This action might not be possible to undo. Are you sure you want to continue?
Department of Electrical Engineering
Examensarbete
Vehicle Detection in Monochrome Images
Examensarbete utfört i Bildbehandling
vid Tekniska högskolan i Linköping
av
Marcus Lundagårds
LITHISYEX 08/4148 SE
Linköping 2008
Department of Electrical Engineering Linköpings tekniska högskola
Linköpings universitet Linköpings universitet
SE581 83 Linköping, Sweden 581 83 Linköping
Vehicle Detection in Monochrome Images
Examensarbete utfört i Bildbehandling
vid Tekniska högskolan i Linköping
av
Marcus Lundagårds
LITHISYEX 08/4148 SE
Handledare: Ognjan Hedberg
Autoliv Electronics AB
Fredrik Tjärnström
Autoliv Electronics AB
Klas Nordberg
isy, Linköpings universitet
Examinator: Klas Nordberg
isy, Linköpings universitet
Linköping, 28 May, 2008
Avdelning, Institution
Division, Department
Division of Computer Vision
Department of Electrical Engineering
Linköpings universitet
SE581 83 Linköping, Sweden
Datum
Date
20080528
Språk
Language
Svenska/Swedish
Engelska/English
Rapporttyp
Report category
Licentiatavhandling
Examensarbete
Cuppsats
Duppsats
Övrig rapport
URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva11819
ISBN
—
ISRN
LITHISYEX 08/4148 SE
Serietitel och serienummer
Title of series, numbering
ISSN
—
Titel
Title
Detektering av Fordon i Monokroma Bilder
Vehicle Detection in Monochrome Images
Författare
Author
Marcus Lundagårds
Sammanfattning
Abstract
The purpose of this master thesis was to study computer vision algorithms for
vehicle detection in monochrome images captured by mono camera. The work has
mainly been focused on detecting rearview cars in daylight conditions. Previous
work in the literature have been revised and algorithms based on edges, shadows
and motion as vehicle cues have been modiﬁed, implemented and evaluated.
This work presents a combination of a multiscale edge based detection and a
shadow based detection as the most promising algorithm, with a positive detection
rate of 96.4% on vehicles at a distance of between 5 m to 30 m.
For the algorithm to work in a complete system for vehicle detection, future
work should be focused on developing a vehicle classiﬁer to reject false detections.
Nyckelord
Keywords vehicle detection, edge based detection, shadow based detection, motion based
detection, mono camera system
Abstract
The purpose of this master thesis was to study computer vision algorithms for
vehicle detection in monochrome images captured by mono camera. The work has
mainly been focused on detecting rearview cars in daylight conditions. Previous
work in the literature have been revised and algorithms based on edges, shadows
and motion as vehicle cues have been modiﬁed, implemented and evaluated.
This work presents a combination of a multiscale edge based detection and a
shadow based detection as the most promising algorithm, with a positive detection
rate of 96.4% on vehicles at a distance of between 5 m to 30 m.
For the algorithm to work in a complete system for vehicle detection, future
work should be focused on developing a vehicle classiﬁer to reject false detections.
v
Acknowledgments
This thesis project is the ﬁnal part of the educational program in Applied Physics
and Electrical Engineering at Linköping University. The work has been carried
out at Autoliv Electronics AB in Mjärdevi, Linköping.
I would like to take this opportunity to thank people that have helped me
during this work. My tutors Ognjan Hedberg and Fredrik Tjärnström for their
helpfulness and valuable advice, Klas Nordberg for his theoretical input, Salah
Hadi for showing interest in my thesis work and Alexander Vikström for his work
on the performance evaluation tool I used to evaluate my algorithms.
vii
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Theory 5
2.1 Vanishing Points and the Hough Transform . . . . . . . . . . . . . 5
2.2 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Correspondence Problem . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 The Harris Operator . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Normalized Correlation . . . . . . . . . . . . . . . . . . . . 9
2.3.3 LucasKanade Tracking . . . . . . . . . . . . . . . . . . . . 10
2.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . 13
2.5.2 The Fundamental Matrix . . . . . . . . . . . . . . . . . . . 13
2.5.3 Normalized EightPoint Algorithm . . . . . . . . . . . . . . 13
2.5.4 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Vehicle Detection Approaches 17
3.1 Knowledge Based Methods . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Corners and Edges . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.3 Shadow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.4 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.5 Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.6 Vehicle Lights . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Stereo Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Motion Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 21
ix
x Contents
4 Implemented Algorithms 23
4.1 Calculation of the Vanishing Point . . . . . . . . . . . . . . . . . . 23
4.2 Distance to Vehicle and Size Constraint . . . . . . . . . . . . . . . 24
4.3 Vehicle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Edge Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.1 Method Outline . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Shadow Based Detection . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.1 Method Outline . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Motion Based Detection . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6.1 Method Outline . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Results and Evaluation 35
5.1 Tests on Edge Based and Shadow Based Detection . . . . . . . . . 36
5.1.1 Edge Based Detection Tests . . . . . . . . . . . . . . . . . . 37
5.1.2 Shadow Based Detection Tests . . . . . . . . . . . . . . . . 39
5.1.3 Tests on Combining Edge Based and Shadow Based Detection 42
5.2 Tests on Motion Based Detection . . . . . . . . . . . . . . . . . . . 44
6 Conclusions 45
Bibliography 49
Chapter 1
Introduction
This chapter introduces the problem to be addressed. Some background is given
along with the thesis’ objective, a discussion of the problem conditions and a
system overview. Finally, the structure of the report is outlined.
1.1 Background
Road traﬃc accidents account for an estimated 1.2 million deaths and up to 50
million injuries worldwide every year [1]. Furthermore, the costs of these accidents
add up to a shocking 13% of the world’s Gross National Product [2]. As the world
leader in automotive safety, Autoliv is continuously developing products to reduce
the risk associated with driving. In Linköping diﬀerent vision based systems which
aim to help the driver are being developed.
An example is Autoliv’s Night Vision system, shown in Figure 1.1, improving
the driver’s vision at night using an infrared camera. The camera detects heat from
objects and is calibrated to be especially sensitive to the temperature of humans
and animals. The view from the infrared camera is projected on a display in front
of the driver and the camera is installed in the front of the car.
Figure 1.1. Autoliv’s Night Vision system.
Detection of obstacles, such as vehicles or pedestrians, is a vital part of such
1
2 Introduction
a system. In a current project, CCD (ChargeCoupled Device) image sensors and
stereo vision capture visible light and are used as the base for a driveraid system.
The motive for this thesis is Autoliv’s wish to investigate what can be achieved
with a mono camera system as far as vehicle detection is concerned.
1.2 Thesis Objective
As shown in Figure 1.2, vehicle detection is basically a twostep process consisting
of detection and classiﬁcation. Detection is the step where the image is scanned
for ROIs (Regions Of Interest), i.e., vehicle hypothesis in this case. The detector is
often used together with a classiﬁer which eliminates false hypotheses. Classiﬁca
tion is therefore the process of deciding whether or not a particular ROI contains
a vehicle. A classiﬁer is typically trained on a large amount of test data, both
positive (vehicles) and negative (nonvehicles).
What the objectives of the diﬀerent steps are can vary from diﬀerent ap
proaches, but in general the detector aims to overestimate the number of ROIs
with the intention of not missing vehicles. The task for the classiﬁer is then to
discard as many of the false vehicle detections as possible. A tracker can be used
to further improve the performance of the system. By tracking regions which in
multiple consecutive frames have been classiﬁed as vehicles, the system can act
more stable through time.
This thesis focuses on the detection step, aiming to investigate existing al
gorithms for detection of vehicles in monochrome (i.e., grayscale) images from a
mono camera. From this study three of them are implemented in MATLAB and
tested on data provided by Autoliv. Improvements are made by modiﬁcations
of the existing algorithms. Furthermore, the complexity of the algorithms is dis
cussed. Herein, vehicle detection will refer to the detection step, excluding the
classiﬁcation.
1.3 Problem Conditions
Diﬀerent approaches to vehicle detection have been proposed in the literature
as will be further discussed in Chapter 3. Creating a robust system for vehicle
detection is a very complex problem. There are numerous diﬃculties that need to
be taken into account:
• Vehicles can be of diﬀerent size, shape and color. Furthermore, a vehicle can
be observed from diﬀerent angles, making the deﬁnition of a vehicle even
broader.
• Lighting and weather conditions vary substantially. Rain, snow, fog, daylight
and darkness must all be taken into account when designing the system.
• Vehicles might be occluded by other vehicles, buildings, etc.
• For a precrash system to serve its purpose it is crucial for the system to
achieve realtime performance.
1.3 Problem Conditions 3
(a) Initial image.
(b) Detected ROIs.
(c) The ROIs have been classiﬁed as vehicles and nonvehicles.
Figure 1.2. Illustration of the twostep vehicle detection strategy
4 Introduction
Due to these variabilities in conditions, it is absolutely necessary to strictly
deﬁne and delimit the problem. Detecting all vehicles in every possible situation
is not realistic. The work in this thesis focus on detecting fully visible rearview
cars and SUVs during daytime. Trucks are not prioritized. To largest possible
extent the algorithms are designed to detect vehicles in various weather conditions
(excluding night scenes) and at any distance. The issue of realtime performance
is outside the scope of this thesis.
1.4 System Overview
The system that has captured the test data consists of a pair of forwarddirected
CCD image sensors mounted on the vehicle’s rearview mirror and a computer
used for information storage. Lens distortion compensation is performed on the
monochrome images and every second horizontal line is removed to avoid artifacts
due to interlacing. The output of the system is therefore image frames with half
the original vertical resolution, 720x240 pixels in size. The frame rate is 30 Hz.
To investigate the performance of a mono system, only the information from the
left image sensor has been used.
1.5 Report Structure
This report is organized as follows: Chapter 2 explains diﬀerent theoretical con
cepts needed in later chapters. Chapter 3 describes the approaches to vehicle
detection that have been proposed in the literature. The algorithms chosen for
implementation are presented in more detail in Chapter 4. The results from the
evaluation of the algorithms follow in Chapter 5. Finally, Chapter 6 sums up the
drawn conclusions.
Chapter 2
Theory
This chapter introduces theory and concepts needed to comprehend the methods
used for vehicle detection. The diﬀerent sections are fairly separate and could
be read independently. It is up to the reader to decide whether to read this
chapter in its full extent before continuing the report or to follow the references
from Chapter 4, describing the implemented algorithms, to necessary theoretical
explanations.
2.1 Vanishing Points and the Hough Transform
(a) The three possible vanishing points
(from [32]).
(b) The interesting vanishing point in vehicle de
tection.
Figure 2.1. Vanishing points.
The vanishing points are deﬁned as points in an image plane where parallel
lines in the 3Dspace converge. As shown in Figure 2.1(a) there are a maximum
of three vanishing points in an image. The interesting vanishing point in the
application of vehicle detection is the point located on the horizon, seen in Figure
5
6 Theory
2.1(b). This point is valuable because the vertical distance in pixels between the
vanishing point and the bottom of a vehicle can yield a distance measure between
the camera and the vehicle under certain assumptions.
Most of the methods (e.g., [4], [5]) proposed in the literature to calculate the
vanishing points depend on the Hough transform [3] to locate dominant line seg
ments. The vanishing point is then decided as the intersection point of these line
segments. Due to measurement noise, there is usually not an unique intersection
point and some sort of error target function is therefore minimized to ﬁnd the best
candidate.
The Hough transform, illustrated in Figure 2.2, maps every point (x, y) in the
image plane to a sinusoidal curve in the Hough space (ρθspace) according to:
y cos θ + xsin θ = ρ
where ρ can be interpreted as the perpendicular distance between the origin and
a line passing through the point (x, y) and θ the angle between the xaxis and the
normal of the same line.
The sinusoidal curves from diﬀerent points along the same line in the image
plane will intersect in the same point in the Hough space, superimposing the value
at that point. Every point in the Hough space transforms back to a line in the
image plane. By thresholding the Hough space one can therefore detect dominant
line segments.
Figure 2.2. The Hough transform transforms a point in the image plane to a sinusoidal
curve in the Hough space. All image points on the same line will intersect in a common
point in the Hough space (from [30]).
2.2 Edge Detection
Edges are important features in image processing. They arise from sharp changes
in image intensity and can, for example, indicate depth discontinuities or changes
of material. Edges are detected by estimating image gradients which indicate how
the intensity changes over an image. The simplest of all approaches to estimate
2.3 The Correspondence Problem 7
gradients is to use ﬁrstorder discrete diﬀerences to obtain this estimate, e.g., the
symmetric diﬀerence:
δI
δx
=
I(x + ∆x, y) −I(x − ∆x, y)
2∆x
Combining this diﬀerentiation with an average ﬁlter in the direction perpen
dicular to the diﬀerentiation yields the famous Sobel ﬁlter. Edges are detected as
pixels with high absolute response when convolving an image with a Sobel ﬁlter.
The following sobel kernel detects vertical edges:
s
x
=
1
4
1
2
1
¸
¸
1
2
1 0 −1
=
1
8
1 0 −1
2 0 −2
1 0 −1
¸
¸
.
To detect horizontal edges s
y
= s
T
x
is used as sobel kernel. If both vertical
and horizontal edge maps I
x
= I ∗ s
x
and I
y
= I ∗ s
y
have been calculated, the
magnitude of the gradient is then given as the vector norm
∇I(x, y) =
I
2
x
(x, y) +I
2
y
(x, y).
One of the most used edge detection techniques was introduced by John F.
Canny in 1986. It is known as the Canny edge detector and detects edges by
searching for local maxima in the gradient norm ∇I(x, y). A more detailed
description of the Canny edge detector can be found in [7].
2.3 The Correspondence Problem
The correspondence problem refers to the problem of ﬁnding a set of corresponding
points in two images taken of the same 3D scene from diﬀerent views (see Figure
2.3). Two points correspond if they are the projection on respective image plane
of the same 3D point. This is a fundamental and wellstudied problem in the ﬁeld
of computer vision. Although humans solve this problem easily, it is a very hard
problem to solve automatically by a computer.
Solving the correspondence problem for every point in an image is seldom
neither wanted nor possible. Apart from the massive computational eﬀort of such
an operation the aperture problem makes it impossible to match some points
unambiguously. The aperture problem, shown in Figure 2.4, arises when one
dimensional structures in motion are looked at through an aperture. Within the
aperture it is impossible to match points on such structure between two consecutive
image frames since the perceptual system is faced with a direction of motion
ambiguity. It clearly shows that some points are inappropriate to match against
others. Thus, to solve the correspondence problem, ﬁrst a way to extract points
suitable for point matching is needed. The Harris operator described below is a
popular such method. Secondly, point correspondences must be found. Sections
2.3.2 and 2.3.3 describe two diﬀerent ways of achieving just that.
8 Theory
Figure 2.3. Finding corresponding points in two images of the same scene is called the
correspondence problem.
Figure 2.4. The aperture problem. Despite the fact that the lines move diagonally, only
horizontal motion can be observed through the aperture.
2.3 The Correspondence Problem 9
2.3.1 The Harris Operator
Interesting points in an image are often called feature points. The properties of
such a point is not clearly deﬁned, instead they depend on the problem at hand.
Points suited for matching typically contain local twodimensional structure, e.g.,
corners. The Harris operator [6] is probably the most used method to extract such
feature points.
First, image gradients are estimated, e.g., using Sobel ﬁlters. Then the 2x2
structure tensor matrix T is calculated as
∇I(x) =
I
x
I
y
T
T = ∇I(x)∇I(x)
T
=
¸
I
x
I
x
I
x
I
y
I
y
I
x
I
y
I
y
.
The Harris response is then calculated as
H(x) = det T(x) − c tr
2
T(x).
The constant c has been assigned diﬀerent values in the literature, typically in the
range 0.04 − 0.05 which empirically has proven to work well. Local maxima in
the Harris response indicate feature points, here deﬁned as points with suﬃcient
twodimensional structure. In Figure 2.5, feature points have been extracted using
the Harris operator.
Figure 2.5. Feature points extracted with the Harris operator
2.3.2 Normalized Correlation
Correlation can be used for templatematching, i.e., to ﬁnd a small region in
an image which matches a template. Especially, putative point correspondences
can be found by comparing a template region t around one feature point against
regions around all feature points (x, y) in another image I. In its simplest form it
is deﬁned as
C(x, y) =
¸¸
α,β
I(x + α, y + β)t(α, β)
10 Theory
where a high response is to indicate a good relation between the region and the
template. The pixel in the centre of the template is assumed to be t(0, 0).
However, in unipolar images (with only positive intensity values) this correla
tion formula can give high response in a region even though the region does not ﬁt
the template at all. This is due to the fact that regions with high intensity yield
higher response since no normalization is performed.
A couple of diﬀerent normalized correlation formulas are common. While the
ﬁrst only normalizes the correlation with the norm of the region and the template
the second also subtracts the mean intensity value from the image region, µ
I
, and
the template, µ
t
.
C(x, y) =
¸¸
α,β
I(x + α, y + β)t(α, β)
¸¸
α,β
I
2
(x + α, y + β)
¸¸
α,β
t
2
(α, β)
C(x, y) =
¸¸
α,β
(I(x + α, y + β) − µ
I
)(t(α, β) − µ
t
)
¸¸
α,β
(I(x + α, y + β) − µ
I
)
2
¸¸
α,β
(t(α, β) − µ
t
)
2
2.3.3 LucasKanade Tracking
Another way of deciding point correspondences is to extract feature points in one
of the images and track them in the other image using LucasKanade tracking [25].
Though ﬁrst introducing an aﬃne motion ﬁeld as a motion model, this is reduced
to a pure translation in [25] since inter frame motion is usually small. Therefore,
the tracking consists of solving d in the equation
J(x +d) = I(x) (2.1)
for two images I and J, a point x =
x y
T
and a translational motion d.
Equation (2.1) can be written as
J(x +
d
2
) = I(x −
d
2
) (2.2)
to make it symmetric with respect to both images. Because of image noise, changes
in illumination, etc. Equation (2.2) is rarely satisﬁed. Therefore, the dissimilarity
=
W
(J(x +
d
2
) −I(x −
d
2
))
2
w(x)dx (2.3)
is minimized by solving
δ
δd
= 0.
The weight function w(x) is usually set to 1. In [8] it is shown that solving
Equation (2.3) is approximately equivalent to solving
Zd = e
2.4 Optical Flow 11
where Z is the 2x2 matrix
Z =
W
g(x)g
T
(x) dx
and e the 2x1 vector
e =
W
(I(x) −J(x))g(x)w(x)dx.
where
g(x) =
δ
δx
(
I+J
2
)
δ
δy
(
I+J
2
)
T
.
In practice, the LucasKanade method is often used in an iterative scheme
where the interesting region in the image is interpolated in each iteration.
2.4 Optical Flow
The problem of deciding the optical ﬂow between two images is closely related
to the correspondence problem described in Section 2.3. The optical ﬂow is an
estimate of the apparent motion of each pixel in an image between two image
frames, i.e., the ﬂow of image intensity values. The optical ﬂow should not be
confused with the motion ﬁeld which is the real motion of an object in a 3Dscene
projected onto the image plane [9]. These are identical only if the object does not
change the image intensity while moving.
There are a number of diﬀerent approaches to computing the optical ﬂow,
of which one derived by Lucas and Kanade [10] will be brieﬂy discussed here.
Assuming that two regions in an image are identical besides a translational motion,
the optical ﬂow is derived from the equation [33]:
I(x + ∆x, y + ∆y, t + ∆t) = I(x, y, t). (2.4)
Equation (2.4) states that a translation vector (∆x, ∆y) exists such that the image
intensity I(x, y, t) after a time ∆t is located at I(x+∆x, y+∆y, t+∆t). Rewriting
the equation using Taylor series of ﬁrst order yields
∆xI
x
+ ∆yI
y
+ ∆tI
t
= 0. (2.5)
After division with ∆t and u =
∆x
∆t
, v =
∆y
∆t
the equation for optical ﬂow is given
as
∇I
T
¸
u
v
= −I
t
. (2.6)
With one equation and two unknowns, further assumptions must be made in order
to solve for the optical ﬂow. The classical approach was proposed by Horn and
Schunck [11] but other methods are found in the literature.
12 Theory
2.5 Epipolar Geometry
Epipolar geometry describes the geometry of two views, i.e., stereo vision. Given a
single image, the 3D point corresponding to a point in the image plane must lie on
a straight line passing through the camera centre and the image point. Because of
the loss of one dimension when a 3D point is projected onto an image plane, it is
impossible to reconstruct the world coordinates of that point from a single image.
However, with two images of the same scene taken from diﬀerent angles, the 3D
point can be calculated by determining the intersection between the two straight
lines passing through respective camera centres and image points. One such line
projected onto another image plane of a camera at a diﬀerent view point is known
as the epipolar line of that image point.
The epipole is the point in one of the images where the camera centre of the
other image is projected. Another way to put it is that the epipolar points are
the intersections between the two image planes and a line passing through the two
camera centers.
Figure 2.6. The projection of a 3D point X onto two image planes. The camera centres
C
1
, C
2
, image coordinates x
1
, x
2
, epipolar lines l
1
, l
2
and epipoles e
1
, e
2
are shown in
the ﬁgure.
2.5 Epipolar Geometry 13
2.5.1 Homogeneous Coordinates
Homogeneous coordinates is the representation for the projective geometry used to
project a 3D scene onto a 2D image plane. The homogeneous representation of a
point x =
x y
T
in an image plane is x
h
=
cx cy c
T
for any nonzero
constant c. Thus, all vectors separated by a constant c are equivalent and such
a vector space is called a projective space. It is common in computer vision to
set c = 1 in the homogeneous representation, so that the other elements represent
actual coordinates in the metric unit chosen.
In computer vision the homogeneous coordinates are convenient in that they
can express an aﬃne transformation, e.g., a rotation and a translation, as a matrix
operation by rewriting
y = Ax +b
into
¸
y
1
=
¸
A b
0, . . . , 0 1
¸
x
1
.
In this way, aﬃne transformations can be combined simply by multiplying their
matrices.
2.5.2 The Fundamental Matrix
The fundamental matrix F is the algebraic representation of epipolar geometry. It
is an 3x3 matrix of rank two and depends on the two cameras internal parameters
and relative pose.
The epipolar constraint describes how corresponding points in a twoview ge
ometry relate and is deﬁned as
x
h2
Fx
h1
= 0.
Here x
h1
is the homogeneous representation of a 3D point in the ﬁrst image and
x
h2
the coordinates of the same 3D point in the second image. This is a necessary
but not suﬃcient condition for point correspondence. Therefore, one can only
discard putative correspondences, not conﬁrm them.
The fundamental matrix can be calculated either by using the camera calibra
tion matrices and their relative pose [13] or by using known point correspondences
as described in Section 2.5.3.
2.5.3 Normalized EightPoint Algorithm
The normalized eightpoint algorithm [12] estimates the fundamental matrix be
tween two stereo images from eight corresponding point pairs. The eight points
in each image are ﬁrst transformed to place the centroid of them at the origin.
The coordinate system is also scaled to make the mean distance from the origin
to a point equal to
√
2. This normalization makes the algorithm more resistant to
noise by ensuring that the point coordinates are in the same size range as 1 in the
homogeneous representation x = (cx, cy, c) = (x, y, 1).
14 Theory
The normalization is done by multiplying the homogeneous coordinates for the
eight points with the normalization matrix P [13]:
P =
¸
α 0 −αx
c
0 α −αy
c
0 0 1
¸
where (x
c
, y
c
) are the coordinates of the centroid of the eight points and α is the
scale factor, deﬁned by
x
c
=
1
8
8
¸
i=1
x
i
y
c
=
1
8
8
¸
i=1
y
i
α =
√
2 ∗ 8
¸
8
i=1
(x
i
− x
c
)
2
+ (y
i
− y
c
)
2
After normalization, the problem consists of minimizing
M
T
MF
v
2
while
F
v

2
= 1
where
Y =
y
1
x
2
x
1
y
2
x
1
y
1
x
2
y
1
y
2
y
1
x
2
y
2
1
T
M =
Y
T
1
Y
T
2
.
.
.
Y
T
8
¸
¸
¸
¸
¸
F
v
=
F
11
F
21
F
31
F
12
F
22
F
32
F
13
F
23
F
33
T
.
This is a total least squares problem and the standard solution is to choose F
v
as
the eigenvector to M
T
M belonging to the smallest eigenvalue [12]. The vector
F
v
is reshaped into a 3x3 matrix F
est
in the reverse order it was reshaped into a
vector.
To ensure that the estimated fundamental matrix has rank two, the norm
F
opt
−F
est

is minimized under the constraint
det F
opt
= 0
2.5 Epipolar Geometry 15
This problem can be solved using Singular Value Decomposition [13]. Let
F
est
= UDV
T
where U and V are orthogonal matrices and the diagonal matrix D consists of
the singular values:
D = diag(r, s, t), r ≥ s ≥ t.
The solution is given by F
opt
= Udiag(r,s,0)V
T
[13]. Finally the fundamental
matrix is denormalized according to
F = P
T
2
F
opt
P
1
where P
1
and P
2
are the transformation matrices for image one and two respec
tively [13].
2.5.4 RANSAC
RANSAC [13], short for Random Sample Consensus, is an iterative method to
estimate parameters of a mathematical model from a set of observed data which
contains outliers. In computer vision, RANSAC is often used to estimate the
fundamental matrix given a set of putative point correspondences. Its advantage
is its ability to give robust estimates even when there are outliers in the data set.
However, there is no upper bound on computation time if RANSAC is to ﬁnd the
optimal parameter estimates.
The use of RANSAC to estimate the fundamental matrix between two stereo
images from a set of putative point correspondences is described in the following
algorithm description. Note that this is only one of many variants of RANSAC.
1. Choose eight random point pairs from a larger set of putative point corre
spondences.
2. Estimate F with the normalized eightpoint algorithm using the eight point
pairs.
3. If F
opt
−F
est
 (see Section 2.5.3) is below a certain threshold
• Evaluate the estimate F by determining the number of correspondences
that agree with this fundamental matrix. If this is the best estimate so
far, save it.
4. Repeat from step 1 until a maximum number of iterations have been pro
cessed.
5. Return F along with the consensus set of corresponding point pairs.
In step 3 the number of corresponding point pairs are calculated as the number
of points that are close enough to their respective epipolar lines. The epipolar lines
16 Theory
and the normalized sum of distances to the epipolar lines are deﬁned as follows:
l
1
= F
T
x
h2
l
2
= Fx
h1
d
sum
=
x
T
h1
l
1

l
2
11
+l
2
12
+
x
T
h2
l
2

l
2
21
+l
2
22
A threshold is applied to d
sum
to separate inliers from outliers. Because of the
normalization of the epipolar lines, the distance measure will be in pixels.
Chapter 3
Vehicle Detection
Approaches
As mentioned in Section 1.2, the purpose of the detection step in a vision based
vehicle detection system is to ﬁnd ROIs. How well this is achieved is a matter
of deﬁnitions. Commonly used measures are the percentage of detected vehicles,
detected nonvehicles, alignment of found ROIs, etc. There are essentially three
diﬀerent approaches to detection of vehicles proposed in the literature: knowledge
based, stereo based and motion based [14].
3.1 Knowledge Based Methods
The knowledge based methods all use a priori image information to extract ROIs.
Diﬀerent cues have been proposed in the literature and systems often combine two
or more of these cues to make the detection more robust.
3.1.1 Color
Color information could possibly be used to distinguish vehicles from background.
Examples exist where road segmentation has been performed using the color cue
[15]. This thesis investigates vehicle detection in monochrome images and therefore
no further research has been made concerning the color cue.
3.1.2 Corners and Edges
Manmade objects like vehicles contain a high degree of corners and edges com
pared to the background, from whichever view they are looked upon. Although
corners might not always be very welldeﬁned in feasible image resolutions, the
edge cue is probably the most exploited of the knowledge based approaches.
In the case of rearview vehicle detection, a vehicle model of two vertical (cor
responding to the left and right side) and two horizontal (bottom and top) edges
17
18 Vehicle Detection Approaches
could be used. This model holds for frontview detection as well. Sun et al. [26]
describe a system that detects vehicles by computing vertical and horizontal edge
proﬁles at three diﬀerent levels of detail (see Figure 3.1). A horizontal edge can
didate corresponding to the bottom of the vehicle is then combined with left and
right side candidates to form ROIs. Wenner [27] uses sliding windows of diﬀerent
sizes to better capture local edge structures. Edges are detected within the image
region delimited by the window instead of using global edge proﬁles for each row
and column.
Figure 3.1. Left column shows images at diﬀerent scales. Second and third column
show vertical and horizontal edge maps. The right column shows edge proﬁles used in
[26].
Jung and Schramm [31] describe an interesting way of detecting rectangles
in an image. By sliding a window over the image the Hough transform of small
regions is computed. Rectangles can then be detected based on certain geometrical
relations in the Hough space, e.g., that the line segments delimiting the rectangle
appear in pairs and that the two pairs are separated by a 90
◦
angle. This is shown
in Figure 3.2. Using the fact that the camera and the vehicles are located on the
same ground plane, i.e., the road, could simplify the model further by not only
assuming a ∆θ = 90
◦
but to lock the θ parameters to 0
◦
and 90
◦
for the two line
pairs respectively.
Edge detection is fairly fast to compute and the detection scheme is simple to
comprehend. On the downside, other manmade objects like buildings, rails, etc.
can confuse a system only based on the edge cue.
3.1 Knowledge Based Methods 19
(a) (b)
Figure 3.2. Properties shown for Hough peaks corresponding to the four sides of a
rectangle centered at the origin (from [31]).
3.1.3 Shadow
The fact that the area underneath a vehicle is darker than the surrounding road
due to the shadow of the vehicle has been suggested as a sign pattern for vehicle
detection in a number of articles [16] [17]. Although this technique can yield very
good result in perfect conditions, it suﬀers in scenes with changing illumination.
Tzomakas et al. [18] partly overcame this problem by deciding an upper thresh
old for the shadow based on the intensity of the free driving space (i.e., the road).
After extracting the free driving space, they calculated the mean and standard
deviation of a Gaussian curve ﬁtted to the intensity values of the road area. The
upper threshold was then set to µ−3σ where µ and σ are the road mean and stan
dard deviation respectively. They combined the detected shadows with horizontal
edge extraction to distill ROIs.
Figure 3.3. Low sun from the side misaligns the ROIs when using the shadow cue.
The most serious drawback of the shadow cue is scenes with low sun (Figure
20 Vehicle Detection Approaches
3.3), making vehicles cast long shadows. Hence, the detected shadows becomes
wider in the case of a sun from the side or ill positioned in the case of the camera
facing the sun. Even though the shadow is still darker beneath the vehicle than
beside it, this is a very weak cue to use to align the ROIs. Surprisingly, this
problem has not been encountered during the literature study.
3.1.4 Symmetry
Symmetry is another sign of objects created by mankind. Vehicles include a fair
amount of symmetry, especially the rearview. Kim et al. [19] used symmetry as a
complement to the shadow cue to better align the left and right side of the ROI. A
disadvantage with this cue is that the free driving space is very symmetrical too,
making symmetry unsuitable as a standalone detection cue.
3.1.5 Texture
Little research has been made on texture as an object detection cue. Kalinke et al.
[20] used entropy to ﬁnd ROIs. The local entropy within a window was calculated
and regions with high entropy were considered as possible vehicles. They proposed
energy, contrast and correlation as other possible texture cues.
3.1.6 Vehicle Lights
Vehicle lights could be used as a night time detection cue [19]. However, the vehicle
lights detection scheme should only be seen as a complement to other techniques.
Brighter illumination and the fact that vehicle lights are not compulsory to use
during daytime in many countries makes it unsuitable for a robust vehicle detection
system.
3.2 Stereo Based Methods
Vehicle detection based on stereo vision uses either the disparity map or Inverse
Perspective Mapping. The disparity map is generated by solving the correspon
dence problem for every pixel in the left and right image and shows the diﬀerence
between the two views. From the disparity map a disparity histogram can be
calculated. Since the rearview of a vehicle is a vertical surface, and the points on
the surface therefore are at the same distance from the camera, it should occur as
a peak in the histogram [21].
The Inverse Perspective Mapping transforms an image point onto a horizontal
plane in the 3D space. Zhao et al [22] used this to transform all points in the
left image onto the ground plane and reproject them back onto the right image.
Then they compared the result with the true right image to detect points above
the ground plane as obstacles.
Since this thesis only deals with detection methods based on images from one
camera, the stereo based approach has not been investigated further.
3.3 Motion Based Methods 21
3.3 Motion Based Methods
As opposed to the previous methods, motion based approaches use temporal in
formation to detect ROIs. The basic idea is trying to detect vehicles by their mo
tion. In systems with ﬁx cameras (e.g., a traﬃc surveillance system) this is rather
straightforward by performing background subtraction. It is done by subtract
ing two consecutive image frames and thresholding the result in order to extract
moving objects. However, the problem becomes signiﬁcantly more complex in an
onroad system because of the egomotion of the camera.
At least two diﬀerent approaches to solve this problem are possible. The ﬁrst
would be to compute the dense optical ﬂow between two consecutive frames, solv
ing the correspondence problem for every pixel [23]. Although, in theory, it would
be possible to calculate the dense optical ﬂow and detect moving objects as areas
of diverging ﬂow (compared to the dominant background ﬂow) this is very time
consuming and not a practical solution. In addition, the aperture problem makes
it impossible to estimate the motion of every pixel.
Another, more realistic approach, would be to compute a sparse optical ﬂow.
This could be done by extracting distinct feature points (e.g. corners) and solve
the correspondence problem for these points. Either feature points are extracted
from both image frames and point pairs are matched using normalized correlation,
or feature points are extracted from one image frame and tracked to the other using
e.g., LucasKanade tracking [25].
By carefully selecting feature points from the background and not from moving
vehicles, the fundamental matrix describing the egomotion of the camera could
be estimated from the putative point matches using RANSAC and the eightpoint
algorithm. In theory, the fundamental matrix could then be used to ﬁnd outliers
in the set of putative point matches. Such an outlier could either originate from
a false point match or from a point on a moving object that does not meet the
motion constraints of the background [24].
However, reality is such that very few feature points can be extracted from the
homogeneous road. On the other hand, vehicles contain a lot of feature points,
e.g., corners and edges. Therefore, the problem of only choosing feature points
from the background is quite intricate.
A possible advantage with the motion based approach could be at detecting
partly occluded vehicles, e.g., overtaking vehicles. Such a vehicle should cause a
diverging motion ﬁeld even though the whole vehicle is not visible for the camera.
An obvious disadvantage is the fact that a method solely based on motion cannot
detect stationary vehicles like parked cars. This is a major drawback as stationary
vehicles can cause dangerous situations if parked in the wrong place. On the same
premises, slow vehicles are hard to detect using motion as the only detection cue.
A property well worth noting is that motion based methods detect all moving
objects, not just vehicles. This could be an advantage as well as a disadvantage.
All moving objects, such as bicycles, pedestrians, etc. could be interesting to de
tect. Combined with an algorithm that detects the road area this could be useful.
However, there is no easy way of distinguish a vehicle from other moving objects
without using other cues than motion.
Chapter 4
Implemented Algorithms
The purpose of the literature study was to choose 23 promising algorithms to
implement and test on data provided by Autoliv. Preferably, they could also
be modiﬁed in order to improve performance. As described in Chapter 3, two
fundamentally diﬀerent approaches were possible. On one hand, the knowledge
based approach, using a priori information from the images. On the other hand,
the motion based approach, using the fact that moving vehicles move diﬀerently
than the background in an image sequence.
After weighting the pros and cons of each method studied in the literature, two
knowledge based and one motion based approach were chosen. These algorithms
have all been implemented in MATLAB and are described in detail in this chapter.
All of them have been modiﬁed in diﬀerent ways to improve performance. First,
however, the calculation of the vanishing point will be explained.
4.1 Calculation of the Vanishing Point
All the implemented algorithms need the ycoordinate of the vanishing point (Sec
tion 2.1) to calculate a distance measure from the camera to a vehicle and to
determine size constraints for a vehicle based on its location in the image. In
this implementation a static estimation of the vanishing point will be calculated
based on the pitch angle of the camera, decided during calibration. Since only the
ycoordinate is interesting for the distance calculation, the xcoordinate will not
be estimated.
The test data have been captured with a CCD camera with a Field of view
(FOV) in the xdirection of 48
◦
. The image size is 720x240 pixels. The ratio
between the height and width of one pixel can be calculated from the known
intrinsic parameters f
x
and f
y
. From these data, the ycoordinate of the vanishing
23
24 Implemented Algorithms
point is calculated as:
f
x
=
f
s
x
f
y
=
f
s
y
FOV
y
=
240
720
s
y
s
x
FOV
x
=
240
720
f
x
f
y
FOV
x
y
vp
=
240
2
+ 240
α
FOV
y
where f is the focal length, s
x
and s
y
are the pixel width and height respectively
and α is the pitch angle deﬁned positive downwards.
4.2 Distance to Vehicle and Size Constraint
To determine the allowed width interval in pixels of a bottom candidate on vertical
distance ∆y from the vanishing point the angle this distance corresponds to in the
camera is calculated as
β =
∆y
240
FOV
y
.
Assuming that the road is ﬂat and that the vehicles are located on the road, the
distance to the vehicle in meters can be calculated as
l = H
cam
/ tan β
where H
cam
is the height above the ground of the camera in meters. Finally,
assuming that the vehicle is located on the same longitudinal axis as the ego
vehicle, the width in pixels of a vehicle is determined as
w = 720
2 arctan
W/2
l
FOV
x
where W is the width in meters of a vehicle. The upper and lower bound for
vehicle width in meters generate an interval in pixels of an allowed vehicle width.
4.3 Vehicle Model
A vehicle is assumed to be less than 2.6 m in width. This is the maximum allowed
width for a vehicle in Sweden [29] and many other countries have approximately
the same regulation. A lower limit of 1.0 m is also set. In the same way, vehicles are
assumed to be between 1.0 m and 2.0 m in height. Note that the upper limit is set
low to avoid unnecessary false detections since truck detection is not prioritized.
The bottom edge is assumed to be a lighttodark edge looking bottomup. In the
same way the left edge is assumed to be a lighttodark edge and the right edge to
be a darktolight edge looking leftright. This is true because the algorithm only
looks for the edges generated from the transition between road and wheels when
ﬁnding the side edges.
4.4 Edge Based Detection 25
4.4 Edge Based Detection
This method uses a multiscale edge detection technique to perform rearview ve
hicle detection. It is inspired by the work done by Sun et al. [26] and Wenner [27]
but has been modiﬁed to improve performance in the current application.
The basic idea is that a rearview vehicle is detected as two horizontal lines
corresponding to its bottom and top and two vertical lines corresponding to its
left and right side. ROIs not satisfying constraints on size based on the distance
from the camera are rejected. Same goes for ROIs asymmetric around a vertical
line in the middle of the ROI and ROIs with too small variance. The method uses
both a coarse and a ﬁne scale to improve robustness. Figure 4.1 shows diﬀerent
steps of the edge based detection scheme.
4.4.1 Method Outline
Edges, both vertical and horizontal, are extracted by convolving the original image
with sobel operators (Section 2.2). Edges a certain distance above the horizon are
not interesting as part of a vehicle and to save computation time that area is
ignored.
Candidates for vehicle bottoms are found by sliding a window of 1xN pixels
over the horizontal edge image, adding up the values inside the window. Local
minima are found for each column and window size and candidates beneath a
certain threshold, corresponding to a vehicle darker than the road, are kept.
Candidates not meeting the constraints on width described in Section 4.2 are
rejected. Next, the algorithm tries to ﬁnd corresponding left and right sides to
the bottom candidates. Upon each bottom candidate a column summation of the
vertical edge image is performed with a window size of w
bottom
/8x1 pixels. Left
sides, deﬁned as a lighttodark edge, are searched for close to the left side of the
bottom candidate and right sides, deﬁned as darktolight edges, are assumed to
lie to the right of the bottom candidate. Each combination of left and right sides
are saved along with corresponding bottom as a candidate. A bottom candidate
without detected left or right sides is discarded.
Vehicle top edges, deﬁned as any kind of horizontal edge of a certain magnitude,
were extracted in the same process as the bottom edges. Now, each candidate is
matched against all vehicle top edges to complete the ROI rectangles. Using
geometry a height interval in pixels is decided in which a top edge must be found
in order to keep the candidate. This height is calculated as
θ
1
= arctan
H − H
cam
l
θ
2
= arctan
H
cam
l
h = 240
θ
1
+ θ
2
FOV
y
where one upper and one lower bound on the height H in meters of a vehicle
give an interval of the vehicle height h in pixels. The parameter l is the distance
26 Implemented Algorithms
(a) Bottom candidates meeting size constraints described in Section 4.2.
(b) Candidates with detected left and right sides.
(c) Candidates with detected top edges.
(d) Asymmetric, homogeneous and redundant candidates rejected.
Figure 4.1. The edge based detection scheme.
4.5 Shadow Based Detection 27
from the camera to the vehicle as derived in Section 4.2. The highest located
top edge within this interval completes the ROI. If no top edge is found within
the interval, the candidate is discarded. Another scale test is performed to reject
candidates not meeting the width constraint criteria.
The ROIs are now checked against symmetry and variance constraints to dis
card more false detections. To calculate an asymmetry measure, the right half of
the ROI is ﬂipped and the median of the squared diﬀerences between the left half
and the ﬂipped, right half is determined. An upper bound threshold on the asym
metry decides whether or not to discard a ROI. Likewise, a ROI is discarded if the
variance over rows is below a certain threshold. The image intensity is summed
up rowwise within the ROI and the variance is calculated on these row sums. Be
cause of this constraint, possible homogeneous candidates covering e.g., the road
are hopefully rejected. Finally, small ROIs within larger ones are rejected to get
rid of ROIs covering e.g., vehicle windows.
4.5 Shadow Based Detection
The shadow based detection algorithm implemented is based on the work by
Tzomakas et al. [18]. The free driving space is detected and local means and
standard deviations of the road are calculated. An upper shadow threshold of
µ − 3σ is applied and the result is combined with a horizontal edge detection to
create vehicle bottom candidates. A ﬁx aspect ratio is used to complete the ROIs.
Figure 4.2 shows diﬀerent steps of the shadow based detection scheme.
4.5.1 Method Outline
The free driving space is ﬁrst estimated with the lowest central homogeneous
region in the image delimited by edges. This is done by detecting edges using the
Canny detector (Section 2.2) and then adding pixels to the free driving space in
a bottomup scheme until the ﬁrst edge is encountered. The ycoordinate for the
vanishing point is used as an upper bound of the free driving space in the case
of no edges present. The image is cropped at the bottom to discard edges on the
egovehicle and on the sides to prevent nonroad regions to be included in the free
driving space estimate. Figure 4.34.4 show a couple of examples.
As opposed to Tzomakas’, this algorithm estimates a local mean and standard
deviation for each row of the free driving space. This is to better capture the
local road intensity. As seen in Figure 4.4(c) problems occur when vehicles, road
markings or shadows occlude parts of the road, making it impossible, with this
algorithm, to detect road points in front of these obstacles. To deal with this
problem, the mean is extrapolated using linear regression to rows where no road
pixels exist to average over. For these rows the standard deviation of the closest
row estimate is used.
The image is thresholded below the horizon using an upper bound on the
shadow. This threshold is calculated as µ − 3σ where µ and σ are the road mean
and standard deviation for the row including the current image point. Horizontal
edges are extracted by simply subtracting a rowshifted copy from the original
28 Implemented Algorithms
(a) The free driving space.
(b) Thresholded image to extract shadows.
(c) Regions classiﬁed as both shadows and lighttodark, horizontal edges.
(d) Final vehicle candidates.
Figure 4.2. The shadow based detection scheme.
4.5 Shadow Based Detection 29
image and thresholding the result. By an ANDoperation, interesting regions
are extracted as regions classiﬁed as both shadows and lighttodark (counting
bottomup) horizontal edges.
(a) Initial image.
(b) Edge map created with the Canny detector.
(c) Detected road area.
Figure 4.3. Detection of the free driving space.
After morphological closing, to remove small holes in the shadows, and opening,
to remove noise, segmentation is performed. Each candidate’s bottom position is
then decided as the row with most points belonging to the shadow region. The left
and right border are situated on the leftmost and rightmost point of the candidate’s
bottom row and a ﬁx aspect ratio decides the height of the ROI. The ROIs are
checked against size constraints as described in Section 4.2. In the same way as in
30 Implemented Algorithms
(a) Initial image.
(b) Edge map created with the Canny detector.
(c) Detected road area.
Figure 4.4. Detection of the free driving space.
3
4.6 Motion Based Detection 31
the edge based detection scheme, asymmetric ROIs are discarded.
4.6 Motion Based Detection
The idea of the following algorithm (inspired by the work done by Yamaguchi et
al. [24]) was to look at two consecutive image frames as a stereo view. This can
be done as long as the vehicle is moving and two image frames taken at diﬀerent
points in time capture the same scene from two diﬀerent positions. The method
outline was initially the following:
• Extract a number of feature points in each image using the Harris operator
(Section 2.3.1)
• Determine a set of point correspondences between the two frames using either
LucasKanade tracking (Section 2.3.3) or normalized correlation (Section
2.3.2).
• Estimate the fundamental matrix from background point correspondences
using RANSAC and the eight point algorithm (Section 2.5).
• Use the epipolar constraint to detect outliers as points on moving objects or
false matched point correspondences.
• Create ROIs from these outliers.
However, a major diﬃculty turned out to be the problem of detecting points
on vehicles as outliers. Consider two image frames from a mono system where the
camera translates forward. The epipole will equal the vanishing point in such a
system [13]. This implies that all epipolar lines will pass through the vanishing
point on the horizon. Therefore, points on vehicles located on the road will trans
late along their corresponding epipolar lines, either towards or from the vanishing
point. Since the epipolar constraint only can be used to reject points away from
their epipolar lines as outliers and not conﬁrm points close to their epipolar lines
as inliers, it will be impossible to detect points on moving vehicles on the road
as outliers. Figure 4.5 illustrates the problem. This is a major issue, though not
encountered in the literature.
Instead, a couple of assumptions were made in order to modify the motion
based algorithm to detect certain vehicles:
• Points on vehicles are used when estimating the fundamental matrix. Since
they are moving along their epipolar lines their direction is consistent with
the epipolar constraint for the background motion. This means that the
signiﬁcant problem of deciding which points to base the estimate on vanishes.
• The vehicle on which the camera is mounted is assumed to move forward.
Thus, the background can be assumed to move towards the camera and
points moving away from the camera can be detected either as points on
overtaking vehicles or mismatched points.
32 Implemented Algorithms
Figure 4.5. Figure shows two consecutive image frames, matched point pairs and their
epipolar lines. Points on vehicles move along their epipolar lines.
4.7 Complexity 33
Since this method cannot detect vehicles moving towards or at the same dis
tance from the camera it is solely to be seen as a complement algorithm. It has
potential to complement the other two algorithm especially in the case of over
taking vehicles not yet fully visible in the images. However, the alignment of the
ROIs cannot be expected to be very precise, since only a number of points are
available to base each ROI upon. If points moving towards the vanishing point
are not detected on each of the four edges delimiting the vehicle, the ROI will be
misaligned.
4.6.1 Method Outline
The implemented algorithm uses two image frames separated by
1
15
sec to detect
vehicles. By convolving the current image frame with sobel kernels edge maps
are obtained. These edge maps are used to extract feature points with the Harris
operator. Since feature points tend to appear in clusters, the image is divided
horizontally into three equal subimages from which an equal number of feature
points are extracted, as long as their Harris response meet a minimum threshold.
The extracted feature points are tracked from the current image frame to the
previous using LucasKanade tracking. All the putative point correspondences
are then used to estimate the fundamental matrix using the eight point algorithm
and RANSAC. Points moving towards the vanishing point are detected and sorted
based on their horizontal position.
The points are then clustered into groups depending on their position in the
image and their velocity towards the vanishing point. ROIs not meeting the scale
constraints are discarded, as in the other two algorithms. Since this algorithm
focus on ﬁnding vehicles in diﬃcult side poses no symmetry constraint is applied.
4.7 Complexity
Using MATLAB execution time to judge an algorithm’s complexity is risky and
diﬃcult. To make the analysis more accurate, data structures have been preallo
cated where possible to avoid time consuming assignments. The algorithms have
been optimized to some degree to lower computation time while generating detec
tions in the many tests. However, a lot more could be done. Of course, a realtime
implementation would need to written in another language, e.g., C++.
Some comments can be made based on the MATLAB code, which have been
run on an Intel Pentium 4 CPU of 3.20 GHz. In the edge based algorithm, the
most time consuming operation is the nested loop ﬁnding horizontal candidates for
vehicle bottoms and tops. Although it consists of fairly simple operations, the fact
that both window size and location in the image are varied makes it expensive to
compute. It accounts for around 30% of the total computation time in MATLAB.
The rest of the edge based detection scheme consists of simple operations which
are fast to compute. The execution time of this part of the program is mainly gov
erned by the number of horizontal candidates found by the nested loop described
above. In MATLAB the algorithm operates at roughly 1 Hz.
34 Implemented Algorithms
The shadow based detection is faster than the edge based. The operation
standing out as time consuming is the Canny edge detection used to extract the
free driving space. Since the shadow based detection only creates one ROI per
shadow, the number of ROIs are always kept low and therefore operations on the
whole set of ROIs, e.g, checking them against size constraints are not expensive
to compute. The algorithm can handle a frame rate of 23 Hz in MATLAB.
The motion based detection scheme is without doubt the most time consuming
algorithm. The 2D interpolation used during the LucasKanade tracking, the 2D
local maxima extraction from the Harris response and the Singular Value Decom
position in the eight point algorithm are all time consuming steps in this algorithm.
Since the LucasKanade tracking is the most time consuming function of the mo
tion based detection algorithm, the computation time is largely dependent on how
many feature points meet the Harris threshold.
Also, the maximum number of iterations for the tracking is another critical pa
rameter since a larger number of iterations implies more interpolation. In general,
the algorithm runs at 0.10.25 Hz in MATLAB.
Chapter 5
Results and Evaluation
The test data has been captured by Autoliv with a stereo vision setup consisting
of two CCD cameras with a FOV in the xdirection of 48
◦
and with a frame rate
of 30 Hz. The image frames are 720x240 pixels where every second row has been
removed to avoid distortion due to eﬀects from interlacing. Only the information
from the left camera is used to evaluate the mono vision algorithms implemented
in this thesis.
The data used to tune the algorithm parameters have been separated from the
validation data. The algorithm tuning has been done manually due to time limita
tions. In order to be able to evaluate algorithm performance, staﬀ at Autoliv have
manually marked vehicles, pedestrians, etc. in the test data. On a frame by frame
basis, each vehicle has been marked with an optimal bounding box surrounding
the vehicle, called a reference marking. Detection results from the algorithms have
been compared against these vehicle reference markings to produce performance
statistics.
The following deﬁnitions have been used when analyzing the performance of
the algorithms:
• A PD (Positive Detection) is a vehicle detection that matches a reference
marking better than the minimum requirements. If several detections match
the same reference only the best match is a PD.
• A ND (Negative Detection) is a reference marking not overlapped by a de
tection at all.
• An OD (Object Detection) is a reported detection that does not overlap a
reference at all.
• The PD rate is the ratio between the number of PDs and the number of
reference markings.
The minimum requirements for a detection to be classiﬁed as a PD is for the
left and right edges to diﬀer less than 30% of the reference marking width from
the true position (based on the reference marking). The top edge is allowed to
35
36 Results and Evaluation
diﬀer 50% of the reference marking height from its true position and the same
limit for the bottom edge is 30%. The easier constraint on the top edge is because
the top edge is not as important as the other edges for a number of reasons e.g.,
to measure a distance to the vehicle.
5.1 Tests on Edge Based and Shadow Based De
tection
A test data set of 70 ﬁles of diﬀerent scenes including a total of 17339 image
frames have been used. The data include motorway as well as city scenes from
diﬀerent locations around Europe. Diﬀerent weather conditions such as sunny,
cloudy, foggy, etc. are all represented. Table 5.1 shows a summary of the test
data.
Type of scene # of ﬁles
Motorway 54
City 16
Conditions
Fine (sun/overcast) 48
Low sun 15
Fog 3
Snow 1
Tunnel 3
Other
Trucks, etc. 11
Table 5.1. Summary of the 70 test data ﬁles.
Occluded vehicles and vehicles overlapping each other have been left out from
the analysis. Tests have been performed on vehicles at diﬀerent distances from
the egovehicle. Vehicles closer than 5 m have been disregarded since they are
impossible to detect with these two algorithms, the bottom edge is simply not
visible in the image at such close distance.
Tests on vehicle reference markings up to 30 m, 50 m and 100 m have been
done. In addition, two diﬀerent sets of poses have been evaluated. The ﬁrst set
only include vehicles seen straight from behind or from the front. The other allows
the vehicles to be seen from an angle showing its rear or front along with one side
of the vehicle, i.e., frontleft, frontright, rearleft or rearright.
Although trucks have not been prioritized to detect, the test data includes
trucks as seen in Table 5.1. The evaluation tool does not distinguish between cars
and trucks and therefore the PD rate of the diﬀerent tests could probably increase
further if such a separation was made.
5.1 Tests on Edge Based and Shadow Based Detection 37
The total number of vehicle reference markings in each test is displayed in
Table 5.2.
Max distance [m]
Poses 30 50 100
Front, Rear 6456 12560 17524
Front, Rear,
Frontleft, Frontright,
Rearright, Rearleft 13483 23162 29023
Table 5.2. The number of vehicle reference markings in the diﬀerent test scenarios.
5.1.1 Edge Based Detection Tests
Table 5.3 shows a summary of the PD rate obtained in the diﬀerent tests. As seen,
this detection scheme is very good at detecting vehicles from side poses. In fact,
the PD rate is higher on the set of vehicle poses including the side poses than the
set only consisting of rear and frontviews.
Max distance [m]
Poses 30 50 100
Front, Rear 89.9% 85.1% 75.1%
Front, Rear,
Frontleft, Frontright,
Rearright, Rearleft 92.2% 86.1% 78.6%
Table 5.3. PD rate for the edge based detection tests.
Figure 5.1 is taken from the test with vehicles up to 100 m of all poses and shows
interesting histograms on how the borders of the ROIs diﬀer between detections
and references for PDs.
The ratio along the xaxis is deﬁned as the position diﬀerence between the de
tection and reference border line divided by the size of reference marking (width
for left and right border lines and height for bottom and top border lines). A
negative ratio corresponds to a detection smaller than the reference, while a posi
tive ratio indicates that the detection is larger than the reference at that speciﬁc
border. As seen, the left and right borders are very well positioned.
The histogram of the bottom border line does have a signiﬁcant peak, however
it is overestimated with an oﬀset of 10%. This is partly explained by the fact
that the vehicle bottoms have been marked where the vehicle wheels meet the
road, while the algorithm often detects the edge arising from the vehicle shadow
situated a few pixels down. This must be considered if a distance measure based
on the vehicle bottom is to be implemented. If the vehicle bottom coordinate is
overestimated the distance will be underestimated.
38 Results and Evaluation
Figure 5.1. Diﬀerence ratio for PDs normalized against the number of reference mark
ings. Test run on the edge based algorithm on vehicles up to 100 m of all poses.
The top edge is the border line with most uncertainty. This comes as no
surprise since this edge has been chosen as the highest positioned edge in a certain
height interval above the bottom border line. In the case of a car, this edge will
therefore more likely overestimate the vehicle height than underestimate it. A
truck, however, does not ﬁt into the height interval used and is therefore only
detected if it contains an edge (somewhere in the middle of the rearview) within
the interval that can be taken as the top edge. The chosen edge will underestimate
the vehicle height and this is one reason to why the histogram is so widely spread.
Another interesting graph is shown in Figure 5.2, where PD rate has been
plotted against distance to vehicles for the test using all poses. This distance has
been calculated with stereo view information during the reference marking process
by staﬀ at Autoliv. The PD rate is clearly dependent on the distance to the vehicle.
Obviously, smaller objects, possibly just a few pixels in size, are harder to detect.
To give some perspective on the number of false detections the edge based
algorithm detected 36.5 detections per frame on average, however only 12.1 of
these were ODs, i.e, detections not overlapping a reference at all. The edge based
algorithm typically generated multiple detections per vehicle though only the best
was classiﬁed as a PD. Many of the ODs arise from railings on the side of the road.
These are mistakenly detected as vehicles as they contain all four needed edges
and also a fair amount of symmetry.
Many of the references not detected well enough by this algorithm are trucks.
5.1 Tests on Edge Based and Shadow Based Detection 39
Figure 5.2. PD rate at diﬀerent distances [m]. Test run on the edge based algorithm
on vehicles up to 100 m of all poses.
Another scenario where the algorithm suﬀers is in scenes with large variations in
illumination, e.g., driving in or out from a tunnel. This, however, is not a drawback
of the algorithm. Instead it is a consequence of the exposure control used by the
camera.
5.1.2 Shadow Based Detection Tests
The PD rate summary is shown in Table 5.4. As opposed to the edge based
detection, the shadow based is more sensitive to the vehicle poses. The PD rate
is also lower than the edge based in all but one test case: frontrear detection up
to 100 m.
Max distance [m]
Poses 30 50 100
Front, Rear 86.1% 82.0% 75.6%
Front, Rear,
Frontleft, Frontright,
Rearright, Rearleft 79.2% 75.0% 71.3%
Table 5.4. PD rate for the shadow based detection tests.
40 Results and Evaluation
An interesting observation can be made from Figure 5.3, taken from the test
with vehicles up to 100 m of all poses. The top border line is better aligned with
this algorithm than the edge based, even though the shadow based algorithm uses
a constant aspect ratio between the height and width of the ROI. The bottom
border line histogram however, shows the same tendency as with the edge based
algorithm, i.e., a slightly overestimated border line coordinate.
Even though the PD rate drops as the distance to the vehicles increases, Figure
5.4 does in fact show that the shadow based algorithm is better at keeping detection
rate at larger distances.
Figure 5.3. Diﬀerence ratio for PDs normalized against the number of reference mark
ings. Test run on the shadow based algorithm on vehicles up to 100 m of all poses.
The shadow based algorithm is in general more sensitive to diﬀerent problem
conditions than the edge based. As mentioned before, low sun is a particularly
diﬃcult scenario for the shadow based algorithm, misaligning the ROIs. In addi
tion, edges on the road from road markings, diﬀerent layers of asphalt, etc. will
confuse the free driving space detection. In addition, the alignment of the ROIs
is more unstable through time compared to the edge based algorithm. The reason
is that the edge based algorithm searches for left and right edges to delimit the
ROI, something the shadow based algorithm does not.
The shadow based algorithm, more restrictive in its way of generating ROIs
per vehicles, had on average 10.2 detections per frame while the number of OD
per frame was as low as 4.9. This is an advantage compared to the edge based
algorithm, less false detections means less work for a classiﬁer. Also, in a hardware
5.1 Tests on Edge Based and Shadow Based Detection 41
Figure 5.4. PD rate at diﬀerent distances [m]. Test run on the shadow based algorithm
on vehicles up to 100 m of all poses.
implementation the total number of detections per frame will be an important
ﬁgure to consider when designing the system. All ROIs, vehicles as well as non
vehicles, will have to be stored and processed by the classiﬁer.
42 Results and Evaluation
5.1.3 Tests on Combining Edge Based and Shadow Based
Detection
To investigate how well the algorithms complemented each other, the detections
from the edge and shadow based algorithms were simply merged. The results in
Table 5.5 shows a signiﬁcant increase in PD rate, especially at larger distances.
Combining the detections from the two algorithms gave 46.7 detections per frame
on average and an OD per frame of 17.0.
Max distance [m]
Poses 30 50 100
Front, Rear 96.2% 94.3% 88.7%
Front, Rear,
Frontleft, Frontright,
Rearright, Rearleft 96.4% 93.1% 88.9%
Table 5.5. PD rate for the tests on combining edge and shadow detection.
Figure 5.5. Diﬀerence ratio for PDs normalized against the number of reference mark
ings. Test run on the edge and shadow based algorithm combined on vehicles up to 100
m of all poses.
Figure 5.5 illustrates the diﬀerence ratio between detections and reference
markings of each border line of the positive detections. The detection rate is
5.1 Tests on Edge Based and Shadow Based Detection 43
improved signiﬁcantly at distances above 30 m compared to the edge based detec
tion alone, as shown in Figure 5.6.
Figure 5.6. PD rate at diﬀerent distances [m]. Test run on the edge and shadow based
algorithm combined on vehicles up to 100 m of all poses.
44 Results and Evaluation
5.2 Tests on Motion Based Detection
The motion based algorithm was only tested as a complement algorithm on dif
ﬁcult test cases where the simpler and faster implemented algorithms would fail.
Therefore a single test was performed on overtaking vehicles not yet fully visible
in the images. This was done by ﬁltering out occluded vehicle reference markings
with the evaluation tool at a distance of 0 to 30 m in rearleft, rearright and side
poses from a subset of 15 ﬁles (4202 frames and 883 vehicle reference markings)
with overtaking vehicles from the original test data set. All border lines were,
as opposed to the other test, allowed to diﬀer 50% from the reference marking
position. The reason was the diﬃculty to align the left and bottom border when
the vehicles were not fully visible. The resulting PD rate was 35.0% with 8.2
detections per frame out of which 2.9 where ODs.
Figure 5.7. Diﬀerence ratio for PDs normalized against the number of reference mark
ings. Test run on the motion based algorithm on occluded vehicles up to 30 m.
Figure 5.7 shows the alignment of the four borders of the ROIs. The top
and right border are well aligned. The left and bottom border lines are almost
exclusively underestimated. Obviously, this is a consequence of the vehicles not
being fully visible in the image. Therefore their left or bottom (or both) reference
border line has been set to the leftmost or lowermost coordinate in the image and
thus that border line cannot be overestimated.
Chapter 6
Conclusions
Three algorithms for vehicle detection have been implemented and tested. The
work has been focused on detecting fully visible rearview cars and SUVs during
daytime. However, the algorithms have been able to detect vehicles in other poses
as well. To largest possible extent the algorithms have been designed to detect
vehicles in various weather conditions and at any distance.
Out of the implemented algorithms there is no doubt that the motion based
is the low achiever and also the slowest implementation. The largest diﬀerence
from the article [24] that has inspired this algorithm is the acknowledgement of
the problem of vehicle points moving along their epipolar lines. The algorithm
was therefore modiﬁed to only detect overtaking vehicles.
Even if improvements are made concerning feature extraction, point matching
and estimation of the fundamental matrix, the core problem still remains: points
on vehicles move along their epipolar lines and are therefore impossible to reject
as outliers. One alternative way to estimate the fundamental matrix would be
to assume a forward translation and use the velocity from the vehicle’s CAN bus
along with the camera calibration matrices to calculate the estimate.
The edge based and the shadow based algorithms should be seen as two maturer
algorithms ready to use for rearview vehicle detection in daylight. The edge
based algorithm has been modiﬁed from the original article by Sun et al. [26] by,
e.g., using local edge detection and from Wenner’s approach [27] by introducing
the top edge, symmetry and variance as additional vehicle cues. The shadow
based algorithm, as opposed to the article by Tzomakas et al. [18], uses a row
byrow local estimation of the road intensity mean and standard deviation and
interpolation to rows where no road points were extracted. Also, variance is used
to reject ROIs.
The ultimate goal for a detection algorithm is ideally, but not very realistic, a
PD rate of 100% and none of the two algorithms are there. Still, two algorithms
have been implemented that work well in good conditions and in the case of the
edge based algorithm also in more diﬃcult scenes. Yet, the most interesting statis
tics were those from the combination of the two algorithms with a PD rate of close
to 90% even at distances up to as far as 100 m. Vehicle detection is a complex
45
46 Conclusions
problem with a large number of parameters like illumination, poses, occlusion, etc.
The way to go is therefore probably to combine diﬀerent detection cues to improve
robustness. The results from the tests in this thesis indicate that this is indeed
the case.
When judging the performance of the algorithms it is very important to keep
in mind that these statistics are calculated on a frame per frame basis. It is clear
that a tracking algorithm would be able to increase these ﬁgures notably by closing
detection gaps of a few frames. These gaps are therefore not to be seen as a critical
drawback of the detection algorithm. Furthermore, it is hard to predict the ﬁnal
outcome of a complete system for vehicle detection solely on the detection stage.
A classiﬁer would get rid of a lot of false detections but also slightly decrease the
PD rate. As said, a tracker would then increase the PD rate by closing detection
gaps.
No real evaluation has been made on which vehicles are not detected. The same
vehicle not being detected during multiple frames is a larger problem than that
of vehicles missing in separate frames. However, the results from the algorithms
in this thesis indicate good performance up to approximately 60 m and useful
performance up to 100 m.
To be useful in an onroad system, the algorithms need to run in real time.
The complexity of these three algorithms indicate that the edge based and shadow
based algorithms stand a good chance of meeting this requirement, even a combi
nation of the two might work. However, the motion based is slow and achieving a
frame rate of 30 Hz would most probably be diﬃcult.
Future work should be focused on adjusting the algorithms to suit an even
broader range of problem conditions. As it is now, trucks are not within the
applied vehicle model and therefore not prioritized to detect. A relaxation of the
height constraint would deﬁnitely mean a greater number of ODs and diﬃculties
to align the top border line though.
The top edge was better aligned with the shadow based algorithm than with the
edge based. This indicates that a ﬁx aspect ratio works well when only detecting
cars. Since the top edge is used to reject structures including a bottom and two
side edges it should still be a part of the algorithm. But once detected, a ROI
could possibly be better aligned using a ﬁx aspect ratio.
As it is now the vanishing point is ﬁx for a certain camera calibration. An
error will therefore be introduced when the road on which the vehicle is driving on
is not ﬂat. This leads to errors when estimating the allowed width and height of a
vehicle at a certain distance from the camera and possibly to rejections of vehicles
or approvals of nonvehicles. A dynamic, robust calculation of the vanishing point,
e.g., using the Hough transform could prevent this.
In addition, an algorithm detecting the road boundaries would help to decrease
the number of ODs. In fact, Figure 6.1 shows that there are a lot of ODs on the left
and right sides of the image, a fact that indicates that a road boundary algorithm
would help reducing the ODs signiﬁcantly.
47
Figure 6.1. Statistics on ODs (Object Detections) taken from the test combining the
edge and shadow based algorithms (up to 100 m, all poses).
Also, a larger test data set would increase the credibility of the test results.
A separation of cars and trucks during the reference marking process would have
made it possible to evaluate the algorithms on cars only, which would have been
interesting since trucks were not prioritized. Instead of being manually chosen, the
parameters of the diﬀerent algorithms, e.g., the edge thresholds, could be tuned
automatically on a set of test data to further optimize performance.
Bibliography
[1] Peden, M. et al. (2004): World report on road traﬃc injury prevention: sum
mary.
[2] Jones, W. (2001): Keeping cars from crashing, IEEE Spectrum, Vol. 38, No.
9, pp. 4045.
[3] Ballard, D.H. (1987): Generalizing the hough transform to detect arbitrary
shapes, Readings in computer vision, pp. 714725.
[4] Cantoni, V., Lombardi, L., Porta, M., Sicard, N. (2001): Vanishing Point
Detection: Representation Analysis and New Approaches, Image Analysis and
Processing. Proceedings. 11th International Conference, pp. 9094.
[5] Lutton, E., Maitre, H., LopezKrahe J. (1994): Contribution to the Deter
mination of Vanishing Points Using Hough Transform, IEEE Transactions on
pattern analysis and machine intelligence, Vol. 16, No. 4, pp. 430438.
[6] Harris, C., Stephens, M. (1998): A Combined Corner and Edge Detector,
Proceedings of 4th Alvey Vision Conference, Manchester, pp. 147151.
[7] Canny, J. (1986): A Computational Approach To Edge Detection, IEEE Trans.
Pattern Analysis and Machine Intelligence, pp. 679714.
[8] Birchﬁeld, S. (1997): Derivation of KanadeLucasTomasi Tracking Equation,
Unpublished.
[9] Jähne B. (2005): Digital Image Processing, 6th revised and extended version.
Springer.
[10] Lucas, B., Kanade, T. (1981): An Iterative Image Registration Technique
with an Application to Stereo Vision, Proc. 7th Intl. Joint Conf. on Artiﬁcial
Intelligence, pp. 674679.
[11] Horn, B.K.P., Schunck, B.G. (1981): Determining optical ﬂow, Artiﬁcial In
telligence 17, pp. 185203.
[12] Chojnacki, W.; Brooks, M.J. (2003): Revisiting Hartley’s normalized eight
point algorithm Transactions on Pattern Analysis and Machine Intelligence,
Volume: 25, Issue: 9, pp. 11721177.
49
50 Bibliography
[13] Hartley, R., Zisserman, A. (2003): Multiple View Geometry in computer
vision, Cambridge University Press.
[14] Sun, Z., Bebis, G., Miller, R. (2006): OnRoad Vehicle Detection: A Review,
IEEE Transactions on pattern analysis and machine intelligence, Vol. 28, No.
5, pp. 694711.
[15] Guo, D., Fraichard, T., Xie, M., Laugier, C. (2000): Color Modeling by
Spherical Inﬂuence Field in Sensing Driving Environment, IEEE Intelligent
Vehicle Symp., pp. 249254.
[16] Mori, H., Charkai, N. (1993): Shadow and Rhythm as Sign Patterns of Ob
stacle Detection, Proc. Int’l Symp. Industrial Electronics, pp. 271277.
[17] Hoﬀmann, C., Dang, T., Stiller, C. (2004): Vehicle detection fusing 2D visual
features, IEEE Intelligent Vehicles Symposium.
[18] Tzomakas, C., Seelen, W. (1998): Vehicle detection in Traﬃc Scenes Us
ing Shadows, Technical Report 9806, Institut für Neuroinformatik, Ruhr
Universität Bochum.
[19] Kim, S., Kim, K et al. (2005): Front and Rear Vehicle Detection and Tracking
in the Day and Night Times using Vision and Sonar Sensor Fusion, Intelligent
Robots and Systems, 2005 IEEE/RSJ International Conference, pp. 21732178.
[20] Kalinke, T., Tzomakas, C., Seelen, W. (1998): A Texturebased Object De
tection and an adaptive Modelbased Classiﬁcation, Proc. IEEE International
Conf. Intelligent Vehicles, pp. 143148.
[21] Franke, U., Kutzbach, I. (1996): Fast Stereo based Object Detection for
Stop&Go Traﬃc, Intelligent Vehicles, pp. 339344.
[22] Zhao, G., Yuta, S. (1993): Obstacle Detection by Vision System For An
Autonomous Vehicle, Intelligent Vehicles, pp. 3136.
[23] Giachetti, A., Campani, M., Torre, V. (1998): The Use of Optical Flow for
Road Navigation, IEEE Transactions on robotics and automation, Vol. 14, No.
1, pp. 3448.
[24] Yamaguchi, K., Kato, T., Ninomiya, Y. (2006): Vehicle EgoMotion Esti
mation and Moving Object Detection using a Monocular Camera, The 18th
International Conference on Pattern Recognition (ICPR’06) Volume 4, pp.
610613.
[25] Shi, J., Tomasi, C. (1994): Good Features to Track, Computer Vision and
Pattern Recognition. Proceedings CVPR ’94., IEEE Computer Society Con
ference, pp. 593600.
[26] Sun, Z., Bebis, G., Miller, R. (2006): Monocular Precrash Vehicle Detection:
Features and Classiﬁers, IEEE Transactions on image processing, Vol. 15, No.
7, pp. 20192034.
Bibliography 51
[27] Wenner, P. (2007): Nightvision II Vehicle classiﬁcation, Umeå University.
[28] Kalinke, T., Tzomakas, C. (1997): Objekthypothesen in Verkehrsszenen unter
Nutzung der Kamerageometrie, IRINI 9707, Institut für Neuroinformatik,
RuhrUniversität Bochum.
[29] Näringsdepartementet (1998): Traﬁkförordning (1998:1276), pp. 27.
[30] Pentenrieder, K. (2005): Tracking Scissors Using RetroReﬂective Lines,
Technical University Munich, Department of Informatics.
[31] Jung, C., Schramm, R. (2004): Rectangle detection based on a windowed
Hough transform, Proceedings of the 17th Brazilian Symposium on Computer
Graphics and Image Processing.
[32] Hanson, K. (2007): Perspective drawings, Wood Digest, September 2007 Is
sue.
[33] Danielsson, PE. et al. (2007): Bildanalys, TSBB07, 2007: Kompendium,
Department of Electrical Engineering, Linköping University.
Vehicle Detection in Monochrome Images
Examensarbete utfört i Bildbehandling vid Tekniska högskolan i Linköping av
Marcus Lundagårds LITHISYEX 08/4148 SE
Handledare:
Ognjan Hedberg
Autoliv Electronics AB
Fredrik Tjärnström
Autoliv Electronics AB
Klas Nordberg
isy, Linköpings universitet
Examinator:
Klas Nordberg
isy, Linköpings universitet
Linköping, 28 May, 2008
.
motion based detection. with a positive detection rate of 96. The work has mainly been focused on detecting rearview cars in daylight conditions. implemented and evaluated.SE Serietitel och serienummer ISSN Title of series.Avdelning. For the algorithm to work in a complete system for vehicle detection. future work should be focused on developing a vehicle classiﬁer to reject false detections. Previous work in the literature have been revised and algorithms based on edges. edge based detection. Nyckelord Keywords vehicle detection.4% on vehicles at a distance of between 5 m to 30 m. shadow based detection. mono camera system . Institution Division. Sweden Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete Cuppsats Duppsats Övrig rapport ISBN — ISRN Datum Date 20080528 LITHISYEX.kb. shadows and motion as vehicle cues have been modiﬁed.se/resolve?urn=urn:nbn:se:liu:diva11819 Titel Title Detektering av Fordon i Monokroma Bilder Vehicle Detection in Monochrome Images Författare Marcus Lundagårds Author Sammanfattning Abstract The purpose of this master thesis was to study computer vision algorithms for vehicle detection in monochrome images captured by mono camera. numbering — URL för elektronisk version http://urn. Department Division of Computer Vision Department of Electrical Engineering Linköpings universitet SE581 83 Linköping. This work presents a combination of a multiscale edge based detection and a shadow based detection as the most promising algorithm.08/4148.
.
Abstract The purpose of this master thesis was to study computer vision algorithms for vehicle detection in monochrome images captured by mono camera. For the algorithm to work in a complete system for vehicle detection. This work presents a combination of a multiscale edge based detection and a shadow based detection as the most promising algorithm. implemented and evaluated. future work should be focused on developing a vehicle classiﬁer to reject false detections.4% on vehicles at a distance of between 5 m to 30 m. v . Previous work in the literature have been revised and algorithms based on edges. The work has mainly been focused on detecting rearview cars in daylight conditions. shadows and motion as vehicle cues have been modiﬁed. with a positive detection rate of 96.
Acknowledgments
This thesis project is the ﬁnal part of the educational program in Applied Physics and Electrical Engineering at Linköping University. The work has been carried out at Autoliv Electronics AB in Mjärdevi, Linköping. I would like to take this opportunity to thank people that have helped me during this work. My tutors Ognjan Hedberg and Fredrik Tjärnström for their helpfulness and valuable advice, Klas Nordberg for his theoretical input, Salah Hadi for showing interest in my thesis work and Alexander Vikström for his work on the performance evaluation tool I used to evaluate my algorithms.
vii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . 2. . . . . . . . 1. . . . . . . . . . . . . . . . . .2 Corners and Edges . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Stereo Based Methods . . . . . . . . . . . .4 System Overview . . . . . . . . . . .5 Report Structure . . . . . . 2. . . . .3 Problem Conditions 1.1. . . . . . . . . . . . . . . . .3 The Correspondence Problem . . . . . . . . . . . . . . .Contents 1 Introduction 1. . . . . . . .1. . . . . . . . . . . . . . . . . . . .2 Normalized Correlation .2 The Fundamental Matrix . . . . . . . . . . . . .5 Texture . . . . . . . . . .3 Shadow . . . . . . . . . . . . . . .1 Knowledge Based Methods . 2. . . . . . . . . . . . 2. . 1. . . .2 Edge Detection .4 Optical Flow . . . . 2. . . .1 Vanishing Points and the Hough Transform 2. . . . . . . . . . . . .5.6 Vehicle Lights . 2 Theory 2. . . . . . . . . . . . . . . .5. . . . . . . . . . 3 Vehicle Detection Approaches 3. . . . .3. . . .3 LucasKanade Tracking . . . . . . . . . . . . . . . . . . . . . . 3. . . . 2. . . . . . . . . . . 1. . . .1. . . . . . . . . . . . . . . . . . . .1 The Harris Operator . 2. . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . .1 Background . . 3. . . . . . .3 Motion Based Methods . . . . . . . . . . . . . . . . . . . . .4 Symmetry . . . . . . . . . . . . . . . . . . . . .1 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . 3. . . ix . . . . . . .3. . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . 2. . . . . . . .1 Color . . . . . 3. . . . . . . . . . . . .3. . . . . .2 Thesis Objective . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 RANSAC . . . . . 1 1 2 2 4 4 5 5 6 7 9 9 10 11 12 13 13 13 15 17 17 17 17 19 20 20 20 20 21 . . . . . . . 3. . . . . . .5 Epipolar Geometry . 3. . .5. .3 Normalized EightPoint Algorithm . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . Contents 23 23 24 24 25 25 27 27 31 33 33 35 36 37 39 42 44 45 49 . .1 Tests on Edge Based and Shadow Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . 5. . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . .6 Motion Based Detection . .1 Edge Based Detection Tests . . . . .1 Method Outline .2 Tests on Motion Based Detection . . . . . . . . . . . . .1. . .1.2 Shadow Based Detection Tests . . 5. . . . . . . . . .1 Calculation of the Vanishing Point . . . . 4. . . . .1. . . . . . . . . . . . . . . . . . . . . . 4. . . .2 Distance to Vehicle and Size Constraint 4. . . . . . . . . . . . . . . . 5. . . . . . . .3 Tests on Combining Edge Based and Shadow Based Detection 5. . . . . . . . . . . . . . . . . . 4. . . . . . . . 4. . . . . . . . .x 4 Implemented Algorithms 4. . . . . . . . . . . . . . . .5 Shadow Based Detection . . . . . . . 4. . .3 Vehicle Model . .6. . .4. . . . . . . . . . . . . . .1 Method Outline . . .4 Edge Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Results and Evaluation 5. . . . 4.1 Method Outline . 6 Conclusions Bibliography . . . . . . . . . . 4. .7 Complexity .
The view from the infrared camera is projected on a display in front of the driver and the camera is installed in the front of the car. Finally. is a vital part of such 1 . As the world leader in automotive safety. the costs of these accidents add up to a shocking 13% of the world’s Gross National Product [2]. Some background is given along with the thesis’ objective. An example is Autoliv’s Night Vision system.2 million deaths and up to 50 million injuries worldwide every year [1].1 Background Road traﬃc accidents account for an estimated 1. The camera detects heat from objects and is calibrated to be especially sensitive to the temperature of humans and animals. such as vehicles or pedestrians. improving the driver’s vision at night using an infrared camera. a discussion of the problem conditions and a system overview.Chapter 1 Introduction This chapter introduces the problem to be addressed. Detection of obstacles.1.1. 1. Figure 1. In Linköping diﬀerent vision based systems which aim to help the driver are being developed. shown in Figure 1. Autoliv is continuously developing products to reduce the risk associated with driving. Autoliv’s Night Vision system. the structure of the report is outlined. Furthermore.
3 Problem Conditions Diﬀerent approaches to vehicle detection have been proposed in the literature as will be further discussed in Chapter 3.e. grayscale) images from a mono camera. 1. vehicle detection will refer to the detection step. A tracker can be used to further improve the performance of the system. Creating a robust system for vehicle detection is a very complex problem. but in general the detector aims to overestimate the number of ROIs with the intention of not missing vehicles.2 Introduction a system. vehicle detection is basically a twostep process consisting of detection and classiﬁcation. The task for the classiﬁer is then to discard as many of the false vehicle detections as possible. Rain. i. daylight and darkness must all be taken into account when designing the system. CCD (ChargeCoupled Device) image sensors and stereo vision capture visible light and are used as the base for a driveraid system. vehicle hypothesis in this case. Herein. • Lighting and weather conditions vary substantially. The motive for this thesis is Autoliv’s wish to investigate what can be achieved with a mono camera system as far as vehicle detection is concerned. A classiﬁer is typically trained on a large amount of test data. The detector is often used together with a classiﬁer which eliminates false hypotheses. the complexity of the algorithms is discussed. Classiﬁcation is therefore the process of deciding whether or not a particular ROI contains a vehicle. • Vehicles might be occluded by other vehicles. excluding the classiﬁcation. Furthermore. • For a precrash system to serve its purpose it is crucial for the system to achieve realtime performance. There are numerous diﬃculties that need to be taken into account: • Vehicles can be of diﬀerent size. fog. By tracking regions which in multiple consecutive frames have been classiﬁed as vehicles. making the deﬁnition of a vehicle even broader. This thesis focuses on the detection step.2 Thesis Objective As shown in Figure 1. both positive (vehicles) and negative (nonvehicles). aiming to investigate existing algorithms for detection of vehicles in monochrome (i.. Furthermore. etc. buildings.e. the system can act more stable through time. shape and color.2. What the objectives of the diﬀerent steps are can vary from diﬀerent approaches. Improvements are made by modiﬁcations of the existing algorithms. Detection is the step where the image is scanned for ROIs (Regions Of Interest). snow. a vehicle can be observed from diﬀerent angles. In a current project. From this study three of them are implemented in MATLAB and tested on data provided by Autoliv. . 1..
(c) The ROIs have been classiﬁed as vehicles and nonvehicles. Illustration of the twostep vehicle detection strategy . (b) Detected ROIs. Figure 1.2.3 Problem Conditions 3 (a) Initial image.1.
To investigate the performance of a mono system. Finally.4 System Overview The system that has captured the test data consists of a pair of forwarddirected CCD image sensors mounted on the vehicle’s rearview mirror and a computer used for information storage.4 Introduction Due to these variabilities in conditions. Detecting all vehicles in every possible situation is not realistic. 720x240 pixels in size. Trucks are not prioritized. Chapter 3 describes the approaches to vehicle detection that have been proposed in the literature. . The results from the evaluation of the algorithms follow in Chapter 5. The algorithms chosen for implementation are presented in more detail in Chapter 4. Chapter 6 sums up the drawn conclusions. To largest possible extent the algorithms are designed to detect vehicles in various weather conditions (excluding night scenes) and at any distance. The work in this thesis focus on detecting fully visible rearview cars and SUVs during daytime. The frame rate is 30 Hz. The output of the system is therefore image frames with half the original vertical resolution. Lens distortion compensation is performed on the monochrome images and every second horizontal line is removed to avoid artifacts due to interlacing. 1. only the information from the left image sensor has been used. it is absolutely necessary to strictly deﬁne and delimit the problem. The issue of realtime performance is outside the scope of this thesis.5 Report Structure This report is organized as follows: Chapter 2 explains diﬀerent theoretical concepts needed in later chapters. 1.
tection.1 Vanishing Points and the Hough Transform (a) The three possible vanishing points (b) The interesting vanishing point in vehicle de(from [32]). to necessary theoretical explanations.1(a) there are a maximum of three vanishing points in an image.1. describing the implemented algorithms. Figure 2. 2. The vanishing points are deﬁned as points in an image plane where parallel lines in the 3Dspace converge. It is up to the reader to decide whether to read this chapter in its full extent before continuing the report or to follow the references from Chapter 4. The diﬀerent sections are fairly separate and could be read independently. Vanishing points. As shown in Figure 2. The interesting vanishing point in the application of vehicle detection is the point located on the horizon. seen in Figure 5 .Chapter 2 Theory This chapter introduces theory and concepts needed to comprehend the methods used for vehicle detection.
By thresholding the Hough space one can therefore detect dominant line segments. 2. The vanishing point is then decided as the intersection point of these line segments. [5]) proposed in the literature to calculate the vanishing points depend on the Hough transform [3] to locate dominant line segments. y) and θ the angle between the xaxis and the normal of the same line.2.6 Theory 2. Edges are detected by estimating image gradients which indicate how the intensity changes over an image. Most of the methods (e. The Hough transform. The simplest of all approaches to estimate . superimposing the value at that point. This point is valuable because the vertical distance in pixels between the vanishing point and the bottom of a vehicle can yield a distance measure between the camera and the vehicle under certain assumptions.1(b). y) in the image plane to a sinusoidal curve in the Hough space (ρθspace) according to: y cos θ + x sin θ = ρ where ρ can be interpreted as the perpendicular distance between the origin and a line passing through the point (x. The Hough transform transforms a point in the image plane to a sinusoidal curve in the Hough space. Due to measurement noise.2. Every point in the Hough space transforms back to a line in the image plane. Figure 2. The sinusoidal curves from diﬀerent points along the same line in the image plane will intersect in the same point in the Hough space. indicate depth discontinuities or changes of material.g. there is usually not an unique intersection point and some sort of error target function is therefore minimized to ﬁnd the best candidate. for example. They arise from sharp changes in image intensity and can.2 Edge Detection Edges are important features in image processing. [4].. All image points on the same line will intersect in a common point in the Hough space (from [30]). illustrated in Figure 2. maps every point (x.
2 and 2. y) = I 2 (x. Canny in 1986. ﬁrst a way to extract points suitable for point matching is needed. The following sobel kernel detects vertical edges: 1 1 1 sx = 2 1 4 2 1 1 0 1 −1 = 2 0 8 1 0 −1 −2 . Edges are detected as pixels with high absolute response when convolving an image with a Sobel ﬁlter. y) − I(x − ∆x. 2.3). A more detailed description of the Canny edge detector can be found in [7].3 The Correspondence Problem The correspondence problem refers to the problem of ﬁnding a set of corresponding points in two images taken of the same 3D scene from diﬀerent views (see Figure 2. point correspondences must be found.4.2. to solve the correspondence problem. arises when onedimensional structures in motion are looked at through an aperture. Two points correspond if they are the projection on respective image plane of the same 3D point. the magnitude of the gradient is then given as the vector norm  I(x. y). If both vertical x and horizontal edge maps I x = I ∗ sx and I y = I ∗ sy have been calculated. Apart from the massive computational eﬀort of such an operation the aperture problem makes it impossible to match some points unambiguously. y) + I 2 (x.. e. . y). Thus. x y One of the most used edge detection techniques was introduced by John F.3 The Correspondence Problem 7 gradients is to use ﬁrstorder discrete diﬀerences to obtain this estimate. Sections 2. it is a very hard problem to solve automatically by a computer. The Harris operator described below is a popular such method.3 describe two diﬀerent ways of achieving just that. Within the aperture it is impossible to match points on such structure between two consecutive image frames since the perceptual system is faced with a direction of motion ambiguity.3.3. This is a fundamental and wellstudied problem in the ﬁeld of computer vision. the symmetric diﬀerence: I(x + ∆x. Solving the correspondence problem for every point in an image is seldom neither wanted nor possible. It clearly shows that some points are inappropriate to match against others. Although humans solve this problem easily.g. −1 0 To detect horizontal edges sy = sT is used as sobel kernel. shown in Figure 2. Secondly. It is known as the Canny edge detector and detects edges by searching for local maxima in the gradient norm  I(x. y) δI = δx 2∆x Combining this diﬀerentiation with an average ﬁlter in the direction perpendicular to the diﬀerentiation yields the famous Sobel ﬁlter. The aperture problem.
The aperture problem.8 Theory Figure 2.4. Despite the fact that the lines move diagonally. . Figure 2.3. only horizontal motion can be observed through the aperture. Finding corresponding points in two images of the same scene is called the correspondence problem.
β) .3 The Correspondence Problem 9 2. Feature points extracted with the Harris operator 2..3.e.2 Normalized Correlation Correlation can be used for templatematching.3. The constant c has been assigned diﬀerent values in the literature. image gradients are estimated. putative point correspondences can be found by comparing a template region t around one feature point against regions around all feature points (x..g. In Figure 2. here deﬁned as points with suﬃcient twodimensional structure. i. y) = α.g. e.5. y) in another image I. Then the 2x2 structure tensor matrix T is calculated as I(x) T = = Ix Iy T I(x) I(x)T = I xI x IyIx I xI y IyIy .05 which empirically has proven to work well. Figure 2. e. typically in the range 0.1 The Harris Operator Interesting points in an image are often called feature points. to ﬁnd a small region in an image which matches a template.. y + β)t(α. Points suited for matching typically contain local twodimensional structure. Especially. In its simplest form it is deﬁned as C(x. The Harris response is then calculated as H(x) = det T (x) − c tr2 T (x). First. using Sobel ﬁlters.2. corners. The properties of such a point is not clearly deﬁned. instead they depend on the problem at hand. Local maxima in the Harris response indicate feature points. feature points have been extracted using the Harris operator.5. The Harris operator [6] is probably the most used method to extract such feature points.β I(x + α.04 − 0.
1) can be written as J (x + x y T (2. The pixel in the centre of the template is assumed to be t(0. µI . etc.β (I(x α. While the ﬁrst only normalizes the correlation with the norm of the region and the template the second also subtracts the mean intensity value from the image region. y + β) − µI )(t(α.β (I(x + α. Therefore. 0).β α. However. y) = α.3 LucasKanade Tracking Another way of deciding point correspondences is to extract feature points in one of the images and track them in the other image using LucasKanade tracking [25].β I(x + α. y) = α.1) and a translational motion d.10 Theory where a high response is to indicate a good relation between the region and the template.3. β) C(x. changes in illumination. a point x = Equation (2. A couple of diﬀerent normalized correlation formulas are common.3) is approximately equivalent to solving Zd = e . µt . in unipolar images (with only positive intensity values) this correlation formula can give high response in a region even though the region does not ﬁt the template at all. C(x. This is due to the fact that regions with high intensity yield higher response since no normalization is performed. the dissimilarity = W (J (x + d d ) − I(x − ))2 w(x)dx 2 2 (2. β) + α. this is reduced to a pure translation in [25] since inter frame motion is usually small.3) is minimized by solving δ = 0. Though ﬁrst introducing an aﬃne motion ﬁeld as a motion model. y + β) t2 (α.β (t(α. y + β)t(α. In [8] it is shown that solving Equation (2. d d ) = I(x − ) 2 2 (2.2) is rarely satisﬁed.2) to make it symmetric with respect to both images. Therefore. the tracking consists of solving d in the equation J (x + d) = I(x) for two images I and J . δd The weight function w(x) is usually set to 1. y + β) − µI )2 − µt )2 2. β) − µt ) α. β) α.β I 2 (x + α. Because of image noise. and the template. Equation (2.
In practice. y + ∆y. . where g(x) = δ I+J δ I+J δx ( 2 ) δy ( 2 ) T . The optical ﬂow is an estimate of the apparent motion of each pixel in an image between two image frames. of which one derived by Lucas and Kanade [10] will be brieﬂy discussed here.2.4 Optical Flow where Z is the 2x2 matrix Z= W 11 g(x)g T (x) dx and e the 2x1 vector e= W (I(x) − J (x))g(x)w(x)dx. Rewriting the equation using Taylor series of ﬁrst order yields ∆xI x + ∆yI y + ∆tI t = 0. 2.. the ﬂow of image intensity values. ∆y) exists such that the image intensity I(x. (2. y.4) Equation (2. t + ∆t) = I(x. There are a number of diﬀerent approaches to computing the optical ﬂow. t+∆t). t) after a time ∆t is located at I(x+∆x.5) v= u v ∆y ∆t the equation for optical ﬂow is given (2. t). The optical ﬂow should not be confused with the motion ﬁeld which is the real motion of an object in a 3Dscene projected onto the image plane [9].4) states that a translation vector (∆x. With one equation and two unknowns.e. the optical ﬂow is derived from the equation [33]: I(x + ∆x. (2. y. i.6) IT = −I t . the LucasKanade method is often used in an iterative scheme where the interesting region in the image is interpolated in each iteration. The classical approach was proposed by Horn and Schunck [11] but other methods are found in the literature. After division with ∆t and u = as ∆x ∆t .3. Assuming that two regions in an image are identical besides a translational motion.4 Optical Flow The problem of deciding the optical ﬂow between two images is closely related to the correspondence problem described in Section 2. These are identical only if the object does not change the image intensity while moving. further assumptions must be made in order to solve for the optical ﬂow. y +∆y.
C2 . Another way to put it is that the epipolar points are the intersections between the two image planes and a line passing through the two camera centers. i. Figure 2. l2 and epipoles e1 . image coordinates x1 . with two images of the same scene taken from diﬀerent angles. stereo vision. The camera centres C1 . the 3D point corresponding to a point in the image plane must lie on a straight line passing through the camera centre and the image point. .. Because of the loss of one dimension when a 3D point is projected onto an image plane. However. One such line projected onto another image plane of a camera at a diﬀerent view point is known as the epipolar line of that image point. x2 . it is impossible to reconstruct the world coordinates of that point from a single image.e. The projection of a 3D point X onto two image planes. The epipole is the point in one of the images where the camera centre of the other image is projected. epipolar lines l1 .12 Theory 2.6. e2 are shown in the ﬁgure.5 Epipolar Geometry Epipolar geometry describes the geometry of two views. Given a single image. the 3D point can be calculated by determining the intersection between the two straight lines passing through respective camera centres and image points.
3 Normalized EightPoint Algorithm The normalized eightpoint algorithm [12] estimates the fundamental matrix between two stereo images from eight corresponding point pairs. . one can only discard putative correspondences. It is common in computer vision to set c = 1 in the homogeneous representation. 2. The eight points in each image are ﬁrst transformed to place the centroid of them at the origin.5.2 The Fundamental Matrix The fundamental matrix F is the algebraic representation of epipolar geometry. .3. This normalization makes the algorithm more resistant to noise by ensuring that the point coordinates are in the same size range as 1 in the homogeneous representation x = (cx. The fundamental matrix can be calculated either by using the camera calibration matrices and their relative pose [13] or by using known point correspondences as described in Section 2.. 0 1 x 1 . Here xh1 is the homogeneous representation of a 3D point in the ﬁrst image and xh2 the coordinates of the same 3D point in the second image. . c) = (x. Thus. not conﬁrm them.2. 1).5. as a matrix operation by rewriting y = Ax + b into y 1 = A b 0.5. aﬃne transformations can be combined simply by multiplying their matrices. 2. This is a necessary but not suﬃcient condition for point correspondence. It is an 3x3 matrix of rank two and depends on the two cameras internal parameters and relative pose. In this way. cy. The coordinate system is also scaled to make the mean distance from the origin √ to a point equal to 2. all vectors separated by a constant c are equivalent and such a vector space is called a projective space. . y.5 Epipolar Geometry 13 2. In computer vision the homogeneous coordinates are convenient in that they can express an aﬃne transformation. Therefore. The epipolar constraint describes how corresponding points in a twoview geometry relate and is deﬁned as xh2 F xh1 = 0. a rotation and a translation.1 Homogeneous Coordinates Homogeneous coordinates is the representation for the projective geometry used to project a 3D scene onto a 2D image plane. e.g. so that the other elements represent actual coordinates in the metric unit chosen. . The homogeneous representation of a T T point x = x y in an image plane is xh = cx cy c for any nonzero constant c.5.
the norm F opt − F est  is minimized under the constraint det F opt = 0 . yc ) are the coordinates of the centroid of the eight points and α is the scale factor.14 Theory The normalization is done by multiplying the homogeneous coordinates for the eight points with the normalization matrix P [13]: α 0 −αxc P = 0 α −αyc 0 0 1 where (xc . deﬁned by 8 xc = 1 8 1 8 xi i=1 8 yc = α= yi i=1 √ 2∗8 8 i=1 (xi − xc )2 + (yi − yc )2 After normalization. . To ensure that the estimated fundamental matrix has rank two. This is a total least squares problem and the standard solution is to choose F v as the eigenvector to M T M belonging to the smallest eigenvalue [12]. The vector F v is reshaped into a 3x3 matrix F est in the reverse order it was reshaped into a vector. . the problem consists of minimizing MT MF v while F v  = 1 where Y = y1 x2 x1 y2 T Y1 Y2T M = . Y8T F v = F11 F21 x1 y1 x2 y1 y2 y1 x2 y2 1 T 2 2 F31 F12 F22 F32 F13 F23 F33 T .
Let F est = U DV T 15 where U and V are orthogonal matrices and the diagonal matrix D consists of the singular values: D = diag(r. The use of RANSAC to estimate the fundamental matrix between two stereo images from a set of putative point correspondences is described in the following algorithm description. Its advantage is its ability to give robust estimates even when there are outliers in the data set. If this is the best estimate so far. 1.s. 5. Repeat from step 1 until a maximum number of iterations have been processed. Return F along with the consensus set of corresponding point pairs. In computer vision. is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. Choose eight random point pairs from a larger set of putative point correspondences.5 Epipolar Geometry This problem can be solved using Singular Value Decomposition [13]. RANSAC is often used to estimate the fundamental matrix given a set of putative point correspondences.2. However.3) is below a certain threshold • Evaluate the estimate F by determining the number of correspondences that agree with this fundamental matrix. Estimate F with the normalized eightpoint algorithm using the eight point pairs. 4. t). short for Random Sample Consensus. there is no upper bound on computation time if RANSAC is to ﬁnd the optimal parameter estimates. The solution is given by F opt = U diag(r. s. 3. save it. 2.5. Note that this is only one of many variants of RANSAC. If F opt − F est  (see Section 2. In step 3 the number of corresponding point pairs are calculated as the number of points that are close enough to their respective epipolar lines. 2.5.4 RANSAC RANSAC [13]. r ≥ s ≥ t. Finally the fundamental matrix is denormalized according to F = P T F opt P 1 2 where P 1 and P 2 are the transformation matrices for image one and two respectively [13]. The epipolar lines .0)V T [13].
. Because of the normalization of the epipolar lines.16 Theory and the normalized sum of distances to the epipolar lines are deﬁned as follows: l1 l2 dsum = F T xh2 = F xh1 xT l1  h1 = + 2 l11 + l2 12 xT l2  h2 l2 + l2 21 22 A threshold is applied to dsum to separate inliers from outliers. the distance measure will be in pixels.
1.1. detected nonvehicles. alignment of found ROIs. In the case of rearview vehicle detection. etc. the edge cue is probably the most exploited of the knowledge based approaches. There are essentially three diﬀerent approaches to detection of vehicles proposed in the literature: knowledge based. Examples exist where road segmentation has been performed using the color cue [15].1 Color Color information could possibly be used to distinguish vehicles from background. the purpose of the detection step in a vision based vehicle detection system is to ﬁnd ROIs. Diﬀerent cues have been proposed in the literature and systems often combine two or more of these cues to make the detection more robust.2. from whichever view they are looked upon. How well this is achieved is a matter of deﬁnitions. 3. Although corners might not always be very welldeﬁned in feasible image resolutions.1 Knowledge Based Methods The knowledge based methods all use a priori image information to extract ROIs. stereo based and motion based [14].Chapter 3 Vehicle Detection Approaches As mentioned in Section 1. a vehicle model of two vertical (corresponding to the left and right side) and two horizontal (bottom and top) edges 17 . Commonly used measures are the percentage of detected vehicles.2 Corners and Edges Manmade objects like vehicles contain a high degree of corners and edges compared to the background. This thesis investigates vehicle detection in monochrome images and therefore no further research has been made concerning the color cue. 3. 3.
1. Jung and Schramm [31] describe an interesting way of detecting rectangles in an image.1).2..g. i. could simplify the model further by not only assuming a ∆θ = 90◦ but to lock the θ parameters to 0◦ and 90◦ for the two line pairs respectively.e. Edges are detected within the image region delimited by the window instead of using global edge proﬁles for each row and column. By sliding a window over the image the Hough transform of small regions is computed. e. Rectangles can then be detected based on certain geometrical relations in the Hough space. Figure 3. that the line segments delimiting the rectangle appear in pairs and that the two pairs are separated by a 90◦ angle.18 Vehicle Detection Approaches could be used. etc. [26] describe a system that detects vehicles by computing vertical and horizontal edge proﬁles at three diﬀerent levels of detail (see Figure 3. A horizontal edge candidate corresponding to the bottom of the vehicle is then combined with left and right side candidates to form ROIs. The right column shows edge proﬁles used in [26]. Left column shows images at diﬀerent scales. This model holds for frontview detection as well. the road. rails. Edge detection is fairly fast to compute and the detection scheme is simple to comprehend. other manmade objects like buildings. Sun et al. Using the fact that the camera and the vehicles are located on the same ground plane. Wenner [27] uses sliding windows of diﬀerent sizes to better capture local edge structures. This is shown in Figure 3. On the downside. Second and third column show vertical and horizontal edge maps. .. can confuse a system only based on the edge cue.
1 Knowledge Based Methods 19 (a) (b) Figure 3. Figure 3.1. [18] partly overcame this problem by deciding an upper threshold for the shadow based on the intensity of the free driving space (i. Properties shown for Hough peaks corresponding to the four sides of a rectangle centered at the origin (from [31]). They combined the detected shadows with horizontal edge extraction to distill ROIs.3. After extracting the free driving space. it suﬀers in scenes with changing illumination. they calculated the mean and standard deviation of a Gaussian curve ﬁtted to the intensity values of the road area.e.2. Although this technique can yield very good result in perfect conditions. Tzomakas et al.. The most serious drawback of the shadow cue is scenes with low sun (Figure . 3.3. The upper threshold was then set to µ − 3σ where µ and σ are the road mean and standard deviation respectively.3 Shadow The fact that the area underneath a vehicle is darker than the surrounding road due to the shadow of the vehicle has been suggested as a sign pattern for vehicle detection in a number of articles [16] [17]. Low sun from the side misaligns the ROIs when using the shadow cue. the road).
contrast and correlation as other possible texture cues. this problem has not been encountered during the literature study. The Inverse Perspective Mapping transforms an image point onto a horizontal plane in the 3D space. Brighter illumination and the fact that vehicle lights are not compulsory to use during daytime in many countries makes it unsuitable for a robust vehicle detection system. especially the rearview. Then they compared the result with the true right image to detect points above the ground plane as obstacles. this is a very weak cue to use to align the ROIs. 3. Surprisingly.6 Vehicle Lights Vehicle lights could be used as a night time detection cue [19]. A disadvantage with this cue is that the free driving space is very symmetrical too. Kim et al.2 Stereo Based Methods Vehicle detection based on stereo vision uses either the disparity map or Inverse Perspective Mapping. Since the rearview of a vehicle is a vertical surface.5 Texture Little research has been made on texture as an object detection cue. the stereo based approach has not been investigated further. 3. From the disparity map a disparity histogram can be calculated. Zhao et al [22] used this to transform all points in the left image onto the ground plane and reproject them back onto the right image.20 Vehicle Detection Approaches 3. . The disparity map is generated by solving the correspondence problem for every pixel in the left and right image and shows the diﬀerence between the two views. it should occur as a peak in the histogram [21]. 3. and the points on the surface therefore are at the same distance from the camera. 3. the detected shadows becomes wider in the case of a sun from the side or ill positioned in the case of the camera facing the sun. Vehicles include a fair amount of symmetry. making symmetry unsuitable as a standalone detection cue. Even though the shadow is still darker beneath the vehicle than beside it. the vehicle lights detection scheme should only be seen as a complement to other techniques. The local entropy within a window was calculated and regions with high entropy were considered as possible vehicles. They proposed energy.1. Kalinke et al.1. However. [20] used entropy to ﬁnd ROIs. [19] used symmetry as a complement to the shadow cue to better align the left and right side of the ROI. Hence.3). Since this thesis only deals with detection methods based on images from one camera. making vehicles cast long shadows.1.4 Symmetry Symmetry is another sign of objects created by mankind.
A possible advantage with the motion based approach could be at detecting partly occluded vehicles. the problem of only choosing feature points from the background is quite intricate. This is a major drawback as stationary vehicles can cause dangerous situations if parked in the wrong place. solving the correspondence problem for every pixel [23]. At least two diﬀerent approaches to solve this problem are possible. vehicles contain a lot of feature points. It is done by subtracting two consecutive image frames and thresholding the result in order to extract moving objects. Such a vehicle should cause a diverging motion ﬁeld even though the whole vehicle is not visible for the camera. On the other hand. etc. corners and edges. In theory. would be to compute a sparse optical ﬂow.g. motion based approaches use temporal information to detect ROIs. All moving objects. This could be done by extracting distinct feature points (e. in theory. a traﬃc surveillance system) this is rather straightforward by performing background subtraction. An obvious disadvantage is the fact that a method solely based on motion cannot detect stationary vehicles like parked cars. e. This could be an advantage as well as a disadvantage. or feature points are extracted from one image frame and tracked to the other using e... such as bicycles. . A property well worth noting is that motion based methods detect all moving objects. Either feature points are extracted from both image frames and point pairs are matched using normalized correlation. By carefully selecting feature points from the background and not from moving vehicles. it would be possible to calculate the dense optical ﬂow and detect moving objects as areas of diverging ﬂow (compared to the dominant background ﬂow) this is very time consuming and not a practical solution. In addition. The basic idea is trying to detect vehicles by their motion.g. the problem becomes signiﬁcantly more complex in an onroad system because of the egomotion of the camera. However.3 Motion Based Methods As opposed to the previous methods. On the same premises. the aperture problem makes it impossible to estimate the motion of every pixel.3 Motion Based Methods 21 3. pedestrians. corners) and solve the correspondence problem for these points. the fundamental matrix could then be used to ﬁnd outliers in the set of putative point matches. more realistic approach..g..g. Combined with an algorithm that detects the road area this could be useful. The ﬁrst would be to compute the dense optical ﬂow between two consecutive frames. Another. LucasKanade tracking [25]. could be interesting to detect. In systems with ﬁx cameras (e. Although. Therefore. not just vehicles. there is no easy way of distinguish a vehicle from other moving objects without using other cues than motion. e. However. However. overtaking vehicles. reality is such that very few feature points can be extracted from the homogeneous road. Such an outlier could either originate from a false point match or from a point on a moving object that does not meet the motion constraints of the background [24].g. the fundamental matrix describing the egomotion of the camera could be estimated from the putative point matches using RANSAC and the eightpoint algorithm.3. slow vehicles are hard to detect using motion as the only detection cue.
.
The test data have been captured with a CCD camera with a Field of view (FOV) in the xdirection of 48◦ . using the fact that moving vehicles move diﬀerently than the background in an image sequence. In this implementation a static estimation of the vanishing point will be calculated based on the pitch angle of the camera. On one hand. First. two fundamentally diﬀerent approaches were possible. the knowledge based approach.1) to calculate a distance measure from the camera to a vehicle and to determine size constraints for a vehicle based on its location in the image. decided during calibration. using a priori information from the images. 4. the motion based approach. two knowledge based and one motion based approach were chosen. Since only the ycoordinate is interesting for the distance calculation. the ycoordinate of the vanishing 23 . Preferably. All of them have been modiﬁed in diﬀerent ways to improve performance. they could also be modiﬁed in order to improve performance. As described in Chapter 3. the calculation of the vanishing point will be explained. From these data. The ratio between the height and width of one pixel can be calculated from the known intrinsic parameters fx and fy . These algorithms have all been implemented in MATLAB and are described in detail in this chapter.Chapter 4 Implemented Algorithms The purpose of the literature study was to choose 23 promising algorithms to implement and test on data provided by Autoliv. however. the xcoordinate will not be estimated. The image size is 720x240 pixels. After weighting the pros and cons of each method studied in the literature. On the other hand.1 Calculation of the Vanishing Point All the implemented algorithms need the ycoordinate of the vanishing point (Section 2.
0 m in height. In the same way the left edge is assumed to be a lighttodark edge and the right edge to be a darktolight edge looking leftright. In the same way.0 m and 2. the distance to the vehicle in meters can be calculated as l = Hcam / tan β where Hcam is the height above the ground of the camera in meters. vehicles are assumed to be between 1. 4.2 Distance to Vehicle and Size Constraint To determine the allowed width interval in pixels of a bottom candidate on vertical distance ∆y from the vanishing point the angle this distance corresponds to in the camera is calculated as ∆y β= F OVy . Note that the upper limit is set low to avoid unnecessary false detections since truck detection is not prioritized. This is the maximum allowed width for a vehicle in Sweden [29] and many other countries have approximately the same regulation. .24 point is calculated as: fx = f sx f fy = sy F OVy = yvp Implemented Algorithms 240 sy 240 fx F OVx = F OVx 720 sx 720 fy α 240 + 240 = 2 F OVy where f is the focal length. assuming that the vehicle is located on the same longitudinal axis as the ego vehicle.3 Vehicle Model A vehicle is assumed to be less than 2. The bottom edge is assumed to be a lighttodark edge looking bottomup. sx and sy are the pixel width and height respectively and α is the pitch angle deﬁned positive downwards. 4. 240 Assuming that the road is ﬂat and that the vehicles are located on the road. This is true because the algorithm only looks for the edges generated from the transition between road and wheels when ﬁnding the side edges.6 m in width. the width in pixels of a vehicle is determined as w = 720 2 arctan W/2 l F OVx where W is the width in meters of a vehicle. Finally.0 m is also set. A lower limit of 1. The upper and lower bound for vehicle width in meters generate an interval in pixels of an allowed vehicle width.
Using geometry a height interval in pixels is decided in which a top edge must be found in order to keep the candidate. deﬁned as any kind of horizontal edge of a certain magnitude. It is inspired by the work done by Sun et al. are extracted by convolving the original image with sobel operators (Section 2. Edges a certain distance above the horizon are not interesting as part of a vehicle and to save computation time that area is ignored. Now. corresponding to a vehicle darker than the road. This height is calculated as H − Hcam l Hcam θ2 = arctan l θ1 + θ2 h = 240 F OVy θ1 = arctan where one upper and one lower bound on the height H in meters of a vehicle give an interval of the vehicle height h in pixels. each candidate is matched against all vehicle top edges to complete the ROI rectangles. A bottom candidate without detected left or right sides is discarded. The parameter l is the distance . Upon each bottom candidate a column summation of the vertical edge image is performed with a window size of wbottom /8 x1 pixels. both vertical and horizontal. Figure 4. The method uses both a coarse and a ﬁne scale to improve robustness. Local minima are found for each column and window size and candidates beneath a certain threshold. ROIs not satisfying constraints on size based on the distance from the camera are rejected. adding up the values inside the window.4.2).4 Edge Based Detection 25 4. were extracted in the same process as the bottom edges. deﬁned as a lighttodark edge. Next. Candidates for vehicle bottoms are found by sliding a window of 1xN pixels over the horizontal edge image.1 Method Outline Edges. The basic idea is that a rearview vehicle is detected as two horizontal lines corresponding to its bottom and top and two vertical lines corresponding to its left and right side. are assumed to lie to the right of the bottom candidate.4. Each combination of left and right sides are saved along with corresponding bottom as a candidate.2 are rejected. Same goes for ROIs asymmetric around a vertical line in the middle of the ROI and ROIs with too small variance. deﬁned as darktolight edges.4 Edge Based Detection This method uses a multiscale edge detection technique to perform rearview vehicle detection. the algorithm tries to ﬁnd corresponding left and right sides to the bottom candidates. Left sides.1 shows diﬀerent steps of the edge based detection scheme. Vehicle top edges. [26] and Wenner [27] but has been modiﬁed to improve performance in the current application. are searched for close to the left side of the bottom candidate and right sides. are kept. 4. Candidates not meeting the constraints on width described in Section 4.
(d) Asymmetric. The edge based detection scheme. (c) Candidates with detected top edges. .2.1. Figure 4. homogeneous and redundant candidates rejected. (b) Candidates with detected left and right sides.26 Implemented Algorithms (a) Bottom candidates meeting size constraints described in Section 4.
To calculate an asymmetry measure.4. the mean is extrapolated using linear regression to rows where no road pixels exist to average over. Finally. To deal with this problem. with this algorithm.g. this algorithm estimates a local mean and standard deviation for each row of the free driving space.4 show a couple of examples. Figure 4. This is to better capture the local road intensity. The image is cropped at the bottom to discard edges on the egovehicle and on the sides to prevent nonroad regions to be included in the free driving space estimate. An upper bound threshold on the asymmetry decides whether or not to discard a ROI.5 Shadow Based Detection The shadow based detection algorithm implemented is based on the work by Tzomakas et al. 4.g. Horizontal edges are extracted by simply subtracting a rowshifted copy from the original . The free driving space is detected and local means and standard deviations of the road are calculated.2. Another scale test is performed to reject candidates not meeting the width constraint criteria. the road are hopefully rejected. 4.5. vehicle windows. making it impossible. If no top edge is found within the interval. An upper shadow threshold of µ − 3σ is applied and the result is combined with a horizontal edge detection to create vehicle bottom candidates. As seen in Figure 4. This is done by detecting edges using the Canny detector (Section 2. Likewise. [18]. For these rows the standard deviation of the closest row estimate is used. This threshold is calculated as µ − 3σ where µ and σ are the road mean and standard deviation for the row including the current image point.1 Method Outline The free driving space is ﬁrst estimated with the lowest central homogeneous region in the image delimited by edges. Because of this constraint. right half is determined. Figure 4. The image is thresholded below the horizon using an upper bound on the shadow. the right half of the ROI is ﬂipped and the median of the squared diﬀerences between the left half and the ﬂipped.. The ycoordinate for the vanishing point is used as an upper bound of the free driving space in the case of no edges present. small ROIs within larger ones are rejected to get rid of ROIs covering e.5 Shadow Based Detection 27 from the camera to the vehicle as derived in Section 4. The ROIs are now checked against symmetry and variance constraints to discard more false detections. The image intensity is summed up rowwise within the ROI and the variance is calculated on these row sums.2) and then adding pixels to the free driving space in a bottomup scheme until the ﬁrst edge is encountered. a ROI is discarded if the variance over rows is below a certain threshold.4(c) problems occur when vehicles. to detect road points in front of these obstacles. The highest located top edge within this interval completes the ROI.2 shows diﬀerent steps of the shadow based detection scheme. the candidate is discarded. possible homogeneous candidates covering e.34. A ﬁx aspect ratio is used to complete the ROIs. As opposed to Tzomakas’.. road markings or shadows occlude parts of the road.
(c) Regions classiﬁed as both shadows and lighttodark. (d) Final vehicle candidates. Figure 4.2. The shadow based detection scheme. . horizontal edges. (b) Thresholded image to extract shadows.28 Implemented Algorithms (a) The free driving space.
to remove noise.2. and opening. to remove small holes in the shadows. (a) Initial image. segmentation is performed. The ROIs are checked against size constraints as described in Section 4. Figure 4. (c) Detected road area.5 Shadow Based Detection 29 image and thresholding the result. The left and right border are situated on the leftmost and rightmost point of the candidate’s bottom row and a ﬁx aspect ratio decides the height of the ROI.4. interesting regions are extracted as regions classiﬁed as both shadows and lighttodark (counting bottomup) horizontal edges.3. Each candidate’s bottom position is then decided as the row with most points belonging to the shadow region. In the same way as in . By an ANDoperation. (b) Edge map created with the Canny detector. After morphological closing. Detection of the free driving space.
(c) Detected road area. 3 .4. (b) Edge map created with the Canny detector.30 Implemented Algorithms (a) Initial image. Detection of the free driving space. Figure 4.
Thus.3. This can be done as long as the vehicle is moving and two image frames taken at diﬀerent points in time capture the same scene from two diﬀerent positions. it will be impossible to detect points on moving vehicles on the road as outliers. 31 4.3. Consider two image frames from a mono system where the camera translates forward. Therefore. This means that the signiﬁcant problem of deciding which points to base the estimate on vanishes. • Estimate the fundamental matrix from background point correspondences using RANSAC and the eight point algorithm (Section 2. However. either towards or from the vanishing point.5). • Use the epipolar constraint to detect outliers as points on moving objects or false matched point correspondences. The epipole will equal the vanishing point in such a system [13].4.5 illustrates the problem.2). [24]) was to look at two consecutive image frames as a stereo view. the background can be assumed to move towards the camera and points moving away from the camera can be detected either as points on overtaking vehicles or mismatched points. .6 Motion Based Detection the edge based detection scheme. • The vehicle on which the camera is mounted is assumed to move forward. This is a major issue. Figure 4. asymmetric ROIs are discarded. points on vehicles located on the road will translate along their corresponding epipolar lines. Since the epipolar constraint only can be used to reject points away from their epipolar lines as outliers and not conﬁrm points close to their epipolar lines as inliers. This implies that all epipolar lines will pass through the vanishing point on the horizon. Instead. a major diﬃculty turned out to be the problem of detecting points on vehicles as outliers. Since they are moving along their epipolar lines their direction is consistent with the epipolar constraint for the background motion. though not encountered in the literature.1) • Determine a set of point correspondences between the two frames using either LucasKanade tracking (Section 2. • Create ROIs from these outliers.3. a couple of assumptions were made in order to modify the motion based algorithm to detect certain vehicles: • Points on vehicles are used when estimating the fundamental matrix.3) or normalized correlation (Section 2. The method outline was initially the following: • Extract a number of feature points in each image using the Harris operator (Section 2.6 Motion Based Detection The idea of the following algorithm (inspired by the work done by Yamaguchi et al.
. Points on vehicles move along their epipolar lines. matched point pairs and their epipolar lines. Figure shows two consecutive image frames.5.32 Implemented Algorithms Figure 4.
The points are then clustered into groups depending on their position in the image and their velocity towards the vanishing point.7 Complexity 33 Since this method cannot detect vehicles moving towards or at the same distance from the camera it is solely to be seen as a complement algorithm.4. By convolving the current image frame with sobel kernels edge maps are obtained. the most time consuming operation is the nested loop ﬁnding horizontal candidates for vehicle bottoms and tops.7 Complexity Using MATLAB execution time to judge an algorithm’s complexity is risky and diﬃcult. The rest of the edge based detection scheme consists of simple operations which are fast to compute.6. However. e.20 GHz. the image is divided horizontally into three equal subimages from which an equal number of feature points are extracted.. the fact that both window size and location in the image are varied makes it expensive to compute. Although it consists of fairly simple operations. If points moving towards the vanishing point are not detected on each of the four edges delimiting the vehicle. To make the analysis more accurate. However. data structures have been preallocated where possible to avoid time consuming assignments. the alignment of the ROIs cannot be expected to be very precise. as in the other two algorithms. The execution time of this part of the program is mainly governed by the number of horizontal candidates found by the nested loop described above. Points moving towards the vanishing point are detected and sorted based on their horizontal position. as long as their Harris response meet a minimum threshold. which have been run on an Intel Pentium 4 CPU of 3. the ROI will be misaligned. These edge maps are used to extract feature points with the Harris operator. Since feature points tend to appear in clusters. All the putative point correspondences are then used to estimate the fundamental matrix using the eight point algorithm and RANSAC. Of course. ROIs not meeting the scale constraints are discarded. Since this algorithm focus on ﬁnding vehicles in diﬃcult side poses no symmetry constraint is applied. 4. C++. In MATLAB the algorithm operates at roughly 1 Hz.1 Method Outline 1 The implemented algorithm uses two image frames separated by 15 sec to detect vehicles. In the edge based algorithm. 4. a lot more could be done. It accounts for around 30% of the total computation time in MATLAB. . The algorithms have been optimized to some degree to lower computation time while generating detections in the many tests.g. since only a number of points are available to base each ROI upon. The extracted feature points are tracked from the current image frame to the previous using LucasKanade tracking. Some comments can be made based on the MATLAB code. It has potential to complement the other two algorithm especially in the case of overtaking vehicles not yet fully visible in the images. a realtime implementation would need to written in another language.
the number of ROIs are always kept low and therefore operations on the whole set of ROIs.10.25 Hz in MATLAB.34 Implemented Algorithms The shadow based detection is faster than the edge based. Since the LucasKanade tracking is the most time consuming function of the motion based detection algorithm. . The 2D interpolation used during the LucasKanade tracking. the maximum number of iterations for the tracking is another critical parameter since a larger number of iterations implies more interpolation. checking them against size constraints are not expensive to compute. The operation standing out as time consuming is the Canny edge detection used to extract the free driving space. the algorithm runs at 0. e. The algorithm can handle a frame rate of 23 Hz in MATLAB.g. In general. Since the shadow based detection only creates one ROI per shadow. Also. the 2D local maxima extraction from the Harris response and the Singular Value Decomposition in the eight point algorithm are all time consuming steps in this algorithm. The motion based detection scheme is without doubt the most time consuming algorithm. the computation time is largely dependent on how many feature points meet the Harris threshold.
On a frame by frame basis. The following deﬁnitions have been used when analyzing the performance of the algorithms: • A PD (Positive Detection) is a vehicle detection that matches a reference marking better than the minimum requirements.Chapter 5 Results and Evaluation The test data has been captured by Autoliv with a stereo vision setup consisting of two CCD cameras with a FOV in the xdirection of 48◦ and with a frame rate of 30 Hz. If several detections match the same reference only the best match is a PD. The data used to tune the algorithm parameters have been separated from the validation data. in the test data. each vehicle has been marked with an optimal bounding box surrounding the vehicle. The image frames are 720x240 pixels where every second row has been removed to avoid distortion due to eﬀects from interlacing. • The PD rate is the ratio between the number of PDs and the number of reference markings. staﬀ at Autoliv have manually marked vehicles. In order to be able to evaluate algorithm performance. Detection results from the algorithms have been compared against these vehicle reference markings to produce performance statistics. Only the information from the left camera is used to evaluate the mono vision algorithms implemented in this thesis. pedestrians. etc. The minimum requirements for a detection to be classiﬁed as a PD is for the left and right edges to diﬀer less than 30% of the reference marking width from the true position (based on the reference marking). • A ND (Negative Detection) is a reference marking not overlapped by a detection at all. • An OD (Object Detection) is a reported detection that does not overlap a reference at all. The algorithm tuning has been done manually due to time limitations. called a reference marking. The top edge is allowed to 35 .
The ﬁrst set only include vehicles seen straight from behind or from the front. Summary of the 70 test data ﬁles. # of ﬁles 54 16 48 15 3 1 3 11 Table 5. . In addition. Tests on vehicle reference markings up to 30 m.1 shows a summary of the test data. Vehicles closer than 5 m have been disregarded since they are impossible to detect with these two algorithms. foggy.36 Results and Evaluation diﬀer 50% of the reference marking height from its true position and the same limit for the bottom edge is 30%.1 Tests on Edge Based and Shadow Based Detection A test data set of 70 ﬁles of diﬀerent scenes including a total of 17339 image frames have been used. The easier constraint on the top edge is because the top edge is not as important as the other edges for a number of reasons e. i. two diﬀerent sets of poses have been evaluated. frontright. cloudy.e. The other allows the vehicles to be seen from an angle showing its rear or front along with one side of the vehicle. The evaluation tool does not distinguish between cars and trucks and therefore the PD rate of the diﬀerent tests could probably increase further if such a separation was made.1.g. frontleft. 50 m and 100 m have been done. the test data includes trucks as seen in Table 5. etc. Tests have been performed on vehicles at diﬀerent distances from the egovehicle. to measure a distance to the vehicle. the bottom edge is simply not visible in the image at such close distance. The data include motorway as well as city scenes from diﬀerent locations around Europe. are all represented. Table 5.. 5. Type of scene Motorway City Conditions Fine (sun/overcast) Low sun Fog Snow Tunnel Other Trucks.1. Although trucks have not been prioritized to detect. Occluded vehicles and vehicles overlapping each other have been left out from the analysis. etc. Diﬀerent weather conditions such as sunny. rearleft or rearright..
Poses Front. Rear.and frontviews.1% 75. As seen.1 is taken from the test with vehicles up to 100 m of all poses and shows interesting histograms on how the borders of the ROIs diﬀer between detections and references for PDs. while a positive ratio indicates that the detection is larger than the reference at that speciﬁc border. The number of vehicle reference markings in the diﬀerent test scenarios.1 Tests on Edge Based and Shadow Based Detection 37 The total number of vehicle reference markings in each test is displayed in Table 5. Frontleft.2. Rearleft Max distance [m] 30 50 100 6456 12560 17524 13483 23162 29023 Table 5. the left and right borders are very well positioned.6% Table 5. In fact. Frontright. The histogram of the bottom border line does have a signiﬁcant peak. Frontright. Rear Front. Max distance [m] 30 50 100 89.2. As seen. This is partly explained by the fact that the vehicle bottoms have been marked where the vehicle wheels meet the road. Rear Front. This must be considered if a distance measure based on the vehicle bottom is to be implemented. The ratio along the xaxis is deﬁned as the position diﬀerence between the detection and reference border line divided by the size of reference marking (width for left and right border lines and height for bottom and top border lines). A negative ratio corresponds to a detection smaller than the reference.1% 78. Rearright. Rearright. If the vehicle bottom coordinate is overestimated the distance will be underestimated.5. Frontleft.1.1% Poses Front.3. PD rate for the edge based detection tests. the PD rate is higher on the set of vehicle poses including the side poses than the set only consisting of rear. Rearleft 92.1 Edge Based Detection Tests Table 5. while the algorithm often detects the edge arising from the vehicle shadow situated a few pixels down. . Figure 5. Rear.9% 85.2% 86. this detection scheme is very good at detecting vehicles from side poses. however it is overestimated with an oﬀset of 10%. 5.3 shows a summary of the PD rate obtained in the diﬀerent tests.
Diﬀerence ratio for PDs normalized against the number of reference markings. however. possibly just a few pixels in size. . The edge based algorithm typically generated multiple detections per vehicle though only the best was classiﬁed as a PD. Many of the ODs arise from railings on the side of the road. The top edge is the border line with most uncertainty. this edge will therefore more likely overestimate the vehicle height than underestimate it. Obviously.5 detections per frame on average. This comes as no surprise since this edge has been chosen as the highest positioned edge in a certain height interval above the bottom border line. are harder to detect. This distance has been calculated with stereo view information during the reference marking process by staﬀ at Autoliv. however only 12.1 of these were ODs. smaller objects. i. does not ﬁt into the height interval used and is therefore only detected if it contains an edge (somewhere in the middle of the rearview) within the interval that can be taken as the top edge. The chosen edge will underestimate the vehicle height and this is one reason to why the histogram is so widely spread. where PD rate has been plotted against distance to vehicles for the test using all poses. A truck. These are mistakenly detected as vehicles as they contain all four needed edges and also a fair amount of symmetry. Another interesting graph is shown in Figure 5.1. The PD rate is clearly dependent on the distance to the vehicle.38 Results and Evaluation Figure 5. Many of the references not detected well enough by this algorithm are trucks. Test run on the edge based algorithm on vehicles up to 100 m of all poses.e. In the case of a car. detections not overlapping a reference at all. To give some perspective on the number of false detections the edge based algorithm detected 36.2.
the shadow based is more sensitive to the vehicle poses. Max distance [m] 30 50 100 86.1 Tests on Edge Based and Shadow Based Detection 39 Figure 5. PD rate at diﬀerent distances [m].1. Another scenario where the algorithm suﬀers is in scenes with large variations in illumination.g. Frontright. Instead it is a consequence of the exposure control used by the camera.0% 71. Rearright.5. is not a drawback of the algorithm. e. PD rate for the shadow based detection tests. Frontleft.2 Shadow Based Detection Tests The PD rate summary is shown in Table 5.0% 75. Rear Front. 5. however.. As opposed to the edge based detection.2% 75. Rear. This.4.3% Table 5.2. Rearleft 79. .4.1% 82. The PD rate is also lower than the edge based in all but one test case: frontrear detection up to 100 m. driving in or out from a tunnel.6% Poses Front. Test run on the edge based algorithm on vehicles up to 100 m of all poses.
something the shadow based algorithm does not. Figure 5.9. more restrictive in its way of generating ROIs per vehicles.. The reason is that the edge based algorithm searches for left and right edges to delimit the ROI. Also. Test run on the shadow based algorithm on vehicles up to 100 m of all poses. i. etc. The shadow based algorithm. had on average 10. less false detections means less work for a classiﬁer. the alignment of the ROIs is more unstable through time compared to the edge based algorithm. in a hardware .2 detections per frame while the number of OD per frame was as low as 4. taken from the test with vehicles up to 100 m of all poses. diﬀerent layers of asphalt. low sun is a particularly diﬃcult scenario for the shadow based algorithm. This is an advantage compared to the edge based algorithm. As mentioned before.e. even though the shadow based algorithm uses a constant aspect ratio between the height and width of the ROI. misaligning the ROIs. Diﬀerence ratio for PDs normalized against the number of reference markings. Figure 5. The top border line is better aligned with this algorithm than the edge based. The shadow based algorithm is in general more sensitive to diﬀerent problem conditions than the edge based. The bottom border line histogram however. In addition. a slightly overestimated border line coordinate. Even though the PD rate drops as the distance to the vehicles increases.4 does in fact show that the shadow based algorithm is better at keeping detection rate at larger distances. edges on the road from road markings.3.40 Results and Evaluation An interesting observation can be made from Figure 5. In addition. will confuse the free driving space detection.3. shows the same tendency as with the edge based algorithm.
vehicles as well as nonvehicles.1 Tests on Edge Based and Shadow Based Detection 41 Figure 5. implementation the total number of detections per frame will be an important ﬁgure to consider when designing the system. PD rate at diﬀerent distances [m].4.5. Test run on the shadow based algorithm on vehicles up to 100 m of all poses. will have to be stored and processed by the classiﬁer. . All ROIs.
Combining the detections from the two algorithms gave 46.5. Rearright. The results in Table 5.5 illustrates the diﬀerence ratio between detections and reference markings of each border line of the positive detections.4% 93. especially at larger distances. Rear Front.5 shows a signiﬁcant increase in PD rate. The detection rate is .9% Table 5. Max distance [m] 30 50 100 96.1.2% 94.7 detections per frame on average and an OD per frame of 17.0. Frontleft. Diﬀerence ratio for PDs normalized against the number of reference markings. Frontright.1% 88. Figure 5.42 Results and Evaluation 5.7% Poses Front. Test run on the edge and shadow based algorithm combined on vehicles up to 100 m of all poses. Figure 5. Rear. the detections from the edge and shadow based algorithms were simply merged.3 Tests on Combining Edge Based and Shadow Based Detection To investigate how well the algorithms complemented each other.5.3% 88. PD rate for the tests on combining edge and shadow detection. Rearleft 96.
Test run on the edge and shadow based algorithm combined on vehicles up to 100 m of all poses. . as shown in Figure 5.6.5.6. PD rate at diﬀerent distances [m].1 Tests on Edge Based and Shadow Based Detection 43 improved signiﬁcantly at distances above 30 m compared to the edge based detection alone. Figure 5.
7. The reason was the diﬃculty to align the left and bottom border when the vehicles were not fully visible.0% with 8. Obviously. . Therefore their left or bottom (or both) reference border line has been set to the leftmost or lowermost coordinate in the image and thus that border line cannot be overestimated. Therefore a single test was performed on overtaking vehicles not yet fully visible in the images. this is a consequence of the vehicles not being fully visible in the image. All border lines were. The top and right border are well aligned.2 Tests on Motion Based Detection The motion based algorithm was only tested as a complement algorithm on difﬁcult test cases where the simpler and faster implemented algorithms would fail.2 detections per frame out of which 2. Figure 5. The resulting PD rate was 35. The left and bottom border lines are almost exclusively underestimated. This was done by ﬁltering out occluded vehicle reference markings with the evaluation tool at a distance of 0 to 30 m in rearleft. rearright and side poses from a subset of 15 ﬁles (4202 frames and 883 vehicle reference markings) with overtaking vehicles from the original test data set.7 shows the alignment of the four borders of the ROIs. Diﬀerence ratio for PDs normalized against the number of reference markings.9 where ODs. as opposed to the other test. Test run on the motion based algorithm on occluded vehicles up to 30 m. allowed to diﬀer 50% from the reference marking position. Figure 5.44 Results and Evaluation 5.
The shadow based algorithm. a PD rate of 100% and none of the two algorithms are there. However. The edge based and the shadow based algorithms should be seen as two maturer algorithms ready to use for rearview vehicle detection in daylight. Even if improvements are made concerning feature extraction. [18]. using local edge detection and from Wenner’s approach [27] by introducing the top edge. the core problem still remains: points on vehicles move along their epipolar lines and are therefore impossible to reject as outliers. variance is used to reject ROIs. symmetry and variance as additional vehicle cues. [26] by. The work has been focused on detecting fully visible rearview cars and SUVs during daytime. the most interesting statistics were those from the combination of the two algorithms with a PD rate of close to 90% even at distances up to as far as 100 m. uses a rowbyrow local estimation of the road intensity mean and standard deviation and interpolation to rows where no road points were extracted.g.Chapter 6 Conclusions Three algorithms for vehicle detection have been implemented and tested. point matching and estimation of the fundamental matrix.. Still. but not very realistic. Also. The algorithm was therefore modiﬁed to only detect overtaking vehicles. The largest diﬀerence from the article [24] that has inspired this algorithm is the acknowledgement of the problem of vehicle points moving along their epipolar lines. Vehicle detection is a complex 45 . One alternative way to estimate the fundamental matrix would be to assume a forward translation and use the velocity from the vehicle’s CAN bus along with the camera calibration matrices to calculate the estimate. Out of the implemented algorithms there is no doubt that the motion based is the low achiever and also the slowest implementation. the algorithms have been able to detect vehicles in other poses as well. two algorithms have been implemented that work well in good conditions and in the case of the edge based algorithm also in more diﬃcult scenes. e. The edge based algorithm has been modiﬁed from the original article by Sun et al. The ultimate goal for a detection algorithm is ideally. as opposed to the article by Tzomakas et al. To largest possible extent the algorithms have been designed to detect vehicles in various weather conditions and at any distance. Yet.
No real evaluation has been made on which vehicles are not detected.g. A classiﬁer would get rid of a lot of false detections but also slightly decrease the PD rate. a tracker would then increase the PD rate by closing detection gaps. A relaxation of the height constraint would deﬁnitely mean a greater number of ODs and diﬃculties to align the top border line though. When judging the performance of the algorithms it is very important to keep in mind that these statistics are calculated on a frame per frame basis. an algorithm detecting the road boundaries would help to decrease the number of ODs. The complexity of these three algorithms indicate that the edge based and shadow based algorithms stand a good chance of meeting this requirement. A dynamic. it is hard to predict the ﬁnal outcome of a complete system for vehicle detection solely on the detection stage. poses. However. even a combination of the two might work. a fact that indicates that a road boundary algorithm would help reducing the ODs signiﬁcantly. the results from the algorithms in this thesis indicate good performance up to approximately 60 m and useful performance up to 100 m. Future work should be focused on adjusting the algorithms to suit an even broader range of problem conditions. This indicates that a ﬁx aspect ratio works well when only detecting cars.46 Conclusions problem with a large number of parameters like illumination. However. The same vehicle not being detected during multiple frames is a larger problem than that of vehicles missing in separate frames. An error will therefore be introduced when the road on which the vehicle is driving on is not ﬂat. This leads to errors when estimating the allowed width and height of a vehicle at a certain distance from the camera and possibly to rejections of vehicles or approvals of nonvehicles. robust calculation of the vanishing point. Since the top edge is used to reject structures including a bottom and two side edges it should still be a part of the algorithm.. The top edge was better aligned with the shadow based algorithm than with the edge based. In fact. In addition. Furthermore. It is clear that a tracking algorithm would be able to increase these ﬁgures notably by closing detection gaps of a few frames. As it is now. occlusion. But once detected.1 shows that there are a lot of ODs on the left and right sides of the image. etc. The results from the tests in this thesis indicate that this is indeed the case. The way to go is therefore probably to combine diﬀerent detection cues to improve robustness. These gaps are therefore not to be seen as a critical drawback of the detection algorithm. Figure 6. As it is now the vanishing point is ﬁx for a certain camera calibration. using the Hough transform could prevent this. . the algorithms need to run in real time. trucks are not within the applied vehicle model and therefore not prioritized to detect. a ROI could possibly be better aligned using a ﬁx aspect ratio. To be useful in an onroad system. As said. e. the motion based is slow and achieving a frame rate of 30 Hz would most probably be diﬃcult.
e. the edge thresholds.47 Figure 6. . all poses). could be tuned automatically on a set of test data to further optimize performance. which would have been interesting since trucks were not prioritized. the parameters of the diﬀerent algorithms. A separation of cars and trucks during the reference marking process would have made it possible to evaluate the algorithms on cars only. Statistics on ODs (Object Detections) taken from the test combining the edge and shadow based algorithms (up to 100 m. Instead of being manually chosen.1. Also..g. a larger test data set would increase the credibility of the test results.
.
. No. pp. 185203.. 7th Intl. [4] Cantoni. Kanade. Vol. (1986): A Computational Approach To Edge Detection. IEEE Spectrum. 11721177. 11th International Conference. 16. pp. 6th revised and extended version. N. 147151. Readings in computer vision.. [6] Harris. (2005): Digital Image Processing. IEEE Transactions on pattern analysis and machine intelligence. Unpublished. B..H. 714725. [5] Lutton. C. pp. Schunck. 679714. pp. H. (2003): Revisiting Hartley’s normalized eightpoint algorithm Transactions on Pattern Analysis and Machine Intelligence. Proceedings of 4th Alvey Vision Conference. IEEE Trans.. J. M. 4045. (1981): An Iterative Image Registration Technique with an Application to Stereo Vision. Volume: 25. pp.. pp. 430438. Artiﬁcial Intelligence 17. [10] Lucas. pp. [8] Birchﬁeld. 9. Proc. on Artiﬁcial Intelligence. (1998): A Combined Corner and Edge Detector. Porta. Lombardi. Joint Conf.G. M. Image Analysis and Processing. Sicard. Manchester. Maitre. 674679. D.. (1997): Derivation of KanadeLucasTomasi Tracking Equation.Bibliography [1] Peden. Stephens.. [9] Jähne B. E. 49 . W. [7] Canny. V. 9094. pp. [12] Chojnacki.P.. [2] Jones. 4. (1994): Contribution to the Determination of Vanishing Points Using Hough Transform. Vol. W. (2001): Keeping cars from crashing. LopezKrahe J. L. B. No. B. M. [11] Horn. Issue: 9. (1981): Determining optical ﬂow. S. T. (2004): World report on road traﬃc injury prevention: summary. Springer. 38.J.K. (2001): Vanishing Point Detection: Representation Analysis and New Approaches. Brooks. et al. Pattern Analysis and Machine Intelligence. pp. Proceedings. M. [3] Ballard. (1987): Generalizing the hough transform to detect arbitrary shapes.
Proceedings CVPR ’94. . IEEE Intelligent Vehicles Symposium. Y. 14... pp. Cambridge University Press.. pp. [16] Mori. 15. 249254. C. Miller. 271277. [22] Zhao. Seelen. W. Yuta. (1993): Shadow and Rhythm as Sign Patterns of Obstacle Detection. Campani. IEEE Transactions on image processing.. Intelligent Robots and Systems.. (1998): A Texturebased Object Detection and an adaptive Modelbased Classiﬁcation. Torre. 3136. 339344. C.. 28.. 694711. A. [17] Hoﬀmann. [26] Sun. Kutzbach. pp. Ninomiya. M. C. The 18th International Conference on Pattern Recognition (ICPR’06) Volume 4. 21732178. Intelligent Vehicles.. 7. (2006): OnRoad Vehicle Detection: A Review. (2005): Front and Rear Vehicle Detection and Tracking in the Day and Night Times using Vision and Sonar Sensor Fusion. Kim. IEEE Transactions on robotics and automation. R. (2000): Color Modeling by Spherical Inﬂuence Field in Sensing Driving Environment. Fraichard. Vol. Dang. (2004): Vehicle detection fusing 2D visual features. T. pp. 5. A. [20] Kalinke. Z. C. W. Proc. Seelen. Laugier.. (1998): Vehicle detection in Traﬃc Scenes Using Shadows. [21] Franke. No.. 3448. No. C. IEEE International Conf. No. G. S. [24] Yamaguchi. (2006): Vehicle EgoMotion Estimation and Moving Object Detection using a Monocular Camera. [18] Tzomakas. K.. [14] Sun. C. Technical Report 9806. Institut für Neuroinformatik. 143148. IEEE Transactions on pattern analysis and machine intelligence. pp. pp.. T.. Intelligent Vehicles. Zisserman. pp. RuhrUniversität Bochum. Intelligent Vehicles. IEEE Computer Society Conference.. (2006): Monocular Precrash Vehicle Detection: Features and Classiﬁers. (2003): Multiple View Geometry in computer vision. Int’l Symp. 20192034. pp. D. 2005 IEEE/RSJ International Conference. U. G.. [25] Shi.. Xie. Vol. M. V. J. G. R. pp. T. Bebis.. Proc. pp. Kato. N.. Tomasi. (1996): Fast Stereo based Object Detection for Stop&Go Traﬃc. Bebis. (1993): Obstacle Detection by Vision System For An Autonomous Vehicle.... S. Z. IEEE Intelligent Vehicle Symp. Miller. Tzomakas. 593600. pp.. [19] Kim. [15] Guo. Stiller. T. 610613. I. 1.. Charkai.. [23] Giachetti.50 Bibliography [13] Hartley. R. Vol. (1998): The Use of Optical Flow for Road Navigation. (1994): Good Features to Track. Computer Vision and Pattern Recognition. H. K et al. Industrial Electronics.
(2007): Perspective drawings. Proceedings of the 17th Brazilian Symposium on Computer Graphics and Image Processing.. Schramm. September 2007 Issue. [31] Jung. 27. IRINI 9707. [32] Hanson. R. (2004): Rectangle detection based on a windowed Hough transform. T. P. RuhrUniversität Bochum. [29] Näringsdepartementet (1998): Traﬁkförordning (1998:1276). Tzomakas. Institut für Neuroinformatik. C. et al. [33] Danielsson.. . PE. (2007): Nightvision II Vehicle classiﬁcation. Linköping University. TSBB07. [28] Kalinke. Wood Digest. K. Department of Electrical Engineering. pp. Department of Informatics. (1997): Objekthypothesen in Verkehrsszenen unter Nutzung der Kamerageometrie. (2007): Bildanalys. [30] Pentenrieder.Bibliography 51 [27] Wenner. 2007: Kompendium. Umeå University. C. Technical University Munich. (2005): Tracking Scissors Using RetroReﬂective Lines. K.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.