You are on page 1of 52

Année 2012

R
e
l
a
t
i
o
n
s

i
n
t
e
r
n
a
t
i
o
n
a
l
e
s

E
N
S
M
M



É c o l e N a t i o n a l e S u p é r i e u r e d e M é c a n i q u e e t d e s M i c r o t e c h n i q u e s
26, chemin de l’Épitaphe – 25030 BESANÇON – France - Tél. 03 81 40 27 31 - Fax 03 81 80 98 70 - www.ens2m.fr


EU4M : EUropean Master in Mechatronics
and MicroMechatronics Systems



Mémoire de stage master



Domaine : Sciences – Technologies – Santé

Mention : Sciences Pour l’Ingénieur

Spécialité

Mécanique – Mécatronique – Microsystèmes et Ingénierie




Obstacles avoidance strategies for
visual servoing of a mobile robot
Daniel CASAGRANDE
Laboratoire d'Analyse et d'Architecture des Systèmes (C.N.R.S.)
7 Avenue du Colonel Roche - 31077 TOULOUSE - Cedex 4

Written and edited with:
L
A
T
E
X / MiKTeX 2.9, TeXnicCenter 1 Stable Release Candidate 1.
Font: Computer Modern, 12 pt
Print date: August 20, 2012
Contents
Contents iv
List of figures v
List of tables vi
1 Introduction 1
1.1 Presentation of LAAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Project background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Project theoretical bases 6
2.1 Robot Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Robot model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Camera model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Interaction Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Visual servoing control law . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Extraction of the image features 16
3.1 Artificial landmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Natural landmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 SURF Implementation . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Visual servoing tests results . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Vision based navigation through a cluttered environment 26
4.1 Control law during the obstacles avoidance phase . . . . . . . . . . . . . 26
4.1.1 Mobile base control . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Control law for the camera . . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Global control law . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Implementation of the controller . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Detecting and defining an obstacle . . . . . . . . . . . . . . . . . 31
4.2.2 Estimation of the obstacle position . . . . . . . . . . . . . . . . . 33
4.3 Obstacle avoidance test results . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Conclusions 37
Appendices 38
A GenoM: Generator of Modules 39
A.1 General overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.2 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.3 Rackham’s Software Architecture . . . . . . . . . . . . . . . . . . . . . . 41
Bibliography 43
Index 45
Abstract 46
List of Figures
1.1 Simplified software architecture of Rackham. . . . . . . . . . . . . . . . . 3
1.2 Gantt chart of the project. . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Rackham . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Rackham’s model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Camera pinhole model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Artificial target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Flux diagram for the artificial landmarks algorithm. . . . . . . . . . . . . 17
3.3 Images from the final and initial position of the robot. . . . . . . . . . . 18
3.4 Initialization request flux chart. . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Current points stored in the iSort vector. . . . . . . . . . . . . . . . . . . 20
3.6 Launch SURF request flux chart. . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Initial position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.8 Final position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9 Linear speed of the base. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.10 Angular speed of the base. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.11 Angular speed of the PTU. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.12 Normalized error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Path following. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Rotating potential field. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Laser field of view (a) and lateral view of Rackham (b). . . . . . . . . . . 31
4.4 Obstacle’s points determination. . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Additional measures to define an obstacle. . . . . . . . . . . . . . . . . . 32
4.6 Sudden loss of the obstacle. . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7 Initial position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.8 Final position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.9 Linear speed of the base. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.10 Angular speed of the base. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.11 Angular speed of the PTU. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.12 Normalized error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.1 Rackham’s software architecture. . . . . . . . . . . . . . . . . . . . . . . 41
List of Tables
3.1 New distances for P
0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
1 Introduction
This report is the result of my master final project in order to obtain the degree of Eras-
mus Mundus European Master in Mechatronics and Micro-Mechatronics Systems. The
work was carried out within the facilities of the Laboratoire d’Analyse et Architecture
de Systèmes (from now on LAAS) of the Centre National de la Recherche Scientifique
(C.N.R.S.).
1.1 Presentation of LAAS
The LAAS is a CNRS research unit located in Toulouse; and with currently more
than 600 employees it is one of the largest laboratories in France and one of the most
recognized European research organizations in the field of robotics. Furthermore, it
is a Carnot institute, label given by the French Ministry of Research, indicating its
excellency of relationship with industry.
LAAS conducts research activities on different types of systems: micro and nano sys-
tems, embedded systems, integrated systems, large scale systems, biological systems,
mobile systems, autonomous systems et critical information systems. Various applica-
tion domains are considered such as: aeronautics, space, transportation, energy, services,
health, telecommunications, environment, production and defense. 8 research topics in
the LAAS cover the activities of the 21 research teams of the laboratory:
• Critical Information Processing
• Networks and Communications
• Robotics
• Decision and Optimization
• MicroNanosystems RF and Optical
• Nano-Engineering and Integration
• Energy Management
• MicroNanoBio Technologies
The present work finds it place in the research group called Robotics, Action and
Perception (RAP). Scientific themes studied in this group are in the continuity of works
made from thirty years at LAAS, about the functional and decisional autonomy of
robotic systems executing tasks in dynamic or changing environments. Within the
Robotics department of the lab, the RAP team focuses mainly its works on the functional
1.2 Project background 2
layer integrated on robotic systems: either indoor or outdoor robots equipped generally
with several sensors (vision, audio, laser scanners, RFID readers) or humanoid robot,
only fitted out in visual and audio sensors. The RAP group is concerned by:
• the integration of advanced functions on a physical robot executing tasks in a
passive environment,
• humans, robots and environment interaction when making profit of an Ambient
Intelligence,
• applications of robotics technologies in other domains.
1.2 Project background
This work addresses the problem of vision based navigation in a cluttered environment.
The RAP team proposes different solutions to this problem based on visual servoing.
Visual servoing is a technique that allows to control a robot using the information
delivered by one or more cameras, which can be embedded or not. The goal is to
minimize the error between a set of images measurements and a set of reference values
which characterizes the desired final position of the robot. When the error is lower than
the preselected threshold then the visual servoing controller has successfully finished its
task.
When it comes to the implementation of such a controller, there are different methods
to achieve it. The most general classification [see Chaumette and Hutchinson, 2006] is:
3D Visual Servoing (also known as Position Based or PBVS), 2D Visual Servoing (also
known as Image Based or IBVS), and the hybrid approaches such as, for example, the
2D 1/2 approach [see Malis et al., 1999].
The 3D Visual Servoing uses the visual measurements and reconstructs the 3D pose
of the robot, then calculates a kinematic error in the Cartesian space and forwards the
control values to the actuators. This estimated 3D pose is highly sensitive to modelling
and measurement errors, which reduces the accuracy of the control scheme.
On the other hand, 2D Visual Servoing the error is calculated directly from the visual
measurements. The main advantage of this strategy is that it is less sensitive to camera
noise, camera calibration, and uncertainties in the robot’s model, which results in a
more robust control. Another great advantage is that it works directly with the visual
data measurements instead of the 3D pose calculation from PBVS; this means that it
uses the less amount of data while being able to perform the task. This reduces the
processing time, often a critical parameter due to the amount of information obtained
in each image acquisition. This method has the disadvantage of struggling when long
distances or rotations along one axis have to be performed and the control may fail.
The 2D Visual Servoing was chosen for this project due to the robustness regarding
camera noise and modelling uncertainties, and from now on when the concept visual
servoing is used it will be in reference to this strategy. Natural or artificial landmarks
can be used to perform the visual servoing. In this work we will attempt to perform
natural landmarks based navigations.
1.2 Project background 3
Pure visual servoing strategies are not enough when the goal is to perform an indoor
navigation task in a real environment where moving or static obstacles may be found.
In this case, there are two things to be kept in mind: the risk of occlusions and the
risk of collisions. In other words, the robot has to be able to deal with landmark
occlusions and navigate among obstacles. An occlusion is defined as the loss of the
scene’s visual features. Many situations can cause an occlusion and many solutions to
deal with them have been proposed and are available [see Folio, 2007; Durand Petiteville,
2012]. Nevertheless, this project is focused on the implementation of obstacles avoidance
strategies, particularly in two different and already validated strategies: path following
and artificial potential field [see Folio, 2007].
So, two goals are defined for the project:
• exploring the feasibility of natural landmarks based navigation,
• implementing the chosen obstacle avoidance routines in order to avoid obstacles.
Figure 1.1: Simplified software architecture of Rackham.
The developed methods will be implemented on Rackham, one of the robots available at
LAAS. This robot runs on a modular architecture, which relies on a module generator
developed in the LAAS called G
en
oM. It is a very useful tool for complex real time
systems, where each generated module performs a specific task (acquire the images,
control the robot, acquire the laser data, etc.) and communicates with the other modules
in order to accomplish a global and more complex task.
The figure 1.1 presents the simplified software architecture of Rackham, where surfvs
and visuserv are the most relevant modules for the project. The first one was entirely
developed during this internship, and the second one was largely updated. For the sake
of clarity and consistence G
en
oM is more precisely described in the appendix A.
1.3 Project organization 4
1.3 Project organization
The timespan for the project is five months, from March first of 2012 to August third
2012. The figure 1.2 presents the Gantt chart of the project. The main tasks to be
performed during this internship are:
• recalculate Rackham’s model, which has changed due to modifications to the robot
structure,
• develop a new G
en
oM module to extract the visual features from the images,
• implement the visual servoing controller,
• confirm whether the openSURF implementation is suitable for our application,
• implement the obstacle avoidance controller,
• write the final report of the project.
This report is organized as follows: the first chapter will provide all the relevant
information about the used models and about the applied control law. The second
part describes the extraction of the visual features from the images and the third one
describes the obstacle avoidance controller. The appendix will attempt to bring some
light over the software platform in which all the algorithms are executed.
1.3 Project organization 5
F
i
g
u
r
e
1
.
2
:
G
a
n
t
t
c
h
a
r
t
o
f
t
h
e
p
r
o
j
e
c
t
.
6
2 Project theoretical bases
This chapter describes the theoretical bases required for our project. First of all the
model of Rackham, the robot used here, will be detailed. Second, the model of the
camera will be presented. Finally, the last two sections of this chapter will introduce
the elements needed to synthesize a control law to perform a visual based navigation
with Rackham in a free environment.
2.1 Robot Description
Rackham is a modificated B21R platform manufactured by iRobot, one of the world
leading enterprise in the field of robotics. Rackham, shown in figure 2.1, is a mobile
nonholonomic robot, with a cylindrical shape and the following measures: 53 cm of
diameter, 180 cm of height and a weight of 140 kg. It has four driving wheels powered
by 4 high torque 24 V DC servo motors. The speed limit for translations is 1 m/s and
for rotations 160°/s.
Figure 2.1: Rackham
The most important elements Rackham
is equipped with are listed below:
• 3 Intel Pentium III processors under
Linux,
• IEEE 802.11b Wireless lan (11Mb/s),
• 1 SICK

LMS-200 2D Laser Range
Finder,
• 8 RFID antennas,
• 24 Infrared proximetry sensors,
• 48 Sonar range sensors,
• PTU unit,
• 1 MicroPix IEEE-1394 CCD Cam-
era Progressive Scan, up to 30frames/sec
and 1024x768 pixels and delivers 24
bit RGB color images,
• 1 Ladybug2 camera.
2.2 Robot model 7
The most relevant items for this project are the SICK

laser, used for the obstacles
detection, and the Micropix camera with the PTU. The PTU is used to add another
degree of freedom (the one corresponding to the tilt is not used in this project) to move
the camera independently of the robot’s base. This aims to controlling the movement of
the camera instead of the movement of the robot itself. This approach allows to control
Rackham as an holonomic robot, eliminating the constraints due to nonholonomy.
2.2 Robot model
Rackham’s accurate motion control relies on having a proper model, therefore modelling
the robot is a crucial part of the project. The available model of the robot was not valid
any more due to modifications in the robot’s structure. So, during the internship, we
calculated a new model; which is presented in the figure 2.2.
Figure 2.2: Rackham’s model.
Four reference frames are defined:
• R
O
(O, x
O
, y
O
, z
O
): the scene’s reference frame.
• R
M
(M, x
M
, y
M
, z
M
): the robot’s base reference frame.
• R
P
(P, x
P
, y
P
, z
P
): the plate’s reference frame.
• R
C
(C, x
C
, y
C
, z
C
): the camera’s reference frame.
2.2 Robot model 8
The position of the robot’s base relative to the frame R
O
is represented by X
M
and Y
M
;
and its rotation, also relative to R
0
, is defined by θ. The position of the PTU relative
to R
M
is defined by D
x
and its rotation by ϑ. The position of the camera relative to
R
P
is only defined by C
x
and C
y
because the camera orientation is fixed to the PTU
orientation. The distances D
x
, C
x
and C
y
are fixed and defined by Rackham’s geometry,
while X
M
, Y
M
, θ and ϑ change with the movement of the robot.
From this we can introduce the following elements q(t):
• the vector q(t)
q(t) = [(t) θ(t) ϑ(t)]
T
(2.1)
where (t) represents the curvilinear abscissa of the point M in R
0
(see figure 2.2).
• The time derivative of q(t), which will be the speed control vector of the robot
˙ q(t):
˙ q(t) =
dq(t)
dt
= [υ(t) ω(t) (t)]
T
(2.2)
where v(t) and ω(t) are the linear and angular speed of the robot’s base and (t)
the plate’s pan speed.
• The three following transformation matrices:
H
R
O
/R
M
=
_
¸
¸
_
cos(θ) −sin(θ) 0 X
M
sin(θ) cos(θ) 0 Y
M
0 0 1 0
0 0 0 1
_
¸
¸
_
H
R
P
/R
C
=
_
¸
¸
_
0 0 1 C
x
0 1 0 C
y
−1 0 0 0
0 0 0 1
_
¸
¸
_
H
R
M
/R
P
=
_
¸
¸
_
cos(ϑ) −sin(ϑ) 0 D
x
−sin(ϑ) −cos(ϑ) 0 0
0 0 −1 h
r
0 0 0 1
_
¸
¸
_
(2.3)
where, h
r
represents the difference of height between the reference system of the
plate R
P
and the one of the base R
M
.
Now we focus on the determination of the camera kinematic screw with respect to the
scene frame, denoted by T
C/R
O
. This screw is required to control the robot using visual
servoing. The approach to obtain it is the same as in [Cadenat, 1999]. The equation
(2.4) of classical mechanics allows to express the kinematic screw of the studied camera
as follows:
T
C/R
0
=
_
_

V
C/R
0


R
C
/R
0
_
_
=
_

V
C/R
M
+

V
M/R
0
+


R
M
/R
0

−−→
MC


R
C
/R
M
+


R
M
/R
0
_
(2.4)
This screw is required to control the robot using visual servoing.
2.2 Robot model 9
Before starting the calculations, we will explain the used notation: T
R

X/R
defines the
kinematic screw of X with respect to R expressed in R

.
The computation of T
C/R
0
involves the determination of two distinct screws: T
M/R
0
and T
C/R
M
whose expressions are recalled herebelow.
T
M/R
0
=
_
_

V
M/R
0


R
M
/R
0
_
_
T
C/R
M
=
_
_

V
C/R
M


R
M
/R
0
_
_
(2.5)
The step by step calculation of T
M/R
0
, T
C/R
M
and the vector
−−→
MC is needed to obtain
T
R
M
C/R
0
. Once these steps are completed the transformation matrix from the frame R
M
to R
C
is used to express the resulting screw in the camera frame, obtaining T
R
C
C/R
0
.
• First T
R
M
M/R
0
: the values can be obtained directly from the figure 2.2.
T
R
M
M/R
0
=
_
_
_

V
R
0
M/R
0


R
M
R
M
/R
0
_
_
_
=
_
_
_
_
_
_
_
_
v
0
0
0
0
ω
_
_
_
_
_
_
_
_
(2.6)
• Then T
R
M
C/R
M
:
T
C/R
M
=
_
_

V
M/R
0


R
M
/R
0
_
_
=
_

V
C/R
P
+

V
P/R
M
+


R
P
/R
M

−→
PC


R
C
/R
P
+


R
P
/R
M
_
(2.7)
As point C is bounded to the frame R
P
and P to the frame R
M
the speeds

V
C/R
P
and

V
P/R
M
are zero; and also


R
C
/R
P
is zero because R
C
is fixed to R
P
:
T
C/R
M
=
_


R
P
/R
M

−→
PC


R
P
/R
M
_
(2.8)
The vector
−→
PC = [C
x
C
y
C
z
]
T
is known in the frame R
P
and can be expressed in
R
M
using the transformation matrix defined in (2.3). On the other hand,


R
M
R
C
/R
M
is nothing else that [0 0 ]
T
. So, after performing the cross product, we obtain
2.2 Robot model 10
T
R
M
C/R
M
:
T
R
M
C/R
M
=
_
_
_

V
R
M
C/R
M


R
M
R
C
/R
M
_
_
_
=
_
_
_
_
_
_
_
_
−C
x
sin ϑ −C
y
cos ϑ
C
x
cos ϑ −C
y
sin ϑ
0
0
0
1
_
_
_
_
_
_
_
_
(2.9)
• And last, the vector MC
R
M
: this vector can be obtained as the vectorial sum of
−−→
MP and
−→
PC, both defined in the frame R
M
:
MC
R
M
=
_
_
D
x
+ C
x
cos ϑ −C
y
sin ϑ
C
x
sin ϑ + C
y
cos ϑ
h + C
z
_
_
(2.10)
where h represents the height difference between the frame of the PTU and the robot’s
base frame.
Finally, having all the required values the kinematic screw of the camera relative to
the frame R
0
is rearranged to be expressed as a matrix multiplied the control vector ˙ q:
T
R
M
C/R
0
=
_
_
_
_
_
_
_
_
1 −C
x
sin ϑ + C
y
cos ϑ −C
x
sin ϑ + C
y
cos ϑ
0 D
x
+ C
x
cos ϑ −C
y
sin ϑ C
x
sin ϑ −C
y
cos ϑ
0 0 0
0 0 0
0 0 0
0 1 1
_
_
_
_
_
_
_
_
˙ q (2.11)
And projecting this screw in the camera frame using H
T
R
M
R
C
the final result is:
T
R
C
C/R
0
=
_
_
_
_
_
_
_
_
V
x
C
V
y
C
V
z
C

x
C

y
C

z
C
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
0 0 0
−sin ϑ C
x
+ D
x
cos ϑ C
x
cos ϑ −C
x
+ D
x
sin ϑ −C
y
0 −1 −1
0 0 0
0 0 0
_
_
_
_
_
_
_
_
˙ q (2.12)
where the lines with nothing else but zeros represent the degrees of freedom (from now on
DOF) that are not controllable. Thus, the kinematic screw can be reduced or simplified
to the three controllable DOF of the camera:
T
R
C
C/R
0
=
_
_
V
yC
V
zC

xC
_
_
=
_
_
−sin ϑ C
x
+ D
x
cos ϑ C
x
cos ϑ −C
x
+ D
x
sin ϑ −C
y
0 −1 −1
_
_
˙ q = J
Robot
˙ q (2.13)
The relation between the control vector and the kinematic screw is a matrix called
2.3 Camera model 11
the Jacobian matrix of the robot:
J
robot
=
_
_
−sin ϑ C
x
+ D
x
cos ϑ C
x
cos ϑ −C
x
+ D
x
sin ϑ −C
y
0 −1 −1
_
_
(2.14)
The determinant of this matrix is −D
x
, that means that the Jacobian is regular and
invertible (because D
x
= 0).
2.3 Camera model
After modelling the robot, the next step is to model the camera. The main properties
of the MicroPix IEEE-1394 Camera are listed below:
• Optic centre: U
0
= 159.05 pixels and V
0
= 122.71 pixels.
• delivers up to 15 fps in jpg format and a resolution of up to 1024 × 768 pixels.
• size of the pixels: squared pixels of size pix = 4.65 µm.
• 10 bit mono or 24 bit RGB colour images.
Figure 2.3: Camera pinhole model.
The camera is configured to acquire RGB full resolution images, but they are later
rescaled to a resolution of 320×240 pixels to be compatible with the pre-existent G
en
oM
1
module visuserv.
1
For a complete explanation of the G
en
oM platform see the appendix A
2.4 Interaction Matrix 12
The pinhole camera model is adopted for this project. This means that the coordinates
of the 3D points of the scene are mapped into the 2D image plane of an ideal pinhole
camera, as shown in the figure 2.3. In this model it is assumed that all the light enters
through a single point; in other words the camera aperture is considered to be a single
point, called optic centre.
The projection of a 3D point p
R
C
= [x y z]
T
in a 2D point P
im
= [X Y ]
T
in metric
coordinates in the image plane is given by the following equation:
P =
_
X
Y
_
=
_
f/z 0 0
0 f/z 0
_
_
_
x
y
z
_
_
(2.15)
with f being the focal length of the camera. The point can also be expressed in pixels
using the metric coordinates and the parameters of the camera:
p =
_
U
V
_
=
_
α
U
0 U
0
0 −α
V
V
0
_
_
_
X
Y
1
_
_
(2.16)
where α
U
= α
V
=
1
pix
and pix is the size of a pixel.
2.4 Interaction Matrix
This matrix is a key element when dealing with visual servoing problems. Indeed, it
allows to relate the motion of the visual features in the image to the camera movements
described by the kinematic screw T
C/R
0
. Thus, the expression of this matrix mainly
depends on the type of the considered visual features. In this project, whether artificial
or natural targets are used, the visual primitives extracted from the image will be points.
Therefore,we will calculate the interaction matrix for this particular primitive.
For the sake of clarity we will consider only one point, before extending the result to
the entire set of visual primitives. It was already stated that the 3D point p = [x y z]
T
of
the figure 2.3 can be projected on the image plane using the equation (2.15). The time
derivative of this equation expresses the time variation of the 2D point P
im
= [X Y ]
T
relative to the speed V
R
C
p/R
C
of the point p:
_
¸
¸
_
¸
¸
_
˙
X =
_
f
z
0
−xf
z
2
_
. V
R
C
p/R
C
˙
Y =
_
0
f
z
−yf
z
2
_
. V
R
C
p/R
C
= L
1
(X, Y, z) . V
R
C
p/R
C
(2.17)
In this project only fixed targets are considered, that means that the point p is fixed
and the variation of the point P
im
is only due to the movement of the camera. Knowing
that the camera moves with a linear speed of
− →
V
C/R
O
and an angular speed of
− →

R
C
/R
O
2.4 Interaction Matrix 13
the velocity of the point p can be expressed as:
− →
V
p/R
C
= −
− →
V
C/R
O

− →

R
C
/R
O

−→
CP (2.18)
And replacing the speed value in the equation (2.17) the result is:
˙ s =
_
˙
X
˙
Y
_
= L
1
(X, Y, z) . (I −
¯
CP
R
C
)
. ¸¸ .
. T
R
C
C/R
0
(2.19)
L
XY
(X, Y, z) (2.20)
where
¯
CP
R
C
is the skew symmetric matrix associated to vector
−→
CP
R
C
. If the vec-
tor s denotes the visual information, then the changes in the visual information are
represented by ˙ s. Following the above definition, the matrix L
XY
appears to be the in-
teraction matrix. After some computations, we obtain the following interaction matrix
for one point:
L
XY
(X, Y, z) =
_
¸
¸
_

f
z
0
X
z
XY
f

_
f +
X
2
f
_
Y
0 −
f
z
Y
z
f +
Y
2
f

XY
f
−X
_
¸
¸
_
(2.21)
where z is the depth. This interaction matrix is suitable for a six DOFs camera. So
in order to be compatible with the reduced kinematic screw defined in equation (2.13)
only the three controllable DOFs are going to be taken into account. Considering that
our target will be described with four points, the reduced interaction matrix for the four
points P
im
(X
i
, Y
i
) with i = 0, 1, 2, 3 is presented in the equation (2.22).
L
r
=
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
0
X
0
z
0
X
0
Y
0
f

f
z
0
Y
0
z
0
f +
Y
2
0
f
0
X
1
z
1
X
1
Y
1
f

f
z
1
Y
1
z
1
f +
Y
2
1
f
0
X
2
z
2
X
2
Y
2
f

f
z
2
Y
2
z
2
f +
Y
2
2
f
0
X
3
z
3
X
3
Y
3
f

f
z
3
Y
3
z
3
f +
Y
2
3
f
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
(2.22)
2.5 Visual servoing control law 14
2.5 Visual servoing control law
Now it remains to design the vision based control law. To do so, we plan to use the task
function approach which is summarized here below.
Task function formalism
The task function formalism [Samson et al., 1991] was introduced to set a common
control framework for manipulator robots, and was later extended to mobile robots.
In this approach, the robotic application is described in terms of a particular output
function called task function e(q(t)). e is generally a n-dimensional vector where n is
the number of DOFs of the robot. The application is then accomplished by regulating
to zero the task function within a certain time. As the control law is no longer expressed
in the state space but in the outputs states, the notion of admissibility was introduced
to be certain that the problem is well stated. A sufficient condition for a task function
to be admissible is that its Jacobian matrix J
vs
=
∂e
vs
∂q
is invertible [Samson et al., 1991].
Control law
In our case, the goal is to position the camera with respect to a given fixed landmark.
To properly complete this task four points are needed [Horaud, 1987]. The measurement
vector will then be defined by equation (2.23).
s(q(t)) = [X
1
Y
1
X
2
Y
2
X
3
Y
3
X
4
Y
4
]
T
(2.23)
It can be shown that a task function fulfilling the requested mission can be defined as
follows [Chaumette, 1990]:
e
vs
(q(t)) = C(s(q(t)) −s

) (2.24)
where s

is the measurement vector corresponding to the final position the robot must
reach. The task function has the same dimension as the number of controllable DOFs,
in this case three. And the measurement vector is a matrix is of dimension eight. So a
matrix is needed to make compatible these two variables. That matrix denoted here by
C is called Combination matrix [Chaumette, 1990].
There are several options when it comes to define the combination matrix. The same
happens with the estimation of the interaction matrix, for in reality it is impossible to
know the exact value of it. In this project a classical approach is followed, as in [Durand
Petiteville, 2012]. The interaction matrix
´
L
r
is defined as an estimation of the interaction
matrix corresponding to the final position of the camera. And the combination matrix is
defined as a constant matrix C = L
+
r
, where L
+
r
= (L
T
r
L
r
)
−1
L
T
r
is the left pseudoinverse
of the estimated interaction matrix
´
L
r
.
With the chosen combination and interaction matrices the task function Jacobian is
defined as J
av
= J
robot
[Durand Petiteville, 2012]. The Jacobian matrix of the robot
J
robot
defined in equation (2.14) has a non zero determinant, which means that the task
function is admissible.
2.5 Visual servoing control law 15
Having a proper task function, the next step is to make it vanish. The usual way is
to set an exponential decay of the task function:
˙ e
vs
= −λ
vs
e
vs
(2.25)
Where the decay speed is defined by the positive scalar or matrix λ
vs
. Differentiating
s(q(t)) through time t and q(t) and using the equation (2.25) it is possible to obtain
the equation for the control vector (2.26). For the complete procedure see [Durand
Petiteville, 2012].
˙ q = −(CL
r
J
r
)
−1
λ
vs
C(s −s

) (2.26)
This control law establishes a rigid virtual link [Chaumette, 1990] between the camera
and the environment and cancels the task function (2.24). To distinguish this control
vector from the followings, from now on it will be called ˙ q
vs
.
16
3 Extraction of the image features
In order to perform a navigation in a natural environment, the latter has to be character-
ized through the visual features independently of the camera, the zoom, the resolution,
of the angle of view, etc. It was stated in the chapter 2 that the visual primitives for
this work are points. These points will be extracted by a SURF [Lowe, 1999; Evans,
2009] algorithm when working with natural landmarks and by a color based algorithm
developed at the University Paul Sabatier when working with artificial landmarks. The
main advantage of using natural landmarks is that they do not require environment
modifications, and that, in general terms, they are not constrained by size, shape or
color.
First the algorithm used to work with artificial landmarks will be presented, and then
the strategy to implement the SURF detector to perform a navigation task using natural
landmarks. The last section presents the results of the tests.
3.1 Artificial landmarks
In this work, the considered artificial landmark is a set of four red circles. An algorithm,
Figure 3.1: Artificial target.
able to find the center of each of
them, was already available. From
the image provided by Rackham, see
figure 3.1, this algorithm filters the
image creating a new binary image
where the pixels with a certain value
of the red channel are isolated. Then,
the isolated pixels that are next to
each other are grouped into clusters.
And finally, the algorithm returns the
centre of the clusters within certain
dimension threshold. The figure 3.2
presents the simplified flux diagram
of the algorithm. A pre-filter was also
introduced before the image binariza-
tion, because the camera delivers red tinted images (see figure 3.1) due to a poor white
balance. With this configuration the algorithms considers only the pixels with a green
and blue component values smaller than 50 and a red component value higher than 120;
the remaining pixels are set to zero.
During the internship, the available algorithm was implemented in the surfvs G
en
oM
3.2 Natural landmarks 17
Figure 3.2: Flux diagram for the artificial landmarks algorithm.
module
1
. There is a request in this module called Launch SURF wich executes this
algorithm when a parameter indicating the use of artificial landmarks is given by the
user. All the module’s requests are explained in detail in the next section.
3.2 Natural landmarks
The realization of landmark based navigation tasks in unknown natural environments
requires to track natural landmarks from the scene. To this aim, the fast detection of
reliable visual features is necessary. A possible solution is to use a robust local feature
detector; and in this work two of the most popular and efficient features detectors have
been considered:
• the Scale Invariant Features Transform [Lowe, 2004] known as SIFT,
• and the Speeded Up Robust Features, known as SURF [Bay et al., 2008].
Finally, SURF was chosen due to lower processing time and similar performance as
SIFT [Juan and Gwon, 2006]. The implemented version of the SURF detector is the
open source version OpenSURF [Evans, 2009].
3.2.1 SURF Implementation
We also implemented OpenSURF in the surfvs G
en
oM module. When working with
OpenSURF the module has three requests: Initialization, Launch SURF and Stop. The
Initialization request is used to create the visual references. To do so, the request
acquires two images (the final and initial ones) and executes the OpenSURF detector
between them to find relevant matches. The best ten points are shown and numbered
(from 0 to 9, with 0 the best match) in the pictures. Then, the user has to choose
1
For more details see the section A.3 of the Appendix A.
3.2 Natural landmarks 18
four points which will be delivered to visuserv, the module responsible for the visual
servoing. The request Launch SURF starts the images acquisition (at a rate of 5 frames
per second), computes the OpenSURF with the current frame to obtain the updated
visual features and delivers them to the visual servoing module. The Stop request stops
the Launch SURF request in execution.
Initialization Request
This request has two options: Online and Offline. The Online mode is used to acquire
both initial and final images, which respectively correspond to the initial and final pose
of the robot. After taking the Initial Image, the OpenSURF detector is applied, finding
and storing all the matches between the images. These matches are sorted using a
matching rate that indicates the quality of the match. Once this is done, both images
are displayed along with the ten points that have the best matching rate. Points are also
numbered from 0 to 9, where the point 0 is the best match. Figures 3.3(a) and 3.3(b)
(a) Final image. (b) Initial image.
Figure 3.3: Images from the final and initial position of the robot.
show the Final and Initial images with the matches sorted and numbered. This way the
user can decide which the most convenient points to choose are, considering that lower
numbers are better but also that selected points should be separated from each other a
certain distance (no less than 20 pixels for images resolutions of 320×240).
After this, the Initialization request is called with the Online mode. During this step
the user explicitly chooses the points, which are then isolated and properly stored. In
this example the points 0, 1, 2, and 3 are selected (see figures 3.3 and 3.5(a)). In case
one of the distances is too small, a message warns the user to choose again the points.
The figure 3.4 presents the flux chart of this request.
Launch SURF Request
This request has a more complex functioning, as it has to perform the OpenSURF in
real time while being able to detect and avoid mismatches. To do so, it works with
3.2 Natural landmarks 19
Figure 3.4: Initialization request flux chart.
3.2 Natural landmarks 20
the information from three images: the Final one, the current one and the previous
one. The four points from the Final image have to be matched to four points in the
current image, task that is performed by the OpenSURF detector. These matches are
stored in the iSort vector, and the next step is to determine whether the matches are
accurate. As a first step, iSort vector is sorted to achieve the same order as the vector
with the previous points (ipts3). The sorting algorithm takes the first point of ipts3
and compares it to the first point of iSort; if the x or the y coordinate have a difference
greater than 10 pixels, they are considered to be different points. Then, the method
tries to find a better match comparing the first ipts3 point with the next three points
in iSort. If it finds a point with a difference in both coordinates smaller than 25 pixels
it is considered a potential match, then it changes the position of this point with the
first point of iSort; if it does not find any potential match it does nothing. The same is
also applied for the three other points of ipts3. The figure 3.5(a) shows the set of points
(a) Current image. (b) Distances tagging.
Figure 3.5: Current points stored in the iSort vector.
stored in the vector iSort, ipts3 would have the same type of information but from the
previous iteration.
Having both vectors sorted with the same order results in an easier filtering of the
wrong matches. It is of critical importance that the wrong matches are identified and
that they are not sent to the visual servoing module, otherwise the control would fail.
The implemented solution consists in identifying the mismatched points and replacing
them with the correspondent previous points. The first attempt to identify a possible
mismatch was a filter that works in the following way:
• six distances are calculated using the coordinates of the four points from the vector
ipts3; the same is done using iSort. The figure 3.5 shows a real capture with the
four points and a detail of the 6 calculated distances. The distances are calculated
using the following equation:
distance[i] =
_
(x
k
−x
j
)
2
+ (y
k
−y
j
)
2
(3.1)
3.2 Natural landmarks 21
where i varies from 0 to 5, while k from 0 to 2 and j from k + 1 to 3; x and y are
the coordinates of the points.
• With these distances 15 ratios are computed using equation (3.2).
ratio[i] =
distance[k]
distance[j]
=
distance[i]
distance[j]
(3.2)
where i varies from 0 to 14, k from 0 to 4 and j from k +1 to 5. Most of the ratios
involve the use of only three points; this helps discriminate whether the remaining
point is a good match. The ratios 4 (involving distances 0 and 5), 7 (distances 1
and 4) and 9 (distances 2 and 3) are computed using the four points, what means
that they do not deliver any useful information and for this reason they are left
behind.
• Then, the errors between the same ratios correspondent to each vector is calculated
with equation (3.3), resulting in a vector of 12 errors:
error[i] = ratioipts3[i] −ratioiSort[i] (3.3)
where i varies from 0 to 12.
• Each error is evaluated and a probability value (of being a wrong point) is assigned
to each involved point. There are three criteria to assign the probability: if the
error is greater than 20% the probability is increased by 2, if the error is between
10% and 20% the probability is increased by 1 and if it is smaller than 3% it is
decreased by 1.
• The probability values are normalized and evaluated for each point; if it is smaller
than 30% the point is considered to be a good match. If greater, then the previous
point is considered.
It may happen that the SURF algorithm finds 2, 3 or even 4 concurrent points as
matches. Clearly, only one of the concurrent points could be a good match. A routine to
detect concurrent points and possible matches was then introduced. Still, the method
had proven itself not to be robust enough, allowing sometimes points that were wrong
matches to be considered as good.
Therefore another filter was developed. The basic principle is very similar to the pre-
vious one but three main differences can be highlighted. The first one is that it considers
one point from iSort at a time instead of the complete set of points simultaneously. The
figure 3.1 illustrates how the method works for the first point. The distances (also the
ratios and errors) are now computed with these values, what means that only the point
P0 is being evaluated. The process is repeated for the other three points.
Evaluating each point separately increases the performance, and has the advantage
of simpler code and that it does not require a special treatment for concurrent points.
The second difference is the way ratios are computed: nine ratios are computed for
each point. The distances that depend on the evaluated point are used as numerators
3.2 Natural landmarks 22
Vector ipts3: P
old
0
P
old
1
P
old
2
P
old
3
Vector iSort: P
new
0
P
old
1
P
old
2
P
old
3
Table 3.1: New distances for P
0
and the others as denominators. For example, considering the point P0 the numerators
would be distances 1, 2 and 3 and the denominators the distances 4, 5 and 6. Computing
all possible combinations returns nine ratios, presented in equation (3.4). The advantage
is that errors are only introduced in the numerator avoiding the possibility of error
compensation.
ratio[0] =
distance 1
distance 4
ratio[1] =
distance 2
distance 4
ratio[2] =
distance 3
distance4
ratio[3] =
distance 1
distance 5
ratio[4] =
distance 2
distance 5
ratio[5] =
distance 3
distance5
ratio[6] =
distance 1
distance 6
ratio[7] =
distance 2
distance 6
ratio[8] =
distance 3
distance6
(3.4)
The third and last difference is the filtering condition. Once all the ratios of a point
are computed, they are summed and if this value is smaller than a certain threshold the
match is considered to be accurate. This method only needs one filter threshold to be
configured.
If the OpenSURF detector fails in successive iterations to find accurate matches.
In this case, the probability threshold is raised, allowing a match with greater error
(due to the bigger difference with the last good match) but nevertheless accurate to be
considered as such.
The figure 3.6 presents the flux chart of this request.
Stop SURF Request
This request is used to stop the launch request, closing the data files and freeing vari-
ables.
In the next section we will present the tests results of performing visual navigation in
a free environment using artificial and natural landmarks.
3.2 Natural landmarks 23

Find and sort
matches
No
Launch SURF
Request

Natural
landmarks?
Executing the
algorithm for
artificial
landmarks*
i < 4 ?
Matching
previous
point i
Matching
current
point i
Yes
Yes No
i = 0
Point
i OK?
* Note: the flux chart of the
algorithm for artificial landmarks
is presented in figure 3.2
No Yes
Figure 3.6: Launch SURF request flux chart.
3.3 Visual servoing tests results 24
3.3 Visual servoing tests results
We performed many tests with different final and initial configurations and also with
different environmental conditions. The best results were obtained at night, because the
lighting condition during the day reduced the range in which our algorithms are able
to find the visual references. We will present first the results obtained with artificial
landmarks and then with natural landmarks.
The tests using artificial targets were successfully performed. We present the result
of a test, in which the linear speed is limited to 10 cm/s, and the angular speed to 0.2
rad/s. The initial position of the PTU is -10°. The figures 3.7 and 3.8 present the final
Figure 3.7: Initial position. Figure 3.8: Final position.
and initial position of the robot. Figures 3.9, 3.10 and 3.11 present the evolution of
0 10 20 30 40 50 60 70
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
Time [s]
m
/
s
Linear speed of the mobile base
Figure 3.9: Linear speed of the base.
0 10 20 30 40 50 60 70
−0.02
−0.01
0
0.01
0.02
0.03
0.04
Time [s]
r
a
d
/
s
Angular speed of the mobile base
Figure 3.10: Angular speed of the base.
the three velocities while figure 3.12 the evolution of the normalized error between the
visual references of the current and reference image.
3.3 Visual servoing tests results 25
0 10 20 30 40 50 60 70
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
Time [s]
r
a
d
/
s
Angular speed of the PTU
Figure 3.11: Angular speed of the PTU.
0 10 20 30 40 50 60 70
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Time [s]
e
(
t
)
Normalized error
Figure 3.12: Normalized error.
The tests using natural landmarks revealed that when the difference of orientation
between the final position of the camera and the initial one is lower than 15°, the robot
arrives to his final position. But if this difference is greater, the openSURF implementa-
tion fails to find the same 4 points all along the trajectory. We tried different parameters
configurations of the SURF algorithm, but none of them performed as desired. For this
reason, we decided to continue our work using only artificial landmarks.
26
4 Vision based navigation through a
cluttered environment
The domain of vision based navigation strategies in robotics is a very prolific one [F. Bonin-
Font and Oliver, 2008; De Souza and Kak, 2002]. Nevertheless, it is common to classify
the visual based navigation strategies into two main categories:
Global methods: where the use of environment maps is involved and generally the
location of the obstacles, if there are any, is known. With all that information
and the global location of the robot a trajectory is planned before the navigation
task begins. Different approaches allow to deal with modifications of the envi-
ronment after the trajectory is planned. One of these approaches is the elastic
band [Quinlan, 1994], which locally deforms the original trajectory allowing the
robot to adapt to the modified environment.
Local methods: where the environment is a priori unknown (no map is available),
and therefore trajectory planning cannot be performed. Instead, the information
provided by the sensors (the most common ones are cameras, laser range finders,
ultrasound and infrared proximetry sensors) is used in real-time to control the
robot.
Our project belongs to the second approach: the environment is unknown for the
robot and the information provided in real-time by the embedded sensor (the camera
in this case) is used to complete the task. Our idea is to add another local method
to allow Rackham to avoid unexpected obstacles present in the scene. Note that their
location, size and amount are unknown. In this context, the first control law, presented
in equation (2.26), is used to guide Rackham towards the target in the free space. When
a risk of collision exists, a second control law prevails over the first one to modify the
trajectory. The integration of both methods is performed through a global control law.
The first section of this chapter presents the chosen control strategy. The second
section addresses the implementation on Rackham.
4.1 Control law during the obstacles avoidance phase
During the obstacles avoidance phase, two objectives must be reached: the robot must
not collide with the obstacle and the visual features must be kept within the field of
view. To accomplish this, we control separately the mobile base and the pan platform.
In the sequel, we first describe the controllers designed for each of them, before pre-
senting the global control law which decides which controller must be applied: pure
visual servoing, pure obstacles avoidance or a mix of both.
4.1 Control law during the obstacles avoidance phase 27
4.1.1 Mobile base control
The mobile base is controlled using two already available methods to avoid obstacles:
path following [Samson, 1992] and artificial rotating potential field [Cadenat, 1999]. We
implemented these methods on Rackham robot.
Path following
This method, proposed in [Cadenat, 1999], is supported by the path following formal-
ism [Samson, 1992]. The idea is to guarantee non collision by following a security path
ξ
0
around the obstacle, keeping a certain distance d
0
(see figure 4.1). To do this, the
couple (δ,α) must be stabilized to zero. δ is the difference between the distance to
collision d
coll
and d
0
, while α is the angle between the robot orientation and the tan-
gent to the security envelope. The dynamics of this couple is presented in the following
Figure 4.1: Path following.
equation [Samson, 1992]:
_
¸
¸
_
¸
¸
_
˙
δ = υ
coll
sin α
˙ α = ω
coll

χυ
coll
cos α
1 + σχδ
(4.1)
where χ is equal to
1
R
, with R the curvature radius of the obstacle in the considered
point. The value of σ is 1 if the robot follows the obstacle’s contour clockwise, -1 if it
does it counter clockwise and 0 if the contour is a straight line.
The path following problem consists in finding a control law ω
coll
allowing to stabilize
the system (4.1) to (0,0) under the assumption that the linear velocity and its derivative
4.1 Control law during the obstacles avoidance phase 28
are bounded and that it never vanishes. This last assumption allows to guarantee the
system controllability. Following [Folio, 2007] and [Durand Petiteville, 2012], a control
law allowing to fulfill these properties is given by:
υ
coll
= υ
0
ω
coll
= −υ
coll
(kδ + α + 2k sin α −χcos α) (4.2)
where k is a positive gain. The proof can be consulted in [Folio, 2007; Durand Petiteville,
2012], while here only the necessary equations for the control are presented.
Artificial rotating potential field
The potential field approach is a well-known method which allows to make a robot
safely navigate through a cluttered environment. The basic principle of this method is
to create an artificial potential field [Khatib, 1980] to guide the robot towards the goal.
In this approach the target is modelled as an attractive field, the obstacle as a repulsive
one and the resulting virtual force would control the robot’s motion. In this work, the
vehicle is driven towards the goal using vision. There is then no need of an attractive
potential, this role being played by visual servoing. Thus, only a repulsive potential will
be created to insure non collision. To this aim, three envelopes around the obstacle are
Figure 4.2: Rotating potential field.
defined in the figure 4.2:
1. the first envelope ξ

is located to a distance d

> 0 from the obstacle and defines
the region where the risk of collision with the obstacle is very high.
2. the second envelope ξ
0
, located to a distance d
0
> d

from the obstacle, defines
the security envelope the robot must follow to avoid the obstacle.
4.1 Control law during the obstacles avoidance phase 29
3. the third envelope ξ
+
, located to a distance d
+
> d
0
from the obstacle, defines the
obstacle’s influence area: the robot begins to consider the obstacle when it enters
this region.
The idea, proposed in [Cadenat, 1999], is to define the modulus F and the orientation
β of the repulsive force so that the robot follows the security envelope, leading to the
result given hereafter:
F = k
1
1
d
2
coll
_
1
d
coll

1
d
+
_
−k
2
(d
coll
−d
+
)
β = α −
π
2
d
coll
d
2
0
+
π
2
(4.3)
Indeed, one can see that the orientation β of the repulsion force is defined to bring
the robot parallel to the obstacles; and that its module F increases when Rackham
moves towards the obstacle. Furthermore, the above definitions allow to obtain a ro-
tative repulsive potential field which presents the advantage of limiting local minima
problems [Cadenat, 1999]. Now, knowing these two values it is possible to obtain the
control speeds for the robot’s base using the following equations [Cadenat, 1999]:
υ
coll
= k
υ
coll
F cos β
ω
coll
= k
ω
coll
F
D
x
sin β
(4.4)
where k
υ
coll
and k
ω
coll
are positive scalar gains.
Equations (4.2) and (4.4) provide two control laws allowing to guarantee non collision
by making the robot follow a pre-defined security envelope. Now, we focus on the control
of the camera during the obstacle avoidance.
4.1.2 Control law for the camera
It was already stated that the visual features must be kept within the camera field of
view during the whole avoidance phase. To do so, the idea was to control the camera
so that it compensates the movements of the robot’s base. We implement the same
approach as in [Cadenat, 1999] based in the task function formalism, where the camera
pan speed is controlled to maintain the target at the center of the image using the visual
features.
If the point P(X
0
, Y
0
) is the center of mass of the visual features, then a task function
to center the target in the image can be defined as:
e
Y
= Y
0
(4.5)
To decouple the control of the camera from the control of the robot’s base the Jacobian
4.2 Implementation of the controller 30
matrix of the robot and the avoidance control vector ˙ q
oa
are both splitted:
J
robot
= (J
base
J
pl
) (4.6)
˙ q
oa
=
_
˙ q
base
˙ q
pl
_
=
_
_
υ
coll
ω
coll

coll
_
_
(4.7)
where J
base
are the first two columns of J
robot
and J
pl
is the third one; ˙ q
base
are the first
two rows of ˙ q and ˙ q
pl
the last one. The control vector for obstacles avoidance is defined
by ˙ q
oa
.
Making e
Y
exponentially decrease to zero and separating the control of the mobile
base and of the pan platform leads to the following result [Cadenat, 1999]:
˙ q
pl
=
coll
= −
1
L
Y
0
J
pl

Y
e
Y
+ L
Y
0
J
base
˙ q
base
) (4.8)
where λ
Y
is a positive scalar gain and L
Y
0
is the reduced interaction matrix for the
ordinate Y
0
.
4.1.3 Global control law
Two controllers have been implemented: the pure visual servoing controller (see equa-
tion (2.26)) and the obstacles avoidance controller (see equation (4.7). In order to guar-
antee that the robot will be controlled with smooth velocities, we define the following
control law, already proposed in [Cadenat, 1999]:
˙ q
global
= (1 −µ
coll
) ˙ q
vs
+ µ
coll
˙ q
oa
(4.9)
where µ
coll
varies continuously between zero and 1 with the distance to the obstacle
d
coll
and is responsible for a smooth change between controllers. The variation of µ
coll
is defined by the following equations:
_
¸
¸
_
¸
¸
_
µ
coll
= 0 d
coll
> d
0
µ
coll
=
d
coll
−d
0
d

−d
0
d

< d
coll
< d
0
µ
coll
= 1 d
coll
< d

(4.10)
4.2 Implementation of the controller
The sensor used for detecting and locating the obstacles is a SICK

laser range finder,
whose field of view, starting and finishing angles are shown in the figure 4.3(a). The laser
provides distance measurements up to 5 meters (software limited for this application),
has a field of view of 180° and a resolution of 0,5°. It is then able to characterize
the portion of the environment within its field of view through 360 points. The sick
4.2 Implementation of the controller 31
(a) (b)
Figure 4.3: Laser field of view (a) and lateral view of Rackham (b).
module (see section A.3 of the Appendix A) delivers a structure containing the Cartesian
coordinates (x,y) of all acquired points to the visuserv module.
Several measures are needed for the implementation of the strategies mentioned in
the previous section. We will next describe them and the proposed algorithm to obtain
them.
4.2.1 Detecting and defining an obstacle
The first thing to do is to process the laser raw data to find all the points corresponding
to the obstacle. The method we implemented finds amongst the acquired 360 points
the nearest one P
0
to the robot by simply calculating the modulus of each point. Once
found, the next step is to determinate the obstacle’s boundary points to define the
complete set of points corresponding to the obstacle (see figure 4.4).
Figure 4.4: Obstacle’s points determination.
Starting from the point P
0
, the algorithm calculates the distance between successive
points until the distance is greater than a certain threshold (5 cm in our case). When
4.2 Implementation of the controller 32
this occurs the right border point P
+
has been found. The same procedure is executed
from the point P
0
in decreasing order to find the left border point P

. With these three
points, P

, P
0
and P
+
, the obstacle is located and delimited, but we still need two
additional measures for our methods to properly work.
The first one is the curvature radius of the obstacle in P
0
, which is needed for the Path
following method. To obtain it, we introduce two additional points P
R−
and P
R−
(see
figure 4.5(a), where γ has a value of 5°) and consider the radius of the circle described
by these three points.
(a) Obstacle’s curvature radius at P
0
. (b) Estimation of the obstacle’s center of mass.
Figure 4.5: Additional measures to define an obstacle.
Finally, the last measure to completely define the obstacle is an estimate of the y-
coordinate of the obstacle center of mass. This measure and is used to define the
contouring sense. This estimation is obtained by calculating the y-coordinate Y
ObsCenter
of the center of a circle described by the points P

, P
0
and P
+
, see figure 4.5(b).
If Y
ObsCenter
is negative the robot should contour the obstacle clockwise, and counter
clockwise if positive.
With all these measures the avoidance controller is able to work. Nevertheless, other
issues inherent to the used sensor appeared and are presented along with the proposed
solution in the next section.
4.2 Implementation of the controller 33
4.2.2 Estimation of the obstacle position
A recurrent issue that occurred during the tests were unexpected manoeuvres while
contouring the obstacle: Rackham moving towards and away from the obstacle instead
of following the security envelope. The cause is to be found in the limited field of view of
the laser (see figure 4.3). When the robot is already following the security envelope the
obstacle orientation is around 0° or 180° regarding the laser frame, see image 4.6. With
Figure 4.6: Sudden loss of the obstacle.
small changes in the base orientation in the right direction the obstacle goes outside the
laser field of view,producing sudden changes in the rotation speed of the robot’s base:
• when the obstacle is lost of sight only the visual servoing control law is applied,
which drives the robot towards the target whereas the obstacle is still dangerous
at this moment (see figure 4.6);
• then, the obstacle is again in sight, changing the control completely to the obstacle
avoidance controller, which makes the robot gain distance from the obstacle and
return to the security envelope.
These manoeuvres result in an unnatural and non efficient trajectory. We also remark
that these changes in the rotation speed of the robot’s base are not smooth. This
happens because the smooth transition between controllers relies in a smooth variation
of µ
Coll
, a variable that depends on the distance to the obstacle. Therefore, if the
laser suddenly stops detecting the obstacle µ
Coll
will change steeply. We attempt to
eliminate or reduce these manoeuvres. The proposed solution is presented step-by-step
here below:
• store the information of the obstacle and the odometry of the robot at an instant
t(k),
4.2 Implementation of the controller 34
• detect the loss of the obstacle at t(k + 1),
• using the stored odometry calculate the current position of the robot in the robot’s
base frame at t(k + 1),
• using the stored obstacle data at t(k) and the calculated current position of the
robot in the base frame at t(k) estimate the current position of the obstacle at
t(k + 1).
Then, the laser measurements at the instant t(k) are stored in the vector P
Obs
until
the loss of the obstacle is detected at t(k + 1). The loss detection is done through a
negative step in the value of µ
coll
: when an obstacle is near the robot µ
coll
has a certain
value between 0 and 1, and when the obstacle goes out of the laser field of view its value
decreases steeply. We consider steps with values greater than 0.2. The same criteria is
used to detect when the obstacle is again in sight, only in this case the step is positive.
On the other hand, the kinematic model of our robot is given by the following equa-
tions:
_
¸
¸
_
¸
¸
_
˙
X
M
(t) = υ(t)cos(θ(t))
˙
Y
M
(t) = υ(t)sin(θ(t))
˙
θ(t) = ω(t)
˙
ϑ(t) = (t)
(4.11)
Integrating this system on the interval [t
k
, t
k+1
] and knowing that the velocities remain
constant during this interval, we can calculate Rackham’s position at the instant t(k+1)
expressed in the robot’s base frame at the instant t(k). The result of the integration is:
_
¸
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
¸
_
X
M
(k + 1) = X
M
(k) +
υ(k)
ω(k)
(sin(θ(k) + ω(k)T
s
) −sin(θ(k)))
Y
M
(k + 1) = Y
M
(k) −
υ(k)
ω(k)
(cos(θ(k) + ω(k)T
s
) −cos(θ(k)))
θ(k + 1) = θ(k) + ω(k)T
s
ϑ(k + 1) = ϑ(k) + (k)T
s
(4.12)
where T
s
is the interval duration. With these values and the stored obstacle data P
Obs
(k)
we can calculate the obstacle position in the robots frame at t(k + 1) using the next
equation:
P
Obs
(k + 1) = H
k/k+1
P
Obs
(k) (4.13)
where H
k/k+1
is the following transformation matrix:
H
k/k+1
=
_
¸
¸
_
cos(θ) sin(θ) 0 −X
M
cos(θ) −Y
M
sin(θ)
−sin(θ) cos(θ) 0 X
M
sin(θ) −Y
M
sin(θ)
0 0 1 0
0 0 0 1
_
¸
¸
_
(4.14)
Although it is not exact, because we assume translation and rotation are not per-
formed simultaneously, it is a very good estimation given the small interval and consid-
ering the application does not require high precision.
4.3 Obstacle avoidance test results 35
We present in the following section the tests results of performing vision based navi-
gation through a cluttered environment using artificial landmarks.
4.3 Obstacle avoidance test results
For this controller we performed similar tests as shown in section 3.3 for pure visual
servoing but these tests only used the artificial target. The tests were successfully
performed, although in some cases the robot was guided to a local minimum instead of
the real target.
Figure 4.7: Initial position. Figure 4.8: Final position.
Both implemented methods have very similar results, therefore we only present the
results of a navigation test using the path following method. In this test the linear speed
was limited to 10 cm/s, and the angular speed to 0,2 rad/s. The initial position of the
PTU was 0°. The figures 4.7 and 4.8 present the images obtained at the final and initial
position of the robot.
0 5 10 15 20 25 30
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Time [s]
m
/
s
Linear speed of the mobile base
Figure 4.9: Linear speed of the base.
0 5 10 15 20 25 30
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
Time [s]
r
a
d
/
s
Angular speed of the mobile base
Figure 4.10: Angular speed of the base.
4.3 Obstacle avoidance test results 36
Figures 4.9, 4.10 and 4.11 present the evolution of the three velocities. We can see in
figures 4.10 and 4.11 the smooth evolution of the angular velocities during the obstacle
avoidance between t = 1sec and t = 18sec. The evolution of the normalized error
0 5 10 15 20 25 30
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
Time [s]
r
a
d
/
s
Angular speed of the PTU
Figure 4.11: Angular speed of the PTU.
0 5 10 15 20 25 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Time [s]
e
(
t
)
Normalized error
Figure 4.12: Normalized error.
between the visual references of the current and reference image is presented in the
figure 4.12. A threshold value of 0,12 was set to consider the task as successful.
37
5 Conclusions
During this internship we were able to implement the desired control techniques to
perform navigations through free and cluttered environments with our mobile robot.
An algorithm was developed to detect obstacles using the data provided by the laser
sensor, and to calculate all the needed measures for the two avoidance strategies. A
solution was also proposed for the problem of obstacles leaving the laser field of view
while the robot is contouring the obstacle. And a study was conducted to determine
whether openSURF could be used to detect the visual features of natural landmarks in
the images.
The results of the navigation tests using artificial landmarks were acceptable. Rack-
ham was able to reach his destination even if obstacles were in his path. And the
estimation of the obstacle’s position proved to work properly, drastically reducing the
manoeuvres of the robot while contouring the obstacle. Nevertheless, we believe that
the trajectories can be improved; and to do so, future works should aim to anticipate
the obstacle in order to obtain a more fluid and natural trajectory. Understanding as
natural a trajectory that is short (comparing to the other possibilities) and with smooth
changes of direction.
Regarding the use of natural landmarks, our tests indicated that openSURF is not able
to find the same visual features all along the trajectory if the difference of orientation
between the final image and the current one is greater than approximately 15°. And
that the velocities of the robot, specially the angular ones, must be severely limited
comparing to the velocities when using artificial landmarks. So we suggest two lines of
work to continue this study:
• develop an algorithm that finds several (for example ten) visual features located
in the same plane, selects four of them to perform the control and in case a
visual feature is lost it is replaced with one of the others, without producing a
discontinuity in the control law.
• although the SIFT detector needs more processing power and time to find matches
between pictures, it is more robust regarding orientation than SURF; therefore we
propose to implement the SIFT method and compare the results with the ones we
obtained using openSURF.
Finally, but very important, is that the student gained valuable experience not also
in robot modelling and visual servoing techniques, but also in modular programming
for real time systems through the use of the G
en
oM platform.
38
Appendices
39
A GenoM: Generator of Modules
G
en
oM is an open modular platform designed at the LAAS in the 90’s [Alami et al.,
1998] and developed through the years [Mallet et al., 2002; Fleury et al., 1997] for the
use in real time applications such as autonomous robots and satellites. It was specifically
designed for applications that require:
• the homogeneous integration of heterogeneous functions with real time constraints
and complex algorithms (control of sensors and actuators, servo-controls, data pro-
cessing, trajectory computations, etc.) in the control architecture. To accomplish
this, coherent and predictable behaviours of the functions (starting, ending, error
handling) and also standard interfaces (control, parametrization, data flow) are
required.
• Parallelization management.
• The functions portability and physical distribution.
• Simplicity in the procedures to add, modify, use and reuse the functions in order
to encourage the non-specialists.
Other advantages
1
of G
en
oM are that:
• the module description does not depend on the operating system and does not
require specific knowledge on distributed systems,
• it becomes a very important tool to handle large systems,
• the underlying standardization of the structure, the behaviour and the interface
of the modules allows to automate their integration.
A.1 General overview
G
en
oM allows to easily encapsulate the operational functions in independent modules
that manage their execution. The functions can be dynamically started, interrupted or
parametrized upon asynchronous requests sent to the modules. Every request has an
associated final reply, that is used to qualify how the service was executed.
The modules are standard servers that control the execution of the functions in the
host machine. They can contain several synchronous or asynchronous functions and
execute them in parallel. The decision on how to group several functions in one module
is based on the concept of resource: one module is responsible of only one physical or
1
For more information consult the official website, see [LAAS - OpenRobots, 2012].
A.2 Module Description 40
logical resource, and all the functions pertaining to the control of that specific resource
are grouped into the same module. For example, a single module can be responsible of
the control of a PTU unit, or of a laser measurement sensor.
A set of modules can be seen as an open, controllable and communicating system.
On one hand there is a control flow, which, based on the requests and replies allows to
control all the functions, and whose requests can be sent by different entities: a user,
other modules or an execution controller. And on the other hand there is a data flow
between the elements of the architecture: the data is exported using public structures.
A.2 Module Description
The modules are automatically created by G
en
oM from a synthetic description of the
module that does not require any profound knowledge of real time distributed systems or
of the operating system installed in the machine. Basically, the descriptor file describes
the offered services, the received data and the data produced by the module.
From the module’s descriptor file G
en
oM creates [LAAS - Group RIA, 2012]:
• an executable module able to run in different operating systems (VxWorks, UNIX);
• interface libraries (access to services and data);
• an interactive program for preliminary test purposes.
This automatic generation of modules is based on a generic module template that
complies with the requisites for real time embedded systems: constant execution time,
static memory allocation, non-preemptive critical sections, etc. G
en
oM can also produce
the module’s interface libraries in several languages: C/C++, TCL, PROPICE, amongst
others. Furthermore, this module structuring allows [Ouadah, 2011]:
• to encapsulate each algorithm in the module,
• to separate the development processes: the module can be created even without
having the inner algorithms,
• the partition of the temporal constraints (specified separately for each module),
• to set a hierarchy to the system (modules hierarchy),
G
en
oM allows the rapid prototyping of complex systems; once the system is specified
and structured in subsystems (or modules), their description and later synthesis is al-
most immediate. After this, the algorithm of each module can be incrementally tuned
independently of the whole system. The modules can be tested in a workstation or
directly in the real time system.
A.3 Rackham’s Software Architecture 41
A.3 Rackham’s Software Architecture
The software architecture of Rackham has been developed through many years and many
researchers at the LAAS. This is the reason why there are already so many modules
available to use. For this work, the proposed software architecture is shown in the
figure A.1.
Figure A.1: Rackham’s software architecture.
The robot’s architecture is defined by six modules:
• two modules for the perception level: viam and sick,
• two modules for the action level: rflex and platine
• and the two modules in charge of processing the data: surfvs and visuserv. All
the work during the internship was focused in these last two modules.
Next, we present a brief description of each module:
rflex
This module is used as an interface between the software controllers provided by iRobot.
It delivers a poster with the odometry data corrected by a gyroscope; this provides a
good estimation regarding the robot’s movements. And it imports the control data
(linear and rotation speed of the base) and sends it to the motor’s drivers.
A.3 Rackham’s Software Architecture 42
sick
This module is responsible for controlling the laser measurement sensor. The laser
measurement sensor used is a SICK

LMS-200 2D Laser Range Finder, whose field of
view, starting and finishing angles are shown in the figure 4.3. The module acquires 360
points (a point each 0, 5°) both in Cartesian and Polar coordinates and sends a structure
containing the information to the module visuserv.
platine
This module controls the PTU, performing a speed or position control of both axis
although only pan is used in this work and the tilt angle is set to zero (parallel to the
floor).
viam
The configuration of the camera and the images acquisitions are performed using this
module. A poster sends the images to the desired modules.
surfvs
All the previous modules were already developed. This module is a new one, which has
been developed during this internship to extract and deliver the visual features from
the images provided by the viam module. Artificial and natural targets can be used.
visuserv
This module was already developed, but was largely updated through this internship.
It is the module responsible for the visual servoing and obstacle avoidance controllers.
This module takes the poster from the module surfvs with the visual features and the
poster with the sick module information, then performs the selected control (pure visual
servoing, or also avoiding obstacles) and sends the control values to the platine and rflex
modules.
43
Bibliography
Alami, R., Chatila, R., Fleury, S., Ghallab, M., and Ingrand, F. (1998). An architecture
for autonomy. International Journal of Robotic Research, Vol. 17:315–337.
Bay, H., Tuytelaars, T., and van Gool, L. (2008). SURF speeded up robust features.
Computer Vision and Image Understanding, Vol. 110(3):346–359.
Cadenat, V. (1999). Commande référencée multi-capteurs pour la navigation d’un robot
mobile. PhD Thesis, Université Paul Sabatier - Toulouse III.
Chaumette, F. (1990). La relation vision-commande : théorie et application à des tâches
robotiques. PhD thesis, Université de Rennes I.
Chaumette, F. and Hutchinson, S. (2006). Visual servo control part I: Basic approaches.
IEEE Robotics & Automation Magazine, Vol. 13:82–90.
De Souza, G. N. and Kak, A. C. (2002). Vision for mobile robot navigation: a survey.
Vol. 24(2):237–267.
Durand Petiteville, A. (2012). Navigation référencée multi-capteurs d’un robot mobile
en environnement encombré. PhD Thesis, Université Paul Sabatier - Toulouse III.
Evans, C. (2009). Notes on the opensurf library. Technical report, University of Bristol.
F. Bonin-Font, F. O. and Oliver, G. (2008). Visual navigation for mobile robots: a
survey. Vol. 53(3):263.
Fleury, S., Herrb, M., and Chatila, R. (1997). GenoM: a tool for the specification and
the implementation of operating modules in a distributed robot architecture. Vol.
2:842–848.
Folio, D. (2007). Stratégies de commande référencées multi-capteurs et gestion de la
perte du signal visuel pour la navigation d’un robot mobile. PhD Thesis, Université
Paul Sabatier - Toulouse III.
Horaud, R. (1987). New methods for matching 3-D objects with single perspective views.
IEEE Transactions on Pattern analysis and machine intelligence, Vol. 9(3):401–412.
Juan, L. and Gwon, O. (2006). A comparison of SIFT, CA-SIFT and SURF. Interna-
tional Journal of Image Processing, Vol. 3(4):143–152.
Khatib, O. (1980). Commande dynamique dans l’espace opérationnel des robots ma-
nipulateurs en présence d’obstacles. PhD Thesis, École Nationale Supérieure de
l’Aéronautique et de l’Espace.
Bibliography 44
LAAS - Group RIA (2012). GenoM. URL: http://homepages.laas.fr/sara/ria/ria-
genom.html.fr.
LAAS - OpenRobots (2012). GenoM. URL: http://www.openrobots.org/wiki/genom/.
Lowe, D. (1999). Object recognition from local scale-invariant features. Proceedings of
the Seventh IEEE International Conference on Computer Vision, Vol. 2:1150–1157.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, Vol. 60(2):91–110.
Malis, E., Chaumette, F., and Boudet, S. (1999). 2D 1/2 visual servoing. IEEE Trans.
on Rob and Automation, Vol. 15(2):234–246.
Mallet, A., Fleury, S., and Bruyninckx, H. (2002). A specification of generic robotics
software components: future evolutions of GenoM in the Orocos context.
Ouadah, N. (2011). Le controle visuel appliqué dans la robotique mobile. PhD Thesis,
Ecole Nationale Polytechnique.
Quinlan, S. (1994). Real-time modification of collision-free paths. PhD thesis, Stanford
University.
Samson, Borgne, and Espiau (1991). Robot control: The task function approach. Oxford
engineering science series, Vol. 22.
Samson, C. (1992). Path following and time-varing feedback stabilization of a wheeled
mobile robot. ICARCV.
45
Index
G
en
oM, 29
Artificial landmarks, 15
Artificial rotating potential field, 23
Camera control law, 25
Camera model, 10
Cluttered environment, 21
Conclusions, 27
Control law, 13
Global control law, 25
Image features, 15
Initialization Request, 17
Interaction Matrix, 11
Introduction, 1
Launch SURF Request, 17, 20
Module Description, 30
Natural landmarks, 16
Obstacles anticipation controller, 26
Obstacles avoidance Control Law, 22
Obstacles avoidance implementation, 26
Organization, 4
Path following, 22
Project theoretical bases, 5
Rackham’s Software Architecture, 30
Robot Description, 5
Robot model, 6
SURF Implementation, 16
Task function formalism, 13
Visual servoing control law, 12
46
Abstract
Obstacles avoidance strategies for visual servoing of a mobile
robot
Keywords: visual servoing, obstacle avoidance, natural landmarks, SURF.
Achieving complex navigation tasks in unknown environments is a very rich issue.
Different approaches can address this problem. One of them is to control the robot
using the visual information provided by an on-board camera. This technique, called
visual servoing, is limited where the environment is cluttered. In such a case, it is
advisable to consider information from proximetric sensors such as a laser rangefinder
plus the visual data to perform the navigation task.
In this internship we worked with an non-holonomic mobile robot, implementing a
2D visual servoing control and two validated strategies to avoid static obstacles: path
following and artificial rotating potential field. And we also explored the feasibility of
using the SURF visual features detector to perform vision based navigation tasks using
natural landmarks.
Stratégies d’évitement d’obstacles pour l’asservissement visuel
d’un robot mobile.
Mots-clés: asservissement visuel, évitement d’obstacles, amers naturels, SURF.
La réalisation de tâches de navigation complexes dans des environnements inconnus
constitue une problématique très riche. Différentes approches permettent de répondre à
ce problème. L’une d’entre elles consiste à commander le robot sur la base d’informations
visuelles fournies par une caméra embarquée. On parle alors d’asservissement visuel. Ce
type d’approche s’avère néanmoins limitée lorsque l’environnement est encombré. Dans
un tel cas, il faut considérer des informations issues de capteurs proximétriques tels
qu’un télémètre laser en plus des données visuelles pour réaliser la tâche de navigation.
Dans le cadre de ce stage, nous avons travaillé sur un robot mobile non holonome, sur
lequel nous avons implémenté un asservissement visuel 2D et deux stratégies d’évitement
d’obstacles: le suivi de chemin et la mèthode potentiels rotatifs. On á aussi exploré la
possibilité d’utiliser le détecteur de caractéristiques visuelles SURF pour effectuer des
tâches de navigation référencée vision en utilisant des amers naturels.