You are on page 1of 8

2017 Indian Control Conference (ICC)

January 4-6, 2017. Indian Institute of Technology, Guwahati, India

Pose Estimation for an Autonomous Vehicle using


Monocular Vision

Nikunj Kothari, Misha Gupta, Leena Vachhani and Hemendra Arya

Abstract—Vision-based pose estimation is particularly important communication is not always reliable and time-consuming, it is
for small sized mobile robots and aerial vehicles that have limited essential for computing all tasks of the control framework on-
payload constraints. Accurate localization is of importance for any
board. Two separate sensing solutions are presented in [4] towards
autonomous vehicle, especially in indoor and GPS-denied (Global
Positioning System) environment. The aim of this work is to provide an integrated helicopter system capable of indoor flight using both
a solution for the pose estimation of an autonomous vehicle using laser-rangefinder and a stereo camera system on-board. The visual
monocular vision alone so that even a small-size vehicle can perform framework [5] has been optimized for an embedded solution to
pose estimation without the help of additional sensors. Unlike Visual satisfy the constraint of limited on-board processing power. A
Odometry which operates on a sequence of images to estimate
robot’s motion, the proposed method estimates the robot’s pose using vision based framework [6] estimates the five degrees-of-freedom
individual images by constantly tracking four fixed feature points pose using a camera mounted on a quadrotor helicopter. [7]
in a rectangular pattern, whose positions are known a priori. Also, proposes a vision-based state estimation approach which combines
unlike Monocular Visual Odometry which suffers from scale ambiguity the advantages of monocular vision with that of stereo vision. [8]
problem, the proposed algorithm can estimate the pose of the vehicle
and can be applied to both planar robots (3-DoF) as well as aerial
identifies window as an object of interest and estimate MAV’s
vehicles (6-DoF). pose with respect to it. They developed safe path-planning method
using the information provided by the GPS and the on-board
The proposed method has been validated by implementing on
a Raspberry Pi2 model B and a RPi camera. Simulation results of
inertial and stereo vision sensors. Instead of using stereo vision to
controlling a differential drive robot using the proposed pose estimation estimate the pose, we are able to estimate the pose using monocular
method are also presented. vision. Real-time localization for the autonomous navigation of
Keywords—Autonomous, Localization, Pose, Estimation
a MAV [9] has been achieved on a mobile processor using an
on-board computing unit(1.6 GHz Atom processor) and multiple
sensors (laser, camera, and IMU). A scanning laser range sensor
I. I NTRODUCTION retrofitted with mirrors is used as a primary source of information
Autonomous vehicles with small form factor, quick response for position and yaw estimation. This is followed by simultaneous
and the ability to operate remotely in a difficult and challenging localization and mapping algorithm(SLAM). A landmark-based
environment have wide applications. To accomplish an autonomous monocular localization technique [10] uses the time-consuming
task, the vehicle should have the capability to determine its pose Scale Invariant Feature Transform (SIFT) for absolute pose estima-
using sensors mounted on the vehicle. Accurate pose estimation of tion. Hybrid pose estimation approach [11] uses colored markers,
the autonomous vehicle still remains a significant challenge, espe- mobile accelerometers, and multi-sensor data fusion techniques.
cially in Global Positioning System (GPS) denied environments. Two different fusion algorithms for the pose estimation have been
proposed based on stereo vision and monocular vision. Monocular
Small size vehicles like Micro Aerial Vehicle (MAV) have vision [12] has been used to extract edges and compared with
major constraint of limited payload carrying capacity. Therefore, known 3D model of the environment followed by particle filter
a wise selection of sensor(s) and processing unit is important. for localization. The autopilot uses an optic flow-based vision
Equally important is an on-line and real-time estimation of the system [13] for pose estimation based on autonomous localization
pose of the vehicle. and scene mapping. It requires implementation with 3 Nested
An acoustic-based solution for pose estimation [1] has been Kalman Filters (3NKF) for robust estimation. Real-time visual-
developed, that which requires multiple geometrically-conditioned inertial navigation using an iterated EKF has been shown in [14]
observations for applying lateration techniques using localiza- with a complexity which grows with the number of features. The
tion. A robust and accurate position-orientation pose estimation more efficient approaches in [15] considered pairwise images for
method [2], requiring computationally expensive acoustic array visual odometry and fused the output with inertial measurements
signal processing techniques. Vision-based positioning arises as a in an EKF. Altimeter is also used to measure unknown scale. The
complementary navigation subsystem and plays a supplementary IMU measurements [16], [17] have been coupled with EKF SLAM
role in the GPS-denied environment. Also, it is able to free the framework. However, the computational cost of EKF SLAM is
system from relying on the external positioning devices as well as O(N 2 ) for N features. The need is to estimate pose of the vehicle
provide the MAV a natural sensing method for object detection using a light weight sensor requiring minimal processing.
and tracking. Monocular Camera, being a light weight and an
inexpensive sensor providing rich information has been used for The proposed vision based technique for pose estimation is an
completely autonomous flights [3] for indoor and GPS denied iterative approach that can be applied to both ground and aerial
environments. However, the vision algorithms are executed off- robots. Navigation of a mobile robot to a desired final position
board on a ground station that establishes wireless communication is shown using simulations in the paper. Experimental result of
to send the control commands to the vehicle. Since wireless estimating the pose or a camera are also included in the paper.
978-1-5090-1795-9/16/$31.00 ©2017 IEEE 424
Simulations and experiments are performed in an environment with respect to the centre of image plane by an angle
where corner points of a square window are used as key features equal to the camera roll angle. All the internal angles are
to localize a differential drive robot. Although the paper presents unchanged, thus ζ1 = ζ2 = ζ3 = ζ4 = 90◦ .
the algorithm based on rectangular features, the proposed technique
can be extended to any shape with more than three distinct features. • Effect of non-zero camera yaw on the captured image:
The non-zero yaw angle of camera distorts the image and
This approach can perform autonomous flight navigation in changes its internal angles. If there is a positive yaw, then
indoor environments, solely using a monocular camera as sensor ζ1 and ζ4 are less than 90◦ , while ζ2 and ζ3 are more than
while all computation is done on the on-board Raspberry Pi. In 90◦ .
the simulation of a differential drive robot, pose estimation error
in robot’s position in x and z directions are 3.7 × 10−4 relative • Effect of non-zero camera pitch on the captured
units and 7.3 × 10−5 relative units respectively. The estimation image: Unlike the effects of non-zero translation and roll,
error in robot’s heading is 0.043◦ . In the experimental validation of pitching the camera distorts the image and changes the
this approach in 3D, position error of less than 0.02m is achieved. internal angles. If there is a positive pitch in the camera,
The proposed algorithm takes on average less than 0.1 sec for then ζ1 and ζ2 are less than 90◦ , while ζ3 and ζ4 are
estimating the camera position and orientation on a Raspberry Pi more than 90◦
Model B computer. Computationally inexpensive, low processing
time requirement and good precision are the major advantages of It is fairly easy to compute the pose variables when only one
the proposed technique that support on-board execution. variable is non-zero. However, the estimation of pose is challenging
when multiple non-zero variables have coupled effect on the shape
Our work starts by studying the effect of changes in camera of window image. We next describe our proposed technique to
pose to distortions in image, followed by methodology to estimate estimate the pose in real scenarios when the vehicle can have
the camera pose in Section II. Explanation of image correcting multiple non-zero position and orientation values. We assume that
algorithm and the underlying maths behind them is presented the camera is rigidly mounted on the vehicle. Hence, estimating
in Section III. Validation of the algorithm using simulation and the pose of the camera estimates the pose of the vehicle.
experimental results is done in Section IV-C. Section V concluded
the paper by highlighting the advantages of the proposed algorithm,
along with directions for future work. A. Estimation of camera pose

The proposed pose estimation iteratively corrects the image


II. P ROPOSED POSE ESTIMATION TECHNIQUE of the window that would appear on the camera, if the camera
We estimate pose of the vehicle using the projection of four had no rotation with respect to the image source. The objective
feature points on the image plane. The feature points are the corners is to obtain the final image of the window as square after the
of a door, window or any rectangular object. Fig 11 describes the iterative corrections in pitch and yaw angles. The centre of the
convention used for window image throughout the paper. (xj , yj ) square in final window image gives the translation along X and
are the pixel values of the four window corner points. ζj represent Y axis. The area of the final window image serves as an estimator
the internal angle formed by the window shape at the j th vertex. of the translation of the camera along Z axis. With the knowledge
The x and y axis are along the direction shown in Fig 11, and z of the focal length of the camera and the global coordinates of
axis is orthogonal to the plane of the paper, as arrived by the right the corners of the square window, estimation of position with a
hand rule. For a rectangular window, j = 1, 2, 3, 4. very high accuracy is obtained using the proposed technique. We
next present the estimation of each variable describing the pose of
When the camera is rotated with respect to the window, the vehicle.
captured image of a square window is not a square. However, it is
a simple quadrilateral. The proposed method iteratively transforms • z−position estimation: Since the knowledge of z posi-
the captured image such that the four edge points of the quadrilat- tion of the camera is of importance while estimating the
eral again forms a square. The method draws its inspiration from camera yaw and pitch, the z position of the camera, as
studying the independent effects of camera translation and rotation seen in Fig 10, is estimated using (1).
on the image of a square window. r
f 2 s2
• Effect of camera translation on the captured image: zest = , (1)
Area
The translation of the camera along the X and Y axis
changes the centroid location of the window image in where, s is the actual scale of the square in the global
the image plane, while translation along the Z direction frame, f is the focal length of the camera, and Area is
changes the scale of the window image. Translation does the area of the quadrilateral image in the image plane.
not distort the image. In particular, the shape of the object • Roll angle γ estimation:
is preserved and the internal angles of the shape obtained The angles that each of the four line segments of the
by joining the adjacent feature points on the image are quadrilateral makes with the x-axis gives the estimate
unchanged. Now, we have ζ1 = ζ2 = ζ3 = ζ4 = 90◦ for for the roll angle. Therefore, roll angle γ is estimated
translation along any axis. as follows:
• Effect of non-zero camera roll on the captured image: Let
Camera Roll, similar to camera translation, does not
i
yji − y(j+1)%4
mj = i , (2)
distort the image, however it rotates the window image xj − xi(j+1)%4

425
for each j = 1, 2, 3, 4. The operator % in (2) gives the • x− and y− position estimation: The centroid of the
remainder. Now, the estimation of roll angle, γest is given image polygon is given by
by (3). P4 P4 !
j=1 xj j=1 yj
180◦ − tan−1 ( 4j=1 mj )
P (xc , yc ) = , (6)
4 4
γest = . (3)
4
Using the pinhole camera model, the corresponding x and
• Yaw angle β estimation: The estimation of yaw angle y coordinates are given by
β is obtained by noting the variation of eβ described by
xc .zest yc .zest
(4). xest = , yest = (7)
f f
eβ = ζ1 + ζ4 − ζ2 − ζ3 (4)
where f is the focal length of the camera
The choice of variable eβ is motivated by the observation
that eβ is mainly affected by the changes in yaw angle III. P ROPOSED ITERATIVE ALGORITHM
β and z−position of the camera (Fig 10). The variation
of eβ with the camera yaw angle for a fixed z is shown Once an estimate of camera states is achieved as described in
in Fig. 1 . The variation of the four coefficients,1 of the Section II, the image is corrected according to the estimated camera
third order polynomial with z (Fig 10) is simulated, and states. The image is iteratively corrected to reduce the distortion
stored in form of another polynomial of suitable degree, in the shape of window’s image. As the distortion is smaller, the
or in the form of a look-up table. Either one of these can coupling between the several modes of distortions (camera pitch,
be selected to optimize memory usage or computational yaw and roll) also reduces, thereby an accurate result is obtained
time. In our simulations, we use the look-up table method in a small number of iterations (shown in Section IV-C).
to estimate camera yaw (β).
• Roll correction: Once an estimate of the roll angle
The variation of each coefficient
is achieved, image coordinates are multiplied with the
Variation of ebeta with Yaw inverse rotation matrix Rz−1 such that
50  rc    
Original Graph xj cos γest sinγest 0 xj
rc
yj  = − sin γest cos γest 0  yj  (8)
40 Second Order Interpolation
Third order Interpolation
1 0 0 1 1
30

20
where (xrc rc
j , yj ) refers to Roll corrected corner points
coordinates, and xj , yj refers to the vertex coordinates
Camera Yaw Angle

10 for each j = 1, 2, 3, 4.
0
• Yaw correction : After arriving at an estimate for the
−10 yaw angle βest , the image is corrected for yaw using the
equation (32), by substituting βest in place of β
−20
xi cos βest + f sin βest
−30 xyc
i = f( ) (9)
f cos βest − xi sin βest
yc
−40 x
yiyc = yi ( i ) sin βest + yi cos βest (10)
−50 f
−20 −15 −10 −5 0 5 10 15 20
ebeta where (xyc yc
j , yj ) refers to yaw corrected vertices for each
j = 1, 2, 3, 4.
Fig. 1: Variation of eβ with Yaw
• Pitch correction : Similar to the yaw case, the pitch
is corrected by the transformations described by the
equations (30), by substituting αest in place of α
• Pitch angle α estimation: The pitch angle estimation yi cos αest + f sin αest
is fundamentally similar to yaw estimation, however the yjpc = f ( ) (11)
f cos αest − xi sin αest
variable eα described by (5) is used for estimating pitch pc
yj
angle. xpc
j = xi ( ) sin αest + xi cos αest , (12)
f
eα = ζ3 + ζ4 − ζ1 − ζ2 (5)
where (xpc pc
j , yj ) refers to pitch corrected vertices for each
The choice of angle eα is motivated by the observation j = 1, 2, 3, 4.
that the variation in angle eα is mainly due to pitch angle
α and z−position. A. Overview of the Algorithm
Now the objective is to minimize the variables eβ and
eα described by (4) and (5) respectively by applying the The algorithm starts by capturing the pixel coordinates of the
corrections in the image as described in the next section. four feature points in the image, which are denoted by xj , yj , for
The amount of correction required to minimize the angles each j = 1, 2, 3, 4. The number of iterations of the corrective
eβ and eα render the values of β and α respectively. algorithm, is counted by the variable i, and imax fixes the total
number of iterations. The algorithm has an extremely fast rate of
1A 3rd degree interpolation polynomial is of the form ax3 +bx2 +cx+d convergence, and for all practical purposes, imax can be set to
426
any value between 3-10. For the simulation results presented in the image.
Section IV-C, imax is set to 5. Before entering the correction loop,  
cos γ cos β − sin γ cos α sin α sin γ
i is initialized to 1, and Rest matrix is initialized to I, the identity  − cos γ sin β sin α − cos α cos γ sin β 
matrix. Fig 2 shows the flowchart of proposed algorithm. Once 



the algorithm enters the correction loop, it estimates the camera z  
sin α cos β cos α cos γ − sin α cos γ  (16)
position using the equation described in (1), and the estimate for  
i
 − sin α sin β sin γ − sin γ sin β cos α 
z is referred as zest . 



After an estimate of the camera z position is acquired, the sin β cos β sin α cos β cos α
algorithm estimates the camera roll angle as described in (3).
i
This estimate of Roll angle is referred by γest . New values of START i = 1, αest = 0,
βest = 0, γest = 0, Rest = I
roll corrected pixel coordinates are generated using (8), which
are passed to the yaw estimation block of the algorithm. Rest is NO
Estimate x using (7)
i i Estimate y using (7)
updated by multiplying Rest by Rz (γest ), where Rz (γest ) is the i < imax Estimate Yaw β using (13)
Estimate Roll γ using (14)
rotation matrix as described in (25). Since roll correction does not Estimate Pitch α using (15)
YES
change the scale of the image, camera z is not estimated after i
Find zest using (1)
correcting for roll. EXIT

i
Estimate Roll γest using (3)
Using roll corrected pixels, Yaw angle is estimated by the
i
process described in (4). This estimate of yaw is referred as βest . Correct image coordinates for Roll using
i
Yaw correction is accomplished using (9) (10). At this step, Rest (8) and Rest = Rest Rz (γest )
i
is updated to Rest .Ry (βest ). i
Estimate Yaw βest using (4)

Since yaw correction changes the scale of the image, camera Correct image coordinates for Yaw using
i
(9),(10) Rest = Rest Ry (βest )
z is again estimated by (1) to improve subsequent estimation of
camera Pitch, which is estimated by the procedure described in (5). i
Find zest using (1)
i
This estimated value of pitch is referred as αest , and the image
is corrected for pitch using (11) (12). Estimate of camera z is i
Estimate Pitch αest using (5)
updated by applying (1) to the pitch corrected pixel points. Rest
i
is also updated to Rest .Rx (αest ) at this stage. Correct image coordinates for Pitch using
i
(11),(12) Rest = Rest Rx (αest )
At the end of the loop, camera x and y coordinates are
i
Find zest using (1)
estimated using (7). The estimated x and y coordinates are referred
as xiest , yest
i i
. The value of zest remains the same. The iteration
i=i+1
counter i is incremented by 1, and the pitch corrected pixels are
then fed back to the beginning of the loop. Fig. 2: Flowchart for the iterative image correction algorithm

After completing (imax ) number of correction cycles, the


algorithm estimates the final value of camera x, y and z. The final
estimates for these are the last estimated value of xiest , yest
i
and
i imax imax imax
zest , i.e. xest , yest and zest . The final values are referred by IV. S IMULATION AND EXPERIMENTAL RESULTS
xest , yest and zest We demonstrate the working of the proposed iterative technique
for estimating pose by simulating following two cases:
Final estimates of the camera angles use the elements of the
final Rest matrix. The final estimation of the camera orientation • Case 1: This case considers zero translation along X
using (16) are given as follows: and Y axis. We consider following pose for the camera:
Pitch (α) = 40.10◦
β = sin−1 (Rest31 ) (13) Yaw (β) = 28.65◦
Rest21 Roll (γ) = −40.10◦
γ = sin−1 ( ) (14)
cos β Position (x, y, z) = (0, 0, 100)
Rest32 The result of each iteration i is presented in Table I. The
α = sin−1 ( ) (15) table entries show the position of centroid of quadrilateral
cos β
in x and y. The z column shows the estimate of z obtained
where, Restij refers to the element at ith row and j th column in by computing the area of the quadrilateral. The internal
the Rest matrix. These equations have singularity at β = 90◦ . angles of the quadrilateral are shown in columns ζ1 , ζ2
In practice when β = 90◦ it means the camera is looking and ζ3 . The angle ζ4 can be computed from three internal
perpendicularly to the window, hence no feature points will be angles ζ1 , ζ2 and ζ3 . The estimated values of roll, yaw
detected. and pitch are shown in columns γ, β and α respectively.
Fig. 3 shows the correction in the image at each step
These equations are derived by looking at the elements of the of each iteration. The centre of square obtained after the
matrix Rγ Rβ Rα , since this is the order in which we are correcting fourth iteration is located at the origin. This renders the
427
estimation of (x, y) as (0, 0). The values of αest , βest A. Validation for any orientation
and γest are 40.10◦ , 28.65◦ , −40.10◦ respectively.
To ensure that the algorithm converges for all possible ori-
entations of the camera, we simulate the proposed algorithm at
TABLE I: Case 1: Result after each iteration an exhaustive number of elements in the domain of interest. Our
domain of interest is γ, β, α ∈ [−45◦ , 45◦ ]. We simulate our
i i
i x y z ζ1 ζ2 ζ3 γest βest αiest proposed technique for all combinations of γ, β and α in the
1 13.68 12.07 72.06 96.8 89.8 91.9 -50.99 22.92 25.21
2 0.42 0.35 97.61 91.0 91.0 89.0 5.16 9.74 9.74 domain of interest with 1◦ increment between the combinations.
3 0 0 100 90.0 90.0 89.9 0.57 0 0 Therefore, we consider 90 angles for each α, β and γ, thus giving
4 0 0 100 90 90 90 0 0 0 us a total of 90 × 90 × 90 = 7, 29, 000 number of combinations,
5 0 0 100 90 90 90 0 0 0
within the domains of interest. The L2 norm of error between the
ideal image and the final image obtained by the iterative algorithm
after 10 iterations is obtained using (17) for each combination.
4
0.16 X
Initial Image eimage = ||(xjideal − xj ), (yjideal − yj )||2 , (17)
0.14 Roll Correction j=1
Yaw Correction
Pitch Correction where (xj , yj ) are the image coordinates obtained by the proposed
0.12
0 algorithm, and xjideal , yjideal are the ideal image coordinates.
0.1 Fig. 4 shows the norm of error eimage for each combination. Fig. 4
0.08
1 shows that the norm of the error is of the order of 10−6 . We next
1
0.06 −6 Variation of eimage in the domain
x 10
1.4023
0.04

0.02 2 21 1.4023

0 32

−0.02 1.4023
−0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
eimage

Fig. 3: Figure illustrating iterative correction of image; numbers 1.4023

inside the quadrilaterals show the iteration count


1.4023

• Case 2: This case considers non-zero value for each vari-


1.4023
able. We consider following camera pose for simulation: 0 1 2 3 4 5 6 7 8
Pitch (α) = 34.38◦ Validation Count 5
x 10
Yaw (β) = −22.91◦
Roll (γ) = 28.66◦ Fig. 4: Error norm for each combination of orientations
Position (x, y, z) = (20,40,150)
The results after each iteration as in Table I for case 1
is presented in Table II. Table II shows that the image present experimental results obtained using RPi camera.

B. Experimental results
TABLE II: Case 2: Result after each iteration
i i
This section gives details of two experiments performed using
i x y z ζ1 ζ2 ζ3 γest βest αiest
1 31.48 34.83 141.9 86.5 97.7 81.6 30.37 -18.33 32.66
RPi camera interfaced with RPi 2 model B development board
2 20.86 40.08 151.0 89.9 90.1 89.8 0.57 -4.58 1.15 to show the repeatability of proposed technique. This experiments
3 20.01 39.99 149.9 90 90 90 0.57 0 0 are performed to estimate the 3-dimensional position of the cam-
4 20 40 150 90 90 90 0 0 0
era. The RPi camera is roughly placed at a relative position
5 20 40 150 90 90 90 0 0 0
(x, y, z) = (10 cm, 20 cm, 80 cm) with respect to the center of
the square-shaped mark on a paper. Fig 5 shows the captured image
of window is corrected to a perfect square shape after (right side)and iterative correction in shape of captured image (left
the fourth iteration. The centre of the square is located at side). Table III shows the result after each iteration. The position
(20, 40) which is the estimation of position (x, y). estimates after the fifth iteration using the proposed technique are
The values of αest , βest and γest are 34.38◦ , −22.91◦ , as follows: x∗ = 10.49 cm, y ∗ = 18.72 cm and z ∗ = 80.52 cm.
28.66◦ respectively.
The second experiment roughly places the Rpi camera at a
We next validate the claim of obtaining good estimate of pose for relative position of (x, y, z) = (10 cm, 10 cm, 60 cm) with
any orientation of the camera. respect to the center of the square mark on a paper. Fig. 6 shows the
428
Next section simulates a controller that takes feedback as
estimated pose using proposed technique.

C. Unicycle robot control using proposed iterative technique


The pose of a planar robot is described by three variables:
position (x, y) and orientation θ. The unicycle robot is controlled
by linear velocity (ν) and angular velocity (ω). The dynamics of
unicycle model is given as follows:
Ẋ = −ν sin θ (18)
Ż = ν cos θ (19)
θ̇ = ω (20)
Fig. 5: Experiment 1: Image correction at each iteration and
captured image; numbers inside the quadrilaterals in left image We simulate a unicycle robot that estimates its position and heading
show the iteration count using the iterative algorithm presented in the previous section. The
objective of the robot is to reach the final position described by
(xf , yf ).

TABLE III: Experiment 1: 3-dimensional position estimation Let the estimated pose using the proposed technique be
(x∗ , y ∗ , θ∗ ). The linear velocity ν of the robot is controlled by
i x y z ζ1 ζ2 ζ3 ζ4 a simple proportional gain kv such that the commanded velocity
1 12.05 15.99 84.73 88.50 88.18 88.41 94.90
2 10.21 18.17 81.15 88.99 90.44 89.94 90.61 has the form:
3 10.57 18.68 80.73 89.53 90.39 89.52 90.54 p
4 10.47 18.71 80.52 89.59 90.39 89.62 90.38 ν = kv ∗ ( (x∗ − xf )2 + (z ∗ − zf )2 ). (21)
5 10.49 18.72 80.52 89.60 90.39 89.60 90.39
Similarly, the angular velocity ω is controlled by the proportional
gain kw and given by
captured image (right side) and the corrected shape of the image xf − x∗
ω = kw ∗ (atan( ) − θ∗ ) (22)
at each iteration (left side). Table IV presents the result after each zf − z ∗
iteration. The position estimates after the fifth iteration using the For the simulation, the robot starts at its initial position (x, z) =
proposed technique are as follows: x∗ = 8.0 cm, y ∗ = 10.26 cm (20, −100) with θ = 0◦ . The final position is set at (xf , zf ) =
and z ∗ = 55.71 cm. Since true position measurements are (0, −50).
inaccurate, the results are satisfactory.
Fig 7 shows the comparison of the actual trajectory the robot
takes to reach the desired home position when the feedback to the
TABLE IV: Experiment 2: 3-dimensional camera position controller is the actual pose, and compares it with the trajectory
estimation generated by taking feedback as the pose estimated using the
proposed iterative technique. This shows that the control of the
i x y z ζ1 ζ2 ζ3 ζ4
1 11.38 12.69 57.87 88.29 86.93 89.69 95.06 robot using proposed technique is similar to the ideal case. Fig. 8
2 7.65 9.85 54.29 90.84 89.88 91.31 87.95 shows the actual heading of the robot and the estimated heading
3 8.04 18.68 55.82 89.98 89.92 89.98 90.10 obtained using the iterative algorithm as the robot moves from
4 8.00 18.71 55.69 90.07 89.94 90.07 89.91
5 8.00 10.26 55.71 90.05 89.93 90.05 89.94 its starting position towards its home position. The Fig. 9 shows
the error in estimating heading of the robot. The error plot in
Fig. 9 shows that the error is small for initial values as the heading
angle is close to zero. However, when the robot turns towards the
final position, the error increases to a maximum of 0.043◦ . The
error does not increase further as the heading angle is fixed after
2 sec. The small error of the order of 0.045◦ does not give further
deviation from the ideal trajectory as shown in Fig. 7.

V. C ONCLUSION
An accurate pose information is always beneficial for control-
ling an autonomous vehicle. This paper proposes a novel technique
for estimating the pose of an autonomous vehicle using monocular
vision. The iterative technique presented in this paper is aimed to
be used in an on-board processor in small size vehicles with limited
Fig. 6: Experiment 2: Images after correction and captured payload constraint. The objective of the proposed technique is to
image; numbers inside the quadrilaterals in the left image show estimate the pose of the vehicle using a computationally inexpen-
the iteration count sive and requiring low processing time method with high accuracy.
The simulation and experimental results validate these objectives.
429
Future work involves implementing the proposed technique on an
Robot Trajectory MAV for autonomous navigation in indoor environment. Experi-
−50
Actual Robot Trajectory
ments such as passing through a window/door are feasible using
−55 Estimated Robot Trajectory
the proposed technique.
−60

−65

−70
A PPENDIX
Z−Axis

−75

−80
A PPENDIX A
−85
C OORDINATE F RAMES AND I MAGE CONVENTIONS
−90

−95 Let the pitch, yaw and roll angles be α, β and γ respectively.
−100 We fix the camera rotation order as pitch (Rx ) → yaw (Ry ) →
0 5 10 15 20 25
X−Axis roll (Rz ), where positive pitch, yaw and roll gives the rotation
matrices as
Fig. 7: The robot trajectory: Ideal and controlled using proposed  
technique 1 0 0
Rx = 0 cos α − sin α (23)
0 sin α cos α
 
cos β 0 − sin β
Ry =  0 1 0  (24)
sinβ 0 cos β
35  
Actual Robot heading cos γ −sinγ 0
Estimated Robot heading
30 Rz =  sin γ cos γ 0 (25)
0 0 1
Robot Heading in Degrees

25

20
and the final rotation matrix R = Rx Ry Rz is given in (26)
 
15
cos β cos γ − cos β sin γ − sin β
 
10  
− sin α sin β cos γ sin α sin β sin γ − sin α cos β 
 
5
 + cos α sin γ
R= + cos α cos γ 

 
0
 
0 2 4
Time (sec)
6 8 10  cos α cos γ sin β − sin γ cos α sin β cos α cos β 
+ sin α sin γ + sin α cos γ
Fig. 8: The robot heading: Ideal and controlled using proposed (26)
technique

Error in heading estimation with time


0.045

0.04

0.035

0.03
Error in Degrees

0.025

0.02

0.015

0.01

0.005

−0.005
0 2 4 6 8 10
Time (sec)

Fig. 9: Error in the robot heading with time Fig. 10: Window and Camera in Global frame

430
x2 , y 2 x1 , y 1 [3] M. Blösch, S. Weiss, D. Scaramuzza, and R. Siegwart, “Vision
ζ2 ζ1 based mav navigation in unknown and unstructured environments,”
in Robotics and automation (ICRA), 2010 IEEE international con-
ference on, pp. 21–28, IEEE, 2010.
[4] M. Achtelik, A. Bachrach, R. He, S. Prentice, and N. Roy, “Stereo
vision and laser odometry for autonomous helicopters in gps-
x denied indoor environments,” in SPIE Defense, security, and sensing,
pp. 733219–733219, International Society for Optics and Photonics,
y 2009.
[5] M. Achtelik, M. Achtelik, S. Weiss, and R. Siegwart, “Onboard imu
ζ3 ζ4 and monocular vision based control for mavs in unknown in-and
outdoor environments,” in Robotics and automation (ICRA), 2011
x3 , y 3 x4 , y 4 IEEE international conference on, pp. 3056–3063, IEEE, 2011.
[6] D. Eberli, D. Scaramuzza, S. Weiss, and R. Siegwart, “Vision
Fig. 11: Camera image notations based position control for mavs using one single circular landmark,”
Journal of Intelligent & Robotic Systems, vol. 61, no. 1-4, pp. 495–
512, 2011.
[7] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Vision-
A PPENDIX B based state estimation for autonomous rotorcraft mavs in complex
YAW AND P ITCH CORRECTION DERIVATION environments,” in Robotics and Automation (ICRA), 2013 IEEE
International Conference on, pp. 1758–1764, IEEE, 2013.
A. Pitch Correction [8] S. Zhou, G. Flores, E. Bazan, R. Lozano, and A. Rodriguez, “Real-
time object detection and pose estimation using stereo vision. an
The co-ordinate of a point P , which has coordinates application for a quadrotor mav,” in 2015 Workshop on Research,
(xp , yp , zp ) in the global frame, where the camera is located at Education and Development of Unmanned Aerial Systems (RED-
a point (xc , yc , zc ) in the global frame, and has a pitch angle α. UAS), pp. 72–77, IEEE, 2015.
From the pinhole camera model [9] S. Shen, N. Michael, and V. Kumar, “Autonomous multi-floor indoor
navigation with a computationally constrained mav,” in Robotics and
f.xr f.yr automation (ICRA), 2011 IEEE international conference on, pp. 20–
xi = , yi = (27) 25, IEEE, 2011.
zr zr
[10] A. Wendel, A. Irschara, and H. Bischof, “Natural landmark-based
where xr = (xp − xc ), yr = (yp − yc ) and zr = (zp − zc ). monocular localization for mavs,” in Robotics and Automation
Similarly, (ICRA), 2011 IEEE International Conference on, pp. 5792–5799,
IEEE, 2011.
f.xrp f.yrp
xip = yip = (28) [11] J. Li, J. A. Besada, A. M. Bernardos, P. Tarrı́o, and J. R. Casar,
zrp zrp “A novel system for object pose estimation using fused vision and
where, inertial data,” Information Fusion, vol. 33, pp. 15–28, 2017.
    
xrp 1 0 0 xr [12] A. Buyval and M. Gavrilenkov, “Vision-based pose estimation for
 yrp  = 0 cos α − sin α  yr  (29) indoor navigation of unmanned micro aerial vehicle based on the
3d model of environment,” in 2015 International Conference on
zrp 0 sin α cos α zr
Mechanical Engineering, Automation and Control Systems (MEACS),
From (27), (28) and (29) we get, pp. 1–4, IEEE, 2015.
[13] F. Kendoul, I. Fantoni, and K. Nonami, “Optic flow-based vision
xr yr cos α − zr sin α system for autonomous 3d localization and control of small aerial
xip = f ( ), yip = f ( )
yr sin α + zr cos α yr sin α + zr cos α vehicles,” Robotics and Autonomous Systems, vol. 57, no. 6, pp. 591–
(30) 602, 2009.
yip cos α + f sin α yi [14] D. Strelow and S. Singh, “Motion estimation from image and inertial
yi = f ( ), xi = xip ( ) sin α + xip cos α measurements,” The International Journal of Robotics Research,
f cos α − xip sin α f vol. 23, no. 12, pp. 1157–1195, 2004.
(31)
[15] A. I. Mourikis, N. Trawny, S. I. Roumeliotis, A. E. Johnson,
A. Ansar, and L. Matthies, “Vision-aided inertial navigation for
B. Yaw Correction spacecraft entry, descent, and landing,” IEEE Transactions on
Robotics, vol. 25, no. 2, pp. 264–280, 2009.
Following a similar procedure as described in Section A, we [16] P. Piniés, T. Lupton, S. Sukkarieh, and J. D. Tardós, “Inertial aiding
get the formulae for Yaw correction as of inverse depth slam using a monocular camera,” in Proceedings
2007 IEEE International Conference on Robotics and Automation,
xip cos β + f sin β xi
xi = f ( ), yi = yip ( ) sin β + yip cos β pp. 2797–2802, IEEE, 2007.
f cos β − xip sin β f
[17] T. Bailey, J. Nieto, J. Guivant, M. Stevens, and E. Nebot, “Con-
(32) sistency of the ekf-slam algorithm,” in 2006 IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 3562–3568, IEEE,
R EFERENCES 2006.

[1] J. Eckert, R. German, and F. Dressler, “An indoor localization frame-


work for four-rotor flying robots using low-power sensor nodes,”
IEEE Transactions on Instrumentation and Measurement, vol. 60,
no. 2, pp. 336–344, 2011.
[2] J. R. Gonzalez and C. J. Bleakley, “High-precision robust broadband
ultrasonic location and orientation estimation,” IEEE Journal of
Selected Topics in Signal Processing, vol. 3, no. 5, pp. 832–844,
2009.

431

You might also like