Professional Documents
Culture Documents
Matthias Faessler, Flavio Fontana, Christian Forster, Elias Mueggler, Matia Pizzoli, and Davide Scaramuzza∗
Robotics and Perception Group, University of Zurich, 8050 Zurich, Switzerland
Received 14 June 2014; accepted 22 January 2015
The use of mobile robots in search-and-rescue and disaster-response missions has increased significantly in
recent years. However, they are still remotely controlled by expert professionals on an actuator set-point level,
and they would benefit, therefore, from any bit of autonomy added. This would allow them to execute high-
level commands, such as “execute this trajectory” or “map this area.” In this paper, we describe a vision-based
quadrotor micro aerial vehicle that can autonomously execute a given trajectory and provide a live, dense
three-dimensional (3D) map of an area. This map is presented to the operator while the quadrotor is mapping,
so that there are no unnecessary delays in the mission. Our system does not rely on any external positioning
system (e.g., GPS or motion capture systems) as sensing, computation, and control are performed fully onboard
a smartphone processor. Since we use standard, off-the-shelf components from the hobbyist and smartphone
markets, the total cost of our system is very low. Due to its low weight (below 450 g), it is also passively safe and
can be deployed close to humans. We describe both the hardware and the software architecture of our system.
We detail our visual odometry pipeline, the state estimation and control, and our live dense 3D mapping, with
an overview of how all the modules work and how they have been integrated into the final system. We report
the results of our experiments both indoors and outdoors. Our quadrotor was demonstrated over 100 times at
multiple trade fairs, at public events, and to rescue professionals. We discuss the practical challenges and lessons
learned. Code, datasets, and videos are publicly available to the robotics community. C 2015 Wiley Periodicals, Inc.
Journal of Field Robotics 00(0), 1–20 (2015) C 2015 Wiley Periodicals, Inc.
View this article online at wileyonlinelibrary.com • DOI: 10.1002/rob.21581
2 • Journal of Field Robotics—2015
which limits the mission time drastically. Operator errors dense 3D mapping without any user intervention and us-
could harm the robot or, even worse, cause further damage. ing only a single camera and an IMU as the main sensory
For these reasons, there is a large need for flying robots that modality.
can navigate autonomously, without any user intervention, The most similar to our system is the one which re-
or execute high-level commands, such as “execute this trajec- sulted from the European project SFLY Weiss et al. (2013);
tory” or “map this area.” This would bring several advantages Scaramuzza et al. (2014). However, in SFLY, dense 3D maps
over today’s disaster-response robots. First, the robot could were computed offline and were available only after several
easily be operated by a single person, who could focus on minutes after landing.
the actual mission. Secondly, a single person could operate Another difference with the SFLY project lies in the
multiple robots at the same time to speed up the mission. vision-based motion-estimation pipeline. Most monocular,
Finally, rescuers could operate such systems with very lit- visual-odometry algorithms used for MAV navigation (see
tle training. These advantages, in combination with the low Section 3.2) rely on PTAM Klein & Murray (2007), which is
cost of the platform, will soon make MAVs become standard a feature-based visual SLAM algorithm, running at 30 Hz,
tools in disaster response operations. designed for augmented reality applications in small desk-
top scenes. In contrast, our quadrotor relies on a novel vi-
sual odometry algorithm [called SVO, which we proposed
1.2. System Overview in Forster, Pizzoli, & Scaramuzza (2014b)] designed specif-
Our system consists of a quadrotor and a laptop base ically for MAV applications. SVO eliminates the need of
station with a graphical user interface for the operator costly feature extraction and matching as it operates di-
(see Figure 1). The quadrotor is equipped with a single, rectly on pixel intensities. This results in high precision,
down-looking camera, an inertial measurement unit (IMU), robustness, and higher frame-rates (at least twice that of
and a single-board computer. All required sensing, com- PTAM) than current state-of-the-art methods. Additionally,
putation, and control is performed onboard the quadrotor. it uses a probabilistic mapping method to model outliers
This design allows us to operate safely even when we and feature-depth uncertainties, which provide robustness
temporarily lose wireless connection to the base station. It in scenes with repetitive, and high-frequency textures.
also allows us to operate the quadrotor beyond the range In SFLY, a commercially available platform was used,
of the wireless link as, for instance, inside buildings. We do with limited access to the low-level controller. Conversely,
not require any external infrastructure such as GPS, which we have full access to all the control loops, which allows
can be unreliable in urban areas or completely unavailable us to tune the controllers down to the lowest level. Further-
indoors. more, we propose a one-time-per-mission estimation of sen-
We chose quadrotors because of their high maneuver- sor biases, which allows us to reduce the number of states
ability, their ability to hover on a spot, and their simple in the state estimator. In addition, the estimated biases are
mechanical design. To make the system self-contained, we also considered in the low-level controller (body rate con-
rely only on onboard sensors (i.e., a camera and an IMU). troller), while in SFLY, they were only used in the high-level
For operating in areas close to humans, safety is a controller (position controller). Furthermore, we propose a
major concern. We aim at achieving this passively by calibration of the actually-produced thrust, which ensures
making the quadrotor as lightweight as possible (below 450 the same control performance regardless of environmental
g). Therefore, we chose passive sensors, which are typically conditions such as temperature and air pressure.
lighter and consume less power. However, when using
cameras, high computational power is required to process
1.4. Outline
the huge amount of data. Due to the boost of computational
power in mobile devices (e.g., smartphones, tablets), The paper is organized as follows. Section 3 reviews the re-
high-performance processors that are optimized for power lated work on autonomous MAV navigation and real-time
consumption, size, and cost are available today. An addi- 3D dense reconstruction. Section 4 presents the hardware
tional factor for real-world applications is the cost of the and software architecture of our platform. Section 5 de-
overall system. The simple mechanical design and the use scribes our visual odometry pipeline, while Section 6 details
of sensors and processors produced millionfold for smart- the state estimation and control of the quadrotor. Section 7
phones makes the overall platform low cost (1,000 USD). describes our live, dense 3D reconstruction pipeline. Finally,
Section 8 presents and discusses the experimental results,
and Section 9 comments on the lessons learned.
1.3. Contributions and Differences with Other
Systems
2. RELATED WORK
The main contribution of this paper is a self-contained, ro-
bust, low-cost, power-on-and-go quadrotor MAV that can To date, most autonomous MAVs rely on GPS to navigate
autonomously execute a given trajectory and provide a live outdoors. However, GPS is not reliable in urban settings and
Figure 1. Our system (a) can autonomously build a dense 3D map of an unknown area. The map is presented to the operator
through a graphical user interface on the base station laptop while the quadrotor is mapping (b, right inset) together with the
onboard image (b, bottom left inset) and a confidence map (b, top left inset).
is completely unavailable indoors. Because of this, most measure the relative velocity of image features, the position
works on autonomous indoor navigation of flying robots estimate of the MAV will inevitably drift over time. This
have used external motion-capture systems (e.g., Vicon or can be avoided using visual odometry or visual simultane-
OptiTrack). These systems are very appropriate for testing ous localization and mapping (SLAM) methods, in monoc-
and evaluation purposes (Lupashin et al., 2014; Michael, ular (Forster et al., 2014b; Scaramuzza et al., 2014; Weiss,
Mellinger, Lindsey, & Kumar, 2010), such as prototyping Scaramuzza, & Siegwart, 2011; Weiss et al., 2013) or stereo
control strategies or executing fast maneuvers. However, configurations (Achtelik et al., 2009; Schmid, Lutz, Tomic,
they need pre-installation of the cameras, and thus they Mair, & Hirschmuller, 2014; Shen, Mulgaonkar, Michael, &
cannot be used in unknown yet unexplored environments. Kumar, 2013). Preliminary experiments for MAV localiza-
Therefore, for truly autonomous navigation in indoor envi- tion using a visual extended Kalman filter (EKF) -based
ronments, the only viable solution is to use onboard sensors. SLAM technique were described in Ahrens, Levine, An-
The literature on autonomous navigation of MAVs using drews, & How (2009). However, the first use of visual SLAM
onboard sensors includes range (e.g., laser rangefinders or to enable autonomous basic maneuvers was done within
RGB-D sensors) and vision sensors. the framework of the SFLY European project, where a sin-
gle camera and an IMU were used for state estimation and
point-to-point navigation over several hundred meters in
2.1. Navigation Based on Range Sensors an outdoor, GPS-denied environment (Scaramuzza et al.,
Laser rangefinders have been largely explored for simul- 2014; Weiss et al., 2013).
taneous localization and mapping (SLAM) with ground Most monocular visual odometry (VO) algorithms
mobile robots (Thrun et al., 2007). Because of the heavy for MAVs (Bloesch, Weiss, Scaramuzza, & Siegwart, 2010;
weight of 3D scanners [see, for instance, the Velodyne sen- Engel, Sturm, & Cremers, 2012; Scaramuzza et al., 2014;
sor (more than 1 kg)], laser rangefinders currently used on Weiss et al., 2013) rely on PTAM (Klein & Murray, 2007).
MAVs are only 2D. Since 2D scanners can only detect objects PTAM is a feature-based SLAM algorithm that achieves ro-
that intersect their sensing plane, they have been used for bustness through tracking and mapping several hundreds
MAVs in environments characterized by vertical structures of features. Additionally, it runs in real time (at around
(Achtelik, Bachrach, He, Prentice, & Roy, 2009; Bachrach, 30 Hz) by parallelizing the motion estimation and map-
He, & Roy, 2009; Shen, Michael, & Kumar, 2011, 2012b) and ping tasks and by relying on efficient keyframe-based bun-
less in more complex scenes. dle adjustment (BA) (Strasdat, Montiel, & Davison, 2010).
RGB-D sensors are based upon structured-light tech- However, PTAM was designed for augmented reality appli-
niques, and thus they share many properties with stereo cations in small desktop scenes, and multiple modifications
cameras. However, the primary differences lie in the range (e.g., limiting the number of keyframes) were necessary to
and spatial density of depth data. Since RGB-D sensors il- allow operation in large-scale outdoor environments (Weiss
luminate a scene with a structured-light pattern, contrary et al., 2013).
to stereo cameras, they can estimate depth in areas with
poor visual texture but are range-limited by their projec-
tors. RGB-D sensors for state estimation and mapping with
2.3. Real-time Monocular Dense 3D Mapping
MAVs have been used in Bachrach et al. (2012) as well as In robotics, a dense reconstruction is needed to interact with
in Shen, Michael, & Kumar (2012a, 2012b), where a multi- the environment—as in obstacle avoidance, path planning,
floor autonomous exploration and mapping strategy was and manipulation. Moreover, the robot must be aware of
presented. the uncertainty affecting the measurements in order to
intervene by changing the vantage point or deploying dif-
ferent sensing modalities (Forster, Pizzoli, & Scaramuzza,
2.2. Navigation Based on Vision Sensors 2014a).
Although laser rangefinders and RGB-D sensors are very A single moving camera represents the most general
accurate and robust to illumination changes, they are too setting for stereo vision. Indeed, in stereo settings a fixed
heavy and consume too much power for lightweight MAVs. baseline constrains the operating range of the system, while
In this case, the alternative solution is to use vision sen- a single moving camera can be seen as a stereo camera with
sors. Early works on vision-based navigation of MAVs fo- an adjustable baseline that can be dynamically reconfigured
cused on biologically inspired algorithms (such as opti- according to the requirements of the task.
cal flow) to perform basic maneuvers, such as takeoff and Few relevant works have addressed real-time, dense
landing, reactive obstacle avoidance, and corridor and ter- reconstruction from a single moving camera, and they shed
rain following (Hrabar, Sukhatme, Corke, Usher, & Roberts, light on some important aspects. If, on the one hand, esti-
2005; Lippiello, Loianno, & Siciliano, 2011; Ruffier & mating the depth independently for every pixel leads to ef-
Franceschini, 2004; Zingg, Scaramuzza, Weiss, & Siegwart, ficient, parallel implementations, on the other hand Gallup,
2010; Zufferey & Floreano, 2006). Since optical flow can only Frahm, Mordohai, Yang, & Pollefeys (2007), Stühmer et al.
Table I. Weight and price of the individual components of our pipeline (see Section 5). The visual odometry pipeline out-
quadrotors. puts an unscaled pose, which is then fused with the IMU
readings in an extended Kalman filter framework [multisen-
Component Weight (g) Price (USD) sor fusion (MSF) (Lynen, Achtelik, Weiss, Chli, & Siegwart,
2013)] to compute a metric state estimate. From this state
Frame, Gears, Propellers 119 63 estimate and a reference trajectory, we compute the desired
Motors, Motor Controllers 70 214 body rates and collective thrust, which are then sent to the
PX4FMU, PX4IOAR 35 181 PX4. Alongside this pipeline, we send the images from the
Hardkernel Odroid-U3 49 65 camera together with the corresponding pose estimate to
Camera, Lens 16 326
the ground station.
1350mA h Battery 99 44
On the ground station, we run a dense 3D reconstruc-
Other Parts 54 110
tion (Pizzoli et al., 2014) in real time using the camera images
Total 442 1,003
with their corresponding pose estimate (see Section 7).
Figure 3. System overview: the PX4 and Odroid U3 communicate with each other over a UART interface. Communication to the
Laptop is over WiFi. Gray boxes are sensors and actuators; software modules are depicted as white boxes.
Figure 4. SVO system overview. Two concurrent threads are used, one for estimating the motion of the camera with respect to the
map and the second one for extending the map.
one thread, while the second thread extends the map, de- (B) Direct methods (Irani & Anandan, 1999), on the
coupled from hard real-time constraints. In the following, other hand, estimate the motion directly from intensity val-
we provide an overview of both motion estimation and ues in the image. The rigid-body transformation is found
mapping. We refer the reader to Forster et al. (2014b) for through minimizing the photometric difference between
more details. corresponding pixels, where the local intensity gradient
magnitude and direction are used in the optimization. Since
this approach starts directly with an optimization, increas-
ing the frame-rate means that the optimization is initialized
4.1. Motion Estimation closer to the optimum, and thus it converges much faster.
Methods that estimate the camera pose with respect to a Hence, direct methods in general benefit from increasing the
map (i.e., a set of key-frames and 3D points) can be divided frame-rate (Handa, Newcombe, Angeli, & Davison, 2012) as
into two classes: the processing time per frame decreases.
(A) Feature-based methods extract a sparse set of salient In Forster et al. (2014b), we proposed a semidirect
image features in every image, match them in successive visual odometry (SVO) that combines the benefits of
frames using invariant feature descriptors, and finally feature-based and direct methods. SVO establishes feature
recover both camera motion and structure using epipolar correspondence by means of direct methods rather than fea-
geometry (Scaramuzza & Fraundorfer, 2011). The bottle- ture extraction and matching. The advantage is increased
neck of this approach is that approximately 50% of the speed (up to 70 fps onboard the MAV and 400 Hz on a
processing time is spent on feature extraction and matching, consumer laptop) due to the lack of feature extraction at ev-
which is the reason why most of the VO algorithms still ery frame and increased accuracy through subpixel feature
run at 20–30 fps despite the availability and advantages of correspondence. Once feature correspondences are estab-
high-frame-rate cameras. lished, the algorithm continues using only point-features,
4.2. Mapping
The mapping thread estimates the depth at new 2D fea-
ture positions [we used FAST corners (Rosten, Porter, &
Drummond, 2010)] by means of a depth filter. Once the depth
filter has converged, a new 3D point is inserted in the map
at the found depth and immediately used for motion esti- Figure 5. Quadrotor with coordinate system and rotor forces.
mation. The same depth filter formulation is used for dense
reconstruction, and its formulation is explained in more de-
tail in Section 7.1. 5.1. Dynamical Model
New depth-filters are initialized whenever a new For state estimation and control, we make use of the follow-
keyframe is selected in regions of the image where few 3D- ing dynamical model of our quadrotor:
to-2D correspondences are found. A keyframe is selected
when the distance between the current frame and the previ- ṙ = v, (1)
ous keyframe exceeds 12% of the average scene depth. The
filters are initialized with a large uncertainty in depth and v̇ = g + R · c, (2)
with a mean at the current average scene depth.
At every subsequent frame, the epipolar line segment Ṙ = R · ω̂, (3)
in the new image corresponding to the current depth con-
fidence interval is searched for an image patch that has
ω̇ = J−1 · (τ − ω × Jω) , (4)
the highest correlation with the reference feature patch (see
Figure 6). If a good match is found, the depth filter is up- where r = [x y z]T and v = [vx vy vz ]T are the position
dated in a recursive Bayesian fashion [see Eq. (31)]. Hence, and velocity in world coordinates, R is the orientation
we use many measurements to verify the position of a 3D of the quadrotor’s body coordinates with respect to the
point, which results in considerably fewer outliers com- world coordinates, and ω = [p q r]T denotes the body rates
pared to triangulation from two views. expressed in body coordinates. The skew symmetric matrix
ω̂ is defined as
⎡ ⎤
0 −r q
4.3. Implementation Details ω̂ = ⎣ r 0 −p ⎦. (5)
The source-code of SVO is available open-source at https:// −q p 0
github.com/uzh-rpg/rpg_svogithub.com/uzh-rpg/rpg_
We define
the gravity
vector as g = [0 0 − g]T , and J =
svo. No special modifications are required to run SVO
diag Jxx , Jyy , Jzz is the second-order moment-of-inertia
onboard the MAV. For all experiments, we use the fast
matrix of the quadrotor. The mass-normalized thrust vector
parameter setting that is available online. The fast setting
is c = [0 0 c]T , with
limits the number of features per frame to 120, main-
tains always at a maximum 10 keyframes in the map mc = f1 + f2 + f3 + f4 , (6)
(i.e., older keyframes are removed from the map), and
for processing-time reasons no bundle adjustment is where fi are the four motor thrusts as illustrated in Figure 5.
performed. The torque inputs τ are composed of the single-rotor thrusts
as
⎡ √ ⎤
2
⎢ 2 l(f1 − f2 − f3 + f4 ) ⎥
⎢√ ⎥
5. STATE ESTIMATION AND CONTROL ⎢ ⎥
τ =⎢ 2 ⎥, (7)
⎢ l(−f − f + f + f ) ⎥
In this section, we describe the state estimation and control ⎣ 2 1 2 3 4 ⎦
used to stabilize our quadrotor. Furthermore, we explain κ(f1 − f2 + f3 − f4 )
how we estimate sensor biases and the actually produced
thrust. The control and calibration sections are inspired by where l is the quadrotor arm length and κ is the rotor-torque
Lupashin et al. (2014). coefficient.
Table II. Parameters and control gains used for the Position Controller. To track a reference trajectory, we
experiments. implemented a PD controller with feedforward terms on
velocity and acceleration:
Parameter Description Value Unit
ades = Ppos · (rref − r̂) + Dpos · (vref − v̂) + aref , (8)
Jxx x-axis Moment of Inertia 0.001 Kg m2
with gain matrices
Ppos = diag pxy , pxy , pz and Dpos =
Jyy y-axis Moment of Inertia 0.001 Kg m2
diag dxy , dxy , dz . Since a quadrotor can only accelerate in
Jzz z-axis Moment of Inertia 0.002 Kg m2
its body z direction, ades enforces two degrees of the desired
m Quadrotor Mass 0.45 kg
l Quadrotor Arm Length 0.178 m
attitude. Now we want to compute the desired normalized
κ Rotor Torque Coefficient 0.0195 m thrust such that the z component of ades is reached with the
pxy Horizontal Position Control Gain 5.0 s−2 current orientation. To do so, we make use of the last row
pz Vertical Position Control Gain 15.0 s−2 of Eq. (2) to compute the required normalized thrust cdes as
dxy Horizontal Velocity Control Gain 4.0 s−1
dz Vertical Velocity Control Gain 6.0 s−1 ades,z + g
prp Roll/Pitch Attitude Control Gain 16.6 s−1
cdes = . (9)
R3,3
pyaw Yaw Attitude Control Gain 5 s−1
ppq Roll/Pitch Rate Control Gain 33.3 s−1 The output of the position controller is composed of the
pr Yaw Rate Control Gain 6.7 s−1 desired accelerations ades , which, together with the reference
yaw angle ψref , encodes the desired orientation as well as a
mass normalized thrust cdes .
Attitude Controller. In the previous paragraph, we
The used coordinate systems and rotor numbering are
computed the desired thrust such that the desired accelera-
illustrated in Figure 5, and the used parameter values are
tion in the vertical world direction is met. Since a quadrotor
listed in Table II.
can only produce thrust in the body z direction, the attitude
controller has to rotate the quadrotor in order to achieve the
5.2. State Estimation desired accelerations in the world x-y plane with the given
thrust. The translational behavior of the quadrotor is inde-
To stabilize our quadrotor, we need an estimate of the metric pendent of a rotation around the body z axis. Therefore, we
pose as well as the linear and angular velocities. We com- first discuss the attitude controller for roll and pitch and,
pute this state estimate by fusing the sensor data from the second we present the yaw controller.
IMU and the output of the visual odometry in an extended For the x-y plane movement, we restate the first two
Kalman filter. To do so, we make use of an open-source mul- rows of (2) and insert the normalized thrust from Eq. (9),
tisensor fusion package (Lynen et al., 2013). Since we did not
modify this package, we are not describing the sensor fusion ades,x R1,3
= c , (10)
in more detail here. ades,y R2,3 des
where Ri,j denotes the (i, j ) element of the orientation ma-
5.3. Controller trix R. Solving for the two entries of R which define the x-y
plane movement, we find
To follow reference trajectories and stabilize the quadrotor,
we use cascaded controllers. The high-level controller run- Rdes,1,3 1 ades,x
= . (11)
ning on the Odroid includes a position controller and an Rdes,2,3 cdes ades,y
attitude controller. The low-level controller on the PX4 con-
tains a body rate controller. The used control gains are listed To track these desired components of R, we use a pro-
in Table II. portional controller on the attitude error,
Ṙdes,1,3 R1,3 − Rdes,1,3
= prp , (12)
Ṙdes,2,3 R2,3 − Rdes,2,3
5.3.1. High-level Control
The high-level controller takes a reference trajectory as input where prp is the controller gain and Ṙ is the change of the
and computes desired body rates that are sent to the low- orientation matrix per time step.
level controller. A reference trajectory consists of a reference We can write the first two rows of Eq. (3) as
position rref , a reference velocity vref , a reference acceleration
Ṙdes,1,3 −R1,2 R1,1 pdes
aref , and a reference yaw angle ψref . First, the position con- = · . (13)
Ṙdes,2,3 −R2,2 R2,1 qdes
troller is described followed by the attitude controller. The
two high-level control loops are synchronized and run at Finally, the desired roll and pitch rate can be computed by
50 Hz. plugging Eq. (12) into Eq. (13) and solving for pdes and qdes .
The yaw-angle control does not influence the transla- We assume the noise nω to have zero mean and can
tional dynamics of the quadrotor and thus it can be con- therefore average the gyro measurements over N samples
trolled independently. First, we compute the current yaw to estimate the gyro bias bω as
angle ψ from the quadrotor orientation R. Second, we com-
1
N
pute the desired angular rate in the world z direction with b̂ω = ω̃k . (19)
a proportional controller on the yaw angle error, N k=1
rworld = pyaw (ψref − ψ) . (14) The accelerometer measurement equation reads
The resulting desired rate can then be converted into ã = c + adist + bacc + nacc , (20)
the quadrotor body frame using its current orientation
where ã denotes the measured accelerations, c is the mass
rdes = R3,3 · rworld . (15) normalized thrust, adist are the accelerations due to exter-
nal disturbances, bacc is the bias, and nacc is the noise of
the accelerometer. In hover conditions, c = −g and we as-
5.3.2. Low-level Control sume to have only small and zero mean disturbances, so the
The commands sent to the low-level control on the PX4 are equation simplifies to
the desired body rates ωdes and the desired mass-normalized ã = −g + bacc + nacc . (21)
thrust cdes . From these, the desired rotor thrusts are then
computed using a feedback linearizing control scheme with As for the gyro bias, we assume the noise nacc to have
the closed-loop dynamics of a first-order system. First, the zero mean and can therefore average the accelerometer mea-
desired torques τ des are computed as surements over N samples to estimate the accelerometer
⎡ ⎤ bias bacc as
ppq (pdes − p)
1
N
τ des = J⎣ ppq (qdes − q) ⎦ + ω × Jω. (16) b̂acc = ãk + g. (22)
pr (rdes − r) N k=1
Then, we can plug τ des and cdes into Eqs. (7) and (6) and When performing the sensor bias calibration, we sam-
solve them for the desired rotor thrusts which has to be ple the IMU readings over a 5 s period, which is enough to
applied. provide an accurate estimate of their biases.
5.4. Calibration
5.4.2. Thrust Calibration
5.4.1. Sensor Bias Calibration When flying our quadrotors under very different conditions
For the state estimation and the control of our quadrotors, indoors and outdoors, we noticed that the produced rotor
we make use of their gyros and accelerometers. Both of thrust can vary substantially. This can significantly reduce
these sensor units are prone to having an output bias that the control authority and hence the flight performance in
varies over time and that we, therefore, have to estimate. We situations in which the produced thrust is low. To overcome
noticed that the changes of these biases during one flight this limitation, we estimate the actually produced thrust in
are negligible. This allows us to estimate them once at the flight.
beginning of a mission and then keep them constant. Thus, The thrust f of a single rotor can be computed as
we do not have to estimate them online and can therefore
1
reduce the size of the state in the state estimation. In the f = · ρ · 2 · C · A, (23)
2
following, we will present a procedure that allows us to
estimate the biases during autonomous hover. Note that the where ρ is the air density, is the rotor speed, C is the lift
quadrotor can hover autonomously even with sensor biases. coefficient, and A is the rotor area. The rotor speed is the
However, removing the biases increases the state estimation input parameter, through which we control the thrust. We
and tracking performance. For the sensor bias calibration, assume that the rotor speed is controlled by the motor con-
we look at the gyros and the accelerometers separately. trollers such that we can neglect differences of the battery
The gyro measurement equation reads voltage. However, the density of the air ρ is not constant as
it depends on the air pressure and temperature. Addition-
ω̃ = ω + bω + nω , (17) ally, wear and possible damages to the rotors might cause
where ω̃ denotes the measured angular velocities, ω is the unexpected changes in C and A. These three values are diffi-
real angular velocities, bω is the bias, and nω is the noise of cult to measure, but we can estimate them together in hover
the gyros. This, in hover conditions, becomes flight. To do so, we first combine the three parameters and
write Eq. (23) as
ω̃ = bω + nω . (18)
f = G2 . (24)
6 Although in our lab, we can transmit uncompressed, full- public exhibitions and outdoor experiments with more than 20-m
resolution images at rates of up to 50 Hz, we observed, during height, that a lower bound of 5 Hz can usually be assumed.
cross-correlation score on a 5 × 5 patch, we search the pixel a denoised depth map F (u) as the following minimization:
on the epipolar line segment ui that has the highest corre-
lation with the reference pixel ui . A depth hypothesis d̃ik is
generated from the observation by triangulating ui and ui min G(u) ∇F (u)
+ λ F (u) − D(u)1 du, (33)
F
from the views r and k, respectively.
where λ is a free parameter controlling the tradeoff between
Let the sequence of d̃ik for k = r, . . . , r + n denote a set
the data term and the regularizer, and G(u) is a weight-
of noisy depth measurements. As proposed in Vogiatzis &
ing function related to the “G-weighted total variation” in-
Hernández (2011), we model the depth filter as a distribu-
troduced in Bresson, Esedoglu, Vandergheynst, Thiran, &
tion that mixes a good measurement (normally distributed
Osher (2007) in the context of image segmentation. We pe-
around the true depth di ) and an outlier measurement
nalize nonsmooth surfaces by making use of a regulariza-
(uniformly distributed in an interval [dimin , dimax ], which is
tion term based on the Huber norm of the gradient, defined
known to contain the depth for the structure of interest):
as
⎧
p d̃ik | di , ρi ⎨ ||∇F (u)||2
2
⎪
if ||∇F (u)||2 ≤
,
∇F (u) = 2
(34)
2
= ρi N d̃ik |di , τik + (1 − ρi )U d̃ik |dimin , dimax , (30)
⎪
⎩||∇F (u)||1 −
otherwise.
2
2
where ρi and τik are the probability (i.e., inlier ratio) and We chose the Huber norm because it allows smooth recon-
the variance of a good measurement, respectively. Assum- struction while preserving discontinuities at strong depth
ing independent observations, the Bayesian estimation for gradient locations (Newcombe et al., 2011). The weighting
di on the basis of the measurements d̃ir+1 , . . . , d̃ik is given by function G(u) influences the strength of the regularization
the posterior and we propose to compute it on the basis of the measure
confidence for u:
p(di , ρi |d̃ir+1 , . . . , d̃ik ) ∝ p(di , ρi ) p(d̃ik |di , ρi ), (31) σ 2 (u)
k G(u) = Eρ [q](u) 2 + 1 − Eρ [q](u) , (35)
σmax
with p(di , ρi ) being a prior on the true depth and the ratio
where we have extended the notation for the expected value
of good measurements supporting it. A sequential update
of the inlier ratio Eρ [q] and the variance σ 2 in Eq. (32) to ac-
is implemented by using the estimation at time step k − 1
count for the specific pixel u. The weighting function (35) af-
as a prior to combine with the observation at time step k. To
fects the strength of the regularization term: for pixels with
this purpose, the authors of Vogiatzis & Hernández (2011)
a high expected value for the inlier ratio ρ, the weight is
show that the posterior in Eq. (31) can be approximated by
controlled by the measurement variance σ 2 ; measurements
the product of a Gaussian distribution for the depth and a
characterized by a small variance (i.e., reliable measure-
Beta distribution for the inlier ratio:
ments) will be less affected by the regularization; in contrast,
2
q(di , ρi | aik , bik , μki , σik ) the contribution of the regularization term will be heavier
2
for measurements characterized by a small expected value
= Beta(ρi |aik , bik )N (di |μki , σik ), (32) for the inlier ratio or higher measurement variance.
The solution to the minimization problem (33) is com-
where aik and bik are the parameters controlling the Beta
puted iteratively based on the work in Chambolle & Pock
distribution. The choice is motivated by the fact that Beta ×
(2011). The algorithm exploits the primal dual formulation
Gaussian is the approximating distribution minimizing the
of Eq. (33) and is detailed in Pizzoli et al. (2014).
Kullback-Leibler divergence from the true posterior (31).
We refer to Vogiatzis & Hernández (2011) for an in-depth
discussion and formalization of this Bayesian update step.
7. RESULTS
7.1. Indoor flight
6.2. Depth Smoothing Our flying arena is equipped with an OptiTrack motion-
In Pizzoli et al. (2014), we introduced a fast smoothing step capture system by NaturalPoint,7 which we only used for
that takes into account the measurement uncertainty to en- ground-truth comparison. Its floor is covered with a texture-
force spatial regularity and mitigates the effect of noisy cam- rich carpet and boxes for 3D structure as shown in Figure 7.
era localization. The quadrotor was requested to autonomously execute a
We now detail our solution to the problem of smooth- closed-loop trajectory specified by waypoints. The trajec-
ing the depth map D(u). For every pixel ui in the reference tory was 20 m long and the MAV flew on average at 1.7 m
image Ir : ⊂ R2 → R, the depth estimation and its confi- above ground. Figures 8, 9, and 10, show the accurate
dence upon the kth observation are given, respectively, by μki
and σik in Eq. (32). We formulate the problem of computing 7 http://www.naturalpoint.com/optitrack/
Figure 7. Our flying arena equipped with an OptiTrack motion-capture system (for ground-truth recording), carpets, and boxes
for 3D structure.
Figure 8. Comparison of estimated and ground-truth position to a reference trajectory (a) flown indoors over a scene as recon-
structed in (b).
trajectory following as well as the position and orientation (with a final absolute error smaller than 5 cm). This result
estimation errors, respectively. For ground-truth compari- was confirmed in more than 100 experiments run at pub-
son, we aligned the first 2 m of the estimated trajectory to the lic events, exhibitions, and fairs. These results outperform
ground truth (Umeyama, 1991). The maximum recorded po- those achieved with MAVs based on PTAM. This is due to
sition drift was 0.5% of the traveled distance. As observed, the higher precision of SVO. Comparisons between SVO
the quadrotor was able to return very close to the start point and PTAM are reported in Forster et al. (2014b).
Figure 9. Error between estimated and ground-truth position Figure 12. Side-view of a piecewise-planar map created by
for the trajectory illustrated in Figure 8. SVO and PTAM. The proposed method has fewer outliers due
to the depth-filter.
Figure 16. Evaluation of dense reconstruction on a synthetic dataset. (a) The reference view. (b) Ground truth depth map.
(c) Depth map based on Vogiatzis & Hernández (2011). (d) Depth map computed by REMODE. (e) Map of reliable measurements
(white pixels are classified as reliable). (f) Error of REMODE.
Figure 17. The completeness of the synthetic experiment, i.e., the percentage of ground truth measurements that are within a
certain error from the converged estimations. For color blind readers: top line, this paper – exact pose; second line from the top,
Vogiatzis and Hernandez 2011 – exact pose; third line from the top, this paper – noisy pose; bottom line, Vogiatzis and Hernandez
2011 – noisy pose.
unstructured environments. In IEEE International Confer- applied to MAV navigation. In IEEE/RSJ International
ence on Robotics and Automation (ICRA). Conference on Intelligent Robots and Systems (IROS).
Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.-P., & Meier, L., Tanskanen, P., Heng, L., Lee, G. H., Fraundorfer, F.,
Osher, S. (2007). Fast global minimization of the active & Pollefeys, M. (2012). PIXHAWK: A micro aerial vehi-
contour/snake model. Journal of Mathematical Imaging cle design for autonomous flight using onboard computer
and Vision, 28(2), 151–167. vision. Autonomous Robots, 33(1–2), 21–39.
Chambolle, A., & Pock, T. (2011). A first-order primal-dual algo- Michael, N., Mellinger, D., Lindsey, Q., & Kumar, V. (2010).
rithm for convex problems with applications to imaging. The GRASP multiple micro UAV testbed. IEEE Robotics
Journal of Mathematical Imaging and Vision, 40(1), 120– Automation Magazine, 17(3), 56–65.
145. Murphy, R. (2014). Disaster Robotics. Cambridge, MA: MIT
Engel, J., Sturm, J., & Cremers, D. (2012). Camera-based naviga- Press.
tion of a low-cost quadrocopter. In IEEE/RSJ International Newcombe, R., Lovegrove, S., & Davison, A. (2011). DTAM:
Conference on Intelligent Robots and Systems (IROS). Dense tracking and mapping in real-time. In International
Forster, C., Lynen, S., Kneip, L., & Scaramuzza, D. (2013a). Col- Conference on Computer Vision (ICCV) (pp. 2320–2327),
laborative monocular slam with multiple micro aerial ve- Barcelona, Spain.
hicles. In IEEE/RSJ International Conference on Intelligent Pizzoli, M., Forster, C., & Scaramuzza, D. (2014). REMODE:
Robots and Systems (IROS). Probabilistic, monocular dense reconstruction in real time.
Forster, C., Pizzoli, M., & Scaramuzza, D. (2013b). Air-ground In IEEE International Conference on Robotics and Au-
localization and map augmentation using monocular tomation (ICRA).
dense reconstruction. In IEEE/RSJ International Confer- Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better:
ence on Intelligent Robots and Systems (IROS). A machine learning approach to corner detection. IEEE
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014a). Appearance- Transactions in Pattern Analysis and Machine Intelligence,
based active, monocular, dense depth estimation for micro 32(1), 105–119.
aerial vehicles. In Robotics: Science and Systems (RSS), 1– Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total varia-
8. tion based noise removal algorithms. Physica D: Nonlinear
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014b). SVO: Fast Phenomena, 60(1-4).
semi-direct monocular visual odometry. In IEEE Interna- Ruffier, F., & Franceschini, N. (2004). Visually guided micro-
tional Conference on Robotics and Automation (ICRA). aerial vehicle: Automatic take off, terrain following, land-
Gallup, D., Frahm, J.-M., Mordohai, P., Yang, Q., & Pollefeys, ing and wind reaction. In IEEE International Conference
M. (2007). Real-time plane-sweeping stereo with multiple on Robotics and Automation (ICRA).
sweeping directions. In Proceedings of the IEEE Interna- Scaramuzza, D., Achtelik, M., Doitsidis, L., Fraundorfer, F.,
tional Conference on Computer Vision and Pattern Recog- Kosmatopoulos, E. B., Martinelli, A., Achtelik, M. W., Chli,
nition. M., Chatzichristofis, S., Kneip, L., Gurdan, D., Heng, L.,
Handa, A., Newcombe, R. A., Angeli, A., & Davison, A. J. (2012). Lee, G., Lynen, S., Meier, L., Pollefeys, M., Renzaglia,
Real-time camera tracking: When is high frame-rate best? A., Siegwart, R., Stumpf, J. C., Tanskanen, P., Troiani, C.,
Hrabar, S., Sukhatme, G. S., Corke, P., Usher, K., & Roberts, & Weiss, S. (2014). Vision-controlled micro flying robots:
J. (2005). Combined optic-ow and stereo-based navigation From system design to autonomous navigation and map-
of urban canyons for a UAV. In IEEE/RSJ International ping in GPS-denied environments. IEEE Robotics Automa-
Conference on Intelligent Robots and Systems (IROS). tion Magazine.
Irani, M., & Anandan, P. (1999). All about direct methods. In Scaramuzza, D., & Fraundorfer, F. (2011). Visual odometry. Part
Proceedings of the Workshop Vision Algorithms: Theory I: The first 30 years and fundamentals. IEEE Robotics Au-
Practice (pp. 267–277). tomation Magazine, 18(4), 80–92.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping Schmid, K., Lutz, P., Tomic, T., Mair, E., & Hirschmuller, H.
for small AR workspaces. In IEEE and ACM International (2014). Autonomous vision-based micro air vehicle for in-
Symposium on Mixed and Augmented Reality (ISMAR) door and outdoor navigation. Journal of Field Robotics,
(pp. 225–234), Nara, Japan. 31, 537–570.
Lippiello, V., Loianno, G., & Siciliano, B. (2011). Mav indoor Shen, S., Michael, N., & Kumar, V. (2011). Autonomous
navigation based on a closed-form solution for absolute multi-floor indoor navigation with a computationally
scale velocity estimation using optical flow and inertial constrained MAV. In IEEE International Conference on
data. In IEEE International Conference on Decision and Robotics and Automation (ICRA).
Control. Shen, S., Michael, N., & Kumar, V. (2012a). Autonomous indoor
Lupashin, S., Hehn, M., Mueller, M. W., Schoellig, A. P., 3D exploration with a micro-aerial vehicle. In IEEE Inter-
Sherback, M., & D’Andrea, R. (2014). A platform for aerial national Conference on Robotics and Automation (ICRA).
robotics research and demonstration: The Flying Machine Shen, S., Michael, N., & Kumar, V. (2012b). Stochastic differen-
Arena. Journal of Mechatronics, 24(1), 41–54. tial equation-based exploration algorithm for autonomous
Lynen, S., Achtelik, M., Weiss, S., Chli, M., & Siegwart, R. (2013). indoor 3D exploration with a micro aerial vehicle. Journal
A robust and modular multi-sensor fusion approach of Field Robotics, 31(12), 1431–1444.
Shen, S., Mulgaonkar, Y., Michael, N., & Kumar, V. (2013). Umeyama, S. (1991). Least-squares estimation of transforma-
Vision-based state estimation and trajectory control to- tion parameters between two point patterns. IEEE Trans-
wards aggressive flight with a quadrotor. In Robotics: Sci- actions on Pattern Analysis and Machine Intelligence,
ence and Systems (RSS), 1–8. 13(4), 376–380.
Strasdat, H., Montiel, J., & Davison, A. (2010). Real-time monoc- Vogiatzis, G., & Hernández, C. (2011). Video-based, real-
ular SLAM: Why filter? In IEEE International Conference time multi view stereo. Image and Vision Computing,
on Robotics and Automation (ICRA). 29(7), 434–441.
Stühmer, J., Gumhold, S., & Cremers, D. (2010). Real-time dense Weiss, S., Achtelik, M. W., Lynen, S., Achtelik, M. C., Kneip,
geometry from a handheld camera. In Pattern Recognition, L., Chli, M., & Siegwart, R. (2013). Monocular vision for
vol. 6376 of Lecture Notes in Computer Science (pp. 11–20). longterm micro aerial vehicle state estimation: A com-
Berlin: Springer. pendium. Journal of Field Robotics, 30(5), 803–831.
Szeliski, R., & Scharstein, D. (2004). Sampling the disparity Weiss, S., Scaramuzza, D., & Siegwart, R. (2011). Monocular-
space image. IEEE Transactions on Pattern Analysis and SLAM-based navigation for autonomous micro heli-
Machine Intelligence, 26(3), 419–425. copters in GPS-denied environments. Journal of Field
Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Robotics, 28(6), 854–874.
Aron, A., Diebel, J., Fong, P., Gale, J., Halpenny, M., Wendel, A., Maurer, M., Graber, G., Pock, T., & Bischof, H.
Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., (2012). Dense reconstruction on-the-fly. In Proceedings of
Stang, P., Strohband, S., Dupont, C., Jendrossek, L., Koe- the IEEE International Conference on Computer Vision
len, C., Markey, C., Rummel, C., van Niekerk, J., Jensen, and Pattern Recognition.
E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Werlberger, M., Pock, T., & Bischof, H. (2010). Motion estima-
Kaehler, A., Nefian, A., & Mahoney, P. (2007). Stanley: The tion with non-local total variation regularization. In Pro-
robot that won the DARPA grand challenge. In Buehler, ceedings of IEEE International Conference on Computer
M., Iagnemma, K., & Singh, S. (eds.), The 2005 DARPA Vision and Pattern Recognition.
Grand Challenge, vol. 36 of Springer Tracts in Advanced Zingg, S., Scaramuzza, D., Weiss, S., & Siegwart, R. (2010). MAV
Robotics (pp. 1–43). Berlin: Springer. navigation through indoor corridors using optical flow. In
Triggs, B., McLauchlan, P., Hartley, R., & Fitzgibbon, A. (2000). IEEE International Conference on Robotics and Automa-
Bundle adjustment—A modern synthesis. In Triggs, W., tion (ICRA).
Zisserman, A., & Szeliski, R. (eds.), Vision Algorithms: Zufferey, J.-C., & Floreano, D. (2006). Fly-inspired visual steer-
Theory and Practice, vol. 1883 of LNCS (pp. 298–372). ing of an ultralight indoor aircraft. IEEE Transactions on
Springer-Verlag. Robotics 22(1), 137–146.