You are on page 1of 176

Autonomous Navigation without HD Prior Maps

by
Teddy Ort
S.B., Massachusetts Institute of Technology (2016)
S.M., Massachusetts Institute of Technology (2020)
Submitted to the
Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2022
© Massachusetts Institute of Technology 2022. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
August 26, 2022

Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Daniela Rus
Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science
Thesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leslie A. Kolodziejski
Professor of Electrical Engineering and Computer Science
Chair, Department Committee on Graduate Students
2
Autonomous Navigation without HD Prior Maps
by
Teddy Ort

Submitted to the Department of Electrical Engineering and Computer Science


on August 26, 2022, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy

Abstract
Most fielded autonomous driving systems currently rely on High Definition (HD) prior
maps both to localize, and to retrieve detailed geometric and semantic information
about the environment. This information is necessary to enable safe operation of
many downstream driving components including, prediction, planning, and control.
However, this requirement has raised issues with scalability, confining autonomous
systems to small test regions where such detailed maps can be maintained. Fur-
thermore, the reliance on HD maps can prevent autonomous vehicles from realizing
human-like flexibility to both explore new areas and successfully navigate in rapidly
changing environments or weather conditions. In this thesis, we present MapLite, an
autonomous navigation system using only Standard Definition (SD) prior maps, in
conjunction with onboard perception to directly infer the necessary HD map online.
We also explore the use of a Localizing Ground Penetrating Radar (LGPR) for precise
localization using stable underground features that are robust to changing weather
conditions. Together, these methods can reduce the requirement for HD prior maps
and bring autonomous navigation closer to human levels of flexibility and robustness.

Thesis Supervisor: Daniela Rus


Title: Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Com-
puter Science

3
4
Acknowledgments

This thesis is the product of many years of effort that would not have been possible
without the contributions and support of many people. Firstly, I would like to thank
Daniela Rus for welcoming me into her lab, introducing me to the world of academic
research, and guiding me along this remarkable journey. I have learned many things
over the years from Daniela, including how to identify and select important research
problems, how to investigate and expand upon existing work, and how to commu-
nicate these effectively with the research community. Daniela’s lab encompasses a
magical vision of robotics and computing that is pervasive in our homes and work-
places helping people both physically and cognitively to meet their needs and goals. I
am so grateful for her sharing this vision with me and supporting and mentoring me
over the years and look forward to continuing to work together to make this vision a
reality.

I am also grateful to the members of my thesis committee: Leslie Kaelbling, John


Leonard, and Roland Siegwart for selflessly giving of their time to share their expertise
and advice with me. I am truly humbled to work with such a distinguished group
who have had such an enormous impact in shaping the field and I am honored to
have had the opportunity to share this work with them.

I was fortunate to have been able to collaborate throughout this work with two
mentors without whom this thesis could not have taken shape. Liam Paull, and Igor
Gilitschenski both spent countless hours brainstorming the ideas and experiments
outlined here, contributing to the code, and even testing and debugging in the field.
Words cannot express how much I appreciate their willingness to share their insights,
expertise, and wisdom with me and I treasure the time we spent together.

I would also like to thank all of my co-authors and collaborators including: Sai Kr-
ishna, Alyssa Pierson, Krishna Murthy, Dhaivat Bhatt, Jeff Walls, Steven Parkison,
Stephen McGill, Guy Rosman, Sertac Karaman, Rohan Banerjee, Brandon Araki, and
Luke Fletcher. Working with real hardware, from full-size vehicles to our scale race-
cars always requires the contribution of many people and I appreciate your willingness

5
to share your time and efforts with me. I look forward to having the opportunity to
work together in the future.
Additionally, I would like to thank all of the members of my DRL family who
have truly made MIT feel like a second home. Aaron Ray, Alexander Amini, Andy
Spielberg, Annan Zhang, Cenk Baykal, David Dorhout, Felix Naser, Guy Rosman,
Hunter Hansen, Igor Gilitschenski, Jeff Lipton, John Romanishin, Johnson Wang,
Joseph DelPreto, Lillian Chin, Lucas Liebenwein, Murad Abu-Khalf, Noam Buckman,
Ramin Hasani, Robert Katzschmann, Brandon Araki, Thomas Balch, Tim Seyde,
Xiao Li, and Yutong Ban, I have greatly enjoyed the collaborative environment of the
lab, the many lunches, outings, and conferences we have attended together, and the
feeling that no matter what time of day or night, a walk through the lab would more
likely than not mean chancing upon one of you working on something fascinating!
I would like to give a special thanks to Amado Antonini for being such a great
friend from the very first day I arrived on the MIT campus. Nine years ago, I would
have never guessed the journey that lay in store, whether it was attending classes,
taking exams, working on side projects, or camping in the White Mountains, I will
always treasure the memories of the many experiences we’ve shared throughout these
years.
I am so lucky to have such an amazing family that has loved and supported me
unconditionally through all of the years. Whether it was editing application essays,
sharing advice, or even making the long drive to help with a move, there was always
a sibling available to help with every need and I feel so fortunate to have you all in
my corner.
No one has been a bigger supporter, mentor, guide, and my number one fan than
my mom. Thank you for always believing in me beyond my wildest dreams, sharing
your advice when I was lost, and for always encouraging me to put another tool in
my toolbox!
Last but certainly not least, perhaps no one has had a closer view of this work than
my partner, Emily. Thank you for always being there for me, supporting me, checking
in, and for hundreds of sweet treats a.k.a deadline antidotes. I am so grateful to have

6
you by my side for this journey and I’m so excited to begin our next adventure!
This work has benefited from the support of the Toyota Research Institute (TRI),
the MIT Lincoln Laboratory, and ONR grant N00014-18-1-2830. However, this thesis
solely reflects the opinions and conclusions of the author and not TRI or any other
entity.

7
8
Contents

1 Introduction 21
1.1 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.2.1 HD Map Creation . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.2 Map Storage and Transmission . . . . . . . . . . . . . . . . . 28
1.2.3 Map Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.2.4 Perceptual Aliasing . . . . . . . . . . . . . . . . . . . . . . . . 30
1.2.5 Inclement Weather . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6 Relevant Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2 Related Work 41
2.1 Map-based Autonomous Navigation . . . . . . . . . . . . . . . . . . . 41
2.2 Mapless Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Automated Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Topological/Topometric Localization and Navigation . . . . . . . . . 43

9
3 MapLite: Autonomous Navigation with SD Prior Maps 45
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 MapLite Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.1 Topometric Map . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Road Surface Segmentation . . . . . . . . . . . . . . . . . . . 51
3.3.3 Topometric registration . . . . . . . . . . . . . . . . . . . . . . 51
3.3.4 Route Planning . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.5 Trajectory Planner . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4 Experiements and Results . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.2 Road Segmentation Results . . . . . . . . . . . . . . . . . . . 60
3.4.3 Registration Evaluation Results . . . . . . . . . . . . . . . . . 61
3.4.4 System Evaluation Results . . . . . . . . . . . . . . . . . . . . 62
3.4.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 MapLite 2.0: Online HD Map Inference 69


4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 Offline map generation . . . . . . . . . . . . . . . . . . . . . . 72
4.2.2 Online map inference . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 MapLite 2.0 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 State Description . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.2 Bird’s Eye View (BEV) Semantic Segmentation . . . . . . . . 77
4.3.3 Online HD Map State Estimation . . . . . . . . . . . . . . . . 80
4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.2 Semantic Segmentation Results . . . . . . . . . . . . . . . . . 85
4.4.3 MapLite 2.0 Results . . . . . . . . . . . . . . . . . . . . . . . 86

10
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 Localizing Ground Penetrating Radar 93


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.1 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.2 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.3 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3.4 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 A Localizing Ground Penetrating Radar . . . . . . . . . . . . . . . . 100
5.4.1 LGPR Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.2 Localization Algorithm . . . . . . . . . . . . . . . . . . . . . . 103
5.4.3 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Planning and Control Pipeline . . . . . . . . . . . . . . . . . . . . . . 107
5.5.1 Pure Pursuit Controller . . . . . . . . . . . . . . . . . . . . . 107
5.5.2 Speed Controller . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6 LGPRNet: Learning to Localize in Weather . . . . . . . . . . . . . . 109
5.6.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 109
5.6.2 LGPRNet Architecture . . . . . . . . . . . . . . . . . . . . . . 110
5.6.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.7 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.7.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7.3 Correlation Method Results . . . . . . . . . . . . . . . . . . . 117
5.7.4 LGPRNet Results . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6 The GROUNDED Dataset 125


6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

11
6.3 Benchmark Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3.1 Localization in Weather Challenge . . . . . . . . . . . . . . . 130
6.3.2 Multi-lane Mapping Challenge . . . . . . . . . . . . . . . . . . 132
6.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.1 Run Level Data . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4.2 Sensor Level Data . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.5 Data Collection Platform . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.5.1 Vehicle Infrastructure . . . . . . . . . . . . . . . . . . . . . . . 137
6.5.2 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.5.3 Time Synchronization . . . . . . . . . . . . . . . . . . . . . . 140
6.5.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7 Conclusion 151
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3 Social Responsibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.4.1 Dialable Navigation . . . . . . . . . . . . . . . . . . . . . . . . 157
7.4.2 Persistent Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.4.3 Expanded HD Map Feature Set . . . . . . . . . . . . . . . . . 159
7.4.4 MapLite with LGPR . . . . . . . . . . . . . . . . . . . . . . . 159

A Supplementary Material 161

12
List of Figures

1-1 A scene showing traffic congestion on a highway. This is so common-


place it may seem boring, but notice that every single vehicle has a
human operator performing the monotonous and dangerous task of
driving. If robots could pilot vehicles safely it would save enormous
amounts of time and many thousands of lives. . . . . . . . . . . . . . 22

1-2 In this thesis, the various challenges associated with autonomous nav-
igation relying on HD prior maps are addressed by various methods
along two main thrusts: 1) MapLite and 2) Localizing Ground Pene-
trating Radar (LGPR). . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3-1 MapLite: A visualization of our system operating on a rural road


in Central Massachusetts. The white lines show the topometric map
obtained from OpenStreetMap.org. The colored pointcloud is obtained
from the lidar scanner used in MapLite to navigate without the need
to localize on a detailed prior map. Note that the noisy topometric
map has been registered to the local sensor data. . . . . . . . . . . . 46

3-2 The MapLite system differs from a typical autonomous navigation


pipeline in five components: Topological Map, Road Segmentation,
Map Registration, Route Replanning, and Motion Planning . . . . . . 49

13
3-3 An example result from the trajectory planner. The vehicle is located
at the blue dot, and received the white path from the Route Planner to
indicate a right turn. The red (road) and blue (off-road) road segmen-
tation is used to plan the green trajectory which follows the high-level
navigation goal while safely remaining on the road. . . . . . . . . . . 56

3-4 A comparison between the trajectory autonomously driven by MapLite


(blue) with the path estimated by odometry (green). The shaded area
is the groundtruth road surface. . . . . . . . . . . . . . . . . . . . . . 59

3-5 A comparison between the RMSE of the estmated map before (Blue)
and after (green) topometric registration. . . . . . . . . . . . . . . . . 61

3-6 The vehicle at the test site in rural Massachusetts. The heavily wooded
area has roads without markings or boundaries. . . . . . . . . . . . . 62

3-7 A traffic circle in the CARLA simulator. We used CARLA to test more
complex road topologies than were available at our test site. . . . . . 65

4-1 An online HD map overlaying lidar intensity as a vehicle traverses


an intersection. The white and yellow lines indicate inferred road and
lane boundaries while the red lines denote intersection entry/exit ports.
The background shows a projection of the lidar intensity data into the
birds-eye-view (BEV) that was used to help generate the HD map online. 71

4-2 In the proposed system, an SD prior map is fused with onboard sensor
data to perform an online estimation of the HD map near the vehicle. 72

4-3 The HD map model state consists of a set of parameters which define
the geometries relative to an SD prior basemap (shown in blue). The
red lines indicate intersection entry/exit ports and the black circles
indicate knot points. In this illustrative example, there would be many
more parameters required to define the full shape of the map elements
shown. However for figure clarity, we only include one of each type. . 75

14
4-4 The first step in the perception pipeline is to project the sensor data
into the Bird’s Eye View (BEV) frame. On the left is a lidar intensity
frame, next is an elevation map, and last is an RGB frame. The red
trace shows the path of the vehicle in each frame. . . . . . . . . . . . 78

4-5 The confusion matrix for the semantic segmentation. The true class is
on the vertical axis, with the prediction on the horizontal. . . . . . . 86

4-6 Selected examples of the online estimated HD map. Rows indicate


different scenarios while columns illustrate pipeline outputs. . . . . . 91

5-1 Our LGPR based autonomous navigation system is capable of success-


fully localizing from a single map in different challenging weather and
lighting conditions including snow, rain, and darkness. . . . . . . . . 94

5-2 The autonomous navigation system architecture showing the key sys-
tem components: Mapping, Localization, Planning, and Control. The
LGPR system used as the primary localization sensor is outlined in red. 97

5-3 The Localizing Ground Penetrating Radar (LGPR) sensor mounted


on the rear of the autonomous vehicle. The main components include
the 11 element radar array and the switch matrix which collected the
signals from the array. The processing chassis contains the onboard
computer for the sensor data processing, and the Global Positioning
System (GPS) unit used for labeling the prior maps and initializing
the search window for localization. . . . . . . . . . . . . . . . . . . . 101

5-4 The autonomous navigation pipeline which processed sensor inputs


from the wheel odometry, and LGPR sensor, and computed steering
and speed commands to send to the vehicle to autonomously drive the
vehicle along the goal path. . . . . . . . . . . . . . . . . . . . . . . . 102

5-5 LGPR data collected from a single radar element as the vehicle was
driven over a path. The colors indicate the intensity of the reflected
signal at each depth and the high degree of structure allows the subter-
ranean geological features to be coherently measured using this method.103

15
5-6 The pure pursuit tracking algorithm was used to smoothly steer the
vehicle along the goal path (shown in green). The blue sphere shows
the lookahead distance and the blue curve shows the path the controller
intends to take. As expected, this path tends to deviate from the goal
path in turns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5-7 LGPRNet architecture composed of traditional convolutional layers


along with Inception building blocks includes a total of 20 layers (Not
including the auxiliary outputs, and considering each inception block a
single layer). Notice that many of the kernels are not square to account
for the unusually high aspect ratio of LGPR frames. . . . . . . . . . . 111

5-8 A comparison of the LGPR estimates compared to ground truth in


various weather conditions. . . . . . . . . . . . . . . . . . . . . . . . . 116

5-9 A histogram showing the distribution of scan correlations between the


test run and the LGPR map in various weather conditions. . . . . . . 117

5-10 A comparison of the LGPR estimates compared to ground truth in


manual and autonomous driving modes. . . . . . . . . . . . . . . . . 118

5-11 The localization accuracy is affected by the target position in the win-
dow (Top) as well as the size of the search window (Bottom). . . . . . 122

6-1 The GROUNDED dataset includes four data streams for each run. 1)
Lidar scans from a roof mounted Velodyne HDL-64, 2) Camera images
from a front-facing Point Grey Grasshopper camera, 3) Base station
corrected RTK-GPS for groundtruth, and 4) LGPR data stream from
the rear-mounted radar unit. . . . . . . . . . . . . . . . . . . . . . . . 144

6-2 Left: Camera images from three different runs of the same route (route_04)
capturing LGPR data in the same location in clear weather, snow, and
rain to support the Localization in Weather Challenge. Right: Trajec-
tory of route_04 overlaid on a map. Other runs capture a variety of
environments including rural (shown here), urban, and highway. . . . 145

16
6-3 The Multi-lane Mapping Challenge provides LGPR frames collected
in four lane positions: 1) Left, 2) Right, and 3) Center 4) Changing.
These runs can be used to create a consistent map capable of localizing
the vehicle continuously even while switching lanes. . . . . . . . . . . 146
6-4 The dataset is organized in directories for each run. Run metadata such
as the lane and weather conditions is provided in the runs.csv file and
referenced by the corresponding {run_id}. Similarly, for each sensor,
frame metadata is provided in a frames.csv file referencing individual
frames by their {frame_id}. . . . . . . . . . . . . . . . . . . . . . . . 147
6-5 Top: The Toyotal Prius vehicle platform used to collect the dataset.
Bottom: A schematic showing the positions of reference frames for
the vehicle and each of the sensors. Blue is the vehicle frame, Red is
the OXTS GPS/IMU, Green is the camera, Magenta is the lidar, and
Yellow is the LGPR sensor. The measured transforms between these
frames are included in the dataset. . . . . . . . . . . . . . . . . . . . 148
6-6 The LGPR sensor used to collect the dataset. The processing chassis
communicated with the switch matrix to control transmit and receive
on the 12 radar elements. The GPS shown here was used for time syn-
chronization, but groundtruth information was obtained by a separate
GPS unit onboard the vehicle. . . . . . . . . . . . . . . . . . . . . . . 149

17
18
List of Tables

3.1 Evaluation of Road Surface Segmentation Methods . . . . . . . . . . 60


3.2 MapLite Performance Evaluation and Ablation Study . . . . . . . . . 63

4.1 Parameter types in the HD map model . . . . . . . . . . . . . . . . . 82


4.2 MapLite 2.0 evaluation results. Values represent mean (std) . . . . . 87

5.1 The results of the LGPRNet predictions compared the hand-designed


optimal correlation algorithm used in [99]. The mean distance error
(m) is compared as well as its components in the lateral and longitu-
dinal directions with respect to the lane direction. . . . . . . . . . . . 120

6.1 Overview of exteroceptive sensing modalities in autonomous vehicle


navigation research datasets. . . . . . . . . . . . . . . . . . . . . . . 129
6.2 The file runs.csv contains metadata for every run in the dataset orga-
nized into the columns shown. . . . . . . . . . . . . . . . . . . . . . . 134
6.3 The GROUNDED dataset contains data collected in a variety of lane
positions, and environment and weather conditions to support the pro-
posed benchmark challenges. . . . . . . . . . . . . . . . . . . . . . . . 136

A.1 Listing of map/target run pairs from GROUNDED used for training
and evaluation of the LGPRNet model. . . . . . . . . . . . . . . . . . 161

19
20
Chapter 1

Introduction

1.1 Vision
Autonomous navigation is a fundamental capability required if robots are to attain
true autonomy. Some early robot successes focused instead on robot arms operating in
factories which were fixed to a workbench and interacted only with objects within the
workspace. However, such a model precludes robots from assisting with the myriad
tasks which require navigating around an environment or among several locations.
Consider the robots used to explore the Fukushima nuclear plant to assess dam-
age and aid with planning for decommissioning [51]. These robots could enter areas
which were far too dangerous for their human operators. In manufacturing and lo-
gistics, robots are beginning to venture outside the workbench to assist with loading,
restocking, and cleaning tasks which require skillful navigation in dynamic environ-
ments. Perhaps no other mobile robotics application has attracted as much attention
recently as autonomous vehicles on road networks, and for good reason. In the United
States, over 80% of transportation takes place by road vehicles and virtually all of it
currently requires a human operator in every vehicle.
In addition to the tremendous cost of the time spent operating vehicles, driving
is also one of the most dangerous human activities in everyday life. Traffic accidents
consistently rank in the top three causes of death for persons under 45, and cause
an estimated 3,000 deaths globally every day [95, 130]. The vast majority of these

21
Figure 1-1: A scene showing traffic congestion on a highway. This is so commonplace
it may seem boring, but notice that every single vehicle has a human operator per-
forming the monotonous and dangerous task of driving. If robots could pilot vehicles
safely it would save enormous amounts of time and many thousands of lives.

are caused by the uniquely human shortcomings of driver distraction, impairment


from drugs or alcohol, and drowsiness. For these reasons, dozens of efforts spanning
both industry and academia have taken on the challenge to develop fully autonomous
mobile vehicles on public roads.

One common element in virtually all fielded autonomous vehicles today is a highly-
detailed world model, or map, of the environment constructed prior to autonomous
operation. While early mobile robots used maps that were created by humans, in the
last several decades Simultaneous Localization and Mapping (SLAM) has become the
dominant method relied upon by nearly all mapping and localization systems. In this
paradigm, a human first manually drives the vehicle through the desired operating
course and records data from the onboard sensors. Next, a SLAM algorithm processes
the data to build a coherent 3D map of the environment. Features of interest such as
lane lines, intersections, or traffic lights are then added to the map as annotations.
Finally, during autonomous operation, the vehicle estimates its own position in the

22
map – called localization – to retrieve an accurate 3D world model which is used
for planning and control. Such maps are often called High Definition (HD) maps
to differentiate them from the sparse symbolic street maps typically used by human
drivers. For comparison, an HD map of a small city could reach thousands of gigabytes
of storage, while an SD map of the same region would only be a few megabytes in
size.

From the 40 ton autonomous trucks operated by Waymo [127] to the 1 ton Nuro
[97] vehicle driving at less than 25 mph and carrying no passengers (only packages),
HD maps are ubiquitous in the vast majority of autonomous vehicles operating today.
One notable exception is the Tesla Full Self-Driving (FSD) feature currently available
in beta. This system aims to allow point-to-point navigation without requiring an
HD map created a priori. This system may be conceptually close to the approach
we take here; however, there are a number of ways in which it differs. Firstly, FSD
requires tremendous amounts of training data which are processed to train the neural
nets required for driving. These are practically quite similar to the HD prior maps
of other approaches albeit in a different representation. Secondly, at present, FSD
requires a human driver be available and responsible for taking control of the vehicle
at all times. Therefore, while it aspires to attain true autonomy, it remains to be seen
if the current approach will achieve that. Finally, since FSD is proprietary software,
it isn’t known exactly how it functions or the level of detail included in the maps it
relies upon. Thus, an accurate comparison of FSD to the current thesis isn’t possible.

To fully grasp the burden of creating HD prior maps, suppose you were planning
to deploy an autonomous vehicle and were tasked with building the mapping and
localization pipeline. First, you must manually pilot the vehicle through the region
of interest to collect mapping data. Since the vehicle sensors are sometimes occluded
(e.g. while driving near a truck or bus) multiple passes are often needed on every road
to obtain complete mapping data. Next, a SLAM flavor of choice is used to build a
consistent 3D map by estimating the precise vehicle trajectory during the mapping
run. Since these can sometimes lead to inconsistent results, humans are often used “in
the loop” to manually check and reject incorrect results or “loop closures” that result

23
from the algorithm incorrectly matching the same scene viewed at two different times.
Next, annotations are added to label the various semantic features in the environment,
from the precise positions of curbs and lane lines to traffic lights and intersections.
This process is once again very demanding on the human annotators who are often
relied upon to generate or check these annotations. Finally, the generated HD map,
which is typically thousands of GB even for a single city, must be loaded onto the
vehicle to enable autonomy only in that region.
Consider scaling this operation to much larger regions. The cost of building HD
maps can be prohibitive in all but the most densely populated regions. Since the maps
will be too large to store locally, high-throughput data connections may be required
to download regions on the fly. Any minor construction or lane-line repainting could
require an area be remapped - also known as map maintenance. This maintenance is
particularly challenging because as new regions are mapped, the number of maps that
must be maintained grows quadratically. Finally, if the appearance of the environment
changes, for example due to rain, snow, or fog occluding the map landmarks, the map
can be rendered useless leaving the vehicle stranded.
Now contrast this with the pre-requisites for a typical human driver. A person
usually requires nothing but a standard definition (SD) symbolic street map to com-
prehend the required navigational steps to reach the desired destination. These street
maps are readily available on standard GPS devices or for download online. In fact,
a digital SD map of the entire globe can fit on a single 50 GB thumb drive. This
map is sufficient for human drivers to navigate in a variety of dynamic weather and
lighting conditions, and doesn’t need updating unless major road construction occurs.
This results in a map lifetime that often reaches many years. Moreover, even in the
event that a human fails to localize and becomes “lost”, people are fully capable of
continuing to drive safely even without any map at all, albeit without necessarily
making progress toward their destination.
The Society of Automotive Engineers (SAE) has standardized autonomous vehicle
capabilities into five levels [115].

• Level 1 refers to vehicles that are fully manual - driven by a human operator at

24
all times.

• Level 2 vehicles have partial automation of both steering and acceleration of


the vehicle, but require human supervision at all times.

• Level 3 is conditional automation which means the vehicle can be fully respon-
sible for the driving task at times, but requires the human operator be ready to
resume control at the request of the conditionally autonomous system.

• Level 4 vehicles have a high degree of autonomy and are capable of unsupervised
autonomous navigation within a specified Operational Design Domain (ODD)
which could constrain the vehicle to specific locations and/or weather condi-
tions.

• Level 5 systems are fully autonomous and expected to be capable of safely


operating the vehicle in all locations and under all conditions that a human
driver could.

All fully-autonomous systems currently deployed are level 4 or below. While these
systems may appear deceptively close to level 5, in fact, there is a large gap of as-yet
unsolved problems before achieving the final level of autonomy. This author believes
that one of the primary reasons for this gap is the heavy reliance of level 4 systems
on HD prior maps. While these maps are extremely valuable for simplifying the
problem of autonomous navigation for a small testing pilot or demo, the laborious
HD mapping process outlined above will be difficult or impossible to scale to the vast
areas and dynamically changing environments required for level 5. There has to be a
better way.
Vision: In this thesis, we envision a future in which robots can navigate complex
environments as easily and flexibly as humans. Whether it be a road network, a
factory floor, or a damaged nuclear power plant, robots will use simple symbolic
priors such as street maps or floorplans to correctly find their way to their desired
destination. This will be enabled by a paradigm shift in the localization and mapping
framework. No longer will dense HD prior maps be a pre-requisite for deploying a

25
robot in a complex environment. Instead, onboard sensors will be used to handle
perceiving and interpreting the environment, in real-time, to provide the planning
and control modules with the necessary information needed to guide the robot to its
goal. The SD prior maps will be needed only for high-level symbolic reasoning such
as deciding which direction to take at an intersection - not for recording the precise
geometry of the lanes it must traverse.
It should be noted that the approach we propose here is diametrically opposed to
the typical HD map-based approach by advocating for an almost exclusive reliance
on online perception with very sparse prior maps. However, these extremes can exist
on a continuum. Depending on the requirements of a particular application, more
detailed data could be recorded in the map which would ease the burden on the online
perception at the cost of increasing the cost of building and maintaining the prior
map. We call this paradigm “Dialable Navigation” 1 which indicates that one could
“dial” how much of the navigation task is borne by the online perception by choosing
what to include in the prior map. In this thesis, we focused on turning the dial as far
as possible toward the online perception side of the continuum to demonstrate what
is achievable. Throughout this thesis, we will indicate design decisions where this
dial could be adjusted, and we leave it to the individual designer to choose the most
appropriate setting for their application. By leveraging the techniques we will outline
in the remainder of this thesis, we hope robots will one day be as capable as humans
of quickly acclimating to new surroundings and navigating with ease in complex and
rapidly changing environments.

1.2 Challenges

HD maps, where available, provide a valuable prior that drastically simplifies the
problem of safe autonomous navigation. With nearly perfect advance knowledge of
the environment, the downstream planning and control modules can simply lookup
1
The term “Dialable Navigation” was coined independently by Prof. Liam Paull, Université de
Montréal, and researchers at the Toyota Research Institute, collaborators for this thesis.

26
the necessary environment geometry to plan and execute the desired trajectory to
guide the vehicle to its destination. This leaves the perception system to focus ex-
clusively on the dynamic actors in the environment such as other vehicles, bicyclists,
and pedestrians. Detection, tracking, and prediction modules will be required to
understand these dynamic agents and ensure the vehicle interacts safely with them.
Moreover, even these modules can benefit greatly from an HD prior map. For exam-
ple, knowing that another vehicle is located in an adjacent lane can aid in predicting
its future trajectory. Similarly, pedestrians observed near crosswalks require much
more caution than those walking on sidewalks. Having all of these features annotated
in the HD map can greatly simplify these downstream modules.
On the other hand, requiring an HD prior map adds considerable challenges to
the deployment of autonomous vehicles. In particular, scaling systems beyond simple
testing pilots toward SAE level 5 can be impossible when building and maintaining
HD maps is necessary. In this section, we will outline some of the key challenges with
HD maps that could be reduced using the approaches described next.

1.2.1 HD Map Creation

One of the primary challenges with relying on HD maps is the burden of map creation
prior to deployment. In this paradigm, an autonomous vehicle cannot operate in a
region that hasn’t been mapped. However, in order to create the map, it must be
driven throughout the region to collect map data. For this reason, human drivers are
typically employed to pilot the mapping vehicle throughout every road in the desired
region. Since every part of the environment must be recorded by the vehicle sensors in
order to build a complete map, multiple passes on the same roads are often necessary
to fill in any gaps caused by occlusions. This can be an expensive and time-consuming
process requiring skilled operators to understand the mapping process and pilot the
vehicle in the correct routes to obtain a complete mapping dataset.
Even after the data for a region is collected, the map creation process is not
complete. First, a SLAM algorithm is used to stitch together the multiple mapping
runs into a single cohesive map by simultaneously estimating the vehicle trajecto-

27
ries. Typical approaches, such as factor graph optimization [39], are computationally
expensive as they grow exponentially with the size of the map. Furthermore, they
can be sensitive to incorrect loop closures which can cause the resulting map to be
inconsistent. For this reason, humans are often used in-the-loop to detect and delete
spurious loop closures caused by incorrect data associations. This makes the map
creation process not only computationally expensive but also human labor intensive
as well.
Finally, after a globally consistent 3D map is successfully generated, there is still
an annotation phase where all of the important elements in the map are labeled. While
this process can be partially automated, this is commonly the most labor-intensive
part of the mapping process requiring large teams of annotators to painstakingly
label detailed features such as road boundaries, traffic light locations, crosswalks, etc.
All of this is a prerequisite to the vehicle operating autonomously which completely
precludes the possibility of automating the mapping process itself.

1.2.2 Map Storage and Transmission

In addition to the intense labor required to build the map, the final HD map product is
extremely large due to the detailed data encoded within. For example, a commercial
vendor for HD maps offers a map of San Francisco, CA containing over 4,000 GB
of data. In contrast, the SD maps typically used by humans to navigate can be
downloaded for free from OpenStreetMap.org and the entire globe could fit on a
50 GB flash drive.
Considering that San Francisco only contains approximately 1,000 miles of roads,
while the United States contains over 4 million miles, it is clear that storing HD
maps onboard the vehicle would quickly become intractable. Instead, it has been
proposed that autonomous vehicles relying on HD maps would download new maps
as they near the edges of the current region, and delete old maps that are no longer
needed. However, this raises the challenge of providing high-speed data connections
to allow for the transfer of such large HD maps. Returning to the example 4 TB
map of San Francisco, current 4G connections can average approximately 50 Mbps

28
yielding a download time of 22 hours for just one city. Even if newer 5G networks
currently being rolled out will reduce this by an order of magnitude, the time and
cost of transmitting these maps will still be prohibitive.

On the other hand, the SD maps used by human drivers are often pre-loaded on
many vehicles or GPS devices with an entire country or continent requiring no map
transmissions. Furthermore, downloading maps while driving can be accomplished
on even slow 3G network connections because the map storage size is so much smaller
than the speed at which a driving vehicles would require it. This begs the question:
should an autonomous vehicle really need such a detailed level of prior knowledge of
its environment?

1.2.3 Map Maintenance

Perhaps an even larger issue than map creation, map maintenance is a monumental
challenge when scaling up HD map based autonomous driving pipelines. We live in
an ever-changing world and paradoxically, the fact that HD maps contain so much
detail is precisely the reason they are so hard to maintain. A map which contains the
lane boundaries must be updated every time the road is painted. On the other hand,
an SD map which only contains road centerlines might not need updating unless an
entire road is created or destroyed.

In one telling example of this problem, the author was recently riding in an au-
tonomous vehicle undergoing a pilot. The safety driver had to take over near an
intersection because the traffic lights had been replaced causing the bulbs to move
over a short distance. Since the HD map included the precise positions of each traffic
light, the vehicle could not navigate the intersection until the traffic light positions
were manually updated. Of course, this particular example could be mitigated by
instead using onboard sensors to detect the lights. However, this starkly illustrates
the drawback of including precise details in a prior map that must then be maintained
for posterity.

29
1.2.4 Perceptual Aliasing

Another challenge with HD maps is caused by the way in which localization takes
place. Typically, features in the environment are automatically selected by the SLAM
frontend and used to associate landmarks with particular scenes in the map. It is
imperative that these features be both stable and unique if the robot is to successfully
localize when returning to the same place. This works well in dense urban areas where
there is a lot of structure such as road signs, lane lines, and buildings that are both
permanent and unique in each place. However, consider a rural road winding through
a forest with little structure. One could drive for miles with nothing but trees or
farms alongside the road. Many of the features detectable in such as setting are
both temporary (as the foliage changes with the seasons) and suffer from perceptual
aliasing - the condition that arises when multiple locations appear the same to the
localization system leading to ambiguity as to where the robot is in the map.
The fact that rural areas are challenging for HD mapping solutions is particularly
unfortunate considering the demographic that autonomous vehicles can serve. While
the populations in cities have a multitude of transportation options including trains,
buses, and taxis, people living in rural areas often have no ability to travel unless they
own a personal vehicle and are physically capable of driving. Access to a personal
transportation option can be the only way a person living in a rural area can access
basic needs such as obtaining food and healthcare. Therefore, the fact that most
current autonomous vehicles will struggle in these areas means they will be unable to
help precisely the populations who stand to benefit the most from these technologies.

1.2.5 Inclement Weather

The last major challenge faced by autonomous vehicles relying on HD maps is caused
by inclement weather conditions. Fog, rain, or snow can all occlude the landmarks
recorded in the HD map which are used for localization. To be sure, humans are not
immune from the challenges of driving in inclement weather conditions. However,
human drivers are capable of enormous flexibility in perceiving and understanding

30
Thesis
Use onboard perception to
reduce requirement for
accurate prior HD maps

HD Map Storage and Map Perceptual Inclement


Creation Transmission Maintenance Aliasing Weather

MapLite LGPR

MapLite MapLite 2.0 LGPR LGPRNet


GROUNDED
Autonomous navigation Online HD Map Localizing Ground Learning to localize with
Large-scale LGPR Dataset
with SD Prior Inference Penetrating Radar LGPR signals

Figure 1-2: In this thesis, the various challenges associated with autonomous navi-
gation relying on HD prior maps are addressed by various methods along two main
thrusts: 1) MapLite and 2) Localizing Ground Penetrating Radar (LGPR).

the environment. For example, a person might infer the shape of a snow-covered road
from a slight change in height of the snow at the borders, from the lights of another
vehicle leading the way, or even from the tire tracks left by another vehicle passing
before. On the other hand, HD map-based solutions require matching exactly the
same landmarks that were visible when the map was recorded in order to successfully
localize. If these landmarks are covered or appear differently due to weather, the
entire solution will fail.
In the last few sections we have outlined some of the serious challenges faced by
autonomous vehicles relying on HD maps for localization and navigation. In the next
section, we will give an overview of our proposed methods and show how they can
mitigate or eliminate each of these issues.

1.3 Approach

In this thesis, we propose the use of onboard sensors to enable autonomous navigation
without relying on detailed HD prior maps. As seen in Fig. 1-2 the proposed methods

31
are organized in two main topics: MapLite and Localizing Ground Penetrating Radar
(LGPR). Each of these addresses some of the challenges with HD maps outlined in
the last section.

The MapLite approach uses the same SD prior maps designed for humans to en-
able autonomous navigation by leveraging onboard sensors to detect and segment the
road in real-time. The SD prior is used only as a navigational cue to determine the
correct direction to navigate at intersections. A Bayesian estimation framework is
used to integrate consecutive measurements over time to build a maximum likelihood
belief regarding the pose of the vehicle in the SD map. In MapLite 2.0, the onboard
sensors are further leveraged to build an online HD map-like representation which is
necessary for navigating in more complex urban scenarios. This enables the vehicle
to navigate in new environments by downloading the SD prior on-the-fly without re-
quiring the manual map creation phase typical of HD mapping pipelines. Similarly,
the reliance on only SD maps greatly reduces the storage size which addresses the
challenges with storing and transmitting very large HD map data. The map mainte-
nance challenge is also greatly ameliorated because the SD prior maps have a much
longer lifespan than HD maps since they only require an update when a road is cre-
ated or destroyed. Lastly, the challenge with perceptual aliasing is reduced because
the MapLite framework doesn’t require precise localization for safe navigation. For
example, if perceptual aliasing would cause the estimate to drift longitudinally along
a stretch of rural road, the vehicle would be able to continue safely operating. Once
an intersection or other unique feature arrives in view (e.g. the perceptual aliasing is
resolved) the Bayesian update will correct for the previous ambiguity.

On the other hand, the challenge posed by inclement weather is not resolved by
the MapLite system. Since it relies on onboard sensors perceiving the scene and seg-
menting the important road features, inclement weather could obscure these features
and prevent them from being correctly detected. Instead, for this challenge we pro-
pose the use of an LGPR based mapping framework which uses landmarks beneath
the road for map building. With LGPRNet we develop a learning architecture specif-
ically designed to enable localization in inclement weather by utilizing cross-weather

32
examples in the training data.
While LGPR maps are not quite as small as SD maps, our analysis shows that
they are smaller than typical HD maps of a similar size region. Furthermore, the
issue with maintaining HD maps is largely mitigated by using LGPR maps since our
experiments have shown that the underground features are stable over many months.
The challenge of perceptual aliasing is likewise not an issue with LGPR localization
as the underground features provide a unique set of landmarks even in rural areas
where the surface lacks structure.
One apparent drawback of the LGPR approach is the requirement to manually
collect LGPR maps before deploying since LGPR data is not readily available like
street maps. In this way, they more closely resemble HD maps which also require a
manual mapping phase. However, it should be noted that these two systems could
be used together in order to achieve a combined system with the benefits of both
which solves all of the challenges we have raised. In particular, a MapLite based
system could be used to explore large areas in fair weather when the onboard surface
sensors can successfully perceive the scene. Simultaneously, the LGPR sensor could
be collecting mapping data fully autonomously. Then, in the event snow, rain, or fog
obscures the road, the LGPR sensor could localize to the previously collected LGPR
maps to enable safe autonomy in these harsh weather conditions that even pose a
challenge to human drivers.

1.4 Contribution
This thesis includes contributions in the areas of core algorithms, autonomous sys-
tems, and experiments as follows.

1.4.1 Algorithms

1. The MapLite algorithm, described in Chapters 3 – 4 introduces a novel method


to fuse sparse prior information about an environment with uncertain onboard
perception to infer the maximum likelihood estimate about the environment

33
state online.

2. A new HD map representation, the MapLite Prior is presented in Chapter 4,


which is parameterized with respect to an SD prior map. Furthermore, we
show how to initialize these parameters to include uncertainty-aware estimates
of the HD map using only an SD prior. This enables the MapLite algorithm to
estimate rich HD map models online.

3. The LGPRNet Convolutional Neural Network architecture presents a modified


inception network pre-trained for image classification tasks to instead learn
to localize a target frame in a sample from LGPR data. In Chapter 5 we
demonstrate that the learned localization is more robust to changing weather
conditions than the previous state-of-the-art.

1.4.2 Systems

1. The MapLite system enables point-to-point navigation on road networks using


only an SD prior map. In this system, even the GPS is used only during initial-
ization after which navigation is accomplished solely using on-board perception
for local planning and the SD prior map for high-level navigation decisions. We
also find in an ablation study that redundancy can be achieved by the over-
lapping responsibilities of the system components. For example, errors in the
perception or inaccuracies in the SD prior can be mitigated by the local planner.

2. We design a multi-modal perception pipeline for the MapLite system described


in Chapter 4 which generates a variable resolution stack of feature images by
aggregating six camera streams and four lidar sensors. This approach allows
the MapLite algorithm to be compatible with an arbitrary number of sensors
and sensing modalities provided they can observe the features necessary for
inferences.

3. The LGPR based localization system is the first deployment of this sensing
modality for automating civilian vehicles. This system includes an EKF for

34
fusing odometry with the LGPR localization estimates and a pure-pursuit path
tracking algorithm for following underground trajectories without any surface-
facing sensors.

4. The GROUNDED dataset includes over 100 driving sequences with LGPR data
along with camera, lidar, and groundtruth GPS positioning information. Since
the LGPR sensors is not yet widely available, we hope this dataset will enable
the research community to collaborate on this new and exciting sensing modality
that can enable autonomy in the most challenging ambient weather conditions.

1.4.3 Experiments

1. In order to evaluate the MapLite navigation system we conducted real-world


trials on a full-scale autonomous Toyota Prius. These trials included over 15 km
of driving where the vehicle had to make navigation decisions at intersections
on the fly using the SD prior map. All the trials also included an RTK-GPS
system for groundtruth evaluation. An ablation study was also conducted to
learn the effect of individual components on system performance. The system
was also deployed on a simulated vehicle in the CARLA simulator to evaluate
more complex road network topologies.

2. The HD map inference component of the MapLite 2.0 system was evaluated
using data collected from a fully-autonomous vehicle operating on public roads
in Michigan and California in collaboration with the Toyota Research Institute.
Our experiments included 100 driving sequences, each of which encompassed a
single intersection traversal. The HD map inferred online using the MapLite
framework was compared with an HD map prepared in the traditional way
which included manual labeling.

3. To evaluate our LGPR solution enabling autonomous navigation in inclement


weather we deployed the sensor on a full-scale autonomous vehicle for real-world
testing. We conducted experiments in clear weather, rain, and snow, along with

35
tests in urban environments, highway driving, and on rural roads, to assess the
performance in a wide variety of conditions that are challenging for vision and
lidar sensors. In these tests, the vehicle relied solely on the LGPR sensor for
navigation and did not use any surface facing sensors. We found that the LGPR
sensor is capable of guiding a vehicle along a trajectory marked by underground
features and demonstrated that the LGPRNet algorithm was more robust to
inclement weather conditions than the previous state-of-the-art scan matching
method.

1.5 Thesis Outline

The remainder of this thesis is organized as follows:


Chapter 3: MapLite: Autonomous Navigation with SD Prior Maps
In Chapter 3, we introduce MapLite, a framework for enabling autonomous nav-
igation using only an SD prior map. The key insight developed here lies with the
bifurcation of the problem domain into a) the symbolic domain where the SD map
information is incorporated and b) the vehicle frame where the local metric planning
takes place using the onboard sensors. The map registration module is responsible for
fusing the information between these domains by maintaining a bayesian maximum a
posteriori estimate of the robots location in the SD map which allows for generating a
local reference path in the vehicle frame the guides the robot toward the goal selected
in the SD map.
We implement this approach on a full-size autonomous vehicle and evaluate it in
extensive real-world testing. We find that the vehicle can successfully complete point-
to-point navigation without using either an HD prior map or even a GPS localization
beyond the initialization. This demonstrates that the vehicle can successfully navigate
a novel environment using only the onboard sensors and an SD prior map in a manner
inspired by human drivers.
Chapter 4: MapLite 2.0: Online HD Map Inference
In Chapter 4, we extend MapLite to include an online inference of the HD map

36
online. While the previous chapter dealt with navigating without using and HD map
at all, this is most suitable for unstructured rural environments where the required
planning behaviors can be inferred directly from the environment (e.g. the need to
navigate only on roads). However, in more structured road environments, and HD
map-like representation can be a useful for framing the semantic understanding of the
scene even if it is not required to be created in advance. For example, the hierarchical
representation of roads containing multiple segments, which in turn contain lanes,
etc. is a useful representation for trajectory planning.
To this end, we introduce MapLite 2.0 which generates a novel parametric HD
map representation directly from the SD prior on-the-fly. The parameters of this
map are then updated using onboard sensors online once again leveraging a bayesian
approach to maintain the most likely posterior of the map parameter values that fit
with the observed features. This online HD map-like estimate can be used in a similar
manner as typical HD maps without all of the cost of creating and maintaining this
information.
Chapter 5: Localizing Ground Penetrating Radar
In Chapter 5, we turn our attention to the challenge of inclement weather inter-
fering with traditional HD map-based localization. Since the previously introduced
MapLite system relies on onboard sensors, it too could be susceptible to problems
that require a good view of the road surface. Instead, we introduce a novel Local-
izing Ground-Penetrating Radar sensor which uses a radar sensor measuring unique
landmarks beneath the road for localization. While this method does require creating
the map in advance, the maps have several benefits over HD surface maps. Firstly,
they are much less susceptible to inclement weather due to the fact that the land-
marks are buried beneath the road 1-3m below the surface. Secondly, LGPR maps
are quite stable and less likely to require maintenance than surface based maps which
can change frequently. Finally, LGPR maps are smaller in storage size than typical
HD maps and can be more easily stored or transmitted to the vehicle on-the-fly.
In this chapter we also introduce LGPRNet, a learning based solution to the
LGPR localization problem that is more resilient to changing weather conditions.

37
Compared to our initial correlation-based method, which suffers from some degrada-
tion in particular during rain, the learning-based approach is less susceptible to this
effect. We hypotheisze that including mixed-weather examples in the training data
allowed the LGPRNet to learn to model the effect changing soil moisture content has
on the LGPR signal and compensate for it automatically during training.
Chapter 6: The GROUNDED Dataset
In Chapter 6, we introduce GROUNDED, a localizing ground-penetrating radar
dataset. Because the LGPR sensor is custom built, and not yet commercially avail-
able, we chose to build and distribute a large-scale dataset of LGPR data to ensure
the community could collaborate, and replicate our work. To that end, we collected
over 100 individual runs with LGPR data in a variety of differnet weather conditions
and environments.
We also present two benchmark challenges to the community to spur innovation
and provide a level playing field on which to evaluate different approaches. The first
is the Localization in Weather Challenge which evaluates how resilient a localization
approach is to changing weather conditions. The second is the Multi-lane Mapping
Challenge which addresses the issue that LGPR scans are typically only a single
lane wide and novel techniques will need to be developed to stitch these together to
construct large-scale coherent maps of the underground topology.
Chapter 7: Conclusion
Finally, in Chapter 7, we end with a discussion of our central thesis, the lessons
learned while developing and carrying out the work, and the implications of our
results. We conclude with some remarks regarding possible directions for future
research.

1.6 Relevant Publications


The major components of this thesis were published in the following publications.
Chapter 3: MapLite: Autonomous Navigation with SD Prior Maps

1. Ort, T., Paull, L. and Rus, D., 2018, May. Autonomous vehicle navigation in

38
rural environments without detailed prior maps. In 2018 IEEE international
conference on robotics and automation (ICRA) (pp. 2040-2047). IEEE.

2. Ort, T., Murthy, K., Banerjee, R., Gottipati, S.K., Bhatt, D., Gilitschenski,
I., Paull, L. and Rus, D., 2019. Maplite: Autonomous intersection navigation
without a detailed prior map. IEEE Robotics and Automation Letters, 5(2),
pp.556-563.

Chapter 4: MapLite 2.0: Online HD Map Inference

1. Ort, T., Walls, J.M., Parkison, S.A., Gilitschenski, I. and Rus, D., 2022. MapLite
2.0: Online HD Map Inference Using a Prior SD Map. IEEE Robotics and Au-
tomation Letters.

Chapter 5: Localizing Ground Penetrating Radar

1. Ort, T., Gilitschenski, I. and Rus, D., 2020. Autonomous navigation in in-
clement weather based on a localizing ground penetrating radar. IEEE Robotics
and Automation Letters, 5(2), pp.3267-3274.

Chapter 6: The GROUNDED Dataset

1. Ort, T., Gilitschenski, I. and Rus, D., 2021, July. GROUNDED: The Localiz-
ing Ground Penetrating Radar Evaluation Dataset. In Robotics: Science and
Systems (Vol. 2).

39
40
Chapter 2

Related Work

This thesis builds on important work in the areas of map-based autonomous navi-
gation, mapless navigation, automated mapping, and topological or topometric lo-
calization and navigation. In the following sections, we briefly describe the primary
works in each of these areas and how this thesis relies and expands on them.

2.1 Map-based Autonomous Navigation


Most of the existing approaches to autonomous driving rely on detailed prior maps
of the environment, often constructed using precision laser scanners. Typical driving
strategies on pre-built maps involve some form of localization: lidar-based [72, 75, 133]
or vision-based [134], followed by local trajectory and motion planning. Evidently,
such approaches need extremely precise prior maps, and cannot easily adapt to envi-
ronment changes (e.g., repaved roads, detours, etc.). Furthermore, such approaches
are not suitable for driving beyond confined areas such as cities/towns.

2.2 Mapless Navigation


To overcome some of the scalability issues with map-based autonomous driving, there
have been several attempts to go mapless (at least in part). Such approaches lever-
age environment structure for local perception. While a majority of such approaches

41
rely on color/grayscale images to detect structures such as roads [18, 135, 2], lane
markings[83, 3], and road boundaries [119], these suffer the common pitfalls of any
image-based segmentation algorithm: poor generalization across varying illumination,
scene dynamics, and infrastructural changes. A few other approaches [98] use lidar
scans to detect drivable regions. However, most such approaches [83, 3] make assump-
tions about the local structure of the environment (eg. availability of lane-markings,
etc.).
Another popular set of approaches take an end-to-end learning approach to map-
less driving. These are either based on imitation learning [4, 16, 30], or on re-
inforcement learning [64]. While such approaches alleviate the need for precision
maps, they are still laborious to implement, as they require enormous amounts of
data/demonstrations from a (human) expert.

2.3 Automated Mapping

Automatically generating HD maps in full or in part as an offline inference task is


an attractive problem for its ability to reduce manual supervision. [92] introduced
a method for estimating road and lane boundaries given a coarse road graph. They
formulated the inference task as an energy minimization problem where potentials
encoded aerial image appearance and prior knowledge, e.g., lane smoothness. [78]
designed a system to estimate crosswalk geometry given accumulated LIDAR intensity
BEV rasters and approximate intersection location determined from a coarse SD
map. The authors trained a network to provide input to a structured estimator
that determines a consistent configuration of crosswalk parameters. [58] proposed a
method for inferring lane boundaries for highway scenes given accumulated LIDAR
intensity BEV rasters. They structured their estimate as a directed acyclic graphical
model where each node in the graph encoded geometric and topological information,
and the nodes were iteratively explored to expand the graph. [140] developed a
lane graph estimation engine using vision and LIDAR data aggregated within a BEV
representation. Their method is capable of handling arbitrary intersection topologies,

42
but does not incorporate prior SD map information.

2.4 Topological/Topometric Localization and Navi-


gation
Another class of approaches to autonomous driving uses topometric maps (predomi-
nantly OpenStreetMaps [54]). In a series of works, Ballardini et al. [8, 6, 7] leverage
the OSM for localization and intersection detection. In a similar vein, OpenStreet-
SLAM [50] aligns OSM data to local maps as an additional cue in a Monte-Carlo
localization framework. However, all these approaches demonstrate the use of OSM
in urban areas, where the precision of OSM is significantly higher than for rural ar-
eas. In particular, our approach can efficiently deal with missing roads, incorrect
intersection annotations, and large deviations in road shapes.

43
44
Chapter 3

MapLite: Autonomous Navigation


with SD Prior Maps

3.1 Overview
At present, the majority of industry-led approaches to autonomous driving involve ei-
ther building and maintaining detailed maps, or making structure assumptions about
the environment such as the existence of road markings on highways and major roads.
The latter approach is very appealing since building and maintaining maps is time-
consuming, expensive, and suffers from perceptual aliasing in some rural environ-
ments. However, road markings are not always present. In fact, only about 2/3 of
the roads in the United States are paved [103]. Additionally, these approaches based
on local infrastructure generally only enable road following and not navigation (i.e.,
traversing intersections to reach a goal). As a result, there is a significant portion of
the road network that cannot be easily used with state-of-the-art autonomous driving
systems.
One class of approaches addressing this problem is based on using neural network-
based driving approaches [4, 16, 30, 64] as they have demonstrated lane-stable driv-
ing without detailed maps. However, those approaches lack explainability, verifi-
able robustness, and require a high amount of training data to generalize. Early
approaches incorporating topometric map data for navigation [57] usually rely on

45
Figure 3-1: MapLite: A visualization of our system operating on a rural road in
Central Massachusetts. The white lines show the topometric map obtained from
OpenStreetMap.org. The colored pointcloud is obtained from the lidar scanner used
in MapLite to navigate without the need to localize on a detailed prior map. Note
that the noisy topometric map has been registered to the local sensor data.

building structure information and are thus not applicable to most rural settings.
[98], demonstrated lane following behaviors without detailed metric maps. This ap-
proach required human-driver disengagements at intersections, owing to their complex
topologies and, thus, did not allow for fully autonomous point-to-point navigation.

In this chapter, we propose a "one-click" solution for navigation using only a weak
topological prior, such as OpenStreetMap (OSM), that is available for the entire
road network and significantly less memory-intensive than a full dense map. The
main advantage of this approach is that it requires only freely existing infrastructure
(the OSM) and therefore scales to rural environments without the need for any map
maintenance or updating of neural network models.

Our proposed method comprises three main parts: First, we segment the scene
using on-board sensors. Second, we perform a topometric registration to fit the OSM
to the local sensor data frame. Finally, we solve the navigation problem at the
topological level and then the path planning problem at the metric level. To the best
of our knowledge this is the first deployment of an autonomous navigation pipeline

46
that can autonomously plan and execute high-level navigation tasks without the need
for a precise georeferenced localization estimate either from high-precision GPS, or
highly detailed prior maps. In summary, we claim the following contributions of our
method:

1. A novel topometric map registration algorithm that approximates the maximum


a posteriori estimate of the registration between the vehicle and the map.

2. A route planning method we call "smart replanning" that allows for efficiently
solving the topological navigation problem on a frequently updating map

3. Extensive evaluations on a full-scale autonomous car in a rural driving setting


where the roads lack structure and through simulations in CARLA [44] with
more complex road topologies such as traffic circles.

3.2 Related Work


In this section, we briefly review some closely related approaches.
Map-based autonomous driving. Most of the existing approaches to au-
tonomous driving rely on detailed prior maps of the environment, often constructed
using precision laser scanners. Typical driving strategies on pre-built maps involve
some form of localization: lidar-based [72, 75, 133] or vision-based [134], followed by
local trajectory and motion planning. Evidently, such approaches need extremely pre-
cise prior maps, and cannot easily adapt to environment changes (e.g., repaved roads,
detours, etc.). Furthermore, such approaches are not suitable for driving beyond
confined areas such as cities/towns.
Map-less autonomous driving. To overcome some of the scalability issues
with map-based autonomous driving, there have been several attempts to go map-
less (at least in part). Such approaches leverage environment structure for local
perception. While a majority of such approaches rely on color/grayscale images to
detect structures such as roads [18, 135, 2], lane markings[83, 3], and road boundaries
[119], these suffer the common pitfalls of any image-based segmentation algorithm:

47
poor generalization across varying illumination, scene dynamics, and infrastructural
changes. A few other approaches [98] use lidar scans to detect drivable regions. How-
ever, most such approaches [83, 3] make assumptions about the local structure of the
environment (eg. availability of lane-markings, etc.).
Another popular set of approaches take an end-to-end learning approach to map-
less driving. These are either based on imitation learning [4, 16, 30], or on re-
inforcement learning [64]. While such approaches alleviate the need for precision
maps, they are still laborious to implement, as they require enormous amounts of
data/demonstrations from a (human) expert.
Road segmentation. In contrast to mapped approaches, where the driveable
road surface is taken as a prior from the map, real-time road surface segmentation is a
crucial requirement for mapless navigation. The basic problem of road segmentation
is simple: for each element in the incoming sensor data, label it as either “road” or
“off-road”. Fig. 3-2 "Road Segmentation" shows an example pointcloud that has been
segmented showing the points lying on the road surface in black and the off-road
points in blue. This problem has been studied extensively in the literature including
model-based approaches such as [2] and those that rely on camera images [135, 136],
lidar [25], and combined vision and lidar [27, 24]. However, the MapLite scenario
imposes some unique constraints. First, since a major benefit of our independence
from prior maps is the ability to operate in large-scale rural regions where roads
are missing structures such as lane markings or curbs, we abstain from utilizing
these features in our approach. Furthermore, deployment on a real-world vehicle
necessitates fast processing time. Finally, rural roads are often unlit, and have highly
varying illumination even during the daytime due to seasonal vegetation. Therefore,
we prefer methods that rely solely on lidar since these sensors are immune to changes
in ambient lighting.
Here, we implement and compare three different model choices for our road seg-
mentation module: a linear Support Vector Machine (SVM) and two convolutional
networks: SparseConvNet (SCN) [38] and PointNet [106].
Topometric localization/registration. Another class of approaches to au-

48
Figure 3-2: The MapLite system differs from a typical autonomous navigation pipeline
in five components: Topological Map, Road Segmentation, Map Registration, Route
Replanning, and Motion Planning

tonomous driving uses topometric maps (predominantly OpenStreetMaps [54]). In


a series of works, Ballardini et al. [8, 6, 7] leverage the OSM for localization and
intersection detection. In a similar vein, OpenStreetSLAM [50] aligns OSM data to
local maps as an additional cue in a Monte-Carlo localization framework. However,
all these approaches demonstrate the use of OSM in urban areas, where the preci-
sion of OSM is significantly higher than for rural areas. In particular, our approach
can efficiently deal with missing roads, incorrect intersection annotations, and large
deviations in road shapes.

3.3 MapLite Method

An overview of our proposed system is shown in Fig. 3-2. Note that our approach re-
quires only a topometric map, which does not contain the detail necessary for precise
localization. However, we demonstrate that it suffices for specifying the goal location
and onboard sensors can be used to plan safe trajectories on the fly. The following
subsections describe the five main components that enable this system to safely nav-
igate to the desired location without the need for a detailed map: Topometric Map,
Road Segmentation, Map Registration, Route Planner, and Motion Planner.

3.3.1 Topometric Map

The topometric maps used by MapLite are simple graph-like data structures with the
extension that each node is associated with a two dimensional point on the surface

49
of the earth described by its longitude and latitude. We utilize the flat-earth approx-
imation with the UTM transform [69] which places the vertices in a plane. Thus for
each map 𝑀 we have
𝑀 = {𝑉, 𝐸}

where each vertex 𝑣𝑖 ∈ R2 describes a waypoint and each edge 𝑒𝑖 ∈ 𝐸 ⊂ |𝑉 | × |𝑉 |


describes a road segment. However, while the connectivity of the network can be
assumed to be relatively correct, the same cannot be said for either the relative or
global accuracy of the specific waypoints.

Topometric maps of the roads throughout the world are readily available from a
number of sources. For this thesis, we used maps downloaded from OpenStreetMap[54]
which provides open access for the public to both download and upload mapping data.
Fig. 3-1 shows an example of the topometric map (in white) used.

These topometric maps differ from the detailed maps typically used for localization
in a number of important ways. First, detailed maps typically need to be recorded
with a human driver prior to deployment and often require further labor-intensive
manual editing to ensure the maps are accurate [72]. For this reason, detailed maps
are currently available for only a handful of cities and even those are often only useful
for a particular autonomous vehicle sensor setup. Topometric maps on the other
hand, are already freely available for most roads throughout the world.

Another difference between topometric maps and detailed maps lies in the storage
size. While a detailed map used to localize over the 20,000 miles of roads in a small
city takes about 200GB [72], a similar topometric map could be stored in about
3.5GB. Considering the US alone contains over 4,000,000 miles of roadways, this can
have a large impact on the storage and data transmission required for autonomous
navigation.

Finally, since detailed maps include many transient surface features, any changes
to the road surface, seasonal vegetation, or building construction can render a detailed
map obsolete often in just a few months. Topometric maps, on the other hand, only
need updating when the road network itself changes which means these maps are

50
typically accurate for many years.

3.3.2 Road Surface Segmentation

We use a linear SVM model for road surface segmentation since its extremely fast
runtime is needed for real-time operation at speed, and its accuracy was close to that
of the much more performance-heavy CNNs (see section 3.4.2 for a comparison). Our
linear SVM relies on five features extracted from the lidar data. The first feature
is the elevation 𝑧 measured directly by the sensor. Next, a measure of the surface
texture is calculated using an approach similar to [102] using a sliding window for
all points 𝑝𝑖 in the neighborhood 𝑁𝑝 . Then the local variance 𝑣𝑝 for each point is
calculated using
1 ∑︁
𝑣𝑝 = |𝑝 − 𝑝|2
|𝑁𝑝 | 𝑖∈𝑁 𝑖
𝑝

where 𝑝 is the mean over all points in 𝑁𝑝 . This feature yields a measure of the local
surface texture that is larger for rough surfaces (e.g. grass and trees) and smaller
on driveable surfaces. Next, the intensity of each return is also included to account
for the difference in reflectivity of road surfaces. The fourth feature is a unique laser
ID accounting for physical differences in each of the 64 transmitters in the HDL-64
sensor. The laser ID is included in the training to account for systemic bias between
the transmitters in a single device. Thus, new training data would need to be collected
if switching to a new sensor. Finally, an indicator feature in {0, 1} is used to indicate
any points that did not return. This is important as the presence of an out-of-range
value (either too far or too close) contains valuable information for segmentation.
These five features are used together to classify each point as either road or not road.

3.3.3 Topometric registration

In the map registration step, we aim to obtain an estimate of the robot location on
the map using only odometry measurements and lidar scans. Formally, our goal is to

51
obtain the estimate
𝑜 𝐿
(3.1)
(︀ (︀ )︀)︀
ˆ 𝑡 = arg min − ln 𝑃 𝑥𝑡 |𝑍1:𝑡
𝑥 , 𝑍1:𝑡
𝑥𝑡

where 𝑥𝑡 = [𝑥𝑡 , 𝑦𝑡 , 𝜃𝑡 ] is the robot’s pose in the topometric map frame at time 𝑡
while 𝑍1:𝑡
𝑜
and 𝑍1:𝑡
𝐿
are the sets of all odometry and lidar measurements up to time
𝑡 respectively. An approximation of this likelihood is required to allow for real-time
operation.

Likelihood Approximation

Using Bayes’ rule, the probability in (3.1) can be represented in terms of its prior,
the lidar observation likelihood, and the odometry-based state prediction as

𝑜 𝐿
(︀ )︀
𝑃 𝑥𝑡 |𝑍1:𝑡 , 𝑍1:𝑡

∝𝑥 𝑃 𝑍𝑡𝐿 |𝑥𝑡 · 𝑃 𝑥𝑡 |𝑍1:𝑡 𝑜 𝐿


(︀ )︀ (︀ )︀
, 𝑍1:𝑡−1
∫︁
(︀ 𝐿 )︀
= 𝑃 𝑍𝑡 |𝑥𝑡 𝑃 (𝑥𝑡 |𝑥𝑡−1 , 𝑍𝑡𝑜 )

𝑜 𝐿
(︀ )︀
· 𝑃 𝑥𝑡−1 |𝑍1:𝑡−1 , 𝑍1:𝑡−1 d𝑥𝑡−1 .

We approximate the prior as a discrete distribution concentrated at the last estimate


ˆ 𝑡−1 ) which results in
𝛿(𝑥𝑡−1 − 𝑥

𝑜 𝐿
∝𝑥 𝑃 𝑍𝑡𝐿 |𝑥𝑡 · 𝑃 (𝑥𝑡 |ˆ
𝑥𝑡−1 , 𝑍𝑡𝑜 ) .
(︀ )︀ (︀ )︀
𝑃 𝑥𝑡 |𝑍1:𝑡 , 𝑍1:𝑡

(︀ )︀
Lidar Observation Likelihood 𝑃 𝑍𝑡𝐿 |𝑥𝑡

Each lidar scan 𝑍𝑡𝐿 is represented as a set of 𝑛 3-tuples 𝑧 𝑖 = (𝑥𝐿𝑖 , 𝑦𝑖𝐿 , 𝑙𝑖 ) where 𝑥𝐿𝑖 , 𝑦𝑖𝐿 ∈
R give the position of each measured lidar point in the sensor frame and 𝑙𝑖 is a binary
classification label obtained in the previously described segmentation module. We
define the signed distance function 𝑓𝐷 (𝑧 𝑖 , 𝑥𝑡 , 𝑀 ) representing the distance from the
point 𝑧 𝑖 to the nearest point on any edge in the topological map 𝑀 (assuming the
vehicle is in location 𝑥𝑡 ). We model the probability of observing each point 𝑧 𝑖 at

52
location 𝑥𝑡 using a sigmoid function

1
𝑙𝑖 = 1


1+exp(𝑓𝐷 (𝑧 𝑖 ,𝑥𝑡 ,𝑀 )−𝑟𝑤 )
𝑃 (𝑧 𝑖 |𝑥𝑡 ) = (3.2)
⎩1 − 1
𝑙𝑖 = 0

1+exp(𝑓𝐷 (𝑧 𝑖 ,𝑥𝑡 ,𝑀 )−𝑟𝑤 )

where 𝑟𝑤 is a tunable parameter that represents the likelihood of finding points labeled
road, far from the map 𝑀 which contains road centerlines. Notice that at the road
center where 𝑓𝐷 (𝑧𝑖 , 𝑀 ) = 0, 𝑃 (𝑧𝑖 ) ≈ 1 if 𝑙 = 1 𝑃 (𝑧𝑖 ) ≈ 0 if 𝑙 = 0 while far from
the road, the converse is true. Also, in this thesis, a constant value was sufficient for
𝑟𝑤 since all the roads traversed were of similar width (5 − 7m) However, given that
the lanecount and roadway class is available for most roads on OpenStreetMap, this
parameter could easily be extended to depend on the road type. Finally, the overall
probability of an entire lidar scan is computed as the product over the probabilities
of the individual measurement tuples 𝑃 𝑍𝑡𝐿 |𝑥𝑡 = 𝑖 𝑃 (𝑧𝑖 | 𝑥𝑡 ).
(︀ )︀ ∏︀

𝑥𝑡−1 , 𝑍𝑡𝑜 )
Odometry based prediction 𝑃 (𝑥𝑡 |ˆ

Each odometry measurement 𝑍𝑡𝑜 = [∆𝑥𝑡 , ∆𝑦𝑡 , ∆𝜃𝑡 ] with respect to the previous pose
is obtained by fusing both the wheel encoder information and the IMU measurements
using an Extended Kalman Filter [88]. The odometry based likelihood of the future
state is then modeled as

ˆ 𝑡−1 + 𝑍𝑡𝑜 ||
(︂ )︂
||𝑥𝑡 − 𝑥
𝑃 𝑥𝑡−1 , 𝑍𝑡𝑜 )
(𝑥𝑡 |ˆ ∝𝑥 exp −
𝑏

with scale factor 𝑏 that is obtained, for computational reasons, through parameter
tuning rather than by estimating it along with the prior.

Implementation for Real-Time Operation

The main remaining driver of computational cost is evaluating 𝑃 𝑍𝑡𝐿 |𝑥𝑡 . This is
(︀ )︀

due to the fact that 𝑓𝐷 (𝑜 𝑧𝑖 , 𝑀 ) ∈ 𝑂(|𝐸|) (with |𝐸| being the number of edges in
𝑀 ) and thus 𝑃 𝑍𝑡𝐿 |𝑥𝑡 ∈ 𝑂(𝑛 · |𝐸|). To achieve a speed-up we employ a number
(︀ )︀

53
of further approximations. First, we discretize the space containing the map 𝑀 into
cells of size 𝑟𝑐 (0.5m for our experiments) and precompute the distance transform.
This computation only needs to happen once for each map, and in practice takes <1s
to compute for a map of size 1km2 . This approximation turns distance computations
into a simple lookup with runtime independent of number of edges or scanned points.
Next, we randomly subsample the input lidar scans using two tunable parameters
𝑠 ∈ N+ , 𝑏 ∈ [0, 1] where 𝑠 is the number of points to sample, and 𝑏 is the desired
share of points from the road (i.e. where 𝑙𝑖 = 1) to account for high label-imbalance.
Finally, we simplify (3.2) by approximating

(1 + exp (𝑓𝐷 (𝑧 𝑖 , 𝑥𝑡 , 𝑀 ) − 𝑟𝑤 ))−1


(︂ )︂
𝑓𝐷 (𝑧 𝑖 , 𝑥𝑡 , 𝑀 )
≈ 1 − min ,1
𝑟𝑤

In practice, we found these approximations to have a negligible effect on system


performance, while decreasing the time required to solve the problem from 10𝑠 to
20𝑚𝑠 on a standard laptop with an Intel i7 processor.

3.3.4 Route Planning

The route planner is responsible for choosing the shortest path from the start location
of the vehicle to the goal. It takes the registered topometric map as input, converts
it into a graph structure and calculates euclidean distances as edge weights. Next A*
[55] is used to plan the shortest path from the start to the goal. Given the heavy
computation required by the other modules, we cannot afford to recalculate the entire
route at a high frequency. However, since the topometric registration frequently
updates the map we cannot simply reuse the original route plan. To account for
this, we use a "fast_update" method that updates the positions of the waypoints in
the path in the metric space without replanning the route in the topological space.
Furthermore, since it is possible the vehicle could deviate from the planned path (e.g.
if the map registration hasn’t updated recently) we also check for deviations from the
plan and only replan with the A* algorithm as necessary. Algorithm 1 describes how

54
the fast replanner determines when a replan of the route plan is necessary or simply
a "fast_update" of the existing path to the new map. In our testing, we found that
this reduced the average latency of the route planning module from 100𝑚𝑠 to 1.5𝑚𝑠.
In section 3.4.4 we show that this optimization has a substantial impact on real-world
system performance.

Algorithm 1 Fast Route Replanning


1: Inputs: (goal, map, pose, route_plan)
2: while pose not equal to goal do
3: if route_plan is empty then
4: Store a new route_plan from A*(pose, goal, map)
5: Set path using route_plan
6: else
7: Set path using fast_update(route_plan, map)
8: if distance(pose, path) > recalculate_threshold then
9: Recalculate a new route_plan with A*
10: end if
11: end if
12: return path
13: end while

Additionally, we utilize a Finite State Machine to break the entire route plan into
sub-tasks (e.g. drive to the next intersection) which enables the vehicle to wait for a
goal, drive, make the correct navigation choice at each intersection, and finally stop
at the goal destination.

3.3.5 Trajectory Planner

The trajectory planner generates the trajectory the vehicle must follow to reach its
destination. As seen in Fig. 3-2 it takes as input a segmented lidar scan as well as
the high-level navigation reference path from the Route Planner. We utilize a varia-
tional trajectory planner to generate safe trajectories. However, unlike typical [139]
implementations, we do not obtain road boundaries from a detailed map. Instead,
the road boundaries are obtained from the road segmentation module (Section 3.3.2)
and this planner integrates the high-level navigation information obtained from the
route planner to plan a locally safe path in the sensor frame. Fig. 3-3 shows an exam-

55
Figure 3-3: An example result from the trajectory planner. The vehicle is located at
the blue dot, and received the white path from the Route Planner to indicate a right
turn. The red (road) and blue (off-road) road segmentation is used to plan the green
trajectory which follows the high-level navigation goal while safely remaining on the
road.

ple of the typical problem this planner solves. The blue point is the current vehicle
location, and the red and blue represents the road/not-road segmentation result. The
white path, is the result from the topometric registration and route planning mod-
ules, which, while reliably providing navigation information (e.g. in this case “turn
right”) we cannot assume it to be precise at a metric level, since the topometric map
does not contain such detail. Instead, the planner chose the green route, as it safely
remains on the road, while also obeying the high-level navigation requirement.

The variational trajectory planner utilizes B-splines [37] to parameterize smooth


trajectories. For a given reference trajectory 𝑟 the planner completes a two-stage
optimization process. The first stage local goal-point selection consists of determining
the closest point to the high-level navigation goal that is both in the vehicle field of
view and also lies on a road surface. The second stage trajectory optimization seeks an
“optimal” path from the current vehicle location to the goal-point based on a number
factors described shortly.

56
Local Goal Point Selection

The reference path is first clipped to the sensor range 𝒳 ∈ R2 . Then, for a reference
path 𝑟 consisting of 𝑛 waypoints, we denote the final point 𝑟𝑛 . Note, although by
construction 𝑟𝑛 is in the sensor range, it may not lie on the road. To obtain the local
goal point we define a cost function 𝐽 (𝑥) = [𝑑 (𝑥) , 𝑑𝑒 (𝑥, 𝑟𝑛 )] for 𝑥 ∈ 𝒳 where 𝑑 (𝑥)
is the signed distance function from 𝑥 to the free space in 𝒳 and 𝑑𝑒 (𝑥, 𝑟𝑛 ) is the
Euclidean distance from 𝑥 to 𝑟𝑛 . Then the goal-point is found by

𝑥𝑔 = arg min 𝐽 (𝑥) 𝑊𝑔𝑇


𝑥∈𝒳

where 𝑊𝑔 is a weight vector for tuning the relative importance of the cost function
terms.

Trajectory Optimization

The second stage of the variation planner creates a B-spline trajectory that begins
at the vehicle’s location and ends at the goal-point 𝑥𝑔 . This optimization utilizes a
three part cost function composed of a road-distance cost

𝑘
1 ∑︁
𝐽𝑑 (𝑞) = 𝑑 (𝑞𝑖 )2
𝑘 𝑖=1

where 𝑞 is a candidate B-spline with 𝑘 waypoints and 𝑑 (𝑞𝑖 ) is, once again, the signed
distance function to the road points. Next, in order to ensure that shorter paths are
preferred to equally feasible longer ones, a relative-path-length cost is computed as
∑︀𝑘
𝑖=2 𝑑𝑒 (𝑞𝑖 , 𝑞𝑖−1 )
𝐽𝑙 (𝑞) =
𝑑𝑒 (𝑥𝑔 , 𝑥0 )

where 𝑥0 is the current vehicle location. Finally, we also seek to minimize the max-
imum curvature of trajectories for ride comfort. Since B-splines allow for analytic

57
curvature calculation we impose a maximum curvature cost as

‖𝜎 ′ × 𝜎 ′′ ‖
𝐽𝜅 (𝑞) = max
‖𝜎 ′ ‖3

where 𝜎 is the B-spline representation of 𝑞 and 𝜎 ′ , 𝜎 ′′ are the first and second deriva-
tives. Finally, the optimal trajectory is obtained using a numerical non-linear opti-
mization routine based on [105] to solve

[︁ ]︁
𝑞 𝑜𝑝𝑡 = arg min 𝐽𝑑 (𝑞) , 𝐽𝑙 (𝑞) , 𝐽𝜅 (𝑞) 𝑊𝑞𝑇
𝑞∈𝒳

where 𝑊𝑞 is a weight vector for tuning the relative importance of the cost function
terms. Note that the roads utilized for testing in our rural testing environment are
unmarked and often single-width. Therefore, the cost function prioritized driving
down the center of the road. For use on roads with multiple lanes, we could easily
extend this cost function to include a cost term based on lane boundaries instead.

3.4 Experiements and Results

3.4.1 Experiment Setup

Ground Truth

In order to train and evaluate these road segmentation methods, we utilized a Real-
Time-Kinematic GPS Inertial Navigation System (Model: OXTS-RT3003) with a
base-station transmitting Differential GPS corrections. The base-station provides
groundtruth position and orientation with accuracy of 2𝑐𝑚 with a range of 10𝑘𝑚.
Note that the RTK-GPS system was utilized only to obtain ground truth, and
to provide an initial position when loading the map before the vehicle begins mov-
ing. While navigating, the vehicle only had access to odometry measurements, and
corrected for drift entirely using the MapLite system. Fig. 3-4 shows an example tra-
jectory that was driven autonomously. The blue path is the one chosen by MapLite,
while the green is the path estimated by the odometry. Clearly, the drift in the odom-

58
Figure 3-4: A comparison between the trajectory autonomously driven by MapLite
(blue) with the path estimated by odometry (green). The shaded area is the
groundtruth road surface.

etry would quickly cause the vehicle to deviate from the road without the MapLite
corrections.

System Integration

We integrate the MapLite system described previously into a pipeline for fully au-
tonomous navigation capable of operating the vehicle from an arbitrary starting point
to a user specified goal location. To execute motion plans generated by the Variational
Planner, we implement a pure pursuit controller [33], which enables us to choose a
lookahead distance 𝑑 = 8𝑚 that was large enough to ensure rider comfort, while small
enough to closely follow the reference path.
We set the vehicle speed using a dynamic speed controller such that it conforms to
the following constraints: 1) Maximum Speed, 2) Maximum Linear Acceleration, 3)
Maximum Linear Deceleration, and 4) Maximum Centripetal Acceleration. The speed
controller first generates a predicted path by simulating the pure pursuit controller
forward a fixed distance, and then analytically solves for the maximum speed that
will remain within the constraints over the length of the predicted path.

59
Table 3.1: Evaluation of Road Surface Segmentation Methods
Precision Recall F1 Accuracy Runtime (s)
SparseConvNet 0.92 0.84 0.88 0.97 0.27
Linear SVM 0.91 0.84 0.87 0.96 0.19
PointNet 0.45 0.79 0.57 0.83 2.32

The output of the Pure Pursuit and Dynamic Speed controllers are a command
steering angle and velocity. In order to achieve those commands using robust closed-
loop control, a pair of PID controllers was implemented. Each of these PID controllers
was tuned to obtain fast responses to speed and steering commands while minimizing
overshoot and steady state error.

3.4.2 Road Segmentation Results

We trained and evaluated three segmentation approaches: 1) the linear SVM, (sec-
tion 3.3.2), 2) A CNN based on SparseConvNet [38], and 3) another CNN based
on PointNet [106]. Performing dense convolutions over sparse data is inefficient and
computationally expensive due to fill-in. Both SparseConvNet and PointNet are
specifically designed to achieve accuracy typical of CNN’s on the sparse pointcloud
data structure of lidar sensors.
To utilize the groundtruth system to evaluate the road segmentation methods, the
vehicle was first driven manually along the border of each road included in the test
set. This included approximately 10𝑘𝑚 of roads in rural Massachusetts. Next, the
road boundary GPS traces were collected and used to create a georeferenced polygon
of the road boundaries. Finally, at test time, the location of the vehicle at the time of
each scan was used to project each scan into the georeferenced map. In this manner,
the groundtruth class of each point was calculated to enable evaluation of each road
segmentation method. Table 3.1 shows the results of evaluating the road segmentation
methods on a set of 500 representative laser scans comprising over 55𝑀 segmented
points. Notice that as expected, the SVM model is by far the fastest to segment a scan,
although it does not perform quite as well as the SparseConvNet. The vastly larger
number of parameters in the SparseConvNet network is also a consideration as that

60
Figure 3-5: A comparison between the RMSE of the estmated map before (Blue) and
after (green) topometric registration.

potentially allows it to learn a more complex variety of environments. However, that


also makes it susceptible to overfitting. For our systems level evaluations presented
in section 3.4.4, we chose to use the Linear SVM as its low runtime is paramount for
enabling real-time operation at speed.

3.4.3 Registration Evaluation Results

We evaluate the topometric registration by comparing the location of the road cen-
terline in the registered topometric map, to the actual road center as measured by the
groundtruth. We compute the Root-Mean-Square-Error (RMSE) along the portion
of the map that is in the sensor range (as the topometric registration only considers
that portion of the map. As a baseline, we also compute the RMSE using the prior
to map registration (e.g. based solely on the odometry estimate).
Fig. 3-5 shows RMSE before (blue) and after (green) topometric registration com-
puted during 380𝑠 of driving. As expected the error based solely on odometry starts
out quite low. However, due to the inherent drift in dead-reckoning measurements,
it quickly drifts to > 5𝑚 RMSE which is much too large to use for autonomous

61
Figure 3-6: The vehicle at the test site in rural Massachusetts. The heavily wooded
area has roads without markings or boundaries.

operation. The maps corrected by topometric registration on the other hand, have
a consistently smaller RMSE which reduces by 85.7% on average. Furthermore, it
reduces the maximum RMSE by 79.7%. The corrected maps based on the regis-
tration alone are not sufficient for blindly following the topmetric map because the
topometric map does not contain the detailed lane-level metric information needed
for autonomous driving. However, these corrected maps are sufficient to generate a
reference path that incorporates the high-level navigation information into the mo-
tion planner which generates the actual control trajectory. This incorporation of the
reference path from the topometric registration is described in section 3.3.5.

3.4.4 System Evaluation Results

We compare the MapLite system against that of an expert human driver to quantify
performance metrics. First we chose a standardized test route of 1.2𝑘𝑚 in length
requiring the traversal of a total of ten intersections (3 right-turns, 4 left-turns, and
3 straight). The test route was in a rural area near Devens, Massachusetts with
overgrown, unmarked, mostly single-lane roads. (See Fig. 3-6)
Next, we utilize the GPS groundtruth to measure a baseline traversal of the test

62
Table 3.2: MapLite Performance Evaluation and Ablation Study
Precision(m) Accuracy(m) Intervention
Human Expert 0.31 0.43 0
MapLite 0.17 0.49 0
1) Map Registration 0.34 0.81 0
2) Variational Planner 0.28 0.81 1.3
3) 1 & 2 - - >10
4) Smart Replanning 0.55 0.57 0

route. Finally, both the human driver and MapLite steer the vehicle along the test
route twice. For these runs we compute two metrics: 1) Accuracy and 2) Precision.
The accuracy metric is the RMSE of the test run compared to the baseline. The pre-
cision metric is the root mean square deviation (RMSD) of the second run compared
to the first. This gives a measure of the repeatability of the system independent of
how closely it navigates to the baseline.

We found that the expert human driver could navigate the test route with an
Accuracy of 𝑅𝑀 𝑆𝐸 = 43𝑐𝑚 while the MapLite system obtained 𝑅𝑀 𝑆𝐸 = 49𝑐𝑚.
While this was not quite as close to the baseline as the human driver, it is quite close
considering the 6𝑐𝑚 is quite a small difference with respect to the size of typical lanes.
Perhaps surprisingly, when it came to precision, the MapLite system outperformed
the human driver with 𝑅𝑀 𝑆𝐷 of only 17𝑐𝑚 while the human had 31𝑐𝑚. We attribute
this to the unique ability of the autonomous system to precisely reproduce the same
trajectory, while human drivers typically have more variability even when navigating
safely.

Next, to exhaustively test the system, we also randomly selected GPS destinations
and for each one, MapLite autonomously planned a route and piloted the vehicle
from its starting position to the destination. Once the vehicle came to a stop, a new
destination was input, and the process repeated. We executed this test for a total
of 16.1km consisting of 107 intersection traversals. At no point throughout this test
was safety driver intervention required, and the vehicle navigated safely to every one
of the destination points.

Finally, to determine the value of each of the components described previously,

63
we performed an ablation test wherein we removed a single component and computed
the accuracy and precision measurements described previously by traversing the test
route twice more. We did this for each of the following four ablation scenarios:

1. We remove the Topometric Registration component leaving the map fixed

2. We remove the Variational Planner, instead directly following the reference from
the topometric map

3. We combine both 1) and 2)

4. We remove the Smart Replanning, instead recomputing a new route plan each
time the map is updated

For each test, we also recorded the required interventions/km by the safety driver
to prevent road departure. Table 3.2 compares the performance of the Human driver,
MapLite, and each of the ablation tests. For 1) Removal of Topometric Registration,
no interventions were needed. However, the performance of the system was worse
in both accuracy and precision. While it might seem surprising that the system can
operate at all with a fixed topometric map, it should be noted that in this system,
the map is used mostly as a source of high-level navigation information, while the
local trajectory plan is computed by the Variational Planner.
Next, for 2) removal of the Variational Planner the driver was required to inter-
vene 1.3𝑖𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛𝑠/𝑘𝑚. Furthermore, both the accuracy and precision metrics in-
creased. Next, for 3) both the Topometric Registration and Variational Planner were
removed. In this case however, the vehicle could not safely navigate autonomously,
and the test was aborted after 10 interventions. This is interesting because it pro-
vides evidence that while the Topometric Registration and Variational Planner are
supplementary (the vehicle performed best when they were both used) they are also
complementary (each of them alone could enable at least some measure of autonomous
navigation).
Finally, in 4) we remove the Smart Replanning feature of the route planner. In
that test again, there were no interventions required. However, both the precision and

64
Figure 3-7: A traffic circle in the CARLA simulator. We used CARLA to test more
complex road topologies than were available at our test site.

accuracy were dramatically worse. This can be explained by the increased processing
time required to recompute the route plan each time the map was updated, which
caused a lag between when new sensor data arrived and when the vehicle was able to
respond.

3.4.5 Simulations

While the testing location in Massachusetts provided a range of intersection topologies


in which to test MapLite, we leveraged autonomous driving simulation to assess the
performance of MapLite in more complex traffic topologies that were not represented
at the testing site. We chose to use the CARLA [44] autonomous driving simulator
for our analysis because it includes realistic environments with a larger diversity of
road topologies. In particular, we selected the built-in Town03 urban environment in
CARLA for testing, shown in Fig. 3-7, which includes a multi-lane traffic circle with
multi-lane incoming roads.
We assessed the performance of MapLite in this environment in the same manner
as we did the real-world vehicle. In this case, our test route included both entering
and exiting the traffic circle, and the groundtruth GPS measurement came from the

65
simulator itself. Using this measurement as the baseline, we obtained a precision
𝑅𝑀 𝑆𝐷 = 10𝑐𝑚 and accuracy 𝑅𝑀 𝑆𝐸 = 38𝑐𝑚. Both of these values are actually
lower than the corresponding values for the real-world vehicle (see Table 3.2). How-
ever, while the simulation does include random noise, and, in fact, when the vehicle
navigates solely based on odometry it quickly drifts from the road, the simulation
does not accurately model all of the various noise sources in a real vehicle, and thus
its better precision and accuracy are not unexpected.

3.5 Discussion
In this chapter we have presented and demonstrated a novel autonomous navigation
system capable of using publicly sourced OpenStreetMap maps combined with on-
board sensors for autonomous navigation on rural roads without detailed prior maps
or GPS localization. This greatly increases the areas in which autonomous driving
systems can be deployed. The presented system does not assume the publicly sourced
maps to be perfectly accurate at a metric level and therefore, map maintenance will
only need to be performed when the core road structure is changed (e.g. roads created
or destroyed) rather than minor updates such as repainting or repaving.
Referring to Fig. 1-2, this approach can address the first four major challenges
with HD Prior maps by:

1. No dedicated manual data collection process is needed for mapping because the
Standard Definition (SD) prior maps intended for humans are already freely
available for most of the world.

2. The storage size of the SD maps from OpenStreetMap are relatively small and
easy to store and transmit. In fact, for the experiments described in this chap-
ter the SD map was downloaded on the fly using a standard wireless cellular
connection.

3. Map maintenance of SD maps that only include road centerlines is relatively

66
easier as they only require updates when the underlying road structure has
changed.

4. The problem of perceptual aliasing is mitigated because the vehicle does not
require a precise localization in order to navigate.

On the other hand, this system does have limitations. Firstly, the planner de-
scribed in Sec. 3.3.5 finds trajectories that ensure the vehicle remains on the road,
and makes progress toward the destination. This works well in rural areas. However,
in more complex road scenes, additional semantic information is necessary in order
to safely navigate while following traffic rules. For example, an understanding of
multiple lanes on a road, the directionality of these lanes, and the requirements for
navigating intersections would be necessary in order to navigate in more densely pop-
ulated areas such as suburban or urban settings. In the next chapter, we will discuss
an extension to this approach that will utilize the prior SD maps to infer those types
of information as well.
Additionally, the onboard sensors require a good view of the road in order to plan
safe trajectories. While HD map-based systems also require the onboard sensors view
the scene in order to localize, once that localization is complete, the map can be
used to plan trajectries that follow the road even if the road is completely occluded.
Therefore, if the road were to be covered with snow, or the cameras / lidars occluded
by fog, the approach described here would be unable to safely navigate. Later (Chap-
ter 5), we will propose a solution for those particularly challenging situations using
an LGPR sensor.

67
68
Chapter 4

MapLite 2.0: Online HD Map


Inference

4.1 Overview

In the last chapter, we presented MapLite, a framework for navigating autonomously


using only SD prior maps in simple settings. In that framework, we eschew the High
Definition (HD) maps typically required for autonomous driving and plan directly
from simple SD prior map information. However, the HD maps that are used in
typical autonomous vehicle pipelines also aid in decision making. HD maps represent
road topology, geometry, and attributes such as speed limits. Such systems require
precise knowledge of the vehicle position with respect to the map. HD map creation
is nontrivial and often includes immense resources to supervise the curation and
annotation process. Moreover, map data becomes outdated as the structure of the
environment evolves requiring costly maintenance procedures. At the same time,
reliance on HD maps presents a safety risk during the period between the structural
change and the next map update cycle.

Addressing this challenge requires a conceptually different approach to the AV


perception stack. Human drivers do not rely on highly detailed maps and directly

69
combine real-time scene understanding with knowledge provided by classical, SD,
street maps. Mimicking this ability requires novel perception techniques that combine
raw sensor data with coarse prior SD maps to obtain a representation similar to
current HD maps.
Existing online perception techniques focus on estimation of certain map attributes
such as, lane boundaries or road topology. These often perform classification or seg-
mentation in a dense sensor representation such as pixels or pointclouds. Those that
do infer vectorized features, typically do not provide a sufficiently rich representation
comparable to modern HD map data. Generally, downstream planning and control
modules require the HD map to query important details such as the shape of the
road ahead, the location, direction, and connectivity of adjacent road lanes, and the
feasible traversals through an intersection. Feature detectors that merely detect the
features in the scene, without inferring the underlying structure cannot satisfy this
requirement.
We aim to enable the much richer HD map-like representation by leveraging an SD
prior map. While the SD map provides valuable topological information, integrating
it is particularly challenging because these maps were designed for humans and the
annotation cannot be assumed to be sufficiently precise. Thus, enabling SD map
based driving using an online computed HD map-like representation is a challenging
and largely open problem.
In this chapter, we present a system to replace or augment meticulously generated
HD maps. Our method infers road geometry and topology, e.g., lane and intersection
sizes, given a publicly available coarse road graph (in our case OpenStreetMap [53])
and online perception data. We convert the coarse road graph into an HD map
prior and a rasterized birds-eye-view (BEV) representation. Additional BEV images
are generated from a combination of sensor inputs. These are used as inputs to a
Convolutional Neural Network (CNN) trained to predict their semantic labeling using
existing HD map data for training. The semantic segmentation is converted into a set
of label-specific distance transforms. Finally, a structured estimator maintains the

70
Figure 4-1: An online HD map overlaying lidar intensity as a vehicle traverses an
intersection. The white and yellow lines indicate inferred road and lane boundaries
while the red lines denote intersection entry/exit ports. The background shows a
projection of the lidar intensity data into the birds-eye-view (BEV) that was used to
help generate the HD map online.

local map estimate and integrates the SD map prior. In summary, the contributions
of this chapter include:

• A new method for online HD map inference from raw sensor data and a prior
SD map.

• A fully relative HD map representation that is designed specifically to enable


both the incorporation of an SD prior map, as well as online perception-based
updates.

• Evaluation of our approach on real-world scenes with diverse road layouts and
appearance.

71
Prior [Sec. III-A] Inference [Sec. III-C]

Segments
SD Map MapLite Prior
Perception [Sec. III-B]
Intersections

Camera Downstream
Images Transitions Posterior
Planning / Control

Lanes
Lidar
Pointclouds MAP
Distance Structured
BEVs CNN Estimator
Transforms

Figure 4-2: In the proposed system, an SD prior map is fused with onboard sensor
data to perform an online estimation of the HD map near the vehicle.

4.2 Related Work

4.2.1 Offline map generation

Automatically generating HD maps in full or in part as an offline inference task is


an attractive problem for its ability to reduce manual supervision. [92] introduced
a method for estimating road and lane boundaries given a coarse road graph. They
formulated the inference task as an energy minimization problem where potentials
encoded aerial image appearance and prior knowledge, e.g., lane smoothness. [78]
designed a system to estimate crosswalk geometry given accumulated LIDAR intensity
BEV rasters and approximate intersection location determined from a coarse SD
map. The authors trained a network to provide input to a structured estimator
that determines a consistent configuration of crosswalk parameters. [58] proposed a
method for inferring lane boundaries for highway scenes given accumulated LIDAR
intensity BEV rasters. They structured their estimate as a directed acyclic graphical
model where each node in the graph encoded geometric and topological information,
and the nodes were iteratively explored to expand the graph. [140] developed a
lane graph estimation engine using vision and LIDAR data aggregated within a BEV
representation. Their method is capable of handling arbitrary intersection topologies,
but does not incorporate prior SD map information.

72
4.2.2 Online map inference

An alternative approach (and the one we take here) to dealing with the challenges of
annotating HD maps, is to infer lane geometry and topology online. [86] trained a
neural network to segment lanes in stereo images to produce BEV lane geometry and
topology. Their method focused on mid-roadway estimation and was not extended
into intersections or other complex topologies. [94] proposed an end-to-end system
for lane line estimation where the architecture includes image-based instance segmen-
tation followed by trained perspective transformation and a least-squares fit of a pa-
rameterized curve. They suggest that fitting parameterized lines in a pseudo-ground
plane (akin to a BEV frame) results in a more simple parameterization compared to
an image frame. PiNet [66] can produce quick and accurate lane boundary estimates
by clustering image frame keypoints into instances. HDMapNet [77] estimates road
boundaries, lane lines, and other semantic features in combined LIDAR and cam-
era data to produce a BEV semantic map around the vehicle. Note that the above
methods typically produce class instances, e.g., lane lines, but do not provide the
topological information that many AV stacks expect.

In the previous chapter, we used the topological information from an SD map to


navigate. MapLite was capable of piloting a vehicle with no prior HD map. But
it focused on simple road geometries in rural scenes – it did not create an HD map
representation to enable use on multi-lane roads or in urban areas. Here, we extend
this idea to actually infer an HD map-like representation online using the SD prior
to determine the overall parameter structure of the problem.

73
4.3 MapLite 2.0 Method

4.3.1 State Description

SD Map Model

In this chapter, we assume the availability of SD prior maps. Unlike HD Maps, SD


maps are widely available and have minimal storage and transmission requirements.
For comparison, the HD map of a small city with 20,000 miles of road could require
hundreds of gigabytes [73]. In contrast, an SD map of the same region could be stored
in just 3.5 gigabytes. These savings come at a cost however. The SD map contains
much less information than the HD equivalent.
We consider an SD map with a graph-like data structure 𝑀𝑆𝐷 = {𝑁, 𝐸} where
the nodes, 𝑁 , are intersections and the edges, 𝐸, are polylines representing road
centerlines. Each edge also has associated semantic attributes such as the road name,
speed limit, and number of lanes in each direction. Note that such maps are commonly
used for navigation and OpenStreetMap.org [53] makes them freely available for much
of the globe. In this chapter, we do not require a high level of accuracy in the
road centerline geometry. For example, it is common for OpenStreetMap centerline
features to have errors on the order of 10𝑚 and the current system is designed to
handle that. However, we do require the topology (e.g. road connectivity) of the SD
map to be correct.

HD Map Model

Although HD maps have been widely adopted for autonomous driving, there is cur-
rently no universal format for HD maps [82]. We take inspiration from several widely
used standards in designing our HD map model. Similar to the Navigation Data
Standard (NDS) [93] we utilize separate layers for routing and lane level geometry
which we generate online. As in the OpenDrive [5] model we parameterize lane geom-
etry with respect to road centerlines. Since we are estimating the model online, and

74
Figure 4-3: The HD map model state consists of a set of parameters which define the
geometries relative to an SD prior basemap (shown in blue). The red lines indicate
intersection entry/exit ports and the black circles indicate knot points. In this illus-
trative example, there would be many more parameters required to define the full
shape of the map elements shown. However for figure clarity, we only include one of
each type.

do not assume the SD map centerlines to be highly accurate, we parameterize even


the lane boundaries that coincide with the centerlines in the SD map which allows
the online map to not only encode lane geometries that aren’t in the SD map, but
also shift the position of a road from the centerline in the SD map. Our primary
objective in designing the HD map model is to provide a data structure with the
geometric information needed for downstream planning and control while allowing it
to be estimated online from the SD prior and sensor data.
The HD map model consists of a hierarchy with three primary layers.

1. The road layer consists of roads and intersections. Each road may contain
multiple lanes and directions of travel and must begin and terminate with either
an intersection (node degree, 𝐷 > 2) or a road terminal (𝐷 = 1).

75
2. The section layer consists of road sections with a constant number of lanes.
There are also transition regions (𝐷 = 2) where two sections with different
numbers of lanes meet. Each section begins and ends at either a transition or
the end of a road.

3. The segment layer consists of road segments with a constant number of lanes in
a single direction. Thus, a section which allows travel in two directions would
contain two road segments. Every road segment begins and ends at the nodes
of its corresponding road section.

The HD map state is defined as 𝑀𝐻𝐷 = {𝑀𝑆𝐷 , Θ} which combines an SD basemap


with a set of parameters Θ that encode the HD map geometry relative to the SD map.
For example, if the SD map contains an edge representing a road centerline, the HD
map will include that edge, along with a set of parameters that encode the real-world
road geometry with respect to that SD map edge as will be described next.
Fig. 4-3 shows an example intersection with three adjacent roads. The SD map
edges are shown as blue lines, while the nodes are blue circles. Each intersection
node has a degree 𝐷 corresponding to the number of roads at the junction. The HD
map model locates the entry and exit ports of the intersection (shown as red lines
in Fig. 4-3) using a set of four parameters per port 𝜃𝑖𝑑 , 𝜃𝑖𝑎 , 𝜃𝑖𝑟 , and 𝜃𝑖𝑙 for 𝑖 ∈ [1, 𝐷].
These define the distance along the SD centerline to the junction port, the angle of
the road-intersection boundary, and the widths in the right and left directions. Thus,
these 4𝐷 parameters fully describe the shape and position of all of the entry and exit
ports for an intersection.
Each road segment is discretized into 𝐾 knot points with fixed offset distance
𝑘𝑛𝑜𝑡_𝑑𝑖𝑠𝑡 along the SD map edge. A set of parameters 𝜃𝑖𝑟𝑏 and 𝜃𝑖𝑙𝑏 for 𝑖 ∈ [1, 𝐾]
define two polylines which represent the right and left boundaries of the road segment.
Next, for a segment with 𝐿 lanes, there are an additional 𝐿 · 𝐾 parameters 𝜃𝑖,𝑗
𝑙
for
𝑖 ∈ [1, 𝐾] , 𝑗 ∈ [1, 𝐿]. These define the perpendicular distance from each knot point
to the corresponding lane line in multi-lane segments. Together, these parameters are

76
used to augment an SD basemap with the precise lane-level geometry and topology
typically offered by a prior HD map.

4.3.2 Bird’s Eye View (BEV) Semantic Segmentation

The semantic segmentation pipeline combines multiple raw sensor streams and pro-
cesses them to generate features in the Bird’s Eye View (BEV) for input to the HD
map state estimation update step as shown in Fig. 4-2 Perception. The BEV frame
is a natural choice given that the SD and HD maps are also typically represented in
this frame.

BEV Projection

To generate a BEV feature, we first accumulate a set of 𝑁 consecutive pointclouds


𝑉
𝑃𝑡−𝑁 , . . . , 𝑃𝑡𝑉 in the vehicle frame where each pointcloud combines all of the lidar
sensor measurements in the indicated time interval [𝑡 − 1, 𝑡]. Each time interval, is
also associated with a set of 𝑁 images from each of 𝑀 cameras with image field-
of-view spanning some (possibly overlapping) portion of the pointclouds 𝑃 𝑉 . Each
point is then “painted” with RGB values by projecting the point into the image frame.
Next, each of the colored pointclouds for 𝑃𝑡−𝑁
𝑉 𝑉
, . . . , 𝑃𝑡−1 are projected into the current
vehicle frame. Although each frame requires 𝑁 − 1 transformations, each pointcoud
𝑃 𝑉 need only be painted once using this scheme. Finally, the accumulated, painted,
pointcloud 𝑃𝑡𝑉 is projected into the BEV frame by discretizing the space around the
vehicle into a grid in the 𝑋 − 𝑌 plane, and aggregating the intensity, height, and
RGB color values in each grid.
Not all cells in the grid will have data points due to both occlusions and the spar-
sity of lidar. Therefore, there is a trade-off between a very high resolution BEV grid
with sharp features but many empty cells, and a low resolution grid with fewer cells
missing data. To obtain a good balance, we compute the BEV at several resolutions,
and then select the highest resolution available cell at each position which yields a

77
Figure 4-4: The first step in the perception pipeline is to project the sensor data into
the Bird’s Eye View (BEV) frame. On the left is a lidar intensity frame, next is an
elevation map, and last is an RGB frame. The red trace shows the path of the vehicle
in each frame.

variable resolution BEV image with higher resolution in locations with more dense
sensor data.
We also choose specific targeted aggregation functions to best match the desired
feature type. Namely, for the intensity feature, we use a mean aggregation, for the
RGB features we choose the median for outlier rejection, and for the height feature,
we use the minimum which efficiently filters out moving objects and gives an ele-
vation map of the ground surface. Fig. 4-4 shows an example set of BEV features
accumulating four seconds of data from four lidar sensors and six cameras arranged
around the ego vehicle.

Semantic Segmentation

In order to estimate the HD map online we generate semantic features to infer the
location of important map objects such as road boundaries and lane lines. To this
end, we train a semantic segmentation model based on a ResNet-101 backbone [56].
The input to this network is a six channel image of the BEV frame with channels
= {𝐼, 𝑍, 𝑅, 𝐺, 𝐵, 𝑆𝐷𝑇 } where 𝐼 is the mean intensity of the lidar points, 𝑍 is the
minimum height, 𝑅,𝐺, and 𝐵 are the median color channels from camera images,
and 𝑆𝐷𝑇 is a distance transform where each pixel gives the distance to the road

78
centerlines in the SD prior map.
To reduce training requirements, we use transfer learning to fine-tune a DeepLabv3
[26] image segmentation model pretrained on the COCO [79] dataset. We modify this
network by increasing the input channels to match the six channels in our features
and initialize the weights using the pretrained input layer. Next, we modify the head
of the network to have an output shape of [𝐵, 𝐶, 𝑊, 𝐻] where 𝐵 is the batch size, 𝐶 is
the number of classes to predict, and 𝑊 ,𝐻 are the width and height of the features.
We use 𝐶 = 4 for classes={𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑, 𝑟𝑜𝑎𝑑𝑠, 𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛𝑠, 𝑙𝑖𝑛𝑒𝑠}.
We compile a training dataset using a ground truth (human annotated) HD map
to generate segmented BEV labels of the four classes. We use a cross-entropy loss with
the modification that any invalid (Nan-valued) pixels in the input BEV (corresponding
to cells occluded or out of range of the vehicle sensors) are excluded from the loss
calculation.

Signed Distance Transform

The final step in the perception pipeline entails generating signed euclidean distance
tranforms [47] for each of the four classes in the semantic segmentation network
output. The distance transforms contain the metric distance to the nearest feature
of each class.
We require a modification to the standard distance transform to account for oc-
clusions. Recall that the distance transform 𝐷𝑇 (𝑓 ) for a binary image 𝑓 evaluates
to 0 everywhere 𝑓 = 0 (the background) and the euclidean distance to the nearest
background point everywhere 𝑓 = 1 (the foreground). Note that the signed distance
transform is calculated as the sum of the forward and negative inverse distance trans-
forms 𝑆𝐷𝑇 = 𝐷𝑇 (𝑓 ) − 𝐷𝑇 (1 − 𝑓 ). This transform has exactly the same values in the
foreground as 𝐷𝑇 (𝑓 ) (the distance to the nearest background point), but has negative
values in the background giving the distance to the nearest foreground point. Simply
treating occluded points as background would have the undesired effect of spurious

79
edges near missing data. Instead, we use an indicator function to generate the for-
ward distance transform treating the missing data as foreground and add the negative
inverse distance transform treating the missing values as background. This ensures
the final distance transform only contains measurements that represent distances to
other valid data and removes the artifacts missing data would otherwise introduce.

Discarding the background and missing data cells, we ultimately construct three
output signed distance transforms: road regions, intersections regions, and lane lines.
These are the inputs used to estimate the state of the HD map online as described
next.

4.3.3 Online HD Map State Estimation

In the online HD map state estimation step, the HD map model parameters are
updated to incorporate new information from the vehicle sensors. We formulate this
problem as a maximum likelihood estimation in which we seek to find Θ* such that

Θ* = arg max 𝑃 (Θ|𝑍1:𝑡 ) (4.1)


Θ

Where Θ is the set of decision variables defining an HD map, and 𝑍1:𝑡 is the set
of all sensor measurements. With Bayes rule we can decompose this as

𝑃 (Θ|𝑍1:𝑡 ) ∝Θ 𝑃 (𝑍𝑡 |Θ, 𝑍1:𝑡−1 ) · 𝑃 (Θ|𝑍1:𝑡−1 ) (4.2)

The second term represents our prior belief over the state given our prior mea-
surements, while the first term is an update step that integrates a new measurement
to obtain an updated belief. We design a tracking algorithm to perform the state
estimation updates online. In the next subsection, we describe how we initialize the
prior 𝑃 (Θ) for the first time before obtaining any measurements, and then we discuss
how the update term is calculated to integrate new sensor data.

80
HD Map Initialization

We model the prior map parameters as gaussians where we require not only an initial
estimate for the value of each parameter, but also an estimate of the uncertainty:
𝑃 (Θ) = 𝒩 (𝜇Θ , ΣΘ ). Furthermore, while we can use the result from the last iteration
in subsequent steps, in the initial estimate before the first sensor measurement is
integrated, we only have the SD map available. Thus, we construct the HD map
model, and initialize all of the parameters solely from the SD prior. We call this
initial HD map estimate the MapLite Prior and it is constructed in an offline process
without requiring any online sensor data. To obtain it, we first collect a training set
of human annotated HD maps. Next, we compare these ground truth maps to the
corresponding SD maps and calculate the parameter values that would generate an
HD map that best matches the ground truth. Finally, we regress initial values and
uncertainties from the distribution of these parameters over the training region. For
example, we regress the value of the parameter 𝜃𝑟𝑏 , a road boundary parameter, in
our ground truth dataset as a function of the number of lanes in each segment. We
then initialize the parameter values at the mean of this distribution, and the variance
as the sum of the squared residuals.

In this way, we can generate an initial guess for the full HD map state. As
expected, we found that parameters that represent features such as lane width which
are fairly consistent initialize with relatively low uncertainty, while those that encode
features like intersection shape, which are much more varied, initialize with relatively
high uncertainty.

Note that since all of the parameters are relative measurements, the HD map is
agnostic to the reference frame (e.g. transforming the basemap geometry transforms
all of the HD map geometry but importantly, the parameter values are unchanged).
This allows the inference to be performed in the local vehicle frame, which is con-
stantly changing with respect to the map frame, without requiring any transformation
of the estimated parameters and uncertainties in the update step.

81
Parameter Notation Unit Distance Transform
{𝑟𝑏/𝑙𝑏}
Segment Boundary 𝜃𝑖 [m] Road Regions
Junction Distance 𝜃𝑖𝑑 [m] Intersections
Junction Angle 𝜃𝑖𝑎 [rad] Intersections
{𝑟/𝑙}
Junction Width 𝜃𝑖 [m] Intersections
Lane Boundary 𝑙
𝜃𝑖,𝑗 [m] Lane Lines

Table 4.1: Parameter types in the HD map model

Belief Update

In order to update the state estimate, we need to obtain the sensor likelihood from
measurements encoded as the set of distance transforms (roads, intersections, and
lane lines) obtained in the Semantic Segmentation step Sec. 4.3.2. We can simplify
the expression in Eq. (4.2) by assuming that the measurements are conditionally
independent,
𝑃 (𝑍𝑡 |Θ, 𝑍1:𝑡−1 ) = 𝑃 (𝑍𝑡 |Θ) (4.3)

First, note that most of the decision variables are not in view in a given update
step. Therefore we define Θ𝑡 as:

Θ𝑡 = {∀𝜃 ∈ Θ s.t. 𝜃 is 𝑖𝑛_𝑣𝑖𝑒𝑤 at time 𝑡}

The update step is then reduced to only updating Θ𝑡 as our estimate of all other state
variables is unchanged.

To find the sensor update distribution, we pair each of the five parameter types
found in our HD map representation with an associated distance transform as shown
in Table 4.1. Let 𝑋 (𝜃) be a rendering function that returns one or more points in
the vehicle frame representing the geometry encoded in each parameter (see Fig. 4-3).
For example, 𝑋 𝜃𝑖𝑑 returns the point on the given junction line at the distance 𝜃𝑖𝑑
(︀ )︀

from the intersection node. Similarly, 𝑋 (𝜃𝑖𝑎 ) returns a set of points distributed along

82
the junction line rotated by the angle 𝜃𝑖𝑎 . Next, let 𝛿 (𝑋) be the mean distance of
the points in 𝑋 (𝜃) to the associated distance transform in Table 4.1. The likelihood
component of the objective function is obtained as

𝛿 (𝜃)
𝑃 (𝑍𝑡 |𝜇Θ𝑡 ) = 1 −
𝛿𝑚𝑎𝑥

where 𝛿𝑚𝑎𝑥 is the value at which the distance transforms are clipped (in our case
10 m). Therefore, as the distance between the encoded geometry and its observation
increases, the likelihood term will become smaller. Integrating this estimate to obtain
its normalization constant would be computationally prohibitive. However, since its
purpose in the objective function is to balance the observation with the prior, we
instead multiply these two terms with parameter weights which we tuned for best
performance.

We then solve for the optimal mean by combining Eqs (4.2, 4.3) to approximate
Eq (4.1) as:
[︁ ]︁
𝜇Θ𝑡 = arg max 𝑃 (𝑍𝑡 |𝜇Θ𝑡 ) , 𝑃 (𝜇Θ𝑡 |Θ𝑡−1 ) 𝑊𝜇𝑇
𝜇 Θ𝑡

where 𝑊𝜇𝑇 is a weight vector that balances between the importance of the prior and
the update.

The associated uncertainty is approximated as

[︁ ]︁
ΣΘ𝑡 = 𝑃 (𝑍𝑡 |𝜇Θ𝑡 ) , 𝑃 (𝜇Θ𝑡 |𝑍1:𝑡−1 ) 𝑊Σ𝑇

where 𝑊Σ𝑇 balances between confidence in the prior, and the measurement.

While we cannot empirically measure this uncertainty, we tune this function to


strike a balance between our confidence in new measurements versus prior measure-
ments. Numerical optimization is then used online to obtain updated parameter
means and uncertainties at each measurement update.

83
4.4 Experiments and Results

4.4.1 Dataset

We evaluate our approach by collecting a dataset of 100 real-world autonomous ve-


hicle sequences in Michigan and California. The vehicle platform includes four lidar
sensors and six cameras giving a 360-degree view around the vehicle. Throughout
each sequence, the ground truth vehicle pose is obtained using a factor-graph based
SLAM system implemented in Ceres [1] that computes an optimized trajectory by
associating landmark observations to a prior fixed landmark map. We estimate the
trajectory estimation error in this system to be < 10 cm. This prior map was also
the reference for the manually annotated ground truth.
Intersections represent the most complex component of the HD map inference
problem as merely detecting lanes on a straight road is relatively much simpler. For
this reason, we focus each sequence on an intersection traversal, beginning the se-
quence ten seconds prior to the intersection entrance, and terminating it five seconds
after the exit. Each sequence is composed of a series of frames at 10Hz where each
frame includes:

1. Six synchronized camera images (1936x1216) giving a surround view of the


vehicle.

2. An accumulated lidar pointcloud from the four lidar sensors over a four second
interval.

3. A local region from the SD prior map downloaded from OpenStreetMap.org.

4. A local region in the hand annotated HD map used for evaluation purposes
only.

We collected a total of 100 sequences, ranging in duration from 10 to 25 seconds.


These include traversals through 18 unique intersections. All sequences are non-
overlapping such that any two sequences that traverse the same intersection do so

84
in either a different direction, and/or at a different time. We randomly select 10
sequences to hold back for validation, with the remaining 90 used for training. Each
frame is processed sequentially, by first projecting to BEV, performing the semantic
segmentation to generate feature distance transforms, and then updating the HD map
model parameters that are within the sensor view at each time step. The final HD
map model obtained after a single traversal is then evaluated as described below.
Since we are interested in online estimation in this thesis, we do not save the updated
maps between sequences, instead, each sequence begins with a new model derived
only from the SD prior map.

4.4.2 Semantic Segmentation Results

While semantic segmentation is not the main focus of this thesis, we report the
results of our segmentation pipeline here to evaluate the role of this component in
the overall system. We train the semantic segmentation CNN using the 90 training
sequences on a single Nvidia Titan XP GPU for 2.5 hours. The input features are 6
channel 1936x1216 BEV images obtained by concatenating {Red, Green, Blue, Lidar
Intensity, Height, OSM Distance Transform}. The output includes per-pixel labels
in {Background, Road, Intersection, Lane Line, Out of Range} where out of range
refers to pixels in the BEV that were empty (e.g. due to occlusion).

We evaluate both overall and per-class accuracy on both the training and valida-
tion splits of the dataset. We achieve an overall accuracy of 93% on the validation
sequences, with per-class accuracies of Background: 97%, Road: 82%, Intersection
96%, and Lane Line: 90%. Fig. 4-5 shows the confusion matrix over these classes.
Notably, although the network performs well at predicting most classes, it does show
some confusion between road and lane markings. We also compare the results over
the training runs and find them to be similar at: 95% overall. This indicates that the
training dataset is diverse enough to avoid significant overfitting.

85
Background 0.97 0.02 0.0044 0.0078
0.8

Road 0.016 0.82 0.012 0.15 0.6

0.4
Intersection 0.0081 0.014 0.96 0.015

0.2
Lane Line 0.0048 0.082 0.015 0.9

Background Road Intersection Lane Line

Figure 4-5: The confusion matrix for the semantic segmentation. The true class is on
the vertical axis, with the prediction on the horizontal.

4.4.3 MapLite 2.0 Results

In evaluating the performance of MapLite 2.0 we compare the HD map state esti-
mate after a single traversal to the hand annotated ground truth map. We limit the
evaluation to a 30 m swath around the vehicle to include only the portion of the map
likely to have passed within range of the onboard sensors. Since the HD map is based
on the underlying SD map prior, while the hand annotated ground truth map was
generated entirely independently, there is no guaranteed one-to-one correspondence
between features. For example, the decision to consider a divided road two separate
one-way roads, as opposed to a single two-lane road has some ambiguity. Therefore,
we propose two metrics for evaluating the overall quality of the HD map estimate.

1. Vehicle Trace Accuracy is a coarse metric that relies on the fact that the

86
Vehicle Trace Accuracy IOU Road Regions
Raw OSM 0.55 (0.40) 0.59 (0.13)
MapLite Prior 0.87 (0.18) 0.77 (0.04)
MapLite 2.0 (Ours) 0.98 (0.04) 0.88 (0.04)
Oracle 0.99 (0.01) 0.89 (0.05)

Table 4.2: MapLite 2.0 evaluation results. Values represent mean (std)

vehicle always drove on the road. We compute the portion of the driven path
that falls within the road boundaries in the HD map estimate.

2. Road-region Intersection Over Union (IOU) is a finer metric that com-


pares the regions defined as road in the annotated map, to those in the HD
map.

We compute these metrics over all the validation sequences and also calculate the
standard deviation to measure the consistency of the results.
For comparison, we also compute these metrics on a “Raw OSM” map, which
is a naively generated road region obtained by inflating the OpenStreetMap road
centerlines a fixed nominal lane width. Additionally, we run the same analyses on the
“MapLite Prior” which is the HD map structure, derived from the OSM prior, along
with the initial estimates of the state parameters, but before any onboard sensor
data is integrated. Finally, to better understand how the semantic segmentation
accuracy affects our results, we also show an “Oracle” result. This evaluates the
same structured estimator algorithm, but using ground truth semantic segmentation
instead of the CNN.
Table 4.2 shows the results of these evaluations. It is interesting that simply
applying the MapLite Prior already improves the Vehicle Trace Accuracy from 55%
to 87% without requiring any additional real-world data. This indicates that applying
the HD map structure with the data driven initialization is sufficient to obtain a
strong prior for the HD map inference. The MapLite 2.0 Posterior, which includes

87
the inference from the onboard perception increases this accuracy to 99%. Compared
to the Vehicle Trace Accuracy metric the IOU metric is finer-grained as it compares
the entire observed road-region rather than only the vehicle trace. Here, we once
again see increasing accuracy with each step as well as decreasing standard deviation
which indicates more consistency.
The reason the IOU does not reach unity is likely due to three factors: 1) The
HD map is derived from the SD prior OpenStreetMap while the ground truth map
is hand annotated independently. Therefore, there is some inherent ambiguity as to
what should be labeled “road” (e.g. paved dividers in semi-divided roads, private
driveways, etc.) 2) Model limitations that do not exactly represent reality. For ex-
ample, our intersection model, links each entry/exit port with a straight line segment
for simplicity, which is only a coarse approximation of real-world intersection regions
used for evaluation. 3) The inference optimization could get caught in a local min-
imum. Computational resources preclude exhaustive search of the entire parameter
space, therefore, a local minimum could prevent the model from reaching the optimal
state.
While the oracle does outperform the MapLite 2.0 model slightly, it demonstrates
that the majority of the remaining error lies in the inference step. Therefore, to
improve these metrics, future work should focus on the three sources of inference
error described previously, rather than on improving the semantic segmentation step
to have the greatest impact.
Fig. 4-6 shows several illustrative real-world examples. The first two columns
include the components that can be run offline as the MapLite prior is derived from
the SD prior before integrating sensor data. The last three columns show how the
sensor data is segmented to create the distance transforms used to update the HD
Map state estimate. One issue of note occurs in row (4) where the road includes a
center turn lane which our model does not explicitly handle. This causes the yellow
centerlines to appear jagged. In contrast, in row (2) the divided center is correctly
inferred online, even though this is not represented in the prior. In future work, we

88
could expand our model to explicitly account for special lane scenarios such as this. In
row (5) an error in the road edge at a single point in the road exiting the intersection
at the top leads to a “kink” in the otherwise smooth road estimate. Providing a term
to encode a “smoothness” constraint may provide more realistic results.

4.5 Discussion
In this chapter, we have presented a novel HD map model representation that is
fully relative to an underlying SD map prior. We have shown that it can provide a
reasonable HD map prior directly from an available SD map. Furthermore, we have
designed a perception and inference system that can be used to estimate the HD map
model parameters to generate an online HD map estimate from only an SD prior and
onboard perception.
This can be useful not only for allowing autonomous vehicles to navigate in pre-
viously unmapped regions, but also for at least two other important applications.
1) An online HD map could be run in the background of a fully mapped solution
to detect changes and suggest map maintenance, and 2) MapLite 2.0 could be used
online to jumpstart mapping of unexplored regions autonomously, which could then
be combined to create an optimized offline map for future use.
There are also some important limitations to this system that should be consid-
ered. The map model is inevitably an approximation and cannot capture all real-world
complexity. While we found that our model is sufficient to represent a large variety
of road types in multiple cities in our dataset, care must be taken to account for new
road structures. For example, there is some judgement required when choosing the
information that should be gleaned from the SD map, versus inferred online. In this
chapter, we utilize the lane number attribute in the SD map as a prior, but detect
the actual lane boundaries from onboard sensors. Conversely, we don’t use crosswalk
information, even though that is available in some regions. Therefore, care should
be taken to select what to include in the prior and what to infer online based on the

89
availability in the region of interest.
In the previous chapter, we focused on probabilistic estimation of the vehicle
localization to a coarse SD map. In this chapter, we instead consider the problem
of inferring the HD map online, while relying on our SLAM system for localization.
This allows us to directly evaluate the HD map inference problem in isolation. For a
full application, these systems will need to be combined to enable both localization
of the vehicle to an SD map, and online inference of the HD map simultaneously.
Additionally, we chose a 2D map representation for simplicity which precludes the
ability to handle multi-level roads (e.g. overpasses). Handling multiple levels through
a map-splitting approach, or an extension to 3D would be needed for deployment on
such roads.
Finally, this approach requires the onboard sensors be able to parse and segment
the scene semantically in order to build the online HD map. If the sensors were to be
occluded by ambient weather such as snow, rain, or fog, the HD map updates would
fail. In the next chapter we will introduce a novel sensing technique – LGPR – which
could be used to augment to enable autonomy even in those challenging inclement
weather conditions.

90
SD Prior MapLite Prior Camera BEV Distance Transform MapLite 2.0

(1)

(2)

(3)

(4)

(5)

Figure 4-6: Selected examples of the online estimated HD map. Rows indicate differ-
ent scenarios while columns illustrate pipeline outputs.

91
92
Chapter 5

Localizing Ground Penetrating Radar

5.1 Overview

Robust localization in diverse conditions is a key challenge to enable the widespread


deployment of autonomous vehicles. Since relying purely on a global navigation satel-
lite system (GNSS), such as the GPS does not provide sufficient precision, research
and industry efforts have focused primarily on utilizing cameras and laser scanners for
the navigation task. These systems typically use Simultaneous Localization and Map-
ping (SLAM) algorithms [22] for creating and maintaining maps of the environment
that allow for highly precise localization and navigation.

Using vision and laser sensors as localization sources comes with its own set of
challenges. For instance, localization from purely visual maps needs to account for
the fact that, even in the absence of occlusions, the appearance of the environment
strongly varies depending on weather, season, time of day, and a potentially changing
environment [84]. While laser scanners do not suffer from changing illumination
conditions, they are still highly affected by weather, occlusions, and environment
dynamics.

In the previous chapters, we introduced MapLite, which enables autonomous nav-


igation without relying on precise localization to HD prior maps using SLAM. How-

93
Figure 5-1: Our LGPR based autonomous navigation system is capable of successfully
localizing from a single map in different challenging weather and lighting conditions
including snow, rain, and darkness.

ever, that method is still susceptible to failures due to inclement weather. In fact,
because the MapLite system requires all of the scene geometry and semantics to be
obtained online from the onboard sensors, rather than from an HD prior map, it could
be even more susceptible to inclement weather failures that obstruct the view of these
features. While HD map-based localization could fail if enough of the key landmarks
are occluded, if localization succeeds, even the parts of the scene that are occluded
by snow or fog would be accessible in the HD map. Thus, neither the traditional ap-
proach using HD prior maps, nor the MapLite framework which relies more heavily
on online perception, are immune to failures caused by sensor occlusions in inclement
weather conditions.

Recently, the use of a Ground Penetrating Radar (GPR) was proposed for local-

94
ization [32]. Similar to the aforementioned approaches, this LGPR can be combined
with GPS to constrain the search space. The main advantages of using GPRs for the
localization task is their robustness to dynamic surface environments, illumination
changes, and visibility conditions. This is made possible by a ground facing sensor
array that creates maps of the underlying structure of the soil and uses these maps
for subsequent localization. However, the LGPR system has never been evaluated on
an autonomous vehicle in adverse weather conditions which include some of the key
applications for this technology.
In this chapter, we deploy the LGPR on a full-scale autonomous Toyota Prius (as
shown in Fig. 5-1). We describe its incorporation into a novel autonomous driving
pipeline and justify all of our design choices. Through extensive evaluations, we
demonstrate the practical capability of LGPR to enable an autonomous vehicle to
drive without relying on any visual or surface features. We design a novel inception
based CNN architecture for learning to LGPR localization that is resilient to changing
weather conditions. For this reason, the method works even when the driving surface
is covered with rain or snow. While the LGPR performs vehicle localization, it does
not enable dynamic obstacle detection in inclement weather. However, in this thesis
chapter we focus on the localization problem and leave the detection of dynamic
obstacles in challenging weather conditions for future work.
Overall, our contributions can be summarized as follows:

• Design of a new autonomous driving pipeline based on LGPR.

• Demonstrate mapping, localization, planning, and control of an autonomous


vehicle without relying on any visible surface features.

• LGPRNet: an inception based CNN architecture for learning localization that


is resilient to changing weather conditions.

• Extensive evaluation of our system localizing on the same map in a variety of


weather conditions including clear weather, rain, and snow.

95
5.2 Related Work

Radar-based Perception. Radars serve as a key component for advanced driver


assistance systems and autonomous vehicles [41]. Past research in that space fo-
cused on fusing radar data with various other sensor types particularly Lidars [108]
and cameras [90] for applications such as pedestrian detection [80] and tracking [17].
Radar-based SLAM approaches [125] usually model structural features in the sur-
rounding environment and have also been investigated in the context of vehicle self-
localization [118, 126]. However, these approaches may fail without special consider-
ations to account for dynamic objects and changing surroundings.
Appearance and Persistence Modelling. Particularly for visual SLAM [22,
84], an extensive body of research explicitly focuses on addressing these types of
problems. The breadth of approaches involves inpainting and removal of dynamic
objects [11, 13], selecting particularly persistent landmarks for map storage [45], and
selecting appearance specific landmarks [19] or map segments [29]. We avoid the
problems addressed in this line of work altogether by using sub-surface features that
do not suffer from frequent appearance changes and occlusions.
GPRs in Robotics. Sub-surface sensing tasks are usually carried out using
GPRs. So far, corresponding research work mainly focused on applications such
as measuring the makeup and content of the soil [35, 109], locating underground
structures [96], mapping archaeological features [34], or landmine detection [36, 28].
Similarly, GPR deployment on robots mainly focused on data collection tasks such as
autonomous surveys [131, 71], inspection [68], or lunar crust analysis [121]. Contrary
to these approaches, we do not focus on classifying the content in the GPR images
but rather on using them to improve the navigation capabilities of the autonomous
system.
Only a limited number of works focuses on consideration of occluded structures in
radar-based SLAM such as [60]. The first use of a GPR for localization was reported
in [32] focusing on describing the design of a low-profile low power GPR system and

96
first demonstrations of real-time localization. In contrast, this thesis demonstrates a
fully-integrated LGPR system for autonomous driving. Moreover, we establish and
evaluate the sensor’s suitability for successful localization across different weather
conditions even when using the same map.
Learning-based Localization Learning based localization methods such as [63,
81] aim to learn a model for localizing a robot on a prior dataset of surface im-
ages. Similarly, our proposed learning-based approach, LGPRNet extends these to
underground features. However, unlike surface features which can be viewed from
many different poses, underground radar features are visible only from the region
immediately above them which substantially affects the localization problem.

5.3 System
An autonomous vehicle requires a large system architecture to enable safe navigation.
The key components of this system include: mapping, localization, planning, and
control. Fig. 5-2 shows a diagram of the various system components. In the following
subsections, we describe the function of these components with an emphasis on how
the design differs in an LGPR system.

Figure 5-2: The autonomous navigation system architecture showing the key system
components: Mapping, Localization, Planning, and Control. The LGPR system used
as the primary localization sensor is outlined in red.

97
5.3.1 Mapping

During the mapping phase, the vehicle is driven by a human operator and the data
from the vehicle’s sensors is recorded. Once this step is completed, the saved map
enables the vehicle to localize in that area. There are two key differences between
the dense 3D maps required in typical vision or LiDAR systems, and those created
using an LGPR sensor. Firstly, because vision and LiDAR maps record surface fea-
tures, they require maintenance when any of those features change which can result in
burdensome repeated map updates. LGPR maps on the contrary, record subsurface
features which are unlikely to change frequently. Secondly, dense sensor maps typi-
cally take up large amounts of space which makes them difficult to store and transmit.
For example, [100] compares the size of a typical (20,000 mile) topological map such
as OpenStreetMap with that of a highly compressed 2D sensor map of the same area.
The topological map only requires 3GB while the sensor map requires about 200GB
of storage space. However, the topological map does not contain enough information
for precise localization. To store an LGPR map of the same area we require approx-
imately 160GB. The data structure used to store the LGPR maps is described in
Sec 5.4.2. Intuitively, these are smaller because the sensor measures only a thin slice
directly below the vehicle, while typical 3D maps contain a detailed view of the entire
environment including surrounding buildings and vegetation. Thus, LGPR maps can
provide precise localization in changing surface conditions without requiring as much
storage space.

5.3.2 Localization

During operation, the autonomous vehicle must have precise localization information
in order to ensure that it closely follows the desired path. Standard freeway lane-
widths in the United States are 3.6 m [49] while the width of a typical passenger vehicle
is 1.7 m. This gives only about 1 m of clearance on each side which is beyond the
accuracy of GPS systems even when augmented with wheel odometers [65]. LiDAR

98
or vision based maps provide localization with sufficient accuracy. However, they
are susceptible to ambient lighting conditions and fail if road surface features such
as lane markings are obscured by rain or snow. The LGPR sensor provides precise
localization to the vehicle by matching the sensor data during operation to that stored
during the mapping phase. Finally, this localization update is then probabilistically
fused with the onboard proprioceptive sensors including the wheel odometry and
the Inertial Measurement Unit (IMU). These are combined to provide a robust and
precise localization estimate at a rate of 40 Hz. Note that distance travelled during
a single cycle is given by 𝑑 = 𝑣/𝑓 where 𝑣 is the vehicle velocity and 𝑓 is the loop
frequency. Therefore, 𝑓 = 40 Hz is enough to ensure that even at highway speeds of
30 m/s the vehicle would travel <1 m before receiving a new localization estimate.

5.3.3 Planning

The planning component determines the path the vehicle is required to take to safely
navigate to its destination. It begins by determining the current location of the
vehicle which it receives from the localization system see Sec 5.3.2. Next, it uses the
LGPR-map from Sec 5.3.1 to determine which roads must be traversed, and dynamic
obstacle detection systems to plan a collision free path toward the goal. Choosing
which roads to take is performed using a shortest-path algorithm such as Dijkstra
[42] and is similar to that performed on standard vehicle navigation devices thus we
will not expound on it here. However, autonomously following the desired path is not
trivial. To do so, we utilize a pure pursuit controller see Sec. 5.5.1 for details. This
algorithm is capable of planning paths that follow a reference trajectory even when the
vehicle cannot exactly follow the reference as is typical in real-world scenarios. This
algorithm is a challenging test for the LGPR localization system, which in previous
work [32] was only evaluated while being manually driven exactly over the same path
as that in the map.

99
5.3.4 Control

The control component is required to determine the actual control values to transmit
to the vehicle, including accelerator, brake, and steering wheel angle. These values
are computed in order to ensure that the vehicle will follow the planned trajectory. In
our system, we utilize two PID controllers to obtain the necessary control values. One
of these controls the accelerator and the brake, while the other controls the steering.
The gains were tuned to allow the vehicle to quickly achieve a desired configuration
while avoiding any instabilities due to lag in the system.

5.4 A Localizing Ground Penetrating Radar

In this section, we describe how the LGPR sensor is used both to build maps during
the mapping phase, as well as to provide localization during autonomous operation
(see Fig. 5-4). The LGPR sensor uses electromagnetic radiation in the 100 MHz to
400 MHz range which is much lower than typical (1 GHz to 3 GHz) GPR devices used
for surveying [114]. This allows the sensor to resolve subsurface geological features
in the 20 cm to 30 cm range with the additional benefit of improving the penetration
depth. The shallow penetration depth of higher frequency sensors would not penetrate
below the road as necessary for observing unique underground features. In general,
the penetration depth is dependent on the relative permeability of the soil [32]. In
our testing region, (Devens, Massachusetts) skin depths of 𝐷 ≈ 100cm are typical
leading to penetration depths in the 2 m to 3 m range. Road materials and soil are
transparent to radar signals at these frequencies. However, changes in the dielectric
properties of the subsurface geology due to variations in soil type and density, rocks,
roots, or layer boundaries cause the signals to be reflected back to the LGPR sensor.
During each element pulse, the neighboring element is used to measure the intensity
of the reflected signal and record an “image” of the subterranean content beneath the
element at the time the pulse occurred. In the following subsections we describe the

100
key features of A) the LGPR sensor utilized, and B) the localization algorithm used
both to record the prior map, as well as to localize a new scan.

Figure 5-3: The LGPR sensor mounted on the rear of the autonomous vehicle. The
main components include the 11 element radar array and the switch matrix which
collected the signals from the array. The processing chassis contains the onboard
computer for the sensor data processing, and the GPS unit used for labeling the prior
maps and initializing the search window for localization.

5.4.1 LGPR Sensor

The LGPR sensor used in this thesis was custom designed by MIT Lincoln Laboratory
for the purpose of localization. Fig. 5-3 shows the key sensor hardware components.
The sensor measures 152cm x 61cm x 7.6cm which requires it to be mounted outside
the vehicle. Because the measurement range resolution depends on the width of the
array, the sensor cannot be easily miniaturized. Therefore, the sensor was mounted
on the rear of the test vehicle at 32 cm above the ground (See Fig. 5-1). This distance
was chosen because it is close to the ground - ensuring a greater penetration depth
- while still remaining high enough for adequate road clearance. Note that in [32]
the sensor was able to be lowered to 15.24 cm which may have an effect on our

101
Figure 5-4: The autonomous navigation pipeline which processed sensor inputs from
the wheel odometry, and LGPR sensor, and computed steering and speed commands
to send to the vehicle to autonomously drive the vehicle along the goal path.

results. Furthermore, we believe the placement on the rear of the vehicle is most
stable since the vehicle begins to depart from the recorded path front first. This
ensures the localization estimates will be available as long as possible to correct for
any departures. However, due to hardware limitations, the sensor location remained
fixed so we cannot know empirically how strong these effects are.

Here we will describe the key features of the sensor, see [32] for more detailed
specifications. The LGPR array is composed of 12 radar elements. During a single
sensor sweep, each element transmits for a fixed period while its neighboring element
receives the reflected signal. Thus, although there are 12 elements only 11 channels of
data are returned. The data is then passed through a number of post-processing steps
including an Infinite Impulse Response filter, online calibration, as well as factory
measured calibration corrections to ensure the data is free of noise and sensor biases.
The resulting measurement from each channel is an array of 369 bytes representing
the relative intensities of the reflections at each depth. Each channel pulses for 8ms
which gives a maximum sensor rate of 126Hz. Thus, a complete sweep of the sensor
can be considered an image of single-byte pixels with dimension 11x369 where each
pixel color gives the intensity of the reflected wave at a particular location. Fig. 5-
5 shows an example of the values of a single element taken as the vehicle moved
along a path. The coherent structures shown provide a picture of the content of the

102
Figure 5-5: LGPR data collected from a single radar element as the vehicle was driven
over a path. The colors indicate the intensity of the reflected signal at each depth
and the high degree of structure allows the subterranean geological features to be
coherently measured using this method.

subterranean environment measured by the sensor.

5.4.2 Localization Algorithm

The onboard computer in the processing chassis receives the sensor data from the
switch matrix, which collects the signals from each of the 12 radar elements. The
sensor has three modes of operation: 1) Calibration, 2) Map Creation, and 3) Local-
ization.

Calibration

During sensor calibration, each element is pulsed for a short period and the mean
values are recorded to ensure the resulting signals are mean centered at 0 and to
compensate for variations in temperature. The calibration routine is fully automated,
only takes a few seconds, and [32] found that it allowed the sensor to operate robustly
at temperatures ranging from −5 °C to 50 °C. In our experiments, we recalibrate daily,
which is necessary to compensate for thermal expansion of the sensor components.
However, since the recalibration always zeros the mean of the signal, we found it does
not preclude localizing to a prior map created with a prior calibration.

103
Map Creation

During the map creation phase, a human driver operates the vehicle and each scan
from the sensor is recorded in sequence building a 3D tunnel of dimension 11×369×𝑁
for the 11 channels, 369 discrete depth bins, and 𝑁 sensor sweeps. Additionally, the
GPS coordinates of the vehicle are collected and associated with each scan. It is
important to notice that a standard GPS device could be used for this labeling process,
and in fact, the LGPR localization would still outperform even the GPS device that
was used in the making of the map itself. Consider that the key information the
LGPR sensor must provide during localization is the pose of the vehicle with respect
to the original data. There is no real requirement to use GPS coordinates at all when
labeling the scans. While it is convenient to use GPS coordinates in order to allow
the sensor localization results to have some real-world meaning, even if the mapping
GPS is not accurate to the “true” GPS coordinates, the vehicle will still be able to
follow the desired path. For this reason, the extrinsic calibration between the GPS
and LGPR is also not needed. Since the GPS is only being used for labeling, a fixed
offset will not affect system performance. We will call these coordinates LGPR-GPS
coordinates to differentiate them from the true GPS coordinates that are fixed to the
earth frame.

Localization

During the localization phase, a single sweep of the sensor is compared to the prior
map database. In order to keep memory costs constant, the map data is pulled from
the database in square grids in the local area where the vehicle is operating. Next, for
a given hypothetical pose for the current scan in the map, the correlation is calculated
as ∑︁
𝐴𝑖,𝑑 𝐵𝑖,𝑑
𝑖,𝑑
𝑟𝐴,𝐵 = √︃∑︁ (5.1)
𝐴2𝑖,𝑑 𝐵𝑖,𝑑
2

𝑖,𝑑

104
where 𝐴, and 𝐵 represent the current scan, and the scan data from the prior map
respectively, 𝑖 spans the number of channels, and 𝑑 spans the number of depth bins.
Note, that rather than assuming the vehicle sensor will directly overlap a previous
scan, the scan from the prior map is generated for any 5 degrees of freedom (DOF)
pose (x, y, roll, pitch, yaw) of the vehicle by interpolating in the 3D tunnel from
the dataset. (Even though the vehicle does change altitude, since it’s height with
respect to the map remains constant the 𝑧 value is unnecessary. However, the roll
and pitch are included because small variations in the orientation of the sensor due to
vehicle suspension could lead to large errors in the position of underground features.)
In this way, any hypothetical pose that overlaps the data is considered. To find
the optimal match that maximizes 𝑟𝐴,𝐵 in (5.1), a particle swarm optimization is
performed to balance the need for quickly finding solutions when they exist in the
expected location, with the ability to search over a large area. As described in [32]
the number of particles, the size of the search region, and the number of iterations,
are tuned online to ensure fast operation when the correlations are high, with a more
exhaustive search if the correlations indicate a poor match. Importantly, although
the sensor itself, polls at a constant 126 Hz, the particle swarm optimization can
limit the actual localization frequency. With the computational resources available
(see Sec. 5.7.1), we found that the localization results could keep up with the sensor
frequency only when the vehicle was following closely to the path in the prior map.
However, when correlations were lower due to noise, or the sensor not fully overlapping
with the prior map, the localization frequency would drop.

Each localization solution includes a 5 DOF pose composed of LGPR-GPS coor-


dinates and orientation, the correlation value 𝑟𝐴,𝐵 ∈ [−1, 1] , and an overlap value
𝑢𝐴,𝐵 ∈ [1..11] indicating how many of the 11 channels overlapped with the prior map
in the optimal pose. In the next section, we detail how these localization estimates
were integrated into the autonomy pipeline.

105
5.4.3 Sensor Fusion

We utilize multiple instances of the Extended Kalman Filter (EKF) algorithm for
fusing the various sensors. One instance fuses the wheel encoders and IMU sensor
measurements to obtain a filtered odometry estimate, and a second instance fuses
these with the LGPR measurements to obtain an estimate in the LGPR-GPS reference
frame. The motivation for using two separate EKF instances is twofold. Firstly, the
proprioceptive sensors run at a consistently fast frequency while the LGPR sensor
is sometimes slower if the correlation quality is low. Secondly, this gives access to a
smooth position estimate in the local frame of the robot, while the global estimate
can have discrete jumps if an LGPR correction comes in after a relatively long pause.
The EKF algorithm has been exhaustively explored in the literature [61, 123, 15]
so we will only cover the key parts that are important for our implementation. We
utilize the EKF architecture as described in [89] for both instances. The state vector
is 8 dimensional containing the 3 DOF pose of the robot, its velocities, and linear
accelerations:
[︁ ]︁⊤
𝑥 = 𝑥, 𝑦, 𝜃, 𝑥,
˙ 𝑦, ˙
˙ 𝜃, 𝑥¨, 𝑦¨

We utilize the wheel encoders only to measure the linear velocity 𝑥˙ and the IMU to
˙ Note the IMU is paired with a magnetometer giving it an absolute
measure 𝜃 and 𝜃.
measurement of 𝜃 rather than solely a relative one. We extrinsically calibrate the
IMU using the procedure described in [101]. Each LGPR measurement contains

[︁ ]︁⊤
𝑧 = latitude, longitude, 𝜃

which is transformed into the Universal Transverse Mercator (UTM) frame using
the transform in [62]. The process noise covariance matrix 𝑤 ∈ R8×8 was tuned in
the field to match observed uncertainties. It contains only diagonal entries with the
values:
[︁ ]︁
diag (𝑤) = 0.5, 0.5, 6, 2.5, 2.5, 2, 1, 1 × 10−2

106
Finally, since each of the sensors directly measures a state variable, the associated
observation matrix 𝐻 (the jacobian of the observation function) is all 0’s except for
1’s in the diagonal positions of the measured state variables.

5.5 Planning and Control Pipeline

5.5.1 Pure Pursuit Controller

The pure pursuit controller [33] is a path tracking controller for Ackermann steered
autonomous vehicles. It was used by the MIT entry in the DARPA Urban Challenge
in 2007 [70] and has been widely adopted for autonomous driving applications. One
key feature of the algorithm is its pursuit of a “lookahead point” on the goal path at
some distance 𝑑 ahead of the vehicle to smoothly adapt to deviations of the vehicle
from the reference path. The pure pursuit steering equation is
(︂ )︂
−1 2𝑥
𝛿 = tan 𝐿 2
𝑑

Where 𝛿 is the required steering angle, 𝑥 is the offset to the goal point, 𝐿 is the vehicle
length (measured between the front and rear axles) and 𝑑 is the lookahead distance.
Fig. 5-6 shows the vehicle driving along the goal path (in green) and approaching a left
turn. The blue path shows how the pure-pursuit algorithm will steer the vehicle. As
expected, it begins by oversteering in the beginning of the turn and then overshoots
at the end. The amount of path deviation depends on the lookahead parameter 𝑑.
However, when utilizing LGPR, deviations from the path cause the sensor overlap
value 𝑢𝐴,𝐵 to drop as discussed previously. The EKF implementation described in
the previous section is crucial to allow the vehicle to continue navigation even when
the overlap is low.

107
Figure 5-6: The pure pursuit tracking algorithm was used to smoothly steer the vehicle
along the goal path (shown in green). The blue sphere shows the lookahead distance
and the blue curve shows the path the controller intends to take. As expected, this
path tends to deviate from the goal path in turns.

5.5.2 Speed Controller

For the purpose of evaluating the LGPR’s capability to provide localization for au-
tonomous navigation, the speed of the vehicle is not a critical element. However,
a controller to regulate the speed was still required to maintain the vehicle cruising
speed and appropriately decelerate in turns. To this end, we designed a speed control
law: {︂ }︂
𝛼 𝑣𝑚𝑎𝑥
𝑣 = min 𝑣𝑚𝑎𝑥 ,
|𝛿|
Where 𝑣𝑚𝑎𝑥 is the maximum speed, 𝛼 is a tunable parameter, 𝛿 is the steering angle,
and 𝑣 is the commanded speed sent to the vehicle controls. [32] evaluate the per-
formance of the localization with a human operator at speeds of up to 65mph. For
our tests however, using a fully autonomous vehicle, we set 𝑣𝑚𝑎𝑥 = 10 − 20mph for
safety reasons. Finally, we found that 𝛼 = 0.1 provided an appropriate rate of speed
reduction while turning.

108
5.6 LGPRNet: Learning to Localize in Weather
In the previous section, we implemented a correlation based localization algorithm
that relies on the LGPR features detected remaining stable between the mapping and
localization phases. While this assumption largely holds, there is some concern that if
the weather significantly changes, it could have an effect on the signal that is difficult
to model. For example, if the moisture content of the soil were to increase due to
rain, the dielectric properties of the soil could have an effect on the penetration depth
of the signal leading to a scaling of the received image. Since this effect would be
difficult to explicitly model, were we propose using a learning based method to use
data from various weathers to learn a localization model that is resilient to differnt
weather conditions.

5.6.1 Problem Formulation

To address the Localization in Weather benchmark, we define the localization problem


comprising a prior map 𝑀 𝑊 in the world frame consisting of a set of 𝑁 LGPR frames
𝐹𝑖 for 𝑖 ∈ {1..𝑁 } where each frame 𝐹𝑖 ∈ R𝑁𝐷 ×𝑁𝐶 and 𝑁𝐷 is the number of depth bins
and 𝑁𝐶 is the number of channels in the radar array. Thus, the value 𝐹𝑖,𝑑,𝑐 represents
the intensity of the radar return in the 𝑖th frame at a particular depth and channel.
Furthermore, each frame 𝐹𝑖 is associated with a pose 𝑃𝑖 which contains the pose of
the sensor in the world frame at the time the frame was measured. Next, we have a
target frame 𝐹𝑇 ∈ R𝑁𝐷 ×𝑁𝐶 which is measured when the vehicle revisits the mapped
region. Our goal is to find the target pose 𝑃𝑇 which gives the pose of the sensor at
the time 𝐹𝑇 was measured.
We simplify the problem by noting that even coarse GPS estimates enable focusing
on a Region of Interest (ROI) which can significantly narrow the size of the map that
must be searched. To this end, we replace the full map 𝑀 𝑊 with a target specific
window 𝑊𝑇 of length 𝑁𝑊 , a fixed hyper-parameter, which selects linearly-spaced
frames from 𝑀 𝑊 in the region of 𝑃𝐶 , a coarse estimate of the target pose 𝑃𝑇 . Since,

109
the LGPR collects frames at a fixed frequency, the spatial density of the map frames
varies by the speed of the mapping vehicle. Therefore, the spatial size of the window
is variable, and can depend on the uncertainty in the coarse pose estimate.

5.6.2 LGPRNet Architecture

We propose a network architecture for LGPR localization, LGPRNet, based on the


Inception architecture [124] which excels at detecting features at varying scales and
uses dimensionality reduction to allow for a very deep architecture while reducing the
computational cost. The problem of detecting underground radar features is quite
distinct from the ImageNet [40] object detection task for which pre-trained models
are available. Additionally, the dimensionality of the input features is unique in this
problem because the number of channels in a radar sensor is quite low compare to
the number of depth bins creating a very narrow aspect ratio (33:1) while the number
of frames in each window region can be larger than the typical 3-channel, low aspect
ratio, RGB images used as input in those networks. For these reasons, we design
a custom network architecture (see Fig. 5-7) where the layer parameters and kernel
shapes are specifically designed for the unique LGPR feature shapes and train it using
GROUNDED data.
The input features shape is (𝐵, 𝑁𝑓 , 𝑁𝐶 , 𝑁𝐷 ) where 𝐵 is the batch size, 𝑁𝐹 =
𝑁𝑊 + 1 is the number of frames in the input including the frames from the window
and the target, 𝑁𝑐 is the number of radar channels, and 𝑁𝐷 is the number of depth
bins in each frame. Note that while the batch size is variable, 𝑁𝑓 , 𝑁𝑐 , 𝑁𝑑 , are fixed
for a given trained model. For our implementation 𝑁𝑓 = 10, 𝑁𝐶 = 11, 𝑁𝐷 = 369.
Due to the unusually large aspect ratio (33:1), we design rectangular kernels such as
(3,11) in the first convolution, and (3,7) in the second, as well as rectangular max
pooling windows such as (2,3) in the first MaxPool, which balances the sparsity in
the channel dimension with the density in the depth dimension. Finally, after the last
fully connected layer in each of the three outputs (Main, Aux1, and Aux2) we add a

110
Figure 5-7: LGPRNet architecture composed of traditional convolutional layers along
with Inception building blocks includes a total of 20 layers (Not including the auxiliary
outputs, and considering each inception block a single layer). Notice that many of
the kernels are not square to account for the unusually high aspect ratio of LGPR
frames.

111
sigmoid activation to transform to the output range of each pose dimension.
We choose an output representation which is relative to the pose of the input win-
dow. This obviates the need to include the real-world pose as an input and simplifies
training. Instead, the predicted pose is a 3-tuple of [longitudinal, lateral, rotation]
where longitudinal is a float in [0,1] which represents the longitudinal position with
respect to the input window. The lateral position is a float in [0,1] which indicates
the lateral offset from the window with respect to the sensor width. Since the sensor
must be overlapping at least partially with the map in order to localize, the range
0,1 represents the full range of [-50, 50]% overlap in the lateral direction. Finally,
the rotation is the angle in radians, normalized to the distribution in the dataset,
where 0 indicates the target pose is aligned with the center of the map window. In
this way, the model can predict an output pose relative only to the window, and the
final vehicle pose prediction is then computed using the known window frame poses.
Using window-relative output poses in this way does simplifies training. However, it
does impose a limitation on the possible predictions of the network. For example, the
prediction will always like within the input window longitudinally, and will always
assume at least 50% overlap with the map frame laterally. Therefore, it is important
to choose a large enough window to capture the uncertainty in the initial pose esti-
mate. It would be interesting to also predict an output reflecting the likelihood that
the window doesn’t contain the target scan, however, we leave that for future work.

5.6.3 Training

We train the LGPRNet on 72 runs from the GROUNDED dataset. Since the goal of
this benchmark is to localize in inclement weather conditions, we choose a selection
of map/target pairs that include different weather conditions between the mapping
and localization. (Please see Supplementary Material for the complete list.)
Each map/target pair consists of two runs collected on the same geographic route.
To create labeled features, we use the GPS measurements provided in the dataset.

112
Since the GPS measurements are provided at 100 Hz while the LGPR measurements
are provided at 126 Hz we use linear interpolation between the GPS measurements
to obtain a groundtruth pose at each LGPR frame. Note that extra care must be
given to angular measurements to ensure interpolation properly wraps around ±𝜋.
Next, for each target scan, a set of training examples is created by selecting a set
of possible map window regions that contain the target scan. Since each target
can be located anywhere in the window, and the windows can vary in size, there is a
combinatoric number of training examples that can be obtained from each map/target
pair. For example, in a 5 minute drive containing ≈ 50𝑘 radar frames, ≈ 100𝑀
training examples can be generated. While this is more than enough data for training
the network, using only data from a single run would quickly overfit to that location.
Instead, we randomly subsampled 100𝑘 samples from each of the 72 runs to obtain a
training set of 7.2𝑀 labeled examples. for training.
We further hold back 100𝑘 randomly selected examples for validation and train
the network on a single Nvidia 1080Ti GPU until convergence (approx 24 hours).
During training, we perform back-propagation at both the main output layer as well
as two auxiliary outputs which aids with training the earlier layers. We weighted these
losses 50% on the output and 25% on each of the auxiliary heads. We also apply a loss
weight on the regressed pose outputs to account for the different units between the
translation and rotation components as described in Localization in Weather Metric.

5.7 Experiments and Results


We performed extensive evaluation of the system in order to verify that the LGPR
sensor could provide a localization estimate precise enough for autonomous navi-
gation. We measure the localization accuracy while the system is being operated
autonomously and compare it to the localization accuracy when a human is driving.
Finally, because the key benefit of LGPR localization over existing localization tech-
niques is its capability in inclement weather, we compare the localization accuracy in

113
different weather conditions.

5.7.1 Setup

The testing site was a closed rural area in Devens, Massachusetts. It includes ∼7 km
of unmarked roads and 9 intersections. Testing took place over a period of six months
to compare a variety of weather and lighting conditions including clear weather, snow,
rain, and darkness (See Fig. 5-1).
The LGPR sensor was mounted on the trailer hitch of a Toyota Prius which was
modified to drive by wire. Two computers ran in parallel to enable the autonomous
system. Onboard the LGPR sensor the LGPR-PC carried out all localization related
computation. The sensor broadcast localization estimates via Ethernet to the PRIUS-
PC onboard the vehicle. The PRIUS-PC then incorporated that information into the
autonomous driving system. Note that the PRIUS-PC was a consumer grade laptop
computer with an Intel Xeon E3-1505Mv5 Quad Core 2.80GHz processor, 32GB of
RAM, and running Ubuntu 16.04. The only other sensors used during testing were
two wheel encoders on each of the rear wheels and an IMU.
To evaluate the performance of the system, an additional RTK-GPS unit with
Differential GPS (DGPS) corrections from a fixed base station was installed on the
vehicle for ground truth. The setup gives the precise vehicle position with ∼2 cm
accuracy. While the LGPR system already uses an onboard GPS for creating maps,
and initializing the search radius, that device did not have access to the base station
correction in order to simulate real-world operation. The base station only has a
broadcast radius of up to 9 km, which renders it impractical for use in general purpose
autonomous vehicles. The measurements from the base station were used only for
analysis and evaluation of the LGPR system.
We chose to use the uncorrected GPS unit for the LGPR device even during the
mapping phase as requiring an extremely accurate GPS sensor could make it very
difficult to create maps. Furthermore, since the GPS coordinates saved are only

114
necessary to be locally accurate for navigation purposes, it is not necessary that the
LGPR-GPS coordinates match the true GPS coordinates.

5.7.2 Testing

During testing, each test was composed of two runs each of which yielded two tra-
jectory measurements for a total of four trajectories per test. The first run was a
mapping run with the LGPR system in map creation mode and the vehicle driven by
a human driver. The trajectory measured by the LGPR sensor during map creation
will be denoted 𝑇𝐿𝐺𝑃
𝑚𝑎𝑝
𝑅 . Additionally, the RTK-GNSS system was used to precisely

track the true trajectory of the vehicle during map creation and that trajectory will
be denoted 𝑇𝐺𝑁
𝑚𝑎𝑝
𝑆𝑆 . Next, the vehicle was driven over the same track again, with the

LGPR sensor set in tracking mode. The localization estimates during the testing run
will be denoted 𝑇𝐿𝐺𝑃
𝑡𝑒𝑠𝑡
𝑅 . Again, the RTK-GNSS was used to measure the true trajec-

tory of the vehicle and that trajectory will be denoted 𝑇𝐺𝑁


𝑡𝑒𝑠𝑡
𝑆𝑆 . Note that during the

testing run the vehicle could be piloted by either a human driver, or the autonomous
system.
The LGPR sensor only needs to localize the vehicle within the coordinates of the
created map, rather than in the fixed ground-truth GPS frame. Therefore, we cannot
simply compare the 𝑇𝐿𝐺𝑃
𝑡𝑒𝑠𝑡
𝑅 estimate to the 𝑇𝐺𝑁 𝑆𝑆 because the coordinates of the
𝑡𝑒𝑠𝑡

LGPR system are limited by the accuracy of the onboard GPS unit used for mapping
(which does not receive base station corrections). Hence, there is no expectation
for the 𝑇𝐿𝐺𝑃
𝑡𝑒𝑠𝑡
𝑅 values to match the “correct” ones, rather they need to be consistent

with those measured in the mapping phase in order to allow accurate path following.
Consequently, as in [32], we utilize a relative metric to obtain the accuracy of the
LGPR localization by computing the mean relative difference. That is, the mean
error over a run with 𝑛 scans is computed as:

1 ∑︁ ⃦(︀ 𝑡𝑒𝑠𝑡 𝑚𝑎𝑝


)︀ (︀ 𝑡𝑒𝑠𝑡 𝑚𝑎𝑝
)︀⃦
𝑆𝑆,𝑖 − 𝑇𝐺𝑁 𝑆𝑆,𝑖 − 𝑇𝐿𝐺𝑃 𝑅,𝑖 − 𝑇𝐿𝐺𝑃 𝑅,𝑖
⃦ 𝑇𝐺𝑁 ⃦
𝑛 𝑛

115
Figure 5-8: A comparison of the LGPR estimates compared to ground truth in various
weather conditions.

where, the 𝑖th estimate of the 𝑇𝐺𝑁


𝑚𝑎𝑝
𝑆𝑆 is denoted 𝑇𝐺𝑁 𝑆𝑆,𝑖 etc. Note that the sub-
𝑚𝑎𝑝

tractions shown are the vector differences between the offset from the map to the
test runs as measured by the LGPR sensor, and the same vector as measured by the
RTK-GNSS system. This method provides a measurement of how closely the relative
position estimate of the LGPR system matches the ground truth RTK-GNSS system
which is necessary for autonomous path following.

One concern with calculating the mean error is aligning the timing of the trajecto-
ries. Each trajectory is measured using individual sensor clocks yielding four separate
time series. The two trajectories for each run occurred at the same real world time,
and therefore alignment only requires finding the scalar offset between the clocks.
However, the time steps between the map and test runs cannot be as easily aligned
because the runs took place over different intervals. To align these four trajectories
to a unified time axis, we first align the estimates in 𝑇𝐺𝑁
𝑚𝑎𝑝
𝑆𝑆 and 𝑇𝐺𝑁 𝑆𝑆 by aligning
𝑡𝑒𝑠𝑡

each point in the test trajectory with the closest point in the map. Next, the two
scalar offsets that best align the LGPR and RTK-GNSS trajectories respectively were
found through numerical optimization.

116
Finally, after synchronizing the timing and calculating the total mean error, the
error vector e at each scan point was decomposed into a cross-track error, e⊥ , and an
along-track error, e‖ for additional insight. First, the unit tangent vector at each point
was calculated using a centered finite difference along 𝑇𝐺𝑁
𝑡𝑒𝑠𝑡
𝑆𝑆 and then the components

along the track and perpendicular to the track were found such that e = e⊥ + e‖ .
For autonomous navigation purposes, the requirements for low cross-track error are
typically more stringent than the along-track error due to lane tolerances.

Figure 5-9: A histogram showing the distribution of scan correlations between the
test run and the LGPR map in various weather conditions.

5.7.3 Correlation Method Results

The system was evaluated over a total of 17 km composed of both manual and au-
tonomous driving modes in clear weather, rain, and snow.

Weather Conditions

Fig. 5-8 shows the results from comparing the effect of weather on localization accu-
racy. In clear weather, the system achieved 0.34 m mean total error with a cross-track

117
Figure 5-10: A comparison of the LGPR estimates compared to ground truth in
manual and autonomous driving modes.

error of only 0.26 m. This value is significantly better than GPS error which is typi-
cally >1 m even when fused with inertial and wheel odometry sensors [65]. Comparing
the results from clear weather and snow, there is very little degradation in localiza-
tion accuracy with mean total error of 0.39 m and mean cross-track error of 0.29 m
respectively.

In the tests during rain, a degradation in the localization quality was observed,
The mean total error jumped to 0.77 m and mean cross-track error was 0.40 m. How-
ever, these values are still significantly lower than utilizing a GPS/INS solution and
were not large enough to require disengagement of the autonomous system. This
degradation in localization accuracy can potentially be attributed to the water in the
ground changing the dielectric properties of the soil and inhibiting the penetration
depth of the signal. Fig. 5-9 shows a histogram of the correlation values of the local-
ization measurements as calculated in Eq. (5.1) in various weather conditions. It is
interesting that rather than the rain causing a high instance of very low correlations,
it only shifts the histogram left which indicates that the rain is not fully obscuring
the signal but rather “blurring” it and making the measured scans more difficult to

118
match to the map. It is possible that these signals could be cleaned in post-processing
to achieve accuracy in rain closer to that in clear weather. However, we leave that
for future work.

Driving Mode

Finally, Fig. 5-10 shows the evaluation results when comparing autonomous and man-
ual driving. During manual driving, the system achieved 0.27 m mean total error and
0.20 m mean cross-track error. However, in autonomous mode, the mean total error
went up to 0.5 m and the mean cross-track error was 0.36 m. (Note that this includes
all weather conditions.) The degraded performance in autonomous mode can be at-
tributed to the fact that the autonomous systems do not follow the mapped path
perfectly. However, the magnitude of the error was not so significant to preclude
following the reference path and over the course of over 8 km of autonomous testing
not a single unplanned disengagement was necessary.

5.7.4 LGPRNet Results

To evaluate the trained LGPRNet model, we select 10 distinct map/target run pairs
none of which were included in the 72 runs used for training. Additionally, we ensured
that there was no geographic overlap between the any of the evaluation runs and the
training runs. This ensures that the model is learning the relative pose between the
target and the map and not learning the map itself. To avoid requiring LGPR maps
to be built specifically for a particular weather condition, we only use maps collected
in clear weather for the Localization in Weather benchmark. In this evaluation, we
compare the localization quality when the target is in a variety of weather conditions
including clear weather, snow, and rain, while localizing to the same clear-weather
map. (Please see Supplementary Material for the complete list of runs used for
evaluation.)
Table 5.1 shows the evaluation results compared with the results from [99] which

119
Clear Snow Rain
Correlation 0.34 0.39 0.77
Mean Error
LGPRNet (Ours) 0.32 0.44 0.47
Correlation 0.26 0.29 0.40
Lateral
LGPRNet (Ours) 0.16 0.26 0.26
Correlation 0.17 0.21 0.57
Longitudinal
LGPRNet (Ours) 0.24 0.30 0.33

Table 5.1: The results of the LGPRNet predictions compared the hand-designed
optimal correlation algorithm used in [99]. The mean distance error (m) is compared
as well as its components in the lateral and longitudinal directions with respect to
the lane direction.

used a hand-designed correlation based optimization to localize. We see that the


learned model compares similarly for clear weather and snow, while significantly im-
proving on the localization quality in rain. Importantly, the metrics reported for the
correlation based approach were considered for localization in each weather, to vari-
ous maps including both matching and non-matching weather maps. Here, in keeping
with the specification for the benchmark, we evaluation rain and snow conditions us-
ing only maps collected in clear weather which is a more challenging task.

We interpret the degradation in localization quality in rain as due to the changing


dielectric properties of the soil when it is saturated. This could impact the radar
signal by effectively scaling the entire frame along the depth dimension. Thus, while
the previous correlation-based approach could not account for this variation in scale,
LGPRNet learns to correct for this through the inclusion of varying weather conditions
in the training set. On the other hand, the lower error found in the snowy weather
conditions is likely not caused by a systemic change in the ground dielectric content
since the snow sits mostly above the ground and doesn’t saturate it like rain does.
Instead, this layer of snow between the sensor and the ground could introduce some
signal attenuation, which would appear as sensor noise. Similarly, the error in clear

120
weather localization is likely due in part to sensor noise, as well as imperfect overlap
between the target and map data. One possible solution that could refine this estimate
in practice would be to fuse these absolute measurements with a proprioceptive state
estimation system including an IMU and/or wheel encoders as was done in [99].
However, here we focus on evaluation of the LGPRNet prediction directly.

Also important to note is that for autonomous vehicles, error in the longitudinal
(along lane) direction is often less critical than the lateral direction (across lanes).
Table 5.1 shows that the LGPRNet performs particularly well in the lateral direction
where it has a maximum of 26 cm of deviation even in rain and outperforms the
correlation approach in all weather types. This is accounted for by the weights in the
Localization in Weather benchmark described previously. Using Eq. (6.1) we compute
benchmark values of:

𝑠𝑠𝑛𝑜𝑤 =0.585

𝑠𝑟𝑎𝑖𝑛 =0.595

Finally, we also investigate how the position of the target in the window, and the
size of the search window affect the localization accuracy. Fig. 5-11 shows that the
optimal position for the target is in the center of the window, as the error increases
by about 1% at each end. This is likely due to the center of the window containing
the smallest average distance to the frames in the window. Interestingly, with respect
to window size, there appears to be an optimal size at approximately 3.5 m. This
is likely a factor of the size of the features in the signal, as very small windows do
not contain enough variation, while very large windows induce error due to the larger
search space. Therefore, these data indicate that the optimal window size should be
the minimum of 3.5 m and the uncertainty of the coarse pose estimate.

121
Figure 5-11: The localization accuracy is affected by the target position in the window
(Top) as well as the size of the search window (Bottom).

122
5.8 Discussion
In this chapter we have presented a solution for autonomous driving using only under-
ground features measured with an LGPR sensor. This solution permits safely driving
in inclement weather including rain, and snow, and under complete darkness at night.
Furthermore, we have obtained measurements of the system performance during fully
autonomous navigation, which, to the best of our knowledge, has never been pub-
lished previously. While safety requirements at our testing facility did not allow for
testing autonomously at high speeds, [32] tested a similar setup (albeit with a human
driver) at highways speeds and demonstrated successful localization. Therefore, we
believe our results will readily extend to high speed driving as well.
Referring to Fig. 1-2, the current system fully addresses three of the challengeswith
HD prior maps:

1. Map Maintenance is solved because the LGPR maps are much more stable than
surface features used in typical HD maps which can change frequently.

2. Perceptual aliasing is not an issue with LGPR maps because we have found
that the features detected are unique enough that even a course GPS prior, or
tracking the vehicle over time using odometry is sufficient to disambiguate any
perceptual aliasing.

3. Inclement weather is addressed because the LGPR sensor is much less suscep-
tible to surface weather conditions in particular when using the learning-based
method described here.

This method also partially addresses the issue with data storage and transmission.
As described in Sec. 5.3.1 the LGPR maps are smaller than typical HD maps, albeit
not quite as small as the SD priors used in the MapLite method (Chapter 3).
Finally, this method does not address the issue with requiring a dedicated map-
creation phase because like typical HD map-based localization systems, the LGPR

123
requires the sensor be used to collect a map prior to autonomous operation. However,
because this method is most valuable in inclement weather conditions, we propose that
a single system with both MapLite (using cameras and lidar) and LGPR localization
would provide the best of both worlds. The MapLite system could be used to explore
new areas during fair weather when the surface sensors are operational, and the LGPR
sensor passively used to create maps at the same time. Then, in the event of inclement
weather, the LGPR sensor could be used as a fallback. This would reduce the cases
where the vehicle cannot safely operate to ones where it is both in a new area that
has not been mapped yet, and during inclement weather where the cameras or lidars
are occluded.

124
Chapter 6

The GROUNDED Dataset

6.1 Overview

The ability to localize in the environment is critical to enable the widespread deploy-
ment of autonomous vehicles. While the GPS is often used to obtain approximate
global localization, it lacks the accuracy necessary to meet the stringent requirements
of autonomous driving [132]. For this reason, most fielded autonomous vehicle solu-
tions currently localize on HD maps with either lidar sensors [72, 74, 133], cameras
[91], or both [134]. Localization with these sensors can provide accuracy in the range
needed for autonomous operation. However, sensors that rely heavily on surface
features in the environment have an inherent failure mode should the environment
change between the mapping and localization phases. Some approaches aim to filter
out dynamic objects during mapping [12]. Others seek to identify and map only stable
features or landmarks in the environment [46, 20]. Robustly dealing with inclement
weather such as snow is particularly challenging as snowfall can dramatically alter the
surface appearance. Solving this problem remains one of the open challenges to enable
human-level (or above) performance of autonomous vehicles in diverse environments.
Recently, LGPR [32] has been proposed to address the localization task in such

125
environments. By mapping and localizing using features beneath the ground, LGPR
can avoid the instability of surface-based maps. LGPR was used as the sole local-
ization sensor in [99] to navigate an automated vehicle in clear weather, rain, and
snow conditions. However, the sensor demonstrated in that work is not commercially
available. Typical GPR sensors utilize frequencies in the 0.5 GHz to 2.5 GHz range
[10] to obtain a high-resolution image of objects close to the surface. However, a
lower frequency of 100 MHz to 400 MHz is ideal for localization because the greater
penetration depth enables mapping of deeper, more stable features [32]. Several com-
panies have recently announced plans to commercialize LGPR technology [113, 120].
Currently however, there are still no options to purchase LGPR sensors suitable for
localization. The lack of publicly available sensors or datasets has left the technical
community in a holding pattern.

In this chapter, we aim to address the lack of access to LGPR systems and to
enable algorithmic development for localization of autonomous vehicles in a wide
range of weather conditions and illumination. We describe and release an open dataset
of LGPR frames collected using one of the prototype sensors described in [32]. We also
propose two challenges aimed to accelerate the development of solutions for mapping
and localization under challenging driving conditions such as difficult illumination,
heavy rain, and snow. We believe the dataset will enable the research community
to replicate and improve upon the current state-of-the-art, and to tackle new open
problems for autonomous driving in difficult weather. For example, [99] found a
degradation in localization performance when localizing in rain or snow, perhaps due
to the unmodeled changes to the moisture content of the soil. Furthermore, both [32]
and [99] conducted both mapping and localization in a single lane. Since practical
autonomous vehicles will need to change lanes, it is an important area of research to
stitch together multiple lanes to form a coherent map.

The dataset consists of 108 runs amounting to a total of 450 km and 12 hours of

126
driving. For each route, there is data associated with clear, rainy, and snowy weather
(see Fig. 6-2). The data includes groundtruth GPS location, odometry, and scans
from the LGPR sensor, the camera, and the lidar system on the vehicle. Because
the LGPR sensor has a data collection footprint equal to its width, which is smaller
than the width of a road, a single path of the LGPR does not provide a complete
map of the ground features for that road. We address this limitation by providing
multiple paths for each segment, with the vehicle driving left, center, and right on
the road (see Fig. 6-3), along with a challenge to align and stitch different paths into
a complete map.

In summary, this chapter describes the following contributions:

• The first publicly available dataset of Ground Penetrating Radar data for local-
ization and mapping collected in a variety of weather conditions, and multiple
adjacent lanes,

• Two challenge benchmarks 1) Localization in Weather and 2) Multi-lane Map-


ping to compare LGPR research

• Additional lidar and camera data streams to enable comparison with existing
visual and lidar navigation approaches in driving.

The remainder of this chapter is organized as follows: In the next section we


review related work with LGPR sensors and autonomous driving datasets. In Sec. 6.3
we present the challenge benchmarks. Next, in Sec. 6.4 we describe the dataset
organization and software development kit. Then, in Sec.6.5 we describe the research
platform, sensor suite, calibration, and time synchronization of the sensor streams.

127
6.2 Related Work

Localization: Over the last two decades, the robotics community extensively con-
sidered the problem of localization and mapping [22] involving a diverse set of sensors,
most prominently cameras [84] and lidars [72, 74, 133]. Particularly for autonomous
driving, a considerable amount of work focused on dealing with challenging and chang-
ing appearance conditions [20] such as weather [43] or occlusions [12, 48]. To improve
robustness, radar has also been considered as a localization modality for autonomous
driving [21, 107, 128, 129]. Even with this additional modality, robust localization
remains challenging due to phenomena such as occlusions. The goal of this dataset is
to overcome or completely avoid some of these challenges. It enables wider research
on a complimentary localization modality which does not suffer from occlusion by
dynamic objects and changes in appearance conditions.
Ground Penetrating Radars: Only a few works considered the use of ground
penetrating radars in robotics such as for landmine detection [36] or for autonomous
surveys [131]. Using GPRs for localization has so far been considered only in [32, 99].
Consequently, most GPR datasets are targeted at very different application domains,
e.g., for research on soil structure characterization [110] or meteorology [67]. There is
currently no dataset allowing for wide-spread localization research with GPRs. The
high cost of GPRs and mere prototype availability of GPRs specifically designed for
localization makes research in this field for many groups completely impossible. By
making their data publicly available, the authors aim to overcome this limitation
simplifying research on radiogeological navigation.
Datasets: Because of the high cost of a retrofitted autonomous vehicle and to
compare results more equally, a lot of Autonomous Driving research is already driven
by benchmark datasets. In that context, the KITTI [52] dataset is one of the earliest
and most popular in autonomous driving research. In recent years, numerous institu-
tions made the data from their research vehicles publicly available [31, 59, 85, 104, 122]

128
Year Dataset Location Weather Camera Lidar Radar Other
2013 KITTI [52] Karlsruhe dry ✓ ✓ -
2016 Cityscapes [31] 50 cities dry ✓ - -
Switzerland
2017 DDD17 [14] dry, rain ✓ - - DVS
& Germany
2017 Oxford [85, 9] Oxford dry, rain, snow ✓ ✓ ✓
2018 ApolloScape [59] 4 x China dry, rain, snow ✓ - -
2018 MVSEC [138] Philadelphia - ✓ ✓ - DVS
2019 Astyx [87] - - ✓ ✓ ✓
2020 CADC [104] Waterloo dry, snow ✓ ✓ -
2020 nuScenes [23] Boston, Singapore dry, rain ✓ ✓ ✓
San Francisco,
2020 Waymo [122] Phoenix, Detroit, dry, rain, snow ✓ ✓ -
Seattle, Los Angeles,
Mountain View
2021 Delft [117] Delft - ✓ - - Mic. array
2021 Ours Massachusetts dry, rain, snow ✓ ✓ LGPR

Table 6.1: Overview of exteroceptive sensing modalities in autonomous vehicle navi-


gation research datasets.

some of which also involve radar data [9, 23]. A dataset specifically focusing on radar
perception is presented in [87]. More recently, several datasets covering novel sensing
modalities have been made publicly available focusing on acoustic detection [117] and
dynamic vision sensors (DVS) [14, 138]. In the same spirit, this thesis contributes
ground penetrating radars as a new sensing modality to dataset-driven perception
research.

6.3 Benchmark Challenges

Prior localization results with LGPR have looked promising, yet there are two im-
portant limitations that must be overcome before LGPR sensors can be practically
useful. The first requirement is to devise algorithms that can localize even when
the prior map was recorded in different weather conditions. This can be challenging
because LGPR data can be affected by the moisture content and temperature of the
underground soil which can vary with surface weather conditions. In [99] a degrada-
tion in localization performance in rain and snow was measured, but their algorithm

129
did not explicitly account for weather changes. The second requirement, is to build
maps that can localize a vehicle while it is changing between multiple lanes. Since
the LGPR sensor only records data directly beneath it, and the sensor only spans
the width of the vehicle, prior work [32, 99] only used maps consisting of a single
lane. Since practical autonomous vehicles will need the ability to maintain a seamless
localization as they traverse multiple lanes, it will be necessary to devise mapping
algorithms that can stitch together data from multiple passes in different lanes to
obtain a cohesive road map. To ensure solutions to these limitations can be com-
pared on an equal footing, we propose the following two challenge benchmarks. Our
dataset specifically includes data to address these challenges including data collected
in a variety of weather conditions and in multiple lanes as shown in Table 6.3.

6.3.1 Localization in Weather Challenge

Mapping and Localization Runs

In the provided dataset, every run was collected as a pair to enable mapping and
localization using the same environmental conditions. For example, run_0001 and
run_0002 were both collected driving the same route, in an urban environment, in
clear weather and in the right lane. All of this information can be found in the runs.csv
file as described in Sec. 6.4.1. However, for the Localization in Weather Challenge,
we aim to evaluate localization using maps that were created in different weather
to demonstrate weather resilience. Therefore, we could instead evaluate localization
using run_0037 or run_0038 which were both collected along the same route, but in
snowy weather while still using run_0001 or run_0002 to build the map. In short,
each run in the dataset includes a weather condition label [clear, rain, snow ]. For this
challenge, the mapping and localization runs should be along the same geographical
route, but in different weather conditions.
In the event learning-based algorithms are utilized for mapping, we would like to

130
clarify here that it is acceptable to include runs which were collected in inclement
weather in the training phase, as long as they were collected at a different location.
This mirrors the real-world constraints where one could envision training an algorithm
to localize to single weather maps in all weather by including a small sample of varying
weather in the training phase. Finally, while other onboard sensors (e.g. odometry,
camera, or lidar) may be used to enhance the localization estimate, the estimate at
each time 𝑡𝑖 may only use measurements taken at a prior time 𝑡𝑗 such that 𝑗 ≤ 𝑖.

Localization in Weather Metric

The metric used for localization in weather is an Absolute Trajectory Evaluation.


Since the LGPR data is labeled with ground truth RTK-GPS data, we can directly
evaluate the trajectory error rather than implementing a relative metric as is often
needed with SLAM or VIO solutions [137]. Additionally, we limit the benchmark to
the two translation degrees of freedom and single rotation (yaw) on the surface of
the ground as these are the critical values necessary for autonomous driving. Thus,
the first two metrics are the Root Mean Square Error (RMSE) of the translation and
orientation of the vehicle t𝑟𝑚𝑠𝑒 , 𝜃𝑟𝑚𝑠𝑒 evaluated over the entire run. Next, because for
driving purposes, the lateral translation error (with respect to the lane) is often far
more important than the longitudinal, we further decompose the error into its lateral
and longitudinal components to obtain two additional error metrics: 𝑡𝑙𝑎𝑡 , 𝑡𝑙𝑜𝑛𝑔 . While
all four of these error metrics should be reported, for the purpose of providing an
overall score we propose a weighting:

𝑠 = 𝑡𝑙𝑎𝑡 + 0.1𝑡𝑙𝑜𝑛𝑔 + 10𝜃𝑟𝑚𝑠𝑒 (6.1)

Intuitively, this assigns equal cost to 10 cm of error in the lateral lane direction,
1 m of error in the longitudinal lane direction, and 0.57° of orientation error which
have similar real-world importance.

131
6.3.2 Multi-lane Mapping Challenge

Mapping and Localization Runs

In the provided dataset, each run is labeled with a lane attribute in [Left, Center,
Right, Changing] as shown in Fig. 6-3. These lanes are overlapping since the center
lane is not a separate lane, but rather the sensor centerline is passing over the lane
divider and including partial data from each of the left and right lanes. The purpose
of these runs is specifically to support building cohesive maps that can track a vehicle
even while it is changing lanes, or only partially overlapping a lane. For this challenge,
multiple runs where the lane is in [Left, Center, Right] in the same route should be
used for map creation. In runs where the lane is marked Changing , the vehicle was
driven along the route while randomly choosing any of the [Left, Center, Right] lane
positions and changing between them. Those runs should only be used for localization
evaluation and not included in mapping or training data.

Multi-lane Mapping Metric

The metric used for the Multi-lane Mapping benchmark is similar to that described
for the previous benchmark. One important difference is that here we explicitly do
not follow a single lane. Therefore, lateral and longitudinal lane errors are not needed
and only the absolute trajectory errors t𝑟𝑚𝑠𝑒 , 𝜃𝑟𝑚𝑠𝑒 are necessary. The overall score is
then calculated as:
𝑠 = 𝑡𝑟𝑚𝑠𝑒 + 10𝜃𝑟𝑚𝑠𝑒 (6.2)

6.4 Dataset

The dataset is available for download at http: // lgprdata. com . The data is stored
hierarchically as shown in Fig. 6-4.

132
6.4.1 Run Level Data

At the highest level are runs. Each run is a single data collection instance often
taking the form of a loop. To avoid potential issues with overlapping sensor data,
the start and end points of the loop always have a small gap of ≈10 m. For each
run, we provide run level information in a file called runs.csv. This file contains
columns as shown in Table 6.2. Each row in runs.csv has a unique run_id which
corresponds to a directory such as run_0001 in the runs directory (see Fig. 6-4).
Note that the route_id corresponds to a unique route traversal but duplicates do
exist because the same route was driven in multiple runs. For example, run_0001
and run_0002 represent two unique runs, but have the same route_id because they
traversed the same physical route. Furthermore, a single route can be traversed in
any of the lanes, or in either direction. One final caveat to bear in mind is that the
direction of traversal rotates the semantic meaning of the lane column. Thus, two
runs with the same route_id but different direction values would imply traversing
the same lane in opposite directions if the lane value in one was equal to right while
in the other was left. In general, the purpose of grouping runs by route, lane, and
direction, is to provide data for the Multi-lane Mapping Challenge (Sec. 6.3.2). For
the simple case of mapping and localizing on the same trajectory, one can simply use
runs with identical route, lane, and direction.
The dataset contains runs in a variety of lanes, and environmental and weather
conditions to support the benchmark challenges proposed earlier. Table 6.3 shows
the splits in the data for each of the relevant data conditions.

6.4.2 Sensor Level Data

Within each run directory, there are several directories, one for each element in the
sensors field in the corresponding row of runs.csv. Currently, every run includes at
a minimum [lgpr, gps, odom]. Many runs additionally include [camera, lidar ]. Next,

133
Column Data Type Description
run_id Integer A unique value for each run in the dataset
date Date The date when the run was recorded
road_type String One of {urban, rural, highway}
route_id Integer A unique value for each route
weather String One of {clear, rain, snow}
direction Integer {−1, 1} = {clockwise, anticlockwise}
lane String One of {left, center, right, changing}
length Float The total length (km) of the run
duration Float The total duration (s) of the run
sensors List A list of comma separated sensor names
Table 6.2: The file runs.csv contains metadata for every run in the dataset organized
into the columns shown.

we briefly describe the data formats of each of these sensor streams, for additional
details, please see the dataset documentation.

lgpr

The LGPR sensor frames are arrays with shape (11, 369) corresponding to the 11
radar channels (11 Tx and Rx pairs for the 12 radar elements in the array) and the
369 depth bins. Each value in the array is an int8 ranging from [-128, 127]. The lgpr
directory contains a file frames.csv which includes a table containing the frame_id
for each LGPR frame in the run, along with a timestamp providing the synchronized
time (see Sec. 6.5.3) for that frame. In the frames directory is included two files
for each frame in frames.csv : 1) frame_id.gpr contains the raw LGPR array in csv
format while 2) frame_id.gmr contains the processed LGPR scan with mean removed
as described in Sec. 6.5.4. A script for reading and visualizing these scans is provided
in the SDK.

134
gps

The ground truth GPS data includes position and velocity. Each GPS reading
contains sixteen values including [longitude, latitude, altitude], position and orien-
tation in the UTM frame [x, y, z, qx, qy, qz, qw ], and velocity in the vehicle frame
[𝑥,
˙ 𝑦,
˙ 𝑧,
˙ 𝑟,
˙ 𝑝, ˙ Note that most GPS measurements include DGPS corrections received
˙ 𝑦].
from a fixed base station. However, because our base station has a range limited to
≈10 km, runs with road_type equal to highway in the runs.csv file do not have base
station corrections. All of these twelve columns are included, along with a timestamp
in a file called gps.csv which includes all GPS measurements for the run.

odom

The odometry data includes measurements from the wheel encoders on each of the
rear wheels and the IMU. For each wheel encoder measurement we include the total
distance in meters, traversed by the left and right rear wheels respectively along with
a timestamp. These measurements are provided in encoder.csv. Next, the IMU data is
a vector of ten values including 3-vectors for linear acceleration and angular velocity,
and an orientation quaternion. These values, along with a timestamp, are provided
for each IMU measurement. Finally, for convenience, we also provide odom.csv which
contains the vehicle position [x, y, z ] and orientation quaternion [qx, qy, qz, qw ] with
respect to the start frame calculated by accumulating the wheel encoder and IMU
measurements using a Kalman filter.

camera

The LGPR sensor is particularly useful in situations where camera sensors can fail
such as glare, darkness, or inclement weather. For this reason, we include camera
data for comparison purposes. We use a PointGrey Grasshopper front-facing camera
mounted behind the windshield. Each frame has resolution 1928x1448. The file

135
Type Total Length (km)
highway 316.2
Road Type rural 115.2
urban 17.6
clear 151.8
Weather rain 145.4
snow 151.8
center 21.6
changing 16.8
Lane
left 38.4
right 372.2

Table 6.3: The GROUNDED dataset contains data collected in a variety of lane po-
sitions, and environment and weather conditions to support the proposed benchmark
challenges.

camera.csv contains a table with columns for the frame_id of each camera frame
along with the timestamp. The directory frames/ contains camera images of the
form frame_id.png where each image corresponds to the frame_id in the frames.csv
file.

lidar

Similar to the camera sensor, we provide lidar data primarily as a baseline comparison
tool. The lidar sensor is a Velodyne HDL-64 mounted above the roof of the vehicle
offset toward the front (see Fig. 6-5). It provides pointcloud data where each scan
contains ≈191,000 data points, and each point is a 4-vector of [x, y, z, intensity]
giving the geometric position of the point with respect to the sensor frame, and the
intensity of the laser reflection. We accumulate measurements until a 360° rotation
is completed at a rate of 10 Hz and include one scan/revolution. The file lidar.csv
contains a table with a column frame_id along with a timestamp. The directory
frames/ contains files frame_id.pcd which includes the pointclouds in PCD format
[112].

136
6.5 Data Collection Platform

6.5.1 Vehicle Infrastructure

The dataset was collected using a Toyota Prius research platform shown in Fig. 6-5.
The sensors mounted on the vehicle include wheel encoders, an IMU, GPS, lidar,
camera, and the LGPR radar array. The origin frame for the vehicle is located
in the center of the rear axle, with the x-axis parallel to the ground and oriented
toward the front of the vehicle, and the z-axis oriented upward. Fig. 6-5 shows the
reference frames for all of the other sensors as well. Note that all of the sensors are
located on the central plane of the vehicle (XZ-plane) except for the GPS unit which
is slightly offset. The actual values for all of these transforms are provided in the
calibration/extrinsic directory as shown in Fig. 6-4. In the following subsections, we
describe the details of each sensor and how the data was collected, synchronized, and
calibrated.

The data was collected on two computers. The first one, CAR_PC was connected
to a vehicle-wide LAN which received data from the GPS, IMU, wheel-encoders, and
lidar. It also utilized a USB hub to receive images from the camera. Due to technical
limitations of the LGPR sensor prototype, it could not be configured to stream the raw
data to the vehicle LAN in real time. Instead, a second computer, LGPR_PC was
used to receive and store the LGPR frames separately. In Sec. 6.5.3 we describe how
we utilize an accurate GPS time reference to provide time-synchronized data streams.
The CAR_PC is a Dell Laptop with an i7 processor running Ubuntu 18.04. The
LGPR_PC is a Single Board Computer (SBC) onboard the LGPR sensor prototype
and accessed via Ethernet for data retrieval.

137
6.5.2 Sensors

LGPR Sensor

The LGPR sensor was mounted on the rear of the vehicle by attaching to the trailer
hitch. It is a completely self-contained unit only connected to the vehicle for power.
Here we describe the main components of the LGPR sensor, for more details see [32]
which describes the design of the sensor. This sensor is not commercially available
and there are only a few prototypes which is the primary motivation for sharing the
data in this thesis research.

As seen in Fig. 6-6 the main LGPR sensor components include a 12 element radar
array, a switch matrix, an OXTS RTK-GPS unit, and a processing chassis. The radar
array transmits on a single element at a time, while receiving on the next element.
Each pair of elements thus provides a single channel of data which produces images 11
pixels wide. Note that the array is 152 cm wide (the same as the width of the vehicle)
and 61 cm from front to back. While the GPS unit contained in the LGPR sensor
is typically used for onboard mapping, here we use it only for time-synchronization,
instead using the more accurate base-station corrected GPS measurements taken
onboard the vehicle for groundtruth as described in Sec. 6.5.2

At runtime, the array completes a sweep of all 11 channels at 126 Hz and these
data are binned into 369 depth bins to provide an 11x369 pixel image that spans the
width of the vehicle and penetrates approximately 3 m beneath the ground (the actual
depth can vary by region and soil content). The switch matrix receives these signals
and transmits them to the processing chassis where the radar images are stored for
later retrieval.

138
RTK-GPS

An OXTS RT3003 Inertial Navigation System (INS) was used to provide a groundtruth
vehicle position for all runs. To achieve an accuracy of ≈2 cm, the GPS sensor re-
quires corrections to the raw GPS signal from a fixed base station. Note that for
all runs with road_type equal to rural or urban we placed and calibrated the base
station in the test region as described in Sec. 6.5.4. However, since the base station
range is limited to 10 km, the runs with road_type equal to highway do not include
base station corrections.

Odometry

The odometry sensors include two magnetic hall effect wheel encoders one in each of
the rear wheels. These provide counts as each pole passes the sensor. Additionally,
we utilize the IMU built in to the RT3003 to obtain acceleration and angular rates.
Together, the wheel odometry and IMU data can be used to obtain an interoceptive
estimate of the vehicle motion independent form the exteroceptive measurements of
the GPS and LGPR sensors.

Camera

We utilize a front-facing Point Grey Grasshopper camera with a resolution of 1928x1448


at 6fps. The camera is mounted behind the windshield to protect it from rain or snow.
Windshield wipers were utilized when required to ensure the windshield remained
clear. The main motivation for providing camera imagery in this dataset is to serve
as a baseline to compare how weather effects vision sensors compared to LGPR.

Lidar

A Velodyne HDL-64 sensor is mounted on the roof of the vehicle. We mount the sensor
on the center plane of the vehicle, slightly forward to obtain a better view of the road

139
in front of the vehicle at the expense of some occlusion caused by the vehicle itself in
the rear. The Velodyne scanner spins at 600 RPM yielding 360-degree scans at 10 Hz.
The scans are labeled with the synchronized time at the end of each revolution, to
obtain the time of individual points, one can linearly interpolate between the time
stamps for each azimuth.

6.5.3 Time Synchronization

As noted in Sec. 6.5.1, the LGPR sensor data is collected in isolation from the other
vehicle sensors. We utilize the GPS time reference on the CAR_PC to obtain a single
reference that is accurate to within a few nanoseconds [76]. First, the GPS position
for each LGPR frame is recorded based on the GPS device incorporated within the
LGPR sensor. Next, the GPS sensor within the vehicle records a data-stream of pairs
of time stamps (𝑡𝑠𝑦𝑠 , 𝑡𝑔𝑝𝑠 ) at 100 Hz. Lastly, for each vehicle sensor (GPS, odometry,
camera, and lidar), each data point is recorded with the corresponding system time
𝑠𝑠𝑦𝑠 .
Then, to synchronize all of the onboard sensor data in post-processing, we inter-
polate the GPS reference time for each sensor data point as:

𝑠𝑠𝑦𝑠 − 𝑡𝑖𝑠𝑦𝑠 (︀ 𝑖+1


𝑡𝑔𝑝𝑠 − 𝑡𝑖𝑔𝑝𝑠 (6.3)
)︀
𝑠𝑔𝑝𝑠 = 𝑖+1 𝑖
𝑡𝑠𝑦𝑠 − 𝑡𝑠𝑦𝑠

where 𝑡𝑖𝑠𝑦𝑠 − 𝑡𝑖𝑔𝑝𝑠 is the closest time reference pair with 𝑡𝑖𝑠𝑦𝑠 < 𝑠𝑠𝑦𝑠 and 𝑡𝑖+1
(︀ )︀ (︀ 𝑖+1
)︀
𝑔𝑝𝑠 − 𝑡𝑔𝑝𝑠

is the next consecutive time reference pair.


The data stream provided by the LGPR sensor contains geo-referenced radar
frames. However, since the position estimates are obtained with the standard quality
GPS device integrated in the LGPR sensor, rather than the RTK-GPS system on-
board the vehicle used for ground truth, we cannot simply match the LGPR frames
to the ground truth positions using their recorded positions. Instead, to obtain ac-

140
curate positions for LGPR frames, we first obtain timestamps by noting that the
radar frames are collected at a fixed frequency of ≈126 Hz. Therefore, to obtain
timestamps, the LGPR stream is aligned with the GPS data stream. We first dif-
ferentiate the positions from both the LGPR scans and the RTK-GPS systems using
a centered Savitzky-Golay filter [116]. Next, we obtain the alignment offset between
the velocities by maximizing the Pearson correlation coefficient between the velocities
using numerical optimization. The result of this maximization, combined with the
fixed frequency of the LGPR scans allows every LGPR scan to be labeled with a
corresponding timestamp on the GPS clock.

6.5.4 Calibration

Each of the sensors included in the data stream was calibrated as needed before each
run. Here we describe the various calibration processes for each sensor. Additionally,
when possible we include the calibration files in the dataset (see Fig. 6-4).

LGPR Sensor Calibration

The LGPR sensor needs to be calibrated to ensure changing environmental conditions


do not unduly affect the sensor readings. The sensor includes a calibration routine
which pulses each element for a short time and measures and records the mean values.
This enables the intensity data to be stored in a mean-centered format which helps
remove any biases due to ambient conditions. This allows the device to operate
reliably in temperatures ranging from −5 °C to 50 °C [32]. We calibrated the LGPR
sensor at the start of each day. Additionally, in the event the temperature changed
drastically throughout the day, the calibration routine was repeated in between runs.
For maximum flexibility, we include for each frame both the raw LGPR frame as
frame_id.gpr, and the mean-centered frame as frame_id.gmr. The transform between
the vehicle frame and LGPR frame is provided in vehicle_to_lgpr.yaml (see Fig. 6-4).

141
RTK-GPS Calibration

The RTK-GPS system requires calibration in two ways. Firstly, because the GPS
is receiving base station corrections, all measurements are with respect to the fixed
location of the base station. Any errors in the measurement of the location of the
base station itself would propagate to the vehicle measurements and reduce the system
accuracy. To mitigate this, we fix the location of the base station by mounting it in
a permanent position. For each of the road types rural and urban we selected a fixed
base station antenna mount and recorded and averaged GPS positions for one hour.
Once that averaging period was complete, we record the mean position and use it
for all future runs using that base station position. This ensures that no errors are
introduced between runs due to incorrect measurement of the base station position.
Note that for road_type highway we do not use corrections because of the limited
base station range.

The second calibration necessary for the GPS groundtruth is a built-in calibration
routine in which the vehicle is driven in several loops and ∞-paths. The manufacturer
provides a software tool for using these drives to fine-tune the position and orientation
of the sensor within the vehicle, as well as the relative positions of each of the two
roof-mounted GPS antennas. We store those values onboard the device, and provide
them with the dataset in vehicle_to_gps.yaml (see Fig. 6-4).

Camera Calibration

To calibrate the Point Grey Grasshopper front-facing camera, we utilize the camera
calibration method described here [111]. The intrinsic calibration file is included in
the dataset as intrinsic/camera.yaml. The measured extrinsic calibration between
the camera and the velodyne is provided in velodyne_to_camera.yaml as shown in
Fig. 6-4.

142
Velodyne Calibration

The Velodyne sensor includes a factory provided calibration file that accounts for
small differences in manufacturing and assembly and the affect they have on the
conversion between the measured azimuth/angle of each point, and its position in the
sensor frame. We apply this calibration file in order to produce the Velodyne frames
found in frame_id.pcd. We also include the calibration file in intrinsic/velodyne.xml.
Additionally, we provide the transform between the Velodyne sensor frame and the
vehicle frame in vehicle_to_velodyne.yaml as shown in Fig. 6-4.

6.6 Discussion
In this chapter, we presented a novel dataset for localization and mapping research
in autonomous driving using a ground penetrating radar. Our goal is to unlock the
potential of radiogeological navigation as this sensing modality holds the promise to
enable autonomous vehicle localization even in the most challenging weather condi-
tions. Together with the dataset, this thesis chapter proposes evaluation protocols
and additionally provides camera and lidar data to simplify comparisons with estab-
lished algorithms on these sensing modalities.

143
Figure 6-1: The GROUNDED dataset includes four data streams for each run.
1) Lidar scans from a roof mounted Velodyne HDL-64, 2) Camera images from a
front-facing Point Grey Grasshopper camera, 3) Base station corrected RTK-GPS for
groundtruth, and 4) LGPR data stream from the rear-mounted radar unit.

144
Figure 6-2: Left: Camera images from three different runs of the same route
(route_04) capturing LGPR data in the same location in clear weather, snow, and
rain to support the Localization in Weather Challenge. Right: Trajectory of route_04
overlaid on a map. Other runs capture a variety of environments including rural
(shown here), urban, and highway.

145
Figure 6-3: The Multi-lane Mapping Challenge provides LGPR frames collected in
four lane positions: 1) Left, 2) Right, and 3) Center 4) Changing. These runs can be
used to create a consistent map capable of localizing the vehicle continuously even
while switching lanes.

146
Figure 6-4: The dataset is organized in directories for each run. Run metadata such
as the lane and weather conditions is provided in the runs.csv file and referenced by
the corresponding {run_id}. Similarly, for each sensor, frame metadata is provided
in a frames.csv file referencing individual frames by their {frame_id}.

147
Figure 6-5: Top: The Toyotal Prius vehicle platform used to collect the dataset.
Bottom: A schematic showing the positions of reference frames for the vehicle and
each of the sensors. Blue is the vehicle frame, Red is the OXTS GPS/IMU, Green
is the camera, Magenta is the lidar, and Yellow is the LGPR sensor. The measured
transforms between these frames are included in the dataset.

148
Figure 6-6: The LGPR sensor used to collect the dataset. The processing chassis
communicated with the switch matrix to control transmit and receive on the 12 radar
elements. The GPS shown here was used for time synchronization, but groundtruth
information was obtained by a separate GPS unit onboard the vehicle.

149
150
Chapter 7

Conclusion

7.1 Summary

In this thesis, we have proposed several methods for addressing the issues with au-
tonomous systems that rely on HD prior maps. These issues prevent HD map-based
systems of achieving the resilience and flexibility of human drivers.
None of the methods take individually can fully solve the autonomy problem to
reach an SAE Level 5 qualification of fully independent autonomy in all regions and
operating conditions where a human would be capable of driving. However, taken
together the methods we’ve outlined can help bring autonomous navigation systems
closer toward this goal. MapLite could enable leveraging SD prior maps to guide
high level behavior while also building HD map-like representations for encoding the
behaviors needed for following rules-of-the-road. An LGPR sensor, equipped with a
learned model for weather resilient localization could serve as a backup localization
solution when inclement weather degrades the quality of the onboard vision and lidar
sensors.
Finally, the concept of “dialable navigation” we introduced provides a framework
for balancing the need for leveraging all available prior knowledge of an environment,

151
with flexibility and resilience provided by the ability to explore unknown areas. For
a particular application, the “dial” could be set far to side of using only the sparsest
SD priors (as in the works we’ve described here) during the initial exploration of
new environments. Next, as the vehicle builds up prior knowledge, and subject to
constraints on storage and transmission, the prior experiences could be leveraged to
provide additional confidence in operating in the region, much like a human driver
can more confidently drive in familiar places. However, the dial need not be set so
far to prior knowledge side that the system is incapable of any autonomy without a
detailed prior HD map of world. Experience has shown that such systems are brittle
and difficult to scale beyond small test sites and pilots.

7.2 Lessons Learned

Through the process of completing this thesis, a number of lessons were learned
that could be of interest to the others working on related problems. Firstly, the
astute reader will notice that a number of the methods described here use Bayesian
inference to maintain a belief about model variables under repeated observations.
This is a key insight that is widely used in the field but bears emphasizing. Even
when an instantaneous measurement leads to the correct estimate most of the time,
it is often more robust to consider the uncertainty, or variance, in order to obtain a
more stable estimate over multiple observations rather than restarting the estimate
with each step. This is especially true when the sensor is known to be noisy. When
doing so, a good sensor noise model is ideal for calculating the variance. However,
we have found that even if not, there is often a good proxy for the variance estimate
that can be a good substitute. For example, in MapLite we use the optimization cost
at the optimal value as a proxy for the measurement variance and it performs well
at providing a higher weight to optimization results that find a good minimum. Of

152
course, when proxies such as these are used, tuning will be necessary to determine
the appropriate weight to assign the proxy variance estimate. This still tends to far
outperform taking the optimal value in each step simply because the true variance
cannot be measured.

Another insight gleaned over the course of this thesis is the power even a simple
prior has in constraining a large-scale problem to reduce the space to a more tractable
size. One example of this is the way the SD prior is used to initialize the HD map-
like representation in MapLite 2.0. Previously, similar attempts were made to obtain
not only the model parameter values, but also the model parameters themselves
directly from the sensor measurements, essentially starting with nothing. However,
these attempts were largely limited to very simple models because the search space
grows quickly intractable when every possible combination of parameters must be
considered. Here, we use the SD prior to fix the parameters and only use the sensor
measurements to determine the most likely values of the parameters which is much
more likely to succeed in even the very rich HD map model representations that were
used. We have used this idea throughout this thesis with good results.

Next, notice that all of the methods proposed in this thesis were implemented
and tested on a full-size autonomous vehicle in the real world. While this is valuable
for ensuring the proposed solutions were not overly simplified or impractical, it also
meant a very large, complex, and interconnected system was required to obtain ex-
perimental results. Since the complexity of the system grows exponentially with the
number of interconnected modules, we found it to be very important to isolate and
evaluate individual components whenever possible. This was often accomplished via
simulation, unit tests, and data driven testing where where the methods are run on
real-world data collected previously.

We also found that resilience through redundancy was an important concept for
ensuring that a single point of failure would not cause a long complex pipeline to fail

153
often do to the number of possible failure modes. For example, in the MapLite abla-
tion study, we found that the Map Registration and Variational Planner components
had significant overlap where the removal of either one of those modules degraded
but didn’t completely destroy the overall system, while having both modules resulted
in the best overall result. This overlap in functionality is key when a dealing with a
large system with many modules that can become brittle when each module has little
overlap with the other modules and always requires all modules function well for the
system as a whole to be successful.
Finally, it is worth mentioning the balance found between using data-driven
learning-based components, and breaking the systems up into modules with expert
designed inputs and outputs. While the former has the benefit of higher generality
(assuming enough data is readily available) and often better results, the learned mod-
els are black boxes that can lack the explainability and interpretability of the modular
expert designed processes. In both MapLite and LGPR methods we have selected a
hybrid approach where certain modules are learned from data, but the overall pipeline
is modular and designed with inputs and outputs of the different components that
are meaningful. This provides a balance between the benefits of learning from data
the parts of the system that would be difficult or impossible to model while ensuring
the overall system was partitioned into modular components that could be inspected
to determine the source of any errors should they occur.

7.3 Social Responsibility

It is my hope that the ideas described in this thesis will advance the ways in which
autonomous vehicles can navigate in and around human environments to better assist
people with their daily tasks. There are many benefits that this technology could
enable including reducing the health risks associated with traffic accidents caused

154
by human error, along with improving the accessibility of transportation solutions
for people currently unable to drive a vehicle themselves. However, before deploying
a new technology, it is important to take a close look at the possible unintended
consequences it could have in order to ensure that appropriate safeguards are in
place to meet our social responsibility. Here I will discuss several possible challenges
associated with autonomous navigation in general, with a particular emphasis on the
ways in which the approaches presented in this thesis can effect those challenges.

Firstly, as with all data driven AI systems, the possibility of bias in the training
data imparts a risk that the final deployed system will not have equal performance
across the diverse circumstances in which it may operate. In the context of our work,
this can manifest itself as a degradation in performance when the system is deployed
in a novel environment that deviates from the environments included in the training
data. It is noteworthy that while this can be challenging for any autonomous robotic
system, in autonomous vehicles that rely on prior HD mapping, this can be somewhat
ameliorated by the requirement that the HD map be created for a region before the
vehicle can operate there. If a new operating region is encountered with different
characteristics, this will likely be caught during the mapping phase. Furthermore,
since manual human annotation is an integral part of the mapping process, the human
annotators could interpret and fix any novel environmental features by integrating
them into the HD map. However, in the MapLite approach proposed here, the vehicle
can operate in completely new regions fully autonomously without any human in the
loop. Therefore, any deviation between the appearance, standards, and conventions
encountered during the training phase, and those in the deployed environment may
cause the system to react unexpectedly.

This issue is not unique to autonomous vehicles. Human piloted vehicles also can
face challenges when a driver encounters a new environment such as when driving in
an unfamiliar city, or navigating an unfamiliar road feature. Similarly, an autonomous

155
system using this approach to navigate a new environment for the first time exercise
greater caution by, for example, driving more slowly until the system builds experience
and confidence.

Next, there can be a challenge with equity when autonomous systems that are de-
ployed focus solely on commercial factors in determining where and when to roll out
these systems. For example, consider the number of transportation options available
to residents of densely populated urban areas. These typically include public trans-
port like buses, trains, and trolleys, along with individual vehicles for hire like taxis
and ride-share vehicles. In contrast, residents of rural areas often have no transporta-
tion options at all other than personally operating a private vehicle. If one cannot
afford or is otherwise unable to do so, it can pose a significant challenge to meet basic
needs like access to healthcare and food. However, the vast majority of fielded au-
tonomous driving systems currently operate exclusively in cities mainly because the
economics of a dense population along with the reduced complexity of a structured
urban environment render these areas more attractive. We hope that the approach
proposed here can some of the challenges of rural areas by removing the requirement
to map such large and distributed regions. Additional incentives provided by public
policy may also be necessary to ensure that these technologies are provided to those
who would benefit from them the most.

Finally, there are many thousands of individuals who currently earn their living
by piloting vehicles from taxicabs to commercial trucks. In my view these jobs should
be, and eventually will be automated. The risks are associated with human drivers
imparts a moral obligation to do so as soon as the technology has matured to the
point where it is the safer option. However, this does not imply that we should
not consider the impact this will have on those that have built a career through
driving. In fact, there are likely to be many new jobs created by this technology from
manually labeling data to building and maintaining maps. Additionally, many jobs

156
could change rather than be eliminated once the necessity for a dedicated human
driver is removed. For example, a bus driver could spend the time looking out for the
needs of the bus passengers, answering questions, or giving a tour if they didn’t have
to focus on the driving task. Thus, government can and should ease this transition
by providing opportunities for retraining to smooth the transition to autonomous
vehicles and ensure the benefits are spread more evenly and doesn’t result in a small
segment of the population bearing the brunt of the negative effects for the benefit of
the rest.

7.4 Future Work

In this thesis we have proposed a method of expanding the capabilities of autonomous


vehicles to operate without relying on HD prior maps created in advance. However,
there are still some interesting areas of research that we leave for future work.

7.4.1 Dialable Navigation

We have presented the concept of “Dialable Navigation” which views the balance of
how obtaining information pertaining to the static world model from prior maps, or
from onboard perception, as a continuum where the specific setpoint is determined
by a particular application. In this thesis, we have focused on pushing the dial
as much as possible toward onboard perception as a way to answer the question,
“How much is the minimum amount of prior information that is sufficient to enable
autonomous navigation?” However, more investigation is needed to determine the best
way to balance the benefits of using prior knowledge, essentially leveraging experience,
with the costs of doing so, including the risk of information becoming outdated and
the requirement to curate the data in advance. Far from a single dimensional dial,
this decision appears throughout the mapping and autonomy pipeline. For example,

157
should a sparse prior map include the number of lanes on a road? The location of
crosswalks? The geometry of those crosswalks? What about traffic lights, speed
limits, and stop signs?
Rather than determining a single answer for each of these questions, investigation
is needed to determine the axes upon which they should be evaluated. For example,
in this thesis we chose to rely on the lane count information from prior maps, but
measure the individual lane geometry online. The factors in this decision include the
availability of lane count information, as well as the frequency that this value changes
compared to relatively more frequent changes to lane geometry. More investigation
will be necessary to determine the various map elements that must be chosen, the
relevant factors that should be evaluated, and a rigorous process by which to set
the dial to maximum the overall system robustness while minimizing the costs for
building and maintaining maps.

7.4.2 Persistent Maps

In this thesis, we have focused on minimizing the prior information in the map and
demonstrating that onboard perception can be sufficient for solving most of the en-
vironment perception problem. Therefore, in all of the systems described here, we
have not persisted the perceived maps of the environment from one run to the next.
Instead, each navigation task is viewed as a blank slate on which the sparse SD priors
are the only source for information about the environment a priori. However, as an
expansion on the issue of Dialable Navigation just discussed, another area for future
research is how to best persist some of these dense features for use when the same
or another autonomous vehicle re-visits the same area. It would seem quite useful
to save the perceived map features, at least as a strong prior, to ease the percep-
tion task and improve the reliability of future navigation. This could resemble an
automated, and continuously updating mapping framework in which the autonomous

158
vehicle begins by relying exclusively on sparse SD priors, and perhaps must navigate
more cautiously as a result, but over time builds up a persistent HD map of the
environment autonomously which can in turn improve reliability and allow for more
confident driving behaviors.

7.4.3 Expanded HD Map Feature Set

In this work, we have proposed methods to build HD map like representations online
to enable navigation without HD prior maps. To do so, we selected a subset of the fea-
tures found in HD maps as the ones that are most critical for driving. These included
the road surface boundaries, lane dividers, intersection topology, and intersection
boundaries. However, these are by no means a complete set of all of the features a
vehicle would require for autonomous navigation in any environment. Depending on
the particular location, additional features such as crosswalks, traffic lights and signs,
bike lanes, etc. could be necessary to safely navigate alongside human drivers while
obeying traffic laws.
The methods presented here could be extended with minor modification to those
features as well. For example, we showed how to model lane line dividers as linestrings
and train the perception detector using an annotated dataset containing lane lines.
A similar approach could be used to detect and model bike lanes as well. However,
including all of the map features that could be found in the many environments in
which autonomous vehicles will need to operate is beyond the scope of this work.

7.4.4 MapLite with LGPR

In this thesis, we proposed using MapLite to enable autonomous navigation with


onboard cameras and lidar sensors to detect the road features rather than relying on
HD prior maps. This approach solves many of the costs of creating HD prior maps in
advance and maintaining them. However, it does not address the problem where the

159
road features themselves are occluded due to inclement weather such as rain or snow.
To address that scenario, we proposed using an LGPR to localize using underground
features instead that are robust to changes in the surface environment. However, this
approach does require the underground features be mapped in advance. Thus, these
approaches are complimentary; however, in this work we study them only in isolation.
Future work could explore combining these systems to enable the vehicle to explore
and build LGPR maps using the MapLite system in fair weather conditions, and then
fallback on the LGPR localization during inclement weather. Such a solution would
need to create associations between the SD prior maps and the LGPR maps, as well
as have a mechanism for detecting the challenging weather conditions to determine
when to trust the parallel systems.

160
Appendix A

Supplementary Material

Table A.1: Listing of map/target run pairs from


GROUNDED used for training and evaluation of the
LGPRNet model.

Data Split Map ID Run ID Map Weather Run Weather


Train 6 44 snow clear
Train 7 44 snow clear
Train 6 45 snow clear
Train 7 45 snow clear
Train 6 78 snow rain
Train 7 78 snow rain
Train 6 79 snow rain
Train 7 79 snow rain
Train 44 6 clear snow
Train 44 7 clear snow
Train 45 6 clear snow

161
Data Split Map ID Run ID Map Weather Run Weather
Train 45 7 clear snow
Train 78 6 rain snow
Train 78 7 rain snow
Train 79 6 rain snow
Train 79 7 rain snow
Train 44 78 clear rain
Train 44 79 clear rain
Train 45 78 clear rain
Train 45 79 clear rain
Train 78 44 rain clear
Train 79 44 rain clear
Train 78 45 rain clear
Train 79 45 rain clear
Train 10 48 snow clear
Train 10 49 snow clear
Train 10 82 snow rain
Train 10 83 snow rain
Train 11 48 snow clear
Train 11 49 snow clear
Train 11 82 snow rain
Train 11 83 snow rain
Train 48 10 clear snow
Train 49 10 clear snow
Train 82 10 rain snow
Train 83 10 rain snow

162
Data Split Map ID Run ID Map Weather Run Weather
Train 48 11 clear snow
Train 49 11 clear snow
Train 82 11 rain snow
Train 83 11 rain snow
Train 48 82 clear rain
Train 48 83 clear rain
Train 49 82 clear rain
Train 49 83 clear rain
Train 82 48 rain clear
Train 83 48 rain clear
Train 82 49 rain clear
Train 83 49 rain clear
Train 20 58 snow clear
Train 20 59 snow clear
Train 20 92 snow rain
Train 20 93 snow rain
Train 21 58 snow clear
Train 21 59 snow clear
Train 21 92 snow rain
Train 21 93 snow rain
Train 58 20 clear snow
Train 59 20 clear snow
Train 92 20 rain snow
Train 93 20 rain snow
Train 58 21 clear snow

163
Data Split Map ID Run ID Map Weather Run Weather
Train 59 21 clear snow
Train 92 21 rain snow
Train 93 21 rain snow
Train 58 92 clear rain
Train 59 92 clear rain
Train 92 58 rain clear
Train 93 58 rain clear
Train 58 93 clear rain
Train 59 93 clear rain
Train 92 59 rain clear
Train 93 59 rain clear
Val 52 53 clear clear
Val 52 14 clear snow
Val 52 15 clear snow
Val 52 86 clear rain
Val 52 87 clear rain
Val 62 63 clear clear
Val 62 24 clear snow
Val 62 25 clear snow
Val 62 96 clear rain
Val 62 97 clear rain

164
Bibliography

[1] Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver, 3 2022.
84

[2] José M Alvarez, A Lopez, and Ramon Baldrich. Illuminant-invariant model-


based road segmentation. In IV, 2008. 42, 47, 48

[3] M. Aly. Real time detection of lane markers in urban streets. In IV, 2008. 42,
47, 48

[4] Alexander Amini, Guy Rosman, Sertac Karaman, and Daniela Rus. Variational
end-to-end navigation and localization. In ICRA, pages 8958–8964, 2019. 42,
45, 48

[5] ASAM OpenDRIVE. https://www.asam.net/standards/detail/


opendrive/, 2022. Accessed: 2022-01-26. 74

[6] A. L. Ballardini, D. Cattaneo, S. Fontana, and D. G. Sorrenti. Leveraging the


osm building data to enhance the localization of an urban vehicle. In ITSC,
2016. 43, 49

[7] A. L. Ballardini, D. Cattaneo, S. Fontana, and D. G. Sorrenti. An online


probabilistic road intersection detector. In ICRA, 2017. 43, 49

[8] A. L. Ballardini, S. Fontana, A. Furlan, D. Limongi, and D. G. Sorrenti. A


framework for outdoor urban environment estimation. In ITSC, 2015. 43, 49

[9] Dan Barnes, Matthew Gadd, Paul Murcutt, Paul Newman, and Ingmar Posner.
The oxford radar robotcar dataset: A radar extension to the oxford robotcar
dataset. In 2020 IEEE International Conference on Robotics and Automation
(ICRA), pages 6433–6438. IEEE, 2020. 129

[10] Andrea Benedetto, Fabio Tosti, Luca Bianchini Ciampoli, and Fabrizio D’amico.
An overview of ground-penetrating radar signal processing techniques for road
inspections. Signal processing, 132:201–209, 2017. 126

165
[11] Berta Bescos, Jose M. Facil, Javier Civera, and Jose Neira. DynaSLAM: Track-
ing, Mapping, and Inpainting in Dynamic Scenes. RA-L, 2018. 96

[12] Berta Bescos, José M Fácil, Javier Civera, and José Neira. DynaSLAM: Track-
ing, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation
Letters, 3(4):4076–4083, 2018. 125, 128

[13] Berta Bescos, Jose Neira, Roland Siegwart, and Cesar Cadena. Empty Cities:
Image Inpainting for a Dynamic-Object-Invariant Space. In ICRA, pages 5460–
5466, 2019. 96

[14] Jonathan Binas, Daniel Niel, Shih-Chii Liu, and Tobi Delbruck. DDD17: End-
To-End DAVIS Driving Dataset. In ICML’17 Workshop on Machine Learning
for Autonomous Vehicles (MLAV 2017), 2017. 129

[15] Gary Bishop, Greg Welch, et al. An introduction to the kalman filter. SIG-
GRAPH, 8(27599-23175):41, 2001. 106

[16] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner,
Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller,
Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv:1604.07316,
2016. 42, 45, 48

[17] Mohamed Essayed Bouzouraa and Ulrich Hofmann. Fusion of occupancy grid
mapping and model based object tracking for driver assistance systems using
laser and radar sensors. In IV, 2010. 96

[18] Alberto Broggi and Simona Bertè. Vision-based road detection in automotive
systems: A real-time expectation-driven approach. JAIR, 3:325–348, 1995. 42,
47

[19] Mathias Bürki, Cesar Cadena, Igor Gilitschenski, Roland Siegwart, and Juan
Nieto. Appearance-based landmark selection for visual localization. JFR,
36(6):1041–1073, 2019. 96

[20] Mathias Bürki, Cesar Cadena, Igor Gilitschenski, Roland Siegwart, and Juan
Nieto. Appearance-based landmark selection for visual localization. Journal of
Field Robotics, 36(6):1041–1073, 2019. 125, 128

[21] K. Burnett, A. P. Schoellig, and T. D. Barfoot. Do We Need to Compensate for


Motion Distortion and Doppler Effects in Spinning Radar Navigation? IEEE
Robotics and Automation Letters, 6(2), 2021. 128

166
[22] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid,
and J. J. Leonard. Past, Present, and Future of Simultaneous Localization and
Mapping: Toward the Robust-Perception Age. IEEE Transactions on Robotics,
32(6), 2016. 93, 96, 128
[23] Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong,
Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom.
nuScenes: A Multimodal Dataset for Autonomous Driving. In (CVPR), 2020.
129
[24] Luca Caltagirone, Mauro Bellone, Lennart Svensson, and Mattias Wahde.
Lidar–camera fusion for road detection using fully convolutional neural net-
works. Robotics and Autonomous Systems, 111:125–131, 2019. 48
[25] Luca Caltagirone, Samuel Scheidegger, Lennart Svensson, and Mattias Wahde.
Fast lidar-based road detection using fully convolutional neural networks. In
IV, 2017. 48
[26] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam.
Rethinking atrous convolution for semantic image segmentation. arXiv preprint
arXiv:1706.05587, 2017. 79
[27] Zhe Chen, Jing Zhang, and Dacheng Tao. Progressive lidar adaptation for road
detection. Journal of Automatica Sinica, 6(3):693–702, 2019. 48
[28] Richard J Chignell. Ground penetrating radar-a sensor for mine detection. In
Proceedings EUREL International Conference on the Detection of Abandoned
Landmines, pages 103–108, 1996. 96
[29] Winston Churchill and Paul Newman. Experience-based navigation for long-
term localisation. IJRR, 2013. 96
[30] Felipe Codevilla, Matthias Miiller, Antonio López, Vladlen Koltun, and Alexey
Dosovitskiy. End-to-end driving via conditional imitation learning. In ICRA,
2018. 42, 45, 48
[31] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus
Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
The Cityscapes Dataset for Semantic Urban Scene Understanding. In (CVPR),
June 2016. 128, 129
[32] Matthew Cornick, Jeffrey Koechling, Byron Stanley, and Beijia Zhang. Localiz-
ing Ground Penetrating RADAR: A Step Toward Robust Autonomous Ground
Vehicle Localization. JFR, 33(1):82–102, 2016. 95, 96, 99, 100, 101, 102, 103,
105, 108, 115, 123, 125, 126, 128, 130, 138, 141

167
[33] R Craig Coulter. Implementation of the pure pursuit path tracking algorithm.
Technical Report CMU-RI-TR-92-01, Carnegie-Mellon, 1992. 59, 107

[34] Jeffrey J Daniels et al. Ground penetrating radar for imaging archeological
objects. In Proceedings of the New Millennium International Forum on Con-
servation of Cultural Property, pages 247–265, 2000. 96

[35] J L Davis and A Peter ANNAN. Ground-penetrating radar for high-resolution


mapping of soil and rock stratigraphy 1. Geophysical prospecting, 37(5):531–551,
1989. 96

[36] Kenneth M Dawson-Howe and Tomos G Williams. The detection of buried


landmines using probing robots. RAS, 1998. 96, 128

[37] Carl De Boor. On calculating with b-splines. Journal of Approximation theory,


6(1):50–62, 1972. 56

[38] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional


neural networks on graphs with fast localized spectral filtering. In NIPS, 2016.
48, 60

[39] Frank Dellaert, Michael Kaess, et al. Factor graphs for robot perception. Foun-
dations and Trends® in Robotics, 6(1-2):1–139, 2017. 28

[40] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE conference on computer
vision and pattern recognition, pages 248–255. Ieee, 2009. 110

[41] Juergen Dickmann, Jens Klappstein, Markus Hahn, Nils Appenrodt, Hans-
Ludwig Bloecher, Klaudius Werber, and Alfons Sailer. Automotive radar the
key technology for autonomous driving: From detection and ranging to envi-
ronmental understanding. In RadarConf, pages 1–6, 2016. 96

[42] E. W. Dijkstra. A Note on Two Problems in Connexion with Graphs. Nu-


merische Mathematik, 1:269–271, 1959. 99

[43] Anh-Dzung Doan, Yasir Latif, Tat-Jun Chin, Yu Liu, Thanh-Toan Do, and Ian
Reid. Scalable Place Recognition Under Appearance Change for Autonomous
Driving. In Proceedings of the IEEE/CVF International Conference on Com-
puter Vision (ICCV), October 2019. 128

[44] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen
Koltun. CARLA: An open urban driving simulator. In CoRL, pages 1–16, 2017.
47, 65

168
[45] M. Dymczyk, E. Stumm, J. Nieto, R. Siegwart, and I. Gilitschenski. Will it
last? Learning stable features for long-term visual localization. In 3DV, 2016.
96

[46] Marcin Dymczyk, Elena Stumm, Juan Nieto, Roland Siegwart, and Igor
Gilitschenski. Will it last? Learning stable features for long-term visual local-
ization. In 2016 Fourth International Conference on 3D Vision (3DV), pages
572–581. IEEE, 2016. 125

[47] Ricardo Fabbri, Luciano Da F Costa, Julio C Torelli, and Odemir M Bruno.
2D Euclidean distance transform algorithms: A comparative survey. ACM
Computing Surveys (CSUR), 40(1):1–44, 2008. 79

[48] M. Fehr, F. Furrer, I. Dryanovski, J. Sturm, I. Gilitschenski, R. Siegwart, and


C. Cadena. TSDF-based change detection for consistent long-term dense recon-
struction and dynamic object discovery. In 2017 IEEE International Conference
on Robotics and Automation (ICRA), 2017. 128

[49] Federal Highway Administration, 2019. 98

[50] Georgios Floros, Benito Van Der Zander, and Bastian Leibe. Openstreetslam:
Global vehicle localization using openstreetmaps. In ICRA, 2013. 43, 49

[51] No job for humans: The robot assault on fukushima - wired. https://www.
wired.com/story/fukushima-robot-cleanup/. Accessed: 2022-07-31. 21

[52] A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The KITTI
dataset. The International Journal of Robotics Research, 32(11), 2013. 128, 129

[53] Mordechai (Muki) Haklay and Patrick Weber. OpenStreetMap: User-Generated


Street Maps. IEEE Pervasive Computing, October 2008. 70, 74

[54] Mordechai (Muki) Haklay and Patrick Weber. Openstreetmap: User-generated


street maps. IEEE Pervasive Computing, 7(4):12–18, October 2008. 43, 49, 50

[55] Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the
heuristic determination of minimum cost paths. SSC, 4(2):100–107, 1968. 54

[56] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern
Recog., pages 770–778, 2016. 78

[57] Matthias Hentschel and Bernardo Wagner. Autonomous robot navigation based
on OpenStreetMap geodata. In ITSC, 2010. 45

169
[58] Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, Jack Fan, and
Raquel Urtasun. DAGMapper: Learning to Map by Discovering Lane Topology.
In Proc. IEEE Int. Conf. Comput. Vis., October 2019. 42, 72
[59] Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng
Wang, Yuanqing Lin, and Ruigang Yang. The ApolloScape Dataset for Au-
tonomous Driving. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 954–960, 2018. 128, 129
[60] E Jose and M D Adams. An augmented state SLAM formulation for multiple
line-of-sight features with millimetre wave RADAR. In IROS, 2005. 96
[61] Rudolph Emil Kalman. A new approach to linear filtering and prediction prob-
lems. Journal of Basic Engineering, 1960. 106
[62] Charles F. F. Karney. Algorithms for geodesics. Journal of Geodesy, 87(1):43–
55, 2013. 106
[63] Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional
network for real-time 6-dof camera relocalization. In Proceedings of the IEEE
international conference on computer vision, pages 2938–2946, 2015. 97
[64] Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda,
John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. Learning to
drive in a day. In ICRA, 2019. 42, 45, 48
[65] Sandy Kennedy and Jim Rossi. Performance of a deeply coupled commercial
grade gps/ins system from kvh and novatel inc. In Position, Location and
Navigation Symposium, 2008. 98, 118
[66] Yeongmin Ko, Younkwan Lee, Shoaib Azam, Farzeen Munir, Moongu Jeon,
and Witold Pedrycz. Key points estimation and point instance segmentation
approach for lane detection. Proc. Int. Conf. on Intell. Transportation Syst.,
2021. 73
[67] Takuji Kubota, Tomoo Ushio, Shoichi Shige, Satoshi Kida, Misako Kachi, and
Ken’ichi Okamoto. Verification of High-Resolution Satellite-Based Rainfall Esti-
mates around Japan Using a Gauge-Calibrated Ground-Radar Dataset. Journal
of the Meteorological Society of Japan. Ser. II, 87A, 2009. 128
[68] H M La, R S Lim, B Basily, N Gucunski, J Yi, A Maher, F A Romero, and
H Parvardeh. Autonomous robotic system for high-efficiency non-destructive
bridge deck inspection and evaluation. In CASE, 2013. 96
[69] Richard B Langley. The utm grid system. GPS world, 9(2):46–50, 1998. 50

170
[70] John Leonard et al. Team mit urban challenge technical report. Technical
Report MIT-CSAIL-TR-2007-058, MIT CSAIL, 2007. 107

[71] J. H. Lever, A. J. Delaney, L. E Ray, E. Trautmann, L. A. Barna, and A. M


Burzynski. Autonomous GPR Surveys using the Polar Rover Yeti. JFR,
30(2):194–215, 2013. 96

[72] J. Levinson, M. Montemerlo, and S. Thrun. Map-Based Precision Vehicle Lo-


calization in Urban Environments. In RSS, 2007. 41, 47, 50, 125, 128

[73] Jesse Levinson, Michael Montemerlo, and Sebastian Thrun. Map-based pre-
cision vehicle localization in urban environments. In Robotics: science and
systems, volume 4, page 1. Citeseer, 2007. 74

[74] Jesse Levinson and Sebastian Thrun. Robust Vehicle Localization in Urban
Environments Using Probabilistic Maps. In ICRA, 2010. 125, 128

[75] Jesse Levinson and Sebastian Thrun. Robust vehicle localization in urban en-
vironments using probabilistic maps. In ICRA, 2010. 41, 47

[76] Wlodzimierz Lewandowski and Claudine Thomas. GPS time transfer. Proceed-
ings of the IEEE, 79(7):991–1000, 1991. 140

[77] Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. HDMapNet: A Local Semantic
Map Learning and Evaluation Framework. arXiv preprint arXiv:2107.06307,
2021. 73

[78] Justin Liang and Raquel Urtasun. End-to-End Deep Structured Models for
Drawing Crosswalks. In Proceedings of the European Conference on Computer
Vision (ECCV), September 2018. 42, 72

[79] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common
objects in context. In Proc. European Conf. Comput. Vis. Springer, 2014. 79

[80] D.T. Linzmeier, M. Skutek, M. Mekhaiel, and K.C.J. Dietmayer. A pedestrian


detection system based on thermopile and radar sensor data fusion. In FUSION,
pages 1272–1279, 2005. 96

[81] Kanglin Liu, Qing Li, and Guoping Qiu. PoseGAN: A pose-to-image translation
framework for camera localization. ISPRS Journal of Photogrammetry and
Remote Sensing, 166:308–315, 2020. 97

171
[82] Rong Liu, Jinling Wang, and Bingqi Zhang. High definition map for automated
driving: Overview and analysis. The Journal of Navigation, 73(2):324–341,
2020. 74

[83] W. Liu, H. Zhang, B. Duan, H. Yuan, and H. Zhao. Vision-based real-time lane
marking detection and tracking. In ITSC, 2008. 42, 47, 48

[84] S. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and


M. J. Milford. Visual Place Recognition: A Survey. IEEE Transactions on
Robotics, 32(1), 2016. 93, 96, 128

[85] Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 Year, 1000
Km: The Oxford RobotCar Dataset. The International Journal of Robotics
Research, 36(1):3–15, 2017. 128, 129

[86] Annika Meyer, N Ole Salscheider, Piotr F Orzechowski, and Christoph Stiller.
Deep semantic lane segmentation for mapless driving. In Proc. IEEE/RSJ Int.
Conf. Intell. Robots and Syst., pages 869–875. IEEE, 2018. 73

[87] M. Meyer and G. Kuschk. Automotive Radar Dataset for Deep Learning Based
3D Object Detection. In European Radar Conference (EuRAD), 2019. 129

[88] T. Moore and D. Stouch. A generalized extended kalman filter implementation


for the robot operating system. In IAS-13, July 2014. 53

[89] T. Moore and D. Stouch. A generalized extended kalman filter implementation


for the robot operating system. In IAS, 2014. 106

[90] Kenji Mori, Tomokazu Takahashi, Ichiro Ide, Hiroshi Murase, Takayuki Miya-
hara, and Yukimasa Tamatsu. Recognition of foggy conditions by in-vehicle
camera and millimeter wave radar. In IV, 2007. 96

[91] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. ORB-
SLAM: a versatile and accurate monocular SLAM system. IEEE transactions
on robotics, 31(5):1147–1163, 2015. 125

[92] Gellért Máttyus, Shenlong Wang, Sanja Fidler, and Raquel Urtasun. Enhancing
Road Maps by Parsing Aerial Images Around the World. In Proc. IEEE Int.
Conf. Comput. Vis., pages 1689–1697, 2015. 42, 72

[93] Navigation Data Standard (NDS). https://nds-association.org/, 2022. Ac-


cessed: 2022-01-26. 74

172
[94] Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, and
Luc Van Gool. Towards end-to-end lane detection: An instance segmentation
approach. In IEEE Intell. Vehicle Symposium, 2018. 73

[95] Traffic safety facts research note. https://crashstats.nhtsa.dot.gov/Api/


Public/Publication/812927, 2020. Accessed: 2022-07-31. 21

[96] Sheng-Huoo Ni, Yan-Hong Huang, Kuo-Feng Lo, and Da-Ci Lin. Buried pipe
detection by ground penetrating radar using the discrete wavelet transform.
Computers and Geotechnics, 37(4):440–448, 2010. 96

[97] Nuro. https://www.nuro.ai/vehicle, 2022. Accessed: 2022-07-26. 23

[98] T. Ort, L. Paull, and D. Rus. Autonomous vehicle navigation in rural environ-
ments without detailed prior maps. In ICRA, 2018. 42, 46, 48

[99] Teddy Ort, Igor Gilitschenski, and Daniela Rus. Autonomous Navigation in
Inclement Weather Based on a Localizing Ground Penetrating Radar. IEEE
Robotics and Automation Letters, 5(2):3267–3274, 2020. 19, 119, 120, 121, 126,
128, 129, 130

[100] Teddy Ort, Krishna Murthy, Rohan Banerjee, Sai Krishna Gottipati, Dhaivat
Bhatt, Igor Gilitschenski, Liam Paull, and Daniela Rus. Maplite: Autonomous
intersection navigation without a detailed prior map. IEEE RA-L, 2019. 98

[101] Oxts rt user manual, 2019. 106

[102] Mark Pauly, Markus Gross, and Leif P Kobbelt. Efficient simplification of
point-sampled surfaces. In VIS, 2002. 51

[103] Trading economics: Paved roads in the united states (as share of total roads).
, 2008. Accessed: 2019-09-10. 45

[104] Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Carlos
Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian Adverse Driving
Conditions Dataset. The International Journal of Robotics Research, 2020. 128,
129

[105] Michael JD Powell. An efficient method for finding the minimum of a func-
tion of several variables without calculating derivatives. The computer journal,
7(2):155–162, 1964. 58

[106] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep
learning on point sets for 3d classification and segmentation. In CVPR, 2017.
48, 60

173
[107] M. Rapp, M. Hahn, M. Thom, J. Dickmann, and K. Dietmayer. Semi-Markov
Process Based Localization Using Radar in Dynamic Environments. In 2015
IEEE 18th International Conference on Intelligent Transportation Systems,
2015. 128

[108] R. H. Rasshofer and K. Gresser. Automotive Radar and Lidar Systems for Next
Generation Driver Assistance Functions. Advances in Radio Science, 3:205–209,
2005. 96

[109] Jane Rea and Rosemary Knight. Geostatistical analysis of ground-penetrating


radar data: A means of describing spatial variation in the subsurface. Water
Resources Research, 34(3):329–339, 1998. 96

[110] Alejandro Romero-Ruiz, Niklas Linde, Thomas Keller, and Dani Or. A Re-
view of Geophysical Methods for Soil Structure Characterization. Reviews of
Geophysics, 56(4):672–697, 2018. 128

[111] Camera calibration - ros wiki, 2021. 142

[112] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library (PCL).
In IEEE International Conference on Robotics and Automation (ICRA), Shang-
hai, China, May 9-13 2011. 136

[113] Dorothy Ryan. Lincoln Laboratory enters licensing agreement to produce its
localizing ground-penetrating radar. https://news.mit.edu/2017/lincoln-
laboratory-enters-licensing-agreement-to-produce-localizing-
ground-penetrating-radar-0718, 2017. 126

[114] Timo Saarenketo and Tom Scullion. Road evaluation with ground penetrating
radar. Journal of Applied Geophysics, 2000. 100

[115] Sae levels of driving automation. https://www.sae.org/blog/sae-j3016-


update, 2021. Accessed: 2022-07-31. 24

[116] Abraham Savitzky and Marcel JE Golay. Smoothing and differentiation of data
by simplified least squares procedures. Analytical chemistry, 36(8):1627–1639,
1964. 141

[117] Y. Schulz, A. K. Mattar, T. Hehn, and J. Kooij. Hearing What You Cannot See:
Acoustic Vehicle Detection Around Corners. Robotics and Automation Letters,
2021. 129

[118] F Schuster, M Wörner, C G Keller, M Haueis, and C Curio. Robust localization


based on radar signal clustering. In IV, 2016. 96

174
[119] Y. W. Seo and R. R. Rajkumar. Detection and tracking of boundary of un-
marked roads. In FUSION, pages 1–6, 2014. 42, 47

[120] Keith Shaw. Meet WaveSense: Providing Accuracy for Vehicles Via Un-
derground Radar. https://www.roboticsbusinessreview.com/unmanned/
unmanned-ground/meet-wavesense-providing-accuracy-for-vehicles-
via-underground-radar, 2020. 126

[121] Y. Su, S. Xing, J. Feng, S. Dai, Y. Xiao, L. Zheng, and C. Li. The preliminary
results of lunar penetrating radar on board the chinese chang’e-3 rover. In GPR,
2014. 96

[122] Pei Sun et al. Scalability in Perception for Autonomous Driving: Waymo Open
Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2020. 128, 129

[123] Shu-Li Sun and Zi-Li Deng. Multi-sensor optimal information fusion kalman
filter. Automatica, 40(6):1017–1023, 2004. 106

[124] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabi-
novich. Going deeper with convolutions. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages 1–9, 2015. 110

[125] Damien Vivet, Franck Gérossier, Paul Checchin, Laurent Trassoudaine, and
Roland Chapuis. Mobile ground-based radar sensor for localization and map-
ping: An evaluation of two approaches. IJARS, 10(8):307, 2013. 96

[126] Erik Ward and John Folkesson. Vehicle localization with low cost radar sensors.
In IV, 2016. 96

[127] Waymo via. https://waymo.com/waymo-via/, 2022. Accessed: 2022-07-26. 23

[128] K. Werber, J. Klappstein, J. Dickmann, and C. Waldschmidt. Point group as-


sociations for radar-based vehicle self-localization. In International Conference
on Information Fusion (FUSION), 2016. 128

[129] K. Werber, J. Klappstein, J. Dickmann, and C. Waldschmidt. Association of


Straight Radar Landmarks for Vehicle Self-Localization. In Intelligent Vehicles
Symposium (IV), 2019. 128

[130] Global status report on road safety 2018. https://www.who.int/


publications/i/item/9789241565684, 2018. Accessed: 2022-07-31. 21

175
[131] R M Williams, L E Ray, and J Lever. An autonomous robotic platform for
ground penetrating radar surveys. In IGARSS, 2012. 96, 128

[132] Michael G Wing, Aaron Eklund, and Loren D Kellogg. Consumer-grade


global positioning system (GPS) accuracy and reliability. Journal of forestry,
103(4):169–173, 2005. 125

[133] R. W. Wolcott and R. M. Eustice. Fast LIDAR localization using multiresolu-


tion Gaussian mixture maps. In ICRA, 2015. 41, 47, 125, 128

[134] Ryan W Wolcott and Ryan M Eustice. Visual localization within lidar maps
for automated urban driving. In IROS, 2014. 41, 47, 125

[135] Geng Zhang, Nanning Zheng, Chao Cui, Yuzhen Yan, and Zejian Yuan. An
efficient road detection method in noisy urban environment. In IV, 2009. 42,
47, 48

[136] Jinyou Zhang and H-H Nagel. Texture-based segmentation of road images. In
IV, 1994. 48

[137] Zichao Zhang and Davide Scaramuzza. A tutorial on quantitative trajectory


evaluation for visual (-inertial) odometry. In (IROS), pages 7244–7251. IEEE,
2018. 131

[138] A. Z. Zhu, D. Thakur, T. Özaslan, B. Pfrommer, V. Kumar, and K. Daniilidis.


The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for
3D Perception. Robotics and Automation Letters, 3(3), 2018. 129

[139] Julius Ziegler et al. Making bertha drive—an autonomous journey on a historic
route. ITSM, 6(2):8–20, 2014. 55

[140] Jannik Zürn, Johan Vertens, and Wolfram Burgard. Lane Graph Estimation
for Scene Understanding in Urban Driving. IEEE Robot. and Automation Lett.,
6(4):8615–8622, 2021. 42, 72

176

You might also like