Professional Documents
Culture Documents
TABLE I
SENSOR SETUP AND CHARACTERISTICS FOR OBJECT TRACKING
II. R ELATED W ORK mation obtained about the object from the vehicles sensors.
A traditional mechanism for incorporating multiple models in
Most existing commercial driver-assistance systems with en-
tracking is the interacting multiple model filter [23], which
vironmental perception are designed for longitudinal traffic in
uses a weighted mixture of different process models. A more
well-structured environments (e.g., adaptive cruise control [6]).
general approach is the switching Kalman filter model [24],
Driver-assistance systems, which work in more complex envi-
which represents multiple independent system state dynamics
ronments and use a multisensor fusion approach, are an active
models and switches between them (or linearly combines them)
area of research. Examples include systems for intersection
to best fit the observed (or predicted) nonlinear dynamics of the
assistance or systems that assist drivers in construction sites on
system being modeled. In our work, model switching is done
a highway [7].
at a higher level than where the vehicle state is estimated. We
Research in vehicle tracking has been accomplished using
propose a novel voting scheme where a tracked object will be
a wide variety of different sensors that include, but are not
represented by the most informative model, depending on the
limited to, computer vision [8], [9], LIDAR [10], [11], and
quality of the sensors.
RADAR [12], [13]. In our approach, we make use of and fuse
An important and cost-determining part in the process of
information from our vehicles onboard LIDAR and RADAR
designing the perception system is the development of the sys-
sensors to track objects around the vehicle. This fusion is
tem architecture [25], [26]. Multiple publications that describe
necessary to provide the proper blend of long-range detection
general architecture concepts for sensor data-fusion systems
with short-range shape estimation. In particular, one of our
exist [20], [25], [27][29]. The architecture that we use for
sensor modules for the SICK LMS 291 (see Table I) uses a
object tracking was originally developed for driver-assistance
vehicle shape-interpretation algorithm that was originally used
systems [30] and is designed for an efficient development
for vehicle collision warnings for buses [14].
process, allowing a cost-effective extension of the system with
To robustly track vehicles, even in the face of sensor and
new sensors.
environment noise, we use a Bayesian-filtering approach [15].
Object tracking using a Bayesian filter formalism relies on an
a priori model of the objects motion that allows the algorithm III. P ERCEPTION S YSTEM
to predict the object motion, given noisy observations. One
of the most widely used methods for state estimation is the In our system, vehicle tracking is an integral part of a larger
Kalman filter [16], [17], in which the system model is assumed perceptual system, which is responsible for providing a coher-
to be linear and the noise is assumed to be Gaussian. Our system ent and consistent model of the world. The process of detecting
uses a more numerically stable variant of the Kalman filter and tracking vehicles is directly influenced by the context in
called the square root filter [18]. which those vehicles are observed. Fig. 2 shows the flow of how
In our application, the autonomous vehicle must interpret a world model is computed. Raw measurements received by the
many different sensor returns, potentially from multiple targets, individual sensors are processed in the measurement phase to be
and continually decide which new sensor reading corresponds passed to the perception phase, where features and object inter-
to an existing tracked object [19], [20]. We make use of an ap- pretations are computed. These object hypotheses are passed to
proach known as multiple hypothesis tracking [21], [22], where the understanding phase, where the larger context in which the
multiple independent state estimators are used to estimate a object is detected is used to generate an understanding about
multimodal probability density. At each step of the estimation the global situation surrounding that object.
process, new sensor data are associated with the multiple tracks
to determine whether the sensor reading corresponds to an A. Modeling the Environment
existing target or whether a new target should be instantiated
in the tracker. Old targets that do not receive additional sensor To properly define the robots model of the world, we divide
information to support them will eventually be pruned from the the world into three specific domains.
tracked list. 1) Road structure: The road structure is a logical interpre-
Our approach to vehicle tracking makes use of multiple tation of the environment and defines where and how
vehicle models, depending on the kind and quality of infor- vehicles are allowed to drive.
DARMS et al.: OBSTACLE DETECTION AND TRACKING FOR THE URBAN CHALLENGE 477
Fig. 2. General perception process that illustrates how raw energy detected from the environment is converted into a situational understanding of the world
around the vehicle. This figure also illustrates where noise and errors can be introduced into the system and the modeling assumptions that specify how the
information is altered at each step. (Note that the categories of errors marked with are generally referred to throughout this paper as artifacts.)
2) Static obstacles: Static obstacles are obstacles that are tem, however, also included a road shape-estimation algorithm,
assumed not to move during the observation period (on which could deliver information about the road in cases where
or off road). the map was not sufficient. This, however, was not necessary on
3) Dynamic obstacles: Dynamic obstacles are obstacles that race day.
potentially move during the observation period. For the 2) Dynamic Obstacle Hypothesis List: The classification of
Urban Challenge, only cars fall into this category. Note, an obstacle as dynamic obstacle requires scene understanding
however, that a car does not necessarily have to be a (e.g., to distinguish a parked from a temporarily stopped ve-
dynamic obstacle, as it can also be a parked vehicle that hicle). The perception system only provides a list of dynamic
will never move. obstacle hypotheses. These are all obstacles that potentially
belong to the class of dynamic obstacles. Dynamic obstacles
are represented by a shape model and state variables, such as
B. Perception System Architecture
position, velocity, and acceleration (see Section IV-A).
The architecture of the perception system, as shown in Fig. 3, For every dynamic obstacle hypothesis, two flags are pro-
is analogously divided to the world model into the follow- vided as follows: 1) the current movement state, i.e., moving
ing three subsystems: 1) a road estimation subsystem, which and not moving, and 2) the movement history, i.e., observed
generates information about the road structure; 2) a tracking moving and not observed moving. The flag moving is set once
subsystem, which is responsible for generating dynamic obsta- the tracking subsystem decides that the object is currently in
cle hypotheses; and 3) a static obstacle estimation subsystem, motion. The flag observed moving is set once the tracking sys-
which estimates the location of static obstacles. tem decides that the object has changed its position. With this
1) Road Structure: The road structure is represented as a definition of a dynamic obstacle, it is obvious that, whenever
topological network of segments, intersections, and zones. A the flags moving and observed moving are set, the obstacle
segment contains a number of road lanes, and each lane has hypothesis belongs to the class of dynamic obstacles. If only
a specified width and direction. The shape and curvature of a the flag observed moving is set, the obstacle may belong to the
particular lane are determined by a set of points that are spaced class of static obstacles (e.g., vehicle stalled). However, in our
roughly 12 m apart from each other. Intersections are junctions system, all objects that have only the observed moving flag set
that explicitly connect lanes from different segments. Zones are directly treated as dynamic obstacles. Testing showed that
are free-form open areas, such as parking lots, which have no this is a good approximation for short observation periods.
explicit restrictions on where vehicles can travel. In certain situations, sensors cannot detect an object. This
In our system, the road structure was derived from the holds true, for example, if an object is not within the field of
given road network definition file and information from a high- view of a sensor or part of the field of view is occluded. (Sensor
precision satellite image. With this, the data were known with occlusions were not modeled within our system.) However,
high confidence. The information was used inside the tracking this can also occur due to sensor artifacts. Dynamic obstacle
system for computational efficiency and to reduce the number hypotheses are only maintained by the tracking system as long
of sensor artifacts (see Section IV). The road-estimation sys- as the sensor data can support the estimation of state variables.
478 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 3, SEPTEMBER 2009
This means that only short-term sensor artifacts (measurement Planar scanning lasers provide information about the geo-
dropouts) are bridged and that no scene understanding is used metric cross section of an object. While this information is
to argue the lifetime of an obstacle hypothesis. Situations where generally useful, a single cross section can be insufficient in
an object is temporarily lost for a longer amount of time are distinguishing between an obstacle and a benign terrain feature.
explicitly handled by the situation assessment algorithms (see For example, the side of a car may provide a contour that looks
Section V). very similar to a sloped road. Furthermore, given a fixed angular
3) Static Obstacle Map and Instantaneous Obstacle Map: resolution, the sensors provide sparse spatial information and,
Static obstacles are represented in a map decomposed into a thus, insufficient geometric information beyond some range.
regular array of cells. Within this map, dynamic obstacle hy- This motivates the use of a dual-mode representation (bounding
potheses that are flagged as observed moving clear out any cells boxes and points, as will be shown next).
within their footprint. The static obstacle map has a memory, The Velodyne HDL-64E was the only sensor with a wide-
remembering obstacles that are no longer visible. enough vertical sensing angle to deliver reliable 3-D informa-
Evidence of obstacles is calculated as a function of the tion about objects around the vehicle. However, the effective
number of LIDAR points classified as an obstacle in each cell. range of the sensor is not sufficient for all the required au-
Evidence decays over time at a rate that is less than the rate tonomous maneuvers, particularly merging and passing ma-
at which it accumulates for a reasonably supported obstacle. neuvers at 30 mi/h. Despite these shortcomings, the sensor is
In this way, the obstacle map is filtered to reduce the effect of important for object validation in the near range as the 3-D
spurious sensor measurements. data allow a ground plane estimation and the creation of the
The perception system also maintains an instantaneous ob- instantaneous obstacle map. In this way, the sensor can be used
stacle map. The map contains all obstacles around the robot; to effectively reject artifacts caused by the limited vertical field
dynamic obstacle hypotheses are not removed from the map. of view of other sensors (e.g., false obstacles generated by
The map is used to validate features from planar LIDAR and perceiving the ground).
RADAR obstacles in the tracking subsystem (see Section IV). The last class of sensors used on the vehicle are fixed-
For this purpose, the information does not have to be as accurate beam LIDAR sensors. Similar to the RADAR sensors, they
as that for the motion-planning algorithms, which use the static provide 2-D coordinates of the detection center of obstacles;
map. A much smaller time constant can be used to filter the however, there is no direct measurement of velocity. A fixed-
sensor input data. beam LIDAR is mounted redundant with a RADAR on a
rotating mount on both sides of the vehicle. By appropriately
C. Sensor Setup pointing these sensors, it is possible to situationally focus and
extend the vehicles perception capability.
Table I shows the autonomous vehicles sensor configuration It is clear that not one of the sensors is sufficient in provid-
used for tracking moving objects and summarizes the charac- ing the situational awareness that is necessary for safe urban
teristics of the sensors [3]. Data are asynchronously collected driving. The variety of sensors provides not only redundant
from each sensor but is timestamped to a common system time. coverage but complementary detection and processing charac-
The combination of these sensors provides complete sensor teristics that provide significant benefits beyond the use of a
coverage around the vehicle. As will be described, the fusion
single sensor as well.
of the different sensors allows Boss to exploit the advantages
of each sensor type to increase overall performance. The re-
dundancy provided by overlapping fields of view enables the IV. T RACKING S UBSYSTEM
system to be robust against false readings or failure of sensors.
The tracking subsystem was responsible for the identification
The sensors are configured to maximize redundancy in front
of dynamic obstacle hypotheses for use by the rest of the
of the vehicle (with between four and seven sensors providing
coverage) while also providing redundant sensing around as system. Its design was influenced by both the available sensors
much of the vehicle as possible (e.g., two sensors, i.e., a LIDAR and the ways that the obstacle hypotheses are used to implement
and a RADAR, provide coverage behind Boss). various traffic-interactive behaviors. The robust implementation
Radars are used for detecting objects in the near and far of each of these behaviors requires complex situation-specific
ranges. Doppler shift provides a measure of the relative speed of reasoning that, if applied to the tracking subsystem, would lead
an obstacle, which gives a low latency and an accurate velocity to a rigid system that is difficult to adapt or extend to implement
estimate, which, in turn, can be used to efficiently distinguish new behaviors. Instead, all possible vehicles are tracked the
static from moving objects. The measurement accuracy of these same way, regardless of the scenario, and situation-specific
sensors is sufficient in associating vehicles to lanes on the road reasoning is isolated in another subsystem.
up to a detection range of 200 m. While long-range and direct It is important to note that there was no attempt to do vehicle
velocity measurement are clear advantages, RADAR sensors classification in the tracking subsystem. Traditional classifica-
provide no notion of the height of the obstacles; instead, they tion algorithms [31], [32] are generally based on camera data
report RADAR cross section. It is common that small angular and rely either on heuristics or training data to accurately clas-
objects can return significant amounts of energy, comparable sify vehicles in the environment. Both approaches work well
with a vehicle. Without complementary sensing, it can be in relatively constrained scenarios such as highway driving, but
difficult to distinguish between irrelevant clutter on the road and they break down somewhat in urban environments, where traffic
relevant vehicles. can come from arbitrary directions.
DARMS et al.: OBSTACLE DETECTION AND TRACKING FOR THE URBAN CHALLENGE 479
Fig. 4. Tracking subsystem architecture. (a) Two layers that comprise the tracking subsystem. (b) Processing flow between the two layers.
A. Modeling Dynamic Obstacles for features that may be used to describe a dynamic obstacle
hypothesis according to either the point model or the box
An extended Kalman filter [16] is used to estimate the state
model.
variables of dynamic obstacle hypotheses. The data available
Thereafter, all extracted features go through a two-step val-
for tracking depend on the type of sensor (e.g., radar versus
idation process to reduce the number of false positives in the
scanning laser) and the position of an observed object in
extracted feature set. These may be caused by issues, such
the sensors field of view (e.g., increased sensor noise and/or
as ground detection by planar laser scanners, where the inter-
decreased resolution at longer ranges).
section of the scanning plane with the ground plane can be
To cope with these issues, we use an adaptive model-
misinterpreted as the edge of a vehicle (see, e.g., [4] and [34]).
switching approach [3], wherein the tracking model is selected
All features that are explicitly validated in at least one of these
from different discrete models based on the available sensor
two steps are used to generate dynamic obstacle hypotheses,
information. The tracking subsystem uses the following two
reducing the chances of a false negative.
models: 1) a box model and 2) a point model. Note, however,
In the first validation step, a sensor-specific validation
that the approach can be extended as necessary to include other
algorithm that exploits the known properties of the sensor
models.
is used to filter out commonly encountered artifacts and to
The box model describes a dynamic obstacle hypothesis as
immediately validate particularly strong features, such as when
a rectangular box with a fixed length and width; its dynamics
a radar sensor indicates a strong velocity return, as discussed
are governed by a simple bicycle model with constant noise
in Section IV-D. In the second validation step, the feature
parameters, which is similar to that in [33]. The point model,
validation interface provided by the Fusion Layer uses the road
on the other hand, does not include any shape information; its
structure to reject features that are too far away from the road.
dynamics are governed by a constant-acceleration model with
This reduces the required computation time that is necessary to
adaptive noise, which is similar to that in [20].
process the data and can reduce the number of artifacts caused
by objects that are close to roads, such as bushes or roadside
B. Architecture barriers. Then, the instantaneous obstacle map is used to check
To support the use of multiple heterogeneous sensors, the whether the remaining features are likely to be caused by a
tracking subsystem is divided into two layers, as shown in ground return; within the range of the map, features are only
Fig. 4. The Sensor Layer is meant to hide sensor-specific validated if they are also found in the map.
details, providing sensor-independent data to the Fusion Layer, All valid features are subsequently used in the next step in the
which is responsible for forming and maintaining dynamic Sensor Layer, i.e., data association. Here, the system attempts to
obstacle hypotheses. For each type of sensor, a unique sensor- associate the features with the currently predicted dynamic obs-
layer module is implemented, and a separate instance of tacle hypotheses from the Fusion Layer. Sensor-specific heuris-
that module is used for each physical device on the robot. tics are used for data association. These are designed according
Sections IV-C and D discuss the modules for scanning lasers to the characteristics of the features extracted from a given sen-
and radar sensors, respectively. sor. Features that cannot be associated to an existing hypothesis
Every time a sensor generates a new measurement, its cor- are used to generate new proposals, as described here.
responding Sensor Layer instance first requests a prediction An observation is generated for each feature associated with
of the current set of dynamic obstacle hypotheses from the a given hypothesis, encapsulating all information that is neces-
Fusion Layer. These predictions are cached for possible use sary to update the estimation of dynamic obstacles in the Fusion
in feature extraction and validation and are eventually used to Level. This includes the nonlinear equations for mapping the
associate validated features to existing hypotheses. Features are measurements from measurement space to state space, the
extracted from the raw sensor data according to the algorithm corresponding Jacobian matrices, and the measurement vector
encapsulated within the Sensor Layer module, which searches for the Kalman filter. This approach is the key to decoupling
480 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 3, SEPTEMBER 2009
cannot be explained as an artifact of an existing hypothesis, a whereas the radar and fixed-beam laser sensors only support
proposal set is analogously generated similar to the process of the point model. A model counts as currently supported by a
alternate proposal selection that was previously described. Edge sensor if it is either the current model and the sensor provides a
targets do not include a direct measurement of velocity, and consistent observation, or it is a model proposed by a sensor as
therefore, all movement observations that are associated with an alternative to the current model. In the case of the proposed
the hypotheses or proposals are of the form no data available. models, a minimum number of consecutive cycles with identi-
cal proposals are required before a proposal is considered in the
D. Sensor Layer: Radar Sensor model-selection process. This increases the systems robustness
to spurious misinterpretations.
The features extracted from raw radar data are called point
The best model is selected from the currently supported
targets. They represent the center of the reflected radar energy
models according to preference and confidence; the box model
and include the relative velocity of the target, as measured
is generally preferred over the point model, and confidence is
by the Doppler shift in the signal. The extraction algorithm
determined by a voting scheme, as will be discussed next. (Only
is optimized to extract reflections from vehicles but is also
sensors that are currently detecting the object are considered.)
susceptible to spurious returns from features, such as manhole
First, the relative support for each possible model is deter-
covers and metallic road signs.
mined as the ratio of the number of sensors that are currently
Fig. 5 includes sample point targets, where, in this case, the
supporting the model to the total number of sensors that are
highlighted artifacts originate from dust. Any feature with a
supporting the model in principle. A model switch is consid-
substantially nonzero absolute velocity that was measured by
ered only when the relative support for at least one model is
adding the Doppler measurement to Bosss velocity is imme- above a configurable minimum support threshold; otherwise,
diately validated, as it can be assumed that it originated from the current model is retained from the previous cycle. Higher
a moving object. All features that are not validated by their threshold values ensure that a switch to a model with higher
velocity are checked in the same way as the edge targets via accuracy will be performed only if there are enough sensors
the generic validation interface provided by the Fusion Layer. that are currently supporting it. In the Tartan Racing system,
For the association step, a set of possible point targets is the number of sensors supporting the box model increases as
generated from the predicted dynamic obstacle hypotheses. the vehicle gets closer to the robot. For example, at a range
For a point model, a single-point target is generated. For a that is closer than 30 m, up to four sensors can support the box
box model, a point target is generated for each corner and model, but at longer ranges, primarily radar sensors, which only
for the center of each edge. The measured point targets are support the point model, detect objects.
then associated to the predicted point targets using a nearest- The best candidate model for the cycle is the most preferred
neighbor approach. In contrast to the data association done for model with relative support above the minimum threshold. If
scanning laser sensors, multiple extracted point targets may be it differs from the current model, the model is immediately
associated with a single predicted hypothesis. Note, however, switched and initialized with values from the supporting pro-
that each extracted point target is still associated with, at most, posal. If, on the other hand, it coincides with the current model,
one predicted target, and vice versa. the current model is still evaluated to handle the case of a
For each associated pair of targets, a heuristic checks whether valid model with grossly invalid state variables. For example,
the predicted velocity and the measured velocity are com- consider the box model whose heading is orthogonal to the
patible. If this is the case, an observation is generated. If direction of travel. In this case, the ratio of sensors that are
not, a new point model proposal for the associated obstacle proposing alternate models to those that are providing obser-
hypothesis is set up. If the associated obstacle hypothesis is a vation of the current model is evaluated, and the model is
box model, then a point model proposal is set up, regardless of reinitialized if this ratio exceeds a configurable threshold.
the compatibility, as the sensor cannot support the box model.
The remaining unassociated point targets are used to generate F. Movement Classification
new point model proposals. Similar to the laser scanner Sensor
Layer, a heuristic checks beforehand if the unassociated target The Fusion Layer combines the movement observations from
could be an artifact caused by an existing object. Movement all sensors with statistical tests of the motion of the hypothesis
observations are generated according to the magnitude of and to compute these properties.
the confidence in the Doppler velocity measurement. 1) Moving Versus Not Moving: The determination of
whether an obstacle hypothesis is moving begins with an ac-
E. Best Model Selection cumulating count of the number of times that the hypothesis
has been reported by any sensor to be confirmed moving.
Based on the proposals and observations provided by the Each individual movement observation increments this count,
Sensor Layer, the Fusion Layer selects the best tracking model and when it exceeds a configurable threshold thmoving , the
(see [3]) for each moving obstacle hypothesis. As with all hypothesis is considered to be potentially moving.
algorithms in the Fusion Layer, the algorithm is formulated to If the dynamic obstacle hypothesis is not classified as poten-
be sensor independent. tially moving based on the movement observations, a statistical
A sensor supports a model if the model is observable with test on the estimated velocity attempts to reject hypothesis
observations from that particular sensor only. The laser scan- H0 : The absolute velocity is smaller than vmin . If this is the
ners (planar and 3-D) support both the box and point models, case, then the obstacle is classified as potentially moving.
482 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 3, SEPTEMBER 2009
Fig. 6. Example of an occupancy polygon. Dynamic obstacle hypotheses A and B cause the polygon to be considered as occupied and unoccupied, respectively
(see [2]).
This statistical test is entirely based on the state variables threshold is reached, the observed moving property is retained
of the dynamic obstacle hypothesis; therefore, it may be per- for a time period tobm
max after the object is no longer moving. This
formed independent of the sensors that detect the hypothesis. accounts for traffic that briefly stops, such as when queueing on
This test is necessary in situations where none of the detecting approach to an intersection, at the potential cost of retaining an
sensors can provide a direct movement confirmation, such as invalid observed moving classification for up to tobm
max .
when the hypothesis is exclusively detected by laser scanners.
The significance level for rejecting hypothesis H0 and vmin
V. S ITUATIONAL R EASONING
can be tuned, with a higher vmin and a lower making more
conservative estimates. Lowering leads to longer times until In this section, we describe two different scenarios that were
enough data are available to reject H0 , delaying the identifica- commonly encountered during the Urban Challenge (see also
tion of a moving obstacle hypothesis. Similarly, increasing vmin [2]). First, we discuss intersection handling, where Boss must
leads to less false positives at low speeds, trading off the ability determine whether to enter an intersection or to wait for other
to properly classify slow-moving objects. In terms of hardware, vehicles to proceed first. Then, we discuss distance keeping,
the test can be influenced by the increase in the number of where Boss tracks a vehicle in front of it to match its speed
sensors or the increased accuracy of a single sensor. and maintain a minimum safety distance. Both demonstrate
If a dynamic obstacle hypothesis is classified as potentially the application of situation-specific reasoning to the situation-
moving, the system performs a cross check against the no- independent dynamic obstacle hypotheses.
movement vectors provided by the Sensor Layers. The dot
product between the normalized current velocity (unit vector)
A. Intersection Handling
estimate and the no-movement vectors (unit vectors) is checked
against a configurable threshold. If this check indicates a strong As the robot approaches and waits at an intersection, the
inconsistency between the estimated velocity vector and any precedence order among vehicles at that intersection must
no-movement vector, the obstacle is classified as not moving, be determined to identify the point in time when the robot
providing the opportunity for a sensor with information about may proceed. Other vehicles may be static on arrival or for
confinement in a particular direction, such as a RADAR sensor, long periods of time after arrival, and therefore, all dynamic
to veto the potentially moving state. Otherwise, all remaining obstacle hypotheses are used to determine the precedence order,
potentially moving hypotheses are marked as moving. regardless of their movement flags. This requires a situational
2) Observed Moving Versus Not Observed Moving: The treatment of false positives, such as a parked vehicle or other
determination of whether a dynamic obstacle hypothesis has obstacles close to the intersection, and false negatives caused
been observed moving begins with an evaluation of the distance by measurement dropout or transient occlusion. These possible
that it has traveled since it was first classified as not observed errors prevent the assignment of precedence to specific vehi-
moving. If this distance is above a configurable threshold and cles, and therefore, the robot instead uses an intersection-centric
the hypothesis would be classified as moving solely based on precedence estimation algorithm.
the movement observations from the Sensor Layer, then the For each lane entering the upcoming intersection, an oc-
obstacle is classified as observed moving. If it is not directly cupancy polygon is generated based on the road structure,
supported by movement observations, then the obstacle hypoth- encompassing the entry point and the area backward along that
esis must be continuously classified as moving for a time period lane for some configurable distance, as shown in Fig. 6. A given
tobm
min,1 to be classified as observed moving. polygon is considered occupied when the estimated front
The check against the distance traveled increases the robust- bumper of either Boss or any dynamic obstacle hypothesis is in-
ness against short-term artifacts, particularly during the initial- side. For polygons in the occupied state, the system maintains
ization phase of the filters. It is possible to skip this check for two pieces of temporal data, i.e., the time of the first occupancy
objects that are confirmed to be moving by the sensors alone; and the time of the most recent occupancy. The former is used
however, this did not significantly increase the performance, in the determination of the precedence order, which is a simple
and therefore, the team favored the more conservative approach. matter of sorting the occupied polygons in ascending order by
The observed moving property is immediately cleared if the first occupancy. The latter is used to implement a temporal
object is not classified as moving during the time period leading delay on when a polygon becomes unoccupied, such that it
up to a second threshold tobm obm
min,2 > tmin,1 . Once this second must persistently be empty over some span of time to transition
DARMS et al.: OBSTACLE DETECTION AND TRACKING FOR THE URBAN CHALLENGE 483
Another issue was that no one sensor could provide all of [6] H. Winner, Adaptive cruise control, in Automotive Electronics Hand-
the information necessary to track the vehicles around Boss. book, R. K. Jurgen, Ed. New York: McGraw-Hill, 1999.
[7] R. Bishop, Intelligent Vehicle Technology and Trends. London, U.K.:
Intelligent mechanisms for fusing the data from RADAR and Artech House, 2005.
LIDAR were critical for the successful operation of the whole [8] Z. Sun, G. Bebis, and R. Miller, On-road vehicle detection: A review,
system. In the fusion process, we proposed an adaptive model IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 5, pp. 694711,
May 2006.
switching approach where the tracking model for an object is [9] C. Tzomakas and W. von Seelen, Vehicle detection in traffic scenes using
selected based on the available sensor information. shadows, Ruhr Univ. Bochum: Institut fuer Neuroinformatik, Bochum,
The vehicle-tracking system takes the raw data returned Germany, IRINI 98-06, 1998.
[10] A. Kirchner and C. Ameling, Integrated obstacle and road tracking using
from the sensors and interprets them at multiple layers. The a laser scanner, in Proc. IEEE Intell. Vehicles Symp., Dearborn, MI, 2000,
final interpretation is left to the robots higher level decision pp. 675681.
processes, which use the situational context to understand the [11] D. Streller, K. Furstenberg, and K. Dietmayer, Vehicle and object models
for robust tracking in traffic scenes using laser range images, in Proc.
scene and form a situation hypothesis. We found that this IEEE Int. Conf. Intell. Transp. Syst., 2002, pp. 118123.
greatly improved the performance of the overall system since [12] M.-S. Lee and Y.-H. Kim, An efficient multitarget tracking algorithm for
the tracking system stays independent of the situational context car applications, IEEE Trans. Ind. Electron., vol. 50, no. 2, pp. 397399,
Apr. 2003.
and the decision algorithms of the robot can be adapted to the [13] R. Mobus and U. Kolbe, Multi-target multi-object tracking, sensor fu-
quality of the perception data and the specific situation. sion of radar and infrared, in Proc. IEEE Intell. Vehicles Symp., 2004,
The general tracking architecture that separates the Sensor pp. 732737.
[14] R. MacLachlan and C. Mertz, Tracking of moving objects from a moving
Layer from the Fusion Layer allowed individual new sensors vehicle using a scanning laser rangefinder, in Proc. IEEE Intell. Transp.
to be folded into an entire system as they were brought online Syst., Sep. 2006, vol. 2006, pp. 301306.
while minimizing the need to modify the other components of [15] A. Jazwinsky, Stochastic Processes and Filtering Theory. New York:
Academic, 1970.
the subsystem. Additionally, new scenarios can easily be added [16] R. E. Kalman and R. Bucy, New results in linear filtering and prediction
to the system without requiring additional code. theory, Trans. ASME, J. Basic Eng., vol. 83, pp. 95108, 1961.
In future work, we see that the most beneficial work can be [17] Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation With Appli-
cations to Tracking and NavigationTheory, Algorithms and Software.
accomplished by focusing on improving the low-level sensor New York: Wiley-Interscience, 2001.
details. For instance, more intelligent feature-extraction algo- [18] G. J. Bierman, Factorization Methods for Discrete Sequential Estimation.
rithms for LIDARs could help to more robustly reject spurious New York: Academic, 1977.
[19] L. Stone, C. Barlow, and T. Corwin, Bayesian Multiple Target Tracking.
returns caused, e.g., by bushes near the roads. In addition, Norwood, MA: Artech House, 1999.
sensors with a wide vertical field of view could be employed [20] Y. Bar-Shalom and X.-R. Li, Multitarget Multisensor Tracking
to better identify the location of the ground plane. Principles and Techniques. Storrs, CT: YBS, 1995.
[21] D. B. Reid, An algorithm for tracking multiple targets, IEEE Trans.
Autom. Control, vol. AC-24, no. 6, pp. 843854, Dec. 1979.
[22] I. J. Cox and S. L. Hingorani, An efficient implementation and evaluation
ACKNOWLEDGMENT of Reids multiple hypothesis tracking algorithm for visual tracking, in
Proc. Int. Conf. Pattern Recog., 1994, pp. 437442.
Parts of this paper were initially published in local German [23] Y. Bar-Shalom, K. C. Chang, and H. A. P. Blom, Tracking a maneu-
conference proceedings, i.e., passages about the perception vering target using input estimation vs. the interacting multiple model
architecture in [1], the environment model and situational rea- algorithm, IEEE Trans. Aerosp. Electron. Syst., vol. 25, no. 2, pp. 296
300, Mar. 1989.
soning in [2], and the model switching approach in [3]. The [24] K. Murphy, Learning switching Kalman filter models, Compaq
movement classification was first published in [4]. This work Cambridge Res. Lab., Cambridge, MA, Tech. Rep. 98-10, 1998.
would not have been possible without the dedicated efforts of [25] D. L. Hall, Handbook of Multisensor Data Fusion. Boca Raton, FL:
CRC, 2001.
the Tartan Racing team. [26] I. Jacobson, G. Booch, and J. Rumbaugh, The Unified Software Develop-
ment Process. Reading, MA: Addison-Wesley, 1999.
[27] L. A. Klein, Sensor and Data Fusion Concepts and Applications.
R EFERENCES
Bellingham, WA: SPIE, 1999.
[1] M. Darms, P. Rybski, and C. Urmson, Vehicle detection and tracking [28] R. R. Brooks and S. S. Iyengar, Multi Sensor Fusion. Upper Saddle
for the urban challenge, in Automatisierungs-, Assistenz- und Einge- River, NJ: PrenticeHall, 1998.
bettete Systeme fr Transportmittel Symp., Braunschweig, Germany, [29] S. S. Blackman and R. Popoli, Design and Analysis of Modern Tracking
2008. Systems. Boston, MA: Artech House, 1999.
[2] M. Darms, C. Baker, P. Rybski, and C. Urmson, Vehicle detection and [30] M. Darms, Eine basis-systemarchitektur zur sensordatenfusion von
tracking for the urban challenge, in Proc. 5. Workshop Fahrerassisten- umfeldsensoren fuer fahrerassistenzsysteme, Fortschrittberichte VDI
zsysteme FAS2008, Apr. 2008, pp. 5767. Reihe 12, no. 653, 2007, Duesseldorf, Germany.
[3] M. Darms, P. Rybski, and C. Urmson, An adaptive model switching [31] F. Dellaert, D. Pomerlau, and C. Thorpe, Model-based car tracking in-
approach for a multisensor tracking system used for autonomous driving tegrated with a road follower, in Proc. IEEE Int. Conf. Robot. Autom.,
in an urban environment, in Steuerung und Regelung von Fahrzeugen Leuven, Belgium, May 1998, vol. 3, pp. 18891894.
und MotorenAUTOREG. Duesseldorf, Germany: VDI-Verlag, [32] S. Gupte, O. Masoud, R. Martin, and N. P. Papanikolopoulos, Detection
2008. and classification of vehicles, IEEE Trans. Intell. Transp. Syst., vol. 3,
[4] M. Darms, P. Rybski, and C. Urmson, Classification and tracking of no. 1, pp. 3747, Mar. 2002.
dynamic objects with multiple sensors for autonomous driving in ur- [33] N. Kaempchen, K. Weiss, M. Schaefer, and K. Dietmayer, IMM object
ban environments, in Proc. IEEE Intell. Vehicles Symp., Eindhoven, tracking for high dynamic driving maneuvers, in Proc. IEEE Intell.
The Netherlands, Jun. 2008, pp. 11971202. Vehicles Symp., 2004, pp. 825830.
[5] C. Urmson, C. Ragusa, D. Ray, J. Anhalt, D. Bartz, T. Galatali, [34] J. Effertz, Tracking und datenfusion im urbanen fahrzeugumfeld, in
A. Gutierrez, J. Johnston, S. Harbaugh, H. Y. Kato, W. Messner, N. Miller, Automatisierungs-, Assistenz- und Eingebettete Systeme fr Transportmit-
K. Peterson, B. Smith, J. Snider, S. Spiker, J. Ziglar, W. R. Whittaker, tel, Symp., Braunschweig, Germany, 2008, pp. 181197.
M. Clark, P. Koon, A. Mosher, and J. Struble, A robust approach to high- [35] R. MacLachlan, Tracking moving objects from a moving vehicle using a
speed navigation for unrehearsed desert terrain, J. Field Robot., vol. 23, laser scanner, Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, Tech.
no. 8, pp. 467508, Aug. 2006. Rep. CMU-RI-TR-05-07, Jun. 2005.
DARMS et al.: OBSTACLE DETECTION AND TRACKING FOR THE URBAN CHALLENGE 485
Michael S. Darms was born in Berlin, Germany, on Christopher Baker received the B.S. degree in
October 19, 1976. He received the Dipl.-Wirtsch.- computer science and the M.S. degree in robotics
Ing. degree in business administration and elec- from Carnegie Mellon University, Pittsburgh, PA, in
trical engineering and the Dr.-Ing. degree (with 2002 and 2004, respectively. He is currently working
highest distinction) from Technische Universitt toward the Ph.D. degree with the Robotics Institute,
Darmstadt, Darmstadt, Germany, in 2002 and 2007, Carnegie Mellon University.
respectively. He is currently researching the challenges of the
From 2002 to 2006, he was a Scientific Assistant design and integration of complex robotic systems,
with the Chair of Automotive Engineering, Tech- under the supervision of Dr. J. Dolan. He has par-
nische Universitt Darmstadt, where he worked on ticipated in the development of robotic systems for
project PRORETA, which is a joint research project autonomous subterranean exploration and mapping.
with Continental Automotive Systems. New approaches for automotive colli- He was a Subsystem Architect for Carnegie Mellon Universitys Tartan Racing
sion avoidance and collision mitigation applications were developed. He was team, which won the 2007 DARPA Urban Challenge.
responsible for the environment perception and sensor data-fusion algorithms.
In 2006, he joined Continental Automotive Systems, Lindau, Germany. From
2006 to 2007, he joined the Tartan Racing team as Continentals Embedded Chris Urmson received the B.Sc. degree in com-
Engineer and worked on the perception system of the autonomous robot puter engineering from the University of Manitoba,
winning the 2007 DARPA Urban Challenge. He is currently a Manager with Winnipeg, MB, in 1998 and the Ph.D. degree in ro-
the Department of Advanced Engineering, Division Chassis and Safety, Conti- botics from Carnegie Mellon University, Pittsburgh,
nental Automotive Group. His research interests include sensor data fusion and PA, in 2005.
software architecture design. He is currently an Assistant Research Professor
Dr. Darms is a member of Verein Deutscher Ingenieure. He was the recip- with the Field Robotics Center, Carnegie Mellon
ient of the Outstanding Paper Award at the 2005 IEEE Intelligent Vehicles University. He has served as the Director of Technol-
Symposium and the 2008 Best Poster Award at the IEEE Intelligent Vehicles ogy for the Urban Challenge with Carnegie Mellon
Symposium. University and as a Robotics Research Scientist with
the Science Applications International Corporation
(SAIC).
Paul E. Rybski was born in Austin, TX, on June 25, Dr. Urmson was the Technical Leader for the Tartan Racing team, which won
1974. He received the interdisciplinary B.A. degree the 2007 DARPA Urban Challenge. He has been the recipient of several tech-
in mathematics/computer science from Lawrence nical awards, including the SAIC Research, Development, Test, and Evaluation
University, Appleton, WI, in 1995 and the M.S. and Technology Award and the Boeing Red Phantom Award. He is also a Siebel
Ph.D. degrees in computer science and engineering Scholar.
with a minor in cognitive science from the Univer-
sity of Minnesota, Minneapolis, in 2001 and 2003,
respectively.
From 2003 to 2005, he was with the Robot-
ics Institute, Carnegie Mellon University (CMU),
Pittsburgh, PA, as a Postdoctoral Fellow, working
on human activity recognition, perception and modeling for human/robot
interaction, and shared state estimation for multirobot teams. In 2005, he joined
the faculty of the Robotics Institute as a Systems Scientist and continued
his work on humanrobot interaction and joined the Tartan Racing team to
work on perception for autonomous driving. He is also the Perception Lead
for the General Motors/CMU Autonomous Driving Collaborative Research
Laboratory.
Dr. Rybski is a member of the IEEE Robotics and Automation Society, the
Association for the Advancement of Artificial Intelligence (AAAI), and the
Association for Computing Machinery. He has been an organizer for the AAAI
Mobile Robotics Competition, the RoboCup@Home League, the Semantic
Robot Vision Challenge, and the IEEE International Conference on Robotics
and Automation Robotics Challenge.