Professional Documents
Culture Documents
Abstract
Action anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous driving pol-
icies in interactive scenarios. Paramount, however, is ensuring safety on the road: a key challenge in doing so is account-
ing for uncertainty in human driver actions without unduly impacting planner performance. This article introduces a
minimally interventional safety controller operating within an autonomous vehicle control stack with the role of ensuring
collision-free interaction with an externally controlled (e.g., human-driven) counterpart while respecting static obstacles
such as a road boundary wall. We leverage reachability analysis to construct a real-time (100 Hz) controller that serves
the dual role of (i) tracking an input trajectory from a higher-level planning algorithm using model predictive control, and
(ii) assuring safety by maintaining the availability of a collision-free escape maneuver as a persistent constraint regard-
less of whatever future actions the other car takes. A full-scale steer-by-wire platform is used to conduct traffic-weaving
experiments wherein two cars, initially side-by-side, must swap lanes in a limited amount of time and distance, emulating
cars merging onto/off of a highway. We demonstrate that, with our control stack, the autonomous vehicle is able to avoid
collision even when the other car defies the planner’s expectations and takes dangerous actions, either carelessly or with
the intent to collide, and otherwise deviates minimally from the planned trajectory to the extent required to maintain
safety.
Keywords
Probabilistic planning, safety-preserving controller, backward reachability analysis, vehicle model predictive con-
trol, human–robot interaction
due to the uncertainty in how humans may behave. To short time horizons otherwise it will lead to overly conser-
quantify this uncertainty, robots often rely on generative vative robot behaviors or even planning problem infeasi-
models of human behavior in order to inform their plan- bility as the set to avoid grows. In general, approaches
ning algorithms (Sadigh et al., 2016; Schmerling et al., that use forward reachability results in very conservative
2018), thereby enabling more efficient and communicative results, especially for interactive scenarios where freedom
interactions. In general, under nominal operating condi- of motion (e.g., nudging) is necessary to convey intent.
tions that reflect the modeling assumptions, these model- These methods, as well as methods that enforce safety as
based probabilistic planners can offer high performance an objective, are often subjected to model simplification,
(e.g., minimizing time and control effort for the robot). such as using a linear dynamics model instead of a non-
However, these planners alone are typically insufficient linear one, or making simplifying assumptions for compu-
for ensuring absolute safety because (i) they depend on tational tractability. However, model simplifications can
probabilistic models of human behavior and, thus, safety lead to overly conservative or imprecise results which
is not enforced deterministically, (ii) dangerous but low- may impede performance.
likelihood events may not be adequately captured in the Safety can also be introduced at the low-level which
human behavior prediction model, and (iii) reasoning often obeys higher-fidelity dynamics models. A common
about these probabilistic behavior models is typically too approach in ensuring safe low-level controls is to use reac-
computationally expensive for the planners to react in real tive collision avoidance techniques: the robot is normally
time when humans strongly defy expectations and/or allowed to apply any control, but switches to an avoidance
diverge from modeling assumptions. controller when near safety violation. Examples of this
In this work we implement a control stack for a full- general approach include HJ backward reachability-based
scale autonomous car (the ‘‘robot’’) engaging in close controllers which have been applied in such a switched
proximity interactions with a human-controlled vehicle fashion (Bajcsy et al., 2019; Chen and Tomlin, 2018;
(the ‘‘human’’). Our control stack aims to stay true to Fisac et al., 2018) and have proven to be effective for
planned trajectories from a high-level planner because avoiding other interacting agents and static obstacles. HJ
freedom of motion is essential for the planner to carry out reachability has been studied extensively and applied suc-
the driving task while conveying future intent to the other cessfully in a variety of safety-critical interactive settings
vehicle. At the same time, we allow the robot to deviate (Bokanowski et al., 2010; Chen et al., 2017; Dhinakaran
from the desired trajectory to the point that is necessary to et al., 2017; Gattami et al., 2011; Margellos and Lygeros,
maintain safety. Our primary tool for designing a control- 2011; Mitchell et al., 2005) owing to its flexibility with
ler that does not needlessly impinge upon the planner’s respect to system dynamics, and its optimal (i.e., non-
choices is Hamilton–Jacobi (HJ) backward reachability. overly conservative) avoidance maneuvers stemming from
We provide a brief overview of the reachability analysis its equivalence to an exhaustive search over joint system
literature relevant to our work in Section 3. dynamics. Other approaches include using a precomputed
emergency maneuver library (Arora et al., 2015). A key
drawback to switching controllers, however, is that the
2. Related work performance goals considered by the high-level planner
are completely ignored when the reactive controller steps
For high-level planners, safety is often incentivized, but in. For some cases, this may be acceptable and necessary,
not strictly enforced as a hard constraint. For example, but in general, it is desirable to ensure safety without
safety is often part of the objective function when selecting unduly impacting planner performance where possible.
optimal plans, or represented via artificial potential fields Instead of switching to a different controller, Gray et al.
(Sadigh et al., 2016; Schmerling et al., 2018; Wolf and (2013), Funke et al. (2017), and Brown et al. (2017) adapt
Burdick, 2008). Although these approaches are designed to the low-level online optimal controller to incorporate con-
account for interactive scenarios in which another sentient straints that avoid static obstacles. The approach taken by
agent is a key environmental consideration, they contain Gray et al. (2013) aims to be minimally interventional,
competing objectives (i.e., collision avoidance versus goal- however, they only consider cases with disturbance uncer-
oriented performance), often do not plan at a sufficiently tainty rather than environmental uncertainty stemming
high rate to account for rapidly changing environments, from the system interacting with other human agents.
and ultimately provide no theoretical framework for ensur- These approaches are able to strike the optimal balance
ing safety for both the human and the robot. between tracking performance and safety (i.e., optimizing
Another common approach to finding collision-free performance subject to safety as a hard constraint), but
plans is to use forward reachability. The idea is to fix a these approaches as presented are effective for avoiding
time horizon and compute the set of states where the other static obstacles only.
agents could possibly be in the future and plan trajectories Inspired by the non-conservative nature and optimality
for the robot that avoid this set (Althoff and Dolan, 2011; of HJ reachability, and the performance of online optimal
Liu et al., 2017; Lorenzetti et al., 2018). Although this control for trajectory tracking, this work aims to infuse
gives a stronger sense of safety, this is only practical for reachability-based safety assurance into the low-level
1328 The International Journal of Robotics Research 39(10-11)
controller such that when the robot is near safety violation, A preliminary version of this work appeared at the
the robot is able to simultaneously maintain safety and fol- 2018 International Symposium on Experimental Robotics
low the high-level plans without unduly affecting perfor- (ISER). In this revised and extended version, we provide
mance. To the best of the authors’ knowledge there has the following additional contributions: (i) an extension to
not been any work that explicitly addresses the integration the control framework to account for static obstacles, (ii)
of reachability-based safety controllers as a component more exposition on the vehicle dynamics models used and
within a robot’s control stack, i.e., with safety as a con- a deeper discussion on the underlying modeling assump-
straint upon a primary planning objective. tions, (iii) an empirical study of the safety–efficiency
trade-off for our approach, and (iv) additional experimen-
tal results including trials with a static road boundary wall
2.1. Statement of contributions and a comparison with a baseline approach.
The contributions of this article are twofold. First, we pro-
pose a method for formally incorporating reachability- 3. Background: HJ reachability
based safety within an existing optimization-based control
framework. The main insight that enables our approach is In this section, we provide a brief overview of reachability
the recognition that, near safety violation, the set of safety- analysis as a tool for constructing safe trajectory plans, and
preserving controls often contains more than just the opti- introduce HJ backward reachability concepts relevant to
mal avoidance control. Instead of directly applying this this work.
optimal avoidance control when prompted by reachability
considerations, as in a switching control approach, we 3.1. Overview: reachability analysis
quantify the set of safety-preserving controls and pass it to
the broader control framework as a constraint. Our intent Given a dynamics model governing a robotic system
is to enable minimal intervention against the direction of a incorporating control and disturbance inputs, reachability
higher-level planner when evasive action is required. analysis is the study of the set of states that the system can
Second, we evaluate the benefits, performance, and trade- reach from its initial conditions. It is often used for formal
offs of this safe control methodology in the context of a verification as it can give guarantees on whether or not the
probabilistic planning framework for the traffic-weaving evolution of the system will be safe, i.e., whether the
scenario studied at a high level in Schmerling et al. (2018), reachable set includes undesirable outcomes. Reachability
wherein two cars, initially side-by-side, must swap lanes in analysis can be divided into two main paradigms: (i) for-
a limited amount of time and distance. Experiments with a ward reachability and (ii) backward reachability.
full-scale steer-by-wire vehicle reveal that our combined
control stack achieves better safety than applying a track- 3.1.1. Forward reachability analysis. The forward reach-
ing controller alone to the planner output, and smoother able set (FRS) is the set of states that the system could
operation (with similar safety) compared with a switching potentially be in after some time horizon t. This is com-
control scheme; in our discussion, we provide a roadmap puted by propagating the dynamics combined with all fea-
towards improving the level of safety assurance in the face sible control sequences and disturbances forward in time.
of practical considerations such as unmodeled dynamics, When considering the interaction between two agents, for
as well as towards generalizations of the basic traffic- example a human and a robot, the FRS is computed for the
weaving scenario. human and the robot plans to avoid this set to ensure
(a) (b)
Fig. 1. Illustration of forward and backward reachability, and how they are commonly used to ensure safe planning and controls.
The robot and human are depicted in red and blue, respectively. (a) The human’s dynamics are propagated forward in time. The
robot plans trajectories that stay outside of the human’s FRS (shaded region). (b) Joint dynamics are propagated backwards in time.
The BRS (shaded region) is not overly conservative even for long time horizons because the robot can react if the human was to
swerve (see right). The robot is in an unsafe state if the human enters the BRS, defined in the relative state space.
Leung et al. 1329
collision-free trajectories (see Figure 1a). This open-loop on safe controls which can be evaluated at high operating
mentality, however, leads to an overly conservative robot frequencies to react to split-second threats.
outlook. That is, while considering actions in the present, Moreover, we compute the BRS using HJ reachability
the robot does not incorporate the possibility that its future analysis, a particular approach to computing reachable
observations of where the human goes might influence sets. There are many existing approaches (Althoff and
how much it actually needs to take avoidance actions; in Krogh, 2014; Frehse et al., 2011; Greenstreet and
short, the robot’s plan to avoid the FRS does not incorpo- Mitchell, 1998; Kurzhanski and Varaiya, 2000; Majumdar
rate closed-loop feedback. To reduce the over-conservative et al., 2014), but there is always a trade-off between mod-
nature of forward reachability, the time horizon t for which eling assumptions, scalability, and representation fidelity
the FRS is computed over is typically kept small and is (i.e., whether the method computes over- or under-
recomputed frequently. This approach has been found to approximations of the reachable set). Compared with alter-
be effective in finding collision-free trajectories (Althoff natives approaches, HJ reachability is the most computa-
and Dolan, 2014), but it is difficult to extend to interactive tionally expensive, but it is able to compute the BRS
scenarios where there are more uncertainties in the rapidly exactly1 for any general nonlinear dynamics with control
changing environment. Aside from the overly conservative and disturbance inputs because it essentially uses a brute
nature of FRS, the key drawback with using forward reach- force computation via dynamic programming. As a result,
ability is that safety (i.e., avoiding the FRS) hinges on the the BRS solution to the HJ reachability problem represents
planner’s capabilities and operating frequency. Even if a constructive proof of the existence of a safety-preserving
computing the FRS is instantaneous, the planner may still control policy (i.e., a safety certificate) from states outside
be unable to react to split-second threats. the BRS. Despite the apparent computational drawbacks,
we note that the BRS may be computed offline, and
requires only a near-instant lookup during runtime, allow-
3.1.2. Backward reachability analysis. Let the target set ing it to be used in controllers or planners that run at a
represent a set of undesirable states (e.g., collision states very high operational frequency.
of the system). The backward reachable set (BRS) is the
set of states that could result in the system being in the tar-
get set, assuming worst-case disturbances, after some time 3.2. HJ backward reachability analysis
horizon t. Specifically, the BRS represents the set of states We briefly review relevant HJ backward reachability defi-
from which there does not exist a controller that can pre- nitions for the remainder of this section; see Chen and
vent the robot dynamics from being driven into the target Tomlin (2018) for a more in-depth treatment. HJ reach-
set under worst-case disturbances within a time horizon t. ability casts the reachability problem as an optimal control
As such, to rule out such an eventuality, the BRS is treated problem and thus computing the reachable set is equiva-
as the ‘‘avoid set.’’ Critical differences between the FRS lent to solving the Hamilton–Jacobi–Isaacs (HJI) partial
and BRS are that (i) the BRS is computed backwards in differential equation (PDE).
time, and (ii) the BRS is computed assuming closed-loop The general HJ reachability formulation is as follows.
reactions to the disturbances. Illustrated in Figure 1b, for Let the system dynamics be given by x_ = f (x, u, d) where
the case of relative dynamics between human- and robot- x 2 Rn is the state, u 2 U Rm is the control, and
controlled vehicles, computation of the BRS takes into d 2 D Rp is the disturbance. For example, x could be
account the fact that the robot is able to react to the human the state of a robot, u be the robot’s controls, d be environ-
at any time and in any state configuration (if the human mental disturbances such as wind, and f be the dynamics
was to swerve into the robot, the robot can swerve too to of the robot. The system dynamics f : Rn × U × D ! Rn
avoid a collision). This leads to the BRS being less overly are assumed to be uniformly continuous, bounded, and
conservative than the FRS. In practice, the results of the Lipschitz continuous in x for a fixed u and d. Let T Rn
BRS computation are cached via a look-up table, and at be the target set that the system wants to avoid at the end
run-time, the optimal robot policy can be computed via a of a time horizon jtj (note that t\0 when propagating
near-instant lookup of the reachability cache. As such, we backwards in time). For collision avoidance, T typically
can always compute optimal actions for the robot at any represents the set of states that are in collision with an
state configuration and regardless of the high-level planner obstacle. For brevity, the following description of HJ
used. backward reachability will use notation relevant to the rest
We elect to use backward reachability because (i) its of the article, that is, we are interested in collision avoid-
non-overly conservative nature stemming from the closed- ance for human–robot interactions. Despite the notation,
loop computation ensures that safe controls will be used the following theory is still applicable to general systems
only when necessary and not unduly affect planner perfor- x_ = f (x, u, d) outside the domain of human–robot
mance, (ii) safety is defined intrinsically in the BRS com- interactions.
putation whereas safety using FRS-based approaches In the context of human–robot interactions, f ()
depends on whether the robot’s planned trajectory inter- describes the relative dynamics between the human and
sect the FRS, and (iii) it provides a computational handle the robot (denoted by frel () where the subscript rel
1330 The International Journal of Robotics Research 39(10-11)
Fig. 2. Contour plots of slices of the HJI value function V computed from the relative dynamics (see Equation (8)) and choice of
signed distance as the terminal value function. Slices show V as a function of relative x and y position (see Figure 3 for an
illustration of the human–robot relative pose) with all other states held fixed. The zero-level set, outlined with a thick black line, is
the backwards reachable set from collision states (i.e., the avoid set).
Fig. 4. Decision-making and control stack for human–robot pairwise vehicle interactions. Our contribution in this work is the
integration of safety-ensuring control constraints, derived from a HJ BRS computed and cached offline, into a model predictive
controller’s tracking optimization problem. A high-level interaction planner produces nominal trajectories for the robot car and the
low-level safe tracking controller executes controls that minimally deviate from the planner’s choice if the vehicles approach the set
of unsafe relative states.
unavoidable if the human brakes abruptly. In Figure 2 describe in detail how to infuse reachability-based safety
(right), if the human car is traveling faster, in this case at assurance within a multi-tiered control framework that
an angle, it is unsafe for the robot to be in the region in consists of different planning objectives.
front of the human car because the robot car may not
be able to maneuver out of the way before the human
catches up. 4. Control stack architecture
We can compute the optimal robot collision avoidance In this section, we propose using a safety-preserving HJI
control controller, rather than switching to the optimal HJI control-
ler defined in Equation (2), and describe how to incorpo-
uR = arg max min rV (xrel )T frel (xrel , uR , uH ) ð2Þ rate it within an existing control stack, a high-level planner
uR uH
feeding desired trajectories to a low-level tracking control-
which offers the greatest increase in V (xrel ) assuming opti- ler, to enable safe human–robot interactions that minimally
mal (worst-case) actions by the human. For general non- impinges on the high-level planning performance objec-
linear systems, computing the optimal collision avoidance tive. The control stack architecture is illustrated in Figure
control in Equation (2) may be non-trivial as there could 4. The proposed control stack is applicable to general
be multiple local maxima. For control/disturbance affine human–robot interactions (e.g., an autonomous car inter-
systems, however, the solutions to the optimal control/dis- acting with a human-driven car, an autonomous manipula-
turbance are bang-bang. tor arm working alongside a human, wheeled mobile
Recall from Equation (1) that when the system is out- robots navigating a crowded sidewalk). However, in this
side of the avoid set A(t) (i.e., V .0), there exists a robot work, we focus on the traffic-weaving scenario, wherein
policy (e.g., Equation (2)) that keeps the robot safe over a two cars initially side-by-side must swap lanes in a limited
time horizon jtj regardless of any (including adversarial) amount of time and distance, because it is a representative
policy taken by the human. Previous applications of HJI interactive scenario that encapsulates many challenging
solutions switch to the optimal control (Equation (2)) characteristics inherent to human–robot interaction.
when near the boundary of the BRS, i.e., when safety is Successful and smooth traffic-weaves rely on action antici-
nearly violated (Bajcsy et al., 2019; Fisac et al., 2018) pation, intent prediction, and proactive behavior from each
This reflects the goal of HJ reachability-based safety vehicle, and ensuring safety is critical because collisions
which is to ensure that the system stays outside of the may lead to life-threatening injury.
avoid set A(t), that is, the value function should always
stay positive.
In an interactive scenario where, for example, we may 4.1. Interaction planner
want to let a robot planner convey intent by nudging Our proposed control framework is agnostic to the interac-
towards the human car to the extent that is safe, we prefer tion planner used. The only assumption for the planner is
a less-extreme control strategy. In the next section, we that it outputs desired trajectories (which presumably
1332 The International Journal of Robotics Research 39(10-11)
reflect high-performance goal-oriented nominal behavior) propose adding containment in the set of safety-preserving
for the robot car to track. In general, high-level planners controls as an additional constraint to the low-level MPC
often optimize objectives that weigh safety considerations tracking controller. The set of safety-preserving controls
(e.g., distance between cars) relative to other concerns
(e.g., control effort), and typical to human–robot interac- U R (xrel ) = fuR : min rV (xrel )T frel (xrel , uR , uH ) ø 0g ð3Þ
uH
tive scenarios, they may reason anticipatively with respect
to a probabilistic interaction dynamics model. That is, represent the set of robot controls that ensure the value
although the planner is encouraged to select safer plans, function is non-decreasing. This constraint can be com-
safety is not enforced as a deterministic constraint at the puted online since the BRS is computed offline and the
planning level. value function V and its gradient rV are cached. Online
In this work, we use the traffic-weaving interaction we employ a safety buffer e.0 so that when the condition
planner from Schmerling et al. (2018). It uses a predictive V (xrel ) ł e holds, indicating that the robot is nearing safety
model of future human behavior to select desired trajec- violation, we add the constraint uR 2 U R (xrel ) to the list of
tories for the robot car to follow, updated at ;3 Hz. We tracking controller constraints. By adding this additional
extend this work by using a hindsight optimization policy safety-preserving constraint when near safety violation,
(Yoon et al., 2008) instead of the limited-lookahead action the MPC tracking controller selects control actions that
policy in order to encourage more information-seeking prevent the robot from further violating the safety thresh-
actions from the robot. old while simultaneously optimizing for tracking perfor-
mance and obeying additional constraints. This results in a
minimally interventional safety controller: the MPC track-
4.2. Tracking controller ing controller will only minimally deviate from the desired
Given a desired trajectory from the interaction planner, trajectory to the extent necessary to maintain safety for the
the tracking controller computes optimal controls to track robot car.
the desired trajectory. We assume that the outputs of the For the traffic-weaving scenario investigated in this
low-level tracking controller are directly usable by the article, the MPC problem for vehicle trajectory tracking
robot’s actuators (e.g., steering and longitudinal force (Brown et al., 2017) is formulated as a quadratic program
commands for a vehicle, torque commands for each joint (QP) (to enable fast solve time amenable to a 100 Hz
on a manipulator arm). As such, the tracking controller operating frequency) and, hence, requires the constraints
typically uses a more accurate dynamics model and oper- to be linear. As such, we instead apply the constraint
ates at a much higher frequency (;100 Hz) than the inter- uR 2 U e R (xrel ) where
action planner, often at the expense of environmental
awareness. To ensure safety with respect to a dynamic e R (xrel ) = fuR : M T uR + bHJI ø 0g
U ð4Þ
HJI
obstacle (i.e., a human-driven vehicle) we incorporate
additional constraints computed from HJ reachability anal- is a linearized approximation of U R (xrel ). Specifically, for
ysis into the tracking optimization problem. These con- the current relative state ~xrel , current robot control ~uR , and
straints are designed to ensure that at each control step the optimal, i.e., worst-case, human action defined analo-
robot car does not enter an unsafe set of relative states that gously to Equation (2) uH , the terms in the linearization
may lead to collision. are
In this work, we are concerned with vehicle trajectory
∂
tracking. We adapt the real-time model predictive control MHJI = rV T frel (xrel , uR , uH ) (~xrel , u , ~uR )
(MPC) tracking controller from Brown et al. (2017) by ∂uR H
modifying it to include an additional invariant set con- bHJI = rV T frel (~xrel , uH , ~uR ) MHJI
T
~uR
straint detailed in the next section. This MPC tracking con-
troller, operating at 100 Hz, computes optimal controls to In general, U R (xrel ) may not be a half-space, leading to
track a desired trajectory by solving an optimization prob- the linear approximation U e R (xrel ) including controls out-
lem at each iteration. The optimization problem is based side of U R (xrel ). However, because we bound the change
on a single-track vehicle model (also known as the bicycle in the control inputs across each time step, feasible con-
model) and incorporates friction and stability control con- trols remain close to the linearization point where the
straints (in addition to control and state constraints) while approximation error is small.
minimizing a combination of tracking error and control
derivatives. A more in-depth treatment of this combined
5. Dynamics
MPC and HJI controller is given in Section 6.
In this section, we detail the vehicle dynamics model used
to model the robot and human car, the relative dynamics
4.3. Safety-preserving HJI control model between the human and the robot necessary for
Rather than switching to the optimal avoidance controller computing the BRS, and the tracking dynamics used for
defined in Equation (2) when nearing safety violation, we the MPC tracking controller. We use a six-state nonlinear
Leung et al. 1333
(Fxdrag = Cd0 Cd1 UxR Cd2 Ux2R ) and for vehicle mass angles and normal forces (accounting for weight transfer
(m) and moment of inertia (Izz ), and the distances from the due to Fxi ) for the front and rear tires can be computed
center of mass to the front and rear axles (df , dr ), the equa- using the following equations
tions of motion for the robot car are
UyR + df rR UyR dr rR
2 3 af = tan1 d, ar = tan1
UxR cos cR UyR sin cR Ux R UxR
6 UxR sin cR + UyR cos cR 7 ex ex
6 7 mgdr h F mgdf + h F
6 r 7 F zf = , Fzr =
x_ R = 6 7
6 m1 (Fxf cos d Fyf sin d + Fxr + Fxdrag ) + rR UyR 7 L L
6 7
4 1
m (Fyf cos d + Fyr + Fxf sin d) rR UxR
5 where h is the distance from the center of mass to the
1
(d F
Izz f yf cos d + d F
f xf sin d d F
r yr ) ground, L = df + dr , and Fe x = Fxf cos d Fyf sin d + Fxr is
ð5Þ the total longitudinal force in the vehicle’s body frame.
chosen such that the robot and human car share the same lateral error e be the lateral distance along direction ^e and
power, steering, and friction limits. The equations of Dc = cR cpath be the robot car’s heading relative to the
motion for the human car are path computed from this closest point. Using this projec-
2 3 tion and the robot car’s dynamics from Equation (5), we
vH cos cH can compute the error dynamics relative to the desired
6 vH sin cH 7 path. Using the same notation defined previously in
x_ H = 6
4
7
5 ð7Þ
v Equation (5), the state for the robot car tracking a desired
T
a path is ^x = ½ s UxR UyR rR Dc e and the con-
T
trols u = uR = ½ d Fx . The tracking dynamics are
Owing to its simpler dynamics representation, the
human car has a transient advantage in control authority 2 3
UxR cos Dc UyR sin Dc
over the robot car (it may change its path curvature dis- 6 m1 (Fxf cos d Fyf sin d + Fxr + Fxd ) + rR UyR 7
continuously, while the robot may not), but by equating 6 1 7
6 yf cos d + Fyr + Fxf sin d) rR UxR 7
the steady-state control limits we ensure that the infinite _^x = 6 m (F 7 ð9Þ
6 1
Izz f Fyf cos d + df Fxf sin d dr Fyr )
(d 7
time horizon BRS computation converges. 6 7
4 r UxR cos Dc UyR sin Dc k(s) 5
UxR sin Dc + UyR cos Dc
5.3. Relative dynamics
The relative state (denoted by subscript rel) between the 6. The MPC + HJI trajectory tracking
robot car and human car is defined with respect to a coor- controller
dinate system centered on and aligned with the robot car’s
vehicle frame. We define the relative position (pxrel , pyrel ) as In this section, we provide details of the optimization
problem used in the MPC + HJI tracking controller,
pxrel cos cR sin cR pxH pxR describe how this problem can be adapted to accommo-
= date collision avoidance for static obstacles, and provide
pyrel sin cR cos cR pyH pyR
numerical details about the BRS computation used in this
and the relative heading crel as crel = cH cR . formulation. Central to an MPC controller is an optimiza-
Since the velocity states are defined with respect to the tion problem; at each time step, the controller solves an
vehicle frame, we cannot define analogous relative velo- optimization problem to find an optimal control sequence,
city states and must include the individual velocity states passes the first control input to the actuator, and then
of each vehicle. As such, the relative state for the human– repeats this process. To be amenable to real-time applica-
robot vehicle system is xrel = ½pxrel pyrel crel UxR UyR vH rR T . tions, the optimization problem requires a fast solve time
In the language of HJ reachability, the disturbance input of (;0:01 s). In this work, the MPC tracking problem is for-
the system is the human car’s control d = uH = ½ v a T mulated as a convex optimization problem, namely a QP,
and the control input is the robot car’s control enabling the use of efficient solvers (Mattingley and
u = uR = ½ d Fx T . Combining Equations (5) and (7), the Boyd, 2012; Stellato et al., 2017) that are capable of sol-
equations of motion for the relative system are ving the QP within the tight operating frequency.
2 3
vH cos crel UxR + pyrel rR
6 vH sin crel UyR pxrel rR 7 6.1. Optimization problem
6 7
6 v r R 7
61 7 Both the trajectory tracking objective and safety-
6 (F
x_ rel = 6 m xf cos d F yf sin d + F xr + Fxdrag ) + r R yR 7
U 7 preserving control constraint rely on optimizing over the
6 1
m (Fyf cos d + Fyr + Fxf sin d) rR UxR
7
6 7 robot steering and longitudinal force inputs simultane-
4 a 5
1 ously. Let qk = ½ Dsk UxR , k UyR , k rR, k Dck ek T
(d F
Izz f yf cos d + d F
f xf sin d d F
r yr )
be the state of the robot car with respect to a nominal tra-
ð8Þ jectory at discrete time step k. Here Dsk , ek , and Dck
denote longitudinal, lateral, and heading error; UxR , k ,
UyR , k , and rR, k are body-frame longitudinal and lateral
5.4. Tracking dynamics velocity, and yaw rate, respectively, as defined in Figure
The tracking MPC controller relies on an error dynamics 5. Let uk = ½ dk Fx, k T be the controls at step k and let
model. Define a path (see Figure 5) through space where s Ak qk + B +
k uk + Bk uk + 1 + ck = qk + 1 denote linearized
is the distance along the path and at any distance s, we first-order-hold dynamics of Equation (9). We adopt the
know the path heading c(s), the curvature k(s), and a varying time steps method (Nshort time steps of size Dtshort
coordinate system (^s, ^e) tangential and normal to the path. and Nlong time steps of size Dtlong ) and stable handling
Given the position and heading of the robot car, we can envelope constraint from Brown et al. (2017) (expressed
project the car to the closest point on the path. Let the as Hk and Gk in the following problem formulation). To
Leung et al. 1335
ensure the existence of a feasible solution, we use slack tracking dynamics as well as the HJI relative dynamics for
variables sb, k , sr, k , and sHJI , k on the stability and HJI the safety-preserving constraint. We call the Operator
constraints. The HJI reachability constraint Splitting Quadratic Program (OSQP) solver (Stellato et al.,
MHJI uk + bHJI ø sHJI is activated only when V (xrel ) ł e. 2017) through the Parametron.jl modeling frame-
Although HJI theory suggests that applying this constraint work (Koolen and contributors, 2018); this combination of
on the next action alone is sufficient, we apply it over the software enables us to solve the following MPC optimiza-
next NHJI = 3 time steps (30 ms lookahead) to account for tion problem at 100 Hz. The MPC code, including optimi-
the approximations inherent in our QP formulation. The zation parameters and vehicle parameters (also included in
MPC tracking problem is a QP of the form Tables 1 and 2), can be found here: https://github.com/
StanfordASL/Pigeon.jl.
þNlong
NshortP
minimize DsTk QDs Dsk + DcTk QDc Dck
q, u, s, sHJI , Dd, DFx k =1
T 6.2. Adding static obstacles
+ eTk Qe ek + Dd k
Dtk R Dd
Ddk
Dtk The current formulation does not prevent the robot from
T driving very far from its nominal trajectory (e.g., com-
DFx, k DFx, k
+ Dtk RDFx Dtk
pletely off the road) in order to avoid the human. In more
+ Wb sb, k + Wr sr, k + WHJI sHJI , k Dtk realistic road settings, there could be environmental con-
subject to q1 = qcurr , u1 = ucurr , straints such as a concrete road boundary. In the same way
dk + 1 dk = Ddk , that collision avoidance with a human-controlled vehicle
d_ min Dtk ł Ddk ł d_ max Dtk , is formulated as an additional constraint to the robot’s
dmin ł dk ł dmax , MPC problem, we can add another safety constraint to
Fx, k + 1 Fx, k = DFx, k , account for a static wall to prevent the robot from swer-
Fx,min ł Fx, k ł Fx,max , ving completely off the road. We consider two approaches
Ux,min ł UxR , k ł Ux,max , for deriving this constraint: the first based on an additional
sb, k ø 0, sr, k ø 0, HJI computation and giving rise to a similar control con-
Ak qk + B +
kuk + Bk uk + 1 +ck = qk + 1 , straint applied over the first NHJI time steps, and the sec-
U s ond treating the wall as a static obstacle with associated
Hk y, k Gk ł b, k ,
rk sr, k state constraints that apply over the whole duration of the
for k = 1, . . . , Nshort + Nlong MPC trajectory optimization.
sHJI , j ø 0 (if V (xrel ) ł e), For the first approach, we can compute a value function
MHJI uj + bHJI ø sHJI (if V (xrel ) ł e), VWALL (xR ) describing the interaction between the robot and
for j = 1, . . . , NHJI the wall by using Equation (5) without the pxR state (we
ð10Þ only care about the distance from the wall and not how far
along the wall the robot is). Two slices of the robot–wall
The objective strives to minimize a combination of BRS projected on the pyR cR plane are illustrated in
tracking error (longitudinal, lateral, and angular), control Figure 7. Intuitively, if the robot is moving towards the
rates (steering and longitudinal tire forces), and magnitude wall, the faster it is moving, the further away it needs to be
of the slack variables. The constraints ensure (i) continuity away from the wall in order for there to exist an optimal
with the current and next state and control, (ii) the change collision avoidance control. Note that because the wall is
in controls across each time step is bounded, (iii) the posi- static, there is no disturbance input. The robot–wall HJI
tivity of slack variables, (iv) the (linearized) dynamics are control constraint MWALL uR + bWALL ø 0 becomes active
satisfied, (v) the vehicle stability constraints are satisfied, when VWALL (xR ) ł e. This means that it is possible for both
and (vi) the HJI safety-preserving half-plane constraint is HJI-safety constraints to be active simultaneously. We
satisfied when V (xrel ) ł e. note that theoretically we could consider the robot, human,
The time-discretization and linearizations (dynamics and wall simultaneously by computing the BRS for the
and constraints) we apply amount to an approximate var- joint system. However, naı̈ve increasing the state size with-
iant of sequential quadratic programming (SQP). In partic- out any decomposition would be computationally undesir-
ular, however, we solve one QP at each MPC step rather able even offline. Thus, we treat the human–robot and
than the usual iteration until convergence. As the tracking robot–wall systems separately, but will discuss later the
problems are so similar from one MPC step to the next, effect of this design choice.
we find that this approach yields sufficient performance Alternatively, we can account for a static wall by add-
for our purposes. We interpolate along each solution tra- ing lateral error bound constraints into the MPC tracking
jectory to compute the linearization nodes for the QP at problem: this is the approach taken by Brown et al. (2017).
the next MPC step. This involves always adding a left and right lateral error
We use the ForwardDiff.jl automatic differentia- constraint (emin , k ł ek ł emax , k ) on each node point along
tion (AD) package implemented in the Julia programming the MPC trajectory such that the lateral deviation from the
language (Revels et al., 2016) to linearize the trajectory desired trajectory does not exceed the lateral distance to
1336 The International Journal of Robotics Research 39(10-11)
Table 1. X1 parameters relevant to defining the equations of Table 2. Parameter values relevant for the MPC tracking
motion in Equation (5) optimization in Equation (10)
g Standard Earth gravity 9.80665 ms2 QDs Quadratic cost on 1.0 m2 s1
m Total mass 1,964 kg longitudinal error
Izz Yaw moment of inertia 2,900 kg m2 QDc Quadratic cost on heading 1.0 s1
h Vertical height of CG 0.47 m error
df Distance from center of 1.4978 m Qe Quadratic cost on lateral 1.0 m2 s1
gravity to front axle error
dr Distance from center of 1.3722 m RDd Quadratic cost on change in 0.1 s
gravity to rear axle steering angle
Caf Front cornering stiffness 150 kN rad1 RDFx Quadratic cost on change in 0.5 N2 s
Car Rear cornering stiffness 220 kN rad1 longitudinal tire force
900 1
Cd0 Drag coefficient constant 241 N Wb Linear cost on sideslip p s
Cd1 Drag coefficient linear 25.1 N m1 s stability slack variable
Cd2 Drag coefficient quadratic 0.0 N m2 s2 Wr Linear cost on yaw rate 50.0
Fxf Front wheel drive 0 stability slack variable
fraction (Fx ø 0) WHJI Linear cost on HJI 500.0 m1
Fxr Rear wheel drive 1 slack variable
fraction (Fx ø 0) We Linear cost on lateral 500.0 m1 s1
Fxf Front wheel brake 0.6 bound slack variable
fraction (Fx \0) dmin Minimum steering angle 18 × 180p
Fxr Rear wheel brake 0.4 dmax Maximum steering angle 18 × 180
p
7.2. Experiments
To evaluate our proposed control stack, a synthesis of a
Fig. 7. The BRS of the robot–wall system where zero heading high-level probabilistic interaction planner with the
corresponds to the robot car being parallel to the wall. MPC + HJI tracking controller, we performed full-scale
human-in-the-loop traffic-weaving trials with X1 taking
on the role of the robot car. To ramp up towards testing
½1, 1; computing the BRS with this system and discreti- with two full-scale vehicles in the near future, we investi-
zation takes approximately 40 minutes. gated two types of human car: (i) a virtual human-driven
car and (ii) a 1/10-scale LiDAR-visible human-driven RC
car. We scaled the highway traffic-weaving scenario
7. Results (mean speed ;28 m/s) in Schmerling et al. (2018) down
7.1. Experimental vehicle platform to a mean speed of ;8 m/s by shortening the track (reduc-
ing longitudinal velocity by a constant) and scaling time
X1 is a flexible steer-by-wire, drive-by-wire, and brake-
by a factor of 4/3 (with the effect of scaling speeds by 3/4
by-wire experimental vehicle developed by the Stanford
and accelerations by 9/16). The value of the parameters in
Dynamic Design Lab (see Figure 8). It is equipped with
the MPC tracking problem are listed in Table 2.
three LiDARs (one 32-beam and two 16-beam), a differ-
We investigate and evaluate the effectiveness of our
ential GPS/INS which provides 100 Hz pose estimates
control stack by allowing the human car to act carelessly
accurate to within a few centimeters as well as high fide-
(i.e., swerving blindly towards the robot car) during the
lity velocity, acceleration, and yaw rate estimates. To con-
experiments. In Sections 7.2.1 and 7.2.2 we compare our
trol X1, desired steering (d) and longitudinal tire force
proposed controller (MPC + HJI) against a tracking-only
(Fxf , Fxr ) commands are sent to the dSpace MicroAutoBox
MPC controller (MPC) and a controller that switches to
(MAB) which handles all sensor inputs except LiDAR
the HJI optimal avoidance controller when near safety vio-
(handled by the onboard PC) and implements all low-level
lation (switching). In Section 7.2.3 we compare the two
actuator controllers. Similarly, state information about the
static obstacle avoidance approaches developed in Section
vehicle is obtained from the MAB. We use the Robot
6.2 to a scenario with a virtual wall constraining the
Operating System (ROS) to communicate with the MAB
robot’s trajectory. We investigate applying an extension of
which handles sensing and control at the hardware level;
the same state-constraint-based static obstacle avoidance
the planning/control stack described in this work is run-
strategy to the problem of collision with a dynamic obsta-
ning onboard X1 on a consumer desktop PC running
cle (the human) in Section 7.2.4 and discuss its shortcom-
Ubuntu 16.04 equipped with a quadcore Intel Core i7-
ings. Finally, in Section 7.2.5 we present experiments with
6700K CPU and an NVIDIA GeForce GTX 1080 GPU. the human car physically embodied by an RC car.
X1 parameters used in the equations of motion (Equation
(5)) are listed in Table 1.
We also performed experiments using a LiDAR-visible 7.2.1. Virtual human-driven vehicle. To ensure a com-
1/10-scale RC car (Figure 8 right) as the human-driven car pletely safe experimental environment, our first tier of
to investigate the robustness of our proposed control stack experimentation uses a joystick-controlled virtual vehicle
with perception uncertainty. for the human car and allows the robot control stack to
The code used to run the experiments can be found here: have perfect observation of the human car state.
https://github.com/StanfordASL/safe_traffic_weaving. Experimental trials of the probabilistic planning
1338 The International Journal of Robotics Research 39(10-11)
Fig. 9. Controller comparison on planner trajectories from X1-virtual human-driven vehicle experiments: (a) Controller comparison
on experimental data where the robot car uses the MPC + HJI controller. (b) Controller comparison on experimental data where the
robot car uses the switching controller. Left: Simulations of using the MPC, MPC + HJI, and switching controllers when V (xrel (t))
first drops below e = 0:05 are shown, and are compared against the executed trajectory (from experiments) and the desired trajectory
(from the interaction planner). Right: The corresponding evolutions of V (xrel (t)). Bottom: Illustration of the traffic-weaving
interaction. The human controlled car (blue car) carelessly drives into the path of the robot car (red car), causing the robot car to
react by swerving. The transparency of the rectangles corresponds to the speed of the car (higher transparency corresponds to higher
speed).
framework using (i) our proposed approach (MPC + HJI) the switching controller which arguably overreacts with a
and (ii) switching to the optimal HJI controller (switching) large excursion outside the lane boundaries. Evidently, our
are shown in Figure 9, along with a simulated comparison proposed controller tries to be minimally interventional:
between the two safety controllers and the tracking-only the robot car swerves/brakes but only to an extent that is
controller (MPC). For comparative purposes, the control- necessary.
lers were simulated with the displayed nominal trajectory Looking at the value function, we see that, as expected,
held fixed, but in reality, the nominal trajectory in these the MPC + HJI controller aims to keep the value positive,
experiments was updated at ;3 Hz. As expected, we see but does not necessarily strive to increase it, while the
that when safety violation occurs, the MPC + HJI control- switching controller aims to increase the value as much as
ler represents a middle ground between the tracking-only possible. The MPC (tracking only) controller fails to
MPC that does not react to the human car’s intrusion, and increase the value at all when safety violation occurs. All
Leung et al. 1339
Fig. 12. Trajectory comparison between using lateral error constraints (top), a robot–wall HJI control constraint (middle), and using
no wall constraints (bottom). The human car is in the left (top) lane and is constant across all three trials, and the robot car is in the
right (bottom) lane. The transparency of the car corresponds to the speed of the car (higher transparency corresponds to higher
speed).
we see that as the human trajectory increases its curvature controller is able to arrest its fall; we discuss this behavior
towards the middle of the lane change, the lateral error in the next section.
bounds (based on an assumption of constant velocity) are
not stringent enough to maintain robot safety. The utility
of the MPC + HJI approach is that it distills safety consid- 8. Discussion
erations with respect to all possible realizations of the Beyond the qualitative and quantitative confirmation of
human trajectory into a single control constraint. our design goals, our experimental results reveal three
Moreover, this constraint is not overly conservative (to main insights.
improve the state-constraint-based approach one might
imagine defining lateral error constraints to avoid the
entire FRS of the human), because it incorporates the con- 8.1. Takeaway 1: The reachability cache is
cept of closed-loop feedback into its BRS computation. underly conservative with respect to robot car
dynamics and overly conservative with respect to
7.2.5. 1/10-scale human-driven vehicle. To begin investi- human car dynamics
gating the effects of perception uncertainty on our safety In all cases, hardware experiments as well as simulation
assurance framework, we use three LiDARs onboard X1 to results, the HJI value function V dips below zero, indicat-
track a human-driven RC car, and implement a Kalman fil- ing that neither the HJI + MPC nor even the optimal
ter for human car state estimation (position, velocity, and avoidance switching controller are capable of guaranteeing
acceleration). Even with imperfect observations, we show safety in the strictest sense. The root of this apparent para-
some successful preliminary results (an example is shown dox is in the computation of the reachability cache used by
in Figure 14) at mean speeds of ;4 m/s, close to the limits both controllers as the basis of their safety assurance.
of the RC car and LiDAR-visible mast in crosswinds at the Though the seven-state relative dynamics model (Equation
test track. We observe similar behavior as in the virtual (8)) subsumes a single-track vehicle model that has proven
human car experiments, including the fact that the value successful in predicting the evolution of highly dynamic
function dips briefly below zero before the MPC + HJI vehicle maneuvers (Brown et al., 2017; Funke et al.,
1342 The International Journal of Robotics Research 39(10-11)
Fig. 13. Comparison between the MPC + HJI controller (left) and a baseline approach not based on any reachability analysis (right)
for the task of dynamic obstacle avoidance. (a) Simulation trial of using MPC + HJI for human car collision avoidance (including
lateral error constraints for wall collision avoidance). (b) Simulation trial of using lateral error constraints for both human car and
wall collision avoidance. The trajectory of the human car starting in the left (top) lane is visualized in gray; the trajectory of the
robot car starting in the right (bottom) lane is visualized in color. In each close-up view, error bars accompanying each snapshot of
the robot illustrate the first (k = 1) lateral state constraint of the MPC problem at that step. To help correspond the positions of the
cars over the trials, each pair of matching-shape markers represents a common time instant.
2017), the way it is employed in computing the value to eight or nine, however, might not be computationally
function V omits relevant components of the dynamics. In feasible (even offline) without devising more efficient HJI
particular, when computing the optimal avoidance control solution techniques or choosing an extremely coarse dis-
(Equation (2)) as part of solving the HJI PDE, we assume cretization over the additional states. By literature stan-
total freedom over the choice of robot steering angle d and dards we already use a relatively coarse discretization grid
longitudinal force command Fx , up to maximum control for solving the HJI PDE; associated numerical inaccura-
limits. This does not account for, e.g., limits on the steer- cies may be another source of the observed safety mis-
ing slew rate (traversing ½dmin , dmax takes approximately 2 match and alternate grid choices are a possible subject of
seconds), and thus the value function is computed under future investigation. We believe that simulation, account-
the assumption that the robot can brake/swerve far faster ing for slew rates, could be a good tool to prototype such
than it actually can. efforts, noting that as it stands we have relatively good
We note that simply tuning the safety buffer e is insuffi- agreement between simulation and the experimental plat-
cient to account for these unmodeled dynamics. In Figure form in our testing.
9a we see that V may drop from approximately 0.5 (the
value when the two cars traveling at 8 m/s start side-by-
side in lanes) to 0.3 in the span of a few tenths of a sec- 8.2. Takeaway 2: Interpretability of the value
ond. Selecting e.0:5 might give enough time for the steer- function V should be a key consideration in
ing to catch up, but such a selection would prevent the future work
robot car from accomplishing the traffic-weaving task even
under nominal conditions, i.e., when the human car is In this work the terminal value function V (0, xrel ) is speci-
equally concerned about collision avoidance. Even for fied as the separation/penetration distance between the
e = 0:05, we see that the human car can cause the robot to bounding boxes of the two vehicles, a purely geometric
swerve by just staying in its lane but with a small offset quantity dependent only on pxrel , pyrel , and crel . Recalling
towards the robot’s lane (see Figure 12). This is because that V (: = V‘ ) represents the worst-case eventual out-
the safety controller would push the robot car outside of its come of a differential game assuming optimal actions
lane from the outset to maintain the buffer. This behavior from both robot and human, we may interpret the above
follows from wide level sets associated with the transient results through the lens of worst-case outcomes, i.e., a
control authority asymmetry (recall that in the HJI relative value of 0.3 may be thought of as 30 cm of collision
dynamics the human car may adjust its trajectory curvature penetration assuming optimal collision seeking/avoidance
discontinuously), assumed as a conservative safety mea- from human/robot. When extending this work to cases
sure as well as a way to keep the relative state dimension with environmental obstacles (e.g., concrete highway
manageable. boundaries that preclude large deviations from the lane),
The simplest remedy for both of these issues is to or multi-agent settings where the robot must account for
increase the fidelity of the relative dynamics model by the uncertainty in multiple other parties’ actions, for many
incorporating additional integrator states d_ for the robot common scenarios it may be the case that guaranteeing
and v_ for the human. Naı̈ve increasing the state dimension absolute safety is impossible. Instead of avoiding a BRS
Leung et al. 1343
Fig. 14. Timelapse of pairwise vehicle interaction: X1 with 1/10-scale human-driven RC car. The RC car (green trajectory) nudges
into X1 (red trajectory), which swerves gently to avoid. Inset: The value function over time of the X1-human driven RC car
experiment.
of states that might lead to collision, we should instead interactions. By essentially projecting the planner’s
treat the value function inside the BRS as a cost. In partic- desired trajectory into the set of safety-preserving controls
ular, we should specify more contextually relevant values whenever safety is threatened, we preserve more of the
of V (0, xrel ) for states in collision, e.g., negative kinetic planner’s intent than would be achieved by adopting the
energy or another notion of collision severity as a function optimal control with respect to separation distance. Our
of the velocity states UxR , UyR , vH , and rR in addition to experiments show that with our proposed minimally inter-
the relative pose. This would lead to a controller that pre- ventional safety controller, we accomplish the high-level
fers, in the worst case, collisions at lower speed, or per- objective (traffic-weaving) despite the human car swer-
haps ‘‘glancing blows’’ where the velocities of the two ving directly onto the path of the robot car, and accom-
cars are similar in magnitude and direction. plish this relatively smoothly compared with using a
switching controller that results in the robot car swerving
more violently off the road. Further, we investigate the
8.3. Takeaway 3: Static obstacles should be addition of a road boundary wall to our formulation to pre-
accounted for using MPC tracking state vent the robot car from swerving completely out of the
constraints and not HJI control constraints lane which could be dangerous in realistic road settings,
and compare against a baseline approach using state con-
It is natural to use HJI when reasoning about potential
straints to avoid dynamic obstacles to show that our
dangers from a dynamic, unpredictable agent, but for sta-
framework is better designed to keep the robot car safe
tic obstacles there is no uncertainty in how the obstacle
even in the face of worst-case human car behaviors.
may act. That is, when the other agent is deterministic, We note that this work represents only a promising first
there is no notion of optimizing robot safety policies over step towards the integration of reachability-based safety
all worst-case actions by the other agent. Indeed, in this guarantees into a probabilistic planning framework. Future
case it is possible to confidently extend the safety con- work includes investigating this framework for cases
straint along the entire MPC horizon, as demonstrated in where the human and robot have very different dynamics,
Section 7.2.3. This is as opposed to deriving a safety con- such as a pedestrian or cyclist interacting with a car, or a
straint to be applied at the first step (on the present control human interacting with a robotic manipulator, and for
action; the HJI + MPC approach), as must be done with a interactions involving multiple (more than two) agents.
non-deterministic agent with unknown future trajectory. We may also consider adapting our approach to ensure
Spreading the collision avoidance constraint over the high planning performance while guaranteeing satisfaction
problem horizon allows for the MPC optimization to more of constraints other than safety, e.g., task requirements
smoothly accomplish its objectives (i.e., control smooth- specified by temporal logic constraints as considered in
ness and tracking performance) while maintaining safety. Chen et al. (2018). In the context of human–robot vehicle
We note, however, that naı̈ve extensions of this approach interactions, we have already discussed the concrete modi-
to non-deterministic agents (recall Section 7.2.4) do not fications to this controller we believe are necessary to
similarly provide strong safety assurances. improve the practical effect of our theoretical guarantees;
further study should also consider better fitting of the
planning objective at the controller level. That is, instead
9. Conclusions
of performing a naı̈ve projection, i.e., the one that mini-
We have investigated a control scheme for providing real- mizes trajectory tracking error, it is likely that a more
time safety assurance to underpin the guidance of a prob- nuanced selection informed by the planner’s prediction
abilistic planner for human–robot vehicle–vehicle model would represent a better ‘‘backup choice’’ in the
1344 The International Journal of Robotics Research 39(10-11)
case that safety is threatened. We recognize that ulti- Bajcsy A, Bansal S, Bronstein E, Tolani V and Tomlin CJ (2019)
mately, guaranteeing absolute safety on a crowded road- An efficient reachability-based framework for provably safe
way may not be realistic, but we believe that in such autonomous navigation in unknown environments. In: Pro-
situations value functions derived from reachability may ceedings IEEE Conference on Decision and Control.
provide a useful metric for near-instantly evaluating the Bokanowski O, Forcadel N and Zidani H (2010) Reachability
and minimal times for state constrained nonlinear problems
future implications of a present action choice.
without any controllability assumption. SIAM Journal on Con-
trol and Optimization 48(7): 4292–4316.
Acknowledgements Brown M, Funke J, Erlien S and Gerdes JC (2017) Safe driving
The authors would like to thank the X1 team, in particular Matt envelopes for path tracking in autonomous vehicles. Control
Brown, Larry Cathey, and Amine Elhafsi, Matteo Zallio, and the Engineering Practice 61: 307–316.
Thunderhill Raceway Park for accommodating testing. Chen M, Hu Q, Fisac J, Akametalu K, Mackin C and Tomlin C
(2017) Reachability-based safety and goal satisfaction of
unmanned aerial platoons on air highways. AIAA Journal of
Funding Guidance, Control, and Dynamics 40(6): 1360–1373.
The author(s) disclosed receipt of the following financial support Chen M, Tam Q, Livingston SC and Pavone M (2018) Signal
for the research, authorship, and/or publication of this article: temporal logic meets Hamilton–Jacobi reachability: Connec-
This work was supported by the Office of Naval Research (grant tions and applications. In: Workshop on Algorithmic Founda-
number N00014-17-1-2433), by Qualcomm, and by the Toyota tions of Robotics.
Research Institute (‘‘TRI’’). This article solely reflects the opi- Chen M and Tomlin CJ (2018) Hamilton–Jacobi reachability:
nions and conclusions of its authors and not ONR, Qualcomm, Some recent theoretical advances and applications in
TRI, or any other Toyota entity. unmanned airspace management. Annual Review of Control,
Robotics, and Autonomous Systems 1(1): 333–358.
Dhinakaran A, Chen M, Chou G, Shih JC and Tomlin CJ (2017)
Notes A hybrid framework for multi-vehicle collision avoidance. In:
1. With precision dependent on parameters of the numerical sol- Proceedings IEEE Conference on Decision and Control.
ver, e.g., discretization choices in mesh size/time step. Fisac JF, Akametalu AK, Zeilinger MN, Kaynama S, Gillula J
2. See Section 8 for a discussion of specific choices of and Tomlin CJ (2018) A general safety framework for
V (0, xrel ). learning-based control in uncertain robotic systems. IEEE
3. For ease of notation going forward we will often write Transactions on Automatic Control 64(7): 2737–2752.
V : = V‘ . Fisac JF, Chen M, Tomlin CJ and Sastry SS (2015) Reach-avoid
4. Since the dynamic bicycle model is ill-defined for low speeds problems with time-varying dynamics, targets and constraints.
(thus, explaining the oscillations in Fx ), the experiment In: Hybrid Systems: Computation and Control.
(when using the robot–wall HJI control constraint) is essen- Frehse G, Le Guernic C, Donzé A, et al. (2011) SpaceEx: Scal-
tially over around t = 9. In general, the MPC problem can able verification of hybrid systems. In: Proceedings Interna-
switch to the kinematic model at low speeds (Patterson et al., tional Conference Computer Aided Verification.
2018), which is well defined in that speed region. Funke J, Brown M, Erlien SM and Gerdes JC (2017) Collision
5. Alternatively, one could use a sample or a maximum a poster- avoidance and stabilization for autonomous vehicles in emer-
iori estimate from the interaction planner’s prediction model. gency scenarios. IEEE Transactions on Control Systems Tech-
nology 25(4): 1204–1216.
Gattami A, Al Alam A, Johansson KH and Tomlin CJ (2011)
ORCID iDs Establishing safety for heavy duty vehicle platooning: A
Karen Leung https://orcid.org/0000-0002-3033-8761 game theoretical approach. IFAC World Congress 44(1):
Mo Chen https://orcid.org/0000-0001-8506-3665 3818–3823.
Gray A, Gao Y, Hedrick JK and Borrelli F (2013) Robust predic-
tive control for semi-autonomous vehicles with an uncertain
References driver model. In: IEEE Intelligent Vehicles Symposium.
Althoff M and Dolan J (2014) Online verification of automated Greenstreet MR and Mitchell I (1998) Integrating projections. In:
road vehicles using reachability analysis. IEEE Transactions Hybrid Systems: Computation and Control.
on Robotics 30(4): 903–918. Koolen Tcontributors (2018) Parametron.jl: Efficiently solving
Althoff M and Dolan JM (2011) Set-based computation of vehi- instances of a parameterized family of optimization problems in
cle behaviors for the online verification of autonomous vehi- Julia. Available at: https://github.com/tkoolen/Parametron.jl.
cles. In: Proceedings IEEE International Conference on Kurzhanski A and Varaiya P (2000) Ellipsoidal techniques for
Intelligent Transportation Systems. reachability analysis: Internal approximation. System and
Althoff M and Krogh BH (2014) Reachability analysis of non- Control Letters 41(3): 201–211.
linear differential-algebraic systems. IEEE Transactions on Liu SB, Roehm H, Heinzemann C, Lütkebohle I, Oehlerking J
Automatic Control 59(2): 371–383. and Althoff M (2017) Provably safe motion of mobile robots
Arora S, Choudhury S, Althoff D and Scherer S (2015) Emer- in human environments. In: IEEE/RSJ International Confer-
gency maneuver library – ensuring safe navigation in partially ence on Intelligent Robots and Systems.
known environments. In: Proceedings IEEE Conference on Lorenzetti J, Chen M, Landry B and Pavone M (2018) Reach-
Robotics and Automation. avoid games via mixed-integer second-order cone
Leung et al. 1345
programming. In: Proceedings IEEE Conference on Decision Revels J, Lubin M and Papamarkou T (2016) Forward-mode
and Control. automatic differentiation in Julia. arXiv preprint
Majumdar A, Vasudevan R, Tobenkin MM and Tedrake R arXiv:1607.07892.
(2014) Convex optimization of nonlinear feedback controllers Sadigh D, Sastry S, Seshia SA and Dragan AD (2016) Planning
via occupation measures. The International Journal of for autonomous cars that leverage effects on human actions.
Robotics Research 33(9): 1209–1230. In: Robotics: Science and Systems.
Margellos K and Lygeros J (2011) Hamilton–Jacobi formulation Schmerling E, Leung K, Vollprecht W and Pavone M (2018)
for reach-avoid differential games. IEEE Transactions on Multimodal probabilistic model-based planning for human–
Automatic Control 56(8): 1849–1861. robot interaction. In: Proceedings IEEE Conference on
Mattingley J and Boyd S (2012) CVXGEN: A code generator for Robotics and Automation.
embedded convex optimization. Optimization and Engineer- Stellato B, Banjac G, Goulart P, Bemporad A and Boyd S (2017)
ing 12(1): 1–27. OSQP: An operator splitting solver for quadratic programs.
Mitchell IM, Bayen AM and Tomlin CJ (2005) A time-dependent arXiv preprint arXiv:1711.08013.
Hamilton–Jacobi formulation of reachable sets for continuous Tanabe K and Chen M (2018) BEACLS: Berkeley Efficient API
dynamic games. IEEE Transactions on Automatic Control in C + + for Level Set methods. Available at: https://github.-
50(7): 947–957. com/HJReachability/beacls.
Pacejka H (2002) Tire and Vehicle Dynamics. Amsterdam: Wolf MT and Burdick JW (2008) Artificial potential functions
Elsevier. for highway driving with collision avoidance. In: Proceedings
Patterson V, Thornton S and Gerdes JC (2018) Tire modeling to IEEE Conference on Robotics and Automation.
enable model predictive control of automated vehicles from Yoon S, Fern A, Givan R and Kambhampati S (2008) Probabilis-
standstill to the limits of handling. In: International Sympo- tic planning via determinization in hindsight. In: Proceedings
sium on Advanced Vehicle Control. AAAI Conference on Artificial Intelligence.