You are on page 1of 21

A Review on Intention-aware and Interaction-aware Trajectory

Prediction for Autonomous Vehicles


This paper was downloaded from TechRxiv (https://www.techrxiv.org).

LICENSE

CC BY 4.0

SUBMISSION DATE / POSTED DATE

10-03-2022 / 16-03-2022

CITATION

Gomes, Iago; Wolf, Denis (2022): A Review on Intention-aware and Interaction-aware Trajectory Prediction
for Autonomous Vehicles. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.19337447.v1

DOI

10.36227/techrxiv.19337447.v1
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

A Review on Intention-aware and Interaction-aware


Trajectory Prediction for Autonomous Vehicles
Iago Pachêco Gomes and Denis Fernando Wolf

Abstract—Autonomous vehicles should improve urban trans- maneuver-based or intention-aware; and, interaction-aware.
port scenarios, since they use a wide range of components to pro- Physical-based approaches were widely studied because of
vide a rich representation of the surroundings and improve driv- their simplicity, however, they provide only short-term tra-
ing decision-making. One of these components is the trajectory
prediction, which estimate future state of traffic participants and jectory prediction since the result imprecision increases with
allows predicting hazardous traffic scenarios. There are different the horizon of prediction. Nonetheless, reliable long-term
approaches for trajectory prediction, in which Intention-aware predictions are necessary for more robust systems [6]. Long-
and Interaction-aware approaches stands for the state-of-art since term trajectory prediction methods need a better representation
they use better representation of the surroundings. This paper of each target agent’s surroundings to improve the accuracy
presents a literature review on intention-aware and interaction-
aware trajectory prediction, highlighting the techniques applied, of their results. This is accomplished by intention-aware and
dataset, evaluation metrics, and open issues. interaction-aware trajectory prediction algorithms, since they
use better road geometry representations, maneuver intentions
Index Terms—trajectory prediction, intention, interaction, au-
tonomous vehicles. of traffic participants, traffic rules, and also take into account
interaction and interdependency of traffic participant’s deci-
sions.
I. I NTRODUCTION
This paper presents a literature review on Intention-Aware
AUTONOMOUS Vehicles are intelligent and robotic ve-
A hicles that navigate in traffic without human intervention,
dealing with all driving scenarios and respecting traffic rules
and Interaction-Aware trajectory prediction for autonomous
vehicles, analyzing primary studies published since 2008, a
year after the edition of the DARPA’s Urban Challenge. The
[1]. They combine a wide range of sensors and software major concerns of this analysis are identifying how maneuver’s
components in order to create a rich representation of a scene intention and interaction improve the performance of trajec-
and understand their surroundings, make decisions, and take tory prediction, the techniques used, the datasets, and also
actions similar or even better than human drivers. Therefore, evaluation metrics. Thus, we defined the following research
they are expected to transform urban traffic improving mo- questions:
bility, safety, accessibility, and reducing pollutant emissions,
among other benefits [2]. • RQ1: What are the existing solutions for trajectory
Although they are not yet present in all streets and cities, prediction that take into account intention of traffic
their viability is clear, given the great advance in research and participants and/or interaction models? This question
development achieved in recent years [2], [3]. However, there aims to identify what are the algorithms, techniques,
are still many challenges in the research field to fulfill the methodologies and technological resources applied for
requirements of a fully-autonomous vehicle, in which it must implementing trajectory prediction, intention prediction,
be able to handle any situation in any traffic scenario without and interaction modelling of traffic participants.
the need for human intervention. An important requirement for • RQ2: How the intention prediction of traffic participants
an autonomous system is the safety of passengers and other improves their trajectory prediction? This question aims
traffic participants. In such case, obstacle detection alone is to understand how the intention influences the improve-
not enough for safe navigation. Hence, it is also necessary to ment of trajectory prediction.
track and predict obstacle trajectories and other behaviors. This • RQ3: How the interaction model of traffic participants
information enriches the representation of the environment [4], improves their trajectory prediction? This question aims
and allows autonomous and intelligent vehicles to redesign to understand how the interaction models influences and
their trajectories and actions to, for example, prevent accidents improves trajectory prediction.
when conflicting trajectories are detected. The remaining of this paper is organized as follows: Section
Lefèvre, Vasquez and Laugier [5] classified trajectory II presents some related works that carried out a literature
prediction approaches into three categories: physical-based; review on trajectory prediction; Section III details a general
and formal definition of the trajectory prediction problem;
This study was financed in part by the Coordenação de Aperfeiçoamento
de Pessoal de Nı́vel Superior - Brasil (CAPES) - Finance Code 001 and grant Sections IV, V, and VI analyze the research questions; Section
88887.500344/2020-00, and the São Paulo Research Foundation (FAPESP) VII presents the most relevant datasets used by the primary
under grant 2019/27301-7. studies; Section VIII describes the evaluation metrics; Section
I. P. Gomes and D. F. Wolf are with the Institute of Mathematics and
Computer Science, University of São Paulo, São Carlos, Brazil (e-mail: IX highlights some remarks based on the primary studies’
iagogomes@usp.br, denis@icmc.usp.br). analysis; and, finally, Section X concludes this review.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

II. R ELATED W ORKS whether the vehicle intends to perform a left or right lane-
change. However, these techniques assume that vehicles are
Trajectory prediction is an important component of an independent entities from each other, that is, their actions do
autonomous system. It is responsible for estimating future not interfere in the decisions of other agents. This is not a re-
states of traffic participants, which enhances the surroundings liable assumption, although it allows long-term predictions. In
representation and improves the performance of several other turn, Interaction-aware approaches consider traffic participants
autonomous driving tasks. However, there are many challenges as interactive agents, such that their actions are interdependent.
regarding predicting the state of traffic participants, specially This assumption is able to better represent the dynamics
with respect to the noise in the observation data, the large of traffic scenarios, which allow long-term predictions more
number of different traffic scenarios, the variety of traffic reliable than intention-aware methods [5]. However, the com-
participants with different motion dynamics (e.g., pedestrians, plexity of traffic scenarios and interactive agents increase the
cyclists, cars, trucks, and motorcyclists), and the stochastic computational cost of the prediction task [13]. In addition,
behavior of the traffic participants’ decision-making [5], [7], interaction modeling is still an open challenge in autonomous
[8]. Therefore, there is an effort in the scientific community to driving [7], [14]–[16]. The review presented by Lefèvre,
develop better trajectory prediction frameworks. In addition to Vasquez and Laugier [5] does not provide a further analysis
cope with the aforementioned challenges, it is also important of interaction-aware trajectory prediction because there were
to take into account the requirements of an autonomous system not many primary studies that followed this approach in the
software/hardware architecture, specially the real-time and review coverage period. However, in recent years this approach
memory constraints. There are some reviews published in the has become more prevalent, especially with the use of deep-
literature that summarize the results achieved so far. learning techniques.
Ridel et al. [9] presented a literary review on pedestrian There are other papers that presented an update of the
behavior and trajectory prediction. This review is an expansion effort in trajectory prediction, specially using deep-learning
of the taxonomy proposed by Brouwer et al. [10], and high- techniques [6], [8], [13]. However, these reviews are limited
lights the importance of the individual characteristics of each to a small scope of techniques compared to what is already
pedestrian for the prediction result, such as age, objects that the available in the literature. Hu and Zheng [17] also surveyed
pedestrian is carrying, among others. In turn, Rudenko et al. behavior and trajectory prediction, in addition to highlight
[11] presented a survey of human motion prediction for a wide the primary studies that applied these tasks in the context
range of applications (e.g., service robots and surveillance of Internet-of-Vehicles (IOV), which allows communication
systems). This survey presents a new taxonomy based on between vehicles. The communication between traffic partici-
the modeling approach and the type of contextual cues. The pants, often referred to as Vehicle-To-Anything (V2X), reduce
first class, modeling approach, focuses on how the motion is the complexity of the prediction task since the agents can
represented. Meanwhile, contextual cue class emphasizes the exchange information of their states or intentions, which is
awareness of the target agent on its surroundings. either noise-free or more reliable than when measured by third-
Alternatively, Bighashdel and Dubbelman [12] surveyed party sensors (i.e., sensors belonging to the ego-vehicle).
the prediction of Vulnerable Road Users (VRUs), which Alternatively, Mozaffari et al. [18] surveyed behavior and
include pedestrians, cyclists, and motorcyclists. They pre- trajectory prediction, and proposed a new taxonomy to classify
sented a taxonomy that divides the techniques into interaction- the primary studies. This taxonomy divides the techniques into
based, path-planning, and intention-based. Interaction-based input representation, output type, and prediction method. Thus,
approaches consider target agents as interactive entities in it allows identifying the technologies that have being used for
an environment with other agents and objects. In turn, path- trajectory prediction, specially the deep-learning techniques.
planning methods inherit characteristics of path-planning tech- However, there are other techniques in the state-of-art of the
niques in robotics, which take heed of either the goal or research field that were not detailed by the authors in their
motion reward of the target agent. Finally, the intention-based survey.
techniques take into account the motion or behavior intention Therefore, in this paper, we extend the classification of
of the target agent, e.g., the intention to cross the road. prediction method, and contextualize it with Intention-aware
Lefèvre, Vasquez and Laugier [5] surveyed trajectory pre- and Interaction-aware trajectory prediction approaches for
diction and risk assessment techniques of vehicles, and also vehicles. We also analyze the datasets and evaluation metrics
proposed a taxonomy that divided the techniques into physical- used by the primary studies to develop and evaluate their
based, maneuver-based, and interaction-aware. Physical- approaches. In addition, we discuss how maneuver intention
based techniques only take into account the dynamic and kine- and vehicles’ interaction are used. Finally, we identify some
matic equations of the target vehicle motion. Hence, they have challenges and future directions for the research area.
the lowest computational cost, but their accuracy degrades
with the increase of the prediction horizon. Therefore, these III. P ROBLEM D EFINITION AND T ERMINOLOGY
techniques are only used for short-term predictions, which is A traffic scenario is characterized by a set of traffic partic-
usually around one second. ipants V = {v0 , v1 , . . . , vn }, each defined by a state vector
Maneuver-based, also known as intention-aware, uses the τi ∈ Rt×m , for i = {0, 1, . . . , n}, and n ∈ N∗ is the number
maneuvering intention of the target vehicle as a priori infor- of traffic participants at the current traffic scenario (except the
mation for the trajectory prediction framework. For example, ego vehicle).
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

The predicted state qi is usually defined by the position of


NV1 SV2 SV4
the vehicle. However, it is also possible to enhance the output
G B I with the prediction of other information, such as orientation,
EV velocity, and acceleration.
F A C Finally, the predictor ℑ is a function of many parameters.
NV2 TV These parameters are the input of the model, and are related
H E D J to the proposed prediction approach. Usually the input of
Intention-aware and Interaction-Aware Trajectory Prediction
NV3 NV4 SV1 SV3
methods are the historical state of the vehicles (V), road geo-
metrical features (G), maneuver intention (M), and interaction
Fig. 1: Traffic Scenario modeling features (I). Therefore, ℑ is defined by:

At a given prediction time k, each traffic participant can ℑ(V, G, M, I) = {ζ1 , . . . , ζn }, n ∈ N∗ (3)
assume one of the roles defined below [18]. Figure 1 shows a It is important to highlight that there are some variations
traffic scenario at a highway with ten traffic participants and in the input parameters of the predictor. For example, if
their roles. the maneuver intention is not explicitly considered, such as
• Ego Vehicle (EV): The ego vehicle (i = 0) is the term in some interaction-aware approaches, the predictor can be
used to refer to the autonomous vehicle. It is important to defined as ℑ (V, G, I). The same happens in intention-aware
note that the observations obtained are generally relative predictors that do not consider the interaction between traffic
to the ego vehicle, since the observations are obtained agents, which can be defined as ℑ (V, G, M ).
from its sensors.
• Target Vehicle (TV): The target vehicle is the one the
IV. I NTENTION - AWARE T RAJECTORY P REDICTION
trajectory predictor aims to estimate the future states.
• Surrounding Vehicles (SV): The surrounding vehicles Intention-aware trajectory prediction is a category of ap-
are within an influence distance from the target vehicle. proaches that takes into account the maneuver intention of the
Therefore, they are assumed to affect the motion of the target vehicles to predict its future state. Intention or maneuver
target vehicle. prediction is an important component for autonomous vehicle
• Non-Effective Vehicles (NV): The non-effective vehicles perception system [19]. It provides the likelihood of traffic
are the remaining vehicles of the current traffic scenario. participants performing maneuvers belonging to a finite set of
They are outside the target vehicle’s influence radius. possibilities [20]. This information contributes to the safety in
Therefore, they should not affect the target vehicle’s traffic scenarios and has been integrated into many automotive
motion. systems such as Advanced Driver Assistance System (ADAS)
The state vector τi is a time series defined in the interval and autonomous vehicles [19]–[21].
[0, t], where t is the size of the series. The state vector is also According to Katrakazas et al. [7] a maneuver is a high-
known as the observation vector, and is subject to noise. The level abstraction of motion that behaves similarly whenever
dimension m of the state vector is arbitrary, and depends on performed. They are characterized by a continuous sequence
several aspects in each application, such as which information of vehicle’ states. Therefore, some maneuvers are considered
is provided to the predictor, the sensors available to observe the primitive motion of each vehicle during their interaction in
the surroundings, the model complexity, among others. For traffic, such as lane change, going straight, turning left/right,
example, if the state vector for each vehicle is its historical among others. In this sense, the maneuver intention is the
trajectory defined by its position over time, the state vector drivers’ consideration towards performing a maneuver [19].
τi ∈ Rt×2 is defined as: There are wide range o factors that influence this decision
and the performance of maneuvers. According to AbuAli and
τi = {(xi , yi )0 , (xi , yi )1 , . . . , (xi , yi )t } , t ∈ N∗ (1) Abou-zeid [21], some stand out, being: individual resources,
in which the tuple (xi , yi )k is the position of a traffic partici- which are physical, social, psychological, and mental features
pant i at a time k, and t is the length of the observation. from the driver; knowledge or skills, which represent the driver
The goal of the trajectory predictor (ℑ) is to estimate experience and knowledge about the driving task; environmen-
trajectories sequences (ζ) of each traffic participant vi (except tal factors, which are surrounding features such as weather,
the ego vehicle) given the features of the model. In this sense, traffic conditions, and vehicle status; and, workload and risk
the output of the predictor is all future trajectories of traffic awareness, that address features from defensive driving such
participants, such that ζi is defined as follows: as risk assessment, situation awareness or traffic attention, and
the effort expended for the task.
As stated in Xing et al. [19]; and Doshi and Trivedi [22],
ζit:t+∆t = qti , . . . , qt+∆t

i (2)
drivers’ decisions are naturally hierarchical, and road-level
in which i is the index of the target vehicle, t is the current decisions can be divided into strategical, tactical, and oper-
time frame, ∆t is the time horizon, qki is the prediction state ational sub-levels. Strategical decisions are high-level motion
of the target vehicle at time frame k ∈ {t, . . . , t + ∆t}. plans such as routes and comfort parameters. Alternatively,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

tactical decisions are short-term decisions that lead to general they need to consider navigation rules, social behaviors, traffic
goal satisfaction, which can also be regarded as lateral and signs, and other traffic rules.
longitudinal maneuvers. Finally, the operational level, also Moreover, interaction-aware trajectory prediction methods
known as the control level, is responsible to estimate control extract more reliable and realistic information from the sur-
commands. roundings, which allows them to predict long-term trajectories
Intention prediction mainly focuses on the tactical level of with more accuracy than other methods [5], [13]. However,
decisions. Thus, there are basically two classes of maneuvers, the challenge to create these methods is far greater than the
respectively, lateral (e.g., left/right lane-change, and keep- other methods, because interaction modeling is still an open
lane) and longitudinal maneuvers (e.g., accelerate, decelerate, challenge in the autonomous vehicle research field [13].
and keep-speed) [19]. However, in intention-aware trajectory There are many reasons that increase the difficulty of inter-
prediction, each maneuver is a task independent of other action modeling, some of them are: complexity in predicting
agents. Thus, for example, if the vehicle is performing a lane interaction-aware maneuver intention; wide range of traffic
change, its movement should not be altered by the presence scenarios; the number of traffic rules; heterogeneity of traffic
of other vehicles in the environment. This is an unrealistic participants (e.g., cars, buses, trucks, motorcyclists, cyclists,
assumption in most cases, since during the lane-change, the and pedestrians); traffic participants that disobey traffic rules;
movements of other vehicles must be constantly monitored to stochastic behavior traffic participants; real-time constraints
avoid collisions. In addition, Xing et al. [19] highlighted that of the application; complexity in defining spatio-temporal
drivers are part of an interactive environment, therefore, their relationship between traffic participants; and, the different
actions are also results of this interaction. That is, the actions contributions that each traffic participant has in relation to the
might depend on each other as a result of this interaction. actions of the target vehicle [5], [23], [24].
There are different methodologies for using maneuver’
Therefore, there are also different interaction-aware frame-
intentions. We describe below some of these approaches that
works. We describe below some of these approaches that stood
stood out in the primary studies listed in the review.
out in the primary studies listed in the review.
• Trajectory Similarity: It uses algorithms to classify
trajectories according to the similarity of motion patterns • Spatial Relationship: This is a representation of how the
using reference trajectories. Unsupervised learning algo- position of each traffic participant influences the future
rithms, such as clustering algorithms, are often used. It position of the target vehicle. It can be represented by an
is also possible to use probabilistic models to learn the either dense or sparse data structure. Examples of dense
representation of each maneuver motion pattern. models are Occupancy Grids, Bird’s Eye View images,
• Individualized Motion Models: It builds motion models and frontal images. Meanwhile, graph structures are a
for each maneuver class using either curve regressions sparse spatial representation of the traffic scenario.
(e.g., Splines or Bezier) to represent each maneuver, or by • Risk Assessment: Risk assessment metrics are also used
applying motion equations in filters (e.g., Kalman Filter) to represent an interaction between traffic participants,
or sampling techniques (e.g., Monte Carlo simulation). such as time-to-collision, time-to-react, and others. It can
• Input Features: The intention prediction can also be be used either as an input feature of the predictor, or as
used as an input feature of trajectory predictors. It is an interaction heuristic to assess trajectory samples.
possible to use it as a conditional variable of probabilistic • Traffic Rules Compliance: Compliance to traffic rules
or autoencoder frameworks, in the feature vector of can also be used either as an input feature of a framework
learning-based models, or to select specific features to or heuristic to assess trajectory samples. There are many
feed the predictor. traffic rules, but the models presented in the literature are
• Specialized Learning Models: The future trajectory limited to a couple of them, such as speed limits, stop
of the vehicle is estimated using learning models that signs, and traffic lights.
specialize in each motion pattern. For instance, in multi- • Social Behaviors: They are navigation rules that are
modal encoder-decoder architectures, different decoders widely followed by drivers, although it is not strictly
predict the trajectory for each intention. obligated. An example of social rules is the space between
the rear vehicle and the following leader. Other social
navigation rules are average driving speed between traffic
V. I NTERACTION - AWARE T RAJECTORY P REDICTION participants, maneuver execution steps, and priorities.
Interaction between traffic participants is one of the main These rules represent features for either the prediction
and most challenging components of autonomous systems, step or for evaluating trajectory samples.
because autonomous vehicles are present in traffic alongside • Interdependent Actions: Another important interaction
other vehicles, whether autonomous or driven by humans [14]. approach is the interdependence of the traffic participants
In this way, all the actions of these agents might influence the actions. For instance, it can be measured by probabilis-
actions of others, creating a system of closely interconnected tic models, which estimate the conditional probability
actions [5]. In addition, there are several ways in which distribution of maneuver intention of traffic participants.
traffic participants interact, since vehicles interact with each Game theory techniques are also applied for modeling the
other, with other vulnerable road users (e.g., pedestrians and influence of each agent’s action for the traffic scenario
cyclists), and also with the surrounding, which means that representation.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

TABLE I: Analytical Methods - Primary Studies Summary into a joint motion models. The interaction module estimates
Technique Type Primary Study Maneuver/Interaction non-collision states for all traffic participants from a set of the
Cosine or
Sine Functions
Intention [25]–[31] lane change, lane-keeping possible future positions predicted using the motion models.
Interacting Multiple Intention [27], [28] lane change, lane-keeping Schreier, Willert, and Adamy [30], [31] employed a discrete
Models Interaction [32], [33] risk assessment
lane change, lane-keeping, and static Bayesian Network (BN) for intention prediction
[30], [31], [34]
Intention
[35], [36]
turn left/right, follow vehicle, using road structure features such as the existence of neigh-
Path Following brake, overtaking
Models risk assessment, boring lanes or the time to an intersection, and also physical
Interaction [37]–[40]
spatial relationship features (e.g, velocity and acceleration). This BN is able to
lane change, lane-keeping,
Regression Intention [36], [41]–[45]
evasion, brake estimate the likelihood of six maneuvers (i.e. left/right lane
Curves
risk assessment change, turn to the left/right, no maneuver, follow vehicle,
Interaction [37], [46]–[48]
spatial relationship
Non-linear spatial relationship lane-keeping, and brake). After this, a specific path-following
Interaction [49]
Optimization traffic rules compliance function performs the trajectory estimation for each maneuver.
There are many path-following techniques from both trajectory
VI. T ECHNIQUES FOR T RAJECTORY P REDICTION planning and microscopic traffic simulation perspective [39].
The choice of techniques in each work takes into account
Several techniques have been used for either intention-aware characteristics such as comfort, simplicity, computational cost,
or interaction-aware trajectory prediction, due to so many as- model inputs, interaction between traffic participants, road
pects related to intention prediction, interaction modeling, and geometry, or traffic scenarios.
time series analysis. These techniques extend by analytical, Annell, Gratner and Svensson [35] proposed an intention-
probabilistic, and machine learning models. In addition, there aware trajectory prediction by using a weighted function
are also architectural design, and auxiliary techniques, which that combines the result of prediction using motion equation
aim to improve the performance of other approaches. Hereby, and reference trajectories of the desired maneuver. Other
in order to answer RQ1, we highlight the main approaches approaches also adopt weighted functions to combine the
and topics for intention-aware or interaction-aware trajectory result of different predictions [36], [38], making it possible
prediction. to manually adjust the weights of each module, or learn them.
In turn, Jeong and Yi [40] applied a Bidirectional Long
A. Analytical Methods Short-Term Memory (Bi-LSTM) to predict lane change inten-
Trajectory prediction using analytical methods is inspired by tion, and path-following models for trajectory prediction. The
studies on trajectory planning, in which mathematical models path-following models are parametrized as the maximum yaw
are used to design the trajectory of vehicles. Parametric curves rate for lateral motion and desired velocity for longitudinal
are generally used, such as Splines, Bezier, and Clothoids. In motion. An interaction module eliminates unrealistic trajecto-
addition to the regression of curves, it is possible to use the ries by using a cost function that considers the maneuver like-
motion equations of the agents, which can be used alone or lihood and collision estimations. Alternatively, Song et al. [37]
together with filtering techniques, such as the Kalman Filter. used Gaussian Mixture Models (GMM) with Hidden Markov
Table I shows a summary of primary studies that applied Models (HMM) to predict lateral (i.e., lane change and lane-
analytical methods for trajectory prediction. keeping) and longitudinal (i.e., yield and not yield) maneuver
He et al. [25] used a Dynamic Bayesian Network (DBN) intentions. To estimate the future states of interacting vehicles,
for predicting lane change and lane-keeping intention on they used motion equations that take into account no-collision
highways, and a cosine function for trajectory estimation. and comfort criteria for longitudinal maneuvers, and fifth-order
Similarly, Woo et al. [26] predicted maneuver intentions using polynomial profile for the lateral maneuvers.
multi-class Support Vector Machine (SVM), and applied a si- Other approaches have also used curves to estimate the
nusoidal and potential field functions to generate, respectively, trajectories according to the prediction of intention [41]–
lateral and longitudinal trajectories. Sinusoidal functions are [43], [46], such as in Lienke et al. [45], which predicts lane
often used for lane change path estimation [26]–[31]. change intention using multi-class SVM, and for each class it
Xie et al. [27] also combined physical-based and maneuver- estimates coefficients of a cubic polynomial function. Another
based approaches for trajectory prediction. They used a DBN possibility for trajectory prediction is to use optimization
for estimating driving behavior and a sine function for es- techniques to estimate the future states of vehicles, subject to
timating the long-term trajectory of lane-change maneuvers. model constraints. For instance, Ding and Shen [49] proposed
In addition, they used Interacting Multiple Models (IMM) to a non-linear optimization subject to cost functions that take
combine the long-term prediction with a short-term estimation into account interaction features, such as spatial relationship,
of an Unscented Kalman Filter (UKF). Zhang et al. [28] also traffic rules compliance (i.e., red light and speed limit), and
applied sine function for lane-change prediction, which was road geometry.
combined with Constant Turn Rate and Acceleration (CTRA)
motion model using IMM. Alternatively, Lefkopoulos et al.
[32] employed IMM with Kalman Filter (KF) to combine B. Sampling Methods
physical-based, maneuver-based, and interaction properties. These techniques draw a sampling of future states of the
This framework has a motion model for each specific lon- traffic participants. Thus, they are more robust to noise and
gitudinal and lateral driving intention, which are integrated uncertainty, since instead of predicting a single trajectory, the
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

TABLE II: Probabilistic Models - Primary Studies Summary


Technique Type Primary Studies Maneuver/Interaction
EV Gaussian Mixture
Intention [28], [44], [56]–[59] lane change, lane-keeping
risk assessment,
Models Interaction [33], [37], [60], [61]
A C interdependent actions
turn left/right, lane change,
Intention [53], [62], [63] lane keeping, go-straight,
TV Gaussian
stop-and-go
Process
risk assessment,
Left Lane Change Lane Keeping Right Lane Change Interaction [39], [64]
spatial relationship
Markov Chains Interaction [65] interdependent actions
(a) Trajectory Segment Sampling lane change, lane keeping,
Hidden Markov Intention [36], [43]
turn left/right, overtaking
Models
Interaction [33], [37], [60] interdependent actions
Intention [25], [62] lane change, lane keeping
Dynamic Bayesian
EV Network Interaction [66]–[68]
interdependent actions,
spatial relationship
A C Partially Observable
Markov Decision Interaction [37], [69], [70] interdependent actions
Process
TV
Left Lane Change Lane Keeping Right Lane Change

(b) Particle Sampling They are especially used when prediction frameworks are
Fig. 2: Example of Sampling Methods for Trajectory Predic- designed to model the noise and other uncertainties in the
tion observations and prediction models. These methods are also
powerful tools for modeling conditional relationships. For
example, by conditionally modeling the relationship between
output of the methods is a distribution of the future states of the future state of the target vehicle given its maneuver’s
the vehicle. In addition, it is also possible to rank each sample intentions. Table II shows a summary of primary studies that
according to different metrics, for example, taking into account applied probabilistic models for trajectory prediction.
the interaction (e.g., spatial relationship, risk assessment, and Hu, Zhan and Tomizuka [56] proposed a Semantic-based
traffic rules compliance) or maneuver intention. Intention and Motion Prediction (SIMP), which uses multiples
There are basically two kinds of sampling, either generating 2D Gaussian Mixture Model (GMM) to model the probability
multiples trajectory segments or particles states. Figure 2 distribution of motion patterns in driving scenarios, and Deep
shows an illustrative example of both sampling techniques ac- Neural Network (DNN) to estimate the probability of entering
cording to the maneuver intention. These samples are usually the intersection area. Other approaches also used GMM to
generated by path following models or motion equation [50], model individual motion patterns [44], [57]. For instance,
[51], but can also be the result of learning or probabilistic Deo, Rangesh and Trivedi [33] presented a framework built
techniques [52]. For instance, Houenou et al. [34] performed with three layers. The first layer is maneuver recognition
trajectory prediction by combining a model-based approach using Hidden Markov Model (HHM). The second layer is a
assuming Constant Yaw Rate and Acceleration (CYRA), and trajectory prediction using Interactive Multiple Model (IMM)
a maneuver-based method that matches the past trajectory with and Variational Gaussian Mixture Models (VGMM). Finally,
the centerline of the current driving lane. After detecting the the third layer is a vehicle interaction module that takes
maneuver, a set of trajectories is generated and the best one into account the context of the surroundings by minimizing
is selected by minimizing a cost function. Alternatively, Tran an energy function, that takes into account the maneuver
and Firl [53] used a normalized three-dimensional Gaussian likelihoods and the state of all surroundings vehicles.
Process (GP) regression model to learn motion patterns of In turn, Liu et al. [62] presented a Driver Characteristic
vehicle behavior at an intersection, and applied the Monte and Intention Estimation (DCIE) using Dynamic Bayesian
Carlo simulation method for multimodal trajectory prediction. Network (DBN) and vehicle’s trajectory prediction using
Similarly, Wissing et al. [39] proposed an interaction-aware Gaussian Process (GP). The DCIE method is able to estimate
trajectory prediction, which relies on Monte Carlo simulation the driver intention and also the driving style (i.e. prudent,
to model interactions and predict the distribution of possible stable, and aggressive driving), using a two-layered DBN.
future positions of the target vehicle. The approach uses Intel- The vehicle trajectory is then predicted using GP according to
ligent Driver Model (IDM) [54], [55] to take into account the each combination of intention and driving style. Alternatively,
interactive behaviors of traffic participants. At each iteration Tran and Firl [63] used multiples GP for maneuver intention
of the MCS, a lane-change driving model that considers three prediction, and combined it with Unscented Kalman Filter
possible lateral maneuvers (i.e., lane-change left/right and (UKF) for long-term trajectory prediction. In turn, Guo et al.
lane-keep) spreads the particles, and also addresses specific [64] applied GP to model different driving patterns, and also
characteristics of the driving scenario. considered spatial interaction among traffic participants.
Reachable sets is another stochastic framework for mo-
C. Probabilistic Models tion prediction, which consider different driving situations
Probabilistic frameworks are also used for trajectory predic- or patterns simultaneously. Althoff, Stursberg and Buss [65]
tion. Similar to sampling models, it can draw a probabilistic proposed a trajectory prediction with two stages. The first stage
distribution over the future positions of the traffic participants. discretizes the reference trajectory with all possible routes of
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

the driving scenario. The second stage model the vehicle dy- E. Recurrent Neural Networks
namic with hybrid automata (i.e., with discrete and continuous
variables), whose discrete states are longitudinal maneuvers Recurrent Neural Networks (RNNs) are variations of Ar-
(i.e., acceleration, deceleration, standstill and speed limit). tificial Neural Networks (ANNs) with recurrent connections
This framework takes into account the interdependence among between neurons, developed to process sequential data that
vehicle actions, and use Markov Chain (MC) to represent the might have temporal dependencies [83]. According to Sale-
behavior of each traffic participant. hinejad et al. [83], recurrent units update their states using
Hidden Markov Models (HMMs) are another probabilistic previous states, which are stored in a memory-like structure.
framework used for trajectory prediction. Li et al. [60] pre- This characteristic allows RNNs to learn long-term patterns.
sented a two-layer framework for interaction-aware trajectory Moreover, trajectory prediction inherits part of its theoretical
prediction using HMMs, which take into account the condi- background from time series prediction. Thus, an inherent
tional dependence of vehicles behaviors. This technique can characteristic is the temporal dependence between features
also be applied for intention prediction [33], [36], [37], [43], from different time frames. Therefore, sequential models are
or to model specific motion patterns. very important because they can handle temporal dependence
Schulz et al. [66], [67] developed an interaction-aware and also extract temporal features from the input data. There
trajectory prediction using DBN. Their approach also takes are many approaches that use these methods and variations
into account the interdependent actions of traffic participants to build reliable prediction models for both intention-aware
based on Markov process, where the agents’ decision-making and interaction-aware trajectory predictions. Table III shows
were divided into three hierarchical stages, the route intention, a summary of recurrent neural networks founded through the
the maneuver intention, and the continuous action. Similarly, analysis of the primary studies, and that were applied either
Gill, Pisu and Schmid [68] proposed a DBN for trajectory for feature extraction or trajectory regression.
prediction, considering driving styles (i.e., aggressive and pas- Xin et al. [95] employed two LSTM for intention and
sive), maneuver intention, driver states, and spatial interaction trajectory prediction. In this architecture, the first LSTM
among traffic participants. estimate the likelihood of each maneuver using a sequence
Partially Observable Markov Decision Process (POMDP) is of lateral features, and the second LSTM uses the maneuvers’
another probabilistic framework that can model the interdepen- likelihood and motion features to forecast the trajectory. A
dence between vehicle actions. This technique is a variation of similar approach was adopted by other primary studies, ei-
standard Markov Decision Process (MDP) in which the state ther using recurrent neural network for intention prediction
of agents are not fully observable, therefore it maps states to [95], clustering algorithms [48], or other machine learning
action while handling uncertainty [71]. Rhinehart et al. [69] techniques [96]. For instance, Deo and Trivedi [72] applied
proposed a multi-agents trajectory prediction system using encoder-decoder deep learning architecture with two LSTM to
POMDP (PRECOG), that considered the vehicles interaction predict the intent of six maneuvers of vehicles on a highway.
and was conditioned on goals. The architecture predicts three lateral maneuvers (i.e., left
lane change, right lane change, and lane-keeping) with two
longitudinal motion variations (i.e., brake and normal), using
D. Encoder-Decoder Deep Learning Architectures the historical trajectory as the input of the encoder network.
This is the main approach for trajectory prediction that Multi-modal encoder-decoder architecture is another ap-
uses deep learning, both for sequential (e.g. Recurrent Neural proach for trajectory prediction [58], [77], [94]. For instance,
Networks) [15], [59], [72]–[74], and static (e.g., Convolutional Xing, Lv and Cao [58] presented an intention-aware trajectory
Neural Network) models. It was first introduced in sequence- prediction based on driving style recognition, in which a
to-sequence deep learning models, and widely applied for Gaussian Mixture Model (GMM) Clustering classifies styles
machine translation task, speech recognition, and time series into ‘Conservative’, ‘Moderate’, and ‘Aggressive’. For each
analysis [75]. class, a specific MLP (Multi-layer Perceptron) predicts the
An encoder-decoder architecture is composed of two mod- trajectory, using also the features extracted from an LSTM.
ules, an encoder and a decoder [76]. The first module (encoder) Alternatively, Fei, He and Ji [77] proposed an interaction-
is responsible to extract features from the input vector, and aware framework with decoders for each maneuver intention
can be built using different techniques, such as Convolutional (i.e., lane change and lane keeping), where the predicted
Neural Networks (CNN), Long-Short Term Memory (LSTM), trajectory is a linear combination of the decoders’ outputs.
or Graph Neural Network (GNN). It is also possible to A widely adopted interaction-aware approach estimates a
combine techniques to create the feature vector, for instance, prior trajectory and then refine it using different methods. For
using static and sequential networks for spatial and temporal instance, Wang et al. [47] combined knowledge reasoning and
features, respectively. Finally, a decoder network estimates LSTM networks for interaction-aware trajectory prediction.
trajectories using the feature vector as input. There are also This framework is divided into two layers. Initially, reference
approaches that also use multi-modal decoder, that is, an trajectories are estimated using cubic Bézier curves and the
architecture with multiples decoder networks. Each decoder intention of the target vehicle. This intention is estimated
is specialized in estimating specific trajectory classes. For through rule base reasoning using a knowledge-base built by
example, decoders that specialize in estimating trajectories for traffic rules and driving experiences (i.e., social navigation
each class of maneuver intention [61], [73], [77]. rules). The second layer predicts the final trajectory based
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

TABLE III: Summary of Recurrent Neural Networks for Trajectory Prediction


Technique Description
A recurrent neural network architecture that introduced the concept of gates to control the amount of information that either is stored or passes through
Long Short-Term
each cell (unit component of the network). Each cell has three gates, named input, output and forget gates, in addition to a memory cell (ct ) [78],
Memory (LSTM)
[79].
It is a gated recurrent neural network with two gates, called reset and update gates. Unlike LTSM, it does not have a memory cell (ct ). Therefore, it
Gated Recurrent
has less parameters than LSTM. However, the literature has not yet shown relevant differences in the performance of the two networks for trajectory
Unit (GRU)
prediction [80], [81].
Relational Recurrent It is augmented-memory based recurrent neural network, which allows memories interaction using Multi-head Attention Mechanism. This mechanism
Neural Network aids to reduce the locality bias of vanilla RNNs [74], [82].
Stacked This is an architectural approach to adding depth to RNNs by hierarchically organizing LSTM networks into layers. The input of a hidden layer is
LSTM the hidden state vector of its previous layer [15], [83].
Minimal Recurrent It is a gated recurrent neural network that prioritizes minimal operations within its cell, which has only the update gate. To do that, it first maps the
Neural Network input to a latent space. Finally, it updates the hidden state using its previous state and the latent representation of the input [84], [85].
Convolutional This is another variation of LSTM network, which uses convolutional operation instead of the standard matrix multiplication of LSTM networks.
LSTM Therefore, it can process sequential images or other grid representations [86]–[90].
It is a variation of LSTM networks, in which it estimates forward and backward hidden sequence. This feature improves the performance of the
Bidirectional
network in many tasks that deals with sequence data and temporal dependences. However, Bi-LSTM has a higher computational cost than standard
LTSM
LSTM [40], [83], [91], [92].
It is based on Structural-RNN [93] and can handle spatio-temporal interactions using LSTM networks. The first step extracts temporal features for each
Spatio-Temporal traffic participant using an LSTM. These features are concatenated and another network extracts the interactive spatial features. Attention mechanism
LTSM is used to differentiate the contribution of different traffic participants. In addition, the architecture also use residual connection between LSTM cells
[94].
It is composed of multiples LSTM cells, one for each vehicle, which processes the past trajectory of each vehicle separately. However, each LSTM
Structural
cell receives as input the states of a specific vehicle and its of surrounding vehicles. The output is a context vector encoding the interaction of the
LTSM
vehicles [15].

on temporal features and the reference trajectories. The rule Other recurrent neural network applied for trajectory predic-
base reasoning divided rules instances into safety, legitimacy, tion are the Relational Recurrent Neural Network (Relational-
and reasonableness, and uses ontology to model the driving RNN) [82] and Minimal Recurrent Neural Network (Minimal-
scenario. In turn, Ju et al. [97] proposed an Interaction-aware RNN) [84]. Messaoud et al. [74] used an Occupancy Grid to
Kalman Neural Network (IaKNN), which uses LSTM net- encode the position of the vehicles, and an encoder-decoder
works for learning the process and measurement noises used Relational Recurrent Neural Network (Relational-RNN) for
to updated trajectories previously predicted by the interaction trajectory prediction. In Relational-RNN the input feature and
and motion layers. memory are related through an attention mechanism [74], [82].
A matrix representation is another way to model interac- Alternatively, Min et al. [85] proposed an interaction-aware
tion [98]–[100], where features can be extracted by either trajectory prediction using Deep Ensemble with Minimal-RNN
Convolution Neural Networks (CNNs), CNNs followed by for spatial relationship modeling among traffic participants.
LSTMs, or Convolutional LSTM (Conv-LSTM) [24], [86],
[89], [90]. Khakzar et al. [87] model the interaction of the
traffic participants using Occupancy Grid (OG) and Risk Map F. Convolutional Neural Networks
(RM). A risk map is a grid that encodes the remaining time Convolutional Neural Network (CNN) is a successful deep
until a collision between vehicles using Time-To-Collision learning technique, which extracts features from matrix data
(TTC). Convolutional LSTM is applied to process the risk representations using convolutional operations [102]. When
map. Similarly, Mukherjee, Wang, and Wallace [88] presented applied for trajectory prediction, it can extract spatial features
a framework which uses an OG for spatio-temporal interaction, from Occupancy Grids (OG), and other data representations
where Conv-LSTM layers extract spatio-temporal features. (e.g., bird’s eye view image of the surroundings). In addition,
Chandra et al. [24] designed an image-based end-to-end tra- CNNs have been especially used for vision-based trajectory
jectory predictor using Conv-LSTM for dense traffic situations prediction frameworks, which are approaches that use images
with heterogeneous agents (e.g., cars, buses, and pedestrians). as input features.
There are some variations of networks based on LSTM for Deo and Trivedi [103] presented an encoder-decoder archi-
trajectory prediction, such as Bidirectional LSTM [40], [92], tecture using Long Short-Term Memory (LSTM) and Con-
Stacked LSTM [15], [101], Structural LSTM [15], and Spatio- volutional Social Pooling (CSP). The encoder is composed
Temporal LSTM [94]. Hou et al. [15] proposed a hierarchical of LSTM with shared weights for each vehicle of the traf-
RNN using an encoder-decoder architecture with two-layered fic scene. The interaction and spatial features are extracted
LSTMs (Stacked Long Short-Term Memory) to model the through a CNN called Convolutional Social Pooling (CSP),
interaction of traffic participants on a highway scenario. They which receives as input a spatial grid (i.e., a 13×3×m matrix
used Structural-LSTM instead of standard LSTM networks. relative to the position of the target vehicle) representation
In turn, Dai, Li and Li [94] presented a long-term interaction- encoding the states of the LSTM. The decoder network com-
aware trajectory prediction in dense traffic using the historical bines the output of the CSP and the target vehicle LSTM, and
trajectories of each vehicle and Spatio-Temporal LSTMs. predicts the distribution of future motion and also the future
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

vehicle’s behavior (i.e., lateral and longitudinal maneuvers). TABLE IV: Graph Neural Networks - Primary Studies Sum-
Song et al. [104] also applied CSP for interaction-aware mary
trajectory prediction, but they also considered the planned Technique Type Primary Studies Maneuver/Interaction Graph Type
turn left/right,
Intention [124] digraph
trajectory of the ego-vehicle within their framework. Alter- go-straight
spatio-temporal,
natively, Mo, Xing and Lv [105] used a 3 × 3 × m grid, Graph
Interaction
[85], [92]
spatial relationship
undirected graph,
[125]–[127] heterogeneous
in which the embedded features of the target vehicle and digraph
Graph Convolutional
surrounding vehicles are stored, respectively, in the center Model
Interaction [128] spatial relationship spatio-temporal
spatio-temporal,
and edges cells of the grid. In addition, convolutional neural undirected graph,
Spectral Graph [106], [129]
networks have also been used to extracts geometrical features Convolutional Network
Interaction
[130], [131]
spatial relationship digraph,
heterogeneous
from the surroundings [23], [51], [106], [107]. Spatial Graph
digraph
Interaction [132] spatial relationship undirected graph
A grid structure can represent other interaction features, Convolutional Network
Spectral Temporal
besides spatio-temporal properties. In Krüger et al. [108], Graph Neural Network
Interaction [107] spatial relationship spatio-temporal
spatio-temporal,
a module called traffic scene context evaluation was build Graph Attention
Interaction
[51], [106]
spatial relationship
digraph,
Network (GAT) [131] heterogeneous
to extract features from a 3D spatio-temporal context tensor digraph
Edge-enhanced Graph
using a 3D convolutional kernel. This context tensor represents Convolutional Network
Interaction [133] spatial relationship weighted digraph

several interaction features (i.e., road geometry, spatial rela- Graph Self-Attention
Network
Interaction [134] spatial relationship weighted digraph
tionship, and traffic compliance rules), which were modeled Spectrum
Interaction [135] spatial relationship
undirected
Graph-LSTM weighted graph
using potential field. Voxels representation is another way to heterogeneous
Semantic Graph spatial relationship digraph
deal with 3D features (e.g., point cloud). For instance, Ye, Network
Interaction [136]
traffic compliance (semantic graph),
spatio-temporal
Cao and Chen [109] applied sparse convolution to learn spatial Spatio-Temporal Graph
heterogeneous
Interaction [137] spatial relationship digraph,
features from voxel representation. Neural Network
spatio-temporal
Vision-based prediction is one of the major applications
of CNNs for trajectory prediction in autonomous vehicles, in
which it extracts features from images, usually from frontal Dash and Agarwal [117] used GAN in a three-component
cameras. Du et al. [110] proposed a three-stage motion pre- architecture, the encoder-decoder generator, a pooling module
diction with obstacle detection, optical flow estimation, and responsible for modeling the interaction, and the discriminator.
trajectory prediction. The input of the predictor is a six- Li, Ma and Tomizuka [118] proposed a framework that uses as
channel image with the detections and optical flow results. input the latent noise with the past trajectory of the vehicles.
Another data representation used with CNNs are the Bird’s Li et al. [119] applied Signal Temporal Logic (STL) to
Eye View (BEV) images [111], [112], which shows a top model the navigation and traffic rules compliance in a trajec-
view image of the traffic scenario. A BEV image can be built tory predictor. The proposed architecture used a syntax tree
from several sources such as projecting frontal images, stereo created with the STL formulas to assess the degree of traffic
cameras’ images, with aerial images, LiDARs point cloud, rules satisfaction of synthetic trajectories in a discriminator of
Occupancy Grid (OG), or High Definition Maps (HD-Maps). a GAN architecture. The rules used took into account navi-
Similar to grids, this images can also represent road geometry, gation near the center of the lane, stopping area, and driving
spatial relationship, and other interaction features. Cui et al. speed. Another application of GANs is known as Imitation
[113] used CNNs to learn features from a BEV image of Learning [120], [121], such as Si, Wei and Liu [122] that
HD-Maps with road geometry and other traffic participants proposed a trajectory predictor with online adaptation using
positions. Similarly, Sadeghian et al. [114] combined top view Parameter Sharing Generative Adversarial Imitation Learning
images from the traffic scenario with the past trajectory of the (PS-GAIL) [123].
agents. In addition, the authors proposed a visual attention
module that predicts important areas of the image, which
improves the performance of the framework. H. Graphs
Graphs are versatile data structures capable of modeling
G. Generative Adversarial Networks both simple and complex relationships between entities. Enti-
Another successful deep learning framework is Generative ties are called vertices, which relate to each other by means
Adversarial Network (GAN), which is also applied for trajec- of edges. Thus, the vertices and edges create a relational
tory prediction. This framework has two components, called topological representation [138]. Moreover, topological graphs
generator and discriminator. The generator learns how to create have also been applied to sparsely represent road geometry,
patterns that follow a data distribution from a given input, waypoints, and directions, which plays an important role in
e.g., trajectories. In turn, the discriminator distinguishes the trajectory prediction. Similarly, graphs can also model spatial
generator’s output between synthetic and real [115]. interaction between traffic participants by creating an agent-
There is a wide range of possibilities to build GANs centric spatial structure, where the agents are the vertices and
architectures. Roy et al. [116] employed a GAN for trajectory the relationships between them are the edges.
prediction of vehicles using aerial images. The training proce- As stated in Chami et al. [138], because of the versatility
dure consists of generating trajectories similar to the ground- of graphs, there are also many machine learning techniques
truth (real trajectories), until the generator produces trajecto- specialized in learning features from graphs. Among the tech-
ries indistinguishable from the real ones. Similarly, Hegde, niques, the Graph Neural Network (GNN) is a deep learning
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

architecture based on neural networks for graph embedding. time, called Spectral Temporal Graph Neural Network (ST-
That is, they are deep learning techniques that learn how to GNN). They used Inverse Graph Fourier Transforms (IGFT)
represent vertices or the entire graph structure, taking into to combine both results, and also a Multi-Head Attention
account the features vector of each vertex, edges connections, Mechanism (MHA) to reduce the propagation error for long
and the dynamism of the graph structures [139]. Table IV time horizons. In addition to the interaction graph, the authors
shows a summary of primary studies with GNN architectures also modeled the environment using graphs built using Con-
applied for trajectory prediction. volutional Neural Networks (CNNs). Ma et al. [127] proposed
The main concerns in graph-based architectures are the a framework called TrafficPredict that applied heterogeneous
type of graph (e.g., undirected, digraph, heterogeneous, spatio- spatio-temporal graph, which differentiate the class of each
temporal, or others), and how to construct and extract features traffic participant (i.e., pedestrian, car, or cyclist) within the
from them. The most common approach is to use graphs to vertices. This approach allows the predictor to consider differ-
represent only the spatial interaction between traffic partici- ent characteristics in the movement of each class. There are
pants [92], [129], [131], [132], [135]. For instance, in Diehl other primary studies that applied graph neural network using
et al. [131], Spectral Graph Convolutional Network (GCN) spatio-temporal graphs [92], [130], [137].
and Graph Attention Network (GAT) were evaluated. The In addition to spatial interaction, some primary studies also
authors compared different methodologies for the use of graph used graphs to represent road geometry [90], [107], [124],
networks, such as the use of residual weights and types of [136], [137]. Quehl et al. [124] modeled the road network
vertex connections (e.g., self connections, fully-connected, using digraph with transition probabilities that express the
among others). Alternatively, Zhao et al. [129] combined tendencies of a target vehicle to choose the next waypoint.
different social features for trajectory prediction. They created They used sample trajectories to build the probabilistic digraph
an interaction feature vector by concatenating features from a based on features such as velocity and time of the day. In
Convolutional Social Pooling (CSP) [103], two-layer Spectral turn, Hu, Zhan and Tomizuka [136] designed a prediction
GCN, and the historical trajectory of the target vehicle. framework using semantic graphs built using waypoints of
Chandra et al. [135] proposed an interaction-aware trajec- reference path from either a High Definition Map (HD-Map)
tory prediction that also predicts maneuver intentions (i.e., or trajectories clustering. A semantic graph is a heterogeneous
overspeeding, underspeeding, or neutral). This framework use graph structure that represent the road geometry and also
undirected weighted graphs to model the spatial relationship regulatory elements, e.g., traffic signs. Similarly, Pan et al.
among traffic participants, and a rule-based algorithm to pre- [137] also used a heterogeneous graph with vertices that
dict maneuver intention using the graph spectrum prediction represent vehicles and lanes waypoints. However, they built a
from a Spectrum Graph-LSTM network. In turn, Weng, Yuan spatio-temporal graph with two types of temporal edges, i.e.,
and Kitani [132] presented an 3D Multi-Object Tracking vehicle-to-vehicle and vehicle-to-lane edges. In addition, an
(MOT) and trajectory prediction using the interaction features attention mechanism was introduced to differentiate the target
embedded by Spectral GCN using undirected graphs. They vehicle in consideration towards its surrounding lanes.
predicted a multi-modal trajectory by using Conditional Vari- Attention mechanism is another deep learning technique that
ational Autoencoder (CVAE) and LSTM. is often used with graph-based trajectory prediction, which dif-
It is also possible to model the spatial interaction through ferentiate the importance of the adjacency vertices in the target
time. In this case, the primaries studies often use spatio- vertex embedding [51], [106], [107], [127], [130], [131], [133],
temporal graphs. A spatio-temporal graph is a graph structure [134], [136], [137]. Ye et al. [134] proposed a Graph Self-
defined by a set of vertices, and two sets of spatial and Attention Network (GSAN) with dynamic weighted digraph
temporal edges. The spatial edges connect vertices at a specific to model the interaction of traffic participants. The GSAN
time-slice, meanwhile the temporal edges connect vertices extracts spatial features of the traffic scenario, while a GRU
through different time-slices. For instance, Li, Ying and Chuah extracts temporal features. In turn, Mo, Xing and Lv [106] pre-
[128] proposed a framework for trajectory prediction using sented a prediction framework built upon an encoder-decoder
spatio-temporal graph, whereat Graph Convolutional Model design, where GAT extracts spatial interaction features from a
(GCM) extracts spatio-temporal features through a sequence heterogeneous digraph. This graph structure has two types of
of convolutional layers and graph operation. Similarly, Chen et vertices, i.e., vehicles and local road map. Alternatively, Jeon,
al. [125] presented an encoder-decoder architecture, where the Choi and Kum [133] applied a network called Edge-enhanced
encoder has three LSTM network for spatial-edge, temporal- Graph Convolutional Network (EGCN), where the attention
edge, and node embedding. The features from each network scores estimation uses edges features. In this framework, the
are concatenated and fed to an LSTM decoder. In turn, Li edges features are the relative state between two nodes, such
et al. [51] applied attention mechanism with graph network, as position, velocity, and heading angle.
where two attention layers extracts features from, respectively,
the topological and temporal representations. In addition,
they applied kinematic constraint after the decoder to ensure I. Autoencoders
feasible trajectory outputs using either Extended Kalman Filter This is an unsupervised machine learning technique, based
(EKF) or Monte Carlo Simulation (MCS). on artificial neural networks, that learns how to represent data
Alternatively, Cao et al. [107] proposed a framework that distribution. It is composed of two modules, the encoder and
applies spectral convolutional operations in the space and decoder. The encoder maps the input to an intermediate data
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

Maneuver Intention
trajectory prediction based on the cause/effect implication
of drivers’ intentions. They used spatio-temporal graphs to
model the interaction among traffic participants, and used the
interaction features to predict maneuvers intention and the
trajectory. In addition, a heatmap was built using multiples
possibles trajectories for each vehicle drawn by the CVAE.
Latent Space
Encoder Decoder
Alternatively, Sriram et al. [90] presented a framework that
predicts multi-modal trajectory of all traffic participants at
the same time. This architecture uses Convolutional LSTM
and CVAE to perform, respectively, scene context feature
Observed Trajectory Future Trajectory
extraction and trajectory prediction. Moreover, the trajectory
Fig. 3: A general framework for trajectory prediction using prediction using the autoencoder network is also conditioned
Conditional Variational Autoencoder to maneuvers’ intention. In turn, Hu et al. [144] also pro-
posed a multi-modal trajectory prediction framework based on
CVAE. However, this framework only considered interacting
representation (i.e, latent space), while the decoder maps this scenarios between a pair of vehicles. Furthermore, Dynamic
representation into the output pattern [140]. In this sense, a Time Warping (DTW) was used for intention prediction, in
Conditional Variational Autoencoder (CVAE) is a modification which it compares a sequence of the historical trajectories of
of Variational Autoencoder (VAE), which conditions both the target vehicle and reference trajectories for each maneuver
the encoder and the decoder to a conditional variable. This intention.
modification allows greater control over the data generation Other autoencoder variations were also founded from the
process, since there will be a distribution for each possible analysis of the primary studies. Fei et al. [145] used Condi-
value of the conditional variable [140]. In trajectory prediction, tional Wasserstein Autoencoder (CWAE), which replaces the
this technique can also estimate the distribution of future KL-divergence from the CVAE to the Wasserstein distance
vehicle positions [92], [132], [141]. Figure 3 shows a general [146] for adversarial training, i.e., similar to Generative Ad-
framework for trajectory prediction, where the conditional versarial Networks (GAN). Furthermore, this framework has
variable is the maneuver intention. In this case, the encoder six modules for features embedding (i.e., context embedder
maps interaction and motion dynamic features to a latent and single embedder), prior generator, posterior generator,
space, and the decoder takes into account maneuvers intention discriminator and decoder. The context embedder module is
to estimate future trajectories. responsible for extracting spatial interaction features from the
In Lee et al. [23], an encoder-decoder framework using historical trajectories of the surrounding and target vehicles.
CVAE and Gated Recurrent Unit (GRU) is proposed. This Meanwhile, Mixture Density Network (MDN) estimates the
system uses a CVAE to produces a sampling of future positions prior and posterior distribution, and, recurrent neural networks
of the target vehicle, while uses a ranking module to assess the and fully-connected neural networks build the discriminator
most likely samples. Similarly, Cho et al. [142] used CVAE and decoder.
and LSTM to estimate multiple hypotheses about the future Zhang et al. [147] proposed a Stacked Sparse Auto-
position of vehicles. However, besides spatial relationship Encoders (SSAE) to deal with a multi-modal high-dimensional
modeled through multiple hypotheses, they also used Signal input vector with motion and interaction features. This tech-
Temporal Logic (STL) to exclude unreasonable scenarios nique is composed of several auto-encoders that uses Convo-
based on traffic rules compliance (i.e., speed limit), collision lutional Neural Networks (CNNs) to process a road occupancy
avoidance, among other social navigation rules. Dulian and grid of size 9 × 5 × m, that represent the target vehicle’s lane
Murray [112] applied CNN networks to extract spatial features and its neighbor lanes. They also applied dilated convolutional
from Bird’s Eye View (BEV) images of a High Definition social pooling to extract global and local interaction features.
Map (HD-Map), and predicted the future trajectories using
a CVAE where the conditional variable was sampled from a
prior distribution during the testing phase. J. Reinforcement Learning
A common approach for trajectory prediction using CVAE This is another machine learning framework that has been
found in the analysis of primary studies was the trajectory applied for intention-aware and interaction-aware trajectory
prediction conditioned on the maneuvers’ intention [59], [61], prediction. In a nutshell, this technique learns a transition
[143]. Hu, Zhan and Tomizuka [61] designed a hierarchical policy between states, given that an agent performs some
probabilistic framework with two modules, the upper and action in the environment. This is a highly interactive learning
lower modules. In the upper module, SIMP [56] was employed strategy, in which the agent must continuously interact with the
to predict the intention of traffic participants. It predicts the environment. Each action causes a transition between states
likelihood of performing a sequence of action to occupy a and is evaluated by a reward function. The purpose of the
specific space in the scenario. Meanwhile, the lower module learning process is to maximize the reward function [148]–
was built using CVAE to obtain the distribution of the future [150]. In trajectory prediction, this technique has been applied
states of the vehicle. specially for learning human-like behaviors regarding interac-
Choi et al. [126] proposed a goal-oriented multi-modal tions and decision-making. For instance, González, Dibangoye
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

and Laugier [70] proposed a framework for long-term traffic and spatial features for prediction [134], [154], [155], or dif-
scene prediction using Inverse Reinforcement Learning (IRL) ferentiate the contribution of each surrounding vehicle for the
that learns a cost function that encodes the interaction and target vehicle trajectory [24], [51], [156]. In intention-aware
behaviors of traffic participants through demonstrations. This frameworks, it can also differentiate the importance of features
approach uses a grid representation to model the spatial for each maneuver intention. In addition, this technique has
relationship of the vehicles in highways, and uses a weighted also been applied in image processing to highlight features in
linear combination of static features (time-invariant) and dy- specific areas of the image [157], [158].
namic features (time-dependent) to evaluate states. The reward Hao et al. [73] proposed an encoder-decoder deep learning
of states also penalizes situations that lead to dangerous architecture using GRU and a self-attention mechanism. This
scenarios. approach shares an encoder module for both intention esti-
Sun, Zhan and Tomizuka [151] employed Hierarchical In- mation and trajectory prediction, which takes only historical
verse Reinforcement Learning (HIRL) and game-theory for trajectory as input of the model. Yan et al. [154] explored an
long-term trajectory prediction considering intention, interac- architecture with two different types of self-attention mecha-
tion, and also the future decisions of the ego-vehicle. The nisms, one for the driving context and one for the driving lane.
decisions of a human driver are modeled using discrete (e.g., The attention mechanism for the driving context is responsible
maneuvers) and continuous (e.g., smoothness, velocity, and for differentiating the importance of each vehicle present in the
acceleration) driving decisions. In addition, the game-theory is scenario. Meanwhile, the second attention mechanism takes
responsible for the discrete decision and explicitly considers into account that the trajectory of the target vehicle is more
the interdependency between the agents decisions, which is a influenced by the vehicles in its target lane. Similarly, Kim
consequence of their interactions. Finally, the future trajecto- et al. [159] also used a self-attention mechanism to focus on
ries of the target vehicle are also conditionally dependent on features from the target vehicle’s desired lane.
the future plans of the ego vehicle. Similarly, Hu, Sun and Kim, Kum and Choi [160] presented a recursive prediction
Tomizuka [143] proposed a planning-based framework using framework, where the predicted trajectory is also used as
Continuous IRL, in which the results are also conditionally input feature in an LSTM-based encoder-decoder architecture.
dependent on the future trajectory of the ego vehicle. Similar to other approaches, the attention mechanism appears
Schwarting et al. [152] also applied game-theory with IRL between the encoder and decoder components to selectively
for trajectory prediction. However, they modeled a system highlight specific features of the context vector before the
that also took into account social behaviors and driving styles decoder takes it as input [101], [161]. However, an attention
(e.g., altruistic, prosocial, individualistic/egoistic, and compet- mechanism can also extract features from specific contexts,
itive). Moreover, the interaction was represented by a non- such as in Zhang et al. [162] that proposed a module for social
cooperative dynamic game, where Social Value Orientation features only, which extract spacial and temporal interaction
(SVO) quantified the level of cooperation of each traffic par- features that are concatenated with other context vectors from
ticipant (i.e., their driving style). These driving styles indicate other encoders before being fed to the decoder.
how likely a driver is to restrain its own reward in favor of Furthermore, some approaches also applied attention before
the reward of other agents. the encoder. For instance, Wang et al. [111] developed a
framework for interaction-aware trajectory prediction using
historical trajectories, bird’s eye view (BEV) image of the road
K. Attention Mechanism geometry, and raw sensor data (e.g., camera and point cloud
Attention mechanism is a machine learning technique firstly images). In this framework, a prior module with GRUs and
employed in Natural Language Processing (NPL) to cope with CNN extract features from the input, that are highlighted by
the drawbacks of recurrent neural networks (RNN) with long- an attention model. Finally, the output of the attention model
term sequences processing. This mechanism is inspired by fed an encoder-decoder predictor.
the selective process of human cognition in data processing Alternatively, Messaoud et al. [156] proposed an encoder-
[153]. According to Niu, Zhong and Yu [153], humans tend to decoder framework using multi-head attention mechanism.
selectively prioritize some information over others during the This framework builds a grid over the surrounding areas
perception and cognition process. In summary, this technique with unrestricted number of vehicle, which is centered on the
estimates weights from the input information which represents target vehicle. The attention module uses the grid to select
its contribution to generating a desired output. Thus, as stated the surrounding vehicles to pay attention to while predicting
in Vaswani et al. [76], self-attention mechanism, also known the target vehicle’s future trajectory. In turn, Mercat et al.
as intra-attention, is the attention function that relies on the [155] combined two multi-head self-attention layers in an
input sequence to estimate the weights and the new context encoder-decoder architecture. This architecture encodes the
vector. This approach is used in many applications due to its input trajectory using an LSTM network, and fed it to the
simplicity and efficiency. Another type of attention mechanism first attention layer. Thereafter, a LSTM network generates
is known as Multi-Head Attention (MHA), that performs self- intermediate time sequences using the output of the attention
attention over K dimensions. That is, the context vector is layer. Thus, these sequences are used by the second atten-
estimated through the combination of K attention estimations. tion layer, and finally, fully-connected linear layers decode
There are many applications of attention for trajectory and generate the outputs. Messaoud et al. [163] presented
prediction. For instance, it can learn the most relevant temporal a modified version of Convolutional Social Pooling (CSP)
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13

[103] with MHA to extract local and non-local interaction ture, which were extracted from bird’s eye view images
features. This approach receives a grid centered on the target datasets. Alternatively, Liu et al. [168] proposed a multi-modal
vehicle, which also represents the neighbors lanes (i.e., three prediction architecture based on stacked transformers, in which
lanes). Convolutional layers extract local features, meanwhile a couple of transformer networks extracted features from
the multi-head attention module extracts non-local features. historical trajectories, road information, and social interaction.
Visual Attention is another approach with similar function Zhao et al. [169] applied a transformer network with resid-
than self-attention [157], [158]. According to Zhao and Koch ual layers for interaction-aware trajectory prediction. In this
[157] it uses machine learning techniques to learn a mapping framework, the transformer learns interaction features from the
function that learns a scalar mask matrix which represents spatial data, which was embedded by an earlier module using
salient regions on an image. Therefore, it can reduce the pooling operations and fully connected feed-forward networks.
computational cost and increase the performance of image In turn, Li et al. [170] presented an end-to-end architecture
processing [157], [158]. For instance, Li, Ma, Tomizuka [141] for object detection and trajectory prediction. The detection
proposed a deep generative trajectory prediction framework module exploits sensor-fusion for bounding-box estimation,
that apply visual attention to extract features from a bird’s and a recurrent prediction module extracts interaction features
eye view image of the road geometry and traffic scenario. and predicts trajectory using transformers, i.e., the output of
In turn, Park et al. [164] only applied the attention over the the prediction is also input of the transformer.
BEV image of the road geometry to extract agent-to-scene
interaction features. M. Multi-Task Learning
Moreover, attention in images can also be achieved by
Multi-Task learning (MTL) is a machine learning paradigm
hand-made masks called Region of Interest (ROI), such as
focused on learning simultaneously two or more different and
drawing a geometrical figure mask or cropping the image.
correlated tasks [171]. According to Zhang and Yang [171], the
In image processing research field, some auxiliary techniques
main idea behind this paradigm is that the learning process of
can help to improve the creation of these regions (e.g., fuzzy
one task improves the generalization of all tasks, since they
sets), however, this combination was not founded in primary
are correlated. There are a couple of approaches to achieve
studies. Therefore, Chandra et al. [24] drew a semi-elliptical
this goal, such as shared parameters between models and
region in the field of view of the target vehicle for an image-
the combination of different loss functions. In any case, the
based interaction-aware trajectory prediction in dense traffic
learning process of the overall model should carefully update
situations.
its parameters to avoid bias toward specific tasks, since the
Besides learned attention scores, it is also possible to use
goal is to achieve similar performance in all tasks [172].
heuristics to achieve selective focus on either features or
In trajectory prediction, this paradigm is specially interest-
surrounding vehicles. Apoorva et al. [165] used a function
ing since autonomous driving has many and correlated tasks.
over velocity of the surrounding vehicles to differentiate their
For example, intention prediction and trajectory prediction are
contribution to the target vehicle. According to the authors, this
two different tasks, which can be coupled into a multi-task
approach accentuate features of vehicles with high collision
learning architecture for a jointly learning process. Similarly,
probabilities. In turn, Dai, Li and Li [94] exploited the concept
multi-target tracking and trajectory prediction are two other
of safe distance to estimate the attention scores. This heuristic
related tasks that can help each other performance. In addi-
takes into account the distance and the time-to-break between
tion, the prediction framework can use a multi-objective loss
the vehicles.
function to explicitly optimize different goals while learning
L. Transformers to predict trajectories.
Strohbeck et al. [173] presented a prediction framework that
Transformer is a different type of neural network architec-
uses Temporal Convolution Network (TCN) to extract features
ture built upon the attention mechanism concept [166]. It was
from a bird’s eye view image of the traffic scenario, the target
first proposed for machine translation on Natural Language
vehicle state vector, and historical trajectory of the surrounding
Processing (NLP) [76], and showed better performance than
vehicles. Moreover, the loss function is a weighted sum of five
recurrent neural network for long-term sequences in some
other loss functions for, respectively, the position, velocity,
tasks. Moreover, it has been applied in many other projects
acceleration, orientation, and yaw rate. Each loss function
such as object detection, image segmentation, pose estimation,
is defined as Mean Square Error (MSE) over its variable,
tracking, and trajectory prediction [166].
with exception to the position that also takes into account
In a nutshell, a transformer network is an encoder-decoder
the uncertainty of the position estimation from the tracking
deep learning architecture with multi-head self-attention mech-
system (i.e., which is defined by the standard deviation).
anism and feed-forward networks [166]. The encoder learns
the long-term relationship between the input features through
positional encoding and attention scores matrices. Meanwhile, N. Continual Learning
the decoder also uses attention to maps the encoder’s output Continual Learning (CL) is a research field in artificial intel-
into the transformer’s output space in a sequence-to-sequence ligence that works with the study, development, and evaluation
model fashion [76]. of techniques, methodologies, frameworks, and application of
Quintanar et al. [167] modified a vanilla transformer for problems of learning over time [174]. That is, problems that
trajectory prediction using historical trajectories as input fea- never stop learning, whether learning new concepts or tasks,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

or improving knowledge and performance on tasks already sensors and traffic scenarios. Table V shows a summary of
learned [175]. the datasets found in the analysis of the primary studies and
There are many applications for continual learning in trajec- recent published dataset for autonomous vehicles.
tory prediction. In the case of interaction-aware frameworks,
the possibilities are even greater, because they have more VIII. E VALUATION M ETRICS
components that can have their performance improved by
using continuous learning algorithms. For example, contin- To evaluate the performance of each approach, different
ual learning algorithms might enhance behavior or intention methodologies need to be applied according to the techniques
prediction, which is a classification problem, to learn new used and what information is desired to be evaluated. For
maneuvers that were not previously known [66]. Alternatively, intention prediction, machine learning metrics of classification
these algorithms might also improve interaction modeling, problems are used, generally based on a confusion matrix. In
learning new interaction behaviors and consequences. Another addition, an important factor to be considered is the unbal-
application is using feedback from the tracking algorithm anced nature of the data, since the datasets present different
to improve predictor performance, while learning individual amounts for each class of maneuver, and in some situations
characteristics of each target vehicle. this difference is quite considerable. Therefore, specific met-
Si, Wei and Liu [122] presented a framework that uses rics for classification problems with unbalanced data should
Parameter Sharing Generative Adversarial Imitation Learning also be used [192].
(PS-GAIL) with Recursive Least Square Parameter Adapta- In turn, trajectory prediction is modeled as a regression
tion Algorithm (RLS-PAA). The latter technique (RLS-PAA) problem which outputs continuous values (e.g. position and
is responsible for adapting a policy model to consider the velocity). Therefore, metrics for machine learning regression
individual characteristics of each target vehicle in an online models were adapted to evaluate the performance of these
fashion. The trajectory prediction module is built using a Gen- predictors [193], [194].
erative Adversarial Network (GAN) and trained offline to learn • Average Displacement Error (ADE): calculates the
policies for drivers action using a parameter sharing approach. average between the Euclidean distances of all points of
According to authors, a limitation of this method is the time the real trajectory (ground-truth) and the result of the
to converge the RLS-PAA, which may be impractical for real- prediction algorithm.
time applications [122]. Alternatively, Vasquez, Fraichard and • Final Displacement Error (FDE): calculates a Eu-
Laugier [176], [177] applied Growing Hidden Markov Models clidean distance between the end point of the real tra-
(GHMM) for vehicle and pedestrian trajectory prediction. jectory (ground-truth) and the predicted one.
This technique has the ability to incrementally learn both • Root Mean Square Error (RMSE): measures how far
parameters and adapt the structure of the model. Therefore, the output of the predictor is from the expected output
when a new motion pattern is detected, the model can integrate (ground-truth).
this knowledge into its knowledge base. A GHMM is a
time-evolving HMM in which the number of discrete states,
IX. I NTENTION - AWARE AND I NTERACTION - AWARE
transition structure, and probability parameters is updated at
T RAJECTORY P REDICTION R EMARKS
each time iteration. The problem of this approach relies on the
complexity to model an interaction-ware trajectory prediction There are different frameworks that address the problem of
using only Hidden Markov Models, which leads to many trajectory prediction, taking into account the intention and/or
variables and increases the complexity of the model and its interaction model of the traffic participants. They explore a
online adaptation. wide range of technologies and methodologies, from analytical
and probabilistic models to machine learning. In addition,
the frameworks also differ from each other on the input and
VII. DATASETS
output representation, whether they use historical trajectories,
The evaluation of software components is a very important road geometry, traffic laws features, risk assessment metrics,
step in the development of an autonomous and intelligent surrounding representation using either frontal camera, bird’s
vehicle. Before testing in real scenarios, it is important to eye view images, or other grid representation, among others
verify the performance of each algorithm using appropri- input features. Moreover, the frameworks also output either
ate methodology. For trajectory prediction, either simulation future trajectories (i.e., multi-modal or unimodal trajectories),
platforms or datasets with real traffic information are very grid maps, or control inputs (i.e., that are used by other
important tools during the development and for posterior techniques to estimate kinematic feasible trajectories).
evaluation of the system [14]. For scientific research, it is also The variety of approaches is especially present in ap-
important that the data used in the evaluation are publicly proaches that consider the interaction between traffic partic-
available, to allow different authors to compare their work ipants. This is because, as already mentioned, interaction is
with each other. one of the challenges of autonomous navigation that is still
This section presents public datasets available for trajectory open, and involves many variables. Thus, in this section, we
prediction gathered from the primary studies presented in highlight some remarks from the analysis of primary studies
the previous sections. It also extends the presentation to that show potential in interaction-aware trajectory prediction
newly published datasets that have a wide range of data from research field.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 15

TABLE V: Trajectory Prediction Datasets


Lyft ApolloScape Argoverse Waymon Open Interaction
NGSIM HighD VisDrone Nuscene BLVD InD TRAF RounD
Dataset (Trajectory) (Motion Forecasting) (Motion) Dataset
intersections,
Traffic urban, urban
highway highway urban urban urban urban urban roundabouts, intersections urban roundabouts
Scenarios intersections highway
highway
cameras,
cameras,
LiDAR, cameras, cameras,
LiDAR, cameras, trajectory, trajectory,
trajectory, trajectory, trajectory, GPS (trajectory), GPS (trajectory), LiDAR, LiDARs, trajectory, trajectory,
Sensors GPS (trajectory), LiDAR, BEV images, BEV images,
BEV images BEV images BEV images HD-Map, HD-Map GPS (trajectory), GPS (trajectory), cameras BEV images
IMU, GPS (trajectory) HD-Map HD-Map
RADAR, HD-Map BEV images
RADAR
BEV images
Number of
- - 400 1 000 170 000 - 323 557 103 354 40 054 6 004 - 50 -
Sequences
16.5 h
Sequence 45 min (I-80) 265 228 20 sec 25 sec 53 min (train) 5 sec 20 sec
(≈ 13.6 sec 991 min - 589 min - 6h
Length 45 min (US-101) total frames each sequence each sequence 50 min (test) each sequence each sequence
per vehicle)
vehicles,
vehicles, vehicles, vehicles, vehicles, vehicles, vehicles, vehicles,
Traffic 23 object vehicles, pedestrian,
vehicles vehicles pedestrians, pedestrians, pedestrians, vehicles pedestrians, pedestrians, pedestrian, pedestrians,
Participants classes pedestrians riders,
riders riders riders riders riders riders riders
rickshaw
lane-change,
lane-change,
lane-change, free-driving, lane-keeping,
moving, lane-change, lane-keeping, left/right turn,
lane-keeping, vehicle following, stopping,
Maneuvers - stopped, - - lane-keeping, - merging, merging, - -
merging, critical maneuver, accelerating,
parked left/right turn yield, yield
yield lane-change decelerating,
left/right turn
left/right turn
References [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190]
various weather various weather it includes various lighting various weather highly various lighting various lighting
very noisy sunny and good weather good weather
Observations and lighting and lighting semantic and conditions and - and lighting interactive conditions and conditions and
[191] windless weather conditions conditions
conditions conditions aerial map traffic densities conditions scenarios traffic densities traffic densities

• Traffic Rules: There are several variables that influence distribution of the future position of the target vehicle.
drivers’ decision-making, both in the interaction with For instance, one of the possibilities is to rank the results
other traffic participants and with the surrounding space. or eliminate trajectories by performing heuristic analyzes
Traffic rules are undoubtedly one of those factors that on the results.
exert great influence. Therefore, these features might be • Traffic Heterogeneity: Traffic heterogeneity is an inter-
important for either intention prediction or interaction esting challenge to be explored by trajectory predictors.
modeling. One remark worth mentioning is that for more Each category of traffic participant has a different way
realistic interaction models, a driver may disobey some of making decisions or motion dynamics. These charac-
traffic rules. teristics can influence the trajectory of the target vehicle.
• Graphs: Graphs are one of the most versatile data It is possible to explicitly address this problem through
structures that can model both road geometry and the some technologies such as heterogeneous graphs, multi-
interaction between traffic participants. Moreover, spatio- task learning, or attention mechanisms.
temporal and heterogeneous graphs are specially inter- • Traffic conditions: The traffic condition is another char-
esting for representing the temporal dependency of time- acteristic that can influence the trajectory of the target
series and the interaction between different traffic partic- vehicle, in addition to the geometry of the roads. For
ipants (e.g., pedestrian, cyclists, cars, and trucks). example, weather, navigable areas (e.g., the existence of
• Attention Mechanism: Attention mechanisms have im- a hole or puddle), traffic jams, and roadworks in progress.
proved the performance of deep learning architectures • Road Geometry: This is an important feature for either
on many tasks. In trajectory prediction, it also presents trajectory or maneuver intention prediction. However,
significant results. They can either highlight important there are many variations of road geometries, especially
features or differentiate the contribution of each surround- at intersections. This variety ends up making prediction
ing vehicle to the target vehicle. tasks and vehicle interaction modeling even more chal-
• Continual Learning: Continuous learning is another ma- lenging. Directed graphs are an interesting structure for
chine learning paradigm that presents many possibilities representing some road geometry variations by including
for trajectory prediction. Among some approaches, it is the direction of the lanes in their representation. Bird’s
possible to use the tracking system’s feedback to improve eye view images obtained from High-Definition Maps
the predictor performance, or either learn new interaction (HD-Maps) also represent many details of these roads.
models or maneuver intentions. However, as there is still no consensus on the best way
• Driving Styles: Driving styles can be an important to represent the roads, there is still room to be explored
feature for trajectory prediction, in addition to tactical by future research.
decision-making. The literature has already shown con- In addition to the aforementioned features that can be
siderable differences in the trajectories of drivers with exploited by trajectory predictors, some final considerations
different driving styles, for example, between an aggres- are also important to mention. The first one is the inclusion of
sive and calm driver. However, in trajectory prediction memory assessment and inference time in published scientific
there is a challenge of estimating the driving style based works. As trajectory prediction is a real-time constrained
only on a few seconds of observation and the lack of system task, it is important to know the memory cost and
labeled datasets. inference time of models, especially those that exploit many
• Multimodal Trajectory Prediction: Multi-modal tra- deep learning technologies. However, few studies make these
jectory prediction are more robust to noise and the considerations so far.
difficulty of properly modeling the interaction between Finally, it is also important to bring some more in-depth
traffic participants. It is also possible to increase the discussions of how these frameworks can be included in
versatility of the predictor, by estimating a probability autonomous vehicle architecture. For example, how much
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 16

modular are the proposals, which components can be shared [11] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila,
with other tasks, and how autonomous vehicle can use the and K. O. Arras, “Human motion trajectory prediction: A survey,” The
International Journal of Robotics Research, vol. 39, no. 8, pp. 895–935,
results obtained. 2020.
[12] A. Bighashdel and G. Dubbelman, “A survey on path prediction
techniques for vulnerable road users: From traditional to deep-learning
X. C ONCLUSION approaches,” in 2019 IEEE Intelligent Transportation Systems Confer-
To increase safety on autonomous navigation, in addition ence (ITSC). IEEE, 2019, pp. 1039–1046.
[13] F. Leon and M. Gavrilescu, “A review of tracking, prediction and
to obstacle detection, it is also highly important to predict decision making methods for autonomous driving,” arXiv preprint
the future trajectory of these obstacles. This information arXiv:1909.07707, 2019.
improves the environment representation and allows predicting [14] P. Koopman and M. Wagner, “Autonomous vehicle safety: An in-
terdisciplinary challenge,” IEEE Intelligent Transportation Systems
hazardous situations such as collisions. There are different Magazine, vol. 9, no. 1, pp. 90–96, 2017.
approaches for trajectory prediction. This paper presents a [15] L. Hou, L. Xin, S. E. Li, B. Cheng, and W. Wang, “Interactive trajectory
review of methods that take into account the intention of the prediction of surrounding road users for autonomous driving using
structural-lstm network,” IEEE Transactions on Intelligent Transporta-
traffic participants and also their interaction models. tion Systems, 2019.
The analysis of the primary studies highlighted the efforts [16] A. Artuñedo, Decision-making strategies for automated driving in
already made in this research field, and provides insights urban environments. Springer Nature, 2020.
[17] X. Hu and M. Zheng, “Research progress and prospects of vehicle
for the evolution of the state-of-the-art. There is also an driving behavior prediction,” World Electric Vehicle Journal, vol. 12,
open field for research on approaches that take into account no. 2, p. 88, 2021.
both intention and interaction, more diverse traffic situations, [18] S. Mozaffari, O. Y. Al-Jarrah, M. Dianati, P. Jennings, and A. Mouza-
kitis, “Deep learning-based vehicle behavior prediction for autonomous
interaction among heterogeneous traffic participants, and that driving applications: A review,” IEEE Transactions on Intelligent
also consider traffic laws (e.g. signalized intersection, speed Transportation Systems, 2020.
limit, and permission to lane change). [19] Y. Xing, C. Lv, H. Wang, H. Wang, Y. Ai, D. Cao, E. Velenis, and F.-Y.
Wang, “Driver lane change intention inference for intelligent vehicles:
framework, survey, and challenges,” IEEE Transactions on Vehicular
ACKNOWLEDGMENT Technology, vol. 68, no. 5, pp. 4377–4390, 2019.
[20] K. Brown, K. Driggs-Campbell, and M. J. Kochenderfer, “A taxonomy
We thank the Coordination for the Improvement of Higher and review of algorithms for modeling and predicting human driver
Education Personnel - Brazil (CAPES) for the financial sup- behavior,” arXiv preprint arXiv:2006.08832, 2020.
[21] N. AbuAli and H. Abou-zeid, “Driver behavior modeling: Devel-
port under grant 88887.500344/2020-0, and the São Paulo opments and future directions,” International journal of vehicular
Research Foundation (FAPESP) for the financial support under technology, vol. 2016, 2016.
grant 2019/27301-7. [22] A. Doshi and M. M. Trivedi, “Tactical driver behavior prediction
and intent inference: A review,” in 2011 14th International IEEE
Conference on Intelligent Transportation Systems (ITSC). IEEE, 2011,
R EFERENCES pp. 1892–1897.
[23] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker,
[1] K. Bimbraw, “Autonomous cars: Past, present and future a review “Desire: Distant future prediction in dynamic scenes with interacting
of the developments in the last century, the present scenario and the agents,” in Proceedings of the IEEE Conference on Computer Vision
expected future of autonomous vehicle technology,” in Informatics in and Pattern Recognition, 2017, pp. 336–345.
Control, Automation and Robotics (ICINCO), 2015 12th International [24] R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic:
Conference on, vol. 1. IEEE, 2015, pp. 191–198. Trajectory prediction in dense and heterogeneous traffic using weighted
[2] D. J. Fagnant and K. Kockelman, “Preparing a nation for autonomous interactions,” in Proceedings of the IEEE/CVF Conference on Com-
vehicles: opportunities, barriers and policy recommendations,” Trans- puter Vision and Pattern Recognition, 2019, pp. 8483–8492.
portation Research Part A: Policy and Practice, vol. 77, pp. 167–181, [25] G. He, X. Li, Y. Lv, B. Gao, and H. Chen, “Probabilistic intention
2015. prediction and trajectory generation based on dynamic bayesian net-
[3] J. M. Lutin, “Not if, but when: Autonomous driving and the future of works,” in 2019 Chinese Automation Congress (CAC). IEEE, 2019,
transit,” Journal of Public Transportation, vol. 21, no. 1, p. 10, 2018. pp. 2646–2651.
[4] D. Ferguson, M. Darms, C. Urmson, and S. Kolski, “Detection, [26] H. Woo, Y. Ji, Y. Tamura, Y. Kuroda, T. Sugano, Y. Yamamoto,
prediction, and avoidance of dynamic obstacles in urban environments,” A. Yamashita, and H. Asama, “Advanced adaptive cruise control based
in 2008 IEEE Intelligent Vehicles Symposium. IEEE, 2008, pp. 1149– on collision risk assessment,” in 2018 21st International Conference on
1154. Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 939–944.
[5] S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion prediction [27] G. Xie, H. Gao, L. Qian, B. Huang, K. Li, and J. Wang, “Vehicle trajec-
and risk assessment for intelligent vehicles,” ROBOMECH journal, tory prediction by integrating physics-and maneuver-based approaches
vol. 1, no. 1, p. 1, 2014. using interactive multiple models,” IEEE Transactions on Industrial
[6] Q. Zou, Y. Hou, and K. Xiong, “An overview of the motion prediction Electronics, vol. 65, no. 7, pp. 5999–6008, 2017.
of traffic participants for host vehicle,” in 2019 Chinese Control [28] L. Zhang, W. Xiao, Z. Zhang, and D. Meng, “Surrounding vehicles
Conference (CCC). IEEE, 2019, pp. 7872–7877. motion prediction for risk assessment and motion planning of au-
[7] C. Katrakazas, M. Quddus, W.-H. Chen, and L. Deka, “Real-time tonomous vehicle in highway scenarios,” IEEE Access, vol. 8, pp.
motion planning methods for autonomous on-road driving: State-of- 209 356–209 376, 2020.
the-art and future research directions,” Transportation Research Part [29] C. Otto and F. P. Leon, “Long-term trajectory classification and
C: Emerging Technologies, vol. 60, pp. 416–442, 2015. prediction of commercial vehicles for the application in advanced driver
[8] S. Paravarzar and B. Mohammad, “Motion prediction on self-driving assistance systems,” in 2012 American Control Conference (ACC).
cars: A review,” arXiv preprint arXiv:2011.03635, 2020. IEEE, 2012, pp. 2904–2909.
[9] D. Ridel, E. Rehder, M. Lauer, C. Stiller, and D. Wolf, “A literature [30] M. Schreier, V. Willert, and J. Adamy, “Bayesian, maneuver-based,
review on the prediction of pedestrian behavior in urban scenarios,” long-term trajectory prediction and criticality assessment for driver as-
in 2018 21st International Conference on Intelligent Transportation sistance systems,” in 17th International IEEE Conference on Intelligent
Systems (ITSC). IEEE, 2018, pp. 3105–3112. Transportation Systems (ITSC). IEEE, 2014, pp. 334–341.
[10] N. Brouwer, H. Kloeden, and C. Stiller, “Comparison and evaluation of [31] ——, “An integrated approach to maneuver-based trajectory predic-
pedestrian motion models for vehicle safety systems,” in 2016 IEEE tion and criticality assessment in arbitrary road environments,” IEEE
19th International Conference on Intelligent Transportation Systems Transactions on Intelligent Transportation Systems, vol. 17, no. 10, pp.
(ITSC). IEEE, 2016, pp. 2207–2212. 2751–2766, 2016.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

[32] V. Lefkopoulos, M. Menner, A. Domahidi, and M. N. Zeilinger, [53] Q. Tran and J. Firl, “Online maneuver recognition and multimodal
“Interaction-aware motion prediction for autonomous driving: A mul- trajectory prediction for intersection assistance using non-parametric
tiple model kalman filtering scheme,” IEEE Robotics and Automation regression,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings.
Letters, vol. 6, no. 1, pp. 80–87, 2021. IEEE, 2014, pp. 918–923.
[33] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles [54] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in
move? a unified framework for maneuver classification and motion empirical observations and microscopic simulations,” Physical review
prediction,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 2, E, vol. 62, no. 2, p. 1805, 2000.
pp. 129–140, 2018. [55] S. Albeaik, A. Bayen, M. T. Chiri, X. Gong, A. Hayat, N. Kardous,
[34] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao, “Vehicle trajectory A. Keimer, S. T. McQuade, B. Piccoli, and Y. You, “Limitations and
prediction based on motion model and maneuver recognition,” in 2013 improvements of the intelligent driver model (idm),” arXiv preprint
IEEE/RSJ international conference on intelligent robots and systems. arXiv:2104.02583, 2021.
IEEE, 2013, pp. 4363–4369. [56] Y. Hu, W. Zhan, and M. Tomizuka, “Probabilistic prediction of vehicle
[35] S. Annell, A. Gratner, and L. Svensson, “Probabilistic collision estima- semantic intention and motion,” in 2018 IEEE Intelligent Vehicles
tion system for autonomous vehicles,” in 2016 IEEE 19th International Symposium (IV). IEEE, 2018, pp. 307–313.
Conference on Intelligent Transportation Systems (ITSC). IEEE, 2016, [57] F. Wirthmüller, J. Schlechtriemen, J. Hipp, and M. Reichert, “Teaching
pp. 473–478. vehicles to anticipate: A systematic study on probabilistic behavior
[36] D. Richardos, B. Anastasia, D. Georgios, and A. Angelos, “Vehicle prediction using large data sets,” IEEE Transactions on Intelligent
maneuver-based long-term trajectory prediction at intersection cross- Transportation Systems, 2020.
ings,” in 2020 IEEE 3rd Connected and Automated Vehicles Symposium [58] Y. Xing, C. Lv, and D. Cao, “Personalized vehicle trajectory prediction
(CAVS). IEEE, pp. 1–6. based on joint time-series modeling for connected vehicles,” IEEE
[37] W. Song, B. Su, G. Xiong, and S. Li, “Intention-aware decision making Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1341–1352,
in urban lane change scenario for autonomous driving,” in 2018 IEEE 2019.
International Conference on Vehicular Electronics and Safety (ICVES). [59] X. Feng, Z. Cen, J. Hu, and Y. Zhang, “Vehicle trajectory prediction
IEEE, 2018, pp. 1–8. using intention-based conditional variational autoencoder,” in 2019
[38] M. Bahari, I. Nejjar, and A. Alahi, “Injecting knowledge in data- IEEE Intelligent Transportation Systems Conference (ITSC). IEEE,
driven vehicle trajectory predictors,” Transportation Research Part C: 2019, pp. 3514–3519.
Emerging Technologies, vol. 128, p. 103010, 2021. [60] J. Li, H. Ma, W. Zhan, and M. Tomizuka, “Generic probabilistic
[39] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, interactive situation recognition and prediction: From virtual to real,”
“Interaction-aware long-term driving situation prediction,” in 2018 21st in 2018 21st international conference on intelligent transportation
International Conference on Intelligent Transportation Systems (ITSC). systems (ITSC). IEEE, 2018, pp. 3218–3224.
IEEE, 2018, pp. 137–143. [61] Y. Hu, W. Zhan, and M. Tomizuka, “A framework for probabilistic
[40] Y. Jeong and K. Yi, “Bidirectional long shot-term memory-based generic traffic scene prediction,” in 2018 21st International Conference
interactive motion prediction of cut-in vehicles in urban environments,” on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2790–
IEEE Access, vol. 8, pp. 106 183–106 197, 2020. 2796.
[41] C. Wissing, T. Nattermann, K.-H. Glander, and T. Bertram, “Trajectory
[62] J. Liu, Y. Luo, H. Xiong, T. Wang, H. Huang, and Z. Zhong, “An
prediction for safety critical maneuvers in automated highway driving,”
integrated approach to probabilistic vehicle trajectory prediction via
in 2018 21st International Conference on Intelligent Transportation
driver characteristic and intention estimation,” in 2019 IEEE Intelligent
Systems (ITSC). IEEE, 2018, pp. 131–136.
Transportation Systems Conference (ITSC). IEEE, 2019, pp. 3526–
[42] K. Gillmeier, F. Diederichs, and D. Spath, “Prediction of ego vehicle
3532.
trajectories based on driver intention and environmental context,” in
[63] Q. Tran and J. Firl, “Modelling of traffic situations at urban inter-
2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp.
sections with probabilistic non-parametric regression,” in 2013 IEEE
963–968.
Intelligent Vehicles Symposium (IV). IEEE, 2013, pp. 334–339.
[43] P. Liu, A. Kurt, and Ü. Özgüner, “Trajectory prediction of a lane chang-
[64] Y. Guo, V. V. Kalidindi, M. Arief, W. Wang, J. Zhu, H. Peng, and
ing vehicle based on driver behavior estimation and classification,”
D. Zhao, “Modeling multi-vehicle interaction scenarios using gaussian
in 17th international IEEE conference on intelligent transportation
random field,” in 2019 IEEE Intelligent Transportation Systems Con-
systems (ITSC). IEEE, 2014, pp. 942–947.
ference (ITSC). IEEE, 2019, pp. 3974–3980.
[44] D. Augustin, M. Hofmann, and U. Konigorski, “Prediction of highway
lane changes based on prototype trajectories,” Forschung im Ingenieur- [65] M. Althoff, O. Stursberg, and M. Buss, “Stochastic reachable sets
wesen, vol. 83, no. 2, pp. 149–161, 2019. of interacting traffic participants,” in 2008 IEEE Intelligent Vehicles
[45] C. Lienke, C. Wissing, M. Keller, T. Nattermann, and T. Bertram, Symposium. IEEE, 2008, pp. 1086–1092.
“Predictive driving: Fusing prediction and planning for automated [66] J. Schulz, C. Hubmann, J. Löchner, and D. Burschka, “Interaction-
highway driving,” IEEE Transactions on Intelligent Vehicles, vol. 4, aware probabilistic behavior prediction in urban environments,” in 2018
no. 3, pp. 456–467, 2019. IEEE/RSJ International Conference on Intelligent Robots and Systems
[46] C. Dong and J. M. Dolan, “Continuous behavioral prediction in lane- (IROS). IEEE, 2018, pp. 3999–4006.
change for autonomous driving cars in dynamic environments,” in 2018 [67] ——, “Multiple model unscented kalman filtering in dynamic bayesian
21st International Conference on Intelligent Transportation Systems networks for intention estimation and trajectory prediction,” in 2018
(ITSC). IEEE, 2018, pp. 3706–3711. 21st International Conference on Intelligent Transportation Systems
[47] S. Wang, P. Zhao, B. Yu, W. Huang, and H. Liang, “Vehicle trajectory (ITSC). IEEE, 2018, pp. 1467–1474.
prediction by knowledge-driven lstm network in urban environments,” [68] J. S. Gill, P. Pisu, and M. J. Schmid, “A probabilistic framework
Journal of Advanced Transportation, vol. 2020, 2020. for trajectory prediction in traffic utilizing driver characterization,” in
[48] T. Zhang, W. Song, M. Fu, Y. Yang, and M. Wang, “Vehicle motion 2019 IEEE 2nd Connected and Automated Vehicles Symposium (CAVS).
prediction at intersections based on the turning intention and prior IEEE, 2019, pp. 1–5.
trajectories model,” IEEE/CAA Journal of Automatica Sinica, 2021. [69] N. Rhinehart, R. McAllister, K. Kitani, and S. Levine, “Precog:
[49] W. Ding and S. Shen, “Online vehicle trajectory prediction using Prediction conditioned on goals in visual multi-agent settings,” in
policy anticipation network and optimization-based context reasoning,” Proceedings of the IEEE/CVF International Conference on Computer
in 2019 International Conference on Robotics and Automation (ICRA). Vision, 2019, pp. 2821–2830.
IEEE, 2019, pp. 9610–9616. [70] D. S. González, J. S. Dibangoye, and C. Laugier, “High-speed highway
[50] K. Okamoto, K. Berntorp, and S. Di Cairano, “Driver intention-based scene prediction based on driver models learned from demonstrations,”
vehicle threat assessment using random forests and particle filtering,” in 2016 IEEE 19th International Conference on Intelligent Transporta-
IFAC-PapersOnLine, vol. 50, no. 1, pp. 13 860–13 865, 2017. tion Systems (ITSC). IEEE, 2016, pp. 149–155.
[51] J. Li, H. Ma, Z. Zhang, J. Li, and M. Tomizuka, “Spatio-temporal [71] M. T. Spaan, “Partially observable markov decision processes,” in
graph dual-attention network for multi-agent prediction and tracking,” Reinforcement Learning. Springer, 2012, pp. 387–414.
arXiv preprint arXiv:2102.09117, 2021. [72] N. Deo and M. M. Trivedi, “Multi-modal trajectory prediction of
[52] J. Li, W. Zhan, Y. Hu, and M. Tomizuka, “Generic tracking and surrounding vehicles with maneuver based lstms,” in 2018 IEEE
probabilistic prediction framework and its application in autonomous Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. 1179–1184.
driving,” IEEE Transactions on Intelligent Transportation Systems, [73] Z. Hao, X. Huang, K. Wang, M. Cui, and Y. Tian, “Attention-based
vol. 21, no. 9, pp. 3634–3649, 2019. gru for driver intention recognition and vehicle trajectory prediction,”
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

in 2020 4th CAA International Conference on Vehicular Control and on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 1441–
Intelligence (CVCI). IEEE, 2020, pp. 86–91. 1446.
[74] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, [96] A. Benterki, M. Boukhnifer, V. Judalet, and C. Maaoui, “Artificial
“Relational recurrent neural networks for vehicle trajectory prediction,” intelligence for vehicle behavior anticipation: Hybrid approach based
in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). on maneuver classification and trajectory prediction,” IEEE Access,
IEEE, 2019, pp. 1813–1818. vol. 8, pp. 56 992–57 002, 2020.
[75] H. Yousuf, M. Lahzi, S. A. Salloum, and K. Shaalan, “A systematic [97] C. Ju, Z. Wang, C. Long, X. Zhang, and D. E. Chang, “Interaction-
review on sequence-to-sequence learning with neural network and its aware kalman neural networks for trajectory prediction,” in 2020 IEEE
models.” International Journal of Electrical & Computer Engineering Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 1793–1800.
(2088-8708), vol. 11, no. 3, 2021. [98] H. Cheng and M. Sester, “Modeling mixed traffic in shared space using
[76] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. lstm with probability density mapping,” in 2018 21st International
Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018,
preprint arXiv:1706.03762, 2017. pp. 3898–3904.
[77] C. Fei, X. He, and X. Ji, “Multi-modal vehicle trajectory prediction [99] ——, “Mixed traffic trajectory prediction using lstm–based models in
based on mutual information,” IET Intelligent Transport Systems, shared space,” in The Annual International Conference on Geographic
vol. 14, no. 3, pp. 148–153, 2019. Information Science. Springer, 2018, pp. 309–325.
[78] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [100] M. Hasan, A. Solernou, E. Paschalidis, H. Wang, G. Markkula, and
computation, vol. 9, no. 8, pp. 1735–1780, 1997. R. Romano, “Maneuver-aware pooling for vehicle trajectory predic-
[79] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: tion,” arXiv preprint arXiv:2104.14079, 2021.
Continual prediction with lstm,” Neural computation, vol. 12, no. 10, [101] M. Fu, T. Zhang, W. Song, Y. Yang, and M. Wang, “Trajectory
pp. 2451–2471, 2000. prediction-based local spatio-temporal navigation map for autonomous
[80] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of driving in dynamic highway environments,” IEEE Transactions on
gated recurrent neural networks on sequence modeling,” arXiv preprint Intelligent Transportation Systems, 2021.
arXiv:1412.3555, 2014. [102] K. O’Shea and R. Nash, “An introduction to convolutional neural
[81] ——, “Gated feedback recurrent neural networks,” in International networks,” arXiv preprint arXiv:1511.08458, 2015.
Conference on Machine Learning, 2015, pp. 2067–2075. [103] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle
[82] A. Santoro, R. Faulkner, D. Raposo, J. Rae, M. Chrzanowski, T. Weber, trajectory prediction,” in Proceedings of the IEEE Conference on
D. Wierstra, O. Vinyals, R. Pascanu, and T. Lillicrap, “Relational Computer Vision and Pattern Recognition Workshops, 2018, pp. 1468–
recurrent neural networks,” in Advances in Neural Information Pro- 1476.
cessing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, [104] H. Song, W. Ding, Y. Chen, S. Shen, M. Y. Wang, and Q. Chen, “Pip:
N. Cesa-Bianchi, and R. Garnett, Eds. Curran Associates, Inc., 2018, Planning-informed trajectory prediction for autonomous driving,” in
pp. 7299–7310. European Conference on Computer Vision. Springer, 2020, pp. 598–
[83] H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, 614.
“Recent advances in recurrent neural networks,” arXiv preprint
[105] X. Mo, Y. Xing, and C. Lv, “Interaction-aware trajectory prediction of
arXiv:1801.01078, 2017.
connected vehicles using cnn-lstm networks,” in IECON 2020 The 46th
[84] M. Chen, “Minimalrnn: Toward more interpretable and trainable recur-
Annual Conference of the IEEE Industrial Electronics Society. IEEE,
rent neural networks,” arXiv preprint arXiv:1711.06788, 2017.
2020, pp. 5057–5062.
[85] K. Min, H. Kim, J. Park, D. Kim, and K. Huh, “Interaction aware
[106] ——, “Recog: A deep learning framework with heterogeneous
trajectory prediction of surrounding vehicles with interaction network
graph for interaction-aware trajectory prediction,” arXiv preprint
and deep ensemble,” in 2020 IEEE Intelligent Vehicles Symposium (IV).
arXiv:2012.05032, 2020.
IEEE, 2020, pp. 1714–1719.
[86] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. [107] D. Cao, J. Li, H. Ma, and M. Tomizuka, “Spectral temporal graph neu-
Woo, “Convolutional lstm network: A machine learning approach for ral network for trajectory prediction,” arXiv preprint arXiv:2106.02930,
precipitation nowcasting,” in Advances in neural information process- 2021.
ing systems, 2015, pp. 802–810. [108] M. Krüger, A. S. Novo, T. Nattermann, and T. Bertram, “Interaction-
[87] M. Khakzar, A. Rakotonirainy, A. Bond, and S. G. Dehkordi, “A dual aware trajectory prediction based on a 3d spatio-temporal tensor
learning model for vehicle trajectory prediction,” IEEE Access, vol. 8, representation using convolutional–recurrent neural networks,” in 2020
pp. 21 897–21 908, 2020. IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 1122–
[88] S. Mukherjee, S. Wang, and A. Wallace, “Interacting vehicle trajectory 1127.
prediction with convolutional recurrent neural networks,” in 2020 IEEE [109] M. Ye, T. Cao, and Q. Chen, “Tpcn: Temporal point cloud networks
International Conference on Robotics and Automation (ICRA). IEEE, for motion forecasting,” in Proceedings of the IEEE/CVF Conference
2020, pp. 4336–4342. on Computer Vision and Pattern Recognition, 2021, pp. 11 318–11 327.
[89] R. Chandra, U. Bhattacharya, C. Roncal, A. Bera, and D. Manocha, [110] L. Du, Z. Wang, L. Wang, Z. Zhao, F. Su, B. Zhuang, and N. V.
“Robusttp: End-to-end trajectory prediction for heterogeneous road- Boulgouris, “Adaptive visual interaction based multi-target future state
agents in dense traffic with noisy sensor inputs,” in ACM Computer prediction for autonomous driving vehicles,” IEEE Transactions on
Science in Cars Symposium, 2019, pp. 1–9. Vehicular Technology, vol. 68, no. 5, pp. 4249–4261, 2019.
[90] N. Sriram, B. Liu, F. Pittaluga, and M. Chandraker, “Smart: Si- [111] J. Wang, P. Wang, C. Zhang, K. Su, and J. Li, “F-net: Fusion neural
multaneous multi-agent recurrent trajectory prediction,” in European network for vehicle trajectory prediction in autonomous driving,”
Conference on Computer Vision. Springer, 2020, pp. 463–479. in ICASSP 2021-2021 IEEE International Conference on Acoustics,
[91] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 4095–4099.
deep recurrent neural networks,” in 2013 IEEE international conference [112] A. Dulian and J. C. Murray, “Multi-modal anticipation of stochastic
on acoustics, speech and signal processing. Ieee, 2013, pp. 6645– trajectories in a dynamic environment with conditional variational
6649. autoencoders,” arXiv preprint arXiv:2103.03912, 2021.
[92] B. Ivanovic, K.-H. Lee, P. Tokmakov, B. Wulfe, R. McAllister, [113] H. Cui, V. Radosavljevic, F.-C. Chou, T.-H. Lin, T. Nguyen, T.-K.
A. Gaidon, and M. Pavone, “Heterogeneous-agent trajectory forecast- Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions
ing incorporating class uncertainty,” arXiv preprint arXiv:2104.12446, for autonomous driving using deep convolutional networks,” in 2019
2021. International Conference on Robotics and Automation (ICRA). IEEE,
[93] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn: 2019, pp. 2090–2096.
Deep learning on spatio-temporal graphs,” in Proceedings of the ieee [114] A. Sadeghian, F. Legros, M. Voisin, R. Vesel, A. Alahi, and S. Savarese,
conference on computer vision and pattern recognition, 2016, pp. “Car-net: Clairvoyant attentive recurrent network,” in Proceedings of
5308–5317. the European Conference on Computer Vision (ECCV), 2018, pp. 151–
[94] S. Dai, L. Li, and Z. Li, “Modeling vehicle interactions via modified 167.
lstm models for trajectory prediction,” IEEE Access, vol. 7, pp. 38 287– [115] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
38 296, 2019. Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial
[95] L. Xin, P. Wang, C.-Y. Chan, J. Chen, S. E. Li, and B. Cheng, networks,” 2014.
“Intention-aware long horizon trajectory prediction of surrounding ve- [116] D. Roy, T. Ishizaka, C. K. Mohan, and A. Fukuda, “Vehicle trajectory
hicles using dual lstm networks,” in 2018 21st International Conference prediction at intersections using interaction based generative adversarial
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 19

networks,” in 2019 IEEE Intelligent Transportation Systems Conference [138] I. Chami, S. Abu-El-Haija, B. Perozzi, C. Ré, and K. Murphy, “Ma-
(ITSC). IEEE, 2019, pp. 2318–2323. chine learning on graphs: A model and comprehensive taxonomy,”
[117] C. Hegde, S. Dash, and P. Agarwal, “Vehicle trajectory prediction arXiv preprint arXiv:2005.03675, 2020.
using gan,” in 2020 Fourth International Conference on I-SMAC (IoT [139] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A
in Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, 2020, pp. comprehensive survey on graph neural networks,” IEEE transactions
502–507. on neural networks and learning systems, 2020.
[118] J. Li, H. Ma, and M. Tomizuka, “Interaction-aware multi-agent tracking [140] P. Baldi, “Autoencoders, unsupervised learning, and deep architec-
and probabilistic behavior prediction via adversarial learning,” in 2019 tures,” in Proceedings of ICML workshop on unsupervised and transfer
international conference on robotics and automation (ICRA). IEEE, learning. JMLR Workshop and Conference Proceedings, 2012, pp.
2019, pp. 6658–6664. 37–49.
[119] X. Li, G. Rosman, I. Gilitschenski, C.-I. Vasile, J. A. DeCastro, [141] J. Li, H. Ma, and M. Tomizuka, “Conditional generative neural system
S. Karaman, and D. Rus, “Vehicle trajectory prediction using generative for probabilistic trajectory prediction,” in 2019 IEEE/RSJ International
adversarial network with temporal logic syntax tree features,” IEEE Conference on Intelligent Robots and Systems (IROS). IEEE, 2019,
Robotics and Automation Letters, vol. 6, no. 2, pp. 3459–3466, 2021. pp. 6150–6156.
[120] A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation learning: [142] K. Cho, T. Ha, G. Lee, and S. Oh, “Deep predictive autonomous driving
A survey of learning methods,” ACM Computing Surveys (CSUR), using multi-agent joint trajectory prediction and traffic rules.” in IROS,
vol. 50, no. 2, pp. 1–35, 2017. 2019, pp. 2076–2081.
[121] Y. Huang and Y. Chen, “Autonomous driving with deep learning: A
[143] Y. Hu, L. Sun, and M. Tomizuka, “Generic prediction architecture
survey of state-of-art technologies,” arXiv preprint arXiv:2006.06091,
considering both rational and irrational driving behaviors,” in 2019
2020.
IEEE Intelligent Transportation Systems Conference (ITSC). IEEE,
[122] W. Si, T. Wei, and C. Liu, “Agen: Adaptable generative prediction
2019, pp. 3539–3546.
networks for autonomous driving,” in 2019 IEEE Intelligent Vehicles
[144] Y. Hu, W. Zhan, L. Sun, and M. Tomizuka, “Multi-modal probabilistic
Symposium (IV). IEEE, 2019, pp. 281–286.
prediction of interactive behavior via an interpretable model,” in 2019
[123] R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler,
IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 557–563.
and M. J. Kochenderfer, “Multi-agent imitation learning for driving
simulation,” in 2018 IEEE/RSJ International Conference on Intelligent [145] C. Fei, X. He, S. Kawahara, N. Shirou, and X. Ji, “Conditional
Robots and Systems (IROS). IEEE, 2018, pp. 1534–1539. wasserstein auto-encoder for interactive vehicle trajectory prediction,”
[124] J. Quehl, H. Hu, S. Wirges, and M. Lauer, “An approach to vehicle in 2020 IEEE 23rd International Conference on Intelligent Transporta-
trajectory prediction using automatically generated traffic maps,” in tion Systems (ITSC). IEEE, 2020, pp. 1–6.
2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. [146] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
544–549. adversarial networks,” in International conference on machine learning.
[125] G. Chen, L. Hu, Q. Zhang, Z. Ren, X. Gao, and J. Cheng, “St- PMLR, 2017, pp. 214–223.
lstm: Spatio-temporal graph based long short-term memory network for [147] H. Zhang, Y. Wang, J. Liu, C. Li, T. Ma, and C. Yin, “A multi-modal
vehicle trajectory prediction,” in 2020 IEEE International Conference states based vehicle descriptor and dilated convolutional social pooling
on Image Processing (ICIP). IEEE, 2020, pp. 608–612. for vehicle trajectory prediction,” arXiv preprint arXiv:2003.03480,
[126] C. Choi, S. Malla, A. Patil, and J. H. Choi, “Drogon: A trajectory 2020.
prediction model based on intention-conditioned behavior reasoning,” [148] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement
arXiv preprint arXiv:1908.00024, 2019. learning: A survey,” Journal of artificial intelligence research, vol. 4,
[127] Y. Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha, pp. 237–285, 1996.
“Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,” [149] M. Wiering and M. Van Otterlo, “Reinforcement learning,” Adaptation,
in Proceedings of the AAAI Conference on Artificial Intelligence, learning, and optimization, vol. 12, no. 3, 2012.
vol. 33, no. 01, 2019, pp. 6120–6127. [150] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
[128] X. Li, X. Ying, and M. C. Chuah, “Grip: Graph-based interaction-aware MIT press, 2018.
trajectory prediction,” in 2019 IEEE Intelligent Transportation Systems [151] L. Sun, W. Zhan, and M. Tomizuka, “Probabilistic prediction of interac-
Conference (ITSC). IEEE, 2019, pp. 3960–3966. tive driving behavior via hierarchical inverse reinforcement learning,”
[129] Z. Zhao, H. Fang, Z. Jin, and Q. Qiu, “Gisnet: Graph-based information in 2018 21st International Conference on Intelligent Transportation
sharing network for vehicle trajectory prediction,” in 2020 International Systems (ITSC). IEEE, 2018, pp. 2111–2117.
Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7. [152] W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus,
[130] S. Carrasco, D. F. Llorca, and M. Á. Sotelo, “Scout: Socially-consistent “Social behavior for autonomous vehicles,” Proceedings of the National
and understandable graph attention network for trajectory prediction of Academy of Sciences, vol. 116, no. 50, pp. 24 972–24 978, 2019.
vehicles and vrus,” arXiv preprint arXiv:2102.06361, 2021. [153] Z. Niu, G. Zhong, and H. Yu, “A review on the attention mechanism
[131] F. Diehl, T. Brunner, M. T. Le, and A. Knoll, “Graph neural networks of deep learning,” Neurocomputing, 2021.
for modelling traffic participant interaction,” in 2019 IEEE Intelligent
[154] J. Yan, Z. Peng, H. Yin, J. Wang, X. Wang, Y. Shen, W. Stechele, and
Vehicles Symposium (IV). IEEE, 2019, pp. 695–701.
D. Cremers, “Trajectory prediction for intelligent vehicles using spatial-
[132] X. Weng, Y. Yuan, and K. Kitani, “Ptp: Parallelized tracking and
attention mechanism,” IET Intelligent Transport Systems, vol. 14,
prediction with graph neural networks and diversity sampling,” IEEE
no. 13, pp. 1855–1863, 2020.
Robotics and Automation Letters, vol. 6, no. 3, pp. 4640–4647, 2021.
[155] J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and
[133] H. Jeon, J. Choi, and D. Kum, “Scale-net: Scalable vehicle trajectory
G. P. Gil, “Multi-head attention for multi-modal joint vehicle motion
prediction network under random number of interacting vehicles via
forecasting,” in 2020 IEEE International Conference on Robotics and
edge-enhanced graph convolutional neural network,” in 2020 IEEE/RSJ
Automation (ICRA). IEEE, 2020, pp. 9638–9644.
International Conference on Intelligent Robots and Systems (IROS).
IEEE, 2020, pp. 2095–2102. [156] K. Messaoud, I. Yahiaoui, A. Verroust, and F. Nashashibi, “Attention
[134] L. Ye, Z. Wang, X. Chen, J. Wang, K. Wu, and K. Lu, “Gsan: Graph based vehicle trajectory prediction,” IEEE Transactions on Intelligent
self-attention network for interaction measurement in autonomous Vehicles, 2020.
driving,” in 2020 IEEE 17th International Conference on Mobile Ad [157] Q. Zhao and C. Koch, “Learning saliency-based visual attention: A
Hoc and Sensor Systems (MASS). IEEE, 2020, pp. 274–282. review,” Signal Processing, vol. 93, no. 6, pp. 1401–1407, 2013.
[135] R. Chandra, T. Guan, S. Panuganti, T. Mittal, U. Bhattacharya, A. Bera, [158] W. Wang and J. Shen, “Deep visual attention prediction,” IEEE
and D. Manocha, “Forecasting trajectory and behavior of road-agents Transactions on Image Processing, vol. 27, no. 5, pp. 2368–2378, 2017.
using spectral clustering in graph-lstms,” IEEE Robotics and Automa- [159] B. Kim, S. H. Park, S. Lee, E. Khoshimjonov, D. Kum, J. Kim, J. S.
tion Letters, vol. 5, no. 3, pp. 4882–4890, 2020. Kim, and J. W. Choi, “Lapred: Lane-aware prediction of multi-modal
[136] Y. Hu, W. Zhan, and M. Tomizuka, “Scenario-transferable semantic future trajectories of dynamic agents,” in Proceedings of the IEEE/CVF
graph reasoning for interaction-aware probabilistic prediction,” arXiv Conference on Computer Vision and Pattern Recognition, 2021, pp.
preprint arXiv:2004.03053, 2020. 14 636–14 645.
[137] J. Pan, H. Sun, K. Xu, Y. Jiang, X. Xiao, J. Hu, and J. Miao, “Lane- [160] S. Kim, D. Kum, and J. won Choi, “Recup net: Recursive predic-
attention: Predicting vehicles’ moving trajectories by learning their tion network for surrounding vehicle trajectory prediction with future
attention over lanes,” in 2020 IEEE/RSJ International Conference on trajectory feedback,” in 2020 IEEE 23rd International Conference on
Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 7949–7956. Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 20

[161] J. Yu, M. Zhou, X. Wang, G. Pu, C. Cheng, and B. Chen, “A dynamic dataset for autonomous driving,” arXiv preprint arXiv:1903.11027,
and static context-aware attention network for trajectory prediction,” 2019.
ISPRS International Journal of Geo-Information, vol. 10, no. 5, p. [182] J. Houston, G. Zuidhof, L. Bergamini, Y. Ye, L. Chen, A. Jain,
336, 2021. S. Omari, V. Iglovikov, and P. Ondruska, “One thousand and
[162] T. Zhang, M. Fu, W. Song, Y. Yang, and M. Wang, “Trajectory one hours: Self-driving motion prediction dataset,” arXiv preprint
prediction based on constraints of vehicle kinematics and social in- arXiv:2006.14480, 2020.
teraction,” in 2020 IEEE International Conference on Systems, Man, [183] P. Wang, X. Huang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The
and Cybernetics (SMC). IEEE, 2020, pp. 3957–3963. apolloscape open dataset for autonomous driving and its application,”
[163] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, IEEE transactions on pattern analysis and machine intelligence, 2019.
“Non-local social pooling for vehicle trajectory prediction,” in 2019 [184] M.-F. Chang, J. W. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett,
IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 975–980. D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, “Argoverse: 3d
[164] S. H. Park, G. Lee, J. Seo, M. Bhat, M. Kang, J. Francis, A. Jadhav, tracking and forecasting with rich maps,” in Conference on Computer
P. P. Liang, and L.-P. Morency, “Diverse and admissible trajectory Vision and Pattern Recognition (CVPR), 2019.
forecasting through multimodal context understanding,” in European [185] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui,
Conference on Computer Vision. Springer, 2020, pp. 282–298. J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam,
[165] K. Apoorva, R. Dhanya, A. K. Anjana, and S. Natarajan, “Trajectory H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi,
forecasting of entities using advanced deep learning techniques,” in Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in percep-
Advanced Computational and Communication Paradigms. Springer, tion for autonomous driving: Waymo open dataset,” in Proceedings of
2018, pp. 745–754. the IEEE/CVF Conference on Computer Vision and Pattern Recognition
[166] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, (CVPR), June 2020.
C. Xu, Y. Xu et al., “A survey on visual transformer,” arXiv preprint [186] W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kum-
arXiv:2012.12556, 2020. merle, H. Konigshof, C. Stiller, A. de La Fortelle et al., “Interaction
[167] A. Quintanar, D. Fernández-Llorca, I. Parra, R. Izquierdo, and dataset: An international, adversarial and cooperative motion dataset
M. Sotelo, “Predicting vehicles trajectories in urban scenarios with in interactive driving scenarios with semantic maps,” arXiv preprint
transformer networks and augmented information,” arXiv preprint arXiv:1910.03088, 2019.
arXiv:2106.00559, 2021. [187] J. Xue, J. Fang, T. Li, B. Zhang, P. Zhang, Z. Ye, and J. Dou,
[168] Y. Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Multimodal motion “Blvd: Building a large-scale 5d semantics benchmark for autonomous
prediction with stacked transformers,” in Proceedings of the IEEE/CVF driving,” in 2019 International Conference on Robotics and Automation
Conference on Computer Vision and Pattern Recognition, 2021, pp. (ICRA). IEEE, 2019, pp. 6685–6691.
7577–7586. [188] J. Bock, R. Krajewski, T. Moers, S. Runde, L. Vater, and L. Eckstein,
[169] J. Zhao, X. Li, Q. Xue, and W. Zhang, “Spatial-channel transformer “The ind dataset: A drone dataset of naturalistic road user trajectories
network for trajectory prediction on the traffic scenes,” arXiv preprint at german intersections,” in 2020 IEEE Intelligent Vehicles Symposium
arXiv:2101.11472, 2021. (IV). IEEE, 2019, pp. 1929–1934.
[189] R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic:
[170] L. L. Li, B. Yang, M. Liang, W. Zeng, M. Ren, S. Segal, and
Trajectory prediction in dense and heterogeneous traffic using weighted
R. Urtasun, “End-to-end contextual perception and prediction with
interactions,” in The IEEE Conference on Computer Vision and Pattern
interaction transformer,” in 2020 IEEE/RSJ International Conference
Recognition (CVPR), June 2019.
on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 5784–
[190] R. Krajewski, T. Moers, J. Bock, L. Vater, and L. Eckstein, “The round
5791.
dataset: A drone dataset of road user trajectories at roundabouts in
[171] Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE
germany,” in 2020 IEEE 23rd International Conference on Intelligent
Transactions on Knowledge and Data Engineering, 2021.
Transportation Systems (ITSC). IEEE, 2020, pp. 1–6.
[172] S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans,
[191] B. Coifman and L. Li, “A critical evaluation of the next generation
D. Dai, and L. Van Gool, “Multi-task learning for dense prediction
simulation (ngsim) vehicle trajectory dataset,” Transportation Research
tasks: A survey,” IEEE Transactions on Pattern Analysis and Machine
Part B: Methodological, vol. 105, pp. 362–377, 2017.
Intelligence, 2021.
[192] A. Ali, S. M. Shamsuddin, A. L. Ralescu et al., “Classification with
[173] J. Strohbeck, V. Belagiannis, J. Müller, M. Schreiber, M. Herrmann, class imbalance problem: a review,” Int. J. Advance Soft Compu. Appl,
D. Wolf, and M. Buchholz, “Multiple trajectory prediction with deep vol. 7, no. 3, pp. 176–204, 2015.
temporal and spatial convolutional neural networks,” in 2020 IEEE/RSJ [193] Y. Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha,
International Conference on Intelligent Robots and Systems (IROS). “Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,”
IEEE, 2020, pp. 1992–1998. arXiv preprint arXiv:1811.02146, 2018.
[174] T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Dı́az- [194] N. Nikhil and B. Tran Morris, “Convolutional neural network for
Rodrı́guez, “Continual learning for robotics: Definition, framework, trajectory prediction,” in Proceedings of the European Conference on
learning strategies, opportunities and challenges,” Information fusion, Computer Vision (ECCV), 2018, pp. 0–0.
vol. 58, pp. 52–68, 2020.
[175] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, Iago Pachêco Gomes is a PhD candidate in the
“Continual lifelong learning with neural networks: A review,” Neural Institute of Mathematics and Computer Science at
Networks, vol. 113, pp. 54–71, 2019. the University of São Paulo (ICMC/USP). He ob-
[176] D. Vasquez, T. Fraichard, and C. Laugier, “Incremental learning of tained his MS degree in Computer Science from
statistical motion patterns with growing hidden markov models,” IEEE the University of São Paulo (ICMC/USP) in 2019.
Transactions on Intelligent Transportation Systems, vol. 10, no. 3, pp. His current research interests are Machine Learning,
403–416, 2009. Autonomous Vehicles, Trajectory and Behavior Pre-
[177] ——, “Growing hidden markov models: An incremental tool for diction.
learning and predicting human and vehicle motion,” The International
Journal of Robotics Research, vol. 28, no. 11-12, pp. 1486–1506, 2009.
[178] J. C. John Halkias, “Next generation simulation (ngsim),” 2020.
[Online]. Available: http://ops.fhwa.dot.gov/trafficanalysistools/ngsim. Denis Fernando Wolf received a PhD degree in
htm Computer Science at the University of Southern Cal-
[179] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset: ifornia – USC in 2006. He is currently an Associate
A drone dataset of naturalistic vehicle trajectories on german highways Professor in the Department of Computer Systems at
for validation of highly automated driving systems,” in 2018 21st the University of São Paulo (ICMC-USP). His cur-
International Conference on Intelligent Transportation Systems (ITSC), rent research interests are Mobile Robotics, Machine
2018, pp. 2118–2125. Learning, Computer Vision and Embedded Systems.
[180] P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection
and tracking meet drones challenge,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2021.
[181] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Kr-
ishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal

You might also like