You are on page 1of 13

724 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO.

2, JUNE 2012

Driver Behavior Classification at Intersections and


Validation on Large Naturalistic Data Set
Georges S. Aoude, Member, IEEE, Vishnu R. Desaraju, Member, IEEE, Lauren H. Stephens, Student Member, IEEE,
and Jonathan P. How, Senior Member, IEEE

Abstract—The ability to classify driver behavior lays the foun- and/or observe the danger involved in such situations [3]. These
dation for more advanced driver assistance systems. In particular, data suggest that driver assistance or warning systems may
improving safety at intersections has been identified as a high have an appropriate role in reducing the number of accidents,
priority due to the large number of intersection-related fatalities.
This paper focuses on developing algorithms for estimating driver improving the safety and efficiency of human-driven ground
behavior at road intersections and validating them on real traffic transportation systems. Such systems typically augment the
data. It introduces two classes of algorithms that can classify driver’s situational awareness and can also act as collision
drivers as compliant or violating. They are based on 1) support mitigation systems [4].
vector machines and 2) hidden Markov models, which are two very Research on intersection decision support systems has be-
popular machine learning approaches that have been used suc-
cessfully for classification in multiple disciplines. However, existing come quite active in both academia and the automotive industry.
work has not explored the benefits of applying these techniques to In the US, the federal DOT, in conjunction with the California,
the problem of driver behavior classification at intersections. The Minnesota, and Virginia DOTs, as well as several U.S. research
developed algorithms are successfully validated using naturalistic universities, is sponsoring the Intersection Decision Support
intersection data collected in Christiansburg, VA, through the U.S. project [3], [5] and, more recently, the Cooperative Intersection
Department of Transportation Cooperative Intersection Collision
Avoidance System for Violations initiative. Their performances are Collision Avoidance Systems (CICAS) project [6]. In Europe,
also compared with those of three traditional methods, and the the InterSafe project was created by the European Commission
results show significant improvements with the new algorithms. to increase safety at intersections. The partners in the InterSafe
Index Terms—Driver behavior, driver warning systems, inten- project include European vehicle manufacturers and research
tion prediction. institutes [7]. Both projects try to explore the requirements,
tradeoffs, and technologies required to create an intersection
I. I NTRODUCTION collision avoidance system and demonstrate its applicability on

T HE FIELD of road safety and safe driving has wit-


nessed rapid advances due to improvements in sensing and
computation technologies. Active safety features like antilock
selected dangerous scenarios [3], [7].
This paper is focused on developing algorithms that infer
driver behaviors at road intersections and validating them using
braking systems and adaptive cruise control have widely been naturalistic data. The resulting algorithms can be applied to
deployed in automobiles to reduce road accidents [1]. However, either vehicle-based systems or infrastructure-based systems.
the U.S. Department of Transportation (DOT) still classifies Inferring driver intentions has been the subject of extensive
road safety as “a serious and national public health issue.” In research. For example, reference [8] introduced a mind-tracking
2008, road accidents in the U.S. caused 37 261 fatalities and approach that extracts the similarity of driver data to several
about 2.35 million injuries. A particularly challenging driving virtual drivers created probabilistically using a cognitive model.
task is negotiating traffic intersection safely; an estimated Reference [9] used graphical models and hidden Markov mod-
45% of injury crashes and 22% of roadway fatalities in the els (HMMs) to create and train models of different driver
U.S. are intersection related [2]. A main contributing factor maneuvers using experimental driving data.
in these accidents is the driver’s inability to correctly assess More specifically, the modeling of behavior at intersections
has been studied using different statistical models [10]–[16].
Manuscript received July 8, 2011; revised October 16, 2011; accepted These studies showed that the stopping behavior depends on
November 13, 2011. Date of publication February 3, 2012; date of current several factors including driver profile (e.g., age and perception-
version May 30, 2012. This work was supported in part by the Ford-MIT
Alliance, whose initial funding led to the seedling of the ideas in this paper, and
reaction time) and yellow-onset kinematic and geometric pa-
the continued funding of Scientific Systems Company, Inc. (SSCI) under Grants rameters (e.g., vehicle speed and distance to intersection).
N68335-09-C-0472 and Grant N68335-09-C-0590, and by Le Fonds Québécois Reference [12] developed red light running predictors based on
de la Recherche sur la Nature et les Technologies (FQRNT) Graduate Award.
The Associate Editor for this paper was J. Zhang.
estimating the time-to-arrival at intersections and the different
G. S. Aoude, V. R. Desaraju, and J. P. How are with the Department of stop-and-go maneuvers. It used speed measurements at two
Aeronautics and Astronautics, Massachusetts Institute of Technology (MIT), discrete point sensors, but the performance of their approach is
Cambridge, MA 02139 USA (e-mail: gaoude@alum.mit.edu; rajeswar@alum.
mit.edu; jhow@mit.edu). limited by the complexity of the multidimensional optimization
L. H. Stephens is with the Department of Electrical Engineering and Com- problem that must be solved. Closely related to the focus of this
puter Science, MIT, Cambridge, MA 02139 USA (e-mail: lhs@mit.edu). paper is the work presented in [15] and [16]. Reference [15] dis-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. cusses the use of time-to-intersection (T T I) and its advantages
Digital Object Identifier 10.1109/TITS.2011.2179537 over time-to-collision (T T C) for intersection safety systems.

1524-9050/$31.00 © 2012 IEEE


AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 725

TABLE I
CLASSIFICATION CATEGORIES

Fig. 1. Red light violation at signalized intersection. Adapted from


www.drivingschool.ca.
Fig. 2. Target vehicle approaching intersection. Classification is performed in
Reference [16] developed different warning algorithms for the Tw time window, and warning is potentially sent to the host vehicle at twarn
measured as the expected time to reach the intersection. In some cases, the time
signalized and stop intersections based on the required decel- corresponding to dmin can be larger than T T Imin .
eration parameter RDP , T T I, and speed–distance regression
(SDR) models. These algorithms will be used as a baseline for This requirement can be encoded in terms of signal detec-
comparison in our work since they have been widely used in the tion theory (SDT), which provides a framework for evaluating
driver behavior classification literature and have also been val- decisions made in uncertain situations [18]. Table I shows the
idated on a large traffic data set [17]. Note, however, that those mapping between classifier output and SDT categories. To meet
authors only consider simple relationships between the driving this performance constraint, the classifier must maximize the
parameters, while the algorithms developed in this paper have number of true positives (to correctly identify violators) while
the flexibility to combine many parameters in the same model. maintaining a low ratio of false positives (to minimize driver
This paper develops two novel classes of algorithms based annoyance).
on distinct branches of classification in machine learning to An underlying assumption for this classification is the avail-
model driver behaviors at signalized intersections. It also suc- ability of communication or sensing infrastructure to provide
cessfully validates these algorithms on a large naturalistic data the observations needed to classify the driver’s behavior and
set. First, it describes the driver behavior inference problem enable the detection of traffic signal phase. Vehicle-to-vehicle
and the different factors involved in the decision making. Then, (V2V) and vehicle-to-infrastructure (V2I) communication sys-
it introduces the two classes of algorithms, a discriminative tems would provide exactly this functionality and are an active
approach based on support vector machines (SVM), and a area of research [19], [20]. Alternatively, onboard sensors could
generative approach based on HMMs, along with the traditional be used to make these observations, particularly when warning
approaches that they are compared with. Next, it describes the drivers of their own impending violations [21]–[23].
implementation process of the different algorithms. Finally, it While several scenarios could be considered for this problem,
evaluates their performance on intersection data collected in this paper focuses on the case consisting of one host vehicle
Christiansburg, VA, as part of the DOT CICAS for Violations and several target vehicles. The goal is to warn the host vehicle
(CICAS-V) initiative [6]. when any of the target vehicles is predicted not to comply with
the traffic lights. To further specify the problem, the following
II. P ROBLEM S TATEMENT assumptions are made.
Consider an intersection controlled by a traffic signal, as 1) The host vehicle has the right of way and is compliant.
shown in Fig. 1. As a vehicle approaches the intersection, the Only the target vehicles that do not have the right of
objective is to predict from a set of observations whether the way are considered in the problem; the other vehicles
driver will stop safely if the signal indicates to do so. Drivers (i.e., with right of way) are ignored. In other words, the
who do not stop before the stop bar are considered to be focus is on warning compliant drivers from the danger
violators, whereas those who do stop are considered to be com- created by other potentially violating drivers. An implicit
pliant. Naturally, drivers behave differently, and the variation assumption is the existence of V2V and V2I systems to
in the resulting observations must be taken into account in the detect the traffic signal phase and to share position, speed,
classification process. and acceleration information among vehicles.
The ability to classify drivers lays the foundation for more 2) The host vehicle is warned at twarn only when a target
advanced driver assistance systems. In particular, these systems vehicle is classified as violating. Fig. 2 illustrates the
would be able to warn drivers of their own potential violations different warning-related variables. twarn corresponds to
as well as detect other potential violators approaching the the time when a target vehicle’s estimated time to arrive
intersection. Integrating the classifier into a driver assistance at the intersection, also known as T T I [24], reaches
system imposes performance constraints that balance violator T T Imin seconds, or when the distance of a target vehicle
detection accuracy with driver annoyance. to the intersection is equal to dmin meters, whichever
726 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

condition happens first. The time and distance thresholds of temporal property is what humans use to decide whether a
are chosen such that the host driver has enough time to vehicle will be compliant [30]. The vehicle’s T T I is defined
react to the warning. A detailed analysis of the choice of simply as
T T Imin and dmin is presented in Section V-B.
3) The target vehicles are tracked as early as possible, but r
TTI = (1)
their classification as violating or compliant is based on v
measurements taken in the Tw time window (Fig. 2).
where v is the vehicle’s current speed, and r is its distance to the
Different values of Tw are analyzed in the developed
stop line. For this classifier, the T T I value is computed when
algorithms; a larger Tw brings a longer measurement
the vehicle’s deceleration crosses some predefined threshold,
“memory” at the expense of an additional computation
indicating the onset of braking. Then, the driver is classified as
requirement. A large Tw might also include irrelevant
a violator if T T I < T T Ireq , where T T Ireq is the time given
measurements when the vehicle is very far from the
for a driver to stop safely after the onset of braking [24]. This
intersection. Finally, note that a target vehicle that stops in
static parameter can be adjusted to change how conservative the
or before the Tw window is directly labeled as compliant.
algorithm is in its classifications.
2) Static RDP: Other work has used a classifier based on
III. A LGORITHMS RDP [15], [16]. RDP gives the deceleration (in g) needed for
the vehicle to stop safely given its current distance and speed.
Classifying human drivers is a very complex task because of
It is defined as
the various nuances and peculiarities of human behaviors [25].
Researchers have shown that the state of a vehicle driver lies in
v2
some high-dimensional feature space [26]. RDP = (2)
Basic classification is traditionally performed by identifying 2rg
simple relationships or trends in data that define each class.
This includes using techniques such as model fitting and re- where r and v are as previously defined, and g is the grav-
gression to identify classification criteria [27]. However, by itational acceleration constant. For a given RDP threshold
only considering simple relationships, these approaches are RDPwarn , a warning distance is computed as
limited in their ability to accurately classify complex data
v2
where the classes may be defined by a variety of factors. To rwarn = . (3)
overcome this limitation, two approaches to classification have 2RDPwarn
been developed in the machine learning community.
Discriminative approaches, such as SVMs, are typically used The vehicle is then classified as a violator if at any time
in binary classification problems, which make them appropriate r < rwarn , i.e., if its required deceleration is greater than the
for the classification of compliant versus violating drivers. selected RDP threshold.
SVMs have several useful theoretical and practical characteris- 3) SDR: A more complex classification strategy based on
tics [28]. We highlight two of them: 1) training SVMs involves fitting regression curves is described in [16] and [31]. This
an optimization problem of a convex function, thus the optimal approach takes a set of speed and distance measurements from
solution is a global solution (i.e., no local optima); 2) the vehicles that are known to be compliant, discretizes the speeds,
upper bound on the generalization error does not depend on the and collects the distance measurements at each speed into a set
dimensionality of the problem. of bins based on percentiles. A regression curve of the form
Classification is often also performed using generative ap-
proaches, such as HMMs, to model the underlying patterns in rwarn = av b + c (4)
a set of observations and explicitly compute the probability
of observing a set of outputs for a given model [29]. HMMs is then fit for each bin. These SDR curves attempt to identify
are well suited to the classification of dynamic systems [9], relationships between speed and distance that can discriminate
such as a vehicle approaching an intersection. The states of the between compliant and violating drivers. For a given curve
HMM define different behavioral modes based on observations, (corresponding to a certain percentile of compliant driver tra-
and the transitions between these states capture the temporal jectories), the vehicle is classified as a violator if at any time
relationship between observations. r < rwarn , i.e., if it is closer to the stop line than expected for
a compliant vehicle at its current speed. Selecting a curve cor-
responding to higher percentile bins yields a more conservative
A. Traditional Methods
classifier.
This section presents three notable techniques from the lit- The version of this algorithm in [16] includes two additional
erature for classifying drivers based on simple relationships layers that declare a vehicle to be compliant if its deceleration
between observations: static T T I [24], static RDP [15], and is below some fixed threshold (e.g., due to braking) or if its
SDR [16]. These algorithms are widely used and provide a velocity is below some fixed threshold (e.g., to permit rolling
baseline for the performance analysis in Section VI. stops). However, only the deceleration layer is considered here
1) Static TTI: One of the most intuitive approaches to clas- as rolling stops are typically not associated with signalized
sification is to use the vehicle’s T T I. It is thought that this type intersections.
AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 727

Fig. 3. SVM-BF architecture.

B. SVM-BF subject to the constraints


The first developed algorithm, which is denoted as SVM- 
N
BF, combines SVM and Bayesian filtering. It was introduced αi yi = 0 αi ≥ 0. (7)
by the authors in [32] and [33] and is extended in this paper. i=1
The core of the algorithm is the SVM, a popular supervised Appropriate kernel selection and feature choice are essential
machine learning technique based on the margin-maximization to obtaining satisfactory results using SVM. Based on experi-
principle [28]. SVM has been successfully applied to several menting with different kernel functions and several combina-
applications, including text categorization, bioinformatics, and tions of features, the best results for this problem were obtained
database marketing [34]. It has also been used recently in active using the Gaussian radial basis function and combining the
safety research, including lane departure warning systems [25], following three features: 1) range to intersection; 2) speed; and
driver distraction detection algorithms [35], [36], and driver 3) longitudinal acceleration. At each measurement cycle, the
skill characterization methods [37]. This work develops a novel output of the SVM block is a classification y = +1 (compliant)
architecture that combines SVM with a Bayesian filter (BF) that or y = −1 (violator). This output is then fed into the Bayesian
enables it to perform well on the driver behavior classification filtering component (Section III-B3), which uses additional
problem. The following sections introduce the architecture of logic before making a final classification.
the SVM-BF algorithm and provide additional theoretical and 3) Bayesian Filtering Component: The BF component
practical details about each of its components. views the outputs of the SVM component as samples of a
1) SVM-BF Architecture: The architecture of the SVM-BF random variable y ∈ {violator, compliant} that is controlled by
algorithm is shown in Fig. 3. At the beginning of each a parameter θ such that
measurement cycle inside the Tw window, the SVM module
(Section III-B2) extracts the relevant features from sensor ob- p(y = compliant|θ) = θ. (8)
servations. It then outputs a single classification (violator versus
compliant) per cycle to the BF component (Section III-B3). The parameter θ is unknown. It represents the probability that
Then, at the end of the Tw window, i.e., at time twarn , the the driver belongs to the compliant class. The role of the BF
BF uses the current and previous SVM outputs to estimate component is to compute the expected value of θ given a
the probability that the driver is compliant. Using a threshold sequence of previous outputs from the SVM component.
detector, the SVM-BF outputs a final classification at twarn To infer the value of the hidden variable, a standard Bayesian
specifying whether the driver is estimated as violator or com- formulation is used [39]. We choose a beta distribution prior for
pliant. To speed up the convergence of the BF, a discount θ, which is a function of some hyperparameters a and b, i.e.,
function is added to the SVM-BF designed to deemphasize Γ(a + b) a−1
earlier classifications in Tw and therefore put more weight on beta(θ|a, b) = θ (1 − θ)b−1 (9)
Γ(a) + Γ(b)
the measurements of the vehicles that are closer to twarn .
2) SVM Component: This section gives a brief introduction where Γ(x) is the gamma function [39]. The values of a and
to SVMs and their implementation in the SVM-BF framework. b have an intuitive interpretation; they represent the initial
The reader is encouraged to refer to [38] for a detailed descrip- “confidence” given for each class, respectively. In other words,
tion of the SVM. they reflect the number of observations corresponding for each
Given a set of binary labeled training data {xi , yi }, where behavior, which were accumulated in previous measurement
i = 1, . . . , N, yi ∈ {+1, −1}, xi ∈ d , N is the number of cycles.
training vectors, and d is the size of the input vector, a new Given a sequence of SVM outputs y = [y1 , . . . , yN ], the
test vector z is classified into one class (y = +1) or the other posterior distribution of θ, i.e., p(θ|y), is computed by mul-
(y = −1) by evaluating the following decision function: tiplying the beta distribution prior by the binomial likelihood
N  function given by

D(z) = sgn αi yi K(xi , z) + B . (5)  
N m
i=1 bin(m|N, θ) = θ (1 − θ)N −m (10)
m
K(xi , xj), which is known as the kernel function, is the inner
product between the mapped pairs of points in the feature space, where m and l represent the number of SVM outputs corre-
and B is the bias term [38]. α is the argmax of the following sponding to y = compliant and y = violator, respectively. The
optimization problem: variable N is the total number of SVM classifications: N =
m + l. By normalizing the resulting function, we obtain

N
1 
N
max W (α) = αi − αi αj yi yj K(xi , xj) (6) p(θ|y) =
Γ(m + a + l + b) m+a−1
θ (1 − θ)l+b−1 . (11)
α
i=1
2 i,j=1 Γ(m + a) + Γ(l + b)
728 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

The expected value of θ given the sequence y, which is the


output of the BF component, can then be simply expressed as
1
m+a
E(θ|y) = θp(θ|y)dθ = . (12)
m+a+l+b
0

4) Discount Function: To improve the accuracy of the ex-


pected value computed in (12), earlier classifications in the Fig. 4. HMM-based classification architecture.
Tw window should be given less weight compared with later
classifications. The following discount function achieves the including speech recognition [29], and part-of-speech tagging
desired purpose: [40]. The application of HMMs to isolated word detection is
particularly relevant to the task of driver classification. In iso-
dk = C N −k , with d0 = C N (13) lated word detection, one HMM is generated for each word in
the vocabulary, and new words are tested against these models
where k = 1 . . . N is the index of the SVM output in the Tw to identify the maximum likelihood model for each test word
window, N represents the index of the last output in Tw , i.e., [29]. HMMs have also been used to recognize different driver
at time twarn , and C is a constant discount factor (0 < C ≤ 1) behaviors, such as turning and braking [9]. This motivates the
used to discount exponentially the weight of the output at time use of HMMs in this paper to detect patterns that characterize
k. Note that C = 1 is equivalent to no discounting. The value compliant and violating behaviors.
of C affects the performance of the SVM-BF significantly. 1) HMM-Based Architecture: Suppose two sets of observa-
Section V-D investigates different values for C in the search for tions are available: one known to be from compliant drivers
the best combination of the SVM-BF parameters. The variables and the other from violators. Each set of observations can be
m and l also need to be indexed by k, where mk and lk are the considered an emission sequence produced by an HMM model-
binary outputs of SVM at step k, and mk + lk = 1. Given these ing vehicle behavior. Using the expectation-maximization (EM)
changes, (12) can be rewritten as algorithm (see Section III-C3), two models λc and λv are

N learned from the compliant driver and violator training data,
d k mk + d 0 a respectively. Then, given a new sequence of observations z, the
k=1
E(θ|y) = (14) forward algorithm (see Section III-C2) is used with λc and λv to

N 
N
estimate the probability that the driver is compliant. As in the
d k mk + d 0 a + d k lk + d 0 b
k=1 k=1 SVM-BF algorithm, a threshold detector (see Section III-C4)
uses this result to output a final classification, labeling the
where a and b are the same hyperparameters defined in (9).
driver as either violating or compliant. Again, this classification
5) Threshold Detector: Given E(θ|y), the SVM-BF algo-
occurs at twarn based on the observations from the Tw window.
rithm outputs the final classification based on the threshold
Fig. 4 summarizes this architecture.
detector specified value τS . The driver is classified as compli-
2) HMMs and the Forward Algorithm: This section pro-
ant if E(θ|y) > τS ; otherwise, it is classified as violating. A
vides a brief introduction to HMMs and the forward algorithm
large threshold value τS is equivalent to a more conservative
as a technique for determining how well a model fits a set of
algorithm (catching more violators) but at the expense of an
observations. See [29] for additional details on HMMs and the
increased number of wrong warnings (i.e., false positives). The
forward algorithm.
choice of the value of τS is analyzed in Section V-D.
An HMM λ(T , t, e) consists of a set of n discrete states and
6) Sliding Window: An extension to the original SVM-BF
a set of observations at each state, as shown in Fig. 5. At any
algorithm is the introduction of a sliding window over the
given time k, the system being modeled will be in one of these
features, which proved to be essential in improving the per-
states qk = si , and the transition probability matrix T gives the
formance of the SVM-BF on road traffic data. To elaborate,
probability of transitioning to any other state at the next time
each feature consists of the means and variances of the last
step qk+1 = sj . Specifically
K different measurements. This change replaces the individual
measurements (range, velocity, and acceleration) with their Tij = P (qk+l = sj |qk = si ). (15)
means and variances computed over the window. This addition
indirectly adds time dependency to the sequence of outputs of The probability of the system starting in each state is given by
the SVM component without affecting computation times, thus the initial state distribution t, where ti = P (q1 = si ). Due to
improving the SVM-BF model. The choice of the value of K is these probabilistic transitions, the current state is typically not
analyzed in Section V-D. known. Instead, a set of observations is assumed to be available.
The probability of a state si emitting a certain observation zk
is given by ei (zk ). The emission distribution for each type of
C. HMM Based
observation is assumed to be Gaussian with unique mean μi
An alternative approach is based on the idea of learning and variance σi2 for every state si . This design decision ensures
generative models from a set of observations. HMMs have that each state corresponds to one specific mode of driving,
been used extensively to develop such models in many fields, which is characterized by a set of observations normally
AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 729

to the forward algorithm. Let

βi (k) = P (xk+1 , . . . , xK |qk = si , λ) (21)

be the probability of observing the rest of the partial sequence


of observations at time k for k ≤ K. Then, the backward
algorithm follows as

βi (K) = 1 (22)
 n
βj (k) = Tij ej (xk+1 )βi (k + 1). (23)
j=1

Fig. 5. Basic HMM λ(T , t, e) with n = 3 states, transition probabilities Tij , Using the terms αi (k) from the forward algorithm and βi (k)
and emissions ei .
from the backward algorithm, the probability of being in state
si at time k given the observations x is given by
distributed around some typical values (specified by the means
and variances). αi (k)βi (k)
A common task with HMMs is determining how well a γi (k) = P (qk = si |x, λ) = n . (24)
i=1 αi (k)βi (k)
given model λ(T , t, e) fits a sequence of observations x =
x1 , . . . , xK . This can be quantified as the probability of observ- Then, the probability of being in state si at time k and state sj
ing x given λ, P (x|λ). The forward algorithm is an efficient at time k + 1 is given by
method for computing this probability [29] and is defined as
follows. Let αi (k) be given by ξij (k) = P (qk = si , qk+1 = sj |x, λ)
αi (k)Tij ej (xk+1 )βj (k + 1)
αi (k) = P (x1 , . . . , xk , qk = si |λ) (16) = n n . (25)
i=1 j=1 αi (k)Tij ej (xk+1 )βj (k + 1)

which is the probability of observing the partial sequence


From these terms, the parameters of an updated HMM λ are
x1 , . . . , xk and having the current state qk at time k equal to
computed with the following update equations:
si given the model λ. Then, the forward algorithm is initialized
using the initial state distribution t, i.e.,
ti = γi (1) (26)
K−1
αi (1) = ti ei (x1 ), i = 1, . . . , n. (17) ξij (k)
Tij = k=1K−1
(27)
k=1 γi (k)
K
The probability of each subsequent partial sequence of obser- k=1 γi (k)xk
vations for k = 1, . . . , K − 1 is given by μi =  K
(28)
γi (k)
 n  Kk=1
i (k)(xk − μi )
2
 γ
αj (k + 1) = αi (k)Tij ej (xk+1 ), i = 1, . . . , n. (18) σi = k=1K . (29)
k=1 γi (k)
i=1
These maximum-likelihood estimates reflect the relative fre-
Upon termination at k = K, the algorithm returns the desired
quencies of the state transitions and emissions in the training
probability
data.

n Repeating this procedure with λ replaced by λ is guaranteed
P (x|λ) = αi (K). (19) to converge to a local maximum [41], i.e., as the number of iter-
i=1 ations increases, P (x1 , . . . , xN |λ) − P (x1 , . . . , xN |λ) → 0.
The resulting λ is the maximum likelihood model λ∗ (T , t, e).
3) EM Algorithm for HMMs: These observations can also Since the EM algorithm is only guaranteed to converge to a
be used to learn an HMM that captures the behavior of the local maximum, several sets of random initializations can be
underlying system. A standard technique for doing so, i.e., tested to reduce the effects of local maxima on the final model
the EM algorithm, is subsequently summarized. The complete parameters.
algorithm is detailed in [41]. As with the choice of features in the SVM, the observations
Given a set of N observation sequences (training data) used for the HMM can have a dramatic impact on its perfor-
x1 , . . . , xN , the EM algorithm computes the maximum- mance. After testing several combinations of observations, the
likelihood estimates of the HMM parameters, i.e., following five parameters were identified to give the best results
in terms of high detection accuracy and low false positive rates:
λ∗ (T , t, e) = arg max P (x1 , . . . , xN |λ(T , t, e)) . (20) 1) range to intersection; 2) speed; 3) longitudinal acceleration;
λ
4) T T I; and 5) RDP . In addition, the observations can be
To do so, it uses the forward algorithm, as defined earlier, as normalized to remove any bias introduced by differences in the
well as the backward algorithm [29], which is defined similar order of magnitude of the observations.
730 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

of 20 Hz beginning approximately 150 m away from the inter-


section, a GPS antenna to record the current time, four video
cameras to record each of the four approaches, and a phase snif-
fer to record the signal phase of the traffic light. These devices
collected data on drivers who were unaware of the experiment
as they moved through the intersection. The information
from these units then underwent postprocessing, including
smoothing and filtering to remove noise such as erroneous radar
returns. In addition, the geometric intersection description—a
detailed plot of the intersection accurate to within 30 cm—was
used to derive new values such as acceleration, lane id, and a
unique identifier for each vehicle. Information on each of the
car approaches was then uploaded onto the SQL database [43],
which was used to obtain the data analyzed in this paper.
The data were further processed at the Aerospace Controls
Fig. 6. Satellite image of the Peppers Ferry intersection (U.S. 460 and Peppers Laboratory at MIT for the purposes of this research. Microsoft
Ferry Rd, Christiansburg, Montgomery, Virginia 24073) taken from Google SQL Server 2008 Developer Edition was used to filter indi-
Earth. CICAS-V data from vehicles at Peppers Ferry intersection were used
to test the algorithms presented in this paper. vidual trajectories from the data collected in the CICAS-V
project. To maintain tractable offline runtimes for the learning
4) Threshold Detector: Using the EM algorithm, two mod- phases of the algorithms, the first 300 000 trajectories out of the
els, i.e., λc and λv , are learned from the compliant driver and 3 018 456 car approaches were extracted. They were classified
violator training data, respectively. Then, given a new sequence as compliant or violating based on whether they committed a
of observations z, the forward algorithm [29] is used with traffic light violation. Violating behaviors included drivers that
λc and λv to find the posterior probability of observing that committed traffic violation at the intersection, defined as cross-
sequence given each model P (z|λc ) and P (z|λv ). The prior ing over the stop bar after the presentation of the red light and
over the models is assumed to be uniform P (λc ) = P (λv ) = continuing into the intersection for at least 3 m within 500 ms.
0.5 since nothing is known beforehand about whether the driver Compliant behaviors included vehicles that stopped before
is compliant or violating. Then, the likelihood ratio the crossbar at the yellow or red light. Out of the extracted
trajectories, 1673 violating and 13 724 compliant trajectories
P (z, λc ) P (z|λc )
= > e−τH (30) were found and then used in the classification algorithms.
P (z, λv ) P (z|λv )

determines whether the driver is more likely to be compliant or V. I MPLEMENTATION


violate the stop bar and assigns the corresponding classification.
Note that this ratio is typically computed using log probabili- This section highlights the several decisions made in im-
ties, which introduces the e term in (30). The threshold τH can plementing the different algorithms introduced in Section III.
be selected to adjust the conservatism of the classifier and is First, Section V-A describes the training and testing procedures
discussed in greater detail in Section V-E. used for data validation and the rationale that motivates them. It
Since states have one emission distribution per observation, also introduces an analysis tool that is frequently used to com-
each state in the HMM represents a coupling between specific pare algorithm performance against parameter choice. Then,
ranges of values for each observation. It is this coupling and Section V-B discusses the parameters that are common to all
the transitions between different coupled ranges that allow the the algorithms. More specifically, it explains the values of the
HMM-based classifier to distinguish between compliant drivers variables affecting the warning timing and the maximum driver
and violators. annoyance levels. Finally, Sections V-C–E detail the choice of
parameters that are specific to the traditional, SVM-BF, and
HMM algorithms, respectively.
IV. DATA C OLLECTION AND F ILTERING
The roadside data used in this paper were collected as part
A. Training/Testing Approaches
of the CICAS-V project [6]. The CICAS-V project collected
data on over 5 500 000 approaches across three intersections. Using the trajectories selected from the CICAS-V database,
In this paper, the data from the Peppers Ferry intersection at as described in Section IV, the algorithms are tested in pseudo
U.S. 460 Business and Peppers Ferry Rd in Christiansburg, VA real time, i.e., by running them on the trajectories of the
(see Fig. 6), were used to evaluate the algorithms, providing a database as if the observations of the target vehicle were
total of 3 018 456 car approaches. The method of collection is arriving in real time. The observations from each trajectory are
detailed in [42]. A brief description is subsequently given. downsampled from 20 to 10 Hz to reduce the computational
At the Peppers Ferry intersection, a custom data acquisition load. The training and testing are performed using two different
system was installed to monitor real-time vehicle approaches. approaches: 1) basic generalization test (see Section V-A1), and
This system included four radar units that identified vehicles, 2) m-fold cross validation (see Section V-A2). Both approaches
measured vehicle speed, range, and lateral position at a rate aim at evaluating the generalization property of the algorithms.
AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 731

To evaluate the results of these tests, the receiver operation TABLE II


CUMULATIVE POPULATION PERCENTILE
characteristic (ROC) curve is used to display the true positive V ERSUS DRIVER RESPONSE TIME [44]
and false positive rates of each set of algorithm parameters [18].
The curve is generated by varying a parameter of interest (or
set of parameters), which is referred to as the beta parameter
in the SDT terminology [18]. Each point on the ROC curve
then corresponds to a different value of the beta parameter. The
choice of beta for each algorithm is subsequently detailed in its
respective section.
1) Basic Generalization Test: The first approach is a TABLE III
MINIMUM T T IMIN AND MINIMUM DISTANCE dMIN PAIRS
straightforward test of generalization. This consists of training
the algorithms on a randomly selected subset that is some
small fraction p of the data and testing on the remaining 1 − p.
This approach demonstrates the generalization property (or lack
thereof) of the algorithms. This property is essential for any
warning algorithm to perform successfully when deployed on
driver assistance systems, particularly given the number of distribution presented in [44]. This distribution answers the
vehicles encountered in everyday driving. The value of p is following question: given a specific driver response time, what
chosen to be 0.2. The total number of trajectories used for is the percentage of population that is able to react to a potential
this approach is 10 000 compliant and 1000 violating. In other collision? The larger T T Imin , the bigger the percentage of
words, 2000 compliant and 200 violating trajectories are used population to react on time to the warning. But a larger T T Imin
in the training phase, whereas the testing phase consists of is expected to lead to a worse performance of the warning algo-
8000 compliant and 800 violating trajectories. rithms because the final classification would be given earlier
2) m-Fold Cross Validation: The second approach uses the and after fewer measurements. To address this problem, the
standard m-fold cross-validation technique for testing gener- different algorithms were developed and evaluated for three
alization [27]. This involves randomly dividing the training different values of T T Imin summarized in Table II. They are
set into m disjoints and equally sized parts. The classification 1.0, 1.6, and 2.0 s, corresponding to 45%, 80%, and 90% of
algorithm is trained m times while leaving out, each time, a the population, respectively. Therefore, the engineer deciding
different set for validation. The mean over the m trials estimates which algorithm to implement has a clearer understanding of
the performance of the algorithm in terms of its ability to the tradeoffs for each choice. Note that the host vehicle is
classify any given new trajectory. The advantage of m-fold assumed to be at rest or moving with a negligible speed in this
cross validation is that, by cycling through the m parts, all the analysis. This is typically the case at twarn , the time where it is
available training data can be used while retaining the ability to warned of the target vehicle possible violation.
test on a disjoint set of test data. A total of 5000 compliant and 2) Minimum Distance Threshold dmin : The dmin distance
1000 violating trajectories are used in the m-fold approach with plays the role of a safety net. In most intersection approaches,
m = 4. First, each algorithm is run once on these data with the the T T Imin condition happens first. But for some cases
same ratio of training and testing data, producing a classifier where the target vehicle approaches the intersection with a low
with fixed parameters. This classifier is then tested using the speed, the T T Imin condition is met too close to the intersection.
m-fold cross-validation approach. The dmin condition ensures that such cases are captured, and
warning (if needed) is given with enough time for the driver to
B. Shared Parameters react. For T T Imin of 1.6 s, dmin is chosen to be 10 m. This
is equivalent to situations where vehicles cross the dmin mark
1) Minimum Time Threshold T T Imin : For each trajectory, with speeds lower than 6.25 m/s or 22.5 km/h, consistent with
as shown in Fig. 2, the final output of the algorithms is given at the low-speed assumption. For T T Imin of 1.0 and 2.0 s, dmin
time twarn , which is computed as is scaled to 6.25 and 12.5 m, respectively. These values are
twarn = min (T T Imin , t(dmin )) . (31) summarized in Table III. Note that in the case of a warning, the
driver will have a period of time larger than T T Imin to react,
In other words, twarn corresponds to the time when the esti- ensuring that the percentage of drivers responding on time to
mated remaining time for the target vehicle to arrive to the the warning is consistent with Table II numbers.
intersection is T T Imin seconds, or when the distance to the 3) Maximum FP Rate: Warning algorithms must take into
intersection is equal to dmin meters, whichever happens first. consideration driver tolerance levels, i.e., they should try to
The choice of T T Imin is important. It represents the amount ensure that the rate of false alarms is below a certain “annoy-
of time the host vehicle is given to react after being warned ance” level that is acceptable to most drivers. In this paper, the
that a violating target vehicle is approaching its intersection. maximum false positive rate is chosen to be 5%, in accordance
Choosing one single mean value for T T Imin provides little with automotive industry recommendations [16]. Therefore,
information about the performance of the warning algorithms the developed algorithms are designed and tuned under the
for response times away from the mean. Instead, we base the constraint of keeping false positive rates below 5% while trying
choice of T T Imin on the cumulative human response time to maximize true positive rates.
732 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

Fig. 7. Regression curves used as decision thresholds for the SDR algorithm.

C. Traditional Algorithm Parameters


1) Static TTI: The static T T I algorithm has two key para-
meters: 1) the safety threshold T T Ireq and 2) the deceleration Fig. 8. Ten best parameter combinations for the SVM-BF classifier with
threshold that indicates the onset of braking. The T T Ireq pa- corresponding true positive rates for a basic generalization test with T T Imin =
rameter is a natural choice for controlling how conservative the 1.6 s and dmin = 10 m.
classifier should be and thus is used as the beta parameter for the
ROC curve analysis. Values for T T Ireq were selected between All combinations of these parameters were tested, and Fig. 8
1.0 and 10.0 s. The onset of braking is taken to be the time at shows the ten combinations that produced the highest rates
which the vehicle’s deceleration is less than −0.075g. Earlier of true positives while maintaining a false positive rate below
work has shown that this is a realistic threshold for identifying 5% for one basic generalization test. The results of this test
brake activation [17]. In the event that a vehicle never crosses (see Section VI) were obtained using the best combination of
this threshold (i.e., does not brake), the classification is instead parameters in Fig. 8: Tw = 15, K = 7, C = 0.9, and τS =
performed at twarn . 0.9. The hyperparameters a and b in (9) are set both to 0.5,
2) Static RDP: The static RDP algorithm is also very specifying no bias toward either behavior. These values could
straightforward. The only adjustable parameter is the warning be changed to reflect a bias toward one driving behavior if the
threshold RDPwarn . Therefore, this is taken as the beta for the system is given prior knowledge of the target driving history.
ROC curve, with values ranging from 0.01g to 3.0g.
3) SDR: For the SDR classifier, several regression curves
E. HMM Parameters
must first be fit to a set of observations from compliant drivers.
Each curve models some percentage of compliant driver be- There are three key parameters for the HMM-based classi-
haviors, and the choice of these percentages determines the fier: 1) the number of states in the HMM; 2) the Tw window
classification thresholds. Fig. 7 shows one such set of curves. size; and 3) the decision threshold τH . As in the previous
The bottom curve corresponds to 0%, that is, none of the mea- methods, the threshold is selected as the beta parameter. The
surements from the compliant driver training data were below number of states determines how many different modes the
this curve. The top curve corresponds to the 80th percentile HMMs can capture, and as a result, the range of behaviors that
of compliant drivers. The other curves cover the range, with can be classified accurately. However, increasing the number of
a higher density of curves at lower percentages to allow for states also increases the complexity of the model and the risk
more precise threshold selection. Each choice of curve for the of overfitting the training data. Models with between 6 and 15
classification threshold is treated as a beta parameter for the states were considered, whereas Tw was varied from 10 to 20
ROC analysis. observations. All combinations of these parameters were tested,
and Fig. 9 shows the ten combinations that produced the highest
rates of true positives while maintaining a false positive rate
D. SVM-BF Parameters
below 5% for one basic generalization test. The results for this
There are four key parameters for the SVM-BF classifier: test (see Section VI) were obtained using the best combination
1) the Tw window size; 2) the discount factor C; 3) the of parameters in Fig. 9: Tw = 15, eight states, and τH = 54.4.
decision threshold τS ; and 4) the sliding window size K (see Recall that τH defines a threshold on the likelihood ratio
Section III-B). The threshold variable is selected as the beta (Section III-C4) and is distinct from τS , which is a threshold
parameter as it was introduced specifically to tune the perfor- on the probability of being classified as compliant (Section III-
mance of the algorithm. Models with Tw varying from 5 to B5). Monte Carlo testing was used to learn multiple models for
15 observations were considered, whereas C varied from each set of parameters to reduce the effects of local minima on
0.5 to 1.0 and K ranged from three to ten measurements. the algorithm.
AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 733

formance difference between these new approaches and the


leading traditional approach (RDP ) increases with T T Imin .
For example, in the basic generalization tests, the SVM-BF
algorithm has roughly 10% higher true positive rates than the
static RDP algorithm for a T T Imin of 1.0 s. However, as
this decision time is increased to 1.6 and 2.0 s, the difference
between SVM-BF and static RDP grows to nearly 18% and
21%, respectively (see Table IV). The m-fold tests exhibit a
similar pattern. This reflects the fact that making a decision
earlier, i.e., with a larger T T Imin , is a more challenging
problem. The ability to perform well with larger T T Imin is
of practical importance to provide an earlier warning to the
driver.
As Fig. 10 illustrates, the SVM-BF algorithm produces the
highest rate of true positives while keeping the false positive
rate at 5%. This follows from the fact that the SVM algorithm
uses convex optimization to perform accurate binary classifi-
cation. With the radial basis kernel, it is able to capture the
nuances characterizing each class of driver behavior. Note that
Fig. 9. Ten best parameter combinations for the HMM-based classifier with the ROC curves for SVM-BF in Fig. 10 are densely computed
corresponding true positive rates for a basic generalization test with T T Imin =
1.6 s and dmin = 10 m. for false positive rates less than 5% as this is the area of
interest. The HMM-based classifier also yields a higher rate
of true positives than the traditional methods because it is a
VI. R ESULTS
rich model that couples observations into modes (states) that
This section presents the results of the traditional and the characterize driver behavior. The state transitions also capture
developed SVM-BF and HMM-based algorithms described in the time dependencies that are inherent in the evolution of
Section III. The algorithms were implemented in MATLAB driver behavior while approaching an intersection. However,
using the LIBSVM toolbox [45] for SVM-BF and the PMTK the HMM-based classifier does not perform as well as the
toolkit [46] for the HMM-based classifier. Although these SVM-BF classifier as it is trying to fit general models to two
classifiers rely on more complex techniques than the traditional sets of behaviors that may include a few drastic outliers. In
methods, most of the computational complexity is in the offline contrast, the SVM is only trying to find the separating boundary
training phase. For online classification of a new trajectory, the or hyperplane between these two models.
computation time for the testing phase must be small. For the There are also several interesting trends in the traditional
results subsequently presented, the SVM-BF algorithm has an methods. The RDP -based classifier appears to be much more
average runtime of 5 ms per trajectory evaluation, whereas the accurate than the other two methods across all the test com-
HMM-based classifier averages 2 ms per trajectory evaluation. binations. Since RDP can be thought of as an effort-based
These tests were run on a 2.5-GHz quad-core computer, but parameter, it provides richer information than the observations
transitioning to an embedded implementation, rather than a used in the other two traditional algorithms.
MATLAB implementation, would greatly speed up the com- Although the T T I-based classifier may resemble how hu-
putation, allowing it to be run on less powerful hardware while mans classify approaching vehicles, the results also show that
retaining these small runtimes. it does not perform well. This is in part due to the fact that
Fig. 10 shows the ROC curves for the five algorithms previ- the decision is made at the onset of braking. In general, the
ously described. The algorithms were tested with both the basic onset of braking occurs well before T T Imin , so the challenge
generalization test and the m-fold cross-validation test [e.g., of correctly classifying a vehicle is comparable to classifying
compare Fig. 10(a) and (b)]. Each test also considered three a vehicle with a large T T Imin . This is reflected in the nearly
different decision thresholds: T T Imin = 1.0, 1.6, and 2.0 s identical T T I curves in Fig. 10.
[e.g., compare Fig. 10(a) versus Fig. 10(c) versus Fig. 10(e)]. The SDR classifier exhibits a unique behavior as T T Imin
These values of T T Imin are also coupled with dmin = 6.25, is varied. For low T T Imin , it outperforms the T T I algorithm.
10.0, and 12.5 m (refer to Section II). The inset in each plot However, as T T Imin increases, performance drops severely and
shows the region of interest around a false positive rate of 5%, is unable to reach a false positive rate of 5% for any choice of
whereas the precise true positive rates at this point are detailed threshold curve [e.g., compare Fig. 10(a) and (e)]. Thus, the
in Table IV. SDR algorithm does not yield a true positive rate for T T Imin
As these results illustrate, both the SVM-BF classifier and of 1.6 and 2.0 s, as indicated in Table IV.
the HMM-based classifier outperform the traditional methods In much of the existing literature, classification is performed
by a significant margin in each set of tests. Although there is when the driver arrives at the intersection, i.e., with T T Imin =
a difference between the true positive rates of the SVM-BF 0.0 s. This is the case in [16], which shows very accurate
and HMM-based classifiers, this difference is consistent across classification using the SDR algorithm. Since this is a less chal-
tests, typically in the range of 5%–7%. In contrast, the per- lenging scenario, most other algorithms, including the SVM-BF
734 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

Fig. 10. ROC curves for all five algorithms with insets showing area of interest around 5% false positives. (a) ROC curves for 10 000/1000 basic test at
T T Imin = 1.0 s. (b) ROC curves for 5000/1000 m-fold test at T T Imin = 1.0 s. (c) ROC curves for 10 000/1000 basic test at T T Imin = 1.6 s. ROC curves
for 5000/1000 m-fold test at T T Imin = 1.6 s. (d) ROC curves for 10 000/1000 basic test at T T Imin = 2.0 s. (e) ROC curves for 5000/1000 m-fold test at
T T Imin = 2.0 s.

and HMM-based classifiers, would also see a substantial per- work [16]. This ensures that the new algorithms developed in
formance improvement. However, since the objective is to this work are evaluated accurately against the baseline per-
provide an early warning, T T Imin must be selected to be formance set by these traditional methods. The basic general-
larger than the typical driver response time, as described in ization and m-fold cross-validation tests show the consistent
Section V-B. improvement the SMV-BF and HMM-based algorithms pro-
The general trends observed with these traditional methods vide in classifying a new set of trajectories. This validates
are also consistent with the behaviors observed in previous the two approaches and demonstrates the flexibility embedded
AOUDE et al.: DRIVER BEHAVIOR CLASSIFICATION AT INTERSECTION AND VALIDATION ON LARGE DATA SET 735

TABLE IV R EFERENCES
TRUE POSITIVE RATES FOR EACH ALGORITHM
FOR 5% F ALSE P OSITIVES [1] W. D. Jones, “Keeping cars from crashing,” IEEE Spectrum, vol. 38, no. 9,
pp. 40–45, Sep. 2001. [Online]: Available: http://dx.doi.org/10.1109/
6.946636.
[2] Fatality analysis reporting system encyclopedia, National Highway Traffic
Safety Administration, Washington, DC. [Online]. Available: http://www-
fars.nhtsa.dot.gov/Crashes/CrashesLocation.aspx
[3] B. Bougler, D. Cody, and C. Nowakowski, “California intersection deci-
sion support: A driver-centered approach to left-turn collision avoidance
system design,” Univ. Calif. Berkeley, Berkeley, CA, Tech. Rep., 2008.
[4] G. S. Aoude, B. D. Luders, D. S. Levine, K. K. H. Lee, and J. P. How,
“Threat assessment design for driver assistance system at intersections,”
in Proc. IEEE Conf. Intell. Transp. Syst., Madeira, Portugal, Sep. 2010,
pp. 1855–1862.
[5] C. Y. Chan and B. Bougler, “Evaluation of cooperative roadside and
vehicle-based data collection for assessing intersection conflicts,” in Proc.
IEEE Intell. Veh. Symp., 2005, pp. 165–170.
[6] M. Maile, F. A. Zaid, L. Caminiti, J. Lundberg, and P. Mudalige, “Cooper-
ative intersection collision avoidance system limited to stop sign and traf-
fic signal violations,” Nat. Highway Traffic Safety Admin., Washington,
DC, Midterm Phase 1 Rep., 2008.
[7] K. Fuerstenberg, “New European approach for intersection safety—The
EC-project INTERSAFE,” in Proc. IEEE Intell. Transp. Syst., 2005,
pp. 432–436.
[8] D. D. Salvucci, “Inferring driver intent: A case study in lane-change
detection,” in Proc. 48th Annu. Meeting Human Factors Ergonom. Soc.,
2004, pp. 2228–2231.
[9] N. Oliver and A. P. Pentland, “Graphical models for driver behavior recog-
nition in a smartcar,” in Proc. IEEE Intell. Veh. Symp., 2000, pp. 7–12.
[10] T. Gates, D. Noyce, L. Laracuente, and E. Nordheim, “Analysis of driver
in their design. Furthermore, the ROC analysis and test- behavior in dilemma zones at signalized intersections,” Transp. Res. Rec.
ing of multiple T T Imin values gives the designer or engi- J. Transp. Res. Board, vol. 2030, no. 1, pp. 29–39, 2007.
[11] H. Rakha, I. El-Shawarby, and J. R. Sett, “Characterizing driver behav-
neer the flexibility to select the parameters that best match ior on signalized intersection approaches at the onset of a yellow-phase
the requirements for any real-world implementation of these trigger,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 4, pp. 630–640,
algorithms. Dec. 2007.
[12] L. Zhang, K. Zhou, W. Zhang, and J. A. Misener, “Prediction of red light
running based on statistics of discrete point sensors,” Transp. Res. Rec. J.
Transp. Res. Board, vol. 2128, pp. 132–142, 2009.
VII. C ONCLUSION [13] J. Bonneson and H. Son, “Prediction of expected red-light-running fre-
quency at urban intersections,” Transp. Res. Rec. J. Transp. Res. Board,
This paper has introduced two new approaches for classify- vol. 1830, pp. 38–47, 2003.
ing driver behaviors at road intersections. The first approach, [14] N. Elmitiny, X. Yan, E. Radwan, C. Russo, and D. Nashar, “Classification
analysis of driver’s stop/go decision and red-light running violation,”
which is denoted as SVM-BF, combines an SVM classifier Accident Anal. Prev., vol. 42, no. 1, pp. 101–111, Jan. 2010.
with a BF to discriminate between compliant drivers and [15] V. Neale, M. Perez, Z. Doerzaph, S. Lee, S. Stone, and T. Dingus, In-
violators based on vehicle speed, acceleration, and distance tersection Decision Support: Evaluation of a Violation Warning System
to Mitigate Straight Crossing Path Crashes (report no. vtrc 06-cr10).
to intersection. The second approach, which is an HMM-
Charlottesville, VA: Virginia Trans. Res. Council, 2006.
based classifier, uses the EM algorithm to develop two distinct [16] Z. Doerzaph, V. Neale, and R. Kiefer, “Cooperative intersection collision
HMMs for compliant and violating behaviors. To optimize avoidance for violations: Threat assessment algorithm development and
safety while respecting driving acceptance levels, the algo- evaluation method,” presented at the Transportation Research Board 89th
Annu. Meeting, Washington, DC, 2010, Paper 10-2748.
rithms were designed to maximize true positive rates while [17] Z. Doerzaph, “Development of a threat assessment algorithm for intersec-
keeping the false alarm rates below the 5% threshold. The two tion collision avoidance systems,” Ph.D. dissertation, Virginia Polytechnic
algorithms were successfully validated on more than 10 000 Inst. State Univ., Blacksburg, VA, 2007.
[18] D. McNicol, A Primer of Signal Detection Theory. Hillsdale, NJ:
intersection approaches collected in Christiansburg, VA, as part Lawrence Erlbaum, 2004.
of the U.S. DOT CICAS-V initiative. The performances of the [19] Car-2-Car Communication Consortium. [Online]. Available: http://www.
algorithms were also compared with three popular traditional car-to-car.org
[20] Vehicle infrastructure integration consortium (VIIC). [Online]. Available:
approaches consisting of TTI-based, RDP-based, and SDR- http://www.vehicle-infrastructure.org
based algorithms. The results of several generalization tests [21] J. Leonard, J. P. How, S. Teller, M. Berger, S. Campbell, G. Fiore,
showed consistent and significant improvements with the de- L. Fletcher, E. Frazzoli, A. Huang, S. Karaman, O. Koch, Y. Kuwata,
D. Moore, E. Olson, S. Peters, J. Teo, R. Truax, M. Walter, D. Barrett,
veloped algorithms, ranging from a minimum of 10% increase A. Epstein, K. Maheloni, K. Moyer, T. Jones, R. Buckley, M. Antone,
in true positive rates to more than 20% increase when issuing a R. Galejs, S. Krishnamurthy, and J. Williams, “A perception-driven au-
warning 1 and 2 s in advance, respectively. tonomous urban vehicle,” J. Field Robot., vol. 25, no. 10, pp. 727–774,
Oct. 2008.
[22] L. Malta, C. Miyajima, and K. Takeda, “A study of driver behavior under
potential threats in vehicle traffic,” IEEE Trans. Intell. Transp. Syst.,
ACKNOWLEDGMENT vol. 10, no. 2, pp. 201–210, Jun. 2009.
[23] S. Thrun, What We’re Driving at Google’s Official Press Release An-
The authors would like to thank Dr. T. Pilutti, Dr. W. Najm, nouncing the Success of Their Driverless Cars, Oct. 2010. [Online]. Avail-
Prof. J. Leonard, and Dr. L. Fletcher for their valuable input. able: http://googleblog.blogspot.com/2010/10/what-were-driving-at.html
736 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012

[24] R. J. Kiefer, J. Salinger, and J. J. Ference, “Status of NHTSA’s rear-end Georges S. Aoude (M’11) received the B.Eng. de-
crash prevention research program,” presented at the 19th Int. Tech. Conf. gree in computer engineering from McGill Univer-
Enhanced Safety Vehicles, Washington, DC, 2005, Paper 05-0282. sity, Montreal, QC, Canada, in 2005 and the S.M. and
[25] H. M. Mandalia and D. D. Salvucci, “Using support vector machines for Ph.D. degrees in aeronautics and astronautics from
lane-change detection,” in Proc. Human Factors Ergonom. Soc. Annu. the Massachusetts Institute of Technology (MIT),
Meeting, 2005, vol. 49, pp. 1965–1969. Cambridge, in 2007 and 2011, respectively.
[26] D. Wipf and B. Rao, “Driver intent inference annual report,” Univ. From 2005 to 2008, he was part of the SPHERES
California, San Diego, CA, Tech. Rep., 2003. team at MIT, where he designed spacecraft recon-
[27] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. figuration maneuvers that were performed onboard
New York: Wiley-Interscience, 2001. the ISS. He is with the Department of Aeronautics
[28] V. Vapnik, The Nature of Statistical Learning Theory. New York: and Astronautics, MIT. He received the NSERC and
Springer-Verlag, 1995. FQRNT Ph.D. (2007–2010) and Masters (2005–2007) scholarships. He also
[29] L. Rabiner, “A tutorial on hidden Markov models and selected applica- received the 2003 Kenneth Young Lochhead and the 2002 CMC Electronics
tions in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, scholarships. His research interests include intention prediction, threat assess-
Feb. 1989. ment, and real-time path planning under uncertainty.
[30] A. Van der Horst, “A time-based analysis of road user behavior in normal Dr. Aoude is a member of the American Institute of Aeronautics and
and critical encounters,” Ph.D. dissertation, Delft Univ. Technol., Delft, Astronautics.
The Netherlands, 1990.
[31] H. Berndt, S. Wender, and K. Dietmayer, “Driver braking behavior during
intersection approaches and implications for warning strategies for driver
assistant systems,” in Proc. Intell. Veh. Symp., Istanbul, Turkey, 2007, Vishnu R. Desaraju (M’11) received the B.S.E.
pp. 245–251. degree in electrical engineering with Engineering
[32] G. S. Aoude and J. P. How, Using support vector machines and Bayesian Global Leadership honors from the University of
filtering for classifying agent intentions at road intersections, MIT, Michigan, Ann Arbor, in 2008 and the S.M. degree in
Cambridge, MA, Tech. Rep. ACL09-02. [Online]. Available: http://hdl. aeronautics and astronautics from the Massachusetts
handle.net/1721.1/46720 Institute of Technology (MIT), Cambridge, in 2010.
[33] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How, “Behavior From 2009 to 2010, he worked on the Agile
Robotics for Logistics project with MIT to develop
classification algorithms at intersections and validation using natural-
semi-autonomous field robotics for the U.S. Army.
istic data,” in Proc. IEEE Intell. Veh. Symp., Baden-Baden, Germany,
He is with the Department of Aeronautics and Astro-
Jun. 2011, pp. 601–606.
nautics, MIT. His current research interests include
[34] SVM Application List, 2006. [Online]. Available: http://www.clopinet.
real-time motion planning, multiagent control, adaptive control, and collision
com/isabelle/Projects/SVM/applist.html
avoidance strategies.
[35] Y. Liang, M. L. Reyes, and J. D. Lee, “Real-time detection of driver
Mr. Desaraju is a member of Eta Kappa Nu.
cognitive distraction using support vector machines,” IEEE Trans. Intell.
Transp. Syst., vol. 8, no. 2, pp. 340–350, Jun. 2007.
[36] T. Ersal, H. Fuller, O. Tsimhoni, J. Stein, and H. Fathy, “Model-based
analysis and classification of driver distraction under secondary tasks,”
Lauren H. Stephens (S’11) is currently an under-
IEEE Trans. Intell. Transp. Syst., vol. 11, no. 3, pp. 692–701, Sep. 2010. graduate student majoring in computer science and
[37] Y. Zhang, W. Lin, and Y.-K. Chin, “A pattern-recognition approach for electrical engineering with the Massachusetts Insti-
driving skill characterization,” IEEE Trans. Intell. Transp. Syst., vol. 11, tute of Technology (MIT), Cambridge.
no. 4, pp. 905–916, Dec. 2010. She worked in the undergraduate research op-
[38] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn., portunities program (UROP) at the MIT Aerospace
vol. 20, no. 3, pp. 273–297, Sep. 1995. Controls Labs. She contributed to the filtering,
[39] C. M. Bishop, Pattern Recognition and Machine Learning (Information processing, and extraction of vehicle trajectories
Science and Statistics). New York: Springer-Verlag, Aug. 2006. from the CICAS-V real-traffic database.
[40] D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun, “A practical part-of- Ms. Stephens was one of the recipients of the
speech tagger,” in Proc. 3rd Conf. Appl. Nat. Lang. Process., 1992, Anita Borg Google Scholarship in 2010.
pp. 133–140.
[41] J. Bilmes, “A gentle tutorial on the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,”
Int. Comput. Sci. Inst., Berkeley, CA, Tech. Rep. ICSI-TR-97-021, 1997. Jonathan P. How (SM’05) received the B.A.Sc.
[42] Z. R. Doerzaph and V. Neale, “Data acquisition method for developing degree from the University of Toronto, Toronto, ON,
crash avoidance algorithms through innovative roadside data collection,” Canada, in 1987 and the S.M. and Ph.D. degrees in
presented at the Transportation Research Board 89th Annu. Meeting, aeronautics and astronautics from the Massachusetts
Washington, DC, 2010, Paper 10-2762. Institute of Technology (MIT), Cambridge, in 1990
[43] M. Maile and L. Delgrossi, “Cooperative intersection collision avoidance and 1993, respectively.
system for violations (CICAS-V) for avoidance of violation-based inter- He was an Assistant Professor with the Depart-
section crashes,” presented at the Enhanced Safety Vehicles, Stuttgart, ment of Aeronautics and Astronautics, Stanford Uni-
Germany, 2009, Paper 09-0118. versity, Stanford, CA. He is a Richard C. Maclaurin
[44] S. McLaughlin, J. Hankey, and T. Dingus, “A method for evaluating col- Professor with the Department of Aeronautics and
lision avoidance systems using naturalistic driving data,” Accident Anal. Astronautics, MIT. He studied for two years at MIT
Prev., vol. 40, no. 1, pp. 8–16, Jan. 2008. as a postdoctoral associate for the Middeck Active Control Experiment that flew
[45] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” onboard the Space Shuttle Endeavour, in March 1995. His current research
ACM Trans. Intell. Syst. Technol. (TIST), vol. 2, no. 3, p. 27, 2011. interests include robust coordination and control of autonomous vehicles in
[46] M. Dunham and K. Murphy, PMTK3—Probabilistic modeling toolkit dynamic uncertain environments.
for matlab/octave, ver. 3. [Online]. Available: http://code.google.com/p/ Dr. How received of the 2002 Institute of Navigation Burka Award and is an
pmtk3/ Associate Fellow of American Institute of Aeronautics and Astronautics.

You might also like