You are on page 1of 9

Accident Analysis and Prevention 92 (2016) 256–264

Contents lists available at ScienceDirect

Accident Analysis and Prevention


journal homepage: www.elsevier.com/locate/aap

Macroscopic hotspots identification: A Bayesian spatio-temporal


interaction approach
Ni Dong a,b , Helai Huang b,∗ , Jaeyoung Lee c , Mingyun Gao d , Mohamed Abdel-Aty c
a
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, 610031, China
b
Urban Transport Research Center, School of Traffic and Transportation Engineering, Central South University, Changsha, 410075, China
c
Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL, 32816-2450, United States
d
School of Science, Wuhan University of Technology, Wuhan, Hubei, 430063, China

a r t i c l e i n f o a b s t r a c t

Article history: This study proposes a Bayesian spatio-temporal interaction approach for hotspot identification by apply-
Received 30 August 2015 ing the full Bayesian (FB) technique in the context of macroscopic safety analysis. Compared with the
Received in revised form 5 March 2016 emerging Bayesian spatial and temporal approach, the Bayesian spatio-temporal interaction model con-
Accepted 3 April 2016
tributes to a detailed understanding of differential trends through analyzing and mapping probabilities of
Available online 23 April 2016
area-specific crash trends as differing from the mean trend and highlights specific locations where crash
occurrence is deteriorating or improving over time. With traffic analysis zones (TAZs) crash data collected
Keywords:
in Florida, an empirical analysis was conducted to evaluate the following three approaches for hotspot
Bayesian spatio-temporal interaction
model
identification: FB ranking using a Poisson-lognormal (PLN) model, FB ranking using a Bayesian spatial
Hotspot identification and temporal (B-ST) model and FB ranking using a Bayesian spatio-temporal interaction (B-ST-I) model.
Ranking criteria The results show that (a) the models accounting for space-time effects perform better in safety ranking
than does the PLN model, and (b) the FB approach using the B-ST-I model significantly outperforms the
B-ST approach in correctly identifying hotspots by explicitly accounting for the space-time variation in
addition to the stable spatial/temporal patterns of crash occurrence. In practice, the B-ST-I approach plays
key roles in addressing two issues: (a) how the identified hotspots have evolved over time and (b) the
identification of areas that, whilst not yet hotspots, show a tendency to become hotspots. Finally, it can
provide guidance to policy decision makers to efficiently improve zonal-level safety.
© 2016 Elsevier Ltd. All rights reserved.

1. Introduction is quite sensitive to random variations due to the random fluctu-


ation and rarity of crash events, the absence of consideration of a
Identifying a crash hotspot is a vital task for a safety improve- phenomenon known as regression-to-the-mean (Hauer, 1997), the
ment program, especially when highway agencies can afford to lack of examining crash dispersion (Elvik, 2007), the false assump-
examine and improve only a limited number of road sites. Numer- tion of a linear relation between crash count and traffic volume,
ous studies have been conducted on investigating the suitability etc.
of various statistical methodologies for the development of effec- Model-based approaches have offered several advantages to
tive hotspot identification and ranking criteria. A naïve statistical these estimation problems due to an increased precision by “bor-
approach based on the estimation of safety by using historical traf- rowing strengths” across similar sites based on available auxiliary
fic crash records, such as the crash frequency method (Deacon variables, especially in the case of small sample sizes (Miaou and
et al., 1975), the crash rate method, the rate quality control method Song, 2005; Cheng and Washington, 2005, 2008; Huang et al.,
(Stokes and Mutabazi, 1996), the crash severity methods and the 2009; Lee et al., 2015). Recently, the approaches based on empirical
safety index method (Tamburri et al., 1970), is often found to Bayesian (EB) adjusted safety performance measures have become
have serious limitations. This includes the fact that the approach popular. Such approaches make joint use of two clues to the safety
performance of a road entity by accounting for both crash history
and the predicted crash frequency of similar sites (Carlin and Louis,
∗ Corresponding author. 1996). Thus, it is clear that the EB method is expected to be more
E-mail addresses: dongni@home.swjtu.edu.cn (N. Dong), reliable as compared to the traditional method as shown by several
huanghelai@csu.edu.cn (H. Huang), jaeyoung@knight.ucf.edu (J. Lee), researchers (Hauer and Persaud, 1984; Hauer, 1996, 1997; Hauer
wh14 gao@126.com (M. Gao), mabdel@mail.ucf.edu (M. Abdel-Aty).

http://dx.doi.org/10.1016/j.aap.2016.04.001
0001-4575/© 2016 Elsevier Ltd. All rights reserved.
N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264 257

et al., 2002; Elvik, 2007). Nevertheless, the EB method may be crit- Thus, from a traffic planning and operational point of view, there
icized for implicitly requiring a large sample size of data to develop is a need to explore the use of the Bayesian spatio-temporal inter-
the reliability of safety performance functions and for ignoring action model in ranking and hotspot identification in the context
the “uncertainty” of associations of covariates and safety (Lan and of macroscopic safety analysis. This can provide guidance to policy
Persaud, 2011; El-Basyouny and Sayed, 2013). decision makers to efficiently improve zonal-level safety.
Within the model-based approaches, the full Bayesian (FB) This study aims at employing the Bayesian spatio-temporal
method has been explored in terms of whether it more reliably interaction approach to analyze crash trends as area-specific and
identifies hotspots as compared to EB (Huang et al., 2009; Jiang to identify crash hotspots in regional safety analysis. We propose a
et al., 2014). Specially, the advantage of the FB approach is its Bayesian spatio-temporal interaction model in which both an area-
explicit use of probability for quantifying the uncertainty of the specific intercept and trends over time are modeled as random
associations of crash risk and various risk factors, which could be effects and interaction between them is allowed for. Compari-
accommodated to estimate a posterior distribution representing son is conducted with the Bayesian spatial and temporal model
the final safety estimate for a specific site on which various ranking with respect to safety ranking and hotspot screening to pro-
criteria could be based (Carlin and Louis, 1996). vide an empirical evaluation of alternative hotspot identification
A further merit of FB is that it is able to allow probabilistic approaches.
and functional structures of complex space-time heterogeneities of
crash occurrence to be more realistically considered and tested by 2. Methodology
use of hierarchical model specification in statistical analysis. Sev-
eral recent studies have indicated that crash rates may be better This section presents the alternative FB hierarchical approaches
fitted by explicitly accounting for the spatial and temporal effects (i.e., the Poisson Lognormal model, the Bayesian spatial and tem-
by using a hierarchical modeling technique (Miaou and Song, 2005; poral model, the Bayesian spatio-temporal interaction model),
Huang and Abdel-Aty, 2010; Dong et al., 2014, 2015; Xu and Huang, including model specification, comparison and assessment and a
2015; Wang and Huang, 2016). As demonstrated by Levine et al. decision parameter in ranking and identifying hotspots.
(1995), spatial variations in crashes exist among observations over
space as crash data are typically collected with reference to a loca- 2.1. Full Bayesian hierarchical modeling in hotspot identification
tion dimension. In a similar vein, there can be a correlation over
time because many of the observed effects associated with a spe- The essential characteristic of the FB approach in hotspot iden-
cific site may remain the same over time or fluctuate periodically tification is the establishment of a probabilistic functional form of
(Congdon, 2003; Lord and Mannering, 2010). Hence, the consider- crash frequency and area-specific diverse risk factors (i.e., a safety
ation of both space and time effects are of fundamental importance performance function and a crash prediction model). The Poisson
in the study of hotspot identification as evidenced both empirically model has been applied for fitting the crash rate. However, the
and theoretically (Huang et al., 2009; Jiang et al., 2014). underlying assumption of the Poisson distribution of variance equal
Unfortunately the aforementioned FB hierarchical modeling to mean is often violated in the crash count data. To account for
approaches for hotspot identification deal with the spatial and tem- this issue of over-dispersion, the over-dispersed Poisson model has
poral effects in crash data as distinct entities, thus ignoring the generally been used as follows:
necessary interaction of space and time to crash occurrence. Most
previous studies assumed the impact of temporal effects on crash Yit |it ∼ Poisson (it ) = Poisson (it eit )
occurrence was constant over space; that is, a time effect, which

represents the variation of crash rates in time might be stationary log (it ) = ˛ + X it ˇ + 
from area to area. But this latent assumption has been denied in
where Yit denotes the crash count at the i th areas (i = 1, . . .N) in
many recent studies (Law et al., 2014; Li et al., 2014) in which a
the t th time period (t = 1, . . .T) as Poisson distributed with the
spatio-temporal interaction effect arises when events located rel-
mean crash frequency it . Given the exposure component eit and
atively close in geographic space occur at about the same time. In
the observed risk factors Xit at a specific area, the mean crash rate
other words, it is possible that the variation of the rate in time has
it varies by a stochastic component v.
larger impacts in certain spatial units and has smaller impacts in
Conveniently, v could be assumed independent of Xit and also
others. Consequently, the possibility of accounting for this spatio-
independent of each other for different observations, denoted as εit .
temporal interaction by allowing the time-trend to vary from area
Standard Poisson-lognormal (PLN) models are obtained by speci-
to area has considerable potential (Richardson et al., 2006).
fying εit with a lognormal distribution.
Variations of the time-trend across areas could be referred to
Model 1. Poisson-lognormal model:
as space-time variation. In 1995, Bernardineilli et al. (1995) pro-

posed a Bayesian hierarchical spatio-temporal interaction model log (it ) = ˛ + X it ˇ + εit
to address this issue by adapting ideas from time series and spa-
tial modeling to a joint analysis of space-time variation. This leads
εit ∼ normal (0,ε2 )
to a more precise estimation of the variability in parameters by
accounting for random variability, especially in the case of small Although the PLN model presented above is capable of captur-
sample sizes. Particularly in the context of hotspot identifica- ing unstructured over-dispersion, it largely ignores any structured
tion, Bayesian spatio-temporal interaction analysis contributes to a heterogeneities due to the spatial and temporal effects of crash data
detailed understanding of differential trends through analyzing and (Miaou and Song, 2005). To explicitly model the structured hetero-
mapping probabilities of site-specific crash trends differing from geneities introduced in the data collection and clustering process,
the mean trend. It could be adopted to highlight specific locations a hierarchical modeling technique has been found to be a better
where crash occurrence is deteriorating or improving over time. alternative in several traffic safety studies (Huang and Abdel-Aty,
Rather than simply identifying hotspots that occurred in the past 2010; Xu et al., 2014; Zeng and Huang, 2014). For this purpose,
at one period, the Bayesian spatio-temporal interaction approach a Bayesian spatial and temporal (B-ST) model could be proposed
provides insight into the processes influencing changing crash rates by replacing the cross-observation component εit with the spatial
over time and enables engineering evaluation and safety improve- and temporal random effects component ϑit in the link function.
ment to predict where crashes might be increasing in the future. Possible spatial and temporal dependence can be reflected in the
258 N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264

model by a serial variation ϑit . There are various options to spec- The spatio-temporal interaction term ıi represents the dif-
ify the distribution of ϑit (Congdon, 2003). A common choice is to ference between the area-specific log-rate and the overall mean
assume a lag-1 dependence in the errors based on the stationarity time-trend
and, as such, can be considered the differential trend
assumption, with ␥ as the autocorrelation coefficient, which is, of area i. As a consequence, a value of ıi < 0 implies that the area-
Model 2. The Bayesian spatial and temporal model: specific trend is less steep than the mean trend whilst a value of
 ıi > 0 implies that the area-specific trend is steeper than the mean
log (it ) = ˛ + X it ˇ + ϑit
trend. Thus, according to model 3, each area is allowed to have
  its own risk profile over time. Its intercept is given by the sum
ϑ2 (˛ + ␸i + i ), and its trend is given by the sum (
+ ıi ).
ϑi,1 ∼ normal 0,   As shown above, different models could be specified to explic-
1 − 2
itly accommodate various structured heterogeneities according to
  specific crash data structures. In this study, the impacts of different
ϑi,t ∼normal ϑi,t−1 , ϑ2 fort > 1 risk models are compared with each other with respect to safety
ranking and hotspot screening.
Despite the spatial and temporal effects being incorporated
into the modeling framework through an error term in the
2.2. Model comparison and assessment
above Bayesian spatial and temporal model, it ignores the spatio-
temporal interaction of traffic crashes among zones in regional
To evaluate the overall model fit and predictive performance,
safety analysis. Spatio-temporal interactive analyses have addi-
mean absolute deviance (MAD), mean squared prediction error
tional benefits over a purely spatial and temporal analysis as they
(MSPE) and R-square are used. The measures of effectiveness
allow us to simultaneously study crash trends over time and to
(MOEs) can be described as:
highlight unusual patterns among zones. The crash trends help
in the interpretation of spatial variations as they points towards 1 pred
MAD = |Yit − Yitobs |
risk factors and environmental or economic effects that are sta- n
ble over time. Alternatively, spatial-temporal interaction can be ∀i.t
represented by incorporating an error term in the Bayesian model
framework, leading to the B-ST-I model, 1

2
pred
MSPE = Yit − Yitobs
Model 3. Bayesian spatio-temporal interaction model: n

∀i.t
log (it ) = ˛ + X it ˇ + ␸i + i + (
+ ıi ) t
where Yitobs is the observed crash number at area i in time period
where ␸i is a random effect to account for unstructured over- pred
t while Yit denotes the model prediction of the expected crash
dispersion errors, which is specified via an ordinary exchangeable
number for a specific area.
normal prior:
R-square is known as the coefficient of determination, indicating
␸i ∼ normal (0,  2 ) how well a particular combination of covariates explains the crash
frequency. In Bayesian inference using MCMC algorithms, the R-
where ␸ is the precision (reciprocal of the variance) of ␸i . square values could be directly expressed as:
A pair of areas is considered neighboring if they are adjacent. ␸i
pred 2
is the spatial correlation term reflecting a shared border, which is (Yitobs − Yit )
specified with a CAR prior as suggested by Besag (1974):
∀i.t
  2
R = 1−
i | j , j = ¯ i, 2)
/ i,  2 ∼ normal( i
I
2
(Yitobs − Yit )
where ∀i.t
1
¯i =
 i ωij −
i=
/ j
ωij i =
/ j where Yit is the global mean of crash frequencies. Clearly, higher
R-square values are preferred.
and
 2 2.3. Hotspot identification performance evaluation
i2 = 
i =
/
ω
j ij
The decision parameter provides a link between the model, and
in which ωij is an entry on the proximity matrix and generally thus the data, and the decision to be made. Therefore, the choice of
reflects the spatial association of two analysis zones. Clearly, the decision parameters needs to take into consideration the context
values of ␸ and  control the amount of extra-Poisson variability under which the rank is to be used, especially the range of treat-
allocated to area-wide heterogeneity and clustering effects among ments to be performed. In this study, two decision parameters are
adjacent zones respectively: illustrated:
 
/ i, ı2 ∼ normal(ı̄i , i2 )
ıi |ıj , j = i,1 = ˆ it

∀i.t
where
1

ı̄i =  ıi ωij i,2 = ˆ it − ␭
␭ ˆ
normal
ˆ

normal
ˆ
= eit × exp(Xit ˇ)
ω
/ j ij
i=
i =
/ j it it
∀i.t
and
While the first decision parameter ( i,1 ) is usually some func-
ı2 tion of it , which can include an exposure component, a covariate,
i2 = 
i=
/ j
ωij space, time and exchangeable effects, the second decision param-
eter ( i,2 ) is useful to identify areas that exhibit abnormally high
N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264 259

observed random effects (i.e., a higher potential for safety improve- TST combines the site consistency test, the total performance
ment as compared to similar areas). difference test, the method consistency test and the total rank dif-
Given a decision parameter, two statistical ranking and selection ference test in order to provide a synthetic index. The test statistic
standards have been used: an absolute standard and a relative stan- is given as:
dard. In this study, we only focus on the relative standard ( i,2 ), i.e.,

ranking and selecting among a predetermined group of areas based 100 SCTj TPDTj − min TPDT
on their relative risk levels, thus, ranking by the posterior mean TSTj = × + 1− +
4 maxj SCT maxj TPDT
(PM)of the decision
 parameter. Symbolically, this is expressed as

Epost i,2 |Y , i.e., the expected value of i,2 taken over the poste- MCTj TRDTj − min TRDT
  + 1−
rior distribution of i,2 given all data y, i.e., p i,2 |Y . maxj MCT maxj TRDT
The performance of these models in identifying hotspots was
investigated by conducting a series of tests: the site consistency
Clearly, SCTj , TPDTj , MCTj , and TRDTj provide an absolute mea-
test (SCT), the total performance difference test (TPDT), the method
sure of effectiveness whereas TSTj gives a synthetic index relative
consistency test (MCT) and the total rank difference test (TRDT)
to the methods being compared. Thus, the methods with higher
(Cheng and Washington, 2008). Recently, Montella (2010) devel-
values of the test statistic are preferred.
oped a total score test (TST), which is a weighted combination of
A series of tests are conducted to compare the alternative meth-
the SCT, TPDT, MCT and TRDT test methods. Five diagnostic criteria
ods using five diagnostic criteria (i.e., SCTj , TPDTj , MCTj , TRDTj and
have been established in previous studies (Jiang et al., 2014): SCTj ,
TSTj ) by giving a cutoff percentage of the total number of areas.
TPDTj , MCTj , TRDTj and TSTj .
The SCT method is used to measure the ability of a method to
consistently identify an area as a hotspot over subsequent obser-
3. Data
vation periods. This test rests on the premise that a true hotspot
in period 1 should also have poor safety performance in period 2
In this study, data sets were collected from five counties in the
given that the crash determinants have not significantly changed.
state of Florida during the years 2010–2013. The counties include
The test statistic SCTj is given as:
Citrus, Hernando, Hillsborough, Pasco and Pinellas. Traffic analy-
n sis zones (TAZs) are special areas delineated by state and/or local
SCTj =  k,2, method = j,i + 1 transportation officials for tabulating traffic-related data and are
k=1
defined as part of the Census Transportation Planning Package (US
where k, 2,method = j,i + 1 represents the potential for safety Census Bureau, web link). As cited by Abdel-Aty et al. (2013), TAZs
improvement values as compared to similar areas in period i + 1 are thought to currently be the only traffic-related zone system and
for an area that ranked k in time period i as identified by hotspot are superior in being easily integrated with the transportation plan-
identification method j. ning process as they have the most important criteria used to define
TPDTj is a total safety performance measure difference test, TAZs, including spatial contiguity, homogeneity, compactness, etc.
which assumes that the hotspots identified by method j with all Thus, TAZs have commonly been considered as base units, and the
years of crash data are true hazard areas. For the top k true hotspots, following data sets were aggregated at the TAZ level. This study
the difference of potential safety improvement estimated in time was based on 2370 TAZs in total. The shape file of the TAZ bound-
periods i and i+1 was computed. The score of the TPDT test is the ary was collected from the Florida Department of Transportation
sum of the differences for the top k true hotspots, which is shown (FDOT) District Seven’s Intermodal Systems Development Unit.
as: The data sets contain historical crash data maintained by the
Florida Department of Transportation Crash Analysis Reporting

n
System and major descriptive factors associated with each TAZ,
TPDTj = ( k, 2,method = j,i + 1 − k, 2,method = j, i )
including traffic characteristics and demographic and socioeco-
k=1
nomic data. During the period of 2010–2013, a total of 173,305
where k, 2,method = j, i + 1 represents the potential for safety crashes were recorded, including fatalities, severe injuries, slight
improvement values as compared to similar areas in period i for injuries and property damage only (PDO). Traffic related data were
an area that ranked k in time period i as identified by hotspot collected primarily from the FDOT Roadway Characteristics Inven-
identification method j. tory. The
MCT evaluates a method’s performance by measuring the num- traffic exposure measure (i.e., eit ) was reflected by daily vehi-
ber of the same hotspots identified in both time period i and cle miles traveled (DVMT) in an individual TAZ. In addition to the
subsequent time period i + 1. The greater the MCT score, the more traffic volume, a variety of self-explanatory risk factors are consid-
reliable and consistent the method is. The test statistic MCTj can be ered as covariates (X it ). This includes the proportion of people aged
expressed as: 5–14, 15–19, 20–24 and 65 or older, the unemployment rate, the
proportion of commuters using cars/public transportation, the pro-
MCTj = {k1 , k2 , ..., kn }i ∩ {k1 , k2 , ..., kn }i + 1 portion of walking commuters, the proportion of people working
at home and the median family income. These demographic and
The TRDT test method measures the difference of rankings for socioeconomic data were obtained from the US Census Bureau.
hotspot identification in two successive time periods. The smaller The variables used for model development, as well as their
the total rank difference score, the more consistent the hotspot descriptive statistics, are shown in Table 1. To obtain the most par-
identification method. The test statistic is given as: simonious model, preliminary multicolinearity tests and backward
stepwise methods were employed in selecting covariates. It should

n
be noted that the proportion of people aged 20–24 and the propor-
TRDTj = |R(kj,i ) − R(kj,i + 1 )|
tion of commuters using public transportation were found to have
k=1
a kind of collinearity. Hence, these variables were finally omitted
where R(kj,i ) is the rank of area k in time period i identified by from the models. And the median family income variable was also
hotspot identification method j. excluded as it is highly correlated with DVMT.
260 N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264

Table 1
Summary of variable and descriptive statistics.

Variables Definition Year Mean S.D. Min Max

Predictor variable
Total crash Total number of 2010 19.3 21.9 0.0 173.0
crashes per TAZ 2011 16.0 18.5 0.0 161.0
2012 17.2 19.5 0.0 196.0
2013 20.6 23.7 0.0 228.0
Exposure variable
DVMT Daily vehicle miles 2010 47.2 71.4 0.0 747.0
traveled (in thousand) 2011 64.6 76.3 0.1 803.0
2012 66.6 76.0 0.1 782.1
2013 65.4 75.7 0.1 806.5
Explanatory variable
P aged 5–14 Proportion of people 2010 10.9 4.8 0.0 28.1
aged 5–14 2011 10.9 4.9 0.0 27.3
2012 10.8 4.8 0.0 26.5
2013 10.7 4.6 0.0 29.8

P aged 15–19 Proportion of people 2010 6.2 4.3 0.0 58.2


aged 15–19 2011 6.4 5.0 0.0 74.0
2012 6.2 5.0 0.0 74.9
2013 6.1 5.0 0.0 75.2

P aged 65 Proportion of people 2010 18.4 12.0 0.0 84.0


aged 65 or older 2011 18.3 11.9 0.0 82.5
2012 18.7 12.0 0.0 83.2
2013 19.0 11.7 0.0 83.0

Unemployment Unemployment rate 2010 0.4 0.1 0.0 0.9


2011 0.4 0.1 0.0 1.0
2012 0.4 0.1 0.0 1.0
2013 0.4 0.1 0.0 1.0

P car Proportion of 2010 0.9 0.1 0.0 1.0


commuters using cars 2011 0.9 0.1 0.0 1.0
2012 0.9 0.1 0.0 1.0
2013 0.9 0.1 0.0 1.0

P walking Proportion of walking 2010 0.0 0.0 0.0 0.5


commuters 2011 0.0 0.0 0.0 0.6
2012 0.0 0.0 0.0 0.3
2013 0.0 0.0 0.0 0.3

P home Proportion of people 2010 0.0 0.0 0.0 0.3


working at home 2011 0.1 0.0 0.0 0.4
2012 0.1 0.0 0.0 0.4
2013 0.1 0.0 0.0 0.4

Note: S.D. represents the abbreviation of standard deviation; min and max refer to the minimum and maximum values of the variable respectively.

4. Results model is used, which implies that there would not be a signifi-
cant difference between selected models if they were used only for
4.1. Model calibration and diagnostics identifying crash-contributing factors. In particular, it was found
that the proportion of people aged 65 or older, the unemployment
The FB analysis was implemented in the freeware WinBUGS rate and the proportion of walking commuters are associated with
package (Speigelhalter et al., 2003) using an MCMC algorithm. In a higher crash rate. However, the proportion of people aged 5–14,
the absence of strong prior information for factor effects and disper- the proportion of people aged 15–19, the proportion of commuters
sion parameters, uninformative priors were assumed with a normal using cars and the proportion of people working at home may help
distribution (0, 1000) for all regression coefficients (ˇ) and with reduce the crash rate of TAZ.
a gamma distribution (0.001, 0.001) for hyperparameters associ- The comparison results of model diagnostic criteria estimated
ated with the disturbance terms. In the model calibration, three from the models are also shown in Table 2. The results indicate that
chains of 30,000 iterations were set up for each model based on the the consideration of spatial and temporal effects, resulting in the B-
convergence speed and the magnitude of the data set. A four-fold ST and B-ST-I models, can significantly improve the goodness of fit
CV, where the dataset was divided into four groups by year (i.e. in comparison with the PLN model. Specifically, judged by the MAD,
2010–2013), was applied to evaluate the models’ predictive per- MSPE and R-square results in the PLN and B-ST-I models, the overall
formance. This division was developed for the PLN, B-ST and B-ST-I goodness of fit and average prediction accuracy are improved by
models. Each time, the sub-dataset of any three years was input about 62%, 88% and 15%, respectively.
for the training of the models, and the estimated parameters were The results also show that the B-ST model provides a slightly bet-
subsequently used to predict the crash frequencies for the removed ter model fit for the subject data set than does the PLN model (MAD
year. from 14.23 to 13.05; MSPE from 632.2 to 596.3; R-square from 0.75
Results of parameter estimation and a Bayesian credible interval to 0.73). Although it is very difficult to theoretically distinguish
(95% BCI) of significant factors in the final models are summarized among within-area correlations, the results imply that modeling
in Table 2. In all three models, seven identical factors are identi- spatial and temporal effects significantly improves the model fit.
fied as significant in terms of affecting crash risk. The results of Furthermore, the significant serial correlation coefficient ( = 0.63)
covariate coefficients appear to be very robust regardless of which
N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264 261

Table 2
Parameter estimation and model diagnostics.

PLN model B-ST model B-ST-I model

Mean 95% BCI Mean 95% BCI Mean 95% BCI

Intercept −0.44 (−0.58, −0.31) −0.18 (−0.32, −0.01) −0.49 (−0.79, −0.19)
P aged 5–14 −0.12 (−0.39, −0.11) −0.11 (−0.23, −0.01) −0.29 (−0.46, −0.20)
P aged 15–19 −0.59 (−0.95, −0.24) −0.13 (−0.66, −0.05) −0.82 (−1.28, −0.39)
P aged 65 0.15 (0.13, 0.43) 0.28 (0.10, 0.59) 0.26 (0.07, 0.55)
Unemployment 0.45 (0.27, 0.63) 0.99 (0.80, 1.24) 0.40 (0.06, 0.73)
P car −0.45 (−0.60, −0.31) −0.51 (−0.69, −0.34) −0.60 (−0.95, −0.25)
P walking 2.85 (2.36, 3.36) 2.28 (1.82, 2.76) 0.59 (0.22, 0.75)
P home −0.97 (−1.25, −0.67) −0.94 (−1.23, −0.68) −0.69 (−0.92, −0.42)
 – – 0.63 (0.61, 0.65) – –

– – – – −0.16 (−0.17, −0.15)
d – – – – 5.30 (4.85, 5.78)
MAD 14.23 (13.95, 14.55) 13.05 (12.83, 13.30) 5.32 (5.28, 5.36)
MSPE 632.2 (591.7, 633.4) 596.3 (505.2, 756.1) 69.90 (68.83, 71.04)
R-square 0.75 (0.73, 0.80) 0.73 (0.70, 0.82) 0.84 (0.84, 0.85)

Table 3
The performance evaluation of different hotspot identification methods.

Criteria Rank PLN B-ST B-ST-I

SCT TOP 1% 86.146 93.481 97.222


TOP 2.5% 145.255 152.176 144.670
TOP 5% 215.141 221.299 197.078

TPDT TOP 1% 25.351 25.836 22.670


TOP 2.5% 39.596 38.973 26.432
TOP 5% 60.811 59.305 29.625

MCT TOP 1% 10 11 18
TOP 2.5% 26 29 47
TOP 5% 57 63 97

TRDT TOP 1% 5298 1313 212


TOP 2.5% 8460 6502 956
TOP 5% 22753 17522 3630

TST TOP 1% 59.45 81.06 100


TOP 2.5% 57.21 66.12 98.77
TOP 5% 55.16 63.77 97.26

estimated in the B-ST model implies that autocorrelation of crash of hotspots. Compared with the SCT test method, the other four
counts may really exist in the subject data set. test methods (i.e., TPDT, MCT, TRDT and TST) reveal that the B-ST-
The B-ST-I model performs best with the highest value of R- I method outperforms all competing hotspot identifications by a
square as well as the lowest MAD and MSPE, which implies that a wide margin on this criterion, showing great consistency in rank-
benefit associated with the consideration of spatio-temporal inter- ing areas across observation periods. Overall, the five test methods
action in improving the goodness of fit. That is also to say that reveal that the B-ST-I method is the most consistent and reliable
modeling the space-time variation benefits the model estimation, for almost all levels of ranking. In particular, the B-ST-I method per-
as confirmed by previous studies (Bernardineilli et al., 1995). The forms best with the highest synthetic index of TSTj in identifying the
results show a statistically significant (p < 0.05) space-time vari- top 1% of hotspots, demonstrating the best consistency. Although
ation (i.e., the precision of spatio-temporal effects ı = 5.3). This it is not the best on all measures, in the cases in which it is not
may be due to the fact that the space-time variation of crash counts the best, it is always similar in performance to the best. This indi-
has been well explained by the specific effects of the residuals cates that the model that accounts for area-specific spatial and time
included in the model form. interactive effects is superior to the non-spatio-temporal models.
The discrepancy becomes larger when the cutoff numbers increase.
In usual practice, because of the constraint of resources or budgets,
4.2. Results of empirical evaluation in hotspot identification only areas with a very high crash risk will be selected for further
investigation and remedial treatment. Hence, the out performance
To evaluate alternative Bayesian models in hotspot identifica- of the B-ST-I approach becomes more important, particularly for
tion, the decision parameter ( i,2 ) value for four years (2010–2013) practical purposes.
was computed for each TAZ. The ranking of hotspots was based In hotspot identification, several test methods, including SCT,
on the decision parameter values. The top hotspot has the highest TPDT, MCT, TRDT and TST, were conducted to evaluate these
decision parameter value and so on. three approaches. Specifically, it was found that the B-ST approach
Using a continuous scale of cutoff numbers (i.e., the top 1%, performs significantly better in safety ranking than do the PLN
2.5% and 5%), Table 3 shows the performance of the three alter- approaches. This confirms the necessity of taking into account vari-
native approaches in terms of these tests statistics respectively. ous heterogeneities of crash occurrence due to spatial and temporal
Specifically, the B-ST method outperforms the other two hotspot effects on traffic safety. Furthermore, the B-ST-I model was found
identification methods on the SCT test in identifying the top 2.5% to outperform the B-ST approach in correctly identifying hotspots.
and 5% of hotspots, followed closely by the PLN method. How- This finding indicates that the flexibility in the model specification
ever, the B-ST-I method performs best in identifying the top 1%
262 N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264

Table 4
Example of network integrated screening results.

Rank PM TAZ Category Phot TAZ IntegratedCategory

1 11.78 1339 high 1 1339 high-high


2 8.629 916 high 1 916 high-high
. . . . . . .
. . . . . . .
. . . . . . .
74 1.03 1832 high 0.5024 815 high-high
75 1.008 1831 high 0.4842 2088 high-low
. . . . . . .
. . . . . . .
. . . . . . .
119 0.7528 1122 high 0 654 high-low
120 0.745 68 median 1 777 median-high
. . . . . . .
. . . . . . .
. . . . . . .
463 0.1655 1197 median 0.5012 1481 median-high
464 0.1653 794 median 0.493 2279 median-low
. . . . . . .
. . . . . . .
. . . . . . .
770 0.0001 759 median 0 1337 median-low
771 −0.0009 1461 low 1 1523 low-high
. . . . . . .
. . . . . . .
. . . . . . .
1615 −0.2438 1021 low 0.5011 1946 low-high
1616 −0.2443 900 low 0.4996 469 low-low
. . . . . . .
. . . . . . .
. . . . . . .
2370 −0.585 990 low 0 2142 low-low

of the FB approach can generate a great potential to improve the As illustrated in Fig. 1a, the posterior means of area-specific
accuracy of identifying hazardous areas by explicitly accounting crash trends are mapped. Without accounting for the spatio-
for space-time variation in addition to the stable spatial/temporal temporal variance of area-specific trends, areas exhibiting high risk
patterns of crash occurrences. as measured by the posterior mean of the decision parameter are
located in the south of the study region. Areas that have high prob-
abilities of being hotspots, as modeled by the posterior probability
of area-specific trends being greater than the overall trend, tend to
5. Discussion
be located in the northern and southwestern sections of the study
region (see Fig. 1b). The hotspots identified by the B-ST-I approach
Based on the above results of evaluation in hotspot identifica-
could be areas with a high crash risk that exhibit large increases
tion, integrated screening based on the Bayesian spatio-temporal
over the time period, as shown in Fig. 1c. This results is unsurpris-
interaction method is developed to provide a broad spectrum per-
ing given the fact that the B-ST-I provides the area-specific trends
spective for the processes influencing changing crashes over time.
over time by accounting for the space-time interaction effects (c.f.
Areas of high risk are identified by the posterior mean PMi of
Fig. 1c), while the PLN and B-ST make no spatio-temporal assump-
the decision parameter i,2 and the posterior probability of area-
tion and only show the high risk as measured by the posterior
specific differential trends of being a hotspot Photi . This reveals
mean of the decision parameter (c.f. Fig. 1a). Therefore, the hotspots
hotspots based on crash trends, i.e., where a crash trend has a
areas identified as shown in Fig. 1a and c are not identical. For
high probability of being greater than the overall trend for a study
instance, TAZ 84 has a high crash risk from 2010 to 2013 and is
region. Thus, all areas could be classified into three categories based
ranked number 8 based on the higher potential for safety improve-
on their PMi : high, median and low. High risk zones are defined
ment; it will likely be considered a hotspot. However, this zone
as zones at the top 5%, median risk zones refer to zones with a
exhibits little increase in crashes over the time periods analyzed;
PMi between 0 and the top 5% and low risk zones are those with
the integrated screening method may not identify it as a hotspot. In
a PMi less than 0. Then, all zones in the “high, median and low”
addition, TAZ 88 has a negative potential for safety improvement
categories respectively are again classified by the area-specific dif-
(i.e., PM88 = −0.585) as well as being classified as “low” based on
ferential probability of being a hotspot over time “Phot”. If the
the “Phot” value (see Fig. 1(b)), implying that it is relatively safe.
value of “Phot” is between 0.5 and 1, this implies that the area-
Hence, the addition of a temporal lens, as provided by the B-ST-I
specific trend is steeper than the mean trend whilst a value of
approach, is particularly informative for engineering analysts and
“Phot” between 0 and 0.5 indicates that the area-specific trend
enforcement professionals who seek to best manage resources.
is less steep than the mean trend. Likewise, there are two cate-
The study’s findings suggest that the proposed approach using
gories based on the value of “Phot”: high and low. Table 4 exhibits a
the B-ST-I model can successfully identify the high-risk zones and
part of the network integrated screening and ranking results. There
provide incentives to reduce the number of traffic casualties in a
are six overall combination classifications: “high-high”, “high-low”,
region’s safety program. Moreover, identifying safety hazardous
“median-high”, “median-low”, “low-high” and “low-low”. Specifi-
zones on a macro-level has contribute to engage in proactive safety
cally, 74 “high-high” TAZs (3%) in our case are identified as hotspots
planning. As for developed urban areas, it is desirable to formu-
because these zones not only have a high crash risk but also exhibit
late specific traffic safety management strategies that accounts for
an increasing trend in crashes over the time periods. In contrast,
zone-level characteristics. For example, areas with a very high crash
there are 754 “low-low” TAZs that are relatively safe. Overall, rather
risk by integrated screening and ranking and higher percentage
than simply identifying hotspots that occurred in the past at one
of adolescent, it is desirable to take more pedestrian and bicycle
time period, the B-ST-I approach highlights areas that exhibit a high
friendly during the transportation planning process.
probability of the area-specific crash trend being greater than the
overall mean trend.
N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264 263

Fig. 1. (a) Screening results based on the posterior mean of the decision parameter, (b) Screening results based on the posterior probability of area-specific differential trends
of being a hotspot, (c) Integrated screening results.

6. Conclusions not overlooked, providing a more comprehensive understanding of


crash changes over time.
This study proposed a Bayesian spatio-temporal interaction Although this study has dealt with a number of aspects in apply-
approach for hotspot identification by applying the FB technique. ing the B-ST-I approach for hotspot identification, three limitations
Using 2370 TAZs from data collected in Florida, a comparison study are worth mentioning and future research expanding on this study
of alternative approaches was conducted to evaluate the perfor- is recommended. First, although the area-specific crash trends over
mance of model fit and hotspot identification. With respect to time were analyzed for hotspot location, the specific mechanisms
model fit, the results showed that the B-ST-I models with accom- driving crash occurrences, i.e., what may influence each crash trend,
modation for the spatio-temporal interaction effect have better were unavailable. If appropriate data are available, future research
goodness of fit than do the other FB models (i.e., the PLN model could consider analyzing individual crash types to further inform
and the B-ST model). Compared with the Bayesian spatial and tem- an understanding of crash trends. Second, the potential benefit
poral approach, an essential theoretical advantage of the B-ST-I of the B-ST-I approach in better accounting for the uncertainties
method is the capability to partition the space-time data into a associated with a parameter estimation of SPF was not adequately
spatial pattern common to all time periods, a temporal trend pat- demonstrated in this empirical evaluation. Three, the results pre-
tern common to all spatial units and a space-time interaction term sented in this paper were based on only crash frequencies and
that allows local time trends to be different. Particularly in hotspot obviously additional research with involving both crash frequency
identification studies, two issues have been dealt with: (a) how and injury severity by employing multivariate models is required
the identified hotspots have evolved over time and (b) the iden- to confirm the paper’s findings. Hence, future research could be
tification of areas that, whilst not yet hotspots, show a tendency conducted using simulated data to experimentally investigate the
to become hotspots. Moreover, by use of a two-stage classification application potential of the B-ST-I approach in traffic safety evalu-
method, areas can be classified as hotspots (areas with a higher ation.
crash risk than average), coldspots (areas with a persistently lower
crash risk than average) or neither (areas with a crash risk that is Acknowledgements
persistently similar to the average), and the temporal behaviors of
these areas can subsequently be identified. This work was jointly supported by 1) the Natural Science Foun-
This study applied B-ST-I approach for estimating area-specific dation of China (No. 71371192), 2) the Research Fund for the Fok
crash trends in order to identify areas where the crash risk situa- Ying Tong Education Foundation of Hong Kong (142005), and 3)
tion is improving or deteriorating over time. By studying both the Joint Research Scheme of National Natural Science Foundation of
spatial and temporal pattern of crash occurrence, the proposed clas- China/Research Grants Council of Hong Kong (No. 71561167001 &
sification method could directly assist in the prioritization of areas N HKU707/15).
for practical implications because it is particularly informative for
traffic planning and operation management based on local available
enforcement resources. For instance, areas with the highest prob- References
ability of being a hotspot can be targeted with the most resources,
Abdel-Aty, M., Lee, J., Siddiqui, C., Choi, K., 2013. Geographical unit based analysis
and areas with a moderately high probability of being a hotspot are in the context of transportation safety planning. Transp. Res. Part A 49, 62–75.
264 N. Dong et al. / Accident Analysis and Prevention 92 (2016) 256–264

Bernardineilli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghislandi, M., Songini, M., Lan, B., Persaud, B., 2011. Full Bayesian approach to investigate and evaluate
1995. Bayesian analysis of space-time variation in disease risk. Stat. Med. 14, ranking criteria for black spot identification. Transp. Res. Rec. 2237, 117–125.
2433–2443. Law, J., Quich, M., Chan, P.W., 2014. Analyzing hotspots of crime using a Bayesian
Besag, J., 1974. Spatial interaction and the statistical analysis of lattice systems. J. spatiotemporal modeling approach: a case study of violent crime in the greater
Royal Stat. Soc. 36, 192–236. Toronto area. Geog. Anal., 1–19.
Carlin, B., Louis, T., 1996. Bayes and Empirical Bayes Methods for Data Analysis. Lee, J., Abdel-Aty, M., Choi, K., Huang, H., 2015. Multi-level hot zone identification
Chapman and Hall. for pedestrian safety. Accid. Anal. Prev. 76, 64–73.
Cheng, W., Washington, S.P., 2005. Experimental evaluation of hotspot Levine, N., Kim, K.E., Nitz, L.H., 1995. Spatial analysis of Honolulu motor vehicle
identification methods. Accid. Anal. Prev. 37, 870–881. crashes: I. Spatial patterns. Accid. Anal. Prev. 27, 663–674.
Cheng, W., Washington, S.P., 2008. New criteria for evaluating hotspot Li, G., Haining, R., Richardson, S., Best, N., 2014. Space-time variability in burglary
identification methods. Transp. Res. Rec. 2083, 76–85. risk: a Bayesian spatio-temporal modelling approach. Spat. Stat. 9, 180–191.
Congdon, P., 2003. Applied Bayesian Modelling. John Wiley and Sons Inc., New Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a
York. review and assessment of methodological alternatives. Transp. Res. Part A 44,
Deacon, J.A., Zegeer, C.V., Deen, R.C., 1975. Identification of hazardous rural 291–305.
highway locations. Transp. Res. Rec. 543, 16–33. Miaou, S.P., Song, J.J., 2005. Bayesian ranking of sites for engineering safety
Dong, N., Huang, H., Xu, P., Ding, Z., Wang, D., 2014. Evaluating spatial proximity improvement: decision parameter, treatability concept, statistical criterion
structures in crash prediction models at the level of traffic analysis zones. and spatial dependence. Accid. Anal. Prev. 37, 699–720.
Transp. Res. Rec. 2432, 46–52. Montella, A., 2010. A comparative analysis of hotspot identification methods.
Dong, N., Huang, H., Zheng, L., 2015. Support vector machine in crash prediction at Accid. Anal. Prev. 42, 571–581.
the level of traffic analysis zones: assessing the spatial proximity effects. Accid. Richardson, S., Abellan, J.J., Best, N., 2006. Bayesian spatio-temporal analysis of
Anal. Prev. 82, 192–198. joint patterns of male and female lung cancer risks in Yorkshire (UK). Stat.
El-Basyouny, K., Sayed, T., 2013. Depth-based hotspots identification and Methods Med. Res. 15, 385–407.
multivariate ranking using the full Bayes approach. Accid. Anal. Prev. 50, Speigelhalter, D.J., Thomas, A., Best, N., Lunn, D.J., 2003. WinBUGS Version 1.4.1
1082–1089. User Manual. MRC Biostatistics Unit, Cambridge University, United Kingdom.
Elvik, R., 2007. State-of-the-art Approaches to Road Accident Black Spot Stokes, R.W., Mutabazi, I.M., 1996. Rate-quality control method of identifying
Management and Safety Analysis of Road Networks. Institute of Transport hazardous road locations. Transp. Res. Rec. 1542, 44–48.
Economics, Oslo, Report 883. Tamburri, T.N., Smith, R.N., Mills, J.P., Perini, V.J., 1970. The safety index: a method
Hauer, E., Persaud, B.N., 1984. Problem of identifying hazardous road locations of evaluating and rating safety benefits. In: Highway Research Record 307,
using accident data. Transp. Res. Rec. 975, 36–43. HRB. National Research Council, Washington, DC, USA.
Hauer, E., Kononov, J., Allery, B., Griffith, M.S., 2002. Screening the road network for Wang, J., Huang, H., 2016. Road network safety evaluation using Bayesian
sites with promise. Transp. Res. Rec. 1784, 27–32. hierarchical joint model. Accid. Anal. Prev. 90, 152–158.
Hauer, E., 1996. Identification of sites with promise. Transp. Res. Rec. 1542, 54–60. Xu, P., Huang, H., 2015. Modeling crash spatial heterogeneity: random parameter
Hauer, E., 1997. Observational Before–After Studies in Road Safety. versus geographically weighting. Accid. Anal. Prev. 75, 16–25.
Pergamon/Elsevier Science, Inc., Tarrytown, New York. Xu, P., Huang, H., Dong, N., Abdel-Aty, M., 2014. Sensitivity analysis in the context
Huang, H., Abdel-Aty, M., 2010. Multilevel data and Bayesian analysis in traffic of regional safety modeling: identifying and assessing the modifiable areal unit
safety. Accid. Anal. Prev. 42, 1556–1565. problem. Accid. Anal. Prev. 70, 110–120.
Huang, H., Chin, H.C., Haque, M.M., 2009. Empirical evaluation of alternative Zeng, Q., Huang, H., 2014. Bayesian spatial joint modeling of traffic crashes on an
approaches in identifying crash hot spots. Transp. Res. Rec. 2103, 32–41. urban road network. Accid. Anal. Prev. 67, 105–112.
Jiang, X., Abdel-Aty, M., Alamili, S., 2014. Application of Poisson random effect
models for highway network screening. Accid. Anal. Prev. 63, 74–82.

You might also like