You are on page 1of 9

Joint Modeling of Pedestrian

and Bicycle Crashes


Copula-Based Approach

Tammam Nashad, Shamsunnahar Yasmin, Naveen Eluru,


Jaeyoung Lee, and Mohamed A. Abdel-Aty

This study contributes to the safety literature on active mode trans- pedestrian fatalities as a percentage of total traffic crash–related
portation safety by using a copula-based model for crash frequency fatalities increased from 1.7% to 2.3% and 11% to 14%, respec-
analysis at a macro level. Most studies in the transportation safety tively (1). To increase the adoption of active transportation, risk to
area identify a single count variable (such as vehicular, pedestrian, or pedestrians and bicyclists on roadways must be reduced. The safety
bicycle crash counts) for a spatial unit at a specific period and study the risk posed to active transportation users in Florida is exacerbated
impact of exogenous variables. Although the traditional count models compared with active transportation users in the United States.
perform adequately in the presence of a single count variable, these Although the national average for pedestrian (bicyclist) fatalities
approaches must be modified to examine multiple dependent variables per 100,000 population is 1.50 (2.35), the corresponding number for
for each study unit. The presented research developed a multivariate the state of Florida is 2.56 (6.80). An important tool for determining
model by adopting a copula-based bivariate negative binomial model the critical factors affecting occurrence of pedestrian and bicycle
for pedestrian and bicycle crash frequency analysis. The proposed crashes and identifying vulnerable locations is the application of
approach accommodates potential heterogeneity (across zones) in the planning-level crash prediction models.
dependency structure. The formulated models were estimated with Traffic crashes aggregated at a certain spatial scale are nonnegative
pedestrian and bicycle crash count data at the statewide traffic analysis integer valued random events. These integer counts are examined
zone level for the state of Florida for 2010 through 2012. The statewide with count regression approaches that quantify the influence of
traffic analysis zone level variables considered in the analysis included exogenous factors on crash counts. Most studies in the transportation
exposure measures, socioeconomic characteristics, road network char- safety area identify a single count variable (such as vehicular, pedes-
acteristics, and land use attributes. A policy analysis was conducted— trian, or bicycle crash counts) for a spatial unit at a specific period
along with a representation of hot spot identification—to illustrate the and study the impact of exogenous variables. In this context, the
applicability of the proposed model for planning purposes. The develop- crash prediction model structures considered include Poisson (2, 3),
ment of such spatial profiles allows planners to identify high-risk zones Poisson–lognormal, Poisson–Gamma regression [also known as
for screening and subsequent treatment identification. negative binomial (NB)], Poisson–Weibull, and generalized Waring
models (4–10). Among these model structures, the NB model, which
Urban regions in North America are encouraging the adoption of offers a closed-form expression while relaxing the equal mean vari-
active modes of transportation by proactively developing infrastruc- ance equality constraint of Poisson regression, is the workhorse of
ture for these modes. According to data from the 2009 National crash count modeling.
Household Travel Survey, about 37.6% of trips by private vehicles
in the United States are of less than 2 mi (1). Even if a small propor-
tion of the shorter private vehicle trips (around dense urban cores) Multiple Dependent Variables
are substituted with public transportation and active transporta-
tion trips, individuals, cities, and the environment gain substantial Although the univariate count models perform adequately in the
benefits. However, a strong impediment to the growing adoption presence of a single count variable, these approaches must be modi-
of active modes of transportation is the risk associated with these fied to examine multiple dependent variables for each study unit.
modes. In the United States between 2004 and 2013, bicycle and For a study unit, if multiple dependent variables are available, com-
mon observed and unobserved factors that affect one dependent
variable could also affect the other dependent variables. Accommo-
T. Nashad, S. Yasmin, N. Eluru, and M. A. Abdel-Aty, Department of Civil, Envi- dating the impact of observed factors is relatively straightforward
ronmental, and Construction Engineering, College of Engineering and Computer
within count models by estimating distinct count models for every
Science, and J. Lee, Center for Advanced Transportation Systems Simulation,
Department of Civil, Environmental, and Construction Engineering, College of dependent variable. Incorporating the impact of unobserved fac-
Engineering and Computer Science, University of Central Florida, Suite 211, tors poses methodological challenges. Accommodating the impact
12800 Pegasus Drive, Orlando, FL 32816-2450. Corresponding author: N. Eluru, of unobserved factors recognizes that the multiple dimensions of
naveen.eluru@ucf.edu. interest have common error terms that affect the dependent variables.
Transportation Research Record: Journal of the Transportation Research Board,
In traditional discrete choice models, there are three ways that such
No. 2601, Transportation Research Board, Washington, D.C., 2016, pp. 119–127. joint processes are examined. The first approach considers the depen-
DOI: 10.3141/2601-14 dent variables being investigated as marginal distributions within a

119
120 Transportation Research Record 2601

bivariate (or multivariate) distribution by developing a joint error was considered. However, the distributional assumption could
distribution. The estimated distribution parameters permit evaluation influence the results. In a copula-based approach, the dependency
of the correlation between the dependent variables. The approach structures can be empirically compared, enhancing the flexibility of
usually results in closed form parametric formulations. These the multivariate approach. Thus the copula approach subsumes any
formulations thus allow for analytical computation of log likeli- bivariate modeling approach. Second, the copula-based approach
hood and offer more stable inference conclusions. Examples of such for count modeling results in an analytical formulation rather than
approaches are bivariate normal or logistic distributions, bivariate an approximation (as in CML methods) or simulation (in the MSL
NB distributions, and the flexible bivariate copula-based approaches or MCMC approaches). Thus, the parameter estimates are likely to
(11–13). The flexibility of the approach is restricted by the potential be more accurate. Finally, several exogenous factors could affect
parametric alternatives available. In the transportation safety area, the dependency profile between the multiple variables. Here, these
to the authors’ knowledge, no count models have been developed impacts are accommodated by parameterizing the dependency profile
that use this approach. to allow for such potential heterogeneity (across zones).
The second approach to addressing multiple dependent variables A simpler version of the proposed approach has been employed
involves developing a multivariate function as described in the first in econometrics by Cameron et al. (25). In their study, the copula
approach. However, as the estimation of the multivariate approach is dependency is considered to be the same across the entire data set.
computationally intractable, an approximation approach to evaluating To the best of the authors’ knowledge, this was the first attempt to use
the multivariate function is considered. The approach, referred to as such copula-based bivariate count models to examine crash count
the composite marginal likelihood (CML) approach, has received events. Copula models for ordered and unordered discrete outcome
considerable attention in the transportation literature in recent years variables have been adopted in the safety literature (12, 13, 26–27).
(14–16). For crash count modeling, the approach has been used by In this paper, copula-based models are used for count events analy-
Narayanamoorthy et al. for bicycle and pedestrian crash counts by sis. Empirically, the study examined the influence of several exog-
severity type (17). enous variables (exposure measures, socioeconomic characteristics,
The third approach to accommodate for the dependency between road network characteristics, and land use attributes) on pedestrian
the dependent variables allows for stitching by considering unobserved and bicycle crash count events at the statewide traffic analysis zone
error components that jointly affect the dependent variables. The (STAZ) level for the state of Florida.
approach usually partitions the error components of the dependent
variables to accommodate for a common term and an independent
term across dependent variables. The common error term across Model Framework
the dependent variables allows for possible unobserved effects. The
common term is considered with a distribution that has a zero mean. This study used a copula-based bivariate NB modeling framework
Thus, any computation of probability requires an integral across the to jointly model pedestrian and bicycle crash frequency at the zonal
error term distribution. The probability computation is dependent level. The econometric framework for the joint model is presented
on the distributional assumption and no longer has a closed form in this section.
expression. Thus, the estimation procedure requires adoption of Let i be the index for STAZ (i = 1, 2, 3, . . . , N) and yqi be the
the maximum simulated likelihood (MSL) approach or the Markov index for crashes occurring over a period of time in a STAZ i,
chain Monte Carlo (MCMC) approach in the Bayesian realm. MSL where q takes the value of 1 for pedestrian crashes and 2 for bicycle
and MCMC methods provide substantial flexibility in accommodat- crashes. The NB probability expression for random variable yqi can
ing for unobserved heterogeneity. However, in the MSL and MCMC be written as follows (25):
methods, the probability computation is sensitive to number of draws
as well as random number generation procedures. Further, these Γ ( yqi + α q−1 ) 
1 αq yqi
1   1 
approaches are more prone to efficiency loss because of inaccuracy Pqi ( yqi µqi , α q ) =  1 − 1 + α µ 
Γ ( yqi + 1) Γ (α q )  1 + α q µ qi 
−1 
in retrieving the variance–covariance parameters that are critical for q qi

inference. [Bhat provided a detailed discussion on the issue with (1)


MSL approaches (18).] Most count modeling approaches used in
the safety area have adopted the third approach. The model structures where
used in the literature include the multivariate–Poisson model (19),
Γ(•) = Gamma function,
Poisson–lognormal models (7, 20–22), and simultaneous equation
αq = NB dispersion parameter specific to road user group q, and
models (23, 24).
µqi = expected number of crashes occurring in STAZ i over a
given period for vulnerable road user group q.

Present Study in Context One can express µqi as a function of explanatory variable (xqi) by using
a log link function as µqis = E(yqi |xqi) = exp(bq xqi), where bq is a vector
The transportation safety literature on count modeling has focused of parameters to be estimated specific to road user group q.
on the third approach to examining multivariate count variables. The correlation or joint behavior of random variables y1i and y2i
The present research contributes to the literature on the first is explored with a copula-based approach. A copula is a mathemati-
approach—developing a multivariate model by adopting a copula- cal device that identifies dependency among random variables with
based bivariate NB model for analysis of pedestrian and bicycle prespecified marginal distribution (11, 28). In constructing the
crash frequency. The proposed approach has three major advantages copula dependency, assume that Λ1(y1i) and Λ2(y2i) are the marginal
relative to existing methods. First, in earlier research (related to the distribution functions of the random variables y1i and y2i, respec-
second and third groups), a particular distributional assumption on tively, and Λ12( y1i, y2i) is the joint distribution for the bivariate
the nature of the correlation across the multiple dependent variables case with corresponding marginal distribution. Subsequently, the
Nashad, Yasmin, Eluru, Lee, and Abdel-Aty 121

bivariate distribution Λ12(y1i, y2i) can be generated as a joint cumu- The level of dependence between the random variables can vary
lative probability distribution of uniform [0, 1] marginal variables across STAZs. Therefore, in this study, the dependence parameter θi
U1 and U2 (11): is parameterized as a function of observed attributes as follows:

Λ12 ( y1i , y2 i ) = Pr (U1 ≤ y1i , U2 ≤ y2 i ) θi = fn (gq sqi ) (7)


= Pr [ Λ (U1 ) ≤ y1i , Λ ( U2 ) ≤ y2 i ]
−1
1
−1
2
where
= Pr [U1 < Λ1 ( y1i ), U2 < Λ 2 ( y2 i )] (2)
sqi = column vector of exogenous variable,
q = row vector of unknown parameters (including a constant)
g
The joint distribution (of the uniform marginal variable) in specific to road user group q, and
Equation 2 can be generated by a function Cθi(.,.) (29), such that fn = functional form of parameterization.
On the basis of the dependency parameter permissible ranges,
Λ12 ( y1i , y2 i ) = Cθi (U1 = Λ1 ( y1i ), U2 = Λ 2 ( y2 i )) (3)
alternate parameterization forms for the six copulas are considered
in the analysis. For Normal, FGM, and Frank copulas, θi = gq sqi is
where Cθi(.,.) is a copula function and θi is the dependence param- used; for the Clayton copula, θi = exp(gq sqi); and for Joe and Gumbel
eter defining the link between y1i and y2i. In the case of two continuous copulas, θi = 1 + exp(gq sqi). The parameters to be estimated in the model
random variables, the bivariate density (or joint density) can be of Equation 6 are bq, αq, and gq. The parameters are estimated with
derived from partial derivatives for the continuous case. However, maximum likelihood approaches. The model estimation is achieved
in this study, y1i and y2i are nonnegative integer valued events. For through the log likelihood functions programmed in GAUSS (30).
such count data, following Cameron et al. (25), the probability mass
function (ζθi) is presented (instead of continuous derivatives) by use
of finite differences of the copula representation as follows: Data Description

This study is focused on pedestrian and bicycle crashes at the STAZ


ζ θi ( Λ1 ( y1i ), Λ 2 ( y2 i )) = Cθi ( Λ1 ( y1i ), Λ 2 ( y2 i ); θi )
level. There are 8,518 STAZs in the state of Florida (Figure 1). Data
− Cθi ( Λ1 ( y1i − 1), Λ 2 ( y2 i ); θi ) for the empirical study were obtained from Florida for 2010 through
2012. The pedestrian and bicycle crash records were collected and
− Cθi ( Λ1 ( y1i ), Λ 2 ( y2 i − 1); θi ) compiled from Florida Department of Transportation (DOT) crash
analysis reporting and Signal Four Analytics databases. These data-
+ Cθi ( Λ1 ( y1i − 1), Λ 2 ( y2 i − 1); θi ) (4)
bases contain long and short forms of crash reports, respectively, for
Florida. The long-form crash report includes crashes with a higher
Given the above setup, Λ1(y1i) and Λ2(y2i) are specified as the injury severity level and crashes related to criminal activities (such
cumulative distribution function (CDF) of the NB distribution. as hit-and-run or driving under the influence). Crash data records
The CDF of NB probability expression (as presented in Equation 1) from short- and long-form databases were compiled for more com-
for yqi can be written as plete information on road crashes and hence were used for the analysis
in this study.
yqi
In addition to the crash database, the explanatory attributes con-
Λ q ( yqi µ qi , α q ) = ∑ Pqi ( yqi µ qi , α q ) (5)
k =0
sidered in the empirical study were aggregated at the STAZ level
accordingly. For the empirical analysis, the selected explanatory
variables were grouped into four broad categories: exposure mea-
Thus, the log likelihood function with the joint probability expression
sures, socioeconomic characteristics, road network characteristics,
in Equation 4 can be written as
and land use attributes. Exposure measures, socioeconomic charac-
N
teristics, and land use attributes were obtained from the U.S. Census
LL = ∑ ζ θi ( Λ1 ( y1i ), Λ 2 ( y2 i )) (6) Bureau and Florida DOT district offices and metropolitan planning
i =1 organizations (MPOs) or from the Florida DOT central office. The
road network characteristics and traffic related attributes were col-
The empirical analysis uses six copula structures: (a) Gaussian, lected from the Florida DOT Transportation Statistics Office. STAZ
(b) Farlie–Gumbel–Morgenstern (FGM), (c) Clayton, (d) Gumbel, data were collected from Florida DOT district offices and MPOs
(e) Frank, and ( f ) Joe. [A detailed discussion of these copulas is or the Florida DOT central office, the U.S. Census Bureau, and
available elsewhere (11)]. The Gaussian, FGM, and Frank copulas the Florida Geographic Data Library. Table 1 provides a summary
represent symmetric dependency structures that ensure higher depen- of the sample characteristics of the count and exogenous variables
dency for unobserved variables around the mean of the distribution. and the definitions of variables considered for final model estima-
The Clayton copula allows for stronger dependency between the tion along with the zonal minimum, maximum, average, and stan-
unobserved variables for the lower tails of the distribution. The dard deviation values for Florida. Table 1 shows that for the three
Gumbel and Joe distributions offer a mirror image to the Clayton studied years, Florida had a record of 16,240 pedestrian crashes with
copula by allowing for stronger dependency toward the positive an average of 1.90 crashes (ranging from 0 to 39 crashes) per STAZ.
tails of the distribution. Of the Joe and Gumbel copulas, Joe allows The state had an average of 1.79 crashes (ranging from 0 to 88) per
for a stronger positive tail dependency. [Figure 1 in the work by Bhat TAZ with a total record of 15,307 bicycle crashes for the 3-year
and Eluru provides more detail (11).] period.
122 Transportation Research Record 2601

mi
0 12.525 50 75 100

FIGURE 1   STAZs in Florida.

Empirical Analysis where

Model Specification and Overall Measures of Fit LL = log likelihood value at convergence,
K = number of parameters, and
The empirical analysis involved estimation of models with six copula Q = number of observations.
structures: Gaussian, FGM, Clayton, Gumbel, Frank, and Joe. The The model with the lower BIC is the preferred copula model. The
empirical analysis involved a series of model estimations. First, BIC value for the independent copula model was 48747.45. The
independent copula models (separate NB models for pedestrian and following copula models (BIC) without parameterization offered
bicycle crash counts) were estimated to establish a benchmark for improved data fit: Clayton (48343.15), FGM (48388.16), and Frank
comparison. Second, the six models were estimated by considering (48340.05). The Gaussian, Gumbel, and Joe copulas collapsed to the
the dependency parameter in the copula model to be the same across independent copula model. For copula dependency profile param-
all STAZs. Third, the copula models were also estimated by consider- eterization, the variables effects were significant only for the Clayton
ing the parameterization for the copula dependency profile. Finally, to copula. Overall, the Clayton copula with dependency profile param-
determine the most suitable copula model (including the independent eterization (48271.85) outperformed the other copula models, as
copula model), a comparison exercise was undertaken. The alternative well as the independent model. The copula model BIC comparisons
copula models estimated were nonnested and hence could be tested confirmed the need to accommodate dependence between pedestrian
with traditional log likelihood ratio test. The Bayesian information and bicycle crash count events in a macro-level analysis.
criterion (BIC) was used to determine the best model from all the
copula models (26, 28, 29, 31). The BIC for a given empirical model
is as follows: Estimation Results

The presentation of the effects of exogenous variables in the joint


BIC = − 2LL + K ln (Q ) (8) model specification is restricted to a discussion of the Clayton copula
Nashad, Yasmin, Eluru, Lee, and Abdel-Aty 123

TABLE 1   Sample Statistics for State of Florida

Zonal Percentile

Variable Variable Description Minimum Maximum Average SD 20th 80th

Dependent Variable
Pedestrian crashes per STAZ Total pedestrian crashes per STAZ 0.000 39.000 1.907 3.315 0.000 3.000
Bicycle crashes per STAZ Total bicycle crashes per STAZ 0.000 88.000 1.797 3.309 0.000 3.000
Exposure Measure
VMT Natural log of VMT in STAZ 0.000 13.437 9.039 2.659 7.978 10.870
Proportion of heavy vehicles Total heavy-vehicle VMT in STAZ or 0.000 0.519 0.067 0.052 0.031 0.095
total vehicles VMT in STAZ
Total population Natural log of total population in STAZ 0.000 10.571 6.437 2.144 4.990 8.233
Population density Natural log of population density (mi2) 0.000 11.052 6.481 2.257 4.542 8.267
Proportion of families with Total families with no vehicle in STAZ 0.000 1.000 0.095 0.123 0.020 0.133
no vehicle or total families in STAZ
Socioeconomic Characteristic
Public transit commuters Commuters using public transportation 0.000 934.000 18.812 54.273 0.000 18.000
Bicycle commuters Commuters using bicycle 0.000 775.000 5.844 19.263 0.000 6.000
Walk commuters Commuters by walking 0.000 1,288.000 14.352 34.681 0.000 20.000
Total employment Natural log of total employment in STAZ 0.000 10.371 5.857 2.017 4.382 7.523
Proportion of service employment Proportion of service employment 0.000 1.000 0.525 0.257 0.294 0.760
Proportion of industrial employment Proportion of industrial employment 0.000 1.000 0.176 0.232 0.000 0.333
Proportion of commercial Proportion of commercial employment 0.000 1.000 0.299 0.235 0.072 0.498
employment
School enrollment density Natural log of total school enrollment 0.000 12.450 2.715 3.143 0.000 6.278
per square mile in STAZ
Road Network Characteristic
Proportion of urban area Total urban area in STAZ or total area 0.000 1.000 0.722 0.430 0.007 1.000
in STAZ
Proportion of local roads Total length of local roads in STAZ or 0.000 1.000 0.572 0.329 0.177 0.858
total length of all roads in STAZ
Proportion of collector roads Proportion of collectors 0.000 1.000 0.191 0.246 0.000 0.323
Proportion of arterial roads Total length of arterial roads in STAZ 0.000 1.000 0.221 0.275 0.006 0.369
or total length of all roads in STAZ
Traffic signal density Natural log of total traffic signals per 0.000 8.756 0.227 0.578 0.000 0.269
miles of road in STAZ
Bike lane length Bike lane length 0.000 28.637 0.303 1.096 0.000 0.030
Sidewalk length Sidewalk length 0.000 25.683 0.993 1.750 0.000 1.735
Land Use Attribute
Density of hotel, motel, or Natural log of total hotel, motel, or time- 0.000 10.392 1.549 2.365 0.000 3.924
time-share room share room per square mile in STAZ
Distance to nearest urban area Distance of the STAZ to nearest urban 0.000 44.101 2.140 5.441 0.000 1.606
area (mi)

Note: VMT = vehicle miles traveled.

specification. Table 2 presents the estimation results of the joint traveled (VMT) at the zonal level. The result related to VMT rep­
model. For ease of presentation, the pedestrian crash count compo- resents the higher crash risk faced by nonmotorized (pedestrian and
nent results and bicycle crash count component results are discussed bicycle) road user groups with increasing VMT (32). The results in
together in the following section by variable groups. The copula Table 2 indicate reduced crash propensity for both pedestrian and
parameters are presented in Table 2 notes. bicyclists with a higher proportion of heavy-vehicle VMT at the
zonal level. For total population, the joint model estimation results
revealed that both pedestrian and bicycle crashes are positively
Exposure Measures associated with higher zonal population.
As expected, both pedestrian and bicycle crash risk were found
For exposure measures, the estimates indicate that both pedestrian to be higher for STAZs with a higher proportion of households
and bicycle crashes are positively associated with higher vehicle miles without access to private vehicles (33, 34), but the magnitude of
124 Transportation Research Record 2601

TABLE 2   Pedestrian–Bicycle Joint Model Estimation Results, Clayton Copula

Pedestrian Bicycle

Variable Estimate t-Statistic Estimate t-Statistic

Constant −4.238 −38.738 −4.272 −41.469


Exposure Measure
VMT 0.118 20.646 0.128 20.775
Proportion of heavy vehicles −0.902 −2.444 −3.145 −8.786
Total population 0.137 17.447 0.138 15.339
Proportion of families with no vehicle 1.323 12.040 0.244 1.976
Socioeconomic Characteristic
Bicycle commuters 0.036 3.841 0.144 16.754
Public transit commuters 0.171 21.750 0.097 11.480
Walk commuters 0.070 7.286 0.081 8.129
Total employment 0.172 16.812 0.136 14.087
Proportion of industrial employment −0.242 −3.632 −0.191 −2.794
School enrollment density 0.012 3.022 0.011 2.638
Road Network Characteristic
Proportion of urban area 0.272 5.146 0.658 11.170
Proportion of local roads 0.564 8.752 0.565 8.157
Proportion of arterial roads 0.306 3.949 0.422 5.040
Traffic signal density 0.289 12.716 0.184 7.281
Sidewalk length 0.272 12.963 0.309 14.754
Land Use Attribute
Density of hotel, motel, or time-share room 0.029 5.943 0.018 3.429
Distance to nearest urban area −0.039 −7.031 −0.084 −9.363

Note: Copula parameters are as follows: constant estimate = −0.973; constant t-statistic = insignificant
in the model; public transit commuters estimate = 0.141; public transit commuters t-statistic = 4.373;
school enrollment density estimate = 0.049; school enrollment density t-statistic = 2.728.

the impact is more pronounced for pedestrian crashes relative to Also, an increase in school enrollment density in a STAZ increases
bicycle crashes. The results reflect that members of households with the likelihood of crash risk in count model components for both
no access to private vehicles use alternate mode of transportation for nonmotorized road user groups.
daily activities, resulting in higher pedestrian and bicycling exposure
in these STAZs.
Road Network Characteristics

Socioeconomic Characteristics Proportion of urban area—a proxy for nonmotorized activity—


reflects that an increase in the proportion of urban area in a zone
The results for the number of commuters as based on various com- increases the likelihood of both pedestrian and bicycle crash risks.
mute modes also showed significant influence on pedestrian and The results associated with the functional class of roadways show
bicycle crash risk. An increase in the number of transit commuters that pedestrian and bicycle crash risk are positively correlated with a
increases the likelihood of pedestrian and bicycle crashes at the STAZ higher proportion of arterial and local roads. Consistent with several
level. The results of the pedestrian crash model intuitively suggests previous studies (36, 37), this study’s results show that a higher
higher demand and supply of public transit in zones with more transit density of signalized intersections is positively associated with more
commuters, which are determinants of pedestrian activities (35). nonmotorized crashes. With respect to sidewalk length, the model
For walk and bicycle commuters, the results reveal that STAZs with estimation results indicate a higher likelihood of pedestrian and
more walk and bike commuters have an increased likelihood of both bicycle crashes with increasing length of sidewalk in a zone.
pedestrian and bicycle crashes. These variables can be considered
as proxy measures for pedestrian and bicycle exposure in the zones.
Both nonmotorized commute variables have a larger impact for Land Use Attributes
bicycle crash count events relative to pedestrian crash count events.
As also found in previous studies (33), more employment within a The results show that an increase in hotel, motel, and time-share room
TAZ leads to a higher probability of bicycle crashes. However, an density in a STAZ increases the likelihood of both pedestrian and
increase in the proportion of industrial employment has a negative bicycle crash risks, indicating a higher level of nonmotorized road
association with pedestrian and bicycle crashes at the STAZ level. user activity near these facilities in a zone (38). Moreover, tourists
Nashad, Yasmin, Eluru, Lee, and Abdel-Aty 125

and visitors may be unfamiliar with local driver behavior and road TABLE 3   Elasticity Effects
regulations (39), which could exacerbate crash risk for these non-
motorized road user groups. The possibilities of pedestrian and Variable Pedestrian Bicycle
bicycle crash risks increase with increasing distance to the nearest Exposure Measure
urban area from the STAZ. STAZs close to urban areas are associated
VMT 25.076 26.318
with shorter, more walkable or bikeable travel distances, which in turn
increase the exposure of nonmotorized road user groups, resulting in Proportion of heavy vehicles −0.938 −2.887
increased likelihood of crash risks. Total population 22.014 21.407
Proportion of families with no vehicle 2.973 0.442
Socioeconomic Characteristic
Dependence Effects Bicycle commuters 1.147 5.097
Public transit commuters 9.831 5.018
The estimated Clayton copula–based bivariate NB model provided Walk commuters 3.760 4.257
the best fit for incorporating the correlation between pedestrian and Total employment 25.730 19.239
bicycle crash count events. The copula parameters highlight the pres-
Proportion of industrial employment −0.582 −0.421
ence of common unobserved factors affecting pedestrian and bicycle
School enrollment density 1.034 0.916
crash frequencies. The exogenous variables that contribute to the
dependency include school enrollment density and public transit Road Network Characteristic
commuters, supporting this study’s hypothesis that the dependency Proportion of urban area 0.208 0.505
structures are not constant across all STAZs. For the Clayton copula, Proportion of local roads 7.198 7.016
the dependency is entirely positive, and the coefficient sign and Proportion of arterial roads 0.944 1.214
magnitude reflect whether a variable increases or reduces the depen- Traffic signal density 1.809 0.922
dency and by how much. The proposed framework, by allowing for Sidewalk length 4.840 5.538
such parameterizations, can improve data fit. Land Use Attribute
Density of hotel, motel, or time-share room 1.207 0.691
Distance to nearest urban area −0.224 −0.210
Policy Analysis

Elasticity Effects and Implications

The parameter effects of exogenous variables in Table 2 do not show public transit–oriented (public transit commuters) neighborhoods
the magnitude of the effects on zonal-level crash counts. For this have implications for engineering measures. Traffic calming mea-
purpose, aggregate-level elasticity effects of exogenous variables sures should be used in these zones to reduce road crashes involving
were computed for both pedestrian and bicycle crash events. The pedestrians and bicyclists. Engineering infrastructure (such as over-
percentage change in the expected total zonal crash counts caused passes, shaded walkways for pedestrian traffic, bike boxes at inter­
by the change in exogenous variable for pedestrian and bicycle were sections, and bike paths for bicycle traffic) that separate nonmotorized
computed separately to identify policy measures based on the most traffic flow from motorized traffic flow in the road network system
critical contributory factors. The computed elasticities are presented should be installed and regulated in those zones having a bigger
in Table 3. [Eluru and Bhat discussed the methodology for comput- population and more employment. Public awareness efforts and traf-
ing elasticities (40).] The results in Table 3 represent the percentage fic education for safe walking and cycling are needed for both non-
change in the number of crashes for 100% change in the independent motorists and motorists in zones with more transit, bike, and walk
variable, other characteristics being equal. For example, the elastic- commuters. Moreover, education campaigns in communities with
ity estimate for the VMT variable indicates that a 100% increase in less access to private vehicles are needed to improve nonmotorists’
VMT will result in a 25.1% and a 26.3% increase in pedestrian and safety. Targeted enforcement strategies should be regulated in zones
bicyclist crashes, respectively. with more local roads and sidewalks to make such neighborhoods
The following observations were based on the elasticity effects more walkable and bikeable. This elasticity analysis illustrates how
presented in Table 3. First, the results in Table 3 indicate that there the proposed model can be applied to determine critical factors
are differences in the elasticity effects across the expected number contributing to increases in pedestrian and bicycle crash counts.
of pedestrian and bicycle crash counts. Second, the most significant
variables with respect to an increase in the expected number of both
pedestrian and bicycle crash counts are VMT, total population, and Spatial Distribution of Hot Spots
total employment. Third, pedestrian crashes have higher elasticities
relative to bicycle crashes for total population; total employment; The model findings have important implications for identifying hot
public transit commuters; proportion of families with no vehicle; spots at the zonal level for safety planning for nonmotorized road
traffic signal density; and density of hotel, motel, or time-share rooms. users. The Highway Safety Manual approach that computes the excess
Finally, the elasticity estimates show that the influence of exposure predicted average crash frequency defined as observed frequency
and socioeconomic characteristics is substantially larger than the minus predicted crash frequency was used to identify hot spots.
influence of roadway and land use characteristics. The measure showed that 10% of the zones are hot zones and others
These results have important implications for improving the safety are normal.
of nonmotorized road users and promoting active modes of trans- The identified hotspots are shown in Figure 2. The figure shows
portation. For instance, results indicating auto-oriented (VMT) and that hot spots for both pedestrian and bicycle crashes are dispersed
126 Transportation Research Record 2601

Pedestrian Bicycle

(a) (b)

FIGURE 2   Spatial distribution of hot spots for (a) pedestrian crash risk and (b) bicycle crash risk.

throughout Florida. In addition, the risk of pedestrian or bicycle In modeling pedestrian and bicycle crashes, the study did not
crashes is higher in most urban zones. This spatial illustration can have access to nonmotorized exposure. To adjust for this, surrogate
be used to prioritize STAZs for enhancing nonmotorized road users’ measures such as population density, VMT, and proportion of heavy
safety in zones of high crash risk. vehicles were used. It would be useful to compile pedestrian and
bicycle exposure data to enhance the model frameworks developed
in this work.
Conclusions

This study formulated and estimated a multivariate count model References


by adopting a copula-based bivariate negative binomial model for
pedestrian and bicycle crash frequency analysis. To the authors’  1. Traffic Safety Facts 2013: A Compilation of Motor Vehicle Crash Data
knowledge, this was the first attempt to use such copula-based from the Fatality Analysis Reporting System and the General Estimates
bivariate count models for safety literature. Moreover, the study con- System. DOT HS 812 139. NHTSA, U.S. Department of Transportation,
tributes to the safety literature by examining the influence of several 2015.
  2. Jovanis, P. P., and H. Chang. Modeling the Relationship of Accidents to
exogenous variables (exposure measures, socioeconomic charac- Miles Traveled. In Transportation Research Record 1068, TRB, National
teristics, road network characteristics, and land use attributes) on Research Council, Washington, D.C., 1986, pp. 42–51.
pedestrian and bicycle crash count events at the STAZ level for the   3. Miaou, S.-P., and H. Lum. A Statistical Evaluation of the Effects of
state of Florida. The empirical analysis estimated models by using Highway Geometric Design on Truck Accident Involvements. In Trans-
six copula structures: Gaussian, FGM, Clayton, Gumbel, Frank, and portation Research Record 1407, TRB, National Research Council,
Washington, D.C., 1993, pp. 11–23.
Joe. The comparison between copula and the independent models,   4. Abdel-Aty, M., and E. Radwan. Modeling Traffic Accident Occurrence
based on information criterion metrics, confirmed the importance of and Involvement. Accident Analysis and Prevention, Vol. 32, 2000,
accommodating dependence between pedestrian and bicycle crash pp. 633–642.
count events in macro-level analysis. The most suitable copula   5. Miaou, S.-P., J. J. Song, and B. K. Mallick. Roadway Traffic Crash Map-
model was obtained for the Clayton copula with parametrization ping: A Space-Time Modeling Approach. Journal of Transportation and
Statistics, Vol. 6, 2003, pp. 33–58.
for dependence profile. The model estimates were augmented with
  6. Aguero-Valverde, J., and P. P. Jovanis. Analysis of Road Crash Fre-
a policy analysis that included elasticity analysis and a spatial rep- quency with Spatial Models. In Transportation Research Record: Jour-
resentation of hot spots for pedestrians and bicycles separately. The nal of the Transportation Research Board, No. 2061, Transportation
spatial distribution of hot spots indicated that zones more prone to Research Board of the National Academies, Washington, D.C., 2008,
pedestrian and bicycle crashes are dispersed throughout Florida, pp. 55–63.
with evidence of clustering along the urban zones. The policy analy-   7. Lord, D., and L. F. Miranda-Moreno. Effects of Low Sample Mean Val-
ues and Small Sample Size on the Estimation of the Fixed Dis­persion
sis illustrated how the proposed model can be used to determine the Parameter of Poisson-gamma Models for Modeling Motor Vehicle
critical factors contributing to increases in pedestrian and bicycle Crashes: A Bayesian Perspective. Safety Science, Vol. 46, No. 5, 2008,
crash counts. pp. 751–770.
Nashad, Yasmin, Eluru, Lee, and Abdel-Aty 127

  8. Maher, M., and L. Mountain. The Sensitivity of Estimates of Regression 25. Cameron, A. C., T. Li, P. K. Trivedi, and D. M. Zimmer. Modelling the
to the Mean. Accident Analysis and Prevention, Vol. 41, No. 4, 2009, Differences in Counted Outcomes Using Bivariate Copula Models with
pp. 861–868. Application to Mismeasured Counts. Econometrics Journal, Vol. 7,
  9. Cheng, L., S. R. Geedipally, and D. Lord. The Poisson–Weibull Gen- No. 2, 2004, pp. 566–584.
eralized Linear Model for Analyzing Motor Vehicle Crash Data. Safety 26. Eluru, N., R. Paleti, R. M. Pendyala, and C. R. Bhat. Modeling
Science, Vol. 54, 2013, pp. 38–42. Multiple Vehicle Occupant Injury Severity: A Copula-Based Multivariate
10. Peng, Y., D. Lord, and Y. Zou. Applying the Generalized Waring Model Approach. In Transportation Research Record: Journal of the Transpor-
for Investigating Sources of Variance in Motor Vehicle Crash Analysis. tation Research Board, No. 2165, Transportation Research Board of the
Accident Analysis and Prevention, Vol. 73, 2014, pp. 20–26. National Academies, Washington, D.C., 2010, pp. 1–11.
11. Bhat, C. R., and N. Eluru. A Copula-Based Approach to Accommodate 27. Rana, T. A., S. Sikder, and A. R. Pinjari. Copula-Based Method for
Residential Self-Selection Effects in Travel Behavior Modeling. Trans- Addressing Endogeneity in Models of Severity of Traffic Crash Inju-
portation Research Part B, Vol. 43, No. 7, 2009, pp. 749–765. ries: Application to Two-Vehicle Crashes. In Transportation Research
12. Yasmin, S., N. Eluru, A. R. Pinjari and R. Tay. Examining Driver Injury Record: Journal of the Transportation Research Board, No. 2147,
Severity in Two Vehicle Crashes: A Copula Based Approach. Accident Transportation Research Board of the National Academies, Washington,
Analysis and Prevention, Vol. 66, 2014, pp. 120–135. D.C., 2010, pp. 75–87.
13. Wang, K., S. Yasmin, K. C. Konduri, N. Eluru, and J. N. Ivan. A Copula- 28. Trivedi, P. K., and D. M. Zimmer. Copula Modeling: An Introduction for
Based Joint Model of Injury Severity and Vehicle Damage in Two-Vehicle Practitioners. Foundations and Trends in Econometrics. Now Publishers,
Crashes. Presented at 94th Annual Meeting of the Transportation Research Inc., Boston, Mass., 2007.
Board, Washington, D.C., 2015. 29. Sklar, A. Random Variables, Joint Distribution Functions, and Copulas.
14. Sener, I. N., N. Eluru, and C. R. Bhat. On Jointly Analyzing the Physical
Kybernetika, Vol. 9, No. 6, 1973, pp. 449–460.
Activity Participation Levels of Individuals in a Family Unit. Journal of
30. GAUSS. Aptech Systems, Inc., Chandler, Ariz., 2012.
Choice Modeling, Vol. 3, No. 3, 2010, pp. 1–38.
31. Quinn, C. The Health-Economic Applications of Copulas: Methods in
15. Ferdous, N., N. Eluru, C. R. Bhat, and I. Meloni. A Multivariate Ordered
Applied Econometric Research. Health Econometrics and Data Group,
Response Model System for Adults’ Weekday Activity Episode Generation
by Activity Purpose and Social Context. Transportation Research Part B, Department of Economics, University of York, United Kingdom, 2007.
Vol. 44, No. 8–9, 2010, pp. 922–943. 32. Elvik, R. The Non-Linearity of Risk and the Promotion of Environ-
16. Bhat, C. R. The Composite Marginal Likelihood (CML) Inference mentally Sustainable Transport. Accident Analysis and Prevention,
Approach with Applications to Discrete and Mixed Dependent Variable Vol. 41, No. 4, 2009, pp. 849–855.
Models. Foundations and Trends in Econometrics, Vol. 7, No. 1, 2014, 33. Cottrill, C. D., and P. V. Thakuriah. Evaluating Pedestrian Crashes in
pp. 1–117. Areas with High Low-Income or Minority Populations. Accident Analysis
17. Narayanamoorthy, S., R. Paleti, and C. R. Bhat. On Accommodating and Prevention, Vol. 42, No. 6, 2010, pp. 1718–1728.
Spatial Dependence in Bicycle and Pedestrian Injury Counts by Severity 34. Siddiqui, C., M. Abdel-Aty, and K. Choi. Macroscopic Spatial Analysis
Level. Transportation Research Part B, Vol. 55, 2014, pp. 245–264. of Pedestrian and Bicycle Crashes. Accident Analysis and Prevention,
18. Bhat, C. R. The Maximum Approximate Composite Marginal Likelihood Vol. 45, 2012, pp. 382–391.
(MACML) Estimation of Multinomial Probit-Based Unordered Response 35. Wier, M., J. Weintraub, E. H. Humphreys, E. Seto, and R. Bhatia. An
Choice Models. Transportation Research Part B, Vol. 45, No. 7, 2011, Area-Level Model of Vehicle-Pedestrian Injury Collisions with Impli-
pp. 923–939. cations for Land Use and Transportation Planning. Accident Analysis
19. Ma, J., and K. M. Kockelman. Bayesian Multivariate Poisson Regres- and Prevention, Vol. 41, No. 1, 2009, pp. 137–145.
sion for Models of Injury Count by Severity. In Transportation Research 36. Wei, F., and G. Lovegrove. An Empirical Tool to Evaluate the Safety of
Record: Journal of the Transportation Research Board, No. 1950, Trans- Cyclists: Community Based, Macro-Level Collision Prediction Models
portation Research Board of the National Academies, Washington, D.C., Using Negative Binomial Regression. Accident Analysis and Preven-
2006, pp. 24–34. tion, Vol. 61, 2013, pp. 129–137.
20. Park, E. S., and D. Lord, Multivariate Poisson-Lognormal Models for 37. Pulugurtha, S. S., and V. Thakur. Evaluating the Effectiveness of
Jointly Modeling Crash Frequency by Severity. In Transportation Research On-Street Bicycle Lane and Assessing Risk to Bicyclists in Charlotte,
Record: Journal of the Transportation Research Board, No. 2019, Trans- North Carolina. Accident Analysis and Prevention, Vol. 76, 2015,
portation Research Board of the National Academies, Washington, D.C., pp. 34–41.
2007, pp. 1–6. 38. Qin, X., and J. N. Ivan. Estimating Pedestrian Exposure Prediction
21. El-Basyouny, K., and T. Sayed. Collision Prediction Models Using Model in Rural Areas. In Transportation Research Record: Journal of
Multivariate Poisson-Lognormal Regression. Accident Analysis and the Transportation Research Board, No. 1773, Transportation Research
Prevention, Vol. 41, No. 4, 2009, pp. 820–828. Board of the National Academies, Washington, D.C., 2001, pp. 89–96.
22. Lee, J., M. Abdel-Aty, and X. Jiang. Multivariate Crash Modeling for 39. Lee, J., M. Abdel-Aty, and X. Jiang. Development of Zone System for
Motor Vehicle and Non-Motorized Modes at the Macroscopic Level. Macro-Level Traffic Safety Analysis. Journal of Transport Geography,
Accident Analysis and Prevention, Vol. 78, 2015, pp. 146–154. Vol. 38, 2014, pp. 13–21.
23. Kim, D. G., and S. Washington. The Significance of Endogeneity Problems 40. Eluru, N., and C. R. Bhat. A Joint Econometric Analysis of Seat Belt Use
in Crash Models: An Examination of Left-Turn Lanes in Intersection and Crash-Related Injury Severity. Accident Analysis and Prevention,
Crash Models. Accident Analysis and Prevention, Vol. 38, No. 6, 2006, Vol. 39, No. 5, 2014, pp. 1037–1049.
pp. 1094–1100.
24. Ye, X., R. M. Pendyala, S. P. Washington, K. Konduri, and J. Oh.
A Simultaneous Equations Model of Crash Frequency by Collision Type The Standing Committee on Safety Data, Analysis, and Evaluation peer-reviewed
for Rural Intersections. Safety Science, Vol. 47, No. 3, 2009, pp. 443–452. this paper.

You might also like