You are on page 1of 9

j o u r n a l o f t r a f fi c a n d t r a n s p o r t a t i o n e n g i n e e r i n g ( e n g l i s h e d i t i o n ) x x x x ; x x x ( x x x ) : x x x

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.keaipublishing.com/jtte

Original Research Paper

Partial proportional odds model for analyzing


pedestrian crashes, threshold heterogeneity by scale
and proportional odds factor

Mahdi Rezapour*, Khaled Ksaibati


Wyoming Technology Transfer Center, University of Wyoming, Laramie, WY 82071, USA

highlights

 More flexibility is given to the model by relaxing the assumption of proportional odds and scale, or spread rate.
 The results highlight a significant improvement in the model fit, compared with the standard model.
 Some of factors contributing to the severity of pedestrian crashes are discussed.
 Almost all identified results linked to the lack of attention and caution of the drivers.

article info abstract

Article history: Despite low traffic in Wyoming, pedestrian crash severity accounts for a high number of
Received 2 February 2021 fatalities in the state. Thus this study was conducted to highlights factors contributing to
Received in revised form those crashes. The results highlighted that drivers under influence, type of vehicle, loca-
20 October 2021 tion of crashes, estimated speed of vehicles, driving over the recommended speed are
Accepted 26 October 2021 some of factors contributing to the severity of crashes. In this study, we used proportional
Available online xxx odds model which assumes that the impact of each attribute is consistent or proportional
across various threshold values. However, it has been argued that this assumption might
Keywords: be unrealistic, especially at the presence of extreme values. Thus, the assumption was
Partial proportional odds model relaxed in this study by shifting the thresholds based on some explanatory attributes, or
Pedestrian crashes proportional odds effects. In addition, we accounted for the spread rate, or scale, of the
Scale heterogeneity model's latent distribution of pedestrian crashes. The results highlighted that the partial
Proportional odds factor proportional odds model through proportional odds factor and scale effects result in a
Vulnerable road users significant improvement in model fit compared with the standard proportional odds
Drivers' lack of attention model. Comparisons were also made across standard normal, simple partial ordinal model,
and partial ordinal accounting for scale heterogeneity. In addition, various potential
threshold structures such as symmetric and flexible were considered, but similar goodness
of fits were observed across all those models. Extensive discussion has been made
regarding the formulation of the implemented methodology, and its implications.
© 2022 Periodical Offices of Chang'an University. Publishing services by Elsevier B.V. on
behalf of KeAi Communications Co. Ltd. This is an open access article under the CC BY-NC-
ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

* Corresponding author. Tel: þ1 307 766 6230; fax: þ1 307 766 6784.
E-mail addresses: mrezapou@uwyo.edu (M. Rezapour), khaled@uwyo.edu (K. Ksaibati).
Peer review under responsibility of Periodical Offices of Chang'an University.
https://doi.org/10.1016/j.jtte.2021.10.006
2095-7564/© 2022 Periodical Offices of Chang'an University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co.
Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
2 J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx

2011). Thus, the cumulative link model, or the proportional


1. Introduction odds model, could be extended to be a more flexible model
by allowing the threshold to be shifted or scaled based on
Vehicle crashes are one of the leading causes of death around some attributes.
the world, which annually kill more than a million of road The next few paragraphs would outline studies that
users, and injure more than 20 million (WHO, 2007). In modeled pedestrian crash severity through the partial pro-
addition, crashes are ranked 7th in terms of life lost (Mark portional odds model and other techniques. The partial pro-
and Kerrigan, 2013; Moomen et al., 2020), which is portional odds model was used for analyzing pedestrian crash
equivalent to more than $230 billion in crash's costs every injury severities (Sasidharan and Mene  ndez, 2014). The results
year. A significant proportion of these crashes are related to highlighted a better performance of the partial proportional
pedestrian crashes, where pedestrian is defined as any odds model compared with the standard ordinal or
person that is not in or on a motor vehicle (NHTSA, 2009). multinomial models.
Despite the efforts and success in reduction of crash fa- A study was conducted with the help of the multinomial
tality in general, the number of pedestrian fatalities has been logit model to identify factors to the severity of pedestrian-
increased from 14% of the fatality decomposition in 2009 to vehicle crash severity (Tay et al., 2011). The results highlighted
20% in 2018, which is equivalent to 6283 deaths (NHTSA, 2018). that alcohol involvement, weather condition, vehicle weight,
To highlight the importance of studying pedestrian crashes, it pedestrians over the age of 65, and gender of the drivers
should be noted that by considering the travel distance, the were some of the factors that significantly impact the
fatality rate per kilometer traveled of pedestrians is 1.4 severity of the pedestrian crashes.
times and 23 times of that motorcyclists and vehicle Pedestrian crash severity was evaluated with the help of
occupants, respectively (Pucher and Dijkstra, 2003). The mixed logit model (Haleem et al., 2015). Different variables
disproportionate number of deaths also has been such as geometric, environmental, posted speed limit and
acknowledged in the literature that 66% of all people killed pedestrian age were found to be important in prediction of
on the roads were pedestrians (Afukaar et al., 2008). the crash severity. The pedestrian crash severity at
As the pedestrian crashes are very likely to result in signalized intersection was evaluated by the Bayesian
injury crashes, and one of the main objectives of policy technique (Munira et al., 2020). The results highlighted that
makers in the states is to reduce the number of injury and higher speed limit would increases pedestrian crash
fatal crashes, extensive efforts have been made to tackle severity, while bus stop decrease the severity. The latent
those crashes. The first step in addressing pedestrian class model was used for modeling pedestrian crashes (Sun
crashes is to identify contributory factors to those crashes. et al., 2019). The results highlighted that pedestrian's alcohol
Also, it is important to identify those factors in a more or drug involvement, crossing, entering the roadway, dark
reliable way. Thus, efforts have been made to give more lighting condition, and high speed are some of the factors
flexibility to the estimation of models' parameters, or increasing the severity of crashes. Also, other techniques
models’ predefined assumptions. For instance, by giving have been employed to study the pedestrian crash severity.
flexibility to the parameters estimations and varying them For instance, pedestrian crash injury was evaluated by
across observations by some continuous or discrete distri- geographic information system (GIS) technique (Hu et al.,
butions; The mixed model, for instance, has been employed 2020). The results found that darkness, lighting conditions
for changing the parameters across observations based on and road isolation facilities are some of the significant
some predefined continuous distribution, while the latent factors contributing to pedestrian crashes.
class model could be employed to use discrete distribution Studies also employed other techniques for modeling
for assignments of the observations. pedestrian crashes. The severity of pedestrian injury was liked
On the other hand, research has been done to relax pre- to the impact of pedestrian crashes (Davis, 2001). The results
defined model assumptions that might be unrealistic. For showed a similar pattern for children aged 0e14, and adults
instance, it has been argued that the proportional odds aged 15e59, while for pedestrian aged over 60 the
assumption might be unrealistic, and more flexibility might be experiences injure tended to be more severe. Hierarchical
needed for the thresholds’ estimations by varying the pa- ordered model was used for modeling injury severity of
rameters based on some attributes including proportional pedestrian crashes (Kim et al., 2017). The factors to
odds factors, or scale heterogeneity. Despite the likelihood of increased pedestrian crash severity included intoxicated
the violation of the proportional odds assumption, not much drivers, road crossing pedestrian, elderly pedestrians, type of
emphasis has been given in employing a non- or partial-pro- vehicles and weather conditions. Also, semi and non-
portional odds assumption. parametric conditional probability density was used for
Only limited studies in traffic safety literature has been modeling pedestrian crashes (Rezapour and Ksaibati, 2021).
conducted by giving flexibility to the proportional odds The results highlighted that alcohol and drug involvement
assumption by focusing only on the proportional odds factors and driving on non-level grade are some of factors
through a single attribute, while ignoring the importance of impacting the severity of pedestrian crash severity.
accounting for scale heterogeneity (Sasidharan and Despite the efforts for giving more flexibility to the ana-
Mene ndez, 2014). The justification of the scaled threshold
lyses, the application of the partial ordinal model is very
might be due to the fact that the extreme response category, limited: there have not been many studies conducted by
relative to the central categories, tends to be used more employing the proportional ordinal model to consider both
frequently relative to other categories (Christensen et al., effects of proportional odds factors, and scale heterogeneity.

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx 3

Thus, this study was conducted by giving more flexibility to where z accommodates J1 elements of threshold of q, and b
the predefined assumptions of the ordinal model to see if consists of p-vector of regression with dimensionality of n  p.
altered model would result in a better fit, and consequently On the other hand, Ak is a representation of y, n  J, where the
would provide more reliable parameters' estimates. The k element of y would be filled with a value of 1, and 0 other-
analysis is especially important to study pedestrian crashes wise. ok are offset constants to adjust the threshold values at
due to mountainous topography of Wyoming with very low higher and lower points.
traffic and high traffic fatality rate. Due to the unique char- Eq. (4) could also be written in a more compact version to
acteristics of the area of Wyoming, it is expected that pedes- accommodate the values in a fewer matrices’ forms, in a
trian crashes in the state to be varied compared with other more precise way.
regions in the U.S. This manuscript is structured as follows:
the method section would extensively discuss the theory hk ¼ Bk 4 þ ok (5)
behind the employed method, and also the model's parame- where 4 accommodate the initial values of coefficients' pa-
ters estimations. The data section discusses the important rameters and thresholds to be estimated in ½zT ; bT T , and the Bk
predictors, while the results and discussion section outline accommodate vector of observed as ½Ak ; X, with a dimen-
and discuss the results. sion of n  ½ðJ  1Þ þ p.
The log-likelihood of Eq. (5), then could be written as below.

2. Method X
n
LL ¼  wT logðpi Þ (6)
i¼1

This section is mainly based on the work in the literature


where the sum is over all observations, and w would be set as
(Christensen and Brockhoff, 2012), and is presented in two
1 in case of ignoring the grouping. Based on the above equa-
subsections. First a general background is presented
tion, instead of optimizing each observation likelihood, the
followed by model's parameters estimations.
sum of all observations would be optimized. Also, instead of
maximizing the expression in Eq. (6), the negative sum would
2.1. General background
be minimized.
The process would be estimated by the help of gradient and
The cumulative model has been used for modeling the ordinal
Hessian through Newton algorithm iteration as follows.
response and has the assumption of proportional odds. The
cumulative logit model could be written as below. 4iþ1 ¼ 4i  ahi (7)
     
PðY  jÞ PðY  jÞ p1 þ / þ pj where hi is the step of ith iteration, which is a function of
log ¼ log ¼ log (1)
PðY > jÞ 1  PðY  jÞ pjþ1 þ / þ pJ Hessian and Gradient. a is set to make sure that the algorithm
is moving in a right direction of optimizing the log-likelihood.
The cumulative odds ratio is called cumulative as it uses
The standard restriction of the threshold is that the values
cumulative probabilities up to a threshold. The measures
should be in an increasing order. However sometimes it ne-
shown in the equation above highlights how likely the
cessitates to give more flexibility to the model. That would be
response is to be in a category j or blow compared with a
given, for instance, by setting symmetric or equidistant
higher category.
thresholds. Those could be achieved by incorporating various
The parameter in the numerator of Eq. (1) is as below.
Jacobian matrices, which is dependent on a particular
   
PðYi  jÞ ¼ F hij ¼ gij ¼ log p1 þ / þ pj assumption about the underlying performance of the
(2) threshold.
for Y category J ¼ 1; /; J; and number of rows is i
While the standard ordinal model has the assumption of
where F is the cumulative density function (CDF) of the proportional odds, the assumption could be relaxed by
assigned distribution family, and pj is the probability of the jth incorporating the scale or proportional odds, nominal, factor.
observation belonging to the response category of y, hij is a The modified model has been called as partial proportional
matrix accommodating the predictors and threshold. The odds, which change the model from logitðPðYi  jÞÞ ¼ qj  b bðXi Þ
probability of p would be estimated as g2  g1 , or Fðh2 Þ  Fðh1 Þ: to model incorporating the nominal effect as below.
Now, the parameter hij in Eq. (2) could be written as below.
q j ¼ qj  wTi b
b bJ (8)
hij ¼ qj  xTi b (3)
where wTi b b J is an extension of xTi b, accommodating the nominal
where the ordinal outcomes would be distinguishable by j1 effect, or an extension of standard qj to make it dependent on
threshold or intercepts of qj , xTi is the transpose of various wTi b
b J ; wTi accommodates the nominal effect. The necessity of
explanatory variables vectors, and b is the coefficients esti- incorporating the nominal variable, the violation of propor-
mate. This indicates that we would have j1 or 2, due to tional odds assumption, might be clarified based on its signifi-
having 3 categories for response for our case, which corre- cance in the threshold part of the model. It should be noted that
sponds to the levels of pedestrian crash severity. considering the nominal effect would result in a shift of the
The cumulative link model factor of hk in Eq. (3) could also threshold values to the left or right of the horizontal axis.
be represented by a set of matrices as below. In this study we also considered the scale effects by
allowing the latent variable location to vary based on certain
hk ¼ Ak z  Xb þ ok (4)
attributes so the Eq. (3) would be written as below.

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
4 J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx

qJ  XTi b including threshold (by using y), proportional odds factors,


hij ¼   (9) and vector of all explanatory variables (X). There are two
exp zTi x
matrices related to Bk , Eq. (5), as B1 and B2 : Those vectors
where zTi is a vector of the explanatory variable related to the times initial values of the parameters, or values to be
scale parameter x. Now considering both effects of scale and estimated as 4, and adding up the offset terms (Eq. (5)), and
proportional odds factor we would have Eq. (10). dividing them by scale parameters, expðzTi xÞ, see
denominator of Eq. (10). Although no matrix has the scale
gðqi Þ þ wTi b
b  xT b
hij ¼  J i (10) parameter, the scale parameter with its own function would
exp zTi x
be considered in the denominator of h as Eq. (10) or (9).
where g works as a restriction function, restricting the The CDF of the resultants of h1 and h2 would be g1 and g2 .
threshold, e.g., symmetric or equidistant, and based on a The difference for those values across observation would be
Jacobean matrix. wTi b
b J would be used to account for nominal pi . Now by having pi based on Eq. (6), and getting its log, the
effect and creating partial or non-proportional odds model log-likelihood would be created. Consequently, the log-
(Peterson and Harrell, 1990). For the equidistant threshold, all likelihood would be summed up and optimization would be
threshold locations would be clarified by identifying the first employed to maximize the sum or minimize the sum.
threshold and the adjacent spacing. Location-scale family of In summary and based on Eq. (4), z accommodates J1(q)
logit model was used for link function in this study. elements of threshold of q, and b consists of p-vector of
In this study we considered various structured threshold, regression with dimensionality of n  p. Ak , on the other
with an objective of further relaxing the assumption that all hand, is a representation of y, n  J, where the k element of y
thresholds are having similar slopes. The process of making the would be filled with a value of 1, and 0 otherwise. The Ak
semi proportional odds model is done by implementing a Jac- was created for the threshold parameters, ok are offset
obean Matrix for the structure of the thresholds. In this study, constants to adjust the threshold values at higher and lower
we considered the equidistant threshold, which would employ points. Now the created vector would be divided by expðzTi xÞ,
 
1 0 based on Eq. (10) denominator, to account for the scale effect.
the transposed equidistant Jacobian as , and also sym-
1 1 For the optimization process, steps parameters are esti-
 
1 0 mated from the two items of Hessian and Gradients. Now each
metric as . For instance, the dimension of equidistant
0 1 parameter would be updated based on its own step, until the
Jacobian, 2  2, is related to J1, as the number of response changes in steps for all parameters is less than an assigned
category. The second column of the matrix is related to spacing, threshold value. During the process, h1 and h1 would be
and the column first value is related to the spacing of zero. updated as all the parameters are dependent on them (h1 and
h1 are functions of parameters to be estimated), Eq. (4) or (5),
2.2. Model parameters’ estimation and consequently in each step the sum of log-likelihood of all
observations would be estimated and updated Eq. (7). The
The model parameters’ estimations are based on the process would be iterated in a loop.
Pn
For any optimization process, the starting values are
wT logðpi Þ likelihood function, Eq. (6), being based on the
i¼1
estimation of pi . On the other hand, pi would be estimated required. So, the starting values of zeroes, for instance, would
by the difference of the two gammas, g2  g1 ; where gamma be given to all of our parameters, except for the threshold
itself is the CDF of hij , Fðhij Þ in Eq. (2). values. The convergent would be reached when the gradient
This means, we would need h for estimation of g, and related to parameters become small enough and the Hessian
consequently the LL. The estimation of h is based on Eq. (5). It remains the positive definitive.
should be noted that the process based on Eq. (4) would be
employed to create the necessary matrices and to come up
with the required matrices of B1 and B2 in Eq. (5) as Bk . 3. Data
4 accommodates all model's parameters' initial values to be
estimated, except for scale. The below process summarizes The data used in this study was obtained from the Wyoming
the preprocessing steps for creating Bk in Eq. (5) with the help Department of Transportation (WYDOT). Three sources of
of Eqs. (4) and (9). data including vehicle, driver and pedestrian during the
B incorporates vectors of Ak , related to y for thresholds, J1. period of 2010e2019 were originally available, which were
A1 is defined as matrix A1 without J column, and A2 is a matrix aggregated based on a common column of crash ID. The
without the first column, A [J,1], J1 thresholds. B matrices, dataset contained missing values, which were removed from
on the other hand, would accommodate the A matrices. B1 the dataset. The final dataset contains 810 crashes involving
 
1 0 pedestrian. The summary statistics of the variables being
and B2 times the Jacobean matrix would be rewritten
1 1 found to be important in the statistical analysis is presented in
to account for the structure of the threshold. Now B1 and B2 Table 1.
times the matrix of nominal would be created as TT1 and TT2. The response category was broken up into three categories:
B1 and B2 would be rewritten again to bind the TT1 and TT2. no physical damage as reference, minor injury as 1 and
B1 and B2 ; then, would take/bind the matrix of X as well, functional injury, disability injury and fatality being coded as
Eq. (4). To this point, B1 and B2 incorporate all needed vectors 2. The means of various attributes highlight the distribution of

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx 5

those attributes. For instance, the mean of 0.82 for response


category highlight that most crashes resulted in some form of
injury or fatality to the pedestrian. The threshold of 60 mph
was considered as a threshold for the posted speed limit
because we find this cutting point to be important. On the
other hand, a cutting point of 39 years old was considered for
the cutting point of the pedestrian age as it divided the data
into almost two equal proportions, mean of 0.48. Maneuver of
straight ahead was compared with maneuvers such as turning
right and left. Location of first harmful event, was whether
vehicles hit pedestrian on the roadways or off the roadway
such as shoulder.
From Table 1, it should be noted that other vehicle
maneuvers include factors such as “turning left” or
“backing”. On the other hand, variables related to grade
include variables such as “level”, “uphill” or “downhill”. The
Fig. 1 e Structure of the included predictors, pedestrian
included variables could be divided into three main
crashes.
categories including roadway, drivers and pedestrians’
characteristics. The Fig. 1 highlights the structure.
scale, or nominal effects result in an improvement compared
with the standard threshold model. On the other hand,
4. Results considering both of those effects resulted in the best per-
forming model in terms of the AIC goodness of fit measure
The results are presented in two subsections: first the results (Table 2).
of a comparison across considered models would be pre- The likelihood ratio test highlighted that there is a signifi-
sented. Then, based on goodness of fit, the next subsection cant gain at the cost of added 5 parameters due to the scale
would present and discuss the results obtained by the best and proportional odds factor so it would be recommended to
performing model. retain both factors in the model, standard model (log-likeli-
hood (LL) ¼ 758, degree of freedom (DOF) of 19, and model
4.1. Models comparison with nominal and scale (LL ¼ 752, DOF ¼ 22), p-value  0.05).
Also, it was found that giving structures of symmetric or
Models considering various structured thresholds, applica- equidistant model would result in almost identical model fit
tions of scale and nominal effects, and a simple ordinal model performance compared with the flexible method. As the
were compared. The results highlighted that considering only complexity of all the models are identical, here we present the

Table 1 e Descriptive statistics of important variables.


Attribute Mean Variance Min (Freq) Max (Freq)
Dependent attribute
Response, no physical damage 0, minor injury 1, functional, disability 2 0.820 0.695 0 (369) 2 (221)
Independent attribute
Downhill area 0.090 0.090 0 (733) 1 (78)
Non-level grade as 1 versus others* 0.170 0.141 0 (674) 1 (137)
Straight ahead maneuver as 1 versus others* 0.460 0.249 0 (435) 1 (376)
Alcohol involvement as 1 versus others* 0.180 0.145 0 (669) 1 (142)
Location of first harmful event being on shoulder versus others * 0.150 0.127 0 (690) 1 (121)
Drug was involved in the crash versus others* 0.028 0.055 0 (764) 1 (47)
Pedestrian age, categorical  39 versus others* 0.480 0.250 0 (422) 1 (389)
Pedestrian gender, female as 1 versus male* 0.430 0.245 0 (464) 1 (347)
Lighting condition, Night as 1 versus others * 0.360 0.231 0 (518) 1 (293)
Estimated speed, continuous 63 39,384 5 80
Pickup truck, pick up versus others* 0.270 0.196 0 (595) 1 (216)
Overspeed, vehicles speed was within the speed limit * versus others 0.130 0.114 0 (688) 1 (61)
Distracted driver, driver was distracted versus normal* 0.100 0.080 0 (738) 1 (73)
Day of a week, weekend versus weekdays* 0.190 0.160 0 (655) 1 (156)
Posted speed, higher than 60 mph versus others* 0.110 0.100 0 (694) 1 (90)

Note: * Reference category.

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
6 J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx

Table 2 e Performance of considered models. Table 3 e Finalist model considering the proportional
odds and scale effects.
ID Log-likelihood DF AIC
Attribute Point Standard p-value
1 Proportional odds model 758 19 1554
estimate error
2 Model with only scale 755 20 1549
3 Model with only nominal 756 21 1553 Downhill 0.7200 0.3060 0.030
4 Model with nominal and scale 752 22 1548 Non-level grade 0.4300 0.2470 0.080
Straight-ahead 0.5400 0.1330 <0.050
maneuver
result of equidistant structured threshold in the next sub- Alcohol involvement 0.5900 0.2000 <0.050
Location of 1st harmful 0.4600 0.1900 0.020
section. The results of degree of freedom (DF), log-likelihood
event being on
and Akaike information criteria (AIC) of the considered
shoulder
models are presented in Table 2. Drug was involved in the 1.0800 0.2990 <0.050
The results of a best performed model, based on Table 2, is crash
related to a model considering both scale and proportional Pedestrian age 0.4600 0.2440 0.060
odds factors. Beside the better fit of the model, the Pedestrian gender 0.1800 0.1270 0.100
significance of the parameters of scale and proportional Lighting condition 0.5000 0.1600 <0.050
Estimated speed 0.0009 0.0004 0.030
odds are indications of violations of proportional odds
Pickup truck 0.6500 0.1570 <0.050
assumption, equal slopes assumption (Table 3). It should be Overspeed 0.3500 0.2430 0.149
highlighted that the differences between the models in Distracted driver 0.2200 0.2720 0.430
terms of parameters estimates were minor. For instance, Day of a week 0.0300 0.1670 0.870
while the point estimate of pickup truck is 0.65 for the final Posted speed, greater 1.0700 0.2170 <0.050
model, the point estimate changes to 0.70 for the standard than 60
Pickup 0.8600 0.4000 0.031
model.
truck  overspeed
Table 3 presents the results of the equidistant model,
Distracted drivers  day 1.0400 0.4920 0.033
including the threshold and the spacing of the adjacent of a week
thresholds. For the equidistant model, by having the Log-scale parameter
location of the first threshold and the distance, the Lighting condition 0.3200 0.1190 0.020
thresholds’ locations would be identified irrespective of the Threshold coefficients
number of response categories. Intercept 0.2900 0.2170 0.180
Intercept spacing 1.2400 0.1870 <0.050
The nominal effect of wTi b b J (Eq. (10)), was considered to
Driver age 0.0200 0.0070 0.020
extend the regression part of the model to vary across j as a
Driver age spacing 0.0010 0.0040 0.800
category. At the same time, the wTi b b J impacts the threshold
parameters by giving more flexibility to the threshold. Here
we also incorporate the scale parameter to let the scale, variable of LCi . In summary, compared with proportional
variance, of the latent variable distribution to be dependent odds model, where there would be only a numerator, for the
on some explanatory variable. The scale parameter is partial model, changes would be made to adjust both parts
significant with a value of 0.32, meaning that during night of Eq. (11).
conditions, the scale of the latent distribution is expðzðdark  We also found that the structured thresholds models
_ or 27% lower compared with light condition.
lightÞÞ ¼ 0:73x, through Jacobian matrix are not needed, and the flexible
In other words, the distribution of the latent variable for light model could provide a similar goodness of fit. In simple
condition is wider compared with dark condition. words, after creating the matrices based on Eq. (11), the
The mathematical formulation of the cumulative link parameters were estimated by the Newton-Raphson algo-
model could be written as rithm through step-having or line search. The results are
Modified threshold part
highlighted in Table 3.
zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{ Regression part
zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{
qj þ DAi  b b J1  ðHTi b1 þ / þ DWi b8 Þ
logitðPðYi  jÞÞ ¼ Scale part
(11) 4.2. Results of the best fit model
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{
expðxLCi Þ
Across all parameters in Table 3, the highest contributory
where DAi means driver's age, HTi means heavy truck, DWi impact is related to the driver being under influence of
means day of a week, LCi means lighting condition. Eq. (11) drugs, with b b Drug ¼ þ1:08. This is expected, as drugs or
highlights various parts of our model. The modified alcohol would slow coordination, judgment, and reaction
threshold part adjusts the threshold of qj by values of DAi b
b J1 , times, which is expected to impact drivers’ reaction times in
so more flexibility would be given to the threshold and it seeing a pedestrian. Although a driver under the influence of
would not be constrained to follow the proportional odds alcohol also has a contributory effect, its impacts is lower
assumption. On the other hand, we accounted for the than the impacts of drugs, b b Alcohol ¼ þ 0:59. The results are
change in the scale of the model based on the binary in line with the literature which highlighted that almost half

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx 7

(47%) of crashes that resulted in a pedestrian death were due when a distracted driver is driving on weekend, the severity of
to the involvement of alcohol for the driver and/or the pedestrian crashes is higher.
pedestrian (NHTSA, 2017). Finally, for the included results in Table 3, we relax the
Two seemingly similar predictors of downhill related and assumption of the model's parameters to have the same
non-level grade predictors were found to contribute to the impacts across all threshold parameters, or thresholds with
severity of pedestrian crashes. Although these two pre- similar slopes. The assumption of the equidistant highlight
dictors seem similar, non-level grade, compared with that the thresholds of qj are equally distanced. The threshold
downhill, include all grades that are not flat. Both factors being exposed by the transpose of Jacobian matrix could be
having contributory effects with higher impact of crashes written as (0.27, 1.59, 1.86, 2.13). The adjustment of using
occurred on downhill area. The results are intuitive that only the first threshold and the spacing is due to the use of
higher speed is expected on non-level areas, especially less parameters for the model fit.
downhill, which is in line with the previous work (Eriksson
et al., 2019).
Moving to the location of the first harmful events, the 5. Discussion
results highlighted that when the location of hitting a
pedestrian is on the shoulder, the severity of crashes would One of the highest priorities of policy makers in the state of
be increased, b b 1st ¼ 0:46 for the first harmful event. The the Wyoming is to reduce severe crashes. Pedestrian crashes
result might be due to the fact that those vehicles going off account for one of the highest proportions of severe crashes
the road, hitting pedestrian on the shoulder width, have in the state so this group has received special attention in the
already lost the control of the vehicles, and possible factors state. This study is funded to identify factors to pedestrian
of lack of control of vehicles might justify the higher crash crashes so appropriate countermeasures could be employed.
severity. The result is in line with the work in the literature In this study we employed multiple techniques to find a most
highlighting the location of crash and pedestrian are reliable method for identifying factors to crashes. The stan-
important factors for the severity of pedestrian crashes (Sze dard ordinal model has been employed extensively in the
and Wong, 2007). literature review for modeling crash severity due to the
Moving to demographic characteristics of pedestrian ordinal nature of the crash severity response. However, the
including gender and age, a cutting point of 39 years old was proportional odds assumption might be too restrictive for
considered to make a dummy variable of passenger age. That observations as it constrains all observations to use the same
divides passenger age into almost equal category. The results set of thresholds. Thus, the flexibility could be given to that
are somehow in line with the previous study that strength and assumption by letting the observations vary based on some
muscle mass loss with aging process (Keller and Engelhardt, of their attributes. The model flexibility in this study was
2014). As a result, older pedestrians are likelier to sustain given by two means of scale and proportional odds factors. In
severe crashes compared with their younger counterpart. summary, while the proportional odds factor was added up
The results are against the work in the literature review that to the threshold factors in the numerator of the model, the
crashes related to male pedestrian are less likely to be scale was considered in the denominator for impacting the
severe (Tay et al., 2011). The differences are likely to be due whole value in the numerator, and the whole equation.
to variation across observations that the non-mixed models Here we allow the threshold to vary based on attribute of
could not account for. driver age. Beside the proportional odds factor, we gave more
On the other hand, counterintuitively the increased esti- flexibility to the latent variable distribution by incorporating
mated speed of vehicles was found to be slightly negative on the scale variable of lighting condition in the equation's de-
the severity of crashes. Few points should be highlighted nominator. By scale consideration, the scale of the latent vari-
about the impact. First, the vehicle estimated speed was ables varies across the categories of being dark or light
approximated by the highway patrol based on brake mark or conditions. In other words, the distribution has a longer or
other factors. So, there might be some uncertainty regarding shorter tail.
this measure. Second it should be highlighted the very small The majority of identified results are intuitive and expected.
impact of this predictor. For instance, we found that factors such as older pedestrians
Moving to the interaction terms, we considered and are likelier to undergo severe crashes, which might be linked to
checked all pairwise interaction terms across our identified the physical structures of the elderly. Or it was found that when
parameters. Across all considered interaction terms, the the drivers were under the influence of drug or alcohol, it is
terms between pickup truck and posted speed limit and day of likelier for pedestrian crashes to be more severe, which is likely
a week and distracted drivers were found to be important. Due to be due to lack of reaction time of drivers. The same expla-
to the significance of pairwise interaction terms across pre- nation goes for the lighting condition: at night the drivers are
dictors, their main effects would not be interpreted. There having a lack of reaction time, so they are expected to react in a
would be 2 scenarios for every interaction term, where few of less optimal way to the hazard ahead.
them would be discussed. For instance, it was found that Beside the main effect, we considered and checked for all
when the vehicle is pickup truck and the vehicle is over- pairwise interaction terms. We found that interaction be-
speeding, the severity of pedestrian crashes increases. Or tween pickup truck and over speeding, for instance, is sig-

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
8 J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx

nificant and so the interaction between these two factors


should be considered instead of their main effects only. Here, Conflict of interest
the importance of considering over speeding along with the
vehicle type might be linked to the reaction time. In summary, The authors do not have any conflict of interest with other
the majority of the finding highlight the importance of reac- entities or researchers.
tion time and attention on the roadway to prevent hitting the
pedestrians.
The obtained results are specific to the data used in this Acknowledgments
study. It is especially important to note that during 10-year
period we incorporated only about 800 crashes. So, lack of This is to acknowledge that this project is funded and sup-
observation for some category or insignificance of variables ported by the WYDOT.
due to the nature of state should be taken into consideration.
Other important predictors such as types of intersection or
other drivers or pedestrian characteristics, in case of avail- references
ability, should be considered.
Also, it is worthy to discuss that variables such as type of
vehicles are expected to mainly account for confounding Afukaar, F.K., Agyemang, W., Most, I., 2008. Accident Statistics
factors that were not recorded at the time of crashes. For 2007. Building and Road Research Institution Council for
instance, type of vehicle could provide information regarding Scientific and Industrial Research, Kumasi.
Christensen, R.H.B., Brockhoff, P.B., 2012. Sensometrics:
various characteristics of drivers, such as income or social
Thurstonian and Statistical Models. IMM-PHD-2012; No. 271.
status, which were not observed at the time of crashes and
Technical University of Denmark, Lyngby.
their inclusion could be linked to the psychology of drivers. Christensen, R.H.B., Cleaver, G., Brockhoff, P.B., 2011. Statistical
More studies are needed to evaluate the performance of and Thurstonian models for the A-not A protocol with and
model fit while accounting for scale and nominal effect. without sureness. Food Quality and Preference 22 (6), 542e549.
Additional flexibility by considering the grouping factors Davis, G.A., 2001. Relating severity of pedestrian injury to impact
might be needed to account for resemblance across similar speed in vehicle-pedestrian crashes: simple threshold model.
Transportation Research Record 1773, 108e113.
groups in the dataset.
Eriksson, J., Forsman, Å., Niska, A., et al., 2019. An analysis of
cyclists' speed at combined pedestrian and cycle paths.
Traffic Injury Prevention 20 (S3), 56e61.
6. Recommendations Haleem, K., Alluri, P., Gan, A., 2015. Analyzing pedestrian crash
injury severity at signalized and non-signalized locations.
Accident Analysis & Prevention 81, 14e23.
Based on the identified results, especially driving under the
Hu, L., Wu, X.H., Huang, J., et al., 2020. Investigation of clusters
influence and distracted drivers, the following recommenda- and injuries in pedestrian crashes using GIS in Changsha,
tions could be given to the policy makers in the state as China. Safety Science 127, 104710.
follows. Keller, K., Engelhardt, M., 2014. Strength and muscle mass loss
with aging process. Age and strength loss. Muscles,
 Drivers under the influence, especially under the influence Ligaments and Tendons Journal 3 (4), 346e350.
Kim, M., Kho, S.Y., Kim, D.K., 2017. Hierarchical ordered model for
of drugs, highlighted the highest impact on pedestrian
injury severity of pedestrian crashes in South Korea. Journal of
crash severity. That possibly highlights that the impaired Safety Research 61, 33e40.
drivers do not have appropriate reaction time to respond Mark, S., Kerrigan, J.R., 2013. Motor vehicle traffic crashes as a
swiftly for the change in the situation. More strict regula- leading cause of death in the United States, 2008 and 2009.
tions and law enforcement recommended for under-in- Annals of Emergency Medicine 61 (4), 484.
fluence drivers, especially due to drug. Moomen, M., Rezapour, M., Raja, M.N., et al., 2020. Predicting
 Those crashes with distracted drivers are more likely to injury severity and crash frequency: insights into the
impacts of geometric variables on downgrade crashes in
result in increased severity of pedestrian crashes. This,
Wyoming. Journal of Traffic and Transportation Engineering
again, highlight the importance of attention about the (English Edition) 7 (3), 375e383.
hazard on the roadway. More studies and thus educa- Munira, S., Sener, I.N., Dai, B.Y., 2020. A Bayesian spatial Poisson-
tional program are needed to educate the drivers lognormal model to examine pedestrian crash severity at
regarding attention and removal of distraction in the signalized intersections. Accident Analysis & Prevention 144,
vehicles. 105679.
National Highway Traffic Safety Administration (NHTSA), 2009.
 In summary, almost all results somehow linked to the lack
Traffic Safety Facts 2007 Data: Pedestrians. NHTSA,
of attention and caution of the drivers. More efforts could
Washington DC.
be made to improve the attention of drivers. That could be National Highway Traffic Safety Administration (NHTSA), 2017.
done for instance by employment of traffic control or Traffic Safety Facts 2017 Data: Pedestrians. NHTSA,
warning at the hazardous locations. Washington DC.

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006
J. Traffic Transp. Eng. (Engl. Ed.) xxxx; xxx (xxx): xxx 9

National Highway Traffic Safety Administration (NHTSA), 2018. Mahdi Rezapour obtained his PhD from
Fatal Motor Vehicle Crashes Overview. NHTSA, Washington University of Wyoming. He is passionate
DC. about statistics, machine learning and psy-
Peterson, B., Harrell, F.E., 1990. Partial proportional odds models chology. He has written papers about trans-
for ordinal response variables. Applied Statistics 39 (2), 205. portation, fraud detection, health
Pucher, J., Dijkstra, L., 2003. Promoting safe walking and cycling to psychology, fear of death, and sense of
improve public health: lessons from The Netherlands and loneliness.
Germany. American Journal of Public Health 93 (9), 1509e1516.
Rezapour, M., Ksaibati, K., 2021. Semi and Nonparametric
Conditional Probability Density, a Case Study of Pedestrian
Crashes. The Open Transportation Journal 15 (1), 280e288.
Sasidharan, L., Mene ndez, M., 2014. Partial proportional odds
model-an alternate choice for analyzing pedestrian crash
injury severities. Accident Analysis & Prevention 72, 330e340. Khaled Ksaibati, PhD, P.E., obtained his BS
Sun, M., Sun, X.D., Shan, D.H., 2019. Pedestrian crash analysis degree from Wayne State University and his
with latent class clustering method. Accident Analysis & MS and PhD degrees from Purdue University.
Prevention 124, 50e57. Dr. Ksaibati worked for the Indian Depart-
Sze, N.N., Wong, S.C., 2007. Diagnostic analysis of the logistic ment of Transportation for a couple of years
model for pedestrian injury severity in traffic crashes. prior to coming to the University of Wyom-
Accident Analysis & Prevention 39 (6), 1267e1278. ing in 1990. He was promoted to an associate
Tay, R., Choi, J., Kattan, L., et al., 2011. A multinomial logit model professor in 1997 and full professor in 2002.
of pedestrian-vehicle crash severity. International Journal of Dr. Ksaibati has been the director of the
Sustainable Transportation 5 (4), 233e249. Wyoming Technology Transfer Center since
World Health Organization (WHO), 2007. Association for Safe 2003.
International Road Travel, Faces behind the Figures: Voices
of Road Traffic Victims and Their Families. WHO, Geneva.

Please cite this article as: Rezapour, M., Ksaibati, K., Partial proportional odds model for analyzing pedestrian crashes, threshold
heterogeneity by scale and proportional odds factor, Journal of Traffic and Transportation Engineering (English Edition), https://
doi.org/10.1016/j.jtte.2021.10.006

You might also like