Professional Documents
Culture Documents
net/publication/268807409
CITATIONS READS
2 2,058
1 author:
Darcin Akin
University of Hafr Al Batin
31 PUBLICATIONS 284 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Darcin Akin on 27 November 2014.
Abstract. This study evaluates the influence of roadway, weather and acci-
dents conditions, and type of traffic control on accident severity (number of per-
son killed) using Negative Binomial and Poisson regression models. Information
on accident severity and roadway and weather conditions was obtained from the
Michigan Department of Transportation Accident Database. Negative Binomial
(NB) and Poisson regression models were deployed to measure the association
between accident severity and roadway, weather and accidents conditions. NB re-
gression model results presented that monthly, daily, hourly and weekday varia-
tions are not statistically significant on accident severity (number of persons
killed). However, Poisson regression results were the reverse with respect to these
variables. Type of traffic control was also found to be not statistically significant.
Number of vehicles involved, crash type (overturn, rear-end, side-swipe, head-on,
hit object, and so on), injury types (A, B, C), number of uninjured, number of oc-
cupants and weather conditions are statistically significant at 0.05 level. Light and
surface conditions were also statistically significant at 0.10 level. The findings of
the Poisson regression are very similar to NB regression but the parameter estima-
tions are little bit different from those determined by NB regression. The results
are in agreement with professional judgments with respect to the factors affecting
the accident severity on highway crashes.
Keywords: crash data, Negative Binomial regression, Poisson regression,
crash properties, road and weather factorss
1 Introduction
Accident prediction models are important tools for estimating road safety with regards
to roadway, weather and accidents conditions. There are different empirical equations
developed for accident prediction models. However, new regression techniques have
recently found application opportunities in this area. It is obvious that the model devel-
opment and subsequently the model results are strongly affected by the choice of the
regression technique used. This study evaluates the influence of roadway, weather and
accidents conditions, and type of traffic control on accident severity (number of person
killed) using regression models. Information on accident severity and roadway and
weather conditions was obtained from the Michigan Department of Transportation
Accident Database. Negative Binomial (NB) and Poisson regression models were dep-
loyed to measure the association between accident severity and roadway, weather and
accidents conditions.
1.1 Literature Review
Statistical models are used to examine the relationships between accidents and features
of accidents as well as accident sites. However, many past studies illuminating the nu-
merous problems with linear regression models (Joshua and Garber, 1990 and Miaou
and Lum, 1993) have led to the adoption of more appropriate regression models such as
Poisson regression which is used to model data that are Poisson distributed, and nega-
tive binomial (NB) model which is used to model data that have gamma distributed
Poisson means across crash sites—allowing for additional dispersion (variance) of the
crash data. Although the Poisson and NB regression models possess desirable distribu-
tional properties to describe motor vehicle accidents, these models are not without limi-
tations. One problem that often arises with crash data is the problem of ‗excess‘ zeroes,
which often leads to dispersion above that described by even the negative binomial
model. ‗Excess‘ does not mean ‗too many‘ in the absolute sense, it is a relative compar-
ison that merely suggests that the Poisson and/or negative binomial distributions predict
fewer zeroes than present in the data. As discussed in Lord et al. (2004), the observance
of a preponderance of zero crashes results from low exposure (i.e. train frequency
and/or traffic volumes), high heterogeneity in crashes, observation periods that are
relatively small, and/or under-reporting of crashes, and not necessarily a ‗dual state‘
process which underlies the ‗zero-inflated‘ model. Thus, the motivation to fit zero-
inflated probability models accounting for excess zeroes often arises from the need to
find better fitting models which from a statistical standpoint is justified; unfortunately,
however, the zero-inflated model comes also with ―excess theoretical baggage‖ that
lacks theoretical appeal (see Lord et al., 2004). Another problem not often observed
with crash data is underdispersion—where the variance of the data is less than the ex-
pected variance under an assumed probability model (e.g. the Poisson). One manifesta-
tion might be ―too few zeroes‖, but this is not a formal description. Underdispersion is a
phenomenon which has been less convenient to model directly than over-dispersion
mainly because it is less common observed. Winkelman's gamma probability count
model offers an approach for modeling underdispersed (or overdispersed) count data
(Winkelmann and Zimmermann, 1995), and therefore may offer an alternative to the
zero-inflated family of models for modeling overdispersed data as well as provide a tool
for modeling underdispersion.
In this section, the data used in this paper and the methodology are described. Crash
data were barrowed from the Michigan Department of Transportation
(www.michigan.goc/MDOT) for the years 2000 and 2004. The data files includes all
crashes reported by the Police in all counties, towns and townships. Data are analyzed
to determine the influence of roadway, weather and accidents conditions, and type of
traffic control on accident severity (number of person killed) using Negative Binomial
and Poisson regression models. The following section presents descriptive of the crash
data.
2.1 Descriptive of Crash Data
Some important descriptive of the crash data used in the analyses are given in the fol-
lowing tables. Table 1 shows accident occurrences in MDOT regions. The most urba-
nized region is Metro so that it has the highest frequency (or percent) of crashes (39.8
and 37.9%) in the State of Michigan in 2000 and 2004. The upper two regions (Supe-
rior and North) have the lowest urbanizations and the lowest accident occurrences (4.3
and 7.3% in 2000 and 4 and 7% in 2004). Then, the other four regions; namely, Grand,
Bay, Southwest and University) have comparable crash occurrence rates, but the Metro
Regions the rate is more than doubled compared to these four regions. The Metro re-
gions includes the city of Detroit (the motor capital of the world), and the most urba-
nized counties such as Macomb, Wayne and Oakland (see Figure 1).
Table 3 present the accident occurrences by more detailed area type. the most inter-
esting number is related to ―straight road‖ sections. These sections constitute over 40%
of all crashes. It is also interesting to note that railroad crossing crashes are only 0.2 or
0.3% of all.
Table 5 present the relationship of accident occurrences to road types. The majority
of crashes occurred on county roads and city streets (about 60%). M-routes also take a
considerable amount of crashes (19%). Interstate and US routes have relatively lower
crashes by 9 and 7%.
2004 2000
Route Class Frequency Valid Percent Frequency Valid Percent
Not Located 4777 1,3 39402 9,2
Interstate Route 34820 9,3 31638 7,4
US Route 27737 7,4 27725 6,5
M Route 72152 19,2 78107 18,3
I-State Business 5731 1,5 6842 1,6
Loop or Spur
US Business 4098 1,1 4746 1,1
Route
M Business 105 ,0 120 ,0
Route
Connector 721 ,2 742 ,2
County Road or 224843 60,0 237520 55,6
City Street
Total 374984 100,0 426842 100,0
Table 6 shows the relationship of accident occurrences to speed limits. It should be
noted that this table does not present the risk of speed limits since the road with speed
limits higher than 55 mph show lower accident occurrences. City streets with speed
limits equal or less than 25 mph take about 20% of all accidents. M, US and Interstate
routes with speed limits higher than 50 mph have the highest percent (over 30%).
Poisson and Negative Binomial (NB) regression models are used to model the influence
of roadway, weather and accidents conditions, and type of traffic control on accident
severity (number of person killed).
The negative binomial (NB) regression model is the member of the exponential family
of discrete probability distributions. The nature of the distribution is itself well unders-
tood, but its contribution to regression modeling, in particular as a generalized linear
model (GLM), has not been appreciated until recently. The mathematical properties of
the negative binomial are derived and GLM algorithms are developed for both the ca-
nonical and log form. The log forms of both may be effectively used to model types of
Poisson-overdispersed count data (Hilbe, 1993). It is not recommended that negative
binomial models be applied to small samples. What constitutes a small sample does not
seem to be clearly defined in the literature though (UCLA, 2011). Poisson regression,
also a member of the class of models known as generalized linear models (GLM), is the
standard method used to analyze count data. However, many real data situations violate
the assumptions upon which the Poisson model is based. For instance, the Poisson mod-
el assumes that the mean and variance of the response are identical. This means that
events occur within a period of observation at a constant rate; an event is equally likely
at any point within the period. When there is heterogeneity in the data, it is likely that
the Poisson model is overdispersed. Such overdispersion is indicated if the variance of
the response is greater than its mean. One may also check for model overdispersion by
submitting the data to a Poisson model and observing the Chi2-based or Deviance-based
dispersion statistic. The model is Poisson-overdispersed if the dispersion value is greater
than unity. Log negative binomial regression can rather effectively be used to model
count data in which the response variance is greater than that of the mean (Hilbe, 1993).
Model Results. In order to apply Poisson regression model, we first need to check is
there is heterogeneity in the data. Table 7 presents the descriptive statistics and it seen
that the variance of the response variable (number of persons killed in crash) is little bit
higher than the mean. Thus, the results of the Poisson regression must be approached
cautiously.
Table 8 presents the case processing summary. As seen about 90% of the cases in-
cluded in regression modeling. Regarding the measure of goodness of fit, the value of
deviance/df is 0.012 and lower than 1. This means that the model is not Poisson-
overdispersed.
2004 2000
Cases n Percent N Percent
Included 349268 93.1% 382367 89.6%
Excluded 25716 6.9% 44475 10.4%
Total 374984 100.0% 426842 100.0%
2004 2000
Hypothesis Test Hypothesis Test
Sig.@ Sig.@
Std. Wald 0.10 Std. Wald 0.10
Parameter B Error Chi2 Sig. level? B Error Chi2 Sig. level?
(Intercept) -4.518 .2127 451.365 .000 yes -5.552 .2317 574.247 .000 yes
Month .000 .0098 .001 .976 no .000 .0100 .000 .996 no
Day .003 .0038 .460 .498 no .012 .0040 8.848 .003 yes
Hour -.008 .0051 2.614 .106 no .004 .0051 .765 .382 no
Weekday -.011 .0169 .426 .514 no .003 .0173 .025 .875 no
NumVeh .818 .0362 510.613 .000 yes .734 .0417 309.797 .000 yes
CrshType -.112 .0065 294.156 .000 yes -.079 .0063 154.388 .000 yes
NumInj -3.687 .0591 3890.27 .000 yes -3.385 .0579 3422.32 .000 yes
NumUnInj -4.312 .0494 7626.77 .000 yes -4.304 .0531 6565.80 .000 yes
NumOcc 3.673 .0412 7935.97 .000 yes 3.425 .0412 6912.91 .000 yes
Weather -.092 .0273 11.320 .001 yes -.067 .0313 4.574 .032 yes
Light .041 .0197 4.347 .037 yes .066 .0204 10.458 .001 yes
Surface .057 .0310 3.359 .067 yes .022 .0341 .433 .510 no
TControl -.018 .0317 .334 .563 no .054 .0341 2.558 .110 no
Table 10 presents the parameter estimates using Poisson regression model. Signific-
ance of the parameters for 2004 and 2000 are similar except for the variables of month,
day, weather and surface. These variables are significant at 0.05 level with year 2004
data.
2004 2000
Hypothesis Test Hypothesis Test
Sig.@ Sig.@
Std. Wald 0.010 Std. Wald 0.010
Parameter B Error Chi2 Sig. level B Error Chi2 Sig. level
(Intercept) -4.410 .1979 496.542 .000 yes -4.691 .2231 442.253 .000 yes
Month -.044 .0093 22.089 .000 yes .014 .0098 2.064 .151 no
Day .025 .0036 50.400 .000 yes -.001 .0038 .030 .863 no
Hour -.034 .0050 46.191 .000 yes -.019 .0048 15.263 .000 yes
Weekday .005 .0157 .099 .753 no -.016 .0168 .933 .334 no
NumVeh .867 .0284 934.627 .000 yes .841 .0310 737.370 .000 yes
CrshType -.102 .0059 301.378 .000 yes -.079 .0059 178.777 .000 yes
NumInj -2.731 .0388 4949.289 .000 yes -2.483 .0482 2652.439 .000 yes
NumUnInj -3.585 .0400 8038.972 .000 yes -3.465 .0456 5782.059 .000 yes
NumOcc 2.898 .0280 10728.93 .000 yes 2.513 .0243 10733.83 .000 yes
Weather -.083 .0259 10.140 .001 yes -.037 .0303 1.464 .226 no
Light .036 .0188 3.633 .057 yes .098 .0203 23.039 .000 yes
Surface .074 .0294 6.244 .012 yes -.058 .0361 2.556 .110 no
TControl -.045 .0300 2.255 .133 no -.022 .0303 .515 .473 no
3 Findings and Conclusion
NB regression model results presented that monthly, daily, hourly and weekday varia-
tions are not statistically significant on accident severity (number of persons killed).
However, the year 2000 data NB regression result that daily variations are significant at
0.05 level. Similarly, the surface conditions are statistically significant on accident
severity with the year 2004 data but no with 2000 data. Type of traffic control was also
found to be not statistically significant. Number of vehicles involved, crash type (over-
turn, rear-end, side-swipe, head-on, hit object, and so on), injury types (A, B, C), num-
ber of uninjured, number of occupants and weather conditions are statistically signifi-
cant at 0.05 level. Light and surface conditions were also statistically significant at 0.10
level. The findings of the Poisson regression are very similar to NB regression with
respect to the parameters of number of vehicles, number of injured and uninjured per-
sons, number of occupants, weather, light and surface conditions, but the parameter
estimations are little bit different from those determined by NB regression. Another
important difference between the NB and Poisson regression results is that monthly,
daily, hourly and weekday variations are statistically significant on accident severity
with year 2004 data but not with year 2000 data except the parameter of the hour of
accident occurrence. The results are in agreement with professional judgments with
respect to the factors affecting the accident severity on highway crashes.
Acknowledgement
The author acknowledges Dr. Dale Lighthizer of the Michigan Department of Transpor-
tation and Prof. Dr. Richard Lyles of Michigan State University for providing the crash
data for the sole purpose of doing academic research and writing papers.
References
Biography
Dr. Darçın Akın – Born in Germany in 1966. He received his BS degree in civil engineering
from Dokuz Eylul University, Izmir, Turkey in 1987 as valedictorian. Respectively, he received
his MS and Ph.D degrees in transportation engineering from Dokuz Eylul and Michigan State
Universities (E. Lansing, MI, USA) in 1992 and 2000.
During his Ph.D. study, he was involved in traffic engineering research projects including
work zone speed study, study of pedestrian behavior at signalized and unsignalized crossings,
and simulation of campus network. He was also worked as part-time at TriCounty Regional
Planning Commission and Michigan Department of Transportation (MDOT), where he worked
on bicycle network planning and statewide long-range plan. Lastly, before he graduated, he
worked as a full-time transport planner at MDOT. He maintained highway networks for city‘s
long range plans. He completed his Ph.D. in May 2000 and returned home to accept a faculty
position at Gebze Institute of Technology (GIT) in the department of City and Regional Planning.
At GIT he has been teaching urban transport policies, urban transport systems, urban planning
and travel modeling. Outside of GIT, he taught at several well-known institutes including Boga-
zici and Istanbul Technical Universities at graduate as well as undergraduate programs. Dr. Akın
received his tenure and the title of associate professor in planning in Feb 2010. Dr. Akın has
many papers and publications at several national and international conferences and symposiums.
He has been a reviewer at several periodicals and international conferences including TRB of
USA.
Dr. Akin is the member of Turkish Road Association since 2005, Chambers of Civil Engineers
since 1987, Institute of Transportation Engineers (ITE) of USA between 1996 and 2000, Ameri-
can Society of Civil Engineers (ASCE) between 1996 and 2000, World Conference on Transport
Research Society (WCTRS) from 2004-6 to 2010-13, and Member of the Board of Trustee of
EMIT Research Platform since 2011.