You are on page 1of 10

Accident Analysis and Prevention 32 (2000) 633 642

www.elsevier.com/locate/aap

Modeling traffic accident occurrence and involvement


Mohamed A. Abdel-Aty *, A. Essam Radwan
Department of Ci6il Engineering, Uni6ersity of Central Florida, Orlando, FL 32816 -2450, USA

Received 26 February 1999; received in revised form 10 June 1999; accepted 24 June 1999

Abstract

The Negative Binomial modeling technique was used to model the frequency of accident occurrence and involvement. Accident
data over a period of 3 years, accounting for 1606 accidents on a principal arterial in Central Florida, were used to estimate the
model. The model illustrated the significance of the Annual Average Daily Traffic (AADT), degree of horizontal curvature, lane,
shoulder and median widths, urban/rural, and the sections length, on the frequency of accident occurrence. Several Negative
Binomial models of the frequency of accident involvement were also developed to account for the demographic characteristics of
the driver (age and gender). The results showed that heavy traffic volume, speeding, narrow lane width, larger number of lanes,
urban roadway sections, narrow shoulder width and reduced median width increase the likelihood for accident involvement.
Subsequent elasticity computations identified the relative importance of the variables included in the models. Female drivers
experience more accidents than male drivers in heavy traffic volume, reduced median width, narrow lane width, and larger number
of lanes. Male drivers have greater tendency to be involved in traffic accidents while speeding. The models also indicated that
young and older drivers experience more accidents than middle aged drivers in heavy traffic volume, and reduced shoulder and
median widths. Younger drivers have a greater tendency of being involved in accidents on roadway curves and while speeding.
2000 Elsevier Science Ltd. All rights reserved.

Keywords: Accident occurrence; Accident involvement; Negative Binomial models; Roadway geometric characteristics; Driver characteristics;
Traffic safety

1. Introduction 1.1. Accident prediction methodology

Safety and efficiency are the two primary goals of Researchers have attempted three approaches to re-
transportation engineering. The effort that public agen- late accidents to geometric characteristics and traffic
cies put into reducing traffic accidents is highly jus- related explanatory variables: Multiple Linear regres-
tifiable. Traffic accidents place a huge financial burden sion, Poisson regression and Negative Binomial regres-
sion. However, recent research shows that multiple
on society. Two major factors usually play an impor-
linear regression suffers some undesirable statistical
tant role in traffic accident occurrence. The first is
properties when applied to accident analysis, some of
related to the driver, and the second is related to the
which have been discussed by Jovanis and Chang
roadway design. Many of the important road user
(1986). To overcome the problems associated with mul-
factors in traffic safety depend strongly on the gender
tiple linear regression models, Jovanis and Chang pro-
and the age of the driver (Miaou and Lum, 1993). This
posed Poisson regression for modeling accident
study investigates the factors that affect accident occur- frequencies. They argued that Poisson regression is a
rence on highway segments, and also the variables that superior alternative to conventional linear regression
affect the accident involvement of the different driver for applications related to highway safety. In addition,
gender and age groups. it could be used with generally smaller sample sizes
than linear regression.
* Corresponding author. Tel.: + 1-407-8235657; fax: + 1-407-
Joshua and Garber (1990) studied the relationship
8233315. between highway geometric factors and truck accidents
E-mail address: mabdel@mail.ucf.edu (M.A. Abdel-Aty) in Virginia using both linear and Poisson regression

0001-4575/00/$ - see front matter 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 0 1 - 4 5 7 5 ( 9 9 ) 0 0 0 9 4 - 9
634 M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642

models. They also concluded that linear regression tech- regression. In this study, Miaou evaluated the perfor-
niques used in their research did not describe the rela- mance of the Poisson regression, zero-inflated Poisson
tionship between truck accidents and the independent regression, and Negative Binomial regression. Maxi-
variables adequately but that the Poisson models did. mum likelihood was used to estimate the coefficients of
Miaou et al., (1992) used a Poisson regression model the models. As an initial step in developing a model,
to establish the empirical relationship between truck Miaou suggested that the Poisson regression model
accidents and highway geometric on a rural interstate should be used to establish the relationship between
in North Carolina. The estimated Poisson model sug- highway geometric and accidents. If over dispersion
gested that Average Annual Daily Traffic (AADT) per exists and is found to be moderate or high, both the
lane, horizontal curvature, and vertical gradient were Negative Binomial and zero inflated Poisson regression
significantly correlated with truck accident likelihood. models can be explored. He suggested that the zero-
During their work, a limitation of the Poisson model inflated Poisson regression model appears to be appro-
was uncovered. Using the Poisson model necessitates priate when the data exhibits a high number of zero
that the mean and variance of the accident frequency frequency observations.
variable (the dependent variable) be equal. In most Ivan and OMara (1997) applied Poisson regression
accident data, the variance of the accident frequency for the prediction of traffic accidents using the Con-
exceeds the mean and, in such case, the data would be necticut Department of Transportations accident data.
over dispersed. They discussed that, although over dis- Results of the model suggest that the posted speed
persion was present, it did not change the conclusion limit, the annual average daily traffic of the highway
about the relationship between truck accidents and the are critical accident prediction variables leading to the
examined traffic and highway geometric design vari- conclusion that the Poisson regression model is pre-
ables. However, they did suggest a correction to over- ferred than the linear regression model.
come the problem of over dispersion. Shankar et al. (1995) used both the Poisson and
A follow-up study was completed by Miaou and Negative Binomial distributions (Poisson when the data
was not significantly over dispersed and negative bino-
Lum (1993). While this study was similar in scope to
mial when it was) to evaluate the effects of roadway
the first, the main purpose was to evaluate the statisti-
geometrics and environmental factors on rural accident
cal properties of two conventional linear regression
frequency in Washington State. In addition to the
models and two Poisson regression models. The models
overall accident frequency on sections of highway, they
studied by Miaou and Lum were comparable to those
modeled the frequency of specific types of accidents.
developed in previous studies to explore the relation-
The authors concluded that separate regression models
ship between vehicle accidents and highway geometric
for a specific type of accidents would have a greater
design. The four types of models considered were (1) an
explanatory power, and that this was statistically
additive linear regression model; (2) a multiplicative confirmed.
linear regression model; (3) a multiplicative Poisson Poch and Mannering (1996) applied the Negative
regression with exponential function and; (4) a multi- Binomial regression to predict the accident frequency
plicative Poisson regression with non-exponential rate on sections of principal arterials in Washington State.
function. The authors found that Poisson regression They concluded that the Negative Binomial regression
models outperformed linear regression models. Further- is a powerful predictive tool and one that should be
more, the Poisson regression model with the exponen- increasingly applied in future accident frequency
tial rate function was the favored model. Miaou and studies.
Lum also attempted to address over dispersion in their Fridstrom et al. (1995) measured the contribution of
frequency data. When over dispersion existed in the randomness, exposure, weather, and daylight to the
data and Poisson model is used, the variance of the variation in road accident counts. They stated that the
estimated model coefficients tended to be underesti- formulation of the generalized Poisson regression mod-
mated. They attempted to relax the Poisson constraint els for accident counts allows for the decomposition of
of the mean being equal to the variance by using the total variation in the dependent variable into one
Wedderburns over dispersion parameter. They found part due to normal random (inexplicable) variation,
that with such over dispersed data, using the Poisson and another part due to systematic, causal factors.
model may not be appropriate for making probabilistic They concluded also that the simple Poisson regression
statements about vehicle accidents because the model models can come very close to explaining almost all the
may under or overestimate the likelihood of occurrence. systematic variation in a cross-section/time series acci-
Because of the over dispersion difficulties, the authors dent data set. However, when the events analyzed are
suggested the use of a more general probability distri- not independent, it would be strongly advisable to use
bution such as the Negative Binomial. Negative Binomial rather than pure Poisson specifica-
Miaou (1994) studied the relationship between high- tion, as certain amount of over dispersion must always
way geometric and accidents using Negative Binomial be expected in such cases.
M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642 635

In summary, from a methodological perspective, pre- reduced. However, because this is an impossible task,
vious researchers have shown that multiple linear re- correcting geometric deficiencies is an important step
gression is not a suitable method for modeling the toward reducing accidents.
relationship between accident occurrence, and the geo-
metric and traffic factors. Poisson regression, and in 1.3. Research objecti6e
case of over dispersion, Negative Binomial regression
are more appropriate approaches for accident The primary objective of this research was to develop
modeling. a mathematical model that explains the relationship
between the frequency of accidents and highway geo-
1.2. Factors affecting highway accidents metric and traffic characteristics. Other objectives in-
clude developing models of accident involvement for
A number of studies have attempted to quantify the different gender and age groups using the Negative
effects of highway geometric design variables and traffic Binomial regression technique. Previous research have
volume on accident rates or frequencies. For example, shown significant differences in accident involvement
Jovanis and Chang (1986) estimated Poisson regression between the different gender and age groups (see for
models using accident, travel mileage, and environmen- example Abdel-Aty et al., 1999a,b; Mostofa, 1998;
tal data. Their models revealed that accident occurrence Chen, 1997). An elasticity method was applied to the
increases with the vehicle miles of travel (VMT). Agent developed models in an attempt to identify the most
and Deen (1975) attempted to identify high-accident critical variables that contribute to accident occurrence
locations with respect to the functional type and ge- and involvement and their relative significance.
ometry of the highway, using accident and volume data
from rural highways in Kentucky collected from 1970
through 1972. They found that four-lane undivided 2. Data collection
highways had the highest accident, injury and fatality
rates. Also, two-lane highways had the highest percent- In order to develop a mathematical model that corre-
age of accidents that involved curvature. lates accident frequencies to the roadway geometric and
Milton and Mannering (1996) attempted to develop a traffic characteristics, one needs to select a roadway
model for an arterial street in Washington State. They that posses a wide variety of geometric and traffic
found that narrow shoulder width, sharp horizontal characteristics. The goal of this data collection exercise
curve, reduced lane width and high volume of traffic all is to divide this roadway into segments with homoge-
have a potential effect on increasing accident frequency. nous characteristics. After reviewing several roadways
They also found that the number of lanes is a highly in Central Florida, it was decided that State Road 50
significant factor in predicting accident frequency. (SR 50) is most appropriate for this task.
More lanes tend to increase accident frequency. SR 50 is a 227 km major principal arterial that
Knuiman et al. (1993) studied the effect of median connects the east and west coasts of Central Florida
width on accident rates using a Negative Binomial passing through the center of Orlando. Parts of SR 50
regression model. For a median without barrier, they are rural, and the number of lanes varies between 2, 4
found that the accident rate declines rapidly when and 6 lanes. This roadway also experiences high acci-
median width exceeded about 7.6 m (25 ft). The de- dent rates, and had very limited changes during the
creasing trend seemed to become level at median widths 3-year study period (199294). This arterial is also long
of approximately 18.9 24.4 m (60 80 ft). enough to produce an adequate number of segments to
Several studies have presented accident relationships develop the model.
for design elements of horizontal curves. In general, Traffic and roadway data were obtained from Road-
accident rate increases as a function of increasing de- way Characteristics Inventory (RCI) database main-
gree of curvature, although the relationship is affected tained by the Florida Department of Transportation
by other variables, including the lane and shoulder (FDOT). This database may be used to process, store,
widths, roadside design, and the length of curve and report information that describe all of the states
(McGee et al., 1995). highway system in Florida. Information on roadways
A common shortfall of many of the previous studies include geometric characteristics such as horizontal
is that they did not consider the effect of the drivers curves, shoulder widths, median widths, and traffic
characteristics. Sabey and Taylor (1980) showed that characteristics such as traffic volumes and speed limits.
human factors are involved in around 95 percent of all SR 50 was divided into 566 highway segments defined
traffic accidents, either alone or in combination with by any change in the geometric and/or roadway vari-
other factors. If motorists were cognizant of every ables (e.g. a new section would be identified when
geometric deficiency encountered and warned to be median changes from 3 to 6 m). Therefore, each high-
careful of these deficiencies, accident potential would be way segment is uniform with respect to all the possible
636 M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642

G(u + ni ) u
geometric and traffic features recorded by the FDOT Prob (ni )= u (1ui )ni (3)
(G(u)ni !) i
database. The data included the following variables;
AADT, degree of horizontal curvature, shoulder type, Where ui = u/(u +li ) and u= 1/a.
divided/undivided, rural/urban classification, posted The Negative Binomial model can be estimated by
speed limit, number of lanes, road surface and shoulder standard maximum likelihood methods. The corre-
types, and lane, median, and shoulder widths. Data sponding likelihood function is:
extracted from the RCI system was coded into a new N
G(u + ni ) u
database based on the identified sections. L(li )= 5 u i (1ui )ni (4)
Accident data was obtained from an electronic acci- i = 1 G(u)ni !

dent database for three years from 1992 to 1994. This is Where N is the total number of highway sections.
a relational database maintained by the Florida De- This function is maximized to obtain coefficient esti-
partment of Highway Safety and Motor Vehicles mates for b and a. Compared with Poisson model, this
(DHSMV). The DHSMVs accident database is the model has an additional parameter a, such that
most complete accident data available in the state of
Florida. DHSMV assembles all the accident reports in Var[ni ]= E[ni ]{1+aE[ni ]} (5)
the state (from counties, cities, police departments, The choice between the Negative Binomial model and
etc.). In all, 1606 accidents were available for SR 50. the Poisson model can largely be determined by the
This relational database contains specific information statistical significance of the estimated coefficient a. If a
about each accident including the driver characteristics. is not significantly different from zero (as measured by
Finally the accident data and the database developed t-statistics) the Negative Binomial model simply re-
from the RCI system were merged based on the mile- duces to a Poisson regression with Var[ni ]= E[ni ]. If a
post of each accident and the beginning and ending is significantly different from zero, then the Negative
milepost of each segment. The resulting database con- Binomial is the correct approach.
tained information about the accidents occurring on It worth mentioning that if a model has several
each segment together with the geometric and traffic variables, there is a possibility that some of the ex-
characteristics of this segment. planatory variables would be related causing the prop-
erty known as multicollinearity. Although
multicollinearity would not cause the estimators to be
3. Modeling methodology
biased, inefficient, or inconsistent, and does not affect
the forecasting performance of the model (Ra-
The Poisson regression methodology was initially
manathan, 1995), it might increase the standard errors
attempted. However, the Poisson distribution was re-
of the coefficients, thus making coefficients less signifi-
jected because the mean and variance of the dependent
cant. Multicollinearity could be identified by low values
variables are different, indicating substantial over dis-
of the t-statistics, high value for correlation coefficients
persion in the data. Such over dispersion suggests a
between variables, and the sensitivity of the estimated
Negative Binomial model. The Negative Binomial mod-
coefficients to specification (Ramanathan, 1995). Non
eling approach is an extension of the Poisson regression
of these symptoms were identified in the models pre-
methodology and allows the variance of the process to
sented in this paper. Pairwise correlations among ex-
differ from the mean. The Negative Binomial model
planatory variables did not have high values, and there
arises from the Poisson model by specifying:
was no observation that the estimated coefficients were
lnli =bxi +o (1) drastically altered when variables were added or
Where, li is the expected mean number of accidents on dropped. Furthermore, the coefficients in the estimated
highway section i; b is the vector representing parame- models were significant and had meaningful signs and
ters to be estimated; xi is the vector representing the magnitudes. Therefore, there is no need to be concerned
explanatory variables on highway segment i; o is the about multicollinearity.
error term, where exp(o) has a gamma distribution with
mean 1 and variance a 2. 3.1. Goodness of fit
The resulting probability distribution is as follows
exp[li exp(o)]l ni i In order to decide which subset of independent vari-
Prob (ni o)= . (2) ables should be included in an accident estimation
ni !
model, AIC (Akaikes information criterion) was used.
Where, ni is the number of accidents on highway sec-
AIC identifies the best approximating model among a
tion i over a time period t.
class of competing models with different numbers of
Integrating o out of this expression produces the
parameters. AIC is defined as follows:
unconditional distribution of ni. The formulation of this
distribution is: AIC= 2*ML+2*k (6)
M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642 637

Where ML is the maximum L(b) and k is the number To overcome this problem and to examine the true
of variables in the model. relative effects of the variables included in the models,
The smaller the value of AIC, the better the model. Shankar et al. (1995) suggested the computation of an
Starting with full set of independent variables and their elasticity parameter, which would measure the true
interactions, a stepwise procedure has been used to relative effect of the variable on accident frequency. In
select the best model based on minimizing the AIC general, elasticity is computed as,
value.
To measure the overall goodness-of-fit statistics, the (l x
E(y)= (8)
deviance value 2(LL(b) LL(0)) which follows a x 2 (x l
distribution has been used for testing overall goodness
Where l is the mean number of accidents, x are the
of fit as suggested by Agresti (1990). The log-likelihood
explanatory variables.
ratio r 2(=1 LL(b)/LL(0)) value of the model
(analogous to R-square test in Linear regression mod-
els)1, which is an indication of the additional variation
in accident frequency explained by the obtained model 4. Estimation results
to the constant term, was also used (for a thorough
discussion of the goodness-of-fit measures for general- 4.1. Modeling accident frequency
ized Poisson regression models, the reader is referred to
Fridstrom et al., 1995). The Negative Binomial results for arterial accident
frequency are presented in Table 1. This table shows
3.2. Relati6e significance of 6ariables that all the variables have the expected sign (with a
positive sign indicating an increase in the accident
A simple plot of the mean number of accidents frequency and a negative sign indicating a decrease).
estimated using the Negative Binomial regression mod- The deviance value (2(LL((b)LL(0)) which follows
els against the different variables may be thought of as x 2 distribution has been used for testing the overall
a method of evaluating model quality. However, the goodness of fit. The x 2 test of the deviance value (266,
slope of this type of plotting does not indicate the and df=7), rejects the null hypothesis that the ob-
relative effects of variables with respect to the accident tained model has explanatory power equal to that of
frequency as well as accident involvement (Shankar et the model with the constant term only. Therefore, the
al., 1995). To resolve this issue, one may apply the model shows an overall good statistical fit. The r 2
partial derivative of E(y) or l with respect to the value of the model, which is an indication of the
independent variables. additional variation in accident frequency explained by
the model to the constant term alone, is relatively low.
(l (exp(xb) (xb
= = exp (xb)b =lb (7) This low value is usual for accident estimation because
(x (b (x there are many variables (e.g. human factors) which are
Since the Negative Binomial regression is nonlinear, the
value of the marginal effect depends on both the coeffi- Table 1
cient for independent variable x and the expected value Negative binomial model of accident frequency
of y. The larger the value of l or E(y), the larger the
Independent variable Coefficient t-statistics
rate of change in E(y) that is the probability of acci-
dents. So, if we conclude anything regarding relative Constant 4.182 3.78
effects of the independent variables from plotting acci- Log of the section length (km) 0.325 7.62
dent frequencies versus an independent variable, it Log of AADT per lane 0.622 5.59
would be misleading. As an example, if we plot the Degree of horizontal curve 0.124 4.46
(degrees/100 m arc)
accident involvement of male and female drivers
Shoulder width (m) 0.122 2.63
against the median width, any conclusion drawn from Median width (m) 0.024 1.58
the slope of the plotted line regarding the relative effect Lane width (m)/no. of lanes 0.364 2.09
of the median width on these two different groups of Urban section dummy variable (1 0.302 3.78
drivers, would be misleading. The fact that the slope of if urban, 0 otherwise)
Over dispersion parameter (a) 0.235 5.45
the line is influenced by the accident involvement of
male and female drivers supports the decision of not Summary statistics
using this method of assessment. Number of sections 566
Log-likelihood at zero 1210
Log-likelihood at covergence 1077
1
There are a variety of measures which are analogous to R 2. r 2 =1LL(b)/LL(0) 0.11
However, none of them produce a true R 2 except under restricted 2(LL((b)LL(0)) 266
conditions.
638 M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642

Table 2 number of lanes has the next relative effect on accident


Elasticity estimates for the accident frequency model
frequency. Shoulder width and median width have the
Variable Elasticity same relative effect, which is greater than the effect of
the degree of horizontal curve.
Section length (km) 0.33
AADT per lane 0.62
Degree of horizontal curve 0.07 4.2. Modeling the accident in6ol6ement by gender
Shoulder width (m) 0.13
Median width (m) 0.12
Two Negative Binomial models of accident involve-
Lane width (m)/number of lanes 0.38
ment were estimated. The first model represents the
males accident involvement, while the second represent
not measurable2 (Jovanis and Chang, 1986; Poch and the females. The models are presented in Table 3. All
Mannering, 1996). the variables that entered in these models are similar to
Turning to the specific variables entered in the the accident frequency model presented before in Table
model, two exposure variables were found to be signifi- 1. Only one additional variable was significant in the
cant. The first is the log of the sections length. The estimation of the accident involvement of male drivers.
longer the length of the roadway section, the more The variable is a speeding indicator ((estimated travel-
likely accidents would occur on these sections. A simi- ing speed posted speed limit)/posted speed limit).
lar conclusion was reached for the log of the AADT per This variable represents the extent of speeding at the
lane. An increase in AADT per lane has a positive time of the accident. The coefficient of this variable is
impact on the likelihood of accidents. positive and significant, indicating that as male drivers
The sharpness of the horizontal curve has a positive speed their likelihood of being involved in an accident
effect on the likelihood of accidents. Accidents increase increase. This variable was not significant and did not
with the increase of the degree of curve. The increase in enter in the females model, showing less variation in
shoulder width and median width reduce the frequency speed among female drivers, which probably indicates
of accidents. Whether the roadway is divided or not, is that male drivers have a tendency to be involved in
accounted for implicitly in the median width variable. accidents while speeding.
If the roadway is undivided, then the median width The deviance value (2(LL(b) LL(0))) for the male
would be equal to zero. There is an interaction effect involvement model (2014, df=8) and female (1088,
between the lane width and the number of lanes. When df= 7) which follows x 2 distribution is significant (at
the lane width increase, and at the same time the 95% confidence interval, x 2 value is equal to 14 for
number of lanes decrease, the frequency of accidents df= 7, and 15.5 for df=8). Therefore, both models
decline. No effect of vertical alignment entered the show good statistical fit. Furthermore, the value of the
model, possibly because Florida has relatively flat to- likelihood ratio index is higher for both the male model
pography (i.e. little variation in slopes). Also the fact (r 2 = 0.275), and the female model (r 2 = 0.16) than the
that urban areas experience higher accident frequency general accident frequency model (r 2 =
3
than rural areas as depicted by the model may be 0.11) .Therefore, the predictability of accident fre-
explained by the larger number of access points and the quency is improved when accident involvement models
higher level of congestion. Finally, the significance of are estimated for each gender group. This could be
the over dispersion parameter (a) indicates that the attributed to that some behavioral variables explained
Negative Binomial formulation is preferred to the more by gender are implicitly considered in the model.
restrictive Poisson formulation. The over dispersion parameter in both models are
To examine the relative effects of the variables in- significant, indicating that the mean varies from the
cluded in the model, average elasticity of all continuous variance, which confirms the appropriateness of the
variables are presented in Table 2. The results show Negative Binomial relative to the Poisson formulation
that AADT per lane has the greatest relative effect for predicting accident involvement for both male and
(0.62) on the accident frequency among all the indepen- female drivers.
dent variables. The interaction between lane width and Elasticity values computed from both models are
depicted in Table 4. Although both male and female
2
driver models show that an increase in AADT per lane
Note that even in a full model explaining 100% of the variation of
the expected number of accidents in the population, the log-likeli-
hood ratio usually obtains very low value if the average expected 3
Note that the higher r2 obtained for separate gender equations is
number of accidents in the data is low (as is the case usually). This is a reflection of non-additivity or interaction. That is, the relationship
attributed to the relatively large pure random variation of the ob- between the explanatory variables and accidents differs for males and
served accident numbers around the expected numbers of accidents in females. In the absence of interaction, one would not expect the
each unit of the population. This can be accounted for by applying separate equations to yield substantially different predictive power
one of the approaches proposed by Fridstrom et al. (1995). than the single equation.
M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642 639

Table 3
Negative binomial models of male and female drivers accident involvement

Variables Male accident model Female accident model

Coefficient t-statistics Coefficient t-statistics

Constant 0.323 0.93 2.52 3.43


Log of section length (km) 0.096 4.25 0.092 3.21
Log of AADT per lane 0.128 2.50 0.375 4.87
Degree of horizontal curve 0.119 7.29 0.107 6.11
Shoulder width (m) 0.108 4.74 0.077 2.63
Median width (m) 0.025 3.77 0.063 6.56
Lane width (m)/no. of lane 0.356 4.88 0.800 8.11
Speed difference/speed limit 0.095 5.40
Urban (1 if urban, 0 if rural) 0.367 8.64 0.317 6.31
Over dispersion parameter (a) 0.094 4.94 0.137 4.10
Summary statistics
Number of sections 566 566
Log-likelihood at zero 3657 3408
Log-likelihood at convergence 650 2864
r 2 = 1LL(b)/LL(0) 0.275 0.16
2(LL((b)LL(0)) 2014 1088

has a positive effect on accident involvement, the rela- solve the problem of double counting accidents for the
tive effect of AADT per lane on accident involvement is two groups of interest (i.e. the young and the old).
higher for female drivers than male drivers. This shows Screening the data proved that in most cases, multiple
a tendency to more accident involvement by females accidents that involved an old or a young driver, the
during heavy traffic. The decrease in median width other driver is from the middle age group.
increases accident involvement frequencies for both The models are presented in Table 5. All the vari-
male and female drivers. But the relative effect of ables that entered in these models are similar to the
median width for female drivers is more pronounced accident frequency model presented before in Table 1.
than that for male drivers. The negative correlation Two additional variables were significant. A speeding
between the interaction of the lane width and number indicator variable ((estimated traveling speedposted
of lanes and accident involvement is higher for females speed limit)/posted speed limit), entered in the young
than males. So it can be concluded that narrow lane and the middle age drivers involvement models, and a
width and larger number of lanes have larger effect on dummy variable of shoulder pavement entered in the
accident involvement for female than male drivers. For old age involvement model. These variables indicated
Male drivers, there is a positive correlation between the an increase in the probability of an accident as the
percentage of speed and accident involvement, which is estimated speed of the accident increase for both the
not significant for female drivers. This indicates that young and middle age drivers. Also older drivers likeli-
male drivers have a tendency to be involved in acci- hood of accident involvement decreases when the
dents that are related to speeding. shoulder is paved.

Table 4
Elasticity estimates for male and female accident involvement models
4.3. Modeling the accident in6ol6ement by age
Variables Elasticity (male Elasticity (female
Three Negative Binomial models of young, middle model) model)
age, and old driver accident involvement were esti-
mated. Young drivers were defined as drivers between Section length (km) 0.09 0.09
AADT per lane 0.12 0.37
15 and 25 years old. Middle age are drivers between 26
Degree of horizontal 0.08 0.08
and 75, and old drivers are those above the age of 75. curve
Based on results from a previous study (Abdel-Aty et Shoulder width (m) 0.18 0.13
al., 1999a), it was decided that for the purpose of this Median width (m) 0.13 0.34
study it is adequate to divide age into these three Lane width (m)/no. 0.35 0.79
of lanes
categories with the aforementioned cut off values.
Speed 0.07
Young drivers (25 or below) and old drivers (above 75) difference/speed
have higher risk than middle aged drivers (Abdel-Aty et limit
al., 1999a). Including a wide middle age category also
640 M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642

Table 5
Negative binomial models of young, middle, and old drivers accident involvement

Variables Young age model Middle age model Old age model

Coefficient t-statistics Coefficient t-statistics Coefficient t-statistics

Constant 3.152 4.04 0.321 4.01 3.020 2.69


Log of section length (km) 0.099 3.23 0.105 4.60 0.218 4.47
Log of AADT per lane 0.373 4.74 0.165 2.81 0.342 3.01
Degree of horizontal curve 0.312 16.47 0.123 8.91
Shoulder width (m) 0.087 2.66 0.074 2.98 0.162 2.87
Median width (m) 0.030 2.98 0.036 5.08 0.094 6.92
Lane width (m)/no. of lane 0.706 5.71 0.448 5.37 0.725 5.21
Shoulder pavement (1 if paved, 0 0.236 2.10
otherwise)
Urban (1 if urban, 0 if rural) 0.534 9.48 0.174 4.56 0.458 4.40
Speed difference/speed limit 0.113 14.42 0.039 2.01
Over dispersion parameter (a) 0.28 1.1 0.195 9.60 0.211 2.13
Summary statistics
Number of sections 566 566 566
Log-likelihood at zero 2763 3994 2518
Log-likelihood at convergence 2313 3599 1557
r 2 = 1LL(b)/LL(0) 0.16 0.09 0.385
2(LL((b)LL(0)) 900 790 1922

The models have an overall good statistical fit. The middle age drivers. These two variables are sometimes
deviance values (=2(LL(b) LL(0))) for young (900; related because speeding on a horizontal curve increases
df= 8), middle (790; df =8) and old (1922; df = 8), the likelihood of an accident. These two variables did
which follow x 2 distribution, are significant (at 95 not enter in the old drivers accident involvement
percent confidence interval Chi-square value is equal to model, which means less variation in the speed variable
15.5 for 8 df). The value of r 2 is higher for young (0.16) for this group, and also indicates no safety problem on
and old (0.385) accident involvement models than the curves for this group of drivers. The above findings
general accident frequency model (0.11). confirm previous results (Abdel-Aty et al., 1999b).
For both the middle and old drivers accident involve- Shoulder and median widths affect negatively the
ment models, the over dispersion parameters were frequency of accident involvement, however, the elastic-
found significant. Therefore, the Negative Binomial ity values indicate that these results are higher for older
formulation is preferred to the more restricted Poisson. drivers. The interaction variable (lane width/no. of
However, the over dispersion parameter for the young lanes) also affect negatively the accident involvement
drivers accident involvement model was insignificant, frequency.
indicating that the Negative Binomial formulation sim-
ply reduces to a Poisson regression with Var[ni ]=E[ni ].
5. Summary and conclusion
To measure the relative effects of the different vari-
ables on accident involvement of young, old and middle This paper presents a model of accident frequency as
age drivers, elasticities were evaluated. Table 6 presents well as models of accident involvement for two driver
a comparison of the relative effects of the different demographic factors: age and gender. The literature
independent variables on the different age groups acci- suggests that the normal distribution, which underlies
dent involvement. the traditional multiple linear regression method,
The results in Table 6 show that an increase in should be used with caution because of the problems
AADT per lane increases the likelihood of accident associated with non-negativity and error terms. If the
frequencies. The relative effect of AADT per lane on underlying accident process is one in which the mean
accident involvement is higher for young and old driv- accident frequency is functionally related to the vari-
ers than middle age drivers. Therefore, younger and ance (e.g. Poisson distribution), then parameters in a
older drivers suffer more problems than middle aged linear regression model would give correct signs but
drivers with heavy volume of traffic. would have incorrect confidence limit. The literature
The elasticity values for both the degree of horizontal also suggests that the Poisson regression and the Nega-
curve and the speeding indicator variable show that the tive Binomial models possess most of the desirable
effect of this variable is higher for young drivers than statistical properties in describing vehicle accident
M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642 641

events. However, one of the stated limitations of the dent occurrence than middle and young drivers for
Poisson regression model is that the variance of the reduced shoulder width and median widths. Decreasing
accident frequency is constrained to be equal to the lane width and increasing number of lanes create more
mean. Most accident frequency data are over dispersed problems for older drivers and younger drivers than
(having a variance greater than the mean), pointing to middle age drivers. Older drivers experience fewer num-
the need for a correction to the Poisson formulation. ber of accidents if the shoulder is paved. Also, the
To overcome this problem the Negative Binomial mod- likelihood of younger drivers accident involvement in-
eling methodology is used in this paper. creases with speeding.
Negative binomial models were developed to estimate This paper confirmed some of the results reached in
accident frequencies, and accident involvement on a previous studies. Shoulder and lane, widths, and sharp
major principal arterial (SR 50) in Central Florida. A horizontal curves are found to affect the safety of a
massive data collection effort has been done for SR 50 roadway as in Milton and Mannering (1996), and me-
from two data sources: Roadway characteristic Inven- dian width are significant as in Knuiman et al. (1993).
tory database and State Accident database. The former Posted speed was also found as a significant accident
contains the geometric characteristic of the roadway predictor in Ivan and OMara (1997), while AADT was
and the later contains the accidents that occurred on significant in the study by Miaou (1994), Ivan and
this roadway. OMara (1997). This study, however, showed that the
The results showed that several roadway design and log of the section length and the log of the AADT are
traffic factors affect the safety of an arterial. Among
significant explanatory variables in modeling accident
those, AADT is the most critical factor. As AADT per
frequency or involvement. The study also showed that
lane increases accident frequency significantly increases.
the interaction between the lane width and the number
Lane width and the number of lanes are the next
of lanes is significant, and that the type of shoulder
potential geometric parameters that contribute to acci-
pavement affects the probability of accident involve-
dent occurrence. Narrow lane width and larger number
ment for older drivers. Speed was introduced in this
of lanes increase accident occurrence. Moreover, nar-
study in a different form to capture the magnitude of
row shoulder width, reduced median width and larger
degree of horizontal curve (i.e. sharper curves) are also speeding relative to the posted speed limit. This speed-
potential geometric features that increase the frequency ing indicator variable was shown to affect the accident
of accidents. Urban roadway segments have higher involvement of male and young drivers. The exact
potential for accident occurrence than rural sections. degree of horizontal curve was an important variable in
From the male and female accident involvement all the estimated models, which might indicate that the
models, it could be concluded that female drivers expe- continuous relationship is preferred than the more re-
rience higher probability of accidents than male drivers strictive categorization of this variable (Milton and
during heavy traffic volume and with reduced median Mannering, 1996).
width. Moreover, narrow lane width and larger number This paper support conclusions from previous litera-
of lanes have more effect on accident involvement for ture that the Negative Binomial formulation is superior
female drivers than male drivers. Male drivers have to the more restricted Poisson regression. The paper
greater tendency to be involved in accidents while attempted to add to the literature the dimension of
speeding. including the effect of age and gender in modeling
Young and older drivers have a larger possibility of accident involvement. It is worth mentioning that the
accident involvement than middle aged drivers when estimation of a model in which a multiple vehicle
experiencing heavy traffic volume. There is no effect of accident is represented by two or more involvements,
horizontal curve on older age drivers accident involve- gives rise to a possible correlation of disturbances,
ment. Older age drivers have greater tendency to acci- which refers to variations in unobserved contributing

Table 6
Elasticity estimates for age accident involvement models

Variables Elasticity (young age model) Elasticity (middle age model) Elasticity (old age model)

Section length (km) 0.10 0.10 0.21


AADT per lane 0.31 0.16 0.34
Degree of horizontal curve 0.18 0.07
Shoulder width (m) 0.70 0.12 0.26
Median width (m) 0.22 0.19 0.50
Lane width (m)/no. of lanes 0.71 0.44 0.72
Speed difference/speed limit 0.075 0.02
642 M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642

factors. Although, it is unlikely that this problem is Ivan, J., OMara, P., 1997. Prediction of Traffic Accident Rates
Using Poisson Regression. Presented at the 76th Annual Meeting
present in the female, young age, or old age accident
of the Transportation Research Board.
involvement models, it is likely that it is present in the Joshua, S., Garber, N., 1990. Estimating truck accident rate and
male and middle age models of accident involvement. involvement using linear and Poisson regression models. Trans-
An important direction for future research, which is portation Planning and Technology 15, 41 58.
methodological in nature, would be to account for this Jovanis, P., Chang, H., 1986. Modeling the relationship of accidents
to miles traveled. Transportation Research Record 1068.
correlation. This task will not be easy because of the
Knuiman, M., Council, F., Reinfurt, D., 1993. The effect of median
complexity of the error term structure in Negative width on highway accident rates. Transportation Research
Binomial models. Record 1401, 70 80.
McGee, H., Hughes, W., Daily, K., 1995. Effect of Highway Stan-
dards on Safety. NCHRP Report 374, Transportation Research
Board.
Acknowledgements
Miaou, S., 1994. The relationship between truck accidents and geo-
metric design of road section: Poisson versus Negative Binomial
The authors wish to acknowledge the comments and regression. Accident Analysis and Prevention 26(4).
suggestions of the anonymous referees. Their recom- Miaou, S., Lum, H., 1993. Modeling vehicle, accidents and highway
mendations resulted in a substantially improved paper. geometric design relationships. Accident Analysis and Prevention
25 (6), 689 709.
Miaou, S., Hu, P., Wright, T., Rathi, A., Davis, S., 1992. Relation-
ship between truck accidents and highway geometric design: a
Poisson regression approach. Transportation Research Record
References 1376, 10 18.
Milton, J., Mannering, F., 1996. The Relationship Between Highway
Abdel-Aty, M., Chen, C., Radwan, E., Brady, 1999a. Analysis of Geometrics, Traffic Related Elements and Motor Vehicle. Wash-
accident-involvement trends by drivers age in Florida. ITE Jour- ington State Dept. of Transportation.
nal on the Web (Feb. 1999), pp. 6974. Mostofa, H., 1998. Modeling of Traffic Accidents on Principal Arte-
Abdel-Aty, M., Chen, C., Radwan, E. 1999b. Using conditional rial, M.S. thesis, Department of Civil Engineering, University of
probabilities to explore the driver age effect in accidents. ASCE Central Florida.
Journal of Transportation Engineering 125(6). Poch, M., Mannering, F., 1996. Negative binomial analysis of inter-
Agent, K., Deen, R., 1975. Relationship between roadway geometrics section accident frequencies, Journal of Transportation Engineer-
and accidents. Transportation Research Record 541, 111. ing 122(2).
Agresti, A., 1990. Categorical Data Analysis. Wiley, New York. Ramanathan, R., 1995. Introductory Econometrics with Applica-
Chen, C., 1997. Statistical Analysis of the Effect of Demographic and tions. The Dryden Press, Fort Worth, TX.
Roadway Factors on Traffic Crash Involvement. M.S. thesis, Sabey, B., Taylor, H., 1980. The Known Risks we Run: The High-
Department of Civil Engineering, University of Central Florida. way. Supplementary Report SR 567, Transport and Road Re-
Fridstrom, L., Ifver, J., Ingebrigtsen, S., Kulmala, R., Thomsen, L., search Laboratory, UK.
1995. Measuring the contribution of randomness, exposure, Shankar, V., Mannering, F., Barfield, W. 1995. Effect of roadway
weather, and daylight to the variation in road accident counts. geometric and environment factors on rural freeway accident
Accident Analysis and Prevention 27 (1), 120. frequencies. Accident Analysis and Prevention 27(30).

You might also like