You are on page 1of 3

Journal of Safety Research - Traffic Records Forum proceedings

36 (2005) 485 – 487


www.elsevier.com/locate/jsr www.nsc.org

Traffic Records Forum Proceedings Paper

Advantages and disadvantages of different crash modeling techniques


Thobias Sando *, Renatus Mussa 1, John Sobanjo 2, Lisa Spainhour 3
Department of Civil Engineering and Environmental Engineering, FAMU-FSU College of Engineering, 2525 Pottsdamer Street, Room 129,
Tallahassee, FL 32310, USA
Available online 18 November 2005

Keywords: Modeling; Crashes; Advantages; Disadvantages; Techniques

1. Problem Statement Poisson regression models pertaining to their underlying


distributional assumptions, estimation procedures, functional
Modeling of traffic crashes is a complex undertaking. form of accident rate, and sensitivity to short road sections.
Previous studies have used a variety of techniques to analyze Miaou, Hu, Wright, Rathi, and Davis (1992) used a Poisson
crashes. The choice of the technique to be used in any specific regression model to establish the empirical relationship
modeling problem can be challenging. The methods that have between truck collisions and highway geometrics on a rural
been mostly used include regression analysis, artificial neural interstate in North Carolina. Using Poison regression requires
networks, and pattern recognition methods that include the mean and variance of the dependent variable to be equal.
nearest neighbor rule and Bayesian belief network technique. In most crash data, the variance of the crash frequency
The knowledge of the advantages and disadvantages of each exceeds the mean and, in such case, the data would be over
method would help safety analysts on deciding the most dispersed.
appropriate method for each particular analysis. This paper Fridstrom, Ifver, Ingebrightsen, Kulmala, and Thomsen
provides an insight on the advantages and disadvantages of (1995) measured the contribution of randomness, exposure,
each of the aforementioned group of methods. weather, and daylight to the variation in road crash counts.
They concluded that simple Poisson regression models can
come very close to explaining almost all the systematic
2. Modeling techniques variations in crash data set. However, when the events are
not independent, it would be strongly advisable to use
2.1. Regression analysis Negative Binomial rather than pure Poisson specification, as
certain amount of over dispersion must always be expected
Traditionally, crashes have been modeled using regression in such cases. Because of highly nonlinear internal relation-
analysis methods. Several studies have shown that linear ships of variables that influence crash occurrences, Abdel-
regression has some undesirable statistical properties when wahab and Abdel-Aty (2001) suggested the use of artificial
applied to crash modeling. Jovanis and Chang (1986) neural networks (ANN) over regression methods.
indicated potential limitations of linear regression and
2.2. Artificial neural networks

* Corresponding author. Tel.: +1 850 410 6233; fax: +1 850 410 6142. Artificial neural networks (ANN) consists a network of
E-mail addresses: sando@eng.fsu.edu (T. Sando), mussa@eng.fsu.edu many simple processors—units, nodes, or neurons. There
(R. Mussa), sobanjo@eng.fsu.edu (J. Sobanjo), spainhou@eng.fsu.edu are a few previous research studies that have used artificial
(L. Spainhour).
1
Tel.: +1 850 410 6191; fax: +1 850 410 6142.
neural networks in crash prediction. Vogt and Bared (1998)
2
Tel.: +1 850 410 6153; fax: +1 850 410 6142. presented an artificial neural network (ANN) concept in
3
Tel.: +1 850 410 6123; fax: +1 850 410 6142. crash modeling. According to the study, the most delicate
0022-4375/$ - see front matter D 2005 National Safety Council and Elsevier Ltd. All rights reserved.
doi:10.1016/j.jsr.2005.10.006
486 T. Sando et al. / Journal of Safety Research - Traffic Records Forum proceedings 36 (2005) 485 – 487

part of neural network modeling is generalization—the Hattori and Takahashi (1999) reported that k-nearest
development of a model that is reliable in predicting future neighbor (k-NN) rule is effective when the probability
crashes. The study also suggests that overfitting (i.e., getting distributions of the feature variables are not known and
weights for which the error is small on the training set that therefore the Bayes decision rule cannot be used. It should
even random variation is accounted for) can be minimized be noted however that the definition of the distance
by having two validation samples in addition to the training measurement in crash variables is difficult and subjective.
sample. The existence of variables that vary in form and magnitude
Musone, Ferrari, and Oneta (1999) used ANN to analyze makes it difficult to establish the distance function. While
urban crashes in the city of Milan in Italy. The study applied some variables are continuous, others are discrete. In
the feed-forward neural networks with a back-propagation addition, even within the continuous and discrete variable
learning paradigm. Abdelwahab and Abdel-Aty (2001) groups, the range of magnitudes and the number of
developed ANN models to predict driver injury severity in categories differ from variable to variable. This lessens the
traffic accidents at signalized intersections. The study appropriateness of the nearest neighbor technique in crash
investigated the use of two well known neural network prediction.
paradigms, the multilayer perceptron (MLP) and fuzzy
adaptive resonance theory (ART) neural networks. The 3.2. Bayesian belief networks technique
MLP neural network has a better generalization perfor-
mance of 65.6% and 60.4% for the training and testing Most of the techniques used for modeling crashes require a
phases, respectively. The performance of the MLP was prior knowledge of the distribution of crash parameters.
compared with an ordered logit model. The ordered logit Sometimes the knowledge about a distribution is not directly
model was able to correctly classify only 58.9% and 57.1% known but instead the statistical dependencies or indepen-
for the training and testing phases, respectively. dencies among the variables are known. For example, by
Artificial neural networks (ANN) approach has the intuition there exists a dependency between side swipe
following advantages: (a) there is no need to assume an crashes with the lane width, vehicle speed and severity of
underlying data distribution; (b) neural networks are the crash, traffic volume and the crash rate, to name a few. The
applicable to multivariate non-linear problems; and (c) the dependency between crash occurrence and traffic factors
transformations of the variables are automated in the such as AADT, geometric factors such as number of lanes,
computational process. However, ANN technique has and design factors such as speed could be established. The
several disadvantages including: (a) minimizing over-fitting internal dependencies could then be represented by condi-
requires a great deal of computational effort, and (b) the tional probabilities that could be used to determine the
individual relations between the input variables and the likelihood of the magnitude of crashes in a particular roadway
output variables are not developed by engineering judgment segment given certain conditions.
so that the model tends to be a black box without analytical The Bayesian Belief Networks technique is fairly new.
basis. The technique is being researched in areas that have
complex dependency of variables such as medical diagnos-
tic systems, real-time weapons scheduling, computer pro-
3. Pattern recognition methods cessor fault diagnosis, generator monitoring expert system,
and software troubleshooting. Bayesian Belief Networks are
3.1. Nearest neighbor rule appropriate for modeling crashes due to the fact that
dependencies between factors are known and can be used
The nearest neighbor analysis is a classification method to construct the belief network structure.
in which the class of an unknown record is assigned after
comparisons between the unknown record and all known
records (training data) in data repository are made. The 4. Summary
degree of similarity between different records is determined
by a function called the distance function. Nukoolkit and The review of several studies on models used in highway
Chen (2001) used two different distance functions— safety modeling data indicates that each method has its
Euclidian Distance (ED) and Value Difference Metric advantages and disadvantages. The advantages and disad-
(VDM) distance both combined with k-mode clustering in vantages of several methods have been discussed in this
predicting whether a car crash will have either an injury or a paper. The suitability of the method depends upon the
non-injury outcome using a subset of year 2000 Alabama desired output and the model inputs. The output of the
interstate alcohol-related crashes. The prediction errors of model could include type of the crash—fatal, injury, and
33% and 45% were observed using ED and VDM methods, property damage only, number of crashes, crash rate,
respectively. The study further proposed an improved segment/intersection severity rating and so forth; the inputs
technique that combines the distance function with decision may include different environmental, traffic, and roadway
tree clustering which reduced the prediction error to 19%. variables. Further review and analysis of different models is
T. Sando et al. / Journal of Safety Research - Traffic Records Forum proceedings 36 (2005) 485 – 487 487

underway. Documentation of limitations of each modeling Vogt, A., & Bared, J. G. (1998). Accident models for two-lane rural roads:
technique will help analysts to decide the best method to use Segments and intersections. Publication FHWA-RD-98-133. FHWA,
US. Department of Transportation.
at each particular modeling problem.
Thobias Sando is a Ph.D. candidate at Florida State University. He is
involved in a research entintled ‘‘Implementation of GIS for Crash Data
References Management.’’ His research interests include the use of Pattern Recognition
Techniques and GIS in crash modeling. He also performs research on the
Abdelwahab, H. T., & Abdel-Aty, M. A. (2001). Development of artificial usability of GPS receivers for collecting field crash data.
neural network models to predict driver injury severity in traffic
accidents at signalized intersections. Journal of the Transportation Renatus Mussa is an Associate Professor and Director of Traffic
Engineering Laboratory at the FAMU-FSU College of Engineering. Dr.
Research Board, 1746, 6 – 13.
Fridstrom, L., Ifver, J., Ingebrightsen, S., Kulmala, R., & Thomsen, L. Mussa has been teaching with the Department of Civil and Environmental
(1995). Measuring the contribution of randomness, exposure, weather, Engineering of the FAMU-FSU College of Engineering since 1998. He has
and daylight to the variation in road accident counts. Accident Analysis excelled in research in different areas of transportation engineering
including intelligent transportation systems, highway safety, and traffic
and Prevention, 27(1), 1 – 20.
Hattori, K., & Takahashi, M. (1999). A new nearest-neighbor rule in the studies.
pattern classification problem. Pattern Recognition, 32, 425 – 432.
John Sobanjo is an Associate Professor at the FAMU-FSU College of
Jovanis, P., & Chang, H. (1986). Modeling the relationship of accidents to
miles traveled. Journal of the Transportation Research Board, 1068, Engineering. His research areas of interest include infrastructure manage-
42 – 51. ment, implementation of new technology in highway applications,
Miaou, S., Hu, P., Wright, T., & Davis, S. (1992). Relationship between construction management, and highway safety. Apart from academics, Dr.
Sobanjo has a vast industrial experience gained from working with
truck accidents and highway geometric design: A Poisson regression
approach. Journal of the Transportation Research Board, 1376, 10 – 18. California Department of Transportation (CALTRANS) and Texas State
Musone, L., Ferrari, A., & Oneta, M. (1999). An analysis of urban Department of Highways and Public Transportation (TxDOT).
collisions using an artificial intelligence model. Accident Analysis and
Lisa Spainhour is an Associate Professor at the FAMU-FSU College of
Prevention., 31, 705 – 718.
Nukoolkit, C., & Chen, H. C. (2001). Improving accuracy of Engineering. Her research interests include field performance of roadside
nearest neighbor algorithm in highway accident prediction. barriers, civil engineering applications of composite materials, and
engineering data modeling and management.
ANNIE ’2001 for the proceedings of smart engineering system design
(vol. 11) (pp. 763 – 768).

You might also like