Professional Documents
Culture Documents
To cite this article: Chenye Qiu, Chunlu Wang, Binxing Fang & Xingquan Zuo (2014) A Multiobjective
Particle Swarm Optimization-Based Partial Classification for Accident Severity Analysis, Applied
Artificial Intelligence: An International Journal, 28:6, 555-576
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
Applied Artificial Intelligence, 28:555–576, 2014
Copyright © 2014 Taylor & Francis Group, LLC
ISSN: 0883-9514 print/1087-6545 online
DOI: 10.1080/08839514.2014.923166
1
School of Internet of Things, Nanjing University of Posts and Telecommunications,
Nanjing, Jiangsu, China
2
Key Laboratory of Trustworthy Distributed Computing and Service of the Ministry of
Education of China, Beijing University of Posts and Telecommunications, Beijing, China
Reducing accident severity is an effective way to improve road safety. In this article, a novel
multiobjective particle swarm optimization (MOPSO)-based partial classification method is employed
to identify the contributing factors that impact accident severity. The accident dataset contains
only a few fatal accidents but the patterns of fatal accidents are of great interest to traffic agen-
cies. Partial classification can deal with the unbalanced dataset by producing rules for each class.
The rules can be evaluated by several conflicting criteria such as accuracy and comprehensibility.
A MOPSO is applied to discover a set of Pareto optimal rules. The accident data of Beijing between
2008 and 2010 are used to build the model. The proposed approach is compared with several rule-
learning algorithms. The results show that the proposed approach can generate a set of accurate
and comprehensible rules, which can indicate the relationship between risk factors and accident
severity.
INTRODUCTION
Traffic accidents have been one of the leading causes of death and
injury worldwide, resulting in an estimated 1.2 million deaths and 50 million
injuries worldwide each year (World Health Organization 2009). Reducing
accident severity is an effective way to improve road safety. This study
employs a novel multiobjective particle swarm optimization (MOPSO)-
based partial classification technique to identify the risk factors that can
significantly influence accident severity.
Up to now, many researchers have attempted to develop accident sever-
ity analysis models. Regression analysis has been widely used to determine
the risk factors that can influence injury severity. Among these regression
models, the logistic regression model and the ordered outcome model have
been the most commonly used models in traffic accident severity analy-
sis (Al-Ghamdi 2002; Milton, Shankar, and Mannering 2008; Bédard et al.
2002; Yau, Lo, and Fung 2006; Yamamoto and Shankar 2004; Kockelman and
Kweon 2002). However, most regression models have their own assumptions
and predefined underlying relationships between dependent and indepen-
dent variables (i.e., linear relations between the variables; Chang and Wang
2006). If these assumptions are violated, the model could lead to erroneous
estimations of the likelihood of severe injury.
In order to deal with this problem, some nonparametric methods with-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
nugget discovery, is the problem of finding simple rules that represent strong
descriptions of a specified class, even when that class has few cases in
the database (Ali, Manganaris, and Srikant 1997). This data mining task
is particularly useful when some of the classes in a database are minority
classes.
Association rules can be used to solve partial classification problems (Ali,
Manganaris, and Srikant). Previous studies have employed association rules
to find meaningful insights between crash characteristics and accident sever-
ity, in which the a priori algorithm is used to extract a set of classification
rules. In association rule algorithms, the determination of the threshold of
support and confidence is very important, but is still under investigated.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
● The K -means algorithm is used to find the global best guide from the
nondominated rules during each run of the algorithm. This global best
selection method can ensure that the rules spread over the nondominated
front.
[50,100) 2
[100,200] 3
>200 4
Seat belt not wear 1
wear 2
Driver professional 1
nonprofessional 2
without license 3
Vehicle small passenger car 1
large passenger car 2
small truck 3
large truck 4
other 5
Collision type sideswipe 1
pedestrian-vehicle 2
rear-end 3
head-on 4
fixed-object 5
rollover 6
Accident cause follow too close 1
fail to brake 2
drunk driving 3
not yield as it has to 4
drive in the wrong lane 5
speeding 6
Weather sunny 1
rain 2
other 3
Gender male 1
female 2
Age <26 1
[26,55] 2
>55 3
Accident severity nonfatal 0
fatal 1
There are two main reasons for using partial classification to analyze the
risk factors and accident severity. First, partial classification can deal with the
unbalanced dataset. As in the accident data, there are relatively fewer fatal
accidents. Traditional classification methods can achieve high classification
accuracy when the fatal accidents are misclassified (Chang and Wang 2006).
However, the patterns of the fatal accidents are very important in order
for traffic departments to take measures to reduce accident severity. Partial
classification targets finding rules for a particular class. Second, the rules
generated by partial classification are easy to understand. This technique
doesn’t seek to find all the rules, but a set of comprehensible rules for users.
In accident severity analysis, the data were divided into two classes: fatal acci-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
dents and nonfatal accidents. Partial classification was used to discover rules
for both classes. These rules can represent strong descriptions of each class.
PARTIAL CLASSIFICATION
Basic Concepts
In data mining, classification is a task of building a model that distin-
guishes items belonging to different classes, on the basis of a set of training
data whose class label is known. There are various kinds of classification
techniques, such as logistic regression, neural networks, decision trees, and
so forth. These models are basically evaluated according to their predictive
accuracy on the unseen test dataset. They don’t emphasize producing knowl-
edge that can be easily understood by end users. However, in many real-life
problems, complete classification may be infeasible and undesirable because
of the dataset characteristics or the user’s preference (Ali, Manganaris, and
Srikant et al. 1997).
In contrast, partial classification aims at discovering knowledge of data
classes, but might not cover all the classes or all the examples of a given
class. In partial classification, the main goal is to learn accurate rules that
can indicate characteristics of data classes. Partial classification can provide
valuable knowledge about the dataset and help the user’s decision making.
Association rule mining is similar to partial classification. They both aim
to obtain a set of simple and comprehensible rules. The main difference
between them is that the consequent of association rule mining is not a par-
ticular class, but can be any conjunction of attributes. By fixing the class
attribute as the consequent, association rule mining can be used to solve
partial classification problems (Ali, Manganaris, and Srikant 1997).
Simple Rules
The task of partial classification is to find a set of simple rules that can
indicate some relationship between the risk factors and accident severity.
MOPSO for Accident Severity Analysis 561
Partial classification can produce rules for each class that can represent a
strong description of the particular class, even when the class has very few
fatal accidents in the database.
Rules produced by partial classification are of the form: “if A then C,”
where A is the rule antecedent that comprises a conjunction of attributes,
and C is the rule consequent, specifying the value of the goal attribute.
In partial classification, the consequent is a fixed value (i.e., the class
attribute). Samples with the same goal attribute are associated. In accident
severity analysis, goal attribute is accident severity and it can take “fatal acci-
dent” or “nonfatal accident.” By fixing on each class attribute, rules can be
obtained. These rules indicating an increased chance of accident severity
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
Rule Evaluation
The discovered rules should have high predictive accuracy and high
comprehensibility. A very common way to measure rule accuracy is to cal-
culate how many cases both antecedent and consequent hold out of all cases
in which the antecedent holds. This metric is simple and easy to implement.
It is defined as:
|A&C| − 1/2
PA = , (1)
|A|
where |A| is the number of instances satisfying all the conditions in the
antecedent A and |A&C| is the number of records that satisfy both the
antecedent A and consequent C.
There are many ways to measure a rule’s comprehensibility. In gen-
eral, the shorter the rule, the more comprehensible it is. The standard
way to measure comprehensibility is to count the number of attributes in
the antecedent part. The comprehensibility decreases when the number
increases. The comprehensibility (COM) of a rule is calculated as follows:
num
COM = 1 − , (2)
m
where num is the number of attributes in the antecedent part of the rule and
m is the number of total attributes.
These two metrics measure different features of a rule, and they are
noncommensurable and conflicting. Rules with high accuracy might not be
comprehensible, whereas rules with high comprehensibility might not be
accurate. Traditional ways to deal with the conflicting criteria are to weigh
all the objectives or give preference to objectives (lexicographic). These
methods all need some a priori information, which is difficult to decide
562 C. Qiu et al.
beforehand, and they can obtain only one solution at a time. By contrast,
the Pareto dominance approach can generate a set of optimal solutions that
approximate the Pareto front in a single run of the algorithm.
where x is the decision vector, X is the decision space, y is the objective vector,
and Y is the objective space.
A vector xk is said to dominate another vector xl , denoted as:
∀i ∈ 1, 2, . . . , n : fi (xk ) ≥ fi (xl ),
(4)
∃i ∈ 1, 2, . . . , n : fi (xk ) > fi (xl ).
t+1
vid = ω × vid
t
+ c1 × r1t × (pbestidt − xid
t
) + c2 × r2t × (gbestdt − xid
t
), (5)
t+1
xid = xid
t
+ vid
t+1
, (6)
t t
where vid is the dth dimension of the velocity of particle i in cycle t; xid
t
the dth dimension of the position of particle i in cycle t; pbestid is the dth
dimension of the position of personal best of particle i in cycle t; gbestdt is the
the dth dimension of the position of global best in cycle t; ω is the inertia
weight, which plays an important role in balance global and local search.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
A large inertia weight promotes global search and a small inertia weight is
more appropriate for local search. The value is typically set between 0 and
1. The cognitive weight is c 1 , and c 2 is the social weight; r1t and r2t are two
random numbers.
Rule Representation
Each particle in the population represents a partial classification rule
in the form of “if A then C.” Only the antecedent is shown in the particle.
The consequent part of the rule is predefined in each run of the algo-
rithm. Hence, it is not represented in the encoding. If there are m decision
attributes, the size of a particle is m. The encoding is shown in Figure 2. The
ith attribute is Ai . All the Ai is in the range [0, 1].
In order to form a rule, the bit string should be translated into the
original information. The translation is as follows:
where Vi means the value translated from the particle for the ith attribute,
Counti is the total number of different values of the ith attribute, int(x)
represents the biggest integer number smaller than x. If Vi is 0, this means
the ith attribute is absent in the rule. Hence, the rule in this study is variable-
length. The antecedent part consists of at least one attribute.
K
E= | f − mi | , (8)
i=1 f∈Ci
MOPSO for Accident Severity Analysis 567
where E is the sum of the square error for all solutions in the archive; f is
the point in space representing a given solution; mi is the mean of the
cluster Ci .
In each cluster, find the solution nearest to the centroid and consider it as
the representative solution of the cluster.
After the clustering process, the clusters are separated from each other and
the solutions in the same cluster are similar. The representative solutions lie
in diverse regions of the nondominated front. The next step is to choose
gbest from the K representative particles for each particle in the population.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
1/
numi
pi = , (9)
K
1/
numi
i=1
t
w t = wmax − (wmax − wmin ), (10)
maxgen
where wmax and wmin are the maximum and minimum values of the inertia
weight; t is the current iteration and maxgen is the maximum iteration of
the algorithm. In this study, we adopted this time-decreasing inertia weight.
Therefore, the algorithm explores the search space initially and later focuses
on the most promising regions.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
EXPERIMENT RESULTS
Comparative Study
KMOPSO uses 100 particles in the population and runs 500 generations.
The maximum number of the archive is 100. The learning rates r1 and r2 are
1.0 and 1.0, respectively; w is time decreasing in order to balance global and
local search; wmax is 0.9, and wmin is set as 0.4. The number of cycles in the
K -means algorithm is 20. The number of clusters (K ) is time varying. It is
shown as follows:
⎧
⎪ |A| when 0 < A ≤ 3
⎨
3 when 3 < A ≤ 10
K = (11)
⎪
⎩5 when 10 < A ≤ 30
10 when 30 < A ≤ 100
Nonfatal 7 90 1 33
Fatal 5 52 6 26
Total 12 142 7 59
In the three algorithms, PART has been used in accident severity anal-
ysis based on the accident data of Ethiopia (Beshah and Hill 2010). This
research focused only on the road-related factors; it did not consider the
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
PA Best 0.9737 1 NA 1
Average 0.8753 0.827 0.6471 0.7052
Worst 0.7607 0.7317 NA 0.5
C Best 0.9167 0.9167 NA 0.8333
Average 0.7857 0.625 1 0.691
Worst 0.75 0.1667 NA 0.5
PA(T) Best 1 1 NA 1
Average 0.7956 0.754 0.6314 0.5933
Worst 0.6696 0.6861 NA 0
each metric, the best, average, and worst values are shown. The best average
results obtained are shown in boldface.
Because RIPPER produced only one default rule for the nonfatal class, it
is not considered here. In the rules generated by the other three algorithms,
PART and C4.5 both produce rules with best predictive accuracy for the
training set, but the average performance of KMOPSO is better than PART
and C4.5. In the testing set, KMOPSO also shows the best predictive accu-
racy. With respect to comprehensibility, the rules generated by KMOPSO
are much more understandable, which can be illustrated by the higher C
value than PART and C4.5. It should be noticed that even the worst C value
of KMOPSO is 0.75, which is higher than the average of PART and C4.5.
The worst C value of PART is only 0.1667. For example, a mined rule with
comprehensibility of 0.1667 is below:
This rule has only three attributes, which makes it much easier to
understand.
With regard to the rules for fatal accidents, C4.5 places first in terms
of accuracy in the training set. KMOPSO is behind C4.5 and PART, only
MOPSO for Accident Severity Analysis 571
slightly better than RIPPER. However, C4.5 and PART show poor generaliza-
tion ability; their predictive accuracies in the testing set are both very low.
KMOPSO shows good generalization ability, because the predictive accuracy
in the testing set is even higher than in the training set. With respect to
comprehensibility, KMOPSO does considerably better than the other three
algorithms, because the C value of KMOPSO is much larger.
From these results, we can see that KMOPSO performs well on the traffic
accident dataset in terms of accuracy and comprehensibility. There are three
main advantages of KMOPSO compared with the other algorithms:
KMOPSO mined a small set of rules with high accuracy and
comprehensibility. Too many rules would hinder the users’ decision mak-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
ing. It’s difficult for them to find useful knowledge from the large set of
rules. In contrast, a small set of rules is easy for users to understand. RIPPER
also obtained a small set of rules, but it produced a default rule only for the
nonfatal class.
The rules mined by KMOPSO show good generalization ability. The
rules show good predictive accuracy in the testing set. In the fatal-accident
dataset, the accuracy in the testing set is even higher than in the training set.
In contrast, the PART and C4.5’s generalization ability are relatively poor.
The rules obtained by KMOPSO show good comprehensibility. This
advantage can be attributed to the adoption of MO. Comprehensibility is
chosen as one object to be optimized.
In the five rules for class 1, two rules show good accuracy, over 80%.
Rule 8 has only two predictors in its antecedent. As for the rest of the
three rules with relatively lower predictive accuracy, they all show great
comprehensibility, with only one attribute in the antecedent.
Based on these rules, a detailed discussion on the risk factors that
affect injury severity is given below. From the rules for the nonfatal acci-
dents, we can find that the following factors can effectively reduce accident
severity:
MOPSO for Accident Severity Analysis 573
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
Road dividers are a key factor affecting accident severity. This attribute is present
in four rules. Three rules with very high accuracy show that when the
road has median strip and is motor/nonmotor divided, the accidents
are less severe. A median strip can divide high-volume traffic flow in
opposite lanes. It can restrict turns and reduce the number of head-on
crashes. This is consistent with the research of the U.S. Federal Highway
Administration (1993).
Mixed traffic is very normal in Beijing and this study shows mixed traf-
fic is dangerous because bicycles are vulnerable under mixed traffic
conditions.
The road protection facilities show significant effect on accident severity. Two
rules show that with protection facilities, the accident severity would
decrease.
Driver’s gender. Four rules show that accidents caused by female drivers tend
to be less severe. In the original dataset, there are many more men
drivers’ accidents than those of women drivers. But the partial classifi-
cation technique can still discover that women’s accidents are less likely
to be fatal accidents. This is because men, especially young men, tend to
drive more aggressively than women, which makes them more likely to
be involved in fatal accidents.
Seat belts. Two rules show that seat belts can be helpful to reduce accident
severity. Safety belts can prevent death in many accidents.
From the rules for the fatal accidents, we can find that the following
factors are associated with fatal accidents:
574 C. Qiu et al.
because of the blurred vision and wet road surface, speeding is extremely
dangerous.
CONCLUSIONS
This article proposed a novel KMOPSO-based partial classification tech-
nique to analyze traffic accident data in order to find the contributing
factors that influence accident severity. Unlike many other accident analy-
sis models, which aim at building classifiers, partial classification seeks to
find knowledge that can indicate some relationships between risk factors
and accident severity. The traffic accident data of Beijing were collected
to build the model. The experiment results show the extracted rules by
KMOPSO have higher accuracy than those by PART algorithm. Also, the
rules generated by KMOPSO are much simpler and easier for end users
to understand. Thus, the proposed approach can provide useful deci-
sion supports for accident severity analysis tasks. From the extracted rules,
it can be seen that road dividers, protection facilities, weather, gender,
and accident causes are the main contributing factors in deciding acci-
dent severity. This knowledge can be used to help take a priori measures
in order to reduce accident severity and thereby improve road traffic
safety.
FUNDING
The authors acknowledge support from High Technology Research and
Development Program (863 Project) (Grant No. 2009AA04Z120).
MOPSO for Accident Severity Analysis 575
REFERENCES
Alatas, B., and E. Akin. 2009. Multi-objective rule mining using a chaotic particle swarm optimization
algorithm. Knowledge-Based Systems 22(6):455–460.
Ali, K., S. Manganaris, and R. Srikant. 1997. Partial classification using association rules. In Proceedings
of the third international conference on knowledge discovery and data mining , 115–118. Menlo Park, CA:
ACM Press.
Al-Ghamdi, A. S. 2002. Using logistic regression to estimate the influence of accident factors on accident
severity. Accident Analysis and Prevention 34(6):729–741.
Bédard, M., G. H. Guyatt, M. J. Stones, and J. P. Hirdes. 2002. The independent contribution of driver,
crash, and vehicle characteristics to driver fatalities. Accident Analysis and Prevention 34(6):717–727.
Beshah, T., and S. Hill. 2010. Mining road traffic accident data to improve safety: Role of road-related
factors on accident severity in Ethiopia. In Proceedings of AAAI artificial intelligence for development,
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
Witten, I. H., and E. Frank. 2005. Data mining: Practical machine learning tools and techniques (2nd ed.). San
Francisco, CA: Morgan Kaufmann.
Yamamoto, T., and V. N. Shankar. 2004. Bivariate ordered-response probit model of driver’s and passen-
ger’s injury severities in collisions with fixed objects. Accident Analysis and Prevention 36(5):869–876.
Yang, J., and J. Zhou, L. Liu, and Y. Li., 2009. A novel strategy of Pareto-optimal solution searching in
multi-objective particle swarm optimization (MOPSO). Computers and Mathematics with Applications
57(11–12):1995–2000.
Yau, K. K. W., H. P. Lo, and S. H. H. Fung. 2006. Multiple-vehicle traffic accidents in Hong Kong. Accident
Analysis and Prevention 38(6):1157–1161.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014