You are on page 1of 24

This article was downloaded by: [Carnegie Mellon University]

On: 05 November 2014, At: 23:05


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Artificial Intelligence: An


International Journal
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/uaai20

A Multiobjective Particle Swarm


Optimization-Based Partial Classification
for Accident Severity Analysis
a b b b
Chenye Qiu , Chunlu Wang , Binxing Fang & Xingquan Zuo
a
School of Internet of Things, Nanjing University of Posts and
Telecommunications, Nanjing, Jiangsu, China
b
Key Laboratory of Trustworthy Distributed Computing and Service
of the Ministry of Education of China, Beijing University of Posts and
Telecommunications, Beijing, China
Published online: 14 Jul 2014.

To cite this article: Chenye Qiu, Chunlu Wang, Binxing Fang & Xingquan Zuo (2014) A Multiobjective
Particle Swarm Optimization-Based Partial Classification for Accident Severity Analysis, Applied
Artificial Intelligence: An International Journal, 28:6, 555-576

To link to this article: http://dx.doi.org/10.1080/08839514.2014.923166

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014
Applied Artificial Intelligence, 28:555–576, 2014
Copyright © 2014 Taylor & Francis Group, LLC
ISSN: 0883-9514 print/1087-6545 online
DOI: 10.1080/08839514.2014.923166

A MULTIOBJECTIVE PARTICLE SWARM OPTIMIZATION-BASED


PARTIAL CLASSIFICATION FOR ACCIDENT SEVERITY ANALYSIS

Chenye Qiu1 , Chunlu Wang2, Binxing Fang2, and Xingquan Zuo2


Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

1
School of Internet of Things, Nanjing University of Posts and Telecommunications,
Nanjing, Jiangsu, China
2
Key Laboratory of Trustworthy Distributed Computing and Service of the Ministry of
Education of China, Beijing University of Posts and Telecommunications, Beijing, China

 Reducing accident severity is an effective way to improve road safety. In this article, a novel
multiobjective particle swarm optimization (MOPSO)-based partial classification method is employed
to identify the contributing factors that impact accident severity. The accident dataset contains
only a few fatal accidents but the patterns of fatal accidents are of great interest to traffic agen-
cies. Partial classification can deal with the unbalanced dataset by producing rules for each class.
The rules can be evaluated by several conflicting criteria such as accuracy and comprehensibility.
A MOPSO is applied to discover a set of Pareto optimal rules. The accident data of Beijing between
2008 and 2010 are used to build the model. The proposed approach is compared with several rule-
learning algorithms. The results show that the proposed approach can generate a set of accurate
and comprehensible rules, which can indicate the relationship between risk factors and accident
severity.

INTRODUCTION
Traffic accidents have been one of the leading causes of death and
injury worldwide, resulting in an estimated 1.2 million deaths and 50 million
injuries worldwide each year (World Health Organization 2009). Reducing
accident severity is an effective way to improve road safety. This study
employs a novel multiobjective particle swarm optimization (MOPSO)-
based partial classification technique to identify the risk factors that can
significantly influence accident severity.
Up to now, many researchers have attempted to develop accident sever-
ity analysis models. Regression analysis has been widely used to determine

Address correspondence to Chenye Qiu, Nanjing University of Posts and Telecommunications,


No.66 Xin Mofan Road, Nanjing 210046, China. E-mail: qiuchenye@gmail.com
556 C. Qiu et al.

the risk factors that can influence injury severity. Among these regression
models, the logistic regression model and the ordered outcome model have
been the most commonly used models in traffic accident severity analy-
sis (Al-Ghamdi 2002; Milton, Shankar, and Mannering 2008; Bédard et al.
2002; Yau, Lo, and Fung 2006; Yamamoto and Shankar 2004; Kockelman and
Kweon 2002). However, most regression models have their own assumptions
and predefined underlying relationships between dependent and indepen-
dent variables (i.e., linear relations between the variables; Chang and Wang
2006). If these assumptions are violated, the model could lead to erroneous
estimations of the likelihood of severe injury.
In order to deal with this problem, some nonparametric methods with-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

out any predefined underlying relationship between the target (dependent)


variable and the predictors (independent variables) are proposed in order
to identify the risk factors affecting injury severity levels in traffic accidents;
these include neural network, Bayesian network, classification and regres-
sion tree (CART), and a rule induction algorithm based on a partial decision
tree—projective adaptive resonance theory (PART) (Dursun, Ramesh, and
Max 2006; Oña et al. 2011; Chang and Wang 2006; Beshah and Hill 2010).
These classification models focus on building an accurate classifier, as mea-
sured by classification error rate. One drawback of these models is that
they may still achieve high overall accuracy even when a large number of
instances in the minority class (i.e., the class with few representative cases in
the database) are misclassified. This may cause a serious problem when the
minority class is of particular interest. Accident severity analysis is a prob-
lem in which the original dataset contains only a very small number of
fatal accident examples although the patterns of those examples are impor-
tant to traffic management departments. So rules indicating an increased
chance of fatal accidents are very useful for knowledge about accident sever-
ity. However, those models mentioned previously fail to produce patterns of
the fatal accidents, which are of great importance to this problem.
Additionally, some classification models (e.g., the neural network
model) are the “black box” models. They cannot represent knowledge in
a way that end users can understand (Beshah and Hill 2010). The PART
and CART algorithms can generate rules, but in the case of large databases,
unless some special techniques are adopted, these algorithms may generate
too many rules and most of the rules cover very few cases, hence their use
as descriptive patterns is limited (Iglesia et al. 2006). In an accident sever-
ity analysis task, the user prefers a set of rules of high accuracy and good
comprehensibility that can indicate the relationship between risk factors and
injury severity, rather than all the rules. Too many rules would even hinder
the user’s decision making (Freitas 1999).
These problems can be addressed by specifically targeting each class,
using partial classification techniques. Partial classification, also known as
MOPSO for Accident Severity Analysis 557

nugget discovery, is the problem of finding simple rules that represent strong
descriptions of a specified class, even when that class has few cases in
the database (Ali, Manganaris, and Srikant 1997). This data mining task
is particularly useful when some of the classes in a database are minority
classes.
Association rules can be used to solve partial classification problems (Ali,
Manganaris, and Srikant). Previous studies have employed association rules
to find meaningful insights between crash characteristics and accident sever-
ity, in which the a priori algorithm is used to extract a set of classification
rules. In association rule algorithms, the determination of the threshold of
support and confidence is very important, but is still under investigated.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

In fact, partial classification can be modeled as a multiobjective (MO)


problem because the rules can be evaluated according to several different
or even conflicting criteria. Classification rules can be highly accurate but
not comprehensible or can be comprehensible but less accurate. This trade-
off between accuracy and comprehensibility motivates the application of
MO metaheuristic algorithms to solve partial classification. By employing
the Pareto approach, a set of optimal rules can be obtained automatically,
without any predefined threshold.
Several MO algorithms have been applied to partial classification prob-
lems (Reynolds and Iglesia 2009; Alatas and Akin 2009; Dehuri and Mall
2006). MO-based partial classification has shown to be a powerful tool in
many types of rule mining tasks. Results of applying the MO metaheuris-
tic algorithms across a range of datasets from the UCI repository showed
that the MO algorithm can generate accurate and comprehensible rules,
but there are few MO algorithm-based partial classification methods for solv-
ing real-world problems. In this study, we use the MO algorithm to solve the
accident severity analysis problem and investigate whether this technique is
suitable for this problem.
A novel MO particle swarm optimization algorithm with K -means guide
selection strategy (KMOPSO) is proposed to identify risk factors affecting
accident severity. To the best of our knowledge, we are the first to use an
MO algorithm-based partial classification technique to analyze the accident
severity problem. The goal of this work is to discover a set of accurate and
comprehensible rules that can indicate the relationship between risk factors
and accident severity. The proposed algorithm has several specific points:

● Rule accuracy and comprehensibility are considered simultaneously by


adopting the MO algorithm.
● The Pareto approach is employed to choose only the nondominated rules
for each class. Hence, this study does not need any predefined threshold
such as support or confidence.
558 C. Qiu et al.

● The K -means algorithm is used to find the global best guide from the
nondominated rules during each run of the algorithm. This global best
selection method can ensure that the rules spread over the nondominated
front.

This article is organized as follows. “Materials and Methods” introduces


accident data and the methods used in this study. “Partial Classification”
describes the partial classification technique. “A KMOPSO Approach for
Accident Severity Analysis” presents the algorithm used for accident sever-
ity analysis. Next, “Experimental Results” are reported. “Conclusions” are
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

presented in the final section.

MATERIALS AND METHODS


Accident Data
Accident data were obtained from the Beijing Traffic Management
Bureau, which included the reported accidents in Beijing between 2008 and
2010. Only accidents that happened on the nonintersection roads (i.e., the
accidents not related to intersections), were chosen to build the model
because the characteristics of nonintersection roads and intersections are
quite different, so they cannot be mixed together. The total number of
accidents obtained on nonintersection roads during this period was 3,651.
There were 523 accidents screened out because of missing values. These
data missed at least one variable used for accident analysis. There were
23 accidents deleted because of questionable information (i.e., records
with conflicting variables were screened out). Finally, the database con-
tained 3105 accident instances, with 2653 nonfatal accidents and 452 fatal
accidents.
Twelve variables and the class variable were used to identify the risk fac-
tors contributing to the accident and injury severity. Accident data include
information on accident severity, time of accident, involved driver (e.g., age,
gender, if the driver wears seat belt), accident type, accident cause, and
involved vehicle (e.g., vehicle type). Roadway data include characteristics of
the roadway on which the accidents occurred (e.g., sight distance, divided
or undivided). Weather data include the weather conditions when the acci-
dent occurred. The description and levels of these variables are given in
Table 1.

Accident Severity Analysis by Partial Classification


Many accident severity analysis models are built to identify factors
that are associated with accident severity, using roadway variables, driver
characteristics, weather data, and accident data as predictors. Models based
MOPSO for Accident Severity Analysis 559

TABLE 1 Description of the Variables

Variables Values Codes

Accident time day 1


night 2
Protection facilities no 1
yes 2
Road divider none 1
median strip 2
motor/nonmotor divided 3
median strip & motor/ 4
nonmotor divided
Sight distance <50 1
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

[50,100) 2
[100,200] 3
>200 4
Seat belt not wear 1
wear 2
Driver professional 1
nonprofessional 2
without license 3
Vehicle small passenger car 1
large passenger car 2
small truck 3
large truck 4
other 5
Collision type sideswipe 1
pedestrian-vehicle 2
rear-end 3
head-on 4
fixed-object 5
rollover 6
Accident cause follow too close 1
fail to brake 2
drunk driving 3
not yield as it has to 4
drive in the wrong lane 5
speeding 6
Weather sunny 1
rain 2
other 3
Gender male 1
female 2
Age <26 1
[26,55] 2
>55 3
Accident severity nonfatal 0
fatal 1

on the partial classification technique have not been developed. This


method focuses on finding useful knowledge that can easily be understood
by users.
560 C. Qiu et al.

There are two main reasons for using partial classification to analyze the
risk factors and accident severity. First, partial classification can deal with the
unbalanced dataset. As in the accident data, there are relatively fewer fatal
accidents. Traditional classification methods can achieve high classification
accuracy when the fatal accidents are misclassified (Chang and Wang 2006).
However, the patterns of the fatal accidents are very important in order
for traffic departments to take measures to reduce accident severity. Partial
classification targets finding rules for a particular class. Second, the rules
generated by partial classification are easy to understand. This technique
doesn’t seek to find all the rules, but a set of comprehensible rules for users.
In accident severity analysis, the data were divided into two classes: fatal acci-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

dents and nonfatal accidents. Partial classification was used to discover rules
for both classes. These rules can represent strong descriptions of each class.

PARTIAL CLASSIFICATION
Basic Concepts
In data mining, classification is a task of building a model that distin-
guishes items belonging to different classes, on the basis of a set of training
data whose class label is known. There are various kinds of classification
techniques, such as logistic regression, neural networks, decision trees, and
so forth. These models are basically evaluated according to their predictive
accuracy on the unseen test dataset. They don’t emphasize producing knowl-
edge that can be easily understood by end users. However, in many real-life
problems, complete classification may be infeasible and undesirable because
of the dataset characteristics or the user’s preference (Ali, Manganaris, and
Srikant et al. 1997).
In contrast, partial classification aims at discovering knowledge of data
classes, but might not cover all the classes or all the examples of a given
class. In partial classification, the main goal is to learn accurate rules that
can indicate characteristics of data classes. Partial classification can provide
valuable knowledge about the dataset and help the user’s decision making.
Association rule mining is similar to partial classification. They both aim
to obtain a set of simple and comprehensible rules. The main difference
between them is that the consequent of association rule mining is not a par-
ticular class, but can be any conjunction of attributes. By fixing the class
attribute as the consequent, association rule mining can be used to solve
partial classification problems (Ali, Manganaris, and Srikant 1997).

Simple Rules
The task of partial classification is to find a set of simple rules that can
indicate some relationship between the risk factors and accident severity.
MOPSO for Accident Severity Analysis 561

Partial classification can produce rules for each class that can represent a
strong description of the particular class, even when the class has very few
fatal accidents in the database.
Rules produced by partial classification are of the form: “if A then C,”
where A is the rule antecedent that comprises a conjunction of attributes,
and C is the rule consequent, specifying the value of the goal attribute.
In partial classification, the consequent is a fixed value (i.e., the class
attribute). Samples with the same goal attribute are associated. In accident
severity analysis, goal attribute is accident severity and it can take “fatal acci-
dent” or “nonfatal accident.” By fixing on each class attribute, rules can be
obtained. These rules indicating an increased chance of accident severity
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

can be used to reduce accident severity.

Rule Evaluation
The discovered rules should have high predictive accuracy and high
comprehensibility. A very common way to measure rule accuracy is to cal-
culate how many cases both antecedent and consequent hold out of all cases
in which the antecedent holds. This metric is simple and easy to implement.
It is defined as:

|A&C| − 1/2
PA = , (1)
|A|

where |A| is the number of instances satisfying all the conditions in the
antecedent A and |A&C| is the number of records that satisfy both the
antecedent A and consequent C.
There are many ways to measure a rule’s comprehensibility. In gen-
eral, the shorter the rule, the more comprehensible it is. The standard
way to measure comprehensibility is to count the number of attributes in
the antecedent part. The comprehensibility decreases when the number
increases. The comprehensibility (COM) of a rule is calculated as follows:
num
COM = 1 − , (2)
m
where num is the number of attributes in the antecedent part of the rule and
m is the number of total attributes.
These two metrics measure different features of a rule, and they are
noncommensurable and conflicting. Rules with high accuracy might not be
comprehensible, whereas rules with high comprehensibility might not be
accurate. Traditional ways to deal with the conflicting criteria are to weigh
all the objectives or give preference to objectives (lexicographic). These
methods all need some a priori information, which is difficult to decide
562 C. Qiu et al.

beforehand, and they can obtain only one solution at a time. By contrast,
the Pareto dominance approach can generate a set of optimal solutions that
approximate the Pareto front in a single run of the algorithm.

Multiobjective Rule Optimization


Many real-world optimization problems involve MOs that should be
optimized simultaneously. Sometimes, these objectives are even conflicting.
Improving one objective could worsen at least one other objective. Contrary
to the single-objective (SO) optimization problem, there is no single opti-
mal solution in a multiobjective (MO) optimization problem, but a set of
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

trade-off solutions known as Pareto Optimal solutions.


Without loss of generality, we consider a multiobjective maximization
problem. It can be stated as:

Max y = f (x) = (f1 (x), . . . , fn (x))


(3)
x = (x1 , . . . , xm ) ∈ X, y = (y1 , . . . , yn ) ∈ Y,

where x is the decision vector, X is the decision space, y is the objective vector,
and Y is the objective space.
A vector xk is said to dominate another vector xl , denoted as:

∀i ∈ 1, 2, . . . , n : fi (xk ) ≥ fi (xl ),
(4)
∃i ∈ 1, 2, . . . , n : fi (xk ) > fi (xl ).

A solution xk is called Pareto optimal if there does not exist another xl ∈ X


that dominates it.
In the accident severity analysis task, the algorithm aims to maximize
both rule accuracy and comprehensibility. The concept of Pareto dominance
is illustrated in Figure 1. In these three rules, Rule 1 is not dominated by any
other rule because it has the largest value on comprehensibility. Rule 2 is
also not dominated by any other rule because of its highest accuracy. Rule
3 is not dominated by Rule 1 for its higher accuracy but it is dominated by
Rule 2. Therefore, Rule 1 and Rule 2 form the Pareto optimal rule set.

A KMOPSO APPROACH FOR ACCIDENT SEVERITY ANALYSIS


Particle swarm optimization (PSO) is a relatively new algorithm, which is
inspired by the social interaction of birds flocking (Kennedy and Eberhart
1995). PSO has been proved to be very effective in a wide variety of optimiza-
tion problems because of its fast convergence and ease of implementation.
MOPSO for Accident Severity Analysis 563
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

FIGURE 1 Concept of Pareto optimality.

The original PSO was proposed to solve single-objective problems.


Extending original PSO to multiobjective PSO requires a redefinition of the
global best (gbest) in order to obtain a set of nondominated solutions. The
gbest in the PSO has a great impact on convergence and diversity of solu-
tions. Researchers have proposed many methods to choose gbest, such as the
crowding distance (Raquel and Naval 2005), the sigma method (Mostaghim
and Teich 2003), and so forth. These methods all have some drawbacks.
For example, the crowding distance method chooses gbest only from the
sparse regions, which might decrease the convergence speed. The sigma
method may lead to premature convergence when the initial particles are
bad distributed (Yang et al. 2009). This article proposes a novel gbest selec-
tion method based on a K -means algorithm and proportional distribution
in order to lead to a diverse and uniformly distributed set of solutions.

Particle Swarm Optimization


PSO is a heuristic technique. A standard PSO includes a swarm of
particles that represent solutions of the problem. Particles fly in a multidi-
mensional search space looking for the optimal position according to their
own flying experience and the experience of the best particle in the swarm.
PSO is easy to implement and converges very fast. It has been successfully
used in many areas.
Let xi = (xi1 , xi2 , · · · , xiD ) be the ith particle in the swarm. D is the dimen-
sion of the search space. Its current velocity is vi = (vi1 , vi2 , · · · , viD ). In the
basic PSO algorithm, the positions of particles are updated by the following
equations:
564 C. Qiu et al.

t+1
vid = ω × vid
t
+ c1 × r1t × (pbestidt − xid
t
) + c2 × r2t × (gbestdt − xid
t
), (5)

t+1
xid = xid
t
+ vid
t+1
, (6)
t t
where vid is the dth dimension of the velocity of particle i in cycle t; xid
t
the dth dimension of the position of particle i in cycle t; pbestid is the dth
dimension of the position of personal best of particle i in cycle t; gbestdt is the
the dth dimension of the position of global best in cycle t; ω is the inertia
weight, which plays an important role in balance global and local search.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

A large inertia weight promotes global search and a small inertia weight is
more appropriate for local search. The value is typically set between 0 and
1. The cognitive weight is c 1 , and c 2 is the social weight; r1t and r2t are two
random numbers.

Rule Representation
Each particle in the population represents a partial classification rule
in the form of “if A then C.” Only the antecedent is shown in the particle.
The consequent part of the rule is predefined in each run of the algo-
rithm. Hence, it is not represented in the encoding. If there are m decision
attributes, the size of a particle is m. The encoding is shown in Figure 2. The
ith attribute is Ai . All the Ai is in the range [0, 1].
In order to form a rule, the bit string should be translated into the
original information. The translation is as follows:

Vi = int(Ai ∗ (Counti + 1)), (7)

where Vi means the value translated from the particle for the ith attribute,
Counti is the total number of different values of the ith attribute, int(x)
represents the biggest integer number smaller than x. If Vi is 0, this means
the ith attribute is absent in the rule. Hence, the rule in this study is variable-
length. The antecedent part consists of at least one attribute.

Multiobjective Particle Swarm Optimization


In the beginning of the algorithm, the positions of all the particles are
randomly initialized. Their speed is set as 0. Their pbest are their present

FIGURE 2 Rule representation of a particle.


MOPSO for Accident Severity Analysis 565

positions. All the nondominated solutions are stored in an external archive.


A gbest selection strategy based on a K -means algorithm is used to choose
the gbest from the archive in order to spread the particles along the entire
Pareto front. The archive is updated after each cycle.
The algorithm of KMOPSO is as follows:

% M is the population size,


% x is the position of particle,
% v is the speed of each particle,
% P is the population,
% t is the iteration counter,
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

% K is the number of clusters.


Initialize the population:
For i =1 to M ,
Initialize xi randomly.
Initialize vi = 0.
Evaluate all the particles.
Store the nondominated vectors in P into the external archive A.
Initialize the personal best of each particle i: pbesti = xi .
While t < maximum number of iterations,
DO
For each particle, gbest is chosen from the archive with the K -means guide
selection strategy. The method will be discussed later.
Compute the speed of each particle with Equation (5).
Compute the new position of each particle with Equation (6).
If xi goes beyond its search boundaries, we take two measures: (1) set the
decision variable the value of its corresponding lower of upper bound-
ary, (2) its velocity is multiplied by −1 in order to make it search in the
opposite direction.
Evaluate all the particles in the population.
Update the external archive A. Insert all the currently nondominated
solutions into A. Delete any dominated solution from A.
Update pbest for each particle. If the current solution is dominated by the
current pbest, the pbest is kept; otherwise, the particle position is updated.
If neither of them is dominated by the other, one of them is randomly
selected as the new pbest.
Increment the loop counter: t = t + 1.
End while.

gbest Selection Strategy Based on K-means Algorithm


In MOPSO, gbest is very important in guiding the entire population
to move toward the true Pareto front. Different from single-objective
566 C. Qiu et al.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

FIGURE 3 Divide particles into K clusters.

optimization problem, there exists a set of nondominated solutions. In our


algorithm, we implement elitism by incorporating an external archive.
A fixed-size external archive is used to store all the nondominated solu-
tions found along the search process in order to prevent the loss of good
solutions. In each cycle, we need to choose gbest from the archive for each
particle in the population in order to guide their fly. The archive is updated
in each cycle. If the candidate solution is not dominated by any solution in
the archive, it will be added to the archive. Any archive members dominated
by this solution will be removed from the archive.
This article introduced a gbest selection strategy based on a K -means algo-
rithm in order to lead to a diverse and uniformly distributed set of solutions.
As shown in Figure 3, the first step of the gbest selection method is to divide
the particles in the archive into K -clusters according to their corresponding
objective function values. It operates as follows:

Randomly choose K solutions, each of which represents a cluster mean or


center.
For each of the remaining solutions, a solution is assigned to the cluster to
which it is the most similar. The similarity is evaluated by the Euclidean
distance between the solution and the cluster center.
Compute the new center of each cluster.
Iterate Steps 2 and 3 until convergence of objective function:


K 
E= | f − mi | , (8)
i=1 f∈Ci
MOPSO for Accident Severity Analysis 567

where E is the sum of the square error for all solutions in the archive; f is
the point in space representing a given solution; mi is the mean of the
cluster Ci .
In each cluster, find the solution nearest to the centroid and consider it as
the representative solution of the cluster.

After the clustering process, the clusters are separated from each other and
the solutions in the same cluster are similar. The representative solutions lie
in diverse regions of the nondominated front. The next step is to choose
gbest from the K representative particles for each particle in the population.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

In the K clusters, they contain different numbers of particles. The more


particles a cluster contains, the more crowded it is. If a cluster has a large
number of particles, it means this region is very crowded. We should encour-
age fewer particles to move toward these crowded regions. A proportional
distribution method is used to encourage particles in the population to move
to the sparse regions. The probability of a representative particle i being
chosen as the global guide is calculated as follows:

1/
numi
pi = , (9)

K
1/
numi
i=1

where pi is the probability of the ith representative particle being chosen


as the gbest. The number of particles the ith cluster contains is numi . As is
shown in Equation (5), the representative particle i has more opportunity to
be chosen as the global best if the corresponding cluster has fewer particles.
This gbest selection algorithm considers both the global and local infor-
mation of the nondominated front. The gbest chosen by this method can
represent the distribution of all the nondominated solutions and encourage
the solutions in the sparse regions. The selected gbest can guide the parti-
cles in the population to move toward different regions in order to acquire a
uniformly distributed Pareto front. The particles in the crowded regions also
have the opportunity to be chosen as the global guide. Hence, this algorithm
would not decrease the speed of convergence.

Time-Decreasing Inertia Weight


Inertia weight plays an important role in balancing global and local
search. A large inertia weight promotes global search and a small iner-
tia weight is more appropriate for local search. Eberhart and Shi (2000)
proposed a time-decreasing inertia weight. The inertia weight is shown as
follows:
568 C. Qiu et al.

t
w t = wmax − (wmax − wmin ), (10)
maxgen

where wmax and wmin are the maximum and minimum values of the inertia
weight; t is the current iteration and maxgen is the maximum iteration of
the algorithm. In this study, we adopted this time-decreasing inertia weight.
Therefore, the algorithm explores the search space initially and later focuses
on the most promising regions.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

EXPERIMENT RESULTS
Comparative Study
KMOPSO uses 100 particles in the population and runs 500 generations.
The maximum number of the archive is 100. The learning rates r1 and r2 are
1.0 and 1.0, respectively; w is time decreasing in order to balance global and
local search; wmax is 0.9, and wmin is set as 0.4. The number of cycles in the
K -means algorithm is 20. The number of clusters (K ) is time varying. It is
shown as follows:

⎪ |A| when 0 < A ≤ 3

3 when 3 < A ≤ 10
K = (11)

⎩5 when 10 < A ≤ 30
10 when 30 < A ≤ 100

where A represents the archive. In our experiment, the number of nondom-


inated rules in the archive seldom exceeds 30. Therefore, in most situations,
K is 5, which is a proper number, validated by many repeated experiments.
In order to estimate the partial classification model, the dataset is divided
into two parts: training set (3/4 of the dataset) and test set (1/4 of the
dataset). The algorithm runs on the training set to obtain a set of rules,
and the rules are evaluated on the test set. The algorithm produces rules of
the ith class in the ith run of the algorithm (Dehuri and Mall 2006). Hence,
we need to run the algorithm separately in order to acquire rules for both
classes and to discover how different combinations of risk factors affect the
accident severity.
Three-rule learning algorithms are chosen to compare with KMOPSO:
PART (a rule-induction algorithm based on a partial decision tree; Frank and
Witten 1998), repeated incremental pruning to produce error reduction
(RIPPER, a propositional rule learner; Cohen 1995), and C4.5 (a decision
tree algorithm; Quinlan 1993). All these algorithms were run with the Weka
tool using the default parameters (Witten and Frank 2005).
MOPSO for Accident Severity Analysis 569

TABLE 2 Number of Rules Obtained by all Four Algorithms

KMOPSO PART RIPPER C4.5

Nonfatal 7 90 1 33
Fatal 5 52 6 26
Total 12 142 7 59

In the three algorithms, PART has been used in accident severity anal-
ysis based on the accident data of Ethiopia (Beshah and Hill 2010). This
research focused only on the road-related factors; it did not consider the
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

drivers’ characteristics, weather conditions, and some other accident-related


information.
Table 2 presents the number of rules obtained by four algorithms. PART
and C4.5 mined many rules. KMOPSO and RIPPER generated much fewer
rules. In the real-world problems, users would prefer a set of accurate and
comprehensible rules. They can combine useful rules and their own expe-
riences to make decisions. Too many rules would make it hard for the
users to find useful patterns from the rule set. Between KMOPSO and
RIPPER, RIPPER found only one rule for the nonfatal class, and it was a
default rule (it simply classifies data into the majority class, i.e., the nonfatal
class), which means it did not find any useful pattern for the nonfatal acci-
dents. In contrast, KMOPSO mined a moderate number of rules for both
classes.
The quality of the rules generated by four algorithms can be seen from
Tables 3 and 4. Table 3 shows the comparison of the rules generated for the
nonfatal accidents. The obtained rules for the fatal accidents are compared
in Table 4. The tables show predictive accuracy for the training set (PA),
comprehensibility (C), and predictive accuracy for the test set (PA(T)). For

TABLE 3 Comparison of the Rules for the Nonfatal Accidents

KMOPSO PART RIPPER C4.5

PA Best 0.9737 1 NA 1
Average 0.8753 0.827 0.6471 0.7052
Worst 0.7607 0.7317 NA 0.5
C Best 0.9167 0.9167 NA 0.8333
Average 0.7857 0.625 1 0.691
Worst 0.75 0.1667 NA 0.5
PA(T) Best 1 1 NA 1
Average 0.7956 0.754 0.6314 0.5933
Worst 0.6696 0.6861 NA 0

Bold entries indicate the best average results.


570 C. Qiu et al.

TABLE 4 Comparison of the Rules for the Fatal Accidents

KMOPSO PART RIPPER C4.5

PA Best 0.9583 1 0.6875 1


Average 0.6873 0.7244 0.637 0.82
Worst 0.4011 0.5 0.5128 0.6
C Best 0.9167 0.8846 0.9167 0.8333
Average 0.8667 0.5223 0.8056 0.6991
Worst 0.75 0.3889 0.75 0.5
PA(T) Best 0.8571 0.6667 0.8182 1
Average 0.716 0.4502 0.6042 0.3951
Worst 0.6326 0.3243 0.4615 0
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

Bold entries indicate the best average results.

each metric, the best, average, and worst values are shown. The best average
results obtained are shown in boldface.
Because RIPPER produced only one default rule for the nonfatal class, it
is not considered here. In the rules generated by the other three algorithms,
PART and C4.5 both produce rules with best predictive accuracy for the
training set, but the average performance of KMOPSO is better than PART
and C4.5. In the testing set, KMOPSO also shows the best predictive accu-
racy. With respect to comprehensibility, the rules generated by KMOPSO
are much more understandable, which can be illustrated by the higher C
value than PART and C4.5. It should be noticed that even the worst C value
of KMOPSO is 0.75, which is higher than the average of PART and C4.5.
The worst C value of PART is only 0.1667. For example, a mined rule with
comprehensibility of 0.1667 is below:

If(weather = sunny)∧ (vehicle = small passenger car)∧ (collision type =


head on)∧ (age = [26,55])∧ (accident cause = drive in the wrong
lane)∧ (protection facilities = no)∧ (road divider = none)∧ (gender =
male)∧ (driver = non-professional)∧ (sight distance >200) ⇒ (accident
severity = non-fatal)

This rule consists of ten attributes in the antecedent part. It is hard to


understand and find useful patterns from the long rule. In contrast, a rule
with comprehensibility of 0.75 is shown:

<If(protection facilities = 2)∧ (road divider = 4)∧ (gender = 2) ⇒ (accident


severity = non-fatal)

This rule has only three attributes, which makes it much easier to
understand.
With regard to the rules for fatal accidents, C4.5 places first in terms
of accuracy in the training set. KMOPSO is behind C4.5 and PART, only
MOPSO for Accident Severity Analysis 571

slightly better than RIPPER. However, C4.5 and PART show poor generaliza-
tion ability; their predictive accuracies in the testing set are both very low.
KMOPSO shows good generalization ability, because the predictive accuracy
in the testing set is even higher than in the training set. With respect to
comprehensibility, KMOPSO does considerably better than the other three
algorithms, because the C value of KMOPSO is much larger.
From these results, we can see that KMOPSO performs well on the traffic
accident dataset in terms of accuracy and comprehensibility. There are three
main advantages of KMOPSO compared with the other algorithms:
KMOPSO mined a small set of rules with high accuracy and
comprehensibility. Too many rules would hinder the users’ decision mak-
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

ing. It’s difficult for them to find useful knowledge from the large set of
rules. In contrast, a small set of rules is easy for users to understand. RIPPER
also obtained a small set of rules, but it produced a default rule only for the
nonfatal class.
The rules mined by KMOPSO show good generalization ability. The
rules show good predictive accuracy in the testing set. In the fatal-accident
dataset, the accuracy in the testing set is even higher than in the training set.
In contrast, the PART and C4.5’s generalization ability are relatively poor.
The rules obtained by KMOPSO show good comprehensibility. This
advantage can be attributed to the adoption of MO. Comprehensibility is
chosen as one object to be optimized.

Discussion on the Rules Mined by KMOPSO


Table 5 shows the rules generated by KMOPSO. The table shows
class (CL), mined rules, predictive accuracy for training set (PA),
comprehensibility (C), and predictive accuracy for test set (PA(T)). Figure 4
and Figure 5 show the performance metric of KMOPSO in the form of a
histogram.
As is shown in Table 5, for class 0 (nonfatal accidents), seven rules were
generated and five rules were discovered for class 1 (fatal accidents). All
the rules in the same class are nondominated in considering accuracy for
the training set and comprehensibility. Some rules show better predictive
accuracy whereas some are easy to understand. In the testing dataset, most
of the rules show good generalization ability.
In the seven rules for class 0, four rules are very accurate, with accuracy
over 80%. The rest of the three rules’ accuracies are not bad, even the lowest
one has 66.96% accuracy. Rule 1 has the highest accuracy in that the entire
testing set supports this rule, and it has good comprehensibility, too, with
only two predictors in the antecedent. Rule 2 shows good accuracy in both
the training and testing dataset but its comprehensibility is relatively low.
Rules 3 and 4 both show high accuracy and good comprehensibility.
572 C. Qiu et al.

TABLE 5 Rules Generated for the Accident Dataset

Training Set Test Set

Rule ID CL Mined Rules PA C PA(T)

1 0 If(road divider=4)∧ (gender=2) 0.8654 0.8333 1


2 0 If(protection facilities=2)∧ (road 0.9737 0.5833 0.8947
divider=4)∧ (seat
belt=2)∧ (driver=1)∧ (age=2)
3 0 If(gender=2) 0.7607 0.9167 0.8143
4 0 If(protection facilities=2)∧ (road 0.9583 0.75 0.8243
divider
=4)∧ (gender=2)
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

5 0 If(sight distance=2)∧ (accident 0.8684 0.8333 0.6909


cause=3)
6 0 If(road divider=1)∧ (seat belt=2) 0.9375 0.75 0.6696
7 0 If(accident cause=3)∧ (gender=2) 0.7632 0.8333 0.6752
8 1 If(collision type=6)∧ (weather=2) 0.9444 0.8333 0.8571
9 1 If(accident cause=6)∧ (weather=2) 0.9583 0.75 0.8
10 1 If(weather=2) 0.6046 0.9167 0.6538
11 1 If(collision type=6) 0.5282 0.9167 0.6326
12 1 If(collision type=5) 0.4011 0.9167 0.6364

FIGURE 4 Rules generated by KMOPSO for nonfatal accidents.

In the five rules for class 1, two rules show good accuracy, over 80%.
Rule 8 has only two predictors in its antecedent. As for the rest of the
three rules with relatively lower predictive accuracy, they all show great
comprehensibility, with only one attribute in the antecedent.
Based on these rules, a detailed discussion on the risk factors that
affect injury severity is given below. From the rules for the nonfatal acci-
dents, we can find that the following factors can effectively reduce accident
severity:
MOPSO for Accident Severity Analysis 573
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

FIGURE 5 Rules generated by KMOPSO for fatal accidents.

Road dividers are a key factor affecting accident severity. This attribute is present
in four rules. Three rules with very high accuracy show that when the
road has median strip and is motor/nonmotor divided, the accidents
are less severe. A median strip can divide high-volume traffic flow in
opposite lanes. It can restrict turns and reduce the number of head-on
crashes. This is consistent with the research of the U.S. Federal Highway
Administration (1993).
Mixed traffic is very normal in Beijing and this study shows mixed traf-
fic is dangerous because bicycles are vulnerable under mixed traffic
conditions.
The road protection facilities show significant effect on accident severity. Two
rules show that with protection facilities, the accident severity would
decrease.
Driver’s gender. Four rules show that accidents caused by female drivers tend
to be less severe. In the original dataset, there are many more men
drivers’ accidents than those of women drivers. But the partial classifi-
cation technique can still discover that women’s accidents are less likely
to be fatal accidents. This is because men, especially young men, tend to
drive more aggressively than women, which makes them more likely to
be involved in fatal accidents.
Seat belts. Two rules show that seat belts can be helpful to reduce accident
severity. Safety belts can prevent death in many accidents.

From the rules for the fatal accidents, we can find that the following
factors are associated with fatal accidents:
574 C. Qiu et al.

Foul weather. According to Rules 8, 9, and 10, accidents occurring on rainy


days tend to be fatal crashes. It is difficult to drive safely without an acci-
dent on rainy days than on normal days. On rainy days, the road would
get wet, the vision will be poor, and the windscreen would get fogged
easily, which makes difficult for the driver to see the other cars or traffic
signals. The driver should be more careful and get prepared in this kind
of weather, because it is not only common that it will lead to an accident,
it would also increase the accident severity.
Speeding and rainy weather. Rule 10 indicates that the combination of rain
and speeding would lead to severe accidents. This rule has high predic-
tive accuracy in both the training and testing datasets. In rainy weather,
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

because of the blurred vision and wet road surface, speeding is extremely
dangerous.

These interpretations of the generated rules can provide related authori-


ties some straightforward knowledge about risk factors and accident severity.
They can be used to promote road safety.

CONCLUSIONS
This article proposed a novel KMOPSO-based partial classification tech-
nique to analyze traffic accident data in order to find the contributing
factors that influence accident severity. Unlike many other accident analy-
sis models, which aim at building classifiers, partial classification seeks to
find knowledge that can indicate some relationships between risk factors
and accident severity. The traffic accident data of Beijing were collected
to build the model. The experiment results show the extracted rules by
KMOPSO have higher accuracy than those by PART algorithm. Also, the
rules generated by KMOPSO are much simpler and easier for end users
to understand. Thus, the proposed approach can provide useful deci-
sion supports for accident severity analysis tasks. From the extracted rules,
it can be seen that road dividers, protection facilities, weather, gender,
and accident causes are the main contributing factors in deciding acci-
dent severity. This knowledge can be used to help take a priori measures
in order to reduce accident severity and thereby improve road traffic
safety.

FUNDING
The authors acknowledge support from High Technology Research and
Development Program (863 Project) (Grant No. 2009AA04Z120).
MOPSO for Accident Severity Analysis 575

REFERENCES
Alatas, B., and E. Akin. 2009. Multi-objective rule mining using a chaotic particle swarm optimization
algorithm. Knowledge-Based Systems 22(6):455–460.
Ali, K., S. Manganaris, and R. Srikant. 1997. Partial classification using association rules. In Proceedings
of the third international conference on knowledge discovery and data mining , 115–118. Menlo Park, CA:
ACM Press.
Al-Ghamdi, A. S. 2002. Using logistic regression to estimate the influence of accident factors on accident
severity. Accident Analysis and Prevention 34(6):729–741.
Bédard, M., G. H. Guyatt, M. J. Stones, and J. P. Hirdes. 2002. The independent contribution of driver,
crash, and vehicle characteristics to driver fatalities. Accident Analysis and Prevention 34(6):717–727.
Beshah, T., and S. Hill. 2010. Mining road traffic accident data to improve safety: Role of road-related
factors on accident severity in Ethiopia. In Proceedings of AAAI artificial intelligence for development,
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

1–8, Stanford, CA, USA: AAAI Press.


Chang, L.Y., and H. W. Wang. 2006. Analysis of traffic injury severity: An application of non-parametric
classification tree techniques. Accident Analysis and Prevention 38(5):1019–1027.
Cohen, W. W., 1995. Fast effective rule induction. In Proceedings of the twelfth international conference on
machine learning , 115–123. Tahoe City, CA, USA: ACM Press.
Dehuri, S., and R. Mall. 2006. Predictive and comprehensible rule discovery using a multi-objective
genetic algorithm. Knowledge-Based Systems 19(6):413–421.
Dursun, D., S. Ramesh, and B. Max. 2006. Identifying significant predictors of injury severity in traffic
accidents using a series of artificial neural networks. Accident Analysis and Prevention, 38(3):434–444.
Eberhart, R. C., and Y. H. Shi. 2000. Comparing inertia weights and constriction factors in particle swarm
optimization. In Proceedings of IEEE congress on evolutionary computation, 84–88. La Jolla, CA, USA:
IEEE Press.
Federal Highway Administration; Highway Safety Research Center. 1993. The association of median width
and highway accident rate (Summary Report, Highway Safety Information System, University of North
Carolina).
Frank, E., and I. H. Witten.1998. Generating accurate rule sets without global optimization. In Proceedings
of the 15th international conference on machine learning (ICML ’98), 144–151. San Francisco, CA: ACM
Press.
Freitas, A. A. 1999. A genetic algorithm for generalized rule induction. In Advances in soft computing-
engineering design and manufacturing , ed. R. Roy et al., 340–353. New York, NY: Springer-Verlag.
Iglesia, B. de la, G. Richards, M. S. Philpott, and V. J. Rayward-Smith. 2006. The application and effec-
tiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of
Operational Research 169(3):898–917.
Kennedy, J., and R. C. Eberhart. 1995. Particle swarm optimization, In Proceedings of IEEE international
conference on neural networks, 1942–1948. Perth, Australia: IEEE Press.
Kockelman, K. M., and Y. J. Kweon. 2002. Driver injury severity: An application of ordered probit models.
Accident Analysis and Prevention 34(3):313–321.
Milton, J. C., V. N. Shankar, and F. L. Mannering. 2008. Highway accident severities and the mixed logit
model: An exploratory empirical analysis. Accident Analysis and Prevention 40(1):260–266.
Mostaghim, S., and J. Teich. 2003. Strategies for finding good local guides in multi-objective particle
swarm optimization (MOPSO). In Proceedings of 2003 IEEE swarm intelligence symposium, 26–33.
Indianapolis. IN: IEEE Press.
Oña, J. De, R. O. Mujalli, and F. J. Calvo. 2011. Analysis of traffic accident injury severity on Spanish rural
highways using Bayesian networks. Accident Analysis and Prevention 43(1):402–411.
Quinlan, R. 1993. C4.5: Programs for machine learning . San Mateo, CA, USA: Morgan Kaufmann.
Raquel, C. R., and P. C. Naval. 2005. An effective use of crowding distance in multiobjective particle
swarm optimization. In Proceedings of genetic and evolutionary computation conference, (GECCO’2005),
257–264. Washington, D.C., USA: ACM Press.
Reynolds, A. P., and B. Iglesia de la. 2009. A multi-objective GRASP for partial classification. Soft Computing
13(3):227–243.
576 C. Qiu et al.

Witten, I. H., and E. Frank. 2005. Data mining: Practical machine learning tools and techniques (2nd ed.). San
Francisco, CA: Morgan Kaufmann.
Yamamoto, T., and V. N. Shankar. 2004. Bivariate ordered-response probit model of driver’s and passen-
ger’s injury severities in collisions with fixed objects. Accident Analysis and Prevention 36(5):869–876.
Yang, J., and J. Zhou, L. Liu, and Y. Li., 2009. A novel strategy of Pareto-optimal solution searching in
multi-objective particle swarm optimization (MOPSO). Computers and Mathematics with Applications
57(11–12):1995–2000.
Yau, K. K. W., H. P. Lo, and S. H. H. Fung. 2006. Multiple-vehicle traffic accidents in Hong Kong. Accident
Analysis and Prevention 38(6):1157–1161.
Downloaded by [Carnegie Mellon University] at 23:05 05 November 2014

You might also like