Mining Techniques For Analysing Road Accidents: S.R.K.R Engineering College, Chinna Amiram, Bhimavaram - 534204, India

ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer and Communication Engineering

An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 1, April 2018
A NATIONAL LEVEL CONFERENCE ON CURRENT TRENDS OF INFORMATION TECHNOLOGY [TECHFLEET 2K18]

20TH-21ST FEB’18
Organized by
S.R.K.R ENGINEERING COLLEGE, CHINNA AMIRAM, BHIMAVARAM – 534204, INDIA
Mining Techniques for Analysing Road

Accidents
P.D.S.S.Lakshmi Kumari 1, S.Suresh Kumar 2
Assistant Professor, Department of Information Technology, SRKR Engineering College, Bhmiavarm, India1
Assistant Professor, Department of Information Technology, SITE College, Tadepalligudem, India2
ABSTRACT :Due to globalization, the economic activities are carried across the globe.This causes the increase of
vehicle traffic, which leads to critical accidents.There is a heterogenous amount of critical accidents data which is
difficult to analyze for finding the constraints causing the road accidents.In this paper, we apply mining algorithms on
critical accident dataset to address this problem.The relationship between critical rate and other attributes combining
weather condition, road type, sunlight condition, speed limit, and drunk driver are considered.Apriori Algorithm for
finding an association between attributes.Naive based approach for classifying how attributes are conditionally
independent.K-means is used to form clusters and to analyze them based on attributes.By using these Statistics
government agencies can take decisions in developing new roads and taking additional safety measures.
KEYWORDS: Critical Accidents, Association, Classification, Clustering
I. INTRODUCTION
There are a lot of vehicles driving on the roadway every day, and traffic accidents could happen at any time
anywhere. Some accident involves fatality, means people die in that accident. As human being, we all want to avoid
accident and stay safe. To find out how to drive safer, data mining technique could be applied on the traffic accident
dataset to find out some valuable information, thus give driving suggestion .Data mining uses many different
techniques and algorithms to discover the relationship in large amount of data. It is considered one of the most
important tool in information technology in the previous decades. Association rule mining algorithm is a popular
methodology to identify the significant relations between the data stored in large database and also plays a very
important role in frequent item set mining. A classical association rule mining method is the Apriori algorithm whose
main task is to find frequent item sets, which is the method we use to analyse the roadway traffic data. Classification in
data mining methodology aims at constructing a model (classifier) from a training data set that can be used to classify
records of unknown class labels. The Naive Bayes technique is one of the very basic probability-based methods for
classification that is based on the Bayes hypothesis with the presumption of independence between each pair of
variables. We used the FARS dataset for our study. The Fatal Accidents Dataset contains all fatal accidents on public
roads in 2007 reported to the National Highway Transportation Safety Administration (NHTSA).The dataset is
downloaded from California Polytechnic State University and all data originally came from FARS (Fatal Accident
Reporting System).The dataset contains 37,248 records and 55 attributes. The data description can be found in the
document.
1.1SCOPE
The scope of this project is meant to analyse the database and take preventive measures to avoid fatal accidents using
data mining.
India has second largest road network in the world. Road accidents happen quite frequently and they claim too many
lives every year. It is necessary to find the root cause for road accidents in order to avoid them. Suitable data mining
approach has to be applied on collected datasets representing occurred road accidents to identify possible hidden
relationships and connections between various factors affecting road accidents with fatal consequences. The results
obtained from data mining approach can help understand the most significant factors or often repeating patterns. The
generated pattern identifies the most dangerous roads in terms of road accidents and necessary measures can be taken to
avoid accidents in those roads. The objective of this project is to reduce the number of accidents caused during road
Copyright to IJIRCCE www.ijircce.com 158

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
accidents due to many reasons or conditions. In this we extract interesting patterns and make analysis using data mining
algorithms. This paper deals with the some of classification models to predict the severity of injury that occurred during
traffic accidents. Using the methods of traffic data analysis can improve the road traffic safety management level
effectively.
II. LITERATURE SURVEY
The cost of deaths and injuries due to traffic accidents has a great impact on society. In recent years,
researchers have paid a great attention at determining the factor that significantly affects accident severity in traffic
system. The author in presents a random forest & rough set theory to identify the factors significantly influencing
single vehicle crash severity. The author in presents a decision tree which predicts cause of accident and accident prone
locations. Project predict traffic accident duration of incident and driver information system. The author in used
various data mining techniques and tells the random forest outperforms than other classification algorithms. In project
we talk about the significance of data mining classification algorithm in predicting the factor which influences road
traffic accident. The author in used to explore the possible application of data mining technology for developing a
classification model and the result shows that developed model could classify accidents within a reasonable accuracy. It
is important to analyze these datasets to extracts useful knowledge. Data mining is an effective tool for analyzing data
to extract useful knowledge. Table 1 shows a sample of different data mining technique and their influential factor used
in traffic accident severity. The severity of injuries measured for crash records has both continuous and categorical
characteristics. Hence many previous studies have used models with ordered structure to analyze risk factor and their
effect on severity of injuries sustained in traffic crashes.
III. RELATED WORKS
Jayasudha analysed the traffic accident using data mining technique that could possibly reduce the fatality rate. Using a
road safety database enables to reduce the fatality by implementing road safety programs at local and national levels.
Those database schemes describes the road accident via roadway condition, person involved and other data would be
useful for case evaluation, collecting additional evidences, settlement decision and subrogation. The International Road
Traffic and Accident Database (IRTAD), GLOBESAFE, website for ARC networks are the best resources to collect
accident data. Using web data a self-organizing map for pattern analysis was generated. It could classify information
and provide warning as an audio or video. It was also identified that accident rates highest in intersections then other
portion of road the effect on road speed on accident in the state of Washington was investigated by Eric.
Some researches claim that those states which increased speed limit from 55mph to 65mph after 1974 had the fatality
rate go up by 27% compared to increase in 10% in the states that did not increase the speed limit. It is claimed that as
the effect of change in maximum speed is varies between urban and rural areas. After 1987 accident in rural areas
increased while urban areas stayed relatively constant but clash rate in urban intersection is twice as high as in rural
intersection. Accident is dependent on area (urban/rural), type of street (intersection, highway). It is assumed that the
fatality rate of an accident might be reduced with the introduction of an express emergency system. Reducing the time
of delivery for emergency medical services (EMS) accident victims could be treated in time saving their lives. Accident
notification time, the difference between crash and EMS notification time is the most crucial. Trauma is time dependent
disease and trauma victims could be saved if treated on time. Trauma victims could stabilized if treated soon. The first
10 minutes is called golden hour. The time was 12.3 minutes for the one who died and 8.4 minutes for the one who
survived. The fatality rate was also much higher in rural areas because of unavailability of rapid EMS response in those
areas. There are also several other factors affecting the fatality such as vehicle kilometers traveled, alcohol
consumption, driver age distribution, accident notification time, personal income per capita and so on Solaiman et. al.
describes various ways accident data could be collected, placed in a centralized database server and visualized the
accident. Data could be collected via different sources and the more the number of sources the better the result. This is
because the data could be validated with respect to one another few could be discarded thus helping to clean up the
data. Different parameters such as junction type, collision type, location, month, time of occurrence, vehicle type could
be visualized in a certain time strap to see the how those parameters change and behave with respect to time. Based on
those attributes one could also classify the type of accident. Using map API the system could be made more flexible
such that it could find the safest and dangerous roads.

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
Partition based clustering and density based clustering were performed by Kumar to group similar accidents
together. Based on the categorical nature of most of the data K-modes algorithm was used. To find the correlation
among various sets of attributes association rule mining was performed. First the data set is classified into 6 clusters
and each of them are studied to predict some patterns. Among the various rules that are generated those which seemed
interesting were considered based on support count and confidence. The experiments showed that the accidents were
dependent of location and most of the accident occurred in populated areas such as markets, hospitals, local colonies.
Type of vehicle was also a factor to determine the nature of accident; two wheelers met with an accident the most in
intersections and involved two or more victims. Blind turn on road was the most crucial action responsible for those
accidents and main duration of accidents were on morning time 4.00 a.m. to 6 a.m. on hills and 8 p.m. to 4 a.m. on
other roads.
IV. PROBLEM STATEMENT
The costs of fatalities and injuries due to road traffic accidents have a tremendous impact on societal well-being and
socioeconomic development. Road transport is the major transportation system in the country. The Ethiopian traffic
control system archives data on various aspects of the traffic system, such as traffic volume, concentration, and vehicle
accidents. With more vehicles and traffic. The basic hypothesis of this research is that accidents are not randomly
scattered along the road network, and that drivers are not involved in accidents at random. There are complex
circumstantial relationships between several characteristics (driver, road, car, etc.) and the accident occurrence. As
such, one cannot improve safety without successfully relating accident frequency and severity to the causative
variables. We will attempt to extend by generating additional attributes and focusing on the contribution of road-related
factors to accident Severity. This will help to identify the parts of a road that are risky, thus supporting traffic accident
data analysis in decision-making processes. The general objective of the research is to investigate the role of road-
related factors in accident severity, using RTA data from Ethiopia and predictive models.
Our three specific objectives include:
 Exploring the underlying variables (especially road-related ones) that impact car accident severity
 Predicting accident severity using different data mining techniques
 Comparing Standard classification models for this task.
In recent years the collection of information on traffic volumes has become a significant portion of the work of road
planning programs in terms of both cost and personnel. Initial methods we have manually collecting volume of traffic
data in TOD (time of day) manner for every 24 hours, manually summarizing volume of traffic data for every 24 hours.
Manually analyzing the traffic data and making Decision. Developing TOD timing plane to control the traffic problem
.There is no special tool helping to analyzing and making decisions.
Manually collecting volume of

traffic data in TOD manner (24
hours)
Manually summarizing volume of
traffic data in TOD There is no special
tool helping to
) analysis and
Manually analyzing the traffic data making decision
and making Developing TOD timing
decision
plane to control the traffic problem
Fig: 4.1 Data flow diagram for analysing existing system of Road Accidents

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
Today we are living in a data-driven world. Developments in data generation, gathering and storing technology have
empowered organizations to gather data sets of massive size. Data mining is a term that blends traditional data analysis
methods with cultured algorithms to handle the tasks stood by these new forms of data sets. a methodology which may
be used to extract knowledge from such databases in an automated manner using data mining techniques Data mining is
the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant
structure from large amounts of data stored in databases, data warehouses, or other information repositories. In this
project we consider different attributes nearly (37248) but performing analysis on these attributes leads to difficulty, so
we reduce the attributes , perform data cleaning and apply data mining algorithms such as apriori , naïve bayes
classification, k means algorithm and generate association rules.
Table 4.1: Attribute name and it’s description

Sr.No ATTRIBUTE NAME DESCRIPTION
01 Sub City Name of sub city where accident occurred
02 Particular Area Whether accident occurred in school or market areas
03 Road Separation How road segments are separated
04 Road Orientation How road is oriented
05 Road Junction Type of road junction
06 Road Surface Type Whether the road surface is asphalt or ground
07 Road Surface Condition Whether the road surface is dry, muddy or wet
08 Weather Condition The weather condition
09 Light Condition The light condition
10 Accident Severity The severity of the accident
5. SYSTEM ARCHITECTURE
DATA SET
DATA PRE-PROCESSING
CLUSTERING ALGORITHM
CLUSTER CLUSTER CLUSTER
ASSOCIATION RULE MINING ALGORITHM
ASSOCIATION RULES ASSOCIATION RULES ASSOCIATION RULES
Fig: 5.1 System architecture for Analysing Road Accidents

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
DATA PREPARATION
MISSING VALUES NUMERIC -> NOMINAL
MODELING
APRIOR: NAVIE BAYES: K – MEANS:

ASSOCIATION CLASSIFICATION CUSTERING
VISUALIZATION/RESULT ANALYSIS
Fig: 5.2 Data flow diagram for Analysing Road Accidents
VI. IMPLEMENTATION OF MODULES
Data pre-processing
Data pre-processing is one of the important tasks in data mining. Data pre-processing mainly deals with
removing noise, handle missing values, removing irrelevant attributes in order to make the data ready for the analysis.
In this step, our aim is to pre-process the accident data in order to make it appropriate for the analysis.
Clustering algorithm
There are several clustering algorithms exist in the literature. The objective of clustering algorithm is to divide
the data into different clusters or groups such that the objects within a group are similar to each other whereas objects
in other clusters are different from each other. Hierarchical clustering technique (e.g. Ward method, single linkage,
complete linkage, etc.), K means and latent class clustering (LCC) have been used in road accident analysis.
Distance measure
Given a data set D, the distance between two objects X and Y, where X and Y are described by N categorical
variables, can be computed as follows
d(X,Y)=
(Xi,YI) ={0, XI=YI 1, Xi≠Yi
In the above equations, Xi and Yi are the attribute i values in object X and Y. This distance measure is often referred as
simple matching dissimilarity measure. The more the number of differences in categorical values of X and Y, more the
different two objects are.
K-mode clustering procedure

In order to cluster the data set D into k cluster, K-modes clustering algorithm perform the following steps:
1. Initially select k random objects as cluster centers or modes.
2. Find the distance between every object and the cluster centre using distance measure defined in Eq. 1.
3. Assign each object to the cluster whose distance with the object is minimum.
4. Select a new center or mode for every cluster and compare it with the previous value of centre or mode; if
the values are different, continue with step 2.

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
Association rules
Association rule mining is a very popular data mining technique that extracts interesting and hidden relations
between various attributes in a large data set. Association rule mining produces a set of rules that define the underlying
patterns in the data set. The associativity of two characteristics of accident is determined by the frequency of their
occurrence together in the data set. A rule A → B indicates that if A occurs then B will also occur.
Given a data set D of n transactions where each transaction TЄ D. Let I = {I1, I2,…. In} is a set of items. An item set A
will occur in T if and only if
A ⊆ T. A → B is and association rule, provided that A ⊂ I, B ⊂ I and A ∩ B = Ø.
Agrawal and Srikanth proposed an algorithm known as Apriori algorithm to find the association rules from large
datasets. The pseudo-code for traditional association rule mining algorithm for frequent item set generation is
Lk={Frequent item set of size k}
Ck={candidate item set of size k}
L1={frequent 1 item sets};
K=1;
While(Lk-1≠φ) then
Ck+1=candidates generated from Lk
For each transaction tϵ D do
Increment the count of candidates in Ck+1 that also contained in t
Lk+1=candidates in Ck+1 with minimum support
K=k+1;
Return UkLk
Increment the count of candidates in Ck+1 that
Further association rules are generated from the frequent item sets and strong rules based on interestingness measures
are taken for the analysis
VII. SYSTEM IMPLEMENTATION
7.1 Apiori Algorithm

Descriptive or predictive mining applied on previous road accidents data in combination with other important
information as weather, speed limit or road conditions creates an interesting alternative with potentially useful and
helpful outcome for all involved stakeholders. Association rule mining is used to analyze the previous data and obtain
the patterns between road accidents. The two criterion used for association rule mining are support and confidence.
Apriori algorithm is one of the techniques to implement association rule mining. In the proposed system, we use apriori
algorithm to predict the patterns of road accidents by analyzing previous road accidents data.
The steps for the apriori algorithm:

 Scan the data set and find the support(s) of each item.
 Generate L1 (Frequent one item set).Use Lk-1, join Lk-1 to generate the set of candidate k - item set.
 Scan the candidate k item set and generate the support of each candidate k – item set.
 Add to frequent item set, until C=Null Set.
 For each item in the frequent item set generate all non empty subsets.
 For each non empty subset determine the confidence. If confidence is greater than or equal to this specified
confidence .Then add to Strong Association Rule.

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
Fig: 7.1 Aprior clean data for Analysing Road Accidents
7.2 Association Rules
Fig: 7.2 Aprior rules for Analysing Road Accidents

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
7.3 Naïve Bayesian Classifier
A Naive Bayesian classifier is a simple probabilistic classifier based on applying Bayesian theorem (from Bayesian
statistics) with strong (naive) independence assumptions. By the use of Bayesian theorem we can write
p(C | F1....Fn) p(C)p(F1.....Fn |C)
p(F1.....Fn)
Fig: 7.3 Naive bayes clean data for Analysing Road Accidents
Fig: 7.4: Naive bayes clean classifier for Analysing Road Accidents

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
7.4 K-Means
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering
problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters
(assume k clusters) fixed apriori. The main idea is to define k-centers, one for each cluster. These centers should be
placed in a cunning way because of different location causes different result. So, the better choice is to place them as
much as possible far away from each other. The next step is to take each point belonging to a given data set and
associate it to the nearest center. When no point is pending, the first step is completed and an early group age
is done. At this point we need to re-calculate k new centroids as bary center of the clusters resulting from the previous
step. After we have these k new centroids , a new binding has to be done between the same data set points and the
nearest new center. A loop has been generated. As a result of this loop we may notice that the k centers change their
location step by step until no more changes are done or in other words centers do not move any more. Finally, this
algorithm aims at minimizing an objective function know as squared error function given by
where,
‘||xi - vj||’ is the Euclidean distance between xi and vj.
‘ci’ is the number of data points in ith cluster.
‘c’ is the number of cluster centers.
Algorithmic steps for k-means clustering

Let X = {x1,x2,x3,…….., xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the distance between each data point and cluster centers.
3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster
centers..
4) Recalculate the new cluster center using: where, ‘ci ’ represents the number of data points in
it h cluster.
5) Recalculate the distance between each data point and new obtained cluster centers.
6) If no data point was reassigned then stop, otherwise repeat from step 3).
7.6 Clustering In K-Means
Fig: 7.5 Clusters of states by fatality of per million people in states

ISSN (Print) : 2320-9798


20TH-21ST FEB’18
Organized by
7.5 Correlation In K-Means
Fig: 7.6 Correlation between traffic-person and fatal rate
VIII. CONCLUSION AND FUTURE SCOPE
As seen in statistics, association rule mining, and the classification, the environmental factors like roadway surface,
weather, and light condition do not strongly affect the fatal rate, while the human factors like being drunk or not, and
the collision type, have stronger effect on the fatal rate. From the clustering result we could see that some states/regions
have higher fatal rate, while some other slower. We may pay more attention when driving within those risky
states/regions. Through the task performed, we realized that data seems never to be enough to make a strong decision.
If more data, like non-fatal accident data, weather data, mileage data, and so on, are available, more test could be
performed thus more suggestion could be made from the data.
The next step in the modeling will be to combine road-related factors with driver information for better
predictions, and to find interactions between the different attributes. We also plan to develop a decision support tool. In
this work, we extended the research to three different cases such as Accident, Casualty and Vehicle for finding the
cause of accident and the severity of accident.
REFERENCES
[1]. Poojitha Shetty, Sachin P C, Supreeth V Kashyap, Venkatesh Madi. Analysis of road accidents using data mining techniques International
Research Journal of Engineering and Technology April 2017.
[2]. Kumar S, Toshniwal D (2016) A data mining approach to characterize road accident locations. J Mod Transp 24(1):62–72
[3]. Sachin Kumar and Durga Toshniwal. Analysing road accident data using association rule mining. In Proceedings of International Conference
on Computing, Communication and Security, pages 1–6, 2015.
[4]. Amira A El Tayeb, Vikas Pareek, and Abdelaziz Araar. Applying association rules mining algorithms for traffic accidents in dubai.
International Journal of Soft Computing and Engineering, September 2015.
[5]. Kumar S, Toshniwal D (2015) A data mining framework to analyze road accident data. J Big Data 2(1):1–18. doi:10.1186/ s40537-015-0035-y
[6]. Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Addison-Wesley; 2006. 15. Abellan J, Lopez G, Ona J. Analyis of traffic
accident severity using decision rules via decision trees, vol. vol. 40. Expert System with Applications: Elsevier; 2013.
[7]. Rovsek V, Batista M, Bogunovic B. Identifying the key risk factors of traffic accident injury severity on Slovenian roads using a non-
parametric classification tree, transport. UK: Taylor and Francis; 2014.
[8]. Kashani T, Mohaymany AS, Rajbari A. A data mining approach to identify key factors of traffic injury severity, promettraffic & transportation,
vol. 23; 2011.
[9]. S. Krishnaveni and M. Hemalatha. A perspective analysis of traffic accident using data mining techniques. International Journal of Computer
Applications, 23(7):40–48, June 2011
[10]. K Jayasudha and C Chandrasekar. An overview of data mining in road traffic and accident analysis. Journal of Computer Applications,
2(4):32–37, 2009.

Mining Techniques For Analysing Road Accidents: S.R.K.R Engineering College, Chinna Amiram, Bhimavaram - 534204, India

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mining Techniques For Analysing Road Accidents: S.R.K.R Engineering College, Chinna Amiram, Bhimavaram - 534204, India

Uploaded by

Copyright:

Available Formats

ISSN(Online): 2320-9801

ISSN (Print) : 2320-9798

International Journal of Innovative Research in Computer and Communication Engineering

A NATIONAL LEVEL CONFERENCE ON CURRENT TRENDS OF INFORMATION TECHNOLOGY [TECHFLEET 2K18]