Professional Documents
Culture Documents
Mining Techniques For Analysing Road Accidents: S.R.K.R Engineering College, Chinna Amiram, Bhimavaram - 534204, India
Mining Techniques For Analysing Road Accidents: S.R.K.R Engineering College, Chinna Amiram, Bhimavaram - 534204, India
ABSTRACT :Due to globalization, the economic activities are carried across the globe.This causes the increase of
vehicle traffic, which leads to critical accidents.There is a heterogenous amount of critical accidents data which is
difficult to analyze for finding the constraints causing the road accidents.In this paper, we apply mining algorithms on
critical accident dataset to address this problem.The relationship between critical rate and other attributes combining
weather condition, road type, sunlight condition, speed limit, and drunk driver are considered.Apriori Algorithm for
finding an association between attributes.Naive based approach for classifying how attributes are conditionally
independent.K-means is used to form clusters and to analyze them based on attributes.By using these Statistics
government agencies can take decisions in developing new roads and taking additional safety measures.
I. INTRODUCTION
There are a lot of vehicles driving on the roadway every day, and traffic accidents could happen at any time
anywhere. Some accident involves fatality, means people die in that accident. As human being, we all want to avoid
accident and stay safe. To find out how to drive safer, data mining technique could be applied on the traffic accident
dataset to find out some valuable information, thus give driving suggestion .Data mining uses many different
techniques and algorithms to discover the relationship in large amount of data. It is considered one of the most
important tool in information technology in the previous decades. Association rule mining algorithm is a popular
methodology to identify the significant relations between the data stored in large database and also plays a very
important role in frequent item set mining. A classical association rule mining method is the Apriori algorithm whose
main task is to find frequent item sets, which is the method we use to analyse the roadway traffic data. Classification in
data mining methodology aims at constructing a model (classifier) from a training data set that can be used to classify
records of unknown class labels. The Naive Bayes technique is one of the very basic probability-based methods for
classification that is based on the Bayes hypothesis with the presumption of independence between each pair of
variables. We used the FARS dataset for our study. The Fatal Accidents Dataset contains all fatal accidents on public
roads in 2007 reported to the National Highway Transportation Safety Administration (NHTSA).The dataset is
downloaded from California Polytechnic State University and all data originally came from FARS (Fatal Accident
Reporting System).The dataset contains 37,248 records and 55 attributes. The data description can be found in the
document.
1.1SCOPE
The scope of this project is meant to analyse the database and take preventive measures to avoid fatal accidents using
data mining.
India has second largest road network in the world. Road accidents happen quite frequently and they claim too many
lives every year. It is necessary to find the root cause for road accidents in order to avoid them. Suitable data mining
approach has to be applied on collected datasets representing occurred road accidents to identify possible hidden
relationships and connections between various factors affecting road accidents with fatal consequences. The results
obtained from data mining approach can help understand the most significant factors or often repeating patterns. The
generated pattern identifies the most dangerous roads in terms of road accidents and necessary measures can be taken to
avoid accidents in those roads. The objective of this project is to reduce the number of accidents caused during road
The cost of deaths and injuries due to traffic accidents has a great impact on society. In recent years,
researchers have paid a great attention at determining the factor that significantly affects accident severity in traffic
system. The author in presents a random forest & rough set theory to identify the factors significantly influencing
single vehicle crash severity. The author in presents a decision tree which predicts cause of accident and accident prone
locations. Project predict traffic accident duration of incident and driver information system. The author in used
various data mining techniques and tells the random forest outperforms than other classification algorithms. In project
we talk about the significance of data mining classification algorithm in predicting the factor which influences road
traffic accident. The author in used to explore the possible application of data mining technology for developing a
classification model and the result shows that developed model could classify accidents within a reasonable accuracy. It
is important to analyze these datasets to extracts useful knowledge. Data mining is an effective tool for analyzing data
to extract useful knowledge. Table 1 shows a sample of different data mining technique and their influential factor used
in traffic accident severity. The severity of injuries measured for crash records has both continuous and categorical
characteristics. Hence many previous studies have used models with ordered structure to analyze risk factor and their
effect on severity of injuries sustained in traffic crashes.
Jayasudha analysed the traffic accident using data mining technique that could possibly reduce the fatality rate. Using a
road safety database enables to reduce the fatality by implementing road safety programs at local and national levels.
Those database schemes describes the road accident via roadway condition, person involved and other data would be
useful for case evaluation, collecting additional evidences, settlement decision and subrogation. The International Road
Traffic and Accident Database (IRTAD), GLOBESAFE, website for ARC networks are the best resources to collect
accident data. Using web data a self-organizing map for pattern analysis was generated. It could classify information
and provide warning as an audio or video. It was also identified that accident rates highest in intersections then other
portion of road the effect on road speed on accident in the state of Washington was investigated by Eric.
Some researches claim that those states which increased speed limit from 55mph to 65mph after 1974 had the fatality
rate go up by 27% compared to increase in 10% in the states that did not increase the speed limit. It is claimed that as
the effect of change in maximum speed is varies between urban and rural areas. After 1987 accident in rural areas
increased while urban areas stayed relatively constant but clash rate in urban intersection is twice as high as in rural
intersection. Accident is dependent on area (urban/rural), type of street (intersection, highway). It is assumed that the
fatality rate of an accident might be reduced with the introduction of an express emergency system. Reducing the time
of delivery for emergency medical services (EMS) accident victims could be treated in time saving their lives. Accident
notification time, the difference between crash and EMS notification time is the most crucial. Trauma is time dependent
disease and trauma victims could be saved if treated on time. Trauma victims could stabilized if treated soon. The first
10 minutes is called golden hour. The time was 12.3 minutes for the one who died and 8.4 minutes for the one who
survived. The fatality rate was also much higher in rural areas because of unavailability of rapid EMS response in those
areas. There are also several other factors affecting the fatality such as vehicle kilometers traveled, alcohol
consumption, driver age distribution, accident notification time, personal income per capita and so on Solaiman et. al.
describes various ways accident data could be collected, placed in a centralized database server and visualized the
accident. Data could be collected via different sources and the more the number of sources the better the result. This is
because the data could be validated with respect to one another few could be discarded thus helping to clean up the
data. Different parameters such as junction type, collision type, location, month, time of occurrence, vehicle type could
be visualized in a certain time strap to see the how those parameters change and behave with respect to time. Based on
those attributes one could also classify the type of accident. Using map API the system could be made more flexible
such that it could find the safest and dangerous roads.
The costs of fatalities and injuries due to road traffic accidents have a tremendous impact on societal well-being and
socioeconomic development. Road transport is the major transportation system in the country. The Ethiopian traffic
control system archives data on various aspects of the traffic system, such as traffic volume, concentration, and vehicle
accidents. With more vehicles and traffic. The basic hypothesis of this research is that accidents are not randomly
scattered along the road network, and that drivers are not involved in accidents at random. There are complex
circumstantial relationships between several characteristics (driver, road, car, etc.) and the accident occurrence. As
such, one cannot improve safety without successfully relating accident frequency and severity to the causative
variables. We will attempt to extend by generating additional attributes and focusing on the contribution of road-related
factors to accident Severity. This will help to identify the parts of a road that are risky, thus supporting traffic accident
data analysis in decision-making processes. The general objective of the research is to investigate the role of road-
related factors in accident severity, using RTA data from Ethiopia and predictive models.
Our three specific objectives include:
Exploring the underlying variables (especially road-related ones) that impact car accident severity
Predicting accident severity using different data mining techniques
Comparing Standard classification models for this task.
In recent years the collection of information on traffic volumes has become a significant portion of the work of road
planning programs in terms of both cost and personnel. Initial methods we have manually collecting volume of traffic
data in TOD (time of day) manner for every 24 hours, manually summarizing volume of traffic data for every 24 hours.
Manually analyzing the traffic data and making Decision. Developing TOD timing plane to control the traffic problem
.There is no special tool helping to analyzing and making decisions.
Fig: 4.1 Data flow diagram for analysing existing system of Road Accidents
Today we are living in a data-driven world. Developments in data generation, gathering and storing technology have
empowered organizations to gather data sets of massive size. Data mining is a term that blends traditional data analysis
methods with cultured algorithms to handle the tasks stood by these new forms of data sets. a methodology which may
be used to extract knowledge from such databases in an automated manner using data mining techniques Data mining is
the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant
structure from large amounts of data stored in databases, data warehouses, or other information repositories. In this
project we consider different attributes nearly (37248) but performing analysis on these attributes leads to difficulty, so
we reduce the attributes , perform data cleaning and apply data mining algorithms such as apriori , naïve bayes
classification, k means algorithm and generate association rules.
5. SYSTEM ARCHITECTURE
DATA SET
DATA PRE-PROCESSING
CLUSTERING ALGORITHM
DATA PREPARATION
MODELING
VISUALIZATION/RESULT ANALYSIS
Data pre-processing
Data pre-processing is one of the important tasks in data mining. Data pre-processing mainly deals with
removing noise, handle missing values, removing irrelevant attributes in order to make the data ready for the analysis.
In this step, our aim is to pre-process the accident data in order to make it appropriate for the analysis.
Clustering algorithm
There are several clustering algorithms exist in the literature. The objective of clustering algorithm is to divide
the data into different clusters or groups such that the objects within a group are similar to each other whereas objects
in other clusters are different from each other. Hierarchical clustering technique (e.g. Ward method, single linkage,
complete linkage, etc.), K means and latent class clustering (LCC) have been used in road accident analysis.
Distance measure
Given a data set D, the distance between two objects X and Y, where X and Y are described by N categorical
variables, can be computed as follows
d(X,Y)=
(Xi,YI) ={0, XI=YI 1, Xi≠Yi
In the above equations, Xi and Yi are the attribute i values in object X and Y. This distance measure is often referred as
simple matching dissimilarity measure. The more the number of differences in categorical values of X and Y, more the
different two objects are.
A Naive Bayesian classifier is a simple probabilistic classifier based on applying Bayesian theorem (from Bayesian
statistics) with strong (naive) independence assumptions. By the use of Bayesian theorem we can write
p(C | F1....Fn) p(C)p(F1.....Fn |C)
p(F1.....Fn)
Fig: 7.3 Naive bayes clean data for Analysing Road Accidents
Fig: 7.4: Naive bayes clean classifier for Analysing Road Accidents
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering
problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters
(assume k clusters) fixed apriori. The main idea is to define k-centers, one for each cluster. These centers should be
placed in a cunning way because of different location causes different result. So, the better choice is to place them as
much as possible far away from each other. The next step is to take each point belonging to a given data set and
associate it to the nearest center. When no point is pending, the first step is completed and an early group age
is done. At this point we need to re-calculate k new centroids as bary center of the clusters resulting from the previous
step. After we have these k new centroids , a new binding has to be done between the same data set points and the
nearest new center. A loop has been generated. As a result of this loop we may notice that the k centers change their
location step by step until no more changes are done or in other words centers do not move any more. Finally, this
algorithm aims at minimizing an objective function know as squared error function given by
where,
‘||xi - vj||’ is the Euclidean distance between xi and vj.
‘ci’ is the number of data points in ith cluster.
‘c’ is the number of cluster centers.
As seen in statistics, association rule mining, and the classification, the environmental factors like roadway surface,
weather, and light condition do not strongly affect the fatal rate, while the human factors like being drunk or not, and
the collision type, have stronger effect on the fatal rate. From the clustering result we could see that some states/regions
have higher fatal rate, while some other slower. We may pay more attention when driving within those risky
states/regions. Through the task performed, we realized that data seems never to be enough to make a strong decision.
If more data, like non-fatal accident data, weather data, mileage data, and so on, are available, more test could be
performed thus more suggestion could be made from the data.
The next step in the modeling will be to combine road-related factors with driver information for better
predictions, and to find interactions between the different attributes. We also plan to develop a decision support tool. In
this work, we extended the research to three different cases such as Accident, Casualty and Vehicle for finding the
cause of accident and the severity of accident.
REFERENCES
[1]. Poojitha Shetty, Sachin P C, Supreeth V Kashyap, Venkatesh Madi. Analysis of road accidents using data mining techniques International
Research Journal of Engineering and Technology April 2017.
[2]. Kumar S, Toshniwal D (2016) A data mining approach to characterize road accident locations. J Mod Transp 24(1):62–72
[3]. Sachin Kumar and Durga Toshniwal. Analysing road accident data using association rule mining. In Proceedings of International Conference
on Computing, Communication and Security, pages 1–6, 2015.
[4]. Amira A El Tayeb, Vikas Pareek, and Abdelaziz Araar. Applying association rules mining algorithms for traffic accidents in dubai.
International Journal of Soft Computing and Engineering, September 2015.
[5]. Kumar S, Toshniwal D (2015) A data mining framework to analyze road accident data. J Big Data 2(1):1–18. doi:10.1186/ s40537-015-0035-y
[6]. Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Addison-Wesley; 2006. 15. Abellan J, Lopez G, Ona J. Analyis of traffic
accident severity using decision rules via decision trees, vol. vol. 40. Expert System with Applications: Elsevier; 2013.
[7]. Rovsek V, Batista M, Bogunovic B. Identifying the key risk factors of traffic accident injury severity on Slovenian roads using a non-
parametric classification tree, transport. UK: Taylor and Francis; 2014.
[8]. Kashani T, Mohaymany AS, Rajbari A. A data mining approach to identify key factors of traffic injury severity, promettraffic & transportation,
vol. 23; 2011.
[9]. S. Krishnaveni and M. Hemalatha. A perspective analysis of traffic accident using data mining techniques. International Journal of Computer
Applications, 23(7):40–48, June 2011
[10]. K Jayasudha and C Chandrasekar. An overview of data mining in road traffic and accident analysis. Journal of Computer Applications,
2(4):32–37, 2009.