Professional Documents
Culture Documents
net/publication/325414837
Analysis and predict the nature of road traffic accident using data mining
techniques in Maharashtra, India
CITATIONS READS
9 666
3 authors, including:
Baye Atnafu
Symbiosis International University
2 PUBLICATIONS 11 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Analysis and predict the nature of road traffic accident using data mining techniques in Maharashtra, India View project
All content following this page was uploaded by Baye Atnafu on 29 May 2018.
Abstract—Traffic accidents are the main cause of hidden pattern that influences the traffic accident
death as well as serious injuries in the world. India is severity levels using data mining techniques.
among the emerging countries where the rate at which There are a number of Data Mining classification
traffic accident occurs is more than the critical limit. As algorithms available (Like a Random tree, J48,
a human being, we all want to avoid traffic accidents Random forest, CART and Naïve Baye's) to predict
and stay safe. In order to stay safe, careful analysis of
the target class by analyzing the training dataset to
roadway traffic accident data is important to identify the
nature of road traffic accident that causes for fatal, get better boundary conditions which can be used to
grievous injury, minor injuries, and non-injury. For this determine each target class. After determining the
purpose, there are a number of classification and boundary conditions, the subsequent task is to
association rule mining algorithms. From this in the predict the target class based on the boundary
proposed method, Random tree, J48, and Naive Baye's conditions.
algorithms are selected that were show better There are also a number of Data Mining algorithms
performance in the previousstudies and applied on data
are available to find out the association between
set to analyze road accident data of Maharashtra state
in India. The results of the three algorithms are
independent variables in a huge data. Association
compared and then the prediction model is done using rule mining algorithm is the most popular
the algorithm which proves to be the best. Further, the methodologies to detect the significant associations
Apriori association rule mining algorithm is applied to between the data stored in the large database. There
find out the relationship between independent are a number of association rule mining algorithms
variablesrespects with nature of accidents. This study available. From these Apriori, predictive Apriori
also investigates the influence of other contributing and FP-growth algorithm are the most common
factors (like road related, driver related and association rule mining methods to find out the
environmental related factors) that influence the association between various road traffic accident
likelihood of severe accidents in Maharashtra, India.
severity factors that influencing the traffic accident
Keywords—road accident, data mining, random tree, severity levels in Maharashtra state, India.
J48, Naive Baye’s, association rule mining
I. RELATED WORK
I. INTRODUCTION
Sachin Kumar & Durga Toshniwal [3] in this study
Annually due to road traffic accidents 1.25 million
initially authors applied three popular classification
peoples die and 20-50 million peoples hurt non-
algorithms such as a CART, Naïve Bayes, and
fatal injuries [1]. According to the road traffic
SVM on PTW power two-wheeler accident data set
accident data provided by states, Maharashtra
and compared the results. CART classification
records the third highest number of fatal accidents
algorithm accuracy was found superior to other two
(13,212) [2]. However, this trend can change in
algorithms. Hence they have been used CART
future as it is hard to predict the rate at which road
classification algorithms to find the various factors
traffic accidents occur as it can occur in any
that influence the accident severity of power two-
situation. Therefore, we need to investigate the
wheeler accidents in entire Uttarakhand state and its
13 districts separately [3]. The result shows that Sachin Kumar, Durga Toshniwal, Manoranjan
each districthas different factors associated with Parida [8] used the Latent Class Clustering and k-
power two-wheeler accidents severity. Modes clustering algorithms to form different
Sachin Kumar & Durga Toshniwal [4] k-means homogeneous clusters using a heterogeneous road
clustering algorithm used to investigate the high accident data. Further, FP growth algorithm is
and low-frequency accident locations. Further, applied to the clusters formed to find out the
they have been used association rule mining to algorithm that is better-performing when decreasing
recognize the association between the various the heterogeneity of traffic accident data [8]. The
factors related to road traffic accidents at various results prove that there is no any clustering
places with changeable accident occurrences. The algorithms is superior to others, that means both the
result shows that more accidents occur on clustering techniques perform well when to reduce
highways, foot-travelers are more vulnerable to the heterogeneity nature of accident data [8].
road accidents at roads that have intersections, Yannis George, Theofilatos Athanasios &
Curve on roads bordered by agriculture land are Pispiringos George [9] investigated road accident
risky for multi-vehicle crashes and intersections on severity per vehicle type by using log-normal
roads which fall upon marketplaces are more regression techniques. The result of this study
vulnerable to severe accidents. shows that bad weather situations and accidents
Wang, Yubian, and Wei Zhang [5] logistic during nighttime increase accident severity.
regression model used to isolate and enumerate the Furthermore, authors concluded that there is a
impact of various roadways and environmental major impact of crash type while examining
factors on the traffic crash severities and predict the accident severity.
accident severity levels. Authors investigated that Hao, Wei, Camille Kamga, Xianfeng Yang, JiaQi
factors like crash location, road function class, road Ma, Ellen Thorson, Ming Zhong, and Chaozhong
alignment, light condition, road surface condition, Wu. [10] explored the factors of injury severity of
and speed limit have the significant impact on crash truck drivers in the United States using an ordered
severity. The results show that higher crash severity probit regression model. The outcomes of analysis
is linked with rural roadways, major arterials, show bad weather and visibility condition enhance
locations without intersection, locations with the chances of high-level injury severity in truck
curves, during night-time, dry roadway conditions, drivers'.
and high-speed limits. Bhaven Naik, Li-Wei Tung, Shanshan Zhao, &
L. Mussonea, M. Bassani and S P. MascibKeller [6] Aemal J. Khattak [11] authors combined two
evaluated the factors that affect the accident diverse sources of data and applied random
severity levels at urban road intersections using parameters (mixed) ordered logit model to consider
back propagation neural network and generalized the distinct heterogeneity in the data. Results
linear mixed model. Both methods demonstrate that depicted that rain, air temperature, humidity, air
traffic flows have a significant role in predicting temperature, wind speed, and rain were found a
severity; this role is not limited to the flow when factor for injury severity. From these factors,
the crash occurred, but also extends to the other warmer air temperatures and rain were associated
vehicle crash flow data before the crash occurs after with high severe injuries while less severe injuries
the crash occurred. wereassociated with higher levels of humidity.
Yina Wu, Mohamed Abdel-Aty & Jaeyoung Lee Dursun Delen, Leman Tomak, Kazim Topuz &
[7] logistic regression model used to recognize the Enes Eryarsoy [12] focused on recognizing the
various factors contributing to increased vehicle person, vehicle, and accident-related risk factors
crash risk during fog and investigate the situations that are significant in building a variance in injury
in which crash risk are more likely to occur. The severity levels sustained by a driver in a car crash.
analysis results show that drivers will be more The authors used a number of predictive analytics
careful when fog is present and the chances of algorithms to find the composite associations
increasing crash risk would be more near ramp between various stages of injury severity and the
areas. risk causes related to crash. The authors also found
the importance of crash-related risk factors after
applying a systematic series of in-formation fusion- this type of classification problem such as a
based sensitivity analysis on the trained predictive Random tree, j48, Naïve Bayes…etc.
models. Sensitivity analysis results prove that use A. RANDOM TREE
of a preventive system (i.e., seatbelt), the way of
A random tree (RndTree) is a group of distinct
crash and drug usages are the main predictors of the
decision trees, which means that operator of
injury severity.
random tree works just like the decision tree
Liling Li, Sharad Shrestha & Gongzhu Hu [13] operator except, for each split, only a random
applied statistics analysis and data mining subset of attributes is accessible [16]. A RndTree is
algorithms such as, Apriori rule mining, Naive a tree haggard at a chance from the collection of
Baye's and k-means clustering algorithm on the achievable trees. In this perspective "at random"
FARS Fatal Accident dataset for the purpose of signifies that every tree has an equal chance of
investigating the relationship between fatal rate and being selected from the set of trees.
other attributes such as weather condition, collision
B. J 48
manner, light condition, drunk driver and surface
conditions. The analysis result shows that J48 is an advanced version of ID3; it decides target
environmental factors likeroadway surface, value of a new test data with respect to diverse
weather, and light conditions do not strongly affect attribute values of training dataset [15]. The inner
the fatal rate, while the human factors like drunk or nodes of a decision tree are represented by different
not and the collision type have a stronger effect on attributes while the branches tell the achievable
the fatality rate. values of these attributes. The internal nodes denote
the dependent variable values [15]. Escalating the
Hasan Mehdi Naqvi & Geetam Tiwari [14] using a
count of trees provides a more intelligent learner
binomial logistic regression model to identify the
just as having a large varied group is capable of
various causes that influence the motorcycle fatal
reaching intelligent conclusion [17].
crashes. Study results show that the chances to
occur rear-end, sideswipe and head-on collision are c. NAÏVE BAYE’S
42 times, 35 times and 25 times more than hit The naïve Bayesian classifier is one of the most
pedestrian for variable "collision type", effective and widely used supervised learning
respectively; the probability of fatal crash increase algorithms to classify the road accident data. It is a
insingle-vehicle crashes than two or more vehicles statistical model that predicts class membership
crashes for variable "number of vehicle", and, the probabilities based on Bayes' theorem. The Naive
probability fatal crash on two-lane national Baye's classification algorithm is one of the
highway is more than four-lane national highway probability-based methods used for classification
for variable "number of lane". and prediction based on the Bayes' hypothesis with
the assumption of independence between each pair
of variables.
II. METHODOLOGY
2. ASSOCIATION RULE MINING
1. CLASSIFICATION ALGORITHMS
ALGORITHMS
The classification algorithm is one of the data Association rule mining algorithm is the most
analysis methodsthat used to constructmodelsto popular methodologies used to detect the
predict the future data. The type of classification significant associations between the data stored in a
algorithms used variesaccording to the target huge database. For this purpose there are a number
variable. The target variable for this survey paper is of association rule mining algorithms present, from
represented as a category variable withsix possible these Apriori, predictive Apriori and FP-growth
outcomes (Overturning, head-oncollision, Rar end association rules mining algorithm are the most
collision, Right turn collision, left turn collision and unusually used algorithms inthe area of road traffic
Others).Accordingly, the analyzingproblem is accident analysis,to generate the best rules that
characterized as a nominal classification problem show the association between various attributes in
and based on the extant literature there are various large datasets.
data mining techniques available that can handle
Table I: Comparison of data mining tools that used for road accident data analysis
Tools R WEKA RAPID MINER KNIME
1 Memory Usage More memory Less Memory more memory -
2 Language C, Fortran Java Language Independent Java
and R
3 speed Works faster on any machine Works faster on any machine Requires more memoryto operate -
4 Usage Complicated to use Most easiest Easy to use Easy to use
to use
5 Interface Type CLI GUI/CLI GUI GUI
Supported
J48 classifier also was built on the preprocessed data of 19,166 total records, 18,681instanceswere
correctlyclassified giving a 97.5% accuracy rate. Itproves that random tree classifier performsbetterthan j48 to
predict the nature of road traffic accidents.Due to this reason the prediction model was built using random tree
classifier. to Dueclassified giving a 97.5% accuracy rateThe variousevaluation measures for j48 classifier are
given in Table III.
Table IV: Best 10 rules generated by Apriori association rule mining algorithms.
Overturning 1
Head on collision 2
Rear end collision 3
Nature of Accident 7 Collusion brush or side wipe 4
Left turn collusion 5
Right turn collusion 6
skidding 7
Fatal 1
Grievous injury 2
Classification of
Accident 4 Minor Injury 3
Non Injured 4
5 Drunk 1
Over-speeding 2
Vehicle out of control 3
Causes of accident Fault of driver of motor vehicle /driver of 4
other vehicle /cyclist /pedestrian /passenger
Defect in mechanical condition of motor 5
vehicle /road condition
4 Single lanes 1
Two lanes 2
Road Feature
Tree or more lanes without central divider 3
Four or more lanes with central divider 4
Straight road 1
Slight curve 2
8 Sharp curve 3
Road Condition
Flat road 4
Gentle incline 5
Steep incline 6
Hump 7
Dip 8
Fine 1
Mist/fog 2
Cloud 3
12 Light rain 4
Heavy rain 5
Hail/sleet 6
Weather Condition
Snow 7
Strong wind 8
Dust storm 9
Very hot 10
Very cold 11
Other weather condition 12
T-junction 1
Y- junction 2
Four arm Junction 3
Intersection Type and 8 Staggered Junction 4
Control Junction with more than four arms 5
Roundabout junction 6
Manned rail crossing 7
Unmanned rail crossing 8
7:00am - 10:59am T1
11:00am - 2:59pm T2
6 3:00pm - 6:59pm T3
Time of day
7:00pm - 10:59pm T4
11:00pm - 2:59am T5
3:00am - 6:59am T6
Date Many 09-09-2015 09-09-2015
Jan - Feb Winter
Mar-May Summer
Season
4 Jun-Sept Rainy
Oct-Dec Monsoon
as season and timein which the accident frequently type. Transportation Research Procedia, 25, 2081-
occurred. This leads to the challenges in the field of 2088.
accident analysis and prediction. Some of the 10. Hao, W., Kamga, C., Yang, X., Ma, J., Thorson, E.,
challenges include modeling of accidents for finding Zhong, M., & Wu, C. (2016). Driver injury severity
study for truck involved accidents at highway-rail
suitable algorithms to detect the accident severity
grade crossings in the United States. Transportation
levels, data preparation, transformation, and research part F: traffic psychology and behavior, 43,
processing time. Therefore, in order to fill some of 379-386.
the gaps, we are motivated to studythe roadtraffic 11. Naik, B., Tung, L. W., Zhao, S., & Khattak, A. J.
accident data to find out the influenceof thenature of (2016). Weather impacts on single-vehicle truck
road accidents on road accident severity levelsin crash injury severity. Journal of safety research, 58,
Maharastra, India.As seen in 57-65.
classificationandApriori association rule mining, the 12. Delen, D., Tomak, L., Topuz, K., & Eryarsoy, E.
road condition,weathercondition, road (2017). Investigating injury severity risk factors in
featureandroad intersection type and controls are automobile crashes with predictive analytics and
sensitivity analysis methods. Journal of Transport &
strongly affect the nature of accidents that lead fatal
Health, 4, 118-131.
and series injury severity. 13. Li, L., Shrestha, S., & Hu, G. (2017, June). Analysis
of road traffic fatal accidents using data mining
IV. REFERENCES techniques. In Software Engineering Research,
1. WHO. Road traffic safety. Available from Management and Applications (SERA), 2017 IEEE
http://www.who.int/mediacentre/factsheets/fs358/en/ 15th International Conference on (pp. 363-370).
Accessed on 21 September 2017. IEEE.
2. 400 road deaths per day in India; up 5% to 1.46 lakh 14. Naqvi, H. M., & Tiwari, G. (2017). Factors
in 2015 Available Contributing to Motorcycle Fatal Crashes on
fromhttp://timesofindia.indiatimes.com/lifestyle/healt National Highways in India. Transportation Research
h-fitness/health-news/400-road deaths-per-day-in- Procedia, 25, 2089-2102.
India-up-5-to-1-46-lakh-in- 15. Classification, prediction, and clustering. Available
2015/articleshow/51920988.cms accessed on 21 from
September 2017. http://shareengineer.blogspot.in/2012/09/classification-
3. Kumar, S., & Toshniwal, D. (2017). Severity analysis and-clustering.html accessed on Accessed on 22
of powered two-wheeler traffic accidents in September 2017.
Uttarakhand, India. European Transport Research 16. Sewaiwar, Purva, and Kamal Kant Verma.
Review, 9(2), 24. "Comparative Study of Various Decision Tree
4. Kumar, S., & Toshniwal, D. (2016). A data mining Classification Algorithm Using
approach to characterize road accident locations. WEKA." International Journal of Emerging
Journal of Modern Transportation, 24(1), 62-72. Research in Management &Technology 4 (2015):
5. Wang, Y., & Zhang, W. (2017). Analysis of 2278-9359.
Roadway and Environmental Factors Affecting 17. Rapidminer.Randomtree. Available from
Traffic Crash Severities. Transportation Research https://docs.rapidminer.com/studio/operators/modelin
Procedia, 25, 2124-2130. g/predictive/trees/random_tree.html accessed on 22
6. Mussone, L., Bassani, M., & Masci, P. (2017). September 2017.
Analysis of factors affecting the severity of crashes in 18. Zhao, Yongheng, and Yanxia Zhang. "Comparison of
urban road intersections. Accident Analysis & decision tree methods for finding active objects."
Prevention, 103, 112-122. Advances in Space Research 41.12 (2008): 1955-
7. Wu, Y., Abdel-Aty, M., & Lee, J. (2017). Crash risk 1959.
analysis during fog conditions using real-time traffic 19. Lai, K., & Cerpa, N. (2001, October). Support vs.
data. Accident Analysis & Prevention. confidence in association rule algorithms. In
8. Sachin Kumar, Durga Toshniwal, Manoranjan Proceedings of the OPTIMA Conference, Curicó.
Parida. "A comparative analysis of heterogeneity in 20. Aher, S. B., & Lobo, L. M. R. J. (2012). A
road accident data using data mining techniques", comparative study of association rule algorithms for
Evolving Systems, 2016. course recommender system in e-learning.
9. George, Y., Athanasios, T., & George, P. (2017). International Journal of Computer Applications,
Investigation of road accident severity per vehicle 39(1), 48-52.
21. L. Zhichun and Y. Fengxin, “An improved frequent 25. Rangra et al. “Comparative Study of Data Mining
pattern tree growth algorithm,” Applied Science and Tools”. In Proceedings ofInternational Conference on
Technology, vol. 35, no. 6, pp. 47–51, 2008. Advanced Research in Computer Science
22. C. Jun and G. Li, “An improved FP-growth algorithm andSoftware Engineering, vol. 4, no. 6, pp. 216-223,
based on item head table node,” Information June 2014.
Technology, vol. 12, pp. 34–35, 2013. 26. Bhinge, A. V. (2015). A Comparative Study on Data
23. B. Zheng and J. Li, “An improved algorithm based Mining Tools (Doctoral dissertation, California State
on FP-growth,” Journal of Pingdingshan Institute of University, Sacramento).
Technology, vol. 17, no. 4, pp. 9–12, 2008. 27. Patel, P. S., & Desai, S. G. (2015). A Comparative
24. S.K. David, Amr T.M. Saeb, K.A. Rubeaan, Study on Data Mining Tools. International Journal of
Comparative Analysis of Data Mining Tools and Advanced Trends in Computer Science and
Classification Techniques using WEKA in Medical Engineering, 4(2).
Bioinformatics, Computer Engineering andIntelligent 28. National Highway Authority of India.Regional Office
System, 4(13), 2013. – Mumbai. Available from http://nhai.org.in accessed
on July 30, 2017.