You are on page 1of 16

Analysis of Road Traffic Accidents to Identify Major Causes of Accidents

using Machine Learning Techniques: In the case of Addis Ababa City


MSc Thesis Research Proposal
By
TARIKWA TESFA
Advisor: Dr. Beakal Gizachew
DEPARTMENT OF SOFTWARE ENGINEERING
COLLEGE OF ELECTRICAL AND MECHANICAL
ENGINEERING
ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY

February 27, 2019


Approval Page
Title: Analysis of Road Traffic Accidents to Identify Major Causes of
Accidents using Machine Learning Techniques: In the case of Addis Ababa
City

Student Name: Tarikwa Tesfa Signature, Date: ______________________

Approved by the examining committee members:


Name Academic Rank Signature Date
Advisor: ________________ ________________ ___________ ___________
Co-Advisor: ________________ ________________ ___________ ___________
Examiner: ________________ ________________ ___________ ___________
Examiner: ________________ ________________ ___________ ___________

Name Signature Date


DGC Chairperson: ________________ ________________ ________________
Associate Dean for: ________________ ________________ ________________
Graduate Programs: ________________ ________________ ________________
Table of Contents
Introduction ...................................................................................................................................................
1
Statement of the Problem ..............................................................................................................................
2
Purpose and Research Question ....................................................................................................................
4
Literature Review and Related Work............................................................................................................ 4
Objectives of the Study ................................................................................................................................. 8
General Objective ..................................................................................................................................... 8
Specific Objectives ................................................................................................................................... 8
Scope and Limitations ...................................................................................................................................
8
Significance of the Research .........................................................................................................................
9
Research Methodology ................................................................................................................................. 9
Review of Related Literature ....................................................................................................................
9
Data collection .......................................................................................................................................... 9
Data processing and Machine Learning Creation .....................................................................................
9
Budget Plan and Work Breakdown .............................................................................................................
11
Schedule ..................................................................................................................................................
11
Budget .....................................................................................................................................................
11
Reference .................................................................................................................................................... 13
Introduction
Road traffic accidents are major worldwide threats that continue to cause casualties, injuries, and
fatalities on road on a daily basis, resulting in huge losses at the economic and social levels. In
Ethiopia, the number of deaths due to traffic accidents is reported to be amongst the highest in
the world. Over 3,000 people annually die due to traffic accidents in Ethiopia. It is estimated that
there are 700,000 vehicles in the country. Per capita car possession in Ethiopia stands at three
cars per 1,000 people. The Global Status Report on Road Safety 2015 indicated that the total
number of road traffic deaths worldwide has been increasing by 1.25 million per year, with the
highest road traffic fatality rates are registered in low-income countries. According to the WHO,
in 2013 the road crash fatality rate in Ethiopia was 4984.3 deaths per 100,000 vehicles per year,
compared to 574 across sub-Saharan African countries. Besides, the number of people injured or
killed in one crash in Ethiopia is about 30 times higher than that in the US [1].

Some places might contribute more to the accident than others might. Addis Ababa, takes the
lion’s share of the risk having higher number of vehicles and traffic and the cost of these
fatalities and injuries has a huge impact on the socio-economic development of the society [2].
Every year, around 300 people are killed on Addis Ababa's roads and 1500 are lightly and
seriously injured.
The government has started several campaigns, such as “Think!” and Road Safety Campaign
(RSC), to help people become aware of road safety issues and try to reduce road accidents [3].

There are different reasons responsible for the accidents like abandonment of traffic rules but road
conditions and the traffic are considered the one of prime cause of fatality and causality across the
globe. These accidents occur due to dynamic design and development of automobile industries. A
traffic crash occurred due to certain reasons like crashes of two vehicles on road, walking person,
animal, or any other natural obstacles. It could result in injury, property damage, and death. Traffic
accident analysis required study of the various factor affecting behind them.

Road traffic accident is outlined as a collision or incident involving at least one road vehicle in
motion that can be on a public or private road to which the public have the right of access. Thus,
road traffic accident can be a collision among vehicles, between vehicles and pedestrians,
between vehicles and animals, or between vehicles and geographical or architectural obstacles
[1]. Single vehicle accidents, in which one vehicle alone (and no other road user) was involved,
are included.
Various sectors are dealing with huge amounts of data available in different formats from
disparate sources. The huge amount of data is becoming easily available and accessible due to
the progressive use of technology. Governments and companies realize the huge insights that can
be obtained from tapping into big data but lack the resources and time required to examine
through its wealth of information. As such, artificial intelligence measures are being employed
by different industries to gather, process, communicate, and share useful information from data
sets. One method of Artificial Intelligent that is increasingly utilized for big data processing is
machine learning [4].

To evaluate and analyze data stored in large databases, machine learning techniques are needed
to search large quantities of data and to discover new patterns and relationships hidden in the
data. Machine learning allows analysis of massive quantities of data. While it generally delivers
faster, more accurate outcome in order to identify profitable opportunities or dangerous risks, it
may also require additional time and resources to train it properly. Integrating machine learning
with Artificial Intelligent and cognitive technologies can make it even more effective in
processing large volumes of information.

Traffic control system is the area, where serious data about the society is recorded and kept.
Using this data, we can identify the risk factors and causes for road traffic accidents, injuries and
fatalities and make preventive measures to save the peoples life. Road traffic accident analysis, a
part of criminology, is a law enforcement function that involves the methodical analysis of
identifying and analyzing both patterns and trends in accident.

Machine learning holds the promise of making it easy, convenient, and practical to explore very
large databases for organizations and users [6]. Actually road traffic accident analysis includes
exploring and detecting accident and their relationships with those who are case of the accident.
The high volume of accident datasets and also the different variables are used in identification of
the major causes of the accident using machine learning techniques [5].

Statement of the Problem


The underlying research problem that initiated this research is the fact that, nowadays road traffic
accidents is becoming a complex social phenomenon and its cost is increasing due to a number of
societal and technological changes. The costs of deaths and injuries due to road traffic accident
has a great effect on society life. There are more than a million deaths each year from road traffic
injuries around the world; millions more suffer injury and long-term disability. Moreover, road
traffic accidents impact on the most productive members of a society and result in large-scale
economic losses for a country. Ethiopia has the highest rate of road traffic accidents, owing to the
fact that road transport is the main transportation system in the country.

Research on road traffic accidents has been conducted for several years mainly in developed
countries, and a few locally. Tibebe [7] conducted a research on historical road traffic accidents
data comprising a dataset of 4,658 accident records at Addis Ababa Traffic Office to investigate
the application of data mining technology for the analysis of accident severity. Following Tibebe,
Zelalem [8] has also conducted a research to classify drivers’ responsibility on a given accident
in Addis Ababa. In addition, Tibebe and Hill [9] again did a research on road related factors on
accident severity.
The previous researches have focused merely on single attributes that help to predict traffic
accident in Addis Ababa, which shows there is a gap for further research that combines the
drivers’ information, road characteristics and other related attributes to predict the causes of
accidents. Changes on traffic rules and regulations are made in the capital city, which has its own
contribution in road safety after these researches have been done.
Moreover, although the existence of a large number of road accidents are shown by different
studies, road traffic accident data are gathered periodically by the Addis Ababa traffic control
and investigation department, due to lack of appropriate data analysis tools this historical and
accumulated data has not been used for analysis.
The recorded data is a major source of solution to analyze the contributing factor of the problem
that cause a great loss of life. In an attempt to prevent road traffic accidents one role that can be
played is researching the main causes of traffic accidents and try to attack the problem from its
root.
In this research, the researcher will construct a model that predicts the major causes of road
traffic accidents based on the drivers’ information, road and other related attributes, using a
traffic accident data from Addis Ababa sub city’s Police Departments in Addis Ababa City.
Purpose and Research Question
In this thesis, a machine learning technique will be used in a knowledge discovery process to
identify and predict major causes of road traffic accident. Thus this research will address the
following three main research questions:

• What are the main determinant factors (attributes) that can cause traffic accident?
• Which machine learning techniques perform well in identifying the main causes of road
traffic accident?
• What are the most interesting patterns or rules generated using the cause factors of roads
traffic accident that can be used as a traffic rules and policies?

Literature Review and Related Work


In this section, we present some related work using machine learning and data mining techniques
to predict and analyze road traffic accidents data to predict the major causes that lead to the
accidents.

In [10] different Supervised Machine Learning methods like Logistic Regression, K- Nearest
Neighbor, Naive Bayes, Decision Tree and Random Forests are implemented on accidents
dataset like to discover how each component is affecting the accidents variables and this gives a
safe driving proposals to limit the accidents. The discoveries of this investigation demonstrate
that the Decision Tree can be a best model for anticipating the reason for accidents by using
Anaconda, which contains Jupyter notebook it is a free source conveyance of R and Python
programming languages for enormous data processing, prediction and analysis. Decision Tree
shown better performance on all the components, namely Weather condition, Causes, Road
Features, Road Condition, Type of Accident, with 99.4%.

In [11] three classification algorithms were implemented Decision tree, ANN, and SVM to detect
the influential environmental features of RTAs that can be used to build the prediction
classification rules. These classifiers were trained and tested using the dataset was obtained from
the Department for Transport of United Kingdom using WEKA tool. R tool also used to apply
sampling techniques to handle the imbalanced data problem of the used dataset. The experiment
results show that the highest Accuracy, Precision, Recall, and F-Measure values were 80.650%,
0.814%, 0.806%, and 0.801% to Decision Tree. The PART algorithm was used to present the
knowledge in the form of rules. PART was run with the accuracy of 76.570% on the Traffic
Accident dataset, and Cross Validation 10-folds were used. Moreover, the JAVA language was
used to build PART rules list for the prediction model. Rules were generated based on Urban or
Rural Area, Speed limit, Light Conditions, and Number of Vehicles attributes.

In [12] have applied different machine learning classification algorithms and discussed the six
algorithms with high accuracy and best classification performances such as Fuzzy-FARCHD,
Random Forest, Hierarchal LVQ, RBF Network (Radial Basis Function Network), Multilayer
Perceptron, and Naïve Bayes on road traffic accident dataset obtained from United Kingdom
road traffic accident of the year 2016. The results from analysis show that Fuzzy-FARCHD
algorithm was effective to classify the dataset and achieves an accuracy of 85.94%. In this
research work, Lighting Conditions, 1st Road Class & No., Number of vehicles are the key
features in selecting the attributes.

In [13] four machine learning techniques which are Naïve Bayes, k-Nearest Neighbors, Decision
trees, and Support Vector Machines were used for evaluation of Punjab road accidents. This
research work had a challenge of performing parametric evaluation to extract highly important
parameters especially for Punjab. The result of this study yields 12 most suitable parameters and
higher performance of 86.25% for Decision Tree classifier. The main causes behind the road
accidents in Punjab come from three most contributing factors with mental state of driver,
alcohol consumption, and speed of vehicle.

In [14] demonstrated models to select a set of influential factors and to build up a model for
classifying the severity of injuries. These models are demonstrated by various machine learning
techniques. Supervised machine learning algorithms, such as AdaBoost, Logistic Regression,
Naive Bayes, and Random Forests are implemented on traffic accident data. SMOTE algorithm
was used to handle data imbalance. The outcome of this research study shows that the Random
Forest model can be a best tool for predicting the injury severity of traffic accidents. RF
algorithm has shown better performance with 75.5% accuracy than LR with 74.5%, NB with
73.1%, and AdaBoost with 74.5% accuracy.

In [15] Machine Learning algorithms like Decision Tree and Naïve Bayes are used for
determination of the harshness of the accident using WEKA tool. From the Result analysis it
shows that J8 classifier gives the better accuracy compared to other algorithms to determine the
severity of an accident.
Table1: Summary of related work
Title Description Dataset Attributes Techniques Performance
used

Analysis of Develop a Dataset is Weather Logistic Decision Tree


Road model for taken from the condition, Regression, K- demonstrated
Accidents to characterizing govt. site Causes, Road Nearest better
Identify the reason for Features, Road Neighbor, performance on
Major Causes accidents. Condition, Naive Bayes, all the
and Type of Decision Tree components
Influencing Accident and Random with accuracy
Factors of Forests of 99.4%
Accidents-A
Machine
learning
Approach

Data Mining Classification Dataset was Urban or Decision tree Decision


Methods for techniques obtained Rural Area, (Random Tree
Traffic were used to from the Speed limit, Forest, archives
Accident detect the Department Light Random highest
Severity accuracy of
influential for Transport Conditions, Tree,
Prediction 80.650%.
environmental of United and Number J48/C4.5, and
features of Kingdom of Vehicles CART),
RTAs that can ANN (back-
be used to propagation),
build the and SVM
prediction (polynomial
classification kernel)
rules.

Classification Analyze the Data set Lighting Fuzzy- Fuzzy-


of Road road accident obtained from Conditions, FARCHD, FARCHD
Traffic data and predict UK road 1st Road Random algorithm
Accident the severity traffic accident is effective
Class & No., Forest,
Data Using level of the year
Number of Hierarchal to classify
Machine of the accidents 2016.
vehicles LVQ, RBF the dataset
Learning and summarize
Network and
Algorithms the information
(Radial Basis achieves an
in a useful Function accuracy of
format by using Network), 85.94%
machine Multilayer
learning
Perceptron,
techniques.
and Naïve
Bayes

Evaluation Used machine Dataset is Mental state Naïve Bayes, The outcome
and learning taken from of driver, k-Nearest of this study
Classification algorithms for Punjab alcohol Neighbors, yields 12
of Road evaluation and government’s consumption, Decision most suitable
Accidents classification of authentic and speed of trees, and parameters
Using road accidents organization vehicle Support and
Machine named Punjab
Learning Road Safety Vector maximum
Techniques Organization Machines performance
of 86.25% for
Decision
Tree
classifier

Comparison Establishes The dataset characteristics AdaBoost, Random


of Machine models to select was provided of the driver, Logistic Forests
Learning a set of by the Office passenger, and Regression, algorithm has
Algorithms influential of Highway pedestrian, shown better
Naive Bayes,
for Predicting factors and to Safety along with performance
build up a traffic and Random
Traffic Planning with 75.5%
model for condition Forests accuracy
Accident (OHSP)
Severity classifying the
severity of
injuries

Comparative Used Datasets are Speed limit, Decision J8 classifier


Study on classification collected from weather Tree and gives the
Data Mining techniques to UK traffic condition, Naïve Bayes better
Classification establish accident number of lane, accuracy
Algorithms models to repository lighting
compared to
for Predicting identify condition
other
Road Traffic accident factors
and to predict
algorithms
Accident
Severity traffic accident
severity.
As stated by several researchers, machine learning techniques have a vast role in analyzing and
predicting the future value of road accidents records and in identifying the patterns of the
components of accidents determining different factors. In addition, the great potential of machine
learning techniques plays a major role in avoiding and monitoring the problems of road
accidents.

Objectives of the Study


General Objective
The general objective of the study is to develop a model that predict the major causes of road
traffic accident in Addis Ababa by using a machine learning classification techniques.

Specific Objectives
To accomplish the above stated general objective, the following specific objectives will be carried out :
• Conduct a thorough review of literature on the existing machine learning techniques and
methods and their application in road traffic accidents.
• Identify appropriate machine learning algorithms and assess different machine learning
application software that are more appropriate to the problem domain, and select the best
software.
• Select and extract the data set required for analysis from the database of Addis Ababa
Sub city’s police departments.
• Prepare the data for analysis which includes adjusting inconsistent data encoding,
accounting for missing values, and deriving other fields from existing ones;  Conduct
training and testing of the predictive models using the new prepared dataset  Compare
and suggest the best model for prediction.
• Interpret and analyze the results of the selected model and forward recommendation.
Scope and Limitations
The scope of this research is limited to identifying and predicting the main causes to the road
traffic accident in Addis Ababa city.
There are different data related problem or limitations in this study are-

 Accidental records are found in hardcopy and hand written format. Therefore, this need
additional time and effort to encode and deal with.
Significance of the Research
The Ethiopian government is implementing different new traffic rules. The new measure, which
is taken by the government, aims to reduce the increasing number of traffic accidents that is
resulting in thousands of death of people and damages of hundreds of millions of dollars
properties every year. So that this study will support the government by adding knowledge on the
understanding of what are the risk factors that contribute to the occurrence of road traffic
accidents and related injuries in Addis Ababa. The result that will be obtained in this study, can
be used by the road safety authorities for planning and evaluating road safety measures. It will
also pave the way to develop better parameters in all aspects of traffic control system.
Specifically it will support the Traffic Control Division of Addis Ababa in taking proper action,
such as revising the existing traffic rules, against road traffic accidents. Citizens, NGOs and
media can also take necessary action with the help of local government. The recommendations
given are going to benefit the public at large on prevention of road accidents and increasing
safety performance if considered.

Research Methodology
The methodologies to be used in conducting this research are described as follows.

Review of Related Literature


A review of relevant literature has been conducted to assess machine learning technology, both
concepts and techniques, and researches in this field. Various books, journals, and articles and
papers from the Internet will be reviewed to understand the practice of accident assessment, in
particular road traffic accident assessment, and the potential applicability of machine learning
technology on road traffic accidents.
Data collection
The primary sources of dataset for this research work will be Addis Ababa sub city’s police
Department. Those necessary data collection methodology that helps to collect the necessary
additional information includes activities like interviewing experts on the area, document
analysis and others.
Data processing and Machine Learning Creation
As we are in the age of digital information, the databases of modern science are so immense
which is difficult to analysis and discover new knowledge from it using manually. Researchers
have begun to search for ways to automate its analysis, as traditional techniques for analysis and
visualization of the data are not possible. A new generation of computational techniques and
tools is required to support the extraction of useful knowledge from the rapidly growing volumes
of data. These techniques and tools are the subject of the emerging field of knowledge discovery
in databases [16]. The machine learning process is conducted using the knowledge discovery in
database process framework model that is the whole process of changing low-level data into
highlevel knowledge. The knowledge discovery in database process model is described in these
five major steps briefly below.
Data Selection- Creating a target dataset includes selecting a dataset or focusing on a subset of
variables or data samples on which discovery is to be performed.
Data Preprocessing- Data cleaning and preprocessing includes basic operations, such as
removing noise or outliers if appropriate, collecting the necessary information to model or
account for noise, deciding on strategies for handling missing data fields, and accounting for time
sequence information and known changes. The recorded traffic accidents data is in Amharic
language, hence it need language transformation and important fields and their values should be
translated to English language using subject experts on the area. After doing this, we will
perform the basic activities of data preprocessing.
Data Transformation- The data transformation step includes finding useful features to represent
the data, depending on the goal of the task, using dimensionality reduction or transformation
methods to reduce features with no effect in the model performance.
Choosing Machine Learning Algorithms and Approaches- In this step the Machine Learning
algorithms and the approaches (supervised, semi-supervised or hybrid) used for the thesis are
decided. In this study, the supervised machine learning approach will be used to build Machine
Learning models.
Machine Learning Model Evaluation- It is the final step in the KDD process framework. It
includes two basic components:

• Interpretation of extracted patterns, possible visualization of the extracted patterns,


removing redundant or irrelevant patterns, and translating the useful ones into terms
understandable by users.
• Consolidating and analysis discovered knowledge, incorporating this knowledge into the
performance system, applying and deploying of the knowledge in the real scenario.
Budget Plan and Work Breakdown
Schedule
The following figure shows the activities and schedules of the study.

Figure 1: Schedule of the project

Budget
The study allocated the budget according to the plan considering the scope of the project from the
very beginning to the completion. This plan includes all the expenses spent from the starting of
the proposal to the completion of the project work. The estimated cost of the research is expected
to be greater than 10,000 ETB in order to get adequate, appropriate data and information.
Table 2: Budget Plan
Resource Amount Expected Price Total Price
Pen 10 50 birr 500 birr
Printing - 6 000 birr 6000 birr

Paper 2 Pack 250 birr 500 birr

Flash disk 2 250 birr 500 birr

Hard disk 1 3500 birr 3,500 birr

Transportation - 2500 birr 2500 birr

Mobile Card - 500 birr 500 birr

Data Collection - 5000 birr 5000 birr

Unexpected Cost - 6000 birr 6000 birr


Total Cost 25,000.00birr

Reference
[1] “Ethiopia introduces pedestrian penalties to cut road traffic accident”. Available:
https://newbusinessethiopia.com/health/ethiopia-introduces-pedestrian-penalties-to-cut-road-
trafficaccident/. [Accessed: 16-Dec.-2019].
[2] Andrew Greasley, “A redesign of a road traffic accident reporting system using business process
simulation”, Business Management Process Journal, Vol.10, No.6, 2004, pp.635-644.
[3]. Fanueal Samson, “Analysis of Traffic Accident In Addis Ababa: Traffic Simulation”, MSC thesis,
Department of Mechanical Engineering, Addis Ababa University, Faculty of Technology, 2006.
[4]. J. Frankenfield, “Machine Learning,” Investopedia, 18-Nov-2019. [Online]. Available:
https://www.investopedia.com/terms/m/machine-learning.asp. [Accessed: 20-Dec-2019]. [5].
Raja Ashok Bolla, (2014). Crime pattern detection using online social media. Thesis paper.
[6]. M. Reza, Keyvanpour, “Detecting and investigating crime by means of data mining: a general crime
matching framework”, vol. 3, 2011.
[7]. Rule Mining and Classification of Road Traffic Accidents Using Adaptive Regression Trees
[8]. Z. Regassa, “Determining the degree of driver’s responsibility for car accident: the case of Addis
Ababa traffic office,” Master’s thesis, Addis Ababa University, 2009.
[9]. Beshah, T. and Hill, S., Mining Road Traffic Accident Data to Improve Safety: Role of Road-Related
Factors on Accident Severity in Ethiopia. In AAAI Spring Symposium: Artificial Intelligence for
Development, 2010.
[10]. T. Ketha, “Analysis of Road Accidents to Identify Major Causes and Influencing Factors of
Accidents – A Machine Learning Approach,” International Journal of Advanced Trends in Computer
Science and Engineering, vol. 8, no. 6, pp. 3492–3497, 2019.
[11]. Q. A. Al-Radaideh and E.J. Daoud, “ Data Mining Methods for Traffic Accident Severity
Prediction,” International Journal of Neural Networks and Advanced Applications, vol. 5, 2018.
[12]. B. Kumeda, F. Zhang, F. Zhou, S. Hussain, A. Almasri, and M. Assefa, “Classification of Road
Traffic Accident Data Using Machine Learning Algorithms,” 2019 IEEE 11th International Conference
on Communication Software and Networks (ICCSN), 2019.
[13]. J. Singh, G. Singh, P. Singh, and M. Kaur, “Evaluation and Classification of Road Accidents Using
Machine Learning Techniques,” Emerging Research in Computing, Information, Communication and
Applications Advances in Intelligent Systems and Computing, pp. 193–204, 2019.
[14]. R. E. Almamlook, K. M. Kwayu, M. R. Alkasisbeh, and A. A. Frefer, “Comparison of Machine
Learning Algorithms for Predicting Traffic Accident Severity,” 2019 IEEE Jordan International Joint Conference
on Electrical Engineering and Information Technology (JEEIT), 2019.
[15]. T. K. Bahiru, D. K. Singh, and E. A. Tessfaw, “Comparative Study on Data Mining Classification
Algorithms for Predicting Road Traffic Accident Severity,” 2018 Second International Conference on
Inventive Communication and Computational Technologies (ICICCT), 2018.
[16]. U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "The KDD process for extracting useful
knowledge from volumes of data," Communications of the ACM, vol. 39.

You might also like