Professional Documents
Culture Documents
1 Abstract
2 In Ghana, traffic safety has always been a major concern in terms of sustainable transportation development, and predicting the severity of traffic
3 accidents remains an important challenging issue. Predicting crash injury severity is an important constituent of reducing the consequences of
4 traffic crashes. This study developed machine learning (ML) models including; Random Forest (RF), Logistic Regression (LG) and Artificial
5 Neural Network to predict crash injury severity using several crash-related parameters. The input parameters mainly include vehicle attributes,
6 road condition and light condition attributes. This study employed the crash database of Ghana for the years 1998–2011. The performance of
7 the various algorithms was measured and compared based on accident severity prediction accuracy, precision, recall, F1-scores, Receiver
8 Operating Characteristics (ROC) scores, and the confusion matrix, while the relevance of the feature attributes was determined using feature
9 selection technique. RF and ANN classifiers performed beyond the acceptable threshold of 70% for Precision, Recall, F-score and Accuracy.
10 After building the predictive model, the RF classifier predicted an accuracy of 87.97%. The ANN classifier was the second-best performing
11 classifier followed by the LR classifier which yielded an overall accuracies of 70.80% and 48.68% respectively. The study has demonstrated the
12 potential of machine learning (ML) as a reliable accident forecasting technique, based on predicted performance and accuracy. The findings of
13 this study are expected to be useful in the establishment or improvement of an effective traffic safety system within a sustainable transportation
14 system, which is critical in assisting government managers in developing timely proactive traffic accident prevention strategies and effectively
15 improving road traffic safety.
16 Keywords: machine learning; logistic regression model; random forest model; artificial neural network; accident severity; feature importance.
Background ductivity loss, medical bills, legal and judicial costs, emergency
costs, insurance fees, property damage, congestion costs, and
Road traffic accident has been one of the leading causes of in-
employment loss Blincoe et al. (2015).
juries and deaths. More than 1.2 million people die each year
on the world’s roads, according to the World Health Organiza- Road traffic accidents affect a huge number of countries each
tion Organization (2015). year, including Ghana. Several factors influence road accidents
In Ghana, studies have shown that one of the leading causes in Ghana, resulting in the severity of the incidents being pre-
of death and injury is road traffic crashes most of which occur in dicted. Statistical modeling techniques have traditionally been
urban areas. Over the last decade, 72 people per 100,000 have used to forecast crashes and categorize their severity Savolainen
experienced a serious bodily injury, 2,080 people have died in et al. (2011), Kidando et al. (2019). However, estimating the sever-
traffic accidents, and over 8% of the population has died as a ity of a road traffic accident using statistical modeling techniques
result of traffic accidents Blankson and Lartey (2020). According is not very accurate Wahab and Jiang (2019). For example, the as-
to the Statista Research Department, there were almost 12,100 sumptions related to data distribution and a linear relationship
road traffic accidents in Ghana from January to October 2020, between explanatory and dependent variables can be untrue
involving over 20,400 vehicles. and lead to inaccurate inferences, an innovative approach (ma-
Furthermore, the collisions resulted in 2,080 deaths and chine learning and deep learning) based on supervised learning
12,380 injuries. According to the source, more males than girls is proposed to improve the performances of accident severity
were involved in traffic accidents in 2016. The source further prediction and to overcome such limitations.
indicated that buses and mini-busses were the leading vehicles The National Road Safety Commission (NRSC) and the Motor
involved in accidents of this nature, after cars. In Ghana, road Traffic and Transport Unit (MTTU) have taken many measures
accidents are still a major public safety concern. Traffic crashes and have made significant commitments to improve travel safety.
incur enormous expenses to people and society, including pro- However, traffic accident may occur at any time and in any
2 Journal of ...
location. Drivers, on the other hand, might be given important Machine Learning Model in Accident Severity Predic-
information to help them prevent or lessen their chances of tion.
being involved in an accident. For preventing and reducing Machine learning models have been employed for traffic acci-
the incidence of traffic accidents, forecasting and identifying dent severity prediction and have proved to have some advan-
associated components under varied situations are critical. As a tages over statistical models.
result, traffic accident prediction models have been developed ML can model the non-linear relationship that exists between
to disclose the important factors that influence traffic accidents target variable and related explanatory variables Assi et al. (2020).
so that traffic safety can be enhanced. Also, because some level of assumptions exist between the ex-
The rest of this paper is organized as follows: Chapter 2 planatory variables and the target variables in statistical models,
introduces some previous studies which are closely related to there would be a model failure if the assumptions are violated,
traffic accident severity prediction. Chapter 3 identifies the key machine learning methods do not depend on inherent assump-
crash types and investigates the impacts of risk factors on differ- tions. Machine learning models can help model the complexity
ent types of crashes using the data resources obtained from the between the explanatory variables and are able to capture non-
motor traffic and transport unit. Also, the proposed model is linear relationships while it could be difficult to be achieved
introduced for predicting the crash severity. Chapter 4 presents using statistical models Chang (2005). The study of Lord and
results and discussion on the proposed model and compares Mannering (2010) and Mannering and Bhat (2014) indicates that
it with some other models. Chapter 5 outlines the main con- the growth of research with regards to transportation could be
clusions and explains the limitations and recommendations for greatly elevated by new dataset resources provided by the rise
future study. of current technologies.
Logistic Regression (LR), Random Forest (RF), K Nearest
Neighbor (KNN), Support Vector Machine (SVM), and Decision
Literature Reviews tree models are known to be the most widely employed models
that were conducted to uncover the significance of machine
Road safety managers and researchers have looked at a variety
learning (ML) model in predicting crash severity over Statistical
of strategies and data to improve road traffic safety. A thorough
models Delen et al. (2017).
understanding of the factors that contribute to road traffic acci-
dents is demanded of effective road safety management. Over
the years, experts have worked hard to uncover some of the Neural Network Model in Accident Severity Prediction.
elements that influence the severity and frequency of accidents. Neural Networks have been utilized in the past as computer-
This chapter covers a summary of the adopted models, such as based models for knowledge processing and prediction in a
statistical models, machine learning models, and the progress of variety of domains Mussone et al. (1999).
deep learning, as well as accident severity prediction. Neural networks have been successfully used to learn and
memorize feature datasets, analyze data, and draw comparisons
between new and old data Ertugrul and Hizal (2005) and to teach
Statistical Model in Accident Severity Prediction the dynamics of non-linear system without any form of math-
According to previous studies, the most prevalent methods of lin- ematical modeling Singh and Deo (2007). Many studies have
ear and nonlinear regression analysis utilized for traffic severity employed neural network models to predict traffic accidents and
prediction include linear regression modeling, logistic regres- severity in transportation research Alkheder et al. (2017), and
sion modeling, and negative binomial regression. A statistical these models have demonstrated high accuracy in predicting
model is a mathematical model that encapsulates a set of sta- accident severity as compared to other Statistical Models Abdel-
tistical assumptions about the generation of sample data, with Aty and Abdelwahab (2004)
the modeling processes largely depicted in idealized form Cox A comprehensive study conducted byChang (2005) compared
(2006). The methods are not only limited in their applicability a 3-layer Artificial Neural Network (ANN) model with the Neg-
because some are not always viable, such as when the conclusion ative Binomial model for the prediction of crash frequency. The
is discrete, but they also demand strong assumptions about data results indicated that the ANN model performed better than
distribution. James and Kim (1996). the Negative Binomial model. Research conducted by Xie et al.
Models such as simple multiple linear regression and negative (2007) compared the Back Propagation Neural Network (BPNN)
binomial regression were some of the early models of traffic acci- and Bayesian Neural Network (BNN) and the Negative Bino-
dent prediction which were based on the approach that assumes mial model for predicting the frequency of traffic accident on
normal distribution of errors. The general form of the linear rural roads. The study indicated that both of the neural network
accident prediction model can be expressed as follows: models had better prediction performance than the Negative
Binomial model. Convolutional Neural Network (CNN) can ex-
amine spatial information and have been extensively employed
y|θ Dist(θ )withθ = f ( X, β, ϵ) in image classification problems Wenqi et al. (2017). The study
of Ren et al. (2018) used a traffic accident dataset and a GPS
dataset to construct a deep learning model in order to better un-
where, derstand the relationship between human movement and traffic
Y: the response variable (i.e accident frequency) accidents. Given real-time GPS data, the model could assess the
θ : the accident dataset likelihood of accidents and their position on a map.
Dist(θ ): the model distribution. While the multiple studies and significant contribution of
X: represent the vector of different explanatory variables. the surveyed model to road safety should not be overlooked,
β: represents the vector of regression coefficient. it is very important to undertake different types of accident
prediction so that specific countermeasures can be made.
Borkor et al. 3
Figure 2 Confusion Matrix of Random Forest Algorithm Figure 4 Confusion Matrix of Artificial Neural Network Algo-
rithm