You are on page 1of 55

PROGNOSTICATING ROAD ACCIDENT BY USING

MACHINE LEARNING
A Project Report
Submitted in partial fulfillment of requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by

M. Priyanka (19761A12A0)

Under the Esteemed Guidance of

Mr. M. Rajesh Reddy


Assistant Professor
Dept of IT, LBRCE

DEPARTMENT OF INFORMATION
TECHNOLOGYLAKIREDDYBALIREDDYCOLLEGEOFENGINEERING
(Autonomous)
L.B.REDDY NAGAR, MYLAVARAM, NTR (District)-521230

(Affiliated to JNTUK Kakinada & Approved by AICTE, New Delhi, NAAC Accredited
with “A” grade, Accredited by NBA under Tier-I (CSE, IT,EEE,ECE,ME) & Certified by
ISO 9001:2015)
2022-2023
LAKIREDDYBALIREDDY COLLEGEOFENGINEERING

(AUTONOMOUS)

(Affiliated to JNTUK Kakinada & Approved by AICTE, New Delhi, NAAC Accredited
with “A” grade, Accredited by NBA under Tier-I (CSE, IT,EEE,ECE,ME) & Certified by ISO
9001:2015)

L.B.REDDY NAGAR, MYLAVARAM, NTR Dist.

DEPARTMENT OF INFORMATION TECHNOLOGY

CERTIFICATE

This is to certify that M.PRIYANKA (19761A12A0) of IV B.Tech


(InformationTechnology) have successfully completed the project titled
“PROGNOSTICATING ROAD ACCIDENT BY USING MACHINE LEARNING ” at
Lakireddy Bali Reddy College of Engineering during the Academic Year 2022-2023. This
project report is submitted as partial fulfillment for the award of degree B.Tech
(Information Technology).

Project Guide Head of Department


Mr.Rajesh Reddy Dr.B.Srinivasa Rao
Asst. Professor Professor
Department of IT Department of IT

Internal Examiner External Examiner


ACKNOWLEDGEMENT

Behind every achievement lies an unfathomable sea of gratitude to those who activated
it,without whom it would ever have been in existence. To them we lay the words of gratitude
imprinted with us.

I express my special gratitude to Dr. K. Appa Rao, Principal of Lakireddy Bali Reddy
College of Engineering, who made this endeavor possible. We would also thank all theteaching
and non- teaching staff members of Information Technology department who have extended
their full cooperation.

I highly oblige to Dr. B. Srinivasa Rao, HOD of Information Technology for his
support and encouragement.

It is with utmost pleasure that we avail this opportunity to express heart full gratitude to
our beloved guide Mr. M. Rajesh Reddy, Assistant Professor of Information Technology for
his mastery supervision through all the phases of our main project work with valuable
suggestions and support. His friendly and informal talks helped us to work under excellent
working conditions.

I am highly thankful to our project coordinator Dr. A. V.N Reddy, Associate Professor of
Information Technology, for providing the knowledge to deal with the project at every phase in
a systematic manner.

Finally, a word of gratitude to our PARENTS who have be a constant source of


encouragement and love.

PROJECT ASSOCIATE
M. Priyanka (19761A12A0)
DECLARATION
I hereby declare that the project report entitled “PROGNOSTICATING ROAD ACCIDENT BY
USING MACHINE LEARNING” submitted to JNTUK is partial fulfillment of the requirement for the
award of the degree of Bachelor of Technology (B. Tech) is an original work carried out by us. The
matter embodied in this project is a genuine work by the student and has not been submitted earlier to this
university or any other university for award of any degree or diploma or prize.

PROJECT ASSOCIATE
M. Priyanka (19761A12A0)
INDEX
TABLE OF CONTENTS PAGE NO
LIST OF FIGURES I
LIST OF TABLES II
ACRONYMS AND ABBREVATION III
ABSTRACT 1
1. INTRODUCTION 3
2 .LITERATURE SURVEY 5-6
3. SYSTEM ANALYSIS 8-9
3.1 Problem statement 8
3.2 Objective 8
3.3 Existing System 8
3.4 Proposed System 9
4. SYSTEM SPECIFICATION 11-12
4.1 Hardware Requirements 11
4.2 Software Requirements 11
4.2.1 Python 11
4.2.2 OpenCV 11
4.2.3 NumPy 11
4.2.4 Google Colab 11
4.2.5 TensorFlow 11-12
4.2.6 Keras 12
4.2.7 Matplotlib 12
4.2.8 Pandas 12
4.3 Functional Requirements 12
4.4 Non-Functional Requirements 12
4.4.1 Performance Requirements 12
4.4.2 Security Requirements 12
5. DESIGN VIEW
5.1 UML Diagrams
5.1.1 Use Case Diagram 14-15
5.1.2 Class Diagram 16
5.1.3 Sequence Diagram 17-18
5.1.4 Activity Diagram 19-20
6. IMPLEMENTATION
6.1 Dataset Information 22
6.2 Methodology
6.2.1 CNN 22-23
6.2.2 Decision tree 23
6.3 Evaluation metrics 24
7. SOURCE CODE 26-36
8. RESULTS AND DISCUSSION 38-39
9. CONCLUSION 41
10. FUTURE SCOPE 43
REFERENCES 44-46
LIST OF FIGURES

Figure 1: Use case diagram….………….…...…………………………………… 15


Figure 2: Class diagram ……………. ….………………………………………... 16
Figure 3: Sequence diagram…………….……………………………………….. 18
Figure 4: Activity diagram………………………………………………………. 20
Figure 5: Architecture of CNN…….…....…….………………………………… 23
Figure 6: Decision tree………………….….…………………………………… 23
Figure 7: Single lane data model accuracy….………………………………….. 38
Figure 8: Double lane data model accuracy ……………….…………………… 39
Figure 9: Three lane data model accuracy…………….…….………………….. 39

i
LIST OF TABLES

Table-1: The initial phase of intersection grouping based on prior probability……… 38

ii
ACRONYMS AND ABBREVATIONS

DL Deep Learning

ML Machine Learning

CNN Convolutional Neural Networks

SVM Support Vector Machine

DT Decision Tree
UML Unified Modelling language
DFD Data Flow Diagram

CV Computer Vision

FC Fully Connected

iii
ABSTRACT
Rural mayday systems can shorten the "accident notification time," or the interval
between an accident and the alerting of emergency personnel. The number of fatalities may
change when this time is cut back. The quantifiable association between deaths and the
accident notification time is estimated using statistical analysis. Allowing its residents to die
in traffic accidents is completely unacceptable and saddening. Thus, a thorough study is
needed to manage this dire scenario. In this, a deeper analysis of traffic accidents will be
conducted in order to quantify the severity of accidents in our nation using machine learning
techniques. Also, we identify the crucial elements that distinctly influence traffic accidents
and offer some helpful recommendations in relation to this problem.. Analysis has been
done, using grid search, random search, or Bayesian optimization. all these applied to the
Indian dataset. by this approach supports and supervise to classify the severity of accidents
into High, Low and Medium Injury of the victims..

1
CHAPTER 1

2
1. INTRODUCTION
One of the most important studies in the field of traffic research is traffic speed
prediction. The success of traffic speed prediction is crucial for both the advantages of road
users and traffic control organisations. In essence, speed estimation belongs to the family of
traffic information estimation. Traffic scientists have created a range of traffic information
prediction approaches, including data driven statistical and machine learning models, thanks
to the data made available by Intelligent Transportation Systems (ITS). How to select the
best prediction approach is a key problem in this area.
Injuries and fatalities from road accidents occur on a massive scale every year, with
severe social and psychological effects. The biggest cause of death for teenagers and young
people worldwide is motor vehicle accidents. Men are involved in road accidents that
account for more than seventy percent (80%) of all fatalities in developing nations. Serious
traffic safety difficulties have plagued the Kingdom of Saudi Arabia (KSA), especially since
the early 1970s oil boom. The KSA outperforms all other G20 nations in terms of annual
road accidents, with more than 300,000. RTAs cause a loss of SAR 13 billion yearly for the
KSA because 30% of hospital capacity is impacted.
The lowest populated of Saudi Arabia's top five provinces in terms of RTA frequency
is Qassim, where the Traffic Police Bureau recorded more than 18,000 accidents in 2010.
These accidents affected around 23,000 persons, resulting in 2000 injuries and nearly 370
fatalities. There aren't many research exploring crash severity in various KSA cities currently
available in the literature. study intends to close this gap by creating a model for the severity
of traffic accidents in Qassim Province and identifying the factors that influence severity.
Traffic accidents could occur anywhere, anytime, and there are a lot of vehicles on
the road every day. Accidents that result in fatalities cause people to pass away. Every human
being wants to be safe and prevent accidents. Data mining techniques could be used on the
dataset of traffic accidents to extract some useful information and provide driving
recommendations. We analyse the roadway traffic data using the Apriori algorithm, a
traditional association rule mining technique, whose primary aim is to identify common
itemsets.

3
Chapter 2

4
2. LITERATURE SURVEY
Chen et al., (2019) proposed a research framework based on key feature selection of
a vehicle trajectory dataset and risk prediction of lane-changing (LC) behavior
onexpressways [14]. This study applied fault tree analysis and K-means clustering methods,
based on the Crash Potential Index (CPI) to determine the risk level of vehicle lane changes.
The results of key feature selection showed that the interaction between the vehicle that
changes lanes and the surrounding vehicles, along with changes in vehicle acceleration
between surrounding the vehicles in the target lane, is a significant factor in the risk
assessment of lane change behavior.

Wang et al. (2019) established a collision tendency prediction model based on the
characteristics of vehicle groups collected by the floating car method based on a highway in
Shanghai [15]. The binary logistic regression model and support vector machine (SVM)
were used to build prediction models with an accuracy of 85%, as opposed to 60% for the
binary logistic regression model.
Wang et al. (2019) studied real-time collision prediction issues [16] based not only
on traffic and environmental forecasting factors, but also included the impact of social
demographic data and trip generation parameters on immediate collision risk. They used
traffic, geometric, socio-demographic and trip generation predictors to analyze the
immediate collision risk of highway ramps. Two Bayesian logistic regression models were
used to identify collision precursors and their impact on ramp collision risk. At the same
time, four support vector machine (SVM) models were used to predict collision occurrence.
The results showed that the inclusion of sociodemographic and trip generation predictors
improved prediction performance.

Zheng & Sayed (2020) proposed a generalized extreme value (GEV) model based on
the Bayesian hierarchical structure to predict collision risk based on three indicators: traffic
volume, shock wave area and platoon ratio [17]. The proposed method was applied to four
signalized intersections in Surrey, British Columbia.

Cai et al. (2020) sought to address extremely unbalanced traffic data in collision and
non-collision cases, using a Deep Convolutional Generative Adversarial Network (DCGAN)
[18], balancing the dataset through the synthetic minority oversampling technique (SMOTE)

5
and random undersampling technique. The experimental design used logistic regression,
support vector machine (SVM), artificial neural networks (ANN) and convolutional neural
networks (CNN) to develop twelve models for performance evaluation. The results showed
that the DCGAN provides the best prediction accuracy.

Yu et al. (2021) presented a novel Deep Spatio-Temporal Graph Convolutional


Network (DSTGCN) to construct a traffic accident model [19]. Experimental results showed
the proposed DSTGCN model outperformed the LR, LASSO, SVM, and DT methods.
Meanwhile, Fang et al. (2021) proposed a semantic context induced attentive fusion network
(SCAFNet) that first segments the RGB video frames into many individual images, which
can then be used as the inputs for the graph convolution network (GCN) based on different
semantic regions. Experimental results showed the proposed model is useful for prediction
of driver attention.

6
Chapter 3

7
3. SYSTEM ANALYSIS

3.1PROBLEM STATEMENT
The problem addressed in this project is the need for an effective and accurate
method to predict high accident risk locations at intersections, which can assist traffic
management departments in identifying priority intersections for improvement and reducing
the likelihood of traffic accidents. Traditional methods for predicting traffic accidents have
limitations in terms of accuracy and effectiveness, making it difficult for traffic management
departments to proactively identify and address accident-prone areas. Therefore, there is a
need for a high accident risk prediction model based on deep learning techniques that can
accurately analyze traffic accident data and identify the key factors that contribute to the
occurrence of accidents at intersections.

3.2OBJECTIVE
The objective of prognosticating road accident using machine learning is to develop a
computer algorithm that can accurately analyze accidents occur at high level intersections.
The aim is to improve the accuracy and efficiency and detecting high level intersections
which can help to decrease the high rise during the accidents and also prescribe the safety
measures for the people.

3.3EXISTING SYSTEM
The use of data mining techniques in traffic accidents could potentially lower the
death rate. By establishing road safety programmes at the municipal and federal levels, using
a road safety database makes it possible to lower the fatality rate. techniques of classification
to forecast the seriousness of injuries sustained in auto accidents. Apriori and Predictive
Apriori association rules algorithms were applied to a dataset on road accidents that was
obtained from the Government Traffic Office in order to study the relationship between
recorded incidents and factors that affect accident severity.

DISADVANTAGES
• The model does not tells about the accidents occur at high risk intersections
• It requires large amount of data.

8
PROPOSED DESIGN
Due to a growth in the number of cars, the road has become complex in design and
management sectors. The issue of road accidents has been identified, and studies on a
remedy have been done. This has improved public health and the nation's economy. Due to
technical advancements and more affordable data retention, big data integration has
increased. A cornerstone of the data mining has been discovered with the emergence of the
demand for data retrieval from this massive data scale. This work aims to allocate the most
pertinent machine classification approach for data mining-based road accident
quantification.

ADVANTAGES
• High accuracy: Deep learning algorithms can achieve high accuracy levels in
detecting the accidents.
• Cost-effective: Automated screening with deep learning can be more cost-
effective than manual screening, which may require highly trained professionals
and specialized equipment.
• Scalability: Deep learning algorithms can analyze large amounts of retinal
images ina short amount of time, making them highly scalable for large screening
programs. Overall, using deep learning for fruit classification can improve the
accuracy and efficiency.

9
Chapter 4

10
4. SYSTEM SPECIFICATIONS
4.1 HARDWARE REQUIREMENTS
Processor : Intel i3 or above
Hard Disk : 120GB
RAM : 4GB or higher

4.2 SOFTWARE REQUIREMENTS


4.2.1 Python
Python is a well-liked general-purpose programming language. It supports a variety
of programming paradigms. It can be used to create web applications, software development
and communicate with database systems. It was designed to be highly extensible through
modules. The coding methodology is simpler, lesscluttered syntax and grammar.
4.2.2 OpenCV
OpenCV is an open-source library of python, which is used to solve computer
visionproblems. It is used to understand the content of the digital image and extracts the
descriptionfrom the digital image, which may be a text description, an object etc. By using
OpenCV library we can perform many tasks such as face detection, image filter, face
recognition, template matching, etc.
4.2.3 NumPy
NumPy is a Python library used in data science to analyze data for deriving
informationfrom it. It is used to work with arrays and includes matrices, Fourier transform,
and linear algebra functions.
4.2.4 Google Colab
It is a python development environment where you can runPython code in your
GoogleDrive. Colab is a free notebook that runs in the browser using Google Cloud. As it
runs on a server you can use it to interact with online database and can keep the code private.
It is used to work collaboratively on the same cloud file with multiple editors. Google Colab
is connected to google drive, then the environment will open to write and run our code.

11
4.2.5 TensorFlow
TensorFlow is an open-source software library developed by Google Brain for
machine learning and artificial intelligence applications. It provides a platform for building
and training neural networks, including deep learning models, for a wide range of tasks such
as image and speech recognition, natural language processing, and more. TensorFlow
supports both CPU and GPU computations, making it highly scalable and efficient for large-
scale data processing.
4.2.6 Keras
Keras is an open-source software library that provides a Python interface for artificial
neural networks. Keras serves as an interface for the TensorFlow library. Keras includes
many implementations of commonly used neural-network building blocks, including layers,
targets, activation functions, optimizers, and a various tools to make working withimage and
text data easier to simplify the coding necessary for writing deep neural networkcode.
4.2.7 Matplotlib
Matplotlib is a complete Python library that allows you to create static, animated,
andinteractive visuals. Matplotlib makes simple and complex things easier. Matplotlib is a
Python package that allows you to create consistent visuals, animations, and interaction,
providing data structures and functionality to manipulate numerical tables and time series.
4.2.8 Pandas
Pandas is an open-source Python package that is widely used in data science, data
analysis and machine learning functions. It is built on top of another package called
NumPy,which provides support for multi-dimensional arrays.

4.3 FUNCTIONAL REQUIREMENTS


For the proper functioning of the system, we need to have all libraries installed.

4.4 NON-FUNCTIONAL REQUIREMENTS


4.4.1 Performance Requirements
• The device should respond immediately.
4.4.2 Security Requirements
• Login to Google Colab using Google Drive should be valid and secure.

12
CHAPTER 5

13
5. DESIGN VIEW
5.1 UML DIAGRAMS
5.1.1 USE CASE DIAGRAM
A use case diagram serves as a representation of the dynamic behavior of a system. It
comprises actors, use cases, and their interactions, which collectively encapsulate the
system's functionality. These actors can be human users, internal or external applications.
The main objective of a use case diagram is to demonstrate the system's dynamic nature
while taking into consideration its requirements, including internal and external factors.

System Name

System

Use case

Actor

<<include>> Relationships

<<extends>>

14
The proposed system's use case diagram, depicted in Figure 1, displays the actors
involved, namely the User and the System. Each actor has a set of distinct actions to execute
within the system.

Fig.1 Usecasediagram

15
5.1.2 CLASS DIAGRAM:
A class diagram is a specific kind of static structural diagram that illustrates a
system's classes, properties, operations, and relationships between the objects to describe the
structure of the system. It displays the system’s classifiers static structure. Class diagrams are
a tool that business analysts can use to model systems from a business perspective. Some of
the notations and symbols used in class diagram are as follows:

Class

Association

Dependency

The class diagram of the proposed system is shown in the below Fig 2. The classes
that are used in the system are Fruit, Fresh fruit, Rotten fruit .

Fig 2: Class Diagram

16
5.1.3 SEQUENCE DIAGRAM:
Sequence diagrams are interaction diagrams that detail how operations are carried
out. They Capture the interaction between objects in the context of collaboration. In a
sequence diagram each instance is represented by a lifeline. It is also called as an event
diagram. Some of the notations and symbols used in
sequence diagram are as follows:

Lifeline

Message

Self-message

17
The sequence diagram of proposed system are shown as below:

Fig.3 Sequence diagram

18
5.1.3 ACTIVITY DIAGRAM:

An activity diagram is a type of behavioral diagram used in UML to depict the


control flow of a system from a starting point to an ending point. It bears a close
resemblance to a flowchart, with a focus on the conditions and sequence of flow within the
system. In an activity diagram, an "activity" represents a specific function or action
performed by the system. Some of the notations and symbols used inactivity diagram are as
follows:

Start

Finish

Activity

Decisionnode

19
The activity diagram for proposed model are as shown as below:

Fig.4Activitydiagram

20
Chapter 6

21
6. IMPLEMENTATION
Implementation is the phase in which the theoretical design is transformed into a
working system. The most important phase for realizing a new successful system with
highest accuracy and classifying new images for users. A system can be implemented once
after it has been tested and verified to be working to specification. This involves examining
current and existing systems using careful planning, implementation design of methods to
achieve change through method change assessment. The two main tasks are collecting
relevant data and training a model on the data using model scoring.
6.1 DATA SET INFORMATION
• Source of the dataset — Click Here
• Kaggle dataset link — Click Here
6.2 METHODOLOGY
6.2.1CNN
The use of Convolutional Neural Networks (CNNs) has been prevalent in accident
prediction, specifically in transportation engineering. CNNs can be employed to scrutinize
various aspects that may lead to accidents, such as traffic images and videos, road
conditions, and weather patterns.A technique for utilizing CNNs in accident prediction is to
extract features from traffic images and videos. By training the CNNs to identify patterns in
the images that indicate potential hazards, such as a pedestrian crossing the street or a
vehicle changing lanes suddenly, these patterns can be utilized to predict the possibility of an
accident happening at a particular time of day or in a specific location.In a study, a CNN
predicted traffic accident risk based on various factors, including weather conditions, road
surface conditions, and traffic volume, achieving a 92% accuracy rate.

22
Fig.5 Architecture of CNN

6.2.2 DECISION TREE


A decision tree is a popular machine learning algorithm that is used for both
classification and regression tasks. It is a graphical model that represents all possible
decisions and outcomes based on a set of conditions or attributes. The decision tree is
constructed recursively by splitting the nodes based on the attribute that maximally reduces
the impurity or uncertainty of the dataset.

Fig.6Decision tree

23
6.3EVALUATION METRICS
The performance of the proposed architecture is evaluated based on several statistical
measures in addition to our new metric, defined as the corona score.
In order to move forward we need to have knowledge on below point:
• True Positive: True Positive is equal to the number of correct predicted
positive cases.
• False Positive: False Positive is equal to the number of incorrect predicted
positive cases.
• True Negative: True Negative is equal to the number of correct predicted
negative cases.
• False Negative: False Negative is equal to the number of incorrect predicted
negative cases

Accuracy: The measure of reliable estimation of right instances among the


complete instances. The measure of Accuracy is calculated as
𝑇𝑃+𝑇𝑁
Accuracy = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

Recall: Recall is the proportion of relevant overall results that the algorithm
recognizes correctly. The measure of recall is calculated as
𝑇𝑃
Recall= 𝑇𝑃+𝐹𝑁

Precision: Precision is the ratio of the unnecessary positive case to the total number.
The measure of precision is calculated as
𝑇𝑃
Precision= 𝑇𝑃+𝐹𝑃

Specificity: Specificity is the ratio of correct predicted negatives over negative


observations. The measure of specificity is calculated as
𝑇𝑁
Specificity=
𝑇𝑁+𝐹𝑃

24
Chapter 7

25
7. SOURCE CODE
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import pandas as pd
from scipy import stats, interp

import matplotlib.pyplot as plt


%matplotlib inline
import seaborn as sns
sns.set(font_scale=1)

from itertools import cycle

from functools import reduce

from sklearn import neighbors, preprocessing, tree


from tensorflow.keras.utils import to_categorical
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.feature_selection import f_classif, SelectKBest, RFE, RFECV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score,
cross_validate, GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold, StratifiedKFold
from sklearn.svm import LinearSVC, SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import make_scorer, balanced_accuracy_score, recall_score,
roc_auc_score, roc_curve, auc
from sklearn.metrics import classification_report
import keras
fromkeras.layers import
Embedding,Dense,LSTM,Bidirectional,GlobalMaxPooling1D,Input,Dropout

26
from keras.callbacks import EarlyStopping,ReduceLROnPlateau
from keras.models import Sequential
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder
from keras.preprocessing.text import Tokenizer

from imblearn.metrics import specificity_score

pid=pd.read_csv('/content/drive/MyDrive/INDIAN_ROAD/Databases/laneAccidents.csv')
pid.fillna(value=0, inplace=True)
pid.head()
data_sno = pid.drop(['S.No','State/UT'], axis=1)
main_data = data_sno[0:35][:]
# main_data.head()

print(main_data.shape)
single_lane_data = main_data[['Single Lane - Accident - 2014','Single Lane -
Accident - 2014 per 1L people','Single Lane - Killed - 2014','Single Lane - Killed - 2014 per
1L people', 'Single Lane - Injured - 2014','Single Lane - Injured - 2014 per 1L people']]
# single_lane_data = main_data[['Single Lane - Accident - 2014','Single Lane -
Accident - 2014 per 1L people',
# 'Single Lane - Killed - 2014','Single Lane - Killed - 2014 per 1L people', 'Single
Lane - Injured - 2014',
# 'Single Lane - Injured - 2014 per 1L people']]

sld = single_lane_data.values
sld_ = np.zeros([sld.shape[0],sld.shape[1]])
sld_x = np.zeros([sld.shape[0],3])
sld_y = np.zeros([sld.shape[0],3])
lab_y = []
count = 0
for i in range(0,sld.shape[0]):
for j in range(0,sld.shape[1]):

27
sld_[i,[j]] = sld[i,[j]]/(sld[:,[j]].max())
for xi in range(0,sld_x.shape[0]):
for xj in range(0,sld_x.shape[1]):
sld_x[xi,[xj]] = sld_[xi,[xj+count]]+ sld_[xi,[xj+count+1]]
if sld_x[xi,[xj]]>0 and sld_x[xi,[xj]]<0.33:
sld_y[xi,[xj]] = -1
elif sld_x[xi,[xj]]>0.33 and sld_x[xi,[xj]]<0.68:
sld_y[xi,[xj]] = 0
else:
sld_y[xi,[xj]] = 1
count = count+1
if xj==2:
count = 0
if -1 in sld_y[xi,:]:
arr = sld_y[xi,:]
ynz = arr[np.where(arr==-1)]
y__1 = ynz.size
else:
y__1 = 0
if 0 in sld_y[xi,:]:
arr = sld_y[xi,:]
y0 = arr[np.where(arr==0)]
y_0 = y0.size
else:
y_0 = 0

if 1 in sld_y[xi,:]:
arr = sld_y[xi,:]
y_1 = arr[np.where(arr==1)]
y_1 = y_1.size
else:
y_1 = 0

28
if y__1>y_0 and y__1 > y_1:
lab_y.append(0)
elif y_1>y_0 and y_1>y__1:
lab_y.append(2)
else:
lab_y.append(1)
y_train = to_categorical(lab_y, num_classes = 3)
print(lab_y)
model=Sequential()
model.add(Input(shape=(3,)))
# model.add(Dropout(0.3))
model.add(Dense(256,activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(3,activation='sigmoid'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy')
print(y_train)
r=model.fit(sld_x,y_train,epochs=100,batch_size=2)
yy = model.predict(sld_x)
# print(y_train)
# print(yy)
for xi in range(0,yy.shape[0]):
for xj in range(0,yy.shape[1]):
if yy[xi,[xj]]<0.5:
yy[xi,[xj]] = 0
else:
yy[xi,[xj]] = 1
print(yy)
double_lane_data = main_data[['Two Lanes - Accident - 2014','Two Lanes -
Accident - 2014 per 1L people',
'Two Lanes - Killed - 2014','Two Lanes - Killed - 2014 per 1L people', 'Two Lanes -
Injured - 2014',
'Two Lanes - Injured - 2014 per 1L people']]

29
#print(single_lane_data)
# single_lane_data = main_data[['Single Lane - Accident - 2014','Single Lane -
Accident - 2014 per 1L people',
# 'Single Lane - Killed - 2014','Single Lane - Killed - 2014 per 1L people', 'Single
Lane - Injured - 2014',
# 'Single Lane - Injured - 2014 per 1L people']]

dld = double_lane_data.values
dld_ = np.zeros([dld.shape[0],dld.shape[1]])
dld_x = np.zeros([dld.shape[0],3])
dld_y = np.zeros([dld.shape[0],3])
dlab_y = []
count = 0
for i in range(0,dld.shape[0]):
for j in range(0,dld.shape[1]):
dld_[i,[j]] = dld[i,[j]]/(dld[:,[j]].max())
for xi in range(0,dld_x.shape[0]):
for xj in range(0,dld_x.shape[1]):
dld_x[xi,[xj]] = dld_[xi,[xj+count]]+ dld_[xi,[xj+count+1]]
if dld_x[xi,[xj]]>0 and dld_x[xi,[xj]]<0.33:
dld_y[xi,[xj]] = -1
elif dld_x[xi,[xj]]>0.33 and dld_x[xi,[xj]]<0.68:
dld_y[xi,[xj]] = 0
else:
dld_y[xi,[xj]] = 1
count = count+1
if xj==2:
count = 0
if -1 in dld_y[xi,:]:
arr = dld_y[xi,:]
ynz = arr[np.where(arr==-1)]
y__1 = ynz.size
else:

30
y__1 = 0
if 0 in dld_y[xi,:]:
arr = dld_y[xi,:]
y0 = arr[np.where(arr==0)]
y_0 = y0.size
else:
y_0 = 0

if 1 in dld_y[xi,:]:
arr = dld_y[xi,:]
y_1 = arr[np.where(arr==1)]
y_1 = y_1.size
else:
y_1 = 0

if y__1>y_0 and y__1 > y_1:


dlab_y.append(0)
elif y_1>y_0 and y_1>y__1:
dlab_y.append(2)
else:
dlab_y.append(1)
y_train_d = to_categorical(dlab_y, num_classes = 3)
print(len(dlab_y))

r_d=model.fit(dld_x,y_train_d,epochs=100,batch_size=2)
yy_d = model.predict(dld_x)
# print(y_train)
# print(yy)
for xi in range(0,yy_d.shape[0]):
for xj in range(0,yy_d.shape[1]):
if yy_d[xi,[xj]]<0.5:
yy_d[xi,[xj]] = 0

31
else:
yy_d[xi,[xj]] = 1
print(yy_d)
three_lane_data = main_data[['3 Lanes or more w.o Median - Accident - 2014','3
Lanes or more w.o Median - Accident - 2014 per 1L people','3 Lanes or more w.o Median -
Killed - 2014','3 Lanes or more w.o Median - Killed - 2014 per 1L people', '3 Lanes or more
w.o Median - Injured - 2014','3 Lanes or more w.o Median - Injured - 2014 per 1L people']]
#print(single_lane_data)
# single_lane_data = main_data[['Single Lane - Accident - 2014','Single Lane -
Accident - 2014 per 1L people',
# 'Single Lane - Killed - 2014','Single Lane - Killed - 2014 per 1L people', 'Single
Lane - Injured - 2014',
# 'Single Lane - Injured - 2014 per 1L people']]

tld = three_lane_data.values
tld_ = np.zeros([tld.shape[0],tld.shape[1]])
tld_x = np.zeros([tld.shape[0],3])
tld_y = np.zeros([tld.shape[0],3])
tlab_y = []
count = 0
for i in range(0,tld.shape[0]):
for j in range(0,tld.shape[1]):
tld_[i,[j]] = tld[i,[j]]/(tld[:,[j]].max())
for xi in range(0,tld_x.shape[0]):
for xj in range(0,tld_x.shape[1]):
tld_x[xi,[xj]] = tld_[xi,[xj+count]]+ tld_[xi,[xj+count+1]]
if tld_x[xi,[xj]]>0 and tld_x[xi,[xj]]<0.33:
tld_y[xi,[xj]] = -1
elif tld_x[xi,[xj]]>0.33 and tld_x[xi,[xj]]<0.68:
tld_y[xi,[xj]] = 0
else:
tld_y[xi,[xj]] = 1
count = count+1

32
if xj==2:
count = 0
if -1 in tld_y[xi,:]:
arr = tld_y[xi,:]
ynz = arr[np.where(arr==-1)]
y__1 = ynz.size
else:
y__1 = 0
if 0 in tld_y[xi,:]:
arr = tld_y[xi,:]
y0 = arr[np.where(arr==0)]
y_0 = y0.size
else:
y_0 = 0

if 1 in tld_y[xi,:]:
arr = tld_y[xi,:]
y_1 = arr[np.where(arr==1)]
y_1 = y_1.size
else:
y_1 = 0

if y__1>y_0 and y__1 > y_1:


tlab_y.append(0)
elif y_1>y_0 and y_1>y__1:
tlab_y.append(2)
else:
tlab_y.append(1)
y_train_t = to_categorical(tlab_y, num_classes = 3)
print(len(tlab_y))
r_t=model.fit(tld_x,y_train_t,epochs=100,batch_size=2)

33
yy_t = model.predict(tld_x)
# print(y_train)
# print(yy)
for xi in range(0,yy_t.shape[0]):
for xj in range(0,yy_t.shape[1]):
if yy_t[xi,[xj]]<0.5:
yy_t[xi,[xj]] = 0
else:
yy_t[xi,[xj]] = 1
print(yy_t)
four_lane_data = main_data[['4 Lanes with Median - Accident - 2014','4 Lanes with
Median - Accident - 2014 per 1L people','4 Lanes with Median - Killed - 2014','4 Lanes with
Median - Killed - 2014 per 1L people', '4 Lanes with Median - Injured - 2014','4 Lanes with
Median - Injured - 2014 per 1L people']]
#print(single_lane_data)
# single_lane_data = main_data[['Single Lane - Accident - 2014','Single Lane -
Accident - 2014 per 1L people',
# 'Single Lane - Killed - 2014','Single Lane - Killed - 2014 per 1L people', 'Single
Lane - Injured - 2014',
# 'Single Lane - Injured - 2014 per 1L people']]

fld = four_lane_data.values
fld_ = np.zeros([fld.shape[0],fld.shape[1]])
fld_x = np.zeros([fld.shape[0],3])
fld_y = np.zeros([fld.shape[0],3])
flab_y = []
count = 0
for i in range(0,fld.shape[0]):
for j in range(0,fld.shape[1]):
fld_[i,[j]] = fld[i,[j]]/(fld[:,[j]].max())
for xi in range(0,fld_x.shape[0]):
for xj in range(0,fld_x.shape[1]):
fld_x[xi,[xj]] = fld_[xi,[xj+count]]+ fld_[xi,[xj+count+1]]

34
if fld_x[xi,[xj]]>0 and fld_x[xi,[xj]]<0.33:
fld_y[xi,[xj]] = -1
elif fld_x[xi,[xj]]>0.33 and fld_x[xi,[xj]]<0.68:
fld_y[xi,[xj]] = 0
else:
fld_y[xi,[xj]] = 1
count = count+1
if xj==2:
count = 0
if -1 in fld_y[xi,:]:
arr = fld_y[xi,:]
ynz = arr[np.where(arr==-1)]
y__1 = ynz.size
else:
y__1 = 0
if 0 in fld_y[xi,:]:
arr = fld_y[xi,:]
y0 = arr[np.where(arr==0)]
y_0 = y0.size
else:
y_0 = 0

if 1 in fld_y[xi,:]:
arr = fld_y[xi,:]
y_1 = arr[np.where(arr==1)]
y_1 = y_1.size
else:
y_1 = 0

if y__1>y_0 and y__1 > y_1:


flab_y.append(0)
elif y_1>y_0 and y_1>y__1:

35
flab_y.append(2)
else:
flab_y.append(1)
y_train_f = to_categorical(flab_y, num_classes = 3)
print(len(flab_y))
r_f=model.fit(fld_x,y_train_f,epochs=100,batch_size=2)

yy_f = model.predict(fld_x)
# print(y_train)
# print(yy)
for xi in range(0,yy_f.shape[0]):
for xj in range(0,yy_f.shape[1]):
if yy_f[xi,[xj]]<0.5:
yy_f[xi,[xj]] = 0
else:
yy_f[xi,[xj]] = 1
print(yy_f)

36
Chapter 8

37
8. RESULTS AND DISCUSSION
The objective of conducting an accident risk analysis is to anticipate the likelihood of
accidents happening at different intersections. By examining past data on the frequency of
accidents and resulting casualties, an estimation can be made of the risk level associated with
each intersection. Additionally, by identifying the critical environmental factors that
contribute to accidents at intersections, it is possible to develop a risk prediction model
specifically for intersections. This model can then be utilized to forecast the level of accident
risk for intersections that have not yet experienced an accident.

grouping Number of intersections probability

Low 13,881 72.58%

Middle 4,324 22.62%

high 910 4.8%

Table1: The initial phase of intersection grouping involves assigning various risk levels
based on their prior probability.

Fig.7 Single lane data model accuracy

38
Fig.8 double lane data model accuracy

Fig.9 three lane data model accuracy

39
Chapter 9

40
9. CONCLUSION
Finally, the end-to-end data science and machine learning study demonstrated the
feasibility of employing data analysis and prediction to aid in the prevention of traffic
fatalities. The investigating agency can target its efforts and resources towards the most
high-risk circumstances by thoroughly examining the available data and creating a machine
learning model to forecast the severity of possible incidents. This paper emphasises the need
of employing data-driven ways to handle difficult challenges.

41
Chapter 10

42
10. FUTURE SCOPE
In the future, this project aims to Develop more advanced deep learning algorithms
such as convolutional neural networks (CNN) or long short-term memory (LSTM) networks
to further improve the accuracy of the model.Developing a real-time prediction model to
predict accidents as they happen and alerting drivers of high-risk areas through a driver alert
system are other possible future developments. Furthermore, integrating the model with IoT
devices to further enhance its prediction capabilities could also be explored. Finally,
deploying the model in various cities and evaluating its effectiveness in reducing the number
of accidents would be a significant step in demonstrating its potential impact.

43
REFERENCES
1. International Transport Forum (ITF). Road Safety Annual Report 2021: The Impact of
COVID-19; ITF: Paris, France, 2021.
2. World Health Organization (WHO). Global Status Report on Road Safety; WHO: Geneva,
Switzerland, 2018.
3. Al-Atawi, A.M.; Kumar, R.; Saleh, W. A framework for accident reduction and risk
identification and assessment in Saudi Arabia. World J. Sci. Technol. Sustain. Dev. 2014, 11,
214–223.
4. Memish, Z.A.; Jaber, S.; Mokdad, A.H.; AlMazroa, M.A.; Murray, C.J.; Al Rabeeah, A.A.;
Saudi Burden of Disease Collaborators. Peer reviewed: Burden of disease, injuries, and risk
factors in the Kingdom of Saudi Arabia, 1990–2010. Prev. Chronic Dis. 2014, 11, E169.
5. Barrimah, I.; Midhet, F.; Sharaf, F. Epidemiology of road traffic injuries in Qassim region,
Saudi Arabia: Consistency of police and health data. Int. J. Health Sci. 2012, 6, 31.
6. FHWA. Highway Safety Manual; American Association of State Highway and
Transportation Officials: Washington, DC, USA, 2010; Volume 19192.
7. Hosseinzadeh, A.; Moeinaddini, A.; Ghasemzadeh, A. Investigating factors affecting
severity of large truck-involved crashes: Comparison of the SVM and random parameter
logit model. J. Saf. Res. 2021, 77, 151–160.
8. Al-Moqri, T.; Haijun, X.; Namahoro, J.P.; Alfalahi, E.N.; Alwesabi, I. Exploiting Machine
Learning Algorithms for Predicting Crash Injury Severity in Yemen: Hospital Case Study.
Appl. Comput. Math 2020, 9, 155–164.
9. Panicker, A.K.; Ramadurai, G. Injury severity prediction model for two-wheeler crashes at
mid-block road sections. Int. J. Crashworthiness 2022, 27, 328–336.
10. Tang, J.; Liang, J.; Han, C.; Li, Z.; Huang, H. Crash injury severity analysis using a two-
layer Stacking framework. Accid. Anal. Prev. 2019, 122, 226–238.
11. Bahiru, T.K.; Singh, D.K.; Tessfaw, E.A. Comparative study on data mining
classification algorithms for predicting road traffic accident severity. In Proceedings of the
2018 Second International Conference on Inventive Communication and Computational
Technologies (ICICCT), Coimbatore, India, 20–21 April 2018.
12. Prati, G.; Pietrantoni, L.; Fraboni, F. Using data mining techniques to predict the severity
of bicycle crashes. Accid. Anal. Prev. 2017, 101, 44–54.
13. Özden, C.; Acı, Ç. Analysis of injury traffic accidents with machine learning methods:
Adana case. Pamukkale Univ. J. Eng. Sci. 2018, 24, 266–275.

44
14. Beshah, T.; Ejigu, D.; Abraham, A.; Snasel, V.; Kromer, P. Mining Pattern from Road
Accident Data: Role of Road User’s Behaviour and Implications for improving road safety.
Int. J. Tomogr. Simul. 2013, 22, 73–86.
15. Zhang, S.; Khattak, A.; Matara, C.M.; Hussain, A.; Farooq, A. Hybrid feature selection-
based machine learning Classification system for the prediction of injury severity in single
and multiple-vehicle accidents. PLoS ONE 2022, 17, e0262941.
16. Arhin, S.A.; Gatiba, A. Predicting crash injury severity at unsignalized intersections
using support vector machines and naïve Bayes classifiers. Transp. Saf. Environ. 2020, 2,
120–132.
17. Candefjord, S.; Muhammad, A.S.; Bangalore, P.; Buendia, R. On Scene Injury Severity
Prediction (OSISP) machine learning algorithms for motor vehicle crash occupants in US. J.
Transp. Health 2021, 22, 101124.
18. Mokhtarimousavi, S.; Anderson, J.C.; Azizinamini, A.; Hadi, M. Improved support
vector machine models for work zone crash injury severity prediction and analysis. Transp.
Res. Rec. 2019, 2673, 680–692.
19. Ma, Z.; Mei, G.; Cuomo, S. An analytic framework using deep learning for prediction of
traffic accident injury severity based on contributing factors. Accid. Anal. Prev. 2021, 160,
106322.
20. AlMamlook, R.E.; Kwayu, K.M.; Alkasisbeh, M.R.; Frefer, A.A. Comparison of
machine learning algorithms for predicting traffic accident severity. In Proceedings of the
2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information
Technology (JEEIT), Amman, Jordan, 9–11 April 2019.
21. Chen, M.-M.; Chen, M.-C. Modeling road accident severity with comparisons of logistic
regression, decision tree and random forest. Information 2020, 11, 270.
22. Niyogisubizo, J.; Liao, L.; Lin, Y.; Luo, L.; Nziyumva, E.; Murwanashyaka, E. A Novel
Stacking Framework Based On Hybrid of Gradient Boosting-Adaptive Boosting-Multilayer
Perceptron for Crash Injury Severity Prediction and Analysis. In Proceedings of the 2021
IEEE 4th International Conference on Electronics and Communication Engineering
(ICECE), Xi’an, China, 17–19 December 2021.
23. Shibata, A.; Fukuda, K. Risk factors of fatality in motor vehicle traffic accidents. Accid.
Anal. Prev. 1994, 26, 391–397.
24. Duncan, C.S.; Khattak, A.J.; Council, F.M. Applying the ordered probit model to injury
severity in truck-passenger car rear-end collisions. Transp. Res. Rec. 1998, 1635, 63–71.

45
25. Al-Turaiki, I.; Aloumi, M.; Aloumi, N.; Alghamdi, K. Modeling traffic accidents in Saudi
Arabia using classification techniques. In Proceedings of the 2016 4th Saudi International
Conference on Information Technology (Big Data Analysis)(KACSTIT), Riyadh, Saudi
Arabia, 6–9 November 2016.
26. Taamneh, M.; Alkheder, S.; Taamneh, S. Data-mining techniques for traffic accident
modeling and prediction in the United Arab Emirates. J. Transp. Saf. Secur. 2017, 9, 146–
166. [CrossRef] 27. Alkheder, S.; Taamneh, M.; Taamneh, S. Severity prediction of traffic
accident using an artificial neural network. J. Forecast. 2017, 36, 100–108.
28. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140.
29. Zhang, J.; Li, Z.; Pu, Z.; Xu, C. Comparing prediction performance for crash injury
severity among various machine learning and statistical methods. IEEE Access 2018, 6,
60079–60087.
30. Krishnaveni, S.; Hemalatha, M. A perspective analysis of traffic accident using data
mining techniques. Int. J. Comput. Appl. 2011, 23, 40–48.
31. Jiang, H. A comparative study on machine learning based algorithms for prediction of
motorcycle crash severity. PLoS ONE 2019, 14, e0214966.
32. Mafi, S.; Abdelrazig, Y.; Doczy, R. Machine learning methods to analyze injury severity
of drivers from different age and gender groups. Transp. Res. Rec. 2018, 2672, 171–183.
33. Wang, X.; Kim, S.H. Prediction and factor identification for crash severity: Comparison
of discrete choice and tree-based models. Transp. Res. Rec. 2019, 2673, 640–653.
34. Santos, K.; Dias, J.P.; Amado, C. A literature review of machine learning algorithms for
crash injury severity prediction. J. Saf. Res. 2022, 80, 254–269.
35. Khan, M.U.A.; Shukla, S.K.; Raja, M.N.A. Load-settlement response of a footing over
buried conduit in a sloping terrain: A numerical experiment-based artificial intelligent
approach. Soft Comput. 2022, 26, 6839–6856.

46

You might also like