KNN Paper
KNN Paper
4th
Amit Kumar(2K22/EE/45)
Electrical Engineering
Delhi Technological University
Delhi, India
Akk71864@gamil.com
Abstract—Traffic safety is a significant concern for road accident fatalities. This disproportionate impact
governments, with reducing road accidents and their underscores the critical importance of improving road
consequences being a top priority. This study focuses on infrastructure and monitoring road safety measures
identifying the critical factors influencing road accidents and effectively.
predicting their severity using machine learning techniques.
Models such as K-Nearest Neighbor (KNN), Decision Tree, and Efforts to address road safety challenges have
Random Forest were implemented and evaluated for accuracy. increasingly relied on technological advancements. One
Among these, KNN emerged as the most accurate model for promising approach is the use of machine learning (ML) and
predicting accident severity. The insights gained from this analysis data mining techniques to analyze accident data, identify
can play a crucial role in improving traffic management and high-risk areas, and predict the severity of potential
reducing the risks associated with road accidents. Furthermore, accidents. By processing historical accident data, these
the system can be enhanced to automatically generate and send
accident reports to relevant authorities, including hospitals,
models can reveal patterns that are not immediately apparent,
ambulances, and insurance providers. This feature could enabling authorities to take proactive measures. For example,
significantly reduce response times, improve emergency services, data on factors such as weather conditions, traffic volume,
and ultimately help lower fatality rates. The study highlights the road structure, lighting, and driver behavior can be fed into
potential of machine learning to address critical traffic safety machine learning algorithms to predict accident-prone
challenges effectively.. locations and evaluate risks associated with different
scenarios.
Keywords—Machine Learning, KNN Algorithm, Severity,
Weather Conditions. Traditional methods of road condition monitoring, such
as manual inspections, are labor-intensive, time-consuming,
I.INTRODUCTION and often lack consistency in data collection and coverage.
Road accidents are becoming an increasingly severe Additionally, the costs of maintaining roads, especially in
global concern, posing a significant threat to human lives. regions with heavy traffic or extreme weather conditions,
According to the latest reports, road traffic accidents are one present significant challenges for authorities. To overcome
of the leading causes of death worldwide, claiming over 1.35 these limitations, a reliable and automated road monitoring
million lives annually and leaving millions injured or system is essential. Such a system can help identify
disabled. In India, the situation is particularly alarming, with problematic areas quickly, prioritize repairs, and allocate
approximately 403,116 road accidents recorded in 2022, resources more effectively.
resulting in 153,972 fatalities and countless injuries. This The proposed research leverages advanced machine
accounts for around 11% of all accident-related deaths learning algorithms to develop a traffic accident prediction
globally, making India the country with the highest number and prevention system. Specifically, the K-Nearest Neighbor
of fatal road accidents among 199 nations. These statistics (KNN) algorithm is used due to its simplicity, efficiency, and
highlight the urgent need for a comprehensive approach to high accuracy in analyzing accident data. The key objectives
road safety. of this system are as follows:
The primary causes of road accidents in India are well- 1. Designing a Cost-Effective Model: Develop a model
documented and include speeding, distracted driving due to that is affordable, user-friendly, and capable of predicting
mobile phone use, driving under the influence of alcohol or accident severity accurately.
drugs, failure to use seat belts or helmets, poor vehicle
maintenance, and hazardous road conditions such as 2. Analyzing Historical Data: Use past accident records
potholes, cracks, and uneven surfaces. The condition of roads to identify high-risk zones and evaluate the contributing
plays a crucial role, as poorly maintained roads increase the factors.
likelihood of accidents significantly. According to the World 3. Making Predictions Based on Key Variables:
Health Organization (WHO), highways, which comprise Incorporate factors such as weather conditions, road
only 2% of India’s road network, are responsible for 37% of
structure, traffic density, and pollution levels to forecast techniques focus on identifying surface irregularities, such as
potential accidents. cracks and potholes, through visual data analysis. Many
studies have used threshold-based heuristics to classify road
4. Facilitating Resource Allocation: Provide actionable
conditions into binary categories (e.g., "damaged" or "not
insights to help authorities allocate resources effectively for
damaged"). However, these binary classifications are limited
road maintenance and emergency response.
in scope and fail to address the varying degrees and types of
5. Enhancing Emergency Response: Enable automatic road deterioration.
alerts to hospitals, ambulances, and other relevant authorities
Multiclass classification, which categorizes road
in case of accidents, reducing response times and potentially
conditions into multiple levels of severity or types of damage,
saving lives.
remains an underexplored area. Additionally, most existing
By integrating advanced analytics with real-time studies rely on vibration signals derived from acceleration
monitoring capabilities, this system aims to significantly data in the direction of gravity to detect road anomalies.
improve road safety. It not only predicts where accidents are While this method is effective to some extent, it suppresses
likely to occur but also helps authorities understand the information from other directional data that could provide
underlying causes, enabling targeted interventions. valuable insights into road conditions.
Furthermore, automating the reporting process ensures that
There is also a noticeable gap in research regarding the
accident data is accurate and actionable, providing a robust
application of advanced neural networks for multiclass
foundation for traffic safety initiatives.
classification of road deterioration. Neural networks, with
In conclusion, as road accidents continue to claim their ability to model complex patterns and relationships in
thousands of lives annually, the need for innovative and data, have the potential to revolutionize road inspection
scalable solutions is more critical than ever. This research methods. By leveraging neural networks, it would be possible
emphasizes the potential of machine learning models like to classify various types of road damage more accurately and
KNN to revolutionize traffic safety management. By efficiently, providing a comprehensive understanding of road
focusing on predictive analysis and proactive measures, the conditions.
system can contribute to reducing fatalities and injuries,
Furthermore, recent advancements in smartphone-based
ultimately creating safer roads for everyone.
monitoring systems have opened new avenues for road
inspection. Smartphones equipped with accelerometers,
gyroscopes, and cameras can be used to detect road
irregularities in real time. However, most studies using
II. RELATED WORKS smartphone-based methods have focused on binary
Currently, most road inspections are performed manually, classifications, overlooking the potential of more granular
following standard protocols issued by state transportation multiclass classification. Additionally, the use of smartphone
departments. Manual inspection involves visually assessing data for identifying road anomalies remains limited to
road conditions and recording data about their state. This threshold-based approaches, which may not fully capture the
approach, while straightforward to implement, has complexity of road deterioration patterns.
significant limitations in terms of precision, consistency, and In summary, while significant progress has been made in
efficiency. Moreover, manual inspections are time- automating road condition monitoring, several areas remain
consuming and labor-intensive, making it difficult to ripe for further research and development. These include:
maintain large road networks effectively.
• Expanding the use of multiclass classification to
To address these challenges, researchers have explored capture the diverse nature of road conditions.
the use of machine learning models to analyze and predict
road conditions. These models can significantly reduce the • Incorporating orthogonal data, such as directional
time and effort required for inspections while improving the vibrations, to improve anomaly detection accuracy.
accuracy of the results. Machine learning techniques enable • Leveraging neural networks to enhance the
automated analysis of vast amounts of data, making it precision and reliability of road condition analysis.
possible to identify road anomalies and prioritize
maintenance more efficiently. • Exploring the full potential of smartphone-based
methods for real-time road monitoring.
Several studies have proposed different methods to
enhance road inspection and monitoring. For instance, By addressing these gaps, future research can build more
Benedetto et al. investigated the use of Ground Penetrating robust and efficient systems for road inspection and
Radar (GPR) to evaluate road conditions. Their research maintenance, ultimately improving road safety and reducing
demonstrated that GPR is an effective tool for assessing road the risk of accidents.
quality, with a false alarm rate of less than 20%. Building on
these findings, Benedetto et al. developed a GPR-based
algorithm that optimizes the inspection process by reducing
the volume of data to be analyzed and the time required for
III. PROPOSED SYSTEM
evaluation. This algorithm has been successfully tested on
portable devices, making it a practical solution for real-world The primary goal of this research is to identify the most
applications. accurate machine learning algorithm for classifying road
surface characteristics using available datasets. Traffic
Another approach to road condition monitoring involves accidents, which result in a significant loss of life and cause
analyzing images of pavement deterioration. These
numerous injuries, have a profound impact on society. This • Data pre-processing: Cleaning of data and feature
has led to growing interest in understanding the factors that extraction/selection.
influence the severity of injuries sustained by drivers in road
• Machine Learning Training: Decision Tree, Neural
accidents. By gaining deeper insights into these factors, it is
Network and Regression Algorithms.
possible to develop effective strategies for reducing accident-
related risks. • Model Evaluation: Testing.
Accurate and detailed accident records form the • Output: Prediction of severity.
foundation of any accident investigation. These records
provide critical data that can be analyzed to identify patterns
and predict future accidents. Using historical accident data,
machine learning models can be trained to recognize various
road surface characteristics, such as cracks, potholes, or
smooth surfaces, which may influence accident rates. Other
important variables, such as weather conditions (rainy, sunny,
foggy, etc.), lighting, and road design, can also be included
in the analysis to create a comprehensive model.
In this study, a machine learning technique known as the
K-Nearest Neighbor (KNN) algorithm is employed to build
and train the model. The KNN algorithm is particularly well-
suited for this purpose because of its simplicity and Fig. 1. Proposed System Architecture
effectiveness in classifying data based on patterns and The methodology of the recommended approaches. The
similarities. By analyzing past accident records and road Decision tree is a graphical representation of possible
surface data, the KNN algorithm is used to forecast the solutions by making decisions under specified circumstances.
severity of accidents. KNN is a feature similarity-based classification algorithm. It
studies the data, analyzes their distances and similarities, and
This predictive model focuses on identifying statistically uses K values to divide them into clusters. The distance can
significant features that can help predict the likelihood of be determined in a variety of ways; we'll use the Euclidean
crashes and injuries. For example, it can analyze the impact distance measurement for this research. Logistic regression is
of weather conditions on accident severity or determine how recognized as a statistical technique for modeling the
specific road characteristics, such as cracks or potholes, probability of a discrete outcome given an input variable.
contribute to accidents. Once these factors are identified, the
model can also suggest ways to reduce associated risks. A. Data Acquisition
Data acquisition is the process of gathering information,
The advantages of using a machine learning approach like
which involves determining what data needs to be collected,
KNN are numerous. First, it allows for the processing of large
how it will be collected, and the purpose behind collecting it.
datasets quickly and accurately, reducing the time and effort
This process is the first step in effective data management.
required for manual analysis. Second, the insights generated
From the moment data is acquired, it becomes the
by the model can assist policymakers, transportation
responsibility of the organization, individual, or unit to handle
authorities, and urban planners in implementing targeted
it carefully and in compliance with applicable laws, policies,
interventions to improve road safety. For instance, areas with
and ethical standards. Proper data acquisition is essential to
frequent accidents due to poor road conditions can be
ensure that the data collected is relevant, accurate, and
prioritized for maintenance or redesign.
managed responsibly.
Moreover, this model can be further enhanced to include
Effective data acquisition involves several important
real-time data collection and analysis. By integrating data
guidelines to ensure transparency, ethical practices, and
from sources such as traffic cameras, weather monitoring
accountability in handling data. These guidelines include:
systems, and road sensors, the system can provide dynamic
predictions and warnings. For example, during adverse 1. Providing Notice of Data Collection: Informing
weather conditions, the model could alert drivers about high- individuals or organizations about the purpose and scope of
risk areas, helping them avoid accidents. data collection before it begins. Transparency ensures that
those providing data understand why it is being collected and
In conclusion, this research aims to build a machine
how it will be used.
learning model that accurately classifies road surface
characteristics and predicts accident severity. By leveraging 2. Obtaining Consent for Data Collection: Collecting
the KNN algorithm, the system can identify critical factors data should always be done with the consent of the individuals
contributing to road accidents and provide actionable insights or entities involved. This consent should be explicit, ensuring
to reduce risks. This approach not only enhances the that participants agree to share their information willingly.
understanding of accident causes but also contributes to the
development of safer road networks, ultimately reducing 3. Limiting Data Collection to Relevant Information:
injuries and saving lives. Only the data necessary for the specific purpose should be
collected. Collecting excessive or irrelevant data not only
The suggested system includes the following thorough wastes resources but also raises ethical and privacy concerns.
solution procedure show in Fig. 1:
4. Establishing Contracts or Agreements: Before
Data Acquisition: Extraction and importing of data. collecting data, it is important to have formal agreements in
place. These contracts outline the terms and conditions of data
collection, usage, and management, ensuring that all parties across data formats, resolving schema conflicts, and
involved are aware of their responsibilities. eliminating redundant information.
5. Tracking Data Production: Maintaining a record of 4. Data Reduction:
how data is generated, processed, and stored is crucial for
High-dimensional datasets can slow down processing and
ensuring data integrity. This tracking also helps in identifying
make the model prone to overfitting. Techniques such as
errors, gaps, or inconsistencies in the data.
Principal Component Analysis (PCA) or feature selection
methods are used to reduce the dimensionality of the data
while retaining the most critical information.
B. Data Pre-processing
Data pre-processing is an essential step in preparing raw 5. Data Formatting and Structuring:
data for analysis by machine learning algorithms. It involves The dataset must be organized to align with the specific
a series of steps to clean, transform, and organize the data requirements of the machine learning model being used. This
into a format suitable for training machine learning models. includes arranging the data into training, validation, and
Without proper pre-processing, raw data, which is often testing sets, ensuring proper labels, and adhering to input-
incomplete, inconsistent, or noisy, cannot be effectively output specifications.
utilized for analysis. This step is critical to ensure accurate
results and meaningful insights from machine learning and
deep learning algorithms. C. Feature Extraction
The process of data pre-processing encompasses various Feature extraction is a critical process in road analysis,
techniques that aim to enhance the quality of the data and where specific attributes or characteristics are identified and
make it suitable for modeling. Machine learning models have utilized to better understand and classify different road
specific requirements for data structure and format, and pre- conditions. Features play a central role in training machine
processing ensures that these requirements are met. Properly learning models, as they represent the key aspects of the data
pre-processed data improves the performance of the that the model uses to learn and make predictions. By
algorithm, reduces the chances of errors, and leads to more extracting meaningful features, we can improve the model's
reliable predictions. accuracy and efficiency.
Steps in Data Pre-processing: In road condition analysis, features can be derived from
various domains, including the time domain, frequency
1. Data Cleaning:
domain, and wavelet domain. Each domain provides unique
Handling Missing Values: Missing data can lead to insights into the characteristics of road conditions, making it
inaccurate results or hinder the training process. Techniques essential to consider features from all relevant perspectives.
such as imputation (filling missing values with mean,
Time-Domain
median, or mode) or removal of incomplete records are used
to address this issue. Features:
These features represent the raw data in its original form,
Removing Noise: Noise in data, such as outliers or
focusing on aspects such as signal amplitude, mean, variance,
irrelevant information, can negatively affect the model's
and standard deviation. Time-domain features are often the
performance. Data cleaning involves identifying and
starting point for analysis, as they capture the basic patterns
eliminating such noise.
and trends in the data.
Dealing with Duplicate Records: Duplicate entries in
Frequency-Domain
datasets are removed to avoid redundancy and ensure data
consistency. Features:
Frequency-domain features are obtained by transforming the
2. Data Transformation:
raw data into the frequency spectrum using techniques such
Scaling and Normalization: Raw data often contains as the Fast Fourier Transform (FFT). These features highlight
features with varying ranges and units. Scaling (e.g., Min- periodicities and patterns in the data that are not immediately
Max scaling) and normalization ensure that all features apparent in the time domain, such as vibrations or oscillations
contribute equally to the model, preventing bias. caused by road irregularities.
Encoding Categorical Variables: Many machine Wavelet-Domain
learning algorithms require numerical input. Categorical
Features:
variables are converted into numerical formats, such as one-
Wavelet transforms provide a multi-resolution analysis of the
hot encoding or label encoding, to make them compatible
data, capturing both time and frequency information. Wavelet-
with the model.
domain features are particularly useful for analyzing non-
Feature Engineering: Creating new features or stationary signals, such as those generated by vehicles
modifying existing ones to highlight important patterns in the traveling over varying road conditions. These features help
data. detect localized anomalies like potholes or cracks.
3. Data Integration:
Data from multiple sources is often combined into a D. Machine Learning Approaches
unified dataset. This process requires ensuring consistency Machine learning is a branch of Artificial Intelligence (AI)
that enables computers to learn and improve from experience
without being explicitly programmed. In machine learning, 3. The test set is used to evaluate the final model's
algorithms are trained using data, allowing them to recognize performance after training is complete. It provides an
patterns and make decisions based on past experiences. Once unbiased assessment of the model's accuracy and its ability
trained, these algorithms can apply their knowledge to solve to make predictions on new, unseen data.
similar problems without human intervention.
Through this iterative process, the model becomes
In the context of this study, we explore various machine increasingly accurate and effective at making predictions
learning techniques to analyze and predict road accidents based on the learned patterns.
based on data obtained from various databases. By applying
these techniques, we aim to develop models that can identify
key factors influencing accidents, helping to improve road E. KNN Algorithm
safety and reduce accident rates. The K-Nearest Neighbors (KNN) algorithm is a widely
Machine learning approaches in this study include both used supervised machine learning technique that is
supervised learning methods, such as classification and particularly effective for classification and prediction tasks.
regression, and potentially unsupervised learning methods, It is based on the principle of feature similarity, meaning
depending on the structure of the data. These techniques are that data points that are similar to each other are likely to
particularly useful for extracting meaningful insights from belong to the same category or have similar outcomes. This
large datasets, such as traffic accident reports, road conditions, method is often employed in various industries for
and environmental factors. categorizing data and solving classification problems.
The key idea behind KNN is straightforward: when a new
data point is introduced, the algorithm compares it to the
points in the training dataset and classifies it based on its
proximity to the nearest neighbors. These neighbors are
determined by measuring the distance between the new point
and all the points in the training data, using distance metrics
such as Euclidean distance, Manhattan distance, or others.
The new data point is then assigned a value or category based
on the majority vote of its nearest neighbors.
For example, in road accident analysis, KNN can be used
to classify accident severity based on factors such as weather,
road conditions, time of day, and other features. If a new
Fig. 2. General workflow of ML Algorithm
accident occurs, KNN will compare the new accident’s
features to past accidents and predict its severity based on the
Fig. 2 illustrates the typical workflow used in machine
outcomes of similar accidents.
learning techniques like classification and regression. The
process starts with the collection of raw data that includes
labeled information. These labels represent the known
outcomes or categories that the model will learn to predict.
The first step is data processing, where the raw data is
cleaned and transformed into a format that is ready for
analysis. During this stage, feature extraction occurs, where
key attributes (or features) from the data are identified. These
features could include road conditions, weather patterns,
traffic volume, or other relevant variables. By extracting
these features, the data becomes more structured and useful
for machine learning algorithms to analyze.
Once the features are extracted, the data is split into three
sets: training, validation, and test sets. This step is critical
for ensuring the model is trained effectively and can
generalize well to unseen data. The distribution of the data
across these sets is carefully managed to ensure that the
proportion of different categories or outcomes is consistent
in each set, allowing the model to learn and test on a balanced
dataset.
1.The training set is used to develop the model and train
the algorithm. It is the primary dataset from which the
machine learning model learns the relationship between the Fig. 3. KNN Flowchart
features and the labels. Fig. 3 illustrates the general process of how the KNN
2. The validation set is used to tune the model and adjust algorithm works. It starts with a training dataset where each
hyperparameters. It helps assess the model's performance data point is already labeled with its corresponding class or
during the training process, ensuring that the model is not value. When a new data point (such as a new accident record)
overfitting or underfitting the data. is introduced, the algorithm follows these steps:
1.Calculate the distance: Measure the distance between the output of a linear equation formed by the independent
new data point and all the points in the training set. Common variables. The sigmoid function maps any input to a value
distance metrics include Euclidean distance, which measures between 0 and 1, making it ideal for classification tasks. The
the straight-line distance between points, or Manhattan logistic regression equation is as follows:
distance, which sums the absolute differences of their
coordinates.
2.Identify the nearest neighbors: Once the distances are
calculated, the KNN algorithm identifies the K nearest points
(neighbors) to the new data point. The value of K is a
parameter that is set beforehand and can affect the model’s
performance. Typically, smaller values of K can lead to a
more sensitive model, while larger values make the model Applications of Logistic Regression
more general.
Logistic regression is highly effective in various
3.Assign a class label: The algorithm then assigns the class
applications, particularly in situations where the outcome is
or category of the new data point based on the majority class categorical. In road safety analysis, logistic regression can be
of its K nearest neighbors. For example, if 3 out of 5 neighbors applied to predict:
belong to the "severe" accident category, the new accident
will be classified as "severe." • Whether a specific set of road conditions will lead to
4.Prediction: Finally, the KNN algorithm predicts the value an accident (binary classification).
or class for the new data point based on the majority of its • The likelihood of different levels of accident
closest neighbors. severity, such as minor, moderate, or severe
Key Characteristics of KNN: (multinomial or ordinal classification)
Simplicity: KNN is easy to implement and understand,
making it a popular choice for classification tasks.
Non-parametric: KNN does not make assumptions about .
the underlying data distribution, unlike other algorithms such G. Model Evaluation
as linear regression. This makes it a flexible method that can Model evaluation is a crucial step in assessing the
be applied to a wide range of problems. performance of machine learning algorithms. After training a
Instance-based learning: KNN is an instance-based model, it is essential to measure how well it performs on
learning algorithm, meaning it doesn't learn a model during unseen data to ensure that the model generalizes well to new,
training. Instead, it stores the training data and uses it directly real-world examples. Evaluation metrics allow us to
during prediction. This can be an advantage or disadvantage, understand the effectiveness of a classifier and to compare the
depending on the application. performance of different models. This helps in selecting the
best model for a given task, such as predicting road accidents
or classifying accident severity.
F. Logistic Regression To evaluate the performance of machine learning
Logistic regression is one of the most widely used techniques classifiers, we use a variety of performance metrics. These
in machine learning, particularly when the data is labeled and metrics provide insight into how well the model is performing
the goal is to classify the data into distinct categories. It is a across different aspects, such as accuracy, precision, recall,
statistical method that models the relationship between a set and F1 score. The evaluation process involves comparing the
of independent (predictor) variables and a categorical predicted values from the model to the actual values in the test
dependent (outcome) variable. Unlike linear regression, dataset.
which is used for continuous data, logistic regression is Factors in Model Evaluation
specifically designed for predicting binary outcomes or
probabilities. The primary factors we consider when evaluating a
classifier’s performance include:
In logistic regression, the model predicts an output that is
either a "Yes" or "No," "True" or "False," or 0 or 1, 1. Accuracy:
representing two possible classes. For example, in the context Accuracy is the most basic performance metric. It
of road accident prediction, logistic regression could be used measures the proportion of correctly predicted
to predict whether an accident will result in a fatality (1) or instances (both positive and negative) out of all
not (0). Although the output is categorical, the algorithm instances in the test set. While simple, accuracy may
produces a probability that the data record belongs to a not be the best indicator if the data is imbalanced
particular class, and these probabilities fall between 0 and 1. (e.g., predicting whether an accident will result in a
This probabilistic nature makes logistic regression suitable for fatality might have many more "non-fatal" cases).
binary classification tasks. 2. Precision:
Precision measures the proportion of true positive
instances (correctly predicted accidents) out of all
Logistic Regression Model: the instances predicted as positive by the model.
The logistic regression model works by applying the High precision means that when the model predicts
logistic function (also known as the sigmoid function) to the
an accident will result in a fatality, it is usually Decision Tree 78.01
correct.
3. Recall (Sensitivity):
Recall, or sensitivity, measures the proportion of true
positive instances out of all the actual positive
instances in the data. It tells us how well the model
identifies all actual accidents, including those that
may result in fatalities.
4. F1 Score:
The F1 score is the harmonic mean of precision and
recall. It provides a balance between the two,
especially when the classes are imbalanced. A higher
F1 score indicates better overall performance when
Fig. 4. Final result of accuracy comparison
considering both precision and recall together.
From the Fig. 4 we can conclude that KNN machine
learning algorithm provides more accurate results than other
H. Evaluation Metrics
machine learning algrithms for Road accident analysis.
1) Precision: The fraction of all projected observation
that is included to the positive class which are positive. The
formula for the Evaluation Metric of Precision is as follows:
Precision calculated using (1)
= (1)
2) Recall:Recall is the proportion of inspection which
are predicted to stay in the positive category but are actually
in the positive category. It shows how well the model can
recognize a random positive class observation. The formula
for the Recall Evaluation Metric is as follows:
Recall calculated using (2)
1 = × × !!
(3)
!!
IV. RESULTS
The analysis and discussion of collected data, along with
the evaluation of each ml prototype capabilities for detecting
road accident accuracy, are discussed in this work. The Fig. 6. Accident count vs driver’s age
constraints used to measure performance were discussed in
preceding section. From the Fig. 6 the count of accidents happening in a day
can be visualized based on the age of the people involved in
To predict the accuracy and count of the accident deaths accidents.
from the data collected based on various parameters are
visulaised, and the findings were summarized in Table I. The V. CONCLUSION
accuracy of each model used, such as KNN, Logistic
According to the findings of this study, machine learning
Regression, and so on, is shown in these results.
algorithms have proven to be effective in classifying accident
TABLE. I. IMPLEMENTATION RESULTS severity based on factors such as cracks and potholes, road
Models Accuracy conditions, weather conditions, and other relevant variables.
KNN 88.25 However, there are certain limitations in the current research
Logistic Regression 88.24 that will be addressed in future studies. One such limitation
Random Forest 86.84 is the small size of our training dataset, which may affect the
precision and accuracy of the model's predictions. A larger [7] Jinjun Tang, Lanlan Zheng, Chunyang Han, Weiqi Yin, Yue Zhang,
and more diverse dataset could potentially improve these Yajie Zou, Helai Huang “Statistical and machine-learning methods for
clearance time prediction of road incidents: A methodology
performance metrics.
review”Analytic Methods in Accident Research 27, 100123, 2020.
Despite these limitations, machine learning offers a [8] Jonghak Lee, Taekwan Yoon, Sangil Kwon, Jongtae Lee “Model
promising approach for making more accurate predictions evaluation for forecasting traffic accident severity in raint seasons using
based on historical data. By analyzing past accident data, the machine learning algorithms:Seoul city study” Applied Sciences 10(1),
129, 2019.
system can predict accident severity and identify potential
[9] LukumanWahab, Haobin Jiang “A comparative study on machine
risk factors more reliably. The results of this analysis can then
learning based algorithms for prediction of motorcycle crash severity”
be used to recommend actions to the relevant authorities, PLoS one 14, (4), e0214966, 2019.
such as road maintenance agencies, in order to reduce the [10] NejdetDogru, AbdulhamitSubasi“Traffic accident detection using
number of accidents. The insights gained from this model can random forest classifier” 2018 15th learning and technology
help improve decision-making processes related to traffic conference(L&T), 40-45, 2018.
safety and road maintenance. [11] J. Patil, M. Prabhu, D. Walavalkar and V.B. Lobo, “Road Accident
Analysis Using Machine Learning, ” 2020 IEEE Pune Section
Given the proven effectiveness of machine learning in International Conference(PuneCon), 2020, pp. 108-112, dol: 10,
forecasting road accidents and fatalities, we strongly 1109/PuneCon50868.2020.9362403.
recommend adopting these techniques for future traffic safety [12] M. F. Labib, A. S. Rifat, M. M. Hossain, A. K. Das, and F. Nawrine,
applications. Moving forward, we aim to develop an advisory "Road Accident Analysis and Prediction of Accident Severity by Using
system that can predict traffic accidents and alert road users Machine Learning in Bangladesh," 2019 7th International Conference
on Smart Computing & Communications (ICSCC), 2019, pp. 1-5, doi:
in real-time. This system could be integrated with various
10.1109/ICSCC.2019.8843640
technologies, such as mobile applications, to enhance its
[13] S. Sonal and S. Suman, "A Framework for Analysis of Road
reach and effectiveness. The goal is to create a user-friendly Accidents," 2018 International Conference on Emerging Trends and
mobile app that not only provides accurate accident Innovations In Engineering And Technological Research (ICETIETR),
predictions but also serves as a valuable tool for road users, 2018, pp. 1-5, doi: 10.1109/ICETIETR.2018.8529088
helping them make safer driving decisions. [14] D. Al-Dogom, N. Aburaed, M. Al-Saad and S. Almansoori,
"Spatiotemporal Analysis and Machine Learning for Traffic Accidents
In the future, we envision expanding this system’s Prediction," 2019 2nd International Conference on Signal Processing
capabilities to include real-time data collection and analysis, and Information Security (ICSPIS), 2019, pp. 1-4, doi:
incorporating live traffic conditions, weather updates, and 10.1109/ICSPIS48135.2019.9045892.
other dynamic factors. By utilizing advanced machine [15] B. Kumeda, F. Zhang, F. Zhou, S. Hussain, A. Almasri and M. Assefa,
learning algorithms and large-scale datasets, we aim to build "Classification of Road Traffic Accident Data Using Machine Learning
a more robust and accurate traffic prediction system that can Algorithms," 2019 IEEE 11th International Conference on
make a significant impact on reducing road accidents and Communication Software and Networks (ICCSN), 2019, pp. 682-687,
saving lives. doi: 10.1109/ICCSN.2019.8905362.
REFERENCES
[1] MubarizManzoor, Muhammad Umer,
SaimaSadiqSaleemUllah,
Hamza Ahmad Madni, abidIshaq, and Carmen bisogni.”Traffic
Accident Severity Prediction Based on Decision Level Fusion of
Machine and Deep Learning Model.” IEEE Access
10.1109/ACCESS.2021.3112546
[2] N. Sridevi, M.V. Keerthana, Monisha V. Pal, T.R. Nikshitha, P. Jyothi.
”Road Accident Analysis Using Machine Learning.”International
Journal of Research in Engineering, Science and ManagementVolume-
3, Issue-5, May-2020 www.ijresm.com|ISSN(Online):2581-5792
[3] AkankshBasavaraju, Jing Du, Fujie Zhou, and JimJi. ”A Machine
Learning Approach to Road Surface Anomaly Assessment Using
Smartphone Sensors.”IEEE SENSORS JOURNAL, VOL.20, NO.5,
MARCH 1, 2020.”
[4] SahilDabhade, Sai Mahale, AvinashChitalkar, PushkarGawhad, Vicky
Pagare.”Road Accident Analysis and Prediction using Machine
Learning. ”International Journal for Research in Applied Science &
Engineering Technology (IJRASET)ISSN: 2321-9653; IC Value:
45.98; SJ Impact Factor: 7.177 Volume 8 Issue I, Jan2020-Available at
www.ijraset.com
[5] Vipul Rana, Hemant Joshi, Deepak Parmar, PradnyaJadhav, Monika
Kanojiya. ”Road Accident Prediction using Machine Learning
Algorithm.” International Research Journal of Engineering and
Technology (IRJET) Volume: 06Issue:03|Mar2019 www.irjet.net.
[6] Meenu Rani Dey, UtkalikaSatapathy, PranaliBhanse, Bhabendu Kr.
Mohanta, Debasish Jena. ”Detecting Road Surface Condition using
Smartphone Sensors and Machine Learning. ”Information Security Lab
IIIT Bhubaneswar Odisha, India a117004@iiit-bh.ac.in 978-17281-
1895-6/19/$31.00 c2019 IEEE.
Code Implimentation
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
time_intervals_15min = np.array(time_intervals_15min)
accident_counts_15min = np.array(accident_counts_15min)
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(time_intervals_15min, accident_counts_15min, test_size=0.2, random_state=42)
# Create and train the KNN model (using Manhattan distance, for example)
knn = KNeighborsRegressor(n_neighbors=3, metric='manhattan') # Using Manhattan distance (L1 norm)
knn.fit(X_train, y_train)
# Testing the model with a new value (e.g., time interval of 9:00 AM, 9:15 AM, etc.)
test_value = np.array([[7]]) # Predict for 9:15 AM (9.25 means hour 9 + 15 minutes)
predicted_count = knn.predict(test_value)
This code demonstrates how to use the K-Nearest train_test_split: Splits the data into 80% training and 20%
testing.
Neighbors (KNN) regression algorithm to predict
accident counts for specific time intervals. It X_train and X_test: Features (time intervals).
y_train and y_test: Targets (accident counts).
preprocesses the data, trains the KNN model, makes a
prediction for a specific time interval, and evaluates the 3. K-Nearest Neighbors (KNN) Regression
Model: KNeighborsRegressor is used to predict continuous
model's performance. Here's a detailed explanation: values (regression task).
Hyperparameters:
1. Data Preparation n_neighbors=3: Considers the 3 closest neighbors to make a
Hourly Data prediction.
initial_hours: Represents the hours from 0 (midnight) to metric='manhattan': Uses the Manhattan distance (sum of
23 (11 PM), formatted as a 2D array where each row is absolute differences) to measure distances between points.
Training: knn.fit(X_train, y_train) fits the model on the
an hour. training data.
accident_counts: A 1D array representing the accident
4. Making Predictions
counts corresponding to each hour.
Example Prediction: For a specific time interval (10.00,
Convert to 15-minute Intervals
representing 10:00 AM), the model predicts the accident count
Reason: Each hour is divided into four 15-minute
using knn.predict(test_value).
intervals to create finer-grained data. 5. Model Evaluation
Predicted vs Actual Values:
Loop Explanation: The model predicts accident counts for the test set using
knn.predict(X_test).
For each hour and its corresponding accident count: Mean Squared Error (MSE):
Measures the average squared difference between predicted and
Four 15-minute intervals are created by adding 0.25,
actual values.
0.5, and 0.75 to the hour.
Lower MSE indicates better model performance.
The accident count for each hour is duplicated across its
6. Results
four intervals. Prediction for a Specific Time: The accident count for 10:00 AM is
Final Data: predicted and displayed.
Model Performance: MSE for the test set is calculated to assess