Professional Documents
Culture Documents
Accident Zones
S. Subhashini R. Maruthi
Department of Computer Applications Department of Computer Applications
Hindustan Institute of Technology and Science Hindustan Institute of Technology and Science
Chennai, Tamil Nadu, India Chennai, Tamil Nadu, India
21248005@student.hindustanuniv.ac.in rmaruthi@hindustanuniv.ac.in
Abstract—The usage of vehicles were increased regions in Tamil Nadu are where accidents occur most
throughout the world, the count of accidents is also increased frequently. The busiest city, like Chennai, is a major
globally. One of the world's major concerns is recognized to metropolitan with a huge number and access to all forms
be the number of traffic accidents. In each year, there are of transportation. Accidents happen less frequently in
increasingly more traffic accidents. Consequently, it has a places like Nilgiris as fewer people are living there. People
significant impact on a nation's society, economy, and just select those cities for vacations, not as their stable
progress. As accidents occur in various places and at various locations. In the year 2020, the total number of accidents
times, identifying where they are more likely to occur can be that took place in Tamil Nadu is more than forty-five
complicated. This paper presents a Predictive Model to
thousand. The overall number of accidents that are
analyze and visualize road traffic accident zones in Tamil
Nadu. The zones are categorized by four different attributes
recorded in Tamil Nadu in 2020 was above 45,000. In the
such as low, medium, high and very high. Data analysis is upcoming decades, this will increase. The state only
done in order to make a determination. The Real time data experiences an increase in the number of accidents each
are acquired and interpreted using Latent Class Clustering year. Both national and international visitors from other
Analysis (LCA). The outcome is represented through states are frequent in Tamil Nadu. Through this study, they
Cartogram visualization technique. Accident analysis is done can learn about the accidents that occur in each district
to identify the reason or causes of an accident in attempt to they visit. They can use it to view the accident zones and
prevent similar instances from occurring repeatedly. As a how often accidents occur there. ln order to enable them to
result, it is simpler to recognize the zones so the data is easier travel securely and to be aware of how busy the region is.
for the users to absorb. This paper supports determining the location through
Google Maps representation. The data are collected,
Keywords—Road accident zone analysis, Road Accidents, cleaned up, integrated, verified, and analyzed. Latent Class
Predictive model, Data analysis, Data visualization, Latent Clustering Analysis is used for data analysis (LCA).
class clustering analysis, Cartogram
II. RELATED WORKS
I. INTRODUCTION
Descriptive model has been studied to find the road
Road accidents are one of the steering reasons of traffic accident zones. This study also uses infographic
demise around the world at present. To minimize the visualization techniques to identify the location efficiently.
sequels of the accident, some conduct are demanded. The This investigation provides only the data in simple format
study of accidents that can distinguish and anatomize the and does not provide the details about the future [1]. A
proxies of accident are furthermore necessary to elect the study is examined to detect hotspots (road accident zones)
most efficient moves. Analyzing accident data can also be using the method of kernel density and hotspot analysis. It
utilized to identify the street, vehicle, and motorist- related does not provide statistical significance and it often gives
accident causes. mortal miscalculations, motorist collapse, the same results [2]. A Logistic regression model for
poor street sensation, vehicle mechanical failure, hurry and analyzing and visualizing road accidents in accidents prone
racing in crime of business acts, traffic jams, way areas was studied. This model is not applicable if the
irruption,etc. are crucial factors in accidents. To pinpoint number of observations are lesser than the number of
the areas with the topmost attention of accidents, data from features [3]. Using the fuzzy clustering technique, the
road accidents are anatomized. Accidents occur for a probability of road accidents in Medan city is investigated.
variety of circumstances, including the part of day, the It performs badly on datasets with clusters of various sizes
vehicle's type, age of rider, the number of passengers, the [4]. K-Means Technique was applied in a study to cluster
type of road, also the weather. Teenagers between the ages the areas of accident-prone highway zones. Outliers and
of 15 and 25 seem to be more likely to be impacted in this noisy data cannot be dealt by this [5].
instance. Those young people utilize the bike for
entertainment and don't have a significant amount of Traffic Accident Evaluation and Prediction through
control. They are of the age category for those without a Machine Learning Techniques has been examined. The
license. Government regulations prohibit them from algorithms for logistic regression, decision tree, and
driving a vehicle until they are 18 years old. The busiest random forests were discussed in this table. The one with
1227
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on October 17,2023 at 14:26:49 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
III. METHODOLOGY
A predictive model is proposed to analyze the accident
zones using road traffic data. Road accident zone data Fig. 3. Process of data pre-processing method
analysis will be conducted using unsupervised machine
learning clustering technique. Data’s are clustered into Data analysis must involve the step of data pre-
different subgroups and analyzed based on characteristics processing. It involves the conversion of unstructured data
and visualized using maps. The accident data set for the into a form that is understood by computers. This is
initial process is derived from the public domain. represented in Fig. 3. Raw data that are incomplete may
contain errors and they don’t have a regular format. First
The data set contains instances and locations near step is cleaning the unwanted and duplicate data. Data
traffic accidents in a specific region. Before putting the Integration is the process of combining data from different
data set through pre-processing, it must be validated. Data sources together. Converting data from one form to
cleaning should be done to remove any null data or another is the goal of data transformation. Final step is
redundant information that might be present. Data that has reduction of data by reducing the storage capacity of data.
been cleaned is utilised as pre-processing data and fed into
the algorithm as input. To classify the data, the algorithm C. Data Coding
generates features from various dataset properties. The
method is used to evaluate the data, and based on the
features that are retrieved; a probability is produced for the
processed data. The data is then visualized using data
visualization technique. Fig. 1 represents the steps for
processing the data.
Fig. 4. Process of data coding
1228
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on October 17,2023 at 14:26:49 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
D. Data Summarization
1229
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on October 17,2023 at 14:26:49 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
distal outcomes will then be examined. The bias-adjusted IV. EXPERIMENTAL RESULTS
three-step methodology and multiple-group latent class
analysis are the best methods for evaluation.
The measured variables are statistically distinct from
one another within each latent class. The correlations
between the measured variables are described by the
latent variable classes [16]. The model of latent class is
expressed as follows in one format:
(1)
Where T represents the total number of latent classes,
and pt indicates the unconditional probability that should
add up to one. The conditional or marginal probabilities
are denoted by
The form is as follows for a model of two-way latent
class:
(2)
TABLE 1. KEY SYMBOL TO POINT THE ZONES
Non-negative matrix factorization and stochastic latent
semantic analysis are also connected to this concept. Symbol Description
The decision on the number of clusters is the first of Low
two difficulties in model selection, and the structure of the
model based on the amount of clusters is the second. A Medium
lower Bayesian Information Criterion (BIC), combined
with goodness-of-fit statistics, multivariate regression
latent variables, bootstrap samples, probability tests, and High
Wald tests, are generally preferable. The separation of the
clusters, or the uncertainty of categorization, is the basis
Very High
for another set of methodologies for analyzing LC cluster
models [17].
Model fit, cluster separation and partition stability are
selecting strategy criteria for LCA. For the eventual The map in the Fig. 9 represents the Cartogram
selection of the number of clusters, additional factors such technique of road traffic accident zones in Tamil Nadu.
as parsimony, the amount of population shares, and This real time map shows the part of Tamil Nadu. Four
evaluation of clusters must be taken into account. The different pointers are marked on all the districts according
chosen clusters can be explored and visualized. To further to the dataset. Different colours and shapes were used to
determine the number of classes, a range of factors can be highlight the districts as per predicting the accident rate
applied. No factor is generally accepted as the best [18]. with previous year records. Table 1 shows the key
symbols which are used to point the zones in the map. The
Visualization is termed as representing data and green square represents the districts with low accident rate.
information in the way of pictures, graphs, chart, map, plot The yellow diamond represents the districts with medium
and animations. Here to plot the regions cartogram – a accident rate. The blue circle represents the districts with
visualization technique based on map is used. Cartogram high accident rate. The red star represents the districts with
or value by area maps is termed as map based data very high accident rate. These different symbols and
visualization. Map-based data visualization is termed as a colours also made it simple to understand the category.
cartogram. The values are specified in harmony with each
other in order to convey the information through the map.
They are a sort of map that illustrates a region's geography. V. CONCLUSION
Depending on the value, the area's size is determined. This study examines data on road traffic accidents that
According to the user's preferences, the regions can be occurred in Tamil Nadu. To present this predictive model
coloured or shaded. It is typically used to illustrate data in a realistic way, a real time dataset is used. The technique
pertaining to countries and regions. For example, Election of visualization is also presented in GoogleMyMaps which
results or population. Here the regions are pointed using is related to Google maps. The collected real time data has
some marks and they have some specifications based on been processed, analyzed and visualized. The data that has
the values. been visualized can be applied to prevent accidents. Users
can get essential information on the zones. By knowing the
1230
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on October 17,2023 at 14:26:49 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
pointed zones, the users can get awareness about the safety [8] Anik Vega Vitianingsih, Nanna Suryana, Zahriah Othman(2021).
of driving in each district. So that they can drive with Spatial analysis model for traffic accident-prone roads
classification: a proposed framework. IAES International Journal
safety in the corresponding places. The major objective of of Artificial Intelligence (IJ-AI) Vol. 10, No. 2, June 2021
this paper is to use visualisation to forecast the zones based [9] Santos, D.; Saias, J.; Quaresma, P.; Nogueira, V.B. Machine
on earlier year records using the data that has been Learning Approaches to Traffic Accident Analysis and Hotspot
obtained. It is concluded that the busiest city has seen the Prediction. Computers 2021, 10,157.
largest number of accidents. Individuals must acquire a bit https://doi.org/10.3390/computers10120157
more about safety and roadway regulations, especially in [10] Syed Saqib Ali Kazmi, Mehreen Ahmed, Rafia Mumtaz, Zahid
the busiest metropolis. For future work, the accident data Anwar, Spatiotemporal Clustering and Analysis of Road Accident
Hotspots by Exploiting GIS Technology and Kernel Density
in interior zones are preferred. So that users of this model Estimation, The Computer Journal, Volume 65, Issue 2, February
can know further about accident zones in each districts. 2022, Pages 155–176, https://doi.org/10.1093/comjnl/bxz158.
[11] Asghar Pasha1,Vijayalakshmi,MD Atique3, MD Hussain4, Harsh
REFERENCES narnot5,Bipin , Road Accident Prediction using Machine Learning,
International Research Journal of Engineering and Technology
[1] Rabbani, Muhammad & Musarat, Muhammad Ali & Alaloul, (IRJET), Volume: 08 Issue: 07 | July 2021.
Wesam & Ayub, Saba & Bukhari, Hamna & Altaf, Muhammad.
[12] Khanh Giang Le a, b, Pei Liua and Liang-Tay Lin, Determining the
(2022). Road Accident Data Collection Systems in Developing and
road traffic accident hotspots using GIS-based temporal-spatial
Developed Countries: A Review. International Journal of
statistical analytic techniques in Hanoi, Vietnam, GEO-SPATIAL
Integrated Engineering. 14. 336-352. 10.30880/ijie.2022.14.01.031.
INFORMATION SCIENCE 2020, VOL. 23, NO. 2, 153–164
[2] Mesquitela, J.; Elvas, L.B.; Ferreira, J.C.; Nunes, L. Data Analytics https://doi.org/10.1080/10095020.2019.1683437
Process over Road Accidents Data—A Case Study of Lisbon City.
[13] Sodikov, Jamshid. (2018). Road Traffic Accident Data Analysis
ISPRS Int. J. Geo-Inf. 2022, 11, 143. https://doi.org/
and Visualization in R. International Journal of Computer Science
10.3390/ijgi11020143
Engineering and Information Technology Research (IJCSEITR). 8.
[3] Sreedhar, Megna. (2021). Road Traffic Accident Analysis and 25-32. 10.24247/ijcseierdjun20184.
Visualization of Accident Prone Areas. International Journal for
[14] Maya John, Hadil Shaiba, Apriori-Based Algorithm for Dubai
Research in Applied Science and Engineering Technology. 9. 552-
Road Accident Analysis, Procedia Computer Science, Volume 163,
561. 10.22214/ijraset.2021.33280
2019, Pages 218-227, ISSN 1877-0509,
[4] Syahputri, Khalida & Sari, Rachida & Rizkya, Indah & Tarigan, https://doi.org/10.1016/j.procs.2019.12.103.
Ukurta & Siregar, Ikhsan & Farhan, Tengku. (2020). Clustering the (https://www.sciencedirect.com/science/article/pii/S187705091932
vulnerability of traffic accidents in Medan city with a fuzzy c- 1428)
means algorithm. IOP Conference Series: Materials Science and
[15] Kumar, S., Toshniwal, D. A data mining approach to characterize
Engineering. 801. 012030. 10.1088/1757-899X/801/1/012030.
road accident locations. J. Mod. Transport. 24, 62–72 (2016).
[5] Puspitasari, Diah & Wahyudi, Mochamad & Rizaldi, Muhammad https://doi.org/10.1007/s40534-016-0095-5
& Nurhadi, Acmad & Ramanda, Kresna & Sumanto,. (2020). K-
[16] https://en.wikipedia.org/wiki/Latent_class_model
Means Algorithm for Clustering The Location Of Accident-Prone
On The Highway. Journal of Physics: Conference Series. 1641. [17] Jeroen K. Vermunt, Tilburg University, Jay Magidson Statistical
012086. 10.1088/1742-6596/1641/1/012086. Innovations Inc, Latent Class Cluster Analysis,
https://jeroenvermunt.nl/hagenaars2002b.pdf
[6] Vyshnavi K G, Dr. Nalini N,.(2022) Machine Learning Algorithms
for Road Accident Analysis and Forecasting. International Journal [18] Olga Lezhnina, Gábor Kismihók, Latent Class Cluster Analysis:
of Research in Engineering and Science (IJRES). Volume 10 Issue Selecting the number of clusters, MethodsX, Volume
7 ǁ July 2022 ǁ PP. 283-288 9,2022,101747, ISSN 22150161,
https://doi.org/10.1016/j.mex.2022.101747.
[7] Dipanshu Gupta, Vagisha Goel, Rithik Gupta, Mohd Shariq,
(https://www.sciencedirect.com/science/article/pii/S221501612200
Rajesh Singh (2022). ROAD ACCIDENT PREDICTOR USING
1273)
MACHINE LEARNING. International Research Journal of
Modernization in Engineering Technology and Science
Volume:04/Issue:05/May-2022
1231
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on October 17,2023 at 14:26:49 UTC from IEEE Xplore. Restrictions apply.