You are on page 1of 8

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 00 (2020) 000–000
www.elsevier.com/locate/procedia

The 5th International Conference on Computer Science and Computational Intelligence 2020.

Big Data and Analytics for Safer Transportation


Andrew Julisar*, Wenny Firstiary*, Jasson Adhiputra*, Erica Simson*, Dr. Zulfany
Erlisa Rasjid, B.Sc.,MMSI a
*School of Computer Science, Bina Nusantara University, Tangerang, Indonesia
*School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Abstract

Research in big data and analytics offers great opportunities to utilize evidence in making decisions in various aspects. One
aspect is the use of big data in transportation. This journal provides the results of data analysis from a dataset containing accident
records that occurred in the USA. This journal is expected to be able to contribute to the government to make a decision-making
process and can also be useful for the public

Keywords: Big data; Data Science, Collision analytics

1. Introduction

Research in big data brings opportunities to make use of it in various fields. One of them is in the field of
transportation, application in the field of transportation can present the accuracy of information that can be useful for
humans.1 Big Data system are stored, processed, and mined efficiently to produce information to enhance different
service especially to field of transportation. 2 The rapid advancement of information and communication technology
has brought a revolution in the domain of public transport. 3 Transportation is a perplexing world. It is a blend of
innovations, social practices, decisions of single clients and stochastic occasions, settled inside a geological, natural
and financial situation. Consequently the plan of the transportation system and its provincial administrative structure
includes know-how from designing, topography, ecological sciences, economy and sociologies. 4 At present, a lot of
equipment is put on the road to monitor activity and traffic, this is needed to get a better understanding of traffic
flow in each region.5 One important task in traffic is to build an effective traffic accident risk by collect all
information about traffic. 4 This information is obtained from real-time then analyzed and becomes a meaningful
traffic pattern. 6 Traffic patterns can identify congestion, help understanding accidents. Various analyzes and
approaches are carried out using big data in machine learning to filter large amounts of traffic data and explore

1877-0509 © 2017 The Authors. Published by Elsevier B.V.


Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and
Computational Intelligence 2017.
2 Author name / Procedia Computer Science 00 (2017) 000–000

useful knowledge to be utilized to take preventative actions or make appropriate decisions. 7 This big data can be
useful in identifying safety hazardous zones, which can be complicated and unreliable today. Without sufficient
data, past studies had to focus mostly on the micro-level networks. 9 Future vehicle warning systems needs a local
(instead of global) analysis of real-time information transmitted between vehicles and infrastructures, to provide
local warning information matching the instantaneous driving contexts.. This information can also be shared with
the authorities who are responsible for managing and helping to solve problems that occur in the area. Traffic data
matches the characteristics of big data because the data generated by traffic varies greatly and is structured or
unstructured. Equipment installed to monitor traffic produces data and grows very significantly when connected
vehicles communicate and exchange information with road infrastructure and other vehicles. 10 Connected vehicles
produce 30 GB of data per day, so the data traffic will be very large and take up high bandwidth and computing
resources in the cloud. The main objective is to help the government in the decision-making process much easier
from the graph we made related to the accident record that occurred in the USALiterature Review

2. Related Works

This segment focus on big data transport projects, to optimize taxi usage, and on big data infrastructures
and applications for transport data events. Transdec (Demiryurek et al. 2010) is a project of the University of
California to create big data infrastructure adapted to transportation. It’s built on three tiers comparable to the MVC
(Model, View, Controller) model for transport data. The presentation tier, based on GoogleTM Map, provides an
interface to create the queries and expose the result, the query interface provides standard queries for the
presentation tier and a data tier is spatiotemporal database built with sensor data and traffic data. This work provides
an interesting query system taking into account the dynamic nature of town data and providing time relevant results
in realtime. Urban insight (Artikis et al. 2013) is a European project studying European town planning. In Dublin
they are working event detection through big data, in particular on an accident detection system using video stream
for CCTV (Closed Circuit Television) and crowdsourcing. Using data analysis they detect anomalies in the traffic
and identify if it’s an accident or not. When there is an ambiguity they rely on crowdsourcing to get further
information. The RITA (Thompson et al. 2014) project in the United States is trying to identify new sources of data
provided by connected infrastructure and connected vehicles. They work to propose more data sources usable for
transport analysis. (Jian et al. 2008) propose a service-oriented model to encompass the data heterogeneity of several
Chinese towns. Each town maintains its data and a service that allows other towns to understand their data. These
services are aggregated to provide a global data sharing service. These papers propose methodologies to
acknowledge data veracity and integrate heterogeneous data into one query system. An interesting line to work on
would be to produce predictions based on this data to build interesting decision support systems. (Jagadish et al.
2014) propose a big data infrastructure based on five steps: data acquisition, data cleaning and information
extraction, data integration and aggregation, big data analysis and data interpretation. (Chen et al. 2014) use
Hadoopgis to get data on demographic composition and health from spatial data. (Lin & Ryaboy 2013) present their
experience on twitter to extract information from log information. They concluded that an efficient big data
infrastructure is a balancing speed of development, ease of analysis, flexibility and scalability. Proposing a big data
infrastructure on the cloud will make developing big data infrastructures more accessible to small businesses for
several reasons: little initial investment, ease of development throw Service-Oriented Architecture (SOA) and using
services developed by specialist of each service. (Yuan et al. 2013), (Ge et al. 2010), (Lee et al. 2004) worked a
transport project to help taxi companies optimize their taxi usage. They work on optimising the odds of a client
needing a taxi to meet an empty taxi, optimizing travel time from taxi to clients, based on historical data collected
from running taxis. Using knowledge from experienced taxi drivers, they built a mapping of the odds of passenger
presence at collection points and direct the taxis based on that map. These research works don’t use real-time data
thus making it complicated to make accurate predictions and react to unexpected events. They also use data limited
to GPS and taxi usage, whereas other data sources could be accessed and used. The state of the art reveals a limited
use of predictions from big data analytics for transportation oriented systems. The heavy storage and processing
infrastructures needed for big data and the current available data-oriented cloud services make possible the
continuous access and processing of real time events to gain constant awareness, produce big databased decision
support systems, which can help take immediate informed actions.
Author name / Procedia Computer Science 00 (2017) 000–000 3

2.1 Big Data Architectures

11
In The context of transportation, the use of big data similar platforms in this journal can be used to
implement the Intelligent Transportation System (ITS). 12 ITS have been developed since the beginning of 1970s
and this is the beginning of the sophistication of the transportation sector. 13 And ITS already improve transportation
in many ways, for example reducing accidents, traffic control, reducing infrastructure damage, etc. 14 Rapid and
dynamic modelling with big data will provide better simulation abilities for ITS. 11 ITS itself is a Transportation
management system that Is integrated with technology, such as information, sensor, and electronics, with
transportation infrastructure that can make the system more efficient, safe, environmentally friendly. For example,
the Big Data Architecture of Smart Cities, which can be used as ITS, has 7 main layers: Data source, Data
Normalization, Data Intermediary, Data Storage, Data Analysis, Data Visualization, Decision. All these layers can
be broadly categorized into data collection and Preparation, as well as data analysis and utilization to support the
decision-making process. 15 Data can be collected from various source such as mobile phone call data record, smart
card data and geo coded social media records.16 All data that has been collected contains treasures of valuable
information and very useful for government to make analysis and monitoring. 11 Data sources and data
normalization, will collect data from various sources, preparing data and load it into a database. The intermediary
layer / brokering stage can overcome this plurality of data, also deal with a variety of formats, sources, and
frequency of data updates, and combine and integrate the right data. The data storage layer manages integrated data
storage and retrieval to support data analytics. While the analytic layer of data can do the analysis thing to extract
useful knowledge and meaningful patterns from the data. The Data visualization layer presents the results of the
analysis in graphical form to the user to assist the decision-making process. 17 big data can be collected and
processed to improve the functionality of transit systems in developing countries.

2.2 Big Data Utilizations

• Query analysis
By performing an analysis query, we can get a graph of the dataset that is owned
• Real-time Traffic Information
11
Having relevant and valid information allows someone to make a decision. Traffic jams due to collisions, or
roadworks, or bad weather can be predicted and avoided by the driver. They can find alternative paths to reach
the goal. Providing real-time information is very useful here. Data like this can be distributed through various
communication devices. ITS can add traffic information in real-time with predictive analysis about traffic
conditions. 18 The use of real time methods for ITS is very useful because with the real time method, ITS will
run efficiently and can reduce the occurrence of accidents in traffic and also reduce traffic jams.
• Analysis of Near Misses and Collisions
11
Analysis of accidents and collisions can almost improve the safety of the transportation system. visualization
of the location and type of collision that will occur can help the driver to identify the location / accident-prone
area and the cause of the accident. A collection of data attributes, such as the frequency and causes of collisions
and the types of vehicles involved in collisions, can be visualized on a map to help transport authorities assess
and make decisions if changes to policies or traffic safety infrastructure that can improvise unwanted events are
needed. Equipped with video analysis can also reveal the mistakes that drivers usually make so that accidents
occur.

2.3 Proposed Big Data Analytics Architecture

• Architecture for Real-Time Traffic Control


4
In making architecture some number of things are done and become a measurement tool for real-time data
processing and decision making. In a transport system, data consumers make various types of queries. Therefore,
when developing a platform for data analytics, we should take the variability of the data and consider these
things:
4 Author name / Procedia Computer Science 00 (2017) 000–000

1. Predictive information: based on information that has been collected every day, a pattern is formed to predict
the number of vehicles every day even every hour. This prediction information can be utilized in making traffic
decisions that will occur at any time. Real-time information can also be used to get the number of vehicles that
will arrive by detecting signals from the previous place that leads to that place.
2. Analytics: analysis of events from time to time into data that is processed into important information, such as
the cause of an accident in an area, the cause of congestion at certain hours, and how to overcome it and help
make decisions for city development.10 Analyzing using geo-location, time and velocity data to identify
problematic areas and abnormal traffic behaviors. This will enable to map problematic lines and locations
throughout the area.
3. Security: security and accessibility issues not all sources are available for all purposes. For example, a signal
control system may only rely on data obtained from loop detectors, while route recommendations might use
several sources including crowd-sourced data. In additional, personal data will not be accessed and used if it is
not needed.
4
Requirements on Big Data architecture for traffic control, an architecture for traffic control that relies on Big
Data analytics has a number of requirements:

1. Support analysis of data in streaming mode to achieve low latencies and analysis of historical data in batch
mode
2. Provide an easy way to specify a data analytics query and its triggering policy.
3. Provide an easy way to plug-in the analysis of different data sources, even as they become available.
4. Provide intuitive mechanisms to considering multiple data sources in answering a single query.
5. Provide an easy way to plug-in advanced data analysis algorithms. Considering that safety-critical nature of
traffic control, the architecture should be resilient and always on. In particular, it should be able to:
accommodate
6. Large number of data sources and consumers and scale linearly with these numbers.
7. Faults (hardware and disconnections network) by continuous operation and without loss of data.

There are challenges hindering access and utilization of big data, we have to consider:
1. 18 Ownership cause of tied commercial sensitivity of sharing data, operators may not be willing to share the
data.
2. Legal constraints about privacy, Big data projects can capture large amount of personal data.
3. 19 Data that being collected is rapidly increasing.

• Architecture for Analysis of Near Misses and Collisions


4
The factors that can help transportation planners to incorporate safety into the transport are like crashes by injury
levels (total, fatal and injury), non-motorized crashes (pedestrian and bicycle) and collision types (rear-end, angle
and sideswipe). At the same time, the alternative crash risk measures (crashes per square mile, crashes per mile, and
crashes per MVMT) was examined

3. Method

The method we use is that we collect accident data that occur in the US that we downloaded from the
Kaggle website, where the website provides a record of accidents reaching 3.0 million. This CSV has a dataset that
covers 49 states in the United States. The CSV that we download has a record starting from 2015 to 2020,
accompanied by several factors that may have the potential to cause accidents, both from temperature, weather, and
many more. reported by asirt.org there are around 38,000 people die from traffic accidents every year. The
economic and societal impact of road crashes costs U.S. citizens $ 871 billion. Road crashes cost the U.S. more than
$ 380 million in direct medical costs. In additional 4.4 million are injured seriously enough to require medical
attention. We use this CSV Big Data knowledge that we learned during semester 4, that is, we use Jupyter notebooks
and python packages like PySpark and Pandas, to produce data visualization presented in graphical form. To
Author name / Procedia Computer Science 00 (2017) 000–000 5

produce graphic data from the CSV, we conduct a series of query analysis to get the data we want to retrieve to be
presented in the graph. We also conducted small experiments such as conducting further analysis, which records
related to supporting factors and inhibiting the occurrence of accidents in the US. Before we go further, here we give
a description table about the features that CSV has that we use. This feature has 49 counts, but we don't explain the
description in terms of the data presented in Boolean form.

Feature Description

Attribute Description
ID Unique Identifier of the accident record
Source Indicates source of the accident report.
TMC Traffic accident may have a Traffic Message Channel.
Severity Shows the severity of the accident (1-4).
Start_Time Shows start time of the accident
End_time Shows end time of the accident
Start_lat Latitude in GPS coordinate (Start Point)
Start_Lng Longitude in GPS Coordinate
End_Lat Latitude in GPS Coordinate (End Point)
End_Lng Longitude in GPS Coordinate (End Point)
Distance(mi) Length of the road extend affected by the accident
Description Description of the accident
Number Shows the street number in address field
Street Shows the street name
Side Shows the side of the street
City Show the city where accident occurred
Country Shows the country where accident occurred
State Shows the state.
Zipcode Shows the zipcode
County Shows the County
Timezone Shows timezone based on the location of the accident
Weather_Timestamp Weather observation record
Temperature Shows the temperature record
Wind Chill Shows the wind chill
Humidity Shows the humidity record
Pressure Shows the air pressure
Visibility Shows visibility
Wind_Direction Shows wind direction
Wind_Speed Shows the wind speed
Precipitation Shows precipitation
Weather_Condition Shows the weather condition

2. Graphs Result

2.1. Graphs
6 Author name / Procedia Computer Science 00 (2017) 000–000

In this section I will present clearly and concisely, data related to accidents that occur in the US, as well as some
graphs that are supporting factors and graphs that are not factors for accidents.

Fig. 1. (a) City Countplot Picture Fig. 2. (a) State Countplot Picture

Fig. 3. (a) Weather Condition Countplot Picture Fig. 4. (a) Bump Countplot Picture

Fig. 5. (a) Wind Direction Countplot Picture Fig. 6. (a) Crossing Factor Countplot Picture
Author name / Procedia Computer Science 00 (2017) 000–000 7

Fig. 6. (a) Traffic Signal Countplot Picture Fig. 7. (a) Side Countplot Picture

Fig. 8. (a) Sunrise Sunset Countplot Picture Fig. 9. (a) TMC Countplot Picture

4. Result and Discussions

From the graph we show above, we provide several graphs that discuss what factors have the potential for
accidents to occur and those that have no potential. From the results (graphs), it is clear that, the accident rate is
dominated by the city of Houston (reaching 93289 records) followed by the cities of Charlotte and Los Angeles, and
the State of California (CA). From other graphs we get that Bumpiness does not have any effect on the occurrence
of accidents, as well as the Crossing Factor, there are only about 207590 records that provide true values as a
supporting factor for accidents, the rest are still dominated by false values. From the side of the side street more
dominated by the Right Side. Reviewing the TMC chart, the most number of accidents is TMC 201. Judging from
Weather conditions, the largest record is in Clear Condition (808171 record) which means weather condition does
not play an important role, as well as the Wind direction obtained when calm. Traffic signals also contribute to true
as a factor supporting the accident that is equal to 503383 records. although it is still dominated by false values, it is
better if the government conducts a review of the traffic signal field.

Because of the lack of more complete information about the dataset, we cannot conduct further research and
query analysis related to the features provided from the dataset. Another difficulty that becomes our obstacle is the
limitations in data science related to further analyzed data and limited graphical modeling. There are reasons to
doubt this explanation, it could be caused by an error in conducting data analysis, or the level of accuracy of the
dataset itself determines the final results. As we end this discussion, we can affirm how vital big data is for the
Transportation domain. We have shown several examples of using big data in order to do an analysis related to
accidents that occur and the results of this analysis can be used as a comparative study, which later is expected to
support the development of better urban planning. We are looking forward to the reviews and discussions from
readers.
8 Author name / Procedia Computer Science 00 (2017) 000–000

5. Conclusions

11
In this era, Big Data really has an important role in building the world of technology become more
sophisticated, big data has also helped various fields of industry in the world such as government, business,
education, healthcare, financial management, smart farming and of course in the field transportation. 20
Transportation infrastructure also become critical component to nation economy, security and wellbeing. 8 Traffic
covers a broad variety of subjects or components including, private car, trucks, public transportation, etc. And traffic
main problem is accidence and traffic jam.11 To reduce it, Big data has been widely applied in the transportation
domain specifically its traffic. In traffic, big data can act as real time traffic control where it is very helpful for
people to choose other roads when there is traffic jams, bad weather or roadworks. In addition, the community can
also choose other paths when there are certain areas prone to accidents. Traffic development is also related to urban
planning, because there will be a change occur in traffic flow and it can be informed so that city planning building
construction can be more accurate. In developing real time traffic control, it requires some data such as, predictive
information, analytics, queries related to urban planning, security.

References

[1] Hashem, I. A. T., Chang, V., Anuar, N. B., Adewole, K., Yaqoob, I., Gani, A., ... & Chiroma, H. (2016). The role of big data in smart city.
International Journal of Information Management, 36(5), 748-758.
[2] Zannat, K., Choudhury, C.F. Emerging Big Data Sources for Public Transport Planning: A Systematic Review on Current State of Art and
Future Research Directions. J Indian Inst Sci 99, 601–619 (2019).
[3] Gennaro, M. de, Paffumi, E., & Martini, G. (2016). Big Data for Supporting Low-Carbon Road Transport Policies in Europe: Applications,
Challenges and Opportunities. Big Data Research, 6, 11–25. doi.org/10.1016/j.bdr.2016.04.003
[4] Guerreiro, G., Figueiras, P., Silva, R., Costa, R., & Jardim-Goncalves, R. (2016, September). An architecture for big data processing on
intelligent transportation systems. An application scenario on highway traffic flows. In 2016 IEEE 8th International Conference on Intelligent
Systems (IS) (pp. 65-72). IEEE.
[5] Ren, H., Song, Y., Wang, J., Hu, Y., & Lei, J. (2018). A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction. 2018 21st
International Conference on Intelligent Transportation Systems (ITSC). doi:10.1109/itsc.2018.8569437
[6] G. Kemp, G. Vargas-Solar, C.F.D. Silva, P. Ghodous, C. Collet. (2015). Aggregating and managing big realtime data in the cloud –
application to intelligent transport for smart cities, Proceedings of the 1st International Conference on Vehicle Technology and Intelligent
Transport Systems. 107-112
[7] Jiang, X., Abdel-Aty, M., Hu, J., & Lee, J. (2016). Investigating macro-level hotzone identification and variable importance using big data: A
random forest models approach. Neurocomputing, 181, 53–63. doi.org/10.1016/j.neucom.2015.08.097
[8] Gang Xiong, Fenghua Zhu, Xisong Dong, Haisheng Fan, Bin Hu, Qingjie Kong, … Teng Teng. (2016). A Kind of Novel ITS Based on Space-
Air-Ground Big-Data. IEEE Intelligent Transportation Systems Magazine, 8(1), 10–22. doi:10.1109/mits.2015.2503200
[9] Liu, J., Wang, X., Khattak, A. J., Hu, J., Cui, J. X., & Ma, J. (2016). How big data serves for freight safety management at highway-rail grade
crossings? A spatial approach fused with path analysis. Neurocomputing, 181, 38–52. doi.org/10.1016/j.neucom.2015.08.098
[10] UN Global Pulse, ‘Using Big Data Analytics for Improved Public Transport,’ Project Series, no. 25, 2017.
[11] Neilson, A., Indratmo, Daniel, B., Tjandra, S. (2019). Systematic Review of the Literature on Big Data in the Transportation Domain.
Concepts and Applications, Big Data Research,10(1), 35-44.
[12] Zhu, L., Yu, F. R., Wang, Y., Ning, B., & Tang, T. (2018). Big Data Analytics in Intelligent Transportation Systems: A Survey. IEEE
Transactions on Intelligent Transportation Systems, 1–16. doi:10.1109/tits.2018.2815678
[13] Thu Phuong Dinh(2018). Smart Transportation: The Role of Big Data and Internet of Things. Logistics and Supply Chain Management.
[14] Zeyu, J., Shuiping, Y., Mingduan, Z., Yongqiang, C., & Yi, L. (2017). Model Study for Intelligent Transportation System with Big Data.
Procedia Computer Science, 107, 418–426. doi:10.1016/j.procs.2017.03.132
[15] Anda, C., Erath, A., & Fourie, P. J. (2017). Transport modelling in the age of big data. International Journal of Urban Sciences, 21(sup1),
19-42.
[16] Ben Ayed, A., Ben Halima, M., & Alimi, A. M. (2015). Big data analytics for logistics and transportation. 2015 4th International
Conference on Advanced Logistics and Transport (ICALT). doi:10.1109/icadlt.2015.7136630
[17] Lantz, K., Khan, S., Ngo, L.B., Chowdhury, M., Donaher, S., & Apon, A. (2015). Potentials of Online Media and Location-Based Big Data
for Urban Transit Networks in Developing Countries. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2537,
pp. 52-61.
[18] Chaolong, J., Hanning, W., & Lili, W. (2016). Research on Visualization of Multi-Dimensional Real-Time Traffic Data
[19] Giest, S. (2017). Big data analytics for mitigating carbon emissions in smart cities: Opportunities and challenges. European Planning
Studies, 25(6), 941–957. doi.org/10.1080/09654313.2017.1294149
[20] Costin, A., Adibfar, A., Hu, H., & Chen, S. S. (2018). Building Information Modeling (BIM) for transportation infrastructure–Literature
review, applications, challenges, and recommendations. Automation in Construction, 94, 257-281.

You might also like