You are on page 1of 20

Big Data Analytics in Intelligent Transportation Systems: A Survey

In this paper author is giving brief survey to improve Transportation by using various machine
learning algorithms such as SVM, Linear Regression, Deep learning, Decision Tree and many more.
Now-a-days road side sensors or inbuilt vehicle sensors will sense current traffic situation and then
report that sense data to centralized servers and centralized server may process that data and then
apply machine learning algorithms on that process data to predict traffic condition such as normal or
congested. The main problem arrive in ITS is generating huge amount of data from various sensors
and traditional machine learning algorithms may not process and predict such huge data and to
overcome from this problem author suggesting to use distributed machine learning technologies
such as SPARK or HADOOP as this technologies support to process big data.

In propose work I am using SPARK processing package called ‘pyspark’ from python to train
distributed machine learning algorithms such as Linear Regression and Decision Tree. To implement
this project I downloaded road traffic data from below website

In above dataset we have weather condition, temperature and traffic status obtained from sensors
and we can use above data to train SPARK machine learning algorithms and then this algorithms may
take current weather and temperature condition from sensor and then predict traffic status from
value 1 to 5, if value 1 is predicted then traffic is normal and if 2, 3, 4 predicted then congested
traffic and if 5 predicted then heavy congested traffic may occurred. In above dataset in last column
we have class label as traffic condition between 1 to 5. In above dataset weather codes are in integer
format and the names of that weather code u can see in below screen
In above screen if weather code 0 means its tornado.

In base paper author has describe various fields where sensor data may accumulate and this spark
processing can applied such as Monitoring Power grid to predict power supply, monitoring patient
using sensor to predict health condition and many more.

In paper author has describe many algorithms but we can implement any one which is giving best
performance and in paper author has not describe any feature selection algorithms which can
remove dummy or garbage values from dataset and used only important attributes which help in
improving prediction accuracy. So in extension work we have added PCA (principal component
analysis algorithm) feature selection algorithm to read only important attributes from dataset and
then build ML model.

Below screen showing code with comments as how we used spark to build traffic monitor ML model
In above screen loading all spark packages

In above screen read comments how we train SPARK ML to build traffic prediction model.

SCREEN SHOTS

To run project double click on ‘run.bat’ file to get below screen


In above screen click on ‘Upload Road Traffic Dataset’ button to load traffic and to get below screen

In above screen selecting and uploading ‘TrafficDataset.csv’ file and then click on ‘Open’ button to
load dataset and to get below screen
In above screen dataset loaded and now click on ‘Initialize Spark Session’ button to create spark
object and initialize and start reading dataset

In above screen spark object created and in below screen we can see data read by spark
In above screen spark displaying few rows from dataset and the column type of dataset as double
and in above dataset we convert data to normalize between 0 and 1. Now dataset is ready and now
click on ‘Run Linear Regression Algorithm’ button to apply dataset on Linear Regression algorithm
and calculate its accuracy

In above screen we got SPARK Linear Regression accuracy as 0.20% and this less accuracy obtained
due to not applying any features selection algorithms. Now click on ‘Run Decision Tree Algorithm’
button to apply decision tree on dataset and calculate its accuracy
In above screen decision tree accuracy is 0.2028% and now click on ‘Extension Decision Tree with
PCA Features Selection’ to apply dataset and then filter it using PCA and then apply decision tree and
then calculate its prediction accuracy

In above screen after applying PCA decision tree we got huge accuracy as 0.90% and this accuracy is
better than paper propose algorithms and now click on ‘Accuracy Comparison Graph’ button to get
below graph
In above graph x-axis represents algorithm name and y-axis represents accuracy of those algorithms
and from above graph we can conclude that extension work yield more accuracy compare to other
algorithms.

To run this project install below package

Pip install pyspark==3.0.1

Modules:
Upload Road Traffic Dataset

Initialize Spark Session

Run Linear Regression Algorithm

Run Decision Tree Algorithm

Extension Decision Tree with PCA Features Selection

Accuracy Comparison Graph

DATA FLOW DIAGRAM:


1. The DFD is also called as bubble chart. It is a simple graphical formalism
that can be used to represent a system in terms of input data to the
system, various processing carried out on this data, and the output data
is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling
tools. It is used to model the system components. These components
are the system process, the data used by the process, an external entity
that interacts with the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that
depicts information flow and the transformations that are applied as
data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a
system at any level of abstraction. DFD may be partitioned into levels
that represent increasing information flow and functional detail.
User

Unauthorized user
Yes Check
NO

Upload Road Traffic Dataset

Initialize Spark Session

Run Linear Regression Algorithm

Run Decision Tree Algorithm

Extension Decision Tree with PCA


Features Selection

Accuracy Comparison Graph

Exit

End process
UML:

Introduction to UML:

The Unified Modeling Language (UML) is a standard language for specifying,


visualizing, constructing, and documenting the artifacts of software systems, as well as for
business modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects. Using the UML helps project teams communicate, explore
potential designs, and validate the architectural design of the software.

Goals of UML

The primary goals in the design of the UML were:

 Provide users with a ready-to-use, expressive visual modeling language so they


can develop and exchange meaningful models.

 Provide extensibility and specialization mechanisms to extend the core concepts.


 Be independent of particular programming languages and development processes.
 Provide a formal basis for understanding the modeling language.
 Encourage the growth of the OO tools market.
 Support higher-level development concepts such as collaborations, frameworks,
patterns and components.
 Integrate best practices.

Why we use UML?

As the strategic value of software increases for many companies, the industry looks
for techniques to automate the production of software and to improve quality and reduce
cost and time-to-market. These techniques include component technology, visual
programming, patterns and frameworks. Businesses also seek techniques to manage the
complexity of systems as they increase in scope and scale. In particular, they recognize the
need to solve recurring architectural problems, such as physical distribution, concurrency,
replication, security, load balancing and fault tolerance. Additionally, the development for
the World Wide Web, while making some things simpler, has exacerbated these
architectural problems. The Unified Modeling Language (UML) was designed to respond
to these needs.

4.1 UML Diagram

The underlying premise of UML is that no one diagram can capture the different
elements of a system in its entirety. Hence, UML is made up of nine diagrams that can be
used to model a system at different points of time in the software life cycle of a system.

The nine UML diagrams are:

Use case diagram:

The use case diagram is used to identify the primary elements and processes that form
the system. The primary elements are termed as "actors" and the processes are called "use
cases." The use case diagram shows which actors interact with each use case.
Class diagram:

The class diagram is used to refine the use case diagram and define a detailed design
of the system. The class diagram classifies the actors defined in the use case diagram into a
set of interrelated classes. The relationship or association between the classes can be either an
"is-a" or "has-a" relationship. Each class in the class diagram may be capable of providing
certain functionalities. These functionalities provided by the class are termed "methods" of
the class. Apart from this, each class may have certain "attributes" that uniquely identify the
class.
Object diagram:

The object diagram is a special kind of class diagram. An object is an instance of a


class. This essentially means that an object represents the state of a class at a given point of
time while the system is running. The object diagram captures the state of different classes in
the system and their relationships or associations at a given point of time.
State diagram:

A state diagram, as the name suggests, represents the different states that objects in
the system undergo during their life cycle. Objects in the system change states in response to
events. In addition to this, a state diagram also captures the transition of the object's state
from an initial state to a final state in response to events affecting the system.
Activity diagram:
The process flows in the system are captured in the activity diagram. Similar to a state
diagram, an activity diagram also consists of activities, actions, transitions, initial and final
states, and guard conditions.
Sequence diagram:

A sequence diagram represents the interaction between different objects in the system.
The important aspect of a sequence diagram is that it is time-ordered. This means that the
exact sequence of the interactions between the objects is represented step by step. Different
objects in the sequence diagram interact with each other by passing "messages".

Collaboration diagram:

A collaboration diagram groups together the interactions between different objects.


The interactions are listed as numbered interactions that help to trace the sequence of the
interactions. The collaboration diagram helps to identify all the possible interactions that each
object has with other objects.

Component diagram:

The component diagram represents the high-level parts that make up the system. This
diagram depicts, at a high level, what components form part of the system and how they are
interrelated. A component diagram depicts the components culled after the system has
undergone the development or construction phase.
Deployment diagram:

The deployment diagram captures the configuration of the runtime elements of the
application. This diagram is by far most useful when a system is built and ready to be
deployed.

You might also like