Mini ProjectReport

ABSTRACT
Vehicular ad hoc networks (VANETs) can help reduce the traffic collisions at
intersections by sending warning messages to the vehicles. As wireless networks evolve
towards high mobility and providing better support for connected vehicles, a number of
new challenges arise due to the resulting high dynamics in vehicular environments and
thus motive rethinking of traditional wireless design methodologies. Machine learning, as
an effective approach to artificial intelligence, can provide a rich set of tools to exploit
such data for the benefit of the networks. After a brief introduction of the major concepts
of machine learning and VANET, our main concern is to implement VANET using
different machine learning techniques. The proposed system uses simulated data collected
from vehicular ad-hoc networks (VANETs) based on that implementation is done through
random forest classifier.
TABLE OF CONTENTS
1. Introduction………………………………….……………………………1
2. Literature Review…………………………………………………………2
3. Proposed Methodology…………………………………………………...6
4. Implementation Details…………………………………………………...7
5. Results and Discussions…………….………………………………….....10
6. Conclusion and Future Work ………………………………………….... 11
References…………………………………………………………………..12
1. INTRODUCTION
Nowadays, Intelligent Transportation Systems (ITSs) are getting more attention because of
the growing number of vehicles which cause traffic congestions, bottle-necks and incidents.
In an ITS, Information and Communication Technologies (ICT) can be integrated with
transport networks, vehicles and users, to improve safety and management of transport
networks. Rapid developments in mobile computing and communication technologies created
new study areas in ITS. Distributed traffic information system is one of them where vehicles
can exchange data with each other. Vehicular ad hoc network (VANET) is result of these
studies. Equipped vehicles can exchange information about congestions or hazardous
situations. Vehicular Ad hoc Networks (VANETs) are a special form of Ad-hoc Networks and
promising technology to advance roadway safety and efficiency by supporting drivers and
roadway administrations with proper information about road and traffic condition.
Vehicular ad hoc network (VANET) was developed to provide vehicular communications

with a reliable and cost-efficient data distribution. The vehicular communications can be used
to reduce road accidents, traffic congestion, traveling time, fuel consumption and so on.
Vehicular communications allow the road users to be informed about the critical and
dangerous situations, which may happen in their surrounding environment, by exchanging
some information. Therefore, VANET can play a vital role to ensure safer urban environments
for road users. VANET is employed by Intelligent Transportation Systems (ITS) for vehicle-
to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. Communications in
VANET rely on standards and protocols defined in Dedicated Short Range Communication
(DSRC) and Wireless Access in a Vehicular Environment (WAVE). IEEE 802.11p and
IEEE1609 are two WAVE standards.
Machine learning methods attempt to find the class of new sample using the way that was
learned from classified examples. In literature, supervised learning term is also used for
classification, since classification algorithm is provided with training sample and its
corresponding outcome. Each outcome is the class of that sample. The three widely used
classification methods that can be used for analysing the traffic data can be: Support Vector
Machine, Artificial Neural Networks and Random Forests. In this study, our main aim to
proceed with VANET’s and designing them uses Random Forest Technique of Machine
Learning. Random Forest Algorithm can be easily implemented to ensure Road safety.
Random forest is just an improvement over the top of the decision tree algorithm. The core
idea behind Random Forest is to generate multiple small decision trees from random subsets
of the data (hence the name “Random Forest”). Each of the decision trees gives a biased
classifier (as it only considers a subset of the data). They each capture different trends in the
data. This ensemble of trees is like a team of experts each with a little knowledge over the
overall subject but thorough in their area of expertise.
2. LITERATURE REVIEW
VANET implementation in a real time system is a challenging task. Many such

implementations have been deployed in recent years and implementing such projects in a real
time system requires complete simulation by measuring the performance of the system. Many
car producing companies like BMW, Audi, Ford, General Motors, Daimler, Nissan and so on
are using the ITS systems for passenger safety.
Vehicular Ad hoc Networks (VANETs) are a special form of Ad-hoc Networks and promising
technology to advance roadway safety and efficiency by supporting drivers and roadway
administrations with proper information about road and traffic condition. Applications to
assist drivers and for traffic safety will be possible with implementation of VANETs. Safety
applications aimed to decrease accident risk. Cooperative collision warning, traffic violation
warning, lane change, and pre-accident sensing are example applications for traffic safety. [1]
Presented Cooperative Active Safety Systems (CASS) which enables vehicles to follow its
neighbouring vehicles to avoid possible collisions. They explored communication scheme
which assumes that a GPS receiver, DSRC device and sensors are installed on vehicles.
Microscopic vehicle parameters, such as convey position, heading, speed, and other
parameters are exchanged with all neighbouring vehicles [1].
High mobility vehicular networks exhibit distinctive characteristics, which have posed
significant challenges to wireless network design. In this section, we identify such challenges
and then discuss the potential of leveraging machine learning to address them.
Constant mobility of vehicles also causes frequent changes of the communications network
topology, affecting channel allocation and routing protocol designs. Another source of
dynamics in high mobility networks comes from the changing vehicle density, which varies
dramatically depending on the locations (remote suburban or dense urban areas) and time
(peak or off hours of the day).
Traditionally developed rigorous mathematical theories and methods for wireless networks
are mostly based on static or low-mobility environment assumptions and usually not designed
to treat the varying environment conditions in an effective way.
In high mobility vehicular networks, there exist different types of connections, which we
broadly categorize into V2I and V2V links. The V2I links enable vehicles to communicate
with the base station to support various traffic efficiency and information and entertainment
(infotainment) services. They generally require frequent access to the Internet or remote
servers for media streaming, high-definition (HD) map downloading, and social networking,
which involve considerable amount of data transfer and thus are more bandwidth intensive
[3].On the other hand, the V2V links are mainly considered for sharing safety-critical
information, as basic safety messages (BSM) in DSRC among vehicles in close proximity in
either a periodic or event triggered manner [2]. Such safety related messages are strictly delay
sensitive and require very high reliability.
Machine learning allows computers to find hidden insights through iteratively learning from
data, without being explicitly programmed. It has revolutionized the world of computer
science by allowing learning with large datasets, which enables machines to change, re-
structure and optimize algorithms by themselves.
3. PROPOSED METHODOLOGY
High mobility networks exhibit strong dynamics in many facets, e.g., wireless channels,
network topologies, traffic dynamics, etc., that heavily influence the network performance.
In this section, we discuss how to exploit machine learning to efficiently learn and robustly
predict such dynamics based on data from a variety of sources.
3.1 VANET and Assumptions
Purpose of this study is to present that if vehicles are installed with trained equipment, they
can distinguish accident situations from normal situations. It is assumed that vehicles can
regularly broadcast some of their microscopic variables and they can classify received
information using machine learning tools to determine if any accident is occurring.
The proposed usage of classification methods aims to enable vehicle to detect accidents on its
neighbourhood. And based on this the sender stopping distance and receiver stopping distance
is noted and analysed. Effectiveness of classification techniques on microscopic variables in
accident detection are evaluated and compared in this study. Following assumptions have
been considered through this study:
• Only probe vehicles are equipped with V2Vcommunication devices.
• Probe vehicles broadcast their position and speed at every second.
• Probe vehicles are able to calculate position of transferring vehicle by using signal
processing and antenna techniques and its own location.
• Probe vehicles can aggregate microscopic traffic values for each vehicle for last 10 seconds.
• Probe vehicles can execute trained classification method at every second.
3.2 Machine Learning and VANET Implementation
Traditional static mathematical models are not good at capturing and tracking such dynamic
changes of VANET. In general, machine learning involves two stages, i.e., training and
testing. In the training stage, a model is learned based on the training data while in the testing
stage, the trained model is applied to produce the prediction.
Machine learning methods attempt to find the class of new sample using the way that was
learned from classified examples. In literature, supervised learning term is also used for
classification, since classification algorithm is provided with training sample and its
corresponding outcome. Each outcome is the class of that sample [2]. In this study, three
widely used classification methods: Support Vector Machine, Artificial Neural Networks and
Random Forests are explored. We have implemented only random forest technique for the
VANET implementation.
i) Artificial Neural Networks
Artificial neural networks (ANNs) aim to duplicate the behaviour of a real neural network
which consists of large number of interconnected neurons. An ANN is collection of nodes
which transforms weighted sum of inputs to output value “0” or “1”. Usually, a sigmoidal
transfer function in each node converts the weighted sum to the output. Neural networks have
input nodes as many as the number of inputs and have output nodes where number of outputs
is determined by the number of classes. If two classes exist, one output node is needed. ANN
with multiple layers are called as a multi-layer perceptron. Input nodes are fully
interconnected to hidden layer nodes and hidden layer nodes are linked to a single output
node. Beside these connections, one of input nodes is directly linked to output node. Each
connection between nodes has a weight in ANN [2].
ii) Support Vector Machine
Support vector machines (SVMs) are recently developed and attracting attention from
researches because of their notable accuracy and ability to manage with large and high
dimensional data sets [26]. The SVM algorithm has been employed in various areas, such as
illumination analysis, haptic data prediction, and financial forecasting. There are also many
implementations of SVM in ITS, such as incident detection, traffic speed and flow
predictions, travel time, traffic flow and speed predictions, eye movement detection [1].A
complex curve is needed to separate two classes in original sample space, but after mapping
original sample values onto a more appropriate feature space, two classes can be separated by
a linear decision boundary. Therefore, classification problem becomes finding suitable
transformation problem.
iii) Random Forest
Random forest (RF) is a data mining tool to solve classification and regression related
problems. Growing an ensemble of trees and deciding the class type by voting have improved
classification accuracy significantly. Random vectors are constructed to grow these
ensembles. Each tree is generated from one of random vectors. RF consists of classification
and regression trees. Classification problems are solved by analysing output of trees. Majority
of class votes determines the RF prediction. The generalization error merges to a limiting
value; while adding more trees to RF since over-fitting does not occur in large RFs. Low bias
and correlation are crucial to achieve higher accuracy. In order to achieve low bias, trees are
grown without pruning, and randomization of variables at each node is applied to achieve low
correlation. In this paper random forest (RF) is implemented and explored for VANET
implementation.
4. IMPLEMANTATION DETAILS
Abstract-Vehicular Ad-hoc Networks are created by applying the principles of mobile ad-hoc
networks (MANETs) the spontaneous creation of a wireless network for data exchange to the
domain of vehicles. It was shown that vehicle-to-vehicle and vehicle-to-roadside
communications architectures will co-exist in VANETs to provide road safety, navigation, and
other roadside services. VANETs are a key part of the intelligent transportation systems (ITS)
framework. Sometimes, VANETs are referred as Intelligent Transportation Networks. In this
study, our main aim to proceed with VANET’s and designing them uses Random Forest
Technique of Machine Learning. We summarize proposing how Random Forest Algorithm
can be easily implemented to ensure Road safety.
4.1 PARAMETERS OF RANDOM FOREST
The parameters which are taken under consideration for the random forest are explained as
follows:
Number of Vehicle Trees: It is true that generally more trees will result in better accuracy.
However, more trees also mean more computational cost and after certain number of trees the
improvement id negligible.
Max depth of the Vehicle Trees: Maximum depth represents the depth of each tree in the
forest. Deeper the tree, the more splits it has and thus captures more information about the
data. The depth of VANET tree depends on the number of vehicular information we include in
our tree.
Minimum Sample Split for Vehicle Tree: Minimum samples split represent the minimum
number of sample required to split an internal node. This can vary between considering at
least one sample at each node to considering all of the samples at each node. The split sample
includes the features we are including in Vehicular tree. The sample required to split internal
node of VANET tree includes the position of vehicles, speed of vehicles, traffic condition,
accuracy of position etc.
Minimum Sample Leaf for Vehicular Tree: Minimum sample leaf is the minimum number
of samples required to be at leaf node. This parameter is similar to min sample split. However,
this describes the minimum number of samples at the leaves, the base of the Vehicular tree.
The leaves decide the final outcome of classifier tree whether the position of vehicle is
accurate or not.
Number of Random Features of Vehicular Tree: Maximum features represent the numbers
to consider when looking for the best split. Some features considered are current GPS position
and the speed of the car, distance, between car and traffic condition.
4.2 DATASET INFORMATION
Datasets contain traces for VANET and each dataset contains a number of requests sent from
different cars (senders) to one car (receiver) requesting a specific data transmission rate with a
specific severity. Also, each request has a start and end time. The generated dataset is based
on simulations of Erlangen city in Italy. Dataset has some specific parameters which are
essential for VANET implementation.
Dataset under consideration contains the following fields:

Start time: Time in seconds when the request arrives.
End time: Time in seconds when the request is done.
Time Period: End time - Start time
Packets: Number of packets required by this request.
Rate: Number of packets divided by time period, packets per second.
Sender Stopping Distance: In meters.
Receiver Stopping Distance: In meters.
Actual Distance: Distance in meters between sender and receiver.
Severity: Severity of the request.
4.3 IMPLEMANTATION
Implementation is broken down into two steps

1. Calculating Splits.
2. Dataset Case Study.
4.3.1 Calculating Splits
In a decision tree, split points are chosen by finding the attribute and the value of that attribute
that result in the lowest cost. Gini index calculates the purity of the groups of data created by
the split point. A Gini index of 0 is perfect purity where class values are perfectly separated
into two groups, in the case of a two-class classification problem.
get_split() function takes a dataset and a fixed number of input features from to evaluate as
input arguments, where the dataset may be a sample of the actual training dataset.
test_split() is used to split the dataset by a candidate split point and gini_index() is used to
evaluate the cost of a given split by the groups of rows created.
A list of features is created by randomly selecting feature indices and adding them to a list
(called features), this list of features is then enumerated and specific values in the training
dataset evaluated as split points.
4.3.2 Dataset Case Study
1. The example assumes that a CSV copy of the dataset is in the current working directory
with the file name Erlang.csv.
2. The dataset is first loaded; the string values converted to numeric and the output column is
converted from strings to the integer values of 0 and 1. This is achieved with helper functions
load_csv(), str_column_to_float() and str_column_to_int() to load and prepare the dataset.
3. We will use n-fold cross validation to estimate the performance of the learned model on
unseen data. This means that we will construct and evaluate n models and estimate the
performance as the mean model error. Classification accuracy will be used to evaluate each
model. These behaviours are provided in the cross_validation_split(), accuracy_metric()
and evaluate_algorithm() helper functions.
4. We will also use an implementation of the Classification and Regression Trees (CART)
algorithm adapted for bagging including the helper functions test_split() to split a dataset into
groups, gini_index() to evaluate a split point, our modified get_split() function discussed in
the previous step, to_terminal(), split() and build_tree() used to create a single decision tree,
predict() to make a prediction with a decision tree, subsample() to make a subsample of the
training dataset and bagging_predict() to make a prediction with a list of decision trees.
5. A new function name random_forest() is developed that first creates a list of decision trees
from subsamples of the training dataset and then uses them to make predictions.
6. As we stated above, the key difference between Random Forest and bagged decision trees
is the one small change to the way that trees are created, here in the get_split() function.
5. RESULT AND DISCUSSIONS
Following points can be drawn from analysis of given data using random forest for VANET
implementation:
1. A n value of 5 was used for cross-validation, giving each fold 882/5 = 176.4 or just over 40
records to be evaluated upon each iteration.
2. Deep trees were constructed with a max depth of 10 and a minimum number of training
rows at each node of 1. Samples of the training dataset were created with the same size as the
original dataset, which is a default expectation for the Random Forest algorithm.
3. The number of features considered at each split point was set to sqrt(num_features) or
sqrt(9)=3 features.
4. A suite of 6 different numbers of trees were evaluated for comparison, showing the
increasing skill as more trees are added.
5. Running the dataset prints the scores for each fold and mean score for each configuration.
Figure 1: Random forest accuracy on VANET implementation
6. Scatter plot depicting the correlation between Time Period, Packets and Rate is shown as
Follows:
Figure 2: Scatter plot representation

6. CONCLUSION AND FUTURE WORK
The possibility of applying machine learning to address problems in high mobility vehicular
networks turned out to be of great advantage. Machine learning is believed to be a promising
solution to this challenge due to its remarkable performance in various AI related areas. The
evaluation of the V2V data communication model is done in proposed work. The mean
accuracy of 91.477 is achieved by applying random forest to given collected real time vehicle
dataset. Moreover, the proposed method can provide estimated geographical location of the
possible accident, which could be useful for highway administration to response immediately
or prevent secondary accident. Further we can perform VANET implementation from other
supervised machine learning approaches in order to perform accuracy comparison among
them. This work can further be enhanced by considering accidents or other incidents as
outliers in traffic data and machine learning algorithms can be employed to detect those
outliers. Upon detection of traffic incident, other probe vehicles can be warned about the
incident and this will help drivers take some action to prevent further incidents and redirect
their route.
REFERNCES
[1] N. Dogru and A. Subasi, "Traffic accident detection using random forest classifier,"
2018 15th Learning and Technology Conference (L&T), Jeddah, 2018, pp. 40-45.
doi: 10.1109/LT.2018.8368509
[2] N. Taherkhani and S. Pierre, "Centralized and Localized Data Congestion Control
Strategy for Vehicular Ad Hoc Networks Using a Machine Learning Clustering
Algorithm," in IEEE Transactions on Intelligent Transportation Systems, vol. 17,
no.11, pp. 3275-3285, Nov. 2016.doi: 10.1109/TITS.2016.2546555
[3] Y. Wang, Z. Ding, F. Li, X. Xia and Z. Li, "Design and implementation of a VANET
application complying with WAVE protocol," 2017 International Conference on
Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai,
2017, pp. 2333-2338. doi: 10.1109/WiSPNET.2017.8300177
[4] K. Golestan et al., “Vehicular ad-hoc networks (VANETs): Capabilities, challenges in

information gathering and data fusion,” in Autonomous and Intelligent Systems. Berlin,
Germany: Springer-Verlag, 2012.
[5] A. Paul, D. P. Mukherjee, P. Das, A. Gangopadhyay, A. R. Chintha and S. Kundu,

"Improved Random Forest for Classification," in IEEE Transactions on Image
Processing, vol. 27, no. 8, pp. 4012-4024, Aug. 2018. doi: 10.1109/TIP.2018.2834830
[6] https://machinelearningmastery.com/implement-random-forest-scratch-python/

Mini ProjectReport

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mini ProjectReport

Uploaded by

Copyright:

Available Formats

ABSTRACT

5. Results and Discussions…………….………………………………….....10

6. Conclusion and Future Work ………………………………………….... 11

Vehicular ad hoc network (VANET) was developed to provide vehicular communications

VANET implementation in a real time system is a challenging task. Many such

3.1 VANET and Assumptions

3.2 Machine Learning and VANET Implementation

i) Artificial Neural Networks

iii) Random Forest

4.1 PARAMETERS OF RANDOM FOREST

4.2 DATASET INFORMATION

Dataset under consideration contains the following fields:

Implementation is broken down into two steps

4.3.1 Calculating Splits

4.3.2 Dataset Case Study

Figure 1: Random forest accuracy on VANET implementation

Figure 2: Scatter plot representation

[4] K. Golestan et al., “Vehicular ad-hoc networks (VANETs): Capabilities, challenges in

[5] A. Paul, D. P. Mukherjee, P. Das, A. Gangopadhyay, A. R. Chintha and S. Kundu,

You might also like