You are on page 1of 33

VEHICLE COUNT PREDICTION USING SENSOR DATA

A PROJECT REPORT

Submitted in Partial Fulfillment of the requirements for the Award of the Degree of

BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

By

D. VAMSI (171FA07056)

G. SRAVAN KUMAR (171FA07058)

K.SRI HARISH (171FA07062)

K.PAVAN (171FA07066)

in

INTERNET OF THINGS

Under the Esteemed Guidance of

Dr. N. Veeranjeneyulu

Professor

Department of Information Technology

Vadlamudi,Guntur,A.P

India
CERTIFICATE
This is to certify that the dissertation entitled “Vehicle Count Prediction Using Sensor Data” is
submitted by D.VAMSI (171FA07056), G.SRAVANKUMAR (171FA07058), K.SRIHARISH
(171FA07062), K.PAVAN (171FA07066) in their partial fulfillment of the requirement of the
award of the degree of bachelor of technology, Vignan’s foundation for science technology and
research University,Guntur is a record of bonafide work carried out by them under my guidance
and supervision. The result embodied in this thesis has not been submitted to any other university
or institution for the award of any degree or diploma.

INTERNAL GUIDE: HEAD OF DEPARTMENT:

Dr. N. Veeranjeneyulu Dr. K. V. Krishna Kishore

Professor, Professor,

Department of IT. Department of IT.

EXTERNAL SIGNATURE

i
ACKNOWLEDGEMENT

It is indeed with a great pleasure and immense sense of gratitude that we acknowledge
the help of these individuals.

We would like to thank our Dr.K.V.Krishna Kishore, Head of the Department of


Information Technology ,VFSTR ,for this constructive criticism throughout our project.

We feel elated in manifesting our sense of gratitude tour internal project guide Dr.N.
Veeranjeneyulu, Professor, Department of information Technology,VFSTR. He has been a
constant source of inspiration for us and we are very deeply thankful to him for his support and
valuable advice.

We extremely grateful to our Departmental staff members, Lab technicians and Non-teaching
staff members for their extreme help throughout our project.

Finally we wxpress our heartful thanks to all of our friends who helped us in successful
completion of this project.

D.VAMSI (171FA07056)

G.SRAVAN KUMAR (171FA07058)

K.SRI HARISH (171FA07062)

K.PAVAN (171FA07066)

ii
DECLARATION
We hereby declare that project titled “Vehicle count Prediction using Sensor data” is a
bonafide original record done by us at VFSTR, vadlamudi towards the partial fulfillment of
requirement for the award of degree of Bachelor of technology in Information Technology in
VFSTR, vadlamudi and also we state that this project has not been submitted anywhere in the
partial fulfillment for any degree of this or any other University.

Date

Place Signature

ii
ii
TABLE OF CONTENTS

S.No Content Page No

1 Objective 1-1
2 Abstract 2-2
3 Problem Statement 3-3
4 Introduction 4-4
5 Dataset Description 5-5
5 Requirements 6-6
6 Methodology 7-7
7 Algorithm 8-8
8 Source Code 9-23
9 Evaluation Metric & Results 24-24
10 Conclusion 25-25
11 References 26-26
IOT Minor Project

OBJECTIVE

The main objective of this project is to apply different Machine Learning Algorithms on
IoT Sensor Data to predict the count of vehicles passed at particular junction.
 Analyzing the Data
 Finding the Hidden trends
 Applying the Machine Learning algorithms

Dept of IT Page 1
IOT Minor Project

ABSTRACT

IoT devices are becoming popular nowadays. The widespread use of IoT yields huge amounts of
raw data. This data can be effectively processed by using machine learning to derive many useful
insights that can become game changers and affect our lives deeply. ML is becoming an essential
player in a growing array of process areas involving image recognition, natural language
processing, forecasting, prediction, and process optimization. ML is evolving to the point of
being able to draw interesting patterns and inferences from these real time data streams, and
make those results available to analysts as well as to embed them directly in business processes.
We are going to predict traffic patterns in each of these four junctions for the next 4 months
using Ensembling Techniques (Regression Analysis) and other Algorithms.

Dept of IT Page 2
IOT Minor Project

PROBLEM STATEMENT

You are working with the government to transform your city into a smart city. The vision is to
convert it into a digital and intelligent city to improve the efficiency of services for the citizens.
One of the problems faced by the government is traffic. You are a data scientist working to
manage the traffic of the city better and to provide input on infrastructure planning for the future.

The government wants to implement a robust traffic system for the city by being prepared for
traffic peaks. They want to understand the traffic patterns of the four junctions of the city. Traffic
patterns on holidays, as well as on various other occasions during the year, differ from normal
working days. This is important to take into account for your forecasting.

Your task

To predict traffic patterns in each of these four junctions for the next 4 months.

The sensors on each of these junctions were collecting data at different times, hence you will see
traffic data from different time periods. To add to the complexity, some of the junctions have
provided limited or sparse data requiring thoughtfulness when creating future projections.
Depending upon the historical data of 20 months, the government is looking to you to deliver
accurate traffic projections for the coming four months. Your algorithm will become the
foundation of a larger transformation to make your city smart and intelligent.

Dept of IT Page 3
IOT Minor Project

INTRODUCTION

Sensor data is the output of a device that detects and responds to some type of input from the
physical environment. The output may be used to provide information or input to another system
or to guide a process. An IoT system consists of sensors/devices which “talk” to the cloud
through some kind of connectivity. Once the data gets to the cloud, software processes it and
then might decide to perform an action, such as sending an alert or automatically adjusting
the sensors/devices without the need for the user.

With a sensor, a machine observes the environment and information can be collected.
A sensor measures a physical quantity and converts it into a signal. Sensors translate
measurements from the real world into data for the digital domain.

Dept of IT Page 4
IOT Minor Project

DATA SET DESCRIPTION

Train.csv (48120 X 4)
Variable Description
ID Unique ID
DateTime Hourly Datetime Variable
Junction Junction Type
Number of Vehicles
Vehicles
(Target)

Test.csv (11808 X 3)

Variable Description
ID Unique ID
DateTime Hourly Datetime Variable
Junction Junction Type

Dept of IT Page 5
IOT Minor Project

REQUIREMENTS

Software Requirements
 Windows

 Anaconda 3. X (Jupyter Notebook)

 Intel i3 processor

Hardware Requirements

 Hard disk

 Processor: Minimum 1 GHz; Recommended 2GHz or more.

 Ethernet connection (LAN) OR a wireless adapter (Wi-Fi)

 Hard Drive: Minimum 32 GB; Recommended 64 GB or more.

 Memory (RAM): Minimum 1 GB; Recommended 4 GB or above.

Dept of IT Page 6
IOT Minor Project

METHDOLOGY

We are predicting the vehicle count at particular junction using date generated by a sensor.

The vehicle count is of numeric type. So we are going to apply Regression Techniques on the
data.

Regression:
A regression problem is when the output variable is a real or continuous value, such as “salary” or
“weight”. Many different models can be used, the simplest is the linear regression. It tries to fit
data with the best hyperplane which goes through the points.
Regression Analysis is a statistical process for estimating the relationships between the dependent
variables or criterion variables and one or more independent variables or predictors. Regression
analysis explains the changes in criterions in relation to changes in select predictors. The
conditional expectation of the criterions based on predictors where the average value of the
dependent variables is given when the independent variables are changed. Three major uses for
regression analysis are determining the strength of predictors, forecasting an effect, and trend
forecasting.

Regression Techniques Applied:

 Ensembling Techniques:

Bagging and Boosting are two of the most commonly used techniques in machine learning.

Bagging algorithms:

 RandomForest Regressor
 Bagging Regressor

Boosting algorithms:

 AdaBoost Regressor
 Light GBM (LGBM Regressor)
 CatBoost Regressor
 Gradient Boosting Regressor

 Decision Tree

Dept of IT Page 7
IOT Minor Project

ALGORITHM

Import necessary modules

Train
data
Load the data

Test
data Analyse the data

Perform Feature Engineering to


get all data from datetime

Build a Regression
model

Train Train the model


data

Test Test the model and


data predict the results

Performance
Evaluation (RMSE)

Dept of IT Page 8
In [11]: import pandas as pd
import numpy as np

In [12]: df=pd.read_csv("C:/sravan//train.csv")
df1=pd.read_csv("C:/sravan//test.csv")

In [13]: df.head()

Out[13]:
DateTime Junction Vehicles ID

0 01-11-2015 00:00 1 15 20151101001

1 01-11-2015 01:00 1 13 20151101011

2 01-11-2015 02:00 1 10 20151101021

3 01-11-2015 03:00 1 7 20151101031

4 01-11-2015 04:00 1 9 20151101041

In [14]: def Create(df):

df['Year'] = pd.to_datetime(df['DateTime']).dt.year

df['Month'] = pd.to_datetime(df['DateTime']).dt.month

df['Day'] = pd.to_datetime(df['DateTime']).dt.day

df['Dayofweek'] = pd.to_datetime(df['DateTime']).dt.dayofweek

df['DayOfyear'] = pd.to_datetime(df['DateTime']).dt.dayofyear

df['Week'] = pd.to_datetime(df['DateTime']).dt.week

df['Quarter'] = pd.to_datetime(df['DateTime']).dt.quarter

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
df['Is_month_start'] = pd.to_datetime(df['DateTime']).dt.is_month_s
tart

df['Is_month_end'] = pd.to_datetime(df['DateTime']).dt.is_month_end

df['Is_quarter_start'] = pd.to_datetime(df['DateTime']).dt.is_quart
er_start

df['Is_quarter_end'] = pd.to_datetime(df['DateTime']).dt.is_quarter
_end

df['Is_year_start'] = pd.to_datetime(df['DateTime']).dt.is_year_sta
rt

df['Is_year_end'] = pd.to_datetime(df['DateTime']).dt.is_year_end

df['Semester'] = np.where(df['Quarter'].isin([1,2]),1,2)

df['Is_weekend'] = np.where(df['Dayofweek'].isin([5,6]),1,0)

df['Is_weekday'] = np.where(df['Dayofweek'].isin([0,1,2,3,4]),1,0)

df['Days_in_month'] = pd.to_datetime(df['DateTime']).dt.days_in_mon
th

df['Hour'] = pd.to_datetime(df['DateTime']).dt.hour

return df

In [15]: df=Create(df)

In [16]: df1=Create(df1)

In [17]: df.columns

Out[17]: Index(['DateTime', 'Junction', 'Vehicles', 'ID', 'Year', 'Month', 'Da


y',
'Dayofweek', 'DayOfyear', 'Week', 'Quarter', 'Is_month_start',

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
'Is_month_end', 'Is_quarter_start', 'Is_quarter_end', 'Is_year_s
tart',
'Is_year_end', 'Semester', 'Is_weekend', 'Is_weekday', 'Days_in_
month',
'Hour'],
dtype='object')

In [18]: df1.columns

Out[18]: Index(['DateTime', 'Junction', 'ID', 'Year', 'Month', 'Day', 'Dayofwee


k',
'DayOfyear', 'Week', 'Quarter', 'Is_month_start', 'Is_month_en
d',
'Is_quarter_start', 'Is_quarter_end', 'Is_year_start', 'Is_year_
end',
'Semester', 'Is_weekend', 'Is_weekday', 'Days_in_month', 'Hou
r'],
dtype='object')

In [19]: target=df['Vehicles']
df=df.drop(['DateTime','Vehicles'],axis=1)
df1=df1.drop(['DateTime'],axis=1)
df['Year'].hist(figsize=(8,8),color="green")

Out[19]: <matplotlib.axes._subplots.AxesSubplot at 0x1d682e7e348>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [20]: df['DayOfyear'].hist(figsize=(8,8))

Out[20]: <matplotlib.axes._subplots.AxesSubplot at 0x1d6840f9508>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [21]: df['Dayofweek'].hist(figsize=(8,8),color="yellow")

Out[21]: <matplotlib.axes._subplots.AxesSubplot at 0x1d6840a2a48>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [22]: df['Year'].hist(figsize=(8,8),color="red")

Out[22]: <matplotlib.axes._subplots.AxesSubplot at 0x1d684573f88>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [23]: df['Month'].hist(figsize=(12,8))

Out[23]: <matplotlib.axes._subplots.AxesSubplot at 0x1d68508c848>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [24]: df['Year'].hist(figsize=(12,8))

Out[24]: <matplotlib.axes._subplots.AxesSubplot at 0x1d6824b70c8>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [25]: df['Day'].hist(figsize=(13,8))

Out[25]: <matplotlib.axes._subplots.AxesSubplot at 0x1d685005188>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [26]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48120 entries, 0 to 48119
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Junction 48120 non-null int64
1 ID 48120 non-null int64
2 Year 48120 non-null int64
3 Month 48120 non-null int64
4 Day 48120 non-null int64
5 Dayofweek 48120 non-null int64
6 DayOfyear 48120 non-null int64
7 Week 48120 non-null int64
8 Quarter 48120 non-null int64
9 Is_month_start 48120 non-null bool

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
10 Is_month_end 48120 non-null bool
11 Is_quarter_start 48120 non-null bool
12 Is_quarter_end 48120 non-null bool
13 Is_year_start 48120 non-null bool
14 Is_year_end 48120 non-null bool
15 Semester 48120 non-null int32
16 Is_weekend 48120 non-null int32
17 Is_weekday 48120 non-null int32
18 Days_in_month 48120 non-null int64
19 Hour 48120 non-null int64
dtypes: bool(6), int32(3), int64(11)
memory usage: 4.9 MB

In [27]: df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11808 entries, 0 to 11807
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Junction 11808 non-null int64
1 ID 11808 non-null int64
2 Year 11808 non-null int64
3 Month 11808 non-null int64
4 Day 11808 non-null int64
5 Dayofweek 11808 non-null int64
6 DayOfyear 11808 non-null int64
7 Week 11808 non-null int64
8 Quarter 11808 non-null int64
9 Is_month_start 11808 non-null bool
10 Is_month_end 11808 non-null bool
11 Is_quarter_start 11808 non-null bool
12 Is_quarter_end 11808 non-null bool
13 Is_year_start 11808 non-null bool
14 Is_year_end 11808 non-null bool
15 Semester 11808 non-null int32
16 Is_weekend 11808 non-null int32
17 Is_weekday 11808 non-null int32
18 Days_in_month 11808 non-null int64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
19 Hour 11808 non-null int64
dtypes: bool(6), int32(3), int64(11)
memory usage: 1.2 MB

In [28]: target

Out[28]: 0 15
1 13
2 10
3 7
4 9
..
48115 11
48116 30
48117 16
48118 22
48119 12
Name: Vehicles, Length: 48120, dtype: int64

In [ ]:

In [45]: from sklearn.ensemble import RandomForestRegressor


a=RandomForestRegressor()
a.fit(df,target)
r1=a.predict(df1)
r1=pd.DataFrame(r1)
print(a.score(df,target)*100)

99.5207692247877

In [46]: r1.head()

Out[46]:
0

0 63.97

1 51.26

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0

2 39.55

3 35.77

4 32.64

In [47]: from sklearn.ensemble import AdaBoostRegressor


a=AdaBoostRegressor()
a.fit(df,target)
r2=a.predict(df1)
r2=pd.DataFrame(r2)
print(a.score(df,target)*100)

62.00277919615114

In [48]: from sklearn.ensemble import BaggingRegressor


a=BaggingRegressor()
a.fit(df,target)
r3=a.predict(df1)
r3=pd.DataFrame(r3)
print(a.score(df,target)*100)

99.30552540068676

In [49]: from lightgbm import LGBMRegressor


a=LGBMRegressor()
a.fit(df,target)
r4=a.predict(df1)
r4=pd.DataFrame(r4)
print(a.score(df,target)*100)

94.23985890770552

In [ ]: from catboost import CatBoostRegressor


a=CatBoostRegressor()
a.fit(df,target)
r5=a.predict(df1)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
r5=pd.DataFrame(r5)
print(a.score(df,target)*100)

In [51]: r5.head()

Out[51]:
0

0 78.128751

1 68.017779

2 59.221489

3 52.589897

4 47.134557

In [ ]: from sklearn.tree import DecisionTreeRegressor


from sklearn import tree
from matplotlib import pyplot as plt
a= DecisionTreeRegressor()
a.fit(df,target)
r6=a.predict(df1)
r6=pd.DataFrame(r6)
print(a.score(df,target)*100)

In [53]: from sklearn.naive_bayes import GaussianNB


a= GaussianNB()
a.fit(df,target)
r7=a.predict(df1)
r7=pd.DataFrame(r7)
print(a.score(df,target)*100)

4.258104738154613

In [54]: from sklearn.ensemble import GradientBoostingRegressor


a= GradientBoostingRegressor()
a.fit(df,target)
r8=a.predict(df1)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
r8=pd.DataFrame(r8)
print(a.score(df,target)*100)

88.64108507601487

In [62]: df1['Random Forest regressor']=r1


df1['Ada boost regressor']=r2
df1['Bagging regressor']=r3
df1['lightgbm regressor']=r4
df1['catboost regressor']=r5
df1['Decision tree regressor']=r6
df1['GaussianNB']=r7
df1['GradientBoostingRegressor']=r8

In [63]: df1.to_csv(r"C:/sravan//results.csv")

In [ ]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
IOT Minor Project

EVALUATION METRIC & RESULTS

The evaluation metric for this competition is Root Mean Squared Error (RMSE).

Predicted Results:

RMSE Scores:
IOT Minor Project

CONCLUSION

Most of the Regression Algorithms performed well. After Feature Engineering RMSE error was
decreased. Among all, Random Forest Regressor is given high accuracy in 25 % test (Public)
data. Gradient Boosting regressor given best accuracy in 75% test (Private) data.

So Gradient Boosting Regressor is the best Regression Technique for this problem
IOT Minor Project

REFERENCES

1. https://datahack.analyticsvidhya.com/contest/janatahack-machine-learning-for-
iot/#ProblemStatement
2. https://www.google.com/search?safe=strict&rlz=1C1JZAP_enIN913IN913&sxsrf=ALe
Kk00HjvD6Aj8yCreDdhHxPgQYZVFTEA%3A1608041780996&ei=NMXYX6aoPLLj
z7sPiMmO0AU&q=machine+lenring%27+regression+paers+vehicle+sensor+data+iot&
oq=machine+lenring%27+regression+paers+vehicle+sensor+data+iot&gs_lcp=CgZwc3k
tYWIQAzIHCCEQChCgATIHCCEQChCgAToECAAQRzoJCAAQyQMQDRAeOgYI
ABANEB46CQgAEMkDEBYQHjoGCAAQFhAeOggIABAWEAoQHjoLCAAQyQMQ
CBANEB46CAghEBYQHRAeOgQIIRAKUKQ0WORsYNNtaAFwAXgDgAHAA4gBg
TKSAQowLjE3LjkuMS4ymAEAoAEBqgEHZ3dzLXdpesgBCMABAQ&sclient=psy-
ab&ved=0ahUKEwimko-5ltDtAhWy8XMBHYikA1oQ4dUDCA0&uact=5

You might also like