You are on page 1of 37

Data Analytics for IoT Use Cases

Submitted By

RADHA MOHAN (18BEC1148)

A Laboratory report submitted to

Dr.VELMATHI.G

SCHOOL OF ELECTRONICS ENGINEERING

in partial fulfillment of the requirements for the course of

ECE3502 –IoT Domain Analyst

Vandalur – Kelambakkam Road

Chennai – 600127

May 2021
ACKNOWLEDGEMENT

I wish to express my sincere thanks and deep sense of gratitude to our Lab
guide, Dr.VELMATHI.G Associate Professor, School of Electronics
Engineering, for her immense and consistent encouragement and valuable
guidance offered to me in a pleasant manner throughout the course of the lab
work.

I am extremely grateful to Dr. Sivasubramanian A,Dean of School of


Electronics Engineering (SENSE), VIT Chennai, for extending the facilities of
the School towards our Lab and for his support.

I express my thanks to our Head of the Department Dr. Vetrivelan. P for his
support throughout the course of this Lab.

I also take this opportunity to thank all the faculty of the School for their
support and their wisdom imparted to us throughout the course.

I thank my parents, family, and friends for bearing with us throughout the
course of our Lab and for the opportunity they provided us in undergoing this
course in such a prestigious institution.

RADHA MOHAN
TABLE OF CONTENTS

Sl.No Date TITLE Page No.

1 17-Feb Data Visualization using Tableu 4-8

2 3-March Predictive Analysis using WEKA 9-15

3 2-April Predictive Analysis using KNIME 16-19

4 16-April Predictive Analysis using PyTorch 20-25

5 30-April Predictive Analysis using SKlearn 26-28

6 13-May Predictive Analysis using Orange 29-32

7 20-May Predictive Analysis using Rapid Miner 33-36


Experiment 1a
AIM: To do the data insight visualization using Tableau for the healthcare dataset.
Dataset: The dataset comprises of the information of the United State of America’s health
sector that includes various attributes such as Patients satisfaction, staff per division etc.

Tableau Sheets:
Result Dashboard:
Experiment 1b
AIM: To do the data insight visualization using Tableau for the traffic related data.
Dataset: The dataset comprises of the information of the I-94 Westbound Traffic Volume
For Minnesota Department of Transportation Automatic Traffic Recorder Station 301.

Tableau Sheets:
Result Dashboard:
Experiment 2
AIM: To do the predictive analysis using WEKA for the leaf dataset.
Dataset: The dataset comprises of the information of particular leaf based on
15 parameters such as Aspect ratio, eccentricity, elongation etc

WEKA screenshots:
Result:
Following is analysis using logistic regression classifier over the given dataset.
Experiment 3
AIM: To do the predictive analysis using WEKA for the weather dataset.
Dataset: The dataset comprises of the information about weather based on
parameters such as temperature, humidity, windy.

WEKA Screenshots:
Result:
Analysis is done for the given dataset using random forest based classification
method.
Experiment 4
AIM: To do the predictive analysis using KNIME Analytics platform for the
imdb dataset.

Dataset:

KNIME screenshots:
This KNIME workflow focuses on creating a scoring model based on historical
data. As with all data mining modeling activities, it is unclear in advance which
analytic method is most suitable. This workflow therefore uses three different
methods simultaneously – Decision Trees, Neural Networking and SVM – then
automatically determines which model is most accurate and writes that model
out for further use.

This workflow manipulates the data so it is suitable for a variety of modeling


techniques by converting nominals to numerics. The data was enhanced so that
understandable labels are used. It uses metanodes to “package” each technique
suitable for reuse. Each Model uses a Test / Learn and cross validated process to
ensure accuracy. The workflow writes out the model in the official PMML
format, so that other applications can use the model.
Results:
Experiment 5
AIM: To do the predictive analysis using PYTORCH for the stock dataset.
Dataset: The dataset comprises of the information of particular Stock data
based on previous year data of stock prices.

Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import torch
import torch.nn as nn
from google.colab import drive
drive.mount('/content/drive')
filepath = '/content/drive/MyDrive/stock_data/AAPL_2006-01-01_to_2018-01-
01.csv'
data = pd.read_csv(filepath)
data = data.sort_values('Date')
data.head()
sns.set_style("darkgrid")
plt.figure(figsize = (15,9))
plt.plot(data[['Close']])
plt.xticks(range(0,data.shape[0],100),data['Date'].loc[::100],rotation=45)
plt.title("Amazon Stock Price",fontsize=18, fontweight='bold')
plt.xlabel('Date',fontsize=18)
plt.ylabel('Close Price (USD)',fontsize=18)
plt.show()
price = data[['Close']]
print(price)
scaler = MinMaxScaler(feature_range=(-1, 1))
price['Close'] = scaler.fit_transform(price['Close'].values.reshape(-1,1))
def split_data(stock, lookback):
data_raw = stock.to_numpy() # convert to numpy array
data = []

# create all possible sequences of length seq_len


for index in range(len(data_raw) - lookback):
data.append(data_raw[index: index + lookback])

data = np.array(data);
test_set_size = int(np.round(0.2*data.shape[0]));
train_set_size = data.shape[0] - (test_set_size);

x_train = data[:train_set_size,:-1,:]
y_train = data[:train_set_size,-1,:]
x_test = data[train_set_size:,:-1:]
y_test = data[train_set_size:,-1,:]

return [x_train, y_train, x_test, y_test]


lookback = 20 # choose sequence length
x_train, y_train, x_test, y_test = split_data(price, lookback)
print('x_train.shape = ',x_train.shape)
print('y_train.shape = ',y_train.shape)
print('x_test.shape = ',x_test.shape)
print('y_test.shape = ',y_test.shape)
x_train = torch.from_numpy(x_train).type(torch.Tensor)
x_test = torch.from_numpy(x_test).type(torch.Tensor)
y_train_lstm = torch.from_numpy(y_train).type(torch.Tensor)
y_test_lstm = torch.from_numpy(y_test).type(torch.Tensor)

input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
num_epochs = 100
import math, time
from sklearn.metrics import mean_squared_error

# make predictions
y_test_pred = model(x_test)
# invert predictions
y_train_pred = scaler.inverse_transform(y_train_pred.detach().numpy())
y_train = scaler.inverse_transform(y_train_lstm.detach().numpy())
y_test_pred = scaler.inverse_transform(y_test_pred.detach().numpy())
y_test = scaler.inverse_transform(y_test_lstm.detach().numpy())

# calculate root mean squared error


trainScore = math.sqrt(mean_squared_error(y_train[:,0], y_train_pred[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(y_test[:,0], y_test_pred[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
lstm.append(trainScore)
lstm.append(testScore)
lstm.append(training_time)
import math, time
from sklearn.metrics import mean_squared_error

# make predictions
y_test_pred = model(x_test)
# invert predictions
y_train_pred = scaler.inverse_transform(y_train_pred.detach().numpy())
y_train = scaler.inverse_transform(y_train_lstm.detach().numpy())
y_test_pred = scaler.inverse_transform(y_test_pred.detach().numpy())
y_test = scaler.inverse_transform(y_test_lstm.detach().numpy())

# calculate root mean squared error


trainScore = math.sqrt(mean_squared_error(y_train[:,0], y_train_pred[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(y_test[:,0], y_test_pred[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
lstm.append(trainScore)
lstm.append(testScore)
lstm.append(training_time)
sns.set_style("darkgrid")

fig = plt.figure()
fig.subplots_adjust(hspace=0.2, wspace=0.2)

plt.subplot(1, 2, 1)
ax = sns.lineplot(x = original.index, y = original[0], label="Data", color='roy
alblue')
ax = sns.lineplot(x = predict.index, y = predict[0], label="Training Prediction
(LSTM)", color='tomato')
ax.set_title('Stock price', size = 14, fontweight='bold')
ax.set_xlabel("Days", size = 14)
ax.set_ylabel("Cost (USD)", size = 14)
ax.set_xticklabels('', size=10)

plt.subplot(1, 2, 2)
ax = sns.lineplot(data=hist, color='royalblue')
ax.set_xlabel("Epoch", size = 14)
ax.set_ylabel("Loss", size = 14)
ax.set_title("Training Loss", size = 14, fontweight='bold')
fig.set_figheight(6)
fig.set_figwidth(16)
trainPredictPlot = np.empty_like(price)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lookback:len(y_train_pred)+lookback, :] = y_train_pred

# shift test predictions for plotting


testPredictPlot = np.empty_like(price)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(y_train_pred)+lookback-1:len(price)-1, :] = y_test_pred

original = scaler.inverse_transform(price['Close'].values.reshape(-1,1))

predictions = np.append(trainPredictPlot, testPredictPlot, axis=1)


predictions = np.append(predictions, original, axis=1)
result = pd.DataFrame(predictions)
trainPredictPlot = np.empty_like(price)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lookback:len(y_train_pred)+lookback, :] = y_train_pred

# shift test predictions for plotting


testPredictPlot = np.empty_like(price)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(y_train_pred)+lookback-1:len(price)-1, :] = y_test_pred

original = scaler.inverse_transform(price['Close'].values.reshape(-1,1))
predictions = np.append(trainPredictPlot, testPredictPlot, axis=1)
predictions = np.append(predictions, original, axis=1)
result = pd.DataFrame(predictions)
import plotly.express as px
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Scatter(go.Scatter(x=result.index, y=result[0],
mode='lines',
name='Train prediction')))
fig.add_trace(go.Scatter(x=result.index, y=result[1],
mode='lines',
name='Test prediction'))
fig.add_trace(go.Scatter(go.Scatter(x=result.index, y=result[2],
mode='lines',
name='Actual Value')))
fig.update_layout(
xaxis=dict(
showline=True,
showgrid=True,
showticklabels=False,
linecolor='white',
linewidth=2
),
yaxis=dict(
title_text='Close (USD)',
titlefont=dict(
family='Rockwell',
size=12,
color='white',
),
showline=True,
showgrid=True,
showticklabels=True,
linecolor='white',
linewidth=2,
ticks='outside',
tickfont=dict(
family='Rockwell',
size=12,
color='white',
),
),
showlegend=True,
template = 'plotly_dark'

annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
xanchor='left', yanchor='bottom',
text='Results (LSTM)',
font=dict(family='Rockwell',
size=26,
color='white'),
showarrow=False))
fig.update_layout(annotations=annotations)

fig.show()

Result:
Following is analysis over the given dataset.
Experiment 6
AIM: To do the predictive analysis using SCIKIT learn for the Crop
production dataset.

Dataset: The dataset comprises of the information of particular crop data


based on parameters on which cultivation of crop is dependent.

Code:
from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report
from sklearn import metrics
from sklearn import tree
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('/content/drive/MyDrive/CropDataset/Crop_recommendation.csv')
df.head()
features = df[['N', 'P','K','temperature', 'humidity', 'ph', 'rainfall']]
target = df['label']
labels = df['label']
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(features,target,test_size = 0.2
,random_state =2)
from sklearn.tree import DecisionTreeClassifier

DecisionTree = DecisionTreeClassifier(criterion="entropy",random_state=2,max_de
pth=5)

DecisionTree.fit(Xtrain,Ytrain)

predicted_values = DecisionTree.predict(Xtest)
x = metrics.accuracy_score(Ytest, predicted_values)
acc.append(x)
model.append('Decision Tree')
print("DecisionTrees's Accuracy is: ", x*100)

print(classification_report(Ytest,predicted_values))

#-----------------------------------------
from sklearn.naive_bayes import GaussianNB

NaiveBayes = GaussianNB()

NaiveBayes.fit(Xtrain,Ytrain)
predicted_values = NaiveBayes.predict(Xtest)
x = metrics.accuracy_score(Ytest, predicted_values)
acc.append(x)
model.append('Naive Bayes')
print("Naive Bayes's Accuracy is: ", x)

print(classification_report(Ytest,predicted_values))

#----------------------------------------------------------
from sklearn.svm import SVC

SVM = SVC(gamma='auto')

SVM.fit(Xtrain,Ytrain)

predicted_values = SVM.predict(Xtest)

x = metrics.accuracy_score(Ytest, predicted_values)
acc.append(x)
model.append('SVM')
print("SVM's Accuracy is: ", x)

print(classification_report(Ytest,predicted_values))

#---------------------------------------------------
from sklearn.linear_model import LogisticRegression

LogReg = LogisticRegression(random_state=2)

LogReg.fit(Xtrain,Ytrain)

predicted_values = LogReg.predict(Xtest)

x = metrics.accuracy_score(Ytest, predicted_values)
acc.append(x)
model.append('Logistic Regression')
print("Logistic Regression's Accuracy is: ", x)

print(classification_report(Ytest,predicted_values))

#----------------------------------------------------

from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(Xtrain,Ytrain)

predicted_values = RF.predict(Xtest)

x = metrics.accuracy_score(Ytest, predicted_values)
acc.append(x)
model.append('RF')
print("RF's Accuracy is: ", x)
print(classification_report(Ytest,predicted_values))

#------------------------------------------------------

plt.figure(figsize=[10,5],dpi = 100)
plt.title('Accuracy Comparison')
plt.xlabel('Accuracy')
plt.ylabel('Algorithm')
sns.barplot(x = acc,y = model,palette='dark')
data = np.array([[104,18, 30, 23.603016, 60.3, 6.7, 140.91]])
prediction = RF.predict(data)
print(prediction)

Result:
Following is analysis over the given dataset.
Experiment 7
AIM: To do the predictive analysis using Orange Analytics platform for the
corn production in United States.

Dataset: The data consist of overall corn production and parameters


such as area harvested, area planted , yield and other parameters as
particular column of the dataset. Below is the screenshot of dataset.

Orange screenshots:
Orange is an open source data visualization and analysis tool, where data
mining is done through visual programming or Python scripting. The tool has
components for machine learning, add-ons for bioinformatics and text mining
and it is packed with features for data analytics.

For this dataset we are Linear regression model to predict the values , orange
provides plenty of built in ML models to start with. Following are the
snapshots of orange working window.
Results

Following is analysis over the given dataset.


Experiment 8
AIM: To do the predictive analysis using Rapid Miner Analytics platform for
the Weather dataset of India.

Dataset: The dataset is based on monthly weather report of India over


the years. Below is the snapshot of data used.

Rapid Miner Screenshots


We are selecting the analysis type as predictive here , we have other option as
clustering the data in certain groups.
After selecting the model as predictive we are choosing the target column or we
can say it as we are dividing the dataset in features and label.

Here we are selecting the features and checking for any missing or corrupt data
in our dataset.
Here we are selecting the model over which we want to train our dataset. As we
are focused on predictive analysis so we are choosing Generalized Linear
Model, Decision Tree, Random forest and Gradient Boosted Trees.

Results
As we can observe from the below screenshot that here Generalized Linear
Model is best to work with as we are getting lowest error rate and standard
Deviation.
Here we can see that how much our label is dependent on various features of
our dataset.

Here we can simulate various features of our model and simultaneously see the
factors on which our prediction is dependent.

In the graph below we can observe that most of predicted values are close to
true values so we can state that our model is capable of predicting new values
correctly.
BIODATA

Photo
Name Name : Radha Mohan

Mobile Number : 7667913028

E-mail : radha.mohan2018@vitstudent.ac.in

You might also like