You are on page 1of 11

SCHOOL OF ELECTRONICS ENGINEERING

Course: - ECE3502, (IoT Domain Analyst)


Experiment/Task No.: 01
Date: 15-04-2023
Name of the student: B. Karthik Reddy
Regd. No.: 20BEC0512

1. Aim of the Experiment/Simulation:

Perform the following things and predict using Time series analysis (Write the code
using Python and explain every step) [4 marks]

1. Plot and visualize the data (First and last 5 rows)


2. Evaluate and plot the Rolling Statistics (mean and standard deviation)
3. Check stationarity of the dataset (Dickey Fuller Test, Augmented Dickey Fuller
Test)
4. Make the data stationary by
(a) taking Log
(b) subtracting Rolling Average
(c) subtracting Exponential Rolling Average
(d) subtracting previous values
(e) Seasonal Decomposition
5. Make the prediction of passengers for next 8 years using ARIMA model check the
confidence interval.
6. Check ARIMA (2,1,0), (0,1,2), (2,1,2)

2. Write a python code to identify the clusters by applying the k-means algorithm. Represent the
Clusters and centroids after each passthrough k-means algorithm. Data Set: Income. [3 marks]

3. Explain Decision Tree Algorithm in Machine learning using an example. Mention the dataset
link and python programming details in your answer sheet. [3 marks]

Name of the Simulation Platform: PYTHON

Programs and Outputs:


(Codes can be directly pasted from the IDE with highlighting your Name & Regd. No)
(Snapshot of the Output should be pasted after the code)
1. Plot and visualize the data (First and last 5 rows)

The first and last 5 rows have been printed using head() and Tail() command.
Next, Data has been visualised using matplotlib plot command.
2. Evaluate and plot the Rolling Statistics (mean and standard deviation)

3. Check stationarity of the dataset (Dickey Fuller Test, Augmented Dickey Fuller Test)

The p-value is greater than 0.05, so we cannot reject the null hypothesis. This suggests that the
time series is non-stationary.
4. Make the data stationary by (a) taking Log.

The p-value is greater than 0.05, indicating that the null hypothesis of non-stationarity cannot be rejected.
The time series is still non-stationary.
(b) Make the data stationary by subtracting Rolling Average
The p-value is less than 0.05, indicating that the null hypothesis of non-stationarity can be rejected. The
time series is now stationary.

(c) Make the data stationary by subtracting Exponential Rolling Average

The p-value is greater than 0.05, indicating that the null hypothesis of non-stationarity cannot be rejected.
The time series is still non-stationary.

(d) Make the data stationary by subtracting previous values


The p-value is greater than 0.05, indicating that the null hypothesis of non-stationarity cannot be rejected.
The time series is still non-stationary.
(e) Make the data stationary by Seasonal Decomposition

The p-value is very small, indicating that the null hypothesis of non-stationarity can be rejected. The residual
component is now stationary.

5. Make the prediction of passengers for next 8 years using ARIMA model check the
confidence interval.
6. Check ARIMA (2,1,0), (0,1,2), (2,1,2)

The results show that the ARIMA(0,1,2) model has the lowest AIC value, which means it is the best fit for the data.
2. Write a python code to identify the clusters by applying the k-means algorithm. Represent the
Clusters and centroids after each passthrough k-means algorithm. Data Set: Income. [3 marks]

STEP-1 : Importing necessary libraries.

STEP-2: Loading the dataset.

STEP-3:Plotting the data points


STEP-4: Performing k-means clustering and plotting the clusters and centroids for 3 different types.
3. Explain Decision Tree Algorithm in Machine learning using an example. Mention the dataset link
and python programming details in your answer sheet. [3 marks]

Decision tree is a popular machine learning algorithm used for both classification and regression tasks. It is a tree-like
model that makes decisions based on the input features to arrive at a prediction. Each internal node in the decision
tree corresponds to a feature, while the edges represent the decision rules based on that feature. The leaf nodes
represent the output or the predicted value.

here's an example of a dataset from Kaggle that we can use to demonstrate the decision tree algorithm:

We will use the famous Iris dataset that contains measurements for 150 iris flowers from three different species. The
dataset can be found at https://www.kaggle.com/uciml/iris

Here are the columns in the dataset:


sepallengthCm: sepal length in cm
sepalwidthCm: sepal width in cm
petallengthCm: petal length in cm
petalwidthCm: petal width in cm
Species: the species of the iris flower, which can be 'Iris-setosa', 'Iris-versicolor', or 'Iris-virginica'

We will use Python and scikit-learn library to implement the decision tree algorithm.

STEP-1: Importing the Necessary Libraries and loading the dataset.

STEP-2: spliting it into features and target variables:

STEP-3: splitting the dataset into training and testing sets:

STEP-4: creating the decision tree classifier and fit it on the training data:
STEP-5: now using the trained classifier to make predictions on the testing data:

STEP-6: Finally, we can evaluate the performance of the classifier using various metrics:

SAMPLE PREDICTIONS :

Karthik reddy
Signature of the Student

You might also like