You are on page 1of 24

University Institute of Engineering

Department of Computer Science & Engineering

Experiment: 2.1

Student Name: Pankaj kumar UID: 21BCS11537


Branch: Computer Science & Engineering Section/Group: 515 B
Semester: 1 Date of Performance:08/01/2022
Subject Name Disruptive technology

1. Aim of the practical: prediction model based on linear regression

2. Tool Used: Google Collab and Google chrome

3. Code/simulation result screenshots:

3.1 pycaret for regression insall pycaret


!pip install pycaret & > /dev /null
print("pycaret installed sucessfully")

3.2 get the version


from pycaret.utils import version
University Institute of Engineering
Department of Computer Science & Engineering

version()

3.3 loading dataset


from pycaret.datasets import get_data

3.4 get the list of datasets available in pycaret


datasets = get_data ('index')
University Institute of Engineering
Department of Computer Science & Engineering

3.6 get Titanic dataset


titanicDataSet =get_data("titanic")
University Institute of Engineering
Department of Computer Science & Engineering

3.7 read data from file


#import pandas as pd

#titanicDataSet = pd.read_csv("myFile.csv")

3.8 import regression techniques from datasets


from pycaret.regression import*

s= setup(data = titanicDataSet, target = 'Survived', silent = True)


University Institute of Engineering
Department of Computer Science & Engineering

3.9 Run and compare the model performance


cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.10 Three lines of code for model comparison for “Insurance” dataset
from pycaret.datasets import get_data

from pycaret.regression import *

insuranceDataSet = get_data("insurance")

s = setup(data = insuranceDataSet, target='charges', silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.11 Three lines of code for model comparison for “House” dataset
from pycaret.datasets import get_data

from pycaret.regression import *

houseDataSet = get_data("house")

s = setup(data = houseDataSet, target='SalePrice', silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

Regression: Advance-1
3.12 model performance using data Normalisatiom
s = setup(data = titanicDataSet, target = 'Survived', normalize = True, normalize_method ='zscore', silent
= True)
University Institute of Engineering
Department of Computer Science & Engineering

3.13 Model performance using “Feature Selection”


s = setup(data = titanicDataSet, target = 'Survived', feature_selection = True, feature_selection_threshold = 0.9, silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.14 model performance using Outlier removal


s= setup(data = titanicDataSet, target = 'Survived', remove_outliers = True, outliers_threshold = 0.05,
silent = True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.15 Model performance using “Transformation”


s = setup(data = titanicDataSet, target = 'Survived', transformation = True, transformation_method = 'yeo-johnson',
silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.16 Model Performance using “PCA”


s = setup(data = titanicDataSet, target = 'Survived', pca = True, pca_method = 'linear', silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.17 Model performance using “Outlier Removal” + “Normalization”


s = setup(data = titanicDataSet, target = 'Survived', remove_outliers = True, outliers_threshold = 0.05, normalize = True,
normalize_method = 'zscore', silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

3.18 Model performance using “Outlier Removal” + “Normalization” + “Transformation”


s = setup(data = titanicDataSet, target = 'Survived', remove_outliers = True, outliers_threshold = 0.05, normalize = True,
normalize_method = 'zscore', transformation = True, transformation_method = 'yeo-johnson', silent=True)

cm = compare_models()
University Institute of Engineering
Department of Computer Science & Engineering

Regression: Advance-2
3.19 building a single model Random Forest
from pycaret.datasets import get_data

from pycaret.regression import*

titanicDataSet = get_data("titanic")

s = setup(data = titanicDataSet, target = 'Survived', silent = True)

rfModel = create_model('rf')

3.20 save the trained model


sm = save_model(rfModel,'rfModel')

3.21 load the model


rfModel = load_model('rfModel')
University Institute of Engineering
Department of Computer Science & Engineering

3.22 make predictions on new datasets


newDataSet = get_data("titanic").iloc[:10]

3.23 make predictions on new dataset


newPredictions = predict_model(rfModel, data = newDataSet)

newPredictions
University Institute of Engineering
Department of Computer Science & Engineering

3.24 scatter plot between actual and predicted


import matplotlib.pyplot as plt

predicted = newPredictions.iloc[:,-1]

actual = newPredictions.iloc[:, -2]

plt.scatter(actual, predicted)

plt.xlabel('Predicted')

plt.ylabel('Actual')

plt.title('Actual vs Predicted')

plt.savefig("result-scatter-plot.jpg", dpi = 300)

plt.show()
University Institute of Engineering
Department of Computer Science & Engineering

3.25 save prediction results to csv


newPredictions.to_csv('MyPrediction.csv')

Plot the Model

3.26 Create random forest or any other model


University Institute of Engineering
Department of Computer Science & Engineering

rf = create_model('rf')

3.27 Plot Error (Scatter Plot)


plot_model(rf, plot='error')

3.28 Plot Learning Curve


University Institute of Engineering
Department of Computer Science & Engineering

plot_model(rf, plot='learning')

3.29 Plot Validation Curve


plot_model(rf, plot='vc')

Feature Importance
University Institute of Engineering
Department of Computer Science & Engineering

3.30 Feature Importance using Random Forest


rfModel = create_model('rf', verbose=False)

plot_model(rfModel, plot='feature')

3.31 Feature Importance using Extra Tress Regressor


etModel = create_model('et', verbose=False)

plot_model(etModel, plot='feature')
University Institute of Engineering
Department of Computer Science & Engineering

3.32 Feature using decision tree


dtModel = create_model('dt', verbose=False)

plot_model(dtModel, plot='feature')
University Institute of Engineering
Department of Computer Science & Engineering

4. Additional Creative Inputs (If Any):

Learning outcomes (What I have learnt):

1. working with Pycaret package

2. various ways of enhancing data quality

3. working with regression package

Evaluation Grid (To be filled by Faculty):


Sr. Parameters Marks Obtained Maximum Marks
No.
1. Worksheet completion including 10
writinglearning objectives/Outcomes.
University Institute of Engineering
Department of Computer Science & Engineering

(To besubmitted at the end of the day)


2. Post Lab Quiz Result. 5
3. Student Engagement in 5
Simulation/Demonstration/
Performance and Controls/Pre-Lab
Questions.
Signature of Faculty (with Date): Total Marks Obtained: 20

You might also like