You are on page 1of 7

Experiment: 5

Student Name: Mohammad Arslan Mansoori UID: 21BCS11169


Student Name: Yared Alemayehu UID: 21BCS10676
Student Name: Yogesh Joshi UID: 21BCS11520
Student Name: Md. Sanaulla UID: 21BCS10953
Student Name: Naman Gupta UID: 21BCS6373

Branch: Computer Science & Engineering Section/Group: PH21AML-129


Semester: 1st Date of Performance:
Subject Name: Disruptive Technology Subject Code: 21ECP-102

1. Aim of the practical: Understand supervised learning to train and develop classifier
model using PyCaret.

2. Tool Used: Google Colab.

3. Basic Concept/ Command Description: In this experiment we have installed PyCaret and
imported a predefined dataset and trained and developed classifier model using PyCaret.
4. Code with screenshot of output:

1.1) Installing PYcaret

!pip install pycaret &> /dev/null


print("Pycaret installed")

Pycaret installed

1.2) Getting the version of Pycaret

from pycaret.utils import version


version()

'2.3.5'

2) Classification: Basics

2.1) Loading Dataset from PyCaret

from pycaret.datasets import get_data

2.2) Get the list of all the dataset available in PyCaret

dataset = get_data('index') #This will give index of all the available dataset in Pycare
Target
Data Default Target
Dataset Variable
Types Task Variable 1
2

Anomaly
0 anomaly Multivariate None None
Detection

Association
1 france Multivariate InvoiceNo Description
Rule Mining

Association
2 germany Multivariate InvoiceNo Description
Rule Mining

Classification
3 bank Multivariate deposit None
(Binary)

Classification
4 blood Multivariate Class None
(Binary)

Classification
5 cancer Multivariate Class None
(Binary)

Classification
6 credit Multivariate default None
(Binary)

Classification
7 diabetes Multivariate Class variable None
(Binary)

Classification
8 electrical_grid Multivariate stabf None
(Binary)

Classification
9 employee Multivariate left None
(Binary)

Classification
10 heart Multivariate DEATH None
(Binary)

Classification
11 heart_disease Multivariate Disease None
(Binary)

Classification
12 hepatitis Multivariate Class None
(Binary)

Classification
13 income Multivariate income >50K None
(Binary)

Classification
14 juice Multivariate Purchase None
(Binary)

Classification
15 nba Multivariate TARGET_5Yrs None
(Binary)

Classification
16 wine Multivariate type None
(Binary)

Classification
17 telescope Multivariate Class None
(Binary)

Classification
18 titanic Multivariate Survived None
(Binary)
Classification
19 us_presidential_election_results Multivariate party_winner None
(Binary)
Classification
20 glass Multivariate Type None
(Multiclass)

Classification
21 iris Multivariate species None
(Multiclass)

Classification
22 poker Multivariate CLASS None
(Multiclass)

Classification
23 questions Multivariate Next_Question None
(Multiclass)

Classification
24 satellite Multivariate Class None
(Multiclass)

Classification
25 CTG Multivariate NSP None
(Multiclass)

26 asia_gdp Multivariate Clustering None None

27 elections Multivariate Clustering None None

28 facebook Multivariate Clustering None None

29 ipl Multivariate Clustering None None


2.3 Get Diabetes dataset
30 jewellery Multivariate Clustering None None

31
dataset_of_diabetes mice Multivariate
= get_data("diabetes") Clustering
#This will give theNone
dataset ofNone
'diabete
print(type(dataset_of_diabetes))
32 migration Multivariate Clustering None None

33 perfume Multivariate
Plasma Clustering None None
glucose
34 pokemon Multivariate Clustering
2-Hour BodyNone
mass None
concentration Diastolic Triceps
Number serum index Diabetes
35 of times a 2 hours in
population blood skin fold
Multivariate Clustering None
insulin (weight None
pedigree
pressure thickness in
an oral (ye
pregnant (mu kg/(height function
36 public_health(mm
glucose Hg)
Multivariate (mm)
Clustering None None
U/ml) in m)^2)
tolerance
37 testseeds Multivariate Clustering None None

38
0 6 wholesale
148 Multivariate
72 Clustering
35 0 None
33.6 None
0.627
39
1 1 tweets
85 66 Text 29 NLP 0 tweet
26.6 None
0.351

2 8 183 64 0 NLP / 0 23.3 0.672

3) Random Forest Algorithm and Confusion Matrix

3.1) Using Random Forest Algo with confusion matrix For the performance
measurement of the classification.

from pycaret.classification import *

s = setup(data=dataset_of_diabetes, target='Class variable', silent=True)


rfModel = create_model('rf')
plot_model(rfModel, plot='confusion_matrix')

3.2) Few Other modules:

'ada' - Ada Boost Classifier


'dt' - Decision Tree Classifier
'et' - Extra Trees Classifier
'gbc' - Gradient Boosting Classifier
'knn' - K Neighbors Classifier
'lightgbm' - Light Gradient Boosting Machine
'lda' - Linear Discriminant Analysis
'lr' - Logistic Regression
'nb' - Naive Bayes
'qda' - Quadratic Discriminant Analysis
'rf' - Random Forest Classifier
'ridge' - Ridge Classifier
'svm' - SVM - Linear Kernel

3.3) Save the trained Module

save = save_model(rfModel, 'rfModelFile')

Transformation Pipeline and Model Successfully Saved


Few Other Operations in the given dataset

data.columns
#Gives all the columns

Index(['Dataset', 'Data Types', 'Default Task', 'Target Variable 1',


'Target Variable 2', '# Instances', '# Attributes', 'Missing Values'],
dtype='object')

data.size #Give the size of the data

448
5. Additional Creative Inputs (If Any):
Learning outcomes (What I have learnt):

1. We learnt to install PyCaret


2. We learnt to use Google Colab
3. We learnt to get dataset from PyCaret

Evaluation Grid (To be filled by Faculty):


Sr. No. Parameters Marks Obtained Maximum Marks
1. Worksheet completion including 10
writing learning objectives/Outcomes.
(To be submitted at the end of the day)
2. Post Lab Quiz Result. 5
3. Student Engagement in 5
Simulation/Demonstration/Performance
and Controls/Pre-Lab Questions.
Signature of Faculty (with Date): Total Marks Obtained: 20

You might also like