Welcome to Scribd!

CH - en.U4CSE19101 Assignment 1

Uploaded by

0% found this document useful (0 votes)

15 views7 pages

This lab assignment involved exploring the WEKA Explorer tool. The following key tasks were performed: 1. Various datasets were loaded and pre-processed, including removing instances in the diabetes dataset over 50 years old. 2. The J48 classifier was run on credit-g.arff achieving 70.5% accuracy, with class bad being misclassified more. Irrelevant attributes were identified. Accuracy improved to 72% after their removal. 3. Instances with high humidity values were removed from the weather.nominal dataset. 4. Analysis of the glass dataset decision tree showed Ba as the top node. Two tableware instances were misclassified. True and false positives of the 'build wind float

Original Description:

Original Title

CH.en.U4CSE19101 Assignment 1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

15 views7 pages

CH - en.U4CSE19101 Assignment 1

Uploaded by

Uma Shankar Cheduri

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

19CSE304- FOUNDATIONS OF DATA SCIENCE

LAB ASSIGNMENT-1
INTRODUCTION TO WEKA EXPLORER

CH SURYA UMA SHANKAR

CH.EN.U4CSE19101

1. Load diabetes.arff dataset.

a. What is the smallest, largest and average age of the patients in the
diabetes dataset?
smallest is 21, largest is 81, and average is 33.241

b. Pre-process the dataset to remove instances more than 50 years old.

Report what filters were used to achieve this. Show screenshot of how the data
changed at each step.
Choose removewithvalues filter. Under filter select unsupervised Instance
removewithvalues. Set attribute index as 8 and invert selection true. Change split point
to 51. Click apply to remove all instances more than 50 years old
c. Compare the number of values for age attribute before and after applying
the filters.
Before applying filter, we have 52 distinct values and after applying filter we
have 29 distinct values

2. Load credit-g.arff dataset.

a. Run J48 classifier with default properties on this dataset and report the
accuracy. Comment about the misclassification and the confusion matrix.

Accuracy is 70.5%. For class good, there were 588 correctly classified
instances and 112 incorrectly classified instances but for class bad, there were more
incorrectly classified instances (183) compared to correctly classified instances (117)
which is a bad figure.

➢ Classification of class bad is not accurate

b. Run 'InfoGainAttribute' evaluator and using 'Ranker' search method and

find out which are the most irrelevant attributes.
Click select attributes and choose 'InfoGainAttribute' and select 'Ranker'
method under Search method.

➢ num_dependents, installment_commitment, residence_since, and

existing_credits are the most irrelevant attributes.

c. After removing the most irrelevant attributes, run J48 classifier again and
comment on the results obtained.
Accuracy has increased (72%) compared to previous results. For class good,
there were 586 correctly classified instances and 114 incorrectly classified instances
and for class bad, there were 166 correctly classified instances and 134 incorrectly
classified instances which is much better compared to previous results. Rate of
misclassification has decreased comparatively.

3. Load weather.nominal dataset. Remove instances in the dataset where

humidity attribute has high value.
The number of instances reduced from 14 to 7 for humidity attribute
4. Run J48 classifier on glass dataset.

a. What is the top node on the decision tree?

b. How many instances of the tableware class were misclassified? What are
the misclassified instances?
2 instances were misclassified. 171 and 192
c. Comment about true positives and false positives of 'build wind float' class.
In 'build wind float' class we have 50 true positives and 25 false
positives
TP rate = .714 and FP rate = .174

d. From the decision tree, come up with a rule to identify headlamps.

If Ba>0.27 and Si>70.16, we can classify the instances as headlamps

e. From the dataset, remove the instances that were misclassified.

Under filter select Removemisclassified. Right click on the filter and choose
J48 classifier and click OK. Now click apply

The number of instances reduced to 194 after applying the filter.

Factor Analysis Spss Notes
Document4 pages
Factor Analysis Spss Notes
gauravibs
No ratings yet
Relevance Ranking for Vertical Search Engines
From Everand
Relevance Ranking for Vertical Search Engines
Bo Long
No ratings yet
Ziegler Nichols Matlab PDF
Document8 pages
Ziegler Nichols Matlab PDF
Marco Arcos
100% (2)
Final Exam For SAS Enterprise Miner
Document17 pages
Final Exam For SAS Enterprise Miner
Erdene Bolor
No ratings yet
Query - Advanced Orientation (Associate) HAK1034.3
Document11 pages
Query - Advanced Orientation (Associate) HAK1034.3
srgilbert01
No ratings yet
Introduction To Data Mining
Document3 pages
Introduction To Data Mining
Quân Phạm
No ratings yet
A Decision Tree Approach For Steam Turbine-Generator Fault Diagnosis
Document8 pages
A Decision Tree Approach For Steam Turbine-Generator Fault Diagnosis
Shabaan Mohamed
No ratings yet
Practical 5: Introduction To Weka For Classfication
Document4 pages
Practical 5: Introduction To Weka For Classfication
Phạm Hoàng Kim
No ratings yet
Chap 7 HW 1
Document3 pages
Chap 7 HW 1
Joy Box
No ratings yet
Data Mining Lab Manual
Document40 pages
Data Mining Lab Manual
xamogoy396
No ratings yet
STAT8017 Assignment 1
Document6 pages
STAT8017 Assignment 1
Thompson Daphnis Lau
No ratings yet
RV College of Engineering, Bengaluru-59: Self-Study Component-2
Document20 pages
RV College of Engineering, Bengaluru-59: Self-Study Component-2
Alishare Muhammed Akram
No ratings yet
Data Mining QAs
Document6 pages
Data Mining QAs
Ajit Kumar
No ratings yet
Cobas
Document3 pages
Cobas
kigm mkj
100% (1)
SET-I-Practice Questions - Students - Problem
Document4 pages
SET-I-Practice Questions - Students - Problem
Tushar Chaudhary
No ratings yet
W7 Weka Experimenter
Document6 pages
W7 Weka Experimenter
Azfar Jiji
No ratings yet
Name: Survey1 Title: Genetic Algorithm Based On Evolution Strategy and The Alication in Data Mining 2.issue
Document24 pages
Name: Survey1 Title: Genetic Algorithm Based On Evolution Strategy and The Alication in Data Mining 2.issue
indirasivakumar
No ratings yet
CAP3770 Lab#4 DecsionTree Sp2017
Document4 pages
CAP3770 Lab#4 DecsionTree Sp2017
Melving
No ratings yet
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
Document18 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
Aditya Hajare
No ratings yet
Data Analytics: Practice Exercises
Document2 pages
Data Analytics: Practice Exercises
Divyanshu Bose
50% (2)
DWDM Record With Alignment
Document69 pages
DWDM Record With Alignment
navya
No ratings yet
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
Document5 pages
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
Rizal Amegia Saputra
No ratings yet
AP Questions Chapter 4
Document8 pages
AP Questions Chapter 4
JT Greenberg
No ratings yet
Group Assignment AI
Document7 pages
Group Assignment AI
Kirthi Vasan
No ratings yet
DM - Ch4 - Classification (Part1)
Document20 pages
DM - Ch4 - Classification (Part1)
C.RadhiyaDevi
No ratings yet
Data Mining Lab Manual
Document44 pages
Data Mining Lab Manual
Amanpreet Kaur
33% (3)
Week 3: Central Tendency: Problem Set 3.1: Characteristics of The Mean
Document6 pages
Week 3: Central Tendency: Problem Set 3.1: Characteristics of The Mean
Tanya Alkhaliq
No ratings yet
A C T M 2018 S R E N: - Directions-: Rkansas Ouncil of Eachers of Athematics Tatistics Egional XAM AME
Document16 pages
A C T M 2018 S R E N: - Directions-: Rkansas Ouncil of Eachers of Athematics Tatistics Egional XAM AME
Megan Cabahug
No ratings yet
Unit 4 Answers
Document6 pages
Unit 4 Answers
Kevin Nyasogo
No ratings yet
ML
Document3 pages
ML
Daniel JDanso
No ratings yet
DWDM
Document4 pages
DWDM
Shihab Ahmed
No ratings yet
DWDM Lab Manual
Document47 pages
DWDM Lab Manual
Krishna Chowdary Challa
No ratings yet
Tutorial 1 Soutions
Document6 pages
Tutorial 1 Soutions
MOHAIMEN GUIMBA
No ratings yet
Which of The Following Does Not Belong To The Group?
Document41 pages
Which of The Following Does Not Belong To The Group?
Ryan Acosta
No ratings yet
Unit 1-L14-Student Handout
Document10 pages
Unit 1-L14-Student Handout
api-576039499
No ratings yet
Exam2004 2 3
Document22 pages
Exam2004 2 3
JoHn Scofield
No ratings yet
Practical 7 Classification Revision Questions
Document8 pages
Practical 7 Classification Revision Questions
nangayebokham
No ratings yet
A. Extract-Transform-Load (ETL) Process B. Hadoop Process C. Online Analytical Process D. Drill-Down Process
Document3 pages
A. Extract-Transform-Load (ETL) Process B. Hadoop Process C. Online Analytical Process D. Drill-Down Process
Pooja Gaddam
No ratings yet
Data Science and Machine Learning Essentials: Lab 4B - Working With Classification Models
Document29 pages
Data Science and Machine Learning Essentials: Lab 4B - Working With Classification Models
aussatris
No ratings yet
Manisha 3001 Week 12
Document22 pages
Manisha 3001 Week 12
Suman Gaihre
No ratings yet
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Document4 pages
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Pavan Sankar K
No ratings yet
09 - Chapter 5
Document32 pages
09 - Chapter 5
Muhammad Imran
No ratings yet
Test 2 2019
Document9 pages
Test 2 2019
mc88spm
No ratings yet
11 4variationswithinadataset
Document4 pages
11 4variationswithinadataset
Christian Batista
No ratings yet
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
Document12 pages
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
pepta
No ratings yet
Project On C Ovid Data
Document16 pages
Project On C Ovid Data
Rubin S
No ratings yet
Ce Ai Lab Ass Group 1
Document11 pages
Ce Ai Lab Ass Group 1
kndnew guade
No ratings yet
CCI 501-Supervised Learning Classification Naive Bayes (NB)
Document8 pages
CCI 501-Supervised Learning Classification Naive Bayes (NB)
Kikuvi John
No ratings yet
Mid Term
Document12 pages
Mid Term
sree vishnupriyq
No ratings yet
Data Mining 2
Document40 pages
Data Mining 2
Piyush Rajput
No ratings yet
Assignment On "Problem Identification: Web Based Statistical Tool For Diallel Analysis"
Document8 pages
Assignment On "Problem Identification: Web Based Statistical Tool For Diallel Analysis"
abhinav
No ratings yet
CS440: HW3
Document7 pages
CS440: HW3
Jon Mueller
No ratings yet
Rheo Meter Training Manual
Document11 pages
Rheo Meter Training Manual
Rossamirah Khairi
No ratings yet
HW 2 Write-Up
Document4 pages
HW 2 Write-Up
Jennifer Alyce Rios
No ratings yet
Math B22 Practice Exam 1
Document2 pages
Math B22 Practice Exam 1
blueberrymuffinguy
No ratings yet
17C - PowerPoint - Standard Deviation
Document9 pages
17C - PowerPoint - Standard Deviation
Peter Lee
No ratings yet
AIML Expt
Document7 pages
AIML Expt
D Slm
No ratings yet
Homework/Assignment: (Chapter 9: Input Modelling)
Document11 pages
Homework/Assignment: (Chapter 9: Input Modelling)
Lam Vũ Hoàng
No ratings yet
Classification of Healthy and Rot Leaves of Apple Using Gradient Boosting and Support Vector Classifier
Document5 pages
Classification of Healthy and Rot Leaves of Apple Using Gradient Boosting and Support Vector Classifier
tayaser bhat
No ratings yet
Evolutionary Algorithms for Food Science and Technology
From Everand
Evolutionary Algorithms for Food Science and Technology
Evelyne Lutton
No ratings yet
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
From Everand
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Phillip I. Good
No ratings yet
CH - En.u4cse19105 de Lab 2
Document21 pages
CH - En.u4cse19105 de Lab 2
Uma Shankar Cheduri
No ratings yet
CH - En.u4cse19101 de Lab 2
Document17 pages
CH - En.u4cse19101 de Lab 2
Uma Shankar Cheduri
No ratings yet
CH - En.u4cse19101 - CN Lab 2
Document4 pages
CH - En.u4cse19101 - CN Lab 2
Uma Shankar Cheduri
No ratings yet
CH - En.u4cse19130 CN Lab 2
Document4 pages
CH - En.u4cse19130 CN Lab 2
Uma Shankar Cheduri
No ratings yet
CH - En.u4cse19105 - Lakshmi Sahithi - Experiment.1
Document12 pages
CH - En.u4cse19105 - Lakshmi Sahithi - Experiment.1
Uma Shankar Cheduri
No ratings yet
Fifo Memory Using Verilog: Department of Computer Science and Engineering
Document24 pages
Fifo Memory Using Verilog: Department of Computer Science and Engineering
Uma Shankar Cheduri
No ratings yet
Mechatronics 1 Introduction
Document8 pages
Mechatronics 1 Introduction
Rafael Deocuariza
No ratings yet
ITML U1 Overview
Document45 pages
ITML U1 Overview
jainkomal1976
No ratings yet
3-Data Referencing
Document17 pages
3-Data Referencing
Deeksha Jangid
No ratings yet
Diabetes Prediction Using Machine Learning
Document6 pages
Diabetes Prediction Using Machine Learning
International Journal of Innovative Science and Research Technology
No ratings yet
Critical Path Method 1
Document25 pages
Critical Path Method 1
Aztec Mayan
No ratings yet
PHD Thesis Big Data
Document7 pages
PHD Thesis Big Data
carolynostwaltbillings
100% (2)
ML Glossary
Document44 pages
ML Glossary
Lakshya Priyadarshi
No ratings yet
Web Mining and Text Mining
Document65 pages
Web Mining and Text Mining
nikhithalazarus4
No ratings yet
Give Three Examples of Open-Loop Systems
Document2 pages
Give Three Examples of Open-Loop Systems
Famela Gad
No ratings yet
Artificial Consciousness: Consciousness: An Introduction by Susan Blackmore
Document39 pages
Artificial Consciousness: Consciousness: An Introduction by Susan Blackmore
UdayKumar
No ratings yet
What Is Interpersonal Communication
Document4 pages
What Is Interpersonal Communication
Bhupendra Sharma
No ratings yet
CDBM Mod02 Answers
Document22 pages
CDBM Mod02 Answers
Moin Kabir Moin
No ratings yet
Lecture9 10
Document18 pages
Lecture9 10
Shugal On Hai
No ratings yet
07 Neural Networks1
Document73 pages
07 Neural Networks1
Rhiksa D'vhieyyrho
No ratings yet
Artificial Intelligence: Thinking Humanly Thinking Rationally
Document9 pages
Artificial Intelligence: Thinking Humanly Thinking Rationally
Ashok Kumar
No ratings yet
Chapter 4 Signal Flow Graph
Document34 pages
Chapter 4 Signal Flow Graph
Abhishek Pattanaik
No ratings yet
Futuronix Training With Fee Detatil 11
Document9 pages
Futuronix Training With Fee Detatil 11
arvind_cool05
No ratings yet
Membangun Model Prediktif
Document21 pages
Membangun Model Prediktif
Mendy T Aries Mia
No ratings yet
Optimal Filtering With Aerospace Applications: Section 2.2: Linear Systems
Document37 pages
Optimal Filtering With Aerospace Applications: Section 2.2: Linear Systems
dayvox10
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
Document12 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
luckyaliss786
No ratings yet
Real-Time Identification of Power Fluctuations
Document11 pages
Real-Time Identification of Power Fluctuations
Enock Anderson
No ratings yet
A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM
Document17 pages
A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM
Atiexz 'husnul Khatimah'
No ratings yet
A Survey of Deep Learning Techniques For Autonomous Driving
Document25 pages
A Survey of Deep Learning Techniques For Autonomous Driving
tilahun
No ratings yet
Sample TG For MIL
Document13 pages
Sample TG For MIL
Ravuby
No ratings yet
Introduction
Document10 pages
Introduction
Betsalot Muluneh
No ratings yet
DR - 2019 - Bottleneck RGB Features For Tea Clones
Document4 pages
DR - 2019 - Bottleneck RGB Features For Tea Clones
Dadan Rohdiana
No ratings yet
Btech Cs 5 Sem Machine Learning Techniques Kcs055 2022
Document2 pages
Btech Cs 5 Sem Machine Learning Techniques Kcs055 2022
Shivangi Mishra
No ratings yet
Backpropagation With Example
Document42 pages
Backpropagation With Example
jay
No ratings yet
Control System Unit 1: Ans: A
Document9 pages
Control System Unit 1: Ans: A
Pankaj Kumar Mehta
No ratings yet