You are on page 1of 29

DATA MINING LAB

Lab Manual

Computer Science & Engineering

Heni.R.Vyas-190305105729
Practical: 1
Aim: - Perform preprocessing on a dataset. Apply various filters and discuss
the effect of each filter applied.
A. Handle missing values
B. Handle Infrequent Nominal values.
C. Derive an attribute from the existing attribute.

About dataset: -
We are using Weather Dataset from Kaggle in our
task. Data src:https://www.kaggle.com/c/weather

Data Dictionary: -

Variable Definition Key


survival Survival 0 = No, 1 = Yes

pclass weather

outlook Sunny,Rainy,overcast

Temperature Temperature in celius

Humidity Humidity

Windy Windy

Play Play
Open our Weather Dataset in Weka Tool:
Task A: Handle missing values
Missing value: - Missing data are values that are not recorded in a dataset. They can be
a single value missing in a single cell or missing of an entire observation (row). Missing
data can occur both in a continuous variable 

Step 1: 4 missing values created in Data set.

Step 2: Missing value founded.


Step 3: Another Missing value founded.

Step 4: Applying filter for checking missing value.


Step 5:
apply remove missing filter on dataset for remove missing value row.
Task B: Handle Infrequent Nominal values.
Infrequent Data: - MLMS (Multiple Level Minimum Supports) model which
uses multiple level minimum supports to discover infrequent itemsets and
frequent itemsets simultaneously is proposed in our previous work. The reason
to discover infrequent itemsets is that there are many valued negative
association rules in them.
Step 1: Infrequent Nominal data set.

Step 2: Handled outlook nominal infrequent data in frequent manner by clicking on


outlook attribute.
Step 3: Handled play nominal infrequent data in frequent manner by clicking on
play attribute.

Step 4: Handled windy nominal infrequent data in frequent manner by clicking on


windy attribute.
Task C: Derive an attribute from the existing attribute.

Step 1: Before applying add expression filter.

Step 2: Deriving an attribute from existing attributes from add expression filter.
Step 3: After applying add expression filter new attribute created that name is
temperature+humidity that are derived from temperature and humidity attribute on
data set.
Practical 2
Aim- Perform Binning in Dataset.
Binning: Data binning, bucketing is a data pre-processing method used to minimize
the effects of small observation errors. The original data values are divided into small
intervals known as bins and then they are replaced by a general value calculated for
that bin.

We using Discretize filter for binning the range in data set.


Step 1: dataset binning range set to 5
Result of binning step 1.

Step 2: dataset bin range set to 2


Result of binning step 2.

Step 3: dataset bin range set to 8


Result of binning step 3.
Practical 3
Aim: Perform Clustering in your data set.
Clustering: - Clustering is an unsupervised Machine Learning-based Algorithm that comprises a
group of data points into clusters so that the objects belong to the same group......Each of these
subsets contains data similar to each other, and these subsets are called clusters.

Step 1: Applied EM clustering on class “temperature” on data set.


Step 2: Applied EM clustering on class “humidity” on our data set.

Step 3: Applied EM clustering on class “windy” on our data set.


Step 4: Applied EM clustering on class “play” in our data set.
Practical 4
Aim: Perform Association on your data set.
Association: - Association rule mining, at a basic level, involves the use of machine
learning models to analyze data for patterns, or co-occurrences, in a
database. ... Association rules are created by searching data for frequent if-then
patterns and using the criteria support and confidence to identify the most important
relationships.

Step 1: applying Apriori association on our data set with 5 rules.


Result of step 1:

Step 2: applying Apriori association on our data set with 14 rules.


Result of Step
2:

Step 3: applying Apriori association on our data set with 4 rules.


Result of Step 3:

Step 4: applying Apriori association on our data set with 18 rules.


Result of Step 4:
Practical 5
Aim: Perform Classifiers on your data set.
Classifiers: - A classifier is a Supervised function (machine learning tool) where the
learned (target) attribute is categorical (“nominal”) in order to classify. It is used after
the learning process to classify new records (data) by giving them the best target
attribute (prediction). Rows are classified into buckets.
Step1: Applying filter J48 and cv=12 on survived column in our data set.

Result of Step 1.
Visualize results of Step 1.
Tree view:

Visualize classifier error:


Step 2: Applying filter J48 and percentage split =89% on play column in our data set.

Result of applying filter J48 and percentage split =89 % on play column.
Visualize Result of Step 2.
Trees:

Visualize classification Error:


Step 3: applying filter J48 and percentage split =55% on “windy” column in our data set.

Result of applying filter J48 and percentage split =55% on “windy” column.
Visualize Result of Step 3.
Tree:

Visualize classification of Error.

You might also like