Lab Manual Computer Science & Engineering

DATA MINING LAB
Lab Manual
Computer Science & Engineering
Heni.R.Vyas-190305105729
Practical: 1
Aim: - Perform preprocessing on a dataset. Apply various filters and discuss
the effect of each filter applied.
A. Handle missing values
B. Handle Infrequent Nominal values.
C. Derive an attribute from the existing attribute.
About dataset: -
We are using Weather Dataset from Kaggle in our
task. Data src:https://www.kaggle.com/c/weather
Data Dictionary: -
Variable Definition Key

survival Survival 0 = No, 1 = Yes
pclass weather
outlook Sunny,Rainy,overcast
Temperature Temperature in celius
Humidity Humidity
Windy Windy
Play Play
Open our Weather Dataset in Weka Tool:
Task A: Handle missing values
Missing value: - Missing data are values that are not recorded in a dataset. They can be
a single value missing in a single cell or missing of an entire observation (row). Missing
data can occur both in a continuous variable
Step 1: 4 missing values created in Data set.
Step 2: Missing value founded.

Step 3: Another Missing value founded.
Step 4: Applying filter for checking missing value.

Step 5:
apply remove missing filter on dataset for remove missing value row.
Task B: Handle Infrequent Nominal values.
Infrequent Data: - MLMS (Multiple Level Minimum Supports) model which
uses multiple level minimum supports to discover infrequent itemsets and
frequent itemsets simultaneously is proposed in our previous work. The reason
to discover infrequent itemsets is that there are many valued negative
association rules in them.
Step 1: Infrequent Nominal data set.
Step 2: Handled outlook nominal infrequent data in frequent manner by clicking on

outlook attribute.
Step 3: Handled play nominal infrequent data in frequent manner by clicking on
play attribute.
Step 4: Handled windy nominal infrequent data in frequent manner by clicking on

windy attribute.
Task C: Derive an attribute from the existing attribute.
Step 1: Before applying add expression filter.
Step 2: Deriving an attribute from existing attributes from add expression filter.
Step 3: After applying add expression filter new attribute created that name is
temperature+humidity that are derived from temperature and humidity attribute on
data set.
Practical 2
Aim- Perform Binning in Dataset.
Binning: Data binning, bucketing is a data pre-processing method used to minimize
the effects of small observation errors. The original data values are divided into small
intervals known as bins and then they are replaced by a general value calculated for
that bin.
We using Discretize filter for binning the range in data set.

Step 1: dataset binning range set to 5
Result of binning step 1.
Step 2: dataset bin range set to 2

Step 3: dataset bin range set to 8

Practical 3
Aim: Perform Clustering in your data set.
Clustering: - Clustering is an unsupervised Machine Learning-based Algorithm that comprises a
group of data points into clusters so that the objects belong to the same group......Each of these
subsets contains data similar to each other, and these subsets are called clusters.
Step 1: Applied EM clustering on class “temperature” on data set.

Step 2: Applied EM clustering on class “humidity” on our data set.
Step 3: Applied EM clustering on class “windy” on our data set.

Step 4: Applied EM clustering on class “play” in our data set.
Practical 4
Aim: Perform Association on your data set.
Association: - Association rule mining, at a basic level, involves the use of machine
learning models to analyze data for patterns, or co-occurrences, in a
database. ... Association rules are created by searching data for frequent if-then
patterns and using the criteria support and confidence to identify the most important
relationships.
Step 1: applying Apriori association on our data set with 5 rules.

Result of step 1:

Result of Step
2:

Result of Step 3:

Result of Step 4:
Practical 5
Aim: Perform Classifiers on your data set.
Classifiers: - A classifier is a Supervised function (machine learning tool) where the
learned (target) attribute is categorical (“nominal”) in order to classify. It is used after
the learning process to classify new records (data) by giving them the best target
attribute (prediction). Rows are classified into buckets.
Step1: Applying filter J48 and cv=12 on survived column in our data set.
Result of Step 1.
Visualize results of Step 1.
Tree view:
Visualize classifier error:

Step 2: Applying filter J48 and percentage split =89% on play column in our data set.
Result of applying filter J48 and percentage split =89 % on play column.
Visualize Result of Step 2.
Trees:
Visualize classification Error:

Step 3: applying filter J48 and percentage split =55% on “windy” column in our data set.
Result of applying filter J48 and percentage split =55% on “windy” column.
Visualize Result of Step 3.
Tree:
Visualize classification of Error.

Lab Manual Computer Science & Engineering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Manual Computer Science & Engineering

Uploaded by

Copyright:

Available Formats

DATA MINING LAB

Computer Science & Engineering

Variable Definition Key

Temperature Temperature in celius

Step 1: 4 missing values created in Data set.

Step 2: Missing value founded.

Step 4: Applying filter for checking missing value.

Step 2: Handled outlook nominal infrequent data in frequent manner by clicking on

Step 4: Handled windy nominal infrequent data in frequent manner by clicking on

Step 1: Before applying add expression filter.

We using Discretize filter for binning the range in data set.

Step 2: dataset bin range set to 2

Step 3: dataset bin range set to 8

Step 1: Applied EM clustering on class “temperature” on data set.

Step 3: Applied EM clustering on class “windy” on our data set.

Step 1: applying Apriori association on our data set with 5 rules.

Step 2: applying Apriori association on our data set with 14 rules.

Step 3: applying Apriori association on our data set with 4 rules.

Step 4: applying Apriori association on our data set with 18 rules.

Visualize classifier error:

Visualize classification Error:

Visualize classification of Error.

You might also like