External PPT - Animesh Singh-200301120038

CENTURION UNIVERSITY OF TECHNOLOGY 2020-2024
AND MANAGEMENT
Data warehouse and

data mining
Let's get visual with the experiment!
GUIDED BY
DR. SANGRAM KESHARI SWAIN
Team 1 Animesh Singh-200301120038
2 Avinash Kumar-200301120002
3 Abhishek Raj-200301120025
4 Aditya Raj-200301120020
5 Vishal Mandal-200301120055
Experiment 1
Aim - Demonstration of
preprocessing on dataset student.arff
Result - We have successfully

preprocessed and discretize student
data set.
LET'S BEGIN!
What is data preprocessing ?
Data preprocessing is a way of converting the raw data into a much-desired form so that useful
information can be derived from it
What is data discretization ?
Data discretization refers to a method of converting a huge number of data values into smaller
ones so that the evaluation and management of data become easy.
What is the difference between corelation and association ?
Association means that one variable provides information about another and correlation means
that two variables show an increasing or decreasing trend. For ex -
First we have to load the dataset
Open Start Programmes Accessories

Notepad and type the training data set
for student using Notepad.
Saving the .arff file
Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka
Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the
student.arff file in weka.
Load the dataset for DPP
While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
student data.
Visualisation of dataset
On the right side click on visualise all

button and we can see the whole data
visualised .
Discretization
Choose the age characteristic. Set the

index to 1 and the bins to 3, then click
Ok to apply the filter. This will result in a
new working relationship with the
attribute partitioned into three bins.
Experiment 2
Demonstration of preprocessing on
dataset labor.arff

preprocessed and discretize labor
data set.
LET'S BEGIN!

for labor using Notepad.


click on explorer and open the labor.arff
file in weka.

labor data.

visualised .
Discretization

index to 1 and the bins to 1, then click Ok
to apply the filter. This will result in a
attribute partitioned into one bins.
Experiment 3
Aim - Demonstration of Association rule
process on dataset contactlenses.arff
using apriori algorithm
Result - This programhas been

successfully executed
LET'S BEGIN!
Association rule
Association rules in data mining are used to identify interesting relationships or

patterns within large datasets. These rules are mainly applied in transactional
databases where transactions consist of items purchased by customers. The
goal of association rule mining is to discover interesting relationships between
different items in the data.
Association rule mining is widely used in various applications such as market

basket analysis, recommendation systems, and inventory management.

for Contactlens using Notepad.


contactlens.arff file in weka.

click 'Associate' to analyse contact lens
data
Output
On the left side click on start button and we can see Association rules that were
generated when apriori algorithm is applied on the given dataset: .
Experiment 4
Demonstration of Association rule
process on dataset test.arff using
apriori algorithm
Result - This programhas been

successfully executed
LET'S BEGIN!

for test using Notepad.
Following that, the file is stored in .arff


click on explorer and open the test.arff
file in weka.

test data.
Associate
On the right side click on start button and we can see the
apriori algorithm applied on the test data.
Experiment 5
Aim -Demonstration of
classification rule process on
dataset student.arff using j48
algorithm

classified student dataset using j48
algorithm.
LET'S BEGIN!

for student using Notepad.


student.arff file in weka.

click 'Associate' to analyse contact lens
data
Generating
On the left side click on start button and we can see classificationrules that were
generated when j48 algorithm is applied on the given dataset: .
Tree view
Experiment 6
Demonstration of classification rule
process on dataset employee.arff
using j48 algorithm

classified employee dataset using j48
algorithm.
LET'S BEGIN!



click on explorer and open the test.arff
file in weka.

test data.
Generating
On the left side click on start button and we can see

classificationrules that were generated when j48 algorithm is
applied on the given dataset: .
Tree view
Experiment 7
Aim - Demonstration of Association
rule process on dataset test.arff using
apriori algorithm
Result - We have successfully used

Association rule using apriori
algorithm.
LET'S BEGIN!
7:Demonstration of Association rule process on dataset contactlenses.arff
using apriori algorithm.
•The Apriori algorithm refers to the algorithm that is used to calculate the
association rules between objects.
• The Apriori algorithm is an influential algorithm that is generally used in the field of
data mining and association rule learning.
•It is used to identify frequent itemsets in a dataset and generate an association-
based rule based on the itemsets.
•It means how two or more objects are related to one another.


Associate
•Clicking on the associate tab will bring up the interface for association rule
algorithm.
•We will use apriori algorithm. This is the default algorithm
•In order to change the parameters for the run (example support, confidence etc)
we click on the text box immediately to the right of the choose button.
Experiment 8
Aim - 8:Demonstration of
classification rule process on dataset
employee.arff using j48 algorithm
Result - We have successfully used

Association rule using apriori
algorithm.
LET'S BEGIN!
8:Demonstration of classification rule process on dataset employee.arff using j48
algorithm
•The j48 algorithm is a classification algorithm that produces decision trees based
on information theory.
•It is an extension of Ross Quinlan’s earlier ID3 algorithm also known in Weka as J48,
J standing for Java.
•The decision trees generated by C4.5 are used for classification, and for this
reason, C4.5 is often referred to as a statistical classifier.


Associate
In notepad
@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute performance {good, avg, poor}
@data
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 35k, good 48, 32k,good
STEP 4:
Under the “text” options in the main panel. We
select the 10-fold cross validation as our
evaluation approach.
STEP 5:
We now click ”start” to generate the model .The Ascii
version of the tree as well as evaluation statistic will
appear in the right panel when the model construction
is complete. :
Experiment 9
Aim - Demonstration of clustering rule
process on dataset iris.arff using simple
k-means
Result - We have successfully demonstrated

clustering rule process on dataset iris.arff using
simple k-means
LET'S BEGIN!
Step 1
In the preprocessing interface, open the

Weka Explorer and load the required dataset,
and we are taking the iris.arff dataset.
Step 2
Find the ‘cluster’ tab in the explorer and

press the choose button to execute
clustering. A dropdown list of available
clustering algorithms appears as a result
of this step and selects the simple-k
means algorithm.
Step 3
Then, to the right of the choose icon, press the text

button to bring up the popup window shown in the
screenshots. We enter three for the number of clusters in
this window and leave the seed value alone. The seed
value is used to generate a random number that is used
to make internal assignments of instances of clusters.
Step 4
One of the choices has been chosen. We must ensure that they
are in the ‘cluster mode’ panel before running the clustering
algorithm. The choice to use a training set is selected, and then
the ‘start’ button is pressed. The screenshots below display the
process and the resulting window.
Step 5
The centroid of each cluster is shown in the result

window, along with statistics on the number and
percent of instances allocated to each cluster. Each
cluster centroid is represented by a mean vector. This
cluster can be used to describe a cluster.
Step 6
Another way to grasp the characteristics of each

cluster is to visualize them. To do so, right-click the
result set on the result. Selecting to visualize cluster
assignments from the list column.
Experiment 10
Demonstration of clustering rule process

on dataset student.arff using simple k- means
Result - We have successfully demonstrated

clustering rule processon dataset student.arff
using simple k- means
LET'S BEGIN!

for labor using Notepad.


click on explorer and open the labor.arff
file in weka.

labor data.

visualised .
Discretization

index to 1 and the bins to 1, then click Ok
to apply the filter. This will result in a
attribute partitioned into one bins.

External PPT - Animesh Singh-200301120038

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

External PPT - Animesh Singh-200301120038

Uploaded by

Copyright:

Available Formats

CENTURION UNIVERSITY OF TECHNOLOGY 2020-2024

Data warehouse and

Result - We have successfully

What is data discretization ?

What is the difference between corelation and association ?

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in.arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

On the right side click on visualise all

Choose the age characteristic. Set the

Result - We have successfully

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in.arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

On the right side click on visualise all

Choose the age characteristic. Set the

Result - This programhas been

Association rules in data mining are used to identify interesting relationships or

Association rule mining is widely used in various applications such as market

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in.arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

Result - This programhas been

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in .arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

Result - We have successfully

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in.arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

Result - We have successfully

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in .arff

Minimize the arff file and then open

Load the dataset for DPP

While still on the preprocessing page,

On the left side click on start button and we can see

Result - We have successfully used

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in .arff

Result - We have successfully used

Open Start Programmes Accessories

Saving the .arff file

Following that, the file is stored in .arff

Result - We have successfully demonstrated