Till 10 021 Done

Faculty of Engineering & Technology
Subject Name :- Data mining & Business Intelligence

Subject Code:- 203105454
B.Tech. IT Year 3rd Semester 6th
PRACTICAL-10
Aim:- Perform Clustering using WEKA tool.
(1). Percentage Split:-

(A).Using SimpleKmeans Custer:-K-means clustering is a simple unsupervised learning
algorithm. In this, the data objects (‘n’) are grouped into a total of ‘k’ clusters, with each
observation belonging to the cluster with the closest mean. It defines ‘k’ sets, one for each
cluster k n (the point can be thought of as the center of a one or two-dimensional figure).
The clusters are separated by a large distance.
 The data is then organized into acceptable data sets and linked to the nearest collection. If
no data is pending, the first stage is more difficult to complete; in this case, an early
grouping is performed. The ‘k’ new set must be recalculated as the barycenters of the
clusters from the previous stage.
 The same data set points and the nearest new sets are bound together after these ‘k’ new
sets have been created. After that, a loop is created. The ‘k’ sets change their position step
by step until no further changes are made as a result of this loop.
Clustering:- Clustering is the method of dividing a set of abstract objects into groups. Points
to Keep in Mind A set of data objects can be viewed as a single entity. When performing
cluster analysis, we divide the data set into groups based on data similarity, then assign
labels to the groups.
Step 1: open the labor.arff dataset.
(Figure 1.1 laybor.arff dataset)
Ankit Pandey(200303108159) Data Mining & Business Intelligence (203105454)

Step 2: Go to the clustering choose filter
SimpleKmeans give percentage Split (50%) click start.
(Figure 1.2 Clustering for (50%))
Step 3: Go to the clustering choose filter SimpleKmeans give percentage Split (70%)
click start.
(Figure 1.3 Clustering for (70%))

❖ ACCURACY TABLE FOR CLUSTERING:
Accuracy (%) Clustered Instances

Split SimpleKmeans SimpleKmeans
ercentage Clustered 0 lustered 1 Clustered 0 Clustered 1
(%)
30 45% 55% 10 22
60 24% 76% 7 22
70 67% 33% 12 6
(B).Using EM Custer:- Expectation Maximization (EM) is another popular, though a bit

more complicated, clustering algorithm that relies on maximizing the likelihood to find
the statistical parameters of the underlying sub-populations in the dataset. I will not get
into the probabilistic theory behind EM. If you are interested you can read more here. But
to briefly summarize, the EM algorithm alternates between two steps (E-step and Mstep).
In the E-step the algorithm tries to find a lower bound function on the original likelihood
using the current estimate of the statistical parameters. In the M-step the algorithm finds
new estimates of those statistical parameters by maximizing the lower bound function (i.e.
determine the MLE of the statistical parameters). Since at each step we maximize the
lower bound function, the algorithm always produces estimates with higher likelihood
than the previous iteration and ultimately converge to a maxima.
 EM is an iterative method which alternates between two steps, expectation (E) and
maximization (M). For clustering, EM makes use of the finite Gaussian mixtures model and
estimates a set of parameters iteratively until a desired convergence value is achieved.
 The EM algorithm extends this basic approach to clustering in two important ways:
Instead of assigning examples to clusters to maximize the differences in means for continuous
variables, the EM clustering algorithm computes probabilities of cluster memberships based
on one or more probability distributions.
Step 1: Go to the clustering choose filter SimpleKmeans give percentage Split

(70%) click start.
(Figure 1.2 Clustering for 70%)
Step 2: Go to the clustering choose filter SimpleKmeans give percentage Split

(30%) click start.
(Figure 1.3 Clustering for 30%)

❖ ACCURACY TABLE FOR CLUSTERING:
Accuracy (%) Clustered Instances

Split SimpleKmeans SimpleKmeans
Percentage Clustered 0 Clustered 1 Clustered 2 Clustered 0 Clustered Clustered
(%) 1 2
30 100% - - 40 - -
60 24% 76% - 7 22 -
70 28% 22% 50% 5 44 9

Till 10 021 Done

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Till 10 021 Done

Uploaded by

Copyright:

Available Formats

Faculty of Engineering & Technology

Subject Name :- Data mining & Business Intelligence

Aim:- Perform Clustering using WEKA tool.

(1). Percentage Split:-

(Figure 1.1 laybor.arff dataset)

Ankit Pandey(200303108159) Data Mining & Business Intelligence (203105454)

(Figure 1.2 Clustering for (50%))

(Figure 1.3 Clustering for (70%))

Ankit Pandey(200303108159) Data Mining & Business Intelligence (203105454)

❖ ACCURACY TABLE FOR CLUSTERING:

Accuracy (%) Clustered Instances

(B).Using EM Custer:- Expectation Maximization (EM) is another popular, though a bit

Step 1: Go to the clustering choose filter SimpleKmeans give percentage Split

(Figure 1.2 Clustering for 70%)

Step 2: Go to the clustering choose filter SimpleKmeans give percentage Split

(Figure 1.3 Clustering for 30%)

Ankit Pandey(200303108159) Data Mining & Business Intelligence (203105454)

❖ ACCURACY TABLE FOR CLUSTERING:

Accuracy (%) Clustered Instances

Ankit Pandey(200303108159) Data Mining & Business Intelligence (203105454)

You might also like