You are on page 1of 4

EXP- 5

AIM:
To study clustering with the application of the K means algorithm in MATLAB

Description:

The k-means algorithm belongs to a family of clustering algorithms known as centroid-based


methods. At its core, it aims to minimize the within-cluster variance, meaning it seeks to
create clusters where the data points within each group are as similar as possible to each other
and as different as possible from data points in other clusters.
Key Concepts:

Centroids: The k-means algorithm defines a centroid for each cluster. This is a representative
point that acts as the "center" of the cluster, calculated as the average of all data points within
that cluster.

Euclidean Distance: The most common distance metric used in k-means is the Euclidean
distance. It measures the "straight-line" distance between two data points. Other distance
metrics can be used, but Euclidean distance is often chosen for its simplicity and efficiency.

Minimizing Within-Cluster Variance: The core objective of k-means is to minimize the


within-cluster variance. This refers to the average squared distance between each data point
and its assigned cluster's centroid. Intuitively, smaller within-cluster variance implies tighter,
more cohesive clusters.
Procedure:
1. Open MATLAB
2. Load fisheriris dataset
3. Write the following code

% Load the Fisher Iris dataset


load fisheriris.mat

% Select specific features (sepal length, sepal width, petal length, petal width)
data = meas(:, 2:end);

% Set the number of clusters (k) to 3


k = 3;

% Create options structure for k-means algorithm


opts = statset('Display', 'final');

% Perform k-means clustering using Euclidean distance, 5 replicates, and final display
[idx, c] = kmeans(data, k, "Distance", "sqeuclidean", "Replicates", 5, "Options", opts);

% Create a scatter plot using the first two features (sepal length and sepal width)
figure;
scatter(data(:, 1), data(:, 2), 10, idx, 'filled');

% Add cluster centers to the plot as large, bold 'x' markers


hold on;
plot(c(:, 1), c(:, 2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);

4. Go to workspace click on meas to visualise the dataset

Results:

You might also like