You are on page 1of 2

Assessment 11

Artificial Intelligence and Machine Learning

Question 1:
K-Means Clustering:

Introduction
K-Means clustering is a popular unsupervised machine learning algorithm used for
partitioning a dataset into K distinct, non-overlapping subsets or clusters. The goal is
to group similar data points together and assign them to clusters, making it a useful
technique for exploratory data analysis and pattern discovery.

Basic Idea:
The algorithm works iteratively to assign each data point to one of K clusters based on
features' similarity. The mean (centroid) of the points in each cluster becomes the new
cluster center. This process is repeated until convergence, where the assignment of
data points to clusters stabilizes.

Algorithm Steps:
1. Initialization: Randomly select K data points as initial cluster centroids.
2. Assignment:Assign each data point to the cluster whose centroid is closest (typically
using Euclidean distance).
3. Update: Recalculate the centroids as the mean of the points in each cluster.
4. Repeat Assignment and Update: Repeat steps 2 and 3 until convergence.

Types of K-Means Clustering:

1. Hard/Traditional K-Means:
- Each data point is assigned exclusively to one cluster.
- The assignment of points to clusters is based on the nearest centroid.

2. Fuzzy K-Means:
- Allows data points to belong to multiple clusters with different degrees of
membership.
- Assigns each point a membership value indicating its degree of belonging to each
cluster.

3. K-Medoids:
- Uses the medoid (the most centrally located point in a cluster) instead of the mean
as the cluster center.
- Less sensitive to outliers compared to traditional K-Means.

4. Kernel K-Means:
- Applies the kernel trick to map data into a higher-dimensional space.
- Enables the clustering of non-linearly separable data.

Advantages of K-Means:
- Simplicity and ease of implementation.
- Scalable to large datasets.
- Applicable to a wide range of data types.

Disadvantages of K-Means:
- Sensitive to the initial placement of centroids.
- Assumes spherical clusters of similar sizes.
- May converge to local optima.

Use Cases:
- Image segmentation.
- Customer segmentation in marketing.
- Anomaly detection in cybersecurity.
- Document clustering in natural language processing.

Tips for Practical Use:


- Preprocess data to handle outliers.
- Consider using feature scaling.
- Run the algorithm multiple times with different initializations.
- Choose the number of clusters (K) carefully; use techniques like the elbow method.

K-Means clustering is a versatile algorithm with various extensions, and its


effectiveness depends on the nature of the data and the problem at hand.m

You might also like