You are on page 1of 5

Task: Clustering Research

1. In groups of 4 to 6, identify 2 machine learning clustering algorithms

It is basically a type of unsupervised learning method. An unsupervised learning method is a method in which we
draw references from datasets consisting of input data without labeled responses. Generally, it is used as a process
to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a
set of examples. 

Clustering Algorithms : 
K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves clustering
problem.K-means algorithm partitions n observations into k clusters where each observation belongs to the
cluster with the nearest mean serving as a prototype of the cluster. 

BIRCH

Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH)  is a clustering algorithm that can cluster large
datasets by first generating a small and compact summary of the large dataset that retains as much information as
possible. This smaller summary is then clustered instead of clustering the larger dataset.

BIRCH is often used to complement other clustering algorithms by creating a summary of the dataset that the
other clustering algorithm can now use. However, BIRCH has one major drawback – it can only process metric
attributes. A metric attribute is any attribute whose values can be represented in Euclidean space i.e., no
categorical attributes should be present.

The DBSCAN algorithm is based on this intuitive notion of “clusters” and “noise”. The key idea is that for each point
of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points. Partitioning
methods (K-means, PAM clustering) and hierarchical clustering work for finding spherical-shaped clusters or
convex clusters. In other words, they are suitable only for compact and well-separated clusters. Moreover, they
are also severely affected by the presence of noise and outliers in the data.

Real life data may contain irregularities, like:

 Clusters can be of arbitrary shape such as those shown in the figure below. 
 Data may contain noise. 

2. List the best use for each of the algorithms you have found
Applications of Clustering in different fields  

 Marketing: It can be used to characterize & discover customer segments for marketing purposes.
 Biology: It can be used for classification among different species of plants and animals.
 Libraries: It is used in clustering different books on the basis of topics and information.
 Insurance: It is used to acknowledge the customers, their policies and identifying the frauds.
 City Planning: It is used to make groups of houses and to study their values based on their geographical
locations and other factors present. 
 Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous zones . 
I was given a task to download the dataset from Kaggle.com and display the:

Shape of the data set

The first 30 rows of the dataset


A description of the dataset.

I was also requested to Create a histogram for elements of the dataset.

I was requested to conduct a research regarding machine learning clustering algorithms and list the best use for
each of the algorithms found, which was then uploaded on e-portfolio.

You might also like