A Comprehensive Review On K-Means Clustering

IJECT VOL.
5, ISSUE 3 SPL - 1, JULY - SEPT 2014

www. i j e c t . o r g INTERNATIONAL JOURNAL OF ELECTRONICS & COMMUNICATION TECHNOLOGY
21
ISSN : 2230-7109 (Online) | ISSN : 2230-9543 (Print)
A Comprehensive Review on k-Means Clustering
Algorithm in Neural Networks
1
Neha,
2
Neelam Chaudhary,
3
Tanvir Singh
1,2,3
Centre for Development of Advanced Computing, Mohali, Punjab, India
Abstract
Neural Networks is known for its ability to derive the complicated
data to extract the complex information in the form of particular
patterns which cant be noticed by human beings or by any other
computing technique. In neural networks, there are various
approaches of pattern recognition, which are list down in this
paper, and out of them, k-Means Clustering Algorithm is discussed
with simulation results, pros-cons and applications.
Keywords
k-Means Clustering, Neural Networks, Pattern Recognition,
MATLAB.
I. Introduction
Neural network is inspired from the Human Brain which computes
very fast as compared to computers. The basic component behind
this fast computations by the brain is NEURON. Brain is a parallel
computer which organizes its computing component neuron to
perform computation much faster than digital computers. One of
the main advantage of Neural Network is the ability to adapt its
weights according to the changes in the surrounding environment.
Fault tolerance is one of the main advantages.
II. Various Pattern Recognition Techniques
k-Means clustering 1.
Kohenenons maps or self-organizing maps 2.
Back-propagation algorithm 3.
Hopfeld model 4.
Perceptron model 5.
k-Means clustering is the oldest and most useful pattern recognition
technique which is to be discussed.
III. k-Means Clustering Algorithm
Clustering is a criteria to group the given set of patterns into clusters
in a way such that patterns in the same cluster are homogeneous
and in the different clusters are unlike. k-Meansclustering is an
algorithm to cluster objects based on some characteristics.As
shown in fg. 1 and fg. 2, objects are grouped based on color.
K in the name of algorithm describes the number of clusters. K
is a positive integer value.The value of K for fg. 1 is 6 and for
fg. 2 is 3.K is the input value.Basically there are two types of
learning: (A) Supervised (B) Unsupervised. k-Means Algorithm
comes under the category of unsupervised leaning. It means the
whole criteria does not use any knowledge. It just follows the
logic which is described below [1-2].
Fig. 1: Clusters for k-Meanns Algorithm
Fig. 2: k-Means Cluster
Algorithm:
Initialize the clusters. Any number of clusters can be 1.
initialized.
Calculate the Euclidean distance of each object from the 2.
clusters.
Euclidean distance is given by:
Now place the object in the cluster with minimum distance 3.
d.
Now calculate the mean of newly grouped data points and 4.
this mean will be the respective updated clusters [14].
Repeat steps 2-4 until convergence [3-5] 5.
All these steps are represented in fg. 4-7 whereas fg.4 shows fow
chart for the algorithm showing the steps involved in k-Means
clustering.
Fig. 3: Flow Chart for Algorithm
IJECT VOL. 5, ISSUE 3 SPL - 1, JULY - SEPT 2014 ISSN : 2230-7109 (Online) | ISSN : 2230-9543 (Print)
www. i j e c t . o r g
22
INTERNATIONAL JOURNAL OF ELECTRONICS & COMMUNICATION TECHNOLOGY
Fig. 4: Initial Representation
Fig. 5: Grouping Under Clusters
Fig. 6: Re-Calculate the Cluster
Fig. 7: Final Representation
IV. Heirarichal vs Non-Heirarichal Approach
Heirarichal clustering method is the method of clustering which
consists of overlapping groups i.e a cluster is a subset of another
cluster[13]. This consists of nesting of clusters.
Non-heirarichal clustering is a method consistes of non-overlapping
groups and no heirarchy .It involves much less computation as
comapred to heirarichal method. K-Means clustering is categorised
under non-heirarichal clustering.Non-heirarichal method is used
on large,high-dimensional datasets.
V. Simulation and Result
We have made data divided into two and four groups. K-mean
demonstrates from which group each data points belongs. The
k-mean algorithm works on two things: the raw data we provide
and number of clustering group in which data has to be divided.
close all,
clear all,
clc,
%Lets consider some random data with two groups
n=40; %sample size
x=[randn(n,1)+2;randn(n,1)+5];
y=[randn(n,1)+2;randn(n,1)+5];
%The group identity
groups=[ones(n,1);ones(n,1)+1];
%To plot the data
scatter(x,y,70,groups,flled)
Fig.8 shows MATLAB plot showing random data with two
groups
Fig.8: MATLAB plot showing random data with two groups
K-Means divides the given data into different cluster groups by
calculating their centroid. Here we divide our data into two clusters
as shown in fg.9.
data=[x,y];
IDX=kmeans(data,2); %divide data into cluster of two groups
IJECT VOL. 5, ISSUE 3 SPL - 1, JULY - SEPT 2014
www. i j e c t . o r g INTERNATIONAL JOURNAL OF ELECTRONICS & COMMUNICATION TECHNOLOGY
23
ISSN : 2230-7109 (Online) | ISSN : 2230-9543 (Print)

Fig. 9: Plot Showing Data Divided Into Two Cluster
K-mean divides data distribution into k groups even though there
is only one group.
%Lets take some random data with four groups
n=80; %sample size
x=[randn(n,1)+3;randn(n,1)+5];
y=[randn(n,1)+3;randn(n,1)+5];
%plot the data
subplot(1,2,1)
plot(x,y,ok,MarkerFaceColor,k)
%divide into two using k-Means and plot the results
data2=[x,y];
IDX=kmeans(data2,4);
%plot the k-Means results
subplot(1,2,2)
scatter(x,y,50,IDX,flled)
Fig.10 shows the plot showing original data and clustered data
into four groups
Fig. 10: Plot Showing Original Data and Clustered Data Into
Four Groups
VI. Pros and Cons
Pros:
Inexpensive as compared to other clustering techniques.
Fast and easier to understand.
This algorithm is moreresponsive when data patterns are well
seperated [11, 12]
This technique will produce tighter clusters especially
when clusters are globular.Globular clusters have
welldefned centers and are circular or elliptical shape
5. In case of large variables, this will work faster if the value
of K is small.
Cons:
It is diffcult to handle in case of noisy patterns.
It is diffcult to predict what would be K due to fxing of
clusters.
This will not work well in case of globular clusters. Non global
clusters are those which do not have well defned centers and
they are of chainlike shape as shown in fg. 11.
Fig. 11: Non-Global clusters
Different initial representation of data will result
differently.
It is necessary to provide a prior specifcation of number of
cluster centers.
Random choice of cluster center does not produce good
results [5].
VII. Application of k-Means
k-Means clustering algorithm is in use by Covenant University
for prediction of academic performance of students [8].
Identifcation of fraud credit card transaction and risky loan
applications i.e for abnormal data perception [9].
It is used for pattern and image recognition.
It is used in search engines, identifcation of cancerous data,
wireless sensors, drug activity prediction [10, 15].
VIII. Conclusion
Automatic (machine) recognition, description, classifcation,
and grouping ofpatterns are important problems in a variety of
engineering and scientifc disciplines such as biology, psychology,
medicine, marketing, computer vision, artifcial intelligence, and
remote sensing.Pattern Recognition is basically a study that how
machines can analyze the environment, and they try to learn the
different patterns taken into interest upon which it fnalize the
decision. k-Means Clustering algorithm is effcient algorithm for
large data and reduce computation time as compare to k-Mean
method. Moreover, this method is fast, easier to understand and
inexpensive.
IJECT VOL. 5, ISSUE 3 SPL - 1, JULY - SEPT 2014 ISSN : 2230-7109 (Online) | ISSN : 2230-9543 (Print)
www. i j e c t . o r g
24
INTERNATIONAL JOURNAL OF ELECTRONICS & COMMUNICATION TECHNOLOGY
References
[1] k-Means Clustering Tutorial, [Online] Available: http://
si gi t wi di yant o. st aff. gunadarma. ac. i d/ Downl oads/
fles/38034/M8-Note-kMeans.pdf
[2] Suman Tatiraju, Avi Mehta,"Image Segmentation using
k-Means clustering, EM and Normalized Cuts", [Online]
Available: http://www.ics.uci.edu/~dramanan/teaching/
ics273a_winter08/projects/avim_report.pdf
[3] OnMyPhD, K-Means Clustering, [Online] Available: http://
www.onmyphd.com/?p=k-Means.clustering
[4] Andrew W. Moore,"K-Means and Hierarchical Clustering",
[Online] Available: http://www.cs.cmu.edu/~cga/ai-course/
kmeans.pdf
[5] Ios, K-Means Clustering, [Online] Available: http://www.
improvedoutcomes.com/docs/WebSiteDocs/Clustering/K-
Means_Clustering_Overview.htm
[6] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu,
Christine D. Piatko, Ruth Silverman, Angela Y. Wu.An
Efficient K-MeansClustering Algorithm: Analysis and
Implementation, IEEE Transactions on pattern analysis
and machine intelligence, Vol. 24, No. 7, July 2002.
[7] A Tutorial on Clustering Algorithms, [Online] Available:
http://home.deib.polimi.it/matteucc/Clustering/tutorial_
html/
[8] [Online] Available: http://covenantuniversity.edu.ng/
[9] James McCaffrey, Detecting Abnormal Data Using k-Means
Clustering,[Online] Available: http://msdn.microsoft.com/
en-us/magazine/jj891054.aspx
[10] Clustering Algorithm Applications,[Online] Available:
https://sites.google.com/site/dataclusteringalgorithms/
clustering-algorithm-applications
[11] Simon Haykin, Neural Networks, Second Edition, Pearson
Education.
[12] Kardi Teknomo, [Online] Available: http://people.revoledu.
com/kardi/tutorial/kMean/index.html
[13] Schikuta,"Grid Clustering: An Efficient Hierarchical
Clustering Method for Very Large Data Sets", Proc. 13thIntl.
Conference on Pattern Recognition, 2, 1996.
[14] R. C. Dubes, A. K. Jain,"Algorithms for Clustering Data",
Prentice Hall, 1988.
[15] K. Mehrotra, C. Mohan, S. Ranka,"Elements of Artifcial
Neural Networks", MIT Press, 1996.
Neha is pursuing her Masters degree
in Embedded Systems from Centre
for Development of Advanced
Computing, Mohali, Punjab. She has
received her B.Tech (Electronics and
Communication) from Punjab Technical
University. She has one year experience
as a Quality Engineer in Cenzer
Industries Limited, Baddi. Her area of
interest is Computer Architecture and
Embedded systems
Neelam Chaudhary is pursuing
her Masters degree in Embedded
Systems from Centre for Development
of Advanced Computing, Mohali,
Punjab. She has received her bachelors
Degree (B.TECH- Electronics &
Communicationengineering) from
Meerut Institute of Engineering
and Technology, Meerut. She has
worked as a Lecturer in electronics
and communication department for 1
year. She has published review/research papers in International
Journals/Conferences. Her area of interest includes environmental
sustainability and embedded system designing.
Tanvir Singh is pursuing his Masters
degree in Embedded Systems from
Centre for Development of Advanced
Computing, Mohali, Punjab. He
received his bachelors Degree
(Electronics and Communication
Engineering) from IET Bhaddal
Technical Campus, Punjab. His area
of interest includes Environmental
Sust ai nabi l i t y i n Wi r el ess
Communication Networks and
Electromagnetic Radiations with a
dream to create a Technical Advanced and eco-friendly world.
He has published 50+ review/research papers in International
Journals/Conferences. He has started a group named Green
Thinkerz to promote Environmental Sustainability (facebook.
com/greenthinkerz).

A Comprehensive Review On K-Means Clustering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comprehensive Review On K-Means Clustering

Uploaded by

Copyright:

Available Formats

IJECT VOL.

5, ISSUE 3 SPL - 1, JULY - SEPT 2014

You might also like