721 views

Uploaded by IJAFRC

Clustering plays a vital role in research area in the field of data mining. Clustering is a process of
partitioning a set of data in a meaningful sub classes called clusters. It helps users to understand
the natural grouping of cluster from the data set. It is unsupervised classification that means it
has no predefined classes. Data are grouped into clusters in such a way that data of the same
group are similar and those in other groups are dissimilar. It aims to minimize intra-class
similarity while to maximize interclass dissimilarity. Clustering is useful to obtain interesting
patterns and structures from a large set of data. Clustering can be applied in many areas, such as
marketing studies, DNA analyses, city planning, text mining, and web documents classification.
Large datasets with many attributes make the task of clustering complex. Many methods have
been developed to deal with these problems. There are number of techniques proposed by
several researchers to analyze the performance of clustering algorithms in data mining. All these
techniques are not suggesting good results for the chosen data sets and for the algorithms in
particular. The choice of clustering algorithm depends both on the type of data available and on
the particular purpose and application. Clustering analysis is one of the main analytical methods
in data mining. Some of the clustering algorithms are suit for some kind of input data. In this
paper, three well known partitioning based methods k-means, K-medoid and enhanced K-medoid
are compared. The study given here explores the behavior of these three methods. Our
experimental results have shown that enhanced K-medoid performed better than k-means and Kmedoid
in terms of cluster quality and elapsed time.

- Mobile Data Gathering With Load Balanced
- A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data
- Classification and Segmentation of Glaucomatous Image Using Probabilistic Neural Network (PNN), K-Means and Fuzzy C-Means(FCM)
- Plugin Kdd98 DENCLUE
- LTE KPI Measurement Methodology and Acceptance Procedure
- A Modified Method for Order Reduction of Large Scale Discrete Systems
- CIM Lecture Notes
- Clustering and Sequential Pattern Mining
- COMPARITIVE STUDY OF DATA MINING TECHNIQUES FOR INTRUSION DETECTION SYSTEM
- Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
- Fdp on Im & Dm Using Ost
- Review Paper
- Rainfall Water Runoff Determination Using Land Cover Classification of Satellite Images For Rain Water Harvesting Application
- mtpreport
- Advancement In K-Mean Clustering Algorithm for Distributed Data
- wsd (1)
- Data Mining
- all quiz dm
- Cluster Cat Vars
- Art Brainstorm 1

You are on page 1of 4

K-Medoid Algorithms

A. Rajarajeswari, 2R. Malathi Ravindran

Scholar Of Computer Science

2Assistant Professor Of Computer Science,

NGM Collge, Pollachi.

1

1 Research

ABSTRACT

Clustering plays a vital role in research area in the field of data mining. Clustering is a process of

partitioning a set of data in a meaningful sub classes called clusters. It helps users to understand

the natural grouping of cluster from the data set. It is unsupervised classification that means it

has no predefined classes. Data are grouped into clusters in such a way that data of the same

group are similar and those in other groups are dissimilar. It aims to minimize intra-class

similarity while to maximize interclass dissimilarity. Clustering is useful to obtain interesting

patterns and structures from a large set of data. Clustering can be applied in many areas, such as

marketing studies, DNA analyses, city planning, text mining, and web documents classification.

Large datasets with many attributes make the task of clustering complex. Many methods have

been developed to deal with these problems. There are number of techniques proposed by

several researchers to analyze the performance of clustering algorithms in data mining. All these

techniques are not suggesting good results for the chosen data sets and for the algorithms in

particular. The choice of clustering algorithm depends both on the type of data available and on

the particular purpose and application. Clustering analysis is one of the main analytical methods

in data mining. Some of the clustering algorithms are suit for some kind of input data. In this

paper, three well known partitioning based methods k-means, K-medoid and enhanced K-medoid

are compared. The study given here explores the behavior of these three methods. Our

experimental results have shown that enhanced K-medoid performed better than k-means and Kmedoid in terms of cluster quality and elapsed time.

Index Terms: - Clustering, Classification, Partition Clustering, K-means, K-medoid, enhanced K-medoid.

I. INTRODUCTION

This study is aimed to give a comparative review of three of the various partitioning based clustering

methods. Clustering is a division of data objects into groups of similar objects. Such groups are called

clusters. Objects possessed by same cluster tend to be similar, while dissimilar objects are possessed by

different clusters. These clusters represent groups of data and provide simplification by representing

many data objects by fewer clusters. And, this helps to model data by its clusters.

Clustering is similar to classification in which data are grouped. A cluster is therefore a collection of

objects which are similar between them and are dissimilar to the objects belonging to other clusters.

There exist a large number of clustering algorithms in the literature. The choice of clustering algorithm

depends both on the type of data available and on the particular purpose and application. The data

clustering is a big problem in a wide variety of different areas, like pattern recognition & bio-informatics.

Clustering is a data description method in data mining which collects most similar data. The purpose is to

organize a collection of data items in to clusters, such that items within a cluster are more similar to each

7 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

Volume 2, Issue 8, August - 2015. ISSN 2348 4853

other than they are in other clusters. Clustering analysis is one of the main analytical methods in data

mining. K-means is the most popular and partition based clustering algorithm. But it is computationally

expensive and the quality of resulting clusters heavily depends on the selection of initial centroid and the

dimension of the data. Several methods have been proposed in the literature for improving performance

of the K-means clustering algorithm. In this research, the most representative algorithms K-Means, Kmedoid and enhanced K-medoid were examined and analyzed based on their basic approach. The best

algorithm in each category was found out based on their performance.

Clustering methods are mainly suitable for investigation of interrelationships between samples to make a

preliminary assessment of the sample structure. Clustering techniques are required because it is very

difficult for humans to intuitively understand data in a high-dimensional space.

II.PARTITION CLUSTERING

A partitioning method constructs k (k<n) clusters of n data sets (objects) where each cluster is also

known as a partition. It classifies the data in to k groups while satisfying following conditions :

Each object should belong to exactly one group.

The number of partitions to be constructed (k) this type of clustering method creates an initial partition.

Then it moves the object from one group to another using iterative relocation technique to find the global

optimal partition. A good partitioning is one in which distance between objects in the same partition is

small (related to each other) whereas the distance between objects of different partitions is large (they

are very different from each other).

A. K-Means Clustering

This segment describes the original K-Means clustering algorithm. The idea is to classify a given set of

data into k number of transfer clusters, where the value of k is fixed in advance. The algorithm consists of

two separate phases: the first stage is to define k centroids, one for each cluster. The next stage is to take

each point belonging to the given data set and associate it to the nearest centroid. Euclidean distance is

generally considered to determine the distance between data points and the centroids. When all the

points are included in some clusters, the first step is completed and an early grouping is done. At this

point we need to recalculate the new centroids, as the inclusion of new points may lead to a change in the

cluster centroids. Once we find k new centroids, a new binding is to be created between the same data

points and the nearest new centroid, generating a loop. As a result of this loop, the k centroids may

change their position in a step by step manner. Eventually, a situation will be reached where the

centroids do not move anymore. This signifies the convergence criterion for clustering.

The process, which is called K-Means, appears to give partitions which are reasonably efficient in the

sense of within class variance, corroborated to some extend by mathematical analysis and practical

experience. Also, the K-Means procedure is easily programmed and is computationally economical, so

that it is feasible to process very large samples on a digital computer. K-Means algorithm is one of first

which a data analyst will use to investigate a new data set because it is algorithmically simple, relatively

robust and gives good enough answers over a wide variety of data sets.

B. K-Mediod Clustering

K-medoid is a partition clustering algorithm which needs to select k clustering centers from data objects

and establish an initial partition nearest to clustering centre for other data before iterating and moving

www.ijafrc.org

8 | 2015, IJAFRC All Rights Reserved

Volume 2, Issue 8, August - 2015. ISSN 2348 4853

clustering centers continuously until an optimum partition is reached. Due to the randomness of K value

and initial selection of clustering centers, the efficiency and accuracy of it is very low. The improved Kmedoid algorithm which adds k value under constraint conditions as clustering centers and only needs

one iteration to get clustering results can not only solve the randomness of clustering center selection but

also can improve its efficiency toward complicated data to achieve global optimization.

The k-means method uses centroid to represent the cluster and it is sensitive to outliers. This means, a

data object with an extremely large value may disrupt the distribution of data. K-medoid method

overcomes this problem by using medoids to represent the cluster rather than centroid. A medoid is the

most centrally located data object in a cluster. Here, k data objects are selected randomly as medoids to

represent k cluster and remaining all data objects are placed in a cluster having medoid nearest (or most

similar) to that data object. After processing all data objects, new medoid is determined which can

represent cluster in a better way and the entire process is repeated. Again all data objects are bound to

the clusters based on the new medoids. In each iteration, medoids change their location step by step, or

in other words, medoids move in each iteration. This process is continued until no any medoid move. As a

result, k clusters are found representing a set of n data objects.

C. Enhanced K-Medoid Clustering

K-medoid has high accuracy of pattern matching and is not sensitive to dirty and abnormal data. Besides,

it requires enormous amount of iteration and a lot of time. It reduces the efficiency of clustering;

therefore, it can only be used to handle small amount of data. With the development of information and

internet, data on database increases rapidly not only in amount but also in complexity. In view of this, it

should not pay the attention to the efficiency of information acquisition, but also to the accuracy of it.

Under this condition, an enhanced K-medoid clustering algorithm is established which can meet the

needs of complex data set.

This new enhanced K-medoid algorithm assumes all the data points as medoids and calculates the costs

for individual points. After calculating the total cost of all the data points, it specifies the number of

clusters in which the original data to be grouped. Since K-medoid algorithm is an unsupervised

algorithm, the number of clusters are specified. The medoids are selected from the data points in which

that data point scored the least minimum cost. For example, if there are ten clusters, the first 10 least

minimum cost points are selected as medoids. This algorithm overcomes the problem of possibility to

check all the data points as medoids. Manhattan distance metric is used to calculate the distance between

the cluster points.

III.CONCLUSION

From the above study, it can be concluded that partitioning based clustering methods are suitable for

small to medium sized data sets. K-means, K-medoid and enhanced K-medoid require specifying k,

number of desired clusters, in advance. Result and runtime depends upon initial partition for all of these

methods. The advantage of K-means is its low computation cost, while drawback is sensitivity to noisy

data and outliers. Compared to this, K-medoid is not sensitive to noisy data and outliers and it has high

computation cost. But when enhanced K-medoid is compared to K-means and K-medoid, it is much

efficient, not sensitive to noisy data and outliers. Its performance towards large data sets is also

competent.

IV.REFERENCES

www.ijafrc.org

Volume 2, Issue 8, August - 2015. ISSN 2348 4853

[1]

John Wiley & Sons, 1990.

[2]

A. K. Jain, M. N. Murty, and P. J. Flynn Data clustering: a review. ACM Computing Surveys, Vol

.31No 3,pp.264 323, 1999.

[3]

J. Han and M. Kamber. Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers,

August 2000.

[4]

Rui Xu, Donlad Wunsch, Survey of Clustering Algorithm, IEEE Transactions on Neural Networks,

Vol. 16, No. 3, may 2005.

[5]

Sanjay garg, Ramesh Chandra Jain, Variation of k-mean Algorithm: A study for High Dimensional

Large data sets, Information Technology Journal5 (6), 1132 1135, 2006.

[6]

K. A. Abdul Nazeer, M. P. Sebastian Improving the Accuracy and Efficiency of the K-Means

Clustering Algorithm Proceedings of the World Congress on Engineering 2009 Vol - I WCE 2009,

July 1 - 3, 2009, London, U.K.

[7]

Mining: An Experimental Approach An experimental approach. Information. Technology. Journal,

Vol, 10,No .3 , pp478- 484,2011.

[8]

D. Klein, S.D. Kamvar, and C. Manning, From instance level Constraints to Space-level

Constraints: Making the Most of Prior Knowledge in Data Clustering, In Proc. ICML02, Sydney,

Australia.

www.ijafrc.org

- Mobile Data Gathering With Load BalancedUploaded byshalini
- A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional DataUploaded byAbdul Rehaan
- Classification and Segmentation of Glaucomatous Image Using Probabilistic Neural Network (PNN), K-Means and Fuzzy C-Means(FCM)Uploaded byInternational Journal for Scientific Research and Development - IJSRD
- Plugin Kdd98 DENCLUEUploaded byoromos
- LTE KPI Measurement Methodology and Acceptance ProcedureUploaded byghost_iraq
- A Modified Method for Order Reduction of Large Scale Discrete SystemsUploaded byEditor IJACSA
- CIM Lecture NotesUploaded byNauman Khan
- Clustering and Sequential Pattern MiningUploaded bycse_solo
- COMPARITIVE STUDY OF DATA MINING TECHNIQUES FOR INTRUSION DETECTION SYSTEMUploaded byAnonymous vQrJlEN
- Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureUploaded byInternational Journal of Research in Engineering and Science
- Fdp on Im & Dm Using OstUploaded bysomenath_sengupta
- Review PaperUploaded byFiyansh
- Rainfall Water Runoff Determination Using Land Cover Classification of Satellite Images For Rain Water Harvesting ApplicationUploaded byEditor IJSET
- mtpreportUploaded byLe Quang Anh
- Advancement In K-Mean Clustering Algorithm for Distributed DataUploaded byIRJET Journal
- wsd (1)Uploaded byBirhane Tesfaye
- Data MiningUploaded byPadmavathi Venkata
- all quiz dmUploaded byRahul Rao Govu
- Cluster Cat VarsUploaded byfab101
- Art Brainstorm 1Uploaded byCristina Nicolescu
- DM Lecture 2 18-09-2012Uploaded byhabibsultanbscs
- 137495180-BSIT-53-TBUploaded bywarrior432
- frank_JMLR13-459-2012.pdfUploaded byDjordje Miladinovic
- An Efficient Approach for Color Pattern Matching Using Image MiningUploaded byseventhsensegroup
- Scall PeruUploaded byPepe Gual
- Outlier 10 ImpUploaded bysuchi87
- IRJET-Detecting Text in Natural Scenes with Connected Component Clustering and Nontext FilteringUploaded byIRJET Journal
- Pr NotesUploaded byANiket
- IJETTCS-2016-02-03-36.pdfUploaded byAnonymous vQrJlEN
- Blobs AnnotationUploaded byaitaoudia

- Effects Of Social Media On The Psychology Of PeopleUploaded byIJAFRC
- Detecting User Relevant ApplicationsUploaded byIJAFRC
- Performance Analysis and Lifetime Maximization of Wireless Sensor Network using Efficient Clustering AlgorithmUploaded byIJAFRC
- Data Security and Anonymization for Collaborative Data PublishingUploaded byIJAFRC
- Face Verification with Age Progression using Discriminative Method and Gradient Orientation PyramidUploaded byIJAFRC
- A Review Of BFOA Application To WSNUploaded byIJAFRC
- CFS based Feature Subset Selection for Software Maintainance PredictionUploaded byIJAFRC
- Efficient Algorithm Comparison To Solve Sudoku PuzzleUploaded byIJAFRC
- A Review of Compact Asymmetric Coplanar Strip Fed Monopole Antenna for Multiband ApplicationsUploaded byIJAFRC
- The Various Ways of Programming and Embedding Firmware into an ARM Cortex-M3 Microcontroller Based HardwareUploaded byIJAFRC
- Detection and Defense of DDoS AttacksUploaded byIJAFRC
- Face Granulation using Difference of Gaussian (DOG) Method for Face RecognitionUploaded byIJAFRC
- DDos System: A Disparagement System with Cache Based and Question Generation in Client-Server ApplicationUploaded byIJAFRC
- Finding Efficient Initial Clusters Centers for K-MeansUploaded byIJAFRC
- Pocket Barcode Application-Data Transmission Between PDA & PC Using Wireless NetworkUploaded byIJAFRC
- Monitoring Applications in Wireless Sensor NetworksUploaded byIJAFRC
- A Study On Denial-Of-Service Attacks And Their Countermeasure ApproachesUploaded byIJAFRC
- Predictive Model for Blood-Brain BarrierUploaded byIJAFRC
- To Design And Algorithm Using Zero Watermarking With Stegnography For Text DocumentUploaded byIJAFRC
- A Survey On Designing Of Turbo Encoder & Turbo DecoderUploaded byIJAFRC
- Metamaterial Based Fractal Body Worn Antenna –A ReviewUploaded byIJAFRC
- A Comparative study on Classification and Clustering Techniques Using Assorted Data Mining ToolsUploaded byIJAFRC
- Automated Leukemia Detection By Using Contour Signature MethodUploaded byIJAFRC
- A Short Study On Effective Predictors Of Academic Performance Based On Course EvaluationUploaded byIJAFRC
- Application of Malicious URL Detection In Data MiningUploaded byIJAFRC
- Analytical Evolution of WIMAX Coded OFDM in Various Fading EnvironmentUploaded byIJAFRC
- Perceptions on Data Aggregation in Wireless Sensor NetworkUploaded byIJAFRC
- Energy Efficient Reconfigurable Fixed Width Baugh-Wooley MultiplierUploaded byIJAFRC
- Designing Of Efficient FPGA Based Random Number Generators Using VHDL.Uploaded byIJAFRC

- -Victor-Schauberger-and-the-Turbulence-of-Water-♦-Franz-Pichler-♦-excerpted-paper-♦-Olafur-Eliasson-Surroundings-Surrounded-♦-2001Uploaded byraoulspiegel6296
- Staad Pro- FEM Modelling 1Uploaded byV.m. Rajan
- Simulink TutorialUploaded bysivakkumar14
- Proceedings of the Workshop on Tools and Algorithms for the Construction and Analysis of SystemsUploaded bymarcos_de_carvalho
- Use of Space and Resources in a Mediterranean Population of the Butterfly Euphydryas AuriniaUploaded byNadja Velez
- Zeno Definition of Phantasia KataleptikeUploaded byangurp2575
- 10.1.1.51.5933Uploaded byBo Cu Bin
- _SENTRON - powermanager.pdfUploaded byMancamiaicuru
- ASTM D 3385 _ 94-2003_Infiltration Rate of Soils in Field Using Double-Ring.pdfUploaded byMartin Daniel Palacios Quevedo
- Biology Chapter 3 NotesUploaded byWand Yew Ong
- physics-ALMCUploaded bylavina rachel
- Physics 71_1Syllabus 1stSem15-16Uploaded byYing Yang
- jimmyUploaded bysoonleong89
- ET200S ManualUploaded byjhan
- How to Script the Creation of a Replay Create a Volume From a Replay and Map a Replay Volume to a Server SC 6.3Uploaded bySean
- 129281009 1 OEP100310 LTE Radio Network Coverage Dimensioning ISUEE 1 03Uploaded byAhlem Drira
- Attenuation Measurement TechniqueUploaded bysabavoon
- Application Controls- Input ControlsUploaded byMaria Gracia Balderas
- Vector CalculusUploaded byMohammad Mofeez Alam
- 2Uploaded byAndrew Prozorovskii
- 3926 Coal Tar GS Mass AnalysisUploaded byaminiaan
- Compact Shape Sa a Relevant Parameter for Sintering ZnO-Bi2O3 Based VaristorsUploaded byJan Ja
- htsc in ibmUploaded byAndrew Nikolaevich Zhelezkov
- 420C19Uploaded byAl Arafat Rumman
- Load Controlled Cyclic Triaxial Strength of SoilUploaded byJiji Krishnan
- Efficiency Calculations of BagasseUploaded byOmar Ahmed Elkhalil
- Reliability-Centered Maintenance - Case StudyUploaded byiqbal
- Hydraulics Engineering Lec 5-Weirs and FlumeUploaded byUsman Afzal
- Extraction and Potential of Cinnamon Essential Oil towards Repellency and Insecticidal ActivityUploaded byIJSRP ORG
- MAG8000-O&MUploaded byehabkhalil