dfg

© All Rights Reserved

6 views

dfg

© All Rights Reserved

- Data Analysis using WEKA
- Software Tool for Data Clustering Using Particle Swarm Optimization
- Binary black hole algorithm
- Clustering K Means
- 3. Comp Sci - Ijcse -An Optimized Approach to Analyze Stock Market
- Bit Transformation Perturbative Masking Technique for Protecting Sensitive Information In Privacy Preserving Data Mining
- K-Means Clustering in Excel
- LEARNING STRATEGY WITH GROUPS ON PAGE BASED STUDENTS' PROFILES
- Prototype System for Retrieval of Remote Sensing Images based on Color Moment and Gray Level Co-Occurrence Matrix, IJCSI, Volume 3, August 2009.
- Es 23861870
- Cluster Analysis
- Breast Cancer
- Likelihood of Intruder Detection in Uniform, Gaussian and Cohesive Network Distribution by Single and Multiple Sensing Models
- Fuzzy,Non Fuzzy Imp
- 07460397
- 04766909.pdf
- b9.pdf
- Clustering
- Final Synopsis
- Db Index

You are on page 1of 4

K. Somasundaram

1

and M. Mary Shanthi Rani

2

Department of Computer Science & Applications,

Gandhigram Rural Institute, Deemed University, Tamil Nadu, India

e-mail:

1

ksomasundaram@hotmail.com,

2

shanthifrank@yahoo.com

Abstract!In this paper, we propose two methods to

construct the initial codebook for K-means clustering

based on covariance and spectral decomposition.

Experimental results with standard images show that the

proposed methods produces better quality reconstructed

images measured in terms of Peak Signal to Noise Ratio

(PSNR).

Keywords: Codebook, Variance, Spectral

Decomposition, Eigen value

I. INTRODUCTION

Clustering is an important partitioning technique

which organizes information in the form of clusters

such that patterns within a cluster are more similar to

each other than the patterns belonging to different

clusters [1]. Traditionally, clustering techniques are

broadly divided into hierarchical and partitioning.

Hierarchical algorithms build clusters hierarchically,

whereas partitioning algorithms determine all clusters at

once. The partitioning methods generally result in a set

of K clusters, each object belonging to one cluster. Each

cluster may be represented by a centroid or a cluster

representative which is some sort of summary

description of all the objects contained in a cluster.

Hierarchical algorithms can be agglomerative (bottom-

up) or divisive (top-down). Agglomerative algorithms

begin with each object as a separate cluster and merge

them successively into larger clusters until the

termination criteria is reached. Divisive algorithms

begin with the whole set and proceed to recursively

divide it into smaller clusters.

The Pairwise Nearest Neighbor (PNN) [2] method

belongs to the class of agglomerative clustering

methods. It generates hierarchical clustering using a

sequence of merge operations until the desired number

of clusters is obtained. The main drawback of the PNN

method is its slowness and the time complexity of even

the fastest implementation of the PNN method is lower

bounded by the number of data objects.

K-means [3] is one of the most popular partitioning

techniques which has great number of applications in

the fields of image and video compression [4],[5],

image segmentation [6], pattern recognition and data

mining[7]-[8]. The rest of the paper is organized as

follows: Section II gives an overview of K-means

clustering technique, Section III briefly describes the

method, Section IV presents the results and

performance analysis of the proposed method and

Section V concludes our work.

II. K-MEANS CLUSTERING

The K-means algorithm is a broadly used VQ

technique and was also graded as one of the top ten

algorithms in data mining [9]. This is iterative in nature

and generates a codebook from the training data using a

distortion measure appropriate for the given application

[3], [10].

It is simple and easy to implement and the

computation time mainly depends on the amount of

training data, codebook size, vector dimension, and

distortion measure for convergence. It clusters the given

objects, based on their attributes into K partitions. K-

means comprises of four steps: initialization,

classification, computational and convergence criteria.

There are two issues in creating a K-means

clustering model:

1. Determining the optimal number of clusters to

create

2. Determining the center of each cluster

Determining the number of clusters (K) is specific

to the problem domain. The overall quality of clustering

is the average distance from each data point to its

associated cluster center. Given the number of clusters

K, the second part of the problem is determining where

to place the center of each cluster. Often, points are

scattered and don't fall into easily recognizable groups.

The algorithm starts by partitioning the input points

into K initial sets, either at random or using some

heuristic data. It then calculates the centroid of each set

and constructs a new partition by associating each point

with the closest centroid. The centroid of each set is

recalculated and the algorithm is repeated by alternate

application of these two steps until convergence is

reached where the points no longer switch clusters or

the overall squared error is less than the convergence

threshold.

Although the K-means algorithm procedure will

always terminate, it does not necessarily find the most

optimal configuration, corresponding to the global

objective function minimum. The algorithm is also

significantly sensitive to the initial randomly selected

cluster centers. A simple approach to reduce this effect

is to make multiple runs of the algorithm with different

Novel Methods for Initial Seed Selection in K-Means Clustering for Image Compression 83

K classes and choose the best one according to a given

criterion.

An important component of any clustering

algorithm is the distance measure which is used to find

the similarity between data objects. Typically, the K-

means algorithm uses squared Euclidean distance

measure to determine the distance between an object

and its cluster centroid.

The conventional K-means algorithm for designing

a codebook C = { c

i

, i=1,", N} of size N from the

given number of M training vectors T = { x

m

,

m=1,",M} is given as follows :

Step 1: Initialization:

Iteration number n = 0; codebook at iteration n,

Cn = { cni ; i = 1,",N }; (1)

Let be the convergence threshold.

Step 2: Partitioning

Find the nearest-neighbour partition

V(cnj )={ xm

where, Q denotes vector quantization operation

and is defined as follows:

Q(x) = cni if d(x, cnj ) d(x, cni ), i = 1,",N (3)

under some distance measure d(x, y),x, y

RK.

Step 3: Codebook Update

Update code vectors Cn = { cnj ; j = 1,"N } to

Cn+1 = {

1 n

j

c

; j = 1,",N } as

1 n

j

c

= C(V (cnj )), (4)

where, C(V (cnj )) is the centroid of the partition V

(cnj ) under the given distance measure d(x, y).

Step 4: Convergence Check

Stop if

1

n n

d d

n

d /

(5)

where, dn =

M / 1

)) ( , (

1

m

M

m

m

x Q x d

Otherwise, replace n by n+ 1 and go to Step 2.

Though K-means algorithm is the most common

and extensively used clustering algorithm due to its

superior scalability and efficiency, its increased time

complexity for convergence poses a serious concern.

This makes it necessary to develop robust strategies to

enable fast convergence of K-means.

A modified K-means algorithm (MODI) was

proposed by Lee et al. [11] which converges faster than

conventional K-means by using a scaled updating

scheme along the direction of the local gradient by a

step-size larger than that used by the centroid update of

the conventional K-means algorithm. While the use of a

scaled-update can accelerate the convergence, use of a

#fixed$ scaling for the entire range of iterations results

in the use of larger step sizes even at iterations closer to

convergence. Consequently, this increases the number

of iterations required for convergence and undesirably

high perturbations of the code vectors as well, to a

poorer local optimum. Kuldip and Ramasubramanian

(NEW) [12] proposed the use of a variable scale factor

which is a function of the iteration number. It offers

faster convergence than the modified K-means

algorithm with a fixed scale factor, without affecting

the optimality of the codebook.

Despite the efficiency of K-means algorithm, it has

some critical issues like apriori fixation of number of

clusters and random selection of initial seeds.

Inappropriate choice of initial seeds may yield poor

results in addition to increase in computation time for

convergence. This has stimulated the researchers to

focus their attention on the initialization step and many

novel initialization methods have been proposed.

The earliest method of initializing the K means

algorithm was done by Forgy in 1965 [13] who chose

points randomly. A pioneering work on seed

initialization was proposed by Ball and Hall (BH) [14].

A similar approach named as Simple Cluster Seeking

(SCS) was proposed by Tou and Gonzales [15]. The

SCS method chooses the first input vector as the first

seed and the rest of the seeds are selected, provided

they are %d& distance apart from all selected seeds. The

SCS and BH methods are sensitive to the parameter d

and the order of the inputs.

A novel initialization method suitable for large

datasets was proposed by Bradley and Fayyad (BF)

[16]. The main idea of their algorithm is to select %m&

subsamples from the dataset, apply the K-means on

each subsample independently, keep the final N centers

from each subsample and produce a set that contains

mN points. A new approach for optimizing K-means

clustering in aspects of accuracy and computation time

has been proposed by Ali and Kiyoki [17]. This

algorithm designates the initial centroids& positions in

the farthest accumulated distance between the data

points in the vector space analogous to choosing pillars&

locations as far as possible from each other within the

pressure distribution of the roof structures for a stable

building. Data points with maximum accumulated

distance among the data distribution are chosen as the

initial centroids.

Babu and Murty [18], proposed a method of near

optimal seed selection using genetic programming.

However, the problem with genetic algorithm is that the

results vary significantly with the choice of population

size, and crossover and mutation probabilities. Huang

and Harris [19] proposed the Direct Search Binary

Splitting (DSBS) method. This method is similar to the

Binary Splitting algorithm except that the splitting step

is enhanced through the use of Principle Component

Analysis (PCA).

Since vector quantization is a natural application of

K-means [20], the centroid index or cluster index and

the centroid are also referred to as #code index$ and

#code vector$ respectively. The result of K-means, the

set of centroids is referred to as the codebook which is

used to quantize vectors with minimum distortion.

84 National Conference on Signal and Image Processing (NCSIP-2012)

III. PROPOSED METHODS

In this paper, two methods Variant Block and

Eigen Block are proposed to construct the initial

codebook to achieve fast convergence, eventually

reducing the computational time of K-means clustering

technique for compressing images.

A. Variant Block Method

The first method is based on the idea that high

variant blocks stand a good choice of initial codebook

as they would maximize intercluster distance. Novelty

in this method is the number of clusters formed %K& is

derived based on the average variance of image blocks.

Nevertheless, the user needs to specify the upper limit

for %K& to control compression rate. The algorithm for

the proposed method is outlined as follows:

Step 1: Divide the input image into 4x4 blocks.

Step 2: Find the covariance of each image block.

Step 3: Find the average covariance and mark it

as threshold value.

Step 4: Find the total number of image blocks %n&

with variance greater than the threshold

value.

Step 5: Set K=n if K>n.

Step 6: Choose only %K& high variant image

blocks

Step 7: Convert each high variant block into 16-

element vector and keep them as initial

seeds.

Step 8: Apply K-means algorithm to construct the

final codebook

The proposed method is image specific as every

image has its own threshold value which is dependent

on the variance of its image blocks.

B. Eigen Block Method

The second method is based on the spectral

decomposition of an image. Clustering algorithms

depend upon the knowledge or acquisition of similarity

information i.e. the affinity between the points in

euclidean space. Spectral techniques, which make use

of information obtained from the eigenvectors and

eigen values of a matrix, have attracted the researchers&

attention with respect to clustering recently. Eigen value

is very important for many applications in science and

engineering such as solution to linear and non-linear

differential equations, boundary value problem, markov

chain, network analysis and population growth model

etc. Spectral clustering methods perform well for some

cases where classical methods (K-means, LBG) fail.

However, for very non-compact clusters, they also tend

to have problems. Hence, spectral decomposition is

incorporated merely as a method for determining the

block structure of the affinity matrix. Consequently, it is

advantageous for clustering techniques, if the affinity

matrix has a clear block structure.

In the proposed method, the significance of eigen

value is used to form the initial codebook. As blocks

having high eigen values are the principal components

of an image, the initial codebook is constructed is

constructed with such blocks. The following steps

describe the algorithm of the proposed eigen block

method.

Step 1: Divide the input image into 4x4 blocks.

Step 2: Form the affinity matrix of each block.

Step 3: Find the eigen values of all blocks.

Step 4: Choose %K& image blocks having high

eigen values.

Step 5: Convert each image block into 16-

element vector.

Step 6: Form the initial codebook with those K

16-element vectors having high eigen

values.

Step7: Apply K-means algorithm with the initial

centroids to construct the final codebook.

IV. RESULTS AND DISCUSSION

The performance of the proposed methods were

evaluated with several test images of size 256256 with

initial block size of 44 on a machine with core2 duo

processor at 2 GHZ using MATLAB. The quality of the

reconstructed image is measured in terms of Peak

Signal to Noise Ratio (PSNR) defined by

PSNR = 20 log

10

255

MSE

(6)

where, MSE is the mean-square error measuring the

deviation of the reconstructed image from the original

image. Q refers to the structural similarity index which

is an universal quality index that models any distortion

as a combination of three different factors - loss of

correlation, mean distortion, and variance distortion.

TABLE 1: PERFORMANCE COMPARISON OF THE PROPOSED METHODS

Test

Image

Method

Execution

Time (in sec.)

Bit Rate

(bpp)

PSNR

Structural

Similarity

Index (Q)

Lena

Variant

Block

Method

39.4 0.21 29.25 0.67

Eigen Block

Method

37.4 0.25 29.24 0.63

K-means 65.0 0.25 29.16 0.69

Boats

Variant

Block

method

24.5 0.25 28.20 0.60

Eigen Block

method

41.5 0.25 28.06 0.59

K-means 135.0 0.25 27.90 0.66

Barbara

Variant

Block

Method

32.8 0.23 25.33 0.68

Eigen Block

method

38.3 0.25 25.18 0.66

K-means 77.5 0.25 25.29 0.69

Novel Methods for Initial Seed Selection in K-Means Clustering for Image Compression 85

Bit rate is measured in bits per pixel (BPP).

Experimental results with test images Lena, Boats and

Barbara are given in Table 1. Table 1 clearly reveals

that both methods achieve fast convergence than K-

means with random initialization of seeds with

enhanced picture quality and bit rate. The execution

time is reduced to almost half of that of traditional K-

means thereby depicting the efficiency of the proposed

methods. Moreover, variant block method can be more

effective than similar methods as it is a variable bit rate

technique and image specific. It achieves variable bit

rate as the size of the codebook depends on the average

variance of the image which varies with different

images. Fig.1 shows the reconstructed images of Lena

using K-means with random initialization and the

proposed methods for visual comparison.

(a) (b)

(c) (d)

Fig. 1a): Original Lena image ; Reconstructed image using b) K-

means c) Variant Block method d) Eigen Block method

V. CONCLUSION

Both variant block and eigen block methods give

better image quality in terms of PSNR value than the

traditional K-means method. Moreover, in terms of

execution time, both of them outperform the K-means

method. Among the two, variant block method

performs better than eigen block method in terms of the

quality of the reconstructed image and execution time.

Both methods stand a good choice for online digital

image applications where execution time plays a vital

role for effective implementation.

REFERENCES

[1] Ankerst, M., M. Breunig, H.P. Kriegel and J. Sander, #OPTICS:

Ordering points to identify the clustering structure$, Proceedings

of ACM SIGMOD International Conference on Management of

Data Mining, ACM Press, Philadelphia, Pennsylvania, United

States, pp. 49-60, June 1999.

[2] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, #When

is %nearest neighbor& meaningful?$, Proceedings of the

International Conference on Database Theory, Jerusalem, pp.

217*235, January 1999.

[3] Linde Y., Buzo A., and Gray R.M., #An Algorithm for Vector

Quantizer Design$, IEEE Transactions on Communication, Vol.

28, pp. 84*95, 1980.

[4] N. Venkateswaran, and Y. V. Ramana Rao, #K-Means

Clustering Based Image Compression in Wavelet Domain$,

Information Technology Journal, Vol.6, pp. 148*153, 2007.

[5] Gersho A., and Gray R.M. #Vector Quantization and Signal

compression$, Kluwer Academic Publishers, New York, pp.

761, 1992.

[6] H.P. Ng, S.H. Ong, K.W.C. Foong, P.S. Goh, and W.L.

Nowinski, #Medical image segmentation using k-means

clustering and improved watershed algorithm$, IEEE Southwest

Symposium on Image Analysis and Interpretation, Denver,

pp.61-65, 2006.

[7] Duda, R.O. and P.E. Hart, Pattern Classification and Scene

Analysis. John Wiley Sons, New York, pp. 482, 1973.

[8] Jiang, D.J. Pei and A. Zhang, #An interactive approach to

mining gene expression data$, IEEE Transactions on

Knowledge and Data Engineering, Vol.17, pp. 1363-1380, 2005.

[9] XindongWu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh,

Qiang Yang,Hiroshi Motoda,Geoffrey J. McLachlan,Angus Ng,

Bing Liu,Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach,David

J. Hand, Dan Steinberg, #Top 10 algorithms in data mining$,

Knowledge and Information Systems Journal, Vol.14, pp. 1-37,

2008.

[10] MacQueen, J.B., Some Method for Classification and Analysis

of Multivariate Observations, Proceedings of the Berkeley

Symposium on Mathematical Statistics and Probability,

(MSP&67), Berkeley, University of California Press, pp: 281-

297, 1967.

[11] D. Lee, S. Baek, and K. Sung, #Modified k-means algorithm for

vector quantizer design$, IEEE Signal Processing Letters, Vol.

4, pp. 2*4, 1997.

[12] Kuldip K. Paliwal and V. Ramasubramanian, Comments on

#Modified K-means Algorithm for Vector Quantizer Design$,

IEEE Transactions on Image Processing, Vol. 9, No. 11,

pp.1964-1967, 2000.

[13] E. Forgy, #Cluster analysis of multivariate data: efficiency vs

interpretability of classification$, Biometrics, Vol. 21, pp.768-

769, 1965.

[14] Ball G.H. and Hall D.J., #PROMENADE-an Online Pattern

Recognition System$, Stanford Research Institute Memo,

Stanford University, 1967.

[15] Tou, J. and R. Gonzales, Pattern Recognition Principles,

Addision-Wesley, Reading, MA., pp: 377, 1977

[16] Bradley, P.S. and U.M. Fayyad, # Refining initial points for K-

means clustering$, Proceedings of the 15th International

Conference on Machine Learning (ICML&98), ACM Press,

Morgan Kaufmann, San Francisco, pp. 91-99, 1998

[17] Ali Ridho Barakbah and Yasushi Kiyoki, #A New Approach for

Image Segmentation using Pillar-Kmeans Algorithm $, World

Academy of Science, Engineering and Technology Journal,

Vol.59, pp.23-28, 2009.

[18] G. P. Babu and M. N. Murty, #A near-optimal initial seed value

selection in K-means algorithm using a genetic algorithm$,

Pattern Recognition Letters, Vol. 14, pp. 763*769, 1993.

[19] C. Huang and R. Harris, #A comparison of several codebook

generation approaches$, IEEE Transactions on Image

Processing, Vol. 2 (1), pp. 108*112, 1993.

[20] J.S. Pan, Z.M. Lu and S.H. Sun, #An efficient encoding

algorithm for vector quantization based on subvector

technique$, IEEE Transactions on Image Processing, Vol 12,

No.3, pp. 265*270, 2003.

- Data Analysis using WEKAUploaded byprabhjotsbhatia9675
- Software Tool for Data Clustering Using Particle Swarm OptimizationUploaded byVinay Chaganti
- Binary black hole algorithmUploaded byAlvaro Gómez Rubio
- Clustering K MeansUploaded byThiago Faleiros
- 3. Comp Sci - Ijcse -An Optimized Approach to Analyze Stock MarketUploaded byiaset123
- Bit Transformation Perturbative Masking Technique for Protecting Sensitive Information In Privacy Preserving Data MiningUploaded byMaurice Lee
- K-Means Clustering in ExcelUploaded byalanpicard2303
- LEARNING STRATEGY WITH GROUPS ON PAGE BASED STUDENTS' PROFILESUploaded byacii journal
- Prototype System for Retrieval of Remote Sensing Images based on Color Moment and Gray Level Co-Occurrence Matrix, IJCSI, Volume 3, August 2009.Uploaded byIJCSI Editor
- Es 23861870Uploaded byAnonymous 7VPPkWS8O
- Cluster AnalysisUploaded bySamarth Sharma
- Breast CancerUploaded byHelloproject
- Likelihood of Intruder Detection in Uniform, Gaussian and Cohesive Network Distribution by Single and Multiple Sensing ModelsUploaded byIOSRjournal
- Fuzzy,Non Fuzzy ImpUploaded bysrisairampoly
- 07460397Uploaded byvrkatevarapu
- 04766909.pdfUploaded byFlavia Rodrigues Gabriel
- b9.pdfUploaded bysagar
- ClusteringUploaded byamrajee
- Final SynopsisUploaded byBaby
- Db IndexUploaded byAlan Díaz
- Survey of Document Clustering Approach for Real World Objects (Documents)Uploaded byEditor IJRITCC
- Amalgam Clustering Algorithm.docxUploaded bysasikathir
- Comparing hard and overlapping clusteringsUploaded bydam
- Akachukwu_Presentation.pptxUploaded byremy_okoro
- ijcnn17Uploaded bysiddhartha1994
- 19. Regional Spatially Adaptive Total VariationUploaded byjagadeesh jagade
- N6Jan2011Uploaded byadmin2146
- abs multicriteria.pdfUploaded byVrushabh Sahare
- 10.1137@1.9780898718348.appa.pdfUploaded byMauro Avila
- Outlier and Event Detection in Wireless Sensor NetworksUploaded bysmkrali

- Quality TestUploaded byAhmed Shaban Kotb
- Decarburization of SteelUploaded bySadeep Madhushan
- Eusebius' ChronicleUploaded byRobert G. Bedrosian
- Orthognatheic SurgeryUploaded byAbdullah AlDossary
- Scientific MethodUploaded bycantor2000
- GAMBAR ADRENERGIKUploaded bySalsabila Safitri
- Group Meeting PresentationUploaded byNeil Bernard
- Update on Perioperative Care of the Cardiac.19Uploaded byaksinu
- Trabajo Practicas Ingles TecnicoUploaded byenrique_13_14
- Organic SolventsSOPUploaded byNikhilesh Kumar
- Optek TOP5 Brewery Applications EnglishUploaded byoptek
- Flow ChartUploaded byFarah Saeki
- Ship Construction LecturesUploaded byengnavaljoao
- riseup2Uploaded byCebeiller Sdn Bhd
- Art Gallery of Tirana Programme Ok AnglishtUploaded bydevonchild
- Tray Columns- PerformanceUploaded byGấu Bear
- f200 SeriesUploaded byEdgar Allende Chavez
- Contact ProcessUploaded byOmar Almonte
- AHCC - Woodlands Patients GuideUploaded bycrisperez
- 09 - Marginal AnalysisUploaded byLyka Garcia
- S1-2014-296770-bibliography (1)Uploaded byMady Ahmad
- 40th DITF Catalogue 2016Uploaded byMichael Hansen
- conflicts-printed.pdfUploaded byjade_rp
- 03_OFDMA_v04Uploaded byDeepak Patni
- Fundamentals of LOPAUploaded byJuan Perez
- Table of all battery cellsUploaded byFiras Noun
- John Gribbin - Introduction to hydraulics and hydrology with applications for stormwater management.pdfUploaded byPablo Paredes
- Vaigai ScheduleUploaded byMalti Bansal
- Micro-tensile Testing MachineUploaded bySonja Kostić
- TLP785 datasheetUploaded byywing4