0 views

- QB Students Dm
- Ravi_Resume
- A PREDICTIVE MODEL FOR MAPPING CRIME USING BIG DATA ANALYTICS.pdf
- Data Warehouse
- Cluster
- 50120140502010
- 17.Prototype Based Image Search Reranking
- Paper SMART Content Provider en 1_2
- Output
- The Relationship Between the Intensity and the Effectiveness of the Preschool Children in Joint Mental Activity
- CaswellJi-AnalysisAndClusteringOfMusicalCompositionsUsingMelodyBasedFeatures
- Market Segmentation Ppr
- GA
- csit64808
- A Comprehensive Review on K-Means Clustering
- New Genetic Algorithm to Predict Data In Mobile Databases
- IJAIEM-2013-09-30-086.pdf
- Clustering
- 09_chapter 1111
- Cluster Analysis

You are on page 1of 29

SCIENCES

AN E',~3GP--~ATIONALJOURNAl

space

A. Bouguettaya *, Q. Le Viet

School of Information Systems, Queensland University of Technology, Brisbane, Qld 4001, Australia

Received 1 June 1997; accepted 30 April 1998

Communicated by Ahmed Elmagarmid

Abstract

Cluster analysis techniques are used to classify objects into groups based on their sim-

ilarities, There is a wide choice of methods with different requirements in computer re-

sources, We present the result of a fairly exhaustive study to evaluate three commonly

used clustering algorithms, namely, single linkage, complete linkage, and centroid. The

cluster analysis study is conducted in the two dimensional (2-D) space. Three types of

statistical distribution are used. Two different types of distances to compare lists of ob-

jects are also used. The results point to some startling similarities in the behavior and

stability of all clustering methods. © 1998 Elsevier Science Inc. All rights reserved.

ate groups o f objects based on their degree o f association [21]. In simple words,

cluster analysis classifies items into groups based on their similarities. In data-

bases, cluster analysis has been used to re-allocate stored information based

on predefined criteria with the goal to improve the efficiency o f data retrieval op-

erations. The standard way for evaluating degree o f similarities varies f r o m one

application to another. By re-allocating data, related information is physically

stored as close as possible to reduce the n u m b e r o f disk accesses. In a distributed

environment, clustering is even m o r e i m p o r t a n t because o f the impact on the

PII: S 0 0 2 0 - 0 2 5 5 (9 8) 1 0 0 3 7 - 3

268 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

response time if the requested data is physically located at different sites. The

need to do clustering is clear. There are many issues that need to be addressed:

1. Calculation of the degree of association between different types of objects.

2. Determination of an acceptable criterion to evaluate the "goodness" of clus-

tering methods.

3. Adaptability of the clustering methods with different distributions of data:

randomly scattered, skewed or concentrated around certain regions, etc.

Several clustering methods have been proposed that differ in the approach

taken to perform the clustering some of which are: Hierarchical versus Partit-

ional, Agglomerative versus Divisive, Extrinsic versus Intrinsic, etc.

[7,10,8,14,22,23,25,18,1]. In that respect, each clustering method has a different

theoretical basis and is applicable to particular fields. We propose a fairly ex-

haustive study of well known clustering techniques in the two-dimensional

(2-D) space. Previous studies, that are less exhaustive in their analysis, have fo-

cused on the 1-D space [5]. Our experiment includes a variety of environment

settings to test the clustering techniques sensibility and behavior. The results

can be of paramount importance across a wide range of applications.

1.1. Definitions

We present a high level description of the different ontologies used in the re-

search literature and this paper. The ontologies we cover are the following:

cluster analysis, objects, clusters, distance and similarity, coefficient of correla-

tion, and standard deviation.

Cluster analysis: This is about the generation of groups of entities that fit a

set of definitions. The group which forms a cluster should have higher degree of

associations within group members than members of different groups. At a

high level of abstraction, a cluster can be viewed as a group of "similar" ob-

jects. Cluster analysis is sometimes referred to as automatic classification. Clus-

ter analysis has a property that makes it different from other classification

methods, namely, information about classes of groupings are not known prior

to the processing. Items are grouped into clusters that are defined by members

of those clusters.

Objects (or items): These are used in a broad sense. They can be anything

that require to be classified based on certain criteria. The object may represent

a single attribute in a relational database or a complex object in an object-ori-

ented database provided that it can be represented as a point in a measurement

space. Obviously, all objects to be clustered should be defined in the same mea-

surement space. In a 1-D (one dimensional) environment, an object is repre-

sented as a point belonging to a segment defined by the interval [a,b] where

a and b are arbitrary numbers.

Clusters: These are groups of objects linked together according to some

rules. The goal of clustering is to find groups containing objects most homoge-

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 269

the objects to be clustered. There are two ways to represent clusters in a mea-

surement space: as a hypothetical point which is not an object in the cluster, or

as an existing object in the cluster called centroid or cluster representative.

Clusters are represented in the measurement space in the same form as the

objects they contain. To distinguish between an object and a cluster, additional

information is needed: the number of objects in the cluster. F r o m that point of

view, a single object is a cluster containing exactly one object. An example of

clusters in a 1-D environment is {0.1}, {0.6}, {0.2, 0.3, 0.5}, etc.

Distance and Similarity: To cluster items in a database or in any other envi-

ronment, some means of quantifying the degree of associations between items

are needed. They may be a measure of distances or similarities. There are a

number of similarity measures available and the choice may have an effect

on the results obtained. Objects which have more than one dimension may

use relative or normalized weight to convert their distance to an arbitrary scale

so that they can be compared. Once the objects are defined in the same mea-

surement space as the points, it is a straightforward exercise to compute the dis-

tance or similarity. The smaller the distance the more similar two objects are.

The most popular choice in computing distance is the Euclidean distance with

where n is the number of dimensions. For the 1-D space, the distance becomes:

d(i,j) = [ xi - xj 1.

There is also the Manhattan distance or city block concepts that are represented

as follows

d(i,j) = [Xil - x j l ] -~- [xi2-xj2 1-~...~- [Xin --Xjn 1.

The distance between two clusters involves some or all items of the two clusters

and is calculated differently depending on the clustering method.

Standard Deviation: This is the measurement of the fluctuation of values as

compared to the mean value. In this study, standard deviation is used to show

the acceptability of the results. The standard deviation of a random variable .~r

is given by

Coefficient o f Correlation: This is the measurement of the strength of rela-

tionship between two variable 5f and Y¢. It essentially answers the question

"how similar are Y" and Y¢?". The values of the coefficients of correlation range

from 0 to 1 where the value 0 points to no similarity and the value 1 points high

similarity. The coefficient of correlation is used to find the similarity among ob-

jects. The correlation r of two random variables ~ and ~ where:

Y" = (Xl,X2,X3,... ,xn) and ~ = (Yl,y2,y3,... ,yn) is given by the formula

270 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

r ~

I E(.~,~) - E(Y') • E ( ~ ) I

v / ( E ( f 2) - E2(~) v / ( E ( ~ 2) - E2(~) '

n .;1 n

where E(X) = ~i=l xi/n and E ( ~ ) = ~'~i=l yi/n and E(Y', ~ ) = ~i=1 xi .yi/n.

1.2. Related work

Cluster analysis has been used in several fields of science to determine tax-

onomical relationships among entities, including fields dealing with species,

psychiatric profiles, census and survey data, and chemical structures

[18,27,14,22].

Clustering applications range from databases (e.g. data clustering and data

mining) to machine learning to data compression [11]. Our application domain

is in the area of databases. In the case of database clustering, the ability to cat-

egorize items into groups allows the re-allocation of related data in a database

to improve the performance of DBMSs. Data records which are frequently ref-

erenced together are moved in close proximity to reduce access time. To reach

this goal, cluster analysis is used to form clusters based on the similarities of

data items. Data may be re-allocated based on values of an attribute, group

of attributes or on accessing patterns. These criteria determine the measuring

distance between data items.

With the advent of OODBs, the need of efficient clustering techniques be-

comes crucial. Some OODBs use some limited form of clustering to improve

their performance. However, they are mostly static in nature [4]. The case of

OODBs is unique in that the underlying model provides a testbed for dynamic

clustering. This is the reason why clustering takes on a whole new meaning

with OODBs. There has been a surge in the number of studies of database clus-

tering [13,17,24,3,2]. In particular, there were recently a number of studies

which investigated adaptive clustering techniques, i.e., the clustering techniques

which can cope with changing access pattern and perform clustering on-line

[5,26].

In database mining and knowledge discovery, the primary goal is the search

for, and the discovery of, previously unknown patterns in large datasets stored

in databases [19,9]. The patterns are then used to predict the model of data

classification. There is a wide range of benefits for "mining" data to find inter-

esting associations. Data warehouses become valuable in terms of understand-

ing, managing, and using previously unknown relationships between sets of

data. Our experiments are meant to provide a generic view on how data is clus-

tered. In this regard, the experiments do not target specific applications and are

applicable to a wide range of domains.

This research builds upon previous work that we have conducted using a dif-

ferent set of environment settings. We investigated three commonly used clus-

tering algorithms: Slink, Clink, and Average, with different settings [5]. The

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 271

preliminary finding seem to point that the choice of clustering method becomes

irrelevant in terms of final outcomes, the study presented here extends our pre-

vious work to include several other settings [5]. The new environment settings

include additional parameters that include: a new clustering method, a statisti-

cal distribution, larger input sizes, and space dimension. The aim is to provide

a basis for a more categorical argument as to the behavior and sensitivity of the

considered clustering methods.

The objects used in this study consists of points lying in the interval [0,1].

There are numerous statistical distributions and we selected the three that

closely model real world distributions [6]. Our aim is to see whether clustering

is dependent on the way objects are generated. The first one is the uniform dis-

tribution and the second one is the piecewise distribution and the third one is

the Gaussian distribution. In what follows, we describe the statistical distribu-

tions that we used in our experiments.

Uniform distribution: The respective distribution function is the following:

~ ( x ) = x. The density function of this distribution is f ( x ) = ~ ' ( x ) = 1 Vx

such that 0 ~<x ~< 1.

Piecewise (skewed) distribution: The respective distribution function is the

following

0.475 if 0.37 ~<x < 0.62,

~(x) = 0.525 if 0.62~<x < 0.743,

0.95 if 0.743 ~<x < 0.89,

1 if 0.89~<x<~ 1.

such that a ~<x < b.

Gaussian (normal) distribution: x

The

2

respective distribution function is the

following: ~,~(x) = ~x/2n~r e x p ( - ~ )2 a "

This as a two-parameter (tr and #) distribution, where ~ is the mean of the

distribution and tr2 is the variance.

o"3 exp

=0.1.

272 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

0.02277 if 0.2 ~<x < 0.3,

~ 0.15867 if 0.3 ~<x < 0.4,.

~-(x) = [ ~.49997 if 0.4~<x < 0.5,

for 0.0~<x~< 1.

For values of x that are in the range [0.5,1], the distribution is symmetric.

Following is an outline of the paper. In Section 2, the clustering methods

used in this study are described. In Section 3, we detail the experiments con-

ducted in this study. In Section 4, we provide the interpretations of the exper-

iment results. In Section 5, we provide some concluding remarks.

2. Clustering methods

There are different ways to classify clustering methods according to the type

of cluster structure they produce. The simple non-hierarchical methods divide

the data set of N objects into M clusters, where no overlap is allowed. They

are also known as partitioning methods. Each item is only a member of a clus-

ter with which it is most similar to, and the cluster may be represented by a

centroid or cluster representative that represents the characteristics of all con-

tained objects. This method is heuristically based and mostly applied in social

sciences.

Hierarchical methods produce a nested data set in which pairs of items or

clusters are successively linked until every item in the data set is linked to form

one cluster. Hierarchical methods can be one of the following:

• Agglomerative, with N - 1 pairwise joins from an unclustered data set. In

other words, from N clusters of one object, this method gradually forms

one cluster of N objects. At each step, clusters or objects are joined together

into larger and larger clusters ending with one big cluster containing all ob-

jects.

• Divisive, in which all objects belong to a single cluster at the beginning,

then they are divided into smaller clusters over and over until the last clus-

ter containing two objects have been broken apart into both atomic con-

stituents.

In both cases, the result of the procedure is a hierarchical tree, hence the

name " hierarchical methods". The hierarchical tree may be presented as a den-

drogram, in which pairwise coupling of the objects in the data set is shown and

the length of the branches (vertices) or the value of the similarity is represented

numerically. Divisive methods are less commonly used and only agglomerative

methods will be discussed in this paper.

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 273

their characteristics. These methods have been used in the experiments present-

ed in this paper.

Single linkage clustering method (Slink): The distance between two clusters

is the smallest distance of all distances between two items (x, y), denoted ~x,y,

such that x is a member of a cluster and y is a member of another cluster. This

method is also known as the nearest neighbor method. The distance ~x.Y is

computed as follows

~x,r=min{~x,y} withxEX, yEY,

where (X, Y) are clusters, and (x,y) are objects in the corresponding clusters.

This method is the simplest among all clustering methods. It has some attrac-

tive theoretical properties [12]. However, it tends to form long or chaining clus-

ters. This method may not be very suitable for objects concentrated around

some centers in the measurement space.

Complete linkage clustering method (Clink): The similarity coefficient is the

longest distance between any pair of objects, denoted ~x~v, taken from two

clusters. This method is also called furthest neighbor clustering method

[16,22,8]. The distance is computed as follows

~x,r = min{~x,y} with x E X, y E Y,

Centroidlmedian method." Clusters in this method are represented by a "cen-

troid", a point in the middle of the cluster. The distance between two clusters is

the distance between their centroids. This method also has a special case in

which the centroid of the smaller group is leveled to the larger one.

2.2. General algorithm

can be found in [22,8]. As a first step, objects are generated using a random

number generator. In our case, these objects are objects in the interval [0,1]. Af-

ter these objects are created, they are compared to each other by measuring the

distance. The distance between two clusters is computed using the similarity co-

efficient. The way objects and clusters of objects are joined together to form

larger clusters varies with the approach used. We outline a generic algorithm

that is applicable to all clustering methods. Essentially, it consists of two phas-

es. The first phase records the similarity coefficients. The second phase com-

putes the minimum and performs the clustering. Initially every cluster

consists of exactly one object.

1. Scan all clusters and record all similarity coefficients.

2. Compute the minimum of all similarity coefficients and then join the corre-

sponding clusters.

274 A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295

4. Go to (I).

There is a case when using Clink method where ambiguity may arise. Sup-

pose when performing Step 1, three successive clusters are to be joined (they all

have the minimum similarity value). When performing Step 2, the first two

clusters are joined. However, when computing the similarity coefficient be-

tween this new cluster and the third cluster, the similarity value may now be

different from the minimum value. The question now is what the next step

should be. There are essentially two possibilities:

• Either proceed by joining clusters using a recomputation of the similarity co-

efficient for each time in Step 2.

• Or join all those clusters that have the similarity coefficient different at once

and do not recompute the similarity in Step 2.

In general, there is no evidence that one is better than the other [22]. For our

study, we selected the first alternative.

2.3. Examples

idea how different clustering methods work with the same set of data. Example

1 uses Slink method, Example 2 uses Clink method, and Example 3 uses Cen-

troid (see Figs. 1-3). The sample data has 10 items and each item has an iden-

tification; and a value on which the distance is calculated. The uniform

distribution was used to generate the above set of objects. For the sake of sim-

plicity, we consider the 1-D space for generating data objects.

Example 1 (Slink).

1. Join clusters {10} and {1} at distance 0.0117584.

2. Join clusters {6} and {10, 1} at 0.105885.

Table 1

Example o f a sample data list

Value Id

0.7764770 1

0.5529483 2

0.3294196 3

0.1058909 4

0.8823621 5

0.6588334 6

0.4353047 7

0.2117760 8

0.9882473 9

0.7647185 10

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267 295 275

3 7 2 6 10

Fig. l. Clustering tree using slink.

3 7 2 6 10 1 5 9

Fig. 2. Clustering tree using clink.

276 A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295

8 3 7 2 6 10 1 5

Fig. 3. Clustering tree using centroid.

4. Join clusters {3} and {7} as one cluster and {2} {6, 10, 1} {5} and {9} as

another cluster at 0.105885148.

5. Join clusters {4, 8} and {3, 7} and {2, 6, 10, 1, 5, 9} as one cluster at

0.117643.

Example 2 (Clink).

1. Join clusters { 1} and {10} at distance 0.0117584.

2. Join clusters {4} and {8} at 0.105885140.

3, Join clusters {3 } and {7} as one cluster, and {2} and {6} as another cluster,

and {5} and {9} as another cluster at 0.105885148.

4. Join clusters {2, 6} and {1, 10} at 0.2235286.

5. Join clusters {4, 8} and {3, 7} at 0.3294138.

6. Join clusters {2, 6, 10, 1} and {5, 9} at 0.4352989.

7. Join clusters {4, 8, 3, 7} and {2, 6, 10, 1, 5, 9} at 0.8823563.

Example 3 (Centroid).

1. Join clusters {10} and {1} at distance of 0.0117585 and form the central

point at 0.77059775.

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 277

2. Join clusters {2} and {6} as one cluster, and {3} and {7} as another cluster,

and {4} and {8} as another cluster at 0.105885148 and form the central

points at 0.60589085, 0.32241215, and 0.15883345.

3. Join clusters {5} and {9} at 0.1058852 and form the central point at

0.93530470.

4. Join clusters { 1, 10} and {2, 6} at 0.16478690 and form the central point at

0.68824430.

5. Join clusters {3, 7} and {4, 8} at 0.22357870 and form the central point at

0.27062280.

6. Join clusters {1, 10, 2, 6} and {5, 9} at 0.24706040 and form the central

point at 0.81177450.

7. Join clusters {1, 10, 2, 6, 5, 9} and {3, 7, 4, 8} at 0.5315170.

3. Experiment description

tree (MST). MST is a tree in which N objects are linked by N - 1 connections

and there is no cycle in the tree. MST has the following properties that make it

suitable to represent the hierarchical tree.

• Any isolated item can be connected to a nearest neighbor.

• Any isolated fragment (sub-set of a MST) in any case can be connected to a

nearest neighbor by the available link.

Data used in this paper is drawn from a (2-D) space. Their values range

from 0 to 1 inclusive and their sizes range from 100 to 500. The congruential

linear algorithm is used to generate data objects [15,20]. The seed is the system

time. Each experiment followed the following steps:

• Generate lists of objects.

• Carry out the clustering process with three different clustering methods.

• Calculate the coefficient of correlation for each clustering method.

Each experiment is repeated 100 times and the standard deviation of the co-

efficients of correlation is calculated. The least square approximation (LSA) is

used to evaluate the acceptability of the approximation. If a coefficient of cor-

relation obtained using the LSA, falls within the segment defined by the corre-

sponding standard deviation, the approximation is deemed to be acceptable.

Types of distances in comparing trees: There are two ways of comparing two

trees, obtained from lists of objects, to compute their coefficient of correlation.

The distance used in the coefficient of correlation could for instance be comput-

ed using the actual linear (Euclidean) difference between two objects. The other

method of computing the distance is to use the minimum number of edges (of a

tree) needed to join two objects. The latter has an advantage over the former in

that it provides a more "natural" implementation of a correlation. We call the

first type of distance, linear distance and the second distance, edge distance.

278 A. Bouguettaya, Q Le Viet I Information Sciences 112 (1998) 267-295

selecting one pair of identifiers in the second list (shorter list) and compute

its distance and then look for the same pair in the first list and compute its dis-

tance. We repeat the same process for all remaining pairs in the second list.

There are several types of coefficients of correlation. This stems from the fact

that several parameters have been used. For instance, the clustering method is

one parameter. There are 3 × 2 x 3 = 18 possible ways to compute the coeffi-

cient of correlation for two lists of objects. Indeed, we have the following choic-

es~

Slink

first parameter Clink

Centroid,

linear distance

second parameter edge distance,

uniform distribution

third parameter piecewise distribution

gaussian distribution.

The other dimension of this comparison study that has a direct influence on

the clustering is the data input. This determines what kind of data is to be com-

pared and what its size is. The following cases have been identified to check the

sensitivity of each clustering method with regard to the input data. For every

type of coefficient of correlation mentioned above, eleven (11) types of situa-

tions (hence, eleven coefficients of correlation) have been isolated. It is our be-

lief that these cases closely represent, what may influence the choice of a

clustering method.

1. The coefficient of correlation is between pairs of objects drawn from a set S

and pairs of objects drawn from the first half of the same set S. The first half

of S is used before the set is sorted.

2. The coefficient of correlation is between pairs of objects drawn from S and

pairs of objects drawn from the second half of S. The second half of S is

used before the set is sorted.

3. The coefficient of correlation is between pairs of objects drawn from the first

half of S, say S> and pairs of objects drawn from the first half of another set

S', say S~. The two sets are given ascending identifiers after being sorted.

The first object of $2 is given as identifier the number 1 and so is given

the first object of S~. The second object of $2 is given as identifier the num-

ber 2 and so is given the second object of S~ and so on.

4. The coefficient of correlation is between pairs of objects drawn from the sec-

ond half of S, say $2, and pairs of objects drawn from the second half of S',

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 279

say S~. The two sets are given ascending identifiers after being sorted in the

same was as the previous case.

5. The coefficient of correlation is between pairs of objects drawn from S and

pairs of objects drawn from the union of a set X and S. The set X contains

10% new randomly generated objects.

6. The coefficient of correlation definition is the same as the fifth coefficient of

correlation except that X now contains 20% new randomly generated ob-

jects.

7. The coefficient of correlation definition is the same as the fifth coefficient of

correlation except that X now contains 30% new randomly generated ob-

jects.

8. The coefficient of correlation definition is the same as the fifth coefficient of

correlation except that X now contains 40% new randomly generated ob-

jects.

9. The coefficient of correlation is between pairs objects drawn from S using

the uniform distribution and pairs of objects drawn from S' using the piece-

wise distribution.

10. The coefficient of correlation is between pairs of objects drawn from S using

the uniform distribution and pairs of objects drawn from S' using the

Gaussian distribution.

1I. The coefficient of correlation is between pairs of objects drawn from S using

the Gaussian distribution and pairs of objects drawn from S' using the

piecewise distribution.

In a nutshell, the above coefficients of correlation are meant to analyze dif-

ferent situations in the evaluation of results. We partition these situations into

three situations, represented by three blocks of coefficients of correlation:

First block: The first, second, third and fourth coefficients of correlation are

used to check the influence of the context on how objects are clustered.

Second block: The fifth, sixth, seventh and eight coefficients of correlation

are used to check the influence of the data size.

Third Block: The ninth, tenth, and eleventh coefficients of correlation are

used to check the relation which may exist between two lists obtained using

two different distributions.

To ensure the statistical representativity of the results, the average of 100 co-

efficient of correlation and standard deviation values (of the same type) are

computed. The least square approximation is then applied to obtain the follow-

ing equation

f(x) = a x + b.

The criterion for a good approximation (or acceptability) is given by the in-

equality

I Y, - f ( x i ) I <~a(y,) for all i,

280 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

is the standard deviation for yi. If this inequality is satisfied, f is then a good

approximation. The least square approximation, if acceptable, helps predict

the behavior of clustering methods for data points beyond the range considered

in our experiments.

4. Resultsandtheirinterpretations

mine the stability of clustering methods and how they compare to each other.

For the sake of readability, a shorthand notation is used to indicate all possible

cases (see Table 2). A similar notation has been used in our previous findings

[5]. For instance, to represent the input with the following parameters: Slink,

Uniform distribution and Linear distance; the abbreviation SUL is used.

Results are presented in figures and tables. The figures describe the different

types of coefficients of correlation. The tables (see Table 3) describe the least

square approximations of the coefficients of correlations.

We first look at the different clustering methods and analyze how stable and

sensitive they are to the various parameters. In essence, we are interested in

knowing how each clustering method reacts to the changes in parameter values.

We look at the behavior of the three blocks of coefficients of correlation val-

ues as defined in Section 3. We then provide an interpretation of the corre-

sponding results.

First block of coefficients of correlation: As previously mentioned, the first

four coefficients of correlation are meant to test the influence of the context.

Table 2

List of abbreviations

Term Shorthand

Slink S

Clink C

Centroid O

Uniform distr. U

Gaussian distr. G

Piecewisedistr. P

Linear distance L

Edge distance E

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 281

Table 3

Graphical representation of all types of correlation coefficients and standard deviations

Second and Sixth Dashed

Third and Seventh Dotted

Fourth and Eighthz Dash-dotted

Ninth Solid

Tenth Dashed

Eleventh Dotted

Fig. 4 represents the first four coefficients of correlation. The (small) difference

between L and E values is consistently the same across all experiments with the

exception of those experiments comparing different distributions (Figs. 6, 9

and 12). We note that the values using E are consistently smaller than those

values using L. The reason lies in difference in computing distances. When L

is used, the distance between the members of two clusters is the same for all

members. However, when E is used, this may not be true (e.g. tree that is

not height balanced) since the distance is equal to the number of edges connect-

ing two members belonging to different clusters. In the case of Figs. 6, 9 and 12

the difference is attenuated due to the use of different distributions.

When the values in L and E are compared against each other, the trend

among the four coefficients of correlation is almost the same. This points to

the fact that the distance type does not play a major role in the final clustering.

The values of the first and second types of correlation are larger than those

of the third and fourth types of correlation because of the corresponding intrin-

sic semantics. The first and second types of correlation compares data objects

drawn form the same initial set. The third and fourth types of correlation com-

pare data objects drawn from different sets. One should expect the former data

objects to be more related than the latter data objects. The standard deviation

values exhibit roughly the same kind of behavior as their corresponding coef-

ficient of correlation values. This points to the fact that the different types of

correlation behave in a uniform and predictable fashion.

Since the coefficient of correlation values are greater than 0.5 in all the above

cases, this points to the important observation that the data context does not

seem to play a significant role in the final data clustering. Likewise, the data

set size does not seem to have any substantial influence on the final clustering.

Note that the slope value is almost equal to zero. This is also confirmed by the

uniform behavior of the standard deviation values as described above.

Second block of coefficients of correlation: The next four coefficients of cor-

relation check the influence of the data size on clustering. Fig. 5 depicts the

next block of coefficient of correlation.

2 8 2 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

0.8 S m ~ racege - Uniform d ~ - Edge

0.61

0.6

~'0.5

~ 0.4 io.0.4

0,3 0"3f.

O.25

o., 2 2 . 1 . ~ . ~ - .....

S i n ~ Ilnkege - Pleoev¢,~ alst. - Edge

1 0.8 . . . . . . .

O.9 ,.~

0.8

0.7

0.6 ......

iii i : i:iii .........

;i i : 12 :~i:. ~ . : - i ] ~..

?7'.7".7:~:~ . . . . . . . . . ..~..~:~:~..~ - ~::7 "7 T..~'..~'IS "5."? -is ;7 7 7 ~ 0.5

8

0.4

0.4

0.3

0.3

0~2

O-2

size size

Sk,l g l e llnlmp - ~ e n dist. - Linear

0.8, .... , .... _, . . . . '.__ ----E . . . . " . . . . '. . . . " . . . . Single link~lle - G ~ dbL - Edge

0.7 - - - - . . . . . . . . . . . . .

0.(

O.5

~0.4

~0.~

0.3

0,t

0.;

°'I~ ,~ ~o ~ ~o ~o ,;o ~

,~o ~o ~ ~o ~o ~ 4~o .~ ~e

which is indicative of the independence of the clustering from any type of dis-

tance used. As in the previous case, the standard deviation values exhibit the

same behavior as the corresponding coefficient of correlation values. This is

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 283

O.9 -- ' , . -- . ,

O.B 0.~

0.71

0.~

0.6

.? ~ 0.,'

80.5

~0.~

0.4

O-Q

0"~00 150 ~00 250 ~ 3~0 400 450 500 °'I, ,~ ~o ~o ~o

Idze

~o ~o ,5o'

Single linkage - Pklozr~i~ diBt. - I d n ~ t s ~ n g ~ Zlnkeoe - P l e ~ a~. - Ed~

I . . . . . . . 0.~

o.~ 0.~

O.7

0.~

~ 0.6

~0.~

~ 0.5

0.~

0.4

%o--,-;;--~- - - ~ o - - - ~ - - - g o - - ~ - - ~ - - %

size

o~! ~e

0.9 O.O

08 0.7

0.7

0.15

0,6

"0.5

8 O.5

~ o.4

o4

o.3

0.3

0.2 0.2

size

relations.

The high values indicate that the context sizes have little effect on h o w data

is clustered. As in the previous case, the data size does not seem to influence the

284 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

final clustering outcome as the slope is nearly equal to zero. Likewise, the data

set size does not seem to have any substantial influence on the final clustering.

Note that the slope value is almost equal to zero. This is also confirmed by the

uniform behavior of the standard deviation values as described above.

Third block of coefficients of correlation: The next three coefficients of corre-

lation check the influence of the distribution for L and E. All other parameters

are set and the same for the pairs of sets of objects to be compared.

Fig. 6 depicts the last three types of coefficients of correlations.

The curve representing the case for U P (Uniform and Piecewise distribu-

tions) in either L or E case, shows values that are a bit lower than the values

in curves representing U G (Uniform and Gaussian distributions) and G P

(Gaussian and Piecewise distributions). This can be explained by the problem

of bootstrapping the random number generator. This is a constant most of the

experiments conducted in this study. The first concurrent experiments (Slink

using L and E) exhibit a behavior that is a little different from the other pieces

of experiments.

When the values in the case of L and E are compared, no substantial differ-

ence is observed. This is indicative of the independence of the clustering from

the types of distances used. Like the previous case, the standard deviation val-

ues exhibit the same behavior as the corresponding coefficient of correlation

values. This is reminiscent of a uniform and predictable behavior of the different

types of correlations.

Since values converge to the value 0.5, this indicates that the distributions do

not influence the clustering one way or the other. As in the previous case, the

data size does not seem to influence the final clustering outcome as the slope

is nearly equal to zero. Likewise, the data set size does not seem to have any

substantial influence on the final clustering. Note that the slope value is almost

equal to zero. This is also confirmed by the uniform behavior of the standard

deviation values as described above.

Slnglel~kage-...-l.ln~r s~b~ ..... Edge

0.55

0.5

o8I

0.55

0.4,5

J 0.45

0.4

~ 0.3 ~ O.3[

O.25

0.~

O.2

0.~ 0.15 _ _ .- ~. .. -

0.1 150 2~ 250 ~0 350 400 450 660 O'l~lO t50 2OO 266 3OO 35O 4OO 45O 66O

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 285

C o m ¢ ~ I ~ g e - U r / b ~ ast. - Uneer

0.8 ~ e laxage - Unlfonn a s t . - Edge

0.~

0.7

. . . . . . . . . .

0.6

0.~

~'0,5

t~

O.4

0.~

~ze ~ze

Coml~te linkage - Pk~ewlse dbL - U r ~ r ~ Irlkage - ~ dlst, - Edge

o.~ , __ . . . . . ,_ . . . . ,. . . . , . . . . ,_____, . . . . 0.8

o,- -7-~-~-]i~?-]:'2~i-777-5==~== ==

0.7

0.6

O.

0.~

0.~

, , , , ,

~so ~ 2so ~0 ~0 4oo 4so soo °'~00 ~so ~0 25o xo 3so 4oo ,tso soo

slze da~

. . . . . . . . . . . . . . . . . . . . . -_ o.7

0.5

~0.5

8 ~o.a

O.4

0.3

size

In essence, the experiments for the Clink clustering method follow the same

type of pattern and behavior as the Slink method. Figs. 7-9 depict the the first,

second, third blocks of coefficients of correlation.

286 .4. Bouguettaya, Q. Le Fiet Information Sciences 112 (1998) 267-295

0.~

0.7 7

:.~. _ ~i. . . . . . . . . . . . ~ ~ ..................................

o.e 0.el

I

8

o.3~

0.~ 021

size

CemplID ~ - ~ a6t.- Unur emmplm I l a g ~ - P i ~ r c a ~ diet. - ~

o.s , , , , , ...... ,. . . . . . . i

0.6

O.7

8

-/

0.~

0.4

0.3

0.2

me ~e

Compete l~¢~le - ~ u m l m ~ c - U n e ~ Colrele~ ~ k ~ l e - G a ~ dla. - IEdga

o.g o.8,

0.7

0.0

0.6

80.5

~0A

OA

O.3

0.~

~ze

The interpretations that apply for the Slink method also apply for the Clink

clustering methods as both follow a very similar pattern of behavior and the

values for both the coefficients of correlation and the standard deviation are

quite similar.

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 287

ComC~an~e-...-Lin~r

0.6

0.~

0.55

01

~zL.~ .................... L~.ST. ~

0.5

0.4.=

0.45

0.~

O.4

0.~

~ 0.~

~ o.~

~ 0.3

o.2~

0.25

0.2 o.;

0.15 0.1,~

o.I o.I

As was the case for Slink and Clink, the experiments for the Centroid clus-

tering methods follow the same type of pattern and behavior. The only differ-

ence lies in the values for the different correlations obtained using different

parameters. Figs. 10-12 depict the first, second, third blocks of coefficients

of correlation.

The interpretations that apply for the previous clustering methods also ap-

ply for the Centroid clustering methods as it follows a similar pattern of behav-

ior. Similarly, the values for both the coefficients of correlation and the

standard deviation are also similar.

Table 4 shows a summary of the results that averages out the different com-

puted coefficients of correlation. Blocks 1-3 correspond respectively to the first

block of four coefficients of correlation, the second block of four coefficients of

correlation, and the third block of three coefficients of correlation.

Tables 5-7 represent the least square approximations for all the curves

shown in this study. The acceptability of an approximation depends on wheth-

er all the coefficient of correlation Values fall within the interval delimited by

the approximating function and the standard deviation. If this is the case, then

we say that the approximation is good. Otherwise, we look at how many points

do not fall within the boundaries and determine the goodness of the function.

Using these functions, will enable us to predict the behavior of the clustering

methods with higher data set sizes.

288 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

O.B O.f~5,

0.6 . . . . . . .

o.6

~.~,~ .............

~'o.5

~"o.,[

~o.4 ~O~i ~

° . 3

0.25

02 ....

i ............................... ?-2 ...... .'_2-_

0.t51 . . . . ,. . . . "=. . . . . . . i ......... ' ~ " ' ~ " - T ~ . . . . . . i .......... I ". . . . . .

o.i 1(~ 150 2o0 25o 3o0 35o 4oo 45o 5oo

me

0.9

CenUokl gnk~le - Pie~m¢~ ~ L - Ummr

O ' ~ L ~ ~

0.8

0.7

0.7

O.B

O.B

~0.4

°.4

0.3

0.3

O.2 O.2

....... - _ --.

0.i

m

CenVo~l Ilnl~ge - ( ~ u ~ , a n d ~ - U n ~ r

0.8 Cer,uo~ k ' , k ~ - a m m ~ din. - Edge

0.7

0.6

0.5

8

~ 0.4

O.3

°.3

o.]5o

o. I

As Tables 5-7 (see Appendix) show, the values of the slopes are all very

small. This points to the stability of all results. All approximations yield almost

parallel lines to the x-axis.

,4. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 289

0.0

0.0

0.7

..... .::.: - ,.. ~' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.;

O.6

OJ

'0.5

8 o.s g

i ~0.4

0.4

0.3

0.3

O.2 0.2

0.11 , i i i i 1

IO0 1,50 20O 250 3OO 360 4~ 450 50O 0"100 150 2OO 2SO 30~ 360 400 450 5OO

stm

Centm~ Unka~ - ~ dbt. - U n ~ C~tmkl I n l u ~ - P l ~ m ~ dlat - E d ~

0,9 0,8,

0.7

0.6~

~,0,6

I

8o.s

~ 0.5

O.4

o.3t

o.a

o2 o.2~

7--.

~11 le

C ; ~ n ~ l e,n k ~ - ~,~,n al~ - C i m ~ Cenuo~ I m k ~ - ~ , a n dist. - EdQe

0.0 0.81

-----._..__

0.0

8o.s

~0.4

0.4

o.3J

0.3

0.2 o.2~

0'I00 150 ~0 250 ~0 360 400 480 500 0"]~10 150 ~ 250 ~0 350 400 4~0 500

#a~e ~iZI

The acceptability test was run and all points passed the test satisfactorily.

Therefore all the approximations listed in tables mentioned above are good ap-

proximations.

290 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

O.68

0.6 O,e

0.46 0.45

0.4 0.4

~ 0,3~

~0.35 g

~ o.~

~ 0.3

o,2~

O25

o.2

O2 o.~ .-~.:- _ :::."~ ~ .~ .'~, ...................... _ .... . . ~

, , , ....- ~ :,~---

~:

0.1!

size

other using the different parameters used in this study. We rely on the results

obtained and the general observations that can be drawn from the experiments

shown in the different figures and tables.

1. The results show that across space dimensions, the context does not com-

pletely hide the sets. For instance, the first and second types of coefficients of

correlation (as shown in all figures) are a little different from the third and

Table 4

Summary of results

P 0.7 0.75 0.7

G 0.7 0.7 0.65

E U 0.5 0.55 0.55

P 0.7 0.7 0.65

G 0.6 0.6 0.6

P 0.9 0.85 0.85

G 0.85 0.8 0.75

E U 0.65 0.7 0.65

P 0.8 0.75 0.7

G 0.7 0.7 0.7

E 0.5 0.5 0.45

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 291

yo +o+~+ + o ~ + ~ + ~ +

+~+~+ ~+ + + + + + +~+

oo

×~×~×× ××××××××~ ×

×

0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I I I I I I I I I I I I

~ + + + ~ + + +

+ ~ + + ++++ +~+ ++

I I I I I I I I I I II

0

c~

++~++++ +~+ ~

~c ××~ ××× ×~×× x ~

6

I I I I I I

d ddg d d

• ~ ~ + ~ ~ +o~

+++ +X + +++ ~++

m N ~ __ ~ N __ X X N m N X

0

I I I I

292 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

°ooo+oooo+o+ • oooooo

+ + + + ~ + + + + + + + + + + +

N X N X ~ N N N N ~ X N X N X N N X

o !

o J

I I I I I I I I I I I

+~+ o + ~ o ~ o o ~ o o o

+ ~ ++ ++ ++

~ × ~ ~ ~ ~ ~+ ~ ~

o

0 0 0 0 0

r~ I II I I I

e~

O

8 ~

r~ I I I I I I I

° ~

I I I ] I I

~ ~ ~ ~ 0 0 0 0 0 0

A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295 293

Table 7

Function approximation of the third block of coefficients of correlation

SE 0.00113 X + 0.45 -0.000119 X + 0.51 -0.000124 X + 0.51

CL -0.00089 X + 0.53 -0.00095 X + 0.52 0.0000088 X + 0.53

CE -0.00085 X + 0.52 0.00089 X + 0.50 0.00088 X + 0.51

OL 0.00079 X + 0.50 0.00079 X + 0.51 -0.0000088 X + 0.50

OE 0.0000101 X + 0.46 0.00111 X + 0.48 0.0000101 X + 0.49

fourth types of coefficient of correlation (as shown in all figures). The values

clearly show what kinds of coefficients are computed.

2. The results show that given the same distribution and type of distance, all

clustering methods exhibit the same behavior and yield approximately the same

values.

3. Slink, Clink, and Centroid seem to have very similar behavior. The coef-

ficients of correlation values are also very close. An explanation for the similar-

ity in behavior between Slink, Clink, and Centroid is that these methods are

based on one single object per cluster to determine similarity distances.

4. The second block of coefficients of correlation for all clustering methods,

demonstrate that the context does not influence the data clustering because all

coefficients of correlation are close to the value 1.

5. The results also show that all clustering methods are equally stable. This

finding comes as a surprise, as intuitively, one expects a clustering method to be

more stable than the others.

6. The results show that the data distribution does not significantly affect the

clustering techniques because the values obtained are very similar to each oth-

er. That is a relatively major finding as the results strongly point to the inde-

pendence of the distribution and the data clustering.

7. The third block of coefficients of correlation across all clustering methods

show that the three methods are little or not perturbed even in a noisy environ-

ment since there are no significant differences in results from Uniform and

Piecewise, and Gaussian distributions.

8. The type of distance (linear or edge) does not influence the clustering pro-

cess as there are no significant differences between the coefficients of correlation

obtained using either linear or edge distances.

The results obtained in this study confirm those from [5] which used one-di-

mensional (l-D) data sample and fewer parameters. The results point very

strongly that in general, no clustering technique is better than another. What

this essentially means is that there is an inherent way for data objects to cluster

and independently from the techniques used.

The other important results that the only discriminator for selecting a clus-

tering method is its computation attractiveness and nothing else. This is a very

294 A. Bouguettaya, Q. Le Viet / Information Sciences 112 (1998) 267-295

important result as in the past there was never an evidence that clustering

methods had very similar behavior.

The results presented here are a compelling evidence that clustering methods

do not seem to influence the outcome of the clustering process. Indeed, all clus-

tering methods considered here exhibit a behavior that is almost constant re-

gardless of the parameters being used in comparing them.

5. Conclusion

iments involved a wide range of parameters to test the stability and sensitivity

of the clustering methods. These experiments were conducted for objects that

are in the 2-D space. The results obtained overwhelmingly point to the stability

of each clustering method and the little sensitivity to noise. The most startling

findings of this study is however, that all clustering methods exhibit an almost

identical behavior; and regardless of the parameters used. The fact that data

objects are drawn from different data spaces does not change the above find-

ings [5].

The above findings are of paramount implications. In that regard, one of the

most important results is t h a t objects have a natural tendency to cluster them-

selves. Clustering methods do not seem to play a major role in the final shape of

the clustering. This also means that the only criterion that should be used to

select one clustering method is the attractiveness of the computational com-

plexity of the clustering algorithm.

Acknowledgements

We would like to thank Mostefa Golea and Alex Delis for their insightful

and detailed comments.

References

[1] M.S. Alderfer, R.K. Blashfield,Cluster Analysis, Sage, BeverleyHills, CA, 1984.

[2] J. Banerjee, W. Kim, S.-Jo. Kim, J. Garza, Clustering a dag for cad databases, IEEE

Transactions on Software Engineering 14 (11) (1988) 1684-1699.

[3] V. Benzaken,An evaluation model for clustering strategies in the 02 object-orienteddatabase

system, In: ICDT, 1990.

[4] V. Benzaken, C. Delobel, Enhancing performance in a persistent object store, Clustering

strategies in 02, In: PODS, 1990.

[5] A. Bouguettaya,On-line Clustering, IEEE Transactionson Knowledgeand Data Engineering

8 (2) (1996).

A. Bouguettaya, Q. Le Viet I Information Sciences 112 (1998) 267-295 295

[6] A. Delis, V.R. Basili, Data binding tool: A tool for measurement based ada source reusability

and design assessment, International Journal of Software Engineering and Knowledge

Engineering 3 (3) (1993) 287-318.

[7] R. Dubes, A.K. Jain, Clustering methodologies in exploratory data analysis, Advances in

Computers 19 (1980).

[8] B. Everitt, Cluster Analysis, Heinemann, Yorkshire, England, 1977.

[9] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smuth, R. Uthurusamy (Eds.), Advances in

Knowledge Discovery and Data Mining, MIT Press, Menlo Park, CA, 1996.

[10] J.A. Hartigan, Clustering Algorithms, Wiley, London, 1975.

[11] A.K. Jain, J. Mao, K.M. Mohiuddin, Artificial neural networks, Computer 29 (3) (1996) 31-

44.

[12] N. Jardine, R. Sibson, Mathematical Taxonomy, Wiley, London, 1971.

[13] Jia-Bing, R. Cheng, A.R. Hurson, Effective clustering of complex objects in object-oriented

databases, In: SIGMOD, 1991.

[14] L. Kaufman, P.J. Rousseenw, Finding Groups in Data, An Introduction to Cluster Analysis,

Wiley, London, 1990.

[15] D.E. Knuth, The Art of Computer Programming, Addison-Wesley, Reading, MA, 1971.

[16] G.N. Lance, W.T. Williams, A general theory for classification sorting strategy, The

Computer Journal 9 (1967) 373-386.

[17] W.J. Mclver, R. King, Self-adaptive, on-line reclustering of complex object data, In:

SIGMOD, 1994.

[18] F. Murtagh, A survey of recent advances in hierachical clustering algorithms, The Computer

Journal 26 (4) (1983) 354-358.

[19] G. Piatetsky-Shapiro, W.J. Frawley (Eds.), Knowledge Discovery in Databases, AAAI Press,

Menlo Park, CA, 1991.

[20] W.H. Press, Numerical Recipes in C, The Art of Scientific Programming, 2nd ed., Cambridge

University Press, 1992.

[21] E. Ramussen, Clustering Algorithms in Information Retrieval, In: W.B. Frakes, R. Baeza-

Yates (Ed.), Information Retrieval: Data Structures and Algorithms, Prentice-Hall, Engle-

wood Cliffs, NJ, 1990.

[22] H.C. Romesburg, Cluster Analysis for Researchers, Krieger, Malabar, FL, 1990.

[23] R.C. Tryon, D.E. Bailey, Cluster Anaysis, McGraw-Hill, New York, 1970.

[24] M.M. Tsangaris, J.F. Naughton, A stochastic approach for clustering in object bases, In:

SIGMOD, 1991.

[25] P. WiUett, Recent trends in hierarchic document clustering, a critical review, Information

Processing and Management 9 (24) (1988) 577-597.

[26] C.T. Yu, C. Suen, K. Lam, M.K. Siu, Adaptive record clustering, ACM Transactions 2 (10)

(1985) 180-204.

[27] J. Zupan, Clustering of Large Data Sets, Research Studies Press, Letchworth, England, 1982.

- QB Students DmUploaded byVinay Gopal
- Ravi_ResumeUploaded byengineeringwatch
- A PREDICTIVE MODEL FOR MAPPING CRIME USING BIG DATA ANALYTICS.pdfUploaded byesatjournals
- Data WarehouseUploaded byAditya Mishra
- ClusterUploaded byLavi Gaurav
- 50120140502010Uploaded byIAEME Publication
- 17.Prototype Based Image Search RerankingUploaded byKannan Smart
- Paper SMART Content Provider en 1_2Uploaded byRicardo Raminhos
- OutputUploaded bySajida Bagus
- The Relationship Between the Intensity and the Effectiveness of the Preschool Children in Joint Mental ActivityUploaded bylazarstosic
- CaswellJi-AnalysisAndClusteringOfMusicalCompositionsUsingMelodyBasedFeaturesUploaded byNosherwan Ijaz
- Market Segmentation PprUploaded bySaumya Jaiswal
- GAUploaded byDewa Adi Suputra
- A Comprehensive Review on K-Means ClusteringUploaded bySingh Tanvir
- New Genetic Algorithm to Predict Data In Mobile DatabasesUploaded byATS
- ClusteringUploaded byShakila Shaki
- csit64808Uploaded byCS & IT
- IJAIEM-2013-09-30-086.pdfUploaded byAnonymous vQrJlEN
- 09_chapter 1111Uploaded bySrinivasa Ragavan
- Cluster AnalysisUploaded bybhartic
- Clustering Algorithm as a Planning Support Tool for Rural Electrification OptimizationUploaded byIJSTR Research Publication
- bse paper 3.pdfUploaded byinvincible_shalin6954
- art-3A10.1007-2Fs10844-011-0158-3Uploaded byXdn Creation
- Bmi704 Ml LabUploaded byjakekei5258
- IJICS-KantarciogluUploaded byjat02013
- Determination of Typical Load Profile of Consumers Using Fuzzy C-Means Clustering AlgorithmUploaded bykalokos
- FUZZY CLUSTERING APPROACH TO STUDY THE DEGREE OF AGGRESSIVENESS IN YOUTH VIOLENCEUploaded byIntegrated Intelligent Research
- Analisis de Mineria de Datos Emergencias MedicasUploaded byHéctor F Bonilla
- MLAlgo ComparisonsUploaded bywerwerwer
- k-cellsUploaded byshekar0888

- Christchurch 19Nov2018 ShortUploaded bySrinidhi Bheesette
- Olympic Pools InfoUploaded bySrinidhi Bheesette
- 2018 SPCS PG Showcase Abstract BookletUploaded bySrinidhi Bheesette
- 1-s2.0-S0969806X15000353-mainUploaded bySrinidhi Bheesette
- CT Foredrassdsdsg W Kalender 2004Uploaded bySrinidhi Bheesette
- Bergmann TPX3 SPSmeasUploaded bySrinidhi Bheesette
- P5 Cable CommissioningUploaded bySrinidhi Bheesette
- Medipix3RX Manual 1v4Uploaded bySrinidhi Bheesette
- Iworid2007!08!03 PospisilUploaded bySrinidhi Bheesette
- 05873950Uploaded bySrinidhi Bheesette
- Compressed 20160621 Slides Tpx.cs.en (1)Uploaded bySrinidhi Bheesette
- 1312.6296Uploaded bySrinidhi Bheesette
- How to Write a Scientific ArticleUploaded byjazorel
- Characterising Mixed FieldUploaded bySrinidhi Bheesette
- huhtinenUploaded bySrinidhi Bheesette

- Derivation of Aggregate DemandUploaded bySaqib Mushtaq
- AutoCAD Plant 3D - 07 IsomUploaded byAlberto Carrizo
- 'Aesthetics Beyond Aesthetics' by Wolfgang WelschUploaded byFangLin Hou
- NJ022713.pdfUploaded bysarahloren_thepress
- ENG2602 - ASSIGNMENT 1..docUploaded byKiana Reed
- 21 USA, Pakistan and the Nuclear IssueUploaded bymurtazee
- Prints and MultiplesUploaded byfulup
- algebra 2 cp syllabusUploaded byapi-234929154
- Trinity & TawheedUploaded byapi-3870707
- Thompson 1995 - Principal RivlariesUploaded byMladen Mrdalj
- MAPEHUploaded byBenevolent Castillo
- The Effect of Mulberry Leaf Meal on the Growth Performance of Weaner Goats in JamaicaUploaded byRinrin S Al
- Bishop Dr. Walter G. Francis, Jr. Funeral ProgramUploaded byJulia Hutchinson
- A Practical Approach to the Rules of Evidence -- 2nd Handout Federal_Evidence_Rule_608__b___00008978__602Uploaded byLja Csa
- Nyaya School of PhilosophyUploaded byShivam
- Behind Bars 1998Uploaded byTibetan Centre for Human Rights and Democracy
- Bioplastics_web28409Uploaded byIman Haerudin
- Butterfly Circus ReflectionUploaded byandi
- GAMA CelluloseAccetateUploaded byFuad Hasan
- PureDisk Best PracticesUploaded byamsreeku
- TrueCrypt User GuideUploaded byxg-hunter
- Bosch WFL2462GB12 User Manual.pdfUploaded byGiles Harper
- 07_chapter 2.pdfUploaded bysaima
- 5th grade rela module 2Uploaded byapi-326844298
- Roque v ComelecUploaded bySinag Haraya
- Lays vs BingoUploaded byAnadiGoel
- MSW management in ChennaiUploaded bysyliamichelle2438
- Anand Mahindra Biography FinallllllUploaded byKhawar Mehdi
- Review of Atharvaveda and Its Paippalādaśākhā- Historical and Philological Papers on a Vedic Tradition.Uploaded byadyakalidas
- Supermarket Management SystemUploaded byKeerthi Nandan