arising from the data. Hence, splitting up a
homogeneous data set in a ―fair‖
way is muchmore straightforward problem when compared tothe analysis of hidden structures fromheterogeneous data set. The clustering algorithms[15, 21] partitioning the data set in to
k
clusterswithout knowing the homogeneity of groups.Hence the principal goal of these clusteringproblems is not to uncover novel or interestingfacts about data.Numerical methods can usually provide onlyguidance about the true number of clusters and thefinal decision is often an ad hoc decision that isbased on prior assumptions and domainknowledge. Therefore, the choice between thedifferent numbers of clusters is often made bycomparing several alternatives, and the finaldecision is a subjective problem that can besolved in practice only by humans. Nevertheless,a number of methods for objective assessment of cluster validity have been developed andproposed. Because the recognition of clusterstructures is difficult especially in high-dimensional spaces, various visualizationtechnique can also be of valuable help to thecluster analyst.Given a connected, undirected graph
G = ( V, E )
,where
V
is the set of nodes,
E
is the set of edgesbetween pairs of nodes, and a weight
w (u , v)
specifying weight of the edge
(u, v)
for each edge
(u, v) E
. A spanning tree is an acyclic subgraphof a graph
G
, which contains all vertices from
G
.The Minimum Spanning Tree (
MST
) of aweighted graph is minimum weight spanning treeof that graph. Several well established
MST
algorithms exist to solve minimum spanning treeproblem [24, 19, 20]. The cost of constructing aminimum spanning tree is
O (m log n)
, where
m
isthe number of edges in the graph and
n
is thenumber of vertices. More efficient algorithm forconstructing
MST
s have also been extensivelyresearched [18, 5, 13]. These algorithms promiseclose to linear time complexity under differentassumptions. A Euclidean minimum spanning tree(
EMST
) is a spanning tree of a set of
n
points in ametric space (
E
n
), where the length of an edge isthe Euclidean distance between a pair of points inthe point set.The hierarchical clustering approaches are relatedto graph theoretic clustering. Clusteringalgorithms using minimal spanning tree takes theadvantage of
MST
. The
MST
ignores manypossible connections between the data patterns, sothe cost of clustering can be decreased. The
MST
based clustering algorithm is known to be capableof detecting clusters with various shapes and size[34]. Unlike traditional clustering algorithms, the
MST
clustering algorithm does not assume aspherical shapes structure of the underlying data.The
EMST
clustering algorithm [23,34] uses theEuclidean minimum spanning tree of a graph toproduce the structure of point clusters in the
n
-dimensional Euclidean space. Clusters aredetected to achieve some measures of optimality,such as minimum intra-cluster distance ormaximum inter-cluster distance [2]. The
EMST
algorithm has been widely used in practice.Clustering by minimal spanning tree can beviewed as a hierarchical clustering algorithmwhich follows a divisive approach. Using thismethod firstly
MST
is constructed for a giveninput. There are different methods to producegroup of clusters. If the number of clusters
k
isgiven in advance, the simplest way to obtain
k
clusters is to sort the edges of minimum spanningtree in descending order of their weights andremove edges with first
k
-1 heaviest weights [2,33].All existing clustering Algorithm require anumber of parameters as their inputs and theseparameters can significantly affect the clusterquality. Our algorithm does not require apredefined cluster number. In this paper we wantto avoid experimental methods and advocate theidea of need-specific as opposed to care-specificbecause users always know the needs of theirapplications. We believe it is a good idea to allowusers to define their desired similarity within acluster and allow them to have some flexibility toadjust the similarity if the adjustment is needed.Our Algorithm produces clusters of
n
-dimensionalpoints with a naturally approximate intra-clusterdistance.Geometric notion of centrality are closely linkedto facility location problem. The distance matrix
D
can computed rather efficiently using Dijkstra’s
algorithm with time complexity
O (| V|
2
ln | V |)
[29].The
eccentricity
of a vertex
x
in
G
and radius
ρ (
G), respectively are defined as
e(x) = max d(x , y) and
ρ
(G) = min e(x) yV xV
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 4, July 2010127http://sites.google.com/site/ijcsis/ISSN 1947-5500