Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
A Clustering Model for Memory Resource Sharing in Large Scale Distributed System 2007

A Clustering Model for Memory Resource Sharing in Large Scale Distributed System 2007

Ratings:
(0)
|Views: 18|Likes:
Published by a2542

More info:

Categories:Types, School Work
Published by: a2542 on Mar 10, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/14/2009

pdf

text

original

 
A Clustering Model for Memory Resource Sharingin Large Scale Distributed System
1
 
Rui Chu, Nong Xiao, and Xicheng Lu
 National Laboratory for Parallel and Distributed Processing, ChangSha, HuNan, China{rchu, nongxiao, xclu}@nudt.edu.cn
1
 
The work was partially supported by the National Basic Research Program of China (973) under Grant No.2005CB321801 and the National Natural Science Foundation of China under Grant No. 90412011, No. 60573135
 
Abstract
 As an application of large scale distributed network computing system, RAM Grid tries to solve the problem of memory resource sharing and utilization. Due to the special properties of memory, traditional resource information management approaches cannot be adapted easily. This paper proposes a clustering based resource aggregating scheme under thebackground of RAM Grid, which can reduce the scaleof resource information management efficiently. Withanalogy to the force field and potential energy theoryin physics, the basic model, the force field-potential energy model, and the corresponding distributed algorithms are proposed, respectively. The model and algorithms are also evaluated by real network topologies based simulation.
1. Introduction
With the rapid development of network computingtechnologies, many schemes and applications in cluster computing have been extended to the wide-areanetwork environment. RAM Grid is such a schemewhich has been proposed by our previous work [1].The basic idea of RAM Grid is to organize the wide-area distributed memory resources together, and provide remote memory which may be much faster than local or remote disk. Just like the traditionalnetwork memory schemes in cluster computingenvironment, such as GMS [2], Anemone [3, 4] andPNR [5] etc., the primary application of RAM Grid isto provide cooperative caching for the case of massnon-continuous disk I/O operations; furthermore, it canalso benefit the memory-intensive applications. Theessence of network memory is to improve the performance of the disk, which suffers from the low-speed mechanical operations. The improvement mainlydepends on the network performance.While being different from the network memoryschemes, RAM Grid is a large scale network computing solution like the P2P or grid computingapproaches, which needs to consider the common problems in the loose-coupled network computingenvironment. The problems mainly come from theautonomic, heterogeneous and dynamic resources,while in cluster computing environment these problems are not emphasized. As a result, although the basic idea and applications are similar, the solutionwith RAM Grid and traditional network memory arequite different.The resource information management is a key partin RAM Grid, which mainly focuses on the problem of organizing and discovering the memory resourcesefficiently with less extra cost. Most of the traditionalnetwork memory schemes simply use the centralizedmanagement, which works well in cluster computingsystem, but does not fit to the larger environment for  both reliability and scalability consideration. Manyworks on P2P system currently make progress for the problem of resources publish and discovery, such asthe numerous works on DHT network. While the DHTsolution often needs a resource identity (such as thefile name, IP address, etc.) as the hash key, in RAMGrid the key factor for resources discovery is thenetwork conditions between two peers, which is notappropriate for the hash key in most DHT schemes. Inour previous work, as a basic solution, the memoryresources management is based on the simple P2Pschemes such as flooding with TTL and the time costof the resources discovery is mostly hidden throughtime overlapping [1]. But obviously, such simple waywill consume much network bandwidth with poor  performance and scalability.
978-1-4244-1890-9/07/$25.00 ©2007 IEEE
 
Indeed, although the available memory resources inthe entire RAM Grid may be abundant, due to therestriction of the network performance, a user can onlyselect a limited subset of the remote memory to servefor it. Moreover, the memory that is too far from theuser in network distance is meaningless, since it cannot perform better than the local disk. This motivates us to partition the resources into groups (we will use theterm ‘group’ instead of ‘cluster’ in order to avoid theconfusion with the concept of ‘cluster computing’). Auser should only search for the resources in its group,and the probability of the resources that fit for a user inanother group is relatively low.In this paper, we will analyze the detailedrequirement for memory resource informationmanagement. We will then propose the managementscheme using the clustering technique, whose keyfactor is network RTT. In other words, each group produced by the clustering scheme will cover thememory resources with the bounded network RTTamong each other. After clustering, the resourceinformation management and discovery can be taken ina single group, and the simple way of the centralizedmanagement can also work well in the group.Furthermore, our scheme is based on the instance of RAM Grid, but can also be used for other similar distributed applications.In order to cluster the resources efficiently, we first put all of the nodes into an Internet coordinates systemfor network RTT estimation using the previous work on network embedding. Since the nodes in RAM Gridand their relationship is similar with the physics systemcomposed by several particles and the spring force anduniversal gravitation among them, we can model theclustering process by the analogy of the force field and potential energy system in physics. The jitter of network RTT is also a key factor for RAM Grid, wewill consider the influence of the jitter, and reflect it inour model using a probability analysis approach. Our clustering algorithm will be proposed based on thismodel; the algorithm will also be evaluated andcompared through the simulation on several realnetwork topologies.The rest of this paper is organized as follows.Section 2 presents the related work that leads us to thetopic of this paper. Section 3 discusses several systemmodels. Section 4 provides the clustering mechanismand algorithms based on the model. Section 5 presentsthe simulation setup and analyzes the result. Section 6concludes this paper.
2. Related work 
The research on network memory emerges with thedevelopment of high performance communicationnetwork. Related statistic [6] shows that the performance on disk latency improves 10% every year,and the bandwidth improves 20% annually. While the performance on latency and bandwidth of network improves 20% and 45% every year respectively. Thenetwork memory schemes have been classified intothree types according to their applications, for example,using remote memory as a remote paging device, as thecache for disk based file systems or databases, or as anew high speed temporary file system.Feeley
et al 
.
 
 proposed a typical network memorysystem used for remote paging named GMS [2], whichmainly focuses on the remote paging for memoryintensive applications. GMS tries to manage all of the physical memory pages in a single cluster. In order toreplace the obsolete pages in the global view, eachmemory page in GMS has a unique ID and its ageinformation. The information is maintained by a master node, which is the center of the entire system. TheRemote Memory Pager [7] by Markatos
et al 
. and theAnemone [3] by Hines
et al 
. are both similar to GMSin resource information management, despite they havedifferent research issues and techniques. There alwaysexists one or two central manager to collect andallocate available memory resource. The Parallel Network RAM [5] raised four different memoryresource information management strategies, includingcentralized, client only, local manager, and backbonemanagement. They conclude that the backbone or client only strategy performs better in different cases,while in most cases the centralized strategy is not the best. All of the above network memory schemes focuson the memory management in a single cluster; thenetwork communication cost can be taken as aconstant in such ideal local environment. Our work interests on the wider area, the network cost and its jitter dominate the system performance. Therefore, weshould manage the resources based on network distance measurement and estimation. Network embedding is a topic to transform thedistance graph among Internet nodes to a coordinatesystem. The work with most attention is GNP [8].Many successive works such as PIC [9], ICS [10],Vivaldi [11], and BBS [12] improve the original one indifferent ways. All of them try to put the nodes into asynthetic Internet coordinate system, and estimate theRTT between two arbitrary nodes with minimal error,instead of measurement. All of the above works areimportant enabling techniques for our scheme in this paper.
 
3. System Model
3.1. Overview
The essence of RAM Grid is to share remotememory connected with high speed network, where themain overhead comes from the latency of the disk or network, rather than bandwidth. Therefore, in order tosimplify the system model, we can only consider asingle network parameter of latency, or the network round-trip time (RTT) which is twice the latency andeasier to measure.For the RTT of network, we have the assumptions below.
 
Arbitrary two nodes can connect and transmitwith each other directly through the physicalnetwork infrastructure. The influence of NATand firewall can be ignored.
 
The RTT of network is symmetric, meaningthat
),(),(
 A B RTT  B A RTT 
=
.
 
The RTT of network accords with the trianglerule, indicating that
),(),(),(
 B RTT  A RTT  B A RTT 
+<
.We also have the assumptions for local disk asfollows.
 
The disk latency of node A, denoted as
)(
 A DL
,can be taken as a constant which can be easilymeasured.
 
The capacity of local disk is unlimited.
 
The built-in cache of disk can be ignored. Therationale of this assumption has been describedin details in our previous work [1].In RAM Grid system, for any node A and Baccording
)(2/),(
 A DL B A RTT 
, if B is providingremote memory, A should not use the memory of B.This is because A can get data block from local disk faster than that from remote memory of B. In other words, in spite of abundant idle memory existing inRAM Grid system, for any node A which needs remotememory, only a little subset of the available memoryresources fit for A. Note that, we define the conceptthat
 B fit for A
 
iff 
 
)(2/),(
 A DL B A RTT 
<
, which meansA can use the remote memory of B for caching.The memory resources organization andmanagement is a problem because both of thecentralized way and the P2P approaches usually cannotwork well. In fact, we noticed that the remote memoryfor a user node only localize among a little subset of allresources as described above, and the resourceinformation management of RAM Grid can bedesigned in a hybrid way, which partitions all thenodes into several groups. For a single group, thenumber of nodes in it is limited; and the diameter of each group, defined as the largest network latency of two nodes in the group, is also limited. Thus, we can perform the centralized management in each group.The manager of the group can be elected by somemechanism, which is important but will not be presented in this paper due to the page limitation.Suppose a user node A is in group G, if most nodesthat fit for A are also in group G, then the groupedhybrid management is appropriate for RAM Grid. Next,we will model this problem.
3.2. Basic Model
In order to partition all the nodes in RAM Grid intogroups, we need some approach for clustering. In our expectation, we should firstly fix the nodes in certainspace, and then the clustering algorithm will constructoverlay among these nodes in the space. Theconnected nodes of the overlay are in the same group.The basic clustering problem is defined below.For the set of the nodes
{ }
[ ]
ni N 
i
,1
, find out thesubsets
[ ]
m jG
 j
,1,
, where
U
m ji j
 N G
1
}{
=
=
and
[ ]
Φ=
 j
GG jm j
I
,1,
.
Restriction
:
[ ]
Gm j
 j
)(,1
, where
is thenumber of the nodes in the group, also called the sizeof the group,
is a threshold of the group size.
Object 1
: minimizing such case:
 D B A RTT  jG BG A
 j
2/),()()()(
.
Object 2
: minimizing such case:
 D B A RTT G BG A
 j j
>
2/),()()(
.Parameter 
 D
can be selected from the typical disk access latency. In other words, in the basic model, wecan simply assume that
[ ]
 D N  DLni
i
=
)(,1
.
 D B A RTT  jG BG A
 j
2/),()()()(
meansthat, nodes A and B fit for each other, but they cannot be clustered into one group. This is called the
 False Negative
of A and B, also denoted as predication
 FN(A,B)
.Similarly,
 D B A RTT G BG A
 j j
>
2/),()()(
iscalled
 False Positive
, denoted as
 FP(A,B)
.The result of the problem is not unique since the
Object 1
and
Object 2
usually cannot be satisfiedtogether. For example, in Figure 1, supposing
 D=4.5
,we cannot satisfy
Object 1
when
Object 2
is satisfied.Therefore, the result can be
{ } { }
G B AG
==
21
,,
or 
{ } { }
 BG AG
,,
21
==
.Intuitively, the result
{ } { }
G B AG
==
21
,,
is better than
{ } { }
 BG AG
,,
21
==
, because the sum of thegroups’ diameter is lower in the former case. In order 

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->