A Study On Dynamic Load Balancing Algorithms

A Study on Dynamic
Load Balancing Algorithms

R. Luling, B. Monien, F. Ramme
PC 2
P A DERB ORN
CENTER FOR
C
PARALLEL
COMPUT ING
Technical Report PC2/TR-001-92

June 1992
PC2 - Paderborn Center for Parallel Computing, Universitat-GH Paderborn, D-33095 Paderborn, Germany
Phone: +49 5251 603342 Fax: +49 5251 60 3436 email: pc2-team@uni-paderborn.de
A Study on Dynamic
Load Balancing Algorithms 0
R. Luling, B. Monien, F. Ramme
Paderborn Center for Parallel Computing

University of Paderborn
Warburger Str. 100
33095 Paderborn, Germany
e-mail : [ rl j bm j ram ] @uni-paderborn.de
Abstract
Dynamic load balancing techniques have proved to be the most critical part of an ecient
implementation of various algorithms on large distributed computing systems.
In this paper a classification of dynamic distributed load balancing algorithms for homoge-
neous multiprocessor systems is introduced and a general test bed, using a random branch
& bound load-generator, for evaluating load balancing strategies is described. With its help
a number of well known load balancing strategies are compared with two new algorithms
based on the gradient model method. The behavior of all algorithms on various networks
when running dierent workload patterns is studied.
By our simulations on a recongurable transputer system it is shown that all strategies
perform better on networks with small diameter. The measurements indicate that even on
large networks one of the randomized strategies and our extension of the gradient model
method behaves very well when simulating data-migration, while under process-migration
another extension of the gradient model method is favored. These new algorithms seem to
be very robust to the kind of workload and therefore well suited for an integration into a
distributed operating system running on large networks.
Keywords: Dynamic Distributed Load-Balancing, Gradient Model, Branch & Bound
1 Introduction
In this paper dynamic load balancing techniques (also referred to as resource sharing, resource scheduling,
job scheduling or task migration methods) in large MIMD multiprocessor systems are studied. In our
0
An Extended Abstract of this report is published in the proceedings of the 3rd IEEE SPDP 91 pp. 686-689
1
case a multiprocessor system consists of autonomous processing elements, which are coupled by a point-
to-point connection network. Processors communicate solely by message passing.
Recently, distributed computing systems with several hundred powerful processors have been built. To
achieve a maximumeciency of these large systems the workload has to be distributed equally throughout
the network. In general we can distinguish static and dynamic load balancing algorithms. In the case
of a static load balancing policy, a xed process graph which represents the distributed computation is
mapped onto the interconnection network. In this case the aim is to minimize edge dilation, processor
load dierences and edge congestion. For an overview of this work see [13].
If the load situation changes in an unpredictable way, as it is the case for many applications, it is
necessary to use a dynamic load balancing strategy which is adaptive to this changing load situation. To
be ecient for large distributed systems the load balancing algorithm itself should be distributed. In the
past the problem of designing such algorithms was studied by various groups who used two very dierent
approaches:
The analytical method: In [3, 4, 7, 15, 21] dierent models were used to analyze the behavior of dynamic
load balancing techniques. Most of these models were based on queuing networks. Because of the
complexity of these models only simple strategies could be analyzed.
The simulation method: There is a huge amount of work which has been done in this eld over the past
decade (see Proceedings of the Int. Conf. on Distributed Computing Systems for an overview).
The majority of authors connected less than 16 processors and used a clique network, realized by a
local area network (LAN), as interconnection structure. Most of the times, the migration of large
packets (process migration) was studied on these networks. Only a few publications consider larger
interconnection networks. See e.g. [2] and [8].
Since many of the published load balancing strategies are only variations of one basic principle, we have
reduced some of the known algorithms to their main features and compared them by simulation. The
network topology is assumed to be homogeneous and in our case consists of up to 324 transputers. Cur-
rently, this is the largest dynamically recongurable transputer system in Europe. To study the behavior
in large networks, the transputers were connected to rings of up to 169 processors (large diameter) and
to a 88, 1313 and 1818 processor torus topologies (relatively large diameter, maximum degree of
four).
First measurements made on our latest machine (a partitionable transputer system with 1024 T805
processors) conrmed our expectations.
To compare dierent applications for load balancing strategies we have examined "process migration"
(relatively long packets with strongly varying properties) and "data migration" (relatively small and
homogeneous packets) separately. These load patterns are generated by a random branch & bound load
generator. In process and data migrations, each load unit is able to generate new load units. The
generating procedure, the scheduling technique and the properties of each load unit can be controlled by
parameters of the generator. This exibility enables us to use the same generator to model both workload
characteristics.
The report is structured as follows: In section 2 we will present our classication of load balancing
strategies. Section 3 introduces our simulation environment and the dierent load classes used. Several
algorithms known from literature and implemented in our simulation environment are shortly described
in section 4.1. In section 4.2 we will present two new algorithms based on the gradient model method.
Our simulation results are discussed in chapter 5. Some conclusions followed by more detailed simulation
results in the appendix conclude this report.
2 Classication of Load Balancing Algorithms

Baumgartner and Wah [1] presented a systematic characterization of load balancing strategies. They
listed dierent points of view (like cooperation, location of control, initiation, ...) and properties (like
2
static, dynamic, adaptable, ...) in a tabular form. We consider this representation to be too complex for
a general discussion, however.
Any dynamic distributed load balancing strategy can be separated into a decision part and a migration
part. In the decision part of the algorithm a decision to migrate or to keep a load unit is made. This
decision can be based on the local load situation and that of the neighboring processes or it depends on
the load situation of any subset of the whole network. In the rst case we call it a 'local decision base'
whereas the second case is called 'global decision base'.
In the migration part a load unit is send to another processor to decrease the load imbalance of the
system. If load units are migrated to direct neighbors only the strategy has a 'local migration space'.
Otherwise, it is called a strategy with 'global migration space'.
According to this distinction between global and local bases we are introducing a new ordering of load
balancing strategies with respect to the 'decision base' and the 'migration space' of a processor.
Here we distinguish the Local and G lobal concept for decision and migration activities. A further dis-
tinction is achieved by regarding the initiator of load balancing activities. (s ender, r eceiver or combined
(sr )).
Migration−
space However, the
distinction between sender and receiver
G LDGM i GDGM i
is less important, because most of the
i { s , r , sr } load balancing strategies can be formu-
lated in a sender or receiver initiated
L LDLM i GDLM i
manner.
L G Decision−base
3 The Simulation Environment and its Load

All experiments were performed on a recongurable network of up to 324 transputers. A transputer is
an autonomous processor (20 MIPS, 1.5 MFLOPs) with four serial links (20 Mbit/s) which permit fast
communication. These processors are connected by a point-to-point communication network of maximum
degree four.
The simulation environment developed is identical for each processor. It consists of a parallel, message-
based system of processes which is implemented using the programming language OCCAM2.
A communication process is responsible for the management of routing, termination detection and on-line
observations. A computation process performs the main computational work. Basically, it consists of a
computing kernel, a process- memory- management, and a process responsible for load balancing .
The computing kernel generates and consumes the workload. The load generator is able to create
random solution spaces for tree structured computations. A subtree of the solution space is unambiguously
described by the root node of that tree. A node is specied by a random value, the accumulated costs
(the sum of edge values from the node to the root), and its level in the solution tree. With the help
of these values it is possible to assign a probability-based cost value to every edge in the subtree. Also
a random number of sons and probability-based leaf-, solution-, transfer- and computation times are
assigned to every node. The solution tree is examined in a best-rst branch & bound ordering which
means that during each iteration of the simulator the unit with minimal cost is taken from the local heap
and the associated computation time is simulated. If this load unit is not a leaf of the tree, a number of
new load units are generated. This model of the solution space is an extension of the model proposed by
Smith in [17]. The properties of the solution space can be determined by user specied parameters. The
deterministic generation is done during run time. That means, its creation is independent of the number
of processors and also independent of its local context. Only in this way it is possible to get comparable
measurements.
3
There are two important characteristics of a distributed computation for the behavior of a load bal-
ancing algorithm. One is the variation of the amount of work associated with a load unit. The other is
the variation of the size of the load units. The rst property can be controlled by some parameters of
the load generator. For the second property we distinguish data- and process migration.
In data migration all packets are relatively small and homogeneous in size. Typical examples for this
class of applications are search algorithms (e.g. branch & bound and alpha-beta search) in the area of
articial intelligence and operations research. We determine the load of one processor as the number of
load units held by this processor.
A common application of process migration is dynamic load balancing within a distributed operating
system. In this case a process consists of program code and corresponding data. This implies that the
packages which have to be migrated are relatively large with strongly varying properties. For our simula-
tion we assume that all tasks are independent, since for most applications the best strategy is to migrate
dependent tasks as one process cluster and to do the whole computation locally afterwards [5].
4 Load Balancing Algorithms

To guarantee a fair comparison, each of the following algorithms have been varied and optimized sepa-
rately. In this section we will not consider the question of how to x the parameters of each strategy. Due
to the strong dependency on the strategy being used this part is solved during the optimization phase of
each single algorithm.
4.1 Known Algorithms

4.1.1 LDLM Strategies
a) Local Random (l-rnd)

In this strategy a processor distributes some of his load units to its local neighbors after each xed
time interval [20, 21]. This is a LDLMs {strategy, because the decision to migrate a load unit is
done purely local. The receiver of load is also a direct (local) neighbor. The balancing is initiated
by a processing element which sends a load unit. We implemented this strategy in a way that after
x iterations of the simulator y load units are sent to random neighbors.
b) Direct-Neighborhood (d-N)
If the local load increases by more than up percent or decreases by more than down percent, the
actual load value is broadcasted to the direct neighbors. If the load of a processing element exceeds
that of its least neighboring load by more than d percent, then it sends one unit to that neighbor
[11]. This strategy is rened in [12] to a dynamically adaptation of the parameters according to
the actual workload of a processor and its communication activities.
4.1.2 LDGM Strategies
As an example of this type of strategy we have implemented the global random strategy (g-rnd) which
is similar to the local random algorithm. The only dierence to the above strategy is that load units are
now migrated over the whole network to a randomly chosen processor [16].
4.1.3. GDLM Strategies
The gradient model (GM) method was introduced by Lin and Keller [10]. It belongs to the group of
GDLMr {strategies, because decisions are based on gradient information. Gradients are vectors consisting
of load respectively distance information of (more or less) all processing elements, which means that each
processor tries to achieve a well approximated global state information of the network. Load units are
4
always sent to immediate neighbors (local). A processor can be in one of the three states L (low), N
(normal) or H (high) according to the local load situation. Each processor "knows" the outgoing link
which leads on the shortest path to a processor which is in state L. If a processor is in state H it sends
a load unit on this link in the direction of an underloaded processor. If a processor changes its state or
updates its shortest paths to a processor in state L, this state is sent to all direct neighbors.
4.1.4. GDGM Strategies
b) Bidding-Algorithm (bid)
The Bidding-Algorithm based on [6, 18] is also state-controlled. The number of processing elements
which are able to take load units from a processor in state H depends on the distance between
these processors. The maximum distance of the load receiver and sender is varied dynamically.
Bid replies take account of communication costs.
The basic idea of this algorithm is that a processor in state H tries to migrate a load unit to a
processor with maximal bid value among all processors which have a distance less than d from the
initiating processor. The distance value d is increased (decreased) if the initiator does not receive
enough bids (receives too many bids) for his oered load unit in a xed time interval which also
depends on d. For a complete description of this algorithm see [6, 18].
a) Drafting Algorithm (draft)
In this algorithm a processor can be in one of the three states L (low), N (normal) or H (high)
which represent the actual load situation. Each processor maintains a load table which contains
the most recent information of the so called candidate processors. A candidate processor is a
processor from which a load unit may be received. Ni et. al [14] choose only the direct neighbors
as candidate processors. To achieve a better separation from the strategies with local migration
space, we use every processor of the network as candidate processors. In opposition to [14] a load
unit is allowed to be migrated several times. Every message is extended by the load value of the
sender (piggybacking) which is used to update the values of the load tables. If the local state of
a processor changes signicantly, it is broadcasted through the network. A migration activity is
initiated by a processor which is in state L. This processor selects one of the processors of its local
load table which is in state H, to migrate a load unit to the initiator.
4.2 Two New Load Balancing Algorithms

In this chapter we describe two new extensions of the basic gradient model method (GM). The following
example shows a snapshot of a distributed system using the original gradient model method.
2 1 2 3
1 0 1 2
2 1 2 3
Let be a processing element in state L.

Then the values inside the boxes represent the
Example: 1 0 1 2
computed distances to an underutilized processor.
Lin and Keller concluded, that if no processor is in state L, then there is no need for further balancing
activities. Otherwise, only unnecessary communication costs will arise. From our experiments it follows
that this statement is only true in small networks. The reasons are illustrated by the following pictures,
which show a prole of the system load.
If the load inside the areas goes down (more or less suddenly, in a large network)[g. a ], then the
5
GM starts rebalancing along the anks [g. b ]. Larger areas of underloaded processing elements are the
consequence of this strategy, which will be overcome by the new algorithm presented in the next section.
H
{ H
{
N { N {
L { L {
Network Network
g. a g. b
4.2.1 The Extended Gradient Model (X-GM) [ GDLMsr{strategy]

Now we will present our extension of the original gradient model. Analogous to the pressure surface of
the gradient model method [10] a suction surface will be dened. This gradient surface usually has a
direction opposite to the pressure surface. The suction surface facilitates the transfer of load units from
the border of an H-eld to the center of the nearest local load minimum.
Let G = (V; E) be a processor network and i 2 V a processing element. Then we dene w, w : V
! f0, : : :, D(G)+1g as follows. Let w(i) be the length of the shortest path from node i to a processing
element which is in state L and w (i) the length of the shortest path to a node in state H (were D(G)
equals the diameter of G). w(i), w (i) equal D(G) + 1 if such a node does not exist in the graph. The
X-gradient surface XGS(G) is the collection of all such values.
(2,2) (1,1) (2,0) (3,0)
state N state L state H

(1,2) (0,1) (1,0) (2,0)
Let be a processing element in

(2,2) (1,1) (2,0) (2,0) state L and let denote a process-
ing element in state H. Then the tuple
inside the boxes represents the (pressure,
Example: (1,3) (0,2) (1,1) (2,1)
suction) values of the XGS(G).
Since the workload of the system changes dynamically, the X-gradient surface can only be approxi-
mated. This is done by a protocol like that used in the original gradient model. For this we recursively
dene a pressurefunction p : V ! f0, : : :, D(G)+1g by
0
pt (i) := min i t = 0 or the state of i equals L
fD(G) + 1; 1 + minfpt?1(j) j j is a neighbor of i gg otherwise
and the suction 8function s : V ! f0, : : :, D(G)+1 g by
< D(G) + 1 i t = 0
st (i) := : 0 i the state of i equals H
minfD(G) + 1; 1 + minfst?1 (j) j j is a neighbor of i gg otherwise
The X-pressure surface is dened to be the collection of the pressure and suction values for all process-
ing elements. It was shown that if the X-pressure surface is constant for two consecutive time stamps
6

t ? 1 and t then i) pt(i) = w(i) 8i 2 V and
ii) st (i) = w(i) 8i 2 V
Proof (sketched) :
i) was shown in [10]
ii) if 6 9 i 2 V such that i is in state H then it follows by denition that st (i) = D(G) + 1 = w(i).
If there is any k 2 V such that k is in state H, then let i 2 V be a processing element and k the
nearest processing element in state H with respect to i. It can be easily shown by induction on
the length of the shortest path from node i to k that st (k)=w(k) holds.
This implies that pressure- and suction- surfaces are well approximated. 2
The following X ? GM algorithm is activated on arrival of a message from a neighboring processor or
when the local load situation changes.
Let fp1, : : :, pk g, f s1 , : : :, sk g be the pressure and suction values of the k neighbors of a processing
element. Initially we set pj := 0 and sj := D(G) + 1 8 neighbors j.
ON event DO
old.p:= p; old.s:= s
CASE local state OF
L : p :=0; ignore pressure values from neighbors
s :=minfD(G)+1, 1+minfsj j 1 j k g g
N : p := minfD(G)+1, 1+minfpj j 1 j k g g
s :=1+minfsj j 1 j k g
IF (s > D(G)+1) THEN s := D(G)+1
ELSE IF (maxfsj j 1 j k g > s) THEN
send one load unit to neighbor j with maximal suction value sj
H : s :=0; p := 1+minfpj j 1 j k g
IF (p > D(G)+1) THEN p := D(G)+1
ELSE send one load unit to the neighbor j with minimal pressure value pj
IF (maxfsj j 1 j k g > s) THEN
send one load unit to neighbor j with maximal suction value sj
IF (p <> old.p) THEN send p to all neighbors
IF (s <> old.s) THEN send s to all neighbors
4.2.2 The Global Extended Gradient Model (GX-GM) [ GDGMsr{strategy]

Because the pressure and suction surface represent distance information, it is possible to extend the X-GM
method in a global fashion. Load units which have to be transferred are extended by a pressure (suction)
distance. A processor which only has to transfer such a unit decrements (increments) the distance value
and sends this unit to a neighbor in the actual pressure (suction) direction. An exception of this scheme
arises if the state of such a processor equals L. In this case the load unit is stored locally.
This model was used in several ways. We achieved the best results when balancing based on the pressure
surface (next requirement) was done locally (like in 4.2.1) and balancing based on the suction surface
(future requirement) was done in a global manner.
5 Simulation Results
To study the behavior of load-balancing strategies in large networks we have to choose the network
topology and the kind of load.
7
If diameter and degree of a network are constant, we expect that the behavior of networks like the torus
is comparable to networks with more processors and logarithmic diameter (like the De Bruijn network).
So we selected the 88, 1313 and 1818 torus topology for simulation purpose. We connected the
processors to rings of up to 169 (132) elements to study networks with considerably larger diameter. It
has to be recognized that besides the diameter, the average degree of the network can also in uence the
behavior of the strategies.
Another decision concerns the kind of load to be simulated. As mentioned in section 3, we considered
data and process migration separately. However, we studied at least one strategy out of every class
introduced in section 2.
To examine the behavior of the algorithms described in section 4, we let each of them run in their speed-
up optimized version on dierent workloads. To avoid nondeterministic eects which are inherent in
distributed computations, we performed each simulation several times. Only average values were used
for comparison.
5.1 Measurements : Data Migration

For the simulation of data migration we used a uniform random distribution in the range of [80, 240]
bytes. The computation time for each load unit was chosen with the same distribution in the range of
[160, 173] msec. 15 random solution spaces were computed, each of them three times.
The following gures give a rst impression of the behavior of the algorithms. Table 1-3 of the appendix
present more detailed information.
100
100
80
80
60
60
40
40
20
1
20
1 64
64
0
7 130
6 130
0
5 7 6 195
4 195 5
3 259 4 259
2 3 2
1
g.1 (torus) g.2 (ring)
0 324 1 324
0
The numbering of the strategies of gure 1-2 is as follows :

# processors : 1-324; % eciency : 0-100 (ratio of speed-up and # processors);
strategies : 0-7 (0=l-rnd, 1=d-N, 2=g-rnd, 3=X-GM, 4=GM, 5=bid, 6=draft, 7=GX-GM);
From g.1 (torus topology) we derive that only g-rnd (2), X-GM (3) and GX-GM (7) behave well. The
l-rnd (0) strategy slows down very rapidly, as can be seen in g.2 (ring topology). g-rnd is able to saturate
the networks by extremely high migration activities (see table 3). However, this is less important when
performing data migration, because the package size is relatively small. From g.2 (ring topology) we
can derive that a global migration space is a very important criteria in networks with large diameter and
low degree. Nevertheless, strategies like bid (5) or draft (6) which have a global migration space need a
great amount of control communications. That is the main reason why these strategies behave so badly
on large networks. The main drawback of d-N (1) is its local migration space, which often results in a
clustering of low saturated processors.
8
Summary :
When performing data migration in large networks, the load balancing strategy should have at least one
global component (migration space or decision base) and should not need too many control communica-
tions.
g−rnd
Migration − GX−GM
space
The best suited strategy was g-rnd, followed by

GX-GM. X-GM also behaved well.
Using g-rnd one should recognize that the sys-
LDGM GDGM
tem must be able to route packages over the

X−GM
LDLM GDLM whole network, which is not necessary using

X-GM or GX-GM.
Decision − base
Data − Migration
5.2 Measurements : Process Migration

To examine process migration we studied heavily loaded UNIX{workstations. With the help of our load
generator and the derived properties we computed 10 dierent process trees (or process forests), each
three times.
The packet length for each load unit was chosen in the range of [6, 202] kbyte with normal distribution
( = 44kbyte, = 400). The computation time for each load unit was chosen in the range of [64, 768]
msec. with exponential distribution ( = 0:007).
The following gures give an impression of the behavior of the algorithms on these workload patterns.
Table 4-6 of the appendix provide further information.
100
100
80
80
60
60
40
40
20
20
1
0
7 1 64
0
6 7
5 64 6 130
4 130 5
3 4 195
2 195 3
259 2 259
1 1
0 324 0 324
g.3 (torus) g.4 (ring)
The numbering of the strategies of gure 3-4 is as follows :

# processors : 1-324; % eciency : 0-100 (ratio of speed-up and # processors);
strategies : 0-7 (0=l-rnd, 1=d-N, 2=g-rnd, 3=X-GM, 4=GM, 5=bid, 6=draft, 7=GX-GM);
9
Considering g.3 (torus), it is remarkable that only d-N (1), X-GM (3) and GX-GM (7) behave well.
All these strategies have a well-directed migration policy while using a moderate amount of control mes-
sages. g.4 (ring) suggests, that d-N will slow down very much if the diameter increases further. This is
due to the restricted decision base which results in load clustering if the network is large enough. Because
g-rnd has a high migration activity and the package size is large when performing process migration, this
strategy now behaves badly. Strategies with synchronized protocol activities like bid (5) or draft (6) also
behave poorly, because the necessary control messages ow slowly through the network and are often
inconsistent with the actual system load.
Summary :
When performing process migration in large networks, the load balancing strategy should have a well-
directed migration policy, so that only a few wrong migration decisions are made. This requirement is
fullled by a global decision base, but only if the global load information can be well approximated. A
global migration space or protocols working over long distances seem to be critical, because the edges
of the networks are often blocked by transferring large packages and therefore the protocol information
becomes very inconsistent.
Migration − GX−GM
space
X−GM
The best suited strategy was X-GM, followed by

LDGM
d−N
GDGM
GX-GM and d-N.
The local decision base of d-N and the global mi-
LDLM GDLM gration space of GX-GM are responsible for its
lower performance.
Decision − base
Process − Migration
As a summary, one can state that g-rnd is well suited when performing data migration on large net-
works. It is easy to implement but requires a global routing facility of the system.
d-N is well suited when performing process migration on networks of average size. The protocol is rela-
tively easy to implement and a global routing facility of the system is not needed.
X-GM behaves very well when performing process migration and quite well when performing data mi-
gration on large networks. So it seems to be relatively robust with respect to the load characteristics.
The X-GM algorithm is easy to implement and is independent of the routing facilities of the system.
These properties makes the X-GM strategy best suited for an integration into an environment where the
workload characteristics are unpredictable. An example of such an environment is a distributed operating
system running on a large network.
10
6 Conclusions
In this paper we have studied dynamic load balancing algorithms using a general purpose simulation
environment running on networks of up to 324 (1024) processors.
We have introduced a new classication scheme for dynamic load balancing algorithms, implemented eight
very dierent strategies (section 4), and evaluated their behavior on dierent ring and torus topologies
for data- and process migration.
Our results indicate that the decision about the most suitable load balancing algorithm depends on the
network as well as on the workload characteristics. We were able to make a promising extension of the
gradient model method introduced by Lin and Keller [10].
It has been shown that in large networks and under data migration a random strategy with global
migration space (4.1.2) and our global variant of the extended gradient model (4.2.2) perform well. In
large networks and under process migration our extended gradient model (4.2.1) has the best behavior.
Because this algorithm leads to a high performance relatively independent of the workload characteristics
it seems to be well suited for an integration into an environment with unpredictable workload patterns.
Appendix
Table one and two present the speed-up measurements (the ratio of sequential and parallel computation
time) when performing data migration (see section 3). The minimum parallel computation time of the
average problem instance is 58.2 seconds.
The values presented are normalized to a zero search-overhead. It can be shown that anomalies of the
distributed branch & bound method [9], which most times results in a search-overhead, are in corre-
lation to the complexity of the used load balancing algorithm. By normalizing the results to a zero
search-overhead, it is possible to deduce statements about the behavior of the algorithms for a xed
computation. Further details are shown in table three.
speed-up values data migration speed-up values data migration
zero search-overhead Torus topology zero search-overhead Ring top.
Alg. 8x8 (64) 13x13 (169) 18x18 (324) Alg. 64 169
l-rnd 41.2 111.8 200.6 l-rnd 31.4 37.2
d-N 54.0 84.5 138.8 d-N 42.8 37.9
g-rnd 60.5 131.7 239.5 g-rnd 56.1 102.0
X-GM 61.2 139.6 228.9 X-GM 48.6 62.8
GM 57.1 124.2 189.0 GM 46.7 45.2
bid 50.5 94.2 130.4 bid 25.7 11.2
draft 57.6 100.6 88.1 draft 51.3 46.6
GX-GM 61.2 139.5 224.8 GX-GM 55.2 73.5
Table 1 Table 2
11
data migration 18x18 (324) Torus top.
Alg. #load.trans #ctrl %search
l-rnd 1689913 | 7.15 Here #load.trans is the total number of
d-N 85227 1128386 17.31 hops load units were migrated, #ctrl is
g-rnd 1740619 | 0.68 the total number of control messages sent
X-GM 556162 649225 5.26 and %search denotes the search-overhead
GM 1144404 488408 17.44 with respect to the sequential case.
bid 103560 8155760 20.63 Table 3
draft 43027 15883144 9.97
GX-GM 648006 668870 4.77
Table four and ve present the speed-up measurements when performing process migration
(see section 3). The minimum parallel computation time of the average problem instance is
154.2 seconds. Additional information is given by table six.
speed-up values process migration speed-up values process mig.
Torus topology Ring topology
Alg. 8x8 (64) 13x13 (169) 18x18 (324) Alg. 64 169
l-rnd 31.1 112.9 159.4 l-rnd 14.3 14.6
d-N 62.8 145.3 221.5 d-N 47.85 44.8
g-rnd 58.1 87.3 122.8 g-rnd 45.9 61.5
X-GM 60.9 142.6 253.6 X-GM 57.9 72.9
GM 49.4 86.7 180.9 GM 50.0 41.3
bid 53.0 127.2 173.5 bid 32.4 16.4
draft 59.1 118.6 171.4 draft 53.4 65.1
GX-GM 59.8 142.1 239.7 GX-GM 54.1 65.4
Table 4 Table 5
process migration 18x18 (324) Torus top.

Alg. #load.trans #ctrl
l-rnd 419634 | Here #load.trans is the total number of
d-N 61268 1076800 hops load units were migrated and #ctrl is
g-rnd 397992 | the total number of control messages sent.
X-GM 500445 1364287
GM 122037 218140 Table 6
bid 110150 6973739
draft 36860 20927417
GX-GM 430483 1493622
12
References
[1] K.M. Baumgartner, B.W. Wah
A Global Load Balancing Strategy for a Distributed Computer System, Workshop on the
Future Trends of Distributed Computing Systems in the 1990s IEEE Comp. Soc. Press,
1988, pp. 93-102
[2] W. Bodenschatz
Multi-Transputer-Maschine zur parallelen Reduktion von Funktionalsprachen, PARS Work-
shop 1989, pp. 128-150
[3] T.L. Casavant, J.G. Kuhl
A Formal Model of Distributed Decision-Making and its Application to Distributed Load
Balancing, IEEE 6 th Int. Conf. on Distributed Computing Systems 1986, pp. 232-239
[4] T.L. Casavant, J. G. Kuhl
Analysis of Three Dynamic Distributed Load Balancing Strategies with Varying Global In-
formation Requirements, IEEE 7 th Int. Conf. on Distributed Computing Systems 1987,
pp. 185-192
[5] A.K. Ezzat, R.D. Bergeron, J.L. Pokoski
Task Allocation Heuristics for Distributed Computing Systems, IEEE 6 th Int. Conf. on
Distributed Computing Systems 1986, pp. 337-346
[6] D. Ferguson, Y. Yemini, C. Nikolaou
Microeconomic Algorithms for Load Balancing in Distributed Computer Systems, IEEE 8 th
Int. Conf. on Distributed Computing Systems 1988, pp. 539-546
[7] C.Y.H. Hsu, J.W.S. Liu
Dynamic Load Balancing Algorithms in Homogeneous Distributed Systems, IEEE 6 th Int.
Conf. on Distributed Computing Systems 1986, pp. 216-223
[8] L.V. Kale
Comparing the Performance of two Dynamic Load Distribution Methods, Parallel Process-
ing, vol. 1, 1988, pp. 8-12
[9] T.H. Lai, S. Sahni
Anomalies in Parallel Branch and Bound Algorithms, Proc. of the Int. Conf. on Parallel
Processing, 1983, pp. 183-190
[10] F.C.H. Lin, R.M. Keller
The Gradient Model Load Balancing Method, IEEE Trans. on Software Engineering 13,
1987, pp. 32-38
[11] R. Luling, B. Monien
Two Strategies for solving the Vertex Cover Problem on a Transputer Network, 3 rd Int.
Workshop on Distributed Algorithms 1989, LNCS 392, pp. 160-170
[12] R. Luling, B. Monien
Load Balancing for Distributed Branch and Bound algorithms, manuscript 1991
[13] B. Monien, H. Sudborough
Embedding one Interconnection Network in Another, Computing Suppl. 7, 1990, pp. 257-282
13
[14] L.M. Ni, C.W. Xu, T.B. Gendreau
Drafting Algorithm - A Dynamic Process Migration Protocol for Distributed Systems, IEEE
5 th Int. Conf. on Distributed Computing Systems 1985, pp. 539-546
[15] S. Pulidas, D. Towsley, J.A. Stankovic
Imbedding Gradient Estimators in Load Balancing Algorithms, IEEE 8 th Int. Conf. on
Distributed Computing Systems 1988, pp. 482-490
[16] F. Ramme
Lastausgleichsverfahren in Verteilten Systemen , Master Thesis, University of Paderborn,
1990
[17] D.R. Smith
Random Trees and the Analysis of Branch and Bound Procedures, JACM, vol. 31, 1984, pp.
163-188
[18] J.A. Stankovic, I.S. Sidhu
An Adaptive Bidding Algorithm for Processes, Clusters and Distributed Groups, IEEE 4 th
Int. Conf. on Distributed Computing Systems 1984, pp. 49-59
[19] J.M. Troya, M. Ortega
A study of parallel branch-and-bound algorithms with best-bound-rst search, Parallel Com-
puting vol. 11, 1989, pp. 121-126
[20] O. Vornberger
Load Balancing in a network of Transputers, 2 nd Int. Workshop on Distributed Algorithms
1987, pp. 116-126
[21] S. Zhou
A Trace-Driven Simulation Study of Dynamic Load Balancing , IEEE Trans. on Software
Engineering, vol. 14, no.9, 1988, pp. 1327-1341
14

A Study On Dynamic Load Balancing Algorithms

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Study On Dynamic Load Balancing Algorithms

Uploaded by

Copyright:

Available Formats

A Study on Dynamic

Load Balancing Algorithms

Technical Report PC2/TR-001-92

Paderborn Center for Parallel Computing

2 Classi cation of Load Balancing Algorithms

3 The Simulation Environment and its Load

4 Load Balancing Algorithms

4.1 Known Algorithms

a) Local Random (l-rnd)

4.1.2 LDGM Strategies

4.1.3. GDLM Strategies

4.1.4. GDGM Strategies

4.2 Two New Load Balancing Algorithms

Let be a processing element in state L.

4.2.1 The Extended Gradient Model (X-GM) [ GDLMsr{strategy]

state N state L state H

Let be a processing element in

4.2.2 The Global Extended Gradient Model (GX-GM) [ GDGMsr{strategy]

5.1 Measurements : Data Migration

The numbering of the strategies of gure 1-2 is as follows :

The best suited strategy was g-rnd, followed by

tem must be able to route packages over the

LDLM GDLM whole network, which is not necessary using

5.2 Measurements : Process Migration

g.3 (torus) g.4 (ring)

The numbering of the strategies of gure 3-4 is as follows :

The best suited strategy was X-GM, followed by

process migration 18x18 (324) Torus top.

You might also like

2 Classication of Load Balancing Algorithms