You are on page 1of 4

2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11

A Global Scheduling Algorithm Based on Dynamic Critical Path

Xing Gu, Qun Yang, De-chang Pi, He-yang Ke


College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Nanjing, China
E-mail: guxing94@yahoo.cn

Abstract—Task scheduling algorithm, with which the tasks algorithm is proposed in Section 3. In Section 4, we use an
with precedence constraints are assigned to the proper example to illustrate these algorithms and give experimental
processor, is vital for obtaining high performance in results. Section 5 provides the concluding remarks.
multiprocessors system. In this paper, we firstly analyzed three
typical table-based algorithms, i.e. MCP algorithm, ETF II. BACKGROUND
algorithm and BDCP algorithm. It is showed these algorithms
cannot guarantee that the critical tasks have the priority to A. Task Scheduling Problem
schedule firstly. To solve this problem, we proposed a new
A scheduling problem consists of an application, a target
global scheduling algorithm that based on dynamic critical
computing environment, and performance criteria for
path (GDCP). In GDCP algorithm, tasks on the critical path
have the priority to be scheduled firstly in each scheduling
scheduling.
step, and a global search strategy will be applied to select a A parallel program is represented as a directed acyclic
suitable processor to execute each task, thus reduces the graph G=(V,E), where V is the set of nodes and E is the set
schedule length. The result of experiments shows that the of edges. A node in the graph represents a task which is a
proposed algorithm is better than other algorithms. set of instructions that can be executed on any of the
available processors. The weight of node vi denoted by wi is
Keywords- scheduling algorithm; dynamic critical path; its computation cost. The edges between the nodes represent
global select strategy; schedule length the dependency between the tasks. Between two nodes
vi,vjęV, edge eijęE denotes the precedence constraint such
I. INTRODUCTION that task vj should not start its execution until task vi
completes its execution. The weight of edge eij denoted by
The objective of task scheduling is to reasonably assign cij represents the communication cost. The source node and
related tasks to a multiprocessor system in order to obtain the destination node of an edge are called the parent node
the minimum schedule length, which is an important issue and the child node respectively. In a task graph, a node with
of parallel computing. An efficient task scheduling method no parent node is called an entry node, and a node with no
can achieve a high performance in a system. It is well child node is called an exit task. A node which obtains all of
known, however, that multiprocessor scheduling for most the messages from its parent nodes and waits for scheduling
precedence-constrained task graphs is an NP-complete to a processor is called a ready node.
problem in its general form [5]. Therefore, task scheduling A critical path (CP) of a task graph is a path formed by a
has been extensively studied, and various heuristics have set of nodes and edges, from an entry node to an exit node,
been proposed in the literature [1][6][7][8][9]. In compile- of which the sum of computation costs and communication
time scheduling, these heuristics are classified into a variety costs is maximum quantity.
of categories, such as list scheduling, task duplication and The target system is a multiprocessor system with
clustering. unlimited number of fully connected identical processors.
A scheduling algorithm needs to address a number of The different processors transmit data by the messaging
issues. It should take task into account granularity, arbitrary way, and all of the processors can execute tasks and
computation and communication costs. What’s more, in communicate with others at a time. Usually we assume that
order to be practical use, it should have low complexity and all inter-processor communications are performed without
be economical. List scheduling [2][3][4] is generally contention and the communication overhead between two
accepted as an attractive approach since it combines low tasks scheduled onto the same processor is taken as zero.
complexity with good results. In this paper, we propose a
new static scheduling algorithm, which is called GDCP B. Related Work
(global scheduling algorithm based on dynamic critical It is well known, most of the reported scheduling
path). It schedules arbitrary precedence-constrained task algorithms are based on the approach of list scheduling. The
graphs to a multiprocessor system with unlimited number of basic idea of list scheduling is to assign priorities to tasks of
fully-connected identical processors. The GDCP algorithm the graph and place the tasks in an ordered list by the
overcomes the drawbacks of past approaches and obtains a priorities, and then schedule a node of the list to a processor.
better performance. The following steps are repeatedly carried out until all of
The remainder of this paper is organized as follows. In nodes have been scheduled to the multiprocessor system.
the next section, we define the task scheduling problem, and 1) Select the node with the highest priority from the
describe three typical scheduling algorithms. Our GDCP scheduling list;

978-0-7695-4600-1/11 $26.00 © 2011 IEEE 1335


DOI 10.1109/TrustCom.2011.183
2) According to a certain method, select a suitable proper priorities to the nodes at each step based on the
processor and distribute the designate node to it. dynamic critical path, and distributes the node with the
In the following, we present some of the well-known highest priority to a processor which is selected by a global
list-base scheduling algorithms and analyze advantages and search strategy. The method guarantees relatively important
shortcomings of them. nodes could be preferentially scheduled to an adequate
Modified Critical Path (MCP) [3]: the algorithm processor, thereby the schedule length is shorten effectively.
determines node priorities by assigning an attribute called
as-late-as-possible (ALAP) to all nodes. The algorithm first A. Some Scheduling Attributes
computes all the latest possible start times for all nodes. Before presenting the objective function, it is necessary
Then, it constructs a schedule list arranged in ascending to define some attributes which are derived from a given
order of the latest possible start time. As for the nodes partial schedule.
which have the same latest possible start times, the epst(vi): the epst(vi) is the earliest possible start time of
algorithm orders them by the value of their child node. At node vi, which is recursively defined as follows:
each step, the MCP algorithm removes the first node from ­°0, if it is a entry node
the list and schedules it to a processor that allows for the epst (vi ) = ® (1)
earliest start time. The complexity of the MCP algorithm is °̄vj∈max {epst (vj ) + wj + cji}, otherwise
pred ( vi )
shown to be O(v2logv). The main problem with MCP
algorithm is that static priority may not always order the where pred(vi) corresponds the set of immediate
nodes for scheduling according to their relative importance. predecessors of node vi, wj is the weight of node vj, and cji is
What is more, the nodes which have smaller latest possible the communication cost of edge eji transferring data from
start times may not on the CP. In other words, the MCP node vj to task vi. When both vj and vi are scheduled on the
algorithm does not assign node priorities accurately and thus same processor, cji becomes zero, since we assume that the
may result in a bad effect of schedule performance. intra-processor communication cost is negligible when it is
Earliest Time First (ETF) [2]: As opposed to the MCP compared with the inter-processor communication cost.
algorithm, ETF determines priorities of nodes dynamically. After the earliest possible start times of all tasks in a
It selects a node for scheduling based on an attribute called graph are obtained, the schedule length of partially
the earliest start time. At each step, the algorithm computes scheduled graph will be the max finish time of all tasks, thus
the earliest start time of all ready nodes, and schedules the the current schedule length (csl) which is also called
one with the minimum value to the processor where it can makespan is calculated by:
start in the earliest time. When there are two nodes with the csl = max {epst (vi ) + wi} (2)
vi∈exit ( G )
same value of the earliest start time, the ETF algorithm
selects the node with a higher static level attribute. The where exit(G) represents the set of tasks in the scheduled
complexity of the ETF algorithm is shown to be O(pv2). The graph. The value of csl is useful because the objective of
major drawback of ETF algorithm is that it may not be able scheduling algorithm is to reduce the schedule length step
to reduce the partial schedule length at each step. This is by step and have a shortest schedule length ultimately.
because the node with the earliest start time may not In addition, the latest start time of a node vi, when the
necessarily on the critical path. Schedule such nodes in node have to start its execution without increasing the
advance may influence the start time of the CP node so that schedule length of the task graph, denoted by lpst(vi), is
the schedule length of the tasks graph cannot shorten recursively defined as follows:
effectively. ­°csl − wi, if it is a eixt node
Balanced Dynamic Critical Path (BDCP) [4]: it lpst(vi) = ® (3)
assigns nodes of a task graph to priorities with the value of °̄vj∈min {lpst (vj ) − cij − wi}, otherwise
succ ( vi )
their earliest possible start times (EPST). At each step, the
algorithm dynamically chooses a node which has the where succ(vi) is the set of immediate successors of
smallest value of EPST as the scheduling node, and node vi, wi and cij represent the weight of node vi and edge
schedules the node to a processor which enables the current eij respectively.
schedule length of the graph to be smallest. The complexity B. Task Priority
of BDCP algorithm is O(v2). Although the algorithm The CP of a task graph is the path with the maximum
overcomes the disadvantages of past static scheduling cost, and potentially determines the schedule length of the
algorithms, the value of EPST cannot reflect the importance scheduled task graph. Thus, the nodes on the CP have the
of nodes correctly, and the nodes on the CP still cannot be priority to be scheduled than other nodes. However, the CP
scheduled in advance so that it would affect the final can change dynamically as the scheduling proceeds. The CP
schedule length of the task graph. at an intermediate scheduling step is called the dynamic
III. THE PROPOSED ALGORITHM critical path. Thereby, the issue how to assign proper
priorities to all nodes of the schedule graph especially the
This section presents the proposed algorithm called a nodes on the dynamic critical path plays an important role in
global scheduling algorithm based on dynamic critical path, scheduling algorithm.
which aims at achieving high performance and low
complexity. The proposed algorithm assigns dynamic

1336
According to the critical path algorithm in graph theory D. The Proposed GDCP Algorithm
[10], it is known that if epst(vi) is equal to lpst(vi), node vi The pseudo-code of the proposed algorithm is shown in
must be a critical node, namely, node vi is on the CP. Thus, Algorithm 1. The GDCP algorithm has a time complexity of
in our algorithm we first assign a node whose epst and lpst O(n2) where n is the number of tasks.
are equal higher priorities than other nodes. And then, for
the nodes which have the same priority, the smaller epst is,
the higher priorities it has. Tasks are ordered by the
decreasing order of their priorities ultimately. Therefore, the
most important nodes which are on the CP can be taken into
account preferentially at each scheduling step with the result
that reduces the final schedule length effectively.
C. Processor Selection
After choosing a scheduling node, we need a method to
select a suitable processor for scheduling that node. As can
be noticed, the classic scheduling algorithm selects the
processor allowing the minimum start time for a node. This
method probably gives a locally optimized result. However,
it usually gives bad results in the case of great
communication cost. For example, figure 1(a) shows a task
graph; figure 1(b) gives the schedule result with the above-
mentioned processor selection method which selects a
processor provided the earliest start time of each node, and
then the schedule length is 4. But figure 1(c) shows that
when schedule all nodes to the same processor, the schedule
length is 3 at last.
Algorithm 1. The pseudo-code for the proposed algorithm

E. An Example

Figure 1. Example of a task graph and two different schedule case

We adopt a global search strategy that is to say use csl


which is defined in (2) as criterion to select a processor. By
using this method, it could ensure that all nodes in a graph
are fully taken into account and the scheduling length can be
reduced gradually at each step. First try to assign a
processor PE to place a node v and compute the start time of
node v when it starts execution, then replace the epst of
Figure 2. (a)A parallel Gaussian elimination task graph;(b)The schedule
node v with the start time briefly. After that, calculate the of the task graph generated by the GDCP algorithm
value of csl according to formula (2). Try to distribute node
v to all the available processors and then select the As an illustration, consider the task graph which is a
processor where the csl of the scheduling graph is shortest. macro data-flow graph and represents the parallel Gaussian
If there is more than one such processor, we choose the elimination algorithm written in an SPWD style [4] shown
processor that provides the minimum start execution time in Figure 2(a). The values of numbers inside nodes and
for the child node of node v that has smallest difference edges represent computation cost of nodes and
between the epst and lpst. communication cost between two nodes respectively in the
task graph, and the edges in the CP in this graph are shown
with thick arrow.

1337
The schedule generated by the GDCP algorithm is V. CONCLUSIONS
shown in Figure 2(b). The GDCP algorithm schedules the After analyzing several typical list scheduling
task graph in the order: v1, v3, v7, v4, v9, v12, v5, v10, v14, algorithms, the paper presents a global scheduling algorithm
v16, v6, v11, v15, v17, v18, v2, v8, v13, and the schedule based on dynamic critical path. The algorithm generally
length is 450 time units. At first step, only node v1 has schedules the nodes on the CP preferentially, and reduces
equal values of epst and lpst in all ready nodes and so it is schedule length of task graph as far as possible at each
selected for scheduling to the processor PE0 which enable scheduling step so that obtains the shortest final schedule
the current schedule length of task graph shorten. After length. The results of experiments shows the proposed
scheduling several nodes: v1, v3, v7, v4, v9, v12, the ready algorithm works well on various random graphs and gets
node on the CP is the only one v5 and then schedule it to the better performance than others.
processor PE1 where the current schedule length could be
reduced. Finally, there are remaining nodes: v2, v8, v13
which have lower priorities. Sort them by the value of epst ACKNOWLEDGMENT
and schedule them to a processor in sequence. Eventually,
the schedule of the task graph is generated by the proposed I really appreciate my tutor Professor Yang, whose help
algorithm. and patience made this paper get off the ground and come to
a close smoothly.
IV. PERFORMANCE EVALUATION Last but not least, thanks are given to my roomies, who
have shared with me my worries, frustrations, and hopefully
Random graphs are generally used to compare my ultimate happiness in eventually finishing this paper.
scheduling algorithms. We implement a graph generator
based on the method according to the paper [5]. A random REFERENCES
DAG is described as follows: the computation cost of each [1] T Hagras, and J Janecek, “Static vs. Dynamic List-Scheduling
node in the graph is randomly selected from a uniform Performance Comparision,” in Acta Polytechnica, vol. 43, January
distribution with the mean equal to 40. Beginning with first 2003.
node, a random number indicating the number of children is [2] JJ Hwang, YC Chow, F, and FD Anger, “Scheduling Precedence
chosen from a uniform distribution with the mean equal to Graphs in Systems with Interprocessor Communication Times,”
v/10. The communication cost of each edge is also SIAM Journal on Computing, vol. 18, pp. 244-257, June 1989
randomly selected from a uniform distribution with mean [3] Wang M Y, and Gajski D D, “Hypertool: A Programming Aid for
equal to 40 times the specified value of CCR Message-passing Systems,” IEEE Transactions on Parallel and
Distributed Systems, vol. 1, pp. 330-343, July 1990.
(communication-to-computation-ratio). We generate a batch
of random task graphs consisting of subsets of graphs in [4] SHI Wei, and ZHENG Wei-Min, “The Balanced Dynamic Critical
Path Scheduling Algorithm of Dependent Task Graphs,” Chinese
which the number of nodes vary from 20 to 160 with Journal of Computers, vol. 24, pp. 991-997, September 2001.
increments of 20, and each subset consists of graphs with
[5] YK Kwok, and Ishfaq Ahmad, “Benchmarking and Comparison of
different CCRs (0.1, 0.2 1.0, 5.0 and 10.0). At last, we use the Task Graph Scheduling Algorithms,” Parallel and Distributed
normalized schedule length (NSL) [5] to compare these Computing, vol. 59, pp. 381-422, December 1999.
scheduling algorithms. [6] Jorge Barbosa, and AP Monteiro, “A List Scheduling Algorithm for
Figure 3 shows the results of different scheduling Scheduling Multi-user Jobs on Clusters,” Computer Science, vol.
algorithms in above-mentioned test environment. In figure 5336, pp. 123-136, December 2008.
3(a), it is known that the larger the value of CCR is, the [7] Ishfaq Ahmad, and Yu-Kwong Kwok, “On Exploiting Task
better result the proposed algorithm obtains. And as can be Duplication in Parallel Program Scheduling,” IEEE Transactions on
observed from figure 3(b), the NSLs of all algorithms have a Parallel and Distributed Systems, vol. 9(9), pp. 872-892, September
1998.
slightly increasing trend along with the increase of the
[8] H Topcuoglu, S. Hariri, and M Y Wu, “Performance-Effective and
number of nodes. But the GDCP algorithm has a better Low-Complexity Task Scheduling for Heterogeneous Computing,”
performance than others in each condition. IEEE Transactions on Parallel and Distributed Systems, vol. 13(3), pp.
260-274, March 2002.
[9] YK Kwong Kwok, and Ishfaq Ahmad, “Efficient Scheduling of
Arbitrary Task Graphs to Multiprocessors Using a Parallel Genetic
Algorithm,” Parallel and Distributed Computing, vol. 47(1), pp. 58-77,
November 1997.
[10] Mark Allen Weiss, “Data Structures and Algorithm Analysis in C,” in
Post &Telecom Press, 2005.

Figure 3. (a)The average NSLs of algorithms with different CCR;(b) The


average NSLs of algorithms with different number of nodes

1338

You might also like