Decomposition-Based Real-Time Scheduling of Parallel Tasks On Multi-Cores Platforms

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2019.2937820, IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Decomposition-based Real-Time Scheduling of

Parallel Tasks on Multi-cores Platforms
Xu Jiang, Nan Guan, Xiang Long, and Han Wan
Abstract—Multi-core processors have become mainstream of independent sporadic tasks, inserting artificial release times
computation platforms, not only for general and high- and deadlines for workload released by individual vertices in
performance computers, but also for real-time embedded systems. the DAG. The interdependencies among vertices connected by
To fully utilize the computation power of multi-cores, software
must be parallelized. Recently, there has been a rapidly increasing edges are automatically guaranteed as long as each sporadic
interest in real-time scheduling of parallel real-time tasks, but task always executes between its artificial release time and
the field is still much less mature than traditional real-time deadline. A proper decomposition strategy can potentially
scheduling of sequential tasks. In this paper, we study the smoothen the task’s workload and thus improve the system
real-time scheduling and techniques for parallel real-time tasks schedulability. However, on the other hand, the resulting
based on decomposition, where a task graph is transferred to
a set of independent sporadic tasks. In particular, we propose sporadic tasks’ relative deadlines are much shorter than the
new decomposition strategies that better explore the structure original task, which may hurt the schedulability. One may
feature of each task to improve schedulability. We develop enumerate all possibilities for decomposition and select the
schedulability tests for the global EDF (Earliest Deadline First) best one for each specific task system, but in practice this
scheduling algorithm based on decomposition and three types of is computationally intractable. The challenge is to design
its variants, with their own pros and cons in different aspects.
We conduct experiments to evaluate the real-time performance of efficient decomposition strategies to foster the strengths and
our proposed scheduling algorithms against the state-of-the-art circumvent the weaknesses of decomposition-based schedul-
scheduling and analysis methods of different types. ing, and provide good schedulability in general. The main
Index Terms—Multi-core, Parallel Tasks, DAG, Real-Time technical contribution of this paper can be summarized as:
Scheduling, Global EDF, Decomposition. • We develop new decomposition strategies that balance
the strengths and weaknesses of decomposition-based
scheduling with a simple metric structure characteristic
I. I NTRODUCTION value. This metric can well represent a task’s structure
ulti-core processors have become mainstream com- feature, and provide a clear guidance for task decom-
M putation platforms, not only for general and high-
performance computers, but also for real-time embedded sys-
position to achieve good schedulability. We develop de-
composition strategies that are optimal with respect to
tems. However, using multi-cores is not a free lunch. Software this metric.
must be properly parallelized to fully exploit the computation • Based on the above decomposition strategy, we schedule
capacity of multi-cores. There have be tremendous research the parallel task set with several variants of global EDF
work in real-time scheduling of sequential workload on multi- (Earliest Deadline First) scheduling with their own pros
cores [1], while the work with parallel real-time tasks is much and cons in different aspects: (i) preemptive global EDF,
less. (ii) non-preemptive global EDF, (iii) global EDF with
A common model for parallel tasks is a Directed Acyclic density separation and (iv) global EDF with restricted
Graph (DAG), where edges describe interdependency con- migration, and develop schedulability test for each base
straints among the workload represented by vertices. Ex- on the structure characteristic values.
isting work on global real-time scheduling algorithms for We conduct experiments with randomly generated task sets
DAG task models can be classified into two paradigms: to compare the real-time performance, in terms of acceptance
(i) decomposition-based scheduling [2], [3], [4], (ii) global ratio, of our proposed scheduling algorithms as well as the
scheduling (without decomposition) [5], [6], [7]. Some of them state-of-the-art of, not only decomposition-based scheduling,
shows superiority to others with certain analysis techniques but also global scheduling using other analysis techniques.
and evaluation metrics. However, in general, the potential of Experiments show that our decomposition strategy and con-
each of them has not yet been fully exploited, and none of sequential schedulability analysis techniques can effectively
them can claim itself to be a clear winner. explore the feature of tasks.
This paper focuses on the decomposition-based scheduling
paradigm, in which each DAG task is transferred into a set II. P RELIMINARY
Xu Jiang is with the School of Computer Science and Engineering, A. Task Model
University of Electronic Science and Technology of China and The Hong Consider a task set τ of n periodic tasks {τ1 , τ2 , ..., τn },
Kong Polytechnic University. Nan Guan is with The Hong Kong Polytechnic
University. Xiang Long and Han Wan are with Beihang University. Corre- executed on m identical processors. Each task is represented
sponding author: Nan Guan, nan.guan@polyu.edu.hk. by a DAG, released recurrently with a period Ti . We assume
0278-0070 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2019.2937820, IEEE
of execution time of the corresponding vertex assigned to this

task. The density of πj is defined as:
δ(πj ) = c(πj )/d(πj ).
The maximal density among all tasks is denoted by:
δ> = max {δ(πj )} .

πj ∈π
Fig. 1. A DAG task example.
The demand bound function dbfτi (t) denotes the maximal
all tasks to be implicit-deadline, i.e., the workload released by accumulative workload over a time interval of length t, re-
a task at t must be finished by the next release time t + Ti . leased by the resulting sequential sporadic tasks from task
graph τi , with both release times and absolute deadline in this
Each task τi is modeled as a directed acyclic graph (DAG).
time interval. The load of τi ’s resulting sequential sporadic
Slightly abusing notations, we also use τi to denote the set
tasks and the total load of a task graph τi and of the whole
of vertices in the corresponding DAG. A vertex v ∈ τi has a
decomposed tasks set π are:
worst-case execution time c(v). Edges represent dependencies
among vertices. A directed edge from vertex v to u means
dbfτi (t)
X
that u can only be executed after v is finished. In this case, `(τi ) = max , `P = `(τi ). (3)
t>0 t
v is a predecessor of u, while u is a successor of v. The τ ∈τ i
total execution time of all vertices of task τi is denoted by

P
Ci = v∈τi c(v). The utilization Ui of a task τi and the total The above defined density and load are natural generaliza-
utilization UP of the task set τ are defined as follows tions of their correspondence for sequential sporadic tasks,
and existing schedulability test conditions for sequential tasks
Ci X are applicable on the resulting task set after decomposition.
Ui = , UP (τ ) = Ui . (1)
Ti τ ∈τ
These test conditions use δ> and `P as the metrics to decide
i
schedulability (which will be introduced in detail in Section
Li denotes the sum of c(v) of each vertex v on the longest IV), and are independent on the specific task decomposition
chain (also called the critical path) of task τi , i.e., the execution strategy in use. However, different decomposition strategies
time of the task τi to be exclusively executed on infinite may lead to different values of δ> and `P . Therefore, the
number of cores, which can be computed in linear time with target of this paper is to design good decomposition strategies,
respect to the size of DAG [8][9]. The laxity of task τi is to minimize both δ> and `P , and thus increase the chance for
Ti − Li . We use Γi to denote the elasticity of τi , which is a task set to be deemed schedulable.
defined as
Li
Γi = .
Ti III. D ECOMPOSITION S TRATEGY
Fig. 1 shows a DAG task example with 8 vertices. The
utilization of this task is Ci /Ti = 30/25, the laxity is Our decomposition strategy consists of two steps: (i) seg-
Ti − Li = 10 and the elasticity is Li /Ti = 15/25. mentation and (ii) laxity distribution. In the segmentation step,
Apparently if a task set τ is schedulable on m processors we divide the time window between two successive release
the following necessary conditions must hold: of τi (of length Ti ) into several segments, and assign the
X workload of each vertex into these segments (a vertex may
∀τi ∈ τ : Li ≤ Ti , and Ui ≤ m. (2) be split into several parts and assigned to different segments,
τi ∈τ as will be introduced later). In the laxity distribution step, the
laxity Ti − Li of this task is distributed into each segment
(given to vertices in each segment accordingly).
B. Task Decomposition The target of our decomposition strategy is to make the load
The target of decomposition is to assign an artificial release of each segment and maximal density of its contained vertices
time and deadline to each vertex (relative to the release time of to be as uniform as possible, and thus to minimize both the
the task), such that the dependencies among different vertices load and the maximal density among all vertices.
are automatically guaranteed as long as each vertex respects In Section III-A, we introduce the basic concepts of the
its own release time and deadline constraints. segmentation step. Before presenting our segmentation strat-
We use π = {π1 , π2 , · · · } to denote the set of resulting egy, we first introduce our laxity distribution strategy and
sequential sporadic tasks from all task graphs in the system. related properties in Section III-B, after which it will become
Each sequential sporadic task πj corresponds to the part of a clear how the segmentation strategy affects the schedulability
vertex in a task graph, whose period p(πj ) equals the period of of a task set. Then we present our segmentation strategy in
the corresponding task graph, relative deadline d(πj ) equals Section III-C. In Section III-D, the properties of the resulting
the distance between its artificial release time and deadline sequential sporadic task set are analyzed which can be further
(after the decomposition), and WCET c(πj ) equals the amount used to make the schedulability decision.
[rdy(v), fsh(v)] of a vertex v may cover several segments.

If we find bx ≤ rdy(v) and fsh(v) ≤ bx+1 for a vertex v
where bj and bj+1 are two boundaries of a segment in B,
then we know vertex v only covers one segment. Otherwise,
the lifetime window of v must cover more than one segment.
In this case, v can be split and assigned to several of these
segments. Later in Section III-C we will introduce how to
assign the vertex to these segments, and will see how this
Fig. 2. Timing diagram of the task example in Fig. 1. The earliest ready
times and latest finish times are marked as upper and lower arrows. affects the schedulability of the task set. For this moment,
we just assume that each vertex has been assigned to one
or more of these segments covered by its lifetime window
[rdy(v), fsh(v)].
Suppose a task is divided into X segments, denoted by
s1 , · · · , sX , where the two boundaries of a segment are defined
by two successive elements in B. We use e(sx ) to denote the
length of segment sx (computed by bx+1 − bx ), and c(sx ) to
denote the total amount of workload of all the vertcies assigned
to segment sx , and we know
Fig. 3. Segmentation of the task example in Fig. 1. X X
X X
Li = e(sx ), Ci = c(sx ). (4)
x=1 x=1
A. Segmentation
Finally, we classify all the segments into two types:
To divide a task into segments, we first construct the timing
diagram of task τi , which defines the earliest ready time of Definition 1. sx is a light segment if c(sx )/e(sx ) ≤ Ci /Li ,
each vertex v, denoted by rdy(v), and the latest finish time of and is a heavy segment otherwise. We use Li and Hi to denote
v, denoted by fsh(v), assuming that τi executes exclusively on the set of light and heavy segments of task τi , respectively.
sufficiently many processors and all the workload of τi must
Fig. 3 shows one possible segmentation of the task in Fig.
be finished within Li .
1. The boundaries of all segments are {0, 1, 5, 10, 14, 15}, and
Intuitively, rdy(v) represents the earliest time that vertex v
thus the timing diagram is divided into 5 segments. The length
is ready to be executed, and fsh(v) is the time that v must
e(sx ) of each segment is computed by bx+1 − bx and we have
be finished to guarantee that τi is finished when it has no
ex of all segment: {1, 4, 5, 4, 1}. v2 (the one with c(v2 ) = 3)
laxity, i.e., finished with in Li (note that fsh(v) is the time
covers segments s2 , s3 , s4 and v4 covers s2 , s3 . Suppose we
that v must be finished to guarantee task τi is finished before
split the workload of v2 into two parts 1 and 2 and assign them
Li rather than before its deadline). Assuming τi releases an
to s3 and s4 , and split v4 into two equal parts and assign them
instance at time 0, then for each vertex v, rdy(v) is computed
to s2 and s3 In this case, we know the classification of each
by
( segment as follows (Ci /Li = 30/15 = 2):
0, PRE(v) = ∅
rdy(v) = s1 : c(s1 )/e(s1 ) = 1/1 < 2 ⇒ light
max {c(u) + rdy(u)}, otherwise
u∈PRE(v) s2 : c(s2 )/e(s2 ) = 8/4 = 2 ⇒ light
where PRE(v) is the set of predecessors of v, and fsh(v) is s3 : c(s3 )/e(s3 ) = 10/5 = 2 ⇒ light
(
Li , SUC(v) = ∅ s4 : c(s4 )/e(s4 ) = 10/4 > 2 ⇒ heavy
fsh(v) = min {rdy(u)}, otherwise s5 : c(s5 )/e(s5 ) = 1/1 < 2 ⇒ light
u∈SUC(v)
Note that here we only use this particular example to
where SUC(v) is the set of successors of v. In this way, all the
illustrate the related concepts, and the segmentation strategy
dependency constraints of a task are preserved if each vertex
in our method will be presented later in Section III-C.
executes in lifetime window [rdy(v), fsh(v)]. Fig. 2 shows the
timing diagram of the example task in Fig. 1.
By the definition of rdy(v) and fsh(v), we can see fsh(v) of B. Laxity Distribution
vertex v must be equal to rdy(u) of some other vertex u. Let Next we distribute the total laxity Ti − Li of the task to
B = {b1 , b2 , ..., bk } denote a set consisting of rdy(v) of each each segment. In other words, we will “stretch” the segments
vertex v and fsh(u) of the latest finished vertex u in time order such that the total length of all segments changes from Li to
(if there are more than one vertex whose earliest start times Ti . Intuitively, the purpose of our laxity assignment strategy
are the same, the same time value is only included once). The is to distribute the laxity in a balanced manner to avoid any
timing diagram of a task τi is divided into several segments of the segments to cause an outstanding workload burst.
by the earliest ready time rdy(v) of each vertex v and fsh(u) We use d(sx ) to denote the length of segment sx after laxity
of the last finished vertex u. The elements in B denote the distribution, and thus d(sx ) − e(sx ) is the amount of laxity
boundaries of these segments. In general, the lifetime window given to this segment (recall that e(sx ) is the length of the
segment before the laxity assignment). We enforce that each Lemma 1. For a task τi we have
vertex (or a part of a vertex in case it is split) in sx must finish
δî ≤ max(λ, ρ) · Γi (9)
execution by the end of this segment. Therefore, d(sx ) can be
viewed as the common relative deadline of vertices (or parts `î ≤ max(λ, ρ) · Ui . (10)
of some vertices) in sx . Notice that, d(sx ) − e(sx ) could be
negative after the laxity distribution for some segment sx , in Proof. We first prove (9) by bounding δ(sh ) of an arbitrary
which case the task set is unschedulable. heavy segment sh :
e(sh ) c(sh ) e(sh ) e(sh )
Definition 2. For a segment sx we define δ(sh ) = = · = λUi h .
h
d(s ) h h
d(s ) c(s ) c(s )
c(sx )
`(sx ) = (5) c(sh ) Ci e(sh ) Li
d(sx ) A heavy segment sh satisfies e(sh )
> Li , i.e., c(sh )
< Ci , so
we have
and `î is the maximum `(sx ) among all segments of task τi :
e(sh ) Li Ci Li Li
δ(sh ) = λUi < λUi =λ =λ = λΓi .
`î = max x
{`(s )}. c(sh ) Ci Ti Ci Ti
each segment sx
Combining this with (8) proves (9).
Definition 3. For a segment sx we define
Then we prove (10) by bounding the load of an arbitrary
e(sx ) light segment sl .
δ(sx ) = (6)
d(sx )
c(sl ) e(sl ) c(sl ) c(sl )
and δî denotes the maximal δ(sx ) among all segments of task `(sl ) = = · = ρΓi .
d(sl ) d(sl ) e(sl ) e(sl )
τi :
c(sl ) Ci
δî = max {δ(sx )}. Since a light segment satisfies e(sl )
≤ Li , we have
each segment sx
After the laxity distribution, each segment (with its con- c(sl ) Li c(sl ) Li Ci Ci
`(sl ) = ρΓi l
=ρ ≤ρ =ρ = ρUi .
tained vertices) has its own release time and relative deadline. e(s ) Ti e(sl ) Ti Li Ti
A DAG task is now transferred to a set of independent sequen- Combining this with (7) proves (10).
tial sporadic tasks. Recall that, our target of decomposition is
to minimize δ> and `P of the resulting sequential sporadic From Lemma 1 we can see that both δî and `î depend
on max(λ, ρ). Therefore, the minimization of δî and `î is
task set. These two parameters directly depends on the `î and
now unified – we want to minimize max(λ, ρ). The following
δî for each task (which will be introduced in detail in Section
lemma gives the relation between λ and ρ:
III-D). Therefore, to improve the schedulability, we should
minimize both `î and δî for each task. Lemma 2. Given a decomposed task τi , the following relation
However, in general minimizing `î and δî may contradict between λ and ρ holds:
each other. Recall that we classify the segments into two types:
CiH LL
light segments and heavy segments. Intuitively, we should give 1= + i (11)
more laxity to heavy segments since, otherwise they will cause λCi ρLi
a large `(sx ). However, on the other hand, if we give too much where X X
laxity to heavy segments, and thus do not give enough laxity CiH = c(sx ), LL
i = e(sx ).
to light segments, δî will become very large, which also hurts sx ∈Hi sx ∈Li
the schedulability. Therefore, the challenge is how to find a Proof. By (7) we know for each heavy segment sh :
systematical strategy to balance `î and δî . In the following
we present our laxity distribution strategy which unifies the c(sh )Ti
d(sh ) =
minimization of `î and δî . λCi
Our laxity distribution strategy uses the following rules: and by (8) we know for each light segment sl :
h
• Rule 1: for each heavy segment s , we give laxity to it
e(sl )Ti
so that c(sh ) d(sl ) = .
`(sh ) = = λUi , (7) ρLi
d(sh )
Hence we have
• Rule 2: for each light segment sl , we give it laxity to it X X C H Ti LL Ti
so that Ti = d(sh ) + d(sl ) = i + i ,
e(sl ) λCi ρLi
δ(sl ) = = ρΓi , (8) h s ∈Hl s ∈L
d(sl )
from which we get (11) by removing Ti from both sides
where λ and ρ are two positive factors that will be decided
later in this subsection. Now we show that although the above By Lemma 2 we know λ and ρ are inversely proportional to
two rules only control `(sh ) for a heavy segment sh and each other. Therefore, max(λ, ρ) is minimized when λ = ρ,
δ(sl ) for a light segment sl , actually δ(sh ) and `(sl ) can also and according to (11) this results in
be correspondingly bounded, and thereby the maximal ` and CiH LL
maximal δ of all segments are bounded. λ=ρ= + i .
Ci Li
We define this value as the structure characteristic value of a Algorithm 1 The segmentation algorithm.
decomposed task, denoted by Ωi : 1: for each element v in S do
2: if v only covers a single segment sx , i.e., the lifetime
Definition 4. The structure characteristic value of a decom- window [rdy(v), fsh(v)] falls in the time interval be-
posed task τi is defined as tween two boundary points of segment sx then
CiH LL 3: S x ← S x ∪ {v}, S ← S \ {v}
Ωi = + i . (12) 4: end if
Ci Li
5: end for
Then the bounds (9) and (10) in Lemma 1 are rewritten as
6: for each light segment sx , x ∈ [1, · · · , X] do
x
δî ≤ Ωi Γi (13) 7: u ∈PS that covers s , in their order in S do
for each
c(u)+ v∈S x c(v) Ci
8: if < Li then
`î ≤ Ωi Ui . (14) e(sx )
9: S x ← S x ∪ {v}; S ← S \ {v};
To minimize both of the above upper bounds of `î and δî , we 10: else
shall minimize Ωi . We observe that Ωi depends on CiH and LL i 11: split u into u0 and u00 , such that
(while Ci and Li are fixed constants for a given task τi ). Both c(u0 ) + v∈S x c(v)
P
Ci
CiH and LL i are decided in the segmentation step. Recall that in x
= ;
e(s ) Li
Section III-A we left it open about how to do the segmentation,
i.e., how to assign the vertices to different segments. Now it 12: fsh(u00 ) ← fsh(u);
becomes clear that the target of our segmentation strategy is 13: S x ← S x ∪ {u0 };
to minimize Ωi . In the following we will introduce in detail 14: S ← (S \ {u}) ∪ {u00 } // u00 be the head of S
our segmentation strategy that leads to the minimal Ωi . 15: break;
16: end if
17: end for
C. Second Look into Segmentation
18: end for
We use an ordered list S to store the vertices that have not 19: if S is not empty then
been assigned to any segment, and use S x to store the vertices 20: Arbitrarily assign the remaining elements in S to the
that have already been assigned to segment sx . Initially, S heavy segments that they cover, as long as the total
contains all the vertices of τi , and each S x is empty. The work of a vertex assigned to a segment sx does not
elements in S are ordered in the increasing order of their latest exceed e(sx );
finish times (ties are broken arbitrarily). The elements in S will 21: end if
be moved to each S x at the end of the segmentation procedure.
The pseudo-code of our segmentation algorithm is shown in
Algorithm 1, which consists of three phases: We use our running example in Fig. 1 and Fig. 2 to illustrate
(1) Assign each vertex that only covers a single segment. Algorithm 1. In phase 1, v1 is assigned to s1 , v3 to s2 , v5 to
(from line 1 to 5 in Algorithm 1). Each vertex that covers only s3 , v6 and v7 to s4 and v8 to s5 . In phase 2, we visit each
one segment sx is simply moved from S to S x . segment in time order.
(2) Assign the the remaining vertices to light segments as
much as possible, without turning any light segment into • s1 : nothing can be assigned.
heavy. (from line 6 to 18). This phase does assignment to each • s2 : two vertices v2 and v4 cover s2 . v4 is before v2 in S
light segments one by one, in the time order (line 6). For each since fsh(v4 ) < fsh(v2 ). v4 is split into two parts v40 and
light segment sx , we assign vertices that cover sx , according v400 so that (c(v3 ) + c(v40 ))/e(s2 ) = Ci /Li = 2, which
to their order in S. A vertex u can be entirely assigned to implies c(v40 ) = c(v400 ) = 4. Then we assign v40 to s2 and
segment sx if sx still remains light after adding the execution put v400 back to S for future use.
time of u (line 8). In this case, we move u from S to S x (9), • s3 : both v2 and v400 cover s2 , and fsh(v400 ) (which inherits
and go to the next iteration, to assign the next vertex in S that from fsh(v4 )) is earlier than fsh(v2 ). After assigning v400
covers sx . Otherwise, we split u into two parts u0 and u00 such to it, s3 is still light, so we continue to assign v2 to s3 .
that after adding u0 to sx , c(sx ) exactly reaches the threshold We split v2 into v20 and v200 , with c(v20 ) = 1 and c(v200 ) = 2,
between light and heavy segments, Ci /Li . u00 is put back to such that (c(v5 ) + c(v400 ) + c(v20 ))/e(s3 ) = Ci /Li = 2,
S for future assignment. and put v200 back to S.
(3) Assign the remaining vertices, if any, to the heavy • s4 : this segment, although light, already meets the thresh-
segments arbitrarily. This phase is from line 19 to 21. After old, i.e., (c(v6 ) + c(v7 ))/e(s4 ) = Ci /Li = 2, therefore
Phase 2, the remaining elements in S can only be assigned no assignment in this phase.
to heavy segments (including those segments that just reach • s5 : nothing can be assigned.
the threshold in Phase 2). Since Ωi only depends on the total After Phase 2, v200 is the only remaining element in S, and
execution time in all heavy segments (i.e., CiH ), we can assign we can arbitrarily assign it to any segment covered by it.
a remaining element in S to its covered segments arbitrarily. We choose to assign it to s4 , and the final result of the
The only constraint is that the part of a vertex assigned to a segmentation is shown in Fig. 3.
segment sx is at most e(sx ). Next we show that Algorithm 1 is optimal in the sense of
resulting in the minimal Ωi . Before that we first introduce a

general real-time scheduling problem and prove the optimality
of EDF with it.
General Real-Time Scheduling Problem: Assume that a

set of real-time jobs are executed on a single-processor. For
each job, its release time and deadline are known, and it can
only execute during this time interval. In general, each job may Fig. 4. Laxity distribution of the decomposed task.
have some unfinished workload by its deadline. The problem
is, how to schedule these jobs so that the total amount of
unfinished workload of all jobs does not exceed 4? The proof of the theorem goes into two steps. (i) First we
prove that Ωi is minimized iff Cout is minimized. (ii) Second
Note that the standard real-time scheduling problem is a we prove that our segmentation algorithm leads to the minimal
special case of the above defined problem with 4 = 0, for Cout .
which EDF is known to be optimal. In the following we prove By the definition of Ωi we have:
EDF is still optimal for the general problem. (Li − LL Ci
CiH LL i ) Li + Cout LL Cout
Ωi = + i = + i .=1+
Lemma 3. EDF scheduling is optimal with this general real- Ci Li Ci Li Ci
time scheduling problem. Therefore, the first step is proved. The remaining of the proof
Proof. This can be proved using the “time slice swapping” is dedicated to the second step.
technique. We say a schedule is valid if each job executes In our segmentation problem, the time line is divided
between its release time and deadline, and the total amount into segments by the earliest ready time and latest finish
of unfinished workload of all jobs is at most 4. Then the time of each vertex. At the beginning of Algorithm 1, each
following procedure is essentially the same as in the standard segment is empty. For each segment s, it can accommodate at
real-time scheduling problem [10]. The only difference is that most e(s) × C Li workload without being turned into a heavy
i
in the standard problem the swapping of execution of a job segment. Therefore, we can normalize c(v) of each vertex
Li
to an earlier point won’t cause it to miss deadline, while in v by c0 (v) = c(v) × C i
, such that the assignment of each
our general problem such a swapping won’t increase the total vertex can be seen as scheduling a job on a unit-speed single-
amount of unfinished work. processor, whose worst-case execution time is c0 (v), release
We will prove that any valid schedule for this job set can time is the earliest ready time rdy(v) of v and deadline is the
be transformed into a valid EDF schedule. This will be shown latest finish time fsh(v) of v. The execution of this job for t
by induction on t: the transformation is possible for any time time units in the range of a segment corresponds to assigning
interval [0, t). This is trivially true for t = 0. Now assume it t× C Li workload of v to this segment. The job can only
i
is true for time interval [0, t) that a job with deadline dj is execute between its release time and deadline, since a vertex
executed in the time interval [t, t + 1), and that the earliest v can only be assigned to segments between its earliest ready
deadline among all jobs pending at t1 is di < dj . Let t0 be the time rdy(v) and latest finish time fsh(v). In this problem, we
first time at which the job with deadline di is executed after t. want to minimize Cout and thus Ωi , so we should make the
We know that t < t0 and t0 < di < dj . Therefore, by swapping total amount of leftover workload of all the vertices as small
the executions in [t, t + 1) and [t0 , t0 + 1) no deadline violation as possible before starting assignment to heavy segments.
occurs in [t, t + 1), and thus no extra unfinished workload is If we interpret this in the form of scheduling the vertices’
introduced in [t, t + 1). By the inductive hypothesis we know corresponding jobs, our segmentation problem is equivalent to
the total amount of unfinished workload is no more than 4 by the general real-time scheduling problem. In the first two steps
time t, so in summary the total amount of unfinished workload of Algorithm 1, the assignment to each segment is prioritized
is no more than than 4 by time t + 1. among vertices in the way that it always selects the vertex
By now we have proved EDF is optimal for general real- v with the earliest fsh(v) to be assigned (or one of them in
time scheduling problem. case there are several). Therefore, the corresponding jobs are
scheduled by EDF. By Lemma 3, the theorem is proved.
Theorem 1. The segmentation algorithm in Algorithm 1 is
optimal in the sense of resulting in the minimal Ωi .
Fig. 4 shows the results after the segmentation and laxity
Proof. After the segmentation the total amount of workload distribution steps in the above subsections, in which both v2
assigned to a heavy segment sh is c(sh ). We define and v4 are split into two parts. The deadline of each part
cout (sh ) = c(sh ) − Ci /Li × e(sh ). (belongs to a vertex) is marked by down-arrow. The dashed
box denotes the part that originally belongs to v2 ’s lifetime
Recall that a segment is heavy if c(sh )/e(sh ) > Ci /Li , so window before being split.
cout (sh ) is the part of “overflowed” workload in sh that makes
c(sh )/e(sh ) above Ci /Li . Then we define D. Resulting Sequential Sporadic Task Set
Using the segmentation and laxity distribution strategy
X
Cout = cout (sh ).
sh ∈H
introduced above, a task graph is transformed into a set
of sequential sporadic tasks. The dependencies among these 1) Schedulability Test: After the decomposition, a task
resulting tasks are automatically guaranteed as long as each of graph is transformed into a set of sequential sporadic tasks
them executes between its artificial release time and absolute where the interdependencies are automatically guaranteed as
deadline. long as each subtask executes with respect to their time
Since each resulting sequential task’s WCET is not greater constraints. Thus, the following condition can be applied to
than e(sx ), and its relative deadline equals d(sx ) of the check the schedulability of the task set [12], [4]:
corresponding segment sx , of the corresponding segment sx ,
Theorem 2. A decomposed task set τ is schedulable by GEDF
and by (13) we know:
on m processors if it satisfies:
n o `P ≤ m − (m − 1)δ> . (20)
δ> = max {δ(πj )} ≤ max δî = max {Ωi Γi } . (15)
πj ∈π τi ∈τ τi ∈τ
The schedulability test in Theorem 2 is the base of previous
Due to the relative release offsets of resulting sequential work on decomposition-based global EDF scheduling of par-
tasks of a task graph τi , we should not simply sum up of allel tasks [12], [4]. Based on this test, the following theorem
demand bound functions of each individual resulting sequen- shows that the structure characteristic values can be directly
tial task when computing dbfτi (t). Instead, the load of a task used to test the schedulability of the task set. Note that `P
graph after the decomposition is actually bounded by `î (as in the original test for traditional sequential tasks is computed
discussed in [2]), and by by (14) we have: by summing up the density of each sequential task, while here
we use (16) to bound `P instead.
X X X
`P = `(τi ) ≤ `î = Ω i Ui . (16) Theorem 3. Task set π is schedulable under GEDF if 1/Ω> −
τi ∈τ τi ∈τ τi ∈τ Γ> > 0 and:
UP − Γ>
Lemma 4. For the resulting sequential sporadic task set π by m≥ (21)
1/Ω> − Γ>
decomposing the original task graph set τ , we have:
where UP , Γ> and Ω> are defined in (19).
`P ≤ Ω> UP (17)
Proof. Proved by combining (17), (18) and (20).
δ > ≤ Ω > Γ> (18)
2) Capacity Augmentation Bound: By now, all task param-
where eters are defined on the basis of unit-speed processors. If we
X scale up/down the processor speed to s, then the WCET of
UP = Ui , Γ> = max{Γi }, Ω> = max{Ωi }. (19) each task will be multiplied by 1/s.
τi ∈τ τi ∈τ
τi ∈τ
Definition 5. [13] If on unit-speed processors, a task set
Proof. Directly follows (15) and (16). has total utilization of at most m and the critical path length
of each task is smaller than its deadline, then a scheduler S
provides a capacity augmentation bound of s if it can schedule
IV. S CHEDULING AND A NALYSIS
this task set on m processors of speed s.
In this section, we will introduce how to analyze the We first generalize the test condition in Theorem 2 to s-
schedulability of the resulted independent sporadic task set speed processors: a task set is schedulable if:
under different scheduling algorithms. Section IV-A studies
preemptive global EDF scheduling (GEDF) and its analysis. `sP ≤ m − (m − 1)δ>
s
(22)
Section IV-B, IV-C and IV-D discuss three variants of global
EDF. At last, Section IV-E shows whether different parts of where `sP = `P /s and δ>
s
= δ> /s.
a vertex assigned to different segments can be scheduled as Theorem 4. Our proposed decomposition-based GEDF
an entire entity at runtime in different algorithms, without scheduling algorithm has a capacity augmentation bound of
violating the desired workload and density bounds. Note that 1
(2 − m )Ω> .
in this section, the preemption and migration overheads are
not included in the theoretical results. In Section V-B, we Proof. Assume τ is a task set with UP ≤ m and Γ> ≤ 1.
1
will introduce how to include the preemption and migration Consider an s-speed processor, where s = (2 − m )Ω> . By
overheads into the schedulability test conditions and show the reusing (17) and (18) in the proof of Theorem 3 we have
corresponding evaluation results. Ω> Γ> Ω> Ω> UP Ω> m
s
δ> ≤ ≤ and `sP ≤ ≤ .
s s s s
A. Global EDF Scheduling (GEDF) Combining them gives
In GEDF [11], jobs are prioritized by their absolute dead- Ω> m Ω> (2m − 1)Ω>
`sP + (m − 1)δ>
s
≤ + (m − 1) = =m
lines, and a higher-priority job preempts a lower-priority job. s s s
A job may start execution on any available processor, and a which meets the condition (22) and thus the task set is
preempted job may resume on any available processor. schedulable on s-speed processors.
Now we take a closer look into the capacity augmentation where ρ = E Dmin , Emax and Dmin are the maximal execu-
max
1
bound (2 − m )Ω> . Recall the definition of the structure tion time and minimal relative deadline among all resulting
characteristic value of a task τi : sporadic tasks, respectively.
CH LL Proof. (24) can be obtained by combining (17), (18) and (23).
Ωi = i + i
Ci Li
where CiH and LL i are the results of the segmentation step.
However, as stated in Theorem 1, our segmentation algorithm C. Global EDF with Density Separation (GEDF-DS)
yields the optimal (minimal) Ωi , and thus the Ωi used in the It is well-known that global scheduling (with sequential
capacity augmentation bound of our scheduling algorithm is tasks) suffers the so-called Dhall’s effect [16], namely some
the minimum among all the possibilities. This minimal Ωi is task set with total load arbitrarily close to 1 is infeasible no
indeed an inherent feature of the task itself: the minimal Ωi matter how many processors are added to the system. Dhall’s
of a task τi is only determined by the task parameters and effect occurs when there are tasks with very large utilization
its graph structure (although a polynomial-time algorithm is (close to 1). To conquer Dhall’s effect, an extended scheduling
needed to actually compute it). strategy called density separation (DS) was proposed [17],
CiH [18], which gives the highest priority to tasks with density
By the definition of CiH and LL i , we know Ci ≤ 1 and
LL CH LL larger than a particular threshold.
i
Li≤ 1, and Cii and Lii cannot equal to 1 at the same time. Using GEDF to schedule the resulting sequential sporadic
Therefore, Ω> must be in the range of [1, 2), and the capacity task set after decomposition of a DAG task system also suffers
1 2
augmentation bound is in the range of [2 − m ,4 − m ). the Dhall’s effect. Therefore, we also apply the DS strategy to
B. Non-preemptive Global EDF (GEDF-NP) the resulting sequential sporadic task set after decomposition.
More precisely, we schedule the resulting sequential sporadic
This section considers non-preemptive global EDF schedul- task set by GEDF-DS, which works as follows:
ing (GEDF-NP), in which each job must execute until com-
• The scheduler maintains two queues Qh and Qn at
pletion as soon as it starts execution. Although in general
runtime. Qh stores at most m−1 active jobs of sequential
non-preemptive blocking is harmful to schedulability of real-
tasks with utilization strictly greater than 1/2 (called
time tasks, non-preemptive scheduling may sometimes be
heavy tasks). Other active jobs are stored in Qh in the
preferable to preemptive scheduling for at least two reasons:
order of their absolute deadlines.
(1) simplicity and (2) lower and more predictable context
• When a sequential task πi releases a job :
switch overheads [14].
– If the utilization of πi is greater than 1/2, and Qh
In the following we present a schedulability test for our
is not full yet (it has stored less than m − 1 active
decomposition-based GEDF-NP scheduling algorithm. The
jobs), the job is put into Qh .
test is built upon the known schedulability test condition for
– If the utilization of πi is no greater than 1/2, or
sporadic tasks under GEDF-NP [15]:
Qh is already full (it has stored m − 1 active jobs),
Theorem 5. [15] A constrained deadline periodic task set the job is inserted into Qn according to the absolute
τ with total density `P , maximum density δ> , and a non- deadline order.
preemption overhead ρ is schedulable under GEDF-NP if it • At run time, jobs in Qh have higher priority for execution
satisfies: than those in Qn , i.e., all the jobs in Qh and the first
`P ≤ m(1 − ρ) − (m − 1)δ> (23) m−mh jobs in Qn are executed, where mh is the number
where ρ is ratio between the maximum execution time and the of jobs in Qh at the current time.
minimum deadline among all tasks. • A job is removed from the corresponding queue upon
completion.
There is another test (Theorem 1 in [15]) where `P is Note the GEDF-DS scheduling algorithm defined above is
computed by summing up the density of each individual task slightly different from the original DS scheduling strategy
and this test is tighter than (23). However, we should not use [17], [18], where the “heavy” tasks are statically assigned the
that test directly, because we use (16) to bound `P rather than highest priority. Since the resulting sequential sporadic tasks
summing up the total density of each individual task for the of the same task graph have relative offsets, and thus cannot
resulting sequential task set (similar to the case of GEDF as be active at the same time, we only need to guarantee that
we discussed in Section 3.4). The only difference between (23) the number of “heavy” tasks that are active at the same time
and the test condition (20) for preemptive GEDF is that m in is bounded, instead of bounding the total number of “heavy”
the right-hand side is replaced by m(1 − ρ). Now we present tasks statically.
a schedulability test condition for GEDF-NP for the resulting
sequential tasks after decomposition: Theorem 7. The resulting sporadic task set by decomposing
the parallel task set τ is schedulable under GEDF-DS if it
Theorem 6. The resulting sporadic task set by decomposing satisfies:
the parallel task set τ is schedulable under GEDF-NP if (1 −  UP −Γ>
ρ)/Ω> − Γ> > 0 and it satisfies:  m ≥ Ω1 −Γ> ,
 δ> ≤ 12
>
UP − Γ> (25)
m≥ (24) 

UP ≤ m+1 , 1
< δ ≤ 1.
(1 − ρ)/Ω> − Γ> 2Ω> 2 >
When δ> > 1/2, from (18) we have Ω> Γ> > 1/2, thus:
Proof. Clearly, the task set is unscheduled if δ> > 1. The m − (m − 1)Ω> Γ> m+1
proof for the case δ> ≤ 12 is exactly the same as Theorem 3, < .
Ω> 2Ω>
and thus omitted here.
When 12 < δ> ≤ 1, let ki be the maximal number of Therefore, we can conclude that the test condition (25) in
sequential tasks with density greaterPthan 1/2 among all Theorem 7 strictly dominates condition (21) in Theorem 3.
segments of task τi , and let k = τi ∈τ ki . Suppose the
following condition holds:
D. Global EDF with Restricted Migration (GEDF-R)
m+1
UP ≤ , A weakness of GEDF is the high and unpredictable over-
2Ω>
head incurred by job migrations among different processors.
then by (14) we have:
X To solve this problem, [19] proposed a variant of GEDF with
`î ≤ (m + 1)/2 (26) restricted migration, called GEDF-R, which only allows tasks
τi ∈τ to migrate among processors at job boundaries. The major
and since each heavy task’s utilization is strictly greater than principles of GEDF-R are as follows 1 :
1/2, we know k < m + 1, i.e., k ≤ m. Therefore, we only • Each processor has a local job queue, which stores active
need to consider the two case: jobs assigned to this processor.
Case 1: k ≤ m−1. The number of all sequential tasks with • Jobs on the same processor are scheduled by preemptive
density greater than 1/2 is at most m − 1. EDF scheduling algorithm.
As the highest priorities are assigned to jobs (of heavy • When a job of task πi is released, it is assigned to a
tasks) in Qh , they are executed on at most k processors processor Pz that satisfies:
without any interference. In the following we will prove that X
the remaining light tasks are also schedulable on the remaining δ(πi ) + δ(πj ) ≤ 1 (27)
m−k processors by GEDF. If this is true, these light tasks are πj ∈Pz
also schedulable if the remaining capacity of the k processors
(excluding the capacity occupied by the k heavy tasks) is where πj ∈ Pz denotes the current active jobs of πj
added in, and thus the entire task set is schedulable on m assigned to the job queue of Pz . We call the condition
processors by GEDF-DS. above the partitioning test. If several processors satisfy
Let `h denote the total density of these heavy tasks and δl the partitioning test, any of them can be chosen to
denote the maximum density among the rest tasks, then we accommodate this newly released job.
know `h > k/2 and δl < 1/2. By (26) and `h > k/2, the • When a job finishes execution on Pz , it is removed from
total density of the light tasks is bounded by: the job queue of Pz .
X It was shown in [19] that the schedulability test for fully-
`ˆ − `h ≤ (m + 1)/2 − k/2.
migrative GEDF in (20) also works for EDF-R. The following
Let `lP denote the total load of the light tasks, since δl < 1/2 Lemma shows that EDF-R also has the same schedulability
test as GEDF for the resulting sequential sporadic task set.
and `lP ≤ `ˆ − `h , we have:
P
Theorem 8. The resulting sporadic task set by decomposing

`lP ≤ (m − k) − (m − k − 1)δl . the parallel task set τ is schedulable under GEDF-R if 1/Ω> −
By Theorem 2, these light tasks are schedulable on the m − k Γ> > 0 and it satisfies:
processors. UP − Γ>
Case 2: k = m. The m − 1 active jobs in Qh are executed m≥ . (28)
1/Ω> − Γ>
on at most m − 1 processors. In this case, the schedulability
decision problem of the remaining tasks is reduced to a single Proof. The proof is by contradiction. Suppose that Algorithm
processor scheduling analysis problem. A sequential sporadic GEDF-R fails to assign a job with density δi to any processor.
task set is schedulable
P on a single processor if its total load is It must be the case that each of the m processors fails
bounded by 1, i.e., `ˆ − `h ≤ 1. Similar to the above case, the partitioning test; i.e., jobs previously assigned to each
by (26) and `h > k/2 we have processor have their densities more than 1 − δi . We use ∆q to
X denote the total density of jobs assigned to processor Pq , so
`ˆ− `h ≤ (m + 1)/2 − k/2 = (m + 1)/2 − (m − 1)/2 = 1.
m m
In summary, we have proved in both cases the task set is X X
∆q > m × (1 − δi ) ⇒ ∆q + δi > m − (m − 1)δi .
schedulable. q=1 q=1
From (21), the resulting sequential sporadic task set is
1 [19] assumes tasks to have implicit deadlines (d(π ) = p(π )), which
schedulable by GEDF if: i i
can be easily extended to the case of constraint deadlines (d(πi ) ≤ p(πi ))
m − (m − 1)Ω> Γ> by replacing utilization by density in the assignment conditions and the
UP ≤ . schedulability test.
Ω>
E. Vertex Reassembling
The decomposition procedure in Section III may split a
vertex into several parts and each of them is assigned to
a different segment. However, when scheduled by GEDF,
we do not need to schedule them individually. Instead, we
can treat the entire vertex as a scheduling entity. Recall that
initially the segments are divided according to the earliest
Fig. 5. Reassembling of the decomposed task.
ready time rdy(v) of vertices. As the segments are “stretched”
in the laxity distribution, we will adjust these time constraints
Since at any time only jobs in the same segment of a task can accordingly and use them as the vertex release time and
be active, so the total density of all active jobs of all tasks is deadlines.
bounded by `P , i.e., Fig. 5 shows that different parts of a vertex in Fig. 4 are
reassembled with their release times and deadlines (which have
m
X been adjusted when the segment is “stretched” in the laxity
`P ≥ ∆q + δi . (29) distribution step).
q=1 After vertex reassembling, load and maximal density bounds
of segments are still applicable to the entire decomposed task:
In summary we have:
Lemma 5. For a decomposed task τi (after vertex reassem-
`P > m − (m − 1)δ> bling) we have `(τi ) ≤ `î , and ∀v ∈ τi : δ(v) ≤ δî .
which contradicts with (28). The proof of Lemma 5 follows the same idea as Theorem
1 in [2], and thus omitted here.
Note that the above proof is essentially the same as the For GEDF, the schedulability test in Theorem 3 only de-
original proof for sequential sporadic tasks in [19]. The only pends on the total work load and the maximum density, i.e.,
difference here is that we need to argue (29) holds as the jobs `P and δ> , and after vertex ressembling, the upper bound of
in different segments of a task cannot be active at the same the two parameters used in their schedulability test conditions
time. are not changed, so vertex reassembling is applicable to GEDF
Similar to GEDF-DS, we can apply the density separation according to Lemma 5.
strategy to GEDF-R, resulting in the scheduling algorithm For GEDF-NP, the schedulability test in Theorem 6 also
GEDF-R-DS, which works as follows. Similar to GEDF-R, depends on `P and δ> , so Lemma 5 also implies vertex
each processor Pq maintains a local queue Qq to store active reassembling is applicable to GEDF-NP. However, after the
jobs assigned to this processor. Jobs in Qq are prioritized by vertex reassembling, Emax and Dmin may be different. There-
EDF. Among the m processors, m − 1 processors can be fore, schedulability test in Theorem 6 should be revised by
assigned at most one job of a heavy task that has higher letting Emax and Dmin denote the maximal execution time
priority than those in the corresponding Qq (we call these and minimal relative deadline among all sporadic tasks after
m − 1 processors the heavy processors). At runtime, when vertex reassembling, respectively.
a heavy task is released, we first try to assign it to a heavy The reassembling behavior will force different sequential
processor that has not been assigned a heavy task. If failed, it tasks that belong to a same vertex to be assigned to one
is assigned to the remaining one processor as a normal task. processor under GEDF-R and affect the on-line workload
Similar to GEDF-R, the assignment of a normal task to Qq of partitioning behavior. As shown in Fig. 6, after decomposition,
aPprocessor Pq is guarded by condition (27), where the term v1 is split into two sequential tasks with density 1/2 and 1
πj ∈Pz δ(πj ) also includes the highest-priority heavy task’s
respectively. We can find that all subtasks could be assigned
density. to a single processor as the total density of the sequential tasks
that are ready to be executed at the same time is at most 1.
Theorem 9. The resulting sporadic task set by decomposing However, if v1 is reassembled and its density becomes 3/4,
the parallel task set τ is schedulable under GEDF-R-DS if it v1 and v2 could not be assigned to the same processor as the
satisfies: total density of all ready sequential tasks is 1/2 + 3/4 > 1.
 UP −Γ> Therefore, vertex reassembling is not applicable to GEDF-R.
 m ≥ 1 −Γ , ,
 δ> ≤ 12 For GEDF-DS, by which the priority of each sequential
Ω> >
task with density greater than 1/2 is ceiled, if verticies are re-
m+1 1

UP ≤ 2Ω> , < δ> ≤ 1. assembled, the priority of some light sequential tasks (density

2
no greater than 1/2 but the density of its corresponding vertex
The proof of Theorem 9 follows the same idea of the proofs is greater than 1/2 after reassembling) will be mis-ceiled, thus
of Theorem 8 and Theorem 7, and thus omitted here. affecting the schedulablity. It could also be explained by the
Although the test condition of GEDF-R-DS is the same as same example shown in figure 6. If v1 is ceiled, v2 will not be
GEDF-DS, the key difference is that a job may migrate from executed before v1 is finished. Therefore, vertex reassembling
one processor to another when scheduled by GEDF-DS. is not applicable to GEDF-DS. For the same reason as GEDF-
GEDF GEDF-NP GEDF-DS GEDF-R GEDF-R-DS to D-XU-DS and D-XU-R/D-XU-RDS, where extra overhead
YES YES NO NO NO will be introduced compared with D-XU.
TABLE I
A PPLICABILITY OF VERTEX REASSEMBLING TO EACH SCHEDULING Task sets are generated using the Erdös-Rényi method
ALGORITHM . G(ni , p) [22]. For each task, the number of vertices is
randomly chosen in the range [50, 250] and the worst-case
execution time of each vertex is randomly picked in the range
[50, 100]. For each possible edge we generate a random value
in the range [0, 1] and add the edge to the graph only if the
generated value is less than a predefined threshold p. In general
the critical path of a DAG generated using the Erdös-Rényi
method becomes longer as p increases, which makes the task
Fig. 6. Reassembling example.
more sequential. We use a method similar with [2] to generate
the period Ti (and we set Di = Ti ):
R and GEDF-DS, vertex reassembling is not applicable to Ci
GEDF-R-DS either. (Li + ) × (1 + 0.25 × Gamma(2, 1)) (30)
3U
Table I shows that whether vertex resembling can be applied
where U is the normalized utilization of the task set and
in each algorithm. Although we can reduce the migration
Gamma(2, 1) is a random value by using gamma distribution.
overhead by GEDF-R/GEDF-R-DS and ease the Dhall’s effect
We make this slight modification because in this way we can:
by GEDF-DS, they will suffer extra overhead as they split a
(i) make a valid period and (ii) generate a reasonable number
vertex into different sequential tasks. We will evaluate this
of tasks when the utilization changes.
trade-off in Section V-B.
V. E VALUATIONS A. Schedulablity Comparison without Overhead
In this section, we evaluate our proposed decomposition- We compare the acceptance ratio of each method. The
based scheduling algorithm by simulation experiments. We acceptance ratio is the ratio between the number of task sets
use D-XU, D-XU-DS, D-XU-R, D-XU-RDS and D-XU-NP to deemed to be schedulable by a method and the total number
denote the schedulability tests we derive in Section IV after of task sets that participate in the experiment (with a specific
applying our decomposition strategy to GEDF, GEDF-DS, parameter configuration). Note that, without considering the
GEDF-R, GEDF-R-DS and GEDF-NP respectively. We will context switch and migration overheads, the schedulability test
compare the acceptance ratio of our proposed algorithms and of D-XU is the same as D-XU-R and D-XU-DS is the same
schedulability tests with the state-of-the-art of two paradigms: as D-XU-RDS.
Fig. 7.(a) shows the acceptance ratios of different methods
• Decomposition-based scheduling: We compare with the
under different normalized utilization (UP /m) of the task
GEDF-based scheduling algorithm and analysis tech- sets, with m = 8 and p = 0.1. The experiment results show
niques developed in [2], denoted by D-SAI. that our methods D-NDS and D-DS clearly outperform other
• Global scheduling (without decomposition): We compare
approaches. In Fig. 7.(b) and (c), the normalized utilization
with work of two types of analysis methods in this of each task set is randomly chosen from [0.2, 0.6]. Fig. 7.(b)
category: shows acceptance ratios with different number of processors,
– The schedulability test based on capacity augmenta- with fixed task parallelism p = 0.1. It can be observed that the
tion bounds in [20], denoted by G-LI. schedulability of our methods decrease slightly as the number
– The schedulability test based on response time anal- of processor increases. This is because the pessimism of global
ysis in [7], denoted by G-MEL. Note that G-MEL scheduling becomes more serious when task set consists of
was developed for a more general conditional DAG more tasks (one “bad” task may hurt the schedulability of
task mode. However, it can be directly applied to the the entire task set). Fig. 7.(c) shows acceptance ratios under
DAG model of this paper, which is a special case of different p with m = 8. We can observe that for all global and
that in [7]. decomposition methods, the schedulability is better for tasks
Other global methods not included are either theoretically with higher parallelism. This is because, for a task with fixed
dominated or shown to be significantly outperformed (with total amount of workload, a more parallel structure in general
empirical evaluations) by one of the above listed methods leads to a shorter critical path, and thus more laxity, which
(details can be found in [5][13][21][4][7]). is beneficial to schedulability. The trend of global scheduling
Furthermore, in Section V-B we investigate the schedula- methods is similar.
bility of global EDF and its variations, i.e., D-XU-DS, D- Note that in Fig. 7, the curves of D-XU/D-XU-R and D-
XU-R, D-XU-RDS and D-XU-NP, while considering overhead XU-DS/D-XU-RDS overlap with each other. This is because
introduced by the context switch and migration. The schedula- schedulability tests of D-XU/D-XU-R and D-XU-DS/D-XU-
bility test conditions of D-XU-DS and D-XU-RDS are exactly RDS are the same under condition δ> ≤ 1/2, and in the
the same, which dominate the tests of D-XU and D-XU-R, experiments of Fig. 7 the elasticity of tasks is small and almost
respectively. However, vertex reassembling is not applicable all the generated task sets satisfy this condition.
0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 8 1 2 1 6 2 0 2 4 2 8 3 2 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
D -S A I D -S A I
G -M E L G -M E L
G -L I G -L I
8 0 D -X U /D -X U -R 8 0 8 0 8 0 8 0 D -X U /D -X U -R 8 0
D -X U -D S /D -X U -R D S D -X U -D S /D -X U -R D S
D -X U -N P D -X U -N P
D -S A I
A c c e p ta n c e R a tio %
6 0 6 0 G -M E L 6 0 6 0
6 0 G -L I
6 0
D -X U /D -X U -R
D -X U -D S /D -X U -R D S
D -X U -N P
4 0 4 0 4 0 4 0 4 0 4 0
2 0 2 0 2 0 2 0 2 0 2 0
0 0 0 0 0 0
0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 8 1 2 1 6 2 0 2 4 2 8 3 2 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
N o r m a liz e d U tiliz a tio n N u m b e r O f P ro c e s s o rs p
(a) Comparison with different normalized total (b) Comparison with different number of pro- (c) Comparison with different p.
utilization. cessors.
Fig. 7. Comparison of the schedulability of task sets with low elasticity.
1 0 0
0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9
1 0 0 ratios between Ti and Li (corresponding to the x-axis). The
D -S
G -M
A I
E L elasticity becomes smaller as the value in x-axis increases.
G -L I
8 0 8 0
D -X
D -X
U /D -X U -R
U -D S /D -X U -R D S
The parallelism p is randomly chosen from [0.1, 0.3]. The
D -X U -N P
normalized utilization of each task set randomly distributes in
6 0 6 0
[0.2, 0.7]. The number of tasks is randomly chosen from
P [2, 8].
4 0 4 0 The number of processors is computed by m = d C Ti
i
/U e
2 0 2 0
where U is the normalized utilization. PWe generate some light
tasks with total utilization of m ∗ U − C Ti to meet the target
i
0
0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9
0 normalized utilization.
N o r m a liz e d U tiliz a tio n It can be observed that tasks are hard to schedule for
D-XU/D-XU-R when elasticity is high. However as density
Fig. 8. Comparison with high elasticity.
separation is applied, D-XU-DS/D-XU-RDS can ease the
1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0
Dhall’s effect to some extent, thus it can schedule a large
1 .0 1 .0
portion of task sets with high elasticity.
0 .8 0 .8
In summary, when the elasticity of tasks is relatively small,
and the parallelism of tasks is high, D-XU/D-XU-R and
D-XU-DS/D-XU-RDS perform better than other methods.
0 .6 0 .6
Notably, D-XU/D-XU-R and D-XU-DS/D-XU-RDS signifi-

0 .4 0 .4
cantly outperforms D-SAI which also uses task decomposition
0 .2 D -X U /D -X U -R 0 .2 method. This is because tasks are decomposed according to
D -X U -D S /D -X U -R D S
their structure characteristics under our decomposition strat-
0 .0 0 .0
1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0 egy. In particular, the density separation strategy can greatly
T i/ L i
improve schedulability over global scheduling in the presence
Fig. 9. Comparison with different elasticity.
of tasks with high density.
B. Schedulablity Comparison With Overhead

In Fig. 8, we investigate the schedulability of task sets In this subsection, we compare the schedulability tests of
with relatively high elasticity. Global EDF scheduling suf- different scheduling approaches considering the context switch
fers from Dhall’s effect when the elasticity of each task overhead. In particular, we distinguish two types of overhead:
is large. We change the constant in (30) to a very large local context switch overhead, denoted by Oloc , which happens
number 60, i.e., change the period generation formula to when a task is preempted and later resumed on the same
Ci
(Li + 60U ) × (1 + 0.25 × Gamma(2, 1)), in order to generate processor, and migration overhead, denoted by Omig , which
tasks with high elasticity. Fig. 8 compares the acceptance ratios happens if a task is preempted and later resumed on a different
of different methods under different normalized utilization processor.
when m = 8 and p = 0.1. Comparing with Fig. 7.(a), the We follow the widely-used approach to count the overhead
schedulability of all methods (especially GEDF) decreases by extending the execution time in the schedulability tests [23],
significantly. However, D-XU-DS can ease the Dhall’s effect [24]. In D-XU and D-XU-DS, in general a resulting sequential
to some extent. As shown in Fig. 8, when the normalized task’s jobs may be preempted and then resumed on a different
utilization is less than 0.4, the schedulability of D-XU-DS processor. Therefore, overhead is counted by adding Omig and
outperforms all the other methods. 2Oloc to c(πj ) for each resulting sequential task πj when πj
We further compare D-XU/D-XU-R and D-XU-DS/D-XU- is the beginning part of a vertex. Otherwise the overhead is
RDS with different elasticity. Fig. 9 follows the same setting counted by adding 2Omig and 2Oloc to c(πj ). This is because
as Fig. 7, while task periods are generated with different one vertex may be split into several sequential tasks, and one
8 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
8 0 ample, a Gang EDF scheduling algorithm was proposed in
D -X U -D S
D -X
D -X
U -R
U -R D S
[27] for moldable parallel tasks where the task could only be
D -X U
6 0 D -X U -N P 6 0 executed on a fixed number of processors. The synchronous
task model was studied in [28], [12].
4 0 4 0 Recently, the scheduling problem of multiple DAGs has
been studied with decomposition-based algorithms. In [2], a
2 0 2 0 capacity augmentation bound of 4 was proved under GEDF.
A schedulability test in [3] was provided to achieve a lower
0 0 capacity bound than 4 in most cases, while √ in other cases
A v e r a g e E x c u tio n tim e o f v e r tic e s ( O
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
m ig ) above 4. In [4], a capacity bound of 3+2 5 was proved for
some special task sets.
Fig. 10. Comparison of the schedulability of task sets considering context
switch and migration overhead. For global scheduling (without decomposition), a resource
augmentation bound of 2 was proved in [29] for arbitrary
deadline task but only considering single DAG. In [5],[13],
more migration overhead needs to be counted for those non- a resource augmentation bound of 2 − 1/m and a capacity
head sequential tasks. In D-XU-R/D-XU-RDS, a job does not augmentation bound of 4 − 2/m were proved. A pseudo-
migrate among different processors, so the overhead is counted polynomial time sufficient schedulability test was presented in
by adding 2Oloc to c(πj ) for each resulting sequential task [5], which later was generalized and dominated by [6]. [13]
√
πj when πj is the beginning part of a vertex. Otherwise the proved the capacity augmentation bound 3+2 5 for GEDF and
overhead is counted by adding 2Oloc and Omig to c(πj ). 3.732 for G-RM.
The values of Oloc and Omig are determined based on Beyond DAG task models, researchers have studied con-
the empirical evaluation results in [24]: Omig = 100us and ditional parallel real-time tasks where both fork-join and
Oloc = 800ns respectively (see Figure 4 in [24]). Tasks branching semantics exist in the same graph [7], [30].
are generated with the same parameter setting as in Fig. The scheduling and analysis of several variants of global
8. The periods are generated according to the formula of EDF was studied under sequential task model. In [14] and
Ci
(Li + 60U ) × (1 + 0.25 × Gamma(2, 1)). The normalized [15], the schedulability of global EDF under non-preemptive
utilization is randomly chosen from [0.1, 0.7], and the execu- algorithms are analyzed. The global EDF scheduling with
tion time of each vertex varies in [Omig , 10000×Omig ], which restricted migration was proposed in [19]. The density sep-
corresponds to the x-axis in Fig. 10. aration scheduling was studied in [31] and [18] for implicit
From Fig. 10, we observe that the performance of D-XU- and arbitrary deadline tasks respectively.
DS is even worse than D-XU when the execution time of
vertices is low, although the ideal schedulability test of D- VII. CONCLUSIONS
XU-DS dominates D-XU. This is because one vertex may
be divided into several parts and each part is scheduled as In this paper, we study the scheduling of parallel real-time
an individual sequential subtask under D-XU-DS, for which tasks modeled by DAG. We propose a new decomposition
higher overhead is introduced. This phenomenon is more algorithm under the guideline of the structure characteristic
obvious when the execution times are lower, with which value of each task. We also investigate the efficiency of these
the influence of overhead is more significant. As the vertex variants of global EDF scheduling algorithm while considering
execution time increases, the relative influence of overhead overhead introduced by migration and context switch. The
becomes smaller and D-XU-DS’s performance dominates D- experimental results suggest that the combination of global
XU. We also observe that D-XU-RDS always performs better EDF with restricted migration and density separation has
than D-XU-DS, while their schedulability test conditions are promising performance in terms of schedulability. A possible
exactly the same. This is because migrations of sequential future work is to extend decomposition-base scheduling to the
subtasks are not allowed in D-XU-RDS (D-XU-R) and thus conditional DAG task model. The challenge is to deal with
less overhead is introduced. D-XU-R performs much better the dynamic branching in the decomposition phase to bound
than D-XU when the execution times of vertices are low the worst-case load and density.
due to the overhead of migration, and the schedulability of
both of them is almost the same when the execution times of VIII. ACKNOWLEDGMENT
vertices are high where the influence of migration overhead This work is supported by the Research Grants Council
becomes less significant. We can also see that D-XU-NP may of Hong Kong (GRF 15204917 and 15213818) and National
perform better than D-XU and D-XU-DS, when the execution Natrual Science Foundation of China (Grant No. 61672140).
time is small, with which the schedulability loss due to non-
preemptive blocking is dominated by the benefit of the low R EFERENCES
overhead.
[1] R. I. Davis and A. Burns, “A survey of hard real-time scheduling
for multiprocessor systems,” ACM computing surveys (CSUR), vol. 43,
VI. R ELATED W ORK no. 4, p. 35, 2011.
[2] A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill, “Parallel
Early work on real-time scheduling of parallel tasks assumes real-time scheduling of dags,” Parallel and Distributed Systems, IEEE
constraints to task structures [25], [26], [27], [28]. For ex- Transactions on, vol. 25, no. 12, pp. 3242–3252, 2014.
[3] M. Qamhieh, F. Fauberteau, L. George, and S. Midonnet, “Global edf [28] K. Lakshmanan, S. Kato, and R. Rajkumar, “Scheduling parallel real-
scheduling of directed acyclic graphs on multiprocessor systems,” in time tasks on multi-core processors,” in Real-Time Systems Symposium
Proceedings of the 21st International conference on Real-Time Networks (RTSS), 2010 IEEE 31st. IEEE, 2010, pp. 259–268.
and Systems. ACM, 2013, pp. 287–296. [29] S. Baruah, V. Bonifaci, A. Marchetti-Spaccamela, L. Stougie, and
[4] M. Qamhieh, L. George, and S. Midonnet, “A stretching algorithm for A. Wiese, “A generalized parallel task model for recurrent real-time
parallel real-time dag tasks on multiprocessor systems,” in Proceedings processes,” in Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd.
of the 22nd International Conference on Real-Time Networks and IEEE, 2012, pp. 63–72.
Systems. ACM, 2014, p. 13. [30] S. Baruah, “The federated scheduling of systems of conditional sporadic
[5] V. Bonifaci, A. Marchetti-Spaccamela, S. Stiller, and A. Wiese, “Fea- dag tasks,” in Proceedings of the 12th International Conference on
sibility analysis in the sporadic dag task model,” in Real-Time Systems Embedded Software. IEEE Press, 2015, pp. 1–10.
(ECRTS), 2013 25th Euromicro Conference on. IEEE, 2013, pp. 225– [31] S. K. Baruah, “Optimal utilization bounds for the fixed-priority schedul-
233. ing of periodic task systems on identical multiprocessors,” Computers,
[6] S. Baruah, “Improved multiprocessor global schedulability analysis of IEEE Transactions on, vol. 53, no. 6, pp. 781–784, 2004.
sporadic dag task systems,” in Real-Time Systems (ECRTS), 2014 26th
Euromicro Conference on. IEEE, 2014, pp. 97–105.
[7] A. Melani, M. Bertogna, V. Bonifaci, A. Marchetti-Spaccamela, and
G. C. Buttazzo, “Response-time analysis of conditional dag tasks in
multiprocessor systems,” in Real-Time Systems (ECRTS), 2015 27th
Xu Jiang has received his BS degree in computer
Euromicro Conference on. IEEE, 2015, pp. 211–221.
science from Northwestern Polytechnical University,
[8] P. Voudouris, P. Stenström, and R. Pathan, “Timing-anomaly free dy-
China in 2009, received the MS degree in computer
namic scheduling of task-based parallel applications,” in Real-Time and
architecture from Graduate School of the Second
Embedded Technology and Applications Symposium (RTAS), 2017 IEEE.
Research Institute of China Aerospace Science and
IEEE, 2017, pp. 365–376.
Industry Corporation, China in 2012, and PhD from
[9] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction
Beihang University, China in 2018. Currently, he
to algorithms second edition,” 2001.
is an Assistant Professor at School of Computer
[10] M. L. Dertouzos, “Control robotics: The procedural control of physical
Science and Engineering, University of Electronic
processes,” in Information Processing, 1974.
Science and Technology of China. His research
[11] C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogram-
interests include real-time systems, parallel and dis-
ming in a hard-real-time environment,” Journal of the ACM (JACM),
tributed systems and embedded systems.
vol. 20, no. 1, pp. 46–61, 1973.
[12] A. Saifullah, J. Li, K. Agrawal, C. Lu, and C. Gill, “Multi-core real-time
scheduling for generalized parallel task models,” Real-Time Systems,
vol. 49, no. 4, pp. 404–435, 2013.
[13] J. Li, K. Agrawal, C. Lu, and C. Gill, “Outstanding paper award:
Analysis of global edf for parallel tasks,” in Real-Time Systems (ECRTS), Nan Guan is currently an assistant professor at
2013 25th Euromicro Conference on. IEEE, 2013, pp. 3–13. the Department of Computing, The Hong Kong
[14] K. Jeffay, D. F. Stanat, and C. U. Martel, “On non-preemptive scheduling Polytechnic University. Dr Guan received his BE and
of periodic and sporadic tasks,” in RTSS, 1991. MS from Northeastern University, China in 2003 and
[15] S. K. Baruah, “The non-preemptive scheduling of periodic tasks upon 2006 respectively, and a PhD from Uppsala Univer-
multiprocessors,” Real-Time Systems, vol. 32, no. 1-2, pp. 9–20, 2006. sity, Sweden in 2013. Before joining PolyU in 2015,
[16] S. K. Dhall and C. L. Liu, “On a real-time scheduling problem,” he worked as a faculty member in Northeastern Uni-
Operations research, vol. 26, no. 1, pp. 127–140, 1978. versity, China. His research interests include real-
[17] B. Andersson, S. Baruah, and J. Jonsson, “Static-priority scheduling on time embedded systems and cyber-physical systems.
multiprocessors,” in RTSS, 1991. He received the EDAA Outstanding Dissertation
[18] M. Bertogna, “Real-time scheduling analysis for multiprocessor plat- Award in 2014, the Best Paper Award of IEEE
forms,” PhD Defense, 2008. Real-time Systems Symposium (RTSS) in 2009, the Best Paper Award of
[19] S. K. Baruah and J. Carpenter, “Multiprocessor fixed-priority scheduling Conference on Design Automation and Test in Europe (DATE) in 2013.
with restricted interprocessor migrations,” Journal of Embedded Com-
puting, vol. 1, no. 2, pp. 169–178, 2005.
[20] J. Li, J. J. Chen, K. Agrawal, C. Lu, C. Gill, and A. Saifullah, “Analysis
of federated and global scheduling for parallel real-time tasks,” in Real-
Time Systems (ECRTS), 2014 26th Euromicro Conference on. IEEE,
2014, pp. 85–96. Xiang Long received his BS degree in Mathematics
[21] S. Baruah, V. Bonifaci, A. Marchetti-Spaccamela, and S. Stiller, “Im- from Peking University, China in 1985, received the
proved multiprocessor global schedulability analysis,” Real-Time Sys- MS and PhD degrees in Computer Science from
tems, vol. 46, no. 1, pp. 3–24, 2010. Beihang University, China in 1988 and 1994. He has
[22] D. Cordeiro, G. Mounié, S. Perarnau, D. Trystram, J.-M. Vincent, and been a professor at Beihang University since 1999.
F. Wagner, “Random graph generation for scheduling simulations,” in His research interests include parallel and distributed
Proceedings of the 3rd International ICST Conference on Simulation systems, computer architecture, real-time systems,
Tools and Techniques. ICST (Institute for Computer Sciences, Social- embedded systems and multi-/many-core oriented
Informatics and Telecommunications Engineering), 2010, p. 60. operating systems.
[23] B. B. Brandenburg and J. H. Anderson, “On the implementation of
global real-time schedulers,” in Real-Time Systems Symposium, 2009,
RTSS 2009. 30th IEEE. IEEE, 2009, pp. 214–224.
[24] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H.
Anderson, “Litmusˆ rt: A testbed for empirically comparing real-time
multiprocessor schedulers,” in Real-Time Systems Symposium, 2006.
RTSS’06. 27th IEEE International. IEEE, 2006, pp. 111–126. Han Wan is a Lecturer of Computer Science and
[25] G. Manimaran, C. S. R. Murthy, and K. Ramamritham, “A new ap- Engineering at Beihang University. Her research
proach for scheduling of parallelizable tasks in real-time multiprocessor includes computer architecture simulation, WCET
systems,” Real-Time Systems, vol. 15, no. 1, pp. 39–60, 1998. analysis in real-time system, parallel computing, and
[26] W. Y. Lee and L. Heejo, “Optimal scheduling for real-time parallel GPU accelerating general purpose application. She
tasks,” IEICE transactions on information and systems, vol. 89, no. 6, has also led research projects on innovations in ped-
pp. 1962–1966, 2006. agogy, curriculum, and educational technology. Dr.
[27] S. Kato and Y. Ishikawa, “Gang edf scheduling of parallel task systems,” Wan has over 30 technical publications.
in Real-Time Systems Symposium, 2009, RTSS 2009. 30th IEEE. IEEE,
2009, pp. 459–468.

Decomposition-Based Real-Time Scheduling of Parallel Tasks On Multi-Cores Platforms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decomposition-Based Real-Time Scheduling of Parallel Tasks On Multi-Cores Platforms

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Decomposition-based Real-Time Scheduling of

of execution time of the corresponding vertex assigned to this

δ(πj ) = c(πj )/d(πj ).

The maximal density among all tasks is denoted by:

δ> = max {δ(πj )} .

total execution time of all vertices of task τi is denoted by

[rdy(v), fsh(v)] of a vertex v may cover several segments.

resulting in the minimal Ωi . Before that we first introduce a

General Real-Time Scheduling Problem: Assume that a

Theorem 8. The resulting sporadic task set by decomposing

V. E VALUATIONS A. Schedulablity Comparison without Overhead

Fig. 7. Comparison of the schedulability of task sets with low elasticity.

Notably, D-XU/D-XU-R and D-XU-DS/D-XU-RDS signifi-

B. Schedulablity Comparison With Overhead

You might also like