You are on page 1of 11

Allocation, Assignment and Allocation of System Components

Scheduling
❚  Defines an architecture by selecting hardware resources
which are necessary to implement a given system.
❚  The components can be, for example, microprocessors,
micro-controllers, DSP s, ASIP s, ASIC s, FPGA s,
memories, buses or point-to-point links.
Kris Kuchcinski ❚  Usually made manually with a support of estimation
tools.
Krzysztof.Kuchcinski@cs.lth.se ❚  In simple cases can be performed automatically using
optimization strategy.

2014-08-26 2
2014-08-26 1

Assignment of System Components Scheduling

❚  After allocation the partitioning of system functionality to ❚  Depending on the computation model scheduling can be
selected components can be done. done off-line or during run-time.
❚  The partitioning defines the assignment of tasks to ❚  Static vs. dynamic scheduling.
particular components. ❚  RTOS support for dynamic scheduling.
❚  If there is number of tasks assigned to the same ❚  Scheduling can address advanced execution
component, which does not support parallel execution, techniques, such as software pipelining.
the execution order need to be decided — task
scheduling. ❚  Can be applied to tasks allocated to hardware, software
as well as hardware operations and software
instructions.

2014-08-26 3 2014-08-26 4

Scheduling Scheduling Approaches

❚  Data-flow scheduling (SDF, CSDF) ❚  Static scheduling


❙  static assignment of the instants at which the ❙  static cycling scheduling
execution takes place, ❚  Dynamic scheduling
❙  time-constrained and resource-constrained, ❙  fixed priorities — e.g., rate monotonic
❙  typical for DSP applications (hw and sw).
❙  dynamic priorities — e.g., earliest deadline first
❚  Real-time scheduling
❙  periodic, aperiodic and sporadic tasks,
❙  independent or data-dependent tasks,
❙  based on priorities (static or dynamic).

2014-08-26 5 2014-08-26 6

1
High-Level Synthesis Scheduling High-Level Synthesis Scheduling (cont d)

x u y dx
Synthesis of the following code 3 x u dx 3 y u dx x dx 3 x u dx 3 y u dx x dx

(inner loop of differential equation integrator) * * * * + * * +


dx y x1 a x1 a
while c do * * + < * <
*
u u
begin dx
- y1 c - c
x1 := x + dx; * *
u1 := u - ( 3 * x * u * dx) - (3 * y * dx); - -
y1 := y + ( u * dx); +
c = x < a; u1
x := x1; u := u1; y := y1; u1 y1

end;
scheduled register
data-flow graph data-flow graph allocation
7 8

HLS typical process


HLS typical process (cont d)

❚  Behavioral specification ❚  Design Representation


❙  Language selection ❙  parsing techniques
E A B C D
❙  Parallelism ❙  data-flow analysis
❙  Synchronization
❙  parallelism extraction +
procedure example; +
var a, b, c, d, e, f, g : integer; ❙  program transformations
begin ❘  elimination of high-level constructs
read(a, b, c, d, e); ❘  loop unrolling
x x
f := e * (a + b); ❘  subexpression detection
g := (a + b) * (c + d);
... F G
end;
2014-08-26 9 2014-08-26 10

HLS typical process (cont d) HLS typical process (cont d)

❚  Operation Scheduling ❚  Data Path Allocation


❙  Parallelism/cost trade-off ❙  Operation selection
E A B C D
❙  Performance measure ❙  Register/Memory allocation
❙  Clocking strategy + ❙  Interconnection Generation
+
❙  Hardware Minimization

x x

F G
2014-08-26 11 2014-08-26 12

2
Data Path Allocation HLS typical process (cont d)

❚  Control Allocation
E<0:7> A<0:7> C<0:7> B<0:7>D<0:7>
Load R3 Load R4
❙  Selection of control
style (PLA, random logic, etc.)
M1 M2 M3 M4 Add, Load R2

Reg. R1 Reg. R2 Reg. R3 Reg. R4


❙  Clock implementation
Load R1, R3, R4

Mul, Load R4
❙  Signal/condition design
* +
Add, Load R1

G F
Data part
Mul, Load R2
Control part

2014-08-26 13 2014-08-26 14

Control Allocation Module Binding



E<0:7> A<0:7> C<0:7> B<0:7>D<0:7>

E<0:7> A<0:7> C<0:7> B<0:7>D<0:7>


M1 M2 M3 M4 Control ROM

Reg. R1 Reg. R2 Reg. R3 Reg. R4 M1 M2 M3 M4
0000: 11000000 0001
*
Reg. R1 Reg. R2 Reg. R3 Reg. R4 0001: 00100000 0010
+
0010: 00011000 0011
Mul 8 Adder 8 0011: 01000000 0100
S0: Start next S1;" G F ...

S1: M3, Load R3, M4=0, Load R4 next S2;" G F
S2: Add, ¬M2, Load R2, ¬M1, Load R1, ¬M3, Load R3, M4=1, Load R4 next S3;"
S3: Add, M1, Load R1, Mul, M4=2, Load R4 next S4;"
S4: Mul, M2, Load R2 next ..."
"

Control description (FSM)
2014-08-26 15 2014-08-26 16

ASAP Scheduling ALAP Scheduling


Control Control
step 1 3 step 1
1 2 3 4 5 + x 1 2 3 4 5 +
+ + x + x 1 + + x + x 1
2 5 2
6 7 8 + x 6 7 8 +
2 2
- + + - + +
4 3 4
3 + 3 x +
9 10 9 10
x x 6 x x 6 5
- - x
4 4
9 7 9 8
DFG 5 x + DFG 5 x +

8 7 10
6 + 6 + x

7 10 ALAP schedule
ASAP schedule x
2014-08-26 17 2014-08-26 18

3
List Scheduling List Scheduling
Control
step 1 3
❚  Constructive scheduling algorithm which selects 1 2 3 4 5 + x
+ + x + x 1
operation to be assigned to control steps based on a 4 5
priority function. 6 7 8 2 + x
- + +
❚  Priority function can be different in different versions of 2
3 +
list scheduling algorithms: 9 10
x x 6
❙  higher priority to operations with low mobility, or 4
-

❙  higher priority to operations with more immediate DFG


9 8
successors, 5 x +

❙  length of the path from the operation to the end of the 7 10


6 + x
block,
❙  ... List schedule
2014-08-26 19 2014-08-26 20

Force-Directed Scheduling Time Frame of Each Operation


Control
step
❚  The basic strategy is to place similar operations in 1 2 3 4 5
different control steps to balance the concurrency of the 1 + + x + x
operations without increasing the total execution time.
6 7 8 1 3 4
❚  By balancing the concurrency of operations, it is 2 - + + + x +
ensured that each functional unit has a high utilization 9 10 6 2 8 5
and therefore the total number of units required is 3 x x - + + x
decreased. 7
9 10
4 x + x
❚  Three main steps:
❙  determine the time frame of each operation,
❙  create a distribution graph, and ASAP schedule ALAP schedule

❙  calculate the force associated with each assignment.


2014-08-26 21 2014-08-26 22

Distribution Graph
Distribution Graph (cont d)
Control step 1/2 1/2 1/2 1/3 1/3 1/2 1/2 1/2 1/3
Control step Addition/Subtraction DG Multiplication DG
1 + x +
+
1
1 3 4 x
2 2 +
-
5 2
3 6 8
x
x +
3
10
4 9 7
4

DGmult(1) = 1/2 + 1/3 = 0.833


DGmult(2) = 1/2 + 1/ 3 = 0.833
DGmult(3) = 1/2 + 1/2 + 1/3 = 1.333
2014-08-26
DGmult(4) = 1/2 + 1/2 = 1 23 2014-08-26 24

4
Force Calculation Forced Directed Scheduling Algorithm

❚  the force associated with the tentative assignment of ❚  Once all the forces are calculated, the operation-control
an operation to c-step j is equal to the difference step pair with the lowest force is scheduled.
between the distribution value in that c-step and the ❚  The distribution graphs and forces are then updated and
average of the distribution values for the c-steps the above process is repeated until all operations are
bounded by the operation s time frame. scheduled.
t
DG(i)
Force( j ) = DG( j ) − ∑
i= f t − ( f + 1)
❚  assignment of operation 10 to control step 3
Force(3) = DGmult(3) - average DGmult value over time
frame of operation 10 = 1.333 - (1.333 + 1)/2 = 0.167,
Force(4) = -0.167
2014-08-26 25 2014-08-26 26

Advanced Scheduling Topics Conditional Dependency Graph

❚  Control constructs h1
not
&
❙  conditional sharing — sharing of resources for
mutually exclusive operations,
&
h3 +
❙  speculative execution.
❚  Chaining and multi-cycling.
&
h7 +
❚  Pipelined components. not
+
❚  Pipelining. h10
&
❚  Registers and memory allocation.
+
❚  Low-power issues.
2014-08-26 27 2014-08-26 28

Guard Hierarchy and


Mutually Exclusive Guards Conditional Resource Sharing

adder multiplier
h1
&
h10 h3, h6, h9

h3 h5 h2
h1
x x
h2 h3
& & & c
h3 h10, h2, h6, h7, h8, h9 c
not c not c
h5 h9
h7 h8 h6 h6 h10, h3, h7, h9
& & &
h7 h3, h6 + +
h8 h3, h9 + +
h10 h9 h9 h10, h3, h5, h6, h8
& &

Guard Hierarchy Mutually Exclusive Guards


2014-08-26 29 2014-08-26 30

5
Speculative Execution Chaining and multicycling

ALU adder multiplier


+
x + x
Chaining
x
x 100 ns
+

c + 100 ns
c
+ +
+
50 ns x
200 ns Multicycle
+
speculatively 100 ns
executed operation
2014-08-26 31 2014-08-26 32

Pipelined Components Fifth Order Elliptic Wave Filter DFG

i1 i2 i3 i4 i5 i6
multiplier 1 +
+ +
+
+ +
50 ns x x x
+ +
+
100 ns + + +
x
x + x
+ i7 i8 +
+ + + +
x + + x
+ x x +
+ +
+ + +
2014-08-26 33 2014-08-26 34

Scheduling of H.261
Pipelining — Example Video Coding Algorithm

Texec = 2963, T10exec = 29630


2014-08-26 35 36

6
Scheduling of H.261
Video Coding Algorithm Real-Time Scheduling

❚  System is a set of tasks


❙  tasks have known execution times (usually WCET)
❚  Tasks are enabled by repeating events in the
environment
❙  some known timing constraints
❚  Task executions must satisfy timing requirements
❚  Single processor (hard)
❙  multiprocessors — much harder, mostly negative
results (e.g. NP-hardness)
Texec = 3373, T pipe init = 1154, T10exec = 13759
37 2014-08-26 38

Scheduling Policies Assumptions and Definitions

❚  Static (pre-run-time, off-line) ❚  Fixed number of periodic and independent tasks.


❙  round-robin, ❚  Parameters:
❙  static cyclic. ❙  execution period — T
❚  Dynamic (run-time, on-line ) ❙  worst-case execution time — C
❙  static priority, ❙  execution deadline — D
❙  dynamic priority, ❙  usually assumed that T=D
❙  preemptive or not. Ti

task i
Ci Di
2014-08-26 39 2014-08-26 40

Static Scheduling Dynamic Scheduling — Static Priorities

❚  Round-robin: ❚  Priorities are assigned to tasks off-line.


❙  pick an order of tasks, e.g. A B C D, ❚  A task with the highest priority is always executed
❙  execute them forever in that order. among enabled tasks.
❚  Static cyclic: ❚  Single processor execution.
❙  pick a sequences of tasks, e.g. A B C B D,
❙  execute that sequence forever. ❚  Preemptive schedule.

❚  Much like static data-flow. ❚  Rate Monotonic Scheduling (RMS) — a task with a
shorter period is assigned a higher priority.
❚  If there are arbitrary task periods the schedule duration
is equal least common multiplier (LCM) of task periods ❚  RMS is optimal — no fixed priority scheme does better.
(MPEG 44.1 kHz * 30 Hz requires 132 300 repetitions).
❚  Problem with sporadic tasks.
2014-08-26 41 2014-08-26 42

7
Rate Monotonic Scheduling- An Example Rate Monotonic Scheduling

20ms 40ms
❚  Utilization
Ci
Task2 Utilization = ∑
Ti i
Task1
❚  All deadlines met when utilization ≤ n(21/n- 1)
T2 ❙  For n=2, 2*(2 1/ 2-1) = 0.83,
T1
❙  For n → ∞ , n(21/n- 1) → ln(2) = 0.69
❚  An example
Task Period (T) Rate (1/T) Exec Time (C) Utilization(U)
Task2 1 100 ms 10 Hz 20 ms 0.2
2 150 ms 6.67 Hz 40 ms 0.267
Task1
3 350 ms 2.86 Hz 100 ms 0.286

Priority(Task1) < Priority(Task2) ❚  0.753 < 0.779 (=3*(21/3 -1) => tasks are schedulable
2014-08-26 43 2014-08-26 44

Rate Monotonic Scheduling Implementation Critical sections

❚  Table of tasks with a task priority and its state (enables, ❚  Access to a shared resource should be mutually
etc.). exclusive to access a resource
❚  At context switch select the task with the highest priority. ❙  lock the resource → critical section starts
❘  may fail and block the task
❚  Linear complexity, O(n) where n = number of tasks.
❙  process the resource
❙  unlock the resource critical section ends

Task C1 C2

2014-08-26 45 2014-08-26 46

Rate Monotonic Scheduling — Problems Deadlock

❚  Static cyclic scheduling is better if possible.


Priority H C2 C1
❚  Critical sections in tasks and communication create
Priority L
problems C1 C2

❙  Deadlock
❙  Priority inversion
❚  Methods for solving priority inversion problem Deadlock
Priority H C2
❙  priority inheritance,
❙  priority ceiling protocol. Priority L C1

2014-08-26 47 2014-08-26 48

8
Priority Inversion Priority Inheritance

❚  When a job Ji tries to enter a critical section and it is already locked


Priority H C by a lower priority task Jk then Ji waits and Jk inherits the priority of Ji
Priority M ❚  The queue of jobs waiting for a resource is ordered by decreasing
priority
Priority L C
❚  Priority inheritance is transitive.
Priority inversion ❚  At any time, the priority at which a critical section is executed is
always equal to the highest priority of the jobs that are currently
blocked on it.
Priority H C
❚  When a job exits critical section it usually resumes the priority it had
Priority M when it entered critical section.
Priority L C C C ❚  When released, a resource is granted to the highest priority job, if any
waiting for it.

2014-08-26 49 2014-08-26 50

Priority Inheritance Protocol Priority Inversion — problems

❚  Chained blocking:
Priority H C
ceiling(C) =H ❙  a job can have several critical sections,
Priority M
❙  it can be blocked whenever it wants to enter a critical
Priority L C section
❙  this generates overhead in terms of task switching.
❚  The main idea is to reduce the occurrence of priority
Priority H C
inversions by preventing multiple priority inversions; a
job will be blocked at most once before it enters its first
Priority M critical section.
Priority L C ❚  The solution prevents deadlock.
priority H

2014-08-26 51 2014-08-26 52

Priority ceiling protocol Dynamic Scheduling — Dynamic Priorities

❚  Assumptions ❚  Dynamic priorities needed since sometime static


❙  a task cannot voluntarily suspend itself, priorities might not meet deadlines
❙  semaphores cannot be held between invocations,
❙  semaphores must be locked in a nested manner. ❚  An Example
❚  Protocol C1=2, T1=5
❙  Every CS has a ceiling: priority of a highest task that may
C2=4, T2=7
enter it,
❙  A task is allowed into a CS only if its priority is higher than
ceilings of all active CS s, Task1
❙  If task A is blocking some higher priority task B, then A gets
Task2
the priority of B while in CS.
2 5 7

2014-08-26 53 2014-08-26 54

9
Earliest Deadline First Earliest Deadline First

❚  Minimizes number of missed deadlines. ❚  EDF can achieve 100% utilization until overload occurs.
❚  Tasks with earliest deadline has priority. ❚  Cannot guarantee meeting deadlines for arbitrary data
❚  An Example arrival times.
❚  An example
Task Period (T) Rate (1/T) Exec Time (C) Utilization (U)
C1=2, T1=5 Task2
A 20 ms 50 Hz 10 ms 0.5
C2=4, T2=7 Task1 B 50 ms 20 Hz 25 ms 0.5

2 5 7 10 14 A1 A2 B1 A3 A4 A5,B2

❚  Any sequence is optimal that puts the jobs in order of A1 B1 A2 B1 A3 B2 A4 B2 A1

non-decreasing deadlines
0 10 20 30 40 50 60 70 80 90 100
2014-08-26 55 2014-08-26 56

Earliest Deadline First Earliest Deadline First Implementation

❚  Additional assumptions ❚  At each preemption, sort tasks by time-to-deadline.


❙  arbitrary release times and deadlines, and ❚  Need for efficient sorting algorithm — O(n logn)
❙  arbitrary and unknown (to the scheduler) execution ❚  Choose ready task closest to the deadline.
times.
❚  The EDF algorithm is optimal in that if there exist any
algorithm that can build a valid (feasible) schedule on a
single processor, then the EDF algorithm also builds a
valid (feasible) schedule.

2014-08-26 57 2014-08-26 58

What Is Missing Data dependencies

❚  Data dependencies between tasks ❚  Data dependencies allow us to


❚  Context switching time (jitter) improve utilization.
❙  Restrict combination of processes P1
❚  Multiprocessor scheduling that can run simultaneously.
❚  Memory considerations ❚  P1 and P2 can t run simultaneously.
❚  ... P2

2014-08-26 59 2014-08-26 60

10
Context-switching time Literature

❚  Non-zero context switch time can push limits of a tight ❚  P. Eles, K. Kuchcinski and Z. Peng, System Synthesis
schedule. with VHDL, Kluwer Academic Publisher, 1998.
❚  Hard to calculate effects -- depends on order of context ❚  Any book on real-time scheduling, e.g.,
switches. Alan Burns and Andy Wellings, Real-Time Systems and
❚  In practice, OS context switch overhead is small. Programming Languages, Addison Wesley, 1996.

2014-08-26 61 2014-08-26 62

11

You might also like