Professional Documents
Culture Documents
Adaptive Partition Scheduling
Adaptive Partition Scheduling
A.Danko
Why?
Evolution of schedulers
Timeline
priority pre-emptive SCHED_FIFO Timeslicing SCHED_RR Time-varying priority
Yes, but:
System locks up Backhoes and Mothers day Untuneable for more than 1 application. US Military Satcom Hard to manage share interactions. Not invented until now.
Why?
Scheduling strategy needs to be based on unit of work, but what we have is communicating threads. must measure real-time behavior.
> 0.1 % accuracy
Design
What is Partitioning?
General Answer Separation of work To isolate:
> cpu usage > memory usage > system resource usage > Failures
QNX Answer
POSIX compatible design which can be applied to existing systems with little or no recoding Partition Scheduling Adaptive A global hard real-time scheduler with overload protection and CPU guarantees
> Separation of work based on working for common purpose
Persistent storage (file system) guarantees and limits Process model for fault isolation Dynamic configuration
Cool Stuff from QNX
Design
Principles
Scheduler must not trigger an overload
> Overhead may not increase with # of threads
Throughput
Offered load
> >
Design
Counting time
What does 14% cpu mean?
> > CPU usage is calculated over a sliding window.
T= -100ms
T= now
Accuracy:
> > > > > > > Counting ticks is not enough. Micro-billing is used to track actual CPU utilization even when threads dont use their whole timeslice. micro- and nano-second resolution Threads are billed based on real usage, not statistics Tradeoff maximum READY-state latency with accuracy of CPU budgeting 100ms window -> 1% accuracy or better. Internal arithmetic accurate to 0.5% or better ns cpu time executed, during last sliding window, expressed as percentage Guaranteed percentage of cpu time, balanced over sliding window
Design
6 6
11 8
Message
10 10
4 9
Receive Threads
Resource manager threads work on behalf of sender Priority and adaptive partition in inherited on receive
> Execution time in server billed to clients partition
Design
10
Hard real-time scheduler under normal load Running thread selected as highest priority READY thread No delay on scheduling if adaptive partition has budget
January 24, 2012
Cool Stuff from QNX
Design
10
Highest priority READY thread in Partition with budget runs No delay on scheduling if adaptive partition has budget
Design
10 9 4 CPU budget exceeded Adaptive Partition 1 (Multi-media) CPU budget exceeded Adaptive Partition 2 (Java application)
If no partitions with remaining budget have READY threads, highest priority READY thread is selected to run from other partitions This allows free time to be given based upon priority
> Free time is still accounted and may have to be paid back (for example, if partition 3 becomes ready within 1 averaging window)
Cool Stuff from QNX
10
Design
Critical threads still run (based on priority) even if partition has no budget Critical threads provide deterministic scheduling even in overload Critical threads are given critical budget and can go into short-term debt
> > Critical time is accounted and has to be repaid Exceeding critical budget is considered an error and causes notification/action
Cool Stuff from QNX
11
Design
Equal time.
How to choose between partitions of equal priority
> Unimportant? > Many threads run at default priority, therefore equal priority
Possible algorithms:
> - round robin > - favor partition with most free time > - favor longest waiter
Requirement:
> Minimize latencies during underload > WBN: divide free time by % cpu share.
Solution:
Interleave partitions by ratio of partition shares
We found a clever way to do that, so its in the patent.
12
How it does it
uKernel
Process creation
libmod_aps.a
messaging
Per-partition Ready Q
Scheduler
clock intr handler ready() block() select_thread()
for all partitions, p Def m(p) -> (bud(p)||crit(p), prio(p), run_t/wsize/bud(p)) Then schedule ps Def ps -> rdy(ps) and (m(ps) < m(pi)) For all i != s
13
Computational requirements
> 32 bit multiply, 64bit add > *no floating point* *no divides* *no address space swapping* *short-circuit calculation of merit function* *no inter-cpu msging on SMP* *history-less algorithm*
14
Any Queries????
15