Adaptive Partition Scheduling

Adaptive Partition Scheduling Part 1: Why we did it
Cool stuff from QNX
A.Danko
January 24, 2012
Why?
Evolution of schedulers
Timeline
priority pre-emptive SCHED_FIFO Timeslicing SCHED_RR Time-varying priority
Yes, but:
System locks up Backhoes and Mothers day Untuneable for more than 1 application. US Military Satcom Hard to manage share interactions. Not invented until now.
SCHED_SPORADIC Really clever time-varying Fair Share scheduling Adaptive configuration
January 24, 2012
Cool Stuff from QNX
Why?
Evolution: Lessons learned

Numerical priorities are chosen by applications but system scheduling behavior must be designed globally Degradation and overload: Priorities are not constants. Importance of work depends on circumstances.
> Modes: normal operation, restart, emergency maintenance
Scheduling strategy needs to be based on unit of work, but what we have is communicating threads. must measure real-time behavior.
> 0.1 % accuracy
Want to specify shares as global percentages

> Applications dont get to pick their importance or shares. System engineers do.
Need to throttle cpu usage without losing realtime latencies.

January 24, 2012
Cool Stuff from QNX
Design
What is Partitioning?
General Answer Separation of work To isolate:
> cpu usage > memory usage > system resource usage > Failures
QNX Answer
POSIX compatible design which can be applied to existing systems with little or no recoding Partition Scheduling Adaptive A global hard real-time scheduler with overload protection and CPU guarantees
> Separation of work based on working for common purpose
Runtime typed memory and kernel object guarantees and limits

> With full inheritance and accounting for all children
Persistent storage (file system) guarantees and limits Process model for fault isolation Dynamic configuration
Cool Stuff from QNX
January 24, 2012
Design
Principles
Scheduler must not trigger an overload
> Overhead may not increase with # of threads
Throughput
Real-time during underload

> Same behavior as today
Real-time during overload

> At least for interrupt handling
Must also be a fair-share scheduler

> > global scheduler algorithm globally configured
Offered load
Must mesh with current QNX architecture

Preemptive priority, individual thread scheduling Heavy use of message passing
> >
Easy to drop onto existing applications Cant be a bag on the side

Insert picture of Juggling Watermelons here
Simple enough for customers to use

> > Engineerable Reconfigure on the fly
January 24, 2012
Cool Stuff from QNX
Design
Counting time
What does 14% cpu mean?
> > CPU usage is calculated over a sliding window.
T= -100ms
T= now
Accuracy:
> > > > > > > Counting ticks is not enough. Micro-billing is used to track actual CPU utilization even when threads dont use their whole timeslice. micro- and nano-second resolution Threads are billed based on real usage, not statistics Tradeoff maximum READY-state latency with accuracy of CPU budgeting 100ms window -> 1% accuracy or better. Internal arithmetic accurate to 0.5% or better ns cpu time executed, during last sliding window, expressed as percentage Guaranteed percentage of cpu time, balanced over sliding window
windowsize is configurable as an argument to kernel at boot
Partition usage Partition budget
January 24, 2012
Cool Stuff from QNX
Design
Whos got time: Partition Inheritance
File System Process

7
Message
6 6
11 8
Message
10 10
4 9
Receive Threads
9 CPU budget available Adaptive Partition 1 (Multi-media)
CPU budget available Adaptive Partition 2 (Java application)
Resource manager threads work on behalf of sender Priority and adaptive partition in inherited on receive
> Execution time in server billed to clients partition
This allows proper accounting for shared resources

January 24, 2012
Cool Stuff from QNX
Design
Real time: Behavior under normal load

Blocked Ready 6 6 6 8 11 Running 9 4 CPU budget available Adaptive Partition 1 (Multi-media) CPU budget available Adaptive Partition 2 (Java application) 7
10
Hard real-time scheduler under normal load Running thread selected as highest priority READY thread No delay on scheduling if adaptive partition has budget
January 24, 2012
Cool Stuff from QNX
Design
Out of time: Behavior under overload

Blocked Ready 6 6 6 8 11 Running 9 4 CPU budget available Adaptive Partition 1 (Multi-media) CPU budget exceeded Adaptive Partition 2 (Java application) 7
10
Highest priority READY thread in Partition with budget runs No delay on scheduling if adaptive partition has budget
January 24, 2012
Cool Stuff from QNX
Design
Free Time: Behavior with unused CPU

Blocked 6 6 11 Running 6 7 10 6
10 9 4 CPU budget exceeded Adaptive Partition 1 (Multi-media) CPU budget exceeded Adaptive Partition 2 (Java application)
CPU budget available Adaptive Partition 3
If no partitions with remaining budget have READY threads, highest priority READY thread is selected to run from other partitions This allows free time to be given based upon priority
> Free time is still accounted and may have to be paid back (for example, if partition 3 becomes ready within 1 averaging window)
Cool Stuff from QNX
January 24, 2012
10
Design
Borrowed Time: Critical Threads

Blocked Ready 6 6 6 8 11 Running 30 11 4 CPU budget available Adaptive Partition 1 (Multi-media) CPU budget exceeded Adaptive Partition 2 (Air Bag Control) 7 Critical Thread
Critical threads still run (based on priority) even if partition has no budget Critical threads provide deterministic scheduling even in overload Critical threads are given critical budget and can go into short-term debt
> > Critical time is accounted and has to be repaid Exceeding critical budget is considered an error and causes notification/action
Cool Stuff from QNX
January 24, 2012
11
Design
Equal time.
How to choose between partitions of equal priority
> Unimportant? > Many threads run at default priority, therefore equal priority
Possible algorithms:
> - round robin > - favor partition with most free time > - favor longest waiter
Requirement:
> Minimize latencies during underload > WBN: divide free time by % cpu share.
Solution:
Interleave partitions by ratio of partition shares
We found a clever way to do that, so its in the patent.
January 24, 2012
Cool Stuff from QNX
12
How it does it
uKernel
Process creation
libmod_aps.a
messaging
Per-partition Ready Q
Scheduler
clock intr handler ready() block() select_thread()
for all partitions, p Def m(p) -> (bud(p)||crit(p), prio(p), run_t/wsize/bud(p)) Then schedule ps Def ps -> rdy(ps) and (m(ps) < m(pi)) For all i != s
January 24, 2012
Cool Stuff from QNX
13
Overhead: Fancy, but is it fast? Scheduling overhead increases with:

> > > > - number of partitions - number of messages/sec - number of clock interrupts/sec, i.e. ClockPeriod() * does not increase with number of threads *
Free or almost free operations:

> Inheriting partition as part of message receive > Joining a thread to a partition > Dynamically changing budgets
Computational requirements
> 32 bit multiply, 64bit add > *no floating point* *no divides* *no address space swapping* *short-circuit calculation of merit function* *no inter-cpu msging on SMP* *history-less algorithm*
Overhead typically 1% of total cpu

January 24, 2012
Cool Stuff from QNX
14
Any Queries????
January 24, 2012
Cool Stuff from QNX
15

Adaptive Partition Scheduling

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Partition Scheduling

Uploaded by

Copyright:

Available Formats

Adaptive Partition Scheduling Part 1: Why we did it

Cool stuff from QNX

January 24, 2012

SCHED_SPORADIC Really clever time-varying Fair Share scheduling Adaptive configuration

January 24, 2012

Cool Stuff from QNX

Evolution: Lessons learned

Want to specify shares as global percentages

Need to throttle cpu usage without losing realtime latencies.

Runtime typed memory and kernel object guarantees and limits

January 24, 2012

Real-time during underload

Real-time during overload

Must also be a fair-share scheduler

Must mesh with current QNX architecture

Easy to drop onto existing applications Cant be a bag on the side

Simple enough for customers to use

January 24, 2012

Cool Stuff from QNX

windowsize is configurable as an argument to kernel at boot

Partition usage Partition budget

January 24, 2012

Cool Stuff from QNX

Whos got time: Partition Inheritance

File System Process

9 CPU budget available Adaptive Partition 1 (Multi-media)

CPU budget available Adaptive Partition 2 (Java application)

This allows proper accounting for shared resources

Real time: Behavior under normal load

Out of time: Behavior under overload

January 24, 2012

Cool Stuff from QNX

Free Time: Behavior with unused CPU

CPU budget available Adaptive Partition 3

January 24, 2012

Borrowed Time: Critical Threads

January 24, 2012

January 24, 2012

Cool Stuff from QNX

January 24, 2012

Cool Stuff from QNX

Overhead: Fancy, but is it fast? Scheduling overhead increases with:

Free or almost free operations:

Overhead typically 1% of total cpu

January 24, 2012

Cool Stuff from QNX

You might also like