HardwareSoftwareCodesign Summary

Design Challenges 2.
Specification &
Hardware/Software Codesign
Application complexity: want adaptability, but specialized
Summary Andreas Biri, D-ITET 27.01.18 - large systems must often provide legacy compatibility Models of Computation
- mixture of event driven & data flow tasks
Observer: subject changes state, observer is notified
1. Introduction Target system complexity: design space is very large - one-to-many dependency between subject & observers
- different technologies, processor types, designs
Embedded Systems are designed for specialized processes Synchronized: causes processes to run sequentially
- use finished system-on-chips, distributed implementation
- systems exist of dedicated, specialized hardware - solves race conditions, bad performance as not parallel
Constraints & design objectives - can easily cause deadlocks if circular dependence
- require design optimizations targeted at intended usage
- cost, power consumption, timing constraints, size,
(performance, cost, power consumption, reliability)
predictability, processing power, temperature Models of Computation
Very specific systems for one description of model
- restricted language, restricted rules
- offer more possibilities to optimize (more pre-knowledge)
- ease-of-use & better analysis (can verify correctness)
- efficient usage, high abstraction level
Model of Computation: “What happens inside & how interact?”

- Components & execution model for computations
- Communication model for information exchange
Levels of Abstraction
Discrete Event model: associate even with (trigger) time
Embedded systems (ES): information processing systems Specification: “Computer requires FPGA, DSP, …” - search for next even to simulate, independent of realtime
embedded into a larger product - Model formally describes selected system properties - allows efficient simulation, as only compute actions
- deliver enhanced functionality of existing system - Consists of data and associated methods - VHDL (hardware description language): sensitivity lists
- work in parallel & distributed target platforms
Synthesis: step from abstraction to real system Finite state machines (FSM): abstract proc. representation
- require assured reliability, guarantees & safety
- connects levels of abstraction (refinement) Differential equation: describe component mathematically
(predictability & bounded execution times are critical)
- from problem-level description to implementation
- usage known at design-time, not programmable by user
Modelling: connects implementation to problem-level Shared memory: potential race conditions & deadlocks
- fixed run-time requirements (at lowest possible cost)
- estimation of lower layer properties to improve design - Critical section: must receive exclusive access to resource
Often, such systems require real-time processing while still
Asynchronous message passing: non-blocking
offering low power consumption for energy independence Structural view: abstract layout of components
- sender does not have to wait until delivered
Multiprocessor system-on-a-chip (MPSoC) Behavioural view: describes function of component - potential buffer overflow if receiver doesn’t read enough
dedicated system, highly specialized with dedicated HW Physical view: effective hardware as seen on chip Synchronous message passing: blocking
- application characterized by variety of tasks - requires simultaneous actions by both end components
Hardware/Software Mapping: partitioning of system to - automatically synchronizes devices
General Purpose Computer: broad class of applications programmable components (software) & specialized HW - Communication sequential process (CSP): rendez-vous
- programmable by end user, usage might vary over time - can adapt implementation HW to match problem set based communication, bot indicate when ready for it
1
Specification requirements Simulation phase Random: information known about system & input is not
All edge labels are evaluated in 3 different phases: sufficient to determine its outputs (undeterministic)
Hierarchy: dependencies between components
1. Effect of external changes & conditions evaluated Determinate: histories of channels depends only on input
- Behavioural hierarchy: states, processes, procedures
2. Set of transitions to be made is computed - independent of timing, state, hardware; only function
- Structural hierarchy: processors, racks, circuit boards
Timing behaviour: mostly requires to be tightly bound 3. Transitions become effective (simultaneously)
Kahn process: monotonic mapping of input to output
State-oriented behaviour: required for reactive system Can also consider internal events instantaneous - create output solely based on previous input
Dataflow-oriented behaviour: parts send data streams - external events only considered in stable state, i.e. when
Adding non-determinacy can occur in multiple ways:
no more internal steps are conducted & status remains
Classical automata: input changes state & output (FSM) - allow processes to test for emptiness of input channels
- state diagram only represents stable states
- complex graphs are difficult to read & analyse by humans - allow shared channels (read or write )
- Moore: output only depending on state, not input StateChart solves some, but not all problems: - allow shared data between processes (variables)
- Mealy: output depends on state and input - hierarchy allows nesting of OR & AND states
Scheduling Kahn networks
- can auto-generate C code (but often inefficient)
2.1 StateCharts (2-25) Responsibility of the system, not the programmer
- no object-orientation & structural hierarchy
- bounded memory (buffers) can overflow if not managed
- not useful for distributed applications
Introduces hierarchies to increase readability: Tom Parks algorithm: iteratively increase buffer sizes
(as have asynchronous events & executions)
- Super-state: can have internal substates
1. Start with network with blocking writes
- Basic state: its super-state is called ancestor state Specification & Description Language (SDL): unambiguous 2. Use scheduling which doesn’t stall if not all block
specification of reactive & distributed systems 3. Run as long as no deadlocks occur
OR-super-state: can be in exactly one of the substates
- allows asynchronous message passing, but simultaneous 4. When deadlock, increase size of smallest buffer
AND-super-state: is in all of the immediate sub-states
- can still have undeterminism if race conditions occur
Finite buffer sizes: limit buffers by introducing reverse
(e.g. if we have multiple inputs for the same FIFO queue)
- if want to send, always need to first read from reverse
2.2 Data-Flow Models (2-55) - blocks if already wrote too much, will wait for token
Try to make result independent of time, focus on data

- processes communicate through FIFO buffers
Computation of state sets: traverse tree representation
- one buffer per connection to avoid time dependence
- OR-super-states: addition of sub-states Kahn Process networks offer various advantages:
- AND-super-states: multiplication of sub-states All processes run simultaneously with imperative code - scheduling algorithm does not affect functional behaviour
- processes can only communicate through buffers - matches stream-based processing & explicit parallelism
Timers: indicated by special edges, traverse after timeout - maps easily to parallel hardware / block-diagram specs - easy mapping to distributed & multi-processor platforms
- fuzzy & difficult to implement with balanced rates
Besides states, can use variables to keep information: Kahn Process Network
- action: variable changes as result of state transition - read: destructive & blocking (empty queue → busy-wait) Synchronous Dataflow (SDF): allow compile-time schedule
(multiple actions executed simultaneously: (𝑎 ≔ 𝑏; 𝑏 ≔ 𝑎) - write: non-blocking - process reads/writes fixed number of token each time
- condition: dependences of state transitions - FIFO queues are of infinite size - uses relative execution rates by solving linear equations
- determinate: no random variables, always know which - requires rank 𝑛 − 1 for 𝑛 processes & sufficient init token
queue I will read next as blocking (cannot peak & decide) - period found when same number of initial tokens again
2
3. Mapping Application-Architecture 4. System Partitioning 4.1 Integer Linear program (ILP) (4-10)
Allocation: select components Assign tasks to computing resources (& com to networks) An integer programming model requires two ingredients:
- objective function: linear cost expressions of int variables
Binding: assign functions to components (run what where)
Optimal partitioning can be achieved using various ways: - constraints: limit design space & optimization
Scheduling: determine execution order (scheduling) - compare design alternatives (design space exploration)
- estimate with analysis, simulation, prototyping
Partitioning: allocation & binding
(if system parameters are unknown / not determinable)
Mapping: binding & scheduling
Usually, one has conflicting design goals & constraints
Synthesis: implementation of the given specifications
- costs: cost of allocated components (should be minimal) IP problem: minimize objective function under constraints
- uses an underlying model of computation
- latency: due to scheduling / resource sharing (parallelize)
- constraints give maximal parameters, try to find solution For partitioning, we can setup the following ILP
Data Flow graph (DFG): show operation & communication
- 𝑥𝑖,𝑘 = {0,1} ∶ determines whether object 𝑜𝑖 in block 𝑝𝑘
- nodes: operations; edges: communication / data
Cost functions: quantitative performance measurement
- initial, abstract representation
- system cost, latency, power consumption, weight, size
Control Flow graph (CFG): node ≙ line of code - try to minimize function consisting of all variables
- includes loops and conditional statements - linear cost function weights & sums individual costs
Architecture Specification: reflects structure & properties Partitioning: assign 𝑛 objects to 𝑚 locks such that
- can be done at different abstraction levels - all objects are assigned / mapped (uniquely / once)
- costs are minimized & all constraints are kept
Mapping relates application & architecture specifications: Load balancing: maximal sum of all durations should be
- binds processes to processors minimized (minimize 𝑇, where 𝑇 ≥ 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑃𝑖 )
Partitioning methods
- binds communication between processes to busses/paths
- specifies resource sharing disciplines & scheduling Exact methods: get on optimal solution with minimal costs Additional constraints: can e.g. maximize number of
- enumeration: iterate through all solutions & compare objects in a single block
DFG Application model (3-11): initial specification - integer linear program (ILP)
𝑓
- functional nodes 𝑉𝑃 : tasks, procedure
Heuristic methods: get good solution with high probability
- communication nodes 𝑉𝑃𝑐 : data dependencies
- Constructive: random mapping, hierarchical clustering
Architecture model (3-12): describe physical hardware Maximizing the cost function: minimize negative function
- Iterative: Kernighan-Lin algorithm, simulated annealing,
𝑓
- functional resources 𝑉𝐴: processor, RISC, DSP evolutionary algorithms
𝑐 ILPs are very popular for synthesis problems
- bus resources 𝑉𝐴 : shared bus, PTP bus, fiber
- acceptable run-time with guaranteed quality
Specification graph: maps application graph to architecture (might be sub-optimal, as only search integer values
- action of binding abstract functions to actual hardware - scheduling can be integrated as well
- must occur after allocation - can add arbitrary constraints (however, hard to find)
- each data flow node must have an outgoing edge - NP-complete (can take long time if too complex)
- communication must actually connect correct HW - good starting point for designing heuristical optimization
3
4.2 Heuristic methods (4-17) 5. Multi-Criteria Optimization 5.2 Multiobjective Evolutionary Algos (5-24)
Constructive methods Evaluate set of solutions simultaneously
Network processor: execute communication workload
Try finding a good solution in a single computation - black-box optimization using randomization (no local min)
- high-performance for network packet processing
- assumption: better solutions are found near good ones
Random mapping: object is randomly assigned to block For optimization, we require:
1. Choose a set of initial solutions (parent set)
Hierarchical clustering: stepwise group objects (i.e. assign - task model: specification of the task structure
2. Mating selection: select some out of parent set
to same blocks) by evaluating the closeness function - flow model: different usage scenarios
3. Variation: use neighbourhood-operators to
- always merge the two “closest” objects (maximal value) The implementation defines architecture, task mapping & generate a new children set
- can stop after reaching desired level of clusters scheduling while considering objectives & constraints 4. Determine a union of children & parent set
- objectives: maximize performance, minimize costs 5. Environment selection: eliminate bad solutions
Iterative methods
- constraints: memory, delay, costs, size (conflicting)
Start at one point and try to improve in steps
- results in a performance model of the system Environmental selection
1. Start with an initial configuration
Criteria to choose which new solutions to take on:
2. search neighbourhood (similar partitions) and Black-box optimization: can only give input, observe
- Optimality: take the ones close to (unknown) front
select one as a candidate (slightly modified) output and use objective function to optimize input
- Diversity: should cover a large part of objective space
3. Evaluate fitness function of candidate
Constraints: only take feasibly solutions, add penalty Hypervolume indicator: should be maximized
4. Stop when criterion is fulfilled / after some time
- reference point: should be far away from optimal point
Hill climbing: always take the one with higher fitness 5.1 Multiobjective Optimizations (5-12) (can strongly influence chosen set by weighting area)
- if no more neighbours better, stop execution Different objectives are often not comparable - corresponds to region dominated by pareto-optimal front
- local optimum as a best result (depends on initialization) - however, there are clearly inferior solutions - using dominated points will not increase the area
- can start at various points to get good (& quick) estimate - all additional pareto-optimal solutions increase the area
Can use classical single objective optimization methods
KL: use information of previous runs to find global one - a better set (dominates other one) always has larger area
- simulated annealing, Kernighan-Lin, ILP
Simulated annealing: use complex acceptance rule (jump) - decision making is done before optimization Choose the solution which increases hypervolume the
Evolutionary algorithms: complex strategy to add entropy - map all dimensions to one using some cost function least and throw it out of the parent set for the next round
Kernighan-Lin algorithm (4-29) Decision space: feasible set of alternatives Neighbourhood-operators (5-37)
From all possible pairs of objects, virtually regroup the best Objective space: image of decision space using the Work on representations of solutions (e.g. integer vector)
- from the remaining objects, continue until all regrouped objective function (“evaluated performance”) Completeness: each solution has an encoding
- after 𝑛/2 turns, take lowest cost one & actually perform
Uniformity: all solutions are represented equally often
External costs 𝐸𝑖 : from node to nodes in other partition Pareto-dominated: if better or equal for all objectives
(else biased to solution with many encodings)
Internal costs 𝐼𝑖 : from node to nodes in same partition Pareto-optimal: not dominated by any other solution
Pareto-optimal front: set of pareto-optimal points Feasibility: each encoding maps to feasible solution
Desirability to move: 𝐷𝑖 = 𝐸𝑖 − 𝐼𝑖
(e.g. make it priority, not absolute definition)
Gain: 𝑔 = 𝐷𝑥 + 𝐷𝑦 − 2 ∗ 𝑐(𝑥, 𝑦) Population-based optimization costs: Pareto-optimal front
Use evolutionary algorithms to get a set of solutions Crossover: take 2 solutions & exchange properties
Simulated annealing: vary (randomly), always take better- - decision making is done after the optimization
cost but also probability to take worse-cost neighbours Mutation: randomly vary property of solution
- function can then weight & map different image points
- gradual cooling: slowly decrease prob. of accepting worse (reorder, flip, replace with different part)
4
Event list: queue of events, processed in order
6. System Simulation - organized as a priority queue (may include time)
7. Design Space Exploration
6.1 System classification Simulation time: represents current value of time Optimal design criteria
- during discrete event, clock advances to next event time
System: combination of components to perform a function Mappings: all possible bindings of tasks to architecture
not possible with the individual parts System modules: model subsystems of simulated system
Request: operational cost of executing given task
- process events, manipulate event queue & system state
Model: formal description of the system (abstraction) - sensitivity list indicates whether module is concerned Binding: subset of mappings so that every task is bounded
to exactly one allocated resource (actual implementation
State: contains all information necessary to determine the Zero duration virtual time interval: delta-cycle (𝜹)
output together with the input for all future times Design constraints
- prevents the cause & effect events to coincide at the
Discrete state models: countable number of states same time instance (if they occur instantly, outcome Delay constraints: maximal time a packet can be processed
- e.g. processors with registers depends on ordering & race conditions occur) Throughput maximization: maximize packets per second
Continuous state model: actual analogue signals - orders “simultaneous” events within simulation cycle Cost minimization: implement only numbered resources
Discrete time model: changes only at discrete times Conflicting usage scenarios: should be good for mixture
SystemC (6-23)
Continuous time model: time advances the system
System-level modelling language to simulate concurrent
Events: tuple of a value 𝑣 and a tag 𝑡 executions in embedded systems (HW, communication)
- 𝑡 = 𝑡𝑖𝑚𝑒 : timed event - event-driven simulation kernel for discrete-event models
- 𝑡 = 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 : untimed event
Processes are the basic units of functionality:
Time-driven simulation: time is partitioned into intervals - SC_THREADS: called once, run forever (block if no input),
- simulation step is performed even if nothing happens can be suspended using 𝑤𝑎𝑖𝑡() / started with 𝑛𝑜𝑡𝑖𝑓𝑦()
Event-driven simulation: discrete or continuous time base - SC_METHOD: event-triggered, require sensitivity list
- evaluation & state changes only at occurrences of events, execute repeatedly when called without being suspended
Discrete Event system (DES): event-driven system Channel: contained for communication & synchronization Simple analysis model
- state evolution depends entirely on occurrence of - can have state (FIFOs), implement one or more interfaces Optimize both to minimize maximal processor & bus load
discrete events over time (not by the evolution of time) Check that modules have sufficient initial tokens! - want to spread load over all CPUs, but not too much
- signals / streams: ordered and/or timed events - get numbers using static parameters, functional
- processes: functions which act on signals or streams simulation & instruction-set simulation (using benchmarks)
6.3 Simulation at high abstract levels
6.2 Discrete event simulation (6-16) (Untimed) functional level: model functionality
- C/C++, Matlab: shared variables & messages
Modules describe the entire system & allow separation:
- Behaviour: described using logic and algebraic expression Transaction level: early SW development, timing
- State: persistent variables inside these modules - SystemC: method calls to channels
- Communication: done through ports via signals
- Synchronization: done through events and signals Register transfer / Pin level: HW design & verification
- Verilog, VHDL: wires and registers
5
8. Performance Estimation Estimation methods (8-19) 9. WCET Analysis
Measurements: use prototype to measure performance
High-level estimation: just look at the functional behaviour Hard Real-Time Systems: embedded controllers are
Simulation: develop program which runs system model
- short estimation time, implementation details irrelevant expected to finish their tasks reliably within time bounds
- limited accuracy, e.g. no information about timing Statistics: develop statistical abstraction & derive statistics
Formal analysis: mathematical abstraction to compute Worst-Case Execution Time (WCET): upper bound on the
Low-level estimation: simulate all physical layers formulas which describe the system performance (WC) execution time of a task, should be kept minimal
- higher accuracy, deeper analysis possible - upper bound consists of all pessimistic consumptions
- long estimation time, need to define exactly - can be approximated with exhaustive measurements
Analytic models: abstract system & derive characteristics
We use performance estimation to check: - performance measures are stochastic values (e.g. avg) Usually, try to compute by analysing program structure
- Validation of non-functional aspects: verification - can use for worst-case/best-case evaluation (bracketing) - modern processers exploit parallelism, therefore
- Design space exploration: allows optimization execution time not simply sum of single instructions
Static analytic models: use algebraic equations & relations
- out-of-order execution by leveraging independence
Exploration: reconfigure system and evaluate performance between components to describe properties
- caches, pipelines, branch prediction, speculation
- fast & simple, but generally inaccurate modelling
- difference between BCET and WCET can be gigantic
Performance metric: function giving a quantitative (scheduling, overhead, resource sharing neglected)
indication on system execution, should be representative Timing Accident: cause for increase of execution time
Dynamic analytic model: extend static models
- time, power, temperature, area, cost, SNR, processing - execution is increased by timing penalty
- implement non-determinism in run-time & processing
- describe e.g. resource sharing (scheduling & arbitration) - causes: cache miss, pipeline stalls, branch misprediction,
Evaluation difficulties
bus collisions, DRAM memory refresh, TLB miss
Non-determinism: computation (parallel), communication Simulation: implement a model of the system (SW, HW)
(interference), memory (shared resources), interactions - include precise allocation, mapping, scheduling Micro-architecture analysis: use abstract interpretation
- combines functional simulation & performance analysis - exclude as many timing accidents as possible
Cyclic timing dependencies: internal streams interact on
- performance evaluation by running the entire program - determine WCET for basic blocks by analysing HW
computation & communication, influences characteristics
- difficult to focus on part, but enables detailed debugging
Uncertain environment: different scenarios (cached, pre- Worst-case Path Determination: maps control flow graph
- one run only evaluates for a single simulation scenario
emption enabled), worst-case vs. best-case inputs to an ILP to determine upper bound & associated path
(specific input trace & initial system state)
Various resource availability & demands: request depend - complex setup & extensive runtimes, but accurate
on precise circumstances, where & when it is executed
Trace-based simulation: separate functional & timing beh.
- run-time of functions can vary depending on state
- trace only determined by functional application
- disregards timing (only focus on functional behaviour)
- faster than low-level simulation, as abstracting as events
- allows evaluation on multiple architectures using the
same event graph on “virtual machines”
6
9.1 Program Path Analysis (9-18) 9.2 Value Analysis (9-29) 9.4 Pipelines (9-54)
Determine sequence of instructions which is executed in Abstract Interpretation (AI): don’t work on actual variables, Ideal case: finish 1 instruction per cycle
the worst-case scenario (i.e. resulting in longest runtime) but consider possible variable intervals (abstract values)
- we know WCET for basic block from static analysis - can give exact WCET by considering all possible inputs Instruction execution is split into several stages
- number of loops must be bounded - supports correctness proofs - multiple instructions can be executed in parallel
- may execute instructions out-of-order
Basic block: sequence of instructions where control flow Value analysis is used to provide:
enters at beginning and exits at end without stopping in- - Access information to data-cache/pipeline analysis Pipeline hazards
between or branching (just linear sequence, SISO) - Detection of infeasible paths Data hazards: operands not yet available
- Derivation of loop bounds (data dependencies, cache miss)
Determining the first instructions of basic blocks:
(solve with pipeline stall & forwarding)
- The first instruction 9.3 Caches (9-35)
- Targets of (un-)conditional jumps Resource hazards: consecutive instr. uses same resource
Provide fast access to stored data without accessing main
- Instructions that follow (un-)conditional jumps Control hazards: conditional branches (requires flush)
memory, as speed gap between CPU & memory is large
Instruction-cache hazards: instruction fetch causes miss
The WCET can then be calculated as the sum of the blocks:
Assumes local correlation between data access:
𝑁 - program will use similar data soon (many hits)
𝑐𝑖 ∶ 𝑊𝐶𝐸𝑇 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘 𝑖 Cache analysis: prediction of cache hits (data / instr.)
𝑊𝐶𝐸𝑇 = ∑ 𝑐𝑖 𝑥𝑖 , - program will reuse items (instructions, data)
𝑥𝑖 ∶ # 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑖
𝑖=1 - access patterns are evenly distributed across the cache Dependence analysis: analysis of data/control hazards
The number of executions of a block 𝑥𝑖 is given by: Resource reservation tables: analysis of resource hazards
4-way set associative cache (9-39): store 4 tags / cache line
- structural constraints: given by flow equations
Least Recently Used (LRU): replace oldest block (ages) Simulation
- additional constraints: extracted from program code
(e.g. number of loop iterations, logical connections) Processor: consider CP as a big state machine with initial
We can distinguish two statically cache contents analyses:
state 𝑠, instruction stream 𝑏 and trace 𝑡
The entire ILP can then be written as: Abstract pipeline: limit simulation to pipeline
Must analysis: Worst-case, “At which position at least?”
- each predicted cache hit reduces WCET (always hit) - may lack information, e.g. about cache contents
- “union + maximal age”: where is my worst position?
Assuming local worst-case at every step leads to the global
May analysis: Best-case, “At which position at most?” worst-case result (might be longer than in real-world)
- each predicted cache miss increases BCET (always miss) - timing anomalies might however counteract this
- “union + minimal age”: where is my best position? assumption (may be faster if delayed in beginning)
- can also join sets of states under this assumption
Loop unrolling: improve analysis by limiting influence of (always keep most pessimistic look, as always safe)
state before the loop by executing it first as 𝑖𝑓 - always assume cache misses where not excluded
- more optimistic result for WCET, pessimistic for BCET
7
10. Performance Analysis of Greedy Processing Component (GPC) Time-interval domain relations (10-32)
If tasks are available & resources ready, always use them
Distributed Embedded Systems - assume preemptable tasks (can stop anytime if 𝐶(𝑡) = 0)
- processes are only restricted by limited resources
Embedded system (ES)
- are processed one after the other in FIFO order
- Computation, Communication, Resource interaction Using convolution, we can describe the relation as:
- build a system from subsystems meeting requirements
10.1 Real-time calculus (10-11)

Abstract systems to calculate evaluation for all possible
executions at once (entire behaviour in one analysis)
Min-plus
- used for interval arithmetic (abstract values)
Computation: task instance 𝑅(𝑡), computing resource 𝐶(𝑡)

Communication: data packet 𝑅(𝑡), bandwidth 𝐶(𝑡)
Delay & Backlog (10-37):
Infimum: greatest element (not necessarily in set) which is - backlog: max. nr of components in queue / vertical diff.
less or equal to all other elements of the set - delay: max. time in queue / horizontal difference
Metrics (10-17)
Data streams 𝑅(𝑡) : number of events in [ 0, 𝑡)
Resource streams 𝐶(𝑡) : available resources in [ 0, 𝑡) 𝑢 ∶ last time the buffer was completely empty (𝑅′ = 𝑅)
- for 0 ≤ 𝑡 ′ ≤ 𝑡, we constantly use all available resources
Arrival curve 𝛼 = [ 𝛼 𝑙 , 𝛼 𝑢 ] 10.2 Modular Performance Analysis (10-40)
𝑅′ (𝑡) − 𝑅′ (𝑢) ≤ 𝐶(𝑡) − 𝐶(𝑢)
- max / min arriving events in any interval of length ∆
𝐶 ′ (𝑡) = sup { 𝐶(𝑢) − 𝑅(𝑢) }
Service curve 𝛽 = [ 𝛽 𝑙 , 𝛽 𝑢 ] 0≤𝑢≤𝑡
- max/min available service in any interval of length ∆
Conservation law: 𝑅′ (𝑡) ≤ 𝑅(𝑡) ∀ 𝑡
Common event pattern: specified by parameter triple
- 𝑝 : period Time domain: cumulative functions
- 𝑗 : jitter Tine-interval domain: variability curves
- 𝑑 : minimum inter-arrival distance of events
Ex. Periodic with Jitter: (10-21)

Ex. TDMA Resource: (10-24)
Different scheduling mechanisms: (10-41)
- Fixed priority/Rate monotonic, EDF, Round Robin, TDMA
8
11. Various
Application Specific Instruction Set
- specialized, but still programmable (efficiently)
- HW with custom instruction set & operations
Translation Look-aside buffer (TLB): stores recent

translation of virtual to physical memory
“Traffic shaper”: guarantee min. delay between tasks

- spreads bursts to minimize influence on other tasks

HardwareSoftwareCodesign Summary

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HardwareSoftwareCodesign Summary

Uploaded by

Copyright:

Available Formats

Design Challenges 2.

Model of Computation: “What happens inside & how interact?”

Try to make result independent of time, focus on data

10.1 Real-time calculus (10-11)

Computation: task instance 𝑅(𝑡), computing resource 𝐶(𝑡)

Ex. Periodic with Jitter: (10-21)

Translation Look-aside buffer (TLB): stores recent

“Traffic shaper”: guarantee min. delay between tasks

You might also like