Professional Documents
Culture Documents
Unit 4 Development and Debugging Techniques and Challenges
Unit 4 Development and Debugging Techniques and Challenges
Cross compiler
Test bench program
Debugging Embedded Systems
Challenges:
target system may be hard to observe;
target may be hard to control;
may be hard to generate realistic inputs;
setup sequence may be complex.
Host/target design
1
Host-based tools
Cross compiler:
• compiles code on host for target system.
Cross debugger:
• displays target state, allows target system to be controlled.
Software debuggers
A monitor program residing on the target provides basic debugger
functions.
Debugger should have a minimal footprint in memory.
User program must be careful not to destroy debugger program,
but ,should be able to recover from some damage caused by user
code.
Debugging techniques
• Compiling and executing in the workstation
• Interrupt Routines
• Breakpoint
Hardware
• In Circuit Emulator(ICE)
• Logic analyzer(State and Timing modes)
• Laboratory Tools
Software
• Instruction set simulator
Develop a Test System
Ta -ge · systeni .est ·syst.e·r n.
------- -- - -
- -- - ---- --- ' "- - - - - - - - - - - - ~ •~ 1 - - - - - - l!r!! . - -
,...... -~ - -- - - - ......,;·-- - --- - - - -
- - - - - - - - ...........~ - - - - - - - - .
'·'Ha.rdW are-ind e -~e· - dent' ~ -l- ''·Har , w-are-i n · epe ..· dent' I
.·_ o . e ·: . code
.
, .-
Dis ·
Calling Interrupt Routine
• This is to make the system do anything in the test environment
• Thus, the test scaffold must execute the interrupt routine.
• The interrupt routines are structured such that the hardware
dependent part calls the hardware independent part
Calling Timer Interrupt Routine
• TIR will initiate some of the activity
• For this, if the test scaffold is not involved, such that the timer of
the host system could call the TIR
• But this would mean that the test scaffold loses control of TIR
• Hence, it is better off that the scaffold calls the TIR
BreakPoint
• The simplest form of the breakpoint is for the user to specify an
address at which the program’s execution is to break
• When the PC reaches that address, control is returned to the
monitor program
• From the monitor program, the user can examine or modify CPU
register, after which execution can be continued
ARM breakpoints
Breakpoint handler actions
• Save registers.
• Allow user to examine machine.
• Before returning, restore system state.
• Safest way to execute the instruction is to replace it and execute in
place.
• Put another breakpoint after the replaced breakpoint to allow
restoring the original breakpoint.
In-circuit emulators
• A microprocessor in-circuit emulator is a specially-instrumented
microprocessor.
• Allows you to stop execution, examine CPU state, modify registers.
In-Circuit Emulator
• ICE is used for debugging a target system
without using the target processor
microcontroller
• But what if you want to alter the contents of a register, memory, or
the state of your I/O to see what happens? An In-circuit emulator
(ICE) is a debugging tool that allows you to access a target MCU for
in-depth debugging.
• ICE is the best tool for finding difficult bugs and can provide
invaluable insight.
Logic analyzer
-- terna
,c ock
Clock
select
1Q
1
2 Probes Comparator - - Memory Disp a.y
ill·
ill·
ill·
n T'r gg,e ·r
.s ,e l ect
Trigger
logic
Ext,e nnal
tr1gg,e r
• A logic analyzer provides a solution to a particular class of problems.
These include digital hardware debugging, design verification, and
embedded software debugging.
• A logic analyzer is an indispensable tool to design and troubleshoot digital
circuits.
• Debugging microprocessor-based designs requires more inputs than
oscilloscopes can offer. Logic analyzers — with their multiple inputs —
solve these problems.
• A logic analyzer solely measures digital, not analog signals! It can capture
many digital signals simultaneously and display their often complex timing
relationship to one another.
• A logic analyzer detects logic threshold levels.
Instruction Set Simulator
• Programs that run on the host and mimic the target microprocessor
and memory
• These simulators can help to determine response and throughput
and to test the start-up code
Debugging challenges
• Software interaction with hardware
• Response and throughput
• Shared data problem
• Portability issues
Models of program
• Models are developed for programs.
• They are more general than the source code
• Source code assembly languages, C code…
• Can be described by single model
• Control/Data flow graph(CDFG)
• Fundamental model for the program
• Data operations(arith, other computations) and conditional operations
DFG a b d
z
CDFG .
Ccode
basic_block_l()
basic_block_2()
bas1
1c_block_3,()
basic- b l ock- 4( )
.
1 basic_block_5() basic- block- 6,()
C,D FG
CDFG for while loop
hi~le < b) {
-_ ~ 1- r -. c .a .b)
b == c2 · -_ -
a1== . cl a b)~
== p DC . ( _
· b _. ~
Assembly, linking and loading
· High-]eve]
Assembly 1
0bject
language Ass,emb]er
co de
11 1 ,c ode
c ,o de
Linke·r
Loader •ecuta
- ary
': - . . ··.
Assembler
• Converts high level language into assembly level language
• Labels into addresses
• Symbol table OR.G .00
.
l abe l. 1 tcmR ,
.c
IL. El ·0 . [ . ]
l a · e l. l Atcm R .d
IL. [m R ·1 . [ ]
l abe l. 3 SUB ·0 • 0 r
] IL 0
] b l2 1 08
l b ll lIL.6
Linking
• Program stitched together out of several pieces
• Operates on object files
• Entry point: where label is defined in the file
• External reference: where the label is used
la be l IL R [r l ] label2 D
D a BI
B a 12 %
If
E ern
l
labe l 1
laoel3
lab el 2 If
Basic Compilation techniques
Statement translation
• Translating high level language Without /little optimization.
a
d
I 2
'I.
'!.
.,. .,,,..
..-
z
i= 1Q ;
f= 1Q1;
Lo1op .i nitiation co,d
Loop
..................""<.
xit
y
f = f + C i · x (i :;
Drawing a control flow graph based on the while form of the loop helps us
understand how to translate it into instructions
Procedure
• Generate a code to handle the procedure call and return
• Subroutine call is not sufficient
• Introduced procedure stack and procedure linkages
• Procedure linkages: Pass parameters and return a value
• Stack pointer: Defines the end of the current frame
• Frame pointer: End of the last frame
• The procedure can refer to an element in the frame by addressing relative
to stack pointer
• When a new procedure is called, the stack pointer and frame pointer are
modified to push another frame onto the stack
Data structures
• References data structure memories
• Address computation: Run time or at compile time
• The address of the array element must be computed at run times, cince
the array index can change.
• Array format
, IL
.
.
Program optimization
• Expression simplification [a*b+c*a= a(b+c)]
• Dead code elimination
d ef 11 e [a [E ,LI
• Procedure Inlining
{ etru n + - ·
z= + -
Contd…
• Loop transformation
If N=4 ( 11 = 0
.
l :· ii +) {
a l - 1[ 1 ] c [i ]
Loop fusion } 0 11 2: l ++ {:
a [i 2 b [i 2 ] c [ ·11 2] :
Loop distribution a [i
}
2 ] = b i 2 ] C [ 11 .2 + 1] <
a
-
X
y
-
-
-
a
C
C
B
•
B
~
I
I
I
s ·t a e en
s ·t a e en
s t .a e en
1
l
I
I
I
b
C
- C
0
I
2
d
- X
3
0
3
y
- -
l 2 3
Contd…
• Scheduling order in which the operations will be performed, by
changing the order register allocation can be more efficient
Track CPU resources using reservation table
-fl-
-fl-
-fl-
Program level energy and power analysis and
optimization
• Choose the proper algorithm
• Proper memory access
• Turn of the sub systems when not in use
!r-------7'
+
--
h 'I · R . IE) I
t esit_ c d e )~
~
cu
• Several factors contribute to the energy consumption of the program.
■ Energy consumption varies from instruction to instruction.
■ The sequence of instructions.
■ The opcode and the locations of the operands
• Identifying and eliminating duplications can lead to significant memory savings usually with little
performance penalty
• Data can sometimes be packed, such as by storing several flags in a single word and extracting them
by using bit-level operations
• Encapsulating functions in subroutines can reduce program size when done carefully.
• Architectures that have variable-size instruction lengths are particularly good candidates
for careful coding to minimize program size, which may require assembly language coding
of key program segments.
Provide the program with inputs that exercise the test we are interested
in.
Execute the program to perform the test.
Examine the outputs to determine whether the test was successful.
Clear box testing
• The most fundamental concept in clear-box testing is the path of execution
through a program.
HLL
HLL assembly
HLL compile assembly
assembly assemble
Multiple-module programs
I
I
I
I
l
I
Programs may be composed from several files. I
I
I
Addresses become more specific during processing:
I
I
I
relative addresses are measured relative to the start of a
l
l
I
module; I
I
I
absolute addresses are measured relative to the start of I
l
the CPU address space.
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Assemblers
I
I
I
I
l
I
Major tasks: I
I
I
generate binary for symbolic instructions;
I
I
I
translate labels into addresses;
l
l
I
Symbol table
I
I
I
I
l
I
ADD r0,r1,r2 xx 0x8 I
I
I
xx ADD r3,r4,r5 yy 0x10 I
I
I
CMP r0,r3
l
l
I
I
yy SUB r5,r6,r7 I
I
I
l
l
I
assembly code symbol table
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
I
I
I
I
l
I
Combines several object modules into a single I
I
I
executable module. I
I
I
Jobs: l
l
I
put modules in order;
I
I
I
resolve labels across modules.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Linking
I
I
I
I
l
I
Combines several object modules into a single I
I
I
executable module. I
I
I
Jobs: l
l
I
put modules in order;
I
I
I
resolve labels across modules.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Linking
Program stitched together out of several pieces
Operates on object files
Entry point: where label is defined in the file
External reference: where the label is used
Load map: order of files to be loaded in memory.
labell LOR rO,[rlJ label2 AOR varl
ADR a B label3
B label2 X % 1
y % l
varl % l a % IO
External Entry
External Entry references pofnts
references points
varl label2
a labell
label3 X
label2 varl
y
a
Flle I FUe2
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Two-pass assembly
I
I
I
I
l
I
Pass 1: I
I
I
generate symbol table
I
I
I
Pass 2:
l
l
I
Pseudo-operations
I
I
I
I
l
I
Pseudo-ops do not generate instructions: I
I
I
ORG sets program location.
I
I
I
EQU generates symbol table entry without advancing
l
l
I
PLC. I
I
I
Data statements define data blocks. I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Dynamic linking
I
I
I
I
l
I
Some operating systems link modules dynamically at I
I
I
run time: I
I
I
shares one copy of library among all executing l
l
programs;
I
I
I
Dynamic linking
I
I
I
I
l
I
Some operating systems link modules dynamically at I
I
I
run time: I
I
I
shares one copy of library among all executing l
l
programs;
I
I
I
Compilation
I
I
I
I
l
I
Compilation strategy (Wirth): I
I
I
compilation = translation + optimization
I
I
I
Compiler determines quality of code:
l
l
I
code size.
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Basic compilation phases
HLL
machine-independent
optimizations
machine-dependent
optimizations
assembly
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
l
l
1
I
Contd…
I
I
I
I
l
I
Statement translation I
I
I
Procedures
I
I
I
Data structures
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Control code generation
if (a+b > 0)
x = 5;
else a+b>0 x=5
x = 7;
x=7
Control code generation,
cont’d. ADR r5,a
LDR r1,[r5]
ADR r5,b
1 a+b>0 x=5 2 LDR r2,b
ADD r3,r1,r2
BLE label3
LDR r3,#5
3 x=7
ADR r5,x
STR r3,[r5]
B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
I
I
l
l
I
Statement translation
I
I
I
I
l
I
Translating high level language Without /little I
I
I
optimization. a b + 5* ( c - d )
I
I
I
l
l
I
a b I
C d I
I
I
2 l
l
I
5 I
,' X I
I
I
l
3 I
I
,,
,I
I
'
I
,I I
I
I
,/ Y I
I ,/ I
\ ,' 4 I
' l
\
/ z I
______________ I
____ ____________ I
________________ I
_ ______________
____________
Arithmetic expressions, cont’d.
b
a c d ADR r4,a
MOV r1,[r4]
1 * 2 - ADR r4,b
5 MOV r2,[r4]
ADD r3,r1,r2
ADR r4,c
3 * MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
4 + SUB r6,r4,r5
MUL r7,r6,#5
ADD r8,r7,r3
DFG code
Contd…
.
I □ e r at□ r ( )
ADR r4 . a ~e [I e:S s f ar 21
WJ r I ~ □ ~d 21
~DR r 4., tJ ~e _ i:J cl e-s s f □ r I>
av r ~ "E ~ .o;ad tia
ADD r l , i r ;i!
I pu "I n m rr3
.
I □ e r at□ r :i! (- )
ADR r4 ., c ge a- cl e-s s f □ r C
01, r 4" 4 ] o;ad C
ADR r4 , □ ge a cl e-s s f □ r
~rn, rs. . E 1 o;ad
S.UB r 6 I , rs pu X "I n tI1 fi6
; □ -e r -:z t□ r ] ( li:)
u r7 6 . 5I □ pera t □ 3, pu_ s ~ Ill . [II 7
. □ e r atar 4 ( )
I
a
--
~
a[O]
a[O,l ]
...
a[U
a[ l ,O]
•••
a[l ,l ]
-
...
Unit 4 - Program design and analysis
• Components for embedded programs
• Model of program
• Interrupt & interrupt latency
Components For Embedded Programs
• Components that are commonly used in embedded software:
• the state machine,
• the circular buffer, and
• the queue.
• State machines are well suited to reactive systems such as user interfaces
• Circular buffers and queues are useful in digital signal processing.
State Machines
• The reaction of most systems can be characterized in terms of the input received and the
current state of the system.
• This leads naturally to a finite-state machine style of describing the reactive system’s behavior.
• The state machine style of programming is an efficient implementation of such computations.
• Uses: – control dominated code; reactive system
• Example: Seat Belt Controller
• Turn on buzzer if a person sits in a seat and does not fasten seat belt within a stipulated time
• Inputs:
• Seat sensor – to know whether a person has sat down or not
• Seat belt sensor – to know if the belt is fastened or not
• Timer – to know when the fixed interval has elapsed
• Output – buzzer
• States – Idle, seated, buzzer, belted
State Machine Example – Seat Belt controller
Seat Belt Controller – C Code
s wi t c 1h ( s t ,a t 1 ~· ) { / c · ck .he c u r re,n t s t ,a t e * /
1
bre,a k:
case, SEATED:
1
i f ( be 1 t ) st ,a . _ .~ BE l ED,.. / * w0 n I t he,a r th 1 :
1 1
buzzer · I
else if (timer) state= BUZZERt /* didn t put on 1
belt i n . ,; me * I
/ · de ault · s self-loop · /
bre,a k:
c ase· BE il TED:
1
}
Circular Buffer
• A data structure that enables the handling of streaming data in an efficient
way.
• Uses a circular buffer stores a subset of the data stream.
· imet
T:iim
Da.ta :strea.m X x2 x3 X x5 x6
► l I2 I3 I4 I5 I6
Timet + 1
Datastrea:m
3
]]_
-- 5
2 2 ---~
3 3
4 4
Timet Timet + 1
Circular buffer
Circular Buffer – C Code Implementation
DI~ a -
) • n1 ·(
(1 l - 0J i < S ZE • .
❖ f - 0
el = SIZ 1 ,.
❖
I . ro - h --
-
]j C
'
r
I
n . {
p 5 = .:: p.· S + n , I O - )
f r[p,. s] - e e _ ·· b ffer dex]·
Queues
• Queues are used whenever data may arrive and depart at somewhat
unpredictable times or when variable amounts of data may arrive.
• A queue is often referred to as an elastic buffer.
• CB has a fixed number of data elements while a queue had varying number of
elements
• One way to build a queue is with a linked list.
• This approach allows the queue to grow to an arbitrary size.
• But this results with a huge price of dynamically allocating memory.
• Another way to design the queue is to use an array to hold all the data.
• For this, two error conditions need to be checked:
• removing from an empty queue and
• adding to a full queue
• adding an element to the tail of the queue is known as enqueueing
• removing an element from the head of the queue is known as dequeueing
An Array Based Queue
#define · -IZ 5 .
VOl qu1eue- 1
' - () {
int q[SIZ ]· hea1d = 0;
ail = 01;
int head . -ii; }
■
Models of program
• Source code assembly languages, C code…
• Source code is not a good representation for programs:
• Clumsy
• Language dependent
• Models are developed for programs.
• They are more general than the source code
• All language specific models can be described by single model
Data Flow Graph
• Model of a program with no conditionals
• Describes a minimal ordering requirement on operations
l 1
DFG- Data Flow Graph
a b C d e
z
Single Assignment Form
X b
--
e se "b .)
bb~() -
, C 1
(te , ) ·
> a .-e 1~ b ·_ · 1 1
() , b ·ea >;
_----:a--:· ~- -~-: ..~-. :. l■I
. . - - - - :___ - I_ -
b-·- .--- 5-' (· · ~ b· --- e----:_ a--
- . . :___ ·_, - · __:_· - . _:_· . .
-_ I■
. '
CDFG – For Loop
_o L o ·
or (i=O; 1< ·.· ; i++)
oop.-. be~ ·y() ·
l
1
r . . ..
q i. ale t <
1~0
whle i<N) .
oop..,·~bo
1
r - ·
·y.· : ( ) •
, ....
i++·~}·
INTERRUPT
• Interrupt is a signal emitted by hardware or software when a process or an event
needs immediate attention.
• It alerts the processor to a high priority process requiring interruption of the
current working process.
• When the microprocessor detects that a signal attached to one of its interrupt
request pins is asserted, it stops executing the sequence of instructions it was
executing, saves on the stack the address of the instruction that would have been
next and jumps to an interrupt routines.
• Interrupt routines are subroutines that do whatever needs to be done when the
interrupt signal occurs.
Hardware Interrupt
This signal tells the microprocessor
that th'e' serial port chip needs service. • In a hardware interrupt, all the devices
'.·. :,·.:_:·_;:,-:-: ~ ~_---,~~►~~{'
'· . .'. ;·. ·,:'. }r.{i.-- ----
i
are connected to the Interrupt
- ,:, i
Request Line.
CPU <··.
.. .
·-.
.
• A single request line is used for all the
. ' .' ~- .. :. _. ·: : ,'
.~
n devices.
• To request an interrupt, a device closes
. ·.
... :· ·-·,':-.
.- . . . -~- -~ J= ..::·:·_i;; -,
:Ne# i\1I
its associated switch.
:· . Interfate'.?it Interrupt
-~ -- :· .~ .: ~--:~_i -~--;"- ~-~-*-"' _ _ _ _ ____,
·. .. ' . . ' .·J,: . .
request pins.
PHOCEDUR;E
N o r----1
Pro g r a P . SH REGISTERS
MAINLINE PUSH· FLA.GS
F"lo I T PROGRAM CLIEAR IF
C .E ARTF
I T IPiUSH CS
PU,SH IP
FETCH ISR ADDRIESS
POP~P
POP CS
POPFlAGS
POP REGISTERS
~R5T
Handling Multiple Devices
• When more than one device raises an interrupt request signal, additional information is required to
decide which device is to be serviced first.
• The methods to decide which device to select:
• Polling:
• The first device encountered with with IRQ bit set is the device that is to be serviced first.
• Appropriate ISR is called to service the same.
• It is easy to implement but a lot of time is wasted by interrogating the IRQ bit of all devices.
• Vectored Interrupts:
• A device requesting an interrupt identifies itself directly by sending a special code to the processor over the
bus.
• This enables the processor to identify the device that generated the interrupt.
• The special code can be the starting address of the ISR or where the ISR is located in memory, and is called
the interrupt vector.
• Interrupt Nesting:
• I/O device is organized in a priority structure.
• Therefore, interrupt request from a higher priority device is recognized where as request from a lower
priority device is not.
• To implement this, each process/device incl the processor is prioritized.
• Processor accepts interrupts only from devices/processes having priority more than it
Interrupt Classification
• Based on what is the cause for the interrupt
• Hardware – caused by external signal
• Software – caused by special instruction
• Based on whether the interrupt can be hidden or not
• Maskable - hidden
• Non-maskable – not hidden
• Based on the address of the subroutine
• Vectored – address is hardwired; vector location
• Non-vectored – address to be provided externally
• Based on the type of interrupt
• Edge triggered
• Level triggered
Saving And Restoring The Context
• Pushing all of the registers at the beginning of an interrupt routine
is known as saving the context
Energy/power.
I
I
I
Program size.
I
l
l
I
Program validation and testing. I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Program-level performance
analysis
Need to understand
performance in detail:
Real-time behavior, not just
typical.
On complex platforms.
Program performance ≠ cache
CPU performance: total execution ti.me
performance
I
I
I
I
l
I
Varies with input data: I
I
I
Different-length paths.
I
I
I
Cache effects.
l
l
I
performance
I
I
I
I
l
I
Simulate execution of the CPU. I
I
I
Makes CPU state visible.
I
I
I
Measure on real CPU using timer.
l
l
I
simplify analysis.
I
I
I
I
Easier to separate on simpler CPUs. l
l
I
Accurate performance analysis requires: I
I
I
Assembly/binary code.
I
l
I
Execution platform.
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Data-dependent paths in an if
statement
if (a || b) { /* T1 */ a b c path
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments
N
i=N
Y
f = f + c[i] * x[i]
i=i+1
Instruction timing
I
I
I
I
l
I
Not all instructions take the same amount of time. I
I
I
Multi-cycle instructions. I
I
Fetches.
I
l
l
Execution times of instructions are not independent. I
I
I
Pipeline interlocks. I
I
l
Cache effects. l
I
Execution times may vary with operand value. I
I
I
Floating-point operations. I
l
I
Some multi-cycle integer operations. I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
Mesaurement-driven performance
1 l
l
I
analysis
I
I
I
I
l
I
Not so easy as it sounds: I
I
I
Must actually have access to the CPU.
I
I
I
Must know data inputs that give worst/best case
l
l
I
performance. I
I
I
Must make state visible. I
l
l
Still an important method for performance analysis. I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Trace-driven measurement
I
I
I
I
l
I
Trace-driven: I
I
I
Instrument the program.
I
I
I
Save information about the path.
l
l
I
Physical measurement
I
I
I
I
l
I
In-circuit emulator allows tracing. I
I
I
Affects execution timing.
I
I
I
CPU simulation
I
I
I
I
l
I
Some simulators are less accurate. I
I
I
Cycle-accurate simulator provides accurate clock-cycle
I
I
I
timing. l
l
I
Simulator models CPU internals.
I
I
I
Simulator writer must know how CPU works.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
SimpleScalar FIR filter simulation
int x[N] = {8, 17, … }; N total sim sim cycles
cycles per filter
int c[N] = {1, 2, … }; execution
main() { 100 25854 259
1,000 155759 156
int i, k, f;
1,0000 1451840 145
for (k=0; k<COUNT; k++)
for (i=0; i<N; i++)
f += c[i]*x[i];
}
motivation
I
I
I
I
l
I
Embedded systems must often meet deadlines. I
I
I
Faster may not be fast enough.
I
I
I
Need to be able to analyze execution time.
l
l
I
analysis
I
I
I
I
l
I
Best results come from analyzing optimized I
I
I
instructions, not high-level language code: I
I
I
non-obvious translations of HLL statements into l
l
instructions;
I
I
I
Loop optimizations
I
I
I
I
l
I
Loops are good targets for optimization. I
I
I
Basic loop optimizations:
I
I
I
code motion;
l
l
I
induction-variable elimination;
I
I
I
i<N*M
i<X N
Y
z[i] = a[i] + b[i];
i = i+1;
z[i,j] = b[i,j];
I
l
l
I
Rather than recompute i*M+j for each array in each I
I
Cache analysis
I
I
I
I
l
I
Loop nest: set of loops, one inside other. I
I
I
Perfect loop nest: no conditionals in nest.
I
I
I
Energy/power optimization
I
I
I
I
l
I
Energy: ability to do work. I
I
I
Most important in battery-powered systems.
I
I
I
Power: energy per unit time.
l
l
I
while (TRUE)
a();
SRAM write: 9
I
I
I
SRAM read: 4.4
I
l
l
multiply: 3.6
I
I
I
add: 1
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
l
BMrgy [Joolall
0.1
14
dcadul ala [2'"'Val) lcar:beslzelr-vaJJ
~
"MPBG"
letll7
14
clcache size [2""val] 15 icache riize [2°"¥all
IIDclldon time
overhead instructions.
I
I
I
Efficient loops
I
I
I
I
l
I
General rules: I
I
I
Don’t use function calls.
I
I
I
Keep loop body small to enable local repeat (only
l
l
I
forward branches). I
I
I
Use unsigned integer for loop counter. I
l
l
Use <= to test loop counter. I
I
I
Make use of compiler---global optimization, software I
I
pipelining.
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
RPT #(1024-1)
I
I
I
MVDD *AR2+,*AR3+ l
l
I
; move I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Two opportunities:
I
I
I
data;
I
l
l
instructions.
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
Clear-box testing
I
I
I
I
l
I
Examine the source code to determine whether it I
I
I
works: I
I
I
Can you actually exercise a path?
l
l
I
Testing procedure:
I
l
l
I
Controllability: rovide program with inputs. I
I
I
Execute. I
l
I
Observability: examine outputs. I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
basis paths:
b 0010 l
C 1 l O1 0
d 0010 l
e 0 10 I 0 All paths are linear
Inddence matrix
combinations of basis
d
a
b
10000
01000
paths.
C 00100
Graph d 00010
e 00001
Basis set
M = e – n + 2p.
I
I e=B
I
I
I V(G) =8-6 +2=4
~---- _J
Branch testing
I
I
I
I
l
I
Heuristic for testing branches. I
I
I
Exercise true and false branches of conditional.
I
I
I
Exercise every simple condition at least once.
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
2)] = F
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
inequalities. j • • i=4,j=5
j • • i=4,j=5
• i= 1,j=2
j<=i+ 1 i=3,j=5
i j • • i=4,j=5
Cmrect test
e i=l,j=2
j>= i-1
Incorrect tests
Loop testing
I
I
I
I
l
I
Loops need specialized tests to be tested efficiently. I
I
I
Heuristic testing strategy:
I
I
I
Skip loop entirely.
l
l
I
Black-box testing
I
I
I
I
l
I
Complements clear-box testing. I
I
I
May require a large number of tests.
I
I
I
Tests software in different ways.
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I