You are on page 1of 152

Unit 4

Development and Debugging techniques and challenges


Development environments
• Load pgm into host
• Start and stop execution in the target
• Examine memory and CPU registers

Cross compiler
Test bench program
Debugging Embedded Systems
Challenges:
 target system may be hard to observe;
 target may be hard to control;
 may be hard to generate realistic inputs;
 setup sequence may be complex.
Host/target design

1
Host-based tools
Cross compiler:
• compiles code on host for target system.
Cross debugger:
•  displays target state, allows target system to be controlled.
Software debuggers
A monitor program residing on the target provides basic debugger
functions.
Debugger should have a minimal footprint in memory.
User program must be careful not to destroy debugger program,
but ,should be able to recover from some damage caused by user
code.
Debugging techniques
• Compiling and executing in the workstation
• Interrupt Routines
• Breakpoint
Hardware
• In Circuit Emulator(ICE)
• Logic analyzer(State and Timing modes)
• Laboratory Tools
Software
• Instruction set simulator
Develop a Test System
Ta -ge · systeni .est ·syst.e·r n.
------- -- - -
- -- - ---- --- ' "- - - - - - - - - - - - ~ •~ 1 - - - - - - l!r!! . - -
,...... -~ - -- - - - ......,;·-- - --- - - - -
- - - - - - - - ...........~ - - - - - - - - .

'·'Ha.rdW are-ind e -~e· - dent' ~ -l- ''·Har , w-are-i n · epe ..· dent' I

.·_ o . e ·: . code

''· .arcl•"'1'"are-d'. ep . n .d -·.n -t· ,.,


1 - .•
-
-·- o-··de·
. ·· . . .· - ,_ -
. ■
-
.
'·:-·-
\

est scaffold code _


• • . ... • I I

Hard . -a -·· ....


·I

.
, .-

Dis ·
Calling Interrupt Routine
• This is to make the system do anything in the test environment
• Thus, the test scaffold must execute the interrupt routine.
• The interrupt routines are structured such that the hardware
dependent part calls the hardware independent part
Calling Timer Interrupt Routine
• TIR will initiate some of the activity
• For this, if the test scaffold is not involved, such that the timer of
the host system could call the TIR
• But this would mean that the test scaffold loses control of TIR
• Hence, it is better off that the scaffold calls the TIR
BreakPoint
• The simplest form of the breakpoint is for the user to specify an
address at which the program’s execution is to break
• When the PC reaches that address, control is returned to the
monitor program
• From the monitor program, the user can examine or modify CPU
register, after which execution can be continued
ARM breakpoints
Breakpoint handler actions
• Save registers.
• Allow user to examine machine.
• Before returning, restore system state.
• Safest way to execute the instruction is to replace it and execute in
place.
• Put another breakpoint after the replaced breakpoint to allow
restoring the original breakpoint.
In-circuit emulators
• A microprocessor in-circuit emulator is a specially-instrumented
microprocessor.
• Allows you to stop execution, examine CPU state, modify registers.
In-Circuit Emulator
• ICE is used for debugging a target system
without using the target processor
microcontroller
• But what if you want to alter the contents of a register, memory, or
the state of your I/O to see what happens? An In-circuit emulator
(ICE) is a debugging tool that allows you to access a target MCU for
in-depth debugging.
• ICE is the best tool for finding difficult bugs and can provide
invaluable insight.
Logic analyzer
-- terna
,c ock

Clock
select
1Q
1
2 Probes Comparator - - Memory Disp a.y
ill·
ill·
ill·

n T'r gg,e ·r
.s ,e l ect
Trigger
logic

Ext,e nnal
tr1gg,e r
• A logic analyzer provides a solution to a particular class of problems.
These include digital hardware debugging, design verification, and
embedded software debugging.
• A logic analyzer is an indispensable tool to design and troubleshoot digital
circuits.
• Debugging microprocessor-based designs requires more inputs than
oscilloscopes can offer. Logic analyzers — with their multiple inputs —
solve these problems.
• A logic analyzer solely measures digital, not analog signals! It can capture
many digital signals simultaneously and display their often complex timing
relationship to one another.
• A logic analyzer detects logic threshold levels.
Instruction Set Simulator
• Programs that run on the host and mimic the target microprocessor
and memory
• These simulators can help to determine response and throughput
and to test the start-up code
Debugging challenges
• Software interaction with hardware
• Response and throughput
• Shared data problem
• Portability issues
Models of program
• Models are developed for programs.
• They are more general than the source code
• Source code assembly languages, C code…
• Can be described by single model
• Control/Data flow graph(CDFG)
• Fundamental model for the program
• Data operations(arith, other computations) and conditional operations
DFG a b d

z
CDFG .
Ccode

basic_block_l()

basic_block_2()

bas1
1c_block_3,()

basic- b l ock- 4( )
.
1 basic_block_5() basic- block- 6,()

C,D FG
CDFG for while loop
hi~le < b) {
-_ ~ 1- r -. c .a .b)
b == c2 · -_ -

a1== . cl a b)~
== p DC . ( _
· b _. ~
Assembly, linking and loading

· High-]eve]
Assembly 1
0bject
language Ass,emb]er
co de
11 1 ,c ode
c ,o de

Linke·r

Loader •ecuta
- ary
': - . . ··.
Assembler
• Converts high level language into assembly level language
• Labels into addresses
• Symbol table OR.G .00
.
l abe l. 1 tcmR ,
.c
IL. El ·0 . [ . ]
l a · e l. l Atcm R .d
IL. [m R ·1 . [ ]
l abe l. 3 SUB ·0 • 0 r

O R. 100 ] IL. 100


= IL

] IL 0
] b l2 1 08
l b ll lIL.6
Linking
• Program stitched together out of several pieces
• Operates on object files
• Entry point: where label is defined in the file
• External reference: where the label is used
la be l IL R [r l ] label2 D

D a BI

B a 12 %

If

E ern

l
labe l 1
laoel3
lab el 2 If
Basic Compilation techniques
Statement translation
• Translating high level language Without /little optimization.

a
d

I 2

'I.

'!.

.,. .,,,..
..-

z
i= 1Q ;
f= 1Q1;
Lo1op .i nitiation co,d

Loop
..................""<.
xit
y

f = f + C i · x (i :;

.I = I 1· Lo1op v ariable up,d ate


'

Drawing a control flow graph based on the while form of the loop helps us
understand how to translate it into instructions
Procedure
• Generate a code to handle the procedure call and return
• Subroutine call is not sufficient
• Introduced procedure stack and procedure linkages
• Procedure linkages: Pass parameters and return a value
• Stack pointer: Defines the end of the current frame
• Frame pointer: End of the last frame
• The procedure can refer to an element in the frame by addressing relative
to stack pointer
• When a new procedure is called, the stack pointer and frame pointer are
modified to push another frame onto the stack
Data structures
• References data structure memories
• Address computation: Run time or at compile time
• The address of the array element must be computed at run times, cince
the array index can change.
• Array format

, IL
.

.
Program optimization
• Expression simplification [a*b+c*a= a(b+c)]
• Dead code elimination
d ef 11 e [a [E ,LI

l' [a [E [LI ) r 11 t d tuff

• Procedure Inlining

{ etru n + - ·

z= + -
Contd…
• Loop transformation
If N=4 ( 11 = 0
.
l :· ii +) {
a l - 1[ 1 ] c [i ]
 Loop fusion } 0 11 2: l ++ {:
a [i 2 b [i 2 ] c [ ·11 2] :
 Loop distribution a [i
}
2 ] = b i 2 ] C [ 11 .2 + 1] <

 Loop tiling nested loops


• Register allocation

a
-
X
y
-
-
-
a
C
C
B


B

~
I
I
I
s ·t a e en
s ·t a e en
s t .a e en
1
l
I
I
I
b

C
- C
0
I
2
d
- X
3
0
3

y
- -
l 2 3
Contd…
• Scheduling order in which the operations will be performed, by
changing the order register allocation can be more efficient
Track CPU resources using reservation table

-fl-

-fl-

-fl-
Program level energy and power analysis and
optimization
• Choose the proper algorithm
• Proper memory access
• Turn of the sub systems when not in use

Calculate the power consumed first


j'U[1Jffl et r Curren

!r-------7'
+
--
h 'I · R . IE) I
t esit_ c d e )~
~

cu
• Several factors contribute to the energy consumption of the program.
■ Energy consumption varies from instruction to instruction.
■ The sequence of instructions.
■ The opcode and the locations of the operands

• Choosing which instructions to use can make some difference in a


program’s energy consumption, but concentrating on the instruction
opcodes has limited payoffs in most CPUs.
• Properly organizing instructions and data in memory
• Try to use registers efficiently
• Accesses to registers are the most energy efficient
• Analyze cache behavior to find major cache conflicts.
• Cache accesses are more energy efficient than
main memory accesses
• high performance = low power. Generally speaking, making the program run faster
also reduces energy consumption
ANALYSIS AND OPTIMIZATION OF PROGRAM SIZE
• The size of its data and instructions both must be considered to minimize program size

• Identifying and eliminating duplications can lead to significant memory savings usually with little
performance penalty

• Inefficient programs often keep several copies of data.

• Buffers should be sized carefully

• Data can sometimes be packed, such as by storing several flags in a single word and extracting them
by using bit-level operations

• A very low-level technique for minimizing data is to reuse values


ANALYSIS AND OPTIMIZATION OF PROGRAM SIZE
• Minimizing the size of the instruction text of a program requires a mix of high-level
program transformations and careful instruction selection.

• Encapsulating functions in subroutines can reduce program size when done carefully.

• Architectures that have variable-size instruction lengths are particularly good candidates
for careful coding to minimize program size, which may require assembly language coding
of key program segments.

• Some microprocessor architectures support dense instruction sets, specially designed


instruction sets that use shorter instruction formats to encode the instructions.
Program validation and testing
• Black box testing generate test without looking into the internal
structure of the program
• Clear box testing generate tests based on program structure. Also
known as white box testing

Provide the program with inputs that exercise the test we are interested
in.
Execute the program to perform the test.
Examine the outputs to determine whether the test was successful.
Clear box testing
• The most fundamental concept in clear-box testing is the path of execution
through a program.

• determine how to ensure that the path is in fact executed.


• By forcing the program to execute along chosen paths, the program is tested.
• Execution of a path is forced by giving it inputs that cause it to take the
appropriate branches.
• Execution of a path exercises both the control and data aspects of the program.
• The control is exercised as branches are taken up; both the computations
leading up to the branch decision and other computations performed along the
path exercise the data aspects.
• Control /data flow graph tool
• Control the variable in the program and observe the results
• Modify the program by adding new inputs and outputs
• Accomplish the following in test
• Determine the set of tests to be performed
• Different types of which covers various criteria
• Choosing the path to test graph theory
• Branch testing
• Domain testing
• Data flow testing
• Cyclomatic testing
Black box testing
• Black-box tests are generated without knowledge of the code being
tested

• Random test random values are generated, no. of sequences


were used, various types of data values(integer, floating, binary)
• Regression test statistical values, past values has to be saved, new
system should be able to pass the older version,
1
___----.-&---____. .
- - - - _ 1
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
l
l
1
I
I
I
I
I
l
I
I
I
I
I
I
I
l

Assembly ,Linking and loading


l
I
I
I
I

Basic Compilation Techniques


I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Assembly and linking

HLL
HLL assembly
HLL compile assembly
assembly assemble

link executable link


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Multiple-module programs
I
I
I
I
l
I
 Programs may be composed from several files. I
I
I
 Addresses become more specific during processing:
I
I
I
 relative addresses are measured relative to the start of a
l
l
I
module; I
I
I
 absolute addresses are measured relative to the start of I
l
the CPU address space.
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Assemblers
I
I
I
I
l
I
 Major tasks: I
I
I
 generate binary for symbolic instructions;
I
I
I
 translate labels into addresses;
l
l
I

 handle pseudo-ops (data, etc.).


I
I
I

 Generally one-to-one translation.


I
l
l
I
 Assembly labels: I
I
I
ORG 100 I
l
I
label1 ADR r4,c I
I
I
I
I
I
I
I
I
I
-- -- I
 Converts high level language into assembly level
language ORG 100

 Labels into addresses


label ! ADR r 4 , c
LDR r 0 , [r4]
label 2 ADR r 4 , d

 Symbol table LDR r l , [r4]


label 3 SUB r 0 , r0 , r l

 Absolute address, Relative address

ORG 100 labell mo


PLC = 100 _.. labell ADR r4,c ORG 100 la bell 100
LDR r0,lr4J lab ell
ADRr4,c la bel2 108
Iabel2 ADR r4,d lDRr0,[r4J label3 ll6
LDR rl ,lr4J lab el2 ADRr4,d
label3 SUB r0,rO,rl lDR r1. [r4]
PLC= 116 --... lab el3 SUB rO,r-0,r l
Code symbol table
Code symbol table
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1l
l
I

Symbol table
I
I
I
I
l
I
ADD r0,r1,r2 xx 0x8 I
I
I
xx ADD r3,r4,r5 yy 0x10 I
I
I
CMP r0,r3
l
l
I
I
yy SUB r5,r6,r7 I
I
I
l
l
I
assembly code symbol table
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I
I
I
I
I
l
I
 Combines several object modules into a single I
I
I
executable module. I
I
I
 Jobs: l
l
I
 put modules in order;
I
I
I
 resolve labels across modules.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Linking
I
I
I
I
l
I
 Combines several object modules into a single I
I
I
executable module. I
I
I
 Jobs: l
l
I
 put modules in order;
I
I
I
 resolve labels across modules.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Linking
 Program stitched together out of several pieces
 Operates on object files
 Entry point: where label is defined in the file
 External reference: where the label is used
 Load map: order of files to be loaded in memory.
labell LOR rO,[rlJ label2 AOR varl

ADR a B label3

B label2 X % 1
y % l
varl % l a % IO

External Entry
External Entry references pofnts
references points
varl label2
a labell
label3 X
label2 varl
y
a
Flle I FUe2
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Symbol table generation


I
I
I
I
l
I
 Use program location counter (PLC) to determine I
I
I
address of each location. I
I
I
 Scan program, keeping count of PLC. l
l
I

 Addresses are generated at assembly time, not


I
I
I
execution time. I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Symbol table example
PLC=0x7

PLC=0x7 ADD xx 0x8


r0,r1,r2 yy 0x10
PLC=0x7
xx ADD
PLC=0x7 r3,r4,r5
CMP
r0,r3
yy SUB
r5,r6,r7
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
l
l
1
I

Two-pass assembly
I
I
I
I
l
I
 Pass 1: I
I
I
 generate symbol table
I
I
I
 Pass 2:
l
l
I

 generate binary instructions


I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Relative address generation


I
I
I
I
l
I
 Some label values may not be known at assembly time. I
I
I
 Labels within the module may be kept in relative form.
I
I
I

 Must keep track of external labels---can’t generate full


l
l
I

binary for instructions that use external labels.


I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Pseudo-operations
I
I
I
I
l
I
 Pseudo-ops do not generate instructions: I
I
I
 ORG sets program location.
I
I
I
 EQU generates symbol table entry without advancing
l
l
I
PLC. I
I
I
 Data statements define data blocks. I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Dynamic linking
I
I
I
I
l
I
 Some operating systems link modules dynamically at I
I
I
run time: I
I
I
 shares one copy of library among all executing l
l
programs;
I
I
I

 allows programs to be updated with new versions of


I
I
l
libraries. l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Dynamic linking
I
I
I
I
l
I
 Some operating systems link modules dynamically at I
I
I
run time: I
I
I
 shares one copy of library among all executing l
l
programs;
I
I
I

 allows programs to be updated with new versions of


I
I
l
libraries. l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
l
l
1
I
I
I
I
I
l
I
I
I
I
I
I
I
l
l

Basic Compilation techniques


I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Compilation
I
I
I
I
l
I
 Compilation strategy (Wirth): I
I
I
 compilation = translation + optimization
I
I
I
 Compiler determines quality of code:
l
l
I

 use of CPU resources;


I
I
I

 memory access scheduling;


I
l
l

 code size.
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Basic compilation phases
HLL

parsing, symbol table

machine-independent
optimizations

machine-dependent
optimizations

assembly
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
l
l
1
I

Contd…
I
I
I
I
l
I
 Statement translation I
I
I
 Procedures
I
I
I

 Data structures
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
I
-- -- I
Control code generation
if (a+b > 0)
x = 5;
else a+b>0 x=5

x = 7;
x=7
Control code generation,
cont’d. ADR r5,a
LDR r1,[r5]
ADR r5,b
1 a+b>0 x=5 2 LDR r2,b
ADD r3,r1,r2
BLE label3
LDR r3,#5
3 x=7
ADR r5,x
STR r3,[r5]

B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]

stmtent ...
I
I
l
l
I

Statement translation
I
I
I
I
l
I
 Translating high level language Without /little I
I
I
optimization. a b + 5* ( c - d )
I
I
I
l
l
I
a b I
C d I
I
I
2 l
l
I
5 I
,' X I
I
I
l
3 I
I
,,
,I
I
'
I
,I I
I
I
,/ Y I
I ,/ I
\ ,' 4 I
' l
\
/ z I
______________ I
____ ____________ I
________________ I
_ ______________
____________
Arithmetic expressions, cont’d.
b
a c d ADR r4,a
MOV r1,[r4]
1 * 2 - ADR r4,b
5 MOV r2,[r4]
ADD r3,r1,r2
ADR r4,c
3 * MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
4 + SUB r6,r4,r5
MUL r7,r6,#5
ADD r8,r7,r3

DFG code
Contd…
.
I □ e r at□ r ( )
ADR r4 . a ~e [I e:S s f ar 21
WJ r I ~ □ ~d 21
~DR r 4., tJ ~e _ i:J cl e-s s f □ r I>
av r ~ "E ~ .o;ad tia
ADD r l , i r ;i!
I pu "I n m rr3
.
I □ e r at□ r :i! (- )
ADR r4 ., c ge a- cl e-s s f □ r C
01, r 4" 4 ] o;ad C

ADR r4 , □ ge a cl e-s s f □ r
~rn, rs. . E 1 o;ad
S.UB r 6 I , rs pu X "I n tI1 fi6
; □ -e r -:z t□ r ] ( li:)
u r7 6 . 5I □ pera t □ 3, pu_ s ~ Ill . [II 7
. □ e r atar 4 ( )
I

ADD rB . 7 . r3 □ pera t □ 4. pu _:s 2 ~ Ill - [II B


Procedure
Allllother n1ajor code ge11e.t ati.on p:tobletn j the Cteation of procedure
1 a

 Generate a code to handle the procedure call and


return
 Subroutine call is not sufficient
 Introduced procedure stack and procedure linkages
 Procedure linkages: Pass parameters and return a
value
 Stack pointer: Defines the end of the current frame
 Frame pointer: End of the last frame
Data structures
 References data structure memories
 Address computation: Run time or at compile time
 Array format
a [O,O]

a
--
~

a[O]
a[O,l ]
...
a[U
a[ l ,O]
•••
a[l ,l ]
-
...
Unit 4 - Program design and analysis
• Components for embedded programs
• Model of program
• Interrupt & interrupt latency
Components For Embedded Programs
• Components that are commonly used in embedded software:
• the state machine,
• the circular buffer, and
• the queue.
• State machines are well suited to reactive systems such as user interfaces
• Circular buffers and queues are useful in digital signal processing.
State Machines
• The reaction of most systems can be characterized in terms of the input received and the
current state of the system.
• This leads naturally to a finite-state machine style of describing the reactive system’s behavior.
• The state machine style of programming is an efficient implementation of such computations.
• Uses: – control dominated code; reactive system
• Example: Seat Belt Controller
• Turn on buzzer if a person sits in a seat and does not fasten seat belt within a stipulated time
• Inputs:
• Seat sensor – to know whether a person has sat down or not
• Seat belt sensor – to know if the belt is fastened or not
• Timer – to know when the fixed interval has elapsed
• Output – buzzer
• States – Idle, seated, buzzer, belted
State Machine Example – Seat Belt controller
Seat Belt Controller – C Code
s wi t c 1h ( s t ,a t 1 ~· ) { / c · ck .he c u r re,n t s t ,a t e * /
1

case, IDll E::


1

i f ( sea t ) { s .ate = SEAT ED ~ t i me r_on = TR UE~ }


/ d1_fault c,as ~ is self - l 0 op / 1 1

bre,a k:
case, SEATED:
1

i f ( be 1 t ) st ,a . _ .~ BE l ED,.. / * w0 n I t he,a r th 1 :
1 1

buzzer · I
else if (timer) state= BUZZERt /* didn t put on 1

belt i n . ,; me * I
/ · de ault · s self-loop · /
bre,a k:
c ase· BE il TED:
1

if (~sea) state= IDLE; /* person left*/


else if (~bel .) state= SEATED:. /* pe,rs,on still
in siea 't * I
bre,ak ,..
iease, BUZZER.
i f ( be t ) st ,a .e ·= BE l ED,.. I * be l i s an-tu r n of f
buzzer · I
e .se if (!se ,a .) state = IDLE; /* no one in 1

seat-tur off buzzer*/

}
Circular Buffer
• A data structure that enables the handling of streaming data in an efficient
way.
• Uses a circular buffer stores a subset of the data stream.

· imet

T:iim
Da.ta :strea.m X x2 x3 X x5 x6
► l I2 I3 I4 I5 I6
Timet + 1
Datastrea:m

3
]]_
-- 5

2 2 ---~

3 3

4 4

Timet Timet + 1
Circular buffer
Circular Buffer – C Code Implementation
DI~ a -

) • n1 ·(
(1 l - 0J i < S ZE • .
❖ f - 0
el = SIZ 1 ,.

I . ro - h --
-

]j C
'
r
I
n . {
p 5 = .:: p.· S + n , I O - )
f r[p,. s] - e e _ ·· b ffer dex]·
Queues
• Queues are used whenever data may arrive and depart at somewhat
unpredictable times or when variable amounts of data may arrive.
• A queue is often referred to as an elastic buffer.
• CB has a fixed number of data elements while a queue had varying number of
elements
• One way to build a queue is with a linked list.
• This approach allows the queue to grow to an arbitrary size.
• But this results with a huge price of dynamically allocating memory.
• Another way to design the queue is to use an array to hold all the data.
• For this, two error conditions need to be checked:
• removing from an empty queue and
• adding to a full queue
• adding an element to the tail of the queue is known as enqueueing
• removing an element from the head of the queue is known as dequeueing
An Array Based Queue
#define · -IZ 5 .
VOl qu1eue- 1
' - () {
int q[SIZ ]· hea1d = 0;
ail = 01;
int head . -ii; }

void en e e (1nt value) { · nt d1e - ue1u () {


if (( ail -1) % SIZE -- head) int a1u -;
er1ror ( uThe queu1e is full"); if ( -ea - -= tail)
q[taiil] -= val1u e; e rro r ( -,Thie queue is e1
1 m pty") ;
ta·l = ( ail 1) % SIZE; v lue - q[head;
h: aid (h ad - 1) % SIZE;I
return ·v alue;
}
Producer/ Consumer Systems
• Queues allow varying input and output rates
• Queues modify the flow of control in the system and as well as store
data


Models of program
• Source code assembly languages, C code…
• Source code is not a good representation for programs:
• Clumsy
• Language dependent
• Models are developed for programs.
• They are more general than the source code
• All language specific models can be described by single model
Data Flow Graph
• Model of a program with no conditionals
• Describes a minimal ordering requirement on operations

l 1
DFG- Data Flow Graph
a b C d e

• Entry point, exit point, no conditions


• Single assignment form – variable in
LHS should appear only once
• Operations – circular nodes
• Variables – square nodes

z
Single Assignment Form

X b

--

• The single-assignment form is important because


o it allows us to identify a unique location in the code where each
named location is computed
Data Flow Graph
• The data flow graph for the code makes the order in which the operations
are performed much less obvious.
• It can be used to determine feasible re-orderings of the operations
• This helps to reduce pipeline or cache conflicts.
• It can be used when the exact order of operations simply doesn’t matter.
• The data flow graph defines a partial ordering of the operations in the basic
block. Hence, it is required to compute the value before it is used; but
generally there are several possible orderings of evaluating expressions
that satisfy this requirement.
Control – Data Flow Graph (CDFG)
• Represents control and data
• Uses data flow graphs as components
• A CDFG has 2 nodes
• Data flow node
• Encapsulates a complete data flow graph
• Write operations in basic block form for simplicity
• Decision node
• Describes all types of control (branch)
CDFG Example
- ' :o. Id''' '
' - -· , '
b:_ ' · -·
11' '~·.' ,

e se "b .)
bb~() -
, C 1
(te , ) ·
> a .-e 1~ b ·_ · 1 1
() , b ·ea >;
_----:a--:· ~- -~-: ..~-. :. l■I
. . - - - - :___ - I_ -
b-·- .--- 5-' (· · ~ b· --- e----:_ a--
- . . :___ ·_, - · __:_· - . _:_· . .
-_ I■
. '
CDFG – For Loop
_o L o ·
or (i=O; 1< ·.· ; i++)
oop.-. be~ ·y() ·
l
1

r . . ..

q i. ale t <

1~0
whle i<N) .
oop..,·~bo
1

r - ·
·y.· : ( ) •
, ....
i++·~}·
INTERRUPT
• Interrupt is a signal emitted by hardware or software when a process or an event
needs immediate attention.
• It alerts the processor to a high priority process requiring interruption of the
current working process.
• When the microprocessor detects that a signal attached to one of its interrupt
request pins is asserted, it stops executing the sequence of instructions it was
executing, saves on the stack the address of the instruction that would have been
next and jumps to an interrupt routines.
• Interrupt routines are subroutines that do whatever needs to be done when the
interrupt signal occurs.
Hardware Interrupt
This signal tells the microprocessor
that th'e' serial port chip needs service. • In a hardware interrupt, all the devices
'.·. :,·.:_:·_;:,-:-: ~ ~_---,~~►~~{'
'· . .'. ;·. ·,:'. }r.{i.-- ----
i
are connected to the Interrupt
- ,:, i

Request Line.
CPU <··.

.. .
·-.
.
• A single request line is used for all the
. ' .' ~- .. :. _. ·: : ,'
.~

n devices.
• To request an interrupt, a device closes
. ·.
... :· ·-·,':-.
.- . . . -~- -~ J= ..::·:·_i;; -,

:Ne# i\1I
its associated switch.
:· . Interfate'.?it Interrupt
-~ -- :· .~ .: ~--:~_i -~--;"- ~-~-*-"' _ _ _ _ ____,
·. .. ' . . ' .·J,: . .
request pins.

• When a device requests an interrupts,


This signal teUs ~e microprocessor
that the network chjp needs service. the value of INTR is the logical OR of
the requests from individual devices.
Interrupt Flow Sequence
N or I I T
Pro gira
F"lo
I T 4 I nte rru t
Routine
I nte rru t
0cc re
I.NTERRUP'1
S ~RV·iCE
1

PHOCEDUR;E
N o r----1
Pro g r a P . SH REGISTERS
MAINLINE PUSH· FLA.GS
F"lo I T PROGRAM CLIEAR IF
C .E ARTF
I T IPiUSH CS
PU,SH IP
FETCH ISR ADDRIESS

POP~P
POP CS
POPFlAGS
POP REGISTERS
~R5T
Handling Multiple Devices
• When more than one device raises an interrupt request signal, additional information is required to
decide which device is to be serviced first.
• The methods to decide which device to select:
• Polling:
• The first device encountered with with IRQ bit set is the device that is to be serviced first.
• Appropriate ISR is called to service the same.
• It is easy to implement but a lot of time is wasted by interrogating the IRQ bit of all devices.
• Vectored Interrupts:
• A device requesting an interrupt identifies itself directly by sending a special code to the processor over the
bus.
• This enables the processor to identify the device that generated the interrupt.
• The special code can be the starting address of the ISR or where the ISR is located in memory, and is called
the interrupt vector.
• Interrupt Nesting:
• I/O device is organized in a priority structure.
• Therefore, interrupt request from a higher priority device is recognized where as request from a lower
priority device is not.
• To implement this, each process/device incl the processor is prioritized.
• Processor accepts interrupts only from devices/processes having priority more than it
Interrupt Classification
• Based on what is the cause for the interrupt
• Hardware – caused by external signal
• Software – caused by special instruction
• Based on whether the interrupt can be hidden or not
• Maskable - hidden
• Non-maskable – not hidden
• Based on the address of the subroutine
• Vectored – address is hardwired; vector location
• Non-vectored – address to be provided externally
• Based on the type of interrupt
• Edge triggered
• Level triggered
Saving And Restoring The Context
• Pushing all of the registers at the beginning of an interrupt routine
is known as saving the context

• Popping them at the end is restoring the context.


Disabling Interrupts
• Interrupt signal can be stopped
• At the source level, by the I/O chips
• At the processor level, where the processor allows the program to ignore
incoming signals
• Assigning priority
• Non-maskable interrupts
• an input pin that causes an interrupt pin not to be disabled
• ex- to allow system to react to power failure or similar catastrophic event
Interrupt Latency
• The amount of time it takes a system to respond to an interrupt
• Factors controlling interrupt latency and hence response
1. The longest period of time during which that interrupt is disabled
2. The period of time it takes to execute any interrupt routines for interrupts that
are of higher priority than the one in question
3. How long it takes the processor to stop what it is doing and start executing
instructions within the interrupt routine.
4. How long it takes the interrupt routine to save the context and then enough
work, to get a ‘response’
• Factor 4 – writing efficient code
• Factor 3 – not under software control
• Factor 2 – write short interrupt routines

• Disabling interrupt solves shared data problem;


• but, shorter the period during which the interrupts are disabled, the
better the response
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l

Program design and analysis


l
I
I
I
I
I
l
I
 Program-level performance analysis. I
I
I
 Optimizing for:
I
I
I
 Execution time.
l
l
I

 Energy/power.
I
I
I

 Program size.
I
l
l
I
 Program validation and testing. I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Program-level performance
analysis
 Need to understand
performance in detail:
 Real-time behavior, not just
typical.
 On complex platforms.
 Program performance ≠ cache
CPU performance: total execution ti.me

 Pipeline, cache are windows


into program.
 We must analyze the entire
program.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
Complexities of program 1 l
l
I

performance
I
I
I
I
l
I
 Varies with input data: I
I
I
 Different-length paths.
I
I
I
 Cache effects.
l
l
I

 Instruction-level performance variations:


I
I
I
I
 Pipeline interlocks. l
l
I
 Fetch times. I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
How to measure program
1 l
l
I

performance
I
I
I
I
l
I
 Simulate execution of the CPU. I
I
I
 Makes CPU state visible.
I
I
I
 Measure on real CPU using timer.
l
l
I

 Requires modifying the program to control the timer.


I
I
I

 Measure on real CPU using logic analyzer.


I
l
l
I
 Requires events visible on the pins. I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Program performance metrics


I
I
I
I
l
I
 Average-case execution time. I
I
I
 Typically used in application programming.
I
I
I
 Worst-case execution time.
l
l
I

 A component in deadline satisfaction.


I
I
I

 Best-case execution time.


I
l
l
I
 Task-level interactions can cause best-case program I
I
behavior to result in worst-case system behavior.
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Elements of program performance


I
I
I
I
l
I
 Basic program execution time formula: I
I
I
 execution time = program path + instruction timing
I
I
I

 Solving these problems independently helps


l
l
I

simplify analysis.
I
I
I
I
 Easier to separate on simpler CPUs. l
l
I
 Accurate performance analysis requires: I
I
I

 Assembly/binary code.
I
l
I
 Execution platform.
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Data-dependent paths in an if
statement
if (a || b) { /* T1 */ a b c path
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments

x = r*s+t; /* A1 */ 0 0 1 T1=F, T3=T: A4


else y=r+s; /* A2 */ 0 1 0 T1=T, T2=F: A2, A3
z = r+s+u; /* A3 */ 0 1 1 T1=T, T2=T: A1, A3
} 1 0 0 T1=T, T2=F: A2, A3
else { 1 0 1 T1=T, T2=T: A1, A3
if ( c ) /* T3 */ 1 1 0 T1=T, T2=F: A2, A3
y = r-t; /* A4 */ 1 1 1 T1=T, T2=T: A1, A3
}

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


Paths in a loop
for (i=0, f=0; i<N; i++) i=0
f = f + c[i] * x[i]; f=0

N
i=N
Y
f = f + c[i] * x[i]

i=i+1

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Instruction timing
I
I
I
I
l
I
 Not all instructions take the same amount of time. I
I
I
 Multi-cycle instructions. I
I
 Fetches.
I
l
l
 Execution times of instructions are not independent. I
I
I
 Pipeline interlocks. I
I
l
 Cache effects. l
I
 Execution times may vary with operand value. I
I
I
 Floating-point operations. I
l
I
 Some multi-cycle integer operations. I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
Mesaurement-driven performance
1 l
l
I

analysis
I
I
I
I
l
I
 Not so easy as it sounds: I
I
I
 Must actually have access to the CPU.
I
I
I
 Must know data inputs that give worst/best case
l
l
I
performance. I
I
I
 Must make state visible. I
l
l
 Still an important method for performance analysis. I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Feeding the program


I
I
I
I
l
I
 Need to know the desired input values. I
I
I
 May need to write software scaffolding to generate the
I
I
I
input values. l
l
I

 Software scaffolding may also need to examine outputs


I
I
I
to generate feedback-driven inputs. I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Trace-driven measurement
I
I
I
I
l
I
 Trace-driven: I
I
I
 Instrument the program.
I
I
I
 Save information about the path.
l
l
I

 Requires modifying the program.


I
I
I

 Trace files are large.


I
l
l
I
 Widely used for cache analysis. I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Physical measurement
I
I
I
I
l
I
 In-circuit emulator allows tracing. I
I
I
 Affects execution timing.
I
I
I

 Logic analyzer can measure behavior at pins.


l
l
I
I
 Address bus can be analyzed to look for events. I
I
I
 Code can be modified to make events visible. l
l
I
 Particularly important for real-world input
I
I
I
streams. I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

CPU simulation
I
I
I
I
l
I
 Some simulators are less accurate. I
I
I
 Cycle-accurate simulator provides accurate clock-cycle
I
I
I
timing. l
l
I
 Simulator models CPU internals.
I
I
I
 Simulator writer must know how CPU works.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
SimpleScalar FIR filter simulation
int x[N] = {8, 17, … }; N total sim sim cycles
cycles per filter
int c[N] = {1, 2, … }; execution
main() { 100 25854 259
1,000 155759 156
int i, k, f;
1,0000 1451840 145
for (k=0; k<COUNT; k++)
for (i=0; i<N; i++)
f += c[i]*x[i];
}

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
Performance optimization
1 l
l
I

motivation
I
I
I
I
l
I
 Embedded systems must often meet deadlines. I
I
I
 Faster may not be fast enough.
I
I
I
 Need to be able to analyze execution time.
l
l
I

 Worst-case, not typical.


I
I
I

 Need techniques for reliably improving execution


I
l
l
time.
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
Programs and performance
1 l
l
I

analysis
I
I
I
I
l
I
 Best results come from analyzing optimized I
I
I
instructions, not high-level language code: I
I
I
 non-obvious translations of HLL statements into l
l
instructions;
I
I
I

 code may move;


I
I
l
 cache effects are hard to predict.
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Loop optimizations
I
I
I
I
l
I
 Loops are good targets for optimization. I
I
I
 Basic loop optimizations:
I
I
I
 code motion;
l
l
I

 induction-variable elimination;
I
I
I

 strength reduction (x*2 -> x<<1).


I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Code motion
for (i=0; i<N*M; i++) i=0; Xi=0;
= N*M
z[i] = a[i] + b[i];

i<N*M
i<X N
Y
z[i] = a[i] + b[i];

i = i+1;

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Induction variable elimination


I
I
I
I
l
I
 Induction variable: loop index. I
I
I
 Consider loop:
I
I
I
for (i=0; i<N; i++)
l
l
I

for (j=0; j<M; j++)


I
I
I

z[i,j] = b[i,j];
I
l
l
I
 Rather than recompute i*M+j for each array in each I
I

iteration, share induction variable between arrays,


I
I
l
increment at end of loop body. I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Cache analysis
I
I
I
I
l
I
 Loop nest: set of loops, one inside other. I
I
I
 Perfect loop nest: no conditionals in nest.
I
I
I

 Because loops use large quantities of data, cache


l
l
I

conflicts are common.


I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Array conflicts in cache
a[0,0] 1024
1024 4099

b[0,0] 4099 ...

main memory cache


© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Array conflicts, cont’d.


I
I
I
I
l
I
 Array elements conflict because they are in the same I
I
I
line, even if not mapped to same location. I
I
I
 Solutions: l
l
I
 move one array;
I
I
I
 pad array.
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Performance optimization hints


I
I
I
I
l
I
 Use registers efficiently. I
I
I
 Use page mode memory accesses.
I
I
I

 Analyze cache behavior:


l
l
I
I
 instruction conflicts can be handled by rewriting code, I
I
rescheudling; I
l
l
 conflicting scalar data can easily be moved;
I
I
I
 conflicting array data can be moved, padded.
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Energy/power optimization
I
I
I
I
l
I
 Energy: ability to do work. I
I
I
 Most important in battery-powered systems.
I
I
I
 Power: energy per unit time.
l
l
I

 Important even in wall-plug systems---power becomes


I
I
I
heat. I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Measuring energy consumption
 Execute a small loop, measure current:

while (TRUE)
a();

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Sources of energy consumption


I
I
I
I
l
I
 Relative energy per operation (Catthoor et al): I
I
I
 memory transfer: 33
I
I
I
 external I/O: 10
l
l
I

 SRAM write: 9
I
I
I
 SRAM read: 4.4
I
l
l
 multiply: 3.6
I
I
I
 add: 1
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Cache behavior is important


I
I
I
I
l
I
 Energy consumption has a sweet spot as cache size I
I
I
changes: I
I
I
 cache too small: program thrashes, burning energy on l
l
external memory accesses;
I
I
I

 cache too large: cache itself burns too much power.


I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Cache sweet spot
"MPBG"

l
BMrgy [Joolall

0.1

14
dcadul ala [2'"'Val) lcar:beslzelr-vaJJ
~

"MPBG"

l!aec. time (cycle,]

letll7

[Li98] © 1998 IEEE


9

14
clcache size [2""val] 15 icache riize [2°"¥all
IIDclldon time

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Optimizing for energy


I
I
I
I
l
I
 First-order optimization: I
I
I
 high performance = low energy.
I
I
I
 Not many instructions trade speed for energy.
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Optimizing for energy, cont’d.


I
I
I
I
l
I
 Use registers efficiently. I
I
I
 Identify and eliminate cache conflicts.
I
I
I

 Moderate loop unrolling eliminates some loop


l
l
I

overhead instructions.
I
I
I

 Eliminate pipeline stalls.


I
l
l
I
 Inlining procedures may help: reduces linkage, but I
I

may increase cache thrashing.


I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Efficient loops
I
I
I
I
l
I
 General rules: I
I
I
 Don’t use function calls.
I
I
I
 Keep loop body small to enable local repeat (only
l
l
I
forward branches). I
I
I
 Use unsigned integer for loop counter. I
l
l
 Use <= to test loop counter. I
I
I
 Make use of compiler---global optimization, software I
I
pipelining.
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Single-instruction repeat loop example


I
I
I
I
l
STM #4000h,AR2
I
I
I
; load pointer to source
I
I
I
STM #100h,AR3 I
l
l
; load pointer to destination I
I

RPT #(1024-1)
I
I
I
MVDD *AR2+,*AR3+ l
l
I
; move I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Optimizing for program size


I
I
I
I
l
I
 Goal: I
I
I
 reduce hardware cost of memory;
I
I
I
 reduce power consumption of memory units.
l
l
I

 Two opportunities:
I
I
I

 data;
I
l
l

 instructions.
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Data size minimization


I
I
I
I
l
I
 Reuse constants, variables, data buffers in different I
I
I
parts of code. I
I
I
 Requires careful verification of correctness. l
l
I
 Generate data using instructions. I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Reducing code size


I
I
I
I
l
I
 Avoid function inlining. I
I
I
 Choose CPU with compact instructions.
I
I
I

 Use specialized instructions where possible.


l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Program validation and testing


I
I
I
I
l
I
 But does it work? I
I
I
 Concentrate here on functional verification.
I
I
I

 Major testing strategies:


l
l
I
I
 Black box doesn’t look at the source code. I
I
I
 Clear box (white box) does look at the source code. l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Clear-box testing
I
I
I
I
l
I
 Examine the source code to determine whether it I
I
I
works: I
I
I
 Can you actually exercise a path?
l
l
I

 Do you get the value you expect along a path?


I
I
I

 Testing procedure:
I
l
l
I
 Controllability: rovide program with inputs. I
I
I
 Execute. I
l
I
 Observability: examine outputs. I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l

Controlling and observing programs


I
I
I
I
I
l
I
firout = 0.0;  Controllability: I
I
for (j=curr, k=0; j<N; j++, k++) I
 Must fill circular buffer
I
firout += buff[j] * c[k]; I
I
for (j=0; j<curr; j++, k++) with desired N values. l
l
I
firout += buff[j] * c[k];  Other code governs how I
I
if (firout > 100.0) firout = 100.0; we access the buffer. I
I
l
if (firout < -100.0) firout = -100.0;
 Observability:
l
I
I
 Want to examine firout
I
I
I
before limit testing. l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Execution paths and testing


I
I
I
I
l
I
 Paths are important in functional testing as well as I
I
I
performance analysis. I
I
I
 In general, an exponential number of paths through l
l
I
the program. I
I
I
 Show that some paths dominate others. I
l
l
 Heuristically limit paths. I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Choosing the paths to test
 Possible criteria:
 Execute every statement
at least once. not covered
 Execute every branch
direction at least once.
 Equivalent for structured
programs.
 Not true for gotos.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


Basis paths
 Approximate CDFG with
undirected graph.
 Undirected graphs have
abcde
a 00100

basis paths:
b 0010 l
C 1 l O1 0
d 0010 l
e 0 10 I 0  All paths are linear
Inddence matrix
combinations of basis
d
a
b
10000
01000
paths.
C 00100
Graph d 00010
e 00001
Basis set

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


Cyclomatic complexity
 Cyclomatic complexity is
3 1
a bound on the size of
basis sets:
 e = # edges
◄--------'
I  n = # nodes
 p = number of graph
components
n=6
◄ ----1
I

 M = e – n + 2p.
I
I e=B
I
I
I V(G) =8-6 +2=4
~---- _J

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Branch testing
I
I
I
I
l
I
 Heuristic for testing branches. I
I
I
 Exercise true and false branches of conditional.
I
I
I
 Exercise every simple condition at least once.
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Branch testing example


I
I
I
I
l
I
 Correct:  Test: I
I
I
 if (a || (b >= c)) {  a=F
I
I
I
printf(“OK\n”); }  (b >=c) = T l
l
I
 Incorrect:  Example:
I
I
I
 if (a && (b >= c)) {  Correct: [0 || (3 >= 2)] =
I
l
printf(“OK\n”); }
l
T I
I
I
 Incorrect: [0 && (3 >= I
I

2)] = F
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Another branch testing example


I
I
I
I
l
I
 Correct:  Incorrect code changes I
I
I
 if ((x == good_pointer) && x- pointer. I
I
>field1 == 3)) { printf(“got the I
 Assignment returns new l
value\n”); } l
LHS in C.
I
I
 Incorrect: I

 Test that catches error:


I
I
z if ((x = good_pointer) && x- l
l
>field1 == 3)) { printf(“got  (x != good_pointer) && I
I
the value\n”); } x->field1 = 3) I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
Domain testing
 Heuristic test for linear i=3,j=5

inequalities. j • • i=4,j=5

 Test on each side + i= l,j=2

boundary of inequality. i=3,J=5

j • • i=4,j=5

• i= 1,j=2
j<=i+ 1 i=3,j=5

i j • • i=4,j=5

Cmrect test
e i=l,j=2

j>= i-1

Incorrect tests

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


Def-use pairs
 Variable def-use:
 Def when value is
a~ mypointer;
\\.~
if (c > 5){
assigned (defined). \ ,
......
while (a->fieldl != vall)
 Use when used on right-
hand side. }
"
a = a->next;

 Exercise each def-use if (a->field2 , = val2)


pair. someproc(a,b);

 Requires testing correct


path.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.


- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Loop testing
I
I
I
I
l
I
 Loops need specialized tests to be tested efficiently. I
I
I
 Heuristic testing strategy:
I
I
I
 Skip loop entirely.
l
l
I

 One loop iteration.


I
I
I

 Two loop iterations.


I
l
l

 # iterations much below max.


I
I
I
I
 n-1, n, n+1 iterations where n is max. I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Black-box testing
I
I
I
I
l
I
 Complements clear-box testing. I
I
I
 May require a large number of tests.
I
I
I
 Tests software in different ways.
l
l
I
I
I
I
I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

Black-box test vectors


I
I
I
I
l
I
 Random tests. I
I
I
 May weight distribution based on software specification.
I
I
I
 Regression tests.
l
l
I

 Tests of previous versions, bugs, etc.


I
I
I

 May be clear-box tests of previous versions.


I
l
l
I
I
I
I
I
l
I
I
I
I
I
I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I
- - - - - - -.:c: :
: =:;;_.::_:~=~ ~~-~~ ~
1 l
l
I

How much testing is enough?


I
I
I
I
l
I
 Exhaustive testing is impractical. I
I
I

 One important measure of test quality---bugs


I
I
I

escaping into field.


l
l
I
I
 Good organizations can test software to give very I
I
I
low field bug report rates. l
l
I

 Error injection measures test quality:


I
I
I
I
 Add known bugs. l
I
I
 Run your tests. I
I
I
 Determine % injected bugs that are caught. I
I
I
I
I
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I
-- -- I

You might also like