You are on page 1of 236

REAL-T IM E E M BE D D ED

CO M PO NENTS AN D SYSTE M S

...

f/

'
.I

.
.

'

ii

.,.,

"

.
..

.,

...

,,

,y

'

;1'

SAM SIEWERT

'

C E N GAG E
Lea r n i n g

i..

..

r!

'

'
1,.

'

Scanned by CamScanner

V I

\.... .. ..

, .
:

'G I

..

E E R I

E R I

Contents

Preface
Acknowledgments
Part I

Real-Time Embedded Theory

/introduction to Real-Time Embedded Systems

xv
XVII

1
3

Introduction

A Brief History of Real-Time Systems

A Brief History of Embedded Systems

Real-Time Services
Real-Time Standards

6
14

Summary

15

Exercises

16

Chapter References

16

/ System Resources

17

Introduction

17

Resource Analysis

19

Real-Time Service Utility

25

Scheduling Classes

32

Multiprocessor Systems
The Cyclic Executive
Scheduler Concepts

33

35

Preemptive Fixed Priority Scheduling Policy

40

ReaJ-Time Operating Systems

43

Thread Safe Reentrant Functions

50

vii

Scanned by CamScanner

Contents

Summary

53

Exercises

53

Chapter References

55

/Processing

57

Introduction

57

Preemptive Fixed-Priority Policy

58

Feasibility

61

Rate Monotonic Least Upper Bound (RUM LUB)

62

Necessary and Sufficient (N&S) Feasibility

73

Scheduling Point Test

73

Completion Time Test

75

Deadline-Monotonic Policy

75

Dynamic Priority Policies

77

Summary

81

Exercises

81

Chapter References

83

/ 1/0 Resources
Introduction

85

Worst-Case Executi
on Tim e
Intermediate I/O

85
86

Execution Effici

89

110 Architect

Summary

ency

ure

91
94

Exercises

96

Chapter Refe
renc es

96
97

/Memory
In tro ductio

Ph ysical Hie
rarchy

99
99
JOO

Scanned by CamScanner

Contents

Ix

Capacity and Alloc ation

I 03

Shared Memory

I 04

ECC Memory

I 05

Flash Filesystems

I 09

Summary

111

Exercises

1 11

Chapter References

111

/ Multiresource Services

113

Introduction

113

Blocking

I 14

Deadlock and Livelock

I 14

Critical Sections to Protect Shared Resources


Priority Inversion
Unbounded Priority Inversion Solutions

I 16
I 16

118

Power Management and Processor Clock Modulation

120

Summary

121

Exercises

121

Chapter References

121

y Soft Real-Time Services


Introduc tion
M issed Deadlines
Quality of Service

123

123
124
125

Alternatives to Rate Monotonic Policy

125

Mixed Hard and Soft Real-Time Services

128

Summ ary

129

Exercises

129

Chapter References

Scanned by CamScanner

130

Contents

Part 11
8

ded C omponents
d
e
b
Em
. .ng Rea I-rime
Oes1gm
Components
Embedded System
In trod uction

Hardware Co mpon

137

Actuators

138

1/0 Interfaces

142

Processor Complex
on
Processor and I/0 Int erconnecti

l43

144

Bus Interconnection

147

High-Speed Serial Interconnection

149

Low-Speed Serial Interconnection

150

Interconnection Systems

151

Memory Subsystems

152

Firmware Components

153

Boot Code

153

Device Drivers

154

Operating System Services


RTOS System Software Mechanism s

154

Message Queues

155

Binary Semaphores

156

Mutex Semaphores

156

Software Virtu al Timers

157

So ftware Signals

m ponents

A PPJ'JCation Ser
vices
Re-entrant A
PP 1 cat ion Lib ra r
1es
C:Om mun1cat
ing and Sync
h ronize d A ppl
tcations
S umm ary

Exercises
Chapter R efe
rences

Scanned by CamScanner

133

135

Sen sors

133

135

ents

Software Applicatio
n Co

1 31

157

158
158

160

161

163

163

164

Contents
9

Debugging Components
Introduction
Exceptions
Assert
Checking Return Codes
Single-Step Debugging
Kernel Scheduler Traces
Test Access Ports
Trace Ports
Power-On Self Test and Diagnostics
External Test Equipment
Application-Level Debuggi ng
Summary
Exercises
Chapter References

,,,>fl'

Performance Tuning

Introduction
Basic Concepts of Drill-Down Tuning
Hardware-Supported Profiling and Tracing
Building Performance Monitoring into Software
Path Length, Efficiency, and Calling Frequency
Fundamental Optimizations
Summary
Exercises
Chapter References
11

High Availability and Reliability Design

Introduction
Reliability and Availability: Simil aritie s and Differences
Reliability

Scanned by CamScanner

165
165

166

173

174

175

182
186

188

189

194

198

199
199

200

201

201

202

206

209

210

221

222

223

223

225

225

226

227

Contents

xii

229

Reliable Software

229

Available Software

230

Design Tradeoffs
e
ign
Hierarchical Approaches for Fail-Saf Des
Summary

233

Chapter References

Part Ill Putting It All Together

237
238

Lifecycle Overview

241

Requirements

242

Risk Analysis

242

High- Level Design


Component Detailed Design
Component Unit Testing
System Integration and Test
Configuration Management and Version Control

Continuous Media Applica


tions
Introduction

257

263

263

Video

Scanned by CamScanner

257

262

Exercises

Video Streaming

254

262

Summary

Video Codecs

245

262

Regression Testing

Unco mpressed
Vid

235
237

System Lifecycle
Introduction

13

232

233

Exercises

12

232

eo Frame Formats

264

268

269
271

Contents

Video Stream Analysis and

Debug

Audio Codecs and Streamin


g
Audio Stream Analysis and De
bug
Voice Over Internet Protocol
(VoIP)
Summary
Exercises
Chapter References
14

Robotic Applications
Introduction
Robotic Arm
Actuation
End Effector Path
Sensing
Tasking
Automation and Autonomy
Summary
Exercises
Chapter References
Chapter Web References

15

Computer Vision Applications


Introduction
Object Tracking
Image Processing for Object Recognition
Characterizing Cameras
Pixel and Servo Coordinates
Stereo-Vision
Summary
Exercises

Scanned by CamScanner

xiii
271

275

275

276

278

278

278
281

281

283
286

293

293

298

300
301

301
301

302

303

303
304

306
309

311

312

313

314

Introduction to Real-Time
Embedded Systems

la

This Chapter

Introduction

A Brief History of Real-Time Systems

A Brief History of Embedded Systems

INTRODUCTION
The concept of real-time digital computing systems is an emergent concept com
pared to most engineering theory and practice. When requested to complete a task
or provide a service in real-time, the common understanding is that this task must
be done upon request and completed while the requester waits for the completion
as an output response. If the response to the req uest is too slow, the requestor may
consider lack of response a failure. The concept of real-time computing is really no
different. Requests for real-time service on a digital computing platform are most
often indicated by asynchronous interrupts. More specifically, inputs that consti
tute a real-time service request indicate a real-world event sensed by the system
for example, a new video frame has been digitized and placed in memory for
processing. The computing platform mu t now process input related to the service
request and produce an output response prior to a deadline mca ured relative to
an event sensed earlier. The real-time digital computing system must produce a re-

Scanned by CamScanner

s
nents and System
Real-Time Embedded Compo

Afte the

deadline estab
wait.
user and/or systern
e
th
ile
wh
st
ue
req
on
.
sponse up
es up or the system
request tame, the user giv

ative to the
,
lished for the response rel
ed.
sponse has bee pro uc
re
.
no
if
ts
en
rem
. g which
ui
fails to meet req
a process
nn
e
du
tim
the
1s
time as a noun
l
rea
e
fin
de
to
y
A co mm on wa
im e relates to computer applica
as a n adjective, rea l-t
tak es plac e o r occ urs Used
ded latency to user requests.
res po nd wi th low boun
can
t
tha
ses
ces
pro
or
s
ion
t
e for computing systems
a te ways to define real tim
ur
cc
a
st
mo
nd
a
t
bes
the
of
On e
rect reaJ-time system
t rea l -t i m e behavior. A cor
rec
cor
by
ant
me
is
at
wh
rify
is to cla
tically) correct output
go r it h m ic ally and mathema
must produc e a fun ct ion a lly ( a l
t for a service.
deadl i ne rela t i ve to t h e reques
response pri or to a well-defined
has a si mil ar and related history to reaJ-time
The con cept of
scope to p r ecl ude generaJ-purpose deskto p
system s a n d is mos tly a narr owi n g of
uded i n a real-time system. For example,
com pute r platf orms that m ight be i n c l
ber of commercial
ission cont rol cen ter inclu des a l a r ge num
.

embedded systems

the NASA shut tle m


im e t el eme ry data. Often desk
deskt op work statio ns for proce sing o f n ea r real-t
o r near real-time services
top real-ti me systems provid e o n l y oft real -t i m e ervice
ally provide hard real
rather than hard real-tim e ervice . Embed ded y tern typic

time services or a mixture of hard and oft real- t ime erv i c e s .


A ain, a common -se e defi n i t i n f embe d d i n g i h el pful for understanding
tern. Embedding means to enclose or
1s meant by a real -time embedded
hat
. Fr m the iewpo i n t of computing systems
implant as essential or chara teri t i

embdded system i a pecial -purp e mputer c omp le t l contained withi


. controls and not d ire tly ob ervab le by the u e r of the system. An em
the device 1t
.
beddd system peforms spec 1 fi predefined er i e rather than user-specified
om puter does .
functions and erv1ces as a general -purpo
an

The real -time em bedded Y tern

i ndu try i

f u l l of specialized terminolo

that h s developed .as a subset . of general comp uting y t e ms terminology. To he


o u. with that termm ? logy, this book i n c l ud es a glossary of common industry def-

1. d

m1t1ons. Although this book attempts to defi me spe i. a ize terminology in context,
on occasion the gl ossa ry c n he I P I'f yo L want to read the text out of order or when

the contextual definition i no t immed tately clear.


.

A BR IEF HI ST OR Y O F RE AL -T I M E S Y ST
EMS
rh origin of real time comes fro
.
m th e recent history
.
of process control usmg
d1g1tal computing platforms 1 n "
iact an earl Y d efi m1t1ve

text on the concept was
pubhshe

d in 1965 [Martin6S] The ,


concept of reaI f1me is
also rooted in computer
1at1on, where
s1mu

.
a simulation that run
s
at least as c
.
1ast as the real-world physICal
.
process 1t models is said to run m
.
.
rea1 time Ma
ny simulat1ons must make a tradeoff b etween running at or faster th
.
an real-time w"th
I ess or more model fidelity.
1
.

Scanned by CamScanner

Introduction to Real-Time Embedded Systems

The same is true for real-time graphical user interfaces (GUI), such as those pro
vided by computer game engines. Not too much later than Martin's 1965 text on
real-time systems, a definitive paper was published that set forth the foundation
for a mathematical definition of hard real-time-"Schedu ling Algorithms for
Multiprogramming in a Hard-Real-Time Environment" [Liu73 J. Liu and Layland
also defined the concept of soft real-time in 1973, however there is still no univer
sally accepted formal definition of soft real-time.
The concept of hard real-time systems became better understood based upon
experience and problems noticed with fielded systems-one of the most famous

examples early on was the Apollo 11 lunar module descent guidance overload. The
Apollo 11 system suffered CPU resource overload that threatened to cause descent
guidance services to miss deadlines and almost resulted in aborting the first land

ing on the moon. During descent of the lunar module and use of the radar system,

astronaut Buzz Aldrin notes a computer guidance system alarm. As recounted in

the book Failure Is Not an Option [KranzOO], Buzz radios, "Program alarm. It's a

1202." Eugene Kranz, the Mission Operations Director for Apollo 11, goes on to
explain, "The alarm tells us that the computer is behind in its work. If the alarms
continue, the guidance, navigation, and crew display updates will become unreli

able. If the alarms are sustained, the computer could grind to a halt, possibly

aborting the mission." Ultimately, based upon experience with this overload con

dition gained in simulation, the decision was to press on and ignore the alarm-as

we all know, the Eagle did land and Neil Armstrong did later safely set foot on the

Moon. How, in general, do you know that a system is overloaded with respect to
CPU, memory, or IO resources? Clearly, it is beneficial to maintain some resource
margin when the cost of failure is too high to be acceptable (as was the case with

the lunar lander), but how much margin is enough? When is it safe to continue op
eration despite resource shortages? In some cases, the resource shortage might just
be a temporary overload from which the system can recover and continue to pro

vide service meeting design requirements.


Since Apollo 11, more interesting real-time problems observed in the field
have shown real-time systems design to be even more complicated than simply en
suring margins. For example, the Mars Pathfinder spacecraft was nearly lost due to
a real-time processing issue. The problem was not due to an overload, but
rather a
prior ty inversion causing dad ine t be missed despite having reasonable
CPU
marg . The. Pathfinder pnnty nvers10 scenario is escribed in
detail in Chap

.!
ter 6, Multi-Resource Services. As o 11 see, ensuring safe,
mutually exclusive
.
access to hared memory can cau pnonty mversion. Whi
le safe access is required
for funct1onal correctness, meetmg response deadline
s is also a requirement for
.
.
real-time systems. A real-time
system must produce fun
ct'1ona II y correct answers
.
.
on time, before deadlmes f0r 1:1ver dll svstem
correctness G.1ven some sys t,ems e'
.
velopment expenence, most engineers are
familiar with how to design and test a

Scanned by CamScanner

Systems
Com ponents and
edded
Emb
Real-Time

'

more, most hardwar e engin eers are fa


rn1iar
c t ion Furt her
un
f
t
.
ec
rr
co
r
fo
. .
system
venfy
correc
and
t1mmg
tnes
s.
logic
Whe n ha rd .
es 1 gn digi tal
.
wi"th m et hods to d
.
I
rea
a
-time
m
ed
embe
dde
combin
d systern ' r
d software are
e.
.
ware ' firmware, an
h
t
t
h
ensure
at
d
to
e
teste
integ
and
rate
. .
ned
d
sys tern
t be desig
.
.
sponse t1m m g m us
I
I
d
eve
es1gn
system
and
res
requi
.
te
st
t hat gf\A..
quirements. This
"...,
m ee ts deadime re
.
.
II
d
y
use
1Ca
typ
ds
.
tho
me
re
wa
soft
.
beyond h ardware or
.
t ha t wer e well teste d still faile d to rovide reems
syst
n,
show
has
As histo ry
dead lines . How do unex pect ed over loads. or mver io n(v
sponses by reqliire d
.
ques tions , some fundame ntal hard real-time th eo. ry must
h appen?. To a nswer these
,,
c
t h e "Syste R esou rces chapter . By
1or
tus
impe
the
is
this
drstoo
first be unde
shou ld be able to expl am wha t happened in sce
the end of this first section, you
the Mars Pat hfinder deadl ine over- ru n
narios such as the Apol lo I l desce nt and
and how to avoid such pitfa lls.

1/A BR IEF H I STORY OF EMB E D D E D S Y S T EM S


Embedding is a m uch older concept than real t ime . Embedded digital computing

systems are often an essential part of any real-t ime embedded system and process
sensed input to produce responses as output t o a c t uators. The sensors and actua -

tors are com ponents providing IO and define the interface between an em bedded

system and the rest of the system or appl icat ion. Left with t h i s a s t he definitio n of
an em bedded digital computer, you could argu e t ha t a general - purpose work
station is an e m bedded system; after all, a mouse, keyboard, and video display pro

vide sensor/actuator-driven IO between the digital c om p u ter and a user. Ho wever,


to satisfy the definition of an embedde d system better, we distingu ish the types of
services provided.
.

A eneral- purpos e workst ation provid es a platfor m for unspec ified, to-be
determmed sets o f service s, where as an embed
ded system provid es a well-definfd
.
servICe or set o services such as antilock braki ng cont rol. In general, providing gen
.
.
eral servICes is impra ctical for
appli catio ns such as com putat ion of 1t t o the rrth digit,
payroll , or office auto mati o
n on an embedded system. Fina lly, the point of an e rn
bedded system is to cost
-effective ly prov ide a mor e l im ited set of serv ices in a l arg e r
.

Y tern, such as an auto


m 0b'I
. t1ons sw1tchmg ct'n ter.
1 e, a1rcra ft , or telecom mun1Ca

le1I- Tie Services


The concept of a real
-.ti me ervic
. e 1. fu nd ament a l in real me embedded >'-ten.l
-ti
Co n c ep t ua II y, a re
.
al-time er
"'lJt - 1n
.
vice p ro v ide a transformation of inputs to out I
Jn e m bt'dd ed
e
Y tern to pro vi <l c a 1v1d
1ro
ht
1
unction For exam pl e, a serv1 r m g F
ther mal ontrol c
1o r a sub y te
.
.
.
t'r rn by sc n '> 111
. the rmLto
r t' te ni p .
g
tempera
ture
with

aturl' L.ri
... itivc n:s1 st ors )
.
to 0 0I t h c

'
I trh.
h et''s ubs y km with a fa or to heat 1t wit

"

Scanned by CamScanner

Introduction to Real-Time Em bedded Systems

coils. The service provi ded in this example is thermal managemen t such that the
subsystem tempera ture is mainta ined within a set range. Many rea l -time embed

ded services are digital control functions and are periodic in nature: An example of

a rea l - time service that has a function other than digital control is digital media

processing. A well-known real-time digital media processing appl ication is the en


coding of audio for transport over a network with playback ( decoding) on a dis
tant network node. When the record and playback services are run on two nodes

and d uplex transport is provided, the system provides voice communicat ion ser
vices. In some sense, all computer appl icat ions are services, but real- time services
must provide the function within time constraints that are defined by the appl ica
tion. In the case of digital control, the services must provide the function within
time constraints required to maintain stability. In the case of voice services, the

function must O(cur within time constaints so that the human ea r is reasonably
pleased by clear audio . The service itsel f is not a hardware, firmware, or software
entity, but rather a conceptual state machine that transforms an input stream into
an output stream with regular and reliable timing.
Listing I. I is a pseudocode outline of a basic serv ice that poils an input inter
face for a specific i np u t vecto r.

./LISTING 1 .1

Pseudocode for Basic Real-Time Service

void provide_service(void)

if( initialize_service()

==

ERROR)

exit(FAILURE_TO_INITIALIZE);
else
in_service

TRUE;

while( in_service)

if(checkForEvent(EVENT_MASK) -- TRUE)

read_input(input_buffer);
output_buffer=do_service(input_buffer);

write_output(output_buffer);

shutdown_service();

Thi

implementation i

imple and lends it elf well to a hard


ware state m<l
ch ne hown in Figure I I ). I m p le men ting a se rv i c e a
a ingle looping tatc ma
:
chm 1s often. not a pr a ctic a l oftware implementatio
n 011 3 micropro r sor when
mult iple erv1Ce must hare a i ng le PU.
As more er i es arc .idJc:d. the loop

Scanned by CamScanner

Kea1-11me tmoeooeo \.ompum:ml>

"'' J"'I;"'

must be maintained and all services are limited to a maximum rate established b
the main loop period. This architecture for real-time services has historically bee

called the Main+ISR or Executive design when applied to software services run
ning on a microcontroller or microprocessor. Chapter 2 will describe the Execu

tive and Mairi+ISR system architectures in more detail. As the Executive approach
is scaled, multiple services must check for events in a round-robin fashion. While
one service is in the Execute state shown in Figure 1.1, the other(s) are not able to
continue polling, which causes significant latency to be added to the sensing of the
real-world events. So, Main+ISR works okay for systems with a few simple services
that all operate at some basic period or multiple thereof in the main loop. For sys-

((
SRVC-MOOE INIT IALIZE

'
SRVINITIAllZi

lniliahHllOll I
ST 4TUS STAAT

SRVC-MOO

READY

SRVC-MOOEERAC>ft

I
'

SRVC-MOqEREAOY

Eenl S.nalrog I
S UTUSPOLL

SRVC-MOOEENCOOE

SRVC-MOOflENCOOE.
SRVC-MOOE REAOY

Re..i I

ST4 TUSSAMPLE

SRVC-MOOETRANSFORM

'
SRVCfRANSfORM

&ecuie I

ST4TUS8USY

SRVC-MOOEaOECOOE
;

WYile I
ST4TUSCOMMAHO SRVc-MOOEOECOOE

FIGUIE 1.1

Scanned by CamScanner

A simple polling state machine for

Introduction to Real-Time Em bedded Systems

terns concerned only with throughput. , polling services are acceptable and perhaps
even preferable, but for real-time services where event detection latency matters,
multiple polling services do not scale well. For large numbers of Services, this ap

proach is not a good option without significant use of concurrent hardware state
machines to assist main loop functions.
When a software impleme ntation is used for multiple services on a single
CPU, software polling is often replaced with hardware offload of the event detec

tion and input encoding. The offload is most often done with an ADC (Analog to

Digital Converter) and DMA (Direct Memory Access) engine that implements the
Event Sensing state in Figure 1 . 1 . This hardware state machine then asserts an in
terrupt input into the CPU, which in turn sets a flag used by a scheduling state ma
chine to indicate that a software data processing service should be dispatched for

execution. The following is a pseudocode outline of a basic event-driven software


service:

void provide_service(void)

if( initialize_service() ==ERROR)


exit(FAILURE_TO_INITIALIZE);
else
in_service

TRUE;

while(in_service)

if(waitFor(service_request_event,

timeout)

! = TIMEOUT)

read_input(input_buffer);
output_buffer=do_service(input_buffer);

}
}

write_output(output_buffer);

else post_timeout_error();

post_service_aliveness(serviceIDSelf());

shutdown_service();

The preceding

waitFor

function is a state that the service sits in until a real

world event is sensed. When the event the service is ti ed to occurs, an interrupt ser
vice routine releases the software state machine so that it reads input, processes that
input to transform it, and then writes the output to an actuation interface. Before
entering the

waitFor

state, the software state machine first initializes itself by creat

ing resources that it need s while in service. For example, the service may need to

Scanned by CamScanner

inpu t buffer trans forma tion. Assumi ng that inireserve worki. ng memory for the
. '

ag m d1cat1 g t h at 1t 1 s ent erm g t h c serv ce


a
sets
ice
serv
the
,
well
.
tiali zatio n goes
1d agen t terminate the semce
outs
ome
l
unti
tely
fini
inde

loop , whic h is executed


the service loop to be ex1ed. In the
smg
ca
,
:
FALS

by setti ng the service flag to


until activated by a sens ed event or by a tim eout.
wa1 tFor state , the serv ice is idle
.
.
l
rs t h,at aI so assert inva
time
d
mter
ware
har
g
usin
ented
Timeouts are implem
)f a n interval time expir es, timeout cond it ions
terru pts to the mult iservic e CPU.
.
ee 1s normaJly e pect ed to be re
servt
the
use
beca
tion
func
error
are hand led by an
ts tha a l1 1k the stat e mac hm are expected
leased prior to the timeout. The even
.
some maximum peno d. If the ser
w1thm
least
at
or
al
interv
ar
regul
a
on
to occur
, then the service reads input
vice is activated from the wa it For state by an event
and produces output
from the sensor interface (if necessary), processes that i n put,
t
to an actuator interface. Wheth er the service is activat ed by a timeou or a real
world event, the service always checks in with the system health a n d status moni

toring service by posting an aliveness indicat ion. A separate service designed to


watch all other services to ensure all services cont i nue to operate on a maximum
period can then handle cases where any of the services fail to continue operation.
The wait For state is typically implemented by associating hardware interface

interrupt assertion with software interrupt handling. If the service is implemented


as a digital hardware Mealy/Moore SM (State Mach i ne ) rather tha n hardware

SM + ISR + service loop, the SM would tran it ion from the wai t For state to an

input state based upon clocked comb in ational logic and the presen t i nput vector

driven from a sensor ADC interface. The hardware state machine would then exe

cute combinational logic and clock and latch flip-flop register i nputs a n d outputs
until an output vector is produced and ultimately causes actuation with a DAC.
Ultimately, the state of the real world is polled by a hardware state machine, by a

software state machine, or by a combin ation of both. Some comb in ation of both is
often the best trade-off and is one of the most popular approaches . Implement ing

service data in software provides flexibility so that modificat ions and upgrades can
be made much more easily as compared to hardwar e modification.

Because a real-time service is triggere d by a real-wo rld event and produces a


corre pondin g system response, how long this tran sform
ation of i n put to outpu t
takes is a key design issue. Given the broad defi
nition of service a real -time service
may be implemented with hardw are,
firmware, a nd/or softw re compo nents. I n

general ' mo st serv1'ces reqmre

an m
t egrat1on
o f compone nts, mcludmg at least
hardware and software. The
real -wo rld even ts are dete cted usin g sens ors, often
tra nsducers and analog -to - d '
.
.
ig1 t a l converters, a n d are tied
t o a m1croprocessor
.
.
int errupt with dat ' con trol, an
.
.
.
d status b u f-c
1ers. This ensor interface provides the
.
inp ut needed for service pro
.
cess mg. T h e processi. n g transforms the input
data
.
.
associated with the even t int o a
respo n se out pu t . The respon se out put 1s most

Scanned by CamScanner

Introduction to Real-Time Embedded Systems

11

often implemented as a digital -to-analog converter interface to an electromechan


ical actuator. The compu ted response is then output to the digital-to-analog inter
face to control some device situated in the real world. As noted, much of the
real-time embedded theory today comes from digital control and process control

where computing systems were embedded early on into vehicles and factories to
provide automated control. Furthermore, digital control has a distinct advanta ge
over analog because it can be programmed without hardware modification. An

analog control system, or analog computer, requires rewiring and modification of


inductors, capacitors, and resistors used in control circuitry.
Real-time digital control and process control services are periodic by nature.
The system either polls sensors on a periodic basis, or the sensor components pro
vide digitized data on a known sampling interval with an interrupt generated to
the controller. The periodic services in digital control systems implement the con
trol law of a digital control system. When a microprocessor is dedicated to only
one service, the design and implementation of services is fairly simple. In this
book, we will deal with systems that include many services, many sensor interfaces,
many actuator interfaces, and one or more processors. Before delving into this

complexity, let's review the elements of a service as shown in Figure 1.2.

TimeActualion - Ti me sensed
Response Time
(From Event to Response)
=

Event
Sensed

Interrupt

FIGURE 1.2

Dispatch

Preemption

Dispatch

Completion
(10 Queued)

Actuation

(10 Completion)

Real-time service timeline.

Figure 1.2 shows a typical service implemented with hardware IO compo


nents, including analog-to-digital converter interfaces to sensors (transducers)
and digital-to-analog converter interfaces to actuators. The service proceing is
often implemented with a software component running as a thread of execution
on

microprocessor. The service thread of execution may be preempted while ex

ecuting by the arrival of interrupts from events and other services. You can also

Scanned by CamScanner

12

Real-Time Embedded Components and Systems

without software. The service may be imple


implement the service processing
mented as a hardware state machine with dedicated hardware processing operat
ing in parallel with other service processing. Implementing service processing in a
software component has the advantage that the service may be updated and modi
fied more easily. Often, after the processing or protocol related to a service is well
known and stable, the processing can be accelerated with hardware state machines
that either replace the software component completely in the extreme case, or
most often, accelerate specific portions of the processing.

For example, a computer vision system that tracks an object in real-time may
filter an image, segment it, find the centroid of a target image, and command an
actuator to tilt and pan the camera to keep the target object in its field of view. The
entire image processing may be completed 30 times per second from an input
camera. The ltering step of processing can be as simple as applying a threshold to
every pixel in a 640 x 480 image. However, applying the threshold operation with

fj

software can be time consuming, so it can be accelerated by offloading this step of


the processing to a state machine that applies the threshold before the input inter

rupt is asserted. The segmentation and centroiding may be harder to offload be

cause these algorithms are still being refined for the system . The service timeline
for the hardware accelerated service processing is shown in Figure 1.3. In Figure

1.3, interference from other services is not shown, but would still be possible.
Response Time

TimeActuation - Ti mesensed
to Response)

(From Even t
=

(---

Event
Sensed

Input

Complete

A.ciun 1 .J

lntrrupt

Completi on

Dispatch

(10

Queued)

Actuation
(10 Completion)

Real-time service timel ine with hardware acceleratio


n.
.

Ultimately all real-t'ime servic


es may be hardware only or a mLX of hardware
and software processing in ord

er to 1m
k' events to acruat1ons to momtor and con.
trol some aspect of an overaU
system. Finally, in both Figure 1.2 and Figure 1.3,
.

Scanned by CamScanner

Introduction to Real-Time Embedded Systems

1J

response time is shon as being limited by the sum of the IO latency, context
swich latency, e xecution time, and potential interference time. Input latency
coes from the time it takes sensor inputs to be converted into digital form and
transferred over an interface into working memory. Context switch latency comes
from the time it takes code to acknowledge an interrupt indicating data is avail
able, to save register vafues and stack for whatever program may already be execut

ing (preemption), and to restore state if needed for the service that will process the
newly available data. Execution ideally proceeds without interrupti<Yn, but if the
system provides ultiple services, then the CPU resources may, be shared and in

terference from other services will increase the response time. Finally, after a ser
vice produces digital output, there will be some latency in the transfer from

working memory to device memory and potential DAC conversion for actuation
output. In some systems, it is possible to overlap IO latency with execution time,
especially execution of other services during IO latency for the current service that
would otherwise leave the CPU under-used. Initially we will assume no overlap,
but in Chapter 4,.we'll discuss design and tuning methods to exploit IO-execution
overlap.
In some cases, a real-time service might simply provide an IO transformation in
real-time, such as a video encoder display system for a multimedia application.
Nothing is being controlled per se as in a digital control application. However, such

systems, referred to as continuous media.real-time applications, definitely have all the

characteristics of a real-time service. Continuous media services, like digital control,


require periodic services-in the case of video, most often for frame rates of 30

c,:

60 frames per second. Similarly, digital audio continuous media systems require en

coding, processing, and decoding of audio sound at kilohertz frequencies. In gen


eral, a real-time service may be depicted as a processing pipeline between a periodic
source and sink as shown in Figure 1.4. Furthermore, the pipeline may involve sev
eral processors and more than one 10 interface. The Figure 1.4 application simply
provides video encoding, compression, transport of data over a network, decom
pression, and decoding for display. If the services on each node in the network do
not provide real-time services, the overall quality of the video display at the pipeline
sink may have undesirable qualities such as frame jitter and drop-outs.
Real-time continuous media services often include significant hardware accel
eration. For example, the pipeline depicted in Figure 1.4 might include a compres
sion and decompression state machine rather than performing compression and
decompression in the software service on each node. Also, most continuous media
processing systems include a data-plane and a control-plane for hardware and
software components. The data-plane includes all elements in the real-time service
pipeline, whereas the control-plane includes nonreal-time management of the
pipeline through an APl (Application Program Interface). A similar approach can
be taken for the architecture of a digital control system that requires occasional

Scanned by CamScanner

1C

Real-Time Embedded Components and Systems


.
line shown in Figure 1.4, the control API
management. In the case of the video pipe
the frame rate. The source m1g ht inhermight allow a user to increase or decrease
es per second), but the frames may be
ently be able to encode frames at 30 fps (fram
decimated and retimed to 24 fps.

Source Flow
& Control 4

SourcFlow
: Control if
.

Application

Appllcatlon

API

HW/SW

FIGURE 1.4

Distributed continuous media real-time services.

So far, all the applications and service types considered have been periodic.
One of the most common examples of a real-time service that is not periodic by
nature is error-handling service. Normally a system does not have periodic or reg
ular faults-if it does, the system most likely needs to be replaced! So, fault han
dling is often asynchronous. The timeline and characteristics of an asynchronous
real-time service are however no different from those already studied. In Chapter
2, "System Resources," the characteristics of services will be investigated further so
that different types of services can be better understood and designed for more
optimal performance.

lulTl Stara
The POSIX (Portable Operating Systems Interface) group has established a num
ber of standards related to real-time systems including the following:

l EE Std

2{)_03.1 b-2000:

t1me extensions
l EE Std

Testing specification for POSIX part l, including real

00.13-1998: Real-time profile standard to address embedded real


tlme applications and smaller footprint dev
ices
IEEE Std 1003.lb-1993: Real-time exten
sion; now integrated into POSIX l003. I
Scanned by CamScanner

Introduction to Real-Time Embedded Systems

15

IEEE Std 1003.lc-1995: Threads; now integrated into POSIX 1003.1


IEEE Std 1003.ld-1999: Additional real-time extensions; now integrated into
POSIX 1003.1-2001, which was later replaced by POSIX 1003.1-2003

IEEE Std 1003.lj-2000: Advanced real-time extensions; now integrated into

POSIX 1003.1-2001, which was later replaced by POSIX 1003.1-2003


IEEE Std 1003.lq-2000: Tracing

The most significant standard for real-time systems from POSIX is I 003. lb,
which specifies the API that most RTOS (Real-Time Operating Systems) and real
time Linux operating systems implement. The POSIX 1003.1 b extensions include
definitions of the following real-time operating system mechanisms:

Priority Scheduling
Real-Time Signals
Clocks and Timers
Semaphores
Message Passing
Shared Memory
Asynchronous and Synchronous I/O
Memory Locking
Some additional interesting real-time standards include the following:

00-1788, Software Considerations in Airborne Systems and Equipment


Certification
JSR-1-The Real-Time Specification for Java
The Object Management Group Real-Time CORBA 1.0 (for example, the
TAO Common Object Request Broker Architecture)
The IETF (Internet Engineering Task Force) RTP (Real-Time Transport Pro
tocol) and RTCP (Real-Time Control Protocol) RFC 3550
The IETF RTSP (Real-Time Streaming Protocol) RFC 2326
ARINC-653, Aeronautical Radio, Incorporated Standards for partitioning
computer resources in time and space
TM

SUMMARY

Now that you are armed with a basic definition of the concept of a real-time em
bedded system, we can proceed to delve into real-time theory and embedded re
source management theory and practice. (A large number of specific terms and

Scanned by CamScanner

16

Real-Time Embedded Components and Systems

terminology are used in real-time embedded systems, so be sure to consult the


complete glossary of commonly used terms at the end of this book.) This theory is

the best place to start because it is fundamental. A good understanding of the the
ory is required before you can proceed with the more practical aspects of engineer

ing components for design and implementation of a real-time embedded system.


The Exercises, Labs, and the Example projects included with this text on the

CD-ROM are intended to be entertaining as well as an informative and valuable


experience-the best way to develop expertise with real-time embedded systems is
to build and experiment with real systems. Although the Example systems can be

""''"rco

built for a low cost, they provide a meaningful experience that can be transferred
to more elaborate projects typical in industry. All the Example projects presented
here include a list of components, list of services, and a basic outline for design and

implementation. They have all been implemented successfully numerous times by

of Colorado. For more challenge, you may want to comat the University
students
.
.

' brne eieents and mix up services from one example with another; for example, it

...

'is posjhle to place an NTSC camera on the grappler of the robotic arm to recog,11ize. tctrgets for pick-up with computer vision. A modification such as this to in
clu a Video stream is a nice fusion of continuous media and robotic applications
that reqt!ires the application of digital control theory as well.

l . Provide examples of real-time embedded systems you are familiar with


and describe how these systems meet the common definition of real time

2.

and embedded.
Find the Liu and Layland paper and read through section

3. Why do they

assume that all requests for services are periodic? Why might this be a
problem with a real application?

3. Define hard real-time services and soft real-time services and describe why
and how they are different.

CHAPTER REFERE N C E S

f KranzOO l Kranz, Gene, Failure Is Not an Option.


.
. C. ., and
[Lm7
3] Lm,

Berkley Books, 2000.


Layland, James W., ((Scheduling Algorithms for Multipro
.
gramming m a Hard-Real-Time
Environment." journal of the Association for

omputing fi!achinery, Vol. 20, No. I , (January 1 973 ): pp. 46-61 .


f Mart i6 5 ] Martm, James, Programming Real-Time Computer Systems. Prentice

Hall,

1965.

Scanned by CamScanner

System Resources
'

In This Chapter

Introduction

Resource Analysis
Real-Time Service Utility
Scheduling Classes
The Cyclic Executive

Scheduler Concepts
Real-Time Operating Systems

Thread Safe Functions

INTRODUCTION
Real-time embedded systems must provide deterministic behavior and often have
more rigorous time- and safety-critical system requirements compared to general

purpose desktop computing systems. For example, a satellite real-time embedded

system must survive launch and the space environment, must be very efficient in
terms of power and mass, and must meet high reliability standards. Applications

that provide a real-time service could in some cases be much simpler if they were

not resource constrained by system requirements typical of an embedded environ

ment. For example, a desktop multimedia application can provide MPEG play

back services on.a high-end PC with a high degree of quality without ignificant
specialized hardware or software de ign; this type of scenario is "killing the prob
lem with resources." The re ources of

desktop sy tern often in ludt> a high

17

Scanned by CamScanner

11

Real-Time Embedded Components and Systems

throug hp ut CPU (GHz clock rate and billion s of i nstruc tions per seco nd) , a high
bandw idth, low-lat ency bus ( gigabit ) , a large-c apacity memo ry system ( gigabytes)
and virtual ly unlim ited public ut l i 1,Y power. By compa rison, a n anti.-lock braki n
system m ust run off the automobile s 1 2 -volt DC pow r system , s u 1ve the u nder
hood therma l and vibrati on envi rnme nt, and provid e the brakm g control at a
reason able consum er cost. So for most real-ti me embedded system s, simply kill ing
the prob lem with resources is not a valid opti n .
.
.
The enginee r m ust instead carefull y co s1der resourc e h m tat1ons, incl udi ng
power, mass, size, memory capacity , processmg, and 1/0 bandwi dth. Furthermore,
compl ications of reliable operatio n in hazardo us environ ments may require special
ized resources such as error detectin g and correctin g memory systems. To success
fully implemen t real-time services i n a system providing embedde d functi ons,
resource analysis must be completed to ensure that these services are not only func
tionaJly correct, but that they produce output on time and with h igh reliabili ty and
availab ility. Part I of this book, "Real-Time Embedded Theory," provides a resou rce
view of real-time embedded systems and methods to make opti mal use of these re
sources. I n Chapter 3, " Processing," we will focus on how to analyze CPU resources.
I n Chapter 4, " 1/0 Resources," resources will be characterized and methods of analy

sis presented. Chapter 5, "Memory," provides memory resource analysis methods.


Chapter 6, "Multiresource Services," provides an overview of how these three basic
resources related to workload throughput relate to more fu nda mental resources
such _as power, mass, and size. Chapters 2 through 6 provide the classic hard real
time view of resource sizi ng, margins, and determi n istic system behavior. Chapter 7,
"Soft Real-Time Services," completes the resource view by presenting the latest con
cepts for soft real-time resource management, where occasional service deadline
misses and failure to maintain resource margins are allowed.
The three fundamental resources, CPU, memory, and 1/0, a re excellen t places to
start understanding the architect u re of rea l - t i me e mbedded systems and how to
meet design requirements and objectives. Furthermore, resource analysis is critical

to the hardware, firmware, and software design in a real-time embedded system.


Upon completion of the entire book, you will also u nderstand all system resource is
s s, including cost, performance, power usage, thermal operating ranges, and relia
bility. Part I I , "Designing Real-Tim e Embedded Co mponents," provides a deta iled
look t the design of components, and Part I I I , " Putting I t All Togethe r," provides an
eri l
?verv 1ew of ow to integrate these components into a system. However, the mat a
CPU core.
n Part ? ne is most critical, because a system designed around the wrong
insufficient memory, or 1/0 bandwidth, is sure to fail. From a real-ti me serv ices per
s etive, in su ffi cient memory, C P U , or 1/ 0 can make the entire project infeasibl e .
fa iling to met req uirem ents. It is i m portant to not only u nderstand how to look at
u s
the three mam resources i ndivid u ally, b ut also to c o nsider m ultiresou rce i ss e .
w r.
trade-offs, and how the main three interplay with other resou rces suc h as po e

Scanned by CamScanner

11

System Resources

M ultiresource constrai nts u ch a power usage ma y res u l t i n less memo ry or a lower

CPU clock rate, for exam ple . I n t h i s se n se, po we r const raints could in the end ca u se
a p robl e m with a design's ca pability to meet real - t i m e deadline .

RESOU RCE ANALY S I S


Look i n g m o r e c l osel y at some of t h e real - t i m e

e rv ice exa m p l e

Chapter l , t here are c o m mon r e ou rces that m u t be


real - t i m e e mbedded ystem includ ing the followi ng:

i n t ro d u ed i n

i z ed and manJgcd i n any

Processing: A n y numb er of microproce or or m ic ro - o n t ro l l e r networked


together

Memory: All sto rage element

in the

y tem i n c l u d i n g vola t i l e and non

volatile to rage

1/0: I n put and output that encodes en ed data and i

tt

ed for decod i n g tor

actuatio n
I n the up o m i ng t h ree chapter , w e w i l l characterize a n d der ive for mal model
for each

f the ke

of h o w ea h ke

re ou rce : pro es i ng , 1 / 0, and memory. Here a re b r ief o u t l i n e

re o u rce w i l l be exa m i ned.


the main f cus of real - t i m e reso u rce analy is and t heory ha

Tra d i t ionall

en tered a round proce sing and how to chedule mult iplexed exec u tion of

been

mult iple e r i e
tern

n a i n gle processor. Sched uling resource usage req u i res t he y -

ftware to make a de i ion to allocate

re o u rce uch as the

P U to a pec i fic

th read of exe ution. The mechan ics of m u l t i plex..ing the CPU by preempting a r u n
ning t h read, savi n g it

witch.

state, a n d di patch ing a new t h read is called a th read cot1 text

h e d u l i n g i n vol es i m pleme n t i n g a pol icy, whereas preemption a n d di -

pat h a re

RT

n t e t - s w i t c h i n g me h a n i m . When

hedu\er a n d ontext-

C P U i m u l t i plexed w i t h a n

itch i n g mech ani m , t he sy tern archite t m u t de

ter m i n e \ hether the CPU re o u rce are sufficient given the et of e rv i e th read t o
be e e uted a n d whether the
to

ervi e

will be able to rel iably complete excc u t i n

t e rn requ i red deadlines. The global demand u pon t he PU m u t be


det e r m i n e d . F u rthermo re, th rel iabilit of t he o erall y tern h i n ge u pon the

pri

repea tJbilit'.'
wilJ

erv i c e

>

eha e

hJt meet ing de dline

a n tc d , t hen hl' .,
t

rm
n

o .ih1ltty

1<le 1 e
t

t >

t e rn

pr

)VI

fo. B-

e -,er

sy11 tc m ,

m rcp

Scanned by CamScanner

11'>

deadl ine ; idea lly,

nb
u e it

e ry e r i e reque

uar nteed. I f deadl ine an bt- guar

be h

i rd

not

han e

vrr

t i me i n

by wel l - de m t:: d d ctd l i n c: , the tem is J}


I
i n m o r e lo cl a t how to i m p lc: m n t

bt t e r under\t n d t n l

v d ry ove r

1 e

J rcr m t n t '> t l l . He fo r

deter m i n 1 t i

make y

exe ution pri r t

t i me

re u1red.

'

t e m re

ur e

and what

a
n

JO

Real-Time Embedded Components ana ystems

or i nstructio n executio n ( cloc k rate },


The main con i de rat i o n s include speed
ions ( averag e Clocks Per I n st ructio n [CP I J ),
the efficie ncy o f execut ing i nstruct
y of service request s. Chapte r 3 prov ides a
a lgori t h m c o m p lexit y , a n d fre u e n c
deta i led exa m i n a t i o n of pro cess ing:

Speed:

Clock Rate for I nstruction Execution.

Efficiency:

C P I o r I P C ( I nstruction s Per Cloc k ) ; processi n g stalls due to


hazards; for example, read data dependen cy, cache m i sses, a nd write buffer
over flow stall s.

Algorithm complexity: C;

i nstructio n cou n t on servic e longest path for ser


vice ; and, ideally, is determ i n i s t ic ; if C; is not known, the worst case shoul d be
used-WCET ( Worst-Ca se E xecu t i o n T i m e ) i s t h e longest, most i nefficie ntly
=

executed p a t h for service; W C ET is o n e compon ent of response time ( as


shown i n F i gu re s 1 .2 and 1 . 3 i n Chapter 1 ) ; other contribut ions to response
t i me come from i n p u t latency; d i spatch latency; exec ution; i n terference by
higher priority services and i n terrupts; and o u t p u t l a tency.

Service Frequency: T,

ervice Relea e Period.

Cha pter 1 0, " Performance T u n i n g," provide t i p for resol v i n g execution effi
ciency issues. Execut ion efficiency i
code can lead to large WCET , mi

not

a rea l - t i me req u i re m e n t , but i nefficient

cd dead l i n e , a n d d i ffi c u l t sched uling. So, un

der tanding met hods for t u n i n g oft wa re perfor m a nce can become i m portant.
I nRut and o u t p u t chan nel between pr

or co res and devices are one of the

most i m porta n t resources i n rea l - t i m e embedded ystems a n d pe r h a ps o n e of the


most often overl ooked as fa r a theo ry a n d analy is. I n a re a l - t i m e embedded sys

tem, low latency for 1 1 0 is fu n d a m e n t a l . T h e re p o n s e t i m e of a se rv i c e can be

h ighly i n fl u enced by 1/0 latency as is evident i n F i g u res l .2 a n d

1 .3

of Ch ap t e r l .

Many processor cores have t h e capa b i l i t y t o con t i n ue p roce ssi n g i nstructions while
110 reads are pending or w h i l e 1/0 wri tes a re d ra i n i n g out of buffers to devices.

This decou pling helps effi cien c y t r e m e n d o u sly, b u t when the service processing re

qu i res re ad data t o con t i n ue or when w ri t e b u ffe rs bec o m e ful l , processi ng can

s all; furthermore, no response is c o m p l e t e u n t i l writes actually drain to output de


.
V1Ce interfaces. So, key 1 / 0 parameters a re l a t e n c y, bandwidth, rea d / wr i t e queue
depths, and co upl i ng bet we e n 110 c h a n n e l s and the CPU. The coverage of 1/0 re

source management in Chapter 4 i n c l u des the followi ng:

Latency

A rbi tra tio n lat ency fo r sha


red 1/0 i n ter faces
R ea d la te n cy

Ti m e for da ta t ran sit from


dev i c e t o CP U c o re

Scanned by CamScanner

System Resources

Reg ist rs , Tigh tly Cou pled


Mem ory ( TCM ) , and L 1 cach e for zero wait
sta te sm gle cy cle access

Bu s i n ter fac e rea d requ


.

es t s a n d com p 1et1ons:
. transac t ions and delay
sp1 1t
.
Wn te late ncy

Tim e for data tran sit from


CPU core to dev ice
Pos ted wri tes pre ven t CP U stal
ls

21

Pos ted wri tes req u i re bus i nter face que


ue

Bandwidth ( BW)

Aver age bytes or word s trans ferred per u n it t i me


BW says nothing about latency, so i t is not a panacea for real-time system

Queue depth

Write b uffer stalls will decrease efficienc y when queues fil l up


Read b u ffers-mos t often stalled by need for data to process

CPU

coupling

OMA chan nels help decouple the CPU from 1/0

Progra m m ed l/O strongly couples the CPU to 1/0


Cycle stealing requires occasional interaction between the CPU and

OMA

engi nes

Memory resou rces are designed based upon cost, capacity, and access latency.
I deally a l l memory would be zero wait-state so that the processing elements in the
system could access data in a s ingle processi ng cycle. Due to cost, the memory is
most often designed as a h ierarchy with the fastest memory being the smallest due
to h igh cost, and l a rge capacity memory the largest and lowest cost per unit stor
age. N o nvolatile memory is most often the slowest access. The management, siz

i ng, and a llocation of memory for real -time embeded systems will be covered i n
detail i n Chap ter 5 and incl udes the follow i ng:

Memory h ierarchy from least to most latency

Level - I cache

Locked for use as fast memory, unlocked for set -associative or d irect

map ped cach es


Lev e l - 2 cac he or TC M

Sing le cycl e acce ss


Typically Harvard arch i tectu re-separate data and i n st ruction caches

Few or no wait-state ( e.g. , 2 cycle acces . )


Typica l l y u n i fied ( contain both data and code )

Locke d r use as TC M , un locked to back L l cache


M M R ( M emory Ma pped Regist er )

M a i n memory- R A M , SDRA M , D D R

Proce or b u interfa

Scanned by CamScanner

ee A p pcnc1 i x A , l o sa r

and ont roller

22

Real-Time Embedded Components and System s

M u lticy cle acce ss late ncy on- chip


Man y-cy cle acce ss late ncy off- c h i p

M M IO ( Mem ory M app ed 1/0) Dev ices

Nonvol atile memory l i ke flash, EEPRO M, a n d battery -backed SRAM


Slowe st read/ write acces s, most often off-c hip

Req u i res algorithm for block e rase: i n terrupt upon c o m pletion and poll

for completion for flash a n d E E PROM


Total capa c i ty for code, data, stack, and heap requ i res careful pla n n i ng

A llocat ion o f data, code, stack, heap to physical h i e ra rchy w i l l sign i fica ntly
affect perfo rman ce
Traditionally, real -time theory a n d systems des i gn have foc used al most e n

t i rely on sharing C P U resources a n d , to a lesser e x t e n t , issues rel ated to shared


memory, 1/0 latency, 1 /0 sched u l i n g, a nd sync h ro n ization o f services. To really
understand the performa nce of a real - t i me em bedded system a n d to properly size
resources for the services to be supported, a l l th ree resou rces m ust be considered
as well as i n teractions between them. A given system m ay experience problems
meeting service dead l i nes because it is:

CPU bound: I nsufficient execution cycles d u ri n g release period and due to


inefficiency in execution

1/0 bound: Too much total 1/0 latency d uring the release period a nd/or poor

scheduling of 1/0 d u ri ng execution

Memory bound: I n suffi cient memory capacity o r too m uc h m e mory access

latency d u r i n g the release period

In fact, most modern m icroproc essors have M M IO ( me mory m a pped 1 / 0 )


arch itect ures so t h a t memory access l a t e n cy a n d device 1 /0 latency con t ri b u te
toget her to respons e latency . M ost often , many exec u t io n cycles c a n be overlap pe d

with 1 10 and memory access t i m e fo r better e fficienc y, b u


t th is req u i res ca reful
ch dul i n g of 1 10 during a servic e releas e. The conce
pt of overla p a n d 1/0 sched u l
.
ing is disc
ussed i n deta il i n Cha pter 4 , " 11 0 Reso u rces ."
T is book provides a more balan ced
chara cterization of aU t h ree major resou rces.
At a high level , a real - t i me emb
edde d syste m can be c h a ract erize d i n term s of CPU ,
1 10, and e ory reso urce
marg i n m a i n ta i n ed as dep icted in Figu re 2 . 1 . The box
.
at the orig
i n m t h e figu re dep icts the
regi on whe re a syst em would hav e high CPU,
l/O , a n d me mo ry ma rgi n
- th i s is 1 d ea 1 , b u t perhaps n o t real .1 s t 1C d ue to c o s t ,
.
as ' power, and ize con tra i n t
. The box in the top- righ t cor ner dep ict s t he re
gio n wh ere a sy tern ha s very
lit tle re ou r e m a rgi n .

n:1

Scanned by CamScanner

System Resources

2J

_ _ .., _ _ _ .. _ _ _ _ _ _ _ _ _ _ _ _ .. .. .. .. .. .. ... .. .. .. ,.
,

C P U -U ti lity

,'

'
'

,'
- - - - - - - - - - - - - - - - _

:
'

,'

'

,__

'

_._

_
_

'
'
'
'
'

I Q - U tility

, '

'

,'

'

- .. - - - - - - - - - - - - - - - - - - - - ... - - - - - - J'

'

FIGURE 2.1

Real-time embedded system resource


characterization.

Often t h e resource margin that a real - t i me embedded system is designed t o


maintain depends u pon a nu mber of h igher- level design factors, i n c l u d i ng:

System cost
Reliabil ity required ( how often is t he system allowed to fail if it is

soft-real - t i me

syst em ? )
Ava i l a b i l i t y req u i red ( how often the system is expected to be out of service or
in service? )
Risk o f oversubscribing resou rces ( how determ i n is t i c are resource demands?)
I m pact o f overs ubsc ription ( i f resource margin i s i ns u ffi c i e n t , what a re t h e
conseq uences ? )
U n fo r t u n a tely, p rescribing general margi ns for a n y system w i t h specific val ues

is d i ffic u l t . H owever, here are some basic guideli nes for resource s i z i n g a n d margin
main tenance:

CPU: The et o f propo ed ervices must be allocated to proces ors so t hat each
pro e sor in the y tern meet t h e Lehoczky, hah, D i ng t heorem for feasib i l i t y .

N o r m ally, t he C P U margi n req u i red i le s t ha n t h e R M L U B ( Rate M o notonic


Lea t U pper Bound ) f appro x i ma t e! 30%. You'll ee why thi is in hapter 3,

f m a rgin req ui red depend. u po n t he servi e parJ


meter -mo t l y t h e i r rel a t i ve relea c p e r i od s a n d how harmo n ic t he plrio<l
a re. F u rt herm re, for a ym m c t r i M P ( M u l t i - P ro c or ) system . scal i ng m,1y
" Proce i ng." The , m u nt

Scanned by CamScanner

24

Real-Time Embedded Components and Systems

each p rocess o r are, in fact, independent. If ser


be fairly l inear if the loads on
have depen de cie and share reso urces such as
vices on di fferent processors
ased synchronizat ion, however, the scal ability is
mory or require message-b
hl's law provi des esti mation for the s peed-u p
ject to Amda hl 's law. Amda
when depe de cies exist betwen the pro
provided by additio nal proces sor
tial speed -up ts discus sed further m the " Processin g on each processor. Poten
r as well.
ces sin g" sec tio n of thi s cha pte

1/0: Total 1/0 latency for a given service should never exceed the respon se
often the deadlin e and period are the
deadline o r the service releae period (
same). You'll see that execution and 1 /0 can be overlappe d so t hat the response

time is not the simple sum of 1/0 latency and execution t i me. I n the worst case,
the response time can be as is suggested by Figures 1 . 2 a n d 1 .3 in Chap ter l .
Overlapping 1/0 time with execution time is therefore a key concept for bette r
performan ce. Overlap theory is presented i n Chapter 4 , " 1/0 Resources."

Scheduling 1/0 so that it is overlaps is often called 110

latency hiding.

Memory: The total_ memory capacity should be suffic ient for the worst-case sta

tic and dynamic memory requirements for all services. Furthermore, the mem

ory access latency sum med with the 1/0 latency should n ot exceed the service
release period. Memory latency can be hidden by overlapping memory latency
with carefuJ instruction scheduling and use of cache to improve performance.
The largest challenge in real -time e mbedded systems is deali'n g with the trade
off between determ inism and efficiency gai ned fro m less deter m i nistic architec
tural features such as set-associative caches and overlapped 1/ 0 a n d execution. To
be completely safe, the system m ust be shown to have determ i n istic t i m i ng and use
of resources so that feasibility can be prove n . For hard real - t i m e systems where the
consequences of failure are too severe to ever allow, the worst case m u st always be
assumed. For soft real -time systems, a better t rade-off c a n b e made t o get higher
performance for lower cost, but with h igher probability o f occasional service fail
ures. In the worst case, the response t i m e equation i s
'IS; , Tresponse-i Deadline;

Tresponse-i

T10-La1ency-i +

W C ET;

TMemory- latency -i

+ L T;nreref rence-j

WCET = Worst - Case - Execution - Time


i-1

L Tinterference- ;
;=I

Scanned by CamScanner

= Total

Service;

i-1

Preemption - Time

J=I

(2. l )

Syste m Resources

JS

All

service.s S; in a hard real-time system must have response times less than
their required deadlin e, and the response time must be assumed to be the sum of
the total worst-case latency . Not all systems have true hard real-time requirements
so that the absolute worst-c ase response needs to be assumed, but man.r do. Com
mercial aircraft flight control systems do need to make worst-case assumptio ns to
be safe. However, worst-ca se assumptions need not be made for all services, j ust

for those required to maintain controlled fl ight.

REAL-T I M E S E RV I C E UTILITY
To more formally describe various types of real-time services, the real-time research
com m u nity devised the concept of a service util ity function. The service utility
function for a sim ple real-t ime service is depicted in Figure

2.2. The service is said to

be 'released when the service is rady to start execution fol lowing a service
request, most often i n i tiated by an interrupt.
Relea

Dead line

1 00%

Uti lity

0%

"------.---:,..;n me
(

'
FIGURE 2.2

Scanned by CamScanner

Hard real-time service utility.

After Dt'adli11e.
llfl'fir_1 is Negari1e

21

Real-Time Embedded Components and Systems


produc ing a respon se any time prior to th t
Notice that the utility of the servic e
and at the i nstant follow in g the dead lin e, t ht
deadlin e relativ e to the request is full,
y negativ e . The i m plicati on i s that con tin
utility not only becomes zero, but actuall
after t h deadl i e is n ot nly futile, but may
uing proces sing of this service reques t
.
than simply d1scon tm u m g the service pro
actually cause more harm to the system
wors e than no respo n se.
cessing. A late response might actua lly be
More specifically, if an early respon s e i s also undesi rable, a s i t would be in a n
isochronal service, then the util ity is negativ e up to the deadlin e, full at the dea d

line, and negative again after the deadlin e as depicte d i n Figure 2.3. For ar.
isochronal service, early completio n of response processin g req u i res the response
to be held or bu ffered up to the deadline if it i s com puted early. The servic es de

picted in Figures 2.2 and 2.3 are said to be hard real-time because the utility of a
late response (or early i n the case of isochronal) is not only zero, but also negat ive .
A hard real-time system suffers sign i ficant harm from i mproperly timed re
sponses. For example, an aircraft may lose control if the d igital autopilot produces
a l ate control surface actuation, or a satellite may be lost if a t h r ust is applied too
Jong. Hard real -time services, isochronal or sim ple, req u i re correct tim ing to avoid
Joss of life and/or assets. For digital control systems early responses can be j ust as

destab ilizing as late responses.


Release

Deadline

1 00%

Utility

0%
Before Deadline
Uri/1ry is Neguii e

AGUIE 2.J

Isochronal service u
ti lity

Scanned by CamScanner

Time

Aft er Deadline.
Uri/try is Ngative

Syst em Resources

J7

How does a hard real-ti me service compare to a typical nonreal -t ime applic a
tion ? F i g ure 2.4 shows a s rv ke that is co nsidered to prodce a respon se with best
effort. Basically, the nonreal- time-service h a s no real deadline bCcause full utility ts
realized whenever a best effort application finally produces

result. Most desktop

systems and even many embedded computing systems are designed to maxjmize
overall throughput for a workload with no guarantee on response time, but with
maxim u m effi ciency i n processing the workload. For example, on a desktop sys
tem, there is no limit on how m uc h the CPU can be oversubscribed and how m uc h

1/0 backlog may be generated. Memory is typically l i mited, but a s CPU and 1/0

backlog increases , response t i mes become longer. No guarantee on any particular


response time can be made for best effort systems with backlogs. Some embedded
systems, for example a storage system interface, also may have no real-time guar
antees. For embedded systems that need no deadline guarantees, it makes sense
that the design attempts to maximize throughput. A h igh throughput system may,
in fact, have low latency in processing requests, but this st ill does not imply any
sort of guarantee of response by a deadline relative to the request. As shown i n Fig
ure 2.4, full utility is assumed no matter when the res.ponse is generated.
Release

1 00%

Utility
Deadline Does Not Exist

0%
FIGURE JA

Best effort service utility.

The real-time research community agrees upon the defin itions of hard real
time, isochronal, and best effort services. The exact definition of soft real-time ser
vices is, by contrast, somewhat unclear. One idea for the concept of soft real -time

Scanned by CamScanner

11

Real-Time Embedded Components and Systems

is similar to the idea of receiving partial credit for late homework because a serv i ce
that produces a late response still provides some utility to t he system . An alte rna
tive idea for the concept of soft real-time is also similar to a well- known homework
policy i n which some service drop-outs are acceptable. I n this case, by ana logy, no
credit is given for late homework, but the student is allowed to drop their lowest
score or scores. Either definition of soft real-time clearly falls between the extremes
of the hard real-time and the best effort utility curves. Figu re 2 . 5 depicts the soft
real-time concept where some function greater than o r equaJ to zero exists for soft

real-time service responses after the response deadl ine-if the function is identi
cally zero, a well -designed system will simply term inate the service after the dead
line, and a service drop-out will occur. If some partial utility can be realized for a
late response, a well-designed system may want to allow for some fixed amount of
service overrun as shown in Figure 2.5.
Release

Deadline

1 00%

Utility

Ti me
After Deadline, Utility Diminishes

FIGURE 2.5

According to Some Function F(t)

Soft real- time u tility curv e.

.
For cont inu ous media a lica .
.
ti o n s (vid eo and au d io), most often t here is no
PP
reason for prod ucin g late res
po ses bec a u th
se ey cau se fram e jitte r. Ofte n. it is best
ou
s
th
r
lic
app
ese
at
fo
ion if th e prev1
fra m e o r so un\,...r b 1te
. output .1 prod uced agai. n
and the late outp ut dropped w h
.
e n t he next f;rame .1 d e 1 1vered
.
on
u n e Of course . 1 r
t
.
.
mult 1pJ servi ce drop -out s oc u r
i n a row t h ' I ea d s to una
eptable qual ity of ser-

Scanned by CamScanner

----------...---= ,._
....
_
_
,,_ ,......

OJ ._ ,...,..
_
.._ __
_

System Resources

21

vie . The adva tage of a drop- out polic


y is that processing can be term ina ted as soon
.
as 1t 1s deter mine d that the curre nt servic
e in progress will miss its dead line, thereby
.
freein g up CPU resou rces. By analo
gy, i f an i nstru ctor gives zero credit for late
home work (and no partia l credi t for
partia lly completed work ) , a wise stude nt will
ork on that homework as soon as they
cea
realize they can't finish and reallocate
the r time to ho ework for other classes. Other
classes may have a differe nt credit
pohcy for accept mg late work, but given that
work on the futile homework has been
dropped, the stude nt may have plenty of time
to fin ish other work on time.
A policy known as the anytime a lgorithm is analogo us to receivin partial
g
credit for part ially comple ted homew ork and partial utility for a partiall y comple te

i:n

service. The concept of an anytime algorithm can only be implemen ted for services
where i terative refinemen t is possible, that is, the algorithm produces an i n i t ial

solutio n long before the dead l i ne, but can produce a better sol ution ( response) i f
allowed to con t i n ue processing u p t o the deadline for response. I f the deadline is
reached before the algorithm finds the optimal solution, then it simply responds

with the best solution found so far. Anytime algorithms have been used most for
robotic and AI ( A rt i ficial I ntelligence) real-time applications where iterative re
finement can be beneficial. For example, a robotic navigation system m i ght i n

cl ude a path - pl a n n ing search algorithm for the map i t is building i n memory.
When the robot encounters an obstacle, it must decide whether to turn left or
right. When not m uch of the environment is mapped, a simple random selection
of left or right m ight be the best response possible. Later on, after more of the en
viro n m e n t is mapped, the algorithm might need to run longer to find a path to its
goal or at least one that gets the robot closer to its goal. Anytime algorithms are de
signed to be t e r m inated at their deadli nes and produce the best solution anyt i me,
so by defi n ition a nyt ime services do not overrun deadlines, but rather provide
some partial util i ty sol ut ion anytime following their release. The i nformation de

rived during p revious partial solutions may in fact be used in subsequent service
releases; for example, in the robot path-planning scenario, paths a n d partial paths

could be saved i n memory for reuse when the robot ret u rns to a location occupied
previously. This is the concept of iterative refinement. The storage of i n termediate

results for i terati ve refinement requires additional memory resources (a dyn a m ic .


program mi n g m et hod ), but can lead to opti mal sol utions prior to the dec ision

deadline, such as a robot that finds an optimal path more quickly a nd still avoids
collisions a n d/ or lengthy pauses to think! Why wouldn ' t the example robotic ap
plication simply halt when encoun teri ng an obstacle, run the search algorithm as
long as i t takes to find a solution, or determ ine that it has insufficient mapping or
no possible path? This is possible, but may be undesirable if it is better that the
robot not stand still too long because someti mes it may be better to do someth ing
rathe r than not h i ng.

Scanned by CamScanner

JO

Real-Time Embedded Components and Systems

map out more of


.
, the robot m ight get l ucky and
By mak ing a random c h 01ce
are not a1 ways t h e b est ap. t more
.
klY Anyti me algorith ms
the envuonme
q ic
are ano ther opti o n fo r less determin i stic serp oach for se 1es, but c earlY h ey

I
he robot simp ly ran until it deter m ined the
vices and av01 dmg o erru n . f t
this would constit ute a best effort service rat her
opti mal answer each time , t en
1
.
.
any tim e real -tim e ut1 ity cur ve.
than anyt ime. F1gure 2 6 depicts an

Release

Deadline

100%

t:tility
F( t )

0%

Before Deadline.

FIGURE

2.1

Utili1 is <

/000

Tim

After Deadline.
Ut1firy is .Vegatfre or Zero

Anytime service utility.

Finally, services can be some combination of the previously explained service


types: hard, isochronal, best-effort, soft, or anyt i me. For exam ple, what about a soft
isochronal service? An isochronal service can achieve partial util ity for responses
prior to the deadline, full utility at the deadline, and again partial utility after the
deadline. For example, in a conti nuous media system, it may not be possible to. hold
early responses until the deadli ne, and it may also be beneficial to produce a late re

sponse-allow an overrun. I n some sense, the idea of a h ard isochronal service is

hard to imagine, yet feedback digital control systems are a very i m portant example

of a hard isochronal application. That is, always producing the response exactly at

the desi red deadlihe relative to release. I soch ronal systems a re normally imple

mented w ith hold buffers and traditional h a rd real -time services, early service co
will
pletions must be buffered, and CPU scheduling m ust ensure that late respo nses
never happen. S o, a soft isochronal service wo uld be fa r easier t o i m plement be
ter cause there is no need for early completio n bufferi'ng and no need to detect a n d

Scanned by CamScanner

System Resour ces

J1

minate services that overrun deadlines. Allowing indefinite overrun of any soft real
time service can ultimately be a problem. If overruns continue to occur and service
releases start to overlap for the same service, the loading simply climbs higher and
higher. How can such a system ever recover? Allowing indefinite overruns would be
similar to the scenario where the overly conscientious student continues to work on
more and more late homewo rk, ulti mately lowering grades in all courses to the
point that total failure is the ultimate outcome. The example of a soft isochronal
service is depicted in Figure 2.7.
Release

1 00%

Deadline

. . . . . . . . . . . . . . . . . . . . . . . .

Utility

0%

F(r)

Before Deuel/me.

Uriliv is < 100%

FIGURE 2.7

F(r}

After Deadline.
Uriliry 1s

<

100"'0

Soft isochronal service utility.

Clearly, an i n telligent real-time agent would use a resouree scheduling policy


that leads to maximum util ity in all responses for all services. For m u l tiple services,
the concept of maximizing total utility requires a normalization of utility scales.
One possible way to normalize utility scales is to assign an i mportance factor to
each service. Furthermore, in an ideal system, different policies would be appl ied
for overruns based upon the known utility functions for each service and relative
im portance of each service. Using the student exam ple again, the student being the
ideal and ultimate scheduler in resource overload is smart enough to
I . Ensure sufficient margin for hard deadlines-attending classes where at
tendance is req uired.

2. Use a besJ -effort approach for extra-curric ular activities.

Scanned by CamScanner

J2

Real-Time Embedded Components and Systems

3. Turn i n late submissi on s for reduced credi t in classes that allow thi s.
4. Apply the anytime policy for classes t hat do not accept late su b mission s,

yet award partial credit.


s. Hold onto assignments com pleted early in classes where absent - m i n d ed
professors m ight m isplace a n early subm ission.

Students are m uc h smarter than most real-time em bedded systems . I t is l i kely

that implem entation of a resource-scheduling algorith m as i ntell igent and opti m al


'
as a student s is not prac t ical for an embedded system. H owever, most real -ti me
embedded systems do mix pol icies l ike the i n t e l l ige n t student did in the exa mple ,
but most often only two at a t ime. One of the most frequently used mixes is a set of
hard real-time services with some best -effo rt services. G uaran tees for the hard

real-time services are proven using methods d iscussed later in this chapter ( Rate

Monotonic Analysis), and best-effort services simply use all the left - over resource

by executing in slack t ime.


The primary focus of this text is on u ndersta n d i ng h ard real-time, however,
soft real-time and isochronal services are also covered . An exa m i nation of anytime

services is not provided in this text. A n yt i me services a re i m portant to artificial

i ntelligence and robot ic sy s t e m s w h e re p l a n n i n g algo r i t h m s t hat are NP hard


( Nondeterm inistic Polynomial bou nd on compute t i m e ) m ust be run i n real-time.

This is a very specialized form of real - t ime embedded system. Because soft real
time services are more generalized a n d l e ss determi nistic c o mpared to hard real
time services, soft real-time is t reate d as an open issue i n this text-a subject that is

an open research area-whereas ha rd rea l - t i m e is well u nderstoo_d and specific


hard real - t i m e pol i ci e s can be p roven o p t i m a l . T h e o p t i m al ity will be demon
strated later in t h is chapter.

S C H E D U L I N G C LA S S E S
Sch ed u l i n g servic es a n d t he i r usage of resou rces c a b e accom
n
plished b y a l a rge
.
of methods. I n this section , we consid er sched u l i ng
system proce ssors with
o rk. In gener al, a system m i g h t h ave more t
h a n one proce ssor ( C P U ) , and any
g ve proce ssor migh t host one or
more servic es. Alloc ating a CPU to eac h se rvice
by t he system m ig h t be s i m plest from a
sched u l i n g v iew po i n t , but clearly .
his
a l so be a cost ly sol u t i o n . F u r t he

r m o re, ru n n i n g s ervic es to c o m p l et io n
igno ring a l l o t h er requ ests
on a fi r t - com e, fi rst- serv ed basi s i s also s i m p l e , bu t
such as ervic e s t a rva t i o
n a n d m i s i n g d ea d l i n e s can arise with t hi
bet te r u nde rst and rea l - t i
me p roce ssor s h ed u l i n g, you fir t need t
revi ew
t a o n o m y of a l l maj or
sched u l i ng p o l i ies t hat c a n b e i m p lem e nt c?d a

shown in
2.8.

variety

i
pr?vi ded
'>-ou ld

p robl e m
Jp roac h. To

Figure

Scanned by CamScanner

SJ

System Resources
Euc:utlon Sc:hedulln1

----------

/"

Loc:al-Unlproceseor

Glob 1U-M P

P<ttm

SM T
SM P
Asymmttrlc: Distrib11ttd
(Symmttric MP) (Simullaneous (Off- load)
Multi-Thrtading)

mpdve

I\

Find-Priority

( Prttmptin. :"lon-Prttmptive S ubtrtt


Under Each Global-MP Leaf)

Batch

Rate

Deadline

Monotonic

" Monotonic:

FCFS

( Flnt Come.
Flnt Serve)

rattn

Dataflow Heur istic

EDF /LLF
(Deadline

R ound Robin
Timttlice

Multi- Frequtncy Co- Rout i ne


[ucutlvts

SJ N

( Shortest Job Nu t)

Continuation
Fu nction

Driven)

FIGURE 2.8

Resource scheduling taxonomy.

As is the case with most policies, no one policy is optimal for all system requirements, not even when all services must be hard real-time. The policies that
can be made optimal and have traditionally been used for hard real-time systems
are Rate Monotonic, Deadline Monotonic, and to a lesser extent, EDF/LLF. The first
branch in the taxonomy is based upon hardware design; does the system contain a
single CPU or multiple CPUs?
Multiprocessor Systems

For multiprocessor systems, the first resource usage policy decision is whether each
CPU will be used for a specific predetermined function (asymmetric, distributed) or
whether workload will be assigned dynamically (symmetric). Most general-purpose
M P (Multi-Processing) platforms provide SMP (Symmetric Multi-Processing)
where the OS determines how to assign work to the set of available processors and
most often attempts to balance the workload on all processors. An SMP OS i s not
simple to implement, and overhead for workload balancing can be high, so many
embedded multiprocessor systems are asymmetric or distributed. Asymmetric mul
tiprocessing is used frequently to take a service that was initially provided by software
running on a general-purpose CPU and offload it to a hardware state-machine or
tailored CPU to implement the service, therefore off-loading the more general pur
pose multiservice CPU. Distributed systems are typically asymmetric and communi
cate via message passing on a network rather than through shared memory, bus or
cross-bar. Other than the issue of load balancing, multiprocessor systems are most

Scanned by CamScanner

JC

Real-Time Embedded Components and Systems

cture-shar ed mem ory, distr ibuted message


disti ngui shed by their hardware arch ite
class ic taxonom y for sue systems i ncl udes
passi ng, or some hybrid of the two. The
.
.
gle Inst r ct1on, ult 1-Data) ,
(Sin
D
SIM
,
)
Data
le
Sing
SISD ( Sing le I n struc tion,
Data ) , and M I M D (Mu lt1 - I nst ruct 1on, Multi
M ISD ( M ulti- I nstru ction , Sing le
e or m ulti le inst ruct ion pr?cessing com
Data). So, all the com bina tion s of sing
s a e P?ss1bl e or M P arch 1 ectu re. Most
bined with single or m ultip le data path
m ultipl e instruction and m ultiple data path
embedded m ultiprocessor systems are
le CPU s for speed-up .
hardware arch itect ures that emp loy m ultip

,,,. nt E CYCL IC EXEC UTIV E


Many real -time systems, including complex, hard real-time, safety critical syste ms,
provide real -time services using a cyclic executive arc hitecture. Cyclic executives do
not requ ire an RTOS or generalized schedu l i ng mecha n i s m . A cyclic executive
provide a loop control structure to explicitly i nterleave execution of more than
one periodic p rocess on a single CPU. The Cyclic Executive is often implemented
as a main loop with an invariant loop body known as the cyclic schedule. A cyclic
schedule includes function calls for each periodic service provided within the maj or

period of the overall loop. The loop may include event polling to determine when
to dispatch functions, and functions that need to be called at a h igher frequency

than the main loop will often be called m ultiple ti mes within the loop. Likewise,
functions implemen ting periodic services that need to be run a t m uch lower fre
quency than the main loop m ay be cal led only o n specific l oo p .counts or only
when polled events indicate a service request. For example, the NASA Space Shut
tle flight software uses a cyclic exec utive design for the PASS ( Primary Avionics
Subsystem ) and has provided decades of defect -free operation p roviding real-time

control of a complex system [ Carlow84 ] .


The Space Shuttle PASS flight software includes t he hard real-time safety critical
GN&C ( Guidance, Navigati on, and Control ) services that are dispatched by a cyclic
executive from a dispatch table configured and selected based upon the current flight
stge of the s uttle ( ascent, on -orbit, re-entry ) . The highest frequency services main
am shttle flight control and operate on a 40- m il lisecond period. As Carlow notes,
.The
high-frequen cy executi ve is schedu led at a relative ly h igh priority to cycle at a
2 Hz rate and in itiate all princi pal functi
on processes direct ly related to vehicle
fliht ontro l. Mid-frequen cy and low-f
reque ncy execu tives are sched u led at lower
.
!:'no nt1e s. They init iate pnnc1p

a I fu nction
processes, which
operate at rates of 6 2
H z down to 0.25 Hz." The sched uling
of t he G N &C exec utives is p rovided by the
Process Manageme nt ' which uses a m

-A .
uI tttas

k mg
"
pnonty
queue struct u re and SC h t"\J
.
ules the CPU m resp onse to requests

ma d e t h rough a service interface


t h at defines
. .
event frequency and pno nty for service
requests. This h ighe r- level schedul er han dles

Scanned by CamScanner

.;lJ:>\'1; 1 1 1

"o;; vv ...... ..

rtquests from the GN&C cyclic executive: SM (Systems Management), which moni
tors systems for faults; and VCO (Vehicle Checkout), which is used for preflight and
on-orbit coast avionic systems testing. The GN&C cyclic executive is scheduled to
run periodically at 25 Hz ( 40-millise,ond period) and includes its own dispatch table
to sequence GN&C services at the high, medium, and low frequencies previously de
scribed. The cyclic executive architecture has been an important and successful
approach for hard real-time systems. Carlow explains the overall system: " [ the flight
software] architectures reflect a synchronous design approach within which the dis
patching of each application process is timed to always occur at a specific point rela
tive to the start of the overall system cycle or loop." Although the cyclic executive has
been successful due to its simplicity and deterministic character, one of its drawbacks
is the difficulty required to modify the cyclic schedule. For PASS, Carlow points out
that, "A major benefit of this approach is repeatability; however, there is only limited
flexibility to accommodate change."
The cyclic executive is often extended to handle asynchronous events with in
terrupts rather than relying only upon loop-based polling of inputs. This extension
of the executive is called the Main +ISR design. As the name implies, this approach
involves a main loop cyclic executive with the addition of ISRs ( Interrupt Service
Routines ). The ISRs handle asynchronous events that interrupt the normal execu
tion sequence of an embedded microprocessor. In the Main+ISR approach, the
ISRs are best kept short and simple so they relay event data to the Main loop for
handling. The Main+ISR approach hJs some advantage over the pure cyclic exec
utive and polling for event input because it may reduce latency between event oc
currence and handling. However, the Main+ISR approach has pitfalls as well. For
example, if an input device malfunctions and raises interrupts at a much higher
frequency than expected, significant interference to loop processing may be intro
duced. Although Main+ISR is more responsive to events as they occur, it may be
less stable unless a concerted effort is made to protect the system for potential in
terrupt malfunctions related to interrupt source devices.

DUL ER C O N C E PTS
The design of a generalized RTOS scheduler for processor resources is covered well
in most operating system texts. Here, we will briefly review some of the major con
cepts as they relate to real-time scheduling of the CPU resource. Real-time services
may be implemented as threads of execution that have an execution context and are
set into execution by a scheduler that determines which thread to dispatch. Dispatch
is a basic mechanism to preempt the currently running thread, save its context, and
restore the context of the thread to be run along with modification of the instruction
pointer or program counter to start or resume execution of the new thread. The

Scanned by CamScanner

onents an d System s
Real-Time Embedded Comp

scheduler.must implement the CPU sharing policy and the dispatche r m ust provid
the context switch for each t hread of execution. T he dispatcher is requ ired to save
and restore all the state that each thread of exeution uses including the followin g: e

Registers

Thread state

Stack
Program counter

xt and is typic al of real -ti me sched


This would be a minim um execu tion conte
tes t h reads i n the conte xt of a proc ss
ulers. By comp arison , the U n ix system execu
e
t
rocess
The
p
contex
read.
h
t
incl
each
for
ud
tor
and mainta ins a process descrip
es
ry,
red
and dyn amic
memo
much more additio nal state, such as V O con text, sha
memory allocati ons. The thread state is one of the best ways to u nderstand how a

scheduler works. As illustrat ed in Table 2 . 1 , thread states are based upon resources
needed in the thread execu tion conte xt.
TABLE 2.1

State Transition Table for a Thread of Execution

ftrMtl State

Desalptlem

Trasltl

Descrtptle

Ready

Thread is queued and ready


to run, but has not been
dispatched (given CPU)

Running

Thread selected for dispatch


based upon scheduling policy

Running

Thread is executing on CPU

Pending

Thread needs another resource in addition to the CPU

Delayed

Wait requested by thread

Suspended

Thread raised unhandled


exception during execution

Ready

Thread yields CPU

Non-Existent

Thread exits

Ready

Additional resource has

Pending

Thread is waiting on a resource in additjon to CPU

become available
Suspended

Suspended

by another thread
Thread is waiting for delay
period to end

Thread has raised unhandled


exception or has been

Ready

Delay has expired

Suspended

Delayed th read is
by another thread

Ready

suspended by command

Non-Existent

Pending thread is suspended

from another thread

Thread has not been aeated

Of a llouted

Scanned by CamScanner

resources

suspended

Suspension removed by
anot her thre - thread
activated
-

Ready

Thread creation and

activation

System Resources

J7

how the scheduler decides which thread from the set of all
those that are ready for dispatch, was the main differentiation in the taxonomy in
Figure 2.8. As threads become ready to run, pointers to their context are normally
placed on a ready queue by the scheduler for dispatch in the order determined by
the scheduling policy. The scheduler must update the ready queue based upon new
service request arrivals. The dispatcher will simply loop if the ready queue is
empty. Note, however, that in VxWorks, neither the scheduler nor the dispatcher
shows up as a task. Also, unlike some operating systems, VxWorks does not in
clude an idle task in the default configuration. In VxWorks, the scheduler and dis
patcher are kernel context services rather than task context services. Preemptive
schedulers are driven by interrupts and task calls into the kernel APL An interrupt
or API call can cause the scheduler to switch context and to potentially dispatch a
new thread or allow the same thread to continue execution. In VxWorks, an inter
rupt or an API call made by the currently runn ing task are the only ways that the
currently running task can be preempted. A fixed-priority preemptive scheduler
simply dispatches threads from the ready queue based upon a priority they have
been assigned at creation unless the application adjusts the priority at runtime.
Most often, if two th reads have the same priority, they are dispatched on a first
come, first-served basis. Almost all RTOSs include priority preemptive schedulers
with support for the basic thread states outlined in Table 2 . 1 .
One o f the major drawbacks of a priority preemptive scheduling policy is the
cost or overhead of the context switch that occurs on every interrupt. Systems such
as Unix, which use a time-slice preemption scheme where an OS timer tick is gen
erated every so many milliseconds by a programmable interval timer, have high
overhead. By comparison, most RTOSs do not use a time-slice tick, and instead
only reschedule when 1 /0 generates an interrupt or when a thread makes a system
call that results in yielding the CPU-a delay, exit, yield, call for an unavailable re
source in addition to the CPU, or suspension due to exception. Otherwise, RTOS
threads normally run to completion unless an event releases a thread ( makes it
ready) at higher priority than the presently executing thread.
Given an M P or uniprocessor hardware architecture, if more than one service
can be requested on a given processor, the next branch in the taxonomy concerns
whether requests will preempt services al re ady in progress or queue and wait for
services in progress to run to completion.
Dispa tch policy,

Preemptive versus Nonpreemptive Schedulers

Nonpreemptive cheduling policy ha exi ted for general-purpose computing from


the begi nning of computing y tern . In general) thi category can be subdivided
into batch and cooperative proce ing. I n hatch systems, jobs that are submitted to
a work queue are deq ueued by the sy tern OS and run to ompletion. The two most

Scanned by CamScanner

JI

ms
Real-Time Embe dded Com ponents and Syste

rved ) and SJN ( Sh o rt est


( Fi rst Com e irst
well know n dequ eue polic ies are FCFS
adva ntage of com p etmg the l argest n u m ber of
Job Next ) . T he SJN polic y has the
ma :r nev r be s rvtce d, a n d fu rthe rm ore, SJ N
jobs per u n i t t i me, but long jobs
.
each JOb will r q u i re exec utio n. Because rea l-t i m e
requ ires an estim ate of how long
.
ive
onse
relat
a
to a req uest
resp
g
ucin
prod
by
ized
acter
services are inhe rentl y char
poli cy is o f inte rest i n this text.
and deadlin e neit her non pree mpt ive
a respo nse relat ive t o requ ests for the ser
All real- ime services must prov ide
often cons trai ned by a dead li ne. At th
vice (a release ) , and the respo nse is most
st orie n t ed based u pon rea l - world
very least, becau se real- t i me system s are reque
ption o r t h ey m ust poll the real
events , clearly these system s must suppo rt preem
ssing. Two nonp ree mp
world o n a regula r basis and provid e period ic servic e proce
flow and mult ifre
tive approa ches are most often used in rea l - t i me system s: data

quency executiv es. In a data flow, input i n terfaces are periodi cally c hecke d, and
when data is available , this data is processe d and o u t p u t is produce d for cons ump
tion by the next service in the flow or terminated by produ c i n g a response. In data
flow processing, the inputs t hat start a flow are c h ecked i n a dete rmin istic order

most often, and flows are executed from start to fin ish o r sou rce t o sink.
The best property of this t ype of schedu l i ng policy is t ha t it is fully determinis

tic-the order o f exec u t ion i n the service and between services ( flows) is fully pre

dictable. The disadva ntage is that some flows may be known to requ ire execution
much more frequently than others-a modification to t h is scheme leads to a mul
t i frequency executive ( MF E ) . In the M F E , specific functions ( data flows ) are exe

cuted at a h igher frequency than others. For example, a con t ro l system data flow

may req u i re exec u t ion

1 00

t i m es per second for stable o pe rat i o n , whereas the

guidance fu n c t ion may o n ly requ ire exec u t ion 1 t i m e per seco n d to d irect the sys
tem t o a target ( hopefully without losing control ! ) . Either a pp roach may use asyn
ch ronous interrupts o r may only poll i n p u t s t a t u s registers; however, conte xt

switches beyond s i m ple i n terrupt servici n g a re n o t done. That is, data flows are not
switch ed prior to complet ion, a n d exec u t ive fu n c t i o n s are n o t switched prior to
completi on; after a flow o r execu t i ve fu nct ion i s dispatch ed ( given the C PU ), it
run to compl etion witho ut signi fic a n t pree m p t io n .

Preemp tive service releases have a n advant age i n that t h e service can be desi gned
m uch more indep enden tly t han data flows o
r execu tives. Each servic e can ass ume
that it will execu te follow ing releas e from
an i nterr upt as if it is t he only service o n the
PU, e ept each servic e m ust be
pree m ptabl e betw een release and c o m pletio n
( gene rati n of re ponse ). The
pree mpta bility req u i re an oper ating y tern th at will
ave and re-t ore ontex.t for ea
h execut abl e erv i e. F u rthe rm ore , eac h rvi e rnu l
exe ute reen tran t
de if it i
ha red a nd m u t yn h ro n i z e a e
t any globalh
,l
hared r sou r es. The
req u i rc m cn are t rue
f a ny pree m pt ive m ul t 1 p rogra m rn ru
.
Y t ern . T ht:"
e re .i n depen dent i n t he
n e t h t n o pcc i fi coo pe rat i n bt
.
tween
rvic es 15 r q u i red u n le.
o d e r dat a i h a red har d o de r d a t j1 n rh

Scanned by CamScanner

_
_

System Resources

,,

requires reentrant code and shared data access synchronization . The services can be
considered to be running asynchronousl y other than specific synchronizati o n poi nts
for shared resource access. Thus, each service can be designed as a separte state ma
chine rather than one state machine composed of many smaller state machi n es ( as is
typical of executives) . Also, one flow of execution may, i n fact, preempt another in
cases where one service is more important than another ( e.g., maybe one service has
a shorter deadl ine or more negative impact if not completed by its deadl ine).
Because preemption opens up the possibility that more than one service m ight
be ready to run, and there are fewer CPU resources than services ready to run, a dis
patch decision must be made. One of the sim plest dispatch policies is to give the
CPU to the service assigned highest priority. These assigned priorit ies are never

changed; they are fixed. To decide how to assign priorities, two common real -time
policies are to assign highest priority to the service with the shortest release period
(h ighest request frequency), which is known as RM ( Rate M on otonic ), and to as
sign the highest priority to the services with the shortest deadline relative to release,
which is known as DM ( Deadl ine Monoton ic ) . Note that RM is identical to D M
when the release period equals the deadline. As you will see i n this chapter, the R M
(and related D M ) policy can be shown t o provably meet system deadl ine require
ments given determ i n istic release periods and service execution t i mes. However,

one major drawback is that fixed priorit ies do not guarantee maximum use of the
CPU.

To attempt full usage of CPU resources and prove that services can meet dead

lines, you are forced to consider preempt ive dynamic priority policies. Two dynam ic
policies have been shown to have the capability to fully use CPU and guarantee dead
lines assuming service execution ti mes and request i ntervals are determ i n istic or

bounded. Liu and Layland proposed Earliest Deadline First ( ED F ) , along with RM ,

and showed them to b e optimal-that is, policies that can schedule a n y set o f services
that can be scheduled ( a n exhaustive proo fl ) . H owever, although RM is sim pler, Liu

and Layland also showed that RM fundamentally requires margin and less than ful l
CPU use u nless service requests are harmonic.

By comparison, in EDF, any time a new service request arrives ( indicated by a n


interrupt asynchronous) , the E D F scheduling policy adjusts all priorities so that the
service with the earliest deadline is given highest priority. Ignoring the overhead of
determining which service has the earliest impending deadl i n e and ignori ng over
head of priority reassign ment, EDF can be shown to provide full use of the CPU
where possible-even when requests i ntervals are not harmon ic. The downside of
EDF is that if request rates vary or execution times vary, the effect u po n services is

very hard to predict-which service will miss its deadl ine in a n overload? Variations
on EDF have been proposed that intend to improve the determinism of the system
gi e n variations, i cluding Least Laxity First ( LL F ) . The L L F policy assigns h ighest
. to the servJCe that has the least d i fference between remaining execution time
pnonty

Scanned by CamScanner

Real-Time Embedded Components and Systems

and its deadline--a measure of which service deadl i n e is most pressin g. The idea is
that laxity may vary based upon exec ution rates, wherea s the EDF assign me nt is not

at all influenc ed by executio n rate.


u
G iven t h i s overvi ew of reso u rce sched u l i n g pol i cies, y o m a y won der h ow
p
these policie s actua11 y i nsta ntiate the variou s u t i l ity fu n ct io n s .resent in the first
l - t i me u t 1 ht y curve. I f
section of this chapte r. They do n ' t , other t h a n the hard rea
.
you can prove that with a policy and determ i n istic param eters that all servic es w ill
comple te prior to deadl i n es, t h e o n l y u t i l ity c u rve t h i s maps t o is the hard real
t i m e curve. Recall t hat the hard rea l - t i me u t i l i t y cu rve provid es fu l l credit for any
respons e delivere d before the deadl i n e relative to service request a n d no credi t or
n egative credit for late response . The i och ro n a l u t i l ity fu n c t io n can be impl e
mented as a hard real-tim e service w i t h early completio ns h e l d u nt i l t h e respo nse
deadl i ne. This approach can be used to i m plem e n t both hard- a n d soft isochr onal

services. Policies and sched u l i ng mec ha n is m s for i m pleme n t a t i o n of soft real - t i me


utility and a n yt i me services are open re earch a reas.

Areemptlve Fixed Priority Scheduling Policy


F ixed priority preemptive policy i s most widely used by rea l - t i m e em bedded sy -

terns employi ng a n RTOS. Early on, most hard real - t i me systems used cyclic exec
u t ives or m u l t i freq uency exec u t ive

often fo u n d i n sy te rn s such as the Space

Shuttle guidance, naviga t i o n , and co n t rol prim ary avio n ics s ubsyst e m . The RTOS

sched u l i ng fra mework offers t h e same determ i n istic schedul i n g as the cyclic exec

u tive, but with far more flexibi l i ty i n h ow services are defi n e d a n d mai n ta i ned. The
princi pal reason for the pervasi ve ness of the RTOS i n l a rger scale configurable

real - t i me em bedded sy terns is t h e fact t h a t t h e policy has been proven opt imal

and a feasibil ity test erist , along w i t h the ease of defi n i n g services as tasks rather
than loop or i n terrupt contexts. The R M feasibi l i ty tests p rovide a m ethod to prove
that a given set of services ( i m plemented as RTOS ta ks) can be guaran teed to all

meet their deadl i nes i f the RM policy is u ed t o assign prioriti es. For h a rd reaJ -tjme
ystems where proof t h a t services i l l n o t m iss deadl i nes is desi red, RM is a n

obviou c hoice. For example , RM is often used for co m m ercial aircra ft , for atellite
sy tern , r any other sy tern where fa i l u re to meet a l l dead I i n es
can result i n ig
n ifi ant lo of l i fe and/o r assets. F i n al ly, as you w i l l see
in thi ect ion, with an RM
.
pol icy, you ca n pred ict and cont rol e xactly which
service will m i s dead l i n es in an
overl oad cen a rio .
Fir r, le t ' look a t why the R M prio
rity polic y i
p t i m a l . Recal l that the RM
.
- p? l icy req u i. res a fram ew rk where e
rvi e a re re l ea .ed a ynch ronou ly ( t yp i c ally
.
via an i n terru pt and p l aced on a
re a d y q u ue i n d i at i n g t hey need to ru n. A
sched uler then d1spa tche ea h t:rvic e
bae d u pon t h e h ighe t prior it y ta k i n the

Scanned by CamScanner

System Resourc es

41

read y queue ( all services are i m p l e me n t ed as tasks and assu m ed to have u nique
priority) . T h e service/t ask d ispatched continues to run t o completion unless an i n
terru pt preempts the task p rese n t l y exec u t i ng. Fol lowing a n i n terrupt, the sched
u l e r ree va l u a t es the ready q ueue and possibly di patches a new ta k (a context
swit c h ) i f a task o f h igher p r i orit y is added to the ready queue compa red to that
ru n n ing prior t o the i nterrupt. When a context switch occ u rs a n d a task/ e rvice is
preempted prior t o running t o completion, t h i s i called inteferencc. I n a fi x ed

priority pree m p t i ve system with only o n e service/task, i n terference i s not possible.


I n a system with m o re than one service/task, any given task may be interfered with
by all t a sks t hat h ave been assigned h igher priority .
With t h is fi xed-priority preemptive fra mework, the scheduling problem mu t
be fu rther constrained to derive a formal mathematical model t h a t proves deter
min istic behavior.

tern

that has

C l early i t is i m possible to prove determ inistic behavior for a sy

nondeter m i n ist i c i n puts. L i u a n d Layland recogn ized t h is and

proposed what t h ey bel i eved to be a reaso nable set of assumptions anJ con t raints

on real systems t o form ul ate a determin istic model. The ass u m p t i ons and c o n

straints are
AI:

All se r v ices requested on periodic basi

A2: C o m p l etion - t i m e < period

, the period is constant

A3: Service requests are i n dependent ( no known phasing)


A4: R u n t i m e is known and determin istic

C I : Deadl i n e = period

( WCET m a y be u ed )

by definit ion

C2: Fixed - priority, preem ptive, run -to-completion sched u l i n g

AS: Critical i n tant-longest response time for a ervice occurs when a l l

ys

tem services are requ ested s i m u l taneously ( m a x i m u m i n terference case for


lowest priority service )

A 1 to A,, are assumptions and C 1 to C11 are con traints as p re se n ted by L i u and
Layl a n d in their paper.
Gi en the fixed p riori ty preemptive sched uling framework and assumpt ions
-

ribed in the preced ing l ist, we can now exa m i n e a l ternatives for assign ing pri

de

a policy that is optimal. howing that t he RM pol icy is opti mal


ii a ccompl i s h e d by i n specting a y tern with a m a l l n u mber of servi es.

oritie and iden t i fy


i mo t ea

An exa m p l e with two ser ice fol lows. Given services


and T 2 , e e ut ion t i m e

T1

2 , Ti = 5,

Scanned by CamScanner

= I,

1
=

a nd C2, and

2 , and then if prio( 1 )

1 and

w i t h period

T1

ds Ti > T 1 , take, for example.


> prio( 2 ) , n te F i g u re 2.9.

relea e peri

Real-Time Embedded Components and Systems

.
S I Makes Dead I me

r pn' o( S 1 ) >

nr
'' io( S2 )

C1
.

..
.

..

..

r:2..
..

..
..
.
..

o,

FIGURE 2.9

02

o,

Example of RM priority assignment policy.

from
I n this two-servIC e exa mp I e, the only other pol icy ( swap p i ng priorities
.
.
.
the preced ing example) does not work. Given servICes 5 1 a n d 52 t h pe ods T,
and T2 and C, and C2 with T2 > T 1 , for example, T, = 2 , T2 = 5, C, - I , C2 - 2, and

then if prio( S2 )

>

prio( S 1 ), note Figu re 2. 1 0.

S 1 M isses Deadline i f prio( S2 )

s,

prio( S 1 )

..

>

'

..
...

FIGURE 2.1 O

..

Example of nonoptimal priority assignment policy.

The conc l usion thJt can be d ra


w n is t ha t for a t wo-serv ice syste m , the RM pol
icy is opt i m a l , whereas the only
a lterna t ive is not o p t i m a l beau e t he alte rna tive

poli cy fa i l s when a wor kab le


sche dule does e x i s t ! T h e ame arg um ent can be
sed
po
for a t hree -se rvice sy te rn , a fou
r - serv i e y. te m a n d fi n a l l y a n N- e i e Y l(lfl .
,
rv c
I n all cases, i t can be sho wn
t h a 1 t h e R M po l i c y i. opt j m a l I n hap er 3, I he R Jte
.
t
Mo not oni c Lea st Up per Bo und
( RM L U B ) is der ived a n d pro ven . ha p te r 3 Jl 0
p rovide s sys te m ched u l i n g fea
i b i l i t y tes t der ive d f ro m R M t hcorv t h a t vo u a n

Scanned by CamScanner

System Resources

use to etermine whether system CPU margin will be sufficient for a real-ti me safe
operation.

R EA L-TI M E O P E R AT I N G SYSTEM S
M ny real -time embedded systems include an RTOS , which provides CPU sched
uling, memory management, and driver interfaces for 1/0 in addition to boot or
BSP ( Board Support Package) firmware. I n this text, example code incl uded is
based upon either the VxWorks RTOS from Wind River Systems or Linux. The

VxWorks RTOS is available for academic licensing from Wind River through the
c.

University program free of charge (see Appendix C, "Wind River University Pro
gram I nformation " ) . Likewise, Linux is freely available in a number of distribu
tions that can be tailored for embedded platforms (see Appendix C, " Real-Time
Linux Distributions and Resources").
Key features that an RTOS or an embedded real-time Linux distribution
should have i ncl ude the following:

'

A fully preemptable kernel so that an interrupt or real-time task can preempt


the kernel scheduler and kernel services with priority
Low well bounded i n terrupt latency
Low well bounded process, task, or th read context switch latency
Capability to fully control all hardware resources and to override any built - i n
operating system resource management
Execution t racing tools
Cross-compiling, cross-debugging, and host-to-target interface tools to sup
port code development on an em bedded m icroprocessor
Full suppo rt fo r PO S I X 1 003. l b synchronous and asynchronous i n tertask
com m u n ication, control, and scheduling
Priority i nversion safe options for mutual exclusion semaphores ( the mutual
excl usion semaphore referred to in this text includes features that extend the
early concepts for semaphores introduced by Dijkstra)
Capability to lock memory address ranges into cache
Capability to lock memory address ranges i nto working memory if virtual
memory with paging is implemented
H igh -precision timestamping, interval timers, and real-time clocks and virtual
timers
The VxWorks, ThreadX, Nucleus, M icro-C-OS, RTEMS and many other avail

able real -time operating systems provide the feat u res in the preceding l ist. This
has been the main selling point of the RTOS becau e it provides time-to-market

Scanned by CamScanner

Real-Time Embedded Com pone nts and Systems


i m e exec u t i ve from sc ra t
accelera tion compared to designi ng and codi.n g a rea l - t
ch,
re
between
g
n
i
c
softwa
terfa
n
i
appl
nt
icati
o
efficie
ns
and
yet provides very d i rect and
ry
a
t
d
n
a
r
i
proprie
requ
are
e
s
n
ice
l
io
opt
n
n s i g, an d
hardwa re platfor ms. Some RTOS
some, such as RTEM s or M i cro-C-O S , req u i re l i m i ted o r n o licensing, especia lly fo r
academ ic or pers o nal use only ( a more com p l et e l ist can be fou n d i n Append ix D,
" RTOS Resources" ) .

Linux was origi nally designed t o b e a m u l t i user o pera t i ng system with mem
ory protec t i o n , abstracted process d o m a i n s , s i g n i fi ca n t l y a u to m a ted reso urce
managemen t, and scalabil ity for desktop a n d l a rge c l u sters o f gener a l - p urpose

computing platforms. Embeddi n g a n d adapt i n g L i n u x for rea l - t i me req u i r es dis


tribution packagi ng, kernel patches, and the add i t ion of s u p port tool s to pro vide
the J J key features l isted previously. Several compan ies s u p port rea l - t i m e Linux

d istributions, including TimeSys L i n ux, F S M Labs L i n u x , W i nd R iver Lin ux, Blue


Cat Linux, Redhawk L i n ux, and many others ( a more c o m p l et e l i s t can be found i n
Appendix C, " Real-Ti me L i n u x Distributions a n d Resou rces ' ) . R ea l - ti m e d istribu

tions of L i n u x requ ire t hese patches:

The Robust M u tex patch ( ht tp://developer.o sdl org/dev/rob ustmu texes/)


.
Linux preem ptable kernel patch ( avai lable from M o n ta Vista, T
imeSy s, FSM Labs)
POSI X clock s a n d t i m ers patch es ( h ttp:// home . conce
pts- ict. n //- rhdv!posix.
html , http://sourceforge. net/p rojectlhig h - res-t imer
s)
igh - reso l ut ion t i mer patc h ( h ttp://www. cs. wisc
.edu!paradyn/libh rtim e/)
L nux Trace Too lkit ( h ttp://"_tww. opersys.
com /L TT!)
L i n u x ( l ) sche dule r i ntro duc ed i n 2
. 6 . x w h ic h p rov ides a n u p per bou nd on
.
sch edu lmg d ispa
tch i nde pen den t of t h e n u mb er o
f p roc ess es act ive

! :

F nal

t
' '

o her op tio n i s t wr ite you r ow


n res ou rce - m a n age m e n t ker nel . In
Ch a p er
er orm ance Tu n mg , ,' yo u ' l l
see t h a t a l t ho u g h an RT OS pro
.
vid e s a
ge ne ric fram ew ork for res
ou rce
.
a n age m e n t a n d m ul t 1. ser v1c
e a p p l i c a t i o n s, the
cost of th ese ge ne ra l ize
' d iea
c
tu res is code fo otp r i. n t a
n d ove r h ead . So, i n Cha pter 8,
" Em b edd e d sys tem Co
m po ne nt s " b asIC
con ce p t s fio r rea l - t .i me ker nel ser
vic es are
su m m ar ized. U nder
.
.
.
sta n d i n g th e a sic
m ec h a n i s m s w i l l b e h el p fu
l to a nyo n e conside ring devel op men
t 0 f a c us to m k e r n
e l B ui' I d 'mg a c u st o m reso
.
u rce- m a n agement kern el j n ot
as d a u n tmg as 1t
fi rst m a Y see m , b u t u s i
n g a p reexi t i ng RTO
may al o be an attr
acti ve
f
base d u po n c o m p l e x i t y
o f ser vic es, po rta b i l ity re
q u i re men ts, t i m e to
mark
n u m ero u s yst em
req u i. re me n t s. Bc cau e i t i s no t
clear whet her u e of
a n RTO or
de e lop m e n t o f a C LL t o m
be tte r op tio n , bo th
re o u r e ker nel i th e
ap proac h c,. ar
e d i s u s:s e
. n t h I. te xt
d
i
,
.
I n gene ral , a n RTO
. e
pro vid
a t h rea d m
g me h a n 1. s m , i n so m
to as a task context
e c a e refe rred
' wh ich i
th c . rn p lem en
t
a
t
'
o
1
n o f a erv1 ce. A _crvicc i t he t h r<>re tac
a 1 co nc ep t o f a n ex
ec ut io
n ontcxt
. Th e R T
m o, t ofte n i m pl ements t h i s a a

Scanned by CamScanner

45

System Resources

t h read of exec u t i o n , w i t h a wel l -known e n t ry point i n to a code ( text ) segmen t ,


through a fu nction, a n d a memory context for this thread o f execution, which i s

called the th read con text. I n VxWorks , t h is is referred t o a s a task a n d the TCB
( Task Control Block ) . I n L i n u x, t h i s is referred to as a process and a process de

scriptor. I n the case of V xWorks, the context is fairly small, and in the case of a
L i n u x p rocess, t h e context i s rela t i vely complex, i n c l u d i n g I /O status, memory
usage context, process state, exec u t ion stack, register state, and ident i fying data. I n

t h i s book, th read will be used t o describe the general im plementation o f a ervice

( at the very least a th read of exec u t io n ) , task to describe an RTOS i m plemen tation,

and process to describe the typical L i n u x i m plementation. Note that L i n u x F I F O


t h reads a re s i m i l a r i n many ways to a VxWorks task.

Typical RTOS CPU sched u l i n g is fixed - priority preempt ive, with the capabil
ity t o modify priorit ies a t r u n t i m e by applications, therefo re a l so su pport i n g
dyna m i c -p riority p reem pt ive. Rea l - t i me respon e with bounded latency for any
n umber of services requ i res pree m pt ion based u pon i n terru pts. Systems where la
tency b o u n d s a re m o re rela xed might i n stead use pol l i n g for even t s and run
t h reads to complet i o n , increasing efficiency by avoiding disru ptive asynch ronous
i n terrupt context switches. Remember, as presented in Chapter 1 , we assume t h a t

response latency m ust b e dete r m i n istic a n d that t h i s i s more i m portant t h a n

t h rough p u t a n d overal l efficiency. I f you are designing an em bedded system t h a t


doesn ' t really have rea l - t i me req u i rements, avoid async h ronous i n terrupts a l t o
get her. Dea l i n g with asyn chronous i n terrupts requ i res debugging i n interrupt con
text a nd can add com plexi t y that m i ght be fu lly avoidable i n a n o n re a l - t i m e
syst e m . Fir t , w h a t real ly constitu tes a real - time dead l i n e req u i rement m u s t b e u n

derstood. By completion of Chapter 3, you s h o u l d b e a b l e t o clearly recog n i ze


whether a ystem req u i res priority preempt ive sched u ling or not. An RTOS pro

vides priority pree m p t ive ched uling as a mecha n ism that allows an application to
i m p lement a variety o f sched u l i n g pol icie :

R M ( Rate Monoto n i c ) o r D M ( Dead l i n e M onoton ic ) , fi xed priority


E D F ( Earl iest Dea d l i n e F i rst ) or LLF ( Least Lax i ty F i rst ) , dynamic priority
i mple r u n to completion cooperat i ve ta k i ng
The e policies a re a mall subse t o f the la rger taxonomy of ched u l i n g pol i ie_

p re en ted p revio u ly ( re fe r to F igure 2.8 ) . I n the ca e of i mple r u n to completion

coope rat ive ta ki ng, the value of the RTO

m o tly the B P , boot, d r iver, and

memory ma nagement becau e t h e appl ication really han dle the hed u l ing. i en
that bou n ded latency i mo t o ft en a hard req uire m e n t i n any real -t i me ytcm, the
foc u s is fu rther l i m i ted to R M , f.DF , and L L F . U l t i mately, a real - ti me

need t o

u p port d i pa tch, exec u t ion

Scanned by CamScanner

hed uler

ntcxt m a n a gem ent, and pree m pti n . I n

RealTime Embedded Components and Systems

ion, but may be preempted by


ces run to com plet
.
th e simp lest scen ano , wh ere servi
. .
.
states are de pic ted 10 Fig ure 2. 1 l .
.
a higher p no n ty service, t h e thread

Dispatch

Interference
(preemption)

Basic dispatch and


preemption states.

FIGURE 2.1 1

In the state transitio n diagram shown in Figure 2. 1 1 , we assume that a thread


i
in execution never has to wait for a n y resources in addit o n to the CPU, that it
never encounters an unrecoverable error, and there is never a need to delay execu
tion . I f this could be guaranteed, the scheduling for this type o f system is fairly sim
ple. All services can be implemented as t h reads, with a stack, a register state, and a

thread state of executing or ready on a priority ordered q ueue. M ost often, threads

that implement services operate on memory or on an 1/0 i nterface. I n this case,


the memory or 1/0 is a secondary resource, which if shared o r i f significant latency
is associated with use, maiy require t h e th read to wait and enter a pending state

until this secondary resource becomes available. I n Figure 2 . 1 2 , w e add a pe ndin g


state, which a thread enters when a secondary resource is not immediately avail
able during execution. When this secondary resource becomes available, for exam
pie when a device has data available , that resource can set a flag ( semaphore),
in di ca ti ng availab ility to the schedu ler to transiti on the pendin g thread back to the
ready state.
In additi n, i f a thread may be arbitr arily delayed
by a p rogram m able amout
.
.
f time then it will need to ente r a dela yed
state as show n i n Figu re 2 . 1 3. A delay 15
'.
si mp ly i mplem ented by a hardwa re
interval timer that provides an interrupt after
a pr gra mabl e nu mber of CPU
clock cycle s o r exter nal oscil lator _cycles. When
t e timer 15 set, n inter rup t hand ler
for the expir ation is insta lJed so that the delay
time ut res lts m restorat ion of the
th read fro m dela yed stat e bac k to ready sta te.
Fma lly, if a th read of exec utio n
enc oun ters a non reco verable error, for ex am
pl e div isio n by zero in the code, then
con tinuatio n cou ld lead to sign ifica t system

Scanned by CamScanner

System Resources

47

Wit on
Resource
( s em T a k e ( l l

Interference
(preempt ion)

FIGURE 2. 1 2

Basic service states showing pending

on secondary resource.

Yield

Wait on
Resource

( semTake (

( t a s kDela y ( ) ,
Paus e ( ) )

))

Interference
(preemption)

Resource
Ready

( s emG 1 v e ( ) )
FIGURE J.1 J

Basic service states, including programmed delays.

lt,
enda ngerme nt. I n the ca e of d ivision by zero, this will cause an overflow re.; u
w h i h i n t u r n m i g h t generate fa u l ty comman d o u t p u t to an actuator , such a a
i han
atellite t h ru ter, which could cau e lo5s of t he a e t . I f t he di vi ion by zero

over
a n ex eption handler that recalc ula tes the re u l t and t herefor e re
1recover y i not p
w i t h i n the ervi e, con t i n uation m i ght be po ible, but often
i l u re, a n nre
ble. Be a u e t h e very next i n t ru t ion m i ght au e total y tern fa
Fi u re 2 . 1 4 how the:
erable ex eption hould re u l t i n u pen ion of that th read.
t h read o r ta k t hat i a l n:ad in the dela 1ed tate
add i t ion of a u pende d tate. I

d l ed b

Scanned by CamScanner

Real-Time Embedded Componen ts and Systems

Suspend

(unhandled
exception)

Yield

( t a s kDe la y ( ) ,
Paus e ( ) )

Interference

(preempt ion)

FIGURE 2. 1 4

Service states, including programmed suspension or

susp ensio n due to exc ep t i on

can also be suspended by an o t h e r task, a

tional sta tes are possible . i n t he s u s pended

t h e ca s e w i t h VxWorks, then addi

t a te , i n c l u d i n g delayed + suspended,

pending+suspended, and s i m pl e suspended.

Al tho ugh sched uling the C P U for m u l t i p le s e rv ice s with the implementation

of th reads, tasks, or processes is t h e m a i n fu n c t i o n of an RTOS, the RTOS also p ro

vides management of memory and 1/0 a lo n g with met hods to synchronize and co

ordi nate usage of these secondary reso u rces a l o n g w i t h t h e C P U . Secondary


resources lead to the add ition of t h e pend i n g state i f their availability at runtime
can't be gua rante e d in advance. F u rt he r m ore, i f two t h reads m ust synchronize,

exe
cute up to the rendezvous poi n t i n its code. I n Chapter 6, you'll see that this see m
i ngl y si mple requirem ent i mposed b y secon d a ry resou rces and t h e need fo r thread
one thread may have to enter the pending state to wait for another thread to

sync h ron izat io n lead to .signi fi ca n t complexity .

Th RTOS provide s 1/0 reso u rce m anage m e n t t h rough a d rive r interface,


. ng
wh 1. h m ludes c mmon e n t ry points for readi ng/wri t i ng data, open i ng/cl si _

?
uo
session with a d v1ce by a thread, a n d c o n figuring the I/O device . The coord i na
cu
of access to dev JC es by multiple t h reads and t h e synchroniz ation of thread exe

Scanned by CamScanner

System Resources

49

t ion with device d a t a va i labil ity a re im pleme nted t h rough


the pend ing state. I n
.
t h e simple s t case, a n I n terrup t Service Rou t i n e ( I S R ) can indicat device data ava i l

e
a b i l i t y by set t i n g a sem apho re ( flag ) , which a llows t h e RTOS to tra n s
i t i o n the
t h read waiting for data to process from pendin g back to the ready state. L i kewise,
when a t h read wants to write data to an I/O ou t put b u ffe r, if t h e bu ffer is curren tly
ful l , the device can synch ro n i z e b u ffer avai l abil ity w i t h t h e th read aga i n t h rough a n
J S R a n d a b i n a ry semapho re. I n Chapter 8, y ou ' l l see that a l l 1/0 and thread yn
c h ro n i zat ion can be handled by ISRs and bi na ry semaphores , but that alternat ive
mec h a n isms such as m essage queues can also be used.

Memory i n the s i m plest scenarios can be mapped and a l located once d u r i n g


boot of t h e system and n ever modi fied a t r u n t i m e. This is t h e ideal cenario be

cause the usage of memory is determ i n i stic i n space and t i me. M e m ory usage may

vary, b u t t h e m a x i m u m used i predeter m i ned, as i s t h e t i m e to c l a i m , u e, and re

lease m e m o ry. I n ge nera l , t h e use of t he C l i bra ry mal loc is frowned u pon in rea l
t i m e e mbedded systems because t h i s dyn a m ic memory- manage m e n t fu nction

provides a ll ocat ion and dea l location of arbit rary segments of memory. Over t i me,
i f t h e segm e n t s t r u l y a re of arbit rary s ize, t h e al locat ion seg men t s must be coa

lesced to avoid external fragmentation of memory as shown i n F igu re 2. 1 5 .

L i kewise, i f arbitrarily si zed segme nts are mapped o n to m i n i m u m size blocks

of m e m o ry ( e.g., 4 KB block ), the n allocation of a I -byte b u ffer w i l l req u i re 4,Q j6

bytes to be a l l oca ted w i t h 4,095 bytes w i t h i n t h is block wasted. T h i s i n ternal frag


m e n t a t i o n is shown i n F igure 2 . 1 6.

r.J ,
.
;,; :1

FIGU RE

J. 1 5

Scanned by CamScanner

ary siz e.
Mem ory fragm entat ion for dat a s gmen ts of arbitr

50

Reat-Time Embedded Components and Systems

Unused

Block
Space

Free
Blocks

. .

FIGURE

2. 1 6

Internal block fragmenfation for fixed-size dynamic

allocation.

I n many real-time embedded systems, a compromise for memory manage


ment can be reached whereby most working data segments are predetermined, and
specific usage heaps (dynamically allocated blocks of memory) can be defined that
have little to no external or internal fragmentation issues.

Overall, because an RIOS provides general mechanisms for CPU, memory,

and 1/0 resource management, adopting an off-the- shelf RIOS such as VxWorks

or embedded Linux is an attractive choice. However, if the resource usage and


m anagement requirements are well understood and simple, developing a custom
resource kernel may be a good alternative.

THRE AD SAFE REEN TRAN T FUNC TION S


When an RIOS scheduling mechanism i s used, care must be taken to consider
whether functions may be called by m ultiple thread contexts. Many real -t ime
services may use common utility functions, and a single implementation of these
common functions will save significant code space memory. However, if one func
tion may be called by more than one thread and those threads are conc urrently
active, the shared function must be written to provide re-entrancy so that it is
d
thread safe. Th reads are considered concurrently active if more than one threa

Scanned by CamScanner

System Resources

SI

context is either exe<:: u ting or awaiting execution in the ready state in the dispatch

queue. In this scenario, thread A might have called function F and could have been
preempted before completing execution of F by thread B, which also calls function

F. If F is a pure function that uses no global data and only operates on input para
meters through stack, then this concurrent use is safe. However, i f F uses any
- global data, this data may be corrupted and/or the function may produce i ncorrect
results. For example, if F is a function that retrieves the current position of a satel
lite stored as a global state and returns a copy of that .position to the caller, the state
inforIJlation could be corrupted, and inconsistent states could be returned to both
callers. For this example, assume function F is defined as follows:
I'

ty pedef s t r u c t p o s i t i o n { double x , y , z ; } POS I T ION ;


'

POS ?TION sate.l lite_pos


.

{o:o, 0 . 0 , 0 . 0} ;

POSI T I ON get_po s i t ion ( v oi d )

double alt ; lat ; long ;


.

..

read_a l t i t ude ( &alt ) ;


_
read_lat itude ( &lat ) ; _

\i

' .

read_longitude ( & long ) ;


1 Multiple f u n c t i o n c a l l s are required to conert the g e o d e t i c

navigational s e n s o r s t a t e to a s t a t e i n inert ial c o o r d i n a t e s

*/

sat e l l it e_p o s . x

s a t e l l ite_po s . y

updat e_x_posi ion ( a lt , lat , lon g ) ;


updat e_y_posit ion { a lt , lat , lon g ) ;

s a t e l l i t e_pos . z

update_z_po s i t i o n ( a lt , lat , long ) ;

return satel lite_pos ;

}
Now, if thread A has executed up to the point where it has completed its call to

update_x_p o s i t in, but not yet to the rest of get_p o s i t io n , and thread B preempts

A, updating the position fully, thread A will be returned an inconsistent state. The
state A returned will include a more recent value for

and older values for y and

because the a l t , l a t , and long variables are on the stack that i s copied for each
thread context. Thread B will be returned a consistent copy of the state updated

from a set o f sensor readings made in one call, but A will have the x value from B's

call and will compute y and z from the alt, lat, long values on its stack sampled at

an earlier time. The function as written here is not thread safe and is not re- entrant

due to the global data, wh ich is updated so that i nterrupts and preemption can
cause partial update of the global data. Vx\.Vorks provides several mechani sms that
can be used to mak e functions re-entrant. One strategy is to protect the global data

Scanned by CamScanner

Real-Time Embedded Components and Systems

SJ

pr:

.
ven t i ng p reem p tion for t h e upd a t e criti cal sec tio n
from parti al upda. tes b},
k ( ) The so l u t i o n to make t h is fu n c t io n thre a d safe
u s i ng t a s k L o c k ( ) and t a s k Un o c
.
usin g

t a s kLock ( )

t a s kU n l o c k (

and

IS

. t 1. 0 n { d o u b l
t y pedef s t ruct posi
.
- {O . O '
POS I T I ON s a t e l l 1 t e_p os
vo i d )
POS I T I O N g e t_p os i t i o n (
{
double alt ,

lat ,

e x,

z;}
.

POS I T I ON ;

0 . 0 , 0 . 0} ,

long ;

l l 1 t e_p o
POS I T I ON c u r r e n t_s at e

..

y,

s;

re ad_a l t i t u d e ( &a l t ) ;
1

read_l a t i t u de ( & l a t ) ;
read l o n g i t u d e ( & long ) ;

- .

'l

...

'

.
*I

Mu l t i p l e f u n c t i o n c a l l s

are

req u i r e d t o c o n v e r t

navigat ional s e n s o r sta t e t o a


The

state

code between L o c k a n d U n l o c k

is

i n

the

t h e g e o d e t ic

i n e r t i a l c oo r d i n a t e s

c r i t i c a l s e c t ion .

taskLock ( ) ;
c u rrent

s a t e l l i t e_p o s . x

u p d a t e_x_p o s i t i o n ( a l t ,

lat ,

long ) ;

cu rrent

s a t e l l i t e_p o s . y

u p d a t e_y_p o s i t i o n ( a l t ,

lat ,

lon g ) ;

cu r r e n t _ s a t e l l i t e_pos . z

u p d a t e_z_p o s i t i o n ( a l t ,

lat ,

long ) ;

s a t e l l i t e_pos
a s s ignment

*/

c u r r e n t _ s a t e l l i t e _p o s ;

/*

a s sumes

s t ruct u re

ta s k U n l o c k ( ) ;

ret u r n c u r r e n t _s a t e l l i t e _p o s ;

The use of L o c k and Un l o c k preve n t s t h e ret u r n


of a n i n co n sisten t state to eit her
fu nct ion becau se it p reven ts preem ption
d u ri n g t h e upda te of t h e l ocal and global
ate l l i te po i t i o n . The fu nctio n is now
t h read safe, b u t pote n t ially will cause a
prio ri t y t h read t o wait upo n a lowe
r prio rity t h read t o com plet e t h i s criti c al sec ti? n
of code . T he Vx\iVorks RTOS prov ides a l te r n a t i ves , i n c l u d i n ta a iab l s
g sk v r
e ( cop.ies
of glob al m J i n ta in e d with t a
k con tex t ) , i n terr upt leve l L o c k a n d U n l o c k , an d a n i n.
ver ion - afe m ute x . The s i m ple
t way t o e n u re t h read safe ty is to a void th e u se
glob a l data a n d t o i m ple men
t o n l y p u re fu n c t i o n t h a t use onl y local stack datJ,,
how eve r, t h i ma y be i m p r
ot
act i c a l . C h a pt e r 8 p r o v i de a mo
re det a i led d isc u io n
m e t h od !) to ma ke c o m m o n l i bra
ry fu n c t i o n s t h rea d a fe.

higer

Scanned by CamScanner

syst em Resources

SJ

S UM MARY

A real - time embedded system should be analyzed to understand req u i rements a n d

how th ey relate to system resou rces. T h e best place to start is w i t h a o l i d under

standing of C P U , memory, and 1 10 requirements so that hardware can be p roperly


sized in terms of C P U clock rate ( i ns t ructions per second ) , memo ry capacity,
memory access latency, and 1/0 bandwidth and l a tency. After you size the e basic
th ree resou rces, determ ine resource margin req u i remen ts, and establi h determin
ism and rel iability requ i remen ts, then you can further refine the ha rdware de ign
for the processi n g platform to determine power, mass, and size. I n many ca es, an

RTOS

provides a quick way to provide management of CPU, memory, and 1/0

along with basic fir m wa re to boot the software services for an application. Even if
a n off- the-shelf

RTOS is used, it

is i m portant to understand the common im ple

mentations of t h ese resource ma nagement features along with theory on resource

usage policy. I n Chapter 3, the focus is CPU resource management; in C hapter 4 ,


it's 1/0 in terface manage ment; i n Chapter 5, memory management; and i n Chap

ter 6 m u l t i resource ma nagement. I n general, fo r real-tim e embedded systems, ser

vice response latency is a prim ary consideration, and overall , the u t i li ty of the
response over time should be well understood for the application being designed.

Traditional h a rd rea l - t i me systems require response by deadl ine or the system is


said to have failed. I n Chapter 7, we consider soft real -time systems where occa

sional m issed deadli nes are allowed, and missed deadline handling and service re
covery is designed into the system.

EXERC I S E S
I . Provide an example of a hard real-time service found in a commonly used
e mbedded system and describe why this service utility fits the hard rea l
t i m e utility c urve.

l. Provide an example of a hard real-t ime isochronal service found in a co m

monly used e mbedded system and describe why this service util ity fi ts the

isoc h ronal rea l - t i m e u t i l i ty c u rve.

3 . Provide a n example of a soft real-time service found in a commonly u ed


e mbedded system and describe why this service utility fits t h e soft rea l

t i m e utility c u rve.
4, I m p lement a VxWorks task that is spawned at t he lowe t priority level

po ible and that call s e m T a k e to enter the pending tate, waiting for an
event in dicated by s e mG 1 v e . F rom the VxWork
ver ify that it i in t h e pending t.ate. Now, call

Scanned by CamScanner

hell. start this task and

fun tion that give the

54

Systems
Real-Time Embedded Components and
on a n d veri fy t h a t it compl etes e xec u t i
semap hore the task it is wa i t i n g
on

e c uted a t the defau l t p r orit y for a


s. I mp le me n t a L i n ux process t.hat is x
o n a b i n ary se apho re t o. be given by a n .
user- level appli cation a n d waits
.
fy i t s state usi n g the p com .
i
d
ver
n
a
s
roces
s
hi
t
Run
.
other appli catio n

and exits.

n:i

Now, run a separa te. process to g1 ve th e


mand to list i t s proces s descrip tor.
con t i n u e exec u t i o n a n d e xit. Ve rify
semap hore causin g the first proce ss to

com plet ion.


,
t get h u n g up on ma t h ) . Please
n
do
(
per
pa
A
M
R
d
n
a
Layl
and
Liu
the
6. Read
sho rt
sum mariz e the paper's m a i n poi n ts ( a t least t h ree or more) i n

.
.
7. Write a paragrap h compari n g the R M a n d t h e Dead l i n e Driven or E DF
policies described i n Liu and L ay la nd i n your own words a n d be com plete)
but keep i t to one reasonably conc ise pa ragra p h . Note that t he D ea d l i ne
D r i ver sc hed u li n g i n L i u and Layland i s now ty p i cal l y called a dy n a mic

parag raph.

priority EDF poli cy . I n

genera l , a

n u m be r o f poli cies that are deadlin e dri


ven have evolved from t h e Dead l i n e Driven sc h ed u l i n g L i u and Layl and

describe, including E D F and L L F . G ive at least t wo spec i fic d i ffere n c e s .

8. Cross-compile code for VxWo rks. Download t h e file two_tasks.c from the

CD-ROM exa m pl e code. C reate a Tornado proj e c t t h a t i ncl udes this file
and download the fi l e to a lab t a rget . U s i n g t h e windshell, use the fu nction

to veri f that t h e o bj ec t code has been dow nlo a ded . Submit ev


idence that you have r u n m o d u l e S h ow i n you r la report ( copy and paste
modu leShow

i n t o your report

or do a C t rl - P r n t

S c r n ) . N e x t , s t i l l i n t h e wi n dshell, do an

to verify t h a t t h e fu n c t i o n e n t ry poi n t for the two_tasks


example has been dy n a m i ca l l y l i n ked i n t o t h e kernel sym bol table. Place
lkup

" t e s t _t a s k "

evidenc e i n you r report . Note t h a t typi ng h e l p o n a w i n ds hell command


l i n e w i l l provide a s u m m a ry o f t h ese basic c om m a n d s . I n add ition , all

comma nds can be looked u p i n t h e V xWork s A P I m a p u a l . Capture the


outpu t from the mod u l e S h ow com m a n d i n t h e w i n d sh
e l l a n d paste it into
.
your lab wr te - u p to prov e t h at you 've d o n e
t h is.
9. Not ice t h e two fun c t i o n s t e s t _t a s k s 1 ( ) a n
d t e s t t a s k s 2 ( ) i n t h e t wo_
ta ks.c file . D esc ri be w h a t each o f t h e e
fu n c t i o n s d es . R u n each fun ctio n
i n t h wind s ell by t yp i n g t h e fu n c
t i o n n a m e at t h e w i nd hell pro mrt.
Exp lain the d i ffere nce i n how y
n c h ro n i za t i o n o f ta ks is a c h ieved i n t wo
d i ffer ent fu n c t i o n n a m e l y t e s
t_tas k 1 and t e s t t ask2.
I O.
un t h W i n d View tool a n d
ca p t u re o u t p u t
t h e t e s t t a s k s pr ogra n
15 r u r n mg . Exp lain
wha t you ee- ide n t i fy you r ta k
t wo ta k arr
,
r u nn in g on the C P U and wh
e re t h ey n h ro n i z e w t h s e m T a k e and se mG i ve.

Scanned by CamScanner

while

whec

55

System Resources

Pro vide a deta iled


exp Ianatio
n an d annot. ate the trace directly by using Ctrl
.
P rnt Sc rn a n d Ctr! y
.
- t o paste an unage mto your write-u
.
p and mark u p the
.
gaph lC d i rectl y. Des
cribe the differ ence between test tasks 1 ( ) and
t e s t_ta s k s 2 ( ) a nd how
this is show n by the Wind View outp ut.
.
1 l . N o w modi fy t h e two_ task
s.c code so that you create a third task name d
y
t
e
s
t
t a s k and write an entry

poi n t func tion that calls paus e ( ) and noth


mg else. Use t h e i comm and
i n the winds hell and captu re outpu t to prove
that you c reated this task. Does
t h i s task show execut ion on Wi ndVie w?
W h y or why not?
1 2 . F o r the next port i o n o f the lab, you will experi
m e n t with crea t i ng a
bootab le project for the s i m ulator. Create a bootab project based upon
le
t h e S i m u l at o r B S P a n d configure a new kernel i m age so t h a t it incl ude
-

OIHl#l CD

POS I X message queues. Build t h i s kernel and l a u n c h t he simu lator with


t h is i mage i n stead of the default i mage. Download an application project
w i t h t h e pos i x_mq.c CD-ROM code, a n d if you d id the fi rst part right , you

will get no errors on download. Do

lkup

" s end " and you should see t he

fu n c t i o n e n t ry po i n t s i n the posix_mq.c module. C u t and past this output


i n t o yo u r s u b m ission to show you did this.

1 3 . S e t a breakpo i n t i n fu n c t i o n
b

s e n d e r wit h the windshell

com mand

s e n d e r . Now run mq_demo from t he windshel l . When the break point is

h i t , start a debug session using the Bug ico n . Use t he Deb ug menu, select
A t t ach, a n d a ttach to the Sender task. You should see a debu g window with
t h e c u rsor s i t t i n g at the s e n d e r fu nction e n t ry poi n t . Switch the View to
M i x ed Sou rce a n d Disassembly and now cou nt the n u mber of i nst ruc tion s

from t h e s e n d e r e n t ry up to the call to mq_o p e n . How m a n y steps are there

from a b rea kpo in t set at t h i s entry poi n t u ntil you enter mq_open u s ing the
s in gl e step i nto? Provide a capture showi ng the b rea k po i n t output from
your wind shel l.

CHA PTE R R E F E R E N C E S
Meeting Deadlines i n Hard R eal Ti me
l Briand9 9 ] B r i a n d , Lo'ic, a n d Roy, Daniel,
I EEE Com pute So c i ety P re s , 1 999.
ystems- Th e Rate Monoton ic
1 , M a rco. Unders tanding the L in ux Kernel.
[ Bo etOO l Bo vet, D a n iel P . , a n d Cesat

App:oach.

O Reilly, 2000.

" oftware
ched u l i n g H a rd Real -Time y terns: A Review .
[ B u rn 9 1 J B u rn , A . ,
1 99 1
Eng ine erin g fou ma /, (
, h u t t ie P r i m a ry Av1o n i s
arlow8 4 I Carlow , G e n e D . , A r h 1 te t u re of the . pace
oft wa re

ber 1 98 4 ) .

Scanned by CamScanner

"

y t e rn , "

M.i ,

o m m u 1 1 i at ions o f the A C A i . V I . 2 7 , N u mber 9 , ( . eptem -

51

Real-Time Embedded Components and Systems

I L i u 73 I

L u , C., and Lyland, J . , "Sched u l i ng Algori t h m for M u l t iprog r a m m i n in

g
."
foumal
of
Association
Enviro
n
m
e
n
t
f
or
-Time
Comp
Real
Hard
a
uting Ma
chinery, Vol. 20, No. I , ( Ja n uary 1 973 ): pp. 46-6 1 .

Robbi ns, K . A . , Prac t ica l U n ix P rog ramming-A Gu ide to Co n cur


rency, Co mmim ica tion, mid Multithreading. Pre n t ice H a l l , 1 996.

I Ro bb ins96 j

j W RS99 l W i nd River Systems I nc . , Vx Works Progra m mer 's Gu ide. W i n d R i ver


Systems, l 999.

Scanned by CamScanner

Processing

In This Chapter

I ntroduction

Feasibi l i ty

Preemptive Fixed - P riority Pol icy


Rate Monotonic Least U pper Bound ( RM LU B )

Necessary and Sufficient ( N&S) Fea sibility

Dyna m ic Priority Pol i c ies

Dead l i ne-Monotonic ( D M ) Pol icy

INTRODUCTION
Processi n g i n p u t data a n d producing o u t p u t d a t a for a system response i n real
time does not necessa rily require large CPU resources, but rather careful use of
CPU resou rces. Before considering how to make optimal use o f CPU resource in
a rea l - t i me embedded system, you m ust fi rst better un derstand what is meant by
proces i n g i n real t ime. The m a n t ra of rea l - t i me system correctness is that the ys
tem m ust not only produce the requ i red output response for a given i n p u t ( fu n c
t i o n a l correctness ) , b u t that i t m ust do so i n a t i mely m a n ner ( before a deadl i n e ) .
A

deadline i n a rea l - t i me system i s a relat ive t i m e after a service reque t b y which

t i me the sy tern m u t prod uce a re ponse. The relative dead l i ne seem to be a i m

p l e concep t, but a more formal pec ification o f rea l - t ime ervice is helpful due to
t he many type o f application . For exam ple, the p roce

ing i n

voi e or v ideo

57

Scanned by CamScanner

51

RHl-Time Embedded Components and Systems


h igh q u a l i t y if t h e service c o n t i n uo usly pro v ide
real -ti me system is conside red
too late, w i t hout too m uc h latency and with ou t t
outpu t neither too early nor
S i m i l a rly, i n d igital o n t rol appl i at ion, the ide al
much j i tter betwee n frames.
delay between sensor sampl i ng a n d ac t u a tor o ut pu t .
tern has a consta n t t i m e
l - t i m e ervice t ha t monitor a atcll i t e' he alt h a nd
However, by com pa ri o n , a rea
ery seque nce when t he atcl l ite i i n danger m u t
statu s and i n i t iates a safe recov
ondit ion i det e ted.
o ible after t he danger ou
enter the recovery as q u ickly as .p
media ( au d i o a n d video) a re i o h ronal real - ti rnt
Digital cont rol and con t i n uo us
genera ted too long after or too early after a r
services . Re ponses should not be
i to r i n g, h oweve r, i a i m ple rea l - t i m e er\11 ,
vice reques t. System hea l t h mon
pon e ( i n i tiate the recove ry eq uen c ) no lat er
where the system m ust produc e a re
e t.
than som e dead li ne follo win g the requ

PREEMPTIVE FIXE D-PR I O R ITY POLICY


Given t hat the R M priority a ign m e n t pol icy i opt i ma l , a hown i n t he prcviou
chapter, we now wan t to determ i ne whether a propo ed et of ervi c i fca ible.
By feasi b le, we mean that the proposed set of e rvices c a n be scheduled given
fixed and known amount of CPU reso u r e. O n e such te t i

L i u and Layland proposed thi

the R M LUB:

si mple feas i b i l i t y test t h ey call the RM Lea

Upper Bound ( RM L U B ) . The RM L U B is defi ned as

U
U:

L/C, I T, ) m ( 2 01
m

1=1

t il i ty of the CPU resource ach ievable

m:

T tal n u m ber of services i n t h e y t e rn h a r i n g com m o n C P U re

T,: Relea e perio d of ervic e


ma 1

I)

C;: Exec u t io n t im e of Service i

ith u t m u h
l and m u t

R M L U B wa derived , i t i pure:'

lo er i n pe t io n of h o w t h i
a

ur

epted o n fa i t h .

Rat her tha n jus t a -ept t h e o ,


a
'"'v ,..' LU B , we can i n t ad e
m i ne the p r pertt
t
servtee and th i r fea ibi li and
ener l i ze t h i i n r m t i n : a n we i n t> t' al
de rmine a
sibili
tl
ra t
pr p
d rvi e h r m , a P ' a ord in
R.'-1
11 '
o an

er thi

qu

empt1 n , d i p a t h , n d

Scanned by CamScanner

lion,

\.\

m pl u n

r, rn

th

t1min

r rek

., ,r

t r m 1 hc r i i al 1 n t Jni

t
n-;

Processing

51

later. The critical ins_t ant assumes that in the worst case, all services m ight be re
quested at the same time. If we do attempt to understand a system by diagram
ming, for what period of time m ust the system be a n alyzed? If we a n alyze the

scheduling with the RM policy for some arbitrary period, we may not observe the
release and com pletion of all services in the proposed set. So, to observe all ser
vices, we at )east need to diagram the service execution over a period equal to or
greater than the largest period in the set of services. You'll see that if we really want
to understand the use of the system, we actually must diagram the executi on over
the least common mu ltiple of all periods for the proposed set of services, or LSM

time. These concepts are best u nderstood by taking a real exam ple. Let's also see
how this real example compares to the RM L U B .

For a system, c a n all Cs fit in t h e largest T over LCM ( Least Common M ulti

ple) time? Given Services S p S2 with periods T, and T1 and C , and C2, assum e
T 2 > T 1 .> for example, T 1

2, T2

5, C ,

I , C2

1 ; a n d then if prio( S , )

>

prio ( S 2 ) ,

you can see that they can by inspect ing a timing diagram as shown in Figure 3. 1 .
C. I .
U = l / 2 + 1 / S = U. 7

-.

U = 0.7 S 2( 2 2 - l ) S 0.83

T1

FIGURE J.1

Example of two-service feasibility testing by examination.

The actual utilization of 70% is lower than the R M L U B of 8 3 . 3%, a n d t h e

system is feasible b y inspection. So, t h e RM LUB appears t o correctly predict feasi


bility for this case.

Why did L i u and Layland call the RM LUB a least upper bound? Let's i nspect
this bound a little more closely by looking at a case that i n creases u t i l ity, but
remains feasible. Perhaps we can even exceed their R M L U B.

I n this example, RM LUB is safely exceeded, given Services 5 1 , S1 with periods

T, and T2 and C , and C2; and assuming T2

C2 = 2; and then if prio( S1 )

>

T 1 , for example, T 1

2, Tl = 5, C 1

1,

prio( S2 ), note Figure 3.2.

By i n spection, this two-service case with 90% utilit}' i s feasible, yet the u t i lity

exceeds the RM LUB. So, what good is the RM L U B? The RM LUB is a pessimi tic
feasibility test that will fail some proposed ervice set that a tually work, but it will

never pass a set that doesn't work. A more formal way of de


that it is a sufficient feasibi l i t y test, but not necessary.

Scanned by CamScanner

ribi ng the RM L U B is

60

Real-Time Embedded Components and Systems

= I 2 + 2 1 5 = 0.9
= 0.9

2( 2 :

= 0.9 > 0.83

FIGURE J.2

- 1)

S:

g RM LUB .
Example of two-service case exce edin

a necessary and sufficien coditi7


What exactly is meant by a sufficient conditio . or
efined meanmg in .1
Necessary, sufficient, and necessary and suff1C1ent ave well-d
the standard defin itions
are
here
,
As defined in Wikipedia, the free online encycl opedia

of all three:

Necesury condition: To say that A is necessary for B is to say that B cannot occur
without A occurring, or that whenever (wherever, etc.) B occurs, so does A. Drink
ing water regularly is necessaryfor a human to stay alive. If A is a necessary cond
tion for B, then the logical relation between them is expressed as "If B then A or a
onfy if A or e A."

Sufftdent condition: To say that A is sufficient for B is to say precisely the converse:
that A cannot occur without B, or whenever A occurs, B occurs. That there is a fire is
sufficient for there being smoke. If A is a sufficient condition for 8, then the logical r
lation between them is expressed as "If A then B" or "A only if B" or "A B."

Necesury nd suffldent condition: To say that A is necessary and sufficient for


B is to say two things: 1 ) A is necessary for B and 2) A is sufficient for B. The logical
relationship is therefore "A if and only if B". I n general, to prove p if Q", it is equiv
alent to proving both the statements "if P, then Q and jf Q, then P:
For real-time scheduling feasibil ity tests, sufficient t herefore means that passing
the test guarantees that the proposed service set w i l l not m iss deadlines; howevet.

fai ling a sufficient feasibility test does not i m pl y that the proposed service set will
m iss deadl ines. An N&S ( Necessar y and S u fficien t ) feasibilit y test is exact-if a ser

vice set passes the N&S feasibility test i t w i l l not miss deadline s and if it fails to pass
)
.
the N&S feasibil ity test, i L is guaran teed to m iss deadli nes. T herefore, an N &S fea ,.
bility test is more exact compa red to a su ffi cient feasibil ity test. The sufficient test is
conser vative and, i n some scneari os, do.w nri gh t pess
i m istic. The RM L U B is useful.
owever , i n that it provide s a simple way to p rove
t h a t a p roposed serv ice set is
.
to
sible. The suffic i ent R M L U B does not prove
that a service set is i n fe asib le- di
t h is, you m st apply a n N&S feasib ility test.
Presen tly, t h e R M L U B is t he l east
. .
1n '
plex feas1b 1l ity test to apply . The R M L U B
i s O( n ) order n comp lexity- requ
e execu tion
that

summ a t 1011 o f serv1C


t i mes and com pa r ison to a simp le ex pressio n

fea

0.:

Scanned by CamScanner


l 1 1t l .. t 0 1 11

1f t ft,... ttq ft1h-:, t . )t "" "' '#\ '< " in ho ptoi '"w'! '-( 1 , 1 : p tr1K1fl, t hf 11nh
a Uu\o f t !'-.,'\ , rJ '\ ththtY f{'i' ' tyt r t( t( ' t ; -. >J y .ir ! ,,.: ...... f tt; U l f Hli: 1 tr.th ! U! -ott
l " lJWt f,.,.t f h\: t ht m1mt.-r. "t ''T'vt: ' tn I t -w-:.,: to : fi r . -., ." flf... 'r! if.of' r.l 1Ht >'t
t"t J

lf'O Ht

4' 1 ;"\ qtl"t f I U

ft,. f t J \: Jr, !1 t l d :.Prt t f tht dH; ":t J t t J f !t',t c 1 ' !! ' ! ' f t-
\'l tfh lrl1''1l i tf"',_ f' dt:fJ n t flfJ._:J:. f t , l) 1: rul f-4 \,(I l o t,_.-. (h( Jf fl';'O\('\'
\.(' f'I. ). ( Cf
\n t h .Ht ". _4t f\"'\t > ' * '"" '"' r . ' 1 1 r ' i; : ;._ , . 1 !. 1 :..1 t w m . hs . . : l fl(' t :w dtpt ' -' ,aL1
J 1 ri 1, t lh t\ \t l l ! \ 1 ' ti!} 1p f hli I n ' "'1 - ; ) . -.. r: l t o r:! f U '(-f. 1 1 : 1 ;"t: :, ) .. n "'
!( a u hd i r ' t c , t l ;l r .. d f t ill'' l o t h . ' 1 : . t h . , , t r m t n-. 1 ! . r: .: rr .. l t .-n e '-!''t l c, (
i
l tl

, "'. hI ) r ,,.. , I

l. .\ .

: "".. '-.r 1\ '- t.


le

'

' , ,

"

' ' '\ . , .. ... .. , 1 ., .. .... .

'

.. .

1 ,

: L \"

' ' ""


.
. .. I'

.. .
.
.,,

,.1

.f

..

..

,
> 1

>

,- .;

... t :...
>

...

, 1 1 1 \ :. '1

i.

; .,..

r ,. .., : 1 ' !

..,

Jt

f 'f J ' ' I


'

.. . .. 1
"

t
-;:r , t .. ! \.... l t h
.- :1t )..1t' n

)'' - -:,:! ! f,
.,

. . .. .. .., . J t 4 --. , " i: ,.I.._


I , ,.

..
' 41:.". ..,, , .. " . \1 '

c... ...

. ..

' . ..

"'

" :

'f-. ,1

' t '

f.' ',f l f " i'

' 4

... t

...

i ;i i .e:....t '" l : i ' ?'- .- t

" '"\ '

r r f J'-r r

t : :-' ! r tn r "C ;.r 1 n .:

.
1o I ;
,

, , . : i J :, , ,.,

\r

l '"
.

, . I1 .'
I

:. ,

1 1 \ :

.f

1 J1 t '

i.

i . ! ' . . . ..

, ' ..

,k l ,

' ..

I.

. ... :

. ,

1 ,, f

lf ' t 1 ' f

' "l' .. . r. ; 1 .: , r t ., .. . . . -;
' f t "- ,. ... .

'\..

, , . ;, ; 1

r t \ , \ .- t t, i, ; t

l t ': 4 ; ,

....

'"

'

'

,, !

i "'

I , , :-- '(,;_

: ; ;

:.

: ::

\'

'<'r\ l' (

.,
.... 4.
, ,,
\, .4"

...
.
. \ !". ' \ \
..
', . . .? I. ,
'..... ,.. ..,.... ""

l "l . l"' t ' r" . '" :

'

ct!U !?:-?.

t,..:'"; :.._ t:

:... , 'Y T' :

f \ f

. . .. . .' ' . ..
ti ;

'... ! ; '

'\

,
-

('
'

\ :;

: ;

t : ..

>

'

,.. '4

'

'

"' t

.. I : .

, , , : J

r f

' <Cf' t

!f :

_. :

.: rif'

: (" : :

(' '!

...

... ,. .... . ..

'

: ;\ l i

,. ... '
\i

... .. .. .....

fti H '{\ .;

j . ! ' ....! ('

' '-"" ' \


'' "
'

... ..
... .
j,

! f

.{

.a

,. . , f

.. ... '

\, J , t

.! i .\i ''f H .t : I )

CJ ( t t" .i ! : : : 4 ! \.

"""" J .. . .. .,,

.,

r#

FE A S I B I L I T Y

..

..

: '

,. ! J f

1 1: .c'

._

: ,

' ' I J ...

t "' ,,

,,-J '-..,

:" J

._
.-.

' j

"

.\t !

.. ! ' :

u ! n t , , ; J '

..

do I.

2'

o I

f'

.,"

t)

f t,"

,"' ... 4

h t t fl ' , H.;d. 1 h; : \

t \t

tr

" '\ '1"'f

it t"t('

t\ U t\ f'""

! (" ! : :

'-;i l l id, fi l

:--; n nuf\ Jn.! "lt l l l 1 . .i1 1 1t i !\' .'\ 'l

\l)f fi\ ll'l\l k J \ t l i i l 1 t \ f , \ h h t! I ,tl \\ .1\ l l .J tl J ,,' f'l d 1. '4<' t ( Ul I \ 1111( f t"Jl t l l t l\' '-l fl.'.
\ 11. r \\.'t
( I c. , t h .tt \. .I l l n u n dr.1dl 1 11n l l f , 1\\ \' \ \ I . .1 'lll r ! 1 .. h 1 1 l f l"\\ h ll l .tl tJ1l J \t'f
t h.H a re: ,I I r 1 I l l <' "' ' k P , l J , 1 1 1 1 1 .t l l \ " " d i \ u l t i" 1n t h." ,t \lb1l 1 ! ,. I ,., , .. ..H l' nut pH.'t 1 \.\.',
Th.: "llll tlu\'.tt t h : -. r ' .a r c:. " ' 1 1 h4.' C \ t l l \ c t i\\. J 11 ' < th(\' h 1ll 1 1 n a p.t ' ' .111 u r h .& k w t ot \tr
i, il:t.'). N -; h."h .u p r l'l l,c:.' . \n ' k.t,1b i l 1t \ lnt '"ll 1wt J'J,, .l \t'f\'iu. \C.'t t llJt l \
.

.u 1 d l tl'\\ \'4.' ' ' 1 l l 1t t 1( l.1 11 .un tt,l t h .u I\ u k l ht: Kt l l ' I\ " .i uhiu\.'nt t 1.' \t
Jnd r h rd\ H t" "''k ht1t 1t \\ Ill t.111 '<'f\ h . t' '<.'h t h .\l J ... t tt.llh 'an be \jfdy ht:dulcd.
u m.1 k

Scanned by CamScanner

12

' <
.

polin:
charac. .
become .
sufficient
feasibi} f,.
pass ,
sufficient "
:.

.....

.
.
Point and Completion tests for the RM
ling
u
d
h
e
.
.
By com par iso n, the Sc .
.
h
mg
t
e
1mprec1se,
'
b ut sa1e
show
es
mpl
Exa
are N&S and therefo re pre use.
amined in the foll owin g sect ion : I t will also
ex
re
a
LUB
M
R
f
the
o
1c
t
. .. ten s
c 1 th e
ds can east1 'f 1a1
.
. h re1atively harmonic per10
clear that servtee sets wit
the tn ore precise N&S
e safe when analyzedb
to
wn
sho
.
be
B
nd
LU
RM
sets as safie th at may not

red ' c t such harmo nic service


ity tests will correct1
the safe service set. The
tes will preci sely identify
the RM LUB. The N
f the N &S safe subs et as depi cted in Figure 3.3.
tests are yet another subset o
.
.

1 ,

..

systems
Rea lTime Embedded Components and

. 71 --r
.

. ..
,

Safe Service Set


Passing
Sufficient Test

FIGURE 3.3

Relationship between sufficient and


N&S feasibility tests.'

RATE MONOTO NIC LEAST UPPER BOUND (RM LUB}

l
:

, .

Taking the same two-serv ice exampfe shown earlier i n Figure 3.2, we hav the
following set of proposed services. Given Services S1, S2 with periods T i and T2
execution times C1 and C2, assum e that the servic es are release d wit h T 1
2, T2
execute deterministically with C1
l , C2
2, and are scheduled by the RM .policf
so that prio(S 1 ) > prio(S2) . I f this propose d system can be shown to be feas ibl
that it can be scheduled with the RM polic y over the LCM (least com on
m
m
pie) p riod derived fr m all prop osed serv
,
c
Leho
ice peri ods, .then the

.
and Dmg theorem [ Bnan d99) guar antee s
it real - time- safe. The . theorem is Will
l
upon the fact that given the periodi c relea
ses of each s ervi ce, the LCM schedu e . .
si mply repeat over and over as sho wn in Fig
-:.i:
ure 3.4.

Scanned by CamScanner

Processing

IJ

C.I.
s,
S2

FIGURE J.4

C:J

C3

Two-service examples used to derive RM LUB.

Note t h at there can be up to

I l

releases of 51 during T2 as indicated by the

# I , #2, and #3 execut ion traces for 51 in Figure 3.4. Furthermore, note that in t h i s
particular scenario, the util ization U is 90%.

The CI ( Crit ical I nsta n t ) is a worst-case assu mption that t h e demands upon
the system m ight include simul taneous requests for service by a l l services in t h e
system! This elimi nates the complexity o f assuming some sort o f known relation

ship or phasing between service requests and makes the RM LU B a more general
result.
G iven t h is motivating two-service example, we can now devise a strategy to

derive the R M LUB for any given set of services S for which each service S ; has an
arbitrary C,, T;. Taking this example, we exam i n e two cases:

Case l : C 1 short enough to fit all three releases in T2 ( fits 52 crit ical t im e zone)

Case 2: C 1 too large to fit last release in T2 ( doesn't fit 52 crit ical time zone )

Exa m i n e U i n both cases to find common U upper bound. The critical t i me

zone i s depicted i n Figure 3.5.

The 52 c r i tical t i me zone is best u nderstood by conside r i ng the c o n d i t i o n


where 51 releases occ u r t h e maxi m u m n u mber o f t i mes a n d for a duration t h a t
uses a l l possible t i m e d u r i n g T2 w i t h o u t act ually causing 5 2 to m i ss i t s deadline. So,

Case I where 5 1 total resource required j ust fits t h e 52 crit ical t i m e zone (T 2 - C2 ) is
shown in Figure 3.5.

I n Case I , all th ree 5 1 releases req u i r i ng C, exec utio n t i me fit i n T 2 as shown i n

Figure 3.5. This is ex pressed by

(3. l )

Scanned by CamScanner

Real-Time Embedded Components and Systems

T.LT /T.J
'-1 fT / T. l
Eq- 2 : Cc = T1 - r

Eq- 1 : c,

Eq-3 : U

T:

c,
T,

c,

= -+-

FIGURE J.S

( I . e. (,

T:

1s

,
small enough 10 fit into fractional 3rd T, shown as ..._.

( I.e. C:

C.I

T: . lnrerf<'renc<' from c I rclcaKs)

r.m -

Example of critical time zone.

L T2

I 11 is the m i n i m u m n u m be r o f times that T:


Note that the expression
I 11 i s the fraction al am ou nt of
- 11
occurs fully during T2 The expressi on
time that the third occurre nce o f T 1 overlap s with Ti. Equatio n 3.2,

L T2

Tz

(3.2)
simply expresses the length of C2 to be long enough to use a l l time not used by S1note that when the third release of 51 ju st fits i n t he crit ical t i m e zone shown in red,

then 5 1 occurs exactly

IT2 I 11 l

t imes d u r i n g T 2 From Figure 3.5 i t should be clear

that if C 1 was increased and C2 held constant, t h e sched ule would not be feasible;
likewise if C2 was increased and C 1 held constant, they j ust fit ! I t i s also interesting
to note that looking over LCM ti me, we do not have ful l u t i lity, but rather 90%.

Equation 3.3,

(3.3)

?etines t e uti lity for the two services. At t h i s poi n t , let's plug the expression for C:
Ill

Equ atio n 3 2 i nto Equ atio n 3 . 3 , whi


ch crea tes Equ atio n 3.4:
.

9_ + [ T2 - C , f T2 / l) l]

11

( J.f

T2

No w, sim pli fy by th e fol


low ing alg eb ra ic ste ps
:
I
T2
1 11
u
( pu l l o u t T2 ter m )
+

+ [ - c, r T1 1J
T:>
S + 1 + [ - c , I T2 I T, l] ( no te tha
t Ti ter m is
r,
TJ

-t

Scanned by CamScanner

T7

I)

Processing

l + C1 [( 1 11i

f T, I T,

Ti

l]

15

( combine C 1 terms)

This gives you Equation 3.5:

(3.5 )
What i s interesting about Equation 3.5 i s that U monoto11ically decreases with
increasing C, when ( T2 > T 1 ) . Recall that T2 must be greater than T1 given the RM

[( 1 1T, fT:;1Ti l]

priority po l icy assumed here. The term


,_
is always less than ze r o
_
because Ti IS greater than T , . This may not oe immediately obvious, so let's analyze
the characteristics of this expression a little closer. Say that we fix T 1 = 1 and be
cause T2 m ust be greater than T 1 , we let it be any value from 1 + to
when T2

oo.

Note that

l , this is a degenerate case where the periods are equal-you'll find out

later why this is someth ing we never allow. If we plot the expression f T1 i r, l , you see
=

that i t is a periodic fu nction that oscillates between 1 and 2 as we Tncrease T 2 ,

equaling 1 anytime T2 is a multiple of T , . By comparison, the term


stant and equal to 1 in this specific example ( we could set

( I / lj )

is con

T 1 to any constant

value-try this and verify that the condition still holds true ) . Figure 3.6 shows the
periodic relationship between T2 and T1 that guarantees that U monotonically de
creases with increasing C 1 when ( T 2

,,

1 8
1 7

T 1 ) for Case 1 .

Rll tlo of Mu Im um Occurrence a of Smaller Period over Larger Period

FIGUIE J.1 Case

Scanned by CamScanner

>

1 relationship of T2 and T i.

61

onents an d Systems
Real-Time E mb edd ed Comp

the scena rio wher e C 1 j ust fits into the critical


So, now that you u nder stand
2.
time zon e let 's loo k at Case
that is, C pills over T2
does not fit i n to
In Ca e 2, the last release of 5 1
. s s p11Jo ver cond 1tt0 n
of th e l ast
n in Figu re 3.6. T h
boun dary on last release as show
:
is exp ress ed sim ply as Equ atio n 3.6
release of 51 dur ing

T.2,

T2

(3 .6}

, b t w i t t h e i n equ ality ippd


Equ atio n 3 . 6 is the sam e as Equ atio n 3. 1
.
cnt1 cal t i m e zone show n m F igu re
whe n 5 1 ' s C 1 is j ust a little too large to fit i n the
3.7 .

C. I

C,

C2

T2 - T1lT2 / T1 j
riLT2 I T, J - ciLT2 n;J

c,

c,

Di I I %Ml 121 ''fi

= - + -

T1

T2

FIGURE J.7

Case 2 overrun of critical time zone by S , .

Even though

S1 overruns the critical time zone, some t i me remains for 52 and

we could find a value of C2 for 52 that still allows it to meet its deadl ine of T2 To
compute this smaller C2, we fi rst note that 51 release

# 1 plus #2 i n Figure 3.6 along

with some fraction of #3 leave some amount of time left over for 52 H owever, if wt

T d u ri n g T leaving out the third re


lease of 51 d uring T2, then we see that this time is the s u m of all full occurrences of
T 1 d u ring T2, which can be expressed as 11 L T1 I 11 J . F urthermore, t he amount of
time that S 1 takes during this 11 L T1 I 11 J d uration is exactly C 1 L T2 I 11 J . From
simply look at the first two occurrences of

2,

these observations we derive Equation 3 . 7:


c2

-:-

11 L T1 I T1 J

cI

L T1 I 11 J .

(3. 7)

Sbsti t ut i n g Equa tion 3.7 i n to the utility Equa


tion 3 . 3 again as befo re, we get

Equa t10n 3.8:

u=

Scanned by CamScanner

9- + [11 LT2 / 11 j - C 1 LT2 / T1 J]

T,

Ti

( 3.8) .

17

Processing

Now sim plify i n g by t h e foll ow in


g a l ge b ra i c step :

c,
[ -Ci L T I T, J ]
I T2 )LT2 I T1 J + +
( e pa ra t i n g term )
1i
T:!
( T, I Ti ) LT: I 1i J + C 1 [ ( I I T, ) - ( I I T1 ) L T I T, J J ( p u I I i n g
out

u- (Ti1
U

C , term ) .

o m mon

Equat ion 3 .9:

T h i give u

( T, I T2 ) LT I T, J + c I [ ( I I 1j ) - ( I I T2 ) L T2 I 11 J] .

( } . 9}

W h a t is i n terest i n g about Equation . i that


mn11oto11ically in ren e . ith
increasing C1 when (T2 >
) . Recall aga i n that T2 m u t b ' grea ter t han T 1 given the

T1

RM p r i o r i t y pol icy a u rned here. The term ( 1

( l I 11 )

because

I T2 ) L T1 I Ti J

T2 is g rea re r t h a n T 1 As be fo re , thi

m a i l e r t h .in

alwa

ma y not be i m mediate! obvi

ous, so let's a nalyze the characteri t ic of t h i ex pres ion a l i t t l e loser-on e c1ga i n ,


we fix

T1

to

Now if we pl ot t h i

oo.

I a n d becau e T mu t be greater t h an T 1 , w e l e t i t b e a n

aga i n ,

cases and t he refore a lso l e

t han

y o u . ee

( I I T, )

that

( I I T1 )l T I T, J

v a l ue fr m I

i !cs t h a n I i n all

as can be een i n F igu re . 8 .

Ratio ol Minimum Occurrences of a Smaller Period over Larger Period

--

0 95

09

( L

0 85

T l

I T Ti !

oe

0 75

I
I
07

0 6 '
06 t
0 55 '
i
05

I
._ ------- ---------

r:

- -- --- -

T
FIGURE J.I

0,

n w yo u

Scanned by CamScanner

u n d r<it nd t h

ey c n ep i t h a t in

t i m e zon . Th
u rren e o

Case 2 relatio nship of T 2 and T,

u ri n ' T ..

n d, i n

, '>t'

c:n ri

w h e re

Je

I , we ha

2,

j u t ov r r un the

e he m

haH t h

i m u rn n u mhc-r

minimum

riti 'l

II

Real-Time Embedded Compone nts and Systems

ty func tions for both Case 1 and Case 2:


I f we now exa min e t he utili

oo,

I + C1

[(t

;,

]<

T 11
/ li ) - I 2 1 3 .5 )

( 11 / Ti ) L Ti 1 11 J + c , r ( 1 / 11 ) - ( , / Ti ) L Ti / T, Jl < 3 .9 >

func tions on the same grap h set ting T 1


Let's pI ot the two util ity
and C 1 = 0 to

T,.

l , T2

::::

+ to

ease-1 Utility

0. 9 - 1

0.8-0 . 9

0 0 7 -0 .8

0 .9

0.6-0. 7

0 .8
0 .7

0. 5-0 .6

0 .6
0.5
0 .4
0. 3
0 .2
0.1

0. 4-0 . 5

0 0 . 3 -0. 4
0 0 . 2-0 . 3

0 . 1 -0 .2

0-0.1

FIGURE 3.9

C1

Case 1

utility with

varying T 2 and C 1

0, this is not particularly i nteres t i n g because this means that is the only service that requires CPU resource-likewise, when C 1 = T 2, then this i
also not so interesting because it m eans that 5 1 uses all t h e CPU resou rce an d never

When

allows 52 to run. Looking a t the utility plot fo r Equation 3 . 5 i n Figu re 3 .9 and


Equation 3 . 9 in Figure 3. 1 0, yo u can clearly see the periodicit y of utility wherr
maxim um utility is achieved when T 1 and T2 are harmo n ic.

What we reaJly want to know is where the utility is equal fo r both cases so t at
we can determine utility indepen den t of whether C 1 exceeds or is less tha n the cot-

_
icaJ time zone. This is most easily d eterm i ned by subtra cti ng Fig u re 3.9 a nd Figurt
ho\\
3 . 1 0 data to fi nd where the two fu nctio n differ enc e a
s re zero. Figu re 3. 1 1 s

that the two functio n differen c es are zero o n a diagonal when TJ is va ried fro m
times to 2 t i mes T 1 and C 1 is varied from ze r o t o
T 1

Scanned by CamScanner

Processing

69

C.se-2 Util ity

0.9-1
1

0 .8-0. 9

0.9

c 0. 7-0 .8

0.8

0 .6-0. 7

0 .7
0.6
05
0.4
0.3
0.2
0.1

0. 5-0 .6

0 4-0 5
o 0 . 3 -0. 4
0 0.2-0.3
0. 1 -0.2
0-0.1

C- 1

FIGURE 3. 1 0

Case 2 utility with va ryi n g T2 and C 1

Intersection of Case- 1 and Case-2 Utility - Cue-2 Utility)

0.9
0.8
0.7
c 0.4-0.6

0.6
0.5

0.2-0.4

Service-IC

0-0. 2
-0. 2-0

0.4
0.3

o -0. 4-0. 2
o -O. S.-0. 4
-0. 0.6

0. 2

1 --0. 8

0.1

1.1

1 .2

1 3

1 .4

1. 5

1 6

1. 7

1 8

1.9

Period-2

FIGURE 3.1 1

I ntersection of Case 1 and Case 2 utility curves.

Finally, if we then plot the diagon al of either utility 'curve (from Equati on 3.5
or Equati on 3.9) as shown in Figure 3. 1 2, we see that iden tical curves that
clearly
have a minimum near 83% utility.

Scanned by CamScanner

-,u

... . .
Real- Time tmoe aaea \..o mpu 11111:;rn

--

Equel utility curve

09
Utility 0.8 .

.,

07

J!=::::: ::::::: ,>=*


_ :.;;:; ;-:;f '.
1.1

1. 2

1 .3

1 .4

Period-2

FIGURE J. 1 2

1 .5

-'
1 .6
1 .7

1 8

1 .9

' 19'

'

:.' , 2

c:Ase

Two-service utility minimum for both cases.

Recall that Liu and Layland claim the least upper bound for safe utility given
an y arbitrary set of services ( any rel a t i o n between periods a n d any relat ion

between. critical time zones) is defined as: U


I

services, m(2 m

I ) = 0 .83 !

= I, (C; I T, ) m(2
m

t=I

1)

For

two

this

We have now empirically determ ined t h a t there i s a m i n i m u m safe bound on

util ity for any given set of services, b u t i n doing so, we c a n a l so c learly see that

bound can be exceeded safely fo r spec ific T 1 , T2, a n d C 1 relation s .

For completeness, let's now fin i s h t h e two service RM L U B proof mathemati


cal ly. We'll argue that the two cases a re valid o n l y w h e n t h ey i ntersect, and given

the two sets of eq uations this can only occ u r when C 1 i s equal for both cases:

c
c
U

= T1

I
2

T2

J
L
c I T2 I T, l

11 T2 I T1
I

S_ + C 2
T1

T2

Now , pl ug C , and C2 s i m u l ta
n eous l y i n to the u t i l i t y equ a t i o n to get E q uati
3 . 1 0:

Scanned by CamScanner

Processing

u=

T2 - T1 L T2 I 11 J T2 - c i I T2 I 11

u=

T2 - T1 LT2 I T1 J + Ti - T1 I T1 I T1 l + Ti L T2 I Ti J I T2 I Ti l

u=

(Ti 1 T, ) - L T2 1 Ti J + 1 - I T1 1 T1 l + ( Ti 1 T1 ) L T1 1 Ti J I Ti 1 Ti l

Ti

T.,

T1

u =1

u= I

u= ,

71

T1

- r T2 1 Ti l + ( Ti 1 T2 ) L T2 / T, Jr Ti 1 Ti 1 + ( Ti 1 Ti ) - L Ti 1 Ti J

- ( Ti I T2 ) ( ( T2 I Ti ) I T1 I Ti 1 - L T2 I Ti J I Ti I Ti 1 - ( T2 I Ti )
- ( T1 1 T2 ) [ r T2 1 11 1 - ( T2 / T, ) J [ ( T2 / T, ) - L T2 / T, J J

(Ti I T1 ) LT I T1 J)

f
(f((T2l -Ti!)) )

( 3. l o i

Now, let whole i n teger nu mber of i n terferences of S 1 to 52 over T 2 be


From this,
I = T2 I Ti J and the fractional interference be
= ( T I T1 ) - L T2 I T1
2

w e can derive a sim ple expression for util ity:

U-1-

J.

( 3. 1 I )

T h e derivation for Equation 3 . 1 l is based upo n substi t ution o f I and f i n to

Equati on 3 . 1 0 as follow s:

u = , - ( T1 / T2

U = t - ( T1 I T2
( N. d )

)[

) [ r T2 / T, 1 - ( Ti / r, ) J [ ( T2 / Ti ) - L T2 / r, J J

L T2 I Ti J - (T2 I T1 ) ][ ( T2 I T1 ) - L T2 I Ti J]

tloor ( N.d)

I - ( 11 I Ti ) l - ( T1 I Ti

U = l - ( T1 I T2 ) ( I

J)
u -- 1 - f ( t ( T2 / T1 )

Scanned by CamScanner

- f )(J)

based on c e i l i n g

) - L T2 I T1 J) J [ ( T2 I 11 ) - L T1 I Ti J J

72

em s
on ents an d Syst
Re al- Tim e Em bedded Co mp

By

inator term to Eq uati on 3. J t ,


e sa me denom
h
t
g

n
cu
tra
b
su
d
add ing an

we

can get:

)
J ( l - /)
(
= I - L T1 I 11 J + ( Ti I 11 ) L Ti I 11 J
u

U = l - ( / (I 1+- /) )
( /)

LUB fo r U occu rs w h e n I i s m i n i mized,

.
.
IS l , an d the
The sm aJl est I po ssi ble
t:
we sub stit ute fo I to ge

I r

i
(
!
U = l - (l ! )
( l + /)

'

.
.
Now takm g t h e denva t1ve o f

so

U w r t f, a n d solv i n g for t h e extrem e, we get:


( 1 + J)( I - 2 / ) - (J - /2 ) ( 1 ) = O
()U / df =
( I + /) 2

Solving fo r f, we get:

And, plugging/back i n to

The RM

U, we get:
u

2 2 ' 12

LUB of m ( 2 - J ) i s 2 ( 2 ' 1 2 - I )
I

rn

- 1)
for m = 2 , w h i c h is true fo r the two

service case-Q.E.D .
Having derived the RM LUB b y i n spec t i o n a n d by m a t hemati a l man ipula
c
tion, we learned that the pess i m i s m of the
RM LUB t h a t leads to low util ity fo r
real-t ime safety i s based upon a bo u n d
that work s for a l l pos s i b l e com b i na tio ns of
T1 and T2. Specifica lly, the RM LUB
is pess i mist ic
c a ses w here T 1 a n d T are
harm onic -in these case s,
you can safe ly ach i eve 1 00010 u t i l i ty! I n m a ny cases, a
shown by dem ons trat ion i n
Figu re 3 5 a n d t h e Leh ocz ky, S h a h , D i n g theore m ,
can also safe ly use a
at leve l s b lo w 1 00010 but a bov t
e h e RM L U B. The
still has valu e bec aus e
tli
i t ' s a s i m ple and q u ic k fea i b i l
i ty c hec k t h at is s u
.
cie nt. Go mg t h rou g h the
_
der i vat io n , it h o u l d n
o w be evi den t tha t i n ma n y Cd
safeJy usi ng I OOo/o o f the
resou rc e i s not po ible w
it h
tive sch ed ulin g and thf R M
pol i c y. I n the nex t
Sh a h , D i n g
e t io n , the Leh o

fo r

L_UB

CP U

CP U

Scanned by CamScanner

>'0
R

fi xed- p rio rity pr eenr


zky)

Processing

7J

theorem is pres ente d a n d prov1'


d es a necessary and sufficien t feas1b1ht y test for RM
.
po hey.

N E C ESS AR Y A N D S U F F I C I EN T (N& S) FEA SIB ILITY

? algor i t h m s fo r deter m i nation of N & S feasib ility testin g with

Tw

easily employed:

RM polic y are

Sched u l i ng Point Test


Completion Time Test
To always achieve I 00% util ity for any given service set, you m ust use a more

compl icated policy with dynamic priorities. You can achieve l 00% u t i lity for

fixed - priority preem ptive services, b u t only if their relative periods a re harmon ic.
Note also t h a t RM theory does not account for I/O latency. The i m p l icit assump

tion is that 1 /0 latency is ei ther i nsign ificant or known deterministic values that

can be considered separately from execution and interference time. In the upcom
ing "Dead l i n e - M onotonic Pol icy" section, yo u ' l l see that it's straightforward to
slightly modify R M pol icy and feasibil ity tests to account for a dead l i n e shorter
than the release period, therefore allowing for addi t ional output latency. I np u t
latency c a n b e s i m i larl y dealt w i t h , i f i t ' s a constant l atency, b y consideri ng the
effective release of the service to occur when the associated interrupt is asserted
rather than t h e real -world even t. Because the actual t i m e from effective release to
effective release is no d i fferent than from event to event, the effective period of the
service is unchanged due to input latency. As long as the additional latency shown
in Figure 3 . 1 3 is acceptable and does not destabilize the service, it can be ignored.
H owever, if the in put latency varies, this causes period j itter. I n cases where period
jitter exists, you i mply assume the worst-case frequency or shortest period possible.

Schedulin1 Point Test

Recall that by the Lehoczky, Shah, Ding theore m , i f a set of services can be hown
to meet all deadli nes form the crit ical instant up to the longe t deadl ine of all task

i n the et , then the et i fea ible. Recall the critical i nstant ass u m ption from Liu
and L a y! J nd paper. which states that in the wor t case, all ervice m i ght be re

q u e ted a t t h e ame point in ti me. Ba ed upon thi common ct of a u m ption ,


Lehoc z ky, h a h , a n d ! J i n g i n t ro d uced a n it erative te t for t h i t h l'orl'.m cillt:<l tht
'

ched u l i ng Po i n t Test :

Scanned by CamScanner

14

a
Real -Tim e E m bed ded Com ponents n d

\f ; , I ;

( k ,/) e R
Ri

{(

k.

systems

. l; r (l ) T1c l

n . m tn

1= 1

Cj

I)\I S k S i.I

.
TJ

I,.

( I ) T1c

. -l j}

ere 5 1 has higher prio


rity
of task s i n the set S, to S,,. wh
Wh ere n is the num ber
prio rity tha n S n > I .
tha n S 2, and S " has higher
t between 5 1 a n d 511
j identifi e s Sj, a serv ice in the se
l per iods m u st be a nalyz ed.
k ide ntifi e s S\, a serv ice whose
ods of S k to be a n a lyze d.
I represents the num ber of peri
utes w i t h i n I peri ods of s,.
represents the num ber of time s SJ exec

[(1 1

f T2 I Ti l is the time requir ed by S1 to exec u t e w i t h i n l perio ds o f Sk-if the sum

l period s o f Sk, then the se rvice


of these times for the set of tasks is smalle r t h a n
set is feasible.

Service Time = Time com pletion - Ti me Release


(From Release to Completion)

_
_
_
/
r

Release

Deadline

Response Time = TimeActuation - Timesensed


(From Event to Response)

Event
Sensed Interrupt

FIG UR E l. 1 1

code

_
_

Dispatch

Preemption

Dispatch

Completion

(10 Queued)

Actuation

(10 Completion)

serv 1Ce

effective rle ase and dea dline.

is. m
. c l uded w i t h tes t c ode o n t h e CD- ROM for thl
Sched u l ing Poin t Test N o t
t h at th e a l go n th m a u m e arrays are orted acco rd
,1
ing to the RM pol icy w h er e p e r
ou
1o d [ O ] 1. t I h 1' ghe
t pnonty a nd s h ort e t p e n
Th e C

a l gor ith m

1c

Scanned by CamScanner

Processing

75

Cepletlo Tle Test


The Completion Time Test is presented as a n alternative to the Scheduling Poi n t
Test [ Briand99 I :

i s t h e n umber of executions of s, a t time t.

r J l
r J lei
a.

is the dem and of sj in time at t .

(t ) is the total cumulative demand from the n tasks up to time

Passing this test requires proving that

a .

t.

(t ) is less than or equal to the dead

line for Sn, which proves that Sn is feasible. P rov ing this same property for all
S from 51 to S,. proves that the service set is feasible.

The C code algorithm for the completion time test can be found on the CD

ROM and assumes arrays are sorted according to the RM policy where per iod [ o I is
the highest priority and shortest period .

DEADLIN E-MONOTON IC POLICY


Deadline- monotonic ( D M ) policy is very similar to RM except t hat highest prior

ity is assigned to the service with the shortest deadl ine. The DM policy is a fixed

priority policy (be sure not to confuse this with EDF where priority is assigned

dynamically with highest priority assigned to the active service with the nearest or
earliest deadline at any given point in time ) . The D M policy el imi nates the original
RM asu mption that service period m u st equal service deadline and allows RM

theory to be applied for scenarios even when deadline is less than period. This is

useful for dealing with significan t output latency. The D M policy can be shown to
be an optimal fixed-priority assignment policy like RM policy because D , and T,
differ only by a constant value, and D; S T;. The D M policy feasibility tests are most
easily i mplemented as iterative tests like Scheduling Point and the Completio n

Time Test for RM policy. This policy and associateJ feasibility tests were fir t i n
troduced b y Alan Burns and Neil A udsley 1 A udsley93 ] . T h e s u ffi c ient feasibil ity
test they first introduced is simple and intuitive:

Vi : I S i S 11 : C,
D,

Scanned by CamScanner

S I .0
D,

3. 1 2 )

e c1or se rvi ce ,' and I '. is the i nterference time servi ce ,


t'm
1

C; ts t h e execution
.
.
.
1 e D . tim e peri' od s i nce the t i me o f request for service .
experiences over its d eadl'n
c'
a I I s ervic es from l to n , i f the deadli n e i n t erv a l i s
Equatton

3 1 2 states that aor


.
t ime interval plus all m t e rfe n n g exe
l o n g enough to con t at n the serv1Ce ex ecution
.
.
re
a
services
l
l
a
If
feasibl
feasible.
e,
t
is
e
h
e
servic
n the
c u t io n t i me intervals , then t he
e).
system .1s c
1eas1' ble ( real -tim e saf
.
.
.
a
l
l
h
1gh er pn o n t y servi c es s
.
by
t
i
o
n
p
pr
e
e
m
to
e
u
d
S
1s
e
ic
I n t erf ere n ce t0 Se r v
1
.
Ince t i m e ts t h e nu mber of releases of sj over the <leadfere
nter
i
l
tota
e
h
t
s
d
an
.
1
,.
to
.
.
m
u
l
t
1
p1 ied b y exe c ut io
then
is
terferences
f
S
.
, m
o
b
er
num
The
n
t e- r v aI o i
Im
. e m

i
y
t
h
pnont
an
igher
h
S
has
ys
a
w
l
a

that
te
o
N
S
I
I
.
for a

time C, and sum med

(3. 1 3 )

,o j
l

o 1s
t h e worst-case nu m b er of re l eases of S1 ove r the dead l i n e i n terval
.
.

_!..

T,

for S,.

Because t h e m tenerence is th e worst -case n u mber of releases, i n ter ferenc e 1s ov erc

cor - th e last i n terference may b e only partial. So, t here will be r, full
accou n t ed 11
.
c
te ncerence.
1 11 ter1erences
an d some parti' a l i n te r ference from t h e last a d d 1 t 1 o n a I m
.
So, we can b e tter acco u n t for t h e p a r t i a l i nterference with

lo.-o }'
1.

is the fu l l i n terfere nce t i m e ,

en c e. and J if t h er e i s, and Min 1 C,

o.

'T,

1? 1-rl oro' J + i J!I


I

r, '

'

0 i f n o p a rt i a l i n ter fer -

is th e parti al i n terfe renc e t i m e i f it ex i s t s .

Hm.-.1ever, e\ en Eq uatio n 3. 1 4 d o es n o t exac tly


acco u n t fo r p a r t i a l i n t erference.
So, both E q u a t io n s 3 . 1 3 a n d 3 . 1 4 c o m b i n e d w i t
h E q u a t i o n 3 . 1 2 are o n l y s u fficient
fe asi b i l i t y t ests.
.

I J

A sl igh t l y d i ffe rent app roac h to d ea


l i n g w i t h T; ':t D; is to s i m ply a s s u m e that S,
ha a s h o r te r peri od than i t rea l ly doe
s u n t i l T, = D; T h i s a p p roac h is a va r i
a t i o n of
.
pe r iod t ran sfo rm t ha t wo uld affe
ct ove ral l u t i l i za t i o n , b u t a l l ow
s t h e RM pol i cy
and fea i b i l i t y app roa che s to be
app l i ed w i t h o u t m od ific at i o n ,
i n c l u d i n g t h e N &S
Co m p let i o n T i m e Tes t a nd/ or
S c h ed u l i ng Poi n t Tes t . I n ge n
e r a l , per iod tra n sfo r m
i u sed to i n c rea . e t h e freq u
e n c y of a p e r i od i c ser vic e to
r a i se i t s RM p r i o r i t y, often
by d i v i d i n g t he i m p le m e n t
at io n o f the se rvi ce i n to m u
l t i p l e p a rt s . Bec ause o ut p u t
I .H en ie t y p i a l l y a re
no t a h uge p o r t i o n o f re
p o n e t i me , pe rio d t r a n form i a
pra t i al wa y t o for ce
rea l - wo rld pro ble ms i n
t o t h e R M fra me wo rk - de r i vi ng a
0 .\ 1 N & fea i b i l i t y tes
t wou l d be a n o t h e r
o pt i o n . H ow eve r , t he N& S D M fea i h il
i t y tes t i co m pl e x , so
u n l e s t h e r e i a h ug
e o u t p u t l a t e n cy , pe riod t ra
n sf orm is t he
b('. t a pp ro ac h d ue t o
t h e la rge bo d y of we
ll - u n de r t oo d RM t h e o ry
.

Scanned by CamScanner

Processing

77

A n a l ter na t ive to RM
or DM are the dynam ic priority
.
.
polices derived from the
.
deadi me - d nv en d yn atn ic pn
onty
approa
ch first presented by Liu and Layland . I n
,
.
t he ne xt sec tio n, we II exp l ore
t
h
e
a
d
vantag
es and disadv antages of dynam ic prior. .
1t1es com pared to stat ic T od
.
ay, ,
ior h ar d real-tim e systems that must provide determm 1st1C res po nse s to service
request s, fixed-p nonty RM policy remain s the
mo st w1'd e 1 y use d and u niv ers ally
accepted the ory.
.

DY NA M IC PRI O R ITY PO L ICI E S

?1

Priority p ee ptive dynamic priority systems can be thought of as a more complex


class of pn n preemptive where priorities are adjusted by the scheduler every time
a new service is released and ready to run. The concept was first formally introduced

oy Liu and Layland [ Liu73 ] with their description of deadline-driven scheduling pol
icy. The policy Liu and Layland specified in their paper later became known as an
EDF ( Earliest Deadl ine First ) dynamic priority policy. The policy is called EDF be
cause the scheduler gives highest priority to the service that has the soonest deadline
whenever a dispatch decision is made. This also means that anytime an additional
thread is placed on the ready queue, the EDF scheduler must reevaluate all dispatch
priorities because the newly added th read may not have a deadline later than all the
existing th reads on the ready queue. Basically, the EDF scheduler must be able to in
sert the new thread i nto the queue based upon time to its deadl ine relative to the
time to deadline for all other threads-the insertion has a complexity that is of the
order

O(n),

where

is the number of threads on the queue. By comparison a

fixed priority policy scheduler can be implemented with complexity that is 0( l ) , or


constant time using prio rity queues. Liu and Layland proved that the potential util
ity for E D F is full and that deadlines can be guaranteed with EDF. This is a n incredi
ble result when compared to fixed-priority R M policy. Essentially n o margin is
required fo r real -time safety. Is this really true? If so, then why wouldn't EDF be the
only policy ever used for real-time systems? Figure

3. 1 4 shows a scenario where the

fixed-priority RM policy fails, and EDF succeeds. Furthermore, it shows that a re

lated d yn a m ic priority policy, LLF ( Least Laxity First ) , also succeeds where RM fails.

Like EDF, L L F is a dynamic- priority pol icy where services on the ready queue

are assig n ed h igher priority i f their laxity is the least . Laxity is the time difference
between their deadline and rem aining computation time. This req u i res the sched
uler to know all outsta n d i n g service request tim es, t h e i r deadl i n es, the c u r rent
time, and remain i n g comp utation time for all services, a n d to reassign priorities t o
all services o n every p reem ption. E timating rem aining computation t i me for each
service can be d i fficult and typically req u i res a wor t-case a p p roximation . Like

EDF, L L F can also schedu le l 00% of the C P U for ched ules that can't be sched u led

by the static R M pol icy.

Scanned by CamScanner

,,,,..,....

71

Real-Time Embedded Components and Systems

E xampl@ 1

T1

(1

T2

(2

T3

(3

Ul

0.5

U2

0. 2

UJ

0. 285 7 1 4

LC M =
Utot

70
0. 985 7 1 4

R M S c hedult'
51
52

..

53

E OF 5 c hedule

s 1

'

$2
S3
no
s1
52

53

52
53
Lax ity
52
53

FIGURE J. 1 4

x
5

l
-

LLF S c hedult'
s1

51

---,

..

x
3

x
x

I
I

R M policy overload scenario.

I n tuitively, it's hard to believe that the use of any resou rce can be safe with no
margin at all. Even the slightest miscalculation on reso u rces requ i red can cause an
overload (overuse of resources) . This is a likely scenario given that determi ning the
actual execution time that all services will req u i re is n o t easy u n less very pes
simistic worst-case ti mes are assu med. So, it becom es i n terest i n g to consider what
happens to threads or services i n a n overload scen ario for a given policy. For EDFr
overload leads to nondetermi n istic fa i l u re-that is, it's very hard to predict exactly
which and how many services will m iss their dea d l ines i n a n overload. It depends

upon the state of the relative prioritie s during the overloa d, which i n turn depends
upon the order of time to deadli ne times for all servic es ready to r u n .

B y comparis on, fo r fixed-prio rity polices such a s RM, i n a n overload all services
,
of lower priori ty than the service that is overr u n n i n g may
m iss their deadl i ne, yet all
services of highe r prior ity are guaranteed not
to be affec ted as show n in Figu re 3. 1 5.
For EDF an over run by any servi ce may
cause all othe r serv ices on the ready
q ueue to miss their dead lines ; a new
servi ce adde d to the q ue ue, therefo re adju st
. .
mg prior i ties for all, will not pree mpt
the over ru n n i n g servi ce. The overru nni ng
service has a time t d adline that
is negati ve beca use it has passed , so it co nt rn ues

.
to be the h ighe st pno nty serv ice and
cont i n ue to ca u e othe rs to wait a n d po t en
tiall y miss dead Hnes . In an over ru
n scen ario , c o m m o n poli cy is to t er mi n at e th e
rde ase of a serv ice t h a t has ove rr
'
u n . T h i c a u s es a ser v ice d ro p out
. H o we\ e
sim ply detec t i n g ove rrun and term
.
matm
g t h e ove rru n n i n g serv i. e ta ke sorn

Scanned by CamScanner

,...
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,,_.) _
. ...,..,.,
._._.

Processing

S,

Slrvu:es

71

onllnuc 10

kel Deadline.

f'no1 ( $, )

l'nt(S

,, I .,

S,

a-running

o'

h,,c, l >caJhnc
S

si-r' ice

Scf\ 1c"'' Jll l\1 1 "

Tho.:1r l >la<ll111 '

Pno( S ,.) ' Pnn1 ( S, )

FIGURE J. 1 5

RM policy overload scenario.

CPU resource, which without any margin means that some other service will m iss

i t s deadli ne-with overrun contro l EDF becomes much more well behaved in an

overload scenario-the services with the soonest dead l i nes will then clearly be the
ones to lose out. However, determ i n i ng which services this will be i n advance
based upon t h e dynamics of releases relative to each other-is still d i fficult. Figure

3 . 1 6 graphically depicts the potentially cascadi n g EDF overload failure scenario

all services queued while the overru n n i ng service executes potentially m i ss their
deadlines, and the next service is l ikely to overrun as well causi ng

cascad i n g fa i l

u re. Probably the best option for an E D F overload is to d u m p a l l services i n the

queue-this at least would be more determin istic.

Prio(

,.1

.)

<

Pno (

over-running

G0

Pno( S, 1 . ) <' Prio (

1)

O\ er- ru n ning

0
FIGURE J. 1 1

Scanned by CamScanner

EDF policy cascading failure overload scenario.

10

Real-Time Embedded Components and Systems

EDF

i nt o the dea
. .
exist whe re a d i ffer e n t pol icy i s encoded
V anat1ons
of
dli....,, ,.
.
.

II
L
b

a
d
o
n
g
fi
n
e
d
m
y
y
1
u
a
n
(de
r
fr
rk
La yl a nd ) 0
.
. n y am ewo
d nven,
dynam1 C-pno
ne
.
h i g hes t p n o n t y 1 s assi gn ed
n
t
o
o f t I1e more mteres t t ng variati ons 1s LLF. I L L F. ,
th
.
.
e
.
ce b e tween i t s rema .i n m g e x ec ut i o n t i m e a d
fferen
.
i
d
ast
I
e
e
h
t
as
t
h
e
a
h
servic t
n
.
.
re nce b e t ween t h. e i r deadl i n
d
d
me
i
t
ffe
i
e
h
t
.
L

i
is
ty
i
x
a
e
n
i
g
e
d
a
. u pco m i n d ea
its
.
.
n
. .
a serv1Ce. Dete r m m m g t h e l east l a x ity s erv
rem a i. n i. n g com pu t a t i on t i m e for
ice
.
.
uest
req
serv1Ce
g
n
es
i
d
m
i
t
outstan
,
l
l
a
h
t
w
eir dead .
requ i res t 1i e sc h e d u ler to kno
.
s
e
m
i
t
l
l
n
a
o
e
i
t
a
for
t
u
p
rvi
m
o
c
.
s.
e
c
A fter 'I
L
cu rreri t t i rn e , and rem a i n i ng
tne
I mes,
.
.
. .
.
p n o n t i es t o a l l serv ices
on eve4J...
t h .1s 1s k nown , t he scheduler must t h e n reassign
.
.
. .
f
at
event that adds anothe r service t o t h e ready queue. Est i m m rema m mg co m pu.
tation t i m e for each service can be d i ffi c u l t a n d t ypica l l y req u i re s a wor st- ca se ap.
proximation. The LLF policy encodes t h e concept of i m m i n e n ce, w h ich i n t u iti vely
.

makes sense-every student k nows that t hey s h o u l d work o n t h e h o m ework where


they have the most to do and which is d u e soonest fi rst-u n l ess of c o u rse tha t par-

ticular ho mework i s not wort h m uch cred i t .

I n some sense, a l l priority enco d i n g pol i c i es , d y n a m i c or s t a t i c , m iss the

poi nt-what we rea l l y want to do is encode w h i c h serv ice i s most i m port ant and
make s u re that it gets the resources it needs fi rst. We wa n t a n i n tell igent sched uler,

l i ke a student who t akes i nt o considerat ion l a x i ty, i m pact of m issi n g a deadlin e


for
a given assign ment, cost of d roppi n g one or m o re, a n d t h e n
i n t e l l igen tly dete r
m i n es how to spend resou rces for m a x i m u m benefi t . Th i s c o ncept i s a
n open re
sea rch a rea i n soft real - t i me system s a n d h a s been i nvestig ated
b y a n u m ber of
resea rchers [ Brand t99 J . I n fact, s i n ce t h e l a n d m ark L i u a n d
Layl a n d forma l ization
o f R M and dead l i ne-dr iven sched u l i ng, m o s t of t h
e p roces sor resou rce research
has been o riented to one of fou r t h i ngs:
'

Gen era l izat ion and red u c i n g con stra i n


ts for RM a p p l i c a t io n
Sol v i g p rob lem s rel ated t o RM app

l icat ion fo r rea l sys tem s


_
a l ter nat ive pol c ies for dea dl i n ed r iv e n sch edu l i n g
mg new soft rea l - t i m e po l ic i es
t o red uce m a rg i n req u i red i n RM po l icy

Dev s ng
Dev i

.
I n Ch a p t r 7 1 ' So ft Re al
- T 1 m e Se rv1c
es, " we w1 1 1 ex p l o re som. e of t he me t hod
t h a t ha ve be en pr op o ed
to a d a pt RM a n d d ea d
i i. ne - d n. v e n poli cies
.
.
to s1tu
. at1. om
w h er e an oc ca
s1o na l ser v ice
d rop ou t o r ove rru
n 1s a ccep ta b l e .
rea l - t i me m et h
o d 1ror h a nd l i. n g m i se
d de ad ! m
. es w i- 1 1 b e m o re clos
ely exa m m ed alon g with mort
i.. n - d e, pt h
of
an d L L F sc

co ve ra ge
_

Scanned by CamScanner

ED F

he d u l i n g .

Soft

Processi ng

81

S UM MARY
Fixed-priority preemptive scheduling with a R M priority assignment policy (short

est period has h ighest priority) is most often used and advised for hard real-time
systems. Hard real -time systems by definition must have deadline guarantees be
cause the consequences of a missed deadline are total system failure, significant loss
of assets, and possible loss of human l i fe. All service sets run as threads of execution
should be tested with a sufficient or better yet N&S feas ibility test .before being
fielded for hard real-time operation. Passing an N&S feasibility test guarantees a

service set will meet its deadlines as long as Ci, Ti, and Di were properly specified

and are determ i n istic or worst case. For quick analysis, a sufficient test such as the

RM LUB may be useful. The potential disharmony in period is the reason that the
RM L U B is less than ful l utility for two or more services. Services that have har
monic periods can be safely scheduled with utility exceed ing the RM LUB despite
failing to pass this test and will pass an N&S test. In general, dynamic priority pol i

cies such as EDF and L L F are not considered safe for hard real-time systems due to
their difficult-to- predict deadline overrun characteristics. Dynamic priority policies
do work well for soft rea l - time applications, such as game engines, video, audio, and
multimedia applications, where an occasional service dropout is acceptable.

EXERCISES
the s u fficient R M scheduling feasibil ity test ( Liu and
p.9, T heorem 5) for the following ANSI C function protot

l . Code

i n t RM_s u f f i c ient (
int Ntasks ,
int

* t id ,

u n s ig n e d long

int

*T,

u n s i g n ed l o n g

int

*C,

long

int

*D) ;

u n s igned
Ntasks

task I ds,

is the number of tasks in the task

set, t id

is an array of unique

is an array of t he release p e r iods, c is an arra y of the computa

tion times, o is an array of the deadl i nes (T must equal D for each

RM).

in

Fin ally, t h e fu n c t i o n should s i m p ly ret u rn l if t h e s y stem can be

cheduled, and 0 if it can't.

Scanned by CamScanner

t id

12

nts and System s


Real- Tim e Embedded Compone

2.

.
T e Test ( fou nd earl ier i n t h i s cha pter ) fo r th e fol.
od e t h e Co mp let ion i m

fu n ction pro to type


I owi ng AN 1
.

ion
i n t Sc hed _co mp l e t

int Ntask s ,
i nt

t id ,

u n s ign ed 1 ong l n t
u n s i g n ed 1 o n g l nt
u n s igne d lon g i n t

r,
c,

o) i

e t h at T m u t equ al o i n all
Parameters are defined again as i n # I . As u m
cases.
' .
between a sufficient a nd
3. Oescnbe m your own words what the differenc e is
.
.
.
a necessary and s1ffici ent sched uling feas1 il 1t test .
.
.
4. Why is the sufficie nt R M LUB so pess 1 m 1st1c .
.
7L
0 CPU re s . If EDF can be shown to meet deadli nes a n d potent 1 a 11 y h a s l QOO
. y
source utilization, then why is it not typically t h e h a rd rea l - t ime polic
of
choice? What are the drawbacks to using EDF c o mpa red to R M / O M ? I n
a n overload situation, how will E D F fa il?
6. Code a function to compute the Fibonacci seq uence to any n u mber of
terms. The sequence is 0, I , l , 2, 3, 5, 8, 1 3, 2 1 , 34, 55, 89, . . . T h e Fibonacci
sequence or Fibonacc i numbers begin with 0 and l . The next term is then
the sum of the two previou s terms. ( Do not be concer ned if your Fibonac i
number overflows an unsign ed 32-bit i nteger , just l et i t overflow).
7. Now, deter mine how many term s i n t h e seq uenc
e corre spo nd to I 0 mil
l iseconds of com puta ti o n on a lab t a rget PC ( th e
a nswe r may vary de
pend ing upon the spec ific targ et used . T he easie
)
st way to do this is to ust
Win
dVi
ew
to
mea
sure
the CPU t i me take n b y a task call i n g you r funct ion
J
'
with a large valu e. See how lon g that take
s a n d the n sca le up or down the
n um ber as needed to ach ieve 1 0 m il
lise con ds of com put atio n. Repeat thi
to determ ine how ma ny .ter ms a re
req ui red for 20 m i l l isec ond s of compu
tati on usi ng Wi ndV iew .
8. Giv en a tas k set wit h two
tas ks cal l i n g yo u r Fib on acc i seq
uence, o ne with
N ter ms for 1 0 mi ll ise
co nd s of exec u t i o n a n d t h e
oth er fo r 20 mi l li
o d , is the sys tem fea
sib le if the I 0 m i l li sec on d

t ask is rel eased eve ry ZO


m i1J1 eco nd s an d the
2 0 m i l lise co nd tas k ev ery
50 m il l i sec o nd s ? ( T 1 == .0
msec, T2 = 50 m se c,
C 1 = I 0 m se c, C 2 = 20 m
ec a n d all D ' = T 1 ' ) . SJ
yo ur an swer up on
1
th e Leh oc zk y, ha h.
a n d D in g1 T heore m .
9
u n t e precedin

g rstern a nd h o w evid
en e that it w rks o r e 'pl ain wh'
1t won t
.

? ;

Scanned by CamScanner

PJOcessing

83

CH APTE R RE FE RE N CE S
[ Aud sley93 I Audsley, N., A. Burns, and A. Wellings, "Deadline Monotonic Sched
uling Theory and Application." Control Engineering Practice, Vol. l : ( 1 99 3 ) :

pp. 7 1 -78, 1 993.

[ Brandt99) Brandt, Scott, "Soft Real-Time Processing with Dynamic Quality of

Service Level Resource Management." Ph.D. Disserta tion, Department of

Computer Science, U niversity of Cotorado, 1 999.

[ Briand99J Briand, Loi'c and Daniel Roy, "Meeting Deadlines in Hard Real-Time
Systems." IEEE Computer Society, 1 999, pp. 28-3 1 .

[ Liu7 3 ] Liu, C., and

J.

Layland, "Schedul ing Algorithms for M ul t i p rogram ming i n

a Hard Real-Time Environ ment." journal of the Association for Comp uting
Machinery, Vol. 20, No. l , ( January 1 9 7 3 ) : pp. 46-6 1 .

Scanned by CamScanner

1/0 Resources

In This Chapter

I ntroduction

Worst-cas e Execution Time

Execut ion Efficiency

I n termediate 1/0
1/0 Architecture

I NTRO DUCTI ON
The i nput a n d output t o a service shown previously i n Figure 3. 1 3 o f Chapter
requires 1/0 to/from a device such as a sensor or actuator ( encoder or decoder) .

This l/O is part of the response time, and as shown i n Chapter 3 , it i mply add to
respon e laten cy, but does not affect the service execution or interferen e t i me
during the response. Most services, unless they are trivial , i n olve ome i ntermed i
ate l/O after the i n itial ensor i n p u t and bfore the fi nal po t i ng of out put data to ,

write buffer . Thi in termediate 1/0 is most often M M R ( M em ry Mapped Reg i t er )

or memory device II

. I f t h i i n termediate 1/0 ha

i n gle ore

le I( ten y,

Ler

wai t - late, t h e n i t ha no add i t ional i m pa t on the er i e r e r n.e t i me. H owevr.


if the i ntermediate 1/0 stall the .. PU o re , t hen thi in rea c the re. pon e time ' h i le
the PU pro es i n g pipel i n e i tailed. R. t her t h a n on. idcr i ng t h i i n tcrmc<liate

15

Scanned by CamScanner

Real-Time Embedded Components and ystems

II

.
l/O 1' t 1s more easily model ed as a n exec ution efficiency. Device 11
/
,
1 0 as dev1ce
. .
0
.
lhon s of core cycle s. By co mpari so .
1
m
even
and
ds
usan
,
o
n
th
s,
ed
r
d
un
. i n.
latency is h
.
l/O 1 ten cy 1s typica lly tens o r hundreds of core cycles-if more l a te
a
n
termed 1ate
.
cy
ld
.
gn
be
shou
reworke
desi
ware
d.
hard
core
the
then
e,
'bl
posst
ts
1s
than t h

IME
WO RST -CAS E EXE CU TI O N T

---

ervice releas e wou l d be deter m i n istic . For s i m ple


Ideall y the execu tion time for a
be true. The I n te l 8088 and the Moto rola
micro processor archit ectures, this may
.
, n o cache, a n d 1ven m mory t h a t h as
68000, for examp le, have no CPU p i peline
count t h e mstruc tt ons fro m sta rt
no wait-st ates, you can take a b lock of code and

o f C P U cycl es required
to finish. Furthermore, for these architec t u res, t h e n u mber
to execute each instruct ion is known- so me instruct ions m ay take more cycle
than others, but a l l are known n u mbers. So, the total n u mber of CPU clock cycle

required to execute a block of code can be calculated. To compute determ in istic


execution time for a service, the following system c h a racteri s t ic s are necessary:
Exact n u mber of i n structions execu ted fro m servi c e rel ease i n put up to re

sponse output.

The exact number of CPU c lock cycles for each i n s t ruction is known.

The n u mber of CPU clock cycles for a given i n s t ru c t i o n i s constant.

Lefs assume that t he second and t h i rd c h a racteristics a re t rue, providing deter


ministic hardware. The same set of instructions always req u i res the same n umber of

CPU clocks to execute. This alone does not guarantee dete r m i n istic execution time
because an algorithm may be data d riv e n . The n u m ber o f l oo p i terations or the

depth of recursio n of the algorith m may be a fu nction o f t h e i n p uts to the algo


rithm. For data-d riven algorit hms, the path l e ngt h , o r total i n structi n co unt, i
o
a funct ion of the input. Most algori t h m s are data drive
n . A n y block of code thdt
conta ins decis ion cons truct s such as " if' state ment
s o r " case" state ment s will exe
cute a diffe rent path based upon t h e outc ome
o f t h e " i f" expression or the "case"
state men t. For si mple data -driven algo
ri t h m s a n d code b l ocks, you ca n s irnrh
cou nt i nstr uctio ns i n a l l path s and com
pare to dete rm i n e the lo ngest pat h . This ap
.
pears s1 pl eno ugh for a sma l l bloc

k o f cod e a nd sim ple algo rith m , but wha t a 01


an app licat ion t h a t perform s a
com pl e x se rvic e ? F o r exam p l e , assu m e a s rvie
e
nee s to fi nd the root of a fu ncti
on whe re the fu ncti on is not i m ple . I n th e ca se
01
n d i ng a root fo r
fu nct ion that i s dete rm i ned by i n t eg ra t i ng sen so r ra t es
.
t i me, he fu.n ct 1on i s data d rive n and
not kn own a - prior i. ay you wa nt to p r
.
.
whe n a rol l i ng atel 1 1 te will
be bro ugh t to re t via dec eler on usin a th r u te
ati
.
t hat i whe n t h e t h ru t fu ncti on ( a n d
t here f o re acc elera t io n ) cau ; s t h e v e l"'f'

b<:

0':e
:

Scanned by CamScanner

1/0 Resources

17

fu n ction to rea ch zero Th e t


h ru st fu n ction is o ft en
k nown for a partic ular type 0 f
t h rust er. 0 n e met hod to fi m d t h
.
e roo t o f any functio
n is. to iterate
, bisect i ng a n interva I to detm e x a n d feed ing th e b
' ctton
ise
.
val ue mto
.
F ( x) = 0 t o test how close t he
current gues s fo r x is to zero I f
th e g ess is
.
h igher, then a lesser or greate r subin ter
.
val will be sele cted for the n ex t 1ter
a continuous function ; then with
at1
on. I f F ( x) is
.
.
.
.
succ essive
1tera t1on
s, the i nterva 1 w1.11 b
.
ecom e d tmmts
. . h mg
. I Y small and the bisection
.
of that i n terva l, or x' will co me c 1 oser and
. cl oser to t I1 e t rue value o f x where F(x) is
.
zero . How man y itera tion s w'll
1 t h ts
. ta ke? The a n swer d epen
d s upon the following
.
reqmrements:

Accu ra c y o f x n eeded

Com plexity of the func tion F(x)


The i n it i a l i nterval

F i n a l ly , s ? me fun c t ions may actually have more than one solution as wel l .
M n y n u mencal methods are s i milar to finding roots by bisection i n that they re

. qmre a total p a t h length t h a t varies with the input For such algorithms, you need
.
t ? pla e a n upper bound o n the path length to define WCET ( Worst-Case Exec u
t 1 0 n Time ) .
Ass u m i ng a n upper bound o n t h e algori t h m ic path length and deter m i n ist ic

hardwa re, t he n t he W C ET is safe as an input to an RM ( Rate M onoto n i c ) feasibil


lengt h s i m p l y
i ty test. A service release that requi res less than the maxim um path
.
ution o f CPU hardw are
enJoys m o re than n ecessa ry resour ce m a rgin. With the evol
mize t h rough put and provi de
desig n , most m ic ropro cesso rs have evolv ed to maxi
lerat ion to the most com mon ly exe
bette r over all e fficie n c y by emp loyi n g acce
refer ence s. This is typifi ed by the R I SC ( Re
c uted i ns t r u c t i o n sequ ence s and data
with i nstr ucti on pipe l i n ing and use of mem ory
duc ed I ns t ru c t i o n Set Com pute r)
ably
s increase d, mem ory acce ss late ncy for com par
cac hes . As C P U cor e cloc k rate
itec t ure
not kep t pac e. So, mo st R I SC pip el i ned arch
sca led cap acit y has gen era lly
ess late ncy ) me m
zer o wa it-s tate (sin gle CP U cyc le acc
m a ke use o f cac he, a s m a l l
ciat ive
sm all to hol d ma ny app l icat ion s. So, set asso
too
is
he
cac
,
ely
nat
u
t
r
o ry U n fo
ry dat a-w hen the ca he hol d
.
por a rily hol d m a i n me mo
m e m ori es are use d to tem
h i t , and the sin gle c cle acc e .
d by a pro gra m, t h is is a
nce
ere
ref
ss
dre
ad
an
dat a for
tio n or loa d dat a
C P U p i pe l i ne t o fet ch a n i nst ruc
the
s
ow
ll
a
e
cod
to da ta a n d/o r
ver , tail s the p i pel ine . f u rth er
cyc le. A cac he m iss; ho we
le
ng
si
a
in
r
iste
reg
o
a
t
in
u i re mo re tha n a i n
o ry Ma pp ed Re gi ter ) ma y req
em
M
(
s
R
MM
m
mo re, I/O fro
i nee ded for
e s t a l l th e P U pip el i ne i f th e dat a
i
kew
i
l
ll
wi
d
n
a
gJe C P U co re cyc le
id ing the m i an art t hat an
t i n g po ten tia l tal l and avo
tec
De
.
n
o
i
t
c
ru
st
n
i
t h e ne xt
, t ha t
f o r e x a m p l e . i n tru t ion
era ll fo r a C P
o
cy
v
en
i
ic
eff
n
i n c rea se ex ec u t i o
req uir ing
o u t o f or de r o tha t t ht' i n t ru t i0n
ted
u
c
exe
be
n
a
c
ca use a n M M R rea d
po t en tia l p i ( d i ne tal l.
as lat e a po. s i h l c , de lay ing t he
d
te
ecu
ex
js
t he rea d da ta
-

Scanned by CamScanner

II

ents an d System s
Re al-T ime Em be dd ed Co mp on

t
ti o n, is to i ncre ase o v
ed in det a l i n the nex sec
ercrib
des
,
ing
elin
pip
of
nt
poi
The
.
pl
m
es
the
f
exa
o
m
ca
fro
.
che
ent
1s evid
rn isse s
.
.
1 1'enc y. However , as
a u exec u t ton effic
s
lm
ll
1pe
sta
es
y
l
,
wil tall , a nd
cau se a dat a dep end enc
.
and M M R late ncie s tha t ma y
.
ion and dat a s t rea m. . The effi. ciency .1s therefo re data
t h 1s 1s a func t'ion o f the 1 nstr uct
ncy wil l vary, eve n fo
erm ini stic . So, execut 10n efficie
r
and code dri ven and not det
y
l
a
n
be
o
t
fun
o
n
y
ctio
a
m
n
ts
n
te
o
con
f
th e
cac he
the sam e blo ck of code, bec aus e
t
e
ds
tha
u
rea
h
xec
t
te
us
d
vio
i
n the
also o f the pre
cur ren t thread of execut ion , but
th
d
n
a
leng
the effi cie ncy
ctio n of he lon ges t pat h
pas t. In sum mary, WC ET is a fun
des cnb es W C ET:
in execut ing tha t pat h . Equ atio n 4. 1

WCET

[ CPlll'orsi -case x Longest - Path - Instruction - Count ]

Clock

- Period

( 4. 1 )

li ned exec utio n as the n u m ber


The CPI is a figure that describes effici ency in pipe
to execu t e eac h instructio n in
of Clocks Per I nstruc tion on average that are req uired
,
ncy, a nd "Overl appi ng
a block of code. In the next two sectio ns, " I ncrea sing Efficie
Executi on with 1/0," we discuss how best- and worst-c ase CPI can be determ i ned or

at least approximated well. The longest path i nstructio n c o u n t m ust be determ ined
by i nspection, formal software engineering proof, or by actual instruction count
traces in a simulator or with the target CPU archi tect u re. Warning-most CPU core
documentation states a CPI that is best case rather t h a n worst case.

For ful l determin ism in WCET for hard real - t i m e system s , you m tst guarantee

the following:

All memory access is to known latency m e m o ry, i n c l u d i n g locked cache


main memory with zero or bounded wa i t - states.
U n locked cache h its are not expected
not determi n istic.

in

or

unloc ked cache becau se the hit rate is

and device 1/0 is not expected n o r req uired t o meet deadli nes.

Overla p o C P

AlJ other p1pelm e hazard s i n additio n to cache m isses a n d device


1/0 read/write
stans are lumpe d i n to CPI and taken as worst case ( e.g. branc
h-den sity x branch
penalty) .
Longest path is know n, a n d instr uctio n c o u
n t fo r i t is kno wn .

.
.
For soft rea l-ti me sys tem s , yo u c a
. n a I I ow occas1o nal servtce d rop-outs or hm
ite
. d verr ns and ther efor e use
ACE T ( Ave rage -Cas e Exec u tion T i m e ) . The AC ET
can e esti mated fro m the follo
wing i n fo r mation :

Expec ted L 1 and L2 eac h e h i


t / m i s ra t i. o a n d cac h e m i s pena lty.
Expecte d overl a p of CP U an d d
evic
. e 1 10 req u i. red to m eet dcad J i nc .
A l l oth er p i pe1 i. ne h aza rd
are typ1.c
a II Y eco n d a ry a nd a n
.
igno red It' k' t'
b ran c h m t. pre d 1eti
. on .
A ve (age len gth pat h is kno
.
.
w n , a n d t h e 10
truct1o n count for i t is k n o w n .

Scanned by CamScanner

be

1/0 Resources

II

I n s u m mary, yo u h a ve t h e fo l lowing two equations:


WCET = Memory - Latency + Device - 10 - Latency +
[ Longest - Path - Inst - Count x CPI Effntfre ]

ACET = [ Expected - Cache - Miss - Rare x Miss - Penalty ] + [ NOA

[ b:pecred - Path - Inst - Count

CPI Efftrr11r

10 - Latency ] +

( 4. 2 )

I n these equations, t h e effect ive C P I acco unts for sec<?ndary p i peli n e hazards
such as branch m ispredi c t i o n s . The term NOA ( N on -Overlap Allowed ) is 1 .0 if IO
t i m e is n o t overlapped with p rocessing at a l l .

I NTE R M EDI ATE 1/0


I n a n o npreem p t iv e r u n - to - completion system with a pipelined C P U , s i x key
related equations describe CPU-1/0 overlap. Note that the I / O described here is de
vice 1 /0 that occurs during the service execution, rather than the i nitial I/O, which
releases the service i n the fi r t place. In some sense, the device 1/0 occurring d uring
service execution can be considered m icro- 1/0 and usuall y consists of M M R access
rather t ha n bl1xk-oriented D M A ( Direct Memory Access ) 1 /0. A l t hough this i n ter
,nediate I / O is m uch lower latency than the i ni t ial block 1/0 latency, i t reduces the
execution effic iency sign ificant ly. I deal ly, with careful generati o n o f m ac h i ne code
( co m piler o p t i m izat i o n s ) , careful h a rdware d e sign for p i peline in struction reorder

i ng, and ca reful e r ice design, you can m i n i m ize the loss o f efficiency d ue to m icro-

1/0. First, you must u n derstand what it mea n s to overlap l/O with CPU.
Consider t h e following overlap

definitio11s:

J C T = I nstruction C o u n t T i m e ( Ti m e to execute a block o f i nstruction


n o stalls = C P U Cycles x C P U Clock Period )

with

J OT = Bus I n terface 1/0 Time ( Bus 1/0 Cycles x Bus Clock Period )
O R = Overlap Req u i red-percentage of CPU cycles that m ust be co n c u rrent
w i t h 1 / 0 cycles
NOA = Non - Overl ap A l l owable for S1 to meet 01-percen tage of C P U cycle
that can be in add ition to 1/0 cyc l e time without m issing service dead l i n e

D; = Dead l i n e for Service S1 rel ative to release ( i n te r r u pt ini t i at i n g exec u t io n )

C P I = C l ocks Per I n t ruction for a block o f i nstruction


The c haracteristic s of overlappin g J /O cycles with C P U cycles for

are s u m m a rized as fol lows by the five po

J/0 t i m e relative to S, dead l i n e D1:

Scanned by CamScanner

i b l e overlap conditions fo r

service

PU time and

Real-Time Embedded Componen ts and Systems

.
. d o therwi se , i f D' < I OT,
1 . D; IOT is reqmre
'
.
.

S; is 110-Bound.
. CPU- Boun d.
.
.
if D ' < I CT, S i is
2 . D > ICT is requ ire d, o therw1se
. h I CT .
, erl ap 0 f I OT Wit
req uires n o ov
3
1 - ( JOT + JC T)
> I OT a n d D ; ICT), overlap of I OT
with
4. I f D; < ( J OT + ICT) where ( D I -

o'. ;

I CT is req ui red .
" e D ; can 't be
I OT or D; < I CT ) , d ea d i m
met
S. I f Di < ( IOT + ICT ) where (D; <
regardless of ove rlap.

OT > 0 must be true


For aI I fi1ve over lap Conditions l isted here, I CT > 0 a n d I
. If
a
I
one
IOT
is
or

ICT
f
I
zero,
required
is
t
Service
h en no
.
I CT and I OT a re zero, n 0
.
.
.
.
o
n
with
m term ed 1a te IO.
overlap 1s poss1'bl e. Wh en IOT = O ' t h is i s an ideal service
.
From these observatio ns about overlap i n a n o n p reem p tive system , we can de

duce the following axioms:

C P l worst-case = ( I CT + J O T ) I J CT

C P ibest-case

( 4.3 )

( max ( I CT, J OT ) ) I ICT

C P J required =

( 4. 4 )

D ; I ICT

( 4.5)

OR = 1 - [ ( Di - IOT) I ICT ]

( 4.6)

CPI require
. d = [ ICT ( 1 - O R ) + IOTJ I l CT

( 4. 7)

OR + NOA = I ( by d e fi n i t i o n )

( 4.9)

NOA = ( D; - ! OT ) I I Cy

( 4.8 )

Equations 4 . 7 and 4.8 provide a cross-check. Equation 4.7 should always match
equation 4.5 as long as cond ition 4 or 3 is true . Equation 4 . 9 should always be 1 .0 by

definition-whatever isn't overlapped m ust be allowable, or it would be requi red.


When no overlapping of core device 1/0 cycles is possible with core C P U cycles,

then the following condition must hold for a service to guara ntee a deadline:

[ Bus - 10 - Cycles x Core - to - Bus

- Factor ] + Core - Cycles <

WCEli

< D;

( 4. 10)

The WC ET m ust be less t h a


n the servic e's dea d l i n e bec aus
e we hav e not con
sid ered i n terferenc e in t h
is CP U- 1/0 ov erl ap an alysis
. Re cal l t h a t interfe rence t i me
m us t be ad de d to rel ea
se 1/0 lat en cy a n d W CE
T:
VS, , Tresponle-1 $ Deadline,

Tre pon:;e-i

==

TJO - Lareru- v-1 + WCET, +


'
TMtmor/- Lntency- 1

i-1

T.I int ufuence- J


1= 1

( 4. 1 1 )

Th e W CE T de du ced fro m Eq
u ation 4 . 1 0 m ust the e o
r f re be an i n p u t i n to the
no rm al RM feas i bi l i t y an aly sis
ha
t
m
od el s i nte rfe r e nc e. Th e Co
t
re- to- Bu s- Fa t r
o

Scanned by CamScanner

1/0 Resources

ti

term is ideally 1 . This i s a zero wait-state case where the processor dock rate and
bus transaction rate are perfectly matched. Most often, a read or write will require
multiple core cycles.
The overlap required (OR) is indicative of how critical execution efficiency is
to a service ' s capability to meet deadlines. I f OR is high, then the capability to meet
deadlines requires high efficiency, and deadline overruns are likely when the
pipeline stalls. In a soft real-time system, it may be reasonable to count on an OR

of 1 0 to 30%, which can be achieved through compiler optimizations (code sched


uling), hand optimizations ( use of prefetch ing), and hardware pipeline hazard
handling. Note that the ICT and JOT in Figure 4. 1 are shown in nanoseconds as an

example for a typical l 00 M Hz to 1 GHz CPU core executing a typical block of


code of 1 00 to 6,000 instructions with a CPicffcctivc = 1 .0. It is not possible to have

OR

>

1 .0, so the cases where OR is greater than 1 .0 are not feasible.


Or Fundion (ICT, IOT) for IOT nd/or ICT > CT
1a
17
16
1S
1.
13
1.2
1 1

D 1 1-1 1
D 1 1-1 1
D 1 5-1.1

1 3-1 '
1 l-l .J
I 1-1l
1- 1 1
0 9-1
D O l-0 9
0 1-0 .1
O f-0 7
0 .5-0 a

O l -0.3

0 0 1-0l
0 0-0 ,

-0 :!--0 ,

FIGURE .1

CPU- 1 O overlap required for given ICT and IOT.

EXECUTION EFFICIE NCY


When WCET is too worst case, a well tuned and pipelined CPU architecture in
creases instruction throughput per unit time and significantly reduces the proba
bility of WCET occurances. In other words, a pipelined CPU reduces the overall
CPI required to execute a block of code. In some cases, I PC ( I nstructions Per

Scanned by CamScanner

12

Real-Time Embedded Components and Systems


.
Clock), wh ich is the inverse 0 f CPI is used as a figure of merit to describe the ov
. I ' 'n d CPU. A CPU with better through put er.
all possible throughput of a pip
ha

' we will use only CPI noting tha


t
ex
ts
s
t
n
I
t
lower CPI and a higher IPC.

C Pl =

I PC

.
. 1 me
. d hardware architecture is to ensure that a n in str uc ti o n
The point of pipe
is
.
ior a 11 I nstructions in the ISA ( I nstruction S et Are h Ite
' ct ure J
completed every cI ock c
.
.
.
NormaIIy CPI is 1 0 or less overall in modern pi pelined systems. Figure 4.2 show

n 1s
com
a simpl e CPU pi pe 1 me
an d 1 ts stage overlap such that one instructio
pleted
(retired) every CPU clock.

IF

ID
IF

Execute

Write-

Back

ID

Execute

IF

ID

Write-

Back

IF

Back

10

Execute

IF

ID

FIGURE 4.2

Write-

Execute

..

Write-

B.lck

Execut

..

..
-

Writ

Back

...
.

.
.


..

i i :D 1;.:.-r=l
.

IF

Sim ple pipe line stage ove rlap (De pth


= 4).

In F igur e 4.2, the stages are


I nst ruc tion Fet ch ( I F ) , I nst ruc tion Decod ( I D}.
e
Exe cut e, and regisier Wr
ite- Bac

k. Th is exa mp le pip elin e is fo ur stage, so for th


pip elin e to reach ste ady -sta
te op era tio n a n d a CP I of 1 .
0, it req uire s fo u r CP
cycles unt il a Wr ite- Ba
ck occ urs on every I F . A t thi
s po int , as lon g as the stagt
overl ap pin g can co nti nu
e, on e ins tru cti on is co mp
let ed every CP U clock .
ipe
line design req uires mi
P
nim
iz
ati on of hazard s, so the pip eline m u st

stall ht
on e-eyeJ e Wnt e- Backs to
pr od uc e co rrect re su lts. Th
esign
e str ate gi es fo r p i pe 1me drrun
a re well descn bed by om
a
pu ter arc hi l ec tur e tex

ts [ He nn essy03 ] , b ut are su
.z
PI
n ed here for conven ience.
Ha zard s tha t ma y sta
. n r ase C
ll
t he pipeli n e and i c e
inc lud e the foll ow ing :

Scanned by CamScanner

9J

1/0 Resou rces

I nstruction a n d data cache mi 'Ses, r eq u i r ing


main memory

hi gh laten cy cache load from

H igh latency device read or wri tes, req u i ring t he p i pd i n e to wait for com plet ion

C ode b ra n c h es-ch a n ge i n l o cal i ty of execut ion and data reference

The i ns t ru c ti o n and data cache misses c a n be reduced by i n c recl i n g ca h e ize,

keeping a separate data and i n s t ruction cache ( H arvard a rchitec t u re ) , a nd allowing


t h e p i peli ne to execu t e i n s t ructions out of order o that

omet h i n g

ont i n ue to

ex ecute while a cache m iss is b e i ng hand led. T h e hazard can't be el i m i nated u n le


all code and data can be locked i n to a L e vel - I cache ( L e vel - I ca c h e i

i nglc cycle

access to the core by defi n i tion ) .

The high latency dev ice read/write hazard i s ver

typical i n em bedded s y terns

where device inte rfaces to sen ors and act uators are co n t ro l l ed and monitored via

M M Rs. When t hese devices and their registers are written or read, thi can stall the
pipel ine while the read or write complete . A p l i t - t ran action bus i nterface to de

vice M M Rs can grea t l y red uce t h e pip e l i ne hazard by allowing reads to be po ted

and t he pipel i n e to con t i n ue u n t i l the read completion is really needed-l i kewise a


spl i t t ransaction bus

a l l o ws

write to be posted to a bu

i n t e r fa c e q ueue in a i ngle

cycle. When a write is po ted, t he p i pel i n e goes on ass u m i ng the M M R write w i l l

ulti m a tely comple t e , but t h a t o t h e r i n s t ructions in t h e pip el i n e d o not nece a r i l y

need t h i s to complete before t hey exec u t e.


F i n a l ly, code b r a nc h i n g h a z a rd

can be red uced by branch pred i c t ion a n d

spec u l a t i ve exec u t ion o f both b ranches-even with branch p rediction a n d spec u

lat ive exec u t io n , a m i s p red i c t i o n typi c a l l y req u i res s om e st a l l cycles t o recover.

By fa r, t he p i p e l i n e h az a rds t h at cont ribute most to l ow ering C P I a re cache

m isses a n d core bu

i n t er fa c e 1/0 l a te n cy ( e.g. , M M R access ) . Branch m ispredic

tion and other p i peline haza rd often res u l t in st alls of m uch s h o rter d u ration ( b y
orders of m agn i t u d e ) co m p a red to cache m i se and device 1/0. The i n dete r m i n
ism of cache m isses can be greatly reduced by loc k i n g code in i n struction cache

and l oc k i ng data i n data cache. A Level- I i n truction and data cache is often t oo
m a l l to lock down a l l code a n d a l l data req u i red , b u t ome a rc h i tec t u re i n c lude a

Level-2 ca he,

h ich is m uc h l arger and can also be locked. Level-2 cache u u a l l y

ha 2 -cycle or m o re, b u t le

t ha n I 0-cycle acces ti me-loc k i n g code and d a t a i n t o

L 2 ca h e i m u h l i ke havi n g a I o r m o re wa i t-state memory. L o w wait -state mem

ory i often referred t o a a T

M ( T i gh t ly Coupled Memory). Thi

r eal - t i m e em bedded y. tern becau e it e l i m i n a te m uch

cache h i t/m i

ideal for

f t h e nondeterm i n i m of

ra t i o that a re da ta - t rea m and i n t ru t i n - t rea m d r i v e n . A n L2


ca he i often u n i fied ( hold in tr u tion and d t a ) and 2 6 K R , 1 2 K B, or m re
i n ize. o , you h o u l d l o k all rea l - t i me e r i
mo t oft e n L 2 , leaving L I t o i n rea

Scanned by CamScanner

ode and dat( int

effi ien y w i t h d n a m i

LI

r L2

he .

loadi ng. A l l be t

94

Real-Time Embedded Components and Systems

i n m a i n mem ory beca u s


e dead.
effort servic e code and data segmen t s can be kept
i
ked
or
I
unloc
L
2
L
any
d
ca
n
a
c rea
che
issue,
n
a
n
lines and deter mi n ism are not
&es
r
ased best e fort se vices.
o
the exec ution effic ienc y of thes e main - mem ry-b
.
cy a nd ptpelme hazards is ve
laten
/
e
1
devic
0
of
ism
n
i
eterm
nond
E l i mina ti ng

ry
1
to exec u te w h 1 e a .w n te 1s d rai ni n g t
diffic ult. When instru ctio n s are allowed
o
is oka _r i n m a n y c i rcu m s tan ces, b u t
no:
devic e, this is called weakly consisten t. This
not
yet
s
c
n
executed
o
t
i
nstruc
i
for
other
or rect .
when the write must occur before
posted
write
h
t
y
b
ited
m
us
i
b
i nt erfac

ness. Posti n g writes is also u l t i m ately l


e
ust
r
tes
stall
w
t
t
he
seque
su
full,
CP
is
U
queue
e
t

unt il

queue depth- when


_
for
kew1se
1
L
.
splitwnte
g
n
tr
i
pend
a
one
n
sactio n
the queue is drained by at least
:
rlier
ea
e
h
t
execut
from
data
ed
uses
re ad i n.
reads, when a n i nstructio n actually
pletes.
Othe
rwise t he de.
struction, then the pipel i n e m ust stall u n t i l the read com
pendent instruction would execute with stale data, and t he executio n woul d be

errant. A stall where the pipeline must wait fo r a read completion i s called a data
dep ende11cy stall. When split-transaction reads a re sched u le d with a regi ster as a
dest ination, this can create another hazard called reg ister p ressu re-th e regist er

awaiting read completion is tied up and can ' t b e used a t a l l b y other i nstr ucti ons

until the read completes even though they a re not dependent u pon the read. You
can reduce register pressure by adding a lot o f genera l - p urpose registers ( m ost
pipelined RISC arch itectures have dozens a n d dozen s o f t he m ) a s well as by pro

viding an option to read data from devices i n to cache. Read i n g from a memory
mapped device i n to cache is normally done with a cache prefetch instruction. In

the worst case, we m ust assume that all device 1/0 d u r i n g executio n of a service
stalls the p i peline so that

WCET = ( CPib,,st=case

Longest

Path

Instruction.

Count ) + Stall

Cyclei

Clock

Period

I f you can keep the stall cycles to a determ i n istic m i n i m u m by locking code
.
m to L2 cache ( o r L I i f poss i b l e ) a n d b y reducing device 1/0 stall cycles, then

WCET can be reduced. Cache locking helps immen sely and i s ful l y deterministic.

1/0 AR CH ITE CTU RE


I n t h i s hap ter, 110 h as bee n exa m
ined a s a reso urce i n term s of late n cy ( t im e) a nd
ban dwi dth ( byte s/secon d ) .
The view has bee n from the perspecti ve of a s ine
processor cor e and I/O be t ween
. heral . The erner
t h at proce ssor core a n d perip
s
genc e o f adva nce d A S I
C arc h itec t u res, suc h as S oC ( Sys tem -on a Ch i )
- p . ha
b
rou gh t ab ou t em be dd ed
J I

I
smg
h
e-c
'
1
p
syste
m
m
desig ns t h a t integ r ate
.
u up e
p roc ess ors wit h m a n y p
enp h era I s m
more comp le .
B
.
x m terco n nec tions t h a n I L'
( Bus I n te r fa ce U n i t ) des igns
.
.

. F 1gur
t er
e 4 . 3 p rovid es an over view of t he m a ny 1n
.

Scanned by CamScanner

1/0 Resources

95

c on n ectio n networ s that can be used on chip an d between multich i p or


.
mul tt subsys tem designs, i ncluding traditional bus architectu res.

even

lntfrtonnctton N ftwork

Dynamic

Bus

iA
king

Omqa

FIGURE 4.3

Non-blocking

ArbltntfCl/Routed

Fully

(Frames. Sequffttts, P1c:bts)

Com11ted

Ring

Hab

Trtt

Bfnes Clos Cross-b1r

A taxonomy of interconnection networks.

The cross-bar interconnection fully connects all processing and 1/0 compo

nents without any blocking. The cross-bar is said to be dynamic because a matrix

of switches m ust be set to create a pathway between two end-points as shown in


Figure 4.4. The number of switches required is a quadratic function of the number

of end-points such that N points can be connected by N2 switches-this is a costly


interconnection. Blocking occurs when the connection between two end-p9ints
prevents the simultaneous connection between two others due to common path

ways that can't be used sim ultaneously. The bus interconnection, like a cross-bar,

is dy namic, but is fully blocking because it must be time multiplexed and allows no
more than two end-points with i n the entire system to be connected at once.

FIGURE U

Scanned by CamScanner

Crbar interconn ectioo network.

Real-Time Embedded Components and Systems


ov ide s low - l a te
as t h e rss - b a r pr.
ncy
A no nbl ocking i n ter con nec t ion suc h
0c
b
ed
no
cat
ded
n
a
i
l
g
n
ki n
-po in ts by pro vid i
c o m m u n ication bet we en two end
ust
m
bus
a
r
e
req
urlit
com m u n icati ng ov
circ uit. By com pari son, an end - poin t
p
r
the
end
ano
ess
oin t 011
by a co ntroll er, add
ces s for the bu s, be g ra n ted the bu s
5 u s , t h e bu s con t ro lle
r
the bu s. If the bu s 1
h
uis
inq
rel
and
a,
er
.
dat
nsf
,
s
the bu tra
.
a
i
e
n
g
tim
ti
n
tra
re
rbi
a
s
tl
bu

y
qu eue . Th is
t
ues
req
a
in
it
wa
.
tor
ues
m akes the req
nen ts w i l l be discussed in
n har dw are com po
,,
cre ase s 1/0 late ncy . I nte rco nne ctio
tem Co mp on ent s.
Sys
ded
bed
Em
"
8,
er
gre ate r d eta il in Ch apt

ac

S U M MAR Y
incl udes i n p u t , i nter med iate 1/0, and out put
The over all resp onse time of a serv ice
mos t ofte n be dete rm i n ed an d added
laten cy. The i nput and outp ut laten c ies can
is more com plex becau se i n te rmedi
to the overa ll respo nse ti me. I n terme diate 1/0
al, i nterc on nection net work
ate 1/0 includ es regist er, memo ry bus, and, i n gener

t i o n . The stall time reduces


l at en c i es that will stall an ind i vidual proces sor's exec u
for i
executi on efficien cy for each process or, a n d u n l ess t hese stall cycles can be used
other work, i ncreases WCET for t h e servi c e.

EXERCISES
l.

I f a processor has a cache h i t rate of 99.5% a n d a cache fl1 iss penalty of 1 60

core processor cycles, what w i l l t h e average C P I b e for 1 , 000 i nstructions?

2. I f a system must com plete fra m e processing so that 1 00 ,000 frames

are

compl eted pe second a n d the i nstruction count per fra m e processed is


.
2, 20 mstruct1ons o n a I -G H z processor core, what is t h e CPI required for

this system? What is the overlap between i nstructi o n s and 1/0 time if the
i n termediat e I/O time is 4.5 m i c rosecon ds?

3. Read the Sha, Rajk umar, et al. paper, (( Priority I nheritan ce Protoco ls: An
A pproac h to Real-T ime Synchr onizat ion." Write a b rief s u m m ary noting

at l ast th ree key conc epts from t h is pape r.


4 Re vie w the CD-R OM code for heap_
mq.c a n d posix-mq.c. Write a brief

para grap h d escn bmg h ow t h ese two


message queue applicat io ns are su n ilar
and ho t ey are differen t . Make sure
you not o n l y read t h e code , bu t th3
you bui ld 1t ' load it , an d e
xecute I t t o m ake s u re you u ndersta nd how bot t
.
.
app l 1Cat 1on wor k.
1 1,
5. 'N r i t e VxWo rks cod e tha t
pawns two tasks: A a n d B. A sho uld ini t 1a 1
.
.
t'
a n d ente r a wh i l e ( 1 ) l o op w
h 1" c h 1t does a semGi ve o n a gloha! "ina i
m
ema pho re s 1 a n d then doe s a semT
a k e o n a d i fferent glob a l b i n a ry
stfll' .
.

Scanned by CamScanner

1/0 Resources

17

phore s2. B should i nitialize and enter a wh i l e ( 1 ) loop in which it does

semTake of s 1 , delays l second, and then does a semGive of S2. Test you r

code o n a t arget o r V x S i m a n d t u r n i n a l l source with evidence t h a t i t


works correctly ( e.g., show coun ters that i n c rement i n windshell d u m p ) .
6. Now r u n the code from the previous exercise and analyze a WindView
trace for i t . Capture the W i ndView trace and add a n n otations by hand that
clearly show

s emG i v e

and

semTa ke

events, the execution t i me, and delay

time. Note any !J nexpected aspects of the trace.

7. U se

t a s k S w i t c h HookAdd

and

t a s kSwi chHookOe l e t e

to add some of you r

own code to the wind kernel context witch sequence and prove i t works
by increasing a global counter for each preempt/d i patch context swi tch
and t i mesta mping a global with the last preempt/d ispatch time with the
x86 P I T ( progra mmable in terval t i m e r )-see the CD- ROM sa mple PIT
call can also be used t o sa m pl e relative

code. The VxWorks

t ickGet { )

s y s C l k R a t e S e t ( 1 ooo )

) . M ake an On/Off wrapper fu nction for yo ur add

ti mes, but is only accurate to the tick resol ution ( I millisecond ass u m i ng
and delete so that you can turn on your switch hook easily from a

windsh

command l i ne and look at the values of your two globals. Turn i n yo ur


code and wi ndshell output howing that it works.
8 . Modify you r hook so that it w i l l compute separate preempt a n d dispatch

counts for a specific task ID or set of task I Ds ( up to J O) and the last pre
empt and di pat h t imes for all tasks you are moni tori ng. Run your code
from Exercise 4 and monitor it by cal l i ng your On/Off wrapper before
r u n n i ng yo ur test tasks. What are your counts and last ti mes and how do
they compare with W i ndView analysis?

9. Use your program to analyze the nu mber of t N e t T a s k dispatches/preemp


t ions and modify the code to track the average t i me between dispatch and
preemption. W rite a program to gather stats on

tNetTask

for 30 seconds.

What is the n umber of dispatches/preemptions? What is the average dis


patch t ime?

CHAPTER REFERENCES
! Almasi89 ] Almasi, George, a n d Allan Gottlieb, Highly Parallel Compu ting. The
Benjam i n/Cummin g Publishing Company, 1 989.
I Henne y0 3 ] Patterso n, David, and John H e n nessy, Compu ter A rch i tecture: A
Quantita tive Approach, 3rd ed. Morgan Kaufmann Publisher , 2003.

I Patter

on94 j Hen nessy, John and David Patterso n , Compu ter Orga n iultion and

Design: Tire Ha rdwa re/ oftware Interfa ce. M organ Kaufm a n n Publi her ,
1 994.

Scanned by CamScanner

Memory

11 Tiiis Chapter

Introduction
Physical Hierarchy
Capacity and Allocation
Shared Memory
ECC Memory
Flash Filesystems

INTRO DUCTION

In the previous chapter, memory was analyzed from the perspective of latency,
and, in this sense, was treated like most any other 1/0 device. For a real- time em
bedded system, this is a useful way to view memory although ifs very atypical
compared to general-purpose computing. In general, memory is typically viewed
as a logical address space for software to use as a temporary store for intermediate
results while processing input data to produce output data. The physical address
space is a hardware view where memory devices of various type and latem:y are
either mapped into address space th rough chip selects and buses or are hidden as
caches for mapped devices. Most often an MMU ( Memory Management Unit)
provides the logical-to-physical address mapping (often one-to-one for embedded
systems) and provides address range and memory access attribute checking. For

II
'

Scanned by CamScanner

1 00

Real-Time Embedded Components and Systems


001
examp le, some memo ry segme nts may have attrib utes set so they are read
ory segm ents. M em
Typi cally, all code is place d in read- only attrib ute mem
o
e
a
n-cach
o
as
n
h
ble
often
att
most
rib
space
M apped 1/0 ( MMIO ) address
ute set
lly written to a devi
to preve nt outpu t data from being cache d and never actua
ce.
.
.
viewe
s
i
d
as
mory
e
m
a
where
gl
oint,
obal st ore
From a highe r level software viewp
memory
segm en ts use.
and an interface to MMIO , it's often useful to set u p shared
able by more than one service. When memory is shared b y m o re than on e service,
care m ust be taken to preven t inconsi stent memo ry u pdates a n d reads. F rom a

re

source perspective, total memory capacity, memor access latency, and m emory
interface bandwid th must be sufficien t to meet requ irements .

PHYSICAL HIERARCHY
The physical memory hierarchy for an embedded p rocessor can vary signi ficantly
based upon hardware architecture . However, most often , a H a rvard architecture is
used, which has evolved from G PCs ( general -pu rpose com puters) and is often em
ployed by em edded ystems as well. The typical Harvar d architec t u re with separ

ate
L I ( Level- I ) mstru ct1on and data caches , but with u n ified L2 ( Level- 2)
cache and
either on-ch ip SRAM or extern al DOR ( Dyna mic Data RAM ) is show
n i n Figure S. l .

Unlockd L 2 D.ita C ach


l 1 Ins trucllon C .c

JICllDS

Scanned by CamScanner

u ... .. . _ _ .,,

Memory

1 0 1-

From the software viewpoint, memory is a global resource i n a single address


space with all other MMIO devices as shown in Figure 5.2.
L l Instruction Cache

L2 Data Cache

Locked St.ck
Cached [)eta

Locked Dita

FIGURE 5.2

OMA Buffer Pools

St.ck

Logical partitioning and segmenting of physical memory hierarchy

by firmware.

Memory system design has been most influenced by GPC architecture and
goals to maximize throughput, but not necessarily to minimize the latency for any
sint,le memory access or operation. The multilevel cac;hed memory hierarchy for
GPC platforms now often includes Harvard L l and unified L2 caches on-chip with
off-chip L3 unified cache. The caches for GPCs are most often set associative with
aging bits for each cache set (line) so that the LRU (Least Recently Used) sets are
replaced when a cache line must be loaded. An N-way set-associative cache can
load an address reference into any N ways in the cache allowing for the LRU line to
be replaced. The LRU replacement policy, or approximation thereof, leads to a
high cache hit to miss ratio so that a processor most often finds data in cache and
does not have to suffer the penalty of a cache miss. The set-associative cache is a
compromise between a direct-mapped and a fully associative cache. In a direct
mapped cache, each address can be loaded into one and only one cache line mak
ing the replacement policy simple, yet often causing cache thrashing. Th rash ing
occurs when two addresses are referenced and keep knocking each other out of
cache, greatly decreasing cache efficiency. Ideally, a cache memory would be so
flexible that the LRU set ( line) for the entire cache would be replaced each time,
minimizing the likelihood of thrashing. Cost of fully associative array memory
prevents this as does the cost of so many aging bits, and most caches a r e four-way

Scanned by CamScanner

1 12

ms
ponents and Syste
Real-Tim e Em bedded Com
.
e 5.3 shows a directb 1' t s for LRU. Figur
.
ng
agi
3
or
2
ciative WIth
e ca che mern .
or ei t-way set asso
t im es th e size o f th
.
ur
fo
is
at
th
a mam emo
byte ad dr ess ab le words),
mapped cache where
r o r mo re J 2 -bi t
u
o
o
ions
lect
(col
has memory sets
se t.
ed in to on e ca ch e
ad
lo
be
ly
on
n
ca
h
ic

gh

Ma in Memory

Dir ect -M apped Cache


S3
S2
SI
so

FIGURE 5.J

D i rect mappin g of memory to cache lines (sets).

By compariso n, Figure 5.4 shows a two-way set-associ ative cache for a main
memory four times the size of the cache. Each line can be loaded into one of four
locations i n the cache. The addition of 2 bits to record how recently each line was
accessed relative to the other three i n the set allows the cache controller to replace
lines that are LRU. The LRU policy assumes that lines that have n o t been accessed
recently are less l ikely to be accessed again anytime soon. This has been shown to
be true for most code where execution has locality of reference where data is used

within small address ranges ( often i n loops) distributed throughou t memory for
general-p urpose programs.

For real-tim e embedded systems, the unpred ictabili


ty of c ache hits/misses is a
problem. It makes it very difficu lt to
estim ate WCET ( Wors t-Cas e Execu tion Time).
n the extreme case, it's really
only safe to assum e that every cach e access could
mc ur t he m iss penalty F or t
c
h is reason , 1or
h ar d real-tim e system s, it's perh aps ad
Vlsable not to use each

e. H owev er, t h 1s
.
. would greatly reduce t h roughp u t to obta lll
.
.
determ m1 st1 C exec uti on s
o , rat h er t h an mclud mg
m u ltileve l caches ' man y real

t .1me em L
uc
-dd ed s ys t
em
s
inste ad mak e invest men
t in TCM ( Tigh tly Cou pled Mern
. .
.
ory) , wh 1c
h 1s smgle -cycle a
ccess 1or
,
t he proc esso r, yet has no cach e fu nctiona 1 tr
.

Scanned by CamScanner

Memory

10J

FIGURE 5.4 Two-way set-associative mapping of memory to


cache lines (sets).

Furthermore, TCM is often d ual ported so that service data ( o r context ) can be

Joaded via D M A and read/written at the same t i me. Some G PC proce ors i nclude

LI

or

L2

caches that have the capability to lock ways so that the cache can be

t urned i n to this type of TCM for embedded applications .


The rea l izat ion that G PC cache arch itect ure is not always benefic ial to real
time embedded systems, especially hard real-t ime systems, has lead to the emer
gence of the software-managed cache. Furthermore, compared to GPCs, where it
is very d i fficult to p redict the care with which code will be written, embedded code

is often carefully analyzed and optim ized so that data access is carefully plan ned. A
software-man aged cache uses application- specific logic to schedule load i ng of a
TCM with service execution context by D M A from a larger, h igher latency external

memory. H ardware cost of a GPC cache arch itecture is avoided and the manage
ment of execution context tailored for t he rea l - t i me services-the scheduling of
DMAs and t h e worst- case delay fo r complet i n g a TCM load-must of cour e till
be carefully considered.

CAP ACITY A N D A LLOCATIO N


be total
The most basic resou rce con ern a sociat ed with memo ry hould alway s
and ervi es
capacity n eeded . M a n y a lgori t h m i ncl ude pa e and t i me trade -off

Scanned by CamScanner

IN

Real-Time Embedded Components and Systems


.
. Keep i n m i n d t h a t cach e does
ssmg
poce
for
text
con
data
t
n
ifica
often need sign
.
n 1 Y copie s of data ra ther tha n
not contnbute to tota 1 capact ty bec aus e 1 t sto re o
.
ior e m bedde d system s where ca anot h er dow nsid e to cac h e. ,
u nique data. Th 1. s ts
. .
. d F u rt her more 1 aten cy ,
ior access to m e mo ry devic es shou ld
pac1ty 1s often 1 1 m 1te
.'
.
n i g n i fi c an t l y i n c re ase
be considered care fu I I Y b eca u se h igh 1 aten cy a cer ss c a
s m ee t .i n g rea 1 - t i. me d eauJ l i n e . So, d a t a sets accessed
bl
s
nd
cy mem ory .
:Eg freu :c:o I o f course be sto red i n t h e lowe st laten

S HARED ME MORY
Often two services find i t useful share a memory data segment o o d e egme n t . I n
t o t h i s h a red meo':Y
t h e case of a shared data segment, t h e read/write acce
_
mu st be guaran teed to be con s i tent so that one serv i c e i n _ t h e m i d d le o f a w n t e is

not p reem pted by an o ther w h i c h c o u l d t h e n read p a rt ia l l y u p d t ed d a t a .

example, a satell i te system m i ght sample sensor a n d tore the


i n a shared memory segment. The satel l i te state vector wo u l d

For
a t e l l i t e st: te vector

t y p i c a l l y in c l ude 3
_

double precision nu mber with the X, Y, Z l o c a t i n rel a t ive to t h e cen t o f E a r t h ,


3 double p rec is ion n u m bers for the veloci ty, a n d 3 m o re double p rec 1 s 1 o n n u m
bers with the ac c e era t ion relative to E a rt h . Overa l l , t h i s tate wou l d l i ke l y have
9
double precisi on n u m bers that ca n ' t be u pdated i n a i n l e P U cycle.
F u rt her
g
more, the state ect o r m i g h t a lso conta i n the a t t i t u d e , a t
t i t ude rate, a n d a t t i t ude
acceler ation -up to 1 8 n u m ber ! C l ea rl y it wo u l d be
po i bl e
a servi ce u pd a t
i n g t h is state t o be pree mpte d v i a a n i n terru pt c a
u s i ng a cont ext swit ch t o a no t h er
serv ice that uses t he state . Usin g a pa rtial ly
u pd ated state wou l d l i ke l y res u l t i n
con trol bein g issu ed based upo n corr upt stat
e i n form atio n a n d m i g h t eve n cau
se
loss of the sa t el l i te.
To safe ly allo w for sha red me mo ry
dat a, the m u tua l exc l u s i o n sem
a p h o re was
i n t roduced [ Ta n nen ba u m 8 7 J . Th e
m u tex sem aph ore pro tec ts
a c r i t ic a l ect ion o f
od e tha t i s exe cut ed t o acc ess t h
e sha red m e m o ry r e ou
rce . I n Vx Wo rk , t wo
u n c t i n up po rt t h i : sem Tak e (
Sem id ) an d semG i v e ( Sem
id ) . !h e s e mT a k e ( S em i d )
bl c
the a i l i n g er vi e i f Sem id=
O an d al low i t to
ex ec u t i o n i f se m i d= 1 .
Th e se'1G 1 e ( Se 1d ) i n rem
en r Sem 1 d by I , lea vin g
i t a t I i f i t i a l re a d y I W
.
he n
the s
1 cs
i d ) a u e Sem 1 d
0
I , a n y ervi e b l ck<"d ea r l i er bv
a
c JI t s

c s ld ) whe n Se i d
0, n w u n b l k o n e
e
h
t
f
w i t i n ' e rv i
T yp i
the
1 'n
r i c
r e u n b l k cd i n 1 r
t - i n , . , , 1 u t i r d c r. lli .
u r
roun J 1 11 the O<Jc th t w rnc a
nd/ r re a d ; the 1ha r
d
mc
m
ry
h
t
i
w
s r k ,
.
J n d 111G 1 ( Se i o ) u m
1dJ
g
m m > n S e i d , the p
u d re, a nd rcJ d
re ua rJ n t ced
to tc m u t u J l l e c l u 1v . 1 he
n t 1 , I e t io n
a r c: ,h >w n in , ,
5. t .

for

co nt inu e

to go from to

lly

blc

Scanned by CamScanner

Memory
TABU S.1

1 05

The Mutex Semaphore Protects Critical Code

1-.C..
semTa k e ( Semid ) i

..

get x ( ) i
get Y ( ) i

semTake ( Semid ) ;

control ( X , Y , Z ) ;
seRIGive ( Se111 id ) ;

get Z ( ) ;

semGive ( Semid ) ;

Clearly if the Update-Code was interrupted and preempted by the Read-Code

at line 4 for example, then the cont rol ( X , v , Z ) function would be using the

new x and possibly an incorrect and definitely old v and z. However, the
semTake ( Semid ) and semGiv e ( Semid ) guarantee that the Read-Code can't preempt

the Update-Code no matter what the RM policies are. How does it do this? The
semTak e ( Semid ) is a TSL instruction (Test and Set-Lock) . In a single cycle, sup

ported by hardware, the semid memory location is first tested to see if it is 0 or l . If


1 , set to 0, and execution continues; if Semid is 0 on the test, the value of Semid is

unchanged, and the next instruction is a branch to a wait-queue and CPU yield.
Whenever a service does a semGi ve ( Semid ) , the wait-queue is checked and the first
waiting service is unblocked. The unblocking is achieved by dequeuing the waiting
service from the wait-queue and then placing it on the ready-queue for execution
inside the critical section at its normal priority.

ECC MEM O RY
For safety-critical real-time embedded systems, it is imperative that data corrup
tion is detected and ideally corrected in real time if possible. This can be accom

plished by using ECC ( Error Correcting Circuitry) memory interfaces. An ECC

memory interface can detect and correct SBEs ( Single Bit Errors) and also can
detect MBEs ( M ulti-Bit Errors), but can not correct MBEs. The encoding for the
extended parity bits for ECC is based upon the Hamming code. When data is
written to memory, parity bits are calculated according to a Hamming encoding

and added to a memory word extension ( an additional 8 bits for a 32-bit word).

When data is read out, check bits are computed by the ECC logic. These check bits,
called the syndrome, encode read data errors as follows :
1 . I f Check-Bits = 0 AND parity-encoded word = 0 => NO ERRORS

2 . If Check-Bits ! = 0 AND parity-encoded word = 1 => SBE, CAN CORRECT

Scanned by CamScanner

1 OI

Real-Time Embedded Components and Systems

3. If

Chec k-Bit s ! =

HAL T
If Check-Bi ts =

4.

word =
O AND parity- enc oded

rd =
O AN D pari ty- enc oded wo

0 = > M B E DETECT ED,

I => par i ty ERRO R,

CAN

CORRECT
i nst ruc tio n exec uted
aJly hal ts bec aus e the nex t
On MB Es, the processor no rm
, the CPU will go
ld cau se a fat al err or. By h a l t ing
wit h un known corrupted data c0u
ad.
er reset and safe recovery i n ste
through a hardware wat ch- dog tim
a non mas kab le inte rru pt tha t so ftwa re
For a correctable SBE , the CPU rais es
read i n g the add ress o f the SBE l ocation,
should hand le by acknowledg ing and then
mem ory loca tion fro m the read
and finall y writi ng the corrected data back to the
from m e mory into regis
register. The ECC autom atical ly corre cts data as it is read
data i n the regis
ters, but most often it's up to the softwa re to write the correc ted

ter back to the corrupted memor y location . I n some i m plemen tations hardware
may also automate the write-back of corrected data.

To understand how Hamming encoding works to compute extended parity


and a syndrome capable of SBE detection/correction and M BE detection, it is best
to lea rn b example. Figure 5.5 shows a H a m m i ng encoding for a n 8-bit word with
.
4 panty bits and an overall word parity-bit shown as pW.
0

pW

p() l
p()2
p()3
o04
p()S
p06
ED

m
ro t

.!

l:Ol

2
p02
x

1
p()l
0

PW

ED

. \It

1
pOl

.
. tl

3
dO l
,,
. ..
1
1

4
p03
x

3
4
o0 2 dO l pQJ
0
. 1 1 1
1
0
1
1

5
d02
"' L
1
1

1
5

d02
i
1

6
7
d03 d04
' -

9
dos

10
d06

11
d07
lFt

12
d08
0,: :..1

-. 1J

7
d04
tJ .r f.J' 'O

9
dos

6
d03

r
"" - -i--- bill .. n

8
p04

P04

111'.'D
0

10
12
11
d06 d07 d08
:'I '.G",O" . lf'.U''.1
0

'

,_.,.,.

-w

co

FIGURE 5.5

Ha m m ing encoding f
o r 8-bit wnrtt ,..ir..

Scanned by CamScanner

- - --- - .

1 07

Memory

In Figure 5.5, the ED (Encoded Data) is. the same as the CD (Corrected Data).
Furthermore, the check bits are zero and pW, parity for the ED, remains zero as
well. Figure 5.6 shows a more interesting scenario where bit-5, which is data bit
d02, is flipped. This could occur due to an environme ntal hazard such as EMI
( Electromagnetic Interferenc e) or charged particle radiation. When this occurs,
because it's an SBE, the Hamming syndrome (check bits) detects and corects this
error as shown in Figure 5.6.

pW

pOl

p02

dOl

p03

d02

d03

p0 1

p02

P03
p04

'T 'V
1
1

d04

), \. v ,J". r:Ji'''.,
1

10

11

12

p04

dos

d06

d07

doe

ED

0
0

Im

ED

0
0
0 . '

_o

.. x

'

tos.

SSE

JC

10

11

12

pW

pOl

p02

dO l

p03

d02

d03

d04

p04

dos

d06

d07

d08

0
0

l1}1"."._!!
1

'

1 :
co

'

tel '

U. .,.

pOS
p06

ifV:.. t;\,'W.\.t\, . U'.' -

JI- .ilU:l[J"'rU.."'
1

-t

1;.V<c.. "lJ.
0
1

. '"'
0

Bii

0
0

'

Conection

FIGURE

5.1

Hamming syndrome catches and corrects an SBE.

In the Figure 5.6 example, the syndrome computed is nonzero and the pW= l ,
which is the case for a correctable SBE. Notice that the syndrome value of binary
I 0 I 0 encodes the errant bit position for bit-5, which is data bit d02. Figure 5. 7
illustrates an MBE scenario.
Note that the syndrome is nonzero, but the pW=O, meaning that an uncor
rectable MBE has occurred. Figure 5.8 shows a scenario where the ED parity is
corrupted.
In this case, the syndrome is zero and pW= I indicating a parity error on the
overall encoded data word. This is a correctable SBE. Similarly, it's possible that
one of the Hamming extended parity bits could be flipped as a detectable and cor
rectable SBE as shown in Figure 5.9.

Scanned by CamScanner

1 08

tem s
ponents and Sys
Real-Time Embedded Com

0
p()l
p02
o03

0
pW
x

p01
x
0

p04
pOS
p06

ED

5'111 ED
f "0

C02 0
C03

0
pW

0 '

0
x
'6 X .
..., B

MBE

CD

p01
p02

ED

"" ED
CO i o "'
t02

t03
:t>4

0
pW
x

0
0

6
d03

7
d04

5
4
3
2
p02 dOl p03 d02
"O
0 ' '.,. 1
0
1
1
0
0
1

p()l
x
0

4
2
3
p02 de p03
.x n....
x
..
0

d02
,
1

0
0

10
d06

0
0
0

'

12d08

d07

. W:..... .-

:a

0,

0
0

8
p04

9
dOS

10
d06

11
d07

12
d08

".I

11.

.1

0
1

U:""'""I

.I

11

12
d08

o-

ll

COIRECTION

Q ...

0
0

8
p04
x

d07

,a"

L- -

6
d03

7
d04

9
dos

10

11
d07

12

0
0

0
0
0
0

p04

co
-
0

d06

..

.. :...

. .
0
0

d08

.,...__..,

. 1

0
0 I

H amming syndrom e
catches a nd corrects
PW ses::

Scanned by CamScanner

10
d06

9
dos
0
0
0

2
3
4
s
p02 dOl Po3 d02
0 ..1 .... 1
"r.
1
1

. Q ..

7
d04

d03

0
pW
0

p()J

FIGUltE 5.1

:n.:.

CD

9
8
p04 dOS
0 "J x ro '.,
0
0

7
d04

j
- x
..., G
S8E

dO l
l
1

6
d03

H amming syndrome catches MBE, but can't correct the data.

p03

p04
p05
p06

5
d02

RGURE 5.7

p01
0

2
p02
x

4
p03
x

'

1 09

Memory

p()l
1)()2
1)()3

pW
x

1
p() l
x

2
002
x

dO l
1

'!'1N

1
1

d04

p04
x

dos

0
0

10
d06

11
d07

12
doe

2
D02
0

3
d0 1

4
p03
1

s
d02

6
d03

d04

ED

0
pW
1

1
1

0
x
x

0
CD

SEE
CORRECTION

FIGURE 5.t

1
. 1

0
0

0
0

12

d08

11

d07

D0 1

10

d06

c03

.....,

0
1

t06

1
1

6
d03
Q

c05

s
d02

ED

d> l
c:02

C04

D0 3

D04

DOS
D06

p04
1

dos

Hamming syndrome catches and corrects corrupt pa rity bit SBE.

This covers a l l t h e basic correctable SBE possibilit ies as well as showing how an
uncorrectable M B E is detected. Overal l , the H a m m ing encoding provides perfect
detection of SB Es and M BEs; u n l i ke a single parity b i t for a word, which can detect
all SBEs, but can only detect M BEs where an odd n u mber of bits are fli pped. I n t h i s
sense, a si ngle parity bit is a n imperfect detector and also offers n o correction be
cause i t does not encode the location of SB Es.

FLASH FIL E SY STEM S


Flash technology has mostly replaced t h e use o f EEPROM ( Electron ically Erasable
Progra m mable Read - O n l y M emory) as t h e p r i mary updateable and nonvolatile

memory used for embedded systems. Usage o f EEPROM con t i n ues for low- ost ,
low-capacity N V R A M ( No n - Volatile RA M ) system req u i rements, b u t most often
Flash memory tec h n o logy is u ed for boot code storage and for t o rage of non

vola t ile files. F lash fi lesystems emerged not long after Fla h devi e Fla h offer i n
circuit read, write , erase, l oc k , and u n lo k operat ions on e tor . Th era c a n d I k
ope ratio ns operate on t he e n t i re ector, m o t oft e n 1 28 K B i n size. J{ead a nd write
operation s can be a byte, word, or b l o k. The m o t p revalent Fla h te h nulog)' j
.

Scanned by CamScanner

/1

1 10

and Systms
Real-Time Embedd ed Com ponents

are erased to all l s,


where the mem ory device locations
NANO (N ot AN D) Flash,
cont ents. Data
the write data and the cur rent memory
and writes are the NANO of
, and sectors
tion s with all l s unle ss i t is unc han ged
can only be written to erased loca
s o r mor e befo re they are exp ecte d t o fail as
can typically be erased 1 00,000 time

read-only memory.
clear that min im izin g sect or erases
Based upon the characteristics of Flas h, it's
rs are erased at a n even pace
will maximize Flash part lifeti me. Furth ermo re, if secto
ble for the overall ex
over the entire capac ity, then the full capac ity will be availa
boot code and b oot
pected lifetim e. This can be done fairly easily for usages such as
u
code updates. Each update simply moves the new write data above the previo sly
written data and sectors are erased as needed. When the top of Flash is encoun tered,
the new data wraps back to the beginn ing sector. This causes even wear because all
sectors are erased in order over time. Filesyste1ns are more difficult to implement
with wear leveling because arbitrarily sized files and a m ultiplic ity of them leads to
fragmentation of the Flash memory. The key to wear leveling for a Flash filesystem is
to map filesystem blocks to sectors so that the same sector rotation and even n umber
of erases can be maintained as was done for the single file boot code update scheme.
A scenario for mapping 2 files each with two 5 1 2 -byte LBAs ( Logical Block Ad
resses) and a Flash device with 2 sectors, each sector 2 ,048 bytes in size ( fo r simplic
ity of th example) shows that 1 6 LBA updates can be accommo dated for S sector

erases. This scenario is shown in Figure 5. 1 0 for 1 6 updates a n d total o f s sector


Sectcr El9Md (SO,
11

S1)

0,0

1,,

,.,

1,,

,.,

2, 1

2. 1

0.2
0,2

0,2

3.2

3,2

so

FS L8a IJpdUd
FS L8e Ched
-

Sec:tar l.Bs

Sect<n Er-..d

11

(SO. S 1)

FIGUIE 5. 1 0

Scanned by CamScanner

2.1

2.1

2,2

2.2

2,2

2.2

Si m pl e Flash wea
r lever mg ex am pl
e.

UPDATES
10

Memory

111

erases ( 3 for Sector 0 and 2 for Section I ). Continuing this pattern 32 updates can be
co m pleted for 10 sector erases (5 for section O and 5 for sector I ). I n genera l approx
.
i mately 6 times more updates than evenly distributed erases.

With wear leveling, Flash is expected to wear evenly so that the overall lifetime
is the sum of the erase lim its for all the sectors rather than the limit on one, yielding
a lifeti me with m illions of erases and orders of magnitude more file updates than

this. Wear leveling is fundamental to malcing filesystem usage with Flash practical.

SUM MARY
Memory resource capacity, access latency, and overall access bandwidth should be
considered when analyzing or designing a real-time embedded memory system.
Although most G PC architectures are concerned principally .w ith memory
throughput and capaci ty, real-time embedded systems are comparatively more

cost sensitive and often are designed for less throughput, but with more determi n
istic and lower worst-case access latency.

EXERCISES
I . Describe why a multi level cache architecture m ight not be ideal for a hard
real-time embedded system.
2. Write VxWorks code that starts t wo tasks, one that wri tes one of two
phrases alternatively to a shared memory buffr, i ncluding 1 ) "The quick
brown fox j u m ps over the lazy dog" and 2 ) "All good dogs go to heaven."
When i t's done, have it call T a s k D e la y ( 1 4 3 ) . Now add a second task that
reads the p h rases and p r ints them out. Make the reader tak h igher prior
ity than the writer and have it call T a s k D e lay ( 37 ) . How many t imes does
the reader task p ri n t out the correct ph rases before they become jumbled?

3 . Fix the problem i n #2 using semT a k e ( ) a n d semG i v e i n VxWorks t o protect


the readers/writers c ri t ical section.

4. Develop an example o f a 3 2 -bit H a m m ing encoded word ( 39 bits total )


and show a correctable

5.

SBE scenario.

For the p reced i n g proble m , now show an u ncorrectable

Scanned by CamScanner

MBE scenario.

1 1J

Real-Time Embedded Components and Systems

C HAPTER REFERENCES
[ Mano9 l J Mano, Morris M., Digital Design, 2nd ed. Prentice- Hall, 1 99 1 .

[ Ta n nenbaum87J Tannenbaum, Andrew S., Operating Systems: Design and bnp/e.


mentation. Pren tice- Hall , 1 987.

Scanned by CamScanner

Multiresource Services

This Chapter

I n t roduc t ion

Block i n g

Deadlock a n d Livelock
Critical Sections to Protect Shared Resources
Priority I n version
Power Management and Processor Clock Modulat ion

INTRODUCTION
Ideally, the service response as ill ust rated in Figure 3. 1 3 of Chapter 3 i s based u pon
i n put latency acq u iring t h e CPU re o u rce, inte rference, and o u t p u t l a tency on ly.
H owever, many services need more resources than j ust the C P U to exec u te. For
exam ple, m a n y services may need m ut ua l l y exclu ive access to shared mem ry
re ou rces or a

hared i n termediate 1/0 resource. I f t h i s re our e is not in u e by

another ervice, t h is p re e n ts n o problem. However, i f a ervice i relea ed and pre


empt a nother r u n n i ng service based upon

RM

resource held by a nother ervice, then it i

blo ked. \Vhen a sc rvi

mu t yield t h e
t hat hold
resour

e,

poli y, onl / to fi nd that i t l a k


e

i blo kc<l. it

PU de pite the RM p Ii y. 'vVe hope that the preempted e rvi e

the ad dit i o n a l re our

will

o m plete i t

at whi h t i me, the hi gh - priority er i

m u t uall

c wi l l

ex Ju i

e u e

f t h at

u nb l o k, p tentiall pn.Aemi .

1 1J

Scanned by CamScanner

1 1C

.
co mponents a""
Real-Time Embedded

_, , ,,, - -

i n thi s chapter
.
io n H o wev er, a s you 'll set:
cut
exe
e
u
10
t
n
co
d
an
0 ther services'
d ead1 ock or priority
"
t 1or
ry if cond itio ns are suffi cien
pora
em
t
e
b
.
not
may
.
blockmg

inversion.

BLOCKING

C PU, but s 't because


Blocking occurs anytim e a service can be dispa tched by the
ry cnt1caJ section
it is lacking some other resource such as access to a shared memo
or access to a bus. When blockin g has a known latency , it could simply be added
into response time, accounte d for, and therefore would not adversel y affect RM
analysis, although it would complicate it. The bigger concern is u n bo unded bloc k
jng, where the amount of time a service will be blocked awaiting a resource is in
definite or at least hard to calculate. Three phenomena related to resource shari ng
can cause this: deadlock, livelock, and unbou nded p riority inversion. Deadlock
and livelock are always unbounded by definition. Priority inversion can be tempo
rary, but under certain conditions, priority i nversion can be i ndefinite.
Blocking can be extremely dangerous because it can cause a very u nderutilized
system to miss deadlines. This is counter intuitive-how can a system with only
5/o CPU loading miss deadlines? If a service is blocked for an i ndefinite time then
the PU is yielded fo r an indefinite time, leaving plenty of C P U m a rgin, b t the
.
service fails to produce a response by its deadl i ne.

DEA DLO CK AND LIV ELO CK


In Figure 6. 1 , service S1 needs resources
A and C, S2 nee ds A and B , a n d S 3 needs B
.
and C. IfS 1 acquires A, the n Si acqu ires
.
B , t h en S3 acq mr
es C fol low ed by requests
.
.
by each for the ir other requ ire
d resource a cir cu lar wa it evo
I ve as sho wn in Figu re
6. 1 . Circular wait, also kn ow

n as the de dl Y em bra ce,


.
cau
ses
md
efi n1te deadlock.
No progress can be made by
5 ervice
. s 1 , 2, or 3 m
Fgu re 6. 1 unl ess reso
by each are released. Dead
urc es held
loc
k can b e prevented by
.
.
. . .
maki
ng sure the circu lar watt
scenano 1s im possible It ca
n be det ect ed 1'f each
ser v1c e is d es 1gn ed t o post a
ahve as di sc ussed in Ch
keepapter 1 1 ' " H g
.
.
' h A va tla bih. ty an d
Relia bi li ty D esi gn ." When
t he keep -alive is not
posted d ue to t he dead
lock, th en a sup erv isory
restart the deadlocked serv
service can
.
i' ce However , it's
.
pos
'bl
si
e
t hat wh en de ad loc k is de te ted
an d sernces arc restar
c
ted , that they cou 1 d .
.
.
.
sim
ply
ree
nte
over. Th is variant is ca
r
t h e de ad loc k over an d

lled /rve
Iock and also
.
pr
ev
en
ts
detect1 on and break ing
progress co m ple tely despi te
o f t he deadlock.

Scanned by CamScanner

Multiresource SeMces

1 15

Service- l
(Holds Resource A)

Service-2
(Holds Resource B)

FIGURE 1.1

Service-3 Requests
R source B

Service-3
(Holds Resource C)

Shared resource deadlock (circular wait).

One solution to prevent livelock following deadlock detection and restarting is


to include a random back-off t ime on restart for each service that was involved
this ensures that one beats the other two to the resource subset needed and com
pletes acquisition allowing each service the same opport unity in turn.
Wl1at if the supervisor service is involved in the deadlock? I n thi ca se, a hard
ware timer known as a watch-dog timer is used to require the supervisor service to
reset a cou ntdown timer periodically. If the supervisor service becomes dead
locked and can't reset the watch-dog timer countdown, then the watch-dog timer
expires and resets the entire system with a hardware reset. This causes the firmware
to reboot the system in hopes that the supervisor deadlock will not evolve again.
These strategies will be discussed in more detail in Chapter 1 1 , which covers high
availability and high reliability design methods.
Even with random back-off, the amount of time that a service will fail to make
progress is hard to predict and will likely cause a deadline to be m issed, even when
the CPU is not highly loaded. The best solution is to eliminate the conditions nec
essary for circular wait. One method of avoidance is to require a total order on the
locking of all resources that can be simultaneously acquired. I n general, deadlock
conditions should be avoided, but detection and recovery schemes are advisable as
well. Further discussion on this topic of avoidance versus detection and recovery
can be found i n current research [ M inoura82 ] , I ReveliotisOO ] .

Scanned by CamScanner

1 16

Real-Time Embedded Components and Systems

ED RE SO UR CE S
C R ITI CA L SE CT IO NS TO PR OT ECT SH AR

bed ded sys tem s to share dat a bet ween two ser
Shared me mory is often used in em
s betw een serv ices , but often even messages
vice s. The alternative is to pass message
shared b uffer. Di fferent cho ices for service
are passed by synchro nizi ng access to a
e clos ely i n Cha pter 8, " Em bedded
to service com mun icati on will be exam ined. mor
is used , beca use rea l - t i me systems
System Com pon ents ." When 'shared mem ory
er prior ity servi ce releases at
allow for even t-driven preem ption of services by high
ust be prote cted to ensure
any time, shared resources such as shared mem ory m
d memo ry location
mutually exclus ive access. So, if one servic e is updat ing a share
e is allowed to
( writing) , i t m ust fully comp lete the updat e before another servic
preem pt the writer and read the same locatio n as descri bed alread y in Chapt er 4. If
this m utex ( mutua lly exclusi ve access ) to the u pdate/r ead data is not enforced,
then the reader m ight read a partially updated message . I f the code for each service
that either updates or reads the shared data is surround ed with a s e mTake ( ) and
semGiv e ( ) ( in VxWorks for example), then the update and read will be uninter
rupted despite the preemptive nature of the RTOS scheduling. The first caller to

will enter the critical u pdate section, but the second caller will be
blocked an not alLowed to enter the partially updated data, causing the original
service in the critical section to alway s fully update or read the data. When the cur
rent user of the critical section calls s e mGive ( ) upon leaving the critical section. the
service blocked on the s e mT a k e ( ) is then allowed to continue safely into the critical
section. The need and use of semaphores to protect such shared resources is a well
underO'd concept in m ulti threaded operating systems.
,.. . . ..
semT a k e ( )

.. .... -

PRIO RIT Y . I NV E R S I O N
\

Priority inversio n is simply defined as any time that a high priori' ty


service has to
.
-.
.
wai wh 11 e a 1ower priority service runs-this can occur in any blockin g scenario.

We re most concrned about unbou nded priorit y invers ion. I f the invers on is
i
b unded, then this c n e lumped into the response latenc y and
accou nted for so

.
t at the R M analysis
is still possi ble. The use of any m u tex ( m utual exclu sion) sem aphore can cause a temporary mvers
.

'
1on
w h'1 l e a h igher
pnori
ty service is blocked
.
to a 11 ow a lower pnority servIC
e to comp 1ete a shared memo ry
read or updat e in its
.
enti.rety. As long as the 1ower pnon

ty servICe execu tes for a critic al sectio n WCET,


.
the .mvers1on is
. known to l as t no 1onger t h an the lowe
r prior ity servic e's W CET fo r
. .
.
the en t 1ca1 section.
What cause s unbounded prior ity mver

smn.
Three cond ition s are necessary
.
.
c
1or
uo bounded mvers1on:

Scanned by CamScanner

Multire59urce Services

117

Three o r more services with u nique priority i n the system-High

At least two services of different priority share a resource with mutex protec

(M), Low (L)

priority sets of services.

(H), Medium

t ion-one or more h i gh and one or more low involved.


One or more services not i nvolved i n the m utex has priority between the two
i nvolved in the m utex.
EssentialJy, a member of the H priority service set catches an L priority service

in the critical sect ion and is blocked on the semTa k e ( semid ) . While the
service executes in the critical section, one or more

priority services i nterfere

with the L priority service's progress for an indefinite amount of time; the
ity service m ust continue to wait not only for the

priority

H prior

priority service to fi nish the

critical section , but for the duration of all i nterference to the L priority service.

How long will this in terference go on? This would be hard to put an upper bound

on-dearly it could be longer than the deadl ine for the H priority service.

Figure 6.2 depicts a shared memory usage scenario for a spacecraft system that

has two services using or calculating navigational data providing the vehicle's position
and attitude in i nertial space; one service is a low priority thread of execution that
periodically points an instrument based upon position and attitude at a target planet
to look for a landing site. A second service, ru nning a high priority, is using basic
navigational sensor readings and computed trajectory information to update the best
estimate of the current navigational state. In Figure 6.2, a set of M prior
s
{M} that are unrelated to the shared memory data critical section can c
l1t:fJ

ie
a .,.._
for as long as M services continue to preempt the L service stuck in t
,. r uAtV )
---11":..:::r--t""" - _) i
Service L

Services

{Ml

ervice H

Inert ial_Posu lOl'I( & , &y .

Sc11n_Lanc1ng_S1 t t

t . y , z, 1tutuo1 1 ;

&z ,

CJown_r1ng1,

crot -r no ,

ars_ i.

age,

(Shared
y,

By M
Services not

sing
ntlOIJ
Sec11on

FIGUH 1.2

Scanned by CamScanner

l,

&1t t 1 tuo1 ) ;

lnterferenct"
pnonty

J"
.....
..._.._,....

Section

orb1t_11 .. 1n ts ) ;

ttltuOt_llt&.Utt ( a ,

Continuing

Cm1 al
Code

1
Unbounded priority inversion scenario.

Mrmory
pacecft
State)

' ""

s and Systems
Real-Time Em bedded Component

1 11

Solutlos
UHetl Priority lverslo

to u se tas k or inte rr u pt
ded pri ori ty i nve rsio n is
un
bo
un
to
s
on
uti
sol
t
firs
On e of the
( ) a n d tas k Un lock { ) ) to
.
and i ntU nlo ck ( ) o r tas k Loc k
1ock mg (VxWorks int Loc k ( )
. t h e same wa y
.
ope rat es m
l sec tion s com ple tely , w 1c h
.
ica
crit
n
i
ns
ptio
m
ree
p
t
ven
pre
re
a
i
mo
as
opt
uce
mal
rod
1
Prio rity inh erit anc e was
as a prio rity ceil ing pro toco l.
to the lev el
of prio rity i n a cnt teal sect ion only
method that l i m its the amp lificat ion
son , inte rrupt lock i n g and p iorit y ceili ng
required to bou nd inve rsio ns. By com pari

u ratio n of the c nt ICal sect ion, but very
essen tially disa ble all pree mpt ion for the d
r wha t is mea nt by prio rity inhe ri
effectiv ely boun d the inve rsion . To desc ribe bette
d inde fin ite i nterferen ce while
tenc e, you mus t unde rstan d the basic strat egy to avoi
H prior ity servi ce gives its pri
a low prior ity task is in a critical sectio n. Basic aUy, the
not be preem pted by t he M
ority temp oraril y to the L priori ty service so that it \Vi ii
m a l l y t h e L priority
priorit y service s while fi n ishing u p the critica l sect ion- n o r
n.
service restores the priority loaned to it as oon a it leaves the c rit ical sectio The
priority of the L service is tempor arily ampli fied to the H priority to p revent the un

bounded i nversion . One downsid e to priority inherita nce is that i t can chain. When
H blocks, it is possible that shortly a ft er H loans its priority to L, another H + n prior
ity service will block on the same semaphore, therefore req uiring a new inheritance
of H + n by L so that H+n is not blocked by service of priority H + l to H + n- 1 inter
fering w ith L,

wh ic h has p riori ty H . The chaining is com plex,

so i t would be easier to

si m ply give L a priority that is so high that c h a i n i n g is not necessary. This idea

., ..,

..
_

beC'a{lle the priority ceiling emulation protocol ( also known as h ighest locker). The
prior.it}' ceiling protocol is more exacting and was first described in terms of A da
.._

angua$

server and client tasks [ Goodenough88 ] . With p riority ceiling emulati on


w h e the in v r i on occu rs, L i s temporarily loaned the h ighest prio ri ty
e nsunng that it completes the critical section with one p r iority ampl ifica

prqt?cal
0ss tb,
.

t.!.O .

wever '. the ceiling protocol is not ideal because i t a m p l ifies the priority of

er than it ealJy n eeds to be. This overamp lification could cause other prob.
.
u ch a signific ant mterfer e nce to high -freque ncy, high-p riority services.
_
A furthe r re finem e t of p r1ority
ceiling is t h e least - locker p ro tocol. I n lea t

. .
.
locke r, t he nonty of L is ampl ified to the h ighes t prior
ity of all t hose serv ices th a t

can po ent1a ll y requ st acces s to the critic al sectio


n. The o n l y dow nside to least

l c e r 1 hat i. t re ui es the prog ram mer to


i ndic ate to t h e oper at i n g ystem w hat

t
east- oc ker pno nty houl d be. Any m istak e
i n spec ifica tion of the leas t - l oc ker
_
pno nty
may cau e nbo u nde d i nve rsio n.
I n t heory, the p rogr a m me r sho uld k no\''

. .
.
very we l l w hat ser vic e c a n en
t er a gi ven crit ical ectio n and t he refo a lso k no \''
re
.
the orre t le a t loc ke r pri. or ity
.
Th e pro ble m of pri ori ty
mver ion b eca m e famo u with
t h e M a r Pa thfinder
pa ecra ft The Pat h fim d er s p c
a e c ra ft was o n
fi
I ap p r oa c h to Mars a nd wou1 i
need to om plet e a crit ical eng .
me b urn to cap t ure mto
a Ma rtia n orb it wit h i n a f

- '

Scanned by CamScanner

Multiresource Services

1 11

days after a cruis trajectory last i ng m a ny months. The m ission engineers readied

he craft by enabl i ng ew services. Services such as meteorological processing from

instru ments were designed to help determi n e the i nsertion orbit around Mars be

cause o e of the ?bjectives of the mission was to land the Sojourner rover on the

surface m a location free of dangerous dust storms. When the new services were
activated during this critical final approach , the Pathfinder began to reboot when
one of the highest priority services failed to service the hardware watch-dog timer.
Discussed in more detail i n Chapter I I , watch-dog timers are used to ensure that

software conti n ues to sanely execute. If the countdown watch-dog timer is not
reset periodically by the h ighest priority periodic service, the system is wired to
reset and reboot to recover from software failures such as deadlock or l ivelock.
Furthermore, most systems will reboot in an attempt to recover several times, and
if this repeats more than three t imes, t he system will safe itself, expecting operator
intervention to fix the rec u rring software problem.
It was not i m mediately evident to the mission engineers why Pathfinder was
rebooting other t h a n i t was caused by failure to reset the watch-dog timer. Based
upon phenomena studied in this chapter alone, reasons for a watch-dog timeout
could incl ude the following:

Deadlock or l ivelock preventing the watch-dog reset service from executing


Loss of software san i ty due to progra m m ing errors such as a bad poin ter, an
i mproperly han dled processor except ion, such as divide by zero, or a bus error
Overload due to miscalculation of WCETs for the final approach service set
A hardware malfunction of the watch-dog timer or associated c i rcuitry
A m ultibit error i n memory due to space radiation causing a bit upset
Most often, the reason for reset is stored in a nonvolatile memory that is per

sistent t h rough a watch-dog reset so that exceptions due to programming errors,


memory bit upsets, and hardware malfu nctions would be apparent as the reason
for reset and/or through anomalies in system health and status telemetry.
After analysis on the ground using a test-bed with identical hardware and data
playback along with analysis of code, it was determined that Pathfinder might be suf
fering from priority inversion. Ideally, a theory like this would be validated first on
the ground in the test-bed by recreating conditions to verify that the suspected bug
could cause the observed behavior. Priority inversion was suspected because a mes
sage-passing method in the VxWorks RTOS used by firmware developers was found
to ultimately use shared memory with an option bit for priority, FCFS ( Fi rst Come
First Served ) , or inversion safe policy for the critical section protec ting the sha red
memory section . I n the end, the theory proved correct, and the mission was saved
and became a huge success. This story has helped underscore the importance of un
.
derstanding multiresource in teractions in embedded systems as well as design for

Scanned by CamScanner

1 JO

Real- Time Emded Components and Systems

field debuggi ng. Prior to the Pathfinder incident, t he problem of priority in versi o n
was mostly viewed as an esoteric possibility rat her t h a n a l i kely fai l u re scena rio. The

unbou nded inversion on Pa th finder resul ted from shared data used hy H , M , an d L
priority services that were made active by m ission controll e rs d u r i ng final app roac h.

POWER MANAG E M E N T A N D P RO C E S S O R C L O C K M O D U LATI ON


Power and layout considera tions for em bedded h a rdware often drives r ea l t i me
-

embedded systems to designs with less memory a n d lower speed p rocessor clocks.

The power consu med by an embedded processor is dete r m i n ed by s wi tc h i ng


power, short - c i rcuit cu rren t , and cu rrent lea kage w i t h i n t h e l ogic c i rcu i t d esi gn
.

The power equations sum marizing a nd used to model t h e power used by a n ASIC
( Application Specific I n tegrated C i rcui t ) de i gn a re

P.werag.

P' h i 1g

p wi1chi ng + psh or 1-tiru 1i1

swi tch mg

( Sprobabi lit

pleak age

)(Cr ) ( V 5upr1y ) 2 ( fc1k)-d ue

to c apacito r c h a rge/d i s c h a rge fo r

pshort- i r ui1 = t ( S probabili ty ) ( Vurrl y ) ( l\ hon -d u e to c u r r e n t flow


when gates switc h

The term s in the prec ed ing equ at i o n


are def i ned as foll ow

P1eJkage i s he pow er los a sed


u po n t h re hol d vol tag e .
S pr oba b i lity IS the pro ba btl 1ty tha t gat e wi ll
w i t c h , or a fra c t i o n o f
ga t e SWi tc hes
o n average.
CL is loa d cap ac itan ce.
l short is sho rt circ uit cur ren t .
fc1k is the C PU clo ck freq
ue nc y .

The easiest pa ra m ete


rs t o co nt rol a re t h
e Vs u ppl an d th e
qu en cy to re du ce po
pr oc es so r clo ck frewe r co ns um pt io n.
Furt hermore, the mor
e power p u t i n
t he m o e
t i e e mbedded syst

a t gen erat e d . Often real


ems m ust operate
wit
h
i
g
h
rel t a b 1 l tty a n d a t
acttve cool ing is
low cost so that
not pra ctical . C ons
.
I
'd
e
r
an
e
m b ed ded sa t e1 1 I t e
a ta n is n ot eve
n fea sibl e for co
co n t ro l systemo lm
' g b eca u se t h e
vac uu m. Oft en pa
elec t ro nics will
.
ssiv e ther mal con
ope rate i n a
. .
d
u
c
t
ton
'
a
n
d
ra
era
t ures ior em bed
d Ia t ion a re u se
P
d t o co n t rol tem
ded sys tem s M
.
o re rece n t l y , s
been d e Ign
ed wit h
ome
embedd ed systems ha e
p roce ss or clo
v
ck m od u I t i.
Jong with C
s
o
tha t vs u ly can be
PU clock ra t e u
red u ced
nd er the CO nt
pp
ro o i r mw
usy m odes of operat
are
wh en i t 's ente ring less
ion or w hen t h
.
e syste m Is
o verh e at i n g .

Scanned by CamScanner

Multiresource Services

1J1

SUMMARY
Rea l - t i m e e mb edd ed sys
tem s are mo st often des ign
ed acc ord ing to har dw a re
pow er, ma ss, lay out , and cos
t con stra in ts, wh ich in tur n defi ne
the processor, J/O ,
and me mo ry reso urc es ava ilab
le for use by firm ware and software. Ana
lysis of a
single reso u rce suc h as CPU ,
mem ory , or 1/0 alon e is well understood for rea
l
time embedded systems--the subj
ect o f Cha pters 3, 4 , and 5 . The anal ysis o f m u l t i re
sour ce usag e, such as CPU and mem
ory, has more rece ntly been exam ined . The
case of m ultip le serv ices shar ing mem ory
and CPU lead to the unde rstan ding that
dead lock, l ivelo ck, a n d prior ity inver sion
could i n terfe re with a se rv i ce ' s capa bil ity
to execu te a n d comp lete prior to a deadl ineeven when t here is more than s u ffi
cient CPU resour ce. T h e c o m p lexity of muitip le resour
ce usage by a set of service s
c a n lead to scenar ios w here a system o t h erwise determ
i ned to be h a rd
real-tim e safe, can st i l l fail.

EXERCI S E S
1.

I m p l e m e n t t h e RM L U B feasibil ity test presented i n an Excel spreadsheet.

V e r i f CPU

sched u l i n g feasibility by hand drawing timing diagrams and

1 5 , C 1 = I , C2=2, C3 = 3;
:
_
What i s the LCM of the periods and total u t i l i ty? Does the example wo rk.
y

u s i n g t h e s p readsh eet . Exam ple: T 1 = 3 , T2=

T3=

Does it pass the RM LUB feasibility test? Why or why not?

2 . G iven that E D F can provide 1 00% CPU u t i l ization and prov1de s a


.

d1;
=; =3

.
line g u a ra ntee, why isn't E D F always used instead of fixed priority R

3. U s i n g t h e same exam ple from # I ( T 1 = 3 , T2=5, T3= 5 , C

does t h e system work with t e E D F policy? o a o u t


LLF resu lt i n the same or a differen t sched u l mg.

;
.

an

4. Exa m i n e t h e exa m ple code d e a d l o c k . c for Linux from the C D - ROM,


which demo nstrat es deadl ock and fix it.
.
1ch
h
ROM,
D
C
.
the
fro
t

r
e
v
n
i
o_
i
r
p
code
ple
m
exa

e
h
t

e
Exa m i n
S. de m o n s t ra t es priority i n version and se W mdV1ew to show that w1th ut
.
that w i t h
t h e priority i n heritan ce protoco l, the 111vers10 n .occurs; also show
ed.
t h e .priorit y i nherita nce protoc I, t e problem is correct

CHAP TER REFERENCES

n
eil ing Pro toc ol:
Good enou gh, John, and L u . S h a, "T h p r l)' . '
.
A da Task . "
H igh - norit )
A M ethod for M i n i m iz i n g the Bloc k i n g of
CM U/SE I -88 - SR -4 Speci al Repo rt.

I G ooden o ugh88 j

Scanned by CamScanner

1 22

Real-Time Embedded Components and Systems

( Minoura82) Minoura, Toshim i, "Deadlock Avoidance Revisited. " Journal of tht


ACM, 29{4), (October 1 982): pp. 1 032 - 1 048.
( ReveliotisOO) Reveliotis , S. A., "An Analytica l Investigat ion of the Dea dlock
Avoidance vs. Detection and Recovery Problem in Buffer-Spa ce Allocation of
Flexibly Automated Production Systems." Technical Report, Georgia Tech
'

2000.
[Shah90] Shah, Lui, Rangunthan Rajkumar, and John P. Lehoczky, "Pr iority
,
Inheritance Protocols: An Approach to Real-Time Synchronizatio n. , IEEE
Transaction on Computers, Vol. 39, No. 9, (September 1 990) .
[ Vahalia96) Vahalia, Uresh, Unix Internals; The New Frontiers. Pren.tice- Hail
, Inc.,

1 996.

Scanned by CamScanner

Soft Real-Time Services

I This Chapter

Introduction
Missed Deadlines
Quality of Service
Alternatives to Rate Monotonic Policy
Mixed Hard and Soft Real-Time Services

INTRODUCTI O N
Soft real time is a simple concept, defined by the utility curve presented in Chapter
,,
2, " System Resources. The complexity of soft real-time systems arises from how
to handle resource overload scenarios. By definition, soft real-time systems are not
designed to guarantee service in worst-case usage scenarios. So, for example, back
to-back cache misses causing a service execution efficiency to be much lower than
expected, might cause that service's deadline or another lower priority service's
deadline to be overrun. How long should any service be allowed to run past its
deadline if at all? How will the quality of the services be impacted by an overrun or
by a recovery method that might terminate the release of an overrunning service?
This chapter provides some guidance on how to handle these soft real-time sce
nario and, in addition, explains why soft real-time methods can work wdl for
some services sets.

Scanned by CamScanner

1 24

Real-Time Embedded Components and Systems

M ISSED D EADLINES

um ber of way s:
M issed dea dlin es can be han dled i n a n

e as soon as th e deadl i n e is pased


J ermin ation of the ove.rru n n i n servic
.
.
lowing an overru n n m g servtee to con t m ue r u nn m g past a deadl i ne for a
limited duration
Allowing an overrun n i ng service to r u n past a deadline i ndefi nitely

Term inating the overrunning scenario as soon as the dead l i n e is passed is


known as a service drop-o ut. The outp uts from the service are not p roduced, and
the computations com pleted up to that po i n t are abandoned. For example, an
M PEG decoder service would discon tinue the decoding and n o t p roduce an out
put frame for display. The ob servable result of this handling is a decrease in quality
of service. A frame drop-out results in a poten tially d ispleasing video quality for a
user. I f drop-outs rarely occur back to back and rarely in general, this might be
acceptable quality. If soft real-time service overruns are handled with term ination
and drop-outs, the expected frequency of d rop-o uts and red uction i n quality of
service should be computed.
The advantage of service d rop-outs is that the im pact of the overrunning
service is isolated to that service alone-other higher p riori ty a n d lower priority
services will not be adversely im pacted as long as the overrun can be quickly
detected and handled . For an RM policy, the failure mode is l i m ited to the single
overrunning service ( refer to Figure 3. 1 5 of Chapter 3 ) . Quick overrun detection
and handling always results i n some residual i n te rference to other services and
could cause additional services to also miss their deadlines-a cascding failure. If
some resource margin is maintained for drop-out handling, this i mpact can still be
isolated to the single overrunning service.
Allowing a service to continue an overrun beyond the specified deadline is
risky because the overrun causes unaccounted- fo r i n terference to other services.
Allowing such a service to overrun indefi nitely could cause all other services to fail
of lesser priority in an RM policy system. For this reason, it's most often advisable.
to ha dle overrns with termination and li mited service drop-outs. Determin ist ic
.
beh v1or m a failure scenario is the next best thing compared to determ i n ist ic be
havior that uaran tees success. Dynamic priority services are more suscepti ble to
.
cascadm fa ilures ( refer to Figure 3. 1 6 in Cha pter 3 ) and therefore also more risky
as far as impact of an overrun and time for the system to recover. Cascading faiJ
urs make the computation of d rop-out im pact on quality of service harder to
estimate .

Scanned by CamScanner

Soft Real-Time Services

guALITY OF SERVI CE

1 25

Quality of service (QoS) fior a rea1 -ti me system


can be quanti fied based upon the
.
frequency that servICes p rod uce an
mcorrect result or a late result compared to
.
how 0 ften t h ey func tion co
tly. A real- time system is said to be c rect only if it
ces
produ
corr ect resu lts on time In Chap ter 1
,
1 , " H 1g
' h Ava1 1ab'I
1 1ty
'
'
d Rel 1a
' b 1' I 1ty
,,
D sign, the class ic desig n meth ods and defin itions of availa
bility and reliab ility
.
will b ex med . The QoS concept is certai nly related The
.
traditional defini tion
of avai lab1hty of a service is defin ed as

rrc

Availability == MTBF I ( MTBF + MTTR)


MTBF == Mean Time Between Failures

MTTR == Mean Time to Recovery


If a service has higher availability, does it also have higher quality? From the

viewpoint of service drop-outs, measured in terms of frames delivered, for exam


ple, for a video decoder, then higher availability does mean fewer service drop-outs
over a given period of time. This formulation for QoS can be expressed as:

QoS == 1

( Drop-outs I Deliveries)

Where QoS == I is full quality and 0 is no quality of service


So, in this example, availability and QoS are directly related to the degree that

number of drop-outs will be directJy proportional to availability. However, deliv


ering decoded frames for display is an isochronal process (defined in Chapter 2 ) .
Presenting frames for display too early causes frame j itter and lower QoS with no

service d rop-outs and 1 00/o availabil ity. Systems providing isochronal services

and output most often use a DM ( Deadline Monotonic) policy and buffer and
hold outputs that are completed p rior to the isochronal deadline to avoid jitter.

The measure of QoS is application specific. For example, isochronal networks


often define QoS as the degree to which packets transported approximate a con
stant bit-rate dedicated c i rcuit. To understand QoS well, the specific applicatjon
domain for a service m ust be well understood. In the remaining sections of this
chapter, soft real-time methods that can be used to establish QoS for an applica
tion are reviewed.

LTE RNATIV ES TO RATE MON OTON IC POLICY

e ou
The RM poli cy can lead to pess i mis tic mai nten ance of h gh r
3,
sets of servic es that are not harm onic ( descri bed in C h apter

Scanned by CamScanner

re margi ?r
Pro ,e ssmg ) .

1 21

Real-Time Embedded Components and Systftms

. .
Furthermore, RM policy makes restncttve assum ptions such as T=D. Because QoS
.
.
.
should
is a bit harder to na1 1 down categorically des1gners of soft real-time systems
.
.
.
mtgh t better fit thei r applica tion -specific

consider alternatives
to RM pol'cy
that
.
would cause a dead measures of QoS. For example, m Figure ?. , th e RM policy
that EDF or
line overrun and . a service dop out decreasi ng Qo S, but it's evident
.
. resu 1 t .in h ig
' her QoS becau se both avoid th e
LLF dynamic priority pohc1es will

'

overrun and subsequent service drop out.

E u n1Jle 2

Tl

Tl

n
T4

7
13

Ul

05

U2

0.2

U3

0. 1 42857

(4

U4

0 1 53&46

Cl

LCM -

JO

Utot -

09967!!1

---,

R MSc

s 1

S2
S3

fAl.Utll

52

S3

54

13

12

11

10

S4
E DF Sc
s 1
Sl
S3
S4

--

TTD

s1

52

53

11

10

54

L l f 5 c
51
52
S3
54
l uily
51

FIGURE 7.1

x
1

Highly loaded system-RM deadline ove r ru n .

The EDF and LLF policies are not always better from a QoS viewp
oint. Figure
7.2 shows how EDF, LLF, and RM all perform equally
well fo r a given service sce
nario. From a practical viewpoin t, the deci sion to be
mad e o n sche dulin g policy
should be a balance between the imp act on QoS
by the mor e adap tive EDF and
LLF policies com pared to the mor
e pred icta ble failu re mod es and deterministic
behavior of RM in an overload situ
atio n. Thi s may be diffi cult to com pute and
might be best evaluated by trying
all three pol icie s wit h ext ens ive tes ting .
In cases where ED F, LLF , and
RM perfor m equ ally well in a non ove rload sce
nario, RM might be a better
choice because the impct of a fai lur
e is sim pler to
co nta in; that is, there is less
likelihood of cascadi ng servic dr
e op -ou ts given upper
bounds on overrun detectio
n and ha nd lin g.
As s own in Figure 7.3 ,
it's well worth no tin g
.
th at systems de sig ned to
harmonic service
request periods do
equally well wi th ED F, LL F,
and RM. Design
in g systems to be ha rm on ic ca
n greatly sim pl ify
real -t im e sc he du li ng .

Scanned by CamScanner

ave

Soft Real-Time Services

..

Tl
.

T2

T3

15

Cl

CJ

Ul
U2

U3

R M S chedUe
SI

0.3J
T 0.4

02

LCM

--

-I

15

U -i 0 93

127

S2
SJ

E Of S ched!Je

Sl

S2
SJ

no
SI

S2

SJ

IS

14

J
12

L L F S chedUe
SI
S2

luily
SJ

SI

S2
SJ

FIGURE 7.2

f unl*

x
x

2
11

Tl

(I

UI

o.s

U2

0.2 5

025

16

2
y
7

x
2
6

x
x
s

x
5

Three policies and three common schedules.

T2
Tl

LC M

16

M S t
SI
S2

rm
SI

S2
Sl
llf S
SI
S2
Sl
SI

luiy

S2

l .

10

SJ

12

FIGURE 7.J

II

Full utility from a harmon ic schedule.

Figure 7.4 shows yet anothe r example of a harmo nic schedule where policy is

inconsequential.

ng T=D . This
For isoch ronal services, OM polic y can have advan tage by relaxi
r and hold
allows for analy sis of syste ms where services can complete early to buffe
shows a
outp uts to reduce prese ntati on jitter and thereby increase QoS. Figure 7.5
to require ..
scenario where the OM polic y succe eds when the RM would fail due
men ts where 0 can be greater or less than the release period T.

Scanned by CamScanner

1 11

a
.
c om po nents
ed
dd
be
Em
e
Real-Tim

T2

10

T3

0. 4

U2

U tot

QI

U3

10

LC M

0.5

UI

Cl

Tl

E aa f11>1e 5

nd Syste ms

R M S c e
Sl
S2
S3
E Of S c e
S I

S2
S3

TTD
s1

S2
S3

10

Llf S chedUe
s1

S3

S2

S3

Sl

T1

12

3
3

A ha rm on ic service set.

FIGURE 7.4

( H"1"0 6

- ---

52

Laxity

13

Cl

UI

0.5

UJ

O ll8S 7
O I H6A6

U1

,.

UOI

0.)

LCM -

JO

ror OM

01

02

OJ

UICI-

RMOO I

0996703

J
7

IS

R M D2 J

E AIHJ( R
l1'TE R

""''

R M St

s1

52
S!

FIGURE 7.5

OM can work where P.A .dils.

MIXED HARD AND

SOFT REAL-TIME S E RV IC E S

--

.
. . e, and best ef.
Many systems include services that are hard real time, so ft real tim
d
fort. For example, a computer vision system on an assembly line may have h
real-time services where missing a deadline would cause shutdown of the proCtht
being controlled. Likewise, operators may want to occasion ally mon itor wha
computer vision systems "sees.>' The video for monito ring should have good .
so that a human monitor can assess how well the system is working, whether hghte
ing is sufficie nt, and whether frame rates appear reasonable. Finally, in the
system, operators may occasionally want to dump mainte na nce data and have

Scanned by CamScanner

Soft Real-Time Services

12 9

rea l requ i rements for how fast this is done-it can be done i n the background
whenever spare cycles a re available.
The mixing of hard, soft, and best effort can be done by admitting the services
into m ul tiple periodic servers for each. The hard real -time services can be sched
uled within a time period (epoch ) during which the CPU is dedicated only to hard
real-time services ( al l o thers are preempted ) . Another approach is to period trans
form all the hard real-time services so that they have priorities that encode their
importance. E i t her way, we ensure that the hard real-time services will preempt all
soft services and best effort services on a determi n istic and periodic basis.
Best-effort services can always be handled b y simply sched uling all these ser
vices at the lowest priority and at an eq ual priority among them . At lowest priority,

best-effort services become slack time stealers that execute only when no real-time
( hard or soft) services are request ing processor resources.

SUMMARY
Soft real -time services assume that some service releases will fa il to meet deadline
requ i rements. Careful considera t ion should be given to how service deadline over
runs will be handled and how t h is will impact QoS.

EXERCISES
Review pos i x_c lock . c and pos i x_rt_t imers . c contained on the CD- RO M .
Write a brief paragraph describing how these two modules wor k a n d what
1ea t u res o f the POSIX 1 00 3 . 1 b real-time extension s they demonst rate.
c

M a ke sure you not only read the code, but that you b uil d it, 1 oad it, an d
.
.
execut e it to m a ke sure you unders tand how both applica tions work.
mo ul fr VxWor
2. Build, load, and analyze the CD-R OM i t t imer_test
e i t 1mer_ est
and descr ibe how i t works . Speci fically what stat es oes t
. a task conte xt using sp
task conte xt trans ition between i f it's spawned m
I.

i t ime r - t e s t ( )
.
x sw wd . c so ftwar e wa t c h - dog
3 . B u 1 I d , I oa d , a n d run t he CD- ROM posi - c your d e
.
ev1' d ence ior
.
tes. Provide
mon itor co d e an d d escnbe how it execu
es.
crip tion with supp orti ng Win dVie w trac
_

\
Scanned by CamScanner

1 JO

Real-Time Embedded Components and Systems

CHAPTE R R E F E R E N C E S
[WRS99] VxWorks Reference Manual 5. 4, available online a s "windma n" or ha rd
copy, as Edition 1 . WRS, 1 999.
[WRS99] VxWorks Programmer's Gu ide 5.4, Edition 1 . WRS, 1 999.

Scanned by CamScanner

Pa rt

II

Designing Real-Time
Embedded Components

Scanned by CamScanner

Embedded System
Components

I This Chapter

I nt roduction
H ardware Components
Firmware Components

RTOS System Software M echanisms


Software Application Components

INTRODUCT I O N
System design can b e approached in a bottom-up or a top-down fashion. The
bottom-up approach consists of determining the fundamental components that go
into a system. The top-down approach is a hierarchical breakdown of the system
into subsystems and then into components. The top-down approach can be
viewed as a concrete breakdown of the system into smaller parts as suggested here,
but often an abstract top-down approach is useful where the system is broken
down by service and function. This functional top-down approach is described in
Chapter 1 2 , "System Lifecycle. " In this chapter, we first examine common compo
nents of a real-time embedded system in the concrete sense going from the overall
system 9own to components. Familiarity of the components can assist the designer
in making more optimal system design decisions. The design of a real-time em
bedded system can be viewed as a hierarchy of subsystems as depicted in Figure 8. 1
for a real-time stereo -vision tracking system.
I JJ

Scanned by CamScanner

1 J4

d Systems
Co m po ne nt s an
ed
dd
be
Em
e
im
-T
Real

Tilt/Pin

Actultion

Procs1n1

'

'J

""..

'
It

. ,..

t
.
I

Sensor/Actutor 10

FIGURE 1.1

Subsystems in a stereo-vision tracking system.

The stereo-vision tracki ng system has a s i m ple goal-keep a bright

object in

the fiel d of view of both cameras even if the object moves a n d e s t imate the distance
from the camera assembly to the

o bje c

t.

A m ore fu n c t i o n a l service view of t he

same stereo-vision t racki ng system would look m uc h d i fferen t . This is depicted by


the h ierarchy for the same system shown i n F ig u re 8.2.

Stereo-Vision
Tracking

Stereo Ranging

FIGURE 1.1

Tilt/Pan Target
Tracking

Range Reporting

Field-of-View
Centroid

Services i n a stereo-vision tracking system.

The stereo-v isio n trac k g sy tem


s
is an ex a m p l e design that is
m
.
. .
more d et a1 1 in
.
Ch apter 1 5 ' " Co mp uter Y1s
1on A ppl icati ons."

Scanned by CamScanner

Reportint

exam i n ed 1

Embedded System Components

I JS

HAR DWARE COM PON ENTS


The hard ware com pon ents o a rea I -time em e
b dd ed system incl ude a wide range
.
.
mecha n i cal a n d elec trical
compo nents. For examp le, in the stereo -vision t rack
mg system we h ave:

?f

Struc t u ral and mech anica l-ca mera assem bly


Electr o- mech an ical actua tors- tilt and pan servos

Electro-mechan ical sensors ( t ransducers)-none, but servo posit ion sensors


couldbe added
Optical sensors-NTSC cameras
Cabli ng-power, NTSC signal, RS23 2 , CAT-5 twisted pairs
Digital state-machi nes, microcontrollers, and microprocessors-x86 m ic ro
processor, PIC microcontroller
Analog frontend (sensor) and backend ( actuator) circui ts-NTSC, TTL pu lse
width modulation
Networks or bus i n terfaces-RS232 serial, Ethernet, PCI
Thermal m a n agement-CPU fans
Typically, additional test equ i pment hardware may also be requi red, i nclud i n

m o n i tors, development com puting environment, oscil loscope, d igital multimeter,


and a logic a n alyzer. This is not part of the system, however, although the test
equ ipment is req u i red to fully i m plement and verify its proper i m plementation
and operation.
In the following sect ions, basic hardware components such as those used in
the stereo-vision tracking sy.stem are described .

Se1son
Sensors are devices that respond to physical s t i m ulus ( ligh t , heat, pressure,
s tress/stra i n , acceleratio n, magnetis m ) by transformi n g the associated energy into
electrical energy or by modifyi ng the elect rical propertie s in a circuit . For example,
a camera is a sensor that converts photon energy into elect rical cha rge that repre
sents the photon flux for each picture element in an array. A therm istor is a resis
tor c i rc uit where the resistance of the therm istor changes with temperat u re and

therefore so does the c i rc u i t c urrent at a given voltage and voltage drop aero s the
load. A sensor assembly may also i n terface t h is analog fron tend to a digital encod
i n g i nterface. A n alog to digital converter s ( A DC ) are u cd t o sa m ple and hold
charge, thereby c onvert ing the analog circuit c u rrent/volta ge i nto a d igital value .
For example, a n 8-bit A DC will encode a sensor c i rcuit operat ional voltage range

i n to 256 levels. Without encoding, sensors are useful i n analog contro l sy tern .

Scanned by CamScanner

! ,d t
t .

.t

.... .

I f

..
I

. . .,

... ' J

; ;,.. .
. ..

.)

"
.

. .
J

'I

.
,.

'

,,

'

l
.,.. ..

, (.
..

' lJ
i

,,

'

" ' ' ' . J . ('

-:

7r

.. .....
.,

. .

'"

..

i.

..

...,. '

.._. .

, >

' .

r .

'

....

"

...

. .

..

i '

\ ... 1 "' ,,

.
;

.... ,. ,-t

..

. ..

, ,f .: ' " !

1 \l

fl

'> .

'1

'\

' ' \.

I . .. "

. ..

.
.

l.
'
,,.

1
"
.

;' ; . I

..

'

'I ' '

..

'J

.I

.1 .

'

"..

'

.i

., \

l
'
. .

'f'.

l
..

. \

J
' '

.
.

' '

I \

"_, )

' :l

..

'\ \.

..

;
.,. '

'

. )

.
. ...
. .
"

..

"

.), "' .

'

..h f}...

)
}

ill

" ..d 1 I... I

\l
'

'. 1 \ r!'-'

(1 I

ut

1.. U t "
, .

Scanned by CamScanner

..

( K

t'

!H

h..t

i"

p41h

'!'

, h

Embedded System Components

1 J7

The ADC takes a analog input ( voltage and current for a given sensor re
.
1stence) and. conves tt i" to a digital word, most often from 8 bit to 1 6 bit. Change
m sensor res1stence 1 ulti mately measured by change in voltage drop over the cir
.
and normally encodes all inputs
mt. The ADC reqmres a reference ltage, Vr,f.
mto a range of values from zero to (r) , sothat for a SV reference, a typical 1 6-bit
ADC can ecode t e voltage in an AFE into increments representing a change of
.
0.0763 m 1lhvolts with an input range of O to -SV for the AFE signal. The resolution
can clearly be increased by reducing the reference voltage, v r,6 however, this is at

the cost of constraining the input range. This trade-off drives the selection of how

many bits the ADC provides in the encoding-to accommodate large input ranges
and h igh resolution, the ADC must have more bits for the encoding. The final ques
tion is how fast can the AFE be sampled? This depends upon the type of ADC:
Flash: Using comparators, I per voltage step, and resistors

Successive approximation: Comparators and counting logic


The flash ADC conversion speed is the sum of the comparator delays and logic
delay-typically flash ADCs are the fastest variety. The successive approximation
ADC uses c o m parators to determine first whether the input is greater than half the
reference, then to determine whether it's greater than one quarter, and so on until
the LSB ( Least Significant Bit) comparison is made and the signal level has been
approximated successively to the bit accuracy of the ADC-this takes as many
clock cycles as the ADC has bits. One more issue with any ADC is how the input
signal is sa mpled-if it changes significantly during the ADC process, then the re
sults will not be accurate, so ADCs m ust sample and hold the input. The sample
and hold time will add latency and thus reduce the maximum inter-sample fre
quency. Furthermore, the ADC may automatically sample and provide an inter
rupt or F I FO input to a state- machine or m icroprocessor, or the ADC may require
commands to sample the input and then convert it. For high-rate encoding such as
video, a dedicated hardware state-machine typically provides the ADC control and
stores encoded data in a FIFO for transfer to a microprocessor via DMA . Methods
for transferring en coded data are discussed in more detail in the " Firmware Com
ponents" ection .

Aduten

a transducer that converts electrical energy into


ome other fo r m uch as sound, motion, heat, or el ec t ro magn et i s m . The imple t
fo r m of ac t uation i witching . The relay provides a mec h a n ism t hat can be a tu
ated to ope n o r close a sw itch on o m mand from a d igital 1 /0 i n t c rfa e . T h i

F u ndamentally, an

actuator is

o n/off co n t ro l does not provide con t i n u o us output o r i mple variation of output

Scanned by CamScanner

1 JI

Real-Time Embedded Components and Systems

amplitude over time. A servomechanism, or servo, is an actuato r th t co n verts elec


trical energy into mechanical rotation using a motor a n d a control i n terface. Heat
ing elements that are nothing more than resistors can be modulated to provide
heat for a system that requires minimum operating temperatures and to provide
cooling using fans, louvers, or some other form of conductive, convect ive or
radiatie cooling. Digital values are decoded into analog signaJs t.h ro ugh an analog
backend (ABE) for actuation so that a digjtaJly encoded value dnves the vol tage in
the ABE circuit. This is most often done using either PWM (Pulse Width Mod ula
tion) or a DAC (Digital-to-AnaJog converter) so that the amplitude in the AB E can
be drive n by a stream of digital encoded outputs. With PWM, periodic digi tal
ulse ( TTL logic level, for example ) is driven out with a duty cyde that is propor
t1 nal to the desired amplitude of the signal at a given point in time. The DAC pro
v1.ds the propo rtiona l outpu t autom atical ly based upon the last comm an de
d1g1 al output rather than decoding using a digit al duty cycle. Much like ADC send
sor mterfaces, DAC actuator interfaces shou ld be char acte rized by

Type of actuation-on/off or DAC/P WM mo dul ated

Speed of actuation

Accuracy of modulation
t ; for acc urte high- rate actuators, a DAC is req uired rat
he r than
rela
r
a
outpu
t
(
ulse
P
M
Code
Modula tion ) is used fo r
JO an ou tp, utC P
inp :t sampl in g a dnv mg
DA
C
for
d
ura
t
'
ion
an d at va ria ble output
levels.
Actuators can be very unstable an d su ft
o t careful design and potential feedback fr:; ver-shoot or fai. lu re to settle with
f se ns or s Fo r ex am
m1rror can be used to move an opt.
ple, a scanning
.
pick -off mirror on an optical path1 l fi I d o view very. ac cu ra te ly by deflecting
mech anism known as a voice-co 1 flex rough a sm aJI an gle . A n electromecha nicai
mag netJC coils are used to defl ect a m . ure can be driven bY a DAC so that electrofurt hermore, the mirror can be irro r on a rub ber fl ex ure to the left or right
res ed to a previo u pos it
fl exure to spring
back, dam pe ned bytor
ion
by
allo
the
wing

rate feed back co ntrol for such an . the e1 ec ro m agn et1 c c oils . Some of the highana 1og cont rol cir. cuit rather tha actu ato r might be i. mpleme nted as a traditional
. upo n the digital real
tern to provide suc h co ntrol. n . re1 ymg
-tim e embedded sys_

ie

1/0 lat(!rfaces

The 110, in gen eral , to an d fro


as e1t. h analog or digj tal . Jn th e acareal- ti m e em bedded system can be
ed fi rst
an ADC 1s_ req uired to encod an al se of an a l og J/O as seen m t h e pre classifisec
vious tio n..
og i np uts a n d a DA
C' PW M, o r relay inter face
er

'

is

Scanned by CamScanner

I
I

Embedded System Components

1 J9

required to decode d igital outputs when analog 1/0 is interfaced to a real-time em

bedded syste m . M a ny e mbedded systems may actually be subsystems in a much


larger system and therefore may not actually have direct analog 1/0-instead,

many real- t i m e embedded systems have digital 1/0 only or in addition to analog
1/0. Either way, at some point, all l/O becomes digital once encoded or prior to

decode. So, prior t o the ABE or after the AFE, the embedded system simply sees
digital 1/0. The fo r m of t h e digital 1/0 can vary significantly and can be character
ized. as:

Word-at-a-time 1/0
Block 1 /0

Furthermore, the method of interfacing word or block J/O can be

MMIO

Port 1/0
In the case of word J/O, a simple set of registers defines the interface to the

AFE/ ABE and provides status, data, and control for encode/decode. A single word is

written to an ABE for output to a data register, the output is started by setting con

trol bits, and the output status is mon itored using the status register. Likewise, for
an AFE interface, status can be polled to determ ine when new encoded data is avail

able, and samples can be commanded through control and monitored via 5tatus.

The word J/O interfaces req uire significant interaction with t h e real -time embed

ded system and are not very efficient, but do provide simple low rate 1/0 interfaces.

These i nterfaces require programmed J/O where a CPU is involved in each inpu t
and output fo r all phases of the read/write, status mon itoring, a n d control . This is

often not desirable for h igher- rate interfaces where even powerful C P Us wo uld
spend way too many cycles on programmed 1/0 and not enough on processing to

provide services. So, most high- rate interfaces have a block 1/0 interface where a
state-machine or OMA engine provides command and control of the word encod

ing/decodi ng and less frequently interrupts the CPU when significantly large blocks

of data have been encoded/decoded-typically 1 ,024-, 2,048-, or 4,096-byte blocks.

Processor cores traditionally have provided 1/0 through dedicated pins from

the C P U to other devices called //0 ports. An alternative is to save on off-chip i n

terface p i n s by mapping M M I O onto existing address and data l ines i n/out of the

CPU so that 1/0 causes devices to be read. or written in the same address space as
memory devices. Many processors, such as the I n tel x86, provide both port 1/0
and MM IO. When devices are memory mapped, care must be taken to ensure that

the M M U ( M emory Ma nagement U n i t ) is aware that particular address ranges are


being used for device 1/0 so that output data is not cached, wri tes are fu lly drained

Scanned by CamScanner

1 40

Systems
Real-Time Embedded Compone nts and

e d
. rat h er tha n bu ffered and the addre ss range is a Jlowed to be acce-'"I;

to t he d evice
o ften i t 1sn t nec
and
cached,
be
can
ons
.
locati
essa
wit
. hout exception. Memo ry
. actu a l mem o de.v1ces

m
a
ated
upd
fte
r Wri terys

for the CPU to wait for writes to be


dram ed and not each d
r r device 1/0 , all writes should be fully
e
h ave been queue d; 10
a DAC was cache
o
t
t
outpu
an
ple,
exam
for
.
f
,
I
d 1,o r
.
so t hat actuat ion 5 reli"able
ut,
this
outp
ker
wo
spea
uld
ing
ca
driv
use play.
later write -back and this outp ut was
back drop -out.
.
n senso r subsystem fo r
st reo-v 1s10
the
Figure 8.3 shows the comp onent s f the
two
of
ca
NTSC
sts
consi
mer
as
device
and two
stereo -vision tracker. This senso r
ers,
grabb
which
use the Bt g78
PCI ( Peripheral Component I nterco nnect ) frame
encoder chip to acquire and encode the NTSC camera o utput. The data acquisi
tion is composed of an NTSC signal-en coding in terface, a PCI bus data 1/0 DMA
channel, and a programma ble DMA engine. The encoding i s perfo rmed at 30
frames per second for a selection of video encoding formats, i ncludi ng the maxi
mum resolution of 640 x 480, 32 -bit pixels where each pixel is composed of an
8-bit intensity, alpha, and three 8-bit fields encoding RGB. The AFE for an NTSC
encoder uses a PLL ( Phase Lock Loop) to synch ron ize with the NTSC signal to
luminance, CrCb = red and
sample and digitize the signal to form a YCrCb ( Y
blur> chrominance) . The color NTSC signal format was an enhancement to the
origual grayscale television signal format. The basic NTSC signal format includes
525 horizontal traces to illuminate a phosphor screen with 262.5 even scan lines
and 262 . 5 odd with blanking time for signal retracing. The i ntensity of the tracing
beam is modula ted during the even/od d i nterlace d line tracing so that each pix l is
e
illumin ated for 1 25 nanoseconds for 427 pixels/ line and 1 0 microseconds of blank
ing between lines yieldin g a scan line time of 63.6 m icrose conds . The NTSC cam
era produces a signal confo rming to this NTSC stand ard for direct input in to a
,

Stereo-Vision
Sensor

NTSC

Camera Data
Acquisition

Camera
Assembly

Data 10 Bus

FIGURE LJ

Encoder 10

NTSC vision subsystem in stereo- vision

tracking system .

Scanned by CamScanner

OMA Engine

... .

Embedded System Components

141

standard televisio n monitor . The interlaci ng of horizon tal scan lines, that is, trac
ing odd lines followed by even line tracing each frame, reduces flicker at the NTSC
frame rate of approxim ately 30 fps ( 29.97 actual ). The AFE PLL synchron izes with
the scan line by detecting the NTSC sync and blanking levels and then programs
the ADCs to sample the signal for each pixel to encode YCrCb. The YCrCb data is
latched into an internal FIFO memory, and a synchronized DMA engine drains the
FIFO with PCI bus transfers from the encoder to host system memory. The YCrCb
format for NTSC was chosen so that grayscale televisions can display color NTSC
signals by simply using the luminance portion of the signal alone. Most encoders
support automatic conversion from YCrCb into alpha-RGB using linear scaling
formulas based upon characteristics of human vision:
R = Y + Cr
G = Y - 0.5 1 x Cr - 0 . 1 86 x Cb
B = Y + Cb

The relationship between the YCrCb and RGB signals are


Y = 0.3R + 0 .59G + O . l I B
Cr = R - Y = R - (0.3R + 0 .59G + 0 . 1 1 8 )
Cb = B - Y = B - (0.3R + 0 .59G + O . t 1 B)

The OMA engine is a simple procssor with an RISC instruction set that pro
vides control over the encoding as well as the PCI OMA transfer and generation of
host interrupts. So, for example, using the Bt 878, microcode can be written to en
code the 525 NTSC input even and odd lines with 427 pixels each into a range of
formats (e.g., 320 x 240 alpha-RGB or 80 x 60 grayscale) based upon the A DC
sampling rates with instructions to transfer the encoded data and to generate an
interrupt at the completion of each frame encoded and transferred over PCI. The
320 x 240 alpha-RGB frames ( 307,200 bytes/frame) are transferred by the OMA
engine using multiple PCI bus bursts, typically 5 1 2 bytes to 4 KB each. This is typ
ical of a high-rate block transfer 1/0 interface for a high data rate sensor. In this ex
ample, two encoders are bursting video data on the PCI bus simultaneously to two
different DMA buffers in two different host memory address ranges.
The stereo-vision tracker also incorporates a low-rate actuation 1/0 interface
to enable the system to tilt and pan the stereo-vision sensor (cameras and baseline
mount) to follow a bright target that may be moving to keep the target in both
camera fields of view. Figure 8.4 describes the components making up this low
rate actuation subsystem.

Scanned by CamScanner

1 41

Real-Time Embedded Components and Systems

Tilt/Pan
Actuation

Servo
Controller

Servos

Serial
Command

Servo PWM
IF

10

FIGURE 1.4

Ti lt/pan servo su bsystem in


stereo-vision tracking system.

The tilt/pan actuation subsystem uses two servos to p rovide t he tilt and pan
rotational degrees of freedom. The servos are commanded with a TTL PWM signal
( Servo PWM 1/0) generated by a microcontroller ( Servo Controller). The specific
signal generated, and thus position of the servo, can be comm anded by a micro
processor through a m ultidrop serial in terface ( Serial Com ma n d I F ) . The serial
command i n terface is simple and only allows the m icrop rocessor to write ou t
command data l byte at a t ime. This is a typical low-rate i nterface.
Processor Coplex

Almost all modern real-time embedded systems i nclude a general -purpose CPU to
process firmware/software to provide updatable and flexible services by processing
and linking sensor i nputs to actuato r outputs . I f the services that a real-ti me em
bedded system m ust provide are so well known that they can be fully comm itted to
a hardware state machi ne, then perhap s a p rocess or comp lex
( o r set of inte rcon
nected CPUs ) is not needed. Most often , servic es are expec
ted t o change over time,
are not well enou gh speci fied initia lly, or are too
comp lex to consi de r hardware
only im plem entat ions. The proc essor com plex
may be c o mpo sed of the following:

A sing le CP U wit h por t 1/0 and bus


i nte rfac e M M I O
M ult iple CP Us on an i nte rna l bu
s with po rt/ M M I O
M ulti ple C PUs wit h an i n terco nne
ctio n net wo rk and po rt/ M M I O

In the case of ou wor king ster eo- visi


o n syst em exa mp le, a mai n x86 CPU

.
v1de s a plat form for ima ge pro cess i ng
to com put e the cen t roid of the target o bi('\
as seen by the left and righ t cam eras and
enc ode d usin g the Bt 878 PC I -b us NTS

P.r

Scanned by CamScanner

Embedded System Components

1 4J

encoder subsystem. The servo control is achieved using the Servo Controller, a M i
crochip P I C , which comma nds m ultiple servos to tilt/pan the camera assembly
using TTL logic level PWM based upon a serial byte stream command to the con
troller. Figure 8.5 shows the subsystems ( Servo Control and I mage P_rocessing)
that compose the overall stereo-vision system processing to provide the tracking
and ranging services. The Servo Control subsystem uses a digital control law based
upon calculated centroid inputs to tilt/pan the stereo-vision sensors in real time to
keep the target in the field of view and produces a series of servo commands as out
put. The I mage Processing subsystem uses alpha-RGB video fra mes at

maxim u m

rate o f 30 fp s t o compute the centroid of the target a s seen b y each camera a n d the
range to the target based upon a triangulation calculation.

Processing
{x86, PCl-10)

Servo Control

Image
Processing

Servo
Command

FIGURE 1.5

Stereo Range
Estimation

Image
Segmentation

Target
Centroid

Processing subsystem in stereo-vision tracking system.

Prtcesser 11.. 1/0 laterco1aedlo1


For multi-CPU real-time embedded systems, an i n terconnection network is re
quired to enable 1/0 and processing to be distributed. The interconnection network
can be

Simple bus or backplane (e.g., PCI or VME)


On-chip local bus with E I U to backplane l/O bus
A cross-bar on-chip interconnection between CPUs
An off-board network-e.g., firewire, USB, Ethernet

Figure 8.6 shows a taxonomy of interconnection trategies that can be used to


integ rate C PUs and 1/0 in terfaces in an embedded syste m.

Scanned by CamScanner

1u

and System s
Real-Time Embedded Compon ents
ork
Interco n nection Netw

Dvn amic

c rd

B us

Bi

king

Omega

FIGURE LI

.
Poin t-to-Po int

Non -blocking

Arb itra ted/R outed


t!)
F ram e, Sequences, Packe

Fully
Connett
td

Ring

Benes Clos Cross-bar

ion st rategi es
Taxo nom y of processo r-1/0 intercon nect

.
Most embedded system s are m tegra t ed w i t h e i t h e r a s cal able bus archi tecture
l i ke PCI , point-to -point serial l i n ks, or network s.

Bus

lterconaedlon
Many different bus architectures have been used and re b e i n g used for embdded
.
systems. To better understand i n t egrat ion of p rocess i n g a n d 1 / 0 wit
a bus inter
connection, this chapter exami nes t h e V M E ( Versa M od ul e Extension bu a d

n
the PCI ( Peripheral Component I n tercon nect ) bus. The V M E bus has h1stoncally
been a popular and simple bus arh itec t u re used w i t h embedd ed systems; by on
tras t , PCI is an emerge nt b us architec t u re t ha t offers traditio nal paralle l bus inte
gration as well as high -speed serial i n terco n nectio n w i t h PCI
Exp ress.
The PCI bus, i n t rod uced as a replac e m e n t fo r t h
e ISA ( I ndus try Stand ard
Arch itect ure) bus prev alent i n the desk top PC
do main , has evol ved and become
a pop ular l/0- to-p rocessor com plex i nteg
ratio n met hod i n emb edd ed syste ms. A
goal of the first PCI stan dar d, 2. 1 , was to
pro vide a bus wh ere 1/0 ada pte cou d be
rs
l
in terfaced to processor complexes
with p l ug- and - p l ay i nte gratio n. P
l ug and play
provides a stan dar d for PC I con tro
ller s ( ma ste rs) to pro be the
bu s and find devices
afte r the y hav e been added wit h
ut any mod ifi c a t i o n to
the ma ste r int erface. The
PCJ bu s was des ign ed to i nte gr.a te
wit h t h e l e gac y ISA bu s thr
o ugh a n interface
cal led the So uth Bu s. The ma in
PCI co ntr oll er int erface
was cal led the North Bus.
A t t h e tim e tha t PC I was fi rst i n t
rod uc ed , ma ny em be
dd ed system s were itegrated
ing t h e V M E bu s. B ild ing a real -ti me em bedd
ed sys tem ba sed u po n bus in tegra
t 1 o n a l low yst em de sig ne rs to de co
mpose the sys t e m
int o su bs ys tem s wit h int er
face an d pr oc e or boards t ha
t ca n be de sig ne d,
bu ilt, an d tested as u nit s a nd la
ter
i n tegrat e d on a sta ndard i n ter co n n
ection . Bo t h V M E a
nd PC I p rov ide th is mod u
lar ity co mpared to c u tom bac kp lan e
or o n - boa rd int e g
ra t io n . Th e bu s int egra ti o n

Scanned by CamScanner

Embedded System Components

1 45

also provi es a fast signal interface compared to packet or frame transmission net
works (this has started to change recently as yo u ll see later ). Table 8. 1 briefly sum
marizes and compares features of VME-32 and PCI 2.x.
TABLE 1.1

'

Comparison of PCI and VME Buses

Feltlln

Bus Transfers

Target Addressing
Data Transfer
Data Transfer Types

VME ...

PCI 1.s ...

Asynchronous 20 MHz

Synch Clock 33/66 MH1

32-, 24-, or 1 6-bit

Multiplexed 32/64-bit

add ress A32, A24, A 1 6)

Address/Data Bus

32-, 24-, or 1 6-bit

Multiplexed with 32-bit

seearate Data Bus

Address Bus

Word or Block Transfer

Burst Transfer Always with

lim ited to a specific block

Min and Max Length

size (e.g. 5 1 2 bytes)


Device Interrupt

Da isy-Chained Prio

4 Shared Interrupt Lines:

Mechanism

Interru pts

A-D Routed to Programmable Interrupt Controller

Interrupt Vectoring

Interrupt Data Cycle

Map A-D onto Processor

following Interrupt Level

vector, e.g. onto IRQ 0 . . . 1 S


on x86 with directo IRQ to
vector mapeing

Bus Access Arbitration

No Arbitration, Firmware

for multiple initiators

or custom controller must

(masters)

ensure mutex access

Device addressing

Custom-designed MMIO

Built-in Hidden Arbitration

Plug and Play Configuration


Space all0ws firmware to

set an MMIO or 10 base

address at run time


Expansion and form
factors

Custom Bus Integration

No custom expansion, but

on 6U boards with 3U/6U

many standard form

D-shell form factor

factors: Compact PCI,


PC/ 1 04+, PMC. and
standard PC 2.x

Faster options

Scanned by CamScanner

VME-64+

PCl-X 1 .C>a-, 2.0, and PCIExpress

1 41

pone
Real-Time Embedded Com

nts and Systems

n b Us
. suc cessful and a popular integratio
.
it
e
mad
e
av
h
t
a
th
ffic1e
. nt wit h bloc
k
The features of PCI
' h rate and is most e
ig
h
e
om
bec
has
110
t
mos
e
two
Th
(
tion.
sfer
m
ura
ost
are burst tran
d- lay confi g
and pl
ion,
trat
arbi
mp
Co
act
are
s
t-in
m
P
te
CI
transfer), buil
e bedded sys
for realrs
facto
rm
form
fo
ar
fa
used
bo
tor) .
comm o nly
ckabl e sm all
PC/ 1 04 + ( a sta
d
an
r)
to
ec
nn
co
on
e
on
s
cti
an
mglc
ne
on
(D -shell backpl
a c hipset int erc
p ar
po
ry
ve
en
be
o
als
s
I as a chi p int ercon
The PCI bus ha
p pularity o f PC
t
r
o
on
as
re

ms. One
board embedded syste
ich allow for bu s-to - bus PCI
l
n d g s ' wh
b
C

P
of
ion
d finit
nection on board ts the e
Hz bu s eye1 es and up to 64ovt. des 32 - bit 33 _M
pr
ard
nd
sta
2.x
I
PC
integrat ion . The

28 mi llio n bytes per second


ing b an dw1 dth of 1
2
x ' yield
t
d
e
a
d
up
fi
b . ah. o n,
bit at 66 MH z or
ba nd wi dt h giv en ar 1tr
e
t'v
e
ffi
e
Th
ond.
to 5 1 2 million bytes per sec
can tly reduced by bus
ver 1 d will be sig nifi
y
enc
la
nse
po
res
ice
dressing, and dev
500 million bytes per
on t order of 1 00 to
.
st
ut
ead,
ove
col

transaction proto
ams o f 32-bit alphawh t' ch . tran sfers 2 stre
e,
mp
exa
n
s1o
-v1
reo
ste
.
second. For the
bytes per sec.
ond, it requires 1 8 '4 3 2 ,000.
sec
per
es
tim
30
s
me
fra
RGB 320 x 240
u
h
ass
(
ming
idt
dw
ban
ive

available PCI 2. l effect


ond, or approximately 20% of the
gI e, fu11 - resolution ensecon d) A sm
per
es

byt
lion
mil
ct1v
100
is
idth
dw
e ban
e11e
2 6 ,90 1 000 bytes per second' or
.
ld requue
coding ( 525 x 427 ) video stream wou
.
about 27% of PCI 2 . 1 effective bandwidth.
o-vi sion examp e is the Bt878,
stere
the
in
ding
enco
o
vide
for
used
set
The chip
works by fetch MA RISC
now also updated as the Cirrus Stream Machine, and
transfers are 1mt1ated by the
engine code from host memory, so some additional PCI
ry. The stereo
chip to fetch code as well as transfer of frame data to the host memo
have had
vision systems implemented at the University of Colorado using PCI 2 . 1
no problem making use of PCI 2 . 1 for this applicatio n with a 320 x 240 30 fps
band
alpha-RGB encoding. Many embedded applications have more than enough
width available from PCI 2.x. One final important feature of PCI .i s that initiators can

=; i:

r
\
l

ad-

configure targets for a maximum and minimum burst length. The minimum serves
,
as a method to reduce overhead so that targets can t transfer small blocks that
incur high overhead for each bus arbitration and address cycle compared to fewer
larger block transfers. The maximum prevents a target from overusing the bus and

would

provides some fairness in bus arbitration for multitarget systems.


The commercial computin g market, specifical ly high-spe ed networ king,
graphics, a?d databases, have pushed PCI to evolve into a very high bandwidth in
terconnection. The AGP ( Accelerated Graphi cs Port) standa rd was developed as a
specific single-taget expansion for PCI to accom moda te high bandwidt h
.
DAC ( RAM D1g1ta l-to-A nalog Conve rters) used to drive monit ors with high
fidelity graphics. Following PCI 2.x, the PCI-X l . Oa and P CI-X 2.0 stan dards w ere
developed for netwo king and database host bus adapte rs and provide 64-b it 1 33
MHz and up to 64-bit 266/ 533 M H z bandw idth . The theore tical limit of the pC l
bus signaling is 5 3 3 MH z. At this speed, the problem o f skew bet wee n the

Scanned by CamScanner

Embedded System Components

1 47

address/da a lin s is sign ific ant and


req uires careful layout of the bus t race and
s
advanc ed s1gn almg tec h niques tha t ma
ke PC I - X 2 .0 expensive and di ffic ult to im
.
plement, esp ecia lly for bus es that acco
mm oda te more than one target and init ia
tor. Th e dev elop m nt of Gig abit Eth ern et
at 1 G an d 1 O G rates (wh ere G = gigabit /
sec) has help ed dnv e the dem and for hi gh
- ra t e PC I . The PCI - X l . Oa stan dard , at
64-b i t I 33 M H z (just over 1 G B/ sec ) , has
made it pop ular for giga bit Ethe rnet n et
.
work in terfa ces. To supp ort I O G Ethe rnet ,
PCI - Ex p ress was dev el o ped , whic h has
a dras ticall y d i ffere nt s ig n a l i n g and phys ical layer than
PC I or PC l - X . PC f - E xpre s
provides high speed 2 . 5 G serial byte l a nes that can
be ganged up. With the i n t ro
ductio n of P C I - X , t h e stand ard i n t roduc ed an i m portan
t new co ncept-

split

transactions.

In P C I 2 . x, when a device has h igh respons e latency, the bus delays u n t i l t


he
target device respond s. W i t h t h e delay pol i cy, having even one low ta rget on the

bus decreases perfo r ma n c e for a l l target and i n i t iators on the bus. Split t ra n sac
tions el i m inate this delay by provi ding bu ffer queue for writes to t h e bus so that
they can be pos ted by a n i n i t iator and drai ned to a slow ta rget over t he bu when
it's ready . The i n i t iator is not delayed in t h i case as long as the write bu ffer queue
i s not exhausted before t h e data is drai ned to the target. Likewi e, on read , split

t ra nsac t i o n a l lows the i n i t iator to post a read request to the bu , which i n i t i ates the
target read, a n d if the t a rget is slow, a l l ows t he ta rget to negot iate for complet ion in
a later t ra n sact ion-the bu

is freed in the meantime for other t ransact ions. Both

PC I - X a nd PC I - Ex p res are spl i t - t ransaction bus standa rds-t his grea tly i m prove

t h e effective bandwidt h because it does not a l low the bus to be held for arbi trarily

long delay per iods. H owever, t here i s , of c o urse, still arbit ration and addressing
overhead.

Hlp-Speetl Serll lterconnedlo


As t raditi o n a l backplane b u ses have become p roblematic as far as laying o u t ignal
t race and dea l i n g w i t h h igh- peed sign a l i n g and skew ( rates above l 00 M H z ) , ev
.
eral new h i g h - peed erial i n tercon nection standards were i n t roduced :

U n iver a l erial Bu

Fi rewire
P

I - E x p re
igabit E t h ernet

an be u ed f o r rea l - t i m e em bedded
u r eria l/ netwo rk i n terc n n e t i on
n. A fu l l di u 1 n
tern a nd pro ' ide a n a t t ract i e ltern at1 e t hu int 1ratio
e 1 t h i t t.
f al l t h e h igh - rat e . criJ I pro to ol'I i be ond t h
All

Scanned by CamScanner

1 41

ms
Real- Time Embedded Components and Syste

c real -time emb ed ded system s in the past a nd ky


The wide adopt 10n of PC I ior
. g existi
ng syst e m s
n nect io n for s e a 1 m
featur es of PCl- Expre ss ma ke tt a n i n terco
.
.
route d on a board ' on a b a c k p l a n e,
that 1s becom ing popu I ar. PC l - E x p r es s can be
"or short dista nces. Furthe r more, tt s c om posed of
and even out o f a box on a cable ''
.
.

a t 2 5 G for each l a n e w i t h the capa b 1 1 1 t y t o gang up


serial byte lanes operating
stated d es1 g n goa I of PC l - E
. x 1 , x2 , x4 x8 ' an d x l 6 config uratio ns. The
lanes m
.

n. Each
to max1m1ze
the bandwi dth per pm on t t te . m te rcon n e. ct1o
( PC I - Express ) 1s
PCl - E b yte 1 ane 1s
fu1 1 d up lex , allowin g concur rent tra n s m i t and receive at 2 . 5 G.
Given these character istics, we can compare t he PC I 2 . x , PCl - X , a n d PC l - E stan

dards as far as bandwidth per pin:


1 00 M B/ s/ pin
PCI-E: [ ( 2 . 5 G/s/dir x 8b/di r ) x ( 1 B/8b ) J /40 p i n
1 . 58 M B/ / p i n
PCI 2.x: [ ( 3 2b x 33 M Hz) x ( 1 B/8b ) ] /84 pin
PCl-X 2.0 266: [ (64b x 266 M H z ) x ( 1 B/8b) ] / I SO p i n
7.09 M B/s/pin
=

PCI - E clearl r has the advantage from the per pect i ve of i n t e rco n n ec t ion layout
and cab l i ng over PCI 2.x and PCl-X. The compl ica t i o n i t h a t e r i a l byte la ne trans

m ission req u i res sign ificant d igital signal proce i ng on ea h byte lane. L i ke fiber
channel and gigabit Ethernet , PCl - E uses an 8b/ l 0b encod i n g s heme with a link
layer and network layered archi tect u re to ach ieve 2.5
t ra n fer rates. G i ven the de
mands of gigabit transp ort, the cost of t h is d igital ignal p rocess
i n g an d network
stack i mplem entati on has actual ly becom e more fea ible
t h a n t h e cost of t radit ional
bus high- speed layou t. F u rther more , t he a b i l i ty.
to u e h igh - peed erial i ntercon
ne t ion on-c hip, on-board , and off-b oard make
t hese t a n d a rds m o re a t t ractive
than trad i t iona l bus arch i tec t u res. Fina l ly,
PCl - E has been desi gned t o be com p a ti
ble with PCI 2.x and PC I - X from the firm
war e view po i n t , de pite a rad ical l y differ
ent data tran spo rt met hod . The PCI
- E stan dard sup por ts t he a me
pl u g-and - pl ay
con figu ratio n, bur st tran sfer s, in t e r
ru p t s , and all bas ic feat u res of
PCI 2 . x.
The PC I - E i nte rco nne ctio n pro
vid e byt e lan e i n terc on nec t
i o n w i t h a n etwor k
IJyered arc h i tect u re as sho wn
i n F ig u re 8.7 .
The P 1-E tan dar d no t on
ly p rov ide s ign i fica n t l y m
o re b a n d wi dth co mp ared
to P I 2' x an d P I-X , i t al
o
.
pro vid e som e fea t u re to
s up po rt rea l - t i me co n tinu
.
ou me dia wit
h iso chr on al chan ne ls. Th
e i soc hro na l c h a n ne ls
p ro vid e ba nd wid t h
a d l ten y pe r for ma nc e
gu ara ntee for t ra ns po
rt. Th e a d v e n t of p 1 - E , U B,
Fi rew ire, an d 1. ?a b i t Eth ern
et ha s pro vid ed an
a l ter na t ive i n ter co n ne
ct io n ar hitec tur e fi r rea l - t i me em be dd
ed
tern Th e e ne w h 'gh
.
1
peed ena I i n t e rco n n\."C , ! wi l
t i n l ike
l be de 1gn ed rnt o ma
ny fu t u re rea l- t i m e
em bed de d y tern .

Scanned by CamScanner

Embedded System Components

1 41

A<. I PnP l>ri,cr


Transaction

me: :::: : : = :"': ::: :: as;

Link

ec:: : ::::ll!!I

Phy ical

FIGURE 8.7

PCl-Express byte lane network architecture.

low-Speed Serial Interconnection


Many rea l - t i m e embedded y tern

not only i nclude high - rate 1 /0 for services such

as video or network t ra n port , but a l so i nc l ude low-rate command/response o r


monitoring i nterfaces. For example, i n our stereo-vision y t e rn , t he servos are com
manded t h rough a low-rate m u l t i d rop R 2 3 2 i nterface. The M icroc h i p PIC ( P ro
i rcu i t ) ha a TTL logic level digital serial i n terface that can

gram mable I n tegrated

be interfaced to the higher volt age RS232 serial in terface. The servos i n the stereo
vision application can t i lt/pan t h rough wide a ngles qu ickly, but o n l y have an accu
racy of several degrees,

o the c o m m a nd rate req u i red to track a q u ickly movi ng

object a t a d i stance of I 0 feet or more is on the order of I to I OOs of m i l l iseconds. The


command protocol u ed on the PIC is a simple byte stream command format with
opcode bytes and operand bytes. To com mand a given servo to a new position s i m
ply req u i re

ending a PIC addre

( because the serial i n tercon nection i s m u lt idrop ) ,

a servo address a nd opcode byte, followed b y a servo position operand. A command


therefore req u i re 4 bytes, and a t a I m i l l i econd rate, this is only 4,000 bytes/second.
!early PC I , lJ SB, F i rewire, or any of the previously presented h igh-rate i n tercon
nection a rc h itect u res a re not warran ted for t h i type of i n terface.
The R 2 3 _ , common serial, poi n t-to-point data t ran m i
t e rn

real - t i me embedded

ion has been used i n

i nce t h e advent o f t h e industry a n d remains a com mon

low- rate and debug i n terface. The R 232 l i n k normally top out around I I .:WO
bit I ec nd ( abou t 1 2 K B/ ec ) and i not capable of lo ng-di tance tran m i ion due
to l i ne n i-.e at the 1 2 - volt
pro ide i m i l . r I

RS422 : A d 1 ff rcn t i a l

Scanned by CamScanner

it u

med i u m - rate t ran mi

and h igher bit ni t e . The e

l km

igna l i n g level

ther

pt ion

have

ion \..,1 i t h longer d i ta

lved t h Jt

e, m u lt idrop.

pt i n are \ idely u'>ed i n rea l - t i me em bedded .

'

/-

t rn ;

r i J I l i n k a rahl ot I .\1 H/ e a n d d i t n -e' U J t

1 SO

and Systems
Real-Time Embedded Com ponents

din g
Mu ltidrop RS 23 2, RS 42 2: Ad
l ink wit h c a pab il i t y to forward

ress targe ts on a c o rn rnon


a pro tocol to ad d

typ ica lly used on -bo ard t o in.


i n te rc on n ec ti o n
12C: A me diu m-speed dig ita l
RO M to a processor
terc o n n ect ch ips su ch as EE P
ra tes
ap a ble of m ed i u m
SP I: A d i git a l ser ial pr ot oc ol c

ial pro toc ols i s beyo n d the


to mediu m- rat e ser
low
e
h
t
all
of
on
i
uss
c
d
s
i
l
ful
A

scop e of this text.

ltercoanedlon Systems

ect devic es with


that can be use d to inte rco n n
Hav in g disc ussed the com pon ent s
ese compo
syst em, let's brie fly d iscu ss how t h
proc esso rs i n a rea l - time emb edd ed
ion a rc h i tectu re. Rea l - t i me emb edded
nent s migh t be arra nged i n an i nter con nect
ssed m o re fu l l y i n Ch apte r 1 2, but an
syste m arch itect ures and desig n will be discu
archi tectu res are most com
overview will help summ arize t h e poss i b i l i ties. Two
n g a m a i n p rocessor com
mon. The first is the hierar chical netwo rk i nterco n nect i
most popular
ple x with a numbe r of microc ontroll ers. This a rc h itec t u re h a s been
in robotics and also for aerospace applicati ons where m a n y sensors and actuators
are distributed i n a large system, yet processi ng a n d services p ro v i de s ystem - level
functions. For example, a robot ic arm that has 5 d eg re es o f freedom (base, shoul
der, elbow, wrist, and claw) with torque control so t h a t it can handle massive ob

ject , might include a microcon troller to provide cont r o l for each joint motor. The
5

t Crocontrollers,

one at each joint, can interface locally to the actuation motor


with a DAC a1 d provide closed-loop feedback con t ro l based upon local posi tion
a n_d stre s/stram sensors to provide smooth rotat i o n even w h e n the arm is han
dlmg 0 b1ects with significan t mass. Proviq ing t h e local control reduces the a m oun t
of c a bl i_ n g ba c k o the main processo r. The DAC a n d the sensor i n terface cab ling is
routed to t e microcontroll er, which is physic ally i ntegra ted
local to each joint (or
control pom t ) . The microcontroller has more t h a
n suffic ient capab ility to provi de
.
t h e torq ue cont rol and closed - I oop
.
momtonn
g. Now, the 5 m icrocontro l lers ca n
.
be inter f-aced via
a low- to med iu m - rate seria l i n terco e
n n cti o n back to t he main
.
proce so r comp l ex for com ma
nd s an d to provid e stat us-th e main p ro cesso r
.
om p lex
runs com P 1 e x
servJ Ces such as path pla n n mg,
.
poss1'bl y ca mer a - based ob.
Ject recogn itio n a user t r face,
d
st e m hea l t h a n d s t a t u s m o n i t o ri ng . In fac t,
the rob otic s co;n m u n i - k
ens t i
e r a r h i c a l app roac h to t h e h uma n bod'
c
s t
w h i ch i n l ude loc al re fl ex 1c o n t ro i as
well as c e n t ra 1 1ze
. d p rocessing i n the b ra1. 11.

3;.

.
e.g., Rodney Brook!>' ub u m p t 1on
arch i t ect u re
.
The a l tern at ive to the h ie r .a
.
re h I' a 1 111t
r"nr
. erc n n e l ion
is a cent ralized proc r
.
?
c o m ple x i n tegrated on a hi11h
o - rat e lllt er
"t'
o n n.e t 1 o 11 su h a PCI
, wi th m a n )' 1 /0 d t''1"
i n ter face al o i n tegrat ed on p( 1 T h i s
, ar h 1tec t u
ro
re h a a n advan tage in t ha t aU P

Scanned by CamScanner

Embedded System Components

1 51

cessi ng can be do ne i n a sin gle


.
p rocessor comp I ex-t h
.
.
.
e d1stnbuted processing of the
hiera rch ical arc hit ect ure req
uires different developm
e nt, debug, and test methods
com p red to the processor
com plex-often the proce
sor complex is a mic roproces
sor Wit an RTOS, and the
mic roc ont rollers are sim ple Ma
in + ISR applications. The
downside to the cen tral ized
process ing is that most often all 1/0
cabling must come
from the com mo n cen tral process
ing enclosure and be routed to sensors and
actu a
tors dist ribu ted thro ugh out the syst
em. Deciding whi ch type of architecture makes
mos t sense is often driven by the syste
m requirements for actuators and sensors-- the
num ber, how distr ibute d they are, how
much latency in sensor/actuator activation is
allowable, and , of cou rse, cost and com plex
ity.

Meory S1bsystems
Real-ti me e mbedd ed system s require nonvol atile data storage to
boot the system
and to start services . After a power-o n reset, the processors in the processo com
r

plex each vector to a hardware -defined starting address to execute code. This start
ing address, typically a high address such as OxFFEO_OOOO, is designed to map a

nonvolatile storage device, such as EEPROM or Flash memory, so that boot code
can be stored permanently at this address and executed following a reset to initial
ize the system. The boot code init ializes all basic interfaces and normally loads a
basic RTOS so that application services can be loaded and run. The code (often
called text segment ) , data ( i nitialized, uninitialized, and read only ) , heap, and
stack segments must be created in a working memory by firmware. Figure 8.8
shows a typical memory map for an embedded system.

O xFFFF_FF FF
ROM (Flsh)
OxFFFO 0000 .__. -----r:;
OxFf"E( FFFF

1 MByte Boot ROM device


(reset vector address @ high address)

40 1 5 M Bytes unused

Ox0 5 0 0 0000
Ox04 FF=F FFF
Ox 04 00 0000
O x O FF= FFF F

--- -----....,

MMIO

1 6 MBytes Memory Mapped 10


(Device Function Registers)

32 MBytes unused
(space left for memory upgradM)

o c 1.-------
'x0 LO
" x O . F() "Ff f
Main Working Memory for OS/Apps
(e.. g. 32 M Bytes SRAM SDRAM, DOR)

O xO

FIGUIE U

Scanned by CamScanner

' '

.. _...,
. :..._
. .,....._
........,.
system.
Common mem ory map for an embedded

.z)!Wr" tt t'M

...

1 52

pon ents
bed ded Co m
Real-Time E m

and Syste ms

th e viewpoint f
emo ry from
m
of
w
mo
ie
a lo gical v
ess devices . F ro
lly
c
ac
rea
a
n
is
c
ap
re
softw a
The memo ry m
m wa re and
fir
y of storage devic
ch
whi
as a h ierarch
through
ed
ce
rib
spa
ess
esc
r
add
is better d
i nt, m em ory
wpo
vie
e
war
hard
e c on t rol)
d fo r dev ic
pe
ap
m
ry
o
and mem
Registe rs (CPU

Cache
ory
W orking Mem
ory
Extended Mem

hie ra rch
ypical me mo ry
a
t
ws
sho
8.9
Figure

bed ded system.


y fo r an em
Work ing Memory
Instruction

l1

Instruction Cach e

L1

Unifie d Cach e
..

'

C11,1h\''1
Oali!

Data

FIGURE 1.9

m.
Commo n physical mem ory h"ierarc hy for an embe dded syste

FIRMWARE COMPON E NTS

mple
Some co mponents ca n only b
e a I ' ed in hardware, b u t m a n y can be i
_
iz
tly to
fi
re ( o ng that soft w a re inte rfac ing d irec
m ented with s o ftware o r rm
ded
hardware is typically called fi r m wa e . urthermore, i f the real - t i m e e m bed
r
syst em ha s any software-based' se vices or feven J U S t m a n age m en t , fi r m wa e is
need ed to inte rface h ar d wa re res our ces to s0 t wa r e application s.

::
/

Scanned by CamScanner

Embedded System Components

1 SJ

...t Code
The universal definition of firmware is code or software that runs out of a non
volatile device to make hardware resources available for the rest of the appl ication
software. Fi rmware providing this function i s normally referred to i n general as

board support package

( BS P ) fi rmware because traditionally, this firmware has

initialized and m ade available all on-board resources for a processor complex to
softwa re applications. Before the resources have been fully init ialized, th fi rmware
boots the boar<l b y executing code out of a nonvolatile device so that one or more
basic interfaces is made operable and the system can now download additional ap
plication software. For example, the BSP boot firmware might i n itialize an Ether
net interface and provide TFTP download of application code for execution.

Device Drivers
Device interface d r ivers are most often considered fi rmware because they directly
interface to hardware resources and m ake those resources available to h igher
level software applications. The architecture of a device driver in terface is
depicted i n Figure 8. 1 0 and incl udes both a HW bottom half i nterface and a SW
top half interface.
Appl ication( s)

ope clo

e.

read/wnte.

EAGAJN. Block. Data,

creat. t0cll

Status

I f Output

If I nput

Ring-Buffer Full then

Ring-Buffer E mpty then

j SemTake
or EAGAI

l Proce

{ SemTake

or

EAGAIN }
else

else

{ Process and Return l

and Return }

:f

.,

I nput

SemG1ve

Ring-Buffer

..

r R

I lard

FIGUllE 1. 1 1

Scanned by CamScanner

ore

()e, 1

Device driver firmware interface.

1M

Real-Time Embedded Components and Systems

Oper1tl11 Syste Senlces

requ ire a n oper ating s stem as d i sc usse


d in
Not all real-t ime embedded syste ms
ide
a
prov
r,
lay
v
howe
er
o,
o
ns
atio
f
ment
softwar,
Chap ter 3. Most RTOS i mple
.
s
to
acces
sys
n
gai
to
e
ns
m
resou rcts.
that acts a single interfa ce for all appli catio
,
orate a n RTOS which p rovides
Furthe rmore , most real-ti me system s incorp
a
rocessor
p
g
edulin
sch
reso
urces with
framework for resource manag ement and for
nee ed services a nd li bra i
r ts
an RM policy. The RTOS also provide s commo nly
s
and
servtee
mech
anism s pro
used by applica tion services. The most fundam ental
vided incl ude the follow ing:

Priority preemptive scheduler for threads


Thread control block management

I nter-thread synchronization and commun ication (e.g., semaphores and mes-

sage queues)

Basic 1/0 for system debug and bring-up (e.g., serial, Ethernet, LED)

ISR ( I nterrupt Service Routine) installation on in terrupt vectors


Transition from boot to operational state
Timers for delays and blocked thread timeouts
Drivers for basic hardware devices (serial, Ethernet, timers, nonvolatile memory)
Extended services beyond these may be - provided to assist development and

debug of a system:

Cross debug agent


I nteractive shell to view control blocks and system context
Capability to dynamically load and execute code object files
Interface to resource analysis tools (e.g., WindView)

RTOS SYSTEM SOFTWA RE M ECHANI S M S

An RTOS does not need to provide the same wealth of system servi ces as a full
m ul t i user operat ing system such as Linux. The understandin g is that the
bi Jity
user pre fers simple r mechanisms and is willing to take on more respon si
tn
n t . that
a
rt
po
m
i
the application to explicitly manage resources. For this reason, it is
d
progra mmers working with an RTOS code with extra precauti on to avo i urnt
rtan t
wasting debug sessions. Some good coding pra t ices that are even more impo

RTS

with an RTOS compared to Windows* or Linux include the following:

Scanned by CamScanner

Embedded System Components

I 55

wa ys che ck ret urn codes for


an RTOS AP I fun ctio n cal l .
awa re o f stac k size s for each
task created and stac k usage by loca l C variables
and par ame ters passed to ensure
.
t h at stac k overru ns are not causin
g proble ms.
Use too ls suc h as the Tor nad o b
rowser an d WmdV1ew to ensure that you
have not ove rloa ded the CPU .

VxW orks c o m p i led with the MMU Basic


only has mem ory prote ction for the
kerne l code, so check array bound s extra carefu
lly to ensur e that wild writes
are not destabilizi ng code.
o ot use p r i n t f i n t ime-cri t ical code because it introduces blocking that will

s1gmfi cantly change t i m i ng ( instead use logMsg ( ) calls).


Be carefu o n l y to use A P I calls that are documente d as okay to use i n i n terrupt
handlers m you r kernel and l S R code ( for example, you can't receive a message
i n a n I S R , but you can send one) .

Set priorities of tasks according to RM theory and demote standard VxWorks


services i f required.
Write sh utdown functions for tasks that release all resources that tasks create
and use.
Following t he preceding precaut ions will make writing RTOS code easier. The

standard RTOS mech an isms are designed to make mutithreaded real - t ime appli
cations s i m pler. M u l t iple t_h reads need methods for sharing resources, sharing

data, syn c h ro n i z i ng, keeping track of t i me, and receiving notification of asynchronous events. The C D - ROM includes numerous examples of usin g basic mecha
n isms i n the VxWorks RTOS.

Messa1e Queues
The VxWor ks RTOS support s both POSI X and native message queue mecha
pass data between
nisms. Messag e q ueues a re used to synchr onize tasks and to
safe (a partial write or read
them. The e n q ueue and deque ue operat ions are t h read
is not possible ).

and writte n i n block ing or nonBoth types of messa ge queue s can be read
does a read on an em ty
blocking mod es. For block ing mode s, when a task
eued by anot her task allow i n g it
sage q ueue , it block s until a message i s enq
.
t nes to wnte a full mes age queu e 1.n
read and con t i n ue. Li kewi se whe n a ta k
e has oom for the ne\\' m e ae. I t LS
bloc k ing mod e, it is bloc ked u n t i l the queu
than u m g WA I T _FOR E VE R so t hat mddi
wise to et a t i meo ut upper bou nd rath er
in yn h ro n izti on a n d re our e mJn age
nite blo k i ng w i l l not o c ur and erro rs
er t han allu re to mJk e pr gre
ment will be det e red t h rou gh t i meo ut rath

':1e
o

Scanned by CamScanner

1 51

ems
Real -Tim e Embedded Com pon ents and Syst
e a t s k doe s a rea d on an em pty que ue,
For n o n b l oc k i n g m essa ge q ue ues , wh
.
d icat m g t a t n ? mes sages we re avai l .
m
,
N
I
A
EAG
e
cod
r
erro
e
h
t
rned
i t wi l l be ret u
late r. L1ke w1s e, for a non bl oc ki ng
a
. bl e t o rea d a n d that the task sho uld t ry aga i n

N error co d e 1 f 1 t tn es to wr ite to a
J
A
AG
E
e
sam
e
h
t
get
l
l
"
wi
k
tas
m essage queue, a
.
n later, perh aps allow i n g a
i
aga
ry
t
ld
shou
er
writ
e
h
fu ll q ueue . T h i s i n d ica tes that t
read of that queu e to create space .

to enqu eue mess ages w ith a priThe POS I X mess age queu es i nc l u d e a fea t u re
_
are deq ueue d. H ighe r p riority
ority level and to read t h e prior ity w he n messages

fi rst This esse n t ially al lows a sende r to put an im


port a n t messag e onto the head of t h e q ueue. E xa m p le code for usage of message
q ueues i n VxWor ks can be fou n d o n t h e C D - ROM i n t h e VxWorks-E xa m p l es di
rectory . The p o s i x_mq . c shows basic fea t u res o f POS J X message queues, i ncl u d i ng
p r i o r i t i es. The h e a p_mq . c fi l e sh ows how p o i n t e rs t o h ea p al located b u ffers ca n be
u ed ,.vi t h message q ueues for zero copy b u ffers. For t h e heap message queu e, the
messages a re a l ways deque ued

ON THE CO

sender m ust a l ways al locate a n d set the p o i n t e r before s en d i n g i t , a n d the receiver


h o u l d det erm i n e when it is sa fe to dea l l oc a t e t h e b u ffer . M essage queue sends can

be cal led

in I S R

con t ext, but recei e can

in

task

mo s t often used mec h a n is m in

the

never be cal l ed i n J S R context, only

(On t e x t .

Binary Semaphores
The b i n a ry se m a p ho re i s t he s i m p l es t a n d

( ) fu n c t i o n is o ft e n u sed i n J S R con text to u nblock a service


task when data becom es ava i l a b l e a s i n d icated by a h a rdware in terr upt.

RTOS. The

h a nd l i n g

semGive

) c a l l is most often u sed by t asks to w a i t for a server reques t ( new data


a a i l a b le ) o r t o sy n ch r o n i z e w i t h a n o t h e r task. Th e two t a s k s
. c code on th e CO

Th

---

<

semTake (

R O ,\ I p ro v i d es a n

e x a mpl e of tasks t h a t syn c h ron ize ;ac h o t h er using a bin ary


emJphore . C a r e shoul d be taken to set t h e b i n a ry sema p h o re i n it ia l state ( FU LL o r
E M P T Y ) , d n d t h e pro to c o l for un block ing m
ust be selec ted. P rotoc ols for un bl oc kin g
i n c l u de S EM_O_ F I FO a n d SEM_O_P R I O R I TY . For S E
M_O_ F I FO , i f m u l t iple tasks bloc k o n
t h e J m l' e m a p hore , th en t h ey a re u n b locke d i n t h e
orde r t ha t t h ey ori gi n aJlv
a r n. ve a n d bloc ked . I f i n t e a d S E M_O_ P R I OR I T Y
is u sed , then t h e h ig h es t pri o rity

ON THC C O

ta k w i l l be u n block ed fi r t . T h e F I F O prot ocol


en u res fai rnes and the P R I OR I T
proto o l help m i n i m i ze pot e n t ia l p r i o r i t )' i n ver

i o n . F i n a l ly i n a e w h e re m ul ti
p ie ta k m a y be b l o ked a n d i f a l l b l o k i n g t a k s h o l d he e ea sed t t h
u
a
r l
e 101t>
t i m e, t h e s e m F l u s h ( ) fu nct i o n p rov ide t h i fea t u
re.

Wute x Sem apho res

- -

. -

- - .. ..

Scanned by CamScanner

- ..

--

.L

- J

Embedded System Components

1 57

s emG i v e ( ) . A m ut ua l exc lus ion sem aph


.
ore uses the sam e take an d give
.
.
ca II s, b ut 1s
crea ted wit h s emMC r e a t e ( ) m VxWor ks. F
or

.
protoco 1 s can be specified :

m utex sem apho res, three unb lock ing

PR IOR ITY :
F I FO:

U nbl ock s h igh est prio rity task


firs t.
Unb lock s in the sam e ord er task
s arri ved in for fair ness .

I NVERS ION_ SAFE :

I mple men ts prio rity inhe ritan ce desc ribed in


Chap ter 6.

M utex semaphores should be used to i mplement re-entrant functions when


these functions need to access global resources. I n itial condi t ions are i m portant
and most often m utex semaphores are initially set full to allow initial access to crit

ai

ical sections. The p r i o_ i n v e r t . c code shows an example of using an inversion-safe


mutex sem aphore i n VxWorks.

Software Virtual Timers


Most e mbedded ha rdware systems i nclude several hardware interval t imers. A PIT
( Program mable I n terval T i m e r ) can be set so that it will generate an interrupt on a

periodic basis, often I , I 0, or I 00 milliseconds. The I SR handling the hardware PIT

interrupt ( I RQO on the x86 archi tecture ) updates virtual ti mers, which keep track

of the i nterrupt count ( cal led ticks in VxWorks ) . All t a s kDelay ( ) calls, t imeouts
specified, and calls to t i c kGet { ) are driven by the PIT i nterrupt rate. If the. basic

rate is set to 1 m il l i second with sysCl kRat eSet ( 1 oo o ) , then t i m i ng accuracy for de

lays and t imeouts will be within 2 mill iseconds. It is possible to have just missed a
tick expirat ion when a t i mer is first set and also possible to overrun at least one tick

before a t imeout handler is i nvoked. The PIT can be set to raise ti mer interrupts
more frequently than 1 OOOx per second, but this starts to require significant CPU
time to maintain virt ual time at h igh rates. The CD-ROM includes three examples
to demonstrate the use of virt ua l ti me, i ncluding i t imer_t e s t . c, posix_c l o c k c,

and p o s ix_rt_t ime rs . c .

S.ltware Sl1nals
t. They
Software signa l s can be thou ght of as the software equiv alent of an int rrup
ta k con
a re the m ain mec han ism p rovi ding asyn chro nou s hand ling of events m s
by ISRs . The
text. A task can set itsel f up to catc h sign als thrown by othe r tasks or _
.
POSI X rea l C D - ROM mcl udes r t - s i g n a l -t e s t . c , w h 1' ch d emon st ra tes the use of
.
ts loss of s1g ven
pre
'
ch
1
I
h
w
s,
s1gna

tt. me sign
most
.
. als. These signals queue, unl ike
.
the process o f han d i .mg a
.
m
I
tas k is
'
n a s 1 f a signal is th rown while t he catc h mg
previousl y throw n signal .

Scanned by CamScanner

1 51

Real-Time Embedded Components and Systems

SOFTWARE APPLICATI O N C O M PO N E NTS


Software components represent the most easiJy up dated a nd flexible i mJe
menta .
.
t1on o f services m a real - t 1m
' e embedded system. The service state mach in e can be
.
coded in an RTOS framework readily using tasks and task synchro n 1zat
Jon m ech
anisms, i ncl u <;i i ng binary semaphores , m utex semaphore s, m essage q ueu
es, ring
buffers, and ISRs .

Application Services
Application services

are softw are i m ages load ed after boot a n d a fter so


me fo rm of
RTO S is func tion al to prov ide speci fic
services. The se serv ices a re sim ply
software
implem enta tion s of service stat e mac
hi nes and exe cut e cod e wit hin the
context of
a task.
Services m ust oft en be syn chr on
ize d wit h eac h oth er o r ISR s.
Th is is normally
acc om pl ished with a bin ary
sem ap ho re . Th e b i n a ry sem
ap ho re blo ck s the calling
task un til the semaphore
is given by a n ISR o r an ot
he r tas k. So, i f a task cal
ls sem
T a k e ( S ) where S= o,
the tas k is blo ck ed a n d
en ter s a pen d i ng sta te
u n ti l an other task
or I SR uses s emGi ve ( S )
to set s 0 1 . W he n s i s se
t, t h e n a l l ta sk s bl oc ke
are un bl oc ke d i n q
d ( i n a qu eue)
ue ue or de r on e at
a tim e ba se d up on
SEM_Q_FIFO option . Som
c re at io n w ith the
e RTOS fram ewo rks
, such as VxWorks
F l u sh ( S ) , whIC
, pro vide a s e m .
h unb locks all task
s presen tly bloc
ked on s no matt
have que ued on s .
er how many
The follo win g exa m
ple,
"two tasks," pr
tive exa mpl e of the
ovi des a si mple instruc
use of a b i n ary sem
aph
ore
by
two service task
RTO S fram ework . The
s in the VxWorks
two tasks C code
is
# i n c l ud e
# i n c l ud e
# i n c l ud e

" v x wor k s . h "


" s em L i b . h "
" sysLib . h "

SEM_ I D s y
n c h_s em

n t ab o r
t_t e s

int ta k e
_c n t

in t give
- cn t

vo i d t a s k_
a(v

t = FA LS
E;
=

O;

0 J.

o id )

int cnt = o
J

wh i l e ( J a b o r t
_t e s

t)

t a s k O e l a y ( 1 00
0) ;
f o r ( c n t =O ; c n t

Scanned by CamScanner

<

1 00 00 000

,. c n t + + ) ;

Em bedded System Com

ponents

1 51

s inG iv e ( s y n c h_sem ) ;

g ive-c n t + + '

void t a s k_b ( vo i d )

int c n t = o J
while ( ! a b o r t _t e s t )

for ( cnt=O ; cnt

<

t a k e -c n t + + '

1 0000 000 , c n t + + ) .
'

semT a k e ( s y n c h - sem ' WAIT _FO REVE R ) ;

t a s kD e l a y ( 1 000 ) ;

void t e s t_t a s k s ( vo id )

s y sC l k R a t e S e t ( 1 000 ) ;

s y n c h_s m = s em BC r e a t e ( SEM_O_F I FO , SEM-FUL L ) ''

I * re c e i v e r r u n s at a h i. g h e r p r i o r i t y than t he sender * /

i f ( t a s kS pawn ( " t a s k_a " ,

10,

o , 4000 , t a s k-a I o ' o I o , o , o , o , o , 0 ,

o , 0 ) == E R R O R )

p r i n t f ( " T a s k A t a s k s p awn f a iled \ n " ) ;

e l se
p r in t f ( " Ta s k A t a s k s p awne d \ n " ) ;
.
,
if ( t a s kS p awn ( " t a s k b " , 1 1 , O , 4000 , t a s k_b

0, o, o, o, o, o, o, o,

O , 0 ) = = ERROR )

\n" ) ;
p r i n t f ( " Ta s k B t a s k s p awn f a i l ed

e l se

p r i n t f ( " Ta s k B t a s k s pawn ed \ n " ) ;

t a s k _ a ( ) and
k B wi th en try po int
The tes t_ t a s k s funct ion s paw ns tas k A an d tas
riority 1 1 . Th e
her tha n Ta k B p
hig
is
t a s1< b ( l
ich
wh
O,
I
y
Task A is ass ig ned p rio rit
wil l prtt pt
sk A is spawned. i t
Ta
en
)
Wh
=
1
(
l
h r
i n itially set ful
op and give
p o e synch _ seM is
en exec ute a lo
th
nd
)
.
a

B , b u t will delay for I second ( t , 000 t ick


pl<' being
wil l uecut e dr>
B
k
Tas
lly.
i nitia
Yn c h _seo.
Because Ta k A yields t he CPU
_

;;

Scanned by CamScanner

,.

1 60

s
on ents an System
Re al-T ime Em bed de d Co mp

h_se m. On the first exec ut


. .
ion o
loop and take . sync
its
t
e
cu
exe
.11
w1
d
an
f ,
c
lower pnonty
l
1or
ay
sec
e
1
d
o
then
n
d
will
B
t
sk
h
Ta
e
tt.
and
" \4 te
sfuJ ,
.
Task B ' the take w1.11 be succes
e
y
k
b
ta
Task
the
g
n
i
B,
w
T
Follo
a
t
s
k A w;
) at this poin .
"1
ll

,
o f syn ch sem WI 11 be emp ty ( =O
h
e

n
wit
its
o
d
d
s
e
t
1
I
a
er
y
eth
wh
a
n
B
d
pt
m
ree

b
p
11
.
.
have come out o f. de1 ay an d WI
. usy .
t a I tern at1o n will
.
stnc
this
and
.
sem,

h
sync
con
give
l
wil
.
A
.
.
or not-at t h is pom t T ask
. cnt1cal
ts
y=O)
empt
r
as
o
w
1
=
.
l
l
u
ell
f

(
sem
a
h
s
the
1 con d 1't '10n of sync .
tmu e. The m 1tia
s
e
task
two
h
t
to
for
ible
de
poss
a
is
It
dl
0c k 1r
.
o f Task A and Task B
n t 1es
pno
rel at1ve
.
. ,
nate
s
lter
as
a
way
l
s
a
it
p rovided
The exa m ple will
.
the initi al con ditio ns are diffe rent .
yield the CPU with a task dela y call nd was as.
here, but if Task A, the giver, did not
B wou ld n ever r u beca use 1t ca n 't pre
signed higher prior ity, it's poss ible that Task .
_
_
t deadJ k either because circ
ular
empt Task A. This system as presented here w1 I
s.
ition
cond
wait is not possible given the prior ities and t he 1mt 1a l
.

Reentrant Application libraries


Code shared by mult iple threads of execution, as is often the case with application
code, must be reentrant. Reentrant code is able t o be i n terrup ted a n d preempted in
the exec ution context of one t h read and t hen executed i n t h e context of a

new

thread without side effects that would ca use either t h read t o s u ffer functional bugs.

So, reentrant code m ust carefully handle global resou rces a n d p rotect them

so that

they are mutually exclusively used b y m u l t i p l e t h reads. T h e fol lowing are the four
main methods to ensure that global data i s either protected o r converted into task

specific context data:

Prote tion of data with use of intLoc k ( ) a n d i n t U n l


o c k ( ) t o ensure that pre
empt io aroun d globa l data acces ses is i m possi
ble a t t h e I S R and task level.
Prote tion of data with use o f t a s kLoc k
( ) a n d t a s kU n l o c k ( ) t o ensure that pre
en:ip ion aro und glob al data acce sses
is i m pos sibl e a t t?-le task level.
Elim i nati. on of glob al data with
task var iabl es so t ha t d a t a is n o l o nger shared
but ow ed by a task con text
a n d stor ed i n t h e TC B ( Ta sk C o ntrol Block) .
Protec tio n of glo ba l d
d
h use o f. s emMC r e a t
a t a Wit
e ( ) t o estab lish a m ut ex sern
P hore and s emTak e
nl b
( ) a n d s em . e ( ) to wrap
Gi v
t he c r i t ical section s wh ere
. .
.
dat 1 5 ma nip ula ted

wit h mu ltip le i n str uct


i o n s t h a t c o u l d otherw ise be int
r u p ed or pree mpt ed.
U e of stac k dat a
only ( C p a ra met e rs
li ni
and fu n c t i o n loca l s ) so tha t ac h 'al '
e
tas k h as its
.
own c opy of the
data .

&>0

A n y of these global dat


'
a eli m m
tit'\J'l.
. atio
. n o r pro tec t i on m
et hod s will mak e fu. 11'1 . ,... .
th read afe o that they a
,.
re re -e n t ra n t
a n d can be used by m
re n t ' l
r
u
nc
co
e
u
l
t
i
pl
t ive t hreads . On e of t he bes
, !l
t a n d sim ple
sa 1 e 1. .
s t ways to
d
ea
thr
m ake a fu nctio n
reca ll that gJobal <lata can h
I' .
1

e C

Scanned by CamScanner

l fTI J0 3 f pfi h"

.-

I.:_

. .

_. _

_ __

t.,

fn

C, l

'

Embedded System Components

I &I

variables and parameters are maintai ned in stack memory. Every VxWorks task
must speci

fy stack space when t a s kSpawn ( ) is called. I nsufficient stack space and de

claration of la rge arrays as C locals can i n t roduce bugs. However, stack only vari
ables i n functions i m plements a pure function that is thread safe.

comun lcatin1 and Synchronized Applications


Applica t ion code normally requires m u l t iple services to synchro nize and to share
data or global resources. As discussed in the previous chapter, a region of memory
can be shared by two tasks and updated or read in a critical section using a mutex
semaphore-the m u tex semaphore guaran tees m utually exclu sive access to the
shared memory by one task at a t i me in spite of the possibility of preemption of one
task by another. A h igher level abstraction of this is the message queue ( see Figure

8. l l ). The message queue provides a buffer shared by two tasks a n d allows each task
to atomically enqueue or deq ueu e

message buffer. The atomic enqueue and de

queue operations a re i m plement ed using a critical section.

FIGURE 8. 1 1

Message queue communication between tasks.

A crit ical section simply requires that other tasks ( services) are not able to pre
empt S1 or S2 while they are in the middle of copying a message from their local
memory i n to the global message queue buffer. The underlying i mplementation can
use mutex semaphores to prevent more than one task from entering the critical sec
tion. O t her methods that can be used include t a s k L o c k , t a s k U n l o c k , whereby t he
scheduler is actually d isabled around the critical section ( no preemption is possible
at all ! ) . Finally, i t ' s also possible to use i n t L o c k , i n t U n l o c k to ma k out i nterrupts
enti rely in a critical section-re call t h a t preemption occur t h ro u gh an i nterrupt or
an RTOS system call (a y ie ld i n side a c r i t ical ect io n i an erro r ) . The t a s k L o c k
app roach works, b u t prevent a l l ched u l i ng d u ring t h e critical ect ion and there

fore ha nega t i ve i m pact on t he R M pol icy. Likewi e, the i n t L o c k approach work a


well because it d i sable ched u l in g ha nges i n addit ion to ma king i n terrupt a n d
pot enti ally m issi ng events a oc iated w i t h t h o e i n t erru p t . The ll er of the me age

Scanned by CamScanner

1 &2

Real-Time Embedded Components and Systems

pJe me n tatio n ,
be con cer ned wit h t h e i m
b ut
que ue, however, doe s not need to
b
.
pta
)
e
]
em
pre
Th e i m ple.
are ato mic ( n n
.
rath er tha t the enq ueu e and deq ueu e
ide ally not d isab le sch edu l mg and shou ld
me nta tion of the message que ue sho uld
prevent unb ounded prio rity inve rsio n.
.
ant ics for the enq ueue a nd
sem
ple
sim
has
,
ted
crea
e
onc
The message que ue,
NO N BLO C K I N G . The mes sage qu e ue
deq ueue that can be eith er BLOC K I NG or
queu e lengt h , a n d BLOC KI N G or
is created with a fixed message.s ize, a maxi m u m
is c reate d B L OC K I NG , writers to a
NON BLO CKIN G sema ntics . Whe n the queu e
a mess age. Likew ise, readers
ful l queu e will be block ed when they try to enqu eue
an empty queue. With
will be blocked when they t ry to deque ue a messa ge from
ueue will sim ply
the alterna tive NONB LOCK ING seman tics, the writer to a full q
be returne d an error code, EAGA IN, indicat ing that the enque ue was not success
ful. Likewise, the reader of an empty queue will be returned EAGAIN indicating
that there was nothing to dequeue . A service using the message q ueue i n a NON
BLOCKING mode may want to do other work and attempt to enqueue or dequeue
again at a later time. The BLOCKI NG semant ics are used when the service has
nothing else worthwhile to do if it can't enqueue or dequeue s u ccessfully.

One downside of message queues is that t hey require a copy of the S I local
buffer in the global message queue bu ffer-thi s is not so efficient . A variant use of
message queues that improves efficiency is the heap message q ueue. I n this case,
pointers are sent as messages rather than data . The poi n ters are set to point to a
buffer allocated by the sender and the pointer received i s u sed to access and
process the buffer-the receiver normally frees the b u ffer . It is key that the sender
allocate the buffer and the receiver deallocate to avoid exh a ustion of the associated
buffer heap ( a pool of reusab e buffers ) . Figure 8. 1 2 shows a heap message queue.
The heap message queue avoids the copy otherwise required , which takes consid
erable CPU time and wastes memory due to double-bufferi n g of the same data.

S 1 Allocates Message
Buffer i n Message Heap

S2 Processes the
Message Buffer
Data and Frees the Buffer
for Re-use

RGURE 1.1 J

Scanned by CamScanner

Heap- based mes


sage que ue comm uni
cation between tasks.

Embedded System Components

1 IJ

Because t he buffe r heap associated with the message queue is typically much larger
than the quee depth ( si ze ) , normally the queue will become filled with pointers
and block wnters before the heap is exhausted. As long as the sender always allo
cates heap and the receiver always deallocates, the heap message queue works
safely and m u c h more efficiently.

SUM MARY
A real-time embedded system is composed of hardware, fi rmware, and software
components. Services can be i m plemented in hardware, firmware, or software, or
some com bination of the th ree. Component design should be completed so that
components can be tested as i ndividual units and then i ntegrated into a larger sys
tem design.

EXERCI S E S
I . Read the following chapters i n PC! System A rchitecture by Anderson and

Shan ley: 1 , 2, 1 7, 1 8 , and 1 9. This should give you a good overview of the
PCI design a n d how to write code to probe for PCI devices and configure
them. You'll also fi n d Sections 3. 1 to 3 . 3 and Section 3.9 of Chapter 3 from
the VxWorks Programmer's Guide useful.

2 . W rite a VxWorks ISR that calls t ic kAnnounce every tick and does a semGive
on a global b i n a ry semaphore every N ticks of the system clock, where N is
a global variable so that the semG i v e frequency is adjustable via the shell.
Now, write a VxWorks task that does a s emTake and i m mediately updates a
global counter to record virtual t icks. Write a driver program to test both
the ISR and task and provide output, which provides evidence that you got
i t worki ng. So, fo r exam ple, if you adjust N to 1 000, on our system, your
count should increase by one every second. Note that you are replacing the
VxWorks virt ua l timer I S R, so you must call t i ckAnnounce so the kernel
can st i l l keep track of time!

3. Now write an abstracted top-half for yo u r driver which incl udes all the
standard d river entry points ( open, read, write, and close) and blocks a caJ l
i n g task on a read until N t icks have elapsed, a t which t ime, i t stuffs t h e read
t i me structure, i n c l ud i n g second and m i l l i econds. V\' rite a
a
test d river th at opens the abstracte d device and read fro m it i n
'' e t h
p r i n t i n g the t i me in seco nds and m i l l i second s-pr ovide evide
.

v "tua l tim e
work ing. A write to your driver hou ld al low for a reset of th
it to zero
value main taine d by your driver ( i .e., writ i n g anyth ing reset
bu ffer with

'?.P

Scanned by CamScanner

1M

Real-Time Embedd ed Components and Systems

that can be used to fin d


4. Write a PCI devic e probi ng function

devi c es

on

your co d e t o fi nd t h e c
PCI bus o and dete rmin es vend or ID. Use
i rrus
.
' dge ( N B ) c h ip
h
n
B
Nort
ntel
I
the
and
-s
e
adapters
t i n th
logic PCI video
works
ese
h
code
e
d
r
you
vi
ces will
lab and provide evidence that
add1t1ona
l devi ces as W
found on every target-some targets may have
ell
.
5. Write a PCI probing function to determi n e the c o n figura tio n of the
including latency t i ing and the arbitration control. F i na lly, determ
in
whether the N B will allow memory access by m asters other tha n the m .
a1 n
.
CPU. Demonstrate that your probe works and descnbe your probe
put. Finally, describe how the N B/SB i nterface i n PC! allows for s hare
In
terrupts ( P<;:i and I RQ).

b:
N

;ut.

CHAPTER REFE RENCES


[ Sha n ley9 9) Sh ani ey, Tom , and Do n
A nde rso n , PC! Sys tem Arch itectur
.
e 4th ed
Add1son- Wes ley, 1 999 .
'

Scanned by CamScanner

De bu gging Co m pone nts

In This Chapter

I nt roduction

Excep t i o n s

Checking Ret urn Codes

Assert
S ingl e - Step Deb u gg i n g
K e r n e l Sched u l er Traces

Test Access Ports

Powe r - O n Self Test and Diagnostics

Trace Ports
External Test Equipment
Applicat i o n - Level Debugging

INTROD U C T I O N
Debug ideal ly i ncludes hardware a n d software support i n t h e fo r m of m o n i t o r i n g
a n d con t rol so that e r r a n t c o n d i t i o n s c a n be reproduced and analyzed s o correc t i ve
ac t ion can be taken to elim i nate bugs. Correction

can be made with

oftware or

hardware mod i ficat io n to el i m i nate the errant behavior detected by a debug


mon itor. A debug m o n i t o r is any code or hardware state m ac h i n e or i n terface t hat

a l low t he u e r to ob erve erra n t , u n e x pected deviat ion - from de igned behavior.


T

dete t and defi ne a bu g, the y tern de i gn er mu t first 'lave a clear

orre t be h a v i o r from a de ign pec i fi a t i o n . I n t h i

ware a n d h a rdware me h, n i m

viewed . I t would be i m po

ihl t

both bu i l t - i n t
i n l udc a

th

h ap t er ,

n ept

m m n d e bu g oft -

tern and e ternal, a re re

rn p rehen sive re iew i 1 1

ne

ha p t e r .

the g al of thi
haptcr i4\ to J rm you \ ith k nowlc:dge o t ha t at the very least ,
you 'l l know wh;:t q ut: t ion to a k an t h w to fu rther reear h dcbu J m t hod .

Scanned by CamScanner

1 15

1 11

Systems
Real-Time Embedded Components and

S -N
IO
-- --------------------EXC
--..
EPT

------------------ --

ctio ns to be cal led by oth er code m od ul


Code is oft en dev elo ped as a set of fun
en code is des ign ed for use by ot h er s, it
or appl icat ion s i n a larger sys tem . Wh
t a fun ctio n oe s not ass ume tat it will be
imp orta n t to program defens ivel y so tha
.
.
app hca t10 s call mg library func
for
se,
ewi
Lik
s.
ent
um
arg
d
pas sed o n ly exp ecte
code and not ven fied at the source level
t ions that are perhaps l i n ked i n as object
a n i lleg al inst ruc tion , atte mpt to decod
it's pos sible that this code mig ht perform
er of othe r erra nt beh avio rs. I n gen
a bad addr ess, overflow its stac k, or any n umb
shor tcom ings, but during devel
eral, matu re code bases will not suffe r from such
use exce ption hand l ing and
opm ent, i n tegra tion, and test, it may be usefu l to

s.
program asserts to isclat e erran t funct ios a n d code block
i
One of the simple st and most comm on m stakes that can cause almost every
i
microprocessor to genera te an except ion is to divide -by-ze ro. D vi sion by zero
causes an overflow and an u ndefi ned arithmet ic result. I n VxWork s, the ker nel in
cludes default exception handling, which suspends the task that caused the excep

tion and pri n ts out debug i n for m a t i o n to assist w i t h l oc a t i ng the cause of the
exception.
For example, the followi ng code will cause t he d ivide-by- zero exception on the
VxWorks simulator and if compiled and run on L i n ux . Nobody would ever write

code t i ?bviously wrong; however, if the denomin a tor is com puted and da ta dri
ven, d1v1s10 n by zero 1 ght not be so obvious. For the purpose o f understandi ng
.
how except10ns
work, 1 t s also one of the easiest to force on all processors.

# i n c l ude " s t d io . h "


int d i v e r ro r ( vo i d )

{
}

return 1 / 0 ;

Run on the VxWorks sim u l a t 0.r, t h


e output to the windsh ell indicates that the
tas k jus t spa wn ed cau sed a n exce
ption and the susp ende d task I D noted along wi.th
the program cou n ter an d t h e tar
get sta tus reg iste r.
- >

s p d i v e r ro r

t a s k spawned :
value

id

1 776 4 1 92

1 0f Of60 ,

O x 1 0f 0 f 60

E x c e p t i o n num be r o :

Scanned by CamScanner

Task :

name

S1U1

Ox 1 0 f O f SO ( s 1 u 1 )

Debuggi ng Componts

1 17

0 1 v id e E r r o r
Prog r am Coun t e r :

O x O O f 4 53 68

Status

O x 0 00 1 0 2 4 6

Re g i s t e r :

408a0b

vx TaskEnt ry

+47

d 1' v e r ro r

(0,

O,

o.

o,

o,

o,

o.

o,

o,

0)

P ro b i ng a l i t t le d e e p e r wit h t h e Tor nad t ol clu mp


i n g t he tJsk tate_ in t h
wind hell reve al that t h e reate d ta k from t h e s p omm and
ha been put i n t t he
u pended tale by t h e V Work
c h e d u ler:
->

i
NAME

ENTRY

PAI

STATUS

PC

SP

tExcTask

-e x c T a s k

1 1 08 d e 0

0 P E NO

406358

1 1 oac 0

t L og T a s k

_l o gT a s k

1 1 0 32b0

0 P E NO

408358

1 1 03 1 b0

t Wd b T a s k

- wd b T a s k

1 0 f e668

3 R E ADY

408358

1 0 f e5 t 8

s1u1

-d i v e r ro r

1 0f 0 f 6 0

1 00 SUSPEND

f 4 5368

t 0 f0ed8

va lue

OxO

p t i o n debug i n fi rmation t o the w i nd h e l l or o n ol al n g w i t h

Dumping e
. u

TIO

f th e

pen i o n

t io n fi r

fe nding t a k i the default handl i ng of a m i r pro e o r c

x W rk . T h i i

a t ion m ight w a n t t
t herefore i n c l u d e

fte n

uppl

uffi c ient d u r i ng de elop m c n t ; however, a n a p p l i


pe i fi c except ion h a n d l i ng. The V x Work

, c a l l ba k t h a t

de i n w m d ified t

in c l ud e

" s t d 10 . h "

i n c l ud e

e c L 1b . h "

Old

CHOO

( int

tas

xc

c e p t 1on

rror c

rap

tor t a s

01

I C 00

e A C oo it

rn

0,

Scanned by CamScanner

exc ec ,

int

c) ;

ln

o that the

i n I ude an ex ept i n hook:

IO,

{
l o \t Q (

a n be regi tered w i t h th e kernel

kcr nrl

i J e cu t o m h a nd l i ng or recove ry from ex ept ion . The d i v idc-

.1ppli dti n L1 n p r
by- 2er

er

) ;

Old

)( I

e x c S t a c k F rame )

e x c e c = O ..<

n ,

tas

IO,

1 11

::>
, -1err 1:.

mponents ana
Real-Time Embedded Co

.
rs ca l i ed in additio n to the defa u l t han.
dler
han
'fi
re
r
ee
sp
.
The added ap pl ica tio.nhandl e r l o g M s g is ou tp u t t o t
the
that
.
.
ure
ens
he
to
.
The s e t o u t ut ilit y IS u sed

dl mg.
win dshell.
->

setout

.
O r i g i n a l s et u p .

-3 '
s o ut -

s in = 3 '

s e r r= 3

e rm i n a 1 . . .
y o u r v i rt u a l t
All b e ing rem app ed t o
' ' '
.
s s a g e n ow . . .
You s h o u l d s e e t h i s me
e t h i s logMsg
You should a 1 5o se
o x 1 0f 9 2 b8 ( t 1 )
_
,
m j
x1c
a o r_o s _v e r s i o n + o
v a l u e = 32 = O x 2 0

- > testExc

.
choos e a n e x c e p t i on

-b u s e r r o r ,
[ b-

E x c e pt ion number 0 ..

Tas k :

by

d=di v ide

ion
Genera t ing d i v i d e by z e ro e x c ep t
Ox 1 0f92b8

Divide E r r o r

408a0b

vxTaskEntry

423e51

w d b F u n c Ca l l L i b i n it +a d

f452eb

_t e s t E x c

O x 1 0f92b 8

value = O

( t2) :

OxO

( t2 )

Ox000 1 0 206

status Registe r :

0)

Ox00f 4 5 1 0 e

Prog ram Coun t e r :

0)

zero J

+47

+f3

423df8
:

( 1 1 0e450 ,

O,

O,

0,

0,

O,

0,

( o '

o,

o,

o,

o,

o,

o, o, o,

_t e s t E x c

_d i v e r r o r

0, 0,

(0)

T rappe d a n E x c e p t i o n f o r t a s k O x 1 0
f92b8

-> i
NAME

ENTR Y

t E xcTa sk

excTask

tLo gTa sk

_l ogT a s k

-wd bT as k

tW dbT ask

t2

va lu e =

Ox42 3df8

oxo

TID
1 1 08 d e0

1 1 0 32 b0

1 0f e 66 8

1 0f 9 2 b 8

PA I

STA TUS

0 PEND

0 PEND

3 REA DY

4 SU S P E N D

PC
408358

408358

408358
f45 1 0e

SP
1 1 08 c e O
1 1 0 3 1 bO
1 0fe 5 1 8

1 0f 9 1 c 0

Simila rly in Lin ux, if


the sa m e cod e w i t h
d ivi d e b y zer o is
is termi n ated and a c
exec uted, t he p roce ss
ore is d u m ped
to the file syst em
for d e b u g. Before lo oking
i n to L i n u x , fi rst run the
same ex a m p l e
co de a n d n ow
gen e ra t e a b u s error . A fte r
this second run , the tas
k list i n g u s i n g the
i co m m
a n d s h o ws t h a t t h e tas k
d i v ide-by- zero run and th
for the
e t ask for the b u
e
s rro r are bo
t h no w i n a s u s pe n d ed
state .

Scanned by CamScanner

Debugging Components

111

testExc

->

choose an e x c e ption [ b = b u s e r ro r , d = d i v i d e b y zer o ) :


G en e r a t i n g b u s e r r o r s e g f a u l t e x c e p t ion
Exception number O :

Task : O x 1 0f0f60 ( t 3 )

Gene r a l P r o t e c t ion F a u l t
Pr ogram Count e r :

Ox00f45 1 2d

Stat us R e g i s t e r :

Ox00 0 1 0206

408a0b

_v x T a s k E n t ry

+47 :

423df8

( 1 1 0d948 , o , 0 , o , o , o , o , o , o ,

0)
423 e 5 1

wd bFunc Ca l l L i b i n it +ad

: _t e s t E x c ( 0 , 0 , o , 0 , o , 0 ,

0, 0, o,

0)
f 452cf

t estExc

+d7 : _buserror ( f fffff f f )

Ox 1 0f0f60 ( t 3 ) : T rapped an E x c e p t ion for task Ox 1 0f0f60


value
i

- >

0 = OxO

NAME

ENTRY

TIO

PAI

STATUS

PC

SP

t E )( C T a s k

excTask

1 1 08de0

0 PEND

408358

1 1 08ce0

t Log T a s k

_logTask

1 1 032b0

0 PEND

408358

1 1 03 1 bO

tWd b T a s k

wd bTask

1 0f e668

3 READY

408358

1 0f e 5 1 8

Ox423df8

1 0f92b8

4 SUSPEND

f45 1 0e

1 0f 9 1 c 0

Ox423df8

1 0 f0f60

4 SUSPEND

f 45 1 2d

1 0f0e88

t2
t3

value = 0

OxO

The default handling i n Linux or VxWorks is essentially the same if an excep


tion is raised by errant code executing i n a task or Linux process context. The
default exception handling is much more drastic for code executing i n VxWorks
kernel or J S R context. In VxWorks, the default handling reboots the target. Look
ing now at output on the simulator console, the default VxWorks exception han
dler also provides i ndication of the exception here as well.
VxWo r k s
Copy r i ght

1 984 - 2002

W i n d R i v e r Syst ems ,

CPU : VxS1m for Wi ndows

Runt ime Name : v x wo r k s

R u n t ime v e r s ion : 5 . 5
BSP v e r s ion :

1 . 2/ 1

Created : J u l 20 2 002 ,

Scanned by CamScanner

1 9 : 23 : 59

I nc .

1 11

ms
mponents and Syste
Real-Time Embedded Co
B_COMM_ P I P E
WDB Co1111 Ty pe : WD
WOB : Re ad y .

Except ion
Vec tor O

Sta tus Reg i s t e r


Exc ept io n
Ve cto r 1 3

O x00f45 1 0e

r
Pr og ra m Co u n t e

: Div ide Err or


Ox0 00 1 020 6

P rogram Counter

on Fault
: G e n e r a l Pro t e c t i

Ox00f451 2d
Status Regis t e r

Ox000 1 0206

On Linux, the same code causes a core dump when the shell is confi gured
to
allow this.

To ensure that core dumps are allowed, use the b u i lt-in shell comm a nd

u n l imit

after compiling the example exception-generating code provided

CD-ROM.

on

the

[ s iewe r tsloc a l h o s t ex ] $ t c s h
[ s i ewe r t sloca l h o s t - / e x ) $ gee

- g g e n_e x c e p t i o n . c

- o genexc

[ s iewert slo calhost - / ex ] $ u n l i m i t

Now, run the g e n e x c executable to generate a segmentation fault by derefer

encing a bad pointer:

[ s iewert sloca l h o s t - / e x ] $
choose a n exce pt ion

. / genexc

[ b= b u s e r ro r ,

d = d i v ide b y z e r o ) : b

Gen e r a t i n g b u s e r r o r s e g f a u l t e x c e p t i o n
Segme n t a t ion f a u l t

( c ore dumpe d )

On a L i n u x s y s t em ' t h e c o r e f 1' l e d umped c a n be


gdb . T h i. s a 1 1 ows f o r exami n a t i o n of t h e s t a c k
of fend ing l i n e of C c ode :
cor e . 1 347 2
GNU g d b Red H a t L i n u x

l o a d e d a n d d e b u g g ed wit h

t ra c e and i d e n t i f i e s th e

[ s iewe r t s l oc a l h o s t - / e x ) $ g d b g e n e x c

(5.2. 1 -4)

Copy i gh t 2002 Fr ee Sof tware


Foundat ion
' Inc .
GOB i s f r ee sof twa r e ,
c o v ered b y t h e GNU G e n e r a l
P u b l i c L i c e n s e , and
you are
we l c o me to c h a ng e it and
/or dist ribute copies
of
con d i t io n s .
Type

it

u n d e r c e rtain

" sh ow copy in g

to se e t h e c o n
d i t io ns .
T h e r e i s abso l u t e l y n o wa r
r a n t y f o r GOB .
T y pe
det a i l s .

. i 3 86 _ r e d h a t - .
l 1nux
/ Q en e x c

Th i s GOB w a s c o n f i g u red as
Co re was g e n e r a t e d by

Scanned by CamScanner

" s h ow w a r r a n t y f o r

Debuggins Compon
ents

171

proga m t e rm i n a t e d wit h s ig n a
l 1 1 ' s egm
. n
ent a t 1o
.
faul t .
. / i686
R ead i ng s ym bo l s f ro m / l 1b
/ li bc . so . s . . .
do
ne
.
Lo a d ed symb o l s f o r / l i b / i686 / l i
bc . so . s
R ea ding sym bo l s f rom / l i b / l d - lin ux
. so . 2 . . . don e .
L oad ed sym bo l s f o r / l ib / ld - l inu x . s
o.2
#0 Ox 0 80 4 83 7 f i n b u s e r ro r ( ba dP
t r= Ox ff ff ff ff ) at
gen_ex ce pt io n . c : Je
so me Oa ta = * ba dP
38
tr;
( gdb) bt
Ox0 804 837 f in b u s e r ro r ( bad
Ptr =Ox fff fff f f ) at gen
_ex cep tio n . c : 38
Ox0 8 048 3d8 in mai n ( ) at gen
#1
_e x c ept ion . c : 56
#2 Ox4 201 58d 4 in l ibc _st a rt_ mai n ( )
f rom / l i b / i68 6/ lib c . so . 6
( gdb )
#0

Run the code again and this time generate the divide-by-zero exception. Now
load the core file dumped for the divide-by-zero exception generator, and, once
again, the stack trace and offending line of code can be examined with gdb.
[ s iewe r t s l o c a l h o s t - / e x ] $ . / genexc
choose and e x c e pt ion [ b =bus err o r , d=divide by zero ] : d
Generat ing d i v id e by z e r o exc eption
Float ing e x c ept i o n ( c o r e dumped )
[ s iewe r t s loc alhost - / ex ] $
siewert sloc a l h o s t - / e x ] $ gdb genexc core . 1 3473
GNU gdb R e d Hat L i n u x ( 5 . 2 . 1 - 4 )
Copyright 2002 F r e e Software Foundat ion , I n c .
GOB is f r e e s o f t wa re , c o v e red by t h e GNU General Public License , and
you a r e
under cert ain
welcome t o c h a n g e l t a nd / o r d i s t r ibute copie s of it
condit ion s .
Type " show c op y i. n g t o s e e t h e con d i t ions .
Type " show warr ant y for
Ther e i s abso l u t e ly n o warran ty for GOB

det ail s .

x" . . .
.
Thi s GOB was c o n f i g u r e d as 1396 - red hat - l inu
/ genexc '
Core was g e n e r a t e d b y
. h me t ic exc ept ion .
Pr ogram t e rm i n a t e d wit h s ig n a l 8 1 Arit
. . . do ne .
Re adin g symbo ls f rom / l 1. b / 1. 68 6 / li bc . so . 6
L o aded symb o l s f o r / li b / i686 / l b c . so . 6
don e .
2
R ead ing sym bo l s fo m / l ib / ld - l inu x . so .
L oa ded sym bo l s f o r / l ib / ld - lin ux . so . 2
. c : 31
gen _ex c ept ion
#0 Ox08 0483 68 i n d i v e r r o r ( a rg=O ) at

31

som eNu m

( gdb )

Scanned by CamScanner

1 / arg ;

1 72

components
Real-Time Embedded

pu; ...

emu

ac e i n fo rm ati o n a n d th e locah.on
ov id e st ac k tr
pr
ks
or
'
xW
V
d
So, both Li nux an
ex ce pt t" o n oc cu rs, it s easiest to
n
a
en
Wh
d
.

ra ise
pt io n is
.
.
cod e wh ere t h e ex ce
m
. e w h y t h e c od e is ca usmg th e exce pt io
in
e code to de te rm
th e Cr oss W i n d d In
sin gle-step debug th
ea sil y do ne us in g
t
e ug
os
m
is
is
th
en t,
"
the Tornado en vi ro nm
i n g se ct ion of th is cha pt e r
gg
bu
De
ep
St
ur e 9. 1 . The "S in gle
tool as shown in Fig
de bu gg er i n Torn ado.
use the sin gle -st ep
d
an
rt
sta
to
w
ho
wi ll cover exactly

nb

Prajlct Md oa..g

-,--,ltl
, ,0111111 - 1rti11 .-

. -

lvxnn@unwiise

- tf

olHI JmlEal\,jI

2.1.:!J 1!J
aih:.JccJc&I i!i)
{

i n t d i verror ( i n t arg)
O x f 9 5 1 0c
Ox f 9 5 10d
Oxf 9 5 1 0
int soaeNua ;
soaeNua l/arg
Ox f 9 5 1 1 2
Oxf 9 5 1 1 7
Ox f 9 5 1 1 9
O x f 9 5 1 la
Oxf 9 5 l ld
Oxf 9 5 1 2 0
Oxf 9 5 1 2 3
return OK ;
Oxf 9 5 1 2 6
Oxf 95128

>

'}

FIGUIE 1.1

O x f 9 5 1 2a
O x f 9 5 1 2c
Oxf 9 5 1 2 d
O x i 9 5 1 2e

PUSH

_ d 1 verror
_d 1 verror
+Ox 0 0 3

HOV

SUB

+Ox0 0 6
+OxOOb
+OxOOd
+O xO O e
+OxO l l
+0K0 1 4
+Ox0 1 7

MO V
MOY
CltiD
I D !'.'
MOV
MOY
MOV

EBP
EBP .
ESP .

ESP
48

ECX .

ECX

Ei\X .

Oxl

+OxO la

XOR

EAX .

+OxOlc

J MP

_ d 1 verror

+OxOle
+Ox0 2 0
+Ox0 2 1
>+ 0 x 0 2 2 .

MOV

ES I .

ESI

RET
MOV

ES I .

ESI

LEAVE

.
---1.-..n w
llbgT8* (Or 11..
ww:O

EAX . [ EEP+ S ]
[ EEP- 3 6 )
EAX
EDX . [ EBP- 3 6 ]
[ EBP-4 ] . EDX

*2: :

.
Using Tornado Cross Win d to An aIyze
Exception.

EAX
+

Ox20

Likewise, in Lin ux , a grap .


.
hical single-s tep deb ugger will
m ake it easier to deter
.
. .
mine why code is
rais ing an exc eptio n
N u merou s grap h tea
I d e b ugger s are ava1 Jable
for L inux and most r
u n on top of the
.
.
gdb co m m a n d-I" me d
ebugger. Figu re 9 2
sho ws usage of the K D
E envi ro n ment Kd
.
bg grap h teal
deb ugger.
A gain, the " S i n g!e
Step Debuggi ng" se
'
t
c
io
n of th 15
c h apter covers exactly how
to start and use t he i. ngle
t ep deb ugger
.
in t h e L m u x K DE

environ
ment.

Scanned by CamScanner

-s

Debugging Components
fie Epc llllafl ........ ..... .Hiii,

. " &S S

w: ?> tt' <'> (h a" e rA

int diverror (int ara)

.{

int oNua ;

llOlleNua

l/1r1 ;

SOXl , ktur

Ox8048382 .,v
Ox8048367 c l td

Oxl(,..llp )

0d048HI icliYl
Oxl04838b

. }

1 7J

""x . OX f f f f f f f c ( ,,,. b p )

llOV

return Ol ;

int buserror ( int *badPtr)

.{

int aoineD1 t 1 ;
someDa ta

aov

Ox804837f

llOV

Ox8048381

llOV

FIGURE

badPtr ;

Ox804 8 3 7 c

Ox8 ("4!bp ) . x

( '4! ax ) , %eax

'.lle ax , Ox f f f ffffc ( "'-bp)

9.2

ASSERT
Preve n t i n g except i o n s rather t h a n h a n d l i n g t h e m is m o re proactive. Certa i n l y
code can check for conditions t h a t wo uld cause a n exception a n d verify arguments
to avoi d the possibil ity of task suspension or ta rget reboot. The standard method
for t h i s type of defens ive progra m m i n g is to include assert checks i n code to isolate
errant conditions for debug. In C code, poi n ters are often passed to fu nctions for
efficiency. A caller of a function m ight not pass a vali d pointer, and t h i s can cause
an exception or erra n t behavior in the called function that can be difficult to trace
back to t h e bad p o i n ter. The following code demonstrates this with a very simple
example t h at fi rst passes

p r i n t Ad d r

# include

" s t d io . h "

# include

" stdlib . h "

# include

" as s e r t . h "

c h a r v a l id P t r [ J
c h a r in v a l i d P t r

a val id pointer and then passes a

NULL poi n ter:

" s ome s t r i ng ;

=
=

( c har * ) 0 ;

v o i d p r i n tAdd r ( v o i d * p t r )

/ * w i l l a s s ert and e x i t program here


a s s e r t ( ( in t ) p t r ) ;

Scanned by CamScanner

i f po i n t e r is NULL /

ts and System
Real-Time Embedded Componen

1 74

p r in t f ( " p t r

Ox%08x \ n " ,

pt r ) ;

int main ( vo id )

p r intAd d r ( v a l id Pt r ) ;
p r intAdd r ( i n v a l i d Pt r ) ;

r e t u r n OK ;

Running the preceding code o n a VxWorks t a r get o r VxSim produces the


following output:
ptr = Ox00f453ac
As s e r t i on f a iled :

_... _

f i le C : / Home / Sam / Bo o k / CDROM / E x a mp l e -

line 1 1

Co de / a s s e r t . c ,

:: , -

( i nt ) p t r ,

y se of assert checking in code makes the error i n the calling argumen\s obvi
-out and avoids confusion. Without the assert check o n t h e pointer parameter, it
,"

-;- -""':fuight

seem that there is an error in the function called when it fi n ally attempts to

dereference or otherwise use the pointer, which would mos t l ikely cause an excep
tion. The assert check is also supported in Linux.

CH ECKI NG RETURN CODES

Any RTOS such as VxWorks provi des a sign ifican t A P


I with m echa nism s fo r task
con rol , inter - ask com unica tion, sync hro
n izati o n , mem o ry man agem ent , and

_
device mterfacmg. Call s mto the API can
fail for n u mer ous reas ons :

Failu re to meet prec ond ition s by


the app licat i o n ( e. g . , s emTa k e whe n sem a
phore has not been cre ate d) .
Kernel resources have be en
ex ha ust ed .
A ad po int er or arg um en
t is pa sse d by the a pp lic ati
on .
A tim eo ut occurs on a
bl oc ki ng ca ll.

Fo r th is reason ap
pl1 ica ti n code sho uld
alw ays check ret u r n codes a nd handie
fa il ures w ith w ar n ng
og e o r se n t to
c onso le o r assert . N ot c h e cki n g re tur
codes often leads to d ffi
i cu t - to -fig ure -o ut fa il
ure s be yon d the i n it ial obv iou s AP
call failure .

Scanned by CamScanner

Debugging Components

175

.
I n Vx Works t h e tas k va ria ble e
.
rr
a1 ways md1cate s t h e last error encoun te red
dur i n g a n A P I c a l l A ca 1 1 t0 perro r ( )
m VxW orks will a l so pri n t more useful debug
. ,
.
m1o rma t ion w h e n a n A P I cal l ret urn s a
ba d code.

SIN G LE-ST E P D E B U Ci Ci l N Ci
Single-ste p d e b u ggi ng i s often t h e most
power fu I way to analyze and understand
both soft ware a lgor i t h m i c errors, ha rd ware so
/ ftware interfaces errors, and some.
even h rdwar e design flaws. Single -step debug
times
gi ng can be done at three d i f
feren t levels m m ost embe dded syste ms:

Task- or process- l evel debugging

Processo r-level debugging

System- o r kernel - level debuggi n g

Most application developers are accustomed to task- or process-level debug


ging. In t h i s case, a process or task is started in VxWorks or Linux and most often
a breakpoin t i s set for the entry poi n t of the task or process. This method allow
the user to debug only one thread of execution at a t ime. Often, this is sufficient
control beca use either the appl ication being debugged is signa lly threaded , or if
m u l t i threaded, then other threads in the overal l application will most often block
awaiting synchronizing events (e.g., a semaphore or message ) before proceed ing.
Debuggi ng asynchronous multithreaded applications is much more d i fficult and

req u i res e i t her syst e m - or kernel- level debugging or processor level using TAP
( Test Access Port ) hardware tools.

Task-level debuggi ng in VxWorks is sim ple. Com mand -line debugging can be
performed d i rectly w i t h i n the windshel l . G raphical debugging can be performed
using t he Tornado tool known as C ross Wind, and then accessed a n d controlled
t h rough a source viewer that displays C code, assembly , or a m ixed mode. For
embedd ed systems , debuggi ng is describe d as cross-de bugging because the ho-t
w h i c h the debug i nterface runs does not have to be the same architec
system

on

ture as the target system being debugged. On the other hand, it certainl y can be
the same, which m ight be the case for an embedded L i n u x system or an embed ded
.
VxW orks system r u n n i n g o n I n tel arch i tecture F u rtherm ore, the embed ded
.

e
syste m may n ot have suffici ent 1/0 i n terface s for effectiv debug A cros debug
viewin g.
ging y tern r u n s a debug agent on the embed ded target and debug o u rcc
.
do, the host tool i
and comm and a n d contro l is done o n a host system For Torna
agent ac c:pt
).
Cross Wi n d a n d the debug agent i W D B ( Wind Debug The debug
tion when reque ted .
com m a n d s a n d replie s with curre nt ta rget tate i n forma

Scanned by CamScanner

1 76

Com p onents
Real-Ti me Embedded

and Systems

t h e ca pa bi l ity to set break


an y de bu gger is
of
es

tur
c
fea
l
s
One o f t he mos t ba
the m. Th e re a re t wo ways t hat b reak
1
e s t e p bet ween
.
g
sin
or
n
ru
to
po in ts and
e m e n te d :
e most of ten i m pl

po in ts ar

ts
Ha rd wa re br ea k poi n
So ftw are b rea kp o i n t s

?reak

int add

ress
l'?
tha t the processor
Hardware b rea kpo i n t s req uire
tion
ruc
Po
r
nst
I
(
mte
)
P
I
the
or PC
i nes wh eth er
te r , a con 1pa rato r tha t det erm
reg1s

c
h
a
me
d
a
msm
an
to
rai se a
requested break add ess,
( Progra m Cou nter ) mat che s the
i ncl ude a

execu.
caus es a halt m
.
debu g exceptio n. The debu g exce ptio n
ler so that the user can exam in e the state of
tio n , and the debu g agent insta lls a hand
impo rtant and notab le characteristic of
the target at the point of this excep tion. O ne
the norm al th read of

to the n u mber of comparator reg


hardware breakpoints is that the n u m ber is limite d
limit is only two or at
isters provided by the specific proces sor archite ctu re. Often the
most a half dozen, but it's definitel y l i m ited by hard wa re support ing resources. A sig

nificant advantage of hardware b reakpo i n t s is that they do not modify the code being
debugged at all and are reliable even if memory is modi fied or errantly corru pted.

A processor reset or power cycle is most often the o n ly way they can be cleared.
Software breakpo i n ts are u n l i m i ted i n n u m ber. They a re i m pl e men ted by

i n sert in g an instruction into the code segme n t o f t h e t h read of execution being

debugged. They modify the code, but o n l y b y i n e rt i n g a si ngle i n s truction to raise


a ebug except ion at each re ques te d breakpo i n t . After t he debug exception is

raised, the debug agent handling of t h e e xce p t i o n is iden t ical. When the d ebug
agent ste s beyond the c u rren t breakp o i n t it must restore t h e original code it re
placed with the debug except ion i nstruc t i o n . Softwa re
b reakpo ints have the disad
vantag e t at they can and mostl y l ike l will
aded, U

111 ntt c.

be lost every t i m e code is relo

mem ory is e rra n tly corr upte d, a n d whe


n a proc esso r is rese t.
ost soft are debu g agen ts, such as

W D B and t he G N U debu g agent, use soft


a e r kpom t s due to t h e i r fl
e x i b i l i t y a n d u n l i m i ted n umb er For task -l evel
e g, t e ost Cross Wi n d
deb ug tool req uest s the WD B deb u age nt to set
_
a
pom
b rea
t m a code seg men t fo r a
spec i. fic task. This a llows t h e debug age nt to
han dle the d eb ug exce ptio
. n and
task
to compa re t h e c urre n t
task contex t to t he
bein g deb ugged and t susp
hat
n d t a t task. R u n the
exam ple code sequen cer t
crea tes two task s that r
an can e calib rated
s
to r u n for 1 0 to 20 millise cond ond
any target
'
e

:
system and t sequencer releases

n
eco nds a .
m1lhs

'I e
uses 900Yo of the
Th is
proce ssor cyc les wh 1
run s. Aft e r the two tas ks are

ru n ning ' th eir st


a t e c a n be observed with t he i. co01
man d to d u m p all task con t rol
s e r v ic eF2 0 eve ry

so

con d s
mi 1i 1se

blo c ks .

Scanned by CamScanner

s e r v 1 c eF 1 0 ever

20

1t

Debugging Components
- >

1 77

i
NAME

ENTRY

tExcTask
tWdbTask
service F 1 0
value

PC

SP

1 1 58de O

0 PEND

1 1 532b0

408358

1 1 58ce0

wdbTa sk

0 PEND

408358

1 1 4e668

1 1 53 1 bO

Ox423df 8

3 READY

408358

1 1 4e5 1 8

1 1 492b8

4 DELAY

408358

1 1 49 1 d8

s e r v i c e F20

STATUS

_logTa sk
-

t1

PRI

excTask

t LogTask

TIO

f ib 1 0

1 1 40f60

f i b20

2 1 PEND

408358

1 1 40e74

1 1 3bc08

22 PEND

408358

1 1 3bb 1 c

OxO

The t 1 task i s the sequencer and most often will be observed i n delay between
releases of s e r v i c e F 1 0 and serv iceF20. The two services will most often be observed
as pending while they wait for release. The i command is implemented by a request to
the tWdbTask ( the target agent), so it will be observed as ready ( running in VxWorks)
because it has to be run n in g to make the observation. Now, start the Cross Wind
debugger ( usually by clicking on a bug ico n ) , select Attach, and then select s e r
vic e F 1 0 and use the i command to dump the TCBs (Task Control Blocks) again.
Figure 9.3 shows the Cross Wind debugger now attached to servic eF 1 0 .

l't'-1 f'I atl t I9fll


Ivicwn@tamwise
oII 1 i;JII ',I 1
Zl lI.! 3 111 ><J .iJ 1rl
al:.lccllr1Cil _L_
.

- tl JC

--

..

- --

- -

- - -

I -

M ('}l s

Ox 4 08 J S 8
Ox 4 08 35b
O x 4 0 8 3 5d
O x4 0 8 36 0
O x4 0 8 3 b 3
Ox 4 0 8 3 6 5

Ox 4 0 8 36 7

O x4 08 369
Ox 4 08 3 a
Ox 408 36 f
O x 4 08 3 7 4

FIGUltE t.J

+ O x0 5 6
T O ii:O S
+ O xO 1
+O x 0 6 4

+Ox067
+O x0 6 9
x071

+O xO 3
+OxO H
+OxO 9
-r0 x0 8 4

__

(;]

J.LD

MOV
MOV
CMP
JllE
TEST
JE
PUSH
PUSH
CALL
MOV

ES P
4
EAX
EBX
EA X . [ EBP-4 J
EAX . l
v 1 ndEx1
+ OxSc
EBX EBX
+ OxS 4
v 1 nd Ex 1

EBX

Ox 4 0 8 3 0 8
tr
EU EBX

Cross Wind debugger attached asynchronously to running task.

Scanned by CamScanner

1 71

onents an d Systems
Real-Time Embedded Comp

->

i
ENT RY

NAME

PAI

TIO

PC

ST AT US

SP

e xcTask

1 1 sadeO

0 PE ND

408358

1 1 58ceo

t Exc Tas k
t Log Tas k

_logTask

1 1 53 2b 0

408358

1 1 53 1 b0

tWdbTask

wdbTask

1 1 4e668

0 PE ND
3 RE AD Y

408358

1 1 4e518

Ox423df 8

1 1 49 2 b8

4 DE LAY

408358

1 1 491 d8

t1

serv ice F 1 0 _f i b 1 0
s e r v ic e F20 _f ib20
value = O

1 1 40f60

2 1 SUSPEND

408358

1 1 40e74

1 1 3bc08

22 PEND

408358

1 1 3b b 1 c

OxO

The serviceF 1 0 task has been suspended by the WDB debug agent. Now the
debug agent can be queried for any i nformation on the state o f t h is task an

code

executing in the context of this task that the user wants t o see. I t can be smgle

stepped from this point as well. When this is done, most often the running task is
caught by the attach somewhere in kernel code, and if the user does not have full
kernel source, a disassembly is displayed along with a single-step p rompt.
Asynchronously attaching and si ngle stepping a run n in g task is often not help
ful to the user debugging the code being scheduled by the kernel a nd which calls
into the ket,nel AP i . Instead of attaching to a running task, n o w start the same se
quencer from the debugger ( first stop the debugger and resta rt the t arget or simu
lator to shut down the running tasks ) . Now, restart the debugger after the target or
simulator is running again and from the Debug menu, select Run. Start Sequencer
from the Run di log box and select Break at Entry. Note that a rguments to the
.
fu nct1on entry pomt can also be passed in from the debugger tool if n eeded. At the
.
wmdshell, output will indicate that the task has been started and that i t has hit a
breakpoint immediately on entry.
->

Bre ak a t OxOOf940e 7 : _Se que


n c e r + Ox3
- >

T as k : O x 1 1 492b 8 ( tDbgT a skl

i
NAME

ENT RY

t ExcTask

_exc Tas k

t LogTask

_l og Ta sk

tWdbTask

_wdb Tas k

tDbgTask

_S eq ue nc er

value

Oxo

TIO

PRI

1 1 58 d eO

ST AT US

0 PE ND

1 1 532b O

0 PE NO

1 1 4 e668

3 RE AD Y

1 1 4 92b 8 1 00 SUS
PEND

PC

SP

408358

1 1 58ce0

408358

1 1 53 1 bO

40 83 58

1 1 4e 5 1 8

f940 e 7

1 1 49244

Note the deb ug age nt wrappe


.
r task tDb Tas
k wit the e n t ry po
int sta rt ed from
the debugge r is in the suspen d stat
A
e sam e tim e,
vi wer win do w wit h a
sour ce-level debug prom pt appears
T nado as
sho w n m Figure 9.4 .
-

Scanned by CamScanner

Debugging Components

1 71

oid sbutdown (vo id)


abortTes t l ;

void Sequencer ( v o i d
)

pri n t f ( " S t ar t i ng Sequencer'n" ) .


/ Just to be sure ve have
s7sClkRt e5et ( l O OO ) .

':l

asec t ick and TOs /

/ Set up service release seaaph


ores /
seaF l O seaBCreate ( SEll..CL FIFO
. SEK EMPTY ) .
seaF20 seaBCreat e( SElt_Q_ FI FO .
SEK EKPTY )

FIGURE 9.4

ii
. -

Cross Wind source-level debugging.

I n general, a C compiler generates numerous machine code instructions for


each l i n e of C code, a n d the sou rce-level debugger can view the current I P ( I n
struction Poi n ter) l ocation in a mixed C source a n d disassembled view a s shown i n
Figure 9.5. This view can be very useful i f t h e C compiler i s generating bad code
(not l ikely, b u t possib l e ) or i f i t's generating inefficien t code ( more likely).

DltlJ

f.......@s-

f1o"'lil tl!L_

:::J ol..l -'ISJl!atl',J

.,, J(l .d .
el4lc.cllfil --

-._ - .( -

j,

M l'P t;! Cl r;J

void Sequencer ( vo 1 d )
" !"! t o .: ue1
C' x f .. ces

PUH
HOV

ien.:er

:Sequei..... e t

Fll.:irl

c LL
41 [
0 be sure ve h l a.ec

ay.C lk Ra teSe t ( l OOO )


;.t ' 4 C f 4

/ J st

i i< t " H I
. l< l ( . , ...

/ Se t u

t.I

'

..,.f l O

FIGURE

t.5

debugging.

Scanned by CamScanner

FU -H
r lt

. >e l '
yJI

ll

apho res
wrv 1c r e l osa M ...

-8( reateC S11_CL f l f0

EI
EI
0>
. '
f<

t ic and TO, /

G x0 1 0

(f

S1C_ Ell PTV l

t>
'
.

, .....,.

Cross Wind mixed C and Assembly level

1 80

Real-Time Embedded Components and Systems

Process-level debug in Lin ux is often the eq u ivale n t of tas k - level d


eb
.
VxWorks. A Linux process can be m ul tithreaded with P OSIX thre ads, fo r e a g lfl
x
Pie,
in which case the pt hread is the equ ivalen t o f the VxWorks task. One ad va
n t ge
of
embedding Linux is that many of the applica tion algorithm s c an be devel op
e an.d
debugged on the host Linux system without cross debugging. Cross deb u
ggi g i
s
only required when the bug is related to how the code speci fi caJJy exec ute s 0 n
he
t
target system. F igure 9. 6 shows the use of Kdbg to debug the m ul tith readed pn
flOr
.
.
.
Ity mvers1on
d emo code found on the C D-ROM.
IBit

li!lm Epc.... .,_.,_. .,._ a a s ur 7'> n- W'


voi d 9-rJ nter(w id thread l d J

.{

iJ!t i ;

atrvct t!Jleapec dHpTiM. dT!Jle;

atruct tl Mapec lowStert ,

it( ( l n t ) thrucUd

0d041..2

CIOpi

O.I04lalMI jne

lownd , ht1hStar t , h isl>End;

10WJUOJ11VICE)
SOllJ ,Ox.l( wbp)
=

Oxto11alf <s..,r l n t e r S l >

ltt iaofday(llowStart ) ;
p.rlntf( "Low pno thread requf!a tin1
aharf!d naource at lid aec'

. i

1tt1"ofday(lhi1hStar t ) ;
print! ( "Hl1h pr lo thzHd nqu
t

i n 1 hued resource at
!Id c '

.i-- --- ui

Low _ p_rlo Ht. CS\n " ) ;


.

in

d bg mixed C and Asse b


m ly source-

The exa mple pth read

code can be bu1' l t


.
.
us mg
9 cc r ki n g m
the
opti on to incl ud e
deb ug sym b ols .

f s iewe rtslocalh

::i

L011_rw1o_sERVICE)

:::.;

-g

!Id

CScount+-;
: r ( ( i.nt )threedid

and us.mg t h e

!Id na

pthrHd.JIUtH._lock( ...1sm ) ;

.._

::J

ost - / e x ] $
gee

- g P t h rea d
. c - lp t h re a d

p t h read

- o p t h r e ad

library

Th e Kdbg app
I'ICa tio
. n can
loa d th e p h
ba d upo n the LF
a . execu table a n d will fi nd the C
E
( Exec uta ble
an
es a sym bol
m mg For ma t
s as we ll as
) i n for ma tio n, which
th
e
so
u
rce co de p ath
I f an em bedded L
' ux a pp l '
m
ica t1o
n n eeds
d eb ugged with
to b e deb ugg
db bY con
ed, it c a n be rem otely
n ec ting to
g
db
over
r
e
Eth
m
e
m
ote
t
l y .o ver a seri
em bedd e drne . " o . u se Qdb re o e y,
al o r T C P connecti on
t l
J
yo
u
m
us
p JC a t on an d
t l mk deb ugg

the n u se
ing s t u bs wit h yo u r
Qdb t a ge
to con nect
t r e mo t e
er sen aJ to th e
l d e v / t t yS O , for ex am l
emb ed d ed
p e,
ap p l icat1. o
n fo r deb ug.

Scanned by CamScanner

Debuggin g Components

181

Syst m - o r kernel - l evel debugging rather than attac h i ng to a si ngle task or


.
pro ess is tncky becau e b t h the Linux and t he VxWorks systems rely upon basic
serv ices ( o r daemons m L m u x ) to m a i n ta i n com m u n i c ation with the user . F o r
exa pl e, i n VxWo ks, the W D B and shell tasks m ust run along with the net task to
.
.
mamtam com m u n ication between the Tornado host tools and the ta rget. Simila rly,

Linux com m u nicates t h rough shells or a wi ndowing system. In VxWorks, system


level debug requires a special kernel build and the ue of a pol l i ng driver i n terface

for com m u n ication. Li kewise, Linux requi res use of kdb the kernel debugger. The
advantage of system - level debug is that it al lows for single stepping of the entire ker
nel and all applications scheduled by it. System- or kernel -level debugging is most
often used by L i n u x d river, kernel module, and kernel featuredevelopers.
I nstead of system - level debug, a hardware debug method can be used to si ngle
step the entire processor rather than using t he system-debug software method. This
has an advantage in that n o services are needed from the software because c o m
munication i s p rovided b y h a rdware. T h e JTAG ( J oint Test Appl ication Grou p )
I EEE standard h a s evolved t o include debug funct ionality th rough the T A P ( Test
Access Port ) . A

]TAG i s a device

that in terfaces to a host via parallel port, USB, or

Ethernet to c o m m a n d and control

t he TAP hardware that allows for processor s i n

gle stepping, register loads and dumps, memory loads and dumps, and even down
load of program s for services such as flashing i m ages i n to no nvola t i le memory.
Figure 9.7 shows the W i n d River V isionCLI C K JTAG debugger run n i ng on a lap
top con n ected to a PowerPC eval uation board with a parallel port host i n terface
and a JTAG B D M ( Ba c kground Debug Mode) con nection to the PowerPC TAP.

..

, ,,
t

FIGURE 9.7

Typical JTAG debugger setup.

(!nost

l l o c k i ng o f t h e dcv ct u nde r t e t
The J TAG s tand ard i n c l ude s exte rna
a h i l i t y to clock kt d a t .1 1 n or o u t of t he fA P,
e c a >
r
often t h e proce sor cior d e '" u g ) th
1 c de od mg of Jdd n.:' , dJt.1, ,md o n t r I s
an d r eset con t ro I . T }.1 e TA u
r pro v id e ha

Scanned by CamScanner

1 11

Real-Time Embedded Comp

onents and Systems

or can be read o r wri tt en. A lTAG


mory wit hin the pro cess.
tha t an y add ressa bl e me
t
nde d vers ions of gdb, so h a t cod e seg m en ts
1 nclu ding exte
debugger can be used
ry c a n be s i n gle step ped. A l
mo ry of no nvo l a t i l e me mo
.
loa ded via JT AG int o me
It h a s powerful ca a.
ls,
too
are
dw
har
nal
itio
req ui res add
p
tho ugh JTAG deb ugg ing
that
has
em
no
syst
a
onto
cur
.
ed
r
ent
t-str app
For e xam ple code can be boo
.
.
b 1 1 1ty.
d
n

h
a
fl
as
ng,
testi
,
ding
mg
nloa
dow
ini tial
are and
boot code. Brin ging up new ha rdw
m
elop
dev
ent
re
wa
tas
firm
a
is
k. By
t vect or
boot code that runs fr.om a system rese
ory,
t
mem
mos
e
l
i
t
vola
ofte
n to
out of non
defin ition , firm ware is software that runs
loaded
often
re,
a
softw
by the
er- level
boot a devic e a n d make it usea ble by high
des
an
d
n
provi
a
int
erface
supp ort
firmw are. The JTAG requi res no target softw are
u
d
n
a
event
,
ally
debu gged
p ro
so that initia l boot- strap code can be loade d,
rm
has
been
platfo
After the
boot
gramm ed into a nonvo latile boot memo ry device .
B
can
be used
ch as W D
strapped with JTAG, a more traditio nal debug agent u
instead for system-level or task- level debug.

K E R N E L S C H E D U L E R TRAC ES
Kernel scheduler trac ing is a critical debug method for real - t i m e systems. Real
time services run either as ISRs or as priority pree m p t ive schedu led tasks. In

VxWorks, this is the default sched u l i ng mecha n i m fo r all tasks. In Linux, the
POSIX t h read F I FO scheduling policy m ust be u ed fo r ystem scope t h reads.
Process scope PO SIX threads are only run when t he proce s that owns them is run.

y tern cope POSIX th reads, such as VxWorks t a ks, a re ru n according to p re


emptive priorities by the kernel. The kernel i tself m ust a lso be preemptab le for
d ete r m i n i stic real-time scheduli ng. The L i n u x kernel can be patched to make it
preemptabl e l ike the VxWorks Wind k e rnel . Given a pree m ptable priority -driven

m u l t i ervice system as just described, RM theory and pol icy can be used to assi gn
priorities to services and to determ i n e whether the system i s feasible. A syst e m is

feasible i f none of the services will m iss their req u i red deadli nes. Theory is an ex
cellent tart i n poi t, but the theo ret ical feasibi lity of a
yste m sho uld be veri fied

and any deadl ine misses observed m u t be debug ged.


The Torna do/Vx Works dev e lo p m e n t fra mewo rk
i n c l udes a tool c all ed
Wind View that provides an event trace of all
tasks ( ervice s) a n d I S Rs (servic e ) in
.
the Y t en: a1 ong wit
. h sync h ronou s and async hrono us event .
A sema ph ore gi ve
and take 1s h ow n a a syn ch
ronou s event o n t he contex t trace along w i t h a )' 11 .
ch ono event uch a i nter rup ts,
exce ptio ns, and t i m eou t . Figu re 9 . 8 ho
\,. J
W i nd V i ew trac e for a n NT c

vi d eo capt u re driver , wh1 h tran po rt fra me t o a n


.
.
exte rnal view ing app l i atio n over TCP
.
In F i g u re 9. 8, t h e I m i l l i ec nd i n
TO
terv al t 1mer
.
i n terru p t a n be een a ir
Ex.a t l y 3 3 f the e o ur between t he t wo
fh
t a r t o f t h c t B t 11 1 d ta'
k re I ea e

Scanned by CamScanner

Debuggi ng Components
l.1

FIGURE 9.8

1 IJ

3.Z

WindView trace for video capture and transport.

video encoder fra m e rate for t h i s t race wa 30 fp . The I N T 1 0 i n terrupt i a soc iated
wit h t h e t N e t T a s k a n d i s t h e l e t \ o r k I n terface ard ( N I C ) D I A com pletion i n
terrupt. T h e b u l k o f t h e e r ice ac t i v i ry between frame acq u i s i t io n a re relea e s o f
t h e s t r ea m i n g t a k a n d t h e T P/ I P network task. I d l e t i m e i n d i ca t e t h a t t h e
processor i s n o t fu l l y u t i l i zed, a n d i n a W i n d V iew trace, t h i i t i m e t ha t t he Wind
kernel d i pa t c h e r wa p i n n i n g and wa i t i n g for om e t h i n g to di patch from the
ready queue by p r i o r i ty. F i g u re 9.9 hows two ynthetic load e rv i ce ( each serv ice
computes the F i bo n acci seq uence ) : ' 1 r u n for I 0 m t'C every 20 m secs, a n d 2 ru n s
for 20 m sec every 50 m ec .
0.01

1.07

0.09

0. 1 0

0.1 1

0.12

0.13

0. 1 4

0. 1 5

0.1&

IMemtJlS

IMcmoplO

tExcTeak

tl...THk

ISllell

f'Nalnt

.eetr eat
acll

eeZI

., .

IWt

.....
...

FIGURE 9.9

Scanned by CamScanner

.. .

WindView trace for synthet i c serv i ce loading.

0. 1 7

0.11

0.1t

i
i

1 84

s
s and System
d Com ponent
edde
Emb
Time
Real-

re le as e t h e tw o se rv i ces
se q u en c e a n d
to
. ,
ed
us
s
.
re
t h e se map ho
, s e m 1 o an d s em 20 , w h i ch
rs
ce
en
On this tra ce,
qu
se
o
tw
.
.
n alo ng wi th
20, can b e see
0 . Th e ta sk d el ays are
fib1 o and f l b
f i b 1 o a n d f l b2
se
lea
re
.
o
t
s
re
ho
ap
m
se
g1ve
T he s ? h t ra c e s . for
.
ce
ra
t
ed
. p1 y de1 ay an d
s1m
sh
ha
w s a n d th e
righ t - po in tin g ar ro
d t r ce m d.1ca tes ti me
in dicated by the
tim e, a n d th e do tte _
n
io
ut
ec
ex
r
so
es
are pr oc
e. f h 1 s pa rti c ul ar tw o
f ib 1 0 an d f ib 20
n i ng , o r id le t i m
n
u
r
is
pt
rru
te
n
i
k or
se c pe ri od.
where some ot he r tas
r cy cle s ov er a l 00 m
e av ail ab le pr oc es so
th
of
%
90
es
us
rio
service sce na
ul t i p l e ) of t h e tw o rel ea se pe
( Le ast Co mm on M
M
LC
the
is
d
rio
pe
The 1 00 msec
ms ecs an d
es a re
pe cte d ex ec u t i o n tim
ex
e
Th
.
ecs
ms
0
=5
T2
riods T1 = 20 an d
rel ea ses of the ser v ices
e, i t 's c le a r t h a t t h e
rac
t
s
thi
m
Fro
52
and
C2=20 msecs for 51
u rat e a n d a s exp ect ed . T i me
exe cut ion ti me s are acc
are well syn chr on ize d, and the
c h a r d w a re t i m ers , such as
s i n g a rc h i t e c t u re - spe c i fi
sta mp ing on the trac e is d on e u
m p Co u n te r ) , wh ich is
u m , t h e TSC ( T i m e S t a
the interval ti.m er, or o n the Pen t i
a c c ura te to a m i
. I n gen era l, t i m esta m p i n g is
as acc urat e as the pro cess or cyc le rate
h a rdw are t i mes t a m p d r i ve r .
cros econ d o r less whe n supp orte d b y a
y
n t s s u c h as W i n d V i e w are c l e a rl y ver
Sc hedu ler trace s of tasks and kern el eve
S i t u a t i o n s l i k e servi ces over run
i l l u m i n a t i ng when debu gging rea l - t i m e syste ms.
ge r to zero i n on p roble matic
n i ng dead l i nes are made obvio us and allow t h e debug
n d V i e w c a n a lso be very
sequen ces and pe haps better synchr on i z e servic es. W i
_
o i mp rove effi
h l p ful when servtCes s i m p ly ru n too l o n g a n d m u s t b e t u n e d t
call
c t e nc y . The runt ime of each release can be clearly observed . U s i n g a fu n c t i o n

C1= I O

vEent,

i n application code, t he trace c a n b e fu rther a n n ot a te d w i t h t h e chevro

t 1 o n event mmdteators and counts shown prev iously i n F 1' g u re 9 . 8 . Th e se a p p 1 ica


.
.
d1ca_tors can be sed tor events uch as fra m e c o u n t s or a n yt h i n g that is useful for
.
t racmg appltcat1on- level events. A l t h o ugh W i n d V iew i s c l ea rl y usefu I , a frequen t
.
.
.
.

t tus 1. ve a l l th1 t rac111g is to t h e exec u t 1' o n of a p p 1 1cat10


n servic es.
h ow m
concern is
.
.
d v . is
w.111
t racmg mechanism t h at involves software i n - c i r c u i t rather than

.
.
h ard wa re m -c 1 rcu1t to collect the t race d a t a . S W I C ( So ft w a re l - c 1 r c u 1 t ) met h od s
.
.
are advantageou s because they don't re u 1. re elect r i cal probing
l ike Logic Analyzers

.
.
.
( LAs) ' no b u I. I t - m
I ogi c on processo r chips, a n d can be add e d t o syste m s m the field

and even re m o t ely accessed Th e a u t h or worked 0


n a p roject w h ere o perators of a
.
.
pace tele cope have the opti o n o f t urn ing
.
o n w tn
dv tew

co I I ec t 1o n a n d d u m pin g
a trace back to E a rt h over t he
NAS A 0ee Space Network- h op e fu l ly this wi ll

.
.
.
neve r be needed. To unde rstan d h ow m
t r us 1v e S W I C m e t h o d s s uch as Wmd
V 1ew
.
a re, you n eed to und er land t h e
eve nt m t r u m e n t ation

.
a n d t h e t race outp ut. T he
.
I ea t m t ru 1ve way to coll ect t race d at .
a I t o b u ffier t h e t race rec o rd i n me mo ry
.
( ryp1cally bit
. -e n oded 3 2 - b 1" t wo
. r d o r cac h e- I " m e - sized

.
record ) . Most p r oc e ors
h ave po ted writ e - buffe r i n t e rfa e t o
m e mo ry t h a t h av e 1. g n .1 fi c a n t t h r o u g h p ut
.
ba n d w 1'd t h . Lo k i ng more lo e 1 y a t t h e
.
.
t ra e i n p s g u r t:: 9 .8 and 9.9 earlie
. '
r m t he
c h a p t e r , 1 t c I e a r rhat eve n t s 0 u r a t L
h e ra t e f m i 1 1 1 e ond
.
n t he 200 M Hz
p ro e sor the r races were t ' ken o n . 1 n gen
eral ' '1 f t h e t ra e i of eve n t t h a t oc c u r

IC\

Scanned by CamScanner

1 85

Debugging Components

o o r mo re pro ces sor


every I m i l l i n
eye I es as t h ey do here,
.
and the trace write s to
o
n
e
take
wn
ry
teba
mo
ck
cy cle ' th e t race
me
.
out
put
load
.
i ng on t h e pro cess or 1s
very l ? Th e ms tru i:i en tat ion to det ect eve nts i n the
Wi
nd ker nel req u i res bu ilt.
in l ogJC m the ern el itse l f. Th e i nst rum ent atio n req u i res
log ical test s. Overa l l , eac h
.
eve nt trace P? n t m ight ake I 0 to I 00 proc ess or cycl es ev ry I
e
m illio n cycl es-a
ve ry l ow add i tion al load mg to t h e pro cess or fo r t raci ng. F or
low load trac ing, i t ' s
c ri tic a l that t h e trace be b u ffere d to mem ory . Afte r collec t i ng trace
data , the fu ll
trac e sho u l d b e d u m ped o n ly a fter all colle ct ion is done and durin g nonrea l - t i m e
o pe ra tion o r sl o w l y d u r i ng slack t ime. D u m p i ng a trace buffe
r can be very i n tru
sive and sign i fi c a n t c a re s h o u l d be taken on how and when this i done. I n all traces
s hown in t h is c h apter, d u m pi n g was done a fter collectio n by my request using t h e

WindV iew host t o o ls .

Buildi ng trac e i n st r u m e n t a t i o n i n to t h e Wind kernel is simple. The Tornado


config uration tool c a n be used to speci fy how the WindView i nstrumentation is
buil t into the VxWorks kernel, w here and how much trace data is buffe red, and
how timestamping is done. F ig u re 9. 1 0 shows usage o f the Tornado kernel co fi g
u ration tool a n d spe c i fi c a ll y W i n dView i nstrumentation, download, a n d t 1 me

stamping options fo r the Vx \i\' o rks b u i ld.

Workspace: Workspace1

I-ii I

di ng WindView ke rn el
FIGURE t. 1 0 Ad
instr um entation .

ve ry
ool k i t , w h i c h i
.
l led t h e L i n u x Tr a e T
ca
l
too
a
.
the
L i n u x can be t raced u m g
. on I a b '' t d i fft:r e n t ' b u t t he en d re ult 1
. .
.
.
1
t
a
.
si m i l a r to W i n d V 1 e w. Th e o per
erv i. es
i n l u d i n g a p p l.1 at 1on
ent
ev
rn
e
t
.
r ne l a n d Y
sa me . a S W J C f ra c e of k e
t m L in u x ke rn el wi th
u
a
1
id

u
b
d
n
a
h
u t pat
Lik e V x W ork s, t o u e LTf, yo u m

Scanned by CamScanner

1 86

Real-Time Embedded Components and Systems

a re well c oe rd by the m ai n tain .


LTI i nstru men t ation . The deta i l s of LTT usag e
S J a n d Bw ld mg E bedded Linux
ers o f LTT a n d by Linux Device Drivers I Cor betO
.
.
n be eas ily b u i l t 10 to m ost a n
a
c
ng
raci
t
C
I
SW
t
tha
e
]
y
Systems [Ya ghm our0 3 . Not
.
reco rds in to th e k e
race
t
ed
od
enc
t
cien
effi
r.
syst em by plac ing mem ory writ es with
.
bu
race
t
e
h
t
ffer
a
g
n
i
p
n
m
u
d
d
to
p
ted
ost
nel sche d uler . M uc h of t he work is rela
the
in
ker
on
tati
nel
men
ru
is
inst
very
the
prq cess ing it for ana lysis . Often plac i ng
straightforward.

TEST ACC E S S PORTS

for bou nda ry sc a n ro


stand ard fo r JT AG was origi nally desig ned
asse m bly were a l l properly con
ensur e that devic es on a digita l electr onic board
i nterfa ce incl udes test data
nected . Designed pri marily for factory verific ation, the
i n to and out of a
i n and test data out signals so that b i t sequen ces can be s h i fted
The I EEE

1 1 49. I

chain of devices under test. A n expected test data o u t p u t sequen ce can be verified
for a test data input sequence . The in terface also incl udes a clock input and reset.
Even tually, m icroprocessor a n d firmware developer s deter m i ned that JTAG co uld
be extended with the TAP interface . The TAP allows a JTAG to send bjt patterns
through the scan chain and also allows the JT AG user to c o m mand a processor ,

single step it, load registers and m em o ry, download code, a n d d u m p registers and
memory out so that co m mands and data can be sent to t h e device under test . A
processor with a TAP can be fully clocked and con trolled by a JT AG device, which

allows the user to boot-strap i t into opera tion right o u t of the h a rdware reset logic.
From the basic JTAG fu nctionali ty, external cont rol of m icroprocessors for
fi rmware development has contin ued to evolve. M ul t i p rocessor syste ms on a
single chip can also be controlled by a si ngle J T A G w h e n t h e y a re placed on a
com m ? n boundary scan chain. The TAP a llows for selection o f one of the proces
sors with bypass logic in the chain so that p rocessor s n o t u n d e r J T AG cont rol can
forward comma nds and data. More o n - c h i p feat u res t o s u pport JTAG ha ve

evolved and are often referred t o as o n - c h i p debug ,


yet the basic com m an d and
contr ol of these addit ional fea t u res is still thro
ugh the basic J T AG signal interface.
Befo JTAG, firmware was deve loped for
new hard ware syste ms with a "b u rn and

learn appr oach where EE PRO M devi


ces were exte rn a l l y prog ram med and sock
.

eted mto
t he new syst em t 0 prov1 d e
nonvol at ile boot code to execute from the
1
p roce sor reset vector Typ ica
J I y, t h e E
1t1a ize
E PROM was program med to first m
.
a sena
. 1 port to wnte out some co
n fi1 rm 1 11g "hello worl d " data, b l i n k an LE D, and
'
. com po ne
1 m. t 1. a 1 1. ze ba 1c
dn t
n t s s ue h as m e m o ry. I f
di
the E E P RO M p rogra m
wo r k , th e fi1 rmw are pro gr a m me r
wou I d try aga i. n , repea t i n g the proc ess u n t il
progres was made.

Scanned by CamScanner

Debugging Components

1 87

After a PROM pr gram was developed fo r a new processor board, most often

the program was prov1d d as a PROM monitor to make t h ings easier for software
developers. PROM monitors would provide basic code download features, mem
ory dumps, d isassembly of code segments, and diagnosti c tests for hardware com
ponents on the system board. The PROM monitor was a common starting poi n t
for developi n g m o re soph i sticated boot code and firmware t o provide platform
services for software.
Today, most systems are boot -strapped with JTAG rather than "burn a nd
learn" or PROM monitors. The capability of JTAG and on-chip debug has pro
gressed so m uch t h at , for example, the entire Tornado/VxWorks tool system can
be run over JTAG, including W indView. This allows firmware and software devel
opers to bring up new hardware systems rapidly, most often the same day that
h ardware arrives back from fabrication. The h i story of this evolution from b u rn
and learn t o PROM moni tors t o use of JTAG and more advanced on-chip debug is
summarized well by Craig A. H aller of Macraigor Systems I Zen ] :
First there was the "crash and burn " method of debugging. . . . After
some time, the idea of a hardware single step was implemented. . . . A t some
point in history, someone (and after reading this I will get lots of email claim
ing credit) had the idea of a debugger monitor (aka ROM monitor) . . . . Sim
ila r in concept to the ROM emula tor, the next major breakthrough in
debugging was the user friendly in-circuit emula tor (ICE) . . . . The latest
addition to the debugger arsenal is on-ch ip debugging (OCD ) . . . . Some
processors enhance their OCD with other resources truly creating complete
on -ch ip deb uggers. IBM's 4xx PowerPC family of embedded processors have a

seven wire interface ( "RISCTrace") in addition to the OCD ("RJSCWatch ")


tltat allow for a complete trace of the processor's execution.

The OCD t h at Craig Haller describes is a combination of JT AG with a trace


port, which approximates the capability of an ICE. The ICE traditionally incl uded
a chip bond-out i nterface so that the ICE could monitor all pin I/O signals to the
OUT ( Device U nder Test ) . This was more valuable when most processors i n ter
faced to memory to fetch code and update data without cache, on-chip memory,
on-chip busses, and in the case of SoC ( System-on-a-Chip), even multiple proce sor cores on chip. Mon itorin g external signals when all of the action is on ch i p
makes no sense . o, the ICE has become les widely used and method of 0 ,D
(On-Chip Debug) are expandi ng to incl ude ot only JTAG and t race, but even o n
chip I LA.s ( I ntern al Logic Analyzer ) an signal probe mUJc es such as the X i l i n x
C h i p Scope monitor. G iven the trend with larger and larger level o f i ntegrat ion
o n a si ngle chip with SoC design 0 D will continue to ex pand.

Scanned by CamScanner

1 11

Real-Time Embedded Components and System s

RT
TR
S!,__
O!
P
El
AC!

__
__
__
_
__
__
__
__
__
__
__
__
__
__
__
__

.
.
cessor s often were i ntegrat ed i nto
Simpl er micro contro II ers an d oIder micro pro
.
c
.

ace t hat is
t Emulato r ) . An I C E me 1 u des an .mten

a system usmg an I C E ( I n- c ircui


board a n d c a n momtor and co n t rol
i nterposed between a processor a nd the system
.
device. The ICE can t h erefore track
all mput an d output s1gna l s to and from the

oth er t han
u nder test as long as no i n ternal memory dev1ces
th e state o f t he d evice
.
to an extern aJly visible m e m o ry. When on- ch ip
'
registers are I oaded a nd sto--d

cach e emerged as a Processor fieature, this made a true I C E d 1 ffii c u l t to imple ment .
.
Then the ICE had no idea what was being loaded from cache and written bac k to
cache on-chip and could easily lose track of the i n ternal state of the processo r being
emulated. The emulation provided not only all the features of JT A TAP, but also
full-speed state tracing so that bugs that were only observable r u n n i ng at full speed,
.
and not seen when single stepping, could be understood. This type of hardware or

software bug, often due to tim ing issues or race conditions i n code that is not well
syn chronized, is known as a Heisenb ug . The name ind icates that l ike the Heisenburg
Uncertainty Principle, where a particle's position and momen t u m can't be simulta
neously observed, a Heisenbug can't be single stepped and the fa ulty logic observed
simultaneously. This type of bug is observable only u nder full-speed conditions and
i s difficult to isolate and reproduce in a single-ste p debug mode. A n ICE can be in
valuable for understa nding and correcti ng a Heisenb ug.
Trace ports have emerged as a solutio n for the cost and com plexity of ICE
imple menta tion and the ability to debug full-sp eed execu tion
of software with no
intrus ion at the software level . Wind View is also an appro
ach, but does involve in
trusio n at the softw are level. Trace ports use i ntern
al h ardw are mon itors to out put
i ntern al state in form ation on a l i m ited n u
mbe r o f ded icated 1/0 pins from a
proc esso r to an exte rnal trace acqu isitio n
devi ce. For exam ple, the I P can be out
put from the processor core onto 32 outp
ut . p i n s ever y cycl e so t hat the thread of
exec utio n can be trac ed whi le a syst em
run s full spe ed. The t race ana lysis is do ne
offl ine afte r the fact , but coll ecti on
m ust occ ur a t the sam e spe ed as the exec utio ,

and therefore ext ern al trac e acq uis


itio n req uire s test equ i p me
Losic
a
s
nt suc h a
An alyz er or specia lized dig ital sig
nal cap ture dev ice. For a fu pic
ll
ture of wh at code
i doi ng in a full -speed tra
ce, th.e I P m ust be cap tur ed
alo ng wit h dat a add re ss and
.
dat a val ue v cto rs n the int
erfa ce un it bet we en the p roc

ess or cor e and the m e m


ory sys tem , inc lud ing cac he
. Th is is a sig n ific an t n u m be r o f ou tp ut pin s. t o t
oft en , tra ce ports ab revi
te inform ati on to red uc e t rac e po rt pin cou nt. f or e
am ple , a tra ce po rt m igh t
inc lude the I P on ly ( n
o da ta or ad d re s infor m at io n ) and
fu rth erm or e co mp ess the
J
P ou tpu t to 8 bits

,
.
. Th e Jl -bit I p ca n be o rnp rt'do wn to 8 . its by inc lud ing
bu ilt -i n tra ce log i
.
tha t ou tpu t on ly rel d t i ve 0
from an i n 1 t 1al fou r - cy cle ou tpu
t of the I p at the a
t rt of the tra e. M o t o ft e n. c
br an c hes loc all y in loo p or
in dec i ion logi ra
th er th a n ma k i ng ab l u t e .
- b

Scanned by CamScanner

-set

Debugging Components

111

address b ranches. I f the code does take a 32-bit address branch then the internal

logic can indicate this so that the external trace capture system an obtain the new
I P over five o r m ore core cycles.

:race ports can be i n val uable fo r d i fficult bugs observable only at full-speed op

eration. However, a trace port is expensive, compl icated, and not easy to decode
and use because it requires external logic analysis equipment. As a result, internal

trace buffers are being designed i nt o most processor cores today. The internal trace

buffer stores a l i m i ted n umber of internal state vectors, including I P, address, and
data vectors, by a h a rdware collection state machi.ne into a buffer that can be

dumped. Trace buffers or trace registers ( often the buffer is seen as a single register
that can be read repeatedly to dump the ful l contents) are typically set up to contin
uously trace and stop t race on exception or on a software assert. Th is allows the de

bugger to captu re data up to a crash point for a bug that is only observable d u ring
extended ru ntimes at full speed . This post-mortem debug tool allows a system that

crashed to be an alyzed to deter m i ne what led to the crash. For extensive debug,
some trace buffers are essentially built-in LAs that allow software to program them
to collect a range of internally traceable data, including bus cycle data, the I P, ad

dress, data vector, and register data, to emulate the capabi l i ty that hardware/soft
ware debuggers had when all these signals could be externally probed with an LA.

POWER-ON S E LF TEST A N D D I A G N OSTICS


An importa n t part of the boot fi rmware is the ability to test all hardware interfaces
to the processor. Boot code can im-plement a series of diagnostic tests after a power
on reset based on a nonvolatile configuration and indicate how far it has advanced
in the test i n g since reset through LEDs, tones, or a record i n nonvolatile memory.
This process is called POST ( Power-On Self Tests) and provides a method for de
bugging boot fa i l u res. I f the POST codes are well known, then a hardware failure
can be diagnosed and fixed easily. For example, i f the POST code indicates that all
in terface are okay, but that memo.-y tests failed, then replacing memory is l i kely
to fix t he problem. POST cod es are also often output on a bus to memory or an

external device, o probing an external bus allows you to capture POST codes and
diagnose p robl em even when the external devices on that bus are not operating
correctly. The x86 PC BIO ( Basic I nput Outpu t y t e r n ) has a rich h istory of
PO T and PO T c de o u t p u t t o well -known device and in terfaces o that config
u ra t i o n and external device fa i l u r e c an be readily d iagn o ed and fixed [ POST ) . A

fi rmware and, for t hat mat


career c a n be m a de writ ing and u nder tanding Bl
ter, firmw are o n any sy: tern. A ign i fi ant part of a firmw are en gi neer' job i sim
l i n g m o n i t o r nd diagno t i
ode safely so that
ply gett i n g a pr oc e 50r up and

the syste m can be more ea i l y u

Scanned by CamScanner

d by appli ati n progr a mme r .

1 90

mponents an d Systems
Real-Time Embedded Co

gen era l i s d i fficult b ec au se t he


ics c a n be wr itte n i n
.
De scr ibi ng ho w dia gn ost
1s e n d less . However a ll
to
ce
rfa
nte
i
t
igh
m
'
a pro ces sor
ces
i
"
dev
l
.
h
era

np
pe
f
o
.
range
.
l
a
ff
o
i

ern
-ch
ext
p
or
p
i
h
m
-c
n
o
em
l
.
.
mo ry , eith er i nte rna
.
processors mt en'ace to me
n
ti
ry
tes
g.
mo
e
m
h
t
i
Me
w
r
mo
ry
r sho uld be fa m 1 l t a
ory. So, every firm wa re eng i nee

:
tests inc lud e the fol low ing

to m e mo ry dat a bus i n terfa ce


Wa lkin g l s test to ver ify pro ces sor
Mem ory address bus veri fica tion
liza tion a n d test
ECC ( Error Cor rect ion Circ uitr y) i nitia
Device patter n test i n g

The walking l s test ensures that memory devices have been properly wire d to
the processor bus in terface. Byte la nes for wide mem ory b uses could be swapped,
and shorts, open circ u its, noise, or signal skew c o u l d a l l c a u se data to be corrupted

on the memory i n terface. A walking I s test s h o u l d si mply write data to me m o ry

with words or arrays of words that match the width of the memory data bus. A
m e mory bus is oft ery 1 28 b i ts or wider. M ost archi tectu res support load/store mul

tiple or load/store string instructions that allow m ul t i p l e r e g i s t ers to be w r i t t en to


the bus or read from the bus for a l igned data s t r u c t ures. O n t h e PowerPC G4,

load/store string instructions can be used to write 1 28 b i t s to m e mory on a l 28-bit


memory bus with walking l s to e n sure that the i n terface is fu l l y function al. Mem

ory data buses wider than a si ngle 32 bit will have a b u s i n terface that m a v coalesce

multiple writes to adjacent word addresses, b u t u s i n g the m ul tiword i n tr uct io ns

h el ps ensure that the test uses t h e ful l bus w i d t h .

T h e followin co e i n c luded on t h C D - RO M uses structure assignment in C

to coax the compiler mto genera t i n g load/store strin g i nstructions with the proper

gee C co m p il er d irect i ves .


#inc lude

" st d io . h "

# i n c lude

" assert . h "

/*

t yped e f

3 2 - b i t u n s ig n e d wo r d s
s t ruct

u n s ig n e d

m u l t iw o r d

-s

1 28 _ bi t

*/

i n t w o rd [ 4 ] ;

m u l t iwo r d t '
-

/*

t he s e d e c l a r a t i o n s s h
o u l d be
.
c a n s t mu l t 1 wo r d t t e s t z e
ro :

v o l a t i l e m u l t 1wo r a_t

v o l a t i l e mu l t 1wo r a_t

Scanned by CamScanner

a l i g n e d o n

.
1 28 - b i t

boundary * /

{ {O x 0 , 0 X O OxO ' OxO } } '


t e s t -p a t t e r n : { { O
0 , 0x O , Ox O , Ox O } } ;
t e s t _ l oc a t 1 on
{ { Ox O , Ox O , OxO , OxO} } ;
-

Debuggi ng Components

1 11

void a s s ig n_mu l t i ( void )

t e s t _lo c a t ion = t e s t _p a t t
ern ,.

void t e s t_mu l t i ( voi d )

r e g i s t e r i n t i ' 1- .'
f o r ( i=O ; i <4 ; i + + )

t e s t _pat t e r n

t e s t -z e ro I

f o r { j = O ; j <32 ; j + + )

I * walk t h e 1

up t h e bits in 1 2 8 - bit a l igned s t ru c t u r e * /

t e s t _pat t e r n . word ( i ]

( O x 1 << j ) ;

/ * st r u c t u r e a s s ignment * /

a s s ig n_mul t i ( ) ;
/ * a s s e r t if st o r ed data does not have a bit s e t ;

a s s e r t ( t e s t_loca t ion . wo rd [ i J ) ;

int m a i n ( v oid )

t e st_m u l t i ( ) ;

Compiling this code on a Darwin OS PowerPC G4 Macintosh with no partic


ula r comp iler d i rectives, the a s s ign_mu l t i function code does not use load/store
string instructions.
Sam - S iewe r t s - Compu t e r : - sas iewe r t S gee lllW . C - o

inw

Sa - S i ewe r t s - Comp u t e r : - s a m s i ewe rtS otool - v t

inw >

MW . out

The o t o o l is a binary utility for Darwin OS that will disassemble object code.
T he disas sembled code for c reveal s that t he com piler does not normall y gener
ate loa d/store string inst ruct ions. The load/store m ultiple instructions ( sutw and

Scanned by CamScanner

1 92

ents and System s


Real-Time Emb edd ed Compon

pop the stack. The 1 28-bit structure assignment loadJ


are used to push an d
.
. . cated m
.
bold type .
indi
are
s
ruction
inst
store
lmw )

-a s s ign_mult i :

r30 , 0x fff8 ( r1 )

00002170
00002a74

stwu

00002a78

or

00002a7c

mf s p r

00002aao

bcl

00002a84

mf s p r

r8 , lr

00002a88

mtspr

lr , rO

00002aac

add is

r9 , r8 , 0x 0

00002a90

a dd i

00002a94

add is.

00002a98

addi

0000219c
00002110
00002114
00002118
000021ac
00002ab0
00002ab4
00002ab8
00002abc

00002ac0

00002ac4

stn

lwz

lwz
lwz

lwz

stw

stw

stw
stw

lwz

lw

blr

r 1 , 0x f f dO ( r 1 )

r 30 , r 1 , r 1
rO , l r
2 0 , 3 1 , 0x 2 a 8 4

r9 , r9 , 0 x 5 a c
r2 , r8 , 0 x 0
r2 , r 2 , 0x59c

rO , OxO ( r2 )
r1 1 , 0x4 ( r2 )
r 1 0 , 0 x8 ( r 2 )
r2 , 0xc ( r2 )
rO , OxO ( r9 )
r 1 1 , 0x4 ( r9 )
r1 0 , 0x8 ( r9 )
r 2 , 0xc ( r9 )
r 1 , 0xO ( r 1 )

r30 , 0xfff8 ( r 1 )

to

Mod i fy ing t he comp ile line a bit to request


level - 2 optim ization and allow
load/ store string cod e gene ratio n and disas
semb ling again the sa me structure
assig n men t in C n ow is done with one
l s w i and one s t s w i i n plac e of the fo ur lZ
( load word and zer o ) and four stw
( store wo rd) inst

ruc tion s.

Sam - Siewe r t s - Compu t e r : - s a m s i ew e r t $ g e e - 02

- m s t r i n g mw . c

Sam - S i ewe r t s - Compu t e r : - sams iewe r t S o t o o l - v - t mw


_ass 1gn _mu l t i :
00002b24

mfspr

00002b28

bc l

000 02b 2c

m f sp r

00002b30

mtspr

lr , ro

add is

r9 , r 1 0 , 0 x 0

000 02b 34
00 00 2b3 8

Scanned by CamScanner

add1s

rO , l r
20 , 3 1 , 0 x 2b 2 c
r10 , lr

r2 , r 1 o , Ox o

>

O rnw

mw . out

Debugging Com ponents

00002b3c

add i

r9 , r9 , 0x50 4

00002b40

add i

00002b48

1t 1wi

00002b4c

blr

00002b44

1 IJ

r2 , r2 , 0x4f 4

11wi

rs , r 2 , 1 8

r5 , r9 , 1 8

I nstead o f coaxi n g the compiler i n t o t h e load/store string i nstructions, the


assembly could be written and called from the C. The most important point, how
ever, is that the memory test should test the full bus width and not j ust one word at
a time. The mw . c code has debug output that can be turned on by passing -DDEBUG
on the c o mpile l i n e so that the walking 1 s test patterns can be observed, and the
1 28-bit alignment of the multiword structures can be verified. Note that the ad
dresses of the structures are all multiples of Ox I O, 1 6 bytes, or 1 28 bits. By default,
most compilers align structu res u nless specifically directed not to, but it's still a
good idea to verify t h is. Some of the walking I s output has been abbreviated here:
add r ( t e s t _z e ro ) = O x 00002ffO
ad d r ( t e st_pat t e rn ) = Ox00003020
add r ( te st_loc at ion ) = Ox00003030

i=O , j = O

mwo rd = Ox00000000000000000000000000000001

i=O ,

j=1

mwo rd = O x 00000000000000000000000000000002

i=O , j = 2

mwo rd = O x 00000000000000000000000000000004

i=O , j = 3

mwo rd = Ox000000000 00000000000000000000008

i=O , j =3 1 mwo rd = oxooooooooooooooooooooooooeooooooo


ox00000000000000000000000 1 00000000

i= 1 ,

j =O

mwo rd

i=1 ,

j =1

mwo rd = ox00000000000000000000000200000000

i= 1 ,

j =2

mwo rd = Ox00000000000000000000000400000000

i=1 ,

j =3

mwo rd

1=1 ,

oxooooooooooooooooaooooooooooo oooo
j = 3 1 mwo rd =

i =2 ,

j =O

i=2 , j = 1
i =2 ,

j =2

i=2 , j = 3

oxoooooooooooooooooooooooaoooooooo

0000 000 0000 0


mwo rd = ox ooooooooooooooo 1 0000
0000 0000 00000 000
mwo rd = ox oooo oooo oooo ooo2
0000 00000000
mwo rd = 0 x 0000 0000 0000 0004 0000
oooo
mwo rd = 0 x 0000 00000000oooeooo

i =2 , j = 3 1 mword

Scanned by CamScanner

ooooooooo

0 x ooooc uooaoooooooooooooooo

ooooooo

s;a
194

_
_
_
_

_
_
_
" ,...
ialol
""'
sfrftr-..w

, . /
r

II

and Syste m s
ed Com pon ents
edd
Emb
me
Rea l-Ti

00000 00000000
1 00 000000000
ooo
ooo
oxo
=
000000000000
i=3 , j =O mwor d
00000000000
o20
o
ooo
oxoo
0000 0000
i=3 , j :: 1 mwor d =
00000000000
000
400
ooo
oxoooo
oo
i=3 , j = 2 mwor d =
ooooooooooooooo
ooo
ooo
oso
oo
oxoooo
i=3 , j =3 mwo r d =
oooooooo
ooooooooooooo
ooo
ooo
ooo
oxso
i=3 , j =31 mwor d =

an d re tr ievi ng patt ern s to


sted by st or in g
te
be
n
ca
s
bu
s
The m em or y addres
w ith po w er s of two, the
] . By ad dr es si n g
99
rr
Ba
[
o
tw
of
wers
re , ad dres ses m ight
addres ses th at are po
l i ne s. F ur th er m o
s
es
dr
ad
e
th
on
1s
ad dr es s tested wa lks
t fully de co de d. Fo r exam ple,
or on pu rpos e if no
ke
sta
mi
a
to
e
du
r
he
be ali ased eit
a n ge from o xo oo o oooo to
ad dr ess ed in the r
be
n
ca
ry
mo
me
B
-M
the same 32
ox oo 20 00 00 0 to Ox 003 FFF F FF,
an d the n ag ain from
es)
byt
00
000
o20
oxo
(
oxo o1 FFF FFF
nd ed tha t the mem
de co din g err or if it's i nte
s
res
add
an
be
ld
cou
is
and so on . Th
an d not aliased at
32 M B of the ad dre ss spa ce
t
firs
the
in
e
onc
d
ppe
ma
ory only be
res s dec odi ng onl y
Th is wo uld hap pen if the add
other mu ltip les of oxoo2000000.
ting l s and Os is
. Mo st ofte n a pat tern of alte rna
included 26 bits rather than 32 bits
1 at all lo
ensu re that bits can be set to 0 and
used ( oxAA or Ox55 ) for each byte to
to
ope rate d o n bitw ise and com bined
catio ns. The pattern and antip atter ns are
patte rn and a ntipa ttern test can be
qu ckly verify data written and read back . This
full devic e test.
writte n and read back over the entire memo ry range for a

EXTERN AL TEST EQU IPMEN T


Use of external test equipm en t or de b ug can be expensive ' b u t also can save
.
.
count 1 ess h ours locahzin a h a dware / so ftware interface
b u g that might be ve ry

. 11y, t h e most common external test equ ipdifficult to iso 1ate oth erw1se . H1sto nca
.
ment used for verifyin g h ar dware/software sys tems me l u d ed a n oscillosc ope an d a
.
.
.
.
.
e
Logic Analyzer. The osc1u oscope is used m ostlY t 0 .iso l ate s1gna
lmg issues with th
hardwa re, to verify sig.na1 s, an d to tune out pu t signa

l s. T h e oscilloscope is mos t

useful to the hardware engin eer, b ut from a syste ms

viewpoint, it's valuab le in em.


bedded ystems for veri fyin g th e o utermost
extents of the system from sen so r in puts ( anal og) to actuator out puts (
anal og 0 r PWM signals ) Figure 9 1 1 sho ws the
.
inte r face be twee n a YxW orks emb
edd ed processo r system, which captu res fra mes
.
. SC cam
firom an NT
. .
era 1 run s an 1 mag(e-p ro
ea
.
cess1 g
service
that determi nes wh er
.
v1su
.
a1 ta rget is i n the cam era '
F ield of V iew) , and then com m ands ti lt/ pan
servos to cen ter the object i n
th c
llo
e
T he M 0 ( M i xed S ig n a l Osci
s
2
a
e
o
op
n
cn
b
pr
sc e ) how
an l g
d 1g1ta l sources. Here the M SO i s u ed to
_
exa m i n e an NTSC o u t p u t and
ven'fy the 30 fps outpu
t rate of the cam era.

FOV

: :: ;

Scanned by CamScanner

<? V .

Debugging Compone

nts

1 15

'ITSC

Examining the NTSC Signal for Thumbnail Camera


( A nalog/Digital interface between Processor and Sensor/Actua;or Subsystem )

FIGURE 9. 1 1

System interfaces for an embedded system.

Likewis e, a relative ly inexpen sive MSO can be used for low-spe ed, l imited
.
w1dth digital logic a n alysis as well . Logic analyzers have an advantage over
oscillo

scopes in that they can observe 1 6, 32, or more signals at the same time, but only at
well -defi n ed l ogic levels .such as TTL (Vthreshold = l .4 V ) and CMOS (Vthreshold =

2.SV ) . The osci lloscope can be used to look at the same channels and to see their
analog natu re, i n c l u d i ng noise, rise time, fall time, and overshoot. The MSO has an
advantage in that it can display -LA output and one or more oscilloscope changes
on the same view, allowing for logic analysis with probing of the same signals to
exam ine a n alog signal issues that might be affecting digital logic.
Using t h e M S O to analyze an NTSC signal at m il liseconds of resolution, the
odd and even line raster periods can be seen at half the frame period ( frame rate is

30 fps ) . Figure 9. 1 2 shows the odd and even raster output signals measured with
the M SO from a composite outpu t CCTV (Closed Circui t Televi sion ) camer a.
NT SC is i nterlaced 2: 1 , so the field rate is 60 fps. The comp osite signa l outpu t from
the N TSC came ra c o m b i n s l u m i nance and chrom inanc e. The signal changes
based upon the sce n e that the cam era view s, but the basic perio d does not.

Scanned by CamScanner

1 96

Real-Time Embedded Components and Systems

NTSC composite signal odd and even scan lines.

FIGURE 9. 1 2

Using the MSO to go to a m ic rosecond reso l u t i o n , now the individual scan

l ines within an odd/even raster c a n be detected. The CCTV ca meras have 5 1 0 hori

zontal pixels and 492 vertical pixels, so 246 odd l i nes a re d igitized and then 246 even

lines. Each l ine therefore has a period of 67.75 m icroseconds i n theory based u pon

the frame rate. Figure 9. 1 3 shows eac h scan l i n e as h a v i n g a period of 6 1 .7 mi

croseconds. However, the simple calculation dot:s n o t take i n to account vertical

...
.

FIGURE 1.1 J

Scanned by CamScanner

NTSC com posite signal individual scan lines.

..

.....
'"....
.
.. . ........ ..
.
")

Debugging Components

1 17

( wh ere clos ed cap tion is inserted


blank lines
into the NTSC sign aJ) or the latency
and
od
even
the
scan
een
s.
The
w
bet
mea sured period for a single scan line is less
al
oret
upp
ic
the
er
bou
e
nd on the scan line period and
than th
therefore makes sense.
can
also be used to analyze the
The MSO
tilt/p an servo control chan nels. A
not
is
rigor
o
ously
serv
y
stand
ardized, but in general the servo circu it and
hobb
designed
t
be
is
g
contr
olled by a PWM outpu t with a separation between
gearin
.
peaks of 20 to 80 m1lhse conds and a pulse width that varies between 2 milliseconds
and I millisecond with servo center near 1 .5 milliseconds. Servo characteristics can
vary, so each new hobby servo should be individu alJy characterized and the PWM
output signal tuned to it for the best results. The spacing between the pulses allows
hobby radios to m ultiplex m ultiple servo signals into one PWM output, so the re
ceiver can demultiplex to allow for four or more channels on a single RC ( Radio
Control) frequency. For the applications of hobby servos presented in this book,
only one signal is carried on each PWM signal, so the spacing betwee p ulse can
.

vary significantly and has no effect. The pulse width sets the servo pos1tmn. Figure
9. 1 4 shows an MSO measurement of the PWM signal generated by the NCD 209 2
channel servo control chip .

.,

I
.
. ,
.

- - - - -

.
'

-.

FIGURE 9.14

- _, _ - I

'
-

- -

- -

oller defa ult PWM output.


NC O 20 9 servo contr
.

put ) is 1 . 5
cen ter ing (th e defa ult out

Note that the p u l se w1d t h for the servo


.
ses is 63 mil lise con ds.
m ill i seconds as exp ecte d and that the spac mg be tween pul
e gen erd to ve n'fy sign als and to tun
.
For the most part, osc111 oscopes are use
.issu es, e ross talk ' or oth er sig nal i n teg rity
at ed o ut put s i gn a l s , to loo k for noi se

Scanned by CamScanner

1 98

nts and Systems


Real-Time Embedded Compone

.
i nterfa ces to sens ors a n d actu ators. Logi c anal\rJ
. .
.
, ..er&
.
prob l ems m embedded system
.
.
domam d epteted earher i
d1g1tal
the
m
used
often
most
n F'1g.
.
on t he oth er ha n d a re
logic
fter
debug
to
n
used
a
be
can
alo
zer)
g
Analy

s ignals
ure 9. l l . An LA ( Logic
to
t
dnve
outpu
are
DA
signals
Cs
digital
or
before
PW "
l
h ave been digitize d or

. .
g
a
u
e
b
1
d
mtern
to
d
d
use
be
1g1t
can
LA
n
a
al
se,
Likewi
logic on
actuators like servos.
softw
re/hardw
digital
n
o
re
events
trace
to

inte rfaces
a processor board or even
the
signal
where
level
ts a co m m
1/0)
urpose
P
such as buses or GPIO ( Genera l
on
logic level. Probing h igh-speed logic signal such as those fo u n o a D O R m ern.
ory bus can be difficult due to speed, skew issue , and complex1 m decodi n g the
logic. Probing low-speed logic such as G P I O o r simpler memory mterfaces such as
SRAM is much simpler. H igh-speed specialized buses such as PCI are often most
easily traced using a bus a nalyzer that is specifically designed to captu re and ana
lyze logic signals only on one bus i nterface. Because the trend in e mbed ded
systems is to move external memory on-chip, the most useful probing is G PIO or
using external bus analyzers such as a PCI a nalyzer.
With a bus analyzer or LA used to capture G PI O output, software can be in

strumented to emit trace information to the external bus or G P I O pins for analy
sis. However, given enough memory for a software trace b u ffer, it's not clear how
this is more advantageous than the SWIC WindView style o f tracing. I n fact, Wind
River often describes WindView as a software LA. A write to a n external bus or
GPIO memory-map ped address can be j ust as i n trusive or more intrusive than
posting a write to on-chip or external memory. However, oscilliscop es and LAs

continue to be useful i n general for diagnosin g hardware problems and for tracing
software that is not instrum ented and software/hard ware i nteract ions.

APP LICATION -LEV EL DEB UG G I N G


A plication- level debu gging can b e done
b y si ngle stepp ing a thread or using
.
Wmd V1e
w. However, for m ultith read appl icati
ons, s in gle stepp in g is so m eti m es
no t as u sefu l when the inter actio n betw
een t hrea ds i s t h e prob lem bein g exam ined
.
.
.
Wm
dVie w is helpful, but only show s
kern el even ts u nless it's furth er inst rum ented
by the add ition of
calls i n the app lica tion bein g deb u gge d. H owever,
( ) has only limi ted i n
form atio n asso ciat ed w i t h it, so often program rners
.
resort to
call s so they can out put
stat e info rma tion to a conso le in
for at. Th ts can som etim es wo
s
rk oka y, but oft en cau ses new b ugs a nd can be rn 1
lead ing bec aus e
occu
call s sign ific ant ly cha nge
exe cu t i o n t i m ing. Th is
mo stly bec aus e
eerequi res a large a m
o u n t o f cod e from the C lib ra ry to
.
.
...c (0
cut e a n d a I so req uires pot enti al
blo ckin g whi le dat a i s out put to co n sol
e de11 C"" .
VxW ork s,
sho uld also not be calle
d from I S R or ker nel con tex t. The
way to deb ug with out put to a con
sole i s u s i n g log M s g . Log gi ng simp l causC
y

wvEv ent

p int t

wE ent )

pri ntf
pri ntt

pri nt

Scanned by CamScanner

stn

bettt:

Debugging Components

1 91

message t o be queu ed that poi nts to a


messge bu ffer. The
actual str ing ou tp ut is
co m pleted by a log gin g service rat h er t
.
han m the ca Jrmg con
text . This allows the
console I/O to be dec oup led from t he call'
ng thread of execut ion , and the caller is
only slowed down for the dur ati'on of an
.
m-m em ory wnt
e an d messa ge enq ueue.
.
.
Furtherm ore, the prio rity of the 1oggmg servic
e
(
tas
k
)
can be cont ro II ed an d there.
"ore interference to real t'1m e servICe
11
s by 1ogM sg d eb ug outp ut is also cont
rolled
better than inline p r i n t f calls.

SU M M A RY

ebgging t.he software/hardware i nterface, applications, and complex multiser

v1ce 1nteract1ons can be difficult. Poorly synchronized services can suffer from race
conditions. Likewise, poorly designed hardware/software interfaces that do not
include synchronizing mechanisms can also have race conditions. ln general,
separate services should synchronize through i;maphores or message queues if
they need to i nteract, and likewise software should synchronize with hardware
through interrupts, pol l i ng status registers, or through control interfaces. Simple
functional software errors can be easily caught through use of parameter checking
and assert calls. H a rdware d iagnostics and POST run by boot firmware can make
hardware failures easier to isolate. Good practices can prevent many bugs before
they become difficult to diagnose. However, many bugs related to software/hard
ware integration will still arise; when they do, knowledge of debug monitors, trace,
and external test equipment is invaluable.

!_X ER CI SE S
a com mo n sensor o r a ctu a tor 1' n terface.
l . Use an osc illoscope to cha racterize
controller such as the. N C 0 209 an dc h arac S peC I'fi1ca 11 y, use a hob by servo
. .
.
.
servo and its hm it s of ope rati on.
by
hob
n
give
a
or
"
11
WM
tenze t h e P
tic load mg exa mp 1e nc l uded o nt he
the
syn
the
ze
aly
an
to
w
2. Us e W indVie
.
vJCes t hat use 90 o;10 o f th e proc es ser
two
es
lud
inc
it
t
l
'
f;
u
a
.
CD -RO M . By d e
an overload by mc reasmg Si run ate
cre
to
le
mp
exa
s
I
th'
sor cycles . M d J' fy
a t this ove rloa ds
U se a Wi ndV iew trac e to prove t h
s.
d
on
sec
ilh
m
0
tim e to 3

?.

!:
s

. Take a trace an d ex pla i


LTT on a Lin ux system
up
set
d
3.
wh at you ob serve .

Scanned by CamScanner

200

Real-Time Embedded Components and Systems

CHAPTER REFERENCES
[ Barr99] Barr, M., Programming Embedded Systems in C and C++. O 'Re illy,
1999
[CorbetOS ] Corbet, J., A. Rubini, and G. Kroah-Hartman, Linux Device Dr
ivers
O'Reilly, 2005.

[GOB] http://www.gnu. org/software/gdb/documentation/


[ LTI] http://www. opersys. com/L TT/
[ POST] http://bioscentral.com/
[PowerPCJ PowerPC Microprocessor Fa mily: The Program ming Enviro nme
nts for
32-bit Microprocessors. Motorola MPCFPE32 B/AD Rev. 1 , 1 997.

[Yaghmour03 J Yaghmour , K., Building Embedded Lin ux Systems. O'Reilly,


2003.
[Zen J http://www. macraigor.com/zenofbdm.pdf

II

f
I

i
I
I

'

Scanned by CamScanner

,, .J..:-.. 1..,,.,_....

\.
I

11

10

Pe rfo rma n ce

Tu n ing

In This Chapter

I ntr od uct ion

Ba sic Co nc ep ts of Dr illDown Tu nin g


H a rd are -Su pp ort ed
Pro fili ng and Tr aci ng
.
Bm ldm g Per for ma nce
Mo nito rin g into Software
Pat h Len gth , Effi ciency,
and Cal ling Frequency
Fun dam enta l Opt imiz atio
ns

INTRO DU CTI O N
The art of performance tuning a system or application is a huge 'topic that can't be
covered completely i n this chapter. The chapter covers basic methods and provides
enough knowledge for you to pursue a more in -depth understanding of perfor
mance tuning on yo ur own with further study. Performance tuning is critical to
real - time e mbedded systems when services are missing deadlines . Otherwise, effi
cient, hig h performan ce executio n of services is not necessary, although it can help

t he designer to save cost, reduce power, and simplify thermal cooling for systems.
Fu nda m ental ly, a real - time system is operatin g correctl y when it produce s the
requ ired
re ativ e to
output ( functio ns) and produces them by a specific deadlin e
req uest. Ea rly in a projec t, ystem design ers m ust size processing based upon un

m p le x ity of se rvic e a l gori t h m

. After algorit hms are im ple


ment ed and t h e sy tern is r u n n i ng, servic es may take longe r to u n t ha n ex pected
an d o ve rru n dead l ines. Whe n t h i happ ens, there are sever al optio n :
de rst a ndi ng of the

co

JOI

Scanned by CamScanner

202

Real-Time Embedded Components and Systems

algori thm
Red uce the com ple xity of the
vic es
.
.
Reduce the freq uen cy of ser
.
ms with t u n m g
rith
algo
the
of
y
ienc
effic
n
utio
I ncrease the exec
.
g h a rdwa re to run w i th a faster
adin
upgr
by
t
ghpu
throu
ssor
proce
I ncrease the
and less laten cy
cloc k, more band widt h, mor e mem ory,

of these option s is acc ept


From the standp oint of real -time correc tness, any
s may be an optio n, but
able. Reduci ng the comple xity of the algo rithms for service
re c ? decs ( co mpres
might degrade the overall system quality. For exampl e, softw

sion and decompr ession protocol for a data stream ) can pr ov i d e high rates of co m
pression at the cost of h igher algo r i t h m i c com plexity. I f the compre ssio n or
decompres sion can't keep up with the desi red fra me rate, t h e options inclu de a

lower performance codec that is simpler and can be executed at the desired frame
rate, or a lower frame rate. The simpler codec that provides less compressio n will
req u i re transport with higher bandwidth.

Often the options of reducing performance ( e.g., lower frame rate ) or of ob

taining more resources ( e.g., more bandwidth or i n c reasi ng processor cycle rate)

more
resources o r performance red uction is performance t u n i n g to optimize the effi
ciency of service execution. This is easily s a id and h a rd to do. I deally, a system de
are not feasible or will not meet the system req u i rements. An alternative to

sign should not count heavily upon performance tuning to make it feasi b l e, but

tuning can save a project that would otherwise not work. From a n R M p erspecti ve,

tuning is j ust red ucing WCET for a software service. Another option is to offload a
service or specific functions t h at the service perfor ms to h a rdware state

machines.

Offloading functions can i n c rease overall syst e m concurrency and pro vide sig

nificant speed- up. ( Portions of the material on performance tuning prese nted

i n this chapter was first published on the I BM deve lo perWork s Web site, IBM

developerWorks. Reprinted with pe r missi o n . )

BAS IC CON CEPT S OF DRI LL-D OWN TU N I N G


The perfor mance o f firmwa re and softwa re
must b e tuned t o a workl oad . A work
i a sequen ce of se rv i c e requests, comm ands,
s
I /Os, or other quan tifiable tran
actio ns that exerc ise the softw are.
Work loads most often are prod u ced by workload
gener tors rathe r than real-w orld
service prov ision . Good wo r kl oads captu r e tht
esse tial featu res of real - orld
work loads and allow for repla y or ge n e r at io n of tht
service requ ests at a max i m u m rate
so that bott lene cks can be iden ti fied in syste ms.
Most workl oad -gen era t io n tools are
x
a p p l i ca t i o n - or se rv ic e do m a in - speci fic . fo r e
a mpl e , com p i l e r b nchm arks or
1 /0 s u bs y t e m workl oa d s ge ne ra t ed b
r
y IOmete . a
com mon ly u ed d 1 k 1/0 wo r k l oa d
gen erat or.

loa

Scanned by CamScanner

Performance Tunin g

20J

C haract eri stic s o f cod e seg me


n ts tha t mo st affect
per form anc e i nd ude the
following:

Segm ent pa th leng th ( in str uc tio


n co un t)
.
Segm en t ex e ut 1o n effi cie nc y (cy
cle co un t fo r a given ath )
P
Segme t ca llm g fre qu en cy ( i n he
rtz )
Execu tio n co n text ( cri tical pa th,
or err or pa th)

.
The crit ical pat h i n clud es cod e execu
ted at h igh frequency for perfor manc e
.
eval ution wor kloa ds and ben chm arks . This
is ofte n a sma ll port ion of the overall
code impl eme nted and may also often fit into L l
instr uctio n or L2 unifi ed cache. .
I nterr pt-ba sed profi l ing is the best place to start
analy zing software services
and deadl ine overr uns . I n terrup t profil ing requires that the system be run with a
well-de fined test worklo ad, ideally a workload that stresses the services and caus
is
ing deadl ine overrun s. The interrup t profiler samples the IP ( I nstructi on Pointer )
or LR ( Li n k Register) to determin e location in code on a periodic basis. The LR is
used in many archi tectures because it contains a return from interrupt address,
.
and the samples of where the code was in execution are taken by interrupt. Ob
serving the I P i n the sample i nterrupt routine is of l ittle use for profiling. Often a
I -millisecond sample rate is used. The workload should be repetitive at some fre
quency; for example, a video frame rate of a specific resolution, a data request rate
for a specific transfer size, a digital control loop sensor sample, and control law
output rate, or any workload that generates uniform periodic service requests.
Multiple workloads can be analyzed, but while profiling, one workload should be
run at a time. M ixed workloads are much harder to profile and understand. While
a uniform workload is being run, the profiler samples the execution locations, and
with asynchr onous sam pling relative to the workload period ( this is importan t ) ,
the profile begins t o stabiliz e a n d show where i n the code most o f the time is spent

for the given workload.


.
.
or LR ts observed m
Where t i m e i s spent , which service and nct1o n the
_
:
most often, i s a fun ctio n of three char acte nst1 cs of the services

tion )
F req uenc y of ser vic e execut ion (or fun ctio n eecu
.
tion )
se
func
(or
ice
relea
serv
each
for
th
leng
I ns t ruct10n count or path
.
ice release (or fun ctio n)
serv
h
eac
in
ion
cut
exe
e
cod
ci
Effi enc y of

.
map ued to services (tasks), to func tions , to lines
t mg c an be
N ote that t h e profil
.
r
hows how

.
ruct 1ons . Figu re 1 0. l s.
d 1vidua l mac hin e code mst
of c cod e, or eve n to in
a 32 -bit cod e add ress can be
a profi le collecte d at t h e Ieve J 0 f hits observed at
.
h 1 eveI down to a
g spent from a very h1g
mined to understa nd wh ere t 1 me is bein
.

machine code instruction .

Scanned by CamScanner

204

Real-Time Embedded Components and Systems

1,
CPU
....,,..ce
c--

....-..-i

I
'i

....... ID
f mdlm

...

Comts lll

....... ID U.. al Code

e e I

c_, .. ,...._

-. fooAlf -tllndlon lllGI07_1115e


llneofC

cs

ASlllt /Wclfoo.c: ll2

Performance profile data mining (hotspot


drill-down) concept.
FIGURE 1 0. 1

As m e n t io ned i n the i ntroduct ion, frequency and i nstruction count red uc ion
can solve t he performance problem, b u t at the cost of overall system performan(e.
Ideally, the goal is to ide n t ify i nefficient code, or

hotspots as t hey are often called,

and to make m od ifications so those code blocks execute more efficiently. A cycle

based profile, where period ic sam p l i n g of the I P or LR is . used, does identify


hotspots in general and sets up more detailed analysis for why those code blocks

are hot-frequency of execution, n u mber of i nstructions i n the block, or poor ex

ecu t ion efficie ncy. Poor exec ution efficiency can be the result of stalling a proce sor pipeli ne, m i sing cache, bloc k i n g reads from slow devices, mi a l i g ned d ata
t ructure , nonoptimal code generation from a compiler, bad coding pract i e in

efficient bu u age, and unnece sary computation.

k
One app ro a h to t u n i ng i to go t h rough code from tart to fini h an d
r
r
w
effici e nc y i ue , orrect them, and try r u n n i ng aga i n . Thi i tedio u . It a n
go
bu t the time req ui red , the low p a y -off, a n d the la rge n u m ber f od e han
en t 1
requ i red all ontribute to ri k with t h i ap pro h . A bet ter a ppro a h i t o id
a i 10"
t he hot pot , n a r rowing the o pe of ode hange ign ifi antly. and the n e.x

thi

ubset of the

ode for potential

hot pot iden t i fi ation , o p t i m i za t ion now has

Scanned by CamScanner

o m b i n e d "i t h pr filing "0 t


ut
feedba k loop so t ha t t h e , al

pt i m i 1,,a tio n .

Performance Tuning

1:

205

.
.
o pt i m izat i o n s c a n be m easu red as cod e
mod ified . The feed back appr oach reduces risk ( mo d i fyi ng a n y worki ng co d a w ays runs
_
the nsk
of i n t rodu cing a bug

the
tabilizing
code
base ) a n d pro v1 d es ,
or des
.
ioc us for the effiort F or any ted ious
e f. '
fort 1t s a l so affi r mi n g to measu re progress, and rerun ning
the
profi le and obser v.
ing hat less t i me i s spe n t i n a r ou t me aft er an opt i m ization

is
we 1 come feedback. It
sh ould be n oted t h a t h i g h - l evel per fiormanc e measur es often show no apparen t
.
.
, l ocal tzed opti m izatio ns . A n
.
impr ovem e n t 1or
.
opt1. m 1zat1on
may lead to a hurry up
.
.
and wait sce n a ri o for that P art icular function or service where a bottlenec k else .
wh ere m t h e system m a sks the i mprov ement .
The ove r a l l process bein g p ropose d 1 s ca JI ed
F i rs t , hotspots are
.
.
.1dent1fied w i t h o u t regard to why they are hot
, a n d th en c I oser ana I ys1 d etermi nes
.
.
of service code or the system a re ra t e 1 1 m 1 t mg.
why specific sect ions
Th"1 m
1 t 1 a I

. . .
lv l of a nalysis is i m porta n t because i t c a n help drive focus on t h e t ru ly rate.
hm1tmg software locks If he software is not the rate- limiting bottleneck in the
:
system, t en profilm g will still help, but will result in arriving at blocking locat ions

drill-down.

more q uickly. If the bottlenecks are related to 1 /0 latencies, this will st i l l show up

as a

otspot, b u t t h e opt i m i zation requ ired may req u i re code redesign or system

redesign. For example, status registers t hat are read and respond slowly to the read
request w i l l s t a l l execution if the read data is used right away. Read ing hardware
status i nto a cached state long before t h e data must be processed reduces stal l t i me.

Ideally, the h a rdware/s oftware i n terface would be modi fied to red uce the read
latency i n addi ti o n . M a ny system- level design improve ments can be made to avoid
of cache
1/0 bottlen ecks. For exam ple, fast-ac cess TCM is used so that the impact

single- cycle access


misses ( the m iss penalt y) is m i n i mized or, in some cases, for
memory, eli m i n ated.

impo ssible , espec ially durin g the


Adj us t i n g h a rdwa re resou rces may often be
I f you need mor e si ngle -cyc le
hard ware /softw a re i n tegra tion phas e of a proj ect.
nt is nece ssary to adju st this re
on-c h ip m e mor y, a h uge cost and t i m e inve stme
software to m i n i mize the loss of
sou rce. I nste ad , i f you can mod ify fi rmware and
reach i mpr ove men t with out hard
perfo r m a n ce t h ro ugh a bot tlen eck, you m ight
idea ll y dete r m i ne wha t thei r bot tlen eck
war e cha nges . Sys te m a rc h i tect s sho uld
war e and software eng i nee rs wit h t u n i ng
will be and t h e n p l a n for scal abi lity. Firm
to the ir sys tem s can ens ure tha t ma xi
t ool s a n d som e h a rdw are s upp ort bui lt i n
in
giv en set of hardware res our ce con st ra ts.
mum per fo r m anc e is ach iev ed for any
e
asu re
m mo n b u i l t - i n CP U perfo rm anc me
The res t of t h i s cha pte r d i sc uss es c o
a nd software
can use the m to tun e firm wa re
m ent h ard ware fea t u res a n d ho w yo u
l he lp, but the db i ity
nte rru pt -ha ed rrofi l i n g wil
for op ti m a l pe rfo rm a nce . S i m pl e i
_
_ tru ct 1on
.ou n , an d r ec u t1 o n
u n' ,
to se pa r ate ho tsp ots ou t ba ed up on freq e !
1 t hat alm o _ t all
a re a 1 t. fhe good new .
.
effi ci' e ncy req u i res som e b u i l t in har dw
. . ) or
mt
e M on ito rin g
an
rm
rfo
Pe
(
PM
e
ud
l

c
n
i
w
no
m odern proce sor core

Scanned by CamScanner

20I

Real-Time Embedded Components and Systems

performance counters that can provide even better vision into performance issues
than interrupt-based profiles.
. .
Ideally, after a workload set is identified and perormance optt m1zations are
being considered for.inclusion in . a code base, . ongom.g perfo rmance egrss ion
testing should be in place. Performance regression tesing shoul d provide simple
high-level metrics to evaluate current perfor mance in trms of MIPs: FLOPs,
transactions or I/Os per second. Also, some level of profihng should be included,
perhaps supported by performance counters. The generation of this data sho uld be
automatic and be able to be correlated to specific code changes over time. In the
worst case, this can allow for backing out code optimizat ions that did not work
and for quick identification of code changes or added features that adversely affect
performance.
.

HAR DWARE-SU PPORTED PROFI LING A N D TRAC I N G

Some basic methods for building performance monitoring capability into the
hardware include:

Performance event counters ( P M U )


Execution trace port (branch encoded)
Trace register buffer (branch encoded)
Cycle and interval counters for timestamping

We now focus on built-in event counters and how they can be used to profile
firmware and software to isolate execution efficiency hotspots. Tracing is advised
after hotspots are located. In general, tracing provides shorter-duration visibility
into the function of the CPU and firmware; but can be cycle-accurate and provides
a view of the exact order of events within a CPU pipeline. Trace is often invaluable
for dealing with esoteric timing issues and hardware or software interaction bugs,
and you can also use it to better uderstand performance bottlenecks. A profile
provides information that is mch like an initial medical diagnosis. The profile an
swers the question, where dos it h urt? By comparison, although a profile can't tell
the tuner anything about latency, a trace provides a precise measure of laten
Profiling supported by performanc e counters can not only indicate where time 15
spent, but also hotspots where cache is most often missed or code locations here
the processor pipeline is most often stalled. A trace is needed to determine stall du
ration and to analyze the timing at a stall hotspo t.
Understanding latency for 1/0 and memory access can enable better overlap of
processing with 1/0. One of the most common optimizations is to queue work and
Scanned by CamScanner

Perfo rmance Tuning

207

start 1/0 early so that the CPU is kept as b sy as possi. b


le. So, altho ugh profi ling is

p t nm
g, trac ing feat ur
the most popular a p p roa ch to
es shou ld stiJI be designed into
.
the h ardware for debug a n d er 1orman ce tun mg
as well
M any ( although not a ll ) com mon CPU arc h
.
1tecture
s include trace ports or the
.
I ud e a trace port . The I B M and
option t o me
'i\MC C owerPC 4xx CPUs, X1lmx

1 ex II Pro ARM E m b e dde d Trace M acroceU,


V'rt
and Tens ilica are amon g th e
.
1
.
m
a
o
t
t
h
.
th
t
f;
1
'
a
1
s
cate
h
1ps
gory
c
Th e 4 xx senes of processo rs has i ncl uded. a
branch-encod ed trac e port for man y years Th e b ranc h -enco ded trace port ma kes
a mce com p rom ise betw een v1s1b 11tty into the core a nd pm-cou

. - ch 1p
n t commg o ff
'
to enable tracm g. Beca use the trace is encoded , 1 t 1 sn't eyeI e-accurat e, b u t 1s
accufor
ugh
h
ardwa
en
re
te
or software debug ging and for perfor mance optim izar

m
en
odern EDA ( Electro nic Design Autom atio n ) tools, ASIC design
uo . G 1:
venficatlon has advance d to the point where all significa nt hardwar e issues can be
discovered ring si1:1 ul ation and synthesis. SoC ( System-on -a-Chip) designs
make post-silicon verification difficult if not impossible, because buses and CPU
cores may have few or no signals that come off-chip. For hardware issues that

might arise with i n ternal on-chip i n terfaces, the only option is to use built-in logic
analyzer functions, such as those provided by the Virtex Chip-Scope. This type of
cycle-accurate trace is most often far more accurate than is necessary for tuning
the perform ance of firmware or software.

Encoded-bra n c h t race, as provided by the IBM and AMCC PowerPC 4xx, re


quires only eight 1/0 pins per traceable core. The encoding is based upori trace

output that provides i n formation on a core-cycle basis on interrupt vectors, rela


tive branches take n , and the occasional absolute branch taken. Most branches and

vect ors that the p rogram counter follows are easily encoded into 8 bits. Typical
processors might have at m<?st 1 6 i n terrupt vectors, requiring only 4 bits to encode
the interrupt source. F u rthermo re, relative branches for loops, C switch blocks,
and if blocks typical ly span short address ranges that might require 1 or 2 bytes to

encode. Finally , the occasi onal long branch to an absolute 32-bit address requires
4 bytes. So, overa ll, encod ed outpu t is typica lly 1 to 5 bytes for each branc h point
in cod e. Given that most code has a branc h densit y of l i n 10 instru ctions or less,
the encoded outpu t, which can take up to 5 cycles to outpu t an absol ute branc h, is
still very accu rate from a software exec utio n orde r and timi ng perspective.
progress between bran ch ints.
The prog ram cou nter is assu med to linea rly
ce code after 1t s ac
So, the bran ch t race is easil y correlated to C or asse mbly sour
,
loped
quired thro ugh a l ogic anal yzer or acqu isitio n t ? I such s ISCTrace deve
. 1 n , which 1s near ly the cycle rate
b y I BM for deb ug. Due to the h i gh rate of acq m1t ?
. hm1 ted. For exam ple, even a 64- MB
of t he CPU the trace window for a trace port 1s
ts or abo ut 200
trac e buffe wou l d capt ure app roxi mately 1 6 mill i n bran ch poin
s only one fifth of a seco nd of
mill.io n instructi ons At a doc k rate of I G Hz, t hat
.
dwa re and software debuggin&
Code execu tion . Thi s info rma tio n is i n val uab le for har

Scanned by CamScanner

JOI

Real-Time Embedded Components and Systems

an<l timing issues, as well as d i rect measu rement of latencies. Hoev er, m
ost ap.
.
plications and services provide a l re n u mber ? f soft wa re opera t i ons per second
.
.
.
over long periods of t ime. For v1s1b1hty m to h 1g er-level soft wa re p erfor m
an
pro fi l i ng provides i n formation t at i s m uc h easier t o use t h a n t he i n for ma
ti
.
traci ng provides. After a p rofi le 1s u nderstood, race c n prov ide
an i n va l u
a ble
level of drill-down to u nder tand poorly perfonmng sections of code.

Performance cou n ters fi rst appeared in the I B M Power PC arch ite ctur t'
in a
patent approved in 1 994. Since then, the m a n u fact u rers of a lmost all other p r0c.es
.
sor architectures have l icensed or inve n ted s i m i l a r fea t u res. The I nte l P M U
is a
wel l -known example in wide use, perhaps most often used by PC game dev elo per
.
The basic idea i s simple. I nstead of directly t racing code exec ut ion on a CPU, a
built-in state mac h i ne is programmed to assert a n i nterrupt periodically so th at an

JSR can sa m ple the state of the C P U , i n c l u d i ng t h e c u rrent add ress of the progr am
cou nter. Sam pling the program c o u n ter address is the s i m p lest form of perfo r

mance mon i toring and produces a histogram o f the add resses where the progr am
counter was found when sampled. This h istogram ca n be mapped to C fu nct io n

and therefore p rovides a profile i n d icat ing t h e fu n c t i o n s i n which the progra m


counter is fou n d most often.

This indicates calling frequency, s i ze of a fu nction ( l a rger functions have larger

add ress bins ) , and n u mber of cycles that are spent i n each fu nct ion. With 32-bit

\\'Ord-sized address bins, the p rofile can provide t h i s i n fo rm a t ion down to the in
struction level and t herefore by l i n e of C code. M ost performance count er state

machi nes also i n c l ude event detec t i on fo r C P U core eve n t s , i ncluding cache

misses, data dependency p i pel i n e stalls, branch m ispredictions , write-back queue


stalls, and i nstruction and cycle counts.

These events, like periodic cycle-based sampling, can a l so be programmed to as


rt an in terrupt for ISR sa m pl ing of c u rre n t program counter and related eve nt
counts. This e ent profile can i nd icate address ranges ( modules, functio ns, lin es of

sign ifi

code) that have hotspots. As sta ted earlier, a hotspot is a code location where
cant time is spent or where code is exec u t i ng poorly. For example, if a pa rtic ulr
function cause a cache m iss coun ter to fire the sam p l i n g i n terrupt frequently. th
indi ate that the fu nction should be exam i ned for data and instruct ion ca che ineffi

c ienc ie

uch

as

F inally, from these same event counts it i s a l so possible to co mpu te metrk


P l for ea h function with i m ple i n st ru mentation added to fu n tion

en !'

u n
and e, i t point T h i s u e of coun ters with
t hose co u nt er
sam
ple
i
n
l
i
n
e
to
code
.
.
(ier red (
t ru ment a tio n ) 1 a h ybnd approach between trac
ing and profi l i ng, oft en re .
10
b 1
a evellt tracl. flg. Th e perfor ma n e o u n t e r req u
ire a hardw a re ell u t t in ..
P U core, but a bo req u i re some firm ware Jnd
e a prvni<
o ft ware: upport to p ro d u
d \, lft'
hM
or c ent t ra e. f f the cot, omplc xi t y, or
of
on
hed ule preve nt in l u)i
in st od.
uppor t for perfor mance moni tori ng, p u re
ofrw,1re method an be u :,ed

Scanned by CamScanner

Performance Tuning

JOI

U LDI N G PE RFO RM A N C E M O N I
TO R I NG I NTO SO FTWA R E
B I

Most of the bas ic me tho ds fior bui'ld mg


'
per c
m
to a
n1ty
1ormance momtormg capa b.:.J

.
system req mre fi rm ware and software sup por
t:

P rfo rm ance cou nte r AP I ( hardware


sup por ted )
.
D1 rect code instru
ment ation to samp le cycle and event count ers for trace
( h a rdwa re supp orted )

Steady -state asynch ron ous sampli ng of counte


supported)

rs throug h interrupts ( hardware

Software event logging to memory buffer ( software only, but hardwa re time
stamp i mproves quality )
Fu nction or block entry/exit tracing to a memory buffer ( software only, with
post-bu ild modific ation of binary executa bles)
Software event logging is a pure software in-circuit approach for performance

monitoring that requ i res no special hardware support. Most embedded operating
system kernels provide an event logging method and analysis tools, such as the
WindView and Linux Trace Toolkit event analyzers. At first, this approach might
seem really i n t rus ive com pared to the ha rdware-supported m ethods. However,
modern architectures such as the PowerPC provide features that make event log
gi n g t races very efficiently. Architectures such as the PowerPC i n c l ude posted
write b u ffers for memory writes, so that occasional trace i n st ructions writi n g
bit-encoded event codes and ti mestamps t o a memory buffer take no more than a
singl e i n struction. G iven that most functions are on the order of h undreds t o
thousands of i n st ructions i n length ( typically a line o f C code generates m ultiple
assembly i nstructions), a couple of additional inst ructions added at fun c t io n entry
and exit contribute little overhead to normal operation.
Eve n t trace logs are i nvaluable for understanding operati n g system kernel and
application code event t i m i ng. Although almost no hardware support is needed,

an acc u rate t i m estamp clock makes the trace t i m i ng more useful. W ithout a hard
ware clock, you can still provide an event trace showing only the order of events,

but the i n cl usion of microsecon d-or-better -accuracy timestamp s vastly improves


t he trace. System architects should carefully consider hardware support for t hese

well-known and well-tested tracing and profiling methods, as well as schedulin g


time fo r softw are developm ent.
When you i m pleme n t performanc e- monitoring hardware and software o n an

embedde d system, the capabilit y to collect h uge amounts of data is almost over
whelming. Effort put into data collection is of questionab le value in the absence of
a plan to anal yze that data . Trac e i n form ation i typically t h e easiest to analyze and
is taken as needed for sequences of interest and mapped to code or to

Scanned by CamScanner

timeline of

210

Real-Time Embedded Components and Systems

gh

OS events. When traces are mapped to code, enginee rs can replay and step th r
code as it ran at full speed, noting branch es taken and overall exec utio n flow or
event traces mapped to OS events, enginee rs can . analyze multithread cotex:t
. g events sue h as semaph
switches made by the schedu ler, t h rea d sync h ron1zm
ore
takes and gives, and appI .tcat10 n-spe c1' fiIC event s.
Profiles that collect event data and PC location down to the level of a 32_bIt
address take more analysis than simple mapping to be useful. Perfo rma nce tuni
effort can be focused well through the process of drilling down from the c g
module level, to the function level, and finally to the level of a specific line of cod e
Drilling down provides the engineer analyst with hotspots w h ere more detailed in formation, such as traces, should be acquired. The drill-down process ensu res th at
you won't spend time optimizing code that will have little impact on bott om l ine
performance improvement.
.

e.

PATH LE NGTH, E F F I C I E NCY, A N D CAL L I N G F R EQU E N CY

Armed with nothing but hardware support to timestamp events, you can still de
termine code segment path length and execution efficiency. Ideally, performance
counters would be used to automate the acquisition of these metrics. However,
when performance counters aren't available, you can measure path length by hand
in two different ways. First, by having the C compiler generate assembly code, you
can then count the instructions by hand or by a word count utility. Second, if a sin
gle-step debugger is available ( for example, a cross-debug agent or JTAG ) , then
you can count instructions by stepping through assembly by hand. Althoug h labo
rious, it is possible, as you ' ll see by looking at some example code.
The code in Listing 1 0. l generates numbers in the Fibonacci sequence.
LISTING 1 0. 1

Simple C Code to Com pute the Fibonacci Sequence

t y pede f u n s i g n e d i n t U I NT32 ;
#def i n e F I B_L I M I T_FOR 32 B I T 4 7

U I NT32 i d x
U I NT32

seqCnt

O,

j dx =

1 ;

= F I B_L I M I T_FOR_32_B I T ,

U I NT32 f i b = 0 ,

f ibO = 0 ,

fib1

1 ;

v o id f ib_wr a p p e r ( v o i d )

f o r ( id x =O ;

idx

f i b = f i bO

Scanned by CamScanner

<

i t e rC n t ;
f i b 1 '

idx + + )

i t e rC n t

1 ;

Performance Tuning
whi l e ( j d x % l t ;

21 1

s eqCn t )

{
f ibO

f ib 1 ;

fib1

j dx++ ;

fib ;

f i b = f ibO

fib1 ;

}
}
}
The easi est way t o get an i nstr ucti on cou nt for a bloc
k of code such as the
Fibo nacci sequ en e-gen erati ng funct ion in Listin g I 0. 1 is to comp ile the C code
into assembly. With t h e GCC C comp iler, this is easily done with the foJlow ing
com ma nd line :

gee f ib . c

-S

- o f ib . s

The resulting assembly is ill ustrated in Listing 1 0.2. Even with the automatic
generation of the assembly from

C, it's still no easy task to count instructions by

hand. For a simple code block such as this example, hand counting can work, but
the approach becomes ti me-consuming for real-world code blocks.
LISTING 1 0.2

GCC-Generated ASM for Fibonacci C Code

. g l o b ! _f ib_w r a p p e r
. s e c t i o n TEXT , t ex t , r e g u l a r , pure_ i n s t r u c t ions

. a l i gn 2
_f ib_wr appe r :
s t mw r30 , - 8 ( r 1 )
stwu r 1 , - 48 ( r 1 )
mr r30 , r 1
mf l r rO
bcl 20 , 3 1 , " L000000000 0 1 $pb "
" L0000000000 1 $ p b " :
mf l r r 1 0

mt l r rO

00 1 S pb " )
i d x - " L0 00 00000
pb " ) ( r 2 )
la r 2 , l o 1 6 ( _i d x - " L00 000 0 000 0 1 $

a dd i s r 2 ' r 1 0 , h a 1 6 (
l i ro , o

stw rO , O ( r2 )
L2 :

00 00 00 1 Sp b " )
id x - " L0 0 00
1 Sp b " ) ( r9 )
la r9 , lo 1 6 ( _ idx - " LOOO O 000 000
000000 1 S p b " )
" L 00 00
a d d i s r 2 , r 1 0 , h a 1 6 ( _ I t e r at 1. o n s
" ) ( r2 )
OOOO000 0 1 Spb
la r 2 , l o 1 6 ( _ I t e r a t i on s - " LO O

a dd i s r9 ' r 1 0 ' h a 1 6 (

lwz r9 , 0 ( r9 )

Scanned by CamScanner

111

and Systems
Real-Time Em bedded Components

lwz r0 , 0 ( r2 )
cmp lw c r7 , r9 , r0
blt cr7 , L5
b L1
L5 :

1 $pb " )
a d d i s r 1 1 , r 1 0 , h a 1 6 ( _f i b - " L 0000 0000 00
1)
l a r 1 1 , lo 1 6 ( _f ib - " L0000 00000 0 1 $pb " ) ( r 1
a d d i s r9 , r 1 0 , h a 1 6 ( _f ib0 - " L0000 00000 0 1 $p b " )
la r9 , lo 1 6 ( _f i b0 - " L00000 00000 1 $pb " ) ( r9 )
addis r2 , r 1 0 , h a 1 6 ( _ f i b 1 - " L000000 0000 1 $p b " )
l a r2 , l o 1 6 _f i b 1 - " L000000 000 0 1 $pb " ) ( r2 )
lwz r9 , 0 ( r9 )

lwz r0 , 0 ( r2 )
add rO , r9 , rO
stw r0 , 0 ( r 1 1 )
LS :
addis r9 , r 1 0 , h a 1 6 ( _ j d x - " L0000000000 1 $ p b " )
la r9 , lo 1 6 ( _ j d x - " L 0000000000 1 $pb " ) ( r9 )
a d d i s r2 , r 1 0 , h a 1 6 (_se q i t e r at i o n s - " L000000000 0 1 $ p b " )
la r2 , lo 1 6 ( _se q i t e rat ion s - " L0000000000 1 $ pb " ) ( r2 )
lwz r9 , 0 ( r9 )
lwz r0 , 0 ( r2 )
cmplw c r7 , r9 , r0
blt cr7 , LS
b L4
LS :
addis r9 , r1 0 , h a 1 6 ( _ f i b0 - " L0000000 000 1 $ p b " )
la r9 , lo 1 6 ( _f ib0 - " L000000 00001 $pb " ) ( r9 )
addis r2 , r 1 0 , ha 1 6 ( _f i b 1 - " L0000 00000 0 1 $p b " )
la r2 , lo 1 6 ( _f ib 1 - " L0000 00000 01 $pb " ) ( r2 )
lwz r0 , 0 ( r2 )
stw r0 , 0 ( r9 )
a d d i s r9 , r 1 0 , h a 1 6 ( _f i b 1 - " L000
0000 000 1 $p b " )
la r9 , lo 1 6 ( _f i b 1 - " L00 000 000
00 1 $p b " ) ( r9 )
a d d i s r2 , r t O , h a 1 6 ( _f i b - "
L00 000 000 00 1 $pb " )
l a r2 , lo 1 6 ( _f i b - " L00 000 000
00 1 $ pb " ) ( r2 )
lwz rO , O ( r2 )
stw r0 , 0 ( r9 )
a d d i s r 1 1 , r 1 0 , ha 1 6 (
_f ib - " L0
la

0 1 $p b " )
1 1 , lo 1 6 ( _ f ib - " L0000000000 1 $p00b 00
" ) ( r1

a d d 1 s r9 , r O , h a 1 6
( _f i b0 -

00 00

1)
" L0 00 00 00 00 0 1 $p b

")
la r9 , lo1 6 ( _f ib0 - " L0
0000 0000 0 1 $p b " ) (
r9 )
addis r2 , r 1 0 , ha 1 6 ( _
f ib 1 - " L00 0000
0000 1 Sp b " )
la r2 , lo 1 6 ( _f i b 1 - Lo
ooooooooo 1 s pb
" ) ( r2 )

Scanned by CamScanner

Performance Tuning

J1J

lwz r 9 , 0 ( r9 )
lwz r0 , 0 ( r2 )
add rO , r9 , rO
stw r0 , 0 ( r 1 1 )
add i s r9 , r 1 0 , ha 1 6 ( _ j d x - " L00
000

la

000 00 1 $

9 , l o 1 6 ( _ j d x - " L0000000000 1 $p b " ) ( r9 b " )

add 1 s r2 , r 1 0 , h a 1 6 ( _ j d x - " L00


000 000 00 1 $pb " )
l a r2 , lo 1 6 ( _j d x - " L00 000 000 00
1 $pb " } ( r 2 )
lwz r2 , 0 ( r2 )
addi r0 , r2 , 1
stw r0 , 0 ( r9 )
b L6
L4 :
addi s r9 , r 1 0 , h a 1 6 ( _ idx - " L000 0000 000
1 $pb " )
la r9 , l o 1 6 ( _ i d x - " L0000 00000 0 1 $pb " ) ( r9 )

a d d i s r2 , r 1 0 , a 1 6 ( _idx - " L00000 00000 1 $pb " )


la r 2 , l o 1 6 ( _ i d x - " L00000 00000 1 $pb " ) ( r2 )
lwz r2 , 0 ( r2 )
add i rO , r2 , 1
s t w r0 , 0 ( r9 )
b L2
L1 :
lwz r 1 , 0 ( r 1 )
lmw r30 , - 8 ( r 1 )
blr

I f you ' re looking for a less tedious approach than cou n t i ng i nstructions b y
hand from assembly source, you can use a single-step debugger and walk through
a code segment in disassembled format. An interesting side effect of this approach
for c o u n t i ng i nstructions is that the counter often learns quite a bit about the flow

of the code i n t h e p rocess. Many single-step debugging tools, includi n g JT AG


( Jo i n t Test A p pl ications Group ) hardware debuggers, can be extended or auto
mated so that they can a u tomatically step from one address to another, keeping

count of i nstruction s in between. Watching the debugger auto- step and count i n
structions between a n e n t ry poin t a n d exit poi n t for code c a n even b e an i nstruc
tive expe rience that may spark ideas for path optim izatio n .
be compi l ed for G4- -or G S
List i n g 1 0. 3 p rovide s a mai n progra m that can

based M ac i n tosh compu t ers runnin g M ac OS X. You can downl oad t h e C H U O


r
toolse t for t h e Mac and use i t with t h e M ONste instru mentation includ ed i n
h e hardware per
Listing 1 0. 3 to measu re t h e cycle and instru ction counts using t
g 1 0. 3 takes t he Fibon acci code
forma nc e c o u n ters. The exam ple code in Listin
Power PC G4 perfor mance count ers
from List i ng J o. J a n d adds sam pling of the

Scanned by CamScanner

214

Real-Time Embedded Components and Systems

Nste r ana lysis tool . The same type of tycl


through the CHU O inte rfac e to the MO

:
te
n
ance
rs
cou
orm
perf
m
erPC
e mbeddtd
Pow
and even t coun ters are foun d i n the
I BM PowerPC cores .
usnNG 1 o.J

ute CPI for Fibonacci Cod e Block


Simp le c Code for PowerPC to Comp

--

# inc lude

" s t d io . h "

#include

" unistd . h "

M a c i n t o s h G4 Powe rPC w i t h t h e Mo nst er


I I T h i s code w i l l wo r k o n t h e
t i o n a n d a n a l y s i s t oo l .
1 1 PowerP C P e r f o r m a n c e Cou n t e r a c q u i s i

I I Simply p a s s in

- DMONST ER_ANA L YS I S wh e n c o mp i l i n g t h e e x amp l e .

PMAP I c o u n t e r s and
I I MONSTER p r o v i d e s a f u l l y f e a t u r e d s e t o f
f o r t h e Mac int osh
I I a n a l y s i s a long w i t h t h e f u l l s u i t e of CHUO t o o l s

I I G s e r i e s of PowerPC p r o c e s s o r s .
II

I I Al t e rnat i v e l y on x 86 Pent ium ma c h i n e s w h i c h i m p l e m e n t t h e Time Stam p

I I Co u n t e r in t h e x86 v e r s i on of P e r f o rm a n c e C o u n t e r s c a l l e d the P'IJ ,


I I p a s s in

- DPMU_ANAL Y S I S .

F o r t h e P e n t ium ,

o n l y C P U c y c l e s will be

I I m e a s u r e d and C P I e s t im a t ed b a s e d u p o n k n own

i n s t r u c t ion count .

II

I I F o r t h e M a c i n t o s h G4 ,
II

II

II
II
II

s i mply l a u n c h t h e m a i n

s h e l l w i t h t h e MONSTER a n a l y z e r s e t u p f o r

p r o g r a m f r om a Mac OS

r e m o t e m o n i t o ri ng to

f o l low al o n g w i t h t h e e x am p l e s i n t h e a r t i c l e .

L e a v e t h e #def i n e LONGLONG_OK i f
su pport 64 - b it

unsigned integers ,

I I ANS I C .

y o u r c o mp i l e r a n d a r c h it ectur e

d e c l a r e d a s u n s i g n e d long long in

II
II
II
II

I f not ,

p l e a s e remove t h e # d e f i n e be low f o r 32 - b i t u n s i g n
ed

long d e c l a r at io n s .

#d ef i n e LO NG _ LO NG_OK
# d e f i n e F I B_L I M I T FOR 32 B I T
47
-

ty ped ef u n s ig n e d i n t
U I NT3 2 I
# i fde f MO NS TE R ANAL
YS I S
# i n c lud e

cH UO l c h ud . h "

#inc lude

ma c h l bo o l e a n . h "

# e l se

Scanned by CamScanner

Performance Tuning

21 5

# i fdef LONG_LONG_OK
t yped ef u n s ign ed lon g lon g int
U I Nrs4 ;
U I NT64 s t a r t TSC
U I NT64 st opTSC

O;

O;

U I NT64 c y c l eC n t

O;

UI NT64 readTSC ( vo id )
{
U I NT64 t s ;

asm

v o l a t i l e ( " . by t e OxO f , Ox
31 "
return t s ;

__

__

" =A "

(ts) ) ;

}
U I NT64 c y c l e s E l a p s e d ( U I NT64 st opTS ,

U I NT64 startTS )

{
ret u rn

( s t o pTS

s t a r t TS ) ;

}
#else
typedef s t r u c t
{
U I NT32 low ;
U I N T32 h i g h ;

TS64 ;

TS64 s t a rt T SC
TS64 st opTSC =
TS64 cyc leCnt

{O ,
{O ,

= o;

O} ;
0} ;

TS64 rea d T SC ( v o i d )
{
TS64 t s ;
__

as m

__

v o l a t i l e ( " . by t e

Ox 0f , Ox 3 1 "

" =a "

return t s ;

}
pTS ,
TS6 4 c y c l e s E l a p s e d ( T S6 4 s t o
{
U I NT32 o v e r F l owCn t ;
U I NT32 c y c l e C n t ;

TSR4 l a n sedT :

Scanned by CamScanner

TS64 s t a r t TS )

( t s . l ow) ,

" =d "

( t s . h ig h ) ) ;

111

one nts and Syste m s


Real-Time Embedd ed Comp

ove r F l owC n t

) ;
i
s t a r t T SC . h g h

( s t opT SC . h i g h

opr sc . 1ow
i f ( o v e r F lowCnt && ( s t

<

s t a r t TS C . l o w ) )

{
overF lowCnt- ;
c y c l eCnt

s t a r t T SC . low )

s t o p TSC . lOW -

s t a r t TS C . l OW j

( Oxf f f f f f f f

}
else

{
c y c leCn t

}
e la p s e d T . low = c y c l eCnt ;
elapsed T . high

ov e r F lowC n t ;

r e t u rn e l a p s e d T ;

}
#endif
#endif
U I NT32 idx

o,

jdx = 1 ;

U I NT32 se q i t e r at ions = F I B_L I M I T_FOR 32 B I T ,


U I N T32 req l t e r at ions = 1 , I t e r at i o n s =

1;

UI NT32 f i b

OI

f i bO

OI

# d e f i n e F I B_TEST ( s eqC n t ,
.
for ( idx=O
;

idx $ l t ;

f 1' b I

i t e rCn t )

i t e rCn t I

1 j

idx++ )

{ \
fib

f ibO + f i b 1 ;

wh ile ( j d x

<

s eqCnt )

\
\

{ \
f i bO

f ib 1 ;

f ib 1

fib;

\
\

f i b = f i bO + f i b 1 ;
j d x++ ;

} \
} \
void f i b_wrappe r ( v oid )

{
F I B_ TES T ( s e q l t e ra

Scanned by CamScanner

t ions ,

I t e r a t i on s ) ;

s t o p TS C . l ow ;

Performance Tuning
# ifd ef MONSTER_ANALYSI S
ch a r lab e l [ ] = " F ib o n a c c i Se r ie s " '
in t m a i n (

i n t a r gc ,

c h a r a rg v ( ] )

{
doub le t b e g i n ,
if ( a rgc ==

t e lapse ;

2)

{
sscanf ( argv [ 1 J '
s eq i t e r a t io n s
I t e ra t io n s

}
else if ( argc ==

" \ ld " ' &req i t erations ) ;

req i t e r a t ions \ F I B L I M I T FOR 32 B I T


- req i t e r a t io n s / seq i t e at ion ;
'

1)

p r i nt f ( " U s i n g d e f a u l t s \ n " ) ;
else
p r i n t f ( " Usage :

f i btest

[ N um iteration s } \n " ) ;

c h ud i n it i a l i z e ( ) ;
TRUE ) ;

c hudUma r k P I D ( get pid ( ) ,

ch udAc q u i r e R emoteAcc es s ( ) ;
t b e g i n = c h ud R e a d T imeBa s e ( chudMic roSecon ds ) ;

chudSt a r t R em o t e P e rfMor.it o r ( labe l ) ;


F I B_T EST ( s eq l t e r at ions ,

I t e r ations ) ;

c h u d S t o p R e m o t e P e rfMoni t o r ( ) ;
t e l a p s e = c h u d R e a d T imeBa s e ( c hudMic roSecond s ) ;
p r i n t f ( " \ n F i bo n a c c i ( \l u ) =\lu
s eq i t e ra t ion s ,

fib,

c h u d R e l e a s e R emo t eAc c es s ( ) ;
ret u rn O ;

}
#else

ER 1 5
#de f in e I NST CNT F I B_ I N N
R 6
#d e f i n e I NS T CN T F I B_ OU TTE

Scanned by CamScanner

( O x\08 l x ) for \f use c \ n " ,

fib,

( t elapse - t begin ) ) ;

J1 7

211

Real-Time Embedded Components and Systems


i n t main ( i n t a rgc , c h a r a rg v I J

double c l k R a t e = o . o , f ibC P I

U I NT32 instCnt

o;

)
=

o.o;

if ( a rgc = = 2 )

sscanf ( argv [ 1 J ,

" \ld " , & r e q l t e r a t io n s ) ;

s eq l t e rat ions = req l t e ra t ions \ F I B_L I M I T_FOR_32_B I T ;

I te r a t ions = req l t e r a t i o n s I seq i t e ra t i on s ;

e l s e i f ( argc = = 1 )
print f ( " Us i n g d e f a u l t s \ n " ) i

else
prin t f ( " Usage : f ib t e s t [ Num i t e ra t ions ) \ n " ) ;
instCnt = ( I NST_CNT_F IB_ I NNER

s e q i t e ra t i on s )

( I NST_CNT_F IB_OUTTER . I t e r a t ions ) + 1 ;

I I E s t imate CPU c lo c k r a t e

s t a r t TSC = readTSC ( ) ;
u s l e ep ( 1 000000 ) ;
st opTSC = readTSC ( ) ;

c yc leCnt = cyclesE lapsed ( st opTSC , s t a r t T SC ) ;


#ifdef LONG LONG OK
-

p r i n t f ( " C ycle Count =\l l u \ n " , c y c l eC n t ) ;


c l kRate = ( ( double ) cycleCnt ) l 1 000000 . 0 ;
print f ( " Based on u s leep a c c u ra c y , CPU e l k r a t e = \lu e l k s / sec , ,
c y c l eCnt ) ;
prin t f ( " \7 . 1 f Mh z \ n " , c l k R a t e ) ;
#else
p r i n t f ( " C y c l e Coun t =\lu \ n " , c y c
l eC n t . low ) ;
p rint f ( " Ov e r F low Cou nt =\l u \ n " , c y c
l e C n t . h i gh ) ;
c lk R a t e = ( ( do ub l e ) c y c l eCn t . low
) / 1 000 000 . 0 ;
prin t f ( " Base d on u s le e p a c c u r
a c y , CPU e l k r a t e = \lu elk s / se c , .
c y c l eCnt . low ) ;
p r i n t f ( " \7 . 1 f Mh z \ n " , c l kRa
te ) ;

#end if

p r i n t f ( " \ n R u n n ing F ibonacc i ( ) Test for \ld i t e ra t ions \ n " ,


s e q i t e rat ion s , I t e r a t ions ) ;

Scanned by CamScanner

_,

Pmormance Tuning

I I START Time d F ibon ac c 1. Tes t

star tTSC

211

readTSC ( ) ;

F IB_TEST ( seq l t e ra t i on 5 , I ter at ions ) ;

stopT SC = readTSC ( ) ;
I I ENO T imed F ibonacci Test

#ifdef LONG_LONG_OK
printf ( " s t artTSC =Ox\0 1 S x \

' startTSC ) ;
t
f
(
"
s
t
o
pTSC
=Ox\0 1 6x \ n , st opTS C ) ;
prin

cycleC n t = c y c lesE lapsed ( st opTSC ' s t artT C ) ;


p r i nt f ( " \ n F ibonacc i ( \lu ) =\l u ( Ox\08lx ) \ n ' seq i terations , f i b , f ib ) ;
p r i n t f ( " \ nC y c l e Count=\11 u \ n " , cycleCnt ) '

prin t f (

\ n i n st Cou n t =\lu \ n " ' ins


. tC nt ) ;
f ib C P I - ( ( d o u b l e ) cycleCn t ) I ( ( do b l e ) instCnt
) '
.

print f (

\ nC P I =\4 . 2f \ n " , f ibCPI ) ;

#else
p r i n t f ( " startTSC h igh=Ox\0 8x , startTSC low=Ox\08x \ n " , startTSC . high ,
startTSC . l ow ) ;

,
p r i n t f ( " stopTSC h i gh=Ox\ 08x , stopTSC low=Ox\08x \ n " , stopTSC . h igh

stopTSC . l o w ) ;
);
cycl eCnt = c y c lesE laps ed ( stopTSC , sta rtTSC
) \ n " , seq l te rat ion s , f i" b ' f ib ) ;
p r i n t f ( " \ n F ibon ac c i ( \l u ) =\lu ( Ox\0 8lx
cyc leCn t . low ) ;
p r i n t f ( " \ nC y c l e Cou nt=\ lu \ n " ,
n " , cyc leC nt . h igh ) ;
p r i n t f ( " Ove rFl ow Cou nt=\lu \
low ) I ( ( dou ble ) ins tCn t ) ;
fib CP I = ( ( dou bl e } cyc leC nt .
f ibC PI ) ;
p r in t f ( " \ nCP l =\4 . 2f \ n " ,
#endif
}
#end if

, the M ON ste r
bu ilt for the M ac int os h G4
3
0.
1
g
tin
Lis
m
R un ni ng th e code fro
th e Fibo na cc i
d exec ut io n efficiency fo r
an
h
gt
len
th
pa
an alysis to ol de te rm i n es the
in Li st in g 10 .4.
fig k ' as su m m arized
seq ue nc e co de bl oc
.
ted us in g the M O N ste r on
lec
ol
c
s
wa
4
0.
1
in Li st in g
m t he
T he ex amp le analysis
C an d downl oa da bl e fro
GC
ith
w
d
le
pi
e ea sil y co m
u r at i on an d so u rc e cod
C D- RO M.

Scanned by CamScanner

220

nts and Systems


Rea I -T.1me Embedded C ompone
Nste r An a is for Example Code
Sample Maci ntosh G4 MO
, 1 66 MHZ CPU Bus , B r a n c h Fo ldi n g :
Processor 1 : 1 250 MHz PPC 7447A

LISTING 1 0.4

enabled , Threshold :

o , M u l t i p l ier :

2x

Timebase res u l t s
1 - CPU Cyc l e s
( p l c l ) PMC 1 :
d
( p 1 c 2 ) PMC 2 :
2 - I n s t r comp l e t e
( tbl ) Pl

Con f i g : 5 - CPI - s imple


1

CPI ( c omp l e t e d )
P l Timebase ( c pu c y c l e s )
_

P 1 pmc 1

SC res 1
Tbl :

( P1 )

( cpu cycles )

2 - I nst r Complet ed :

1 - CPU Cyc l e s :

CPI ( c omplet ed )
84 1 3 1 60

8445302 . 308861 008

P 1 pmc 2

( p1 )
1 6 1 8307

5 . 1 9874 1 64790735
Tb1 :

( Pl )

( c pu cycles )

2 - I n s t r Comp l e t ed :

1 - CPU Cyc l e s :

CPI ( c ompleted )
7346540

7374481 . 7 1 0 1 0 7035

5 . 398402349080333
Tb1 :

(Pl )

( cpu cycles )

2 - I n s t r Complet ed :

1 - CPU Cy c l e s :

CPI ( comple t ed )

7207497 . 309806291

7 1 80243
4 . 302282649205663

Tb1 :

(Pl )

( cpu cycles )

2 - I n st r Compl e t e d :

1 - CPU C y c l e s :

CPI ( complet ed )
44808388

44985850 . 44768739

1 . 1 60973752343601
Tb1 :

( P1 )

( cpu cycl es )

2 - I n s t r Compl eted :

CPI

1 - CPU C y c l e s :

( c omplet ed )
470597098

472475463 . 2072901

1 . 026246290797887
Tb1 :

( P1 )

( cpu cycles )

2 - I n s t r Compl e t ed :
2 1 49357027 . 379779

(P1 )

CPI

1 - CPU C y c l e s :

( c omplet e d )
2 1 40806560

1 360873

( p1 )
1 668938
(p1 )
38595522
(p1 )
4 5 8 5 6 1 558

( p1 )
2 1 08 61 9999

1 . 0 1 526427759 1 63 1
S a m - Sie we r t s - Com put e r : - / TSC

s ams iew e r t S . / p e rfmo n 1 00


Fib ona cci ( 6 ) = 1 3 ( OxOOOO
OOO d ) f o r 7 5 4 5 . 2 1 3 589 u s e
c
Sam - Sie we r t s - Com put e r : - /
TSC s a m s i ewe r t S . / p e rf
m on 1 0000
F i b o n a c c i ( 36 ) =24 1 578 1 7
( Ox 0 1 709 e79 ) f o r 588 3 . 6 1
02 50 u s ec
Sam - Sie we rt s - Comp u t e r :
- / TSC sam sie we r t S . /
p e rf m o n 1 000 000
F i b o n a c c i ( 28 ) =5 1 422 9 (
Ox0 007 d 8 b 5 ) f o r 600 . 1
1 363 8 u s e c
8
Sam - S i ewe rt s - Com pu t e r : - / TSC
s a m s iew e r t S . / pe r fmo n 1 000 0 00 00

Scanned by CamScanner

Performance Tuning

F ibonac c i ( 2 7 ) = 3 1 78 1 1

( Ox 0004d973 ) f o r 36554 . 38 1 943 u s e c

Sam - S iewe r t s - Comput e r : - / T SC sams iewe rt$


F ibonac c i ( 3 1 ) = 2 1 78309

221

. / pe rfmon 1 0000000000

( O x002 1 3d05 ) f o r 378554 . 53 1 6 1 8 u s e c

Sam - S iewe r t s - Comput e r : - / TSC sams iewe r t $ . / pe rfmon 1 000000000000


F i bonacc i ( 1 7 ) = 2584

( OxOOOOOa 1 8 )

f o r 1 720 1 78 . 701 376 u s e c

Sam - Siewe rt s - Comput e r : - / TSC sams iewe rt$

Note that i n the example runs, for larger numbers of iterations, the Fibonacci
sequence code becomes cached and the CPI drops dramatically from 5 to 1 . If the
Fibonacci code were considered to be critical path code for performance, you

might want to ((. 'Sider locking this code.block into LI instruction cache.

If you don't have access to a PowerPC platform, an alternative Pentium TSC

(Ti meStamp Counter) build, which counts CPU core cycles and uses a hand
counted in struction count to derive CPI, is included on the CD-ROM. You can
download and analyze this code on any Macintosh Mac OS X PC or Wi ndows PC

running Cygwin TM tools. On the Macintosh, if you use MONster with the Remote
feature, the appl ication will automatically upload counter data to the M ONster
analysis tool, including the cycle and i nstruction counts for the Fibonacci code
block.

FUNDAMENTAL OPT I M I ZATI O N S


Given analysis o f code such a s the example Fibonacci sequence code block, how do
you take the i nformation and start optimizing the code? The best approach is to
make gains by following the low-hanging fruit model of optimization-apply op
timizations that require the least effort and change to the code, but provide the
biggest im provements first. M ost often, this means simply ensuring that compiler
optim izations that

can

be used

are

being used. After compiler optimizations are

exhausted, then, simple code restructuring might be considered to eliminate cache


miss hotspots. The optimization process is often architecture-specific. Most archi
tectures include application programming notes with ideas for code opti mization.
Although the methods for optimizing go beyond the scope of this book, some of
the most common are summarized here . .
Following are some basic methods for optimizing code segments:

Use compiler basic block optimizations ( inline, loop unrolling, and so o n ) .


Simplify algorithm complexity and unnecessary instructions in code.
Precom pute commo nly used express ions up front .
Lock critical-p ath code into LI instructi on cache.
Lock critical -pa th, high-frequency reference data into l l data cache.

Scanned by CamScanner

222

Real-Time Embedded Compone nts and Systems

ory prefet ch-prefetch data to be read an d modfi


1 1ed
Take ad vantage o f mem
int o L 1 or L2 cache.
.
.
1/0 for data used m algorith m s early to avo id
start
chprefet
IO
M
M
Use
data
lls.
sta
ne
eli
pip
dependency

these opt i m i zation methods usin g profil


Aft er you 've exhausted some of
ing to
.
.
. atto
.
o
d
nee
f
opt1m1z
m
most
blocks
n,
the
of
y
n
o
io
t
ca
u
fi
'
i
nt
n
e
eed to
sup po rt i d
. .
. .r
1
t
r
f
ad
d1
s
0
1
l
a
. 1m
method
ov
ation
optimiz
e
x
ple
m
en t. GO()d

consider more com


1
the
m
opt1
m
uded
incl
be
zat1on
should
p roce ss t o e
functional regression testing
n
e
co
change
g
n
ptimiz
s.
o
by
broken
This
not
is
functionality
is
that
especi ally
sure
.
.
1ons.
t
1za
opt1m
Note
that
architecture-specific
a rc hitectu re
true for more complex
specific optimizat ions make code m uch less portable.

Some more advanced methods for optim izing code segment s incl ude the

following:

Use of compiler feed bac k optimization-profile provided as in put to compiler.

H an d - tune assembly to ex ploi t features such as cond i tion a l exec utio n .

SUMMARY
After the current marriage of software, firmware, and hardware is op tim i zed , how
does the architect improve the perfo rmance of the next generation of the system?

At some poi nt, for every product, the cost of further code optimization will out
weigh the gain of improved performance and m a rketability of the product. How
ever, you can still use performa nce a nalysis features such as performa nce co unters

to characterize potential modifications that should go i nto next-generation prod

uc t s. For example, if a fu nction is h ighly opti m i zed, b ut remains as a high con

sumer of CPU resources, and the system is C P U - b o u n d so that the CPU is the

bottleneck, then hardware rearch itecting to address t h is is beneficial. That part icu
lar fu nction might be considered for i mplementation i n a h ardwa re - accel erat ing
ne
state achi ne. Or perhaps a second processor c o re could be dedic ated to that
lln

fu nction. T he point is t h a t pe rfo rm a nce counters a re not only useful fo r coa


seful
maxi n:i u m performance out of a n existing h ardwa re design, b u t ca n al o be u
_
for guiding future hardware design.

Scanned by CamScanner

Performance Tuning

22J

EX E RCI S E S
l.

2.

3.

Use the Pentium TSC code to count the cycles for a given code block.
Then, in the single-step debugger, count the number of instructions for
the same code block to compute the CPI for this code on your system.
Provide evidence that your CPI calculation is accurate and explain why ifs
as high or low as you find it to be.
Compare the accuracy for timing the execution of a common code block
first using the CD-ROM time_stamp.c code and then using the Pentium
_
TSC . Is there a significant difference? Why?
Download the Brink and Abyss Pentium-4 PMU profiling tools and pro
file the Fibonacci code running on a Linux system . Explain your results .

CHAPTER R E F E R E N C E S

[ Siewert]

h ttp://www- 1 28.ibm.com/developerworks/power!library/pa -bigironJ/

Scanned by CamScanner

,,. .

. ..

11

H igh Ava il a b il ity a n d


Re li a b il ity D e s ign

'

In

This Chapter

I ntro duc tion

Reliab ility and Availa bility: Simila rities and Differ ences
Reliab ility

Reliable Software

Available Software

Design Tradeoffs

H ierarchical Approaches for Fail-Safe Design

INTRODUCTION
High availability a n d high reliability ( HA/HR) are measured differently, but both
provide some indication of system robustness and quality expectations. Further
more, because system failures can be dangerous, both also relate to system safety.
I ncreasing availability and reliability of a system requires time and monetary in
vest ment, increasing cost, and increasing time-to- market for products. The worst
.thing that can come of engineering efforts to increase system availability and relia
bility is an unintentional decrease in overall availability and reliability. When de
sign for HNHR is not well tested or adds complexity, this un intentional reduction
.
in H A/ H R can often be the result. In this chapter, design methods for increasing
HA/H R are introduced along with a review of exactly what HA and what H R really
m eans. ( Portions of the material on performance tuning presented in this chapter
was first published o n the IBM developerWorks Web site, IBM developerWorks.
Reprinted with permission.)

Scanned by CamScanner

226

Real-Tim e Embedd ed Compo nents and Systems

REL IAB I LITY AND AVA I LAB I LITY :


SIM I LAR ITI ES AND DIF FER ENC ES
the percentage o f time over a well -defined period that a system or
service is available for users. So, for example, if a system is said to have 99 . 999% , or
five nines, availability, t h is system must not be unavailable m ore than five m inut
over the course of a year. Quick recovery and restorat ion o f service after a fa ult
greatly increases availability. The quicker the recovery, the more often the system

Availability is

or service. can go down and still meet the five n ines criteria. Five nines is a

high

availability, or HA metric.
In contrast, high reliability ( H R) is perhaps best described by the old adage that
a chain is only as strong as its weakest link. A system built from components that
have very low probabil ity of fail ure leads to high system reliability: The overalJ
'expected system reliability is the product of all subsystem rel iabili ties, and the sub

system rel iability is a product of all component reliabilities. Based upon this math
ematical fact, components are required to have very low probability of fail ure if the
subsystems and system are to also have reasonably low probab ility of failure. For
example, a system of
or
. _, ..

..

1 0 components, each with 99.999/o reliability, is (0.99999) 10,

99.?,9/o , reliable. Any decrease in the reliability of a single component in this

txpe of s i n gle-string design can greatly reduce overall rel iability. For example,

95/o reliable component in the previous example that was 99.99%


reliable would drop the overall reliability to 94.99/o. H R components are often

ding j ust one

constructed of higher quality raw materials, subjected to more rigorous testing,


and often have more complex fabrication p rocesses, all i ncreasing com ponent
cost . There may be notable exceptions and advan cements where lower cost com
ponents also have H R , making component choice obvious, but this is not
The simple mathem atical relation ship between H A and H R is:
Availa bility = MTBF

I ( MT B F

typ ical.

MTTR )

Mean Time Between Failures and MTTR = Mean Time to Recovery.


st ill
o, while an H R system has large MTBF
, it can tolera te longer M TT R a nd
y i. e l d e n t a ailabi lity. Likew ise a
en

system with low reliab ility can provide dec

. it h a
a va i lab i l i ty if
s very fast recovery time s. From this relat ion , you can see t ha .
o ne syste m w t. t h five n i nes HA mig
ht hav e low rel iabi lity and a very fast recorf'
( ay 1 econd ) and therefo re go down and recov
a yea!
er 3 ' 000 o r mo re t i m es in
( 1 0 ti me a day' ) An H R syste m m i gh t go down
ait for
once every two yea rs a n d ..

where MTBF
.

ope rato r i n terv e n t i o n to


.
- i. ing Cl11.
aie
c 1 Y a n d care fu l J y recover operat iona
l state, t
.
.
ae ra ge 1 O m i n u te s eve ry
2 yea r fo r the ex act am H A met ri o f fi v e 0 1 0
e
.
Cl ea rly , mo st ope rato r wo u l d no
t co n 1d er t hese yste ms to h a ve t he sam e
.

Scanned by CamScanner

qual 1t'

High Availability a nd Reliability Design

227

RE L IA BI LITY
The<;>retically, it's possible to build a system with low-quality, not-so-reliable compo
nents and subsystems, and still achieve HA. This type of system would have to in
clude massive redundancy and complex switching logic to isolate frequently failing
components and to bring spares online very quickly in place of those components
that failed to prevent interruption to service. Most often, it's better to strike a balance
and invest in more reliable components to mini mize the interconnection and
switching requirements. If you take a simple example of a system designed with re
dundant components that can be isolated or activated, it's dear that the interconnec
tion and switching logic does not scale well to high levels of redundancy and sparing.
Consider the simple, single-spare dual-string system shown in Figure 1 1 . 1 .

Ct

Cl

C2

C2

C3

C3

FIGURE 1 1 . 1

Dual-string, cross-strapped

subsystem interconnection.

The exa mp le syst em s h 0wn in Figure 1 1 . l has eight possible configurat ions
that can be formed b Y activatin g and isolating components C l , c 2 , o r C3 from s 1de
and
A or si'd e B The system m ust be able to positively detect failed components
.
an operable switch state and new mtercontrack these failu res to reco nfiigure with
ble 1 1 1 describes the eight possible co n figu
nect ion of activate d compone n ts. Ta
exam p1 e .
. sma 11 scale HA sys tem
rations for t h is
. ple exam p 1e sh. own I. n F igure l l . l a trade-off can be made between
F ro m t h e s1m
. h
nec ting co mponents and redundancy management wit
0 f inte rcon
.
the comp l ex1ty
.
' g H R com ponen ts . The cost of hardware components with H R
the cost of me 1 ud m
b estimated based upon c o mpo ne nt tsting, expected
is fairly well k now n an d ca
acterist ics , pac k a gi ng, and the overa ll p hy sic a l
, MT BF, ope ratt on l cha r

'

failu re rate s

featu res of the compo nent.

Scanned by CamScanner

111

Real-Time Embedded Components and Systems

TAllU

Enumeration of Configurations for a Three Subsystem Dual-String


Cr
Strapped System

1 1 .1

(..........

C.. c2

c..,.1111 C1

Cllf es

System architects should also consider three simple parameters before invest

ing heavily in HA or H R for a system component or subsystem:

Likelihood of unit failure

I m pact of failure on the system

Cost of recovery versus cost of fail-safe isolation


Conceptually, architects should consider how levels of recovery are handled

with varying degrees of automation, as depicted i n Figure 1 1 .2.


E x c PpllonB as ed
Ma nt n a n c
And S erv1c 1n9

S y s t e rn A u t o n o m y
A n d A u tomatic
R e c o verv

D i a g no s i s

R e - c o n f i g u r a t ion

S u b- S ys tern E r ror

De

ion

'

FIGURE 1 1 .1

Scanned by CamScanner

Detection a n d
C o r rec t1on

C o mputing S y Hm

C o rrection

Suppo rting multiple levels of recovery autonomy.

Hig h Ava ilab ility and Reliabil


ity Design

221

RE L IAB LE SOFTWA R E
Pe rhaps mu ch h a rde r to est ima te is the cos t of
H R so ft ware. Cl ear1y, re1 1a bl e h a rd .
.
wa re r u n n m g u n reli able sof twa re will resu lt
m c
ia11 ure mo d es t h at are I 1 k eI y to
.
.
.
ca se serv Ke tte u p t io . Com lex softw are is ofte
n less relia ble, and the best way
.
.
to increase reha b1hty 1s with testm g Test ing takes t .
1me an d uIt 1mateI y add s t o cost
to mar k et.
and time
.

System a rc h itects have tradi tiona lly focused


on designing HA and H R hard
ware with firmw are t o manage redun dancy and to autom
ate recovery. So, for ex
ample, fi rm ware would reconfi gure the compo nents in the exa mple in Table 1 1 . l
t recover from a com ponent fai l u re. Traditio nally, rigorous testing and verifica
tion have ensured that fi rmware has no flaws, but history has shown that defects

can still wind u p in the field and emerge due to subtle differences in t iming o r exe
cution of data-driven algori th ms.
Design i n g firmware and software for HR can be costly. The FAA requires rig
orous d oc u mentation and testing to ensure that fliht software on commercial air
craft is HR. The D0- 1 788 class A standard requi res software developers to
maintain cop ious design, test, and process documentation. Furthermore, testing
must include formal proof that code has been well tested with criteria such as mul
tiple condition decision coverage ( M C DC ). This criteria ensures that all paths and
all statements in the code have been exercised and shown to work. This laborious
task greatly i nc reases the cost of software components.

AVAILA B L E SOFTWA R E
Real-time systems are composed o f software services that provide service upon re
quest. Norma lly these service s block i n - betwee n request s. Due to coding or design
error , it is possib le that software san ity may be lost so that service s will block i n
s
be
definitely. T o detect and recove r from indefin ite blocki ng, all service should
upper
coded so that they will event ually t i m e out i f they do not unblo ck by some
i meut
bo und o n an expec ted reque st perio d. When the servic e unblo cks due to t
f ( ) ) , whic h
or due to data avail able, the servi ce shou ld call keep_ alive ( t a s k i dSel
ty mon itor. The san
will post the servi ce's uniq ue task ID to a software syste m sani
to ake s re
i ty mon it o r is also a serv ice. This serv ice simp ly perio dica lly chec k
serv ICe. I f a serv ice
all other serv ices have chec ked in and are cont i nuin g to prov ide
use tas kSu spen d ( ) and
has not chec ked in by a t i m e l i m it , the sani ty mon itor c n
ific shut down ( ) to recover
tas kOe lete ( ) in VxWo rks alon g with a n app licat ion- spec
d. Thi s begs the que s
the wedged serv ice a n d its reso urces so it can then be rest arte
this case, a har dwa re
tion , wha t h app ens if the san ity mo n itor itse lf lose s san i ty? In

Scanned by CamScanner

230

RealTime Embedded Components and Systems

dog is a co untdown ti
tn er
watch dog t imer will recov er the system . The watch
of
v
rec
re
full
boot
o ery the sy ,
which if not reset by the sanit y mon itor, will start a
S
,
t work and systems go
tern. I n rare circu msta nces, the recov ery meth ods w.on
i
nto
.
d ov .

of
sa
1ty
loss
in
ov
results
n
er
ry
recove
g
a
do
watch
the
?
where
t
rolling reboo
er
ak
s
time
and
n
,
te
stress
st
i
onal,
functi
ing
g, this is

With rigoro us testing , includ


e
u
h
requu
ikely
l
man
most
interventio
very unlike ly to happe n. I f it does, this will
n
ila
y
unava
length
or
bilit
failure
y.
system
to
lead
will
or
to fix the proble m

D E S I G N TRA D EO F F S
Designing for H R alone can be cost prohibitive, so most often a balance o f design
for H A and H R is better . H A at the hardware level is most often achieved through
redu ndancy ( sparing) and switchi ng. A trade-off can be made between the cost of
dupl ication and simply engineering H R i n to components to reduce the MTBF.
Over t i me, designers have found a balance between H R a n d HA features to opti
mize cost and avai labi l i ty. Fundamental to dupl ication schemes is the recovery
latency. When planning component or subsystem duplication for HA, arc h itects

must carefully consider the com plexity a n d latency of t h e recovery scheme and
how this will affect firmware and software layers. T rade-offs between working to
simply increase reliabi lit)r instead of increas ing availabil ity through sparing should
be analyzed. A simple, well-proven methodology that is often employed by systems
engi neers is to consider the t rade-off of p robability of failure, impact o f failure,
and cost to mitigate i m pact or reduce likelihood of failu re. This method is most
often referred to as FMEA ( Failure Modes and Effects Analysis).
Another, less fo rmal process that system engineers often use is referred to as
the lo w-hanging fruit process. This process involves ranking system design feat es
under consideration by cost to i m plemen t, rel i a b i l i ty i m p roveme nt, availab1l 1'
.
improvement, and complexity . The point of low-hanging fruit analysis is to pKk

features that improve HA/ H R the most for least cost and with least risk. Wit ho t
.
existmg prod ucts and field testing, the hrdest part of F M EA or low-hangi ng fruit
au
analysis is estimating the p robability of fa i l u re for c o m pone nts and esti m ng
.
true
improvem ent to H.A/ H R for specific feat ures. For hardware, the tn. ed and .
.
.
.
testu l
met hod 1or
c
1s
est1 matmg re l 1a
. b 1' I tty
based u pon compone nt testi ng, sys tem
be
.
envi ron mental testi ng, accelerated t esting , and field testing. The tradeo ffs but
tween engineering rel iabil .ity and availability into hardware are fairly obv ious,
how does this work with fi rmwa re and softw are?
\.

cJ

Designin g and i m plementi ng H R fi rmwa re and software can be ve1 cosifi


s ver
The main approach for ensuring that fi rm wa re/software is high ly reliab l e i

Scanned by CamScanner

High Availa bility


and Reliability Design

2J 1

h formal cover age criteria


. h
.
catio n wit
alo ng W ll.
. tes
u it
ts , integra t ion tests, system
n
essio
regr
test
ing
and
. Test coverag
t ests ,
e cn ten a i n cl ude fiea t
ure po mts;
.
b ut for
c
h
mu
more
s,
tem
rig
or
sys
ou
s
cr ite ria are necess
.
HR
.
. a ry, mcl
u dmg
state men t , path ,
M u l t i p le Condit ion D ec1 .
s1 on Coverage ) .
a nd MCDC. (
You mig ht won d er why des ign i ng t est cases fior pa th and stat
eme nt cove rage is
not su ffitc i e fior HR. softwar. The i m pl e s n ippet of C code i n List ing 1 1 . I shows
why MC DC is req u ired . The if test in ma in ( ) has two expressio
ns 1 og1ca
11 y ord ered
ost
com
g
C
.1
p
ers
M
er.
gene
i
rate
et
code so that the second expre ssion is shor t to
. f the fi rst evaluates
to true . As a result , both paths i n m a i n ( ) for the if
circm ted t
.
blocks ca n be dnve by a test withou t ever executin g ( testing) the ou t s ideL i m 1 t s
code. So, a test dnver m u st drive the sa me pa t h with t h e conditio n where
d is false and where recoveryReq u ired is true and outs idel i m i t s is
recov e r y R e q u i r e
eith er true o r false in combi nation with t h i s . So, for simple p ath c overa ge in
main there are o nl y two paths noted by coverage of code on the line n umbers:
Path-A) 28, 30, 32 and Path- B ) 28, 30, 1 3, 20, 36. However, by M C DC, you m ust

of the fu nction
ensure that main Path-A is covered i n combina tion with the paths
define s Path -C) 28, 30, l 3, l 5, 1 6, 32.
ou t s id e l i m i t s , which
LISTING 1 1 .1

n Re uirem ents
Sim ple C Code with Two Paths and MCDC Testi g q

01 :

# d e f i n e U P P E R_ L I M I T 1 00

02 :

# d e f i n e TRUE

03 :

#define

FALSE 0

04 :
05 :

e x t e r n v o id

06 :

extern

07 :

08 :

09 :

12:
13:
14:
15:
16:
17:
18:
19:
20 :
21 :

lo gM e s s a g e ( c h a

*m sg ) ;

ne d i n t ad dr ) ;
MM I OR ea d ( u n s ig
u n s ig n ed i n t
o id ) ;
rtRecove ry ( v
extern void Sta
i o n ( v oi d ) ;
t inueOperat
e x t e r n v o id C on
re d ;
cove ryRequi
e x t e r n int Re

10:
1 1 :

int

n s ig n e d
L im i t Te s t ( u
if ( va l

>

IT)
UP PE _L I M

l o gM e s s a g

e ( " L imi t

E;
r e t u r n TRU

e l se

Scanned by CamScanner

in t v a l )

re t u r n

FAL SE ;

E x c eed ed \

n" ) ;

2J1

Real-Time Embedded Components and Systems

22 : }
23 :
24 : main ( )

25 : {
26 :

uns igned int I OValue

27 :
28 :

I OValue

29 :

O;

MM I ORead ( Ox F0000 1 00 ) ;

if ( Recov eryRe qui red 1 1 Out s i d e l imi t s ( I OVa l u e ) )

30 :

31 :
32 :
33 :

S t a rt Recovery ( ) ;

}
else

34 :

35 :
36 :
37 :

Cont i n u eOperat i o n ( ) ;

38 : }

H I E RA R C H I C A L APPROAC H E S F O R FAIL-SAFE D E S I G N
I deal ly, a l l system, s ubsystem, a n d component e r ro rs c a n b e detected and cor

rected in a h j erarchy so that component errors are detected and corrected without
any action req u i red by the conta i n i n g

ubsyste m . T h i s h i e ra rc h ical approach for

fa u l t detection and fa u l t protect ion/correc ti o n can grea t l y simplify verification of a


RAS ( Reliabil ity, Availabi l i ty, and Servicea b i l i ty ) design . An ECC memory compo

nent provides for si ngle- b i t error dete ct ion a n d a u to m a t i c correction. The in c or


poration o f ECC memory provides a c o m ponent level o f R.A.. S , which can incre ase

RAS performance and reduce t h e complexity of su pport i ng RA S at h ighe r lev el


H R systems often incl ude design elements that en u re t h a t nonrecoverab le failu res

re u l t i n the system going o u t of service, along with safing to reduce risk of lo i ng


t h e a et, damagi ng property, o r c a using loss of l i fe .

SUMMARY
d
}'!> t ern designed for H A may not neces a ri l y b e d e i g n e d fo r
sy t ern te.n
t o be H A a l t hough t hey often fai l afe a n d wa i t f o r h u m a n i nt e rv e nt io n for re O\'
ery rather t h a n r i k a uto m a t ic recov ery . 0, t h e r e i, n o c l ea r o r rela t io n bet"'
t'f0

H R. H R

the H A a n d

MTTR .

HR

di

Scanned by CamScanner

o t her t h a n t h e equa t i n fo r a v a i l abi l i r y i n t e r m o f MTBF J n


u ed , g i ve n t h a t h igh M T R F a n d I
n g M TT R comp a r t"d to I

2J J

High Availability and Reliability Design

MTBF

and rapid M TT R coul d yield the same availability, it appears that a va ilabil
ity alone does not characteriz e the safeness or quality of a system very well.

EX ERC ISES
1 . H ow many con figurations must a fault recovery system attempt for a d ual
string ful l y cross-strapped system with four subsystem s i

2 . Describe what you think

good fail-safe mode would be for an Earth

orb i t i n g satel lite that encounters a nonrecoverable multibit error when it


p asses th rough the high radiation zone known as the South Atlantic
Anomaly. What should the satel lite do? How should it eventually be re
covered for conti nued operation?

3. I f it takes 30 seconds for Win dows to boot, how many ti mes can this OS

fail d u e to deadlock or loss of software san ity in a year for five nines HA

ignoring detection time for each failure?

4. I f the l a unch vehicle for satellites has an overall system reliability of 99.5%

and has more than 2,000 components, what is the average reliab ility of
each com ponent?

5. Research the CodeTest software coverage analysis tool and describe how it
provides test coverage criteria. Can it provide proof of MCDC coverage?

CHAPT E R R E F E R E N C E S
[ ReliableLi n u x ]
2002.

Scanned by CamScanner

Campbell, I ain, Reliable Lin ux-Assuring High A va ilability. Wiley,

You might also like