Professional Documents
Culture Documents
CO M PO NENTS AN D SYSTE M S
...
f/
'
.I
.
.
'
ii
.,.,
"
.
..
.,
...
,,
,y
'
;1'
SAM SIEWERT
'
C E N GAG E
Lea r n i n g
i..
..
r!
'
'
1,.
'
Scanned by CamScanner
V I
\.... .. ..
, .
:
'G I
..
E E R I
E R I
Contents
Preface
Acknowledgments
Part I
xv
XVII
1
3
Introduction
Real-Time Services
Real-Time Standards
6
14
Summary
15
Exercises
16
Chapter References
16
/ System Resources
17
Introduction
17
Resource Analysis
19
25
Scheduling Classes
32
Multiprocessor Systems
The Cyclic Executive
Scheduler Concepts
33
35
40
43
50
vii
Scanned by CamScanner
Contents
Summary
53
Exercises
53
Chapter References
55
/Processing
57
Introduction
57
58
Feasibility
61
62
73
73
75
Deadline-Monotonic Policy
75
77
Summary
81
Exercises
81
Chapter References
83
/ 1/0 Resources
Introduction
85
Worst-Case Executi
on Tim e
Intermediate I/O
85
86
Execution Effici
89
110 Architect
Summary
ency
ure
91
94
Exercises
96
Chapter Refe
renc es
96
97
/Memory
In tro ductio
Ph ysical Hie
rarchy
99
99
JOO
Scanned by CamScanner
Contents
Ix
I 03
Shared Memory
I 04
ECC Memory
I 05
Flash Filesystems
I 09
Summary
111
Exercises
1 11
Chapter References
111
/ Multiresource Services
113
Introduction
113
Blocking
I 14
I 14
I 16
I 16
118
120
Summary
121
Exercises
121
Chapter References
121
123
123
124
125
125
128
Summ ary
129
Exercises
129
Chapter References
Scanned by CamScanner
130
Contents
Part 11
8
ded C omponents
d
e
b
Em
. .ng Rea I-rime
Oes1gm
Components
Embedded System
In trod uction
Hardware Co mpon
137
Actuators
138
1/0 Interfaces
142
Processor Complex
on
Processor and I/0 Int erconnecti
l43
144
Bus Interconnection
147
149
150
Interconnection Systems
151
Memory Subsystems
152
Firmware Components
153
Boot Code
153
Device Drivers
154
154
Message Queues
155
Binary Semaphores
156
Mutex Semaphores
156
157
So ftware Signals
m ponents
A PPJ'JCation Ser
vices
Re-entrant A
PP 1 cat ion Lib ra r
1es
C:Om mun1cat
ing and Sync
h ronize d A ppl
tcations
S umm ary
Exercises
Chapter R efe
rences
Scanned by CamScanner
133
135
Sen sors
133
135
ents
Software Applicatio
n Co
1 31
157
158
158
160
161
163
163
164
Contents
9
Debugging Components
Introduction
Exceptions
Assert
Checking Return Codes
Single-Step Debugging
Kernel Scheduler Traces
Test Access Ports
Trace Ports
Power-On Self Test and Diagnostics
External Test Equipment
Application-Level Debuggi ng
Summary
Exercises
Chapter References
,,,>fl'
Performance Tuning
Introduction
Basic Concepts of Drill-Down Tuning
Hardware-Supported Profiling and Tracing
Building Performance Monitoring into Software
Path Length, Efficiency, and Calling Frequency
Fundamental Optimizations
Summary
Exercises
Chapter References
11
Introduction
Reliability and Availability: Simil aritie s and Differences
Reliability
Scanned by CamScanner
165
165
166
173
174
175
182
186
188
189
194
198
199
199
200
201
201
202
206
209
210
221
222
223
223
225
225
226
227
Contents
xii
229
Reliable Software
229
Available Software
230
Design Tradeoffs
e
ign
Hierarchical Approaches for Fail-Saf Des
Summary
233
Chapter References
237
238
Lifecycle Overview
241
Requirements
242
Risk Analysis
242
257
263
263
Video
Scanned by CamScanner
257
262
Exercises
Video Streaming
254
262
Summary
Video Codecs
245
262
Regression Testing
Unco mpressed
Vid
235
237
System Lifecycle
Introduction
13
232
233
Exercises
12
232
eo Frame Formats
264
268
269
271
Contents
Debug
Robotic Applications
Introduction
Robotic Arm
Actuation
End Effector Path
Sensing
Tasking
Automation and Autonomy
Summary
Exercises
Chapter References
Chapter Web References
15
Scanned by CamScanner
xiii
271
275
275
276
278
278
278
281
281
283
286
293
293
298
300
301
301
301
302
303
303
304
306
309
311
312
313
314
Introduction to Real-Time
Embedded Systems
la
This Chapter
Introduction
INTRODUCTION
The concept of real-time digital computing systems is an emergent concept com
pared to most engineering theory and practice. When requested to complete a task
or provide a service in real-time, the common understanding is that this task must
be done upon request and completed while the requester waits for the completion
as an output response. If the response to the req uest is too slow, the requestor may
consider lack of response a failure. The concept of real-time computing is really no
different. Requests for real-time service on a digital computing platform are most
often indicated by asynchronous interrupts. More specifically, inputs that consti
tute a real-time service request indicate a real-world event sensed by the system
for example, a new video frame has been digitized and placed in memory for
processing. The computing platform mu t now process input related to the service
request and produce an output response prior to a deadline mca ured relative to
an event sensed earlier. The real-time digital computing system must produce a re-
Scanned by CamScanner
s
nents and System
Real-Time Embedded Compo
Afte the
deadline estab
wait.
user and/or systern
e
th
ile
wh
st
ue
req
on
.
sponse up
es up or the system
request tame, the user giv
ative to the
,
lished for the response rel
ed.
sponse has bee pro uc
re
.
no
if
ts
en
rem
. g which
ui
fails to meet req
a process
nn
e
du
tim
the
1s
time as a noun
l
rea
e
fin
de
to
y
A co mm on wa
im e relates to computer applica
as a n adjective, rea l-t
tak es plac e o r occ urs Used
ded latency to user requests.
res po nd wi th low boun
can
t
tha
ses
ces
pro
or
s
ion
t
e for computing systems
a te ways to define real tim
ur
cc
a
st
mo
nd
a
t
bes
the
of
On e
rect reaJ-time system
t rea l -t i m e behavior. A cor
rec
cor
by
ant
me
is
at
wh
rify
is to cla
tically) correct output
go r it h m ic ally and mathema
must produc e a fun ct ion a lly ( a l
t for a service.
deadl i ne rela t i ve to t h e reques
response pri or to a well-defined
has a si mil ar and related history to reaJ-time
The con cept of
scope to p r ecl ude generaJ-purpose deskto p
system s a n d is mos tly a narr owi n g of
uded i n a real-time system. For example,
com pute r platf orms that m ight be i n c l
ber of commercial
ission cont rol cen ter inclu des a l a r ge num
.
embedded systems
i ndu try i
f u l l of specialized terminolo
1. d
m1t1ons. Although this book attempts to defi me spe i. a ize terminology in context,
on occasion the gl ossa ry c n he I P I'f yo L want to read the text out of order or when
A BR IEF HI ST OR Y O F RE AL -T I M E S Y ST
EMS
rh origin of real time comes fro
.
m th e recent history
.
of process control usmg
d1g1tal computing platforms 1 n "
iact an earl Y d efi m1t1ve
text on the concept was
pubhshe
.
a simulation that run
s
at least as c
.
1ast as the real-world physICal
.
process 1t models is said to run m
.
.
rea1 time Ma
ny simulat1ons must make a tradeoff b etween running at or faster th
.
an real-time w"th
I ess or more model fidelity.
1
.
Scanned by CamScanner
The same is true for real-time graphical user interfaces (GUI), such as those pro
vided by computer game engines. Not too much later than Martin's 1965 text on
real-time systems, a definitive paper was published that set forth the foundation
for a mathematical definition of hard real-time-"Schedu ling Algorithms for
Multiprogramming in a Hard-Real-Time Environment" [Liu73 J. Liu and Layland
also defined the concept of soft real-time in 1973, however there is still no univer
sally accepted formal definition of soft real-time.
The concept of hard real-time systems became better understood based upon
experience and problems noticed with fielded systems-one of the most famous
examples early on was the Apollo 11 lunar module descent guidance overload. The
Apollo 11 system suffered CPU resource overload that threatened to cause descent
guidance services to miss deadlines and almost resulted in aborting the first land
ing on the moon. During descent of the lunar module and use of the radar system,
the book Failure Is Not an Option [KranzOO], Buzz radios, "Program alarm. It's a
1202." Eugene Kranz, the Mission Operations Director for Apollo 11, goes on to
explain, "The alarm tells us that the computer is behind in its work. If the alarms
continue, the guidance, navigation, and crew display updates will become unreli
able. If the alarms are sustained, the computer could grind to a halt, possibly
aborting the mission." Ultimately, based upon experience with this overload con
dition gained in simulation, the decision was to press on and ignore the alarm-as
we all know, the Eagle did land and Neil Armstrong did later safely set foot on the
Moon. How, in general, do you know that a system is overloaded with respect to
CPU, memory, or IO resources? Clearly, it is beneficial to maintain some resource
margin when the cost of failure is too high to be acceptable (as was the case with
the lunar lander), but how much margin is enough? When is it safe to continue op
eration despite resource shortages? In some cases, the resource shortage might just
be a temporary overload from which the system can recover and continue to pro
.!
ter 6, Multi-Resource Services. As o 11 see, ensuring safe,
mutually exclusive
.
access to hared memory can cau pnonty mversion. Whi
le safe access is required
for funct1onal correctness, meetmg response deadline
s is also a requirement for
.
.
real-time systems. A real-time
system must produce fun
ct'1ona II y correct answers
.
.
on time, before deadlmes f0r 1:1ver dll svstem
correctness G.1ven some sys t,ems e'
.
velopment expenence, most engineers are
familiar with how to design and test a
Scanned by CamScanner
Systems
Com ponents and
edded
Emb
Real-Time
'
systems are often an essential part of any real-t ime embedded system and process
sensed input to produce responses as output t o a c t uators. The sensors and actua -
tors are com ponents providing IO and define the interface between an em bedded
system and the rest of the system or appl icat ion. Left with t h i s a s t he definitio n of
an em bedded digital computer, you could argu e t ha t a general - purpose work
station is an e m bedded system; after all, a mouse, keyboard, and video display pro
A eneral- purpos e workst ation provid es a platfor m for unspec ified, to-be
determmed sets o f service s, where as an embed
ded system provid es a well-definfd
.
servICe or set o services such as antilock braki ng cont rol. In general, providing gen
.
.
eral servICes is impra ctical for
appli catio ns such as com putat ion of 1t t o the rrth digit,
payroll , or office auto mati o
n on an embedded system. Fina lly, the point of an e rn
bedded system is to cost
-effective ly prov ide a mor e l im ited set of serv ices in a l arg e r
.
aturl' L.ri
... itivc n:s1 st ors )
.
to 0 0I t h c
'
I trh.
h et''s ubs y km with a fa or to heat 1t wit
"
Scanned by CamScanner
coils. The service provi ded in this example is thermal managemen t such that the
subsystem tempera ture is mainta ined within a set range. Many rea l -time embed
ded services are digital control functions and are periodic in nature: An example of
a rea l - time service that has a function other than digital control is digital media
and d uplex transport is provided, the system provides voice communicat ion ser
vices. In some sense, all computer appl icat ions are services, but real- time services
must provide the function within time constraints that are defined by the appl ica
tion. In the case of digital control, the services must provide the function within
time constraints required to maintain stability. In the case of voice services, the
function must O(cur within time constaints so that the human ea r is reasonably
pleased by clear audio . The service itsel f is not a hardware, firmware, or software
entity, but rather a conceptual state machine that transforms an input stream into
an output stream with regular and reliable timing.
Listing I. I is a pseudocode outline of a basic serv ice that poils an input inter
face for a specific i np u t vecto r.
./LISTING 1 .1
void provide_service(void)
if( initialize_service()
==
ERROR)
exit(FAILURE_TO_INITIALIZE);
else
in_service
TRUE;
while( in_service)
if(checkForEvent(EVENT_MASK) -- TRUE)
read_input(input_buffer);
output_buffer=do_service(input_buffer);
write_output(output_buffer);
shutdown_service();
Thi
implementation i
Scanned by CamScanner
"'' J"'I;"'
must be maintained and all services are limited to a maximum rate established b
the main loop period. This architecture for real-time services has historically bee
called the Main+ISR or Executive design when applied to software services run
ning on a microcontroller or microprocessor. Chapter 2 will describe the Execu
tive and Mairi+ISR system architectures in more detail. As the Executive approach
is scaled, multiple services must check for events in a round-robin fashion. While
one service is in the Execute state shown in Figure 1.1, the other(s) are not able to
continue polling, which causes significant latency to be added to the sensing of the
real-world events. So, Main+ISR works okay for systems with a few simple services
that all operate at some basic period or multiple thereof in the main loop. For sys-
((
SRVC-MOOE INIT IALIZE
'
SRVINITIAllZi
lniliahHllOll I
ST 4TUS STAAT
SRVC-MOO
READY
SRVC-MOOEERAC>ft
I
'
SRVC-MOqEREAOY
Eenl S.nalrog I
S UTUSPOLL
SRVC-MOOEENCOOE
SRVC-MOOflENCOOE.
SRVC-MOOE REAOY
Re..i I
ST4 TUSSAMPLE
SRVC-MOOETRANSFORM
'
SRVCfRANSfORM
&ecuie I
ST4TUS8USY
SRVC-MOOEaOECOOE
;
WYile I
ST4TUSCOMMAHO SRVc-MOOEOECOOE
FIGUIE 1.1
Scanned by CamScanner
terns concerned only with throughput. , polling services are acceptable and perhaps
even preferable, but for real-time services where event detection latency matters,
multiple polling services do not scale well. For large numbers of Services, this ap
proach is not a good option without significant use of concurrent hardware state
machines to assist main loop functions.
When a software impleme ntation is used for multiple services on a single
CPU, software polling is often replaced with hardware offload of the event detec
tion and input encoding. The offload is most often done with an ADC (Analog to
Digital Converter) and DMA (Direct Memory Access) engine that implements the
Event Sensing state in Figure 1 . 1 . This hardware state machine then asserts an in
terrupt input into the CPU, which in turn sets a flag used by a scheduling state ma
chine to indicate that a software data processing service should be dispatched for
void provide_service(void)
TRUE;
while(in_service)
if(waitFor(service_request_event,
timeout)
! = TIMEOUT)
read_input(input_buffer);
output_buffer=do_service(input_buffer);
}
}
write_output(output_buffer);
else post_timeout_error();
post_service_aliveness(serviceIDSelf());
shutdown_service();
The preceding
waitFor
world event is sensed. When the event the service is ti ed to occurs, an interrupt ser
vice routine releases the software state machine so that it reads input, processes that
input to transform it, and then writes the output to an actuation interface. Before
entering the
waitFor
ing resources that it need s while in service. For example, the service may need to
Scanned by CamScanner
inpu t buffer trans forma tion. Assumi ng that inireserve worki. ng memory for the
. '
SM + ISR + service loop, the SM would tran it ion from the wai t For state to an
input state based upon clocked comb in ational logic and the presen t i nput vector
driven from a sensor ADC interface. The hardware state machine would then exe
cute combinational logic and clock and latch flip-flop register i nputs a n d outputs
until an output vector is produced and ultimately causes actuation with a DAC.
Ultimately, the state of the real world is polled by a hardware state machine, by a
software state machine, or by a combin ation of both. Some comb in ation of both is
often the best trade-off and is one of the most popular approaches . Implement ing
service data in software provides flexibility so that modificat ions and upgrades can
be made much more easily as compared to hardwar e modification.
an m
t egrat1on
o f compone nts, mcludmg at least
hardware and software. The
real -wo rld even ts are dete cted usin g sens ors, often
tra nsducers and analog -to - d '
.
.
ig1 t a l converters, a n d are tied
t o a m1croprocessor
.
.
int errupt with dat ' con trol, an
.
.
.
d status b u f-c
1ers. This ensor interface provides the
.
inp ut needed for service pro
.
cess mg. T h e processi. n g transforms the input
data
.
.
associated with the even t int o a
respo n se out pu t . The respon se out put 1s most
Scanned by CamScanner
11
where computing systems were embedded early on into vehicles and factories to
provide automated control. Furthermore, digital control has a distinct advanta ge
over analog because it can be programmed without hardware modification. An
TimeActualion - Ti me sensed
Response Time
(From Event to Response)
=
Event
Sensed
Interrupt
FIGURE 1.2
Dispatch
Preemption
Dispatch
Completion
(10 Queued)
Actuation
(10 Completion)
ecuting by the arrival of interrupts from events and other services. You can also
Scanned by CamScanner
12
For example, a computer vision system that tracks an object in real-time may
filter an image, segment it, find the centroid of a target image, and command an
actuator to tilt and pan the camera to keep the target object in its field of view. The
entire image processing may be completed 30 times per second from an input
camera. The ltering step of processing can be as simple as applying a threshold to
every pixel in a 640 x 480 image. However, applying the threshold operation with
fj
cause these algorithms are still being refined for the system . The service timeline
for the hardware accelerated service processing is shown in Figure 1.3. In Figure
1.3, interference from other services is not shown, but would still be possible.
Response Time
TimeActuation - Ti mesensed
to Response)
(From Even t
=
(---
Event
Sensed
Input
Complete
A.ciun 1 .J
lntrrupt
Completi on
Dispatch
(10
Queued)
Actuation
(10 Completion)
er to 1m
k' events to acruat1ons to momtor and con.
trol some aspect of an overaU
system. Finally, in both Figure 1.2 and Figure 1.3,
.
Scanned by CamScanner
1J
response time is shon as being limited by the sum of the IO latency, context
swich latency, e xecution time, and potential interference time. Input latency
coes from the time it takes sensor inputs to be converted into digital form and
transferred over an interface into working memory. Context switch latency comes
from the time it takes code to acknowledge an interrupt indicating data is avail
able, to save register vafues and stack for whatever program may already be execut
ing (preemption), and to restore state if needed for the service that will process the
newly available data. Execution ideally proceeds without interrupti<Yn, but if the
system provides ultiple services, then the CPU resources may, be shared and in
terference from other services will increase the response time. Finally, after a ser
vice produces digital output, there will be some latency in the transfer from
working memory to device memory and potential DAC conversion for actuation
output. In some systems, it is possible to overlap IO latency with execution time,
especially execution of other services during IO latency for the current service that
would otherwise leave the CPU under-used. Initially we will assume no overlap,
but in Chapter 4,.we'll discuss design and tuning methods to exploit IO-execution
overlap.
In some cases, a real-time service might simply provide an IO transformation in
real-time, such as a video encoder display system for a multimedia application.
Nothing is being controlled per se as in a digital control application. However, such
c,:
60 frames per second. Similarly, digital audio continuous media systems require en
Scanned by CamScanner
1C
Source Flow
& Control 4
SourcFlow
: Control if
.
Application
Appllcatlon
API
HW/SW
FIGURE 1.4
So far, all the applications and service types considered have been periodic.
One of the most common examples of a real-time service that is not periodic by
nature is error-handling service. Normally a system does not have periodic or reg
ular faults-if it does, the system most likely needs to be replaced! So, fault han
dling is often asynchronous. The timeline and characteristics of an asynchronous
real-time service are however no different from those already studied. In Chapter
2, "System Resources," the characteristics of services will be investigated further so
that different types of services can be better understood and designed for more
optimal performance.
lulTl Stara
The POSIX (Portable Operating Systems Interface) group has established a num
ber of standards related to real-time systems including the following:
l EE Std
2{)_03.1 b-2000:
t1me extensions
l EE Std
15
The most significant standard for real-time systems from POSIX is I 003. lb,
which specifies the API that most RTOS (Real-Time Operating Systems) and real
time Linux operating systems implement. The POSIX 1003.1 b extensions include
definitions of the following real-time operating system mechanisms:
Priority Scheduling
Real-Time Signals
Clocks and Timers
Semaphores
Message Passing
Shared Memory
Asynchronous and Synchronous I/O
Memory Locking
Some additional interesting real-time standards include the following:
SUMMARY
Now that you are armed with a basic definition of the concept of a real-time em
bedded system, we can proceed to delve into real-time theory and embedded re
source management theory and practice. (A large number of specific terms and
Scanned by CamScanner
16
the best place to start because it is fundamental. A good understanding of the the
ory is required before you can proceed with the more practical aspects of engineer
""''"rco
built for a low cost, they provide a meaningful experience that can be transferred
to more elaborate projects typical in industry. All the Example projects presented
here include a list of components, list of services, and a basic outline for design and
of Colorado. For more challenge, you may want to comat the University
students
.
.
' brne eieents and mix up services from one example with another; for example, it
...
'is posjhle to place an NTSC camera on the grappler of the robotic arm to recog,11ize. tctrgets for pick-up with computer vision. A modification such as this to in
clu a Video stream is a nice fusion of continuous media and robotic applications
that reqt!ires the application of digital control theory as well.
2.
and embedded.
Find the Liu and Layland paper and read through section
3. Why do they
assume that all requests for services are periodic? Why might this be a
problem with a real application?
3. Define hard real-time services and soft real-time services and describe why
and how they are different.
CHAPTER REFERE N C E S
Hall,
1965.
Scanned by CamScanner
System Resources
'
In This Chapter
Introduction
Resource Analysis
Real-Time Service Utility
Scheduling Classes
The Cyclic Executive
Scheduler Concepts
Real-Time Operating Systems
INTRODUCTION
Real-time embedded systems must provide deterministic behavior and often have
more rigorous time- and safety-critical system requirements compared to general
system must survive launch and the space environment, must be very efficient in
terms of power and mass, and must meet high reliability standards. Applications
that provide a real-time service could in some cases be much simpler if they were
ment. For example, a desktop multimedia application can provide MPEG play
back services on.a high-end PC with a high degree of quality without ignificant
specialized hardware or software de ign; this type of scenario is "killing the prob
lem with resources." The re ources of
17
Scanned by CamScanner
11
throug hp ut CPU (GHz clock rate and billion s of i nstruc tions per seco nd) , a high
bandw idth, low-lat ency bus ( gigabit ) , a large-c apacity memo ry system ( gigabytes)
and virtual ly unlim ited public ut l i 1,Y power. By compa rison, a n anti.-lock braki n
system m ust run off the automobile s 1 2 -volt DC pow r system , s u 1ve the u nder
hood therma l and vibrati on envi rnme nt, and provid e the brakm g control at a
reason able consum er cost. So for most real-ti me embedded system s, simply kill ing
the prob lem with resources is not a valid opti n .
.
.
The enginee r m ust instead carefull y co s1der resourc e h m tat1ons, incl udi ng
power, mass, size, memory capacity , processmg, and 1/0 bandwi dth. Furthermore,
compl ications of reliable operatio n in hazardo us environ ments may require special
ized resources such as error detectin g and correctin g memory systems. To success
fully implemen t real-time services i n a system providing embedde d functi ons,
resource analysis must be completed to ensure that these services are not only func
tionaJly correct, but that they produce output on time and with h igh reliabili ty and
availab ility. Part I of this book, "Real-Time Embedded Theory," provides a resou rce
view of real-time embedded systems and methods to make opti mal use of these re
sources. I n Chapter 3, " Processing," we will focus on how to analyze CPU resources.
I n Chapter 4, " 1/0 Resources," resources will be characterized and methods of analy
Scanned by CamScanner
11
System Resources
CPU clock rate, for exam ple . I n t h i s se n se, po we r const raints could in the end ca u se
a p robl e m with a design's ca pability to meet real - t i m e deadline .
e rv ice exa m p l e
i n t ro d u ed i n
in the
volatile to rage
tt
actuatio n
I n the up o m i ng t h ree chapter , w e w i l l characterize a n d der ive for mal model
for each
f the ke
of h o w ea h ke
Tra d i t ionall
en tered a round proce sing and how to chedule mult iplexed exec u tion of
been
mult iple e r i e
tern
P U to a pec i fic
th read of exe ution. The mechan ics of m u l t i plex..ing the CPU by preempting a r u n
ning t h read, savi n g it
witch.
pat h a re
RT
n t e t - s w i t c h i n g me h a n i m . When
hedu\er a n d ontext-
C P U i m u l t i plexed w i t h a n
ter m i n e \ hether the CPU re o u rce are sufficient given the et of e rv i e th read t o
be e e uted a n d whether the
to
ervi e
pri
repea tJbilit'.'
wilJ
erv i c e
>
eha e
a n tc d , t hen hl' .,
t
rm
n
o .ih1ltty
1<le 1 e
t
t >
t e rn
pr
)VI
fo. B-
e -,er
sy11 tc m ,
m rcp
Scanned by CamScanner
11'>
nb
u e it
e ry e r i e reque
be h
i rd
not
han e
vrr
t i me i n
bt t e r under\t n d t n l
v d ry ove r
1 e
J rcr m t n t '> t l l . He fo r
deter m i n 1 t i
make y
t i me
re u1red.
'
t e m re
ur e
and what
a
n
JO
Speed:
Efficiency:
Algorithm complexity: C;
Service Frequency: T,
Cha pter 1 0, " Performance T u n i n g," provide t i p for resol v i n g execution effi
ciency issues. Execut ion efficiency i
code can lead to large WCET , mi
not
der tanding met hods for t u n i n g oft wa re perfor m a nce can become i m portant.
I nRut and o u t p u t chan nel between pr
1 .3
of Ch ap t e r l .
Many processor cores have t h e capa b i l i t y t o con t i n ue p roce ssi n g i nstructions while
110 reads are pending or w h i l e 1/0 wri tes a re d ra i n i n g out of buffers to devices.
This decou pling helps effi cien c y t r e m e n d o u sly, b u t when the service processing re
Latency
Scanned by CamScanner
System Resources
es t s a n d com p 1et1ons:
. transac t ions and delay
sp1 1t
.
Wn te late ncy
21
Bandwidth ( BW)
Queue depth
CPU
coupling
OMA
engi nes
Memory resou rces are designed based upon cost, capacity, and access latency.
I deally a l l memory would be zero wait-state so that the processing elements in the
system could access data in a s ingle processi ng cycle. Due to cost, the memory is
most often designed as a h ierarchy with the fastest memory being the smallest due
to h igh cost, and l a rge capacity memory the largest and lowest cost per unit stor
age. N o nvolatile memory is most often the slowest access. The management, siz
i ng, and a llocation of memory for real -time embeded systems will be covered i n
detail i n Chap ter 5 and incl udes the follow i ng:
Level - I cache
Locked for use as fast memory, unlocked for set -associative or d irect
M a i n memory- R A M , SDRA M , D D R
Proce or b u interfa
Scanned by CamScanner
ee A p pcnc1 i x A , l o sa r
22
Req u i res algorithm for block e rase: i n terrupt upon c o m pletion and poll
A llocat ion o f data, code, stack, heap to physical h i e ra rchy w i l l sign i fica ntly
affect perfo rman ce
Traditionally, real -time theory a n d systems des i gn have foc used al most e n
1/0 bound: Too much total 1/0 latency d uring the release period a nd/or poor
n:1
Scanned by CamScanner
System Resources
2J
_ _ .., _ _ _ .. _ _ _ _ _ _ _ _ _ _ _ _ .. .. .. .. .. .. ... .. .. .. ,.
,
C P U -U ti lity
,'
'
'
,'
- - - - - - - - - - - - - - - - _
:
'
,'
'
,__
'
_._
_
_
'
'
'
'
'
I Q - U tility
, '
'
,'
'
- .. - - - - - - - - - - - - - - - - - - - - ... - - - - - - J'
'
FIGURE 2.1
System cost
Reliabil ity required ( how often is t he system allowed to fail if it is
soft-real - t i me
syst em ? )
Ava i l a b i l i t y req u i red ( how often the system is expected to be out of service or
in service? )
Risk o f oversubscribing resou rces ( how determ i n is t i c are resource demands?)
I m pact o f overs ubsc ription ( i f resource margin i s i ns u ffi c i e n t , what a re t h e
conseq uences ? )
U n fo r t u n a tely, p rescribing general margi ns for a n y system w i t h specific val ues
is d i ffic u l t . H owever, here are some basic guideli nes for resource s i z i n g a n d margin
main tenance:
CPU: The et o f propo ed ervices must be allocated to proces ors so t hat each
pro e sor in the y tern meet t h e Lehoczky, hah, D i ng t heorem for feasib i l i t y .
Scanned by CamScanner
24
1/0: Total 1/0 latency for a given service should never exceed the respon se
often the deadlin e and period are the
deadline o r the service releae period (
same). You'll see that execution and 1 /0 can be overlappe d so t hat the response
time is not the simple sum of 1/0 latency and execution t i me. I n the worst case,
the response time can be as is suggested by Figures 1 . 2 a n d 1 .3 in Chap ter l .
Overlapping 1/0 time with execution time is therefore a key concept for bette r
performan ce. Overlap theory is presented i n Chapter 4 , " 1/0 Resources."
latency hiding.
Memory: The total_ memory capacity should be suffic ient for the worst-case sta
tic and dynamic memory requirements for all services. Furthermore, the mem
ory access latency sum med with the 1/0 latency should n ot exceed the service
release period. Memory latency can be hidden by overlapping memory latency
with carefuJ instruction scheduling and use of cache to improve performance.
The largest challenge in real -time e mbedded systems is deali'n g with the trade
off between determ inism and efficiency gai ned fro m less deter m i nistic architec
tural features such as set-associative caches and overlapped 1/ 0 a n d execution. To
be completely safe, the system m ust be shown to have determ i n istic t i m i ng and use
of resources so that feasibility can be prove n . For hard real - t i m e systems where the
consequences of failure are too severe to ever allow, the worst case m u st always be
assumed. For soft real -time systems, a better t rade-off c a n b e made t o get higher
performance for lower cost, but with h igher probability o f occasional service fail
ures. In the worst case, the response t i m e equation i s
'IS; , Tresponse-i Deadline;
Tresponse-i
T10-La1ency-i +
W C ET;
TMemory- latency -i
+ L T;nreref rence-j
L Tinterference- ;
;=I
Scanned by CamScanner
= Total
Service;
i-1
Preemption - Time
J=I
(2. l )
Syste m Resources
JS
All
service.s S; in a hard real-time system must have response times less than
their required deadlin e, and the response time must be assumed to be the sum of
the total worst-case latency . Not all systems have true hard real-time requirements
so that the absolute worst-c ase response needs to be assumed, but man.r do. Com
mercial aircraft flight control systems do need to make worst-case assumptio ns to
be safe. However, worst-ca se assumptions need not be made for all services, j ust
REAL-T I M E S E RV I C E UTILITY
To more formally describe various types of real-time services, the real-time research
com m u nity devised the concept of a service util ity function. The service utility
function for a sim ple real-t ime service is depicted in Figure
be 'released when the service is rady to start execution fol lowing a service
request, most often i n i tiated by an interrupt.
Relea
Dead line
1 00%
Uti lity
0%
"------.---:,..;n me
(
'
FIGURE 2.2
Scanned by CamScanner
After Dt'adli11e.
llfl'fir_1 is Negari1e
21
line, and negative again after the deadlin e as depicte d i n Figure 2.3. For ar.
isochronal service, early completio n of response processin g req u i res the response
to be held or bu ffered up to the deadline if it i s com puted early. The servic es de
picted in Figures 2.2 and 2.3 are said to be hard real-time because the utility of a
late response (or early i n the case of isochronal) is not only zero, but also negat ive .
A hard real-time system suffers sign i ficant harm from i mproperly timed re
sponses. For example, an aircraft may lose control if the d igital autopilot produces
a l ate control surface actuation, or a satellite may be lost if a t h r ust is applied too
Jong. Hard real -time services, isochronal or sim ple, req u i re correct tim ing to avoid
Joss of life and/or assets. For digital control systems early responses can be j ust as
Deadline
1 00%
Utility
0%
Before Deadline
Uri/1ry is Neguii e
AGUIE 2.J
Isochronal service u
ti lity
Scanned by CamScanner
Time
Aft er Deadline.
Uri/try is Ngative
Syst em Resources
J7
How does a hard real-ti me service compare to a typical nonreal -t ime applic a
tion ? F i g ure 2.4 shows a s rv ke that is co nsidered to prodce a respon se with best
effort. Basically, the nonreal- time-service h a s no real deadline bCcause full utility ts
realized whenever a best effort application finally produces
systems and even many embedded computing systems are designed to maxjmize
overall throughput for a workload with no guarantee on response time, but with
maxim u m effi ciency i n processing the workload. For example, on a desktop sys
tem, there is no limit on how m uc h the CPU can be oversubscribed and how m uc h
1/0 backlog may be generated. Memory is typically l i mited, but a s CPU and 1/0
1 00%
Utility
Deadline Does Not Exist
0%
FIGURE JA
The real-time research community agrees upon the defin itions of hard real
time, isochronal, and best effort services. The exact definition of soft real-time ser
vices is, by contrast, somewhat unclear. One idea for the concept of soft real -time
Scanned by CamScanner
11
is similar to the idea of receiving partial credit for late homework because a serv i ce
that produces a late response still provides some utility to t he system . An alte rna
tive idea for the concept of soft real-time is also similar to a well- known homework
policy i n which some service drop-outs are acceptable. I n this case, by ana logy, no
credit is given for late homework, but the student is allowed to drop their lowest
score or scores. Either definition of soft real-time clearly falls between the extremes
of the hard real-time and the best effort utility curves. Figu re 2 . 5 depicts the soft
real-time concept where some function greater than o r equaJ to zero exists for soft
real-time service responses after the response deadl ine-if the function is identi
cally zero, a well -designed system will simply term inate the service after the dead
line, and a service drop-out will occur. If some partial utility can be realized for a
late response, a well-designed system may want to allow for some fixed amount of
service overrun as shown in Figure 2.5.
Release
Deadline
1 00%
Utility
Ti me
After Deadline, Utility Diminishes
FIGURE 2.5
.
For cont inu ous media a lica .
.
ti o n s (vid eo and au d io), most often t here is no
PP
reason for prod ucin g late res
po ses bec a u th
se ey cau se fram e jitte r. Ofte n. it is best
ou
s
th
r
lic
app
ese
at
fo
ion if th e prev1
fra m e o r so un\,...r b 1te
. output .1 prod uced agai. n
and the late outp ut dropped w h
.
e n t he next f;rame .1 d e 1 1vered
.
on
u n e Of course . 1 r
t
.
.
mult 1pJ servi ce drop -out s oc u r
i n a row t h ' I ea d s to una
eptable qual ity of ser-
Scanned by CamScanner
----------...---= ,._
....
_
_
,,_ ,......
OJ ._ ,...,..
_
.._ __
_
System Resources
21
i:n
service. The concept of an anytime algorithm can only be implemen ted for services
where i terative refinemen t is possible, that is, the algorithm produces an i n i t ial
solutio n long before the dead l i ne, but can produce a better sol ution ( response) i f
allowed to con t i n ue processing u p t o the deadline for response. I f the deadline is
reached before the algorithm finds the optimal solution, then it simply responds
with the best solution found so far. Anytime algorithms have been used most for
robotic and AI ( A rt i ficial I ntelligence) real-time applications where iterative re
finement can be beneficial. For example, a robotic navigation system m i ght i n
cl ude a path - pl a n n ing search algorithm for the map i t is building i n memory.
When the robot encounters an obstacle, it must decide whether to turn left or
right. When not m uch of the environment is mapped, a simple random selection
of left or right m ight be the best response possible. Later on, after more of the en
viro n m e n t is mapped, the algorithm might need to run longer to find a path to its
goal or at least one that gets the robot closer to its goal. Anytime algorithms are de
signed to be t e r m inated at their deadli nes and produce the best solution anyt i me,
so by defi n ition a nyt ime services do not overrun deadlines, but rather provide
some partial util i ty sol ut ion anytime following their release. The i nformation de
rived during p revious partial solutions may in fact be used in subsequent service
releases; for example, in the robot path-planning scenario, paths a n d partial paths
could be saved i n memory for reuse when the robot ret u rns to a location occupied
previously. This is the concept of iterative refinement. The storage of i n termediate
deadline, such as a robot that finds an optimal path more quickly a nd still avoids
collisions a n d/ or lengthy pauses to think! Why wouldn ' t the example robotic ap
plication simply halt when encoun teri ng an obstacle, run the search algorithm as
long as i t takes to find a solution, or determ ine that it has insufficient mapping or
no possible path? This is possible, but may be undesirable if it is better that the
robot not stand still too long because someti mes it may be better to do someth ing
rathe r than not h i ng.
Scanned by CamScanner
JO
I
he robot simp ly ran until it deter m ined the
vices and av01 dmg o erru n . f t
this would constit ute a best effort service rat her
opti mal answer each time , t en
1
.
.
any tim e real -tim e ut1 ity cur ve.
than anyt ime. F1gure 2 6 depicts an
Release
Deadline
100%
t:tility
F( t )
0%
Before Deadline.
FIGURE
2.1
Utili1 is <
/000
Tim
After Deadline.
Ut1firy is .Vegatfre or Zero
hard to imagine, yet feedback digital control systems are a very i m portant example
of a hard isochronal application. That is, always producing the response exactly at
the desi red deadlihe relative to release. I soch ronal systems a re normally imple
mented w ith hold buffers and traditional h a rd real -time services, early service co
will
pletions must be buffered, and CPU scheduling m ust ensure that late respo nses
never happen. S o, a soft isochronal service wo uld be fa r easier t o i m plement be
ter cause there is no need for early completio n bufferi'ng and no need to detect a n d
Scanned by CamScanner
J1
minate services that overrun deadlines. Allowing indefinite overrun of any soft real
time service can ultimately be a problem. If overruns continue to occur and service
releases start to overlap for the same service, the loading simply climbs higher and
higher. How can such a system ever recover? Allowing indefinite overruns would be
similar to the scenario where the overly conscientious student continues to work on
more and more late homewo rk, ulti mately lowering grades in all courses to the
point that total failure is the ultimate outcome. The example of a soft isochronal
service is depicted in Figure 2.7.
Release
1 00%
Deadline
. . . . . . . . . . . . . . . . . . . . . . . .
Utility
0%
F(r)
Before Deuel/me.
FIGURE 2.7
F(r}
After Deadline.
Uriliry 1s
<
100"'0
Scanned by CamScanner
J2
3. Turn i n late submissi on s for reduced credi t in classes that allow thi s.
4. Apply the anytime policy for classes t hat do not accept late su b mission s,
real-time services are proven using methods d iscussed later in this chapter ( Rate
Monotonic Analysis), and best-effort services simply use all the left - over resource
This is a very specialized form of real - t ime embedded system. Because soft real
time services are more generalized a n d l e ss determi nistic c o mpared to hard real
time services, soft real-time is t reate d as an open issue i n this text-a subject that is
S C H E D U L I N G C LA S S E S
Sch ed u l i n g servic es a n d t he i r usage of resou rces c a b e accom
n
plished b y a l a rge
.
of methods. I n this section , we consid er sched u l i ng
system proce ssors with
o rk. In gener al, a system m i g h t h ave more t
h a n one proce ssor ( C P U ) , and any
g ve proce ssor migh t host one or
more servic es. Alloc ating a CPU to eac h se rvice
by t he system m ig h t be s i m plest from a
sched u l i n g v iew po i n t , but clearly .
his
a l so be a cost ly sol u t i o n . F u r t he
r m o re, ru n n i n g s ervic es to c o m p l et io n
igno ring a l l o t h er requ ests
on a fi r t - com e, fi rst- serv ed basi s i s also s i m p l e , bu t
such as ervic e s t a rva t i o
n a n d m i s i n g d ea d l i n e s can arise with t hi
bet te r u nde rst and rea l - t i
me p roce ssor s h ed u l i n g, you fir t need t
revi ew
t a o n o m y of a l l maj or
sched u l i ng p o l i ies t hat c a n b e i m p lem e nt c?d a
shown in
2.8.
variety
i
pr?vi ded
'>-ou ld
p robl e m
Jp roac h. To
Figure
Scanned by CamScanner
SJ
System Resources
Euc:utlon Sc:hedulln1
----------
/"
Loc:al-Unlproceseor
Glob 1U-M P
P<ttm
SM T
SM P
Asymmttrlc: Distrib11ttd
(Symmttric MP) (Simullaneous (Off- load)
Multi-Thrtading)
mpdve
I\
Find-Priority
Batch
Rate
Deadline
Monotonic
" Monotonic:
FCFS
( Flnt Come.
Flnt Serve)
rattn
EDF /LLF
(Deadline
R ound Robin
Timttlice
SJ N
( Shortest Job Nu t)
Continuation
Fu nction
Driven)
FIGURE 2.8
As is the case with most policies, no one policy is optimal for all system requirements, not even when all services must be hard real-time. The policies that
can be made optimal and have traditionally been used for hard real-time systems
are Rate Monotonic, Deadline Monotonic, and to a lesser extent, EDF/LLF. The first
branch in the taxonomy is based upon hardware design; does the system contain a
single CPU or multiple CPUs?
Multiprocessor Systems
For multiprocessor systems, the first resource usage policy decision is whether each
CPU will be used for a specific predetermined function (asymmetric, distributed) or
whether workload will be assigned dynamically (symmetric). Most general-purpose
M P (Multi-Processing) platforms provide SMP (Symmetric Multi-Processing)
where the OS determines how to assign work to the set of available processors and
most often attempts to balance the workload on all processors. An SMP OS i s not
simple to implement, and overhead for workload balancing can be high, so many
embedded multiprocessor systems are asymmetric or distributed. Asymmetric mul
tiprocessing is used frequently to take a service that was initially provided by software
running on a general-purpose CPU and offload it to a hardware state-machine or
tailored CPU to implement the service, therefore off-loading the more general pur
pose multiservice CPU. Distributed systems are typically asymmetric and communi
cate via message passing on a network rather than through shared memory, bus or
cross-bar. Other than the issue of load balancing, multiprocessor systems are most
Scanned by CamScanner
JC
period of the overall loop. The loop may include event polling to determine when
to dispatch functions, and functions that need to be called at a h igher frequency
than the main loop will often be called m ultiple ti mes within the loop. Likewise,
functions implemen ting periodic services that need to be run a t m uch lower fre
quency than the main loop m ay be cal led only o n specific l oo p .counts or only
when polled events indicate a service request. For example, the NASA Space Shut
tle flight software uses a cyclic exec utive design for the PASS ( Primary Avionics
Subsystem ) and has provided decades of defect -free operation p roviding real-time
a I fu nction
processes, which
operate at rates of 6 2
H z down to 0.25 Hz." The sched uling
of t he G N &C exec utives is p rovided by the
Process Manageme nt ' which uses a m
-A .
uI tttas
k mg
"
pnonty
queue struct u re and SC h t"\J
.
ules the CPU m resp onse to requests
Scanned by CamScanner
.;lJ:>\'1; 1 1 1
"o;; vv ...... ..
rtquests from the GN&C cyclic executive: SM (Systems Management), which moni
tors systems for faults; and VCO (Vehicle Checkout), which is used for preflight and
on-orbit coast avionic systems testing. The GN&C cyclic executive is scheduled to
run periodically at 25 Hz ( 40-millise,ond period) and includes its own dispatch table
to sequence GN&C services at the high, medium, and low frequencies previously de
scribed. The cyclic executive architecture has been an important and successful
approach for hard real-time systems. Carlow explains the overall system: " [ the flight
software] architectures reflect a synchronous design approach within which the dis
patching of each application process is timed to always occur at a specific point rela
tive to the start of the overall system cycle or loop." Although the cyclic executive has
been successful due to its simplicity and deterministic character, one of its drawbacks
is the difficulty required to modify the cyclic schedule. For PASS, Carlow points out
that, "A major benefit of this approach is repeatability; however, there is only limited
flexibility to accommodate change."
The cyclic executive is often extended to handle asynchronous events with in
terrupts rather than relying only upon loop-based polling of inputs. This extension
of the executive is called the Main +ISR design. As the name implies, this approach
involves a main loop cyclic executive with the addition of ISRs ( Interrupt Service
Routines ). The ISRs handle asynchronous events that interrupt the normal execu
tion sequence of an embedded microprocessor. In the Main+ISR approach, the
ISRs are best kept short and simple so they relay event data to the Main loop for
handling. The Main+ISR approach hJs some advantage over the pure cyclic exec
utive and polling for event input because it may reduce latency between event oc
currence and handling. However, the Main+ISR approach has pitfalls as well. For
example, if an input device malfunctions and raises interrupts at a much higher
frequency than expected, significant interference to loop processing may be intro
duced. Although Main+ISR is more responsive to events as they occur, it may be
less stable unless a concerted effort is made to protect the system for potential in
terrupt malfunctions related to interrupt source devices.
DUL ER C O N C E PTS
The design of a generalized RTOS scheduler for processor resources is covered well
in most operating system texts. Here, we will briefly review some of the major con
cepts as they relate to real-time scheduling of the CPU resource. Real-time services
may be implemented as threads of execution that have an execution context and are
set into execution by a scheduler that determines which thread to dispatch. Dispatch
is a basic mechanism to preempt the currently running thread, save its context, and
restore the context of the thread to be run along with modification of the instruction
pointer or program counter to start or resume execution of the new thread. The
Scanned by CamScanner
onents an d System s
Real-Time Embedded Comp
scheduler.must implement the CPU sharing policy and the dispatche r m ust provid
the context switch for each t hread of execution. T he dispatcher is requ ired to save
and restore all the state that each thread of exeution uses including the followin g: e
Registers
Thread state
Stack
Program counter
scheduler works. As illustrat ed in Table 2 . 1 , thread states are based upon resources
needed in the thread execu tion conte xt.
TABLE 2.1
ftrMtl State
Desalptlem
Trasltl
Descrtptle
Ready
Running
Running
Pending
Delayed
Suspended
Ready
Non-Existent
Thread exits
Ready
Pending
become available
Suspended
Suspended
by another thread
Thread is waiting for delay
period to end
Ready
Suspended
Delayed th read is
by another thread
Ready
suspended by command
Non-Existent
Of a llouted
Scanned by CamScanner
resources
suspended
Suspension removed by
anot her thre - thread
activated
-
Ready
activation
System Resources
J7
how the scheduler decides which thread from the set of all
those that are ready for dispatch, was the main differentiation in the taxonomy in
Figure 2.8. As threads become ready to run, pointers to their context are normally
placed on a ready queue by the scheduler for dispatch in the order determined by
the scheduling policy. The scheduler must update the ready queue based upon new
service request arrivals. The dispatcher will simply loop if the ready queue is
empty. Note, however, that in VxWorks, neither the scheduler nor the dispatcher
shows up as a task. Also, unlike some operating systems, VxWorks does not in
clude an idle task in the default configuration. In VxWorks, the scheduler and dis
patcher are kernel context services rather than task context services. Preemptive
schedulers are driven by interrupts and task calls into the kernel APL An interrupt
or API call can cause the scheduler to switch context and to potentially dispatch a
new thread or allow the same thread to continue execution. In VxWorks, an inter
rupt or an API call made by the currently runn ing task are the only ways that the
currently running task can be preempted. A fixed-priority preemptive scheduler
simply dispatches threads from the ready queue based upon a priority they have
been assigned at creation unless the application adjusts the priority at runtime.
Most often, if two th reads have the same priority, they are dispatched on a first
come, first-served basis. Almost all RTOSs include priority preemptive schedulers
with support for the basic thread states outlined in Table 2 . 1 .
One o f the major drawbacks of a priority preemptive scheduling policy is the
cost or overhead of the context switch that occurs on every interrupt. Systems such
as Unix, which use a time-slice preemption scheme where an OS timer tick is gen
erated every so many milliseconds by a programmable interval timer, have high
overhead. By comparison, most RTOSs do not use a time-slice tick, and instead
only reschedule when 1 /0 generates an interrupt or when a thread makes a system
call that results in yielding the CPU-a delay, exit, yield, call for an unavailable re
source in addition to the CPU, or suspension due to exception. Otherwise, RTOS
threads normally run to completion unless an event releases a thread ( makes it
ready) at higher priority than the presently executing thread.
Given an M P or uniprocessor hardware architecture, if more than one service
can be requested on a given processor, the next branch in the taxonomy concerns
whether requests will preempt services al re ady in progress or queue and wait for
services in progress to run to completion.
Dispa tch policy,
Scanned by CamScanner
JI
ms
Real-Time Embe dded Com ponents and Syste
quency executiv es. In a data flow, input i n terfaces are periodi cally c hecke d, and
when data is available , this data is processe d and o u t p u t is produce d for cons ump
tion by the next service in the flow or terminated by produ c i n g a response. In data
flow processing, the inputs t hat start a flow are c h ecked i n a dete rmin istic order
most often, and flows are executed from start to fin ish o r sou rce t o sink.
The best property of this t ype of schedu l i ng policy is t ha t it is fully determinis
tic-the order o f exec u t ion i n the service and between services ( flows) is fully pre
dictable. The disadva ntage is that some flows may be known to requ ire execution
much more frequently than others-a modification to t h is scheme leads to a mul
t i frequency executive ( MF E ) . In the M F E , specific functions ( data flows ) are exe
cuted at a h igher frequency than others. For example, a con t ro l system data flow
1 00
guidance fu n c t ion may o n ly requ ire exec u t ion 1 t i m e per seco n d to d irect the sys
tem t o a target ( hopefully without losing control ! ) . Either a pp roach may use asyn
ch ronous interrupts o r may only poll i n p u t s t a t u s registers; however, conte xt
switches beyond s i m ple i n terrupt servici n g a re n o t done. That is, data flows are not
switch ed prior to complet ion, a n d exec u t ive fu n c t i o n s are n o t switched prior to
completi on; after a flow o r execu t i ve fu nct ion i s dispatch ed ( given the C PU ), it
run to compl etion witho ut signi fic a n t pree m p t io n .
Preemp tive service releases have a n advant age i n that t h e service can be desi gned
m uch more indep enden tly t han data flows o
r execu tives. Each servic e can ass ume
that it will execu te follow ing releas e from
an i nterr upt as if it is t he only service o n the
PU, e ept each servic e m ust be
pree m ptabl e betw een release and c o m pletio n
( gene rati n of re ponse ). The
pree mpta bility req u i re an oper ating y tern th at will
ave and re-t ore ontex.t for ea
h execut abl e erv i e. F u rthe rm ore , eac h rvi e rnu l
exe ute reen tran t
de if it i
ha red a nd m u t yn h ro n i z e a e
t any globalh
,l
hared r sou r es. The
req u i rc m cn are t rue
f a ny pree m pt ive m ul t 1 p rogra m rn ru
.
Y t ern . T ht:"
e re .i n depen dent i n t he
n e t h t n o pcc i fi coo pe rat i n bt
.
tween
rvic es 15 r q u i red u n le.
o d e r dat a i h a red har d o de r d a t j1 n rh
Scanned by CamScanner
_
_
System Resources
,,
requires reentrant code and shared data access synchronization . The services can be
considered to be running asynchronousl y other than specific synchronizati o n poi nts
for shared resource access. Thus, each service can be designed as a separte state ma
chine rather than one state machine composed of many smaller state machi n es ( as is
typical of executives) . Also, one flow of execution may, i n fact, preempt another in
cases where one service is more important than another ( e.g., maybe one service has
a shorter deadl ine or more negative impact if not completed by its deadl ine).
Because preemption opens up the possibility that more than one service m ight
be ready to run, and there are fewer CPU resources than services ready to run, a dis
patch decision must be made. One of the sim plest dispatch policies is to give the
CPU to the service assigned highest priority. These assigned priorit ies are never
changed; they are fixed. To decide how to assign priorities, two common real -time
policies are to assign highest priority to the service with the shortest release period
(h ighest request frequency), which is known as RM ( Rate M on otonic ), and to as
sign the highest priority to the services with the shortest deadline relative to release,
which is known as DM ( Deadl ine Monoton ic ) . Note that RM is identical to D M
when the release period equals the deadline. As you will see i n this chapter, the R M
(and related D M ) policy can be shown t o provably meet system deadl ine require
ments given determ i n istic release periods and service execution t i mes. However,
one major drawback is that fixed priorit ies do not guarantee maximum use of the
CPU.
To attempt full usage of CPU resources and prove that services can meet dead
lines, you are forced to consider preempt ive dynamic priority policies. Two dynam ic
policies have been shown to have the capability to fully use CPU and guarantee dead
lines assuming service execution ti mes and request i ntervals are determ i n istic or
bounded. Liu and Layland proposed Earliest Deadline First ( ED F ) , along with RM ,
and showed them to b e optimal-that is, policies that can schedule a n y set o f services
that can be scheduled ( a n exhaustive proo fl ) . H owever, although RM is sim pler, Liu
and Layland also showed that RM fundamentally requires margin and less than ful l
CPU use u nless service requests are harmonic.
very hard to predict-which service will miss its deadl ine in a n overload? Variations
on EDF have been proposed that intend to improve the determinism of the system
gi e n variations, i cluding Least Laxity First ( LL F ) . The L L F policy assigns h ighest
. to the servJCe that has the least d i fference between remaining execution time
pnonty
Scanned by CamScanner
and its deadline--a measure of which service deadl i n e is most pressin g. The idea is
that laxity may vary based upon exec ution rates, wherea s the EDF assign me nt is not
terns employi ng a n RTOS. Early on, most hard real - t i me systems used cyclic exec
u t ives or m u l t i freq uency exec u t ive
Shuttle guidance, naviga t i o n , and co n t rol prim ary avio n ics s ubsyst e m . The RTOS
sched u l i ng fra mework offers t h e same determ i n istic schedul i n g as the cyclic exec
u tive, but with far more flexibi l i ty i n h ow services are defi n e d a n d mai n ta i ned. The
princi pal reason for the pervasi ve ness of the RTOS i n l a rger scale configurable
real - t i me em bedded sy terns is t h e fact t h a t t h e policy has been proven opt imal
and a feasibil ity test erist , along w i t h the ease of defi n i n g services as tasks rather
than loop or i n terrupt contexts. The R M feasibi l i ty tests p rovide a m ethod to prove
that a given set of services ( i m plemented as RTOS ta ks) can be guaran teed to all
meet their deadl i nes i f the RM policy is u ed t o assign prioriti es. For h a rd reaJ -tjme
ystems where proof t h a t services i l l n o t m iss deadl i nes is desi red, RM is a n
obviou c hoice. For example , RM is often used for co m m ercial aircra ft , for atellite
sy tern , r any other sy tern where fa i l u re to meet a l l dead I i n es
can result i n ig
n ifi ant lo of l i fe and/o r assets. F i n al ly, as you w i l l see
in thi ect ion, with an RM
.
pol icy, you ca n pred ict and cont rol e xactly which
service will m i s dead l i n es in an
overl oad cen a rio .
Fir r, le t ' look a t why the R M prio
rity polic y i
p t i m a l . Recal l that the RM
.
- p? l icy req u i. res a fram ew rk where e
rvi e a re re l ea .ed a ynch ronou ly ( t yp i c ally
.
via an i n terru pt and p l aced on a
re a d y q u ue i n d i at i n g t hey need to ru n. A
sched uler then d1spa tche ea h t:rvic e
bae d u pon t h e h ighe t prior it y ta k i n the
Scanned by CamScanner
System Resourc es
41
read y queue ( all services are i m p l e me n t ed as tasks and assu m ed to have u nique
priority) . T h e service/t ask d ispatched continues to run t o completion unless an i n
terru pt preempts the task p rese n t l y exec u t i ng. Fol lowing a n i n terrupt, the sched
u l e r ree va l u a t es the ready q ueue and possibly di patches a new ta k (a context
swit c h ) i f a task o f h igher p r i orit y is added to the ready queue compa red to that
ru n n ing prior t o the i nterrupt. When a context switch occ u rs a n d a task/ e rvice is
preempted prior t o running t o completion, t h i s i called inteferencc. I n a fi x ed
tern
that has
proposed what t h ey bel i eved to be a reaso nable set of assumptions anJ con t raints
on real systems t o form ul ate a determin istic model. The ass u m p t i ons and c o n
straints are
AI:
C I : Deadl i n e = period
( WCET m a y be u ed )
by definit ion
ys
A 1 to A,, are assumptions and C 1 to C11 are con traints as p re se n ted by L i u and
Layl a n d in their paper.
Gi en the fixed p riori ty preemptive sched uling framework and assumpt ions
-
ribed in the preced ing l ist, we can now exa m i n e a l ternatives for assign ing pri
de
T1
2 , Ti = 5,
Scanned by CamScanner
= I,
1
=
a nd C2, and
1 and
w i t h period
T1
relea e peri
.
S I Makes Dead I me
r pn' o( S 1 ) >
nr
'' io( S2 )
C1
.
..
.
..
..
r:2..
..
..
..
.
..
o,
FIGURE 2.9
02
o,
from
I n this two-servIC e exa mp I e, the only other pol icy ( swap p i ng priorities
.
.
.
the preced ing example) does not work. Given servICes 5 1 a n d 52 t h pe ods T,
and T2 and C, and C2 with T2 > T 1 , for example, T, = 2 , T2 = 5, C, - I , C2 - 2, and
then if prio( S2 )
>
s,
prio( S 1 )
..
>
'
..
...
FIGURE 2.1 O
..
Scanned by CamScanner
System Resources
use to etermine whether system CPU margin will be sufficient for a real-ti me safe
operation.
R EA L-TI M E O P E R AT I N G SYSTEM S
M ny real -time embedded systems include an RTOS , which provides CPU sched
uling, memory management, and driver interfaces for 1/0 in addition to boot or
BSP ( Board Support Package) firmware. I n this text, example code incl uded is
based upon either the VxWorks RTOS from Wind River Systems or Linux. The
VxWorks RTOS is available for academic licensing from Wind River through the
c.
University program free of charge (see Appendix C, "Wind River University Pro
gram I nformation " ) . Likewise, Linux is freely available in a number of distribu
tions that can be tailored for embedded platforms (see Appendix C, " Real-Time
Linux Distributions and Resources").
Key features that an RTOS or an embedded real-time Linux distribution
should have i ncl ude the following:
'
able real -time operating systems provide the feat u res in the preceding l ist. This
has been the main selling point of the RTOS becau e it provides time-to-market
Scanned by CamScanner
Linux was origi nally designed t o b e a m u l t i user o pera t i ng system with mem
ory protec t i o n , abstracted process d o m a i n s , s i g n i fi ca n t l y a u to m a ted reso urce
managemen t, and scalabil ity for desktop a n d l a rge c l u sters o f gener a l - p urpose
! :
F nal
t
' '
Scanned by CamScanner
45
System Resources
called the th read con text. I n VxWorks , t h is is referred t o a s a task a n d the TCB
( Task Control Block ) . I n L i n u x, t h i s is referred to as a process and a process de
scriptor. I n the case of V xWorks, the context is fairly small, and in the case of a
L i n u x p rocess, t h e context i s rela t i vely complex, i n c l u d i n g I /O status, memory
usage context, process state, exec u t ion stack, register state, and ident i fying data. I n
( at the very least a th read of exec u t io n ) , task to describe an RTOS i m plemen tation,
Typical RTOS CPU sched u l i n g is fixed - priority preempt ive, with the capabil
ity t o modify priorit ies a t r u n t i m e by applications, therefo re a l so su pport i n g
dyna m i c -p riority p reem pt ive. Rea l - t i me respon e with bounded latency for any
n umber of services requ i res pree m pt ion based u pon i n terru pts. Systems where la
tency b o u n d s a re m o re rela xed might i n stead use pol l i n g for even t s and run
t h reads to complet i o n , increasing efficiency by avoiding disru ptive asynch ronous
i n terrupt context switches. Remember, as presented in Chapter 1 , we assume t h a t
vides priority pree m p t ive ched uling as a mecha n ism that allows an application to
i m p lement a variety o f sched u l i n g pol icie :
memory ma nagement becau e t h e appl ication really han dle the hed u l ing. i en
that bou n ded latency i mo t o ft en a hard req uire m e n t i n any real -t i me ytcm, the
foc u s is fu rther l i m i ted to R M , f.DF , and L L F . U l t i mately, a real - ti me
need t o
Scanned by CamScanner
hed uler
Dispatch
Interference
(preemption)
FIGURE 2.1 1
thread state of executing or ready on a priority ordered q ueue. M ost often, threads
Scanned by CamScanner
System Resources
47
Wit on
Resource
( s em T a k e ( l l
Interference
(preempt ion)
FIGURE 2. 1 2
on secondary resource.
Yield
Wait on
Resource
( semTake (
( t a s kDela y ( ) ,
Paus e ( ) )
))
Interference
(preemption)
Resource
Ready
( s emG 1 v e ( ) )
FIGURE J.1 J
lt,
enda ngerme nt. I n the ca e of d ivision by zero, this will cause an overflow re.; u
w h i h i n t u r n m i g h t generate fa u l ty comman d o u t p u t to an actuator , such a a
i han
atellite t h ru ter, which could cau e lo5s of t he a e t . I f t he di vi ion by zero
over
a n ex eption handler that recalc ula tes the re u l t and t herefor e re
1recover y i not p
w i t h i n the ervi e, con t i n uation m i ght be po ible, but often
i l u re, a n nre
ble. Be a u e t h e very next i n t ru t ion m i ght au e total y tern fa
Fi u re 2 . 1 4 how the:
erable ex eption hould re u l t i n u pen ion of that th read.
t h read o r ta k t hat i a l n:ad in the dela 1ed tate
add i t ion of a u pende d tate. I
d l ed b
Scanned by CamScanner
Suspend
(unhandled
exception)
Yield
( t a s kDe la y ( ) ,
Paus e ( ) )
Interference
(preempt ion)
FIGURE 2. 1 4
t a te , i n c l u d i n g delayed + suspended,
Al tho ugh sched uling the C P U for m u l t i p le s e rv ice s with the implementation
vides management of memory and 1/0 a lo n g with met hods to synchronize and co
exe
cute up to the rendezvous poi n t i n its code. I n Chapter 6, you'll see that this see m
i ngl y si mple requirem ent i mposed b y secon d a ry resou rces and t h e need fo r thread
one thread may have to enter the pending state to wait for another thread to
?
uo
session with a d v1ce by a thread, a n d c o n figuring the I/O device . The coord i na
cu
of access to dev JC es by multiple t h reads and t h e synchroniz ation of thread exe
Scanned by CamScanner
System Resources
49
e
a b i l i t y by set t i n g a sem apho re ( flag ) , which a llows t h e RTOS to tra n s
i t i o n the
t h read waiting for data to process from pendin g back to the ready state. L i kewise,
when a t h read wants to write data to an I/O ou t put b u ffe r, if t h e bu ffer is curren tly
ful l , the device can synch ro n i z e b u ffer avai l abil ity w i t h t h e th read aga i n t h rough a n
J S R a n d a b i n a ry semapho re. I n Chapter 8, y ou ' l l see that a l l 1/0 and thread yn
c h ro n i zat ion can be handled by ISRs and bi na ry semaphores , but that alternat ive
mec h a n isms such as m essage queues can also be used.
cause the usage of memory is determ i n i stic i n space and t i me. M e m ory usage may
lease m e m o ry. I n ge nera l , t h e use of t he C l i bra ry mal loc is frowned u pon in rea l
t i m e e mbedded systems because t h i s dyn a m ic memory- manage m e n t fu nction
provides a ll ocat ion and dea l location of arbit rary segments of memory. Over t i me,
i f t h e segm e n t s t r u l y a re of arbit rary s ize, t h e al locat ion seg men t s must be coa
r.J ,
.
;,; :1
FIGU RE
J. 1 5
Scanned by CamScanner
ary siz e.
Mem ory fragm entat ion for dat a s gmen ts of arbitr
50
Unused
Block
Space
Free
Blocks
. .
FIGURE
2. 1 6
allocation.
and 1/0 resource management, adopting an off-the- shelf RIOS such as VxWorks
Scanned by CamScanner
System Resources
SI
context is either exe<:: u ting or awaiting execution in the ready state in the dispatch
queue. In this scenario, thread A might have called function F and could have been
preempted before completing execution of F by thread B, which also calls function
F. If F is a pure function that uses no global data and only operates on input para
meters through stack, then this concurrent use is safe. However, i f F uses any
- global data, this data may be corrupted and/or the function may produce i ncorrect
results. For example, if F is a function that retrieves the current position of a satel
lite stored as a global state and returns a copy of that .position to the caller, the state
inforIJlation could be corrupted, and inconsistent states could be returned to both
callers. For this example, assume function F is defined as follows:
I'
{o:o, 0 . 0 , 0 . 0} ;
..
\i
' .
*/
sat e l l it e_p o s . x
s a t e l l ite_po s . y
s a t e l l i t e_pos . z
}
Now, if thread A has executed up to the point where it has completed its call to
update_x_p o s i t in, but not yet to the rest of get_p o s i t io n , and thread B preempts
A, updating the position fully, thread A will be returned an inconsistent state. The
state A returned will include a more recent value for
because the a l t , l a t , and long variables are on the stack that i s copied for each
thread context. Thread B will be returned a consistent copy of the state updated
from a set o f sensor readings made in one call, but A will have the x value from B's
call and will compute y and z from the alt, lat, long values on its stack sampled at
an earlier time. The function as written here is not thread safe and is not re- entrant
due to the global data, wh ich is updated so that i nterrupts and preemption can
cause partial update of the global data. Vx\.Vorks provides several mechani sms that
can be used to mak e functions re-entrant. One strategy is to protect the global data
Scanned by CamScanner
SJ
pr:
.
ven t i ng p reem p tion for t h e upd a t e criti cal sec tio n
from parti al upda. tes b},
k ( ) The so l u t i o n to make t h is fu n c t io n thre a d safe
u s i ng t a s k L o c k ( ) and t a s k Un o c
.
usin g
t a s kLock ( )
t a s kU n l o c k (
and
IS
. t 1. 0 n { d o u b l
t y pedef s t ruct posi
.
- {O . O '
POS I T I ON s a t e l l 1 t e_p os
vo i d )
POS I T I O N g e t_p os i t i o n (
{
double alt ,
lat ,
e x,
z;}
.
POS I T I ON ;
0 . 0 , 0 . 0} ,
long ;
l l 1 t e_p o
POS I T I ON c u r r e n t_s at e
..
y,
s;
re ad_a l t i t u d e ( &a l t ) ;
1
read_l a t i t u de ( & l a t ) ;
read l o n g i t u d e ( & long ) ;
- .
'l
...
'
.
*I
Mu l t i p l e f u n c t i o n c a l l s
are
req u i r e d t o c o n v e r t
state
code between L o c k a n d U n l o c k
is
i n
the
t h e g e o d e t ic
i n e r t i a l c oo r d i n a t e s
c r i t i c a l s e c t ion .
taskLock ( ) ;
c u rrent
s a t e l l i t e_p o s . x
u p d a t e_x_p o s i t i o n ( a l t ,
lat ,
long ) ;
cu rrent
s a t e l l i t e_p o s . y
u p d a t e_y_p o s i t i o n ( a l t ,
lat ,
lon g ) ;
cu r r e n t _ s a t e l l i t e_pos . z
u p d a t e_z_p o s i t i o n ( a l t ,
lat ,
long ) ;
s a t e l l i t e_pos
a s s ignment
*/
c u r r e n t _ s a t e l l i t e _p o s ;
/*
a s sumes
s t ruct u re
ta s k U n l o c k ( ) ;
ret u r n c u r r e n t _s a t e l l i t e _p o s ;
higer
Scanned by CamScanner
syst em Resources
SJ
S UM MARY
RTOS
along with basic fir m wa re to boot the software services for an application. Even if
a n off- the-shelf
RTOS is used, it
vice response latency is a prim ary consideration, and overall , the u t i li ty of the
response over time should be well understood for the application being designed.
sional m issed deadli nes are allowed, and missed deadline handling and service re
covery is designed into the system.
EXERC I S E S
I . Provide an example of a hard real-time service found in a commonly used
e mbedded system and describe why this service utility fits the hard rea l
t i m e utility c urve.
monly used e mbedded system and describe why this service util ity fi ts the
t i m e utility c u rve.
4, I m p lement a VxWorks task that is spawned at t he lowe t priority level
po ible and that call s e m T a k e to enter the pending tate, waiting for an
event in dicated by s e mG 1 v e . F rom the VxWork
ver ify that it i in t h e pending t.ate. Now, call
Scanned by CamScanner
54
Systems
Real-Time Embedded Components and
on a n d veri fy t h a t it compl etes e xec u t i
semap hore the task it is wa i t i n g
on
and exits.
n:i
.
.
7. Write a paragrap h compari n g the R M a n d t h e Dead l i n e Driven or E DF
policies described i n Liu and L ay la nd i n your own words a n d be com plete)
but keep i t to one reasonably conc ise pa ragra p h . Note that t he D ea d l i ne
D r i ver sc hed u li n g i n L i u and Layland i s now ty p i cal l y called a dy n a mic
parag raph.
genera l , a
8. Cross-compile code for VxWo rks. Download t h e file two_tasks.c from the
CD-ROM exa m pl e code. C reate a Tornado proj e c t t h a t i ncl udes this file
and download the fi l e to a lab t a rget . U s i n g t h e windshell, use the fu nction
i n t o your report
or do a C t rl - P r n t
S c r n ) . N e x t , s t i l l i n t h e wi n dshell, do an
" t e s t _t a s k "
Scanned by CamScanner
while
whec
55
System Resources
OIHl#l CD
lkup
1 3 . S e t a breakpo i n t i n fu n c t i o n
b
com mand
h i t , start a debug session using the Bug ico n . Use t he Deb ug menu, select
A t t ach, a n d a ttach to the Sender task. You should see a debu g window with
t h e c u rsor s i t t i n g at the s e n d e r fu nction e n t ry poi n t . Switch the View to
M i x ed Sou rce a n d Disassembly and now cou nt the n u mber of i nst ruc tion s
from a b rea kpo in t set at t h i s entry poi n t u ntil you enter mq_open u s ing the
s in gl e step i nto? Provide a capture showi ng the b rea k po i n t output from
your wind shel l.
CHA PTE R R E F E R E N C E S
Meeting Deadlines i n Hard R eal Ti me
l Briand9 9 ] B r i a n d , Lo'ic, a n d Roy, Daniel,
I EEE Com pute So c i ety P re s , 1 999.
ystems- Th e Rate Monoton ic
1 , M a rco. Unders tanding the L in ux Kernel.
[ Bo etOO l Bo vet, D a n iel P . , a n d Cesat
App:oach.
O Reilly, 2000.
" oftware
ched u l i n g H a rd Real -Time y terns: A Review .
[ B u rn 9 1 J B u rn , A . ,
1 99 1
Eng ine erin g fou ma /, (
, h u t t ie P r i m a ry Av1o n i s
arlow8 4 I Carlow , G e n e D . , A r h 1 te t u re of the . pace
oft wa re
ber 1 98 4 ) .
Scanned by CamScanner
"
y t e rn , "
M.i ,
51
I L i u 73 I
g
."
foumal
of
Association
Enviro
n
m
e
n
t
f
or
-Time
Comp
Real
Hard
a
uting Ma
chinery, Vol. 20, No. I , ( Ja n uary 1 973 ): pp. 46-6 1 .
I Ro bb ins96 j
Scanned by CamScanner
Processing
In This Chapter
I ntroduction
Feasibi l i ty
INTRODUCTION
Processi n g i n p u t data a n d producing o u t p u t d a t a for a system response i n real
time does not necessa rily require large CPU resources, but rather careful use of
CPU resou rces. Before considering how to make optimal use o f CPU resource in
a rea l - t i me embedded system, you m ust fi rst better un derstand what is meant by
proces i n g i n real t ime. The m a n t ra of rea l - t i me system correctness is that the ys
tem m ust not only produce the requ i red output response for a given i n p u t ( fu n c
t i o n a l correctness ) , b u t that i t m ust do so i n a t i mely m a n ner ( before a deadl i n e ) .
A
p l e concep t, but a more formal pec ification o f rea l - t ime ervice is helpful due to
t he many type o f application . For exam ple, the p roce
ing i n
voi e or v ideo
57
Scanned by CamScanner
51
the R M LUB:
U
U:
L/C, I T, ) m ( 2 01
m
1=1
m:
I)
ith u t m u h
l and m u t
R M L U B wa derived , i t i pure:'
lo er i n pe t io n of h o w t h i
a
ur
epted o n fa i t h .
er thi
qu
empt1 n , d i p a t h , n d
Scanned by CamScanner
lion,
\.\
m pl u n
r, rn
th
t1min
r rek
., ,r
t r m 1 hc r i i al 1 n t Jni
t
n-;
Processing
51
later. The critical ins_t ant assumes that in the worst case, all services m ight be re
quested at the same time. If we do attempt to understand a system by diagram
ming, for what period of time m ust the system be a n alyzed? If we a n alyze the
scheduling with the RM policy for some arbitrary period, we may not observe the
release and com pletion of all services in the proposed set. So, to observe all ser
vices, we at )east need to diagram the service execution over a period equal to or
greater than the largest period in the set of services. You'll see that if we really want
to understand the use of the system, we actually must diagram the executi on over
the least common mu ltiple of all periods for the proposed set of services, or LSM
time. These concepts are best u nderstood by taking a real exam ple. Let's also see
how this real example compares to the RM L U B .
For a system, c a n all Cs fit in t h e largest T over LCM ( Least Common M ulti
ple) time? Given Services S p S2 with periods T, and T1 and C , and C2, assum e
T 2 > T 1 .> for example, T 1
2, T2
5, C ,
I , C2
1 ; a n d then if prio( S , )
>
prio ( S 2 ) ,
you can see that they can by inspect ing a timing diagram as shown in Figure 3. 1 .
C. I .
U = l / 2 + 1 / S = U. 7
-.
U = 0.7 S 2( 2 2 - l ) S 0.83
T1
FIGURE J.1
Why did L i u and Layland call the RM LUB a least upper bound? Let's i nspect
this bound a little more closely by looking at a case that i n creases u t i l ity, but
remains feasible. Perhaps we can even exceed their R M L U B.
>
T 1 , for example, T 1
2, Tl = 5, C 1
1,
By i n spection, this two-service case with 90% utilit}' i s feasible, yet the u t i lity
exceeds the RM LUB. So, what good is the RM L U B? The RM LUB is a pessimi tic
feasibility test that will fail some proposed ervice set that a tually work, but it will
Scanned by CamScanner
ribi ng the RM L U B is
60
= I 2 + 2 1 5 = 0.9
= 0.9
2( 2 :
FIGURE J.2
- 1)
S:
g RM LUB .
Example of two-service case exce edin
of all three:
Necesury condition: To say that A is necessary for B is to say that B cannot occur
without A occurring, or that whenever (wherever, etc.) B occurs, so does A. Drink
ing water regularly is necessaryfor a human to stay alive. If A is a necessary cond
tion for B, then the logical relation between them is expressed as "If B then A or a
onfy if A or e A."
Sufftdent condition: To say that A is sufficient for B is to say precisely the converse:
that A cannot occur without B, or whenever A occurs, B occurs. That there is a fire is
sufficient for there being smoke. If A is a sufficient condition for 8, then the logical r
lation between them is expressed as "If A then B" or "A only if B" or "A B."
fai ling a sufficient feasibility test does not i m pl y that the proposed service set will
m iss deadl ines. An N&S ( Necessar y and S u fficien t ) feasibilit y test is exact-if a ser
vice set passes the N&S feasibility test i t w i l l not miss deadline s and if it fails to pass
)
.
the N&S feasibil ity test, i L is guaran teed to m iss deadli nes. T herefore, an N &S fea ,.
bility test is more exact compa red to a su ffi cient feasibil ity test. The sufficient test is
conser vative and, i n some scneari os, do.w nri gh t pess
i m istic. The RM L U B is useful.
owever , i n that it provide s a simple way to p rove
t h a t a p roposed serv ice set is
.
to
sible. The suffic i ent R M L U B does not prove
that a service set is i n fe asib le- di
t h is, you m st apply a n N&S feasib ility test.
Presen tly, t h e R M L U B is t he l east
. .
1n '
plex feas1b 1l ity test to apply . The R M L U B
i s O( n ) order n comp lexity- requ
e execu tion
that
fea
0.:
Scanned by CamScanner
l 1 1t l .. t 0 1 11
1f t ft,... ttq ft1h-:, t . )t "" "' '#\ '< " in ho ptoi '"w'! '-( 1 , 1 : p tr1K1fl, t hf 11nh
a Uu\o f t !'-.,'\ , rJ '\ ththtY f{'i' ' tyt r t( t( ' t ; -. >J y .ir ! ,,.: ...... f tt; U l f Hli: 1 tr.th ! U! -ott
l " lJWt f,.,.t f h\: t ht m1mt.-r. "t ''T'vt: ' tn I t -w-:.,: to : fi r . -., ." flf... 'r! if.of' r.l 1Ht >'t
t"t J
lf'O Ht
ft,. f t J \: Jr, !1 t l d :.Prt t f tht dH; ":t J t t J f !t',t c 1 ' !! ' ! ' f t-
\'l tfh lrl1''1l i tf"',_ f' dt:fJ n t flfJ._:J:. f t , l) 1: rul f-4 \,(I l o t,_.-. (h( Jf fl';'O\('\'
\.(' f'I. ). ( Cf
\n t h .Ht ". _4t f\"'\t > ' * '"" '"' r . ' 1 1 r ' i; : ;._ , . 1 !. 1 :..1 t w m . hs . . : l fl(' t :w dtpt ' -' ,aL1
J 1 ri 1, t lh t\ \t l l ! \ 1 ' ti!} 1p f hli I n ' "'1 - ; ) . -.. r: l t o r:! f U '(-f. 1 1 : 1 ;"t: :, ) .. n "'
!( a u hd i r ' t c , t l ;l r .. d f t ill'' l o t h . ' 1 : . t h . , , t r m t n-. 1 ! . r: .: rr .. l t .-n e '-!''t l c, (
i
l tl
, "'. hI ) r ,,.. , I
l. .\ .
'
' , ,
"
'
.. .
1 ,
: L \"
.. .
.
.,,
,.1
.f
..
..
,
> 1
>
,- .;
... t :...
>
...
, 1 1 1 \ :. '1
i.
; .,..
r ,. .., : 1 ' !
..,
Jt
.. . .. 1
"
t
-;:r , t .. ! \.... l t h
.- :1t )..1t' n
)'' - -:,:! ! f,
.,
..
' 41:.". ..,, , .. " . \1 '
c... ...
. ..
' . ..
"'
" :
'f-. ,1
' t '
' 4
... t
...
r r f J'-r r
.
1o I ;
,
, , . : i J :, , ,.,
\r
l '"
.
, . I1 .'
I
:. ,
1 1 \ :
.f
1 J1 t '
i.
i . ! ' . . . ..
, ' ..
,k l ,
' ..
I.
. ... :
. ,
1 ,, f
lf ' t 1 ' f
' "l' .. . r. ; 1 .: , r t ., .. . . . -;
' f t "- ,. ... .
'\..
, , . ;, ; 1
r t \ , \ .- t t, i, ; t
l t ': 4 ; ,
....
'"
'
'
,, !
i "'
I , , :-- '(,;_
: ; ;
:.
: ::
\'
'<'r\ l' (
.,
.... 4.
, ,,
\, .4"
...
.
. \ !". ' \ \
..
', . . .? I. ,
'..... ,.. ..,.... ""
'
ct!U !?:-?.
t,..:'"; :.._ t:
f \ f
. . .. . .' ' . ..
ti ;
'... ! ; '
'\
,
-
('
'
\ :;
: ;
t : ..
>
'
,.. '4
'
'
"' t
.. I : .
, , , : J
r f
' <Cf' t
!f :
_. :
.: rif'
: (" : :
(' '!
...
... ,. .... . ..
'
: ;\ l i
,. ... '
\i
... .. .. .....
fti H '{\ .;
... ..
... .
j,
! f
.{
.a
,. . , f
.. ... '
\, J , t
.! i .\i ''f H .t : I )
CJ ( t t" .i ! : : : 4 ! \.
"""" J .. . .. .,,
.,
r#
FE A S I B I L I T Y
..
..
: '
,. ! J f
1 1: .c'
._
: ,
t "' ,,
,,-J '-..,
:" J
._
.-.
' j
"
.\t !
.. ! ' :
u ! n t , , ; J '
..
do I.
2'
o I
f'
.,"
t)
f t,"
,"' ... 4
h t t fl ' , H.;d. 1 h; : \
t \t
tr
it t"t('
t\ U t\ f'""
! (" ! : :
'-;i l l id, fi l
\l)f fi\ ll'l\l k J \ t l i i l 1 t \ f , \ h h t! I ,tl \\ .1\ l l .J tl J ,,' f'l d 1. '4<' t ( Ul I \ 1111( f t"Jl t l l t l\' '-l fl.'.
\ 11. r \\.'t
( I c. , t h .tt \. .I l l n u n dr.1dl 1 11n l l f , 1\\ \' \ \ I . .1 'lll r ! 1 .. h 1 1 l f l"\\ h ll l .tl tJ1l J \t'f
t h.H a re: ,I I r 1 I l l <' "' ' k P , l J , 1 1 1 1 1 .t l l \ " " d i \ u l t i" 1n t h." ,t \lb1l 1 ! ,. I ,., , .. ..H l' nut pH.'t 1 \.\.',
Th.: "llll tlu\'.tt t h : -. r ' .a r c:. " ' 1 1 h4.' C \ t l l \ c t i\\. J 11 ' < th(\' h 1ll 1 1 n a p.t ' ' .111 u r h .& k w t ot \tr
i, il:t.'). N -; h."h .u p r l'l l,c:.' . \n ' k.t,1b i l 1t \ lnt '"ll 1wt J'J,, .l \t'f\'iu. \C.'t t llJt l \
.
.u 1 d l tl'\\ \'4.' ' ' 1 l l 1t t 1( l.1 11 .un tt,l t h .u I\ u k l ht: Kt l l ' I\ " .i uhiu\.'nt t 1.' \t
Jnd r h rd\ H t" "''k ht1t 1t \\ Ill t.111 '<'f\ h . t' '<.'h t h .\l J ... t tt.llh 'an be \jfdy ht:dulcd.
u m.1 k
Scanned by CamScanner
12
' <
.
polin:
charac. .
become .
sufficient
feasibi} f,.
pass ,
sufficient "
:.
.....
.
.
Point and Completion tests for the RM
ling
u
d
h
e
.
.
By com par iso n, the Sc .
.
h
mg
t
e
1mprec1se,
'
b ut sa1e
show
es
mpl
Exa
are N&S and therefo re pre use.
amined in the foll owin g sect ion : I t will also
ex
re
a
LUB
M
R
f
the
o
1c
t
. .. ten s
c 1 th e
ds can east1 'f 1a1
.
. h re1atively harmonic per10
clear that servtee sets wit
the tn ore precise N&S
e safe when analyzedb
to
wn
sho
.
be
B
nd
LU
RM
sets as safie th at may not
1 ,
..
systems
Rea lTime Embedded Components and
. 71 --r
.
. ..
,
FIGURE 3.3
l
:
, .
Taking the same two-serv ice exampfe shown earlier i n Figure 3.2, we hav the
following set of proposed services. Given Services S1, S2 with periods T i and T2
execution times C1 and C2, assum e that the servic es are release d wit h T 1
2, T2
execute deterministically with C1
l , C2
2, and are scheduled by the RM .policf
so that prio(S 1 ) > prio(S2) . I f this propose d system can be shown to be feas ibl
that it can be scheduled with the RM polic y over the LCM (least com on
m
m
pie) p riod derived fr m all prop osed serv
,
c
Leho
ice peri ods, .then the
.
and Dmg theorem [ Bnan d99) guar antee s
it real - time- safe. The . theorem is Will
l
upon the fact that given the periodi c relea
ses of each s ervi ce, the LCM schedu e . .
si mply repeat over and over as sho wn in Fig
-:.i:
ure 3.4.
Scanned by CamScanner
Processing
IJ
C.I.
s,
S2
FIGURE J.4
C:J
C3
I l
# I , #2, and #3 execut ion traces for 51 in Figure 3.4. Furthermore, note that in t h i s
particular scenario, the util ization U is 90%.
The CI ( Crit ical I nsta n t ) is a worst-case assu mption that t h e demands upon
the system m ight include simul taneous requests for service by a l l services in t h e
system! This elimi nates the complexity o f assuming some sort o f known relation
ship or phasing between service requests and makes the RM LU B a more general
result.
G iven t h is motivating two-service example, we can now devise a strategy to
derive the R M LUB for any given set of services S for which each service S ; has an
arbitrary C,, T;. Taking this example, we exam i n e two cases:
Case l : C 1 short enough to fit all three releases in T2 ( fits 52 crit ical t im e zone)
Case 2: C 1 too large to fit last release in T2 ( doesn't fit 52 crit ical time zone )
Case I where 5 1 total resource required j ust fits t h e 52 crit ical t i m e zone (T 2 - C2 ) is
shown in Figure 3.5.
(3. l )
Scanned by CamScanner
T.LT /T.J
'-1 fT / T. l
Eq- 2 : Cc = T1 - r
Eq- 1 : c,
Eq-3 : U
T:
c,
T,
c,
= -+-
FIGURE J.S
( I . e. (,
T:
1s
,
small enough 10 fit into fractional 3rd T, shown as ..._.
( I.e. C:
C.I
r.m -
L T2
L T2
Tz
(3.2)
simply expresses the length of C2 to be long enough to use a l l time not used by S1note that when the third release of 51 ju st fits i n t he crit ical t i m e zone shown in red,
IT2 I 11 l
that if C 1 was increased and C2 held constant, t h e sched ule would not be feasible;
likewise if C2 was increased and C 1 held constant, they j ust fit ! I t i s also interesting
to note that looking over LCM ti me, we do not have ful l u t i lity, but rather 90%.
Equation 3.3,
(3.3)
?etines t e uti lity for the two services. At t h i s poi n t , let's plug the expression for C:
Ill
9_ + [ T2 - C , f T2 / l) l]
11
( J.f
T2
+ [ - c, r T1 1J
T:>
S + 1 + [ - c , I T2 I T, l] ( no te tha
t Ti ter m is
r,
TJ
-t
Scanned by CamScanner
T7
I)
Processing
l + C1 [( 1 11i
f T, I T,
Ti
l]
15
( combine C 1 terms)
(3.5 )
What i s interesting about Equation 3.5 i s that U monoto11ically decreases with
increasing C, when ( T2 > T 1 ) . Recall that T2 must be greater than T1 given the RM
[( 1 1T, fT:;1Ti l]
oo.
Note that
l , this is a degenerate case where the periods are equal-you'll find out
later why this is someth ing we never allow. If we plot the expression f T1 i r, l , you see
=
( I / lj )
is con
T 1 to any constant
value-try this and verify that the condition still holds true ) . Figure 3.6 shows the
periodic relationship between T2 and T1 that guarantees that U monotonically de
creases with increasing C 1 when ( T 2
,,
1 8
1 7
T 1 ) for Case 1 .
Scanned by CamScanner
>
1 relationship of T2 and T i.
61
onents an d Systems
Real-Time E mb edd ed Comp
T.2,
T2
(3 .6}
C. I
C,
C2
T2 - T1lT2 / T1 j
riLT2 I T, J - ciLT2 n;J
c,
c,
= - + -
T1
T2
FIGURE J.7
Even though
we could find a value of C2 for 52 that still allows it to meet its deadl ine of T2 To
compute this smaller C2, we fi rst note that 51 release
with some fraction of #3 leave some amount of time left over for 52 H owever, if wt
2,
-:-
11 L T1 I T1 J
cI
L T1 I 11 J .
(3. 7)
u=
Scanned by CamScanner
T,
Ti
( 3.8) .
17
Processing
c,
[ -Ci L T I T, J ]
I T2 )LT2 I T1 J + +
( e pa ra t i n g term )
1i
T:!
( T, I Ti ) LT: I 1i J + C 1 [ ( I I T, ) - ( I I T1 ) L T I T, J J ( p u I I i n g
out
u- (Ti1
U
C , term ) .
o m mon
T h i give u
( T, I T2 ) LT I T, J + c I [ ( I I 1j ) - ( I I T2 ) L T2 I 11 J] .
( } . 9}
T1
( l I 11 )
because
I T2 ) L T1 I Ti J
T2 is g rea re r t h a n T 1 As be fo re , thi
m a i l e r t h .in
alwa
T1
to
Now if we pl ot t h i
oo.
I a n d becau e T mu t be greater t h an T 1 , w e l e t i t b e a n
aga i n ,
t han
y o u . ee
( I I T, )
that
( I I T1 )l T I T, J
v a l ue fr m I
i !cs t h a n I i n all
--
0 95
09
( L
0 85
T l
I T Ti !
oe
0 75
I
I
07
0 6 '
06 t
0 55 '
i
05
I
._ ------- ---------
r:
- -- --- -
T
FIGURE J.I
0,
n w yo u
Scanned by CamScanner
u n d r<it nd t h
ey c n ep i t h a t in
t i m e zon . Th
u rren e o
u ri n ' T ..
n d, i n
, '>t'
c:n ri
w h e re
Je
I , we ha
2,
j u t ov r r un the
e he m
haH t h
i m u rn n u mhc-r
minimum
riti 'l
II
oo,
I + C1
[(t
;,
]<
T 11
/ li ) - I 2 1 3 .5 )
( 11 / Ti ) L Ti 1 11 J + c , r ( 1 / 11 ) - ( , / Ti ) L Ti / T, Jl < 3 .9 >
T,.
l , T2
::::
+ to
ease-1 Utility
0. 9 - 1
0.8-0 . 9
0 0 7 -0 .8
0 .9
0.6-0. 7
0 .8
0 .7
0. 5-0 .6
0 .6
0.5
0 .4
0. 3
0 .2
0.1
0. 4-0 . 5
0 0 . 3 -0. 4
0 0 . 2-0 . 3
0 . 1 -0 .2
0-0.1
FIGURE 3.9
C1
Case 1
utility with
varying T 2 and C 1
0, this is not particularly i nteres t i n g because this means that is the only service that requires CPU resource-likewise, when C 1 = T 2, then this i
also not so interesting because it m eans that 5 1 uses all t h e CPU resou rce an d never
When
What we reaJly want to know is where the utility is equal fo r both cases so t at
we can determine utility indepen den t of whether C 1 exceeds or is less tha n the cot-
_
icaJ time zone. This is most easily d eterm i ned by subtra cti ng Fig u re 3.9 a nd Figurt
ho\\
3 . 1 0 data to fi nd where the two fu nctio n differ enc e a
s re zero. Figu re 3. 1 1 s
that the two functio n differen c es are zero o n a diagonal when TJ is va ried fro m
times to 2 t i mes T 1 and C 1 is varied from ze r o t o
T 1
Scanned by CamScanner
Processing
69
0.9-1
1
0 .8-0. 9
0.9
c 0. 7-0 .8
0.8
0 .6-0. 7
0 .7
0.6
05
0.4
0.3
0.2
0.1
0. 5-0 .6
0 4-0 5
o 0 . 3 -0. 4
0 0.2-0.3
0. 1 -0.2
0-0.1
C- 1
FIGURE 3. 1 0
0.9
0.8
0.7
c 0.4-0.6
0.6
0.5
0.2-0.4
Service-IC
0-0. 2
-0. 2-0
0.4
0.3
o -0. 4-0. 2
o -O. S.-0. 4
-0. 0.6
0. 2
1 --0. 8
0.1
1.1
1 .2
1 3
1 .4
1. 5
1 6
1. 7
1 8
1.9
Period-2
FIGURE 3.1 1
Finally, if we then plot the diagon al of either utility 'curve (from Equati on 3.5
or Equati on 3.9) as shown in Figure 3. 1 2, we see that iden tical curves that
clearly
have a minimum near 83% utility.
Scanned by CamScanner
-,u
... . .
Real- Time tmoe aaea \..o mpu 11111:;rn
--
09
Utility 0.8 .
.,
07
1. 2
1 .3
1 .4
Period-2
FIGURE J. 1 2
1 .5
-'
1 .6
1 .7
1 8
1 .9
' 19'
'
:.' , 2
c:Ase
Recall that Liu and Layland claim the least upper bound for safe utility given
an y arbitrary set of services ( any rel a t i o n between periods a n d any relat ion
services, m(2 m
I ) = 0 .83 !
= I, (C; I T, ) m(2
m
t=I
1)
For
two
this
util ity for any given set of services, b u t i n doing so, we c a n a l so c learly see that
the two sets of eq uations this can only occ u r when C 1 i s equal for both cases:
c
c
U
= T1
I
2
T2
J
L
c I T2 I T, l
11 T2 I T1
I
S_ + C 2
T1
T2
Now , pl ug C , and C2 s i m u l ta
n eous l y i n to the u t i l i t y equ a t i o n to get E q uati
3 . 1 0:
Scanned by CamScanner
Processing
u=
T2 - T1 L T2 I 11 J T2 - c i I T2 I 11
u=
T2 - T1 LT2 I T1 J + Ti - T1 I T1 I T1 l + Ti L T2 I Ti J I T2 I Ti l
u=
(Ti 1 T, ) - L T2 1 Ti J + 1 - I T1 1 T1 l + ( Ti 1 T1 ) L T1 1 Ti J I Ti 1 Ti l
Ti
T.,
T1
u =1
u= I
u= ,
71
T1
- r T2 1 Ti l + ( Ti 1 T2 ) L T2 / T, Jr Ti 1 Ti 1 + ( Ti 1 Ti ) - L Ti 1 Ti J
- ( Ti I T2 ) ( ( T2 I Ti ) I T1 I Ti 1 - L T2 I Ti J I Ti I Ti 1 - ( T2 I Ti )
- ( T1 1 T2 ) [ r T2 1 11 1 - ( T2 / T, ) J [ ( T2 / T, ) - L T2 / T, J J
(Ti I T1 ) LT I T1 J)
f
(f((T2l -Ti!)) )
( 3. l o i
U-1-
J.
( 3. 1 I )
Equati on 3 . 1 0 as follow s:
u = , - ( T1 / T2
U = t - ( T1 I T2
( N. d )
)[
) [ r T2 / T, 1 - ( Ti / r, ) J [ ( T2 / Ti ) - L T2 / r, J J
L T2 I Ti J - (T2 I T1 ) ][ ( T2 I T1 ) - L T2 I Ti J]
tloor ( N.d)
I - ( 11 I Ti ) l - ( T1 I Ti
U = l - ( T1 I T2 ) ( I
J)
u -- 1 - f ( t ( T2 / T1 )
Scanned by CamScanner
- f )(J)
based on c e i l i n g
) - L T2 I T1 J) J [ ( T2 I 11 ) - L T1 I Ti J J
72
em s
on ents an d Syst
Re al- Tim e Em bedded Co mp
By
n
cu
tra
b
su
d
add ing an
we
can get:
)
J ( l - /)
(
= I - L T1 I 11 J + ( Ti I 11 ) L Ti I 11 J
u
U = l - ( / (I 1+- /) )
( /)
.
.
IS l , an d the
The sm aJl est I po ssi ble
t:
we sub stit ute fo I to ge
I r
i
(
!
U = l - (l ! )
( l + /)
'
.
.
Now takm g t h e denva t1ve o f
so
Solving fo r f, we get:
And, plugging/back i n to
The RM
U, we get:
u
2 2 ' 12
LUB of m ( 2 - J ) i s 2 ( 2 ' 1 2 - I )
I
rn
- 1)
for m = 2 , w h i c h is true fo r the two
service case-Q.E.D .
Having derived the RM LUB b y i n spec t i o n a n d by m a t hemati a l man ipula
c
tion, we learned that the pess i m i s m of the
RM LUB t h a t leads to low util ity fo r
real-t ime safety i s based upon a bo u n d
that work s for a l l pos s i b l e com b i na tio ns of
T1 and T2. Specifica lly, the RM LUB
is pess i mist ic
c a ses w here T 1 a n d T are
harm onic -in these case s,
you can safe ly ach i eve 1 00010 u t i l i ty! I n m a ny cases, a
shown by dem ons trat ion i n
Figu re 3 5 a n d t h e Leh ocz ky, S h a h , D i n g theore m ,
can also safe ly use a
at leve l s b lo w 1 00010 but a bov t
e h e RM L U B. The
still has valu e bec aus e
tli
i t ' s a s i m ple and q u ic k fea i b i l
i ty c hec k t h at is s u
.
cie nt. Go mg t h rou g h the
_
der i vat io n , it h o u l d n
o w be evi den t tha t i n ma n y Cd
safeJy usi ng I OOo/o o f the
resou rc e i s not po ible w
it h
tive sch ed ulin g and thf R M
pol i c y. I n the nex t
Sh a h , D i n g
e t io n , the Leh o
fo r
L_UB
CP U
CP U
Scanned by CamScanner
>'0
R
Processing
7J
Tw
easily employed:
RM polic y are
compl icated policy with dynamic priorities. You can achieve l 00% u t i lity for
fixed - priority preem ptive services, b u t only if their relative periods a re harmon ic.
Note also t h a t RM theory does not account for I/O latency. The i m p l icit assump
tion is that 1 /0 latency is ei ther i nsign ificant or known deterministic values that
can be considered separately from execution and interference time. In the upcom
ing "Dead l i n e - M onotonic Pol icy" section, yo u ' l l see that it's straightforward to
slightly modify R M pol icy and feasibil ity tests to account for a dead l i n e shorter
than the release period, therefore allowing for addi t ional output latency. I np u t
latency c a n b e s i m i larl y dealt w i t h , i f i t ' s a constant l atency, b y consideri ng the
effective release of the service to occur when the associated interrupt is asserted
rather than t h e real -world even t. Because the actual t i m e from effective release to
effective release is no d i fferent than from event to event, the effective period of the
service is unchanged due to input latency. As long as the additional latency shown
in Figure 3 . 1 3 is acceptable and does not destabilize the service, it can be ignored.
H owever, if the in put latency varies, this causes period j itter. I n cases where period
jitter exists, you i mply assume the worst-case frequency or shortest period possible.
Recall that by the Lehoczky, Shah, Ding theore m , i f a set of services can be hown
to meet all deadli nes form the crit ical instant up to the longe t deadl ine of all task
i n the et , then the et i fea ible. Recall the critical i nstant ass u m ption from Liu
and L a y! J nd paper. which states that in the wor t case, all ervice m i ght be re
ched u l i ng Po i n t Test :
Scanned by CamScanner
14
a
Real -Tim e E m bed ded Com ponents n d
\f ; , I ;
( k ,/) e R
Ri
{(
k.
systems
. l; r (l ) T1c l
n . m tn
1= 1
Cj
I)\I S k S i.I
.
TJ
I,.
( I ) T1c
. -l j}
[(1 1
_
_
_
/
r
Release
Deadline
Event
Sensed Interrupt
FIG UR E l. 1 1
code
_
_
Dispatch
Preemption
Dispatch
Completion
(10 Queued)
Actuation
(10 Completion)
serv 1Ce
is. m
. c l uded w i t h tes t c ode o n t h e CD- ROM for thl
Sched u l ing Poin t Test N o t
t h at th e a l go n th m a u m e arrays are orted acco rd
,1
ing to the RM pol icy w h er e p e r
ou
1o d [ O ] 1. t I h 1' ghe
t pnonty a nd s h ort e t p e n
Th e C
a l gor ith m
1c
Scanned by CamScanner
Processing
75
r J l
r J lei
a.
a .
t.
line for Sn, which proves that Sn is feasible. P rov ing this same property for all
S from 51 to S,. proves that the service set is feasible.
The C code algorithm for the completion time test can be found on the CD
ROM and assumes arrays are sorted according to the RM policy where per iod [ o I is
the highest priority and shortest period .
ity is assigned to the service with the shortest deadl ine. The DM policy is a fixed
priority policy (be sure not to confuse this with EDF where priority is assigned
dynamically with highest priority assigned to the active service with the nearest or
earliest deadline at any given point in time ) . The D M policy el imi nates the original
RM asu mption that service period m u st equal service deadline and allows RM
theory to be applied for scenarios even when deadline is less than period. This is
useful for dealing with significan t output latency. The D M policy can be shown to
be an optimal fixed-priority assignment policy like RM policy because D , and T,
differ only by a constant value, and D; S T;. The D M policy feasibility tests are most
easily i mplemented as iterative tests like Scheduling Point and the Completio n
Time Test for RM policy. This policy and associateJ feasibility tests were fir t i n
troduced b y Alan Burns and Neil A udsley 1 A udsley93 ] . T h e s u ffi c ient feasibil ity
test they first introduced is simple and intuitive:
Vi : I S i S 11 : C,
D,
Scanned by CamScanner
S I .0
D,
3. 1 2 )
C; ts t h e execution
.
.
.
1 e D . tim e peri' od s i nce the t i me o f request for service .
experiences over its d eadl'n
c'
a I I s ervic es from l to n , i f the deadli n e i n t erv a l i s
Equatton
i
y
t
h
pnont
an
igher
h
S
has
ys
a
w
l
a
that
te
o
N
S
I
I
.
for a
(3. 1 3 )
,o j
l
o 1s
t h e worst-case nu m b er of re l eases of S1 ove r the dead l i n e i n terval
.
.
_!..
T,
for S,.
cor - th e last i n terference may b e only partial. So, t here will be r, full
accou n t ed 11
.
c
te ncerence.
1 11 ter1erences
an d some parti' a l i n te r ference from t h e last a d d 1 t 1 o n a I m
.
So, we can b e tter acco u n t for t h e p a r t i a l i nterference with
lo.-o }'
1.
o.
'T,
r, '
'
0 i f n o p a rt i a l i n ter fer -
I J
Scanned by CamScanner
Processing
77
A n a l ter na t ive to RM
or DM are the dynam ic priority
.
.
polices derived from the
.
deadi me - d nv en d yn atn ic pn
onty
approa
ch first presented by Liu and Layland . I n
,
.
t he ne xt sec tio n, we II exp l ore
t
h
e
a
d
vantag
es and disadv antages of dynam ic prior. .
1t1es com pared to stat ic T od
.
ay, ,
ior h ar d real-tim e systems that must provide determm 1st1C res po nse s to service
request s, fixed-p nonty RM policy remain s the
mo st w1'd e 1 y use d and u niv ers ally
accepted the ory.
.
?1
oy Liu and Layland [ Liu73 ] with their description of deadline-driven scheduling pol
icy. The policy Liu and Layland specified in their paper later became known as an
EDF ( Earliest Deadl ine First ) dynamic priority policy. The policy is called EDF be
cause the scheduler gives highest priority to the service that has the soonest deadline
whenever a dispatch decision is made. This also means that anytime an additional
thread is placed on the ready queue, the EDF scheduler must reevaluate all dispatch
priorities because the newly added th read may not have a deadline later than all the
existing th reads on the ready queue. Basically, the EDF scheduler must be able to in
sert the new thread i nto the queue based upon time to its deadl ine relative to the
time to deadline for all other threads-the insertion has a complexity that is of the
order
O(n),
where
lated d yn a m ic priority policy, LLF ( Least Laxity First ) , also succeeds where RM fails.
Like EDF, L L F is a dynamic- priority pol icy where services on the ready queue
are assig n ed h igher priority i f their laxity is the least . Laxity is the time difference
between their deadline and rem aining computation time. This req u i res the sched
uler to know all outsta n d i n g service request tim es, t h e i r deadl i n es, the c u r rent
time, and remain i n g comp utation time for all services, a n d to reassign priorities t o
all services o n every p reem ption. E timating rem aining computation t i me for each
service can be d i fficult and typically req u i res a wor t-case a p p roximation . Like
EDF, L L F can also schedu le l 00% of the C P U for ched ules that can't be sched u led
Scanned by CamScanner
,,,,..,....
71
E xampl@ 1
T1
(1
T2
(2
T3
(3
Ul
0.5
U2
0. 2
UJ
0. 285 7 1 4
LC M =
Utot
70
0. 985 7 1 4
R M S c hedult'
51
52
..
53
E OF 5 c hedule
s 1
'
$2
S3
no
s1
52
53
52
53
Lax ity
52
53
FIGURE J. 1 4
x
5
l
-
LLF S c hedult'
s1
51
---,
..
x
3
x
x
I
I
I n tuitively, it's hard to believe that the use of any resou rce can be safe with no
margin at all. Even the slightest miscalculation on reso u rces requ i red can cause an
overload (overuse of resources) . This is a likely scenario given that determi ning the
actual execution time that all services will req u i re is n o t easy u n less very pes
simistic worst-case ti mes are assu med. So, it becom es i n terest i n g to consider what
happens to threads or services i n a n overload scen ario for a given policy. For EDFr
overload leads to nondetermi n istic fa i l u re-that is, it's very hard to predict exactly
which and how many services will m iss their dea d l ines i n a n overload. It depends
upon the state of the relative prioritie s during the overloa d, which i n turn depends
upon the order of time to deadli ne times for all servic es ready to r u n .
B y comparis on, fo r fixed-prio rity polices such a s RM, i n a n overload all services
,
of lower priori ty than the service that is overr u n n i n g may
m iss their deadl i ne, yet all
services of highe r prior ity are guaranteed not
to be affec ted as show n in Figu re 3. 1 5.
For EDF an over run by any servi ce may
cause all othe r serv ices on the ready
q ueue to miss their dead lines ; a new
servi ce adde d to the q ue ue, therefo re adju st
. .
mg prior i ties for all, will not pree mpt
the over ru n n i n g servi ce. The overru nni ng
service has a time t d adline that
is negati ve beca use it has passed , so it co nt rn ues
.
to be the h ighe st pno nty serv ice and
cont i n ue to ca u e othe rs to wait a n d po t en
tiall y miss dead Hnes . In an over ru
n scen ario , c o m m o n poli cy is to t er mi n at e th e
rde ase of a serv ice t h a t has ove rr
'
u n . T h i c a u s es a ser v ice d ro p out
. H o we\ e
sim ply detec t i n g ove rrun and term
.
matm
g t h e ove rru n n i n g serv i. e ta ke sorn
Scanned by CamScanner
,...
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,,_.) _
. ...,..,.,
._._.
Processing
S,
Slrvu:es
71
onllnuc 10
kel Deadline.
f'no1 ( $, )
l'nt(S
,, I .,
S,
a-running
o'
h,,c, l >caJhnc
S
si-r' ice
FIGURE J. 1 5
CPU resource, which without any margin means that some other service will m iss
i t s deadli ne-with overrun contro l EDF becomes much more well behaved in an
overload scenario-the services with the soonest dead l i nes will then clearly be the
ones to lose out. However, determ i n i ng which services this will be i n advance
based upon t h e dynamics of releases relative to each other-is still d i fficult. Figure
all services queued while the overru n n i ng service executes potentially m i ss their
deadlines, and the next service is l ikely to overrun as well causi ng
cascad i n g fa i l
Prio(
,.1
.)
<
Pno (
over-running
G0
1)
O\ er- ru n ning
0
FIGURE J. 1 1
Scanned by CamScanner
10
EDF
i nt o the dea
. .
exist whe re a d i ffer e n t pol icy i s encoded
V anat1ons
of
dli....,, ,.
.
.
II
L
b
a
d
o
n
g
fi
n
e
d
m
y
y
1
u
a
n
(de
r
fr
rk
La yl a nd ) 0
.
. n y am ewo
d nven,
dynam1 C-pno
ne
.
h i g hes t p n o n t y 1 s assi gn ed
n
t
o
o f t I1e more mteres t t ng variati ons 1s LLF. I L L F. ,
th
.
.
e
.
ce b e tween i t s rema .i n m g e x ec ut i o n t i m e a d
fferen
.
i
d
ast
I
e
e
h
t
as
t
h
e
a
h
servic t
n
.
.
re nce b e t ween t h. e i r deadl i n
d
d
me
i
t
ffe
i
e
h
t
.
L
i
is
ty
i
x
a
e
n
i
g
e
d
a
. u pco m i n d ea
its
.
.
n
. .
a serv1Ce. Dete r m m m g t h e l east l a x ity s erv
rem a i. n i. n g com pu t a t i on t i m e for
ice
.
.
uest
req
serv1Ce
g
n
es
i
d
m
i
t
outstan
,
l
l
a
h
t
w
eir dead .
requ i res t 1i e sc h e d u ler to kno
.
s
e
m
i
t
l
l
n
a
o
e
i
t
a
for
t
u
p
rvi
m
o
c
.
s.
e
c
A fter 'I
L
cu rreri t t i rn e , and rem a i n i ng
tne
I mes,
.
.
. .
.
p n o n t i es t o a l l serv ices
on eve4J...
t h .1s 1s k nown , t he scheduler must t h e n reassign
.
.
. .
f
at
event that adds anothe r service t o t h e ready queue. Est i m m rema m mg co m pu.
tation t i m e for each service can be d i ffi c u l t a n d t ypica l l y req u i re s a wor st- ca se ap.
proximation. The LLF policy encodes t h e concept of i m m i n e n ce, w h ich i n t u iti vely
.
poi nt-what we rea l l y want to do is encode w h i c h serv ice i s most i m port ant and
make s u re that it gets the resources it needs fi rst. We wa n t a n i n tell igent sched uler,
Dev s ng
Dev i
.
I n Ch a p t r 7 1 ' So ft Re al
- T 1 m e Se rv1c
es, " we w1 1 1 ex p l o re som. e of t he me t hod
t h a t ha ve be en pr op o ed
to a d a pt RM a n d d ea d
i i. ne - d n. v e n poli cies
.
.
to s1tu
. at1. om
w h er e an oc ca
s1o na l ser v ice
d rop ou t o r ove rru
n 1s a ccep ta b l e .
rea l - t i me m et h
o d 1ror h a nd l i. n g m i se
d de ad ! m
. es w i- 1 1 b e m o re clos
ely exa m m ed alon g with mort
i.. n - d e, pt h
of
an d L L F sc
co ve ra ge
_
Scanned by CamScanner
ED F
he d u l i n g .
Soft
Processi ng
81
S UM MARY
Fixed-priority preemptive scheduling with a R M priority assignment policy (short
est period has h ighest priority) is most often used and advised for hard real-time
systems. Hard real -time systems by definition must have deadline guarantees be
cause the consequences of a missed deadline are total system failure, significant loss
of assets, and possible loss of human l i fe. All service sets run as threads of execution
should be tested with a sufficient or better yet N&S feas ibility test .before being
fielded for hard real-time operation. Passing an N&S feasibility test guarantees a
service set will meet its deadlines as long as Ci, Ti, and Di were properly specified
and are determ i n istic or worst case. For quick analysis, a sufficient test such as the
RM LUB may be useful. The potential disharmony in period is the reason that the
RM L U B is less than ful l utility for two or more services. Services that have har
monic periods can be safely scheduled with utility exceed ing the RM LUB despite
failing to pass this test and will pass an N&S test. In general, dynamic priority pol i
cies such as EDF and L L F are not considered safe for hard real-time systems due to
their difficult-to- predict deadline overrun characteristics. Dynamic priority policies
do work well for soft rea l - time applications, such as game engines, video, audio, and
multimedia applications, where an occasional service dropout is acceptable.
EXERCISES
the s u fficient R M scheduling feasibil ity test ( Liu and
p.9, T heorem 5) for the following ANSI C function protot
l . Code
i n t RM_s u f f i c ient (
int Ntasks ,
int
* t id ,
u n s ig n e d long
int
*T,
u n s i g n ed l o n g
int
*C,
long
int
*D) ;
u n s igned
Ntasks
task I ds,
set, t id
is an array of unique
tion times, o is an array of the deadl i nes (T must equal D for each
RM).
in
Scanned by CamScanner
t id
12
2.
.
T e Test ( fou nd earl ier i n t h i s cha pter ) fo r th e fol.
od e t h e Co mp let ion i m
ion
i n t Sc hed _co mp l e t
int Ntask s ,
i nt
t id ,
u n s ign ed 1 ong l n t
u n s i g n ed 1 o n g l nt
u n s igne d lon g i n t
r,
c,
o) i
e t h at T m u t equ al o i n all
Parameters are defined again as i n # I . As u m
cases.
' .
between a sufficient a nd
3. Oescnbe m your own words what the differenc e is
.
.
.
a necessary and s1ffici ent sched uling feas1 il 1t test .
.
.
4. Why is the sufficie nt R M LUB so pess 1 m 1st1c .
.
7L
0 CPU re s . If EDF can be shown to meet deadli nes a n d potent 1 a 11 y h a s l QOO
. y
source utilization, then why is it not typically t h e h a rd rea l - t ime polic
of
choice? What are the drawbacks to using EDF c o mpa red to R M / O M ? I n
a n overload situation, how will E D F fa il?
6. Code a function to compute the Fibonacci seq uence to any n u mber of
terms. The sequence is 0, I , l , 2, 3, 5, 8, 1 3, 2 1 , 34, 55, 89, . . . T h e Fibonacci
sequence or Fibonacc i numbers begin with 0 and l . The next term is then
the sum of the two previou s terms. ( Do not be concer ned if your Fibonac i
number overflows an unsign ed 32-bit i nteger , just l et i t overflow).
7. Now, deter mine how many term s i n t h e seq uenc
e corre spo nd to I 0 mil
l iseconds of com puta ti o n on a lab t a rget PC ( th e
a nswe r may vary de
pend ing upon the spec ific targ et used . T he easie
)
st way to do this is to ust
Win
dVi
ew
to
mea
sure
the CPU t i me take n b y a task call i n g you r funct ion
J
'
with a large valu e. See how lon g that take
s a n d the n sca le up or down the
n um ber as needed to ach ieve 1 0 m il
lise con ds of com put atio n. Repeat thi
to determ ine how ma ny .ter ms a re
req ui red for 20 m i l l isec ond s of compu
tati on usi ng Wi ndV iew .
8. Giv en a tas k set wit h two
tas ks cal l i n g yo u r Fib on acc i seq
uence, o ne with
N ter ms for 1 0 mi ll ise
co nd s of exec u t i o n a n d t h e
oth er fo r 20 mi l li
o d , is the sys tem fea
sib le if the I 0 m i l li sec on d
g rstern a nd h o w evid
en e that it w rks o r e 'pl ain wh'
1t won t
.
? ;
Scanned by CamScanner
PJOcessing
83
CH APTE R RE FE RE N CE S
[ Aud sley93 I Audsley, N., A. Burns, and A. Wellings, "Deadline Monotonic Sched
uling Theory and Application." Control Engineering Practice, Vol. l : ( 1 99 3 ) :
[ Briand99J Briand, Loi'c and Daniel Roy, "Meeting Deadlines in Hard Real-Time
Systems." IEEE Computer Society, 1 999, pp. 28-3 1 .
J.
a Hard Real-Time Environ ment." journal of the Association for Comp uting
Machinery, Vol. 20, No. l , ( January 1 9 7 3 ) : pp. 46-6 1 .
Scanned by CamScanner
1/0 Resources
In This Chapter
I ntroduction
I n termediate 1/0
1/0 Architecture
I NTRO DUCTI ON
The i nput a n d output t o a service shown previously i n Figure 3. 1 3 o f Chapter
requires 1/0 to/from a device such as a sensor or actuator ( encoder or decoder) .
This l/O is part of the response time, and as shown i n Chapter 3 , it i mply add to
respon e laten cy, but does not affect the service execution or interferen e t i me
during the response. Most services, unless they are trivial , i n olve ome i ntermed i
ate l/O after the i n itial ensor i n p u t and bfore the fi nal po t i ng of out put data to ,
or memory device II
. I f t h i i n termediate 1/0 ha
i n gle ore
le I( ten y,
Ler
15
Scanned by CamScanner
II
.
l/O 1' t 1s more easily model ed as a n exec ution efficiency. Device 11
/
,
1 0 as dev1ce
. .
0
.
lhon s of core cycle s. By co mpari so .
1
m
even
and
ds
usan
,
o
n
th
s,
ed
r
d
un
. i n.
latency is h
.
l/O 1 ten cy 1s typica lly tens o r hundreds of core cycles-if more l a te
a
n
termed 1ate
.
cy
ld
.
gn
be
shou
reworke
desi
ware
d.
hard
core
the
then
e,
'bl
posst
ts
1s
than t h
IME
WO RST -CAS E EXE CU TI O N T
---
o f C P U cycl es required
to finish. Furthermore, for these architec t u res, t h e n u mber
to execute each instruct ion is known- so me instruct ions m ay take more cycle
than others, but a l l are known n u mbers. So, the total n u mber of CPU clock cycle
sponse output.
The exact number of CPU c lock cycles for each i n s t ruction is known.
CPU clocks to execute. This alone does not guarantee dete r m i n istic execution time
because an algorithm may be data d riv e n . The n u m ber o f l oo p i terations or the
b<:
0':e
:
Scanned by CamScanner
1/0 Resources
17
Accu ra c y o f x n eeded
F i n a l ly , s ? me fun c t ions may actually have more than one solution as wel l .
M n y n u mencal methods are s i milar to finding roots by bisection i n that they re
. qmre a total p a t h length t h a t varies with the input For such algorithms, you need
.
t ? pla e a n upper bound o n the path length to define WCET ( Worst-Case Exec u
t 1 0 n Time ) .
Ass u m i ng a n upper bound o n t h e algori t h m ic path length and deter m i n ist ic
Scanned by CamScanner
II
ents an d System s
Re al-T ime Em be dd ed Co mp on
t
ti o n, is to i ncre ase o v
ed in det a l i n the nex sec
ercrib
des
,
ing
elin
pip
of
nt
poi
The
.
pl
m
es
the
f
exa
o
m
ca
fro
.
che
ent
1s evid
rn isse s
.
.
1 1'enc y. However , as
a u exec u t ton effic
s
lm
ll
1pe
sta
es
y
l
,
wil tall , a nd
cau se a dat a dep end enc
.
and M M R late ncie s tha t ma y
.
ion and dat a s t rea m. . The effi. ciency .1s therefo re data
t h 1s 1s a func t'ion o f the 1 nstr uct
ncy wil l vary, eve n fo
erm ini stic . So, execut 10n efficie
r
and code dri ven and not det
y
l
a
n
be
o
t
fun
o
n
y
ctio
a
m
n
ts
n
te
o
con
f
th e
cac he
the sam e blo ck of code, bec aus e
t
e
ds
tha
u
rea
h
xec
t
te
us
d
vio
i
n the
also o f the pre
cur ren t thread of execut ion , but
th
d
n
a
leng
the effi cie ncy
ctio n of he lon ges t pat h
pas t. In sum mary, WC ET is a fun
des cnb es W C ET:
in execut ing tha t pat h . Equ atio n 4. 1
WCET
Clock
- Period
( 4. 1 )
at least approximated well. The longest path i nstructio n c o u n t m ust be determ ined
by i nspection, formal software engineering proof, or by actual instruction count
traces in a simulator or with the target CPU archi tect u re. Warning-most CPU core
documentation states a CPI that is best case rather t h a n worst case.
For ful l determin ism in WCET for hard real - t i m e system s , you m tst guarantee
the following:
in
or
and device 1/0 is not expected n o r req uired t o meet deadli nes.
Overla p o C P
.
.
For soft rea l-ti me sys tem s , yo u c a
. n a I I ow occas1o nal servtce d rop-outs or hm
ite
. d verr ns and ther efor e use
ACE T ( Ave rage -Cas e Exec u tion T i m e ) . The AC ET
can e esti mated fro m the follo
wing i n fo r mation :
Scanned by CamScanner
be
1/0 Resources
II
CPI Efftrr11r
10 - Latency ] +
( 4. 2 )
I n these equations, t h e effect ive C P I acco unts for sec<?ndary p i peli n e hazards
such as branch m ispredi c t i o n s . The term NOA ( N on -Overlap Allowed ) is 1 .0 if IO
t i m e is n o t overlapped with p rocessing at a l l .
i ng, and ca reful e r ice design, you can m i n i m ize the loss o f efficiency d ue to m icro-
1/0. First, you must u n derstand what it mea n s to overlap l/O with CPU.
Consider t h e following overlap
definitio11s:
with
J OT = Bus I n terface 1/0 Time ( Bus 1/0 Cycles x Bus Clock Period )
O R = Overlap Req u i red-percentage of CPU cycles that m ust be co n c u rrent
w i t h 1 / 0 cycles
NOA = Non - Overl ap A l l owable for S1 to meet 01-percen tage of C P U cycle
that can be in add ition to 1/0 cyc l e time without m issing service dead l i n e
Scanned by CamScanner
i b l e overlap conditions fo r
service
PU time and
.
. d o therwi se , i f D' < I OT,
1 . D; IOT is reqmre
'
.
.
S; is 110-Bound.
. CPU- Boun d.
.
.
if D ' < I CT, S i is
2 . D > ICT is requ ire d, o therw1se
. h I CT .
, erl ap 0 f I OT Wit
req uires n o ov
3
1 - ( JOT + JC T)
> I OT a n d D ; ICT), overlap of I OT
with
4. I f D; < ( J OT + ICT) where ( D I -
o'. ;
I CT is req ui red .
" e D ; can 't be
I OT or D; < I CT ) , d ea d i m
met
S. I f Di < ( IOT + ICT ) where (D; <
regardless of ove rlap.
ICT
f
I
zero,
required
is
t
Service
h en no
.
I CT and I OT a re zero, n 0
.
.
.
.
o
n
with
m term ed 1a te IO.
overlap 1s poss1'bl e. Wh en IOT = O ' t h is i s an ideal service
.
From these observatio ns about overlap i n a n o n p reem p tive system , we can de
C P l worst-case = ( I CT + J O T ) I J CT
C P ibest-case
( 4.3 )
C P J required =
( 4. 4 )
D ; I ICT
( 4.5)
OR = 1 - [ ( Di - IOT) I ICT ]
( 4.6)
CPI require
. d = [ ICT ( 1 - O R ) + IOTJ I l CT
( 4. 7)
OR + NOA = I ( by d e fi n i t i o n )
( 4.9)
NOA = ( D; - ! OT ) I I Cy
( 4.8 )
Equations 4 . 7 and 4.8 provide a cross-check. Equation 4.7 should always match
equation 4.5 as long as cond ition 4 or 3 is true . Equation 4 . 9 should always be 1 .0 by
then the following condition must hold for a service to guara ntee a deadline:
WCEli
< D;
( 4. 10)
Tre pon:;e-i
==
i-1
( 4. 1 1 )
Th e W CE T de du ced fro m Eq
u ation 4 . 1 0 m ust the e o
r f re be an i n p u t i n to the
no rm al RM feas i bi l i t y an aly sis
ha
t
m
od el s i nte rfe r e nc e. Th e Co
t
re- to- Bu s- Fa t r
o
Scanned by CamScanner
1/0 Resources
ti
term is ideally 1 . This i s a zero wait-state case where the processor dock rate and
bus transaction rate are perfectly matched. Most often, a read or write will require
multiple core cycles.
The overlap required (OR) is indicative of how critical execution efficiency is
to a service ' s capability to meet deadlines. I f OR is high, then the capability to meet
deadlines requires high efficiency, and deadline overruns are likely when the
pipeline stalls. In a soft real-time system, it may be reasonable to count on an OR
OR
>
D 1 1-1 1
D 1 1-1 1
D 1 5-1.1
1 3-1 '
1 l-l .J
I 1-1l
1- 1 1
0 9-1
D O l-0 9
0 1-0 .1
O f-0 7
0 .5-0 a
O l -0.3
0 0 1-0l
0 0-0 ,
-0 :!--0 ,
FIGURE .1
Scanned by CamScanner
12
C Pl =
I PC
.
. 1 me
. d hardware architecture is to ensure that a n in str uc ti o n
The point of pipe
is
.
ior a 11 I nstructions in the ISA ( I nstruction S et Are h Ite
' ct ure J
completed every cI ock c
.
.
.
NormaIIy CPI is 1 0 or less overall in modern pi pelined systems. Figure 4.2 show
n 1s
com
a simpl e CPU pi pe 1 me
an d 1 ts stage overlap such that one instructio
pleted
(retired) every CPU clock.
IF
ID
IF
Execute
Write-
Back
ID
Execute
IF
ID
Write-
Back
IF
Back
10
Execute
IF
ID
FIGURE 4.2
Write-
Execute
..
Write-
B.lck
Execut
..
..
-
Writ
Back
...
.
.
.
..
i i :D 1;.:.-r=l
.
IF
stall ht
on e-eyeJ e Wnt e- Backs to
pr od uc e co rrect re su lts. Th
esign
e str ate gi es fo r p i pe 1me drrun
a re well descn bed by om
a
pu ter arc hi l ec tur e tex
ts [ He nn essy03 ] , b ut are su
.z
PI
n ed here for conven ience.
Ha zard s tha t ma y sta
. n r ase C
ll
t he pipeli n e and i c e
inc lud e the foll ow ing :
Scanned by CamScanner
9J
H igh latency device read or wri tes, req u i ring t he p i pd i n e to wait for com plet ion
omet h i n g
ont i n ue to
i nglc cycle
where device inte rfaces to sen ors and act uators are co n t ro l l ed and monitored via
M M Rs. When t hese devices and their registers are written or read, thi can stall the
pipel ine while the read or write complete . A p l i t - t ran action bus i nterface to de
vice M M Rs can grea t l y red uce t h e pip e l i ne hazard by allowing reads to be po ted
a l l o ws
write to be posted to a bu
i n t e r fa c e q ueue in a i ngle
m isses a n d core bu
tion and other p i peline haza rd often res u l t in st alls of m uch s h o rter d u ration ( b y
orders of m agn i t u d e ) co m p a red to cache m i se and device 1/0. The i n dete r m i n
ism of cache m isses can be greatly reduced by loc k i n g code in i n struction cache
and l oc k i ng data i n data cache. A Level- I i n truction and data cache is often t oo
m a l l to lock down a l l code a n d a l l data req u i red , b u t ome a rc h i tec t u re i n c lude a
Level-2 ca he,
ha 2 -cycle or m o re, b u t le
cache h i t/m i
ideal for
f t h e nondeterm i n i m of
Scanned by CamScanner
effi ien y w i t h d n a m i
LI
r L2
he .
loadi ng. A l l be t
94
ry
1
to exec u te w h 1 e a .w n te 1s d rai ni n g t
diffic ult. When instru ctio n s are allowed
o
is oka _r i n m a n y c i rcu m s tan ces, b u t
no:
devic e, this is called weakly consisten t. This
not
yet
s
c
n
executed
o
t
i
nstruc
i
for
other
or rect .
when the write must occur before
posted
write
h
t
y
b
ited
m
us
i
b
i nt erfac
unt il
errant. A stall where the pipeline must wait fo r a read completion i s called a data
dep ende11cy stall. When split-transaction reads a re sched u le d with a regi ster as a
dest ination, this can create another hazard called reg ister p ressu re-th e regist er
awaiting read completion is tied up and can ' t b e used a t a l l b y other i nstr ucti ons
until the read completes even though they a re not dependent u pon the read. You
can reduce register pressure by adding a lot o f genera l - p urpose registers ( m ost
pipelined RISC arch itectures have dozens a n d dozen s o f t he m ) a s well as by pro
viding an option to read data from devices i n to cache. Read i n g from a memory
mapped device i n to cache is normally done with a cache prefetch instruction. In
the worst case, we m ust assume that all device 1/0 d u r i n g executio n of a service
stalls the p i peline so that
WCET = ( CPib,,st=case
Longest
Path
Instruction.
Count ) + Stall
Cyclei
Clock
Period
I f you can keep the stall cycles to a determ i n istic m i n i m u m by locking code
.
m to L2 cache ( o r L I i f poss i b l e ) a n d b y reducing device 1/0 stall cycles, then
WCET can be reduced. Cache locking helps immen sely and i s ful l y deterministic.
I
smg
h
e-c
'
1
p
syste
m
m
desig ns t h a t integ r ate
.
u up e
p roc ess ors wit h m a n y p
enp h era I s m
more comp le .
B
.
x m terco n nec tions t h a n I L'
( Bus I n te r fa ce U n i t ) des igns
.
.
. F 1gur
t er
e 4 . 3 p rovid es an over view of t he m a ny 1n
.
Scanned by CamScanner
1/0 Resources
95
even
lntfrtonnctton N ftwork
Dynamic
Bus
iA
king
Omqa
FIGURE 4.3
Non-blocking
ArbltntfCl/Routed
Fully
Com11ted
Ring
Hab
Trtt
The cross-bar interconnection fully connects all processing and 1/0 compo
nents without any blocking. The cross-bar is said to be dynamic because a matrix
ways that can't be used sim ultaneously. The bus interconnection, like a cross-bar,
is dy namic, but is fully blocking because it must be time multiplexed and allows no
more than two end-points with i n the entire system to be connected at once.
FIGURE U
Scanned by CamScanner
y
qu eue . Th is
t
ues
req
a
in
it
wa
.
tor
ues
m akes the req
nen ts w i l l be discussed in
n har dw are com po
,,
cre ase s 1/0 late ncy . I nte rco nne ctio
tem Co mp on ent s.
Sys
ded
bed
Em
"
8,
er
gre ate r d eta il in Ch apt
ac
S U M MAR Y
incl udes i n p u t , i nter med iate 1/0, and out put
The over all resp onse time of a serv ice
mos t ofte n be dete rm i n ed an d added
laten cy. The i nput and outp ut laten c ies can
is more com plex becau se i n te rmedi
to the overa ll respo nse ti me. I n terme diate 1/0
al, i nterc on nection net work
ate 1/0 includ es regist er, memo ry bus, and, i n gener
EXERCISES
l.
are
this system? What is the overlap between i nstructi o n s and 1/0 time if the
i n termediat e I/O time is 4.5 m i c rosecon ds?
3. Read the Sha, Rajk umar, et al. paper, (( Priority I nheritan ce Protoco ls: An
A pproac h to Real-T ime Synchr onizat ion." Write a b rief s u m m ary noting
Scanned by CamScanner
1/0 Resources
17
semTake of s 1 , delays l second, and then does a semGive of S2. Test you r
s emG i v e
and
semTa ke
7. U se
t a s k S w i t c h HookAdd
and
t a s kSwi chHookOe l e t e
own code to the wind kernel context witch sequence and prove i t works
by increasing a global counter for each preempt/d i patch context swi tch
and t i mesta mping a global with the last preempt/d ispatch time with the
x86 P I T ( progra mmable in terval t i m e r )-see the CD- ROM sa mple PIT
call can also be used t o sa m pl e relative
t ickGet { )
s y s C l k R a t e S e t ( 1 ooo )
ti mes, but is only accurate to the tick resol ution ( I millisecond ass u m i ng
and delete so that you can turn on your switch hook easily from a
windsh
counts for a specific task ID or set of task I Ds ( up to J O) and the last pre
empt and di pat h t imes for all tasks you are moni tori ng. Run your code
from Exercise 4 and monitor it by cal l i ng your On/Off wrapper before
r u n n i ng yo ur test tasks. What are your counts and last ti mes and how do
they compare with W i ndView analysis?
tNetTask
for 30 seconds.
CHAPTER REFERENCES
! Almasi89 ] Almasi, George, a n d Allan Gottlieb, Highly Parallel Compu ting. The
Benjam i n/Cummin g Publishing Company, 1 989.
I Henne y0 3 ] Patterso n, David, and John H e n nessy, Compu ter A rch i tecture: A
Quantita tive Approach, 3rd ed. Morgan Kaufmann Publisher , 2003.
I Patter
on94 j Hen nessy, John and David Patterso n , Compu ter Orga n iultion and
Design: Tire Ha rdwa re/ oftware Interfa ce. M organ Kaufm a n n Publi her ,
1 994.
Scanned by CamScanner
Memory
11 Tiiis Chapter
Introduction
Physical Hierarchy
Capacity and Allocation
Shared Memory
ECC Memory
Flash Filesystems
INTRO DUCTION
In the previous chapter, memory was analyzed from the perspective of latency,
and, in this sense, was treated like most any other 1/0 device. For a real- time em
bedded system, this is a useful way to view memory although ifs very atypical
compared to general-purpose computing. In general, memory is typically viewed
as a logical address space for software to use as a temporary store for intermediate
results while processing input data to produce output data. The physical address
space is a hardware view where memory devices of various type and latem:y are
either mapped into address space th rough chip selects and buses or are hidden as
caches for mapped devices. Most often an MMU ( Memory Management Unit)
provides the logical-to-physical address mapping (often one-to-one for embedded
systems) and provides address range and memory access attribute checking. For
II
'
Scanned by CamScanner
1 00
re
source perspective, total memory capacity, memor access latency, and m emory
interface bandwid th must be sufficien t to meet requ irements .
PHYSICAL HIERARCHY
The physical memory hierarchy for an embedded p rocessor can vary signi ficantly
based upon hardware architecture . However, most often , a H a rvard architecture is
used, which has evolved from G PCs ( general -pu rpose com puters) and is often em
ployed by em edded ystems as well. The typical Harvar d architec t u re with separ
ate
L I ( Level- I ) mstru ct1on and data caches , but with u n ified L2 ( Level- 2)
cache and
either on-ch ip SRAM or extern al DOR ( Dyna mic Data RAM ) is show
n i n Figure S. l .
JICllDS
Scanned by CamScanner
u ... .. . _ _ .,,
Memory
1 0 1-
L2 Data Cache
Locked St.ck
Cached [)eta
Locked Dita
FIGURE 5.2
St.ck
by firmware.
Memory system design has been most influenced by GPC architecture and
goals to maximize throughput, but not necessarily to minimize the latency for any
sint,le memory access or operation. The multilevel cac;hed memory hierarchy for
GPC platforms now often includes Harvard L l and unified L2 caches on-chip with
off-chip L3 unified cache. The caches for GPCs are most often set associative with
aging bits for each cache set (line) so that the LRU (Least Recently Used) sets are
replaced when a cache line must be loaded. An N-way set-associative cache can
load an address reference into any N ways in the cache allowing for the LRU line to
be replaced. The LRU replacement policy, or approximation thereof, leads to a
high cache hit to miss ratio so that a processor most often finds data in cache and
does not have to suffer the penalty of a cache miss. The set-associative cache is a
compromise between a direct-mapped and a fully associative cache. In a direct
mapped cache, each address can be loaded into one and only one cache line mak
ing the replacement policy simple, yet often causing cache thrashing. Th rash ing
occurs when two addresses are referenced and keep knocking each other out of
cache, greatly decreasing cache efficiency. Ideally, a cache memory would be so
flexible that the LRU set ( line) for the entire cache would be replaced each time,
minimizing the likelihood of thrashing. Cost of fully associative array memory
prevents this as does the cost of so many aging bits, and most caches a r e four-way
Scanned by CamScanner
1 12
ms
ponents and Syste
Real-Tim e Em bedded Com
.
e 5.3 shows a directb 1' t s for LRU. Figur
.
ng
agi
3
or
2
ciative WIth
e ca che mern .
or ei t-way set asso
t im es th e size o f th
.
ur
fo
is
at
th
a mam emo
byte ad dr ess ab le words),
mapped cache where
r o r mo re J 2 -bi t
u
o
o
ions
lect
(col
has memory sets
se t.
ed in to on e ca ch e
ad
lo
be
ly
on
n
ca
h
ic
gh
Ma in Memory
FIGURE 5.J
By compariso n, Figure 5.4 shows a two-way set-associ ative cache for a main
memory four times the size of the cache. Each line can be loaded into one of four
locations i n the cache. The addition of 2 bits to record how recently each line was
accessed relative to the other three i n the set allows the cache controller to replace
lines that are LRU. The LRU policy assumes that lines that have n o t been accessed
recently are less l ikely to be accessed again anytime soon. This has been shown to
be true for most code where execution has locality of reference where data is used
within small address ranges ( often i n loops) distributed throughou t memory for
general-p urpose programs.
e. H owev er, t h 1s
.
. would greatly reduce t h roughp u t to obta lll
.
.
determ m1 st1 C exec uti on s
o , rat h er t h an mclud mg
m u ltileve l caches ' man y real
t .1me em L
uc
-dd ed s ys t
em
s
inste ad mak e invest men
t in TCM ( Tigh tly Cou pled Mern
. .
.
ory) , wh 1c
h 1s smgle -cycle a
ccess 1or
,
t he proc esso r, yet has no cach e fu nctiona 1 tr
.
Scanned by CamScanner
Memory
10J
Furthermore, TCM is often d ual ported so that service data ( o r context ) can be
Joaded via D M A and read/written at the same t i me. Some G PC proce ors i nclude
LI
or
L2
caches that have the capability to lock ways so that the cache can be
is often carefully analyzed and optim ized so that data access is carefully plan ned. A
software-man aged cache uses application- specific logic to schedule load i ng of a
TCM with service execution context by D M A from a larger, h igher latency external
memory. H ardware cost of a GPC cache arch itecture is avoided and the manage
ment of execution context tailored for t he rea l - t i me services-the scheduling of
DMAs and t h e worst- case delay fo r complet i n g a TCM load-must of cour e till
be carefully considered.
Scanned by CamScanner
IN
S HARED ME MORY
Often two services find i t useful share a memory data segment o o d e egme n t . I n
t o t h i s h a red meo':Y
t h e case of a shared data segment, t h e read/write acce
_
mu st be guaran teed to be con s i tent so that one serv i c e i n _ t h e m i d d le o f a w n t e is
For
a t e l l i t e st: te vector
t y p i c a l l y in c l ude 3
_
c s ld ) whe n Se i d
0, n w u n b l k o n e
e
h
t
f
w i t i n ' e rv i
T yp i
the
1 'n
r i c
r e u n b l k cd i n 1 r
t - i n , . , , 1 u t i r d c r. lli .
u r
roun J 1 11 the O<Jc th t w rnc a
nd/ r re a d ; the 1ha r
d
mc
m
ry
h
t
i
w
s r k ,
.
J n d 111G 1 ( Se i o ) u m
1dJ
g
m m > n S e i d , the p
u d re, a nd rcJ d
re ua rJ n t ced
to tc m u t u J l l e c l u 1v . 1 he
n t 1 , I e t io n
a r c: ,h >w n in , ,
5. t .
for
co nt inu e
to go from to
lly
blc
Scanned by CamScanner
Memory
TABU S.1
1 05
1-.C..
semTa k e ( Semid ) i
..
get x ( ) i
get Y ( ) i
semTake ( Semid ) ;
control ( X , Y , Z ) ;
seRIGive ( Se111 id ) ;
get Z ( ) ;
semGive ( Semid ) ;
at line 4 for example, then the cont rol ( X , v , Z ) function would be using the
new x and possibly an incorrect and definitely old v and z. However, the
semTake ( Semid ) and semGiv e ( Semid ) guarantee that the Read-Code can't preempt
the Update-Code no matter what the RM policies are. How does it do this? The
semTak e ( Semid ) is a TSL instruction (Test and Set-Lock) . In a single cycle, sup
unchanged, and the next instruction is a branch to a wait-queue and CPU yield.
Whenever a service does a semGi ve ( Semid ) , the wait-queue is checked and the first
waiting service is unblocked. The unblocking is achieved by dequeuing the waiting
service from the wait-queue and then placing it on the ready-queue for execution
inside the critical section at its normal priority.
ECC MEM O RY
For safety-critical real-time embedded systems, it is imperative that data corrup
tion is detected and ideally corrected in real time if possible. This can be accom
memory interface can detect and correct SBEs ( Single Bit Errors) and also can
detect MBEs ( M ulti-Bit Errors), but can not correct MBEs. The encoding for the
extended parity bits for ECC is based upon the Hamming code. When data is
written to memory, parity bits are calculated according to a Hamming encoding
and added to a memory word extension ( an additional 8 bits for a 32-bit word).
When data is read out, check bits are computed by the ECC logic. These check bits,
called the syndrome, encode read data errors as follows :
1 . I f Check-Bits = 0 AND parity-encoded word = 0 => NO ERRORS
Scanned by CamScanner
1 OI
3. If
Chec k-Bit s ! =
HAL T
If Check-Bi ts =
4.
word =
O AND parity- enc oded
rd =
O AN D pari ty- enc oded wo
CAN
CORRECT
i nst ruc tio n exec uted
aJly hal ts bec aus e the nex t
On MB Es, the processor no rm
, the CPU will go
ld cau se a fat al err or. By h a l t ing
wit h un known corrupted data c0u
ad.
er reset and safe recovery i n ste
through a hardware wat ch- dog tim
a non mas kab le inte rru pt tha t so ftwa re
For a correctable SBE , the CPU rais es
read i n g the add ress o f the SBE l ocation,
should hand le by acknowledg ing and then
mem ory loca tion fro m the read
and finall y writi ng the corrected data back to the
from m e mory into regis
register. The ECC autom atical ly corre cts data as it is read
data i n the regis
ters, but most often it's up to the softwa re to write the correc ted
ter back to the corrupted memor y location . I n some i m plemen tations hardware
may also automate the write-back of corrected data.
pW
p() l
p()2
p()3
o04
p()S
p06
ED
m
ro t
.!
l:Ol
2
p02
x
1
p()l
0
PW
ED
. \It
1
pOl
.
. tl
3
dO l
,,
. ..
1
1
4
p03
x
3
4
o0 2 dO l pQJ
0
. 1 1 1
1
0
1
1
5
d02
"' L
1
1
1
5
d02
i
1
6
7
d03 d04
' -
9
dos
10
d06
11
d07
lFt
12
d08
0,: :..1
-. 1J
7
d04
tJ .r f.J' 'O
9
dos
6
d03
r
"" - -i--- bill .. n
8
p04
P04
111'.'D
0
10
12
11
d06 d07 d08
:'I '.G",O" . lf'.U''.1
0
'
,_.,.,.
-w
co
FIGURE 5.5
Ha m m ing encoding f
o r 8-bit wnrtt ,..ir..
Scanned by CamScanner
- - --- - .
1 07
Memory
In Figure 5.5, the ED (Encoded Data) is. the same as the CD (Corrected Data).
Furthermore, the check bits are zero and pW, parity for the ED, remains zero as
well. Figure 5.6 shows a more interesting scenario where bit-5, which is data bit
d02, is flipped. This could occur due to an environme ntal hazard such as EMI
( Electromagnetic Interferenc e) or charged particle radiation. When this occurs,
because it's an SBE, the Hamming syndrome (check bits) detects and corects this
error as shown in Figure 5.6.
pW
pOl
p02
dOl
p03
d02
d03
p0 1
p02
P03
p04
'T 'V
1
1
d04
), \. v ,J". r:Ji'''.,
1
10
11
12
p04
dos
d06
d07
doe
ED
0
0
Im
ED
0
0
0 . '
_o
.. x
'
tos.
SSE
JC
10
11
12
pW
pOl
p02
dO l
p03
d02
d03
d04
p04
dos
d06
d07
d08
0
0
l1}1"."._!!
1
'
1 :
co
'
tel '
U. .,.
pOS
p06
JI- .ilU:l[J"'rU.."'
1
-t
1;.V<c.. "lJ.
0
1
. '"'
0
Bii
0
0
'
Conection
FIGURE
5.1
In the Figure 5.6 example, the syndrome computed is nonzero and the pW= l ,
which is the case for a correctable SBE. Notice that the syndrome value of binary
I 0 I 0 encodes the errant bit position for bit-5, which is data bit d02. Figure 5. 7
illustrates an MBE scenario.
Note that the syndrome is nonzero, but the pW=O, meaning that an uncor
rectable MBE has occurred. Figure 5.8 shows a scenario where the ED parity is
corrupted.
In this case, the syndrome is zero and pW= I indicating a parity error on the
overall encoded data word. This is a correctable SBE. Similarly, it's possible that
one of the Hamming extended parity bits could be flipped as a detectable and cor
rectable SBE as shown in Figure 5.9.
Scanned by CamScanner
1 08
tem s
ponents and Sys
Real-Time Embedded Com
0
p()l
p02
o03
0
pW
x
p01
x
0
p04
pOS
p06
ED
5'111 ED
f "0
C02 0
C03
0
pW
0 '
0
x
'6 X .
..., B
MBE
CD
p01
p02
ED
"" ED
CO i o "'
t02
t03
:t>4
0
pW
x
0
0
6
d03
7
d04
5
4
3
2
p02 dOl p03 d02
"O
0 ' '.,. 1
0
1
1
0
0
1
p()l
x
0
4
2
3
p02 de p03
.x n....
x
..
0
d02
,
1
0
0
10
d06
0
0
0
'
12d08
d07
. W:..... .-
:a
0,
0
0
8
p04
9
dOS
10
d06
11
d07
12
d08
".I
11.
.1
0
1
U:""'""I
.I
11
12
d08
o-
ll
COIRECTION
Q ...
0
0
8
p04
x
d07
,a"
L- -
6
d03
7
d04
9
dos
10
11
d07
12
0
0
0
0
0
0
p04
co
-
0
d06
..
.. :...
. .
0
0
d08
.,...__..,
. 1
0
0 I
H amming syndrom e
catches a nd corrects
PW ses::
Scanned by CamScanner
10
d06
9
dos
0
0
0
2
3
4
s
p02 dOl Po3 d02
0 ..1 .... 1
"r.
1
1
. Q ..
7
d04
d03
0
pW
0
p()J
FIGUltE 5.1
:n.:.
CD
9
8
p04 dOS
0 "J x ro '.,
0
0
7
d04
j
- x
..., G
S8E
dO l
l
1
6
d03
p03
p04
p05
p06
5
d02
RGURE 5.7
p01
0
2
p02
x
4
p03
x
'
1 09
Memory
p()l
1)()2
1)()3
pW
x
1
p() l
x
2
002
x
dO l
1
'!'1N
1
1
d04
p04
x
dos
0
0
10
d06
11
d07
12
doe
2
D02
0
3
d0 1
4
p03
1
s
d02
6
d03
d04
ED
0
pW
1
1
1
0
x
x
0
CD
SEE
CORRECTION
FIGURE 5.t
1
. 1
0
0
0
0
12
d08
11
d07
D0 1
10
d06
c03
.....,
0
1
t06
1
1
6
d03
Q
c05
s
d02
ED
d> l
c:02
C04
D0 3
D04
DOS
D06
p04
1
dos
This covers a l l t h e basic correctable SBE possibilit ies as well as showing how an
uncorrectable M B E is detected. Overal l , the H a m m ing encoding provides perfect
detection of SB Es and M BEs; u n l i ke a single parity b i t for a word, which can detect
all SBEs, but can only detect M BEs where an odd n u mber of bits are fli pped. I n t h i s
sense, a si ngle parity bit is a n imperfect detector and also offers n o correction be
cause i t does not encode the location of SB Es.
memory used for embedded systems. Usage o f EEPROM con t i n ues for low- ost ,
low-capacity N V R A M ( No n - Volatile RA M ) system req u i rements, b u t most often
Flash memory tec h n o logy is u ed for boot code storage and for t o rage of non
vola t ile files. F lash fi lesystems emerged not long after Fla h devi e Fla h offer i n
circuit read, write , erase, l oc k , and u n lo k operat ions on e tor . Th era c a n d I k
ope ratio ns operate on t he e n t i re ector, m o t oft e n 1 28 K B i n size. J{ead a nd write
operation s can be a byte, word, or b l o k. The m o t p revalent Fla h te h nulog)' j
.
Scanned by CamScanner
/1
1 10
and Systms
Real-Time Embedd ed Com ponents
read-only memory.
clear that min im izin g sect or erases
Based upon the characteristics of Flas h, it's
rs are erased at a n even pace
will maximize Flash part lifeti me. Furth ermo re, if secto
ble for the overall ex
over the entire capac ity, then the full capac ity will be availa
boot code and b oot
pected lifetim e. This can be done fairly easily for usages such as
u
code updates. Each update simply moves the new write data above the previo sly
written data and sectors are erased as needed. When the top of Flash is encoun tered,
the new data wraps back to the beginn ing sector. This causes even wear because all
sectors are erased in order over time. Filesyste1ns are more difficult to implement
with wear leveling because arbitrarily sized files and a m ultiplic ity of them leads to
fragmentation of the Flash memory. The key to wear leveling for a Flash filesystem is
to map filesystem blocks to sectors so that the same sector rotation and even n umber
of erases can be maintained as was done for the single file boot code update scheme.
A scenario for mapping 2 files each with two 5 1 2 -byte LBAs ( Logical Block Ad
resses) and a Flash device with 2 sectors, each sector 2 ,048 bytes in size ( fo r simplic
ity of th example) shows that 1 6 LBA updates can be accommo dated for S sector
S1)
0,0
1,,
,.,
1,,
,.,
2, 1
2. 1
0.2
0,2
0,2
3.2
3,2
so
FS L8a IJpdUd
FS L8e Ched
-
Sec:tar l.Bs
Sect<n Er-..d
11
(SO. S 1)
FIGUIE 5. 1 0
Scanned by CamScanner
2.1
2.1
2,2
2.2
2,2
2.2
Si m pl e Flash wea
r lever mg ex am pl
e.
UPDATES
10
Memory
111
erases ( 3 for Sector 0 and 2 for Section I ). Continuing this pattern 32 updates can be
co m pleted for 10 sector erases (5 for section O and 5 for sector I ). I n genera l approx
.
i mately 6 times more updates than evenly distributed erases.
With wear leveling, Flash is expected to wear evenly so that the overall lifetime
is the sum of the erase lim its for all the sectors rather than the limit on one, yielding
a lifeti me with m illions of erases and orders of magnitude more file updates than
this. Wear leveling is fundamental to malcing filesystem usage with Flash practical.
SUM MARY
Memory resource capacity, access latency, and overall access bandwidth should be
considered when analyzing or designing a real-time embedded memory system.
Although most G PC architectures are concerned principally .w ith memory
throughput and capaci ty, real-time embedded systems are comparatively more
cost sensitive and often are designed for less throughput, but with more determi n
istic and lower worst-case access latency.
EXERCISES
I . Describe why a multi level cache architecture m ight not be ideal for a hard
real-time embedded system.
2. Write VxWorks code that starts t wo tasks, one that wri tes one of two
phrases alternatively to a shared memory buffr, i ncluding 1 ) "The quick
brown fox j u m ps over the lazy dog" and 2 ) "All good dogs go to heaven."
When i t's done, have it call T a s k D e la y ( 1 4 3 ) . Now add a second task that
reads the p h rases and p r ints them out. Make the reader tak h igher prior
ity than the writer and have it call T a s k D e lay ( 37 ) . How many t imes does
the reader task p ri n t out the correct ph rases before they become jumbled?
5.
SBE scenario.
Scanned by CamScanner
MBE scenario.
1 1J
C HAPTER REFERENCES
[ Mano9 l J Mano, Morris M., Digital Design, 2nd ed. Prentice- Hall, 1 99 1 .
Scanned by CamScanner
Multiresource Services
This Chapter
I n t roduc t ion
Block i n g
Deadlock a n d Livelock
Critical Sections to Protect Shared Resources
Priority I n version
Power Management and Processor Clock Modulat ion
INTRODUCTION
Ideally, the service response as ill ust rated in Figure 3. 1 3 of Chapter 3 i s based u pon
i n put latency acq u iring t h e CPU re o u rce, inte rference, and o u t p u t l a tency on ly.
H owever, many services need more resources than j ust the C P U to exec u te. For
exam ple, m a n y services may need m ut ua l l y exclu ive access to shared mem ry
re ou rces or a
RM
mu t yield t h e
t hat hold
resour
e,
i blo kc<l. it
will
o m plete i t
m u t uall
c wi l l
ex Ju i
e u e
f t h at
u nb l o k, p tentiall pn.Aemi .
1 1J
Scanned by CamScanner
1 1C
.
co mponents a""
Real-Time Embedded
_, , ,,, - -
i n thi s chapter
.
io n H o wev er, a s you 'll set:
cut
exe
e
u
10
t
n
co
d
an
0 ther services'
d ead1 ock or priority
"
t 1or
ry if cond itio ns are suffi cien
pora
em
t
e
b
.
not
may
.
blockmg
inversion.
BLOCKING
lled /rve
Iock and also
.
pr
ev
en
ts
detect1 on and break ing
progress co m ple tely despi te
o f t he deadlock.
Scanned by CamScanner
Multiresource SeMces
1 15
Service- l
(Holds Resource A)
Service-2
(Holds Resource B)
FIGURE 1.1
Service-3 Requests
R source B
Service-3
(Holds Resource C)
Scanned by CamScanner
1 16
ED RE SO UR CE S
C R ITI CA L SE CT IO NS TO PR OT ECT SH AR
bed ded sys tem s to share dat a bet ween two ser
Shared me mory is often used in em
s betw een serv ices , but often even messages
vice s. The alternative is to pass message
shared b uffer. Di fferent cho ices for service
are passed by synchro nizi ng access to a
e clos ely i n Cha pter 8, " Em bedded
to service com mun icati on will be exam ined. mor
is used , beca use rea l - t i me systems
System Com pon ents ." When 'shared mem ory
er prior ity servi ce releases at
allow for even t-driven preem ption of services by high
ust be prote cted to ensure
any time, shared resources such as shared mem ory m
d memo ry location
mutually exclus ive access. So, if one servic e is updat ing a share
e is allowed to
( writing) , i t m ust fully comp lete the updat e before another servic
preem pt the writer and read the same locatio n as descri bed alread y in Chapt er 4. If
this m utex ( mutua lly exclusi ve access ) to the u pdate/r ead data is not enforced,
then the reader m ight read a partially updated message . I f the code for each service
that either updates or reads the shared data is surround ed with a s e mTake ( ) and
semGiv e ( ) ( in VxWorks for example), then the update and read will be uninter
rupted despite the preemptive nature of the RTOS scheduling. The first caller to
will enter the critical u pdate section, but the second caller will be
blocked an not alLowed to enter the partially updated data, causing the original
service in the critical section to alway s fully update or read the data. When the cur
rent user of the critical section calls s e mGive ( ) upon leaving the critical section. the
service blocked on the s e mT a k e ( ) is then allowed to continue safely into the critical
section. The need and use of semaphores to protect such shared resources is a well
underO'd concept in m ulti threaded operating systems.
,.. . . ..
semT a k e ( )
.. .... -
PRIO RIT Y . I NV E R S I O N
\
We re most concrned about unbou nded priorit y invers ion. I f the invers on is
i
b unded, then this c n e lumped into the response latenc y and
accou nted for so
.
t at the R M analysis
is still possi ble. The use of any m u tex ( m utual exclu sion) sem aphore can cause a temporary mvers
.
'
1on
w h'1 l e a h igher
pnori
ty service is blocked
.
to a 11 ow a lower pnority servIC
e to comp 1ete a shared memo ry
read or updat e in its
.
enti.rety. As long as the 1ower pnon
smn.
Three cond ition s are necessary
.
.
c
1or
uo bounded mvers1on:
Scanned by CamScanner
Multire59urce Services
117
At least two services of different priority share a resource with mutex protec
(H), Medium
in the critical sect ion and is blocked on the semTa k e ( semid ) . While the
service executes in the critical section, one or more
with the L priority service's progress for an indefinite amount of time; the
ity service m ust continue to wait not only for the
priority
H prior
critical section , but for the duration of all i nterference to the L priority service.
How long will this in terference go on? This would be hard to put an upper bound
on-dearly it could be longer than the deadl ine for the H priority service.
Figure 6.2 depicts a shared memory usage scenario for a spacecraft system that
has two services using or calculating navigational data providing the vehicle's position
and attitude in i nertial space; one service is a low priority thread of execution that
periodically points an instrument based upon position and attitude at a target planet
to look for a landing site. A second service, ru nning a high priority, is using basic
navigational sensor readings and computed trajectory information to update the best
estimate of the current navigational state. In Figure 6.2, a set of M prior
s
{M} that are unrelated to the shared memory data critical section can c
l1t:fJ
ie
a .,.._
for as long as M services continue to preempt the L service stuck in t
,. r uAtV )
---11":..:::r--t""" - _) i
Service L
Services
{Ml
ervice H
Sc11n_Lanc1ng_S1 t t
t . y , z, 1tutuo1 1 ;
&z ,
CJown_r1ng1,
crot -r no ,
ars_ i.
age,
(Shared
y,
By M
Services not
sing
ntlOIJ
Sec11on
FIGUH 1.2
Scanned by CamScanner
l,
&1t t 1 tuo1 ) ;
lnterferenct"
pnonty
J"
.....
..._.._,....
Section
orb1t_11 .. 1n ts ) ;
ttltuOt_llt&.Utt ( a ,
Continuing
Cm1 al
Code
1
Unbounded priority inversion scenario.
Mrmory
pacecft
State)
' ""
s and Systems
Real-Time Em bedded Component
1 11
Solutlos
UHetl Priority lverslo
to u se tas k or inte rr u pt
ded pri ori ty i nve rsio n is
un
bo
un
to
s
on
uti
sol
t
firs
On e of the
( ) a n d tas k Un lock { ) ) to
.
and i ntU nlo ck ( ) o r tas k Loc k
1ock mg (VxWorks int Loc k ( )
. t h e same wa y
.
ope rat es m
l sec tion s com ple tely , w 1c h
.
ica
crit
n
i
ns
ptio
m
ree
p
t
ven
pre
re
a
i
mo
as
opt
uce
mal
rod
1
Prio rity inh erit anc e was
as a prio rity ceil ing pro toco l.
to the lev el
of prio rity i n a cnt teal sect ion only
method that l i m its the amp lificat ion
son , inte rrupt lock i n g and p iorit y ceili ng
required to bou nd inve rsio ns. By com pari
u ratio n of the c nt ICal sect ion, but very
essen tially disa ble all pree mpt ion for the d
r wha t is mea nt by prio rity inhe ri
effectiv ely boun d the inve rsion . To desc ribe bette
d inde fin ite i nterferen ce while
tenc e, you mus t unde rstan d the basic strat egy to avoi
H prior ity servi ce gives its pri
a low prior ity task is in a critical sectio n. Basic aUy, the
not be preem pted by t he M
ority temp oraril y to the L priori ty service so that it \Vi ii
m a l l y t h e L priority
priorit y service s while fi n ishing u p the critica l sect ion- n o r
n.
service restores the priority loaned to it as oon a it leaves the c rit ical sectio The
priority of the L service is tempor arily ampli fied to the H priority to p revent the un
bounded i nversion . One downsid e to priority inherita nce is that i t can chain. When
H blocks, it is possible that shortly a ft er H loans its priority to L, another H + n prior
ity service will block on the same semaphore, therefore req uiring a new inheritance
of H + n by L so that H+n is not blocked by service of priority H + l to H + n- 1 inter
fering w ith L,
so i t would be easier to
si m ply give L a priority that is so high that c h a i n i n g is not necessary. This idea
., ..,
..
_
beC'a{lle the priority ceiling emulation protocol ( also known as h ighest locker). The
prior.it}' ceiling protocol is more exacting and was first described in terms of A da
.._
angua$
prqt?cal
0ss tb,
.
t.!.O .
wever '. the ceiling protocol is not ideal because i t a m p l ifies the priority of
er than it ealJy n eeds to be. This overamp lification could cause other prob.
.
u ch a signific ant mterfer e nce to high -freque ncy, high-p riority services.
_
A furthe r re finem e t of p r1ority
ceiling is t h e least - locker p ro tocol. I n lea t
. .
.
locke r, t he nonty of L is ampl ified to the h ighes t prior
ity of all t hose serv ices th a t
. .
.
very we l l w hat ser vic e c a n en
t er a gi ven crit ical ectio n and t he refo a lso k no \''
re
.
the orre t le a t loc ke r pri. or ity
.
Th e pro ble m of pri ori ty
mver ion b eca m e famo u with
t h e M a r Pa thfinder
pa ecra ft The Pat h fim d er s p c
a e c ra ft was o n
fi
I ap p r oa c h to Mars a nd wou1 i
need to om plet e a crit ical eng .
me b urn to cap t ure mto
a Ma rtia n orb it wit h i n a f
- '
Scanned by CamScanner
Multiresource Services
1 11
days after a cruis trajectory last i ng m a ny months. The m ission engineers readied
instru ments were designed to help determi n e the i nsertion orbit around Mars be
cause o e of the ?bjectives of the mission was to land the Sojourner rover on the
surface m a location free of dangerous dust storms. When the new services were
activated during this critical final approach , the Pathfinder began to reboot when
one of the highest priority services failed to service the hardware watch-dog timer.
Discussed in more detail i n Chapter I I , watch-dog timers are used to ensure that
software conti n ues to sanely execute. If the countdown watch-dog timer is not
reset periodically by the h ighest priority periodic service, the system is wired to
reset and reboot to recover from software failures such as deadlock or l ivelock.
Furthermore, most systems will reboot in an attempt to recover several times, and
if this repeats more than three t imes, t he system will safe itself, expecting operator
intervention to fix the rec u rring software problem.
It was not i m mediately evident to the mission engineers why Pathfinder was
rebooting other t h a n i t was caused by failure to reset the watch-dog timer. Based
upon phenomena studied in this chapter alone, reasons for a watch-dog timeout
could incl ude the following:
Scanned by CamScanner
1 JO
field debuggi ng. Prior to the Pathfinder incident, t he problem of priority in versi o n
was mostly viewed as an esoteric possibility rat her t h a n a l i kely fai l u re scena rio. The
unbou nded inversion on Pa th finder resul ted from shared data used hy H , M , an d L
priority services that were made active by m ission controll e rs d u r i ng final app roac h.
embedded systems to designs with less memory a n d lower speed p rocessor clocks.
The power equations sum marizing a nd used to model t h e power used by a n ASIC
( Application Specific I n tegrated C i rcui t ) de i gn a re
P.werag.
P' h i 1g
swi tch mg
( Sprobabi lit
pleak age
Scanned by CamScanner
Multiresource Services
1J1
SUMMARY
Rea l - t i m e e mb edd ed sys
tem s are mo st often des ign
ed acc ord ing to har dw a re
pow er, ma ss, lay out , and cos
t con stra in ts, wh ich in tur n defi ne
the processor, J/O ,
and me mo ry reso urc es ava ilab
le for use by firm ware and software. Ana
lysis of a
single reso u rce suc h as CPU ,
mem ory , or 1/0 alon e is well understood for rea
l
time embedded systems--the subj
ect o f Cha pters 3, 4 , and 5 . The anal ysis o f m u l t i re
sour ce usag e, such as CPU and mem
ory, has more rece ntly been exam ined . The
case of m ultip le serv ices shar ing mem ory
and CPU lead to the unde rstan ding that
dead lock, l ivelo ck, a n d prior ity inver sion
could i n terfe re with a se rv i ce ' s capa bil ity
to execu te a n d comp lete prior to a deadl ineeven when t here is more than s u ffi
cient CPU resour ce. T h e c o m p lexity of muitip le resour
ce usage by a set of service s
c a n lead to scenar ios w here a system o t h erwise determ
i ned to be h a rd
real-tim e safe, can st i l l fail.
EXERCI S E S
1.
V e r i f CPU
1 5 , C 1 = I , C2=2, C3 = 3;
:
_
What i s the LCM of the periods and total u t i l i ty? Does the example wo rk.
y
T3=
d1;
=; =3
.
line g u a ra ntee, why isn't E D F always used instead of fixed priority R
;
.
an
r
e
v
n
i
o_
i
r
p
code
ple
m
exa
e
h
t
e
Exa m i n
S. de m o n s t ra t es priority i n version and se W mdV1ew to show that w1th ut
.
that w i t h
t h e priority i n heritan ce protoco l, the 111vers10 n .occurs; also show
ed.
t h e .priorit y i nherita nce protoc I, t e problem is correct
n
eil ing Pro toc ol:
Good enou gh, John, and L u . S h a, "T h p r l)' . '
.
A da Task . "
H igh - norit )
A M ethod for M i n i m iz i n g the Bloc k i n g of
CM U/SE I -88 - SR -4 Speci al Repo rt.
I G ooden o ugh88 j
Scanned by CamScanner
1 22
2000.
[Shah90] Shah, Lui, Rangunthan Rajkumar, and John P. Lehoczky, "Pr iority
,
Inheritance Protocols: An Approach to Real-Time Synchronizatio n. , IEEE
Transaction on Computers, Vol. 39, No. 9, (September 1 990) .
[ Vahalia96) Vahalia, Uresh, Unix Internals; The New Frontiers. Pren.tice- Hail
, Inc.,
1 996.
Scanned by CamScanner
I This Chapter
Introduction
Missed Deadlines
Quality of Service
Alternatives to Rate Monotonic Policy
Mixed Hard and Soft Real-Time Services
INTRODUCTI O N
Soft real time is a simple concept, defined by the utility curve presented in Chapter
,,
2, " System Resources. The complexity of soft real-time systems arises from how
to handle resource overload scenarios. By definition, soft real-time systems are not
designed to guarantee service in worst-case usage scenarios. So, for example, back
to-back cache misses causing a service execution efficiency to be much lower than
expected, might cause that service's deadline or another lower priority service's
deadline to be overrun. How long should any service be allowed to run past its
deadline if at all? How will the quality of the services be impacted by an overrun or
by a recovery method that might terminate the release of an overrunning service?
This chapter provides some guidance on how to handle these soft real-time sce
nario and, in addition, explains why soft real-time methods can work wdl for
some services sets.
Scanned by CamScanner
1 24
M ISSED D EADLINES
um ber of way s:
M issed dea dlin es can be han dled i n a n
Scanned by CamScanner
guALITY OF SERVI CE
1 25
rrc
QoS == 1
( Drop-outs I Deliveries)
service d rop-outs and 1 00/o availabil ity. Systems providing isochronal services
and output most often use a DM ( Deadline Monotonic) policy and buffer and
hold outputs that are completed p rior to the isochronal deadline to avoid jitter.
e ou
The RM poli cy can lead to pess i mis tic mai nten ance of h gh r
3,
sets of servic es that are not harm onic ( descri bed in C h apter
Scanned by CamScanner
re margi ?r
Pro ,e ssmg ) .
1 21
. .
Furthermore, RM policy makes restncttve assum ptions such as T=D. Because QoS
.
.
.
should
is a bit harder to na1 1 down categorically des1gners of soft real-time systems
.
.
.
mtgh t better fit thei r applica tion -specific
consider alternatives
to RM pol'cy
that
.
would cause a dead measures of QoS. For example, m Figure ?. , th e RM policy
that EDF or
line overrun and . a service dop out decreasi ng Qo S, but it's evident
.
. resu 1 t .in h ig
' her QoS becau se both avoid th e
LLF dynamic priority pohc1es will
'
E u n1Jle 2
Tl
Tl
n
T4
7
13
Ul
05
U2
0.2
U3
0. 1 42857
(4
U4
0 1 53&46
Cl
LCM -
JO
Utot -
09967!!1
---,
R MSc
s 1
S2
S3
fAl.Utll
52
S3
54
13
12
11
10
S4
E DF Sc
s 1
Sl
S3
S4
--
TTD
s1
52
53
11
10
54
L l f 5 c
51
52
S3
54
l uily
51
FIGURE 7.1
x
1
The EDF and LLF policies are not always better from a QoS viewp
oint. Figure
7.2 shows how EDF, LLF, and RM all perform equally
well fo r a given service sce
nario. From a practical viewpoin t, the deci sion to be
mad e o n sche dulin g policy
should be a balance between the imp act on QoS
by the mor e adap tive EDF and
LLF policies com pared to the mor
e pred icta ble failu re mod es and deterministic
behavior of RM in an overload situ
atio n. Thi s may be diffi cult to com pute and
might be best evaluated by trying
all three pol icie s wit h ext ens ive tes ting .
In cases where ED F, LLF , and
RM perfor m equ ally well in a non ove rload sce
nario, RM might be a better
choice because the impct of a fai lur
e is sim pler to
co nta in; that is, there is less
likelihood of cascadi ng servic dr
e op -ou ts given upper
bounds on overrun detectio
n and ha nd lin g.
As s own in Figure 7.3 ,
it's well worth no tin g
.
th at systems de sig ned to
harmonic service
request periods do
equally well wi th ED F, LL F,
and RM. Design
in g systems to be ha rm on ic ca
n greatly sim pl ify
real -t im e sc he du li ng .
Scanned by CamScanner
ave
..
Tl
.
T2
T3
15
Cl
CJ
Ul
U2
U3
R M S chedUe
SI
0.3J
T 0.4
02
LCM
--
-I
15
U -i 0 93
127
S2
SJ
E Of S ched!Je
Sl
S2
SJ
no
SI
S2
SJ
IS
14
J
12
L L F S chedUe
SI
S2
luily
SJ
SI
S2
SJ
FIGURE 7.2
f unl*
x
x
2
11
Tl
(I
UI
o.s
U2
0.2 5
025
16
2
y
7
x
2
6
x
x
s
x
5
T2
Tl
LC M
16
M S t
SI
S2
rm
SI
S2
Sl
llf S
SI
S2
Sl
SI
luiy
S2
l .
10
SJ
12
FIGURE 7.J
II
Figure 7.4 shows yet anothe r example of a harmo nic schedule where policy is
inconsequential.
ng T=D . This
For isoch ronal services, OM polic y can have advan tage by relaxi
r and hold
allows for analy sis of syste ms where services can complete early to buffe
shows a
outp uts to reduce prese ntati on jitter and thereby increase QoS. Figure 7.5
to require ..
scenario where the OM polic y succe eds when the RM would fail due
men ts where 0 can be greater or less than the release period T.
Scanned by CamScanner
1 11
a
.
c om po nents
ed
dd
be
Em
e
Real-Tim
T2
10
T3
0. 4
U2
U tot
QI
U3
10
LC M
0.5
UI
Cl
Tl
E aa f11>1e 5
nd Syste ms
R M S c e
Sl
S2
S3
E Of S c e
S I
S2
S3
TTD
s1
S2
S3
10
Llf S chedUe
s1
S3
S2
S3
Sl
T1
12
3
3
A ha rm on ic service set.
FIGURE 7.4
( H"1"0 6
- ---
52
Laxity
13
Cl
UI
0.5
UJ
O ll8S 7
O I H6A6
U1
,.
UOI
0.)
LCM -
JO
ror OM
01
02
OJ
UICI-
RMOO I
0996703
J
7
IS
R M D2 J
E AIHJ( R
l1'TE R
""''
R M St
s1
52
S!
FIGURE 7.5
SOFT REAL-TIME S E RV IC E S
--
.
. . e, and best ef.
Many systems include services that are hard real time, so ft real tim
d
fort. For example, a computer vision system on an assembly line may have h
real-time services where missing a deadline would cause shutdown of the proCtht
being controlled. Likewise, operators may want to occasion ally mon itor wha
computer vision systems "sees.>' The video for monito ring should have good .
so that a human monitor can assess how well the system is working, whether hghte
ing is sufficie nt, and whether frame rates appear reasonable. Finally, in the
system, operators may occasionally want to dump mainte na nce data and have
Scanned by CamScanner
12 9
rea l requ i rements for how fast this is done-it can be done i n the background
whenever spare cycles a re available.
The mixing of hard, soft, and best effort can be done by admitting the services
into m ul tiple periodic servers for each. The hard real -time services can be sched
uled within a time period (epoch ) during which the CPU is dedicated only to hard
real-time services ( al l o thers are preempted ) . Another approach is to period trans
form all the hard real-time services so that they have priorities that encode their
importance. E i t her way, we ensure that the hard real-time services will preempt all
soft services and best effort services on a determi n istic and periodic basis.
Best-effort services can always be handled b y simply sched uling all these ser
vices at the lowest priority and at an eq ual priority among them . At lowest priority,
best-effort services become slack time stealers that execute only when no real-time
( hard or soft) services are request ing processor resources.
SUMMARY
Soft real -time services assume that some service releases will fa il to meet deadline
requ i rements. Careful considera t ion should be given to how service deadline over
runs will be handled and how t h is will impact QoS.
EXERCISES
Review pos i x_c lock . c and pos i x_rt_t imers . c contained on the CD- RO M .
Write a brief paragraph describing how these two modules wor k a n d what
1ea t u res o f the POSIX 1 00 3 . 1 b real-time extension s they demonst rate.
c
M a ke sure you not only read the code, but that you b uil d it, 1 oad it, an d
.
.
execut e it to m a ke sure you unders tand how both applica tions work.
mo ul fr VxWor
2. Build, load, and analyze the CD-R OM i t t imer_test
e i t 1mer_ est
and descr ibe how i t works . Speci fically what stat es oes t
. a task conte xt using sp
task conte xt trans ition between i f it's spawned m
I.
i t ime r - t e s t ( )
.
x sw wd . c so ftwar e wa t c h - dog
3 . B u 1 I d , I oa d , a n d run t he CD- ROM posi - c your d e
.
ev1' d ence ior
.
tes. Provide
mon itor co d e an d d escnbe how it execu
es.
crip tion with supp orti ng Win dVie w trac
_
\
Scanned by CamScanner
1 JO
CHAPTE R R E F E R E N C E S
[WRS99] VxWorks Reference Manual 5. 4, available online a s "windma n" or ha rd
copy, as Edition 1 . WRS, 1 999.
[WRS99] VxWorks Programmer's Gu ide 5.4, Edition 1 . WRS, 1 999.
Scanned by CamScanner
Pa rt
II
Designing Real-Time
Embedded Components
Scanned by CamScanner
Embedded System
Components
I This Chapter
I nt roduction
H ardware Components
Firmware Components
INTRODUCT I O N
System design can b e approached in a bottom-up or a top-down fashion. The
bottom-up approach consists of determining the fundamental components that go
into a system. The top-down approach is a hierarchical breakdown of the system
into subsystems and then into components. The top-down approach can be
viewed as a concrete breakdown of the system into smaller parts as suggested here,
but often an abstract top-down approach is useful where the system is broken
down by service and function. This functional top-down approach is described in
Chapter 1 2 , "System Lifecycle. " In this chapter, we first examine common compo
nents of a real-time embedded system in the concrete sense going from the overall
system 9own to components. Familiarity of the components can assist the designer
in making more optimal system design decisions. The design of a real-time em
bedded system can be viewed as a hierarchy of subsystems as depicted in Figure 8. 1
for a real-time stereo -vision tracking system.
I JJ
Scanned by CamScanner
1 J4
d Systems
Co m po ne nt s an
ed
dd
be
Em
e
im
-T
Real
Tilt/Pin
Actultion
Procs1n1
'
'J
""..
'
It
. ,..
t
.
I
Sensor/Actutor 10
FIGURE 1.1
object in
the fiel d of view of both cameras even if the object moves a n d e s t imate the distance
from the camera assembly to the
o bje c
t.
Stereo-Vision
Tracking
Stereo Ranging
FIGURE 1.1
Tilt/Pan Target
Tracking
Range Reporting
Field-of-View
Centroid
Scanned by CamScanner
Reportint
exam i n ed 1
I JS
?f
Se1son
Sensors are devices that respond to physical s t i m ulus ( ligh t , heat, pressure,
s tress/stra i n , acceleratio n, magnetis m ) by transformi n g the associated energy into
electrical energy or by modifyi ng the elect rical propertie s in a circuit . For example,
a camera is a sensor that converts photon energy into elect rical cha rge that repre
sents the photon flux for each picture element in an array. A therm istor is a resis
tor c i rc uit where the resistance of the therm istor changes with temperat u re and
therefore so does the c i rc u i t c urrent at a given voltage and voltage drop aero s the
load. A sensor assembly may also i n terface t h is analog fron tend to a digital encod
i n g i nterface. A n alog to digital converter s ( A DC ) are u cd t o sa m ple and hold
charge, thereby c onvert ing the analog circuit c u rrent/volta ge i nto a d igital value .
For example, a n 8-bit A DC will encode a sensor c i rcuit operat ional voltage range
i n to 256 levels. Without encoding, sensors are useful i n analog contro l sy tern .
Scanned by CamScanner
! ,d t
t .
.t
.... .
I f
..
I
. . .,
... ' J
; ;,.. .
. ..
.)
"
.
. .
J
'I
.
,.
'
,,
'
l
.,.. ..
, (.
..
' lJ
i
,,
'
-:
7r
.. .....
.,
. .
'"
..
i.
..
...,. '
.._. .
, >
' .
r .
'
....
"
...
. .
..
i '
\ ... 1 "' ,,
.
;
.... ,. ,-t
..
. ..
, ,f .: ' " !
1 \l
fl
'> .
'1
'\
' ' \.
I . .. "
. ..
.
.
l.
'
,,.
1
"
.
;' ; . I
..
'
..
'J
.I
.1 .
'
"..
'
.i
., \
l
'
. .
'f'.
l
..
. \
J
' '
.
.
' '
I \
"_, )
' :l
..
'\ \.
..
;
.,. '
'
. )
.
. ...
. .
"
..
"
.), "' .
'
..h f}...
)
}
ill
\l
'
'. 1 \ r!'-'
(1 I
ut
1.. U t "
, .
Scanned by CamScanner
..
( K
t'
!H
h..t
i"
p41h
'!'
, h
1 J7
The ADC takes a analog input ( voltage and current for a given sensor re
.
1stence) and. conves tt i" to a digital word, most often from 8 bit to 1 6 bit. Change
m sensor res1stence 1 ulti mately measured by change in voltage drop over the cir
.
and normally encodes all inputs
mt. The ADC reqmres a reference ltage, Vr,f.
mto a range of values from zero to (r) , sothat for a SV reference, a typical 1 6-bit
ADC can ecode t e voltage in an AFE into increments representing a change of
.
0.0763 m 1lhvolts with an input range of O to -SV for the AFE signal. The resolution
can clearly be increased by reducing the reference voltage, v r,6 however, this is at
the cost of constraining the input range. This trade-off drives the selection of how
many bits the ADC provides in the encoding-to accommodate large input ranges
and h igh resolution, the ADC must have more bits for the encoding. The final ques
tion is how fast can the AFE be sampled? This depends upon the type of ADC:
Flash: Using comparators, I per voltage step, and resistors
Aduten
F u ndamentally, an
actuator is
Scanned by CamScanner
1 JI
Speed of actuation
Accuracy of modulation
t ; for acc urte high- rate actuators, a DAC is req uired rat
he r than
rela
r
a
outpu
t
(
ulse
P
M
Code
Modula tion ) is used fo r
JO an ou tp, utC P
inp :t sampl in g a dnv mg
DA
C
for
d
ura
t
'
ion
an d at va ria ble output
levels.
Actuators can be very unstable an d su ft
o t careful design and potential feedback fr:; ver-shoot or fai. lu re to settle with
f se ns or s Fo r ex am
m1rror can be used to move an opt.
ple, a scanning
.
pick -off mirror on an optical path1 l fi I d o view very. ac cu ra te ly by deflecting
mech anism known as a voice-co 1 flex rough a sm aJI an gle . A n electromecha nicai
mag netJC coils are used to defl ect a m . ure can be driven bY a DAC so that electrofurt hermore, the mirror can be irro r on a rub ber fl ex ure to the left or right
res ed to a previo u pos it
fl exure to spring
back, dam pe ned bytor
ion
by
allo
the
wing
rate feed back co ntrol for such an . the e1 ec ro m agn et1 c c oils . Some of the highana 1og cont rol cir. cuit rather tha actu ato r might be i. mpleme nted as a traditional
. upo n the digital real
tern to provide suc h co ntrol. n . re1 ymg
-tim e embedded sys_
ie
1/0 lat(!rfaces
'
is
Scanned by CamScanner
I
I
1 J9
many real- t i m e embedded systems have digital 1/0 only or in addition to analog
1/0. Either way, at some point, all l/O becomes digital once encoded or prior to
decode. So, prior t o the ABE or after the AFE, the embedded system simply sees
digital 1/0. The fo r m of t h e digital 1/0 can vary significantly and can be character
ized. as:
Word-at-a-time 1/0
Block 1 /0
MMIO
Port 1/0
In the case of word J/O, a simple set of registers defines the interface to the
AFE/ ABE and provides status, data, and control for encode/decode. A single word is
written to an ABE for output to a data register, the output is started by setting con
trol bits, and the output status is mon itored using the status register. Likewise, for
an AFE interface, status can be polled to determ ine when new encoded data is avail
able, and samples can be commanded through control and monitored via 5tatus.
The word J/O interfaces req uire significant interaction with t h e real -time embed
ded system and are not very efficient, but do provide simple low rate 1/0 interfaces.
These i nterfaces require programmed J/O where a CPU is involved in each inpu t
and output fo r all phases of the read/write, status mon itoring, a n d control . This is
often not desirable for h igher- rate interfaces where even powerful C P Us wo uld
spend way too many cycles on programmed 1/0 and not enough on processing to
provide services. So, most high- rate interfaces have a block 1/0 interface where a
state-machine or OMA engine provides command and control of the word encod
ing/decodi ng and less frequently interrupts the CPU when significantly large blocks
Processor cores traditionally have provided 1/0 through dedicated pins from
terface p i n s by mapping M M I O onto existing address and data l ines i n/out of the
CPU so that 1/0 causes devices to be read. or written in the same address space as
memory devices. Many processors, such as the I n tel x86, provide both port 1/0
and MM IO. When devices are memory mapped, care must be taken to ensure that
Scanned by CamScanner
1 40
Systems
Real-Time Embedded Compone nts and
e d
. rat h er tha n bu ffered and the addre ss range is a Jlowed to be acce-'"I;
to t he d evice
o ften i t 1sn t nec
and
cached,
be
can
ons
.
locati
essa
wit
. hout exception. Memo ry
. actu a l mem o de.v1ces
m
a
ated
upd
fte
r Wri terys
Stereo-Vision
Sensor
NTSC
Camera Data
Acquisition
Camera
Assembly
Data 10 Bus
FIGURE LJ
Encoder 10
tracking system .
Scanned by CamScanner
OMA Engine
... .
141
standard televisio n monitor . The interlaci ng of horizon tal scan lines, that is, trac
ing odd lines followed by even line tracing each frame, reduces flicker at the NTSC
frame rate of approxim ately 30 fps ( 29.97 actual ). The AFE PLL synchron izes with
the scan line by detecting the NTSC sync and blanking levels and then programs
the ADCs to sample the signal for each pixel to encode YCrCb. The YCrCb data is
latched into an internal FIFO memory, and a synchronized DMA engine drains the
FIFO with PCI bus transfers from the encoder to host system memory. The YCrCb
format for NTSC was chosen so that grayscale televisions can display color NTSC
signals by simply using the luminance portion of the signal alone. Most encoders
support automatic conversion from YCrCb into alpha-RGB using linear scaling
formulas based upon characteristics of human vision:
R = Y + Cr
G = Y - 0.5 1 x Cr - 0 . 1 86 x Cb
B = Y + Cb
The OMA engine is a simple procssor with an RISC instruction set that pro
vides control over the encoding as well as the PCI OMA transfer and generation of
host interrupts. So, for example, using the Bt 878, microcode can be written to en
code the 525 NTSC input even and odd lines with 427 pixels each into a range of
formats (e.g., 320 x 240 alpha-RGB or 80 x 60 grayscale) based upon the A DC
sampling rates with instructions to transfer the encoded data and to generate an
interrupt at the completion of each frame encoded and transferred over PCI. The
320 x 240 alpha-RGB frames ( 307,200 bytes/frame) are transferred by the OMA
engine using multiple PCI bus bursts, typically 5 1 2 bytes to 4 KB each. This is typ
ical of a high-rate block transfer 1/0 interface for a high data rate sensor. In this ex
ample, two encoders are bursting video data on the PCI bus simultaneously to two
different DMA buffers in two different host memory address ranges.
The stereo-vision tracker also incorporates a low-rate actuation 1/0 interface
to enable the system to tilt and pan the stereo-vision sensor (cameras and baseline
mount) to follow a bright target that may be moving to keep the target in both
camera fields of view. Figure 8.4 describes the components making up this low
rate actuation subsystem.
Scanned by CamScanner
1 41
Tilt/Pan
Actuation
Servo
Controller
Servos
Serial
Command
Servo PWM
IF
10
FIGURE 1.4
The tilt/pan actuation subsystem uses two servos to p rovide t he tilt and pan
rotational degrees of freedom. The servos are commanded with a TTL PWM signal
( Servo PWM 1/0) generated by a microcontroller ( Servo Controller). The specific
signal generated, and thus position of the servo, can be comm anded by a micro
processor through a m ultidrop serial in terface ( Serial Com ma n d I F ) . The serial
command i n terface is simple and only allows the m icrop rocessor to write ou t
command data l byte at a t ime. This is a typical low-rate i nterface.
Processor Coplex
Almost all modern real-time embedded systems i nclude a general -purpose CPU to
process firmware/software to provide updatable and flexible services by processing
and linking sensor i nputs to actuato r outputs . I f the services that a real-ti me em
bedded system m ust provide are so well known that they can be fully comm itted to
a hardware state machi ne, then perhap s a p rocess or comp lex
( o r set of inte rcon
nected CPUs ) is not needed. Most often , servic es are expec
ted t o change over time,
are not well enou gh speci fied initia lly, or are too
comp lex to consi de r hardware
only im plem entat ions. The proc essor com plex
may be c o mpo sed of the following:
.
v1de s a plat form for ima ge pro cess i ng
to com put e the cen t roid of the target o bi('\
as seen by the left and righ t cam eras and
enc ode d usin g the Bt 878 PC I -b us NTS
P.r
Scanned by CamScanner
1 4J
encoder subsystem. The servo control is achieved using the Servo Controller, a M i
crochip P I C , which comma nds m ultiple servos to tilt/pan the camera assembly
using TTL logic level PWM based upon a serial byte stream command to the con
troller. Figure 8.5 shows the subsystems ( Servo Control and I mage P_rocessing)
that compose the overall stereo-vision system processing to provide the tracking
and ranging services. The Servo Control subsystem uses a digital control law based
upon calculated centroid inputs to tilt/pan the stereo-vision sensors in real time to
keep the target in the field of view and produces a series of servo commands as out
put. The I mage Processing subsystem uses alpha-RGB video fra mes at
maxim u m
rate o f 30 fp s t o compute the centroid of the target a s seen b y each camera a n d the
range to the target based upon a triangulation calculation.
Processing
{x86, PCl-10)
Servo Control
Image
Processing
Servo
Command
FIGURE 1.5
Stereo Range
Estimation
Image
Segmentation
Target
Centroid
Scanned by CamScanner
1u
and System s
Real-Time Embedded Compon ents
ork
Interco n nection Netw
Dvn amic
c rd
B us
Bi
king
Omega
FIGURE LI
.
Poin t-to-Po int
Non -blocking
Fully
Connett
td
Ring
ion st rategi es
Taxo nom y of processo r-1/0 intercon nect
.
Most embedded system s are m tegra t ed w i t h e i t h e r a s cal able bus archi tecture
l i ke PCI , point-to -point serial l i n ks, or network s.
Bus
lterconaedlon
Many different bus architectures have been used and re b e i n g used for embdded
.
systems. To better understand i n t egrat ion of p rocess i n g a n d 1 / 0 wit
a bus inter
connection, this chapter exami nes t h e V M E ( Versa M od ul e Extension bu a d
n
the PCI ( Peripheral Component I n tercon nect ) bus. The V M E bus has h1stoncally
been a popular and simple bus arh itec t u re used w i t h embedd ed systems; by on
tras t , PCI is an emerge nt b us architec t u re t ha t offers traditio nal paralle l bus inte
gration as well as high -speed serial i n terco n nectio n w i t h PCI
Exp ress.
The PCI bus, i n t rod uced as a replac e m e n t fo r t h
e ISA ( I ndus try Stand ard
Arch itect ure) bus prev alent i n the desk top PC
do main , has evol ved and become
a pop ular l/0- to-p rocessor com plex i nteg
ratio n met hod i n emb edd ed syste ms. A
goal of the first PCI stan dar d, 2. 1 , was to
pro vide a bus wh ere 1/0 ada pte cou d be
rs
l
in terfaced to processor complexes
with p l ug- and - p l ay i nte gratio n. P
l ug and play
provides a stan dar d for PC I con tro
ller s ( ma ste rs) to pro be the
bu s and find devices
afte r the y hav e been added wit h
ut any mod ifi c a t i o n to
the ma ste r int erface. The
PCJ bu s was des ign ed to i nte gr.a te
wit h t h e l e gac y ISA bu s thr
o ugh a n interface
cal led the So uth Bu s. The ma in
PCI co ntr oll er int erface
was cal led the North Bus.
A t t h e tim e tha t PC I was fi rst i n t
rod uc ed , ma ny em be
dd ed system s were itegrated
ing t h e V M E bu s. B ild ing a real -ti me em bedd
ed sys tem ba sed u po n bus in tegra
t 1 o n a l low yst em de sig ne rs to de co
mpose the sys t e m
int o su bs ys tem s wit h int er
face an d pr oc e or boards t ha
t ca n be de sig ne d,
bu ilt, an d tested as u nit s a nd la
ter
i n tegrat e d on a sta ndard i n ter co n n
ection . Bo t h V M E a
nd PC I p rov ide th is mod u
lar ity co mpared to c u tom bac kp lan e
or o n - boa rd int e g
ra t io n . Th e bu s int egra ti o n
Scanned by CamScanner
1 45
also provi es a fast signal interface compared to packet or frame transmission net
works (this has started to change recently as yo u ll see later ). Table 8. 1 briefly sum
marizes and compares features of VME-32 and PCI 2.x.
TABLE 1.1
'
Feltlln
Bus Transfers
Target Addressing
Data Transfer
Data Transfer Types
VME ...
Asynchronous 20 MHz
Multiplexed 32/64-bit
Address/Data Bus
Address Bus
Da isy-Chained Prio
Mechanism
Interru pts
Interrupt Vectoring
No Arbitration, Firmware
(masters)
Device addressing
Custom-designed MMIO
Faster options
Scanned by CamScanner
VME-64+
1 41
pone
Real-Time Embedded Com
n b Us
. suc cessful and a popular integratio
.
it
e
mad
e
av
h
t
a
th
ffic1e
. nt wit h bloc
k
The features of PCI
' h rate and is most e
ig
h
e
om
bec
has
110
t
mos
e
two
Th
(
tion.
sfer
m
ura
ost
are burst tran
d- lay confi g
and pl
ion,
trat
arbi
mp
Co
act
are
s
t-in
m
P
te
CI
transfer), buil
e bedded sys
for realrs
facto
rm
form
fo
ar
fa
used
bo
tor) .
comm o nly
ckabl e sm all
PC/ 1 04 + ( a sta
d
an
r)
to
ec
nn
co
on
e
on
s
cti
an
mglc
ne
on
(D -shell backpl
a c hipset int erc
p ar
po
ry
ve
en
be
o
als
s
I as a chi p int ercon
The PCI bus ha
p pularity o f PC
t
r
o
on
as
re
ms. One
board embedded syste
ich allow for bu s-to - bus PCI
l
n d g s ' wh
b
C
P
of
ion
d finit
nection on board ts the e
Hz bu s eye1 es and up to 64ovt. des 32 - bit 33 _M
pr
ard
nd
sta
2.x
I
PC
integrat ion . The
transaction proto
ams o f 32-bit alphawh t' ch . tran sfers 2 stre
e,
mp
exa
n
s1o
-v1
reo
ste
.
second. For the
bytes per sec.
ond, it requires 1 8 '4 3 2 ,000.
sec
per
es
tim
30
s
me
fra
RGB 320 x 240
u
h
ass
(
ming
idt
dw
ban
ive
byt
lion
mil
ct1v
100
is
idth
dw
e ban
e11e
2 6 ,90 1 000 bytes per second' or
.
ld requue
coding ( 525 x 427 ) video stream wou
.
about 27% of PCI 2 . 1 effective bandwidth.
o-vi sion examp e is the Bt878,
stere
the
in
ding
enco
o
vide
for
used
set
The chip
works by fetch MA RISC
now also updated as the Cirrus Stream Machine, and
transfers are 1mt1ated by the
engine code from host memory, so some additional PCI
ry. The stereo
chip to fetch code as well as transfer of frame data to the host memo
have had
vision systems implemented at the University of Colorado using PCI 2 . 1
no problem making use of PCI 2 . 1 for this applicatio n with a 320 x 240 30 fps
band
alpha-RGB encoding. Many embedded applications have more than enough
width available from PCI 2.x. One final important feature of PCI .i s that initiators can
=; i:
r
\
l
ad-
configure targets for a maximum and minimum burst length. The minimum serves
,
as a method to reduce overhead so that targets can t transfer small blocks that
incur high overhead for each bus arbitration and address cycle compared to fewer
larger block transfers. The maximum prevents a target from overusing the bus and
would
Scanned by CamScanner
1 47
split
transactions.
bus decreases perfo r ma n c e for a l l target and i n i t iators on the bus. Split t ra n sac
tions el i m inate this delay by provi ding bu ffer queue for writes to t h e bus so that
they can be pos ted by a n i n i t iator and drai ned to a slow ta rget over t he bu when
it's ready . The i n i t iator is not delayed in t h i case as long as the write bu ffer queue
i s not exhausted before t h e data is drai ned to the target. Likewi e, on read , split
t ra nsac t i o n a l lows the i n i t iator to post a read request to the bu , which i n i t i ates the
target read, a n d if the t a rget is slow, a l l ows t he ta rget to negot iate for complet ion in
a later t ra n sact ion-the bu
PC I - X a nd PC I - Ex p res are spl i t - t ransaction bus standa rds-t his grea tly i m prove
t h e effective bandwidt h because it does not a l low the bus to be held for arbi trarily
long delay per iods. H owever, t here i s , of c o urse, still arbit ration and addressing
overhead.
U n iver a l erial Bu
Fi rewire
P
I - E x p re
igabit E t h ernet
an be u ed f o r rea l - t i m e em bedded
u r eria l/ netwo rk i n terc n n e t i on
n. A fu l l di u 1 n
tern a nd pro ' ide a n a t t ract i e ltern at1 e t hu int 1ratio
e 1 t h i t t.
f al l t h e h igh - rat e . criJ I pro to ol'I i be ond t h
All
Scanned by CamScanner
1 41
ms
Real- Time Embedded Components and Syste
n. Each
to max1m1ze
the bandwi dth per pm on t t te . m te rcon n e. ct1o
( PC I - Express ) 1s
PCl - E b yte 1 ane 1s
fu1 1 d up lex , allowin g concur rent tra n s m i t and receive at 2 . 5 G.
Given these character istics, we can compare t he PC I 2 . x , PCl - X , a n d PC l - E stan
PCI - E clearl r has the advantage from the per pect i ve of i n t e rco n n ec t ion layout
and cab l i ng over PCI 2.x and PCl-X. The compl ica t i o n i t h a t e r i a l byte la ne trans
m ission req u i res sign ificant d igital signal proce i ng on ea h byte lane. L i ke fiber
channel and gigabit Ethernet , PCl - E uses an 8b/ l 0b encod i n g s heme with a link
layer and network layered archi tect u re to ach ieve 2.5
t ra n fer rates. G i ven the de
mands of gigabit transp ort, the cost of t h is d igital ignal p rocess
i n g an d network
stack i mplem entati on has actual ly becom e more fea ible
t h a n t h e cost of t radit ional
bus high- speed layou t. F u rther more , t he a b i l i ty.
to u e h igh - peed erial i ntercon
ne t ion on-c hip, on-board , and off-b oard make
t hese t a n d a rds m o re a t t ractive
than trad i t iona l bus arch i tec t u res. Fina l ly,
PCl - E has been desi gned t o be com p a ti
ble with PCI 2.x and PC I - X from the firm
war e view po i n t , de pite a rad ical l y differ
ent data tran spo rt met hod . The PCI
- E stan dard sup por ts t he a me
pl u g-and - pl ay
con figu ratio n, bur st tran sfer s, in t e r
ru p t s , and all bas ic feat u res of
PCI 2 . x.
The PC I - E i nte rco nne ctio n pro
vid e byt e lan e i n terc on nec t
i o n w i t h a n etwor k
IJyered arc h i tect u re as sho wn
i n F ig u re 8.7 .
The P 1-E tan dar d no t on
ly p rov ide s ign i fica n t l y m
o re b a n d wi dth co mp ared
to P I 2' x an d P I-X , i t al
o
.
pro vid e som e fea t u re to
s up po rt rea l - t i me co n tinu
.
ou me dia wit
h iso chr on al chan ne ls. Th
e i soc hro na l c h a n ne ls
p ro vid e ba nd wid t h
a d l ten y pe r for ma nc e
gu ara ntee for t ra ns po
rt. Th e a d v e n t of p 1 - E , U B,
Fi rew ire, an d 1. ?a b i t Eth ern
et ha s pro vid ed an
a l ter na t ive i n ter co n ne
ct io n ar hitec tur e fi r rea l - t i me em be dd
ed
tern Th e e ne w h 'gh
.
1
peed ena I i n t e rco n n\."C , ! wi l
t i n l ike
l be de 1gn ed rnt o ma
ny fu t u re rea l- t i m e
em bed de d y tern .
Scanned by CamScanner
1 41
Link
ec:: : ::::ll!!I
Phy ical
FIGURE 8.7
be interfaced to the higher volt age RS232 serial in terface. The servos i n the stereo
vision application can t i lt/pan t h rough wide a ngles qu ickly, but o n l y have an accu
racy of several degrees,
real - t i me embedded
low- rate and debug i n terface. The R 232 l i n k normally top out around I I .:WO
bit I ec nd ( abou t 1 2 K B/ ec ) and i not capable of lo ng-di tance tran m i ion due
to l i ne n i-.e at the 1 2 - volt
pro ide i m i l . r I
RS422 : A d 1 ff rcn t i a l
Scanned by CamScanner
it u
l km
igna l i n g level
ther
pt ion
have
lved t h Jt
e, m u lt idrop.
'
/-
t rn ;
1 SO
and Systems
Real-Time Embedded Com ponents
din g
Mu ltidrop RS 23 2, RS 42 2: Ad
l ink wit h c a pab il i t y to forward
ltercoanedlon Systems
ject , might include a microcon troller to provide cont r o l for each joint motor. The
5
t Crocontrollers,
3;.
.
e.g., Rodney Brook!>' ub u m p t 1on
arch i t ect u re
.
The a l tern at ive to the h ie r .a
.
re h I' a 1 111t
r"nr
. erc n n e l ion
is a cent ralized proc r
.
?
c o m ple x i n tegrated on a hi11h
o - rat e lllt er
"t'
o n n.e t 1 o 11 su h a PCI
, wi th m a n )' 1 /0 d t''1"
i n ter face al o i n tegrat ed on p( 1 T h i s
, ar h 1tec t u
ro
re h a a n advan tage in t ha t aU P
Scanned by CamScanner
1 51
Meory S1bsystems
Real-ti me e mbedd ed system s require nonvol atile data storage to
boot the system
and to start services . After a power-o n reset, the processors in the processo com
r
plex each vector to a hardware -defined starting address to execute code. This start
ing address, typically a high address such as OxFFEO_OOOO, is designed to map a
nonvolatile storage device, such as EEPROM or Flash memory, so that boot code
can be stored permanently at this address and executed following a reset to initial
ize the system. The boot code init ializes all basic interfaces and normally loads a
basic RTOS so that application services can be loaded and run. The code (often
called text segment ) , data ( i nitialized, uninitialized, and read only ) , heap, and
stack segments must be created in a working memory by firmware. Figure 8.8
shows a typical memory map for an embedded system.
O xFFFF_FF FF
ROM (Flsh)
OxFFFO 0000 .__. -----r:;
OxFf"E( FFFF
40 1 5 M Bytes unused
Ox0 5 0 0 0000
Ox04 FF=F FFF
Ox 04 00 0000
O x O FF= FFF F
--- -----....,
MMIO
32 MBytes unused
(space left for memory upgradM)
o c 1.-------
'x0 LO
" x O . F() "Ff f
Main Working Memory for OS/Apps
(e.. g. 32 M Bytes SRAM SDRAM, DOR)
O xO
FIGUIE U
Scanned by CamScanner
' '
.. _...,
. :..._
. .,....._
........,.
system.
Common mem ory map for an embedded
.z)!Wr" tt t'M
...
1 52
pon ents
bed ded Co m
Real-Time E m
and Syste ms
th e viewpoint f
emo ry from
m
of
w
mo
ie
a lo gical v
ess devices . F ro
lly
c
ac
rea
a
n
is
c
ap
re
softw a
The memo ry m
m wa re and
fir
y of storage devic
ch
whi
as a h ierarch
through
ed
ce
rib
spa
ess
esc
r
add
is better d
i nt, m em ory
wpo
vie
e
war
hard
e c on t rol)
d fo r dev ic
pe
ap
m
ry
o
and mem
Registe rs (CPU
Cache
ory
W orking Mem
ory
Extended Mem
hie ra rch
ypical me mo ry
a
t
ws
sho
8.9
Figure
l1
Instruction Cach e
L1
Unifie d Cach e
..
'
C11,1h\''1
Oali!
Data
FIGURE 1.9
m.
Commo n physical mem ory h"ierarc hy for an embe dded syste
mple
Some co mponents ca n only b
e a I ' ed in hardware, b u t m a n y can be i
_
iz
tly to
fi
re ( o ng that soft w a re inte rfac ing d irec
m ented with s o ftware o r rm
ded
hardware is typically called fi r m wa e . urthermore, i f the real - t i m e e m bed
r
syst em ha s any software-based' se vices or feven J U S t m a n age m en t , fi r m wa e is
need ed to inte rface h ar d wa re res our ces to s0 t wa r e application s.
::
/
Scanned by CamScanner
1 SJ
...t Code
The universal definition of firmware is code or software that runs out of a non
volatile device to make hardware resources available for the rest of the appl ication
software. Fi rmware providing this function i s normally referred to i n general as
initialized and m ade available all on-board resources for a processor complex to
softwa re applications. Before the resources have been fully init ialized, th fi rmware
boots the boar<l b y executing code out of a nonvolatile device so that one or more
basic interfaces is made operable and the system can now download additional ap
plication software. For example, the BSP boot firmware might i n itialize an Ether
net interface and provide TFTP download of application code for execution.
Device Drivers
Device interface d r ivers are most often considered fi rmware because they directly
interface to hardware resources and m ake those resources available to h igher
level software applications. The architecture of a device driver in terface is
depicted i n Figure 8. 1 0 and incl udes both a HW bottom half i nterface and a SW
top half interface.
Appl ication( s)
ope clo
e.
read/wnte.
creat. t0cll
Status
I f Output
If I nput
j SemTake
or EAGAI
l Proce
{ SemTake
or
EAGAIN }
else
else
and Return }
:f
.,
I nput
SemG1ve
Ring-Buffer
..
r R
I lard
FIGUllE 1. 1 1
Scanned by CamScanner
ore
()e, 1
1M
sage queues)
Basic 1/0 for system debug and bring-up (e.g., serial, Ethernet, LED)
debug of a system:
An RTOS does not need to provide the same wealth of system servi ces as a full
m ul t i user operat ing system such as Linux. The understandin g is that the
bi Jity
user pre fers simple r mechanisms and is willing to take on more respon si
tn
n t . that
a
rt
po
m
i
the application to explicitly manage resources. For this reason, it is
d
progra mmers working with an RTOS code with extra precauti on to avo i urnt
rtan t
wasting debug sessions. Some good coding pra t ices that are even more impo
RTS
Scanned by CamScanner
I 55
standard RTOS mech an isms are designed to make mutithreaded real - t ime appli
cations s i m pler. M u l t iple t_h reads need methods for sharing resources, sharing
data, syn c h ro n i z i ng, keeping track of t i me, and receiving notification of asynchronous events. The C D - ROM includes numerous examples of usin g basic mecha
n isms i n the VxWorks RTOS.
Messa1e Queues
The VxWor ks RTOS support s both POSI X and native message queue mecha
pass data between
nisms. Messag e q ueues a re used to synchr onize tasks and to
safe (a partial write or read
them. The e n q ueue and deque ue operat ions are t h read
is not possible ).
and writte n i n block ing or nonBoth types of messa ge queue s can be read
does a read on an em ty
blocking mod es. For block ing mode s, when a task
eued by anot her task allow i n g it
sage q ueue , it block s until a message i s enq
.
t nes to wnte a full mes age queu e 1.n
read and con t i n ue. Li kewi se whe n a ta k
e has oom for the ne\\' m e ae. I t LS
bloc k ing mod e, it is bloc ked u n t i l the queu
than u m g WA I T _FOR E VE R so t hat mddi
wise to et a t i meo ut upper bou nd rath er
in yn h ro n izti on a n d re our e mJn age
nite blo k i ng w i l l not o c ur and erro rs
er t han allu re to mJk e pr gre
ment will be det e red t h rou gh t i meo ut rath
':1e
o
Scanned by CamScanner
1 51
ems
Real -Tim e Embedded Com pon ents and Syst
e a t s k doe s a rea d on an em pty que ue,
For n o n b l oc k i n g m essa ge q ue ues , wh
.
d icat m g t a t n ? mes sages we re avai l .
m
,
N
I
A
EAG
e
cod
r
erro
e
h
t
rned
i t wi l l be ret u
late r. L1ke w1s e, for a non bl oc ki ng
a
. bl e t o rea d a n d that the task sho uld t ry aga i n
N error co d e 1 f 1 t tn es to wr ite to a
J
A
AG
E
e
sam
e
h
t
get
l
l
"
wi
k
tas
m essage queue, a
.
n later, perh aps allow i n g a
i
aga
ry
t
ld
shou
er
writ
e
h
fu ll q ueue . T h i s i n d ica tes that t
read of that queu e to create space .
to enqu eue mess ages w ith a priThe POS I X mess age queu es i nc l u d e a fea t u re
_
are deq ueue d. H ighe r p riority
ority level and to read t h e prior ity w he n messages
ON THE CO
be cal led
in I S R
in
task
the
(On t e x t .
Binary Semaphores
The b i n a ry se m a p ho re i s t he s i m p l es t a n d
RTOS. The
h a nd l i n g
semGive
Th
---
<
semTake (
R O ,\ I p ro v i d es a n
ON THC C O
i o n . F i n a l ly i n a e w h e re m ul ti
p ie ta k m a y be b l o ked a n d i f a l l b l o k i n g t a k s h o l d he e ea sed t t h
u
a
r l
e 101t>
t i m e, t h e s e m F l u s h ( ) fu nct i o n p rov ide t h i fea t u
re.
- -
. -
- - .. ..
Scanned by CamScanner
- ..
--
.L
- J
1 57
.
protoco 1 s can be specified :
PR IOR ITY :
F I FO:
ai
interrupt ( I RQO on the x86 archi tecture ) updates virtual ti mers, which keep track
of the i nterrupt count ( cal led ticks in VxWorks ) . All t a s kDelay ( ) calls, t imeouts
specified, and calls to t i c kGet { ) are driven by the PIT i nterrupt rate. If the. basic
rate is set to 1 m il l i second with sysCl kRat eSet ( 1 oo o ) , then t i m i ng accuracy for de
lays and t imeouts will be within 2 mill iseconds. It is possible to have just missed a
tick expirat ion when a t i mer is first set and also possible to overrun at least one tick
before a t imeout handler is i nvoked. The PIT can be set to raise ti mer interrupts
more frequently than 1 OOOx per second, but this starts to require significant CPU
time to maintain virt ual time at h igh rates. The CD-ROM includes three examples
to demonstrate the use of virt ua l ti me, i ncluding i t imer_t e s t . c, posix_c l o c k c,
S.ltware Sl1nals
t. They
Software signa l s can be thou ght of as the software equiv alent of an int rrup
ta k con
a re the m ain mec han ism p rovi ding asyn chro nou s hand ling of events m s
by ISRs . The
text. A task can set itsel f up to catc h sign als thrown by othe r tasks or _
.
POSI X rea l C D - ROM mcl udes r t - s i g n a l -t e s t . c , w h 1' ch d emon st ra tes the use of
.
ts loss of s1g ven
pre
'
ch
1
I
h
w
s,
s1gna
tt. me sign
most
.
. als. These signals queue, unl ike
.
the process o f han d i .mg a
.
m
I
tas k is
'
n a s 1 f a signal is th rown while t he catc h mg
previousl y throw n signal .
Scanned by CamScanner
1 51
Application Services
Application services
SEM_ I D s y
n c h_s em
n t ab o r
t_t e s
int ta k e
_c n t
in t give
- cn t
vo i d t a s k_
a(v
t = FA LS
E;
=
O;
0 J.
o id )
int cnt = o
J
wh i l e ( J a b o r t
_t e s
t)
t a s k O e l a y ( 1 00
0) ;
f o r ( c n t =O ; c n t
Scanned by CamScanner
<
1 00 00 000
,. c n t + + ) ;
ponents
1 51
s inG iv e ( s y n c h_sem ) ;
g ive-c n t + + '
void t a s k_b ( vo i d )
int c n t = o J
while ( ! a b o r t _t e s t )
<
t a k e -c n t + + '
1 0000 000 , c n t + + ) .
'
t a s kD e l a y ( 1 000 ) ;
void t e s t_t a s k s ( vo id )
s y sC l k R a t e S e t ( 1 000 ) ;
I * re c e i v e r r u n s at a h i. g h e r p r i o r i t y than t he sender * /
10,
o , 0 ) == E R R O R )
e l se
p r in t f ( " Ta s k A t a s k s p awne d \ n " ) ;
.
,
if ( t a s kS p awn ( " t a s k b " , 1 1 , O , 4000 , t a s k_b
0, o, o, o, o, o, o, o,
O , 0 ) = = ERROR )
\n" ) ;
p r i n t f ( " Ta s k B t a s k s p awn f a i l ed
e l se
t a s k _ a ( ) and
k B wi th en try po int
The tes t_ t a s k s funct ion s paw ns tas k A an d tas
riority 1 1 . Th e
her tha n Ta k B p
hig
is
t a s1< b ( l
ich
wh
O,
I
y
Task A is ass ig ned p rio rit
wil l prtt pt
sk A is spawned. i t
Ta
en
)
Wh
=
1
(
l
h r
i n itially set ful
op and give
p o e synch _ seM is
en exec ute a lo
th
nd
)
.
a
;;
Scanned by CamScanner
,.
1 60
s
on ents an System
Re al-T ime Em bed de d Co mp
,
o f syn ch sem WI 11 be emp ty ( =O
h
e
n
wit
its
o
d
d
s
e
t
1
I
a
er
y
eth
wh
a
n
B
d
pt
m
ree
b
p
11
.
.
have come out o f. de1 ay an d WI
. usy .
t a I tern at1o n will
.
stnc
this
and
.
sem,
h
sync
con
give
l
wil
.
A
.
.
or not-at t h is pom t T ask
. cnt1cal
ts
y=O)
empt
r
as
o
w
1
=
.
l
l
u
ell
f
(
sem
a
h
s
the
1 con d 1't '10n of sync .
tmu e. The m 1tia
s
e
task
two
h
t
to
for
ible
de
poss
a
is
It
dl
0c k 1r
.
o f Task A and Task B
n t 1es
pno
rel at1ve
.
. ,
nate
s
lter
as
a
way
l
s
a
it
p rovided
The exa m ple will
.
the initi al con ditio ns are diffe rent .
yield the CPU with a task dela y call nd was as.
here, but if Task A, the giver, did not
B wou ld n ever r u beca use 1t ca n 't pre
signed higher prior ity, it's poss ible that Task .
_
_
t deadJ k either because circ
ular
empt Task A. This system as presented here w1 I
s.
ition
cond
wait is not possible given the prior ities and t he 1mt 1a l
.
new
thread without side effects that would ca use either t h read t o s u ffer functional bugs.
So, reentrant code m ust carefully handle global resou rces a n d p rotect them
so that
they are mutually exclusively used b y m u l t i p l e t h reads. T h e fol lowing are the four
main methods to ensure that global data i s either protected o r converted into task
&>0
e C
Scanned by CamScanner
.-
I.:_
. .
_. _
_ __
t.,
fn
C, l
'
I &I
variables and parameters are maintai ned in stack memory. Every VxWorks task
must speci
claration of la rge arrays as C locals can i n t roduce bugs. However, stack only vari
ables i n functions i m plements a pure function that is thread safe.
8. l l ). The message queue provides a buffer shared by two tasks a n d allows each task
to atomically enqueue or deq ueu e
FIGURE 8. 1 1
A crit ical section simply requires that other tasks ( services) are not able to pre
empt S1 or S2 while they are in the middle of copying a message from their local
memory i n to the global message queue buffer. The underlying i mplementation can
use mutex semaphores to prevent more than one task from entering the critical sec
tion. O t her methods that can be used include t a s k L o c k , t a s k U n l o c k , whereby t he
scheduler is actually d isabled around the critical section ( no preemption is possible
at all ! ) . Finally, i t ' s also possible to use i n t L o c k , i n t U n l o c k to ma k out i nterrupts
enti rely in a critical section-re call t h a t preemption occur t h ro u gh an i nterrupt or
an RTOS system call (a y ie ld i n side a c r i t ical ect io n i an erro r ) . The t a s k L o c k
app roach works, b u t prevent a l l ched u l i ng d u ring t h e critical ect ion and there
Scanned by CamScanner
1 &2
pJe me n tatio n ,
be con cer ned wit h t h e i m
b ut
que ue, however, doe s not need to
b
.
pta
)
e
]
em
pre
Th e i m ple.
are ato mic ( n n
.
rath er tha t the enq ueu e and deq ueu e
ide ally not d isab le sch edu l mg and shou ld
me nta tion of the message que ue sho uld
prevent unb ounded prio rity inve rsio n.
.
ant ics for the enq ueue a nd
sem
ple
sim
has
,
ted
crea
e
onc
The message que ue,
NO N BLO C K I N G . The mes sage qu e ue
deq ueue that can be eith er BLOC K I NG or
queu e lengt h , a n d BLOC KI N G or
is created with a fixed message.s ize, a maxi m u m
is c reate d B L OC K I NG , writers to a
NON BLO CKIN G sema ntics . Whe n the queu e
a mess age. Likew ise, readers
ful l queu e will be block ed when they try to enqu eue
an empty queue. With
will be blocked when they t ry to deque ue a messa ge from
ueue will sim ply
the alterna tive NONB LOCK ING seman tics, the writer to a full q
be returne d an error code, EAGA IN, indicat ing that the enque ue was not success
ful. Likewise, the reader of an empty queue will be returned EAGAIN indicating
that there was nothing to dequeue . A service using the message q ueue i n a NON
BLOCKING mode may want to do other work and attempt to enqueue or dequeue
again at a later time. The BLOCKI NG semant ics are used when the service has
nothing else worthwhile to do if it can't enqueue or dequeue s u ccessfully.
One downside of message queues is that t hey require a copy of the S I local
buffer in the global message queue bu ffer-thi s is not so efficient . A variant use of
message queues that improves efficiency is the heap message q ueue. I n this case,
pointers are sent as messages rather than data . The poi n ters are set to point to a
buffer allocated by the sender and the pointer received i s u sed to access and
process the buffer-the receiver normally frees the b u ffer . It is key that the sender
allocate the buffer and the receiver deallocate to avoid exh a ustion of the associated
buffer heap ( a pool of reusab e buffers ) . Figure 8. 1 2 shows a heap message queue.
The heap message queue avoids the copy otherwise required , which takes consid
erable CPU time and wastes memory due to double-bufferi n g of the same data.
S 1 Allocates Message
Buffer i n Message Heap
S2 Processes the
Message Buffer
Data and Frees the Buffer
for Re-use
RGURE 1.1 J
Scanned by CamScanner
1 IJ
Because t he buffe r heap associated with the message queue is typically much larger
than the quee depth ( si ze ) , normally the queue will become filled with pointers
and block wnters before the heap is exhausted. As long as the sender always allo
cates heap and the receiver always deallocates, the heap message queue works
safely and m u c h more efficiently.
SUM MARY
A real-time embedded system is composed of hardware, fi rmware, and software
components. Services can be i m plemented in hardware, firmware, or software, or
some com bination of the th ree. Component design should be completed so that
components can be tested as i ndividual units and then i ntegrated into a larger sys
tem design.
EXERCI S E S
I . Read the following chapters i n PC! System A rchitecture by Anderson and
Shan ley: 1 , 2, 1 7, 1 8 , and 1 9. This should give you a good overview of the
PCI design a n d how to write code to probe for PCI devices and configure
them. You'll also fi n d Sections 3. 1 to 3 . 3 and Section 3.9 of Chapter 3 from
the VxWorks Programmer's Guide useful.
2 . W rite a VxWorks ISR that calls t ic kAnnounce every tick and does a semGive
on a global b i n a ry semaphore every N ticks of the system clock, where N is
a global variable so that the semG i v e frequency is adjustable via the shell.
Now, write a VxWorks task that does a s emTake and i m mediately updates a
global counter to record virtual t icks. Write a driver program to test both
the ISR and task and provide output, which provides evidence that you got
i t worki ng. So, fo r exam ple, if you adjust N to 1 000, on our system, your
count should increase by one every second. Note that you are replacing the
VxWorks virt ua l timer I S R, so you must call t i ckAnnounce so the kernel
can st i l l keep track of time!
3. Now write an abstracted top-half for yo u r driver which incl udes all the
standard d river entry points ( open, read, write, and close) and blocks a caJ l
i n g task on a read until N t icks have elapsed, a t which t ime, i t stuffs t h e read
t i me structure, i n c l ud i n g second and m i l l i econds. V\' rite a
a
test d river th at opens the abstracte d device and read fro m it i n
'' e t h
p r i n t i n g the t i me in seco nds and m i l l i second s-pr ovide evide
.
v "tua l tim e
work ing. A write to your driver hou ld al low for a reset of th
it to zero
value main taine d by your driver ( i .e., writ i n g anyth ing reset
bu ffer with
'?.P
Scanned by CamScanner
1M
devi c es
on
your co d e t o fi nd t h e c
PCI bus o and dete rmin es vend or ID. Use
i rrus
.
' dge ( N B ) c h ip
h
n
B
Nort
ntel
I
the
and
-s
e
adapters
t i n th
logic PCI video
works
ese
h
code
e
d
r
you
vi
ces will
lab and provide evidence that
add1t1ona
l devi ces as W
found on every target-some targets may have
ell
.
5. Write a PCI probing function to determi n e the c o n figura tio n of the
including latency t i ing and the arbitration control. F i na lly, determ
in
whether the N B will allow memory access by m asters other tha n the m .
a1 n
.
CPU. Demonstrate that your probe works and descnbe your probe
put. Finally, describe how the N B/SB i nterface i n PC! allows for s hare
In
terrupts ( P<;:i and I RQ).
b:
N
;ut.
Scanned by CamScanner
In This Chapter
I nt roduction
Excep t i o n s
Assert
S ingl e - Step Deb u gg i n g
K e r n e l Sched u l er Traces
Trace Ports
External Test Equipment
Applicat i o n - Level Debugging
INTROD U C T I O N
Debug ideal ly i ncludes hardware a n d software support i n t h e fo r m of m o n i t o r i n g
a n d con t rol so that e r r a n t c o n d i t i o n s c a n be reproduced and analyzed s o correc t i ve
ac t ion can be taken to elim i nate bugs. Correction
oftware or
ware a n d h a rdware me h, n i m
viewed . I t would be i m po
ihl t
both bu i l t - i n t
i n l udc a
th
h ap t er ,
n ept
m m n d e bu g oft -
ne
ha p t e r .
the g al of thi
haptcr i4\ to J rm you \ ith k nowlc:dge o t ha t at the very least ,
you 'l l know wh;:t q ut: t ion to a k an t h w to fu rther reear h dcbu J m t hod .
Scanned by CamScanner
1 15
1 11
Systems
Real-Time Embedded Components and
S -N
IO
-- --------------------EXC
--..
EPT
------------------ --
s.
program asserts to isclat e erran t funct ios a n d code block
i
One of the simple st and most comm on m stakes that can cause almost every
i
microprocessor to genera te an except ion is to divide -by-ze ro. D vi sion by zero
causes an overflow and an u ndefi ned arithmet ic result. I n VxWork s, the ker nel in
cludes default exception handling, which suspends the task that caused the excep
tion and pri n ts out debug i n for m a t i o n to assist w i t h l oc a t i ng the cause of the
exception.
For example, the followi ng code will cause t he d ivide-by- zero exception on the
VxWorks simulator and if compiled and run on L i n ux . Nobody would ever write
code t i ?bviously wrong; however, if the denomin a tor is com puted and da ta dri
ven, d1v1s10 n by zero 1 ght not be so obvious. For the purpose o f understandi ng
.
how except10ns
work, 1 t s also one of the easiest to force on all processors.
{
}
return 1 / 0 ;
s p d i v e r ro r
t a s k spawned :
value
id
1 776 4 1 92
1 0f Of60 ,
O x 1 0f 0 f 60
E x c e p t i o n num be r o :
Scanned by CamScanner
Task :
name
S1U1
Ox 1 0 f O f SO ( s 1 u 1 )
Debuggi ng Componts
1 17
0 1 v id e E r r o r
Prog r am Coun t e r :
O x O O f 4 53 68
Status
O x 0 00 1 0 2 4 6
Re g i s t e r :
408a0b
vx TaskEnt ry
+47
d 1' v e r ro r
(0,
O,
o.
o,
o,
o,
o.
o,
o,
0)
i
NAME
ENTRY
PAI
STATUS
PC
SP
tExcTask
-e x c T a s k
1 1 08 d e 0
0 P E NO
406358
1 1 oac 0
t L og T a s k
_l o gT a s k
1 1 0 32b0
0 P E NO
408358
1 1 03 1 b0
t Wd b T a s k
- wd b T a s k
1 0 f e668
3 R E ADY
408358
1 0 f e5 t 8
s1u1
-d i v e r ro r
1 0f 0 f 6 0
1 00 SUSPEND
f 4 5368
t 0 f0ed8
va lue
OxO
Dumping e
. u
TIO
f th e
pen i o n
t io n fi r
x W rk . T h i i
a t ion m ight w a n t t
t herefore i n c l u d e
fte n
uppl
, c a l l ba k t h a t
de i n w m d ified t
in c l ud e
" s t d 10 . h "
i n c l ud e
e c L 1b . h "
Old
CHOO
( int
tas
xc
c e p t 1on
rror c
rap
tor t a s
01
I C 00
e A C oo it
rn
0,
Scanned by CamScanner
exc ec ,
int
c) ;
ln
o that the
IO,
{
l o \t Q (
kcr nrl
.1ppli dti n L1 n p r
by- 2er
er
) ;
Old
)( I
e x c S t a c k F rame )
e x c e c = O ..<
n ,
tas
IO,
1 11
::>
, -1err 1:.
mponents ana
Real-Time Embedded Co
.
rs ca l i ed in additio n to the defa u l t han.
dler
han
'fi
re
r
ee
sp
.
The added ap pl ica tio.nhandl e r l o g M s g is ou tp u t t o t
the
that
.
.
ure
ens
he
to
.
The s e t o u t ut ilit y IS u sed
dl mg.
win dshell.
->
setout
.
O r i g i n a l s et u p .
-3 '
s o ut -
s in = 3 '
s e r r= 3
e rm i n a 1 . . .
y o u r v i rt u a l t
All b e ing rem app ed t o
' ' '
.
s s a g e n ow . . .
You s h o u l d s e e t h i s me
e t h i s logMsg
You should a 1 5o se
o x 1 0f 9 2 b8 ( t 1 )
_
,
m j
x1c
a o r_o s _v e r s i o n + o
v a l u e = 32 = O x 2 0
- > testExc
.
choos e a n e x c e p t i on
-b u s e r r o r ,
[ b-
E x c e pt ion number 0 ..
Tas k :
by
d=di v ide
ion
Genera t ing d i v i d e by z e ro e x c ep t
Ox 1 0f92b8
Divide E r r o r
408a0b
vxTaskEntry
423e51
w d b F u n c Ca l l L i b i n it +a d
f452eb
_t e s t E x c
O x 1 0f92b 8
value = O
( t2) :
OxO
( t2 )
Ox000 1 0 206
status Registe r :
0)
Ox00f 4 5 1 0 e
0)
zero J
+47
+f3
423df8
:
( 1 1 0e450 ,
O,
O,
0,
0,
O,
0,
( o '
o,
o,
o,
o,
o,
o, o, o,
_t e s t E x c
_d i v e r r o r
0, 0,
(0)
T rappe d a n E x c e p t i o n f o r t a s k O x 1 0
f92b8
-> i
NAME
ENTR Y
t E xcTa sk
excTask
tLo gTa sk
_l ogT a s k
-wd bT as k
tW dbT ask
t2
va lu e =
Ox42 3df8
oxo
TID
1 1 08 d e0
1 1 0 32 b0
1 0f e 66 8
1 0f 9 2 b 8
PA I
STA TUS
0 PEND
0 PEND
3 REA DY
4 SU S P E N D
PC
408358
408358
408358
f45 1 0e
SP
1 1 08 c e O
1 1 0 3 1 bO
1 0fe 5 1 8
1 0f 9 1 c 0
Scanned by CamScanner
Debugging Components
111
testExc
->
Task : O x 1 0f0f60 ( t 3 )
Gene r a l P r o t e c t ion F a u l t
Pr ogram Count e r :
Ox00f45 1 2d
Stat us R e g i s t e r :
Ox00 0 1 0206
408a0b
_v x T a s k E n t ry
+47 :
423df8
( 1 1 0d948 , o , 0 , o , o , o , o , o , o ,
0)
423 e 5 1
wd bFunc Ca l l L i b i n it +ad
: _t e s t E x c ( 0 , 0 , o , 0 , o , 0 ,
0, 0, o,
0)
f 452cf
t estExc
- >
0 = OxO
NAME
ENTRY
TIO
PAI
STATUS
PC
SP
t E )( C T a s k
excTask
1 1 08de0
0 PEND
408358
1 1 08ce0
t Log T a s k
_logTask
1 1 032b0
0 PEND
408358
1 1 03 1 bO
tWd b T a s k
wd bTask
1 0f e668
3 READY
408358
1 0f e 5 1 8
Ox423df8
1 0f92b8
4 SUSPEND
f45 1 0e
1 0f 9 1 c 0
Ox423df8
1 0 f0f60
4 SUSPEND
f 45 1 2d
1 0f0e88
t2
t3
value = 0
OxO
1 984 - 2002
W i n d R i v e r Syst ems ,
R u n t ime v e r s ion : 5 . 5
BSP v e r s ion :
1 . 2/ 1
Created : J u l 20 2 002 ,
Scanned by CamScanner
1 9 : 23 : 59
I nc .
1 11
ms
mponents and Syste
Real-Time Embedded Co
B_COMM_ P I P E
WDB Co1111 Ty pe : WD
WOB : Re ad y .
Except ion
Vec tor O
O x00f45 1 0e
r
Pr og ra m Co u n t e
P rogram Counter
on Fault
: G e n e r a l Pro t e c t i
Ox00f451 2d
Status Regis t e r
Ox000 1 0206
On Linux, the same code causes a core dump when the shell is confi gured
to
allow this.
To ensure that core dumps are allowed, use the b u i lt-in shell comm a nd
u n l imit
CD-ROM.
on
the
[ s iewe r tsloc a l h o s t ex ] $ t c s h
[ s i ewe r t sloca l h o s t - / e x ) $ gee
- g g e n_e x c e p t i o n . c
- o genexc
[ s iewert sloca l h o s t - / e x ] $
choose a n exce pt ion
. / genexc
[ b= b u s e r ro r ,
d = d i v ide b y z e r o ) : b
Gen e r a t i n g b u s e r r o r s e g f a u l t e x c e p t i o n
Segme n t a t ion f a u l t
( c ore dumpe d )
l o a d e d a n d d e b u g g ed wit h
t ra c e and i d e n t i f i e s th e
[ s iewe r t s l oc a l h o s t - / e x ) $ g d b g e n e x c
(5.2. 1 -4)
it
u n d e r c e rtain
" sh ow copy in g
to se e t h e c o n
d i t io ns .
T h e r e i s abso l u t e l y n o wa r
r a n t y f o r GOB .
T y pe
det a i l s .
. i 3 86 _ r e d h a t - .
l 1nux
/ Q en e x c
Th i s GOB w a s c o n f i g u red as
Co re was g e n e r a t e d by
Scanned by CamScanner
" s h ow w a r r a n t y f o r
Debuggins Compon
ents
171
proga m t e rm i n a t e d wit h s ig n a
l 1 1 ' s egm
. n
ent a t 1o
.
faul t .
. / i686
R ead i ng s ym bo l s f ro m / l 1b
/ li bc . so . s . . .
do
ne
.
Lo a d ed symb o l s f o r / l i b / i686 / l i
bc . so . s
R ea ding sym bo l s f rom / l i b / l d - lin ux
. so . 2 . . . don e .
L oad ed sym bo l s f o r / l ib / ld - l inu x . s
o.2
#0 Ox 0 80 4 83 7 f i n b u s e r ro r ( ba dP
t r= Ox ff ff ff ff ) at
gen_ex ce pt io n . c : Je
so me Oa ta = * ba dP
38
tr;
( gdb) bt
Ox0 804 837 f in b u s e r ro r ( bad
Ptr =Ox fff fff f f ) at gen
_ex cep tio n . c : 38
Ox0 8 048 3d8 in mai n ( ) at gen
#1
_e x c ept ion . c : 56
#2 Ox4 201 58d 4 in l ibc _st a rt_ mai n ( )
f rom / l i b / i68 6/ lib c . so . 6
( gdb )
#0
Run the code again and this time generate the divide-by-zero exception. Now
load the core file dumped for the divide-by-zero exception generator, and, once
again, the stack trace and offending line of code can be examined with gdb.
[ s iewe r t s l o c a l h o s t - / e x ] $ . / genexc
choose and e x c e pt ion [ b =bus err o r , d=divide by zero ] : d
Generat ing d i v id e by z e r o exc eption
Float ing e x c ept i o n ( c o r e dumped )
[ s iewe r t s loc alhost - / ex ] $
siewert sloc a l h o s t - / e x ] $ gdb genexc core . 1 3473
GNU gdb R e d Hat L i n u x ( 5 . 2 . 1 - 4 )
Copyright 2002 F r e e Software Foundat ion , I n c .
GOB is f r e e s o f t wa re , c o v e red by t h e GNU General Public License , and
you a r e
under cert ain
welcome t o c h a n g e l t a nd / o r d i s t r ibute copie s of it
condit ion s .
Type " show c op y i. n g t o s e e t h e con d i t ions .
Type " show warr ant y for
Ther e i s abso l u t e ly n o warran ty for GOB
det ail s .
x" . . .
.
Thi s GOB was c o n f i g u r e d as 1396 - red hat - l inu
/ genexc '
Core was g e n e r a t e d b y
. h me t ic exc ept ion .
Pr ogram t e rm i n a t e d wit h s ig n a l 8 1 Arit
. . . do ne .
Re adin g symbo ls f rom / l 1. b / 1. 68 6 / li bc . so . 6
L o aded symb o l s f o r / li b / i686 / l b c . so . 6
don e .
2
R ead ing sym bo l s fo m / l ib / ld - l inu x . so .
L oa ded sym bo l s f o r / l ib / ld - lin ux . so . 2
. c : 31
gen _ex c ept ion
#0 Ox08 0483 68 i n d i v e r r o r ( a rg=O ) at
31
som eNu m
( gdb )
Scanned by CamScanner
1 / arg ;
1 72
components
Real-Time Embedded
pu; ...
emu
ac e i n fo rm ati o n a n d th e locah.on
ov id e st ac k tr
pr
ks
or
'
xW
V
d
So, both Li nux an
ex ce pt t" o n oc cu rs, it s easiest to
n
a
en
Wh
d
.
ra ise
pt io n is
.
.
cod e wh ere t h e ex ce
m
. e w h y t h e c od e is ca usmg th e exce pt io
in
e code to de te rm
th e Cr oss W i n d d In
sin gle-step debug th
ea sil y do ne us in g
t
e ug
os
m
is
is
th
en t,
"
the Tornado en vi ro nm
i n g se ct ion of th is cha pt e r
gg
bu
De
ep
St
ur e 9. 1 . The "S in gle
tool as shown in Fig
de bu gg er i n Torn ado.
use the sin gle -st ep
d
an
rt
sta
to
w
ho
wi ll cover exactly
nb
Prajlct Md oa..g
-,--,ltl
, ,0111111 - 1rti11 .-
. -
lvxnn@unwiise
- tf
olHI JmlEal\,jI
2.1.:!J 1!J
aih:.JccJc&I i!i)
{
i n t d i verror ( i n t arg)
O x f 9 5 1 0c
Ox f 9 5 10d
Oxf 9 5 1 0
int soaeNua ;
soaeNua l/arg
Ox f 9 5 1 1 2
Oxf 9 5 1 1 7
Ox f 9 5 1 1 9
O x f 9 5 1 la
Oxf 9 5 l ld
Oxf 9 5 1 2 0
Oxf 9 5 1 2 3
return OK ;
Oxf 9 5 1 2 6
Oxf 95128
>
'}
FIGUIE 1.1
O x f 9 5 1 2a
O x f 9 5 1 2c
Oxf 9 5 1 2 d
O x i 9 5 1 2e
PUSH
_ d 1 verror
_d 1 verror
+Ox 0 0 3
HOV
SUB
+Ox0 0 6
+OxOOb
+OxOOd
+O xO O e
+OxO l l
+0K0 1 4
+Ox0 1 7
MO V
MOY
CltiD
I D !'.'
MOV
MOY
MOV
EBP
EBP .
ESP .
ESP
48
ECX .
ECX
Ei\X .
Oxl
+OxO la
XOR
EAX .
+OxOlc
J MP
_ d 1 verror
+OxOle
+Ox0 2 0
+Ox0 2 1
>+ 0 x 0 2 2 .
MOV
ES I .
ESI
RET
MOV
ES I .
ESI
LEAVE
.
---1.-..n w
llbgT8* (Or 11..
ww:O
EAX . [ EEP+ S ]
[ EEP- 3 6 )
EAX
EDX . [ EBP- 3 6 ]
[ EBP-4 ] . EDX
*2: :
.
Using Tornado Cross Win d to An aIyze
Exception.
EAX
+
Ox20
environ
ment.
Scanned by CamScanner
-s
Debugging Components
fie Epc llllafl ........ ..... .Hiii,
. " &S S
.{
int oNua ;
llOlleNua
l/1r1 ;
SOXl , ktur
Ox8048382 .,v
Ox8048367 c l td
Oxl(,..llp )
0d048HI icliYl
Oxl04838b
. }
1 7J
""x . OX f f f f f f f c ( ,,,. b p )
llOV
return Ol ;
.{
int aoineD1 t 1 ;
someDa ta
aov
Ox804837f
llOV
Ox8048381
llOV
FIGURE
badPtr ;
Ox804 8 3 7 c
Ox8 ("4!bp ) . x
( '4! ax ) , %eax
9.2
ASSERT
Preve n t i n g except i o n s rather t h a n h a n d l i n g t h e m is m o re proactive. Certa i n l y
code can check for conditions t h a t wo uld cause a n exception a n d verify arguments
to avoi d the possibil ity of task suspension or ta rget reboot. The standard method
for t h i s type of defens ive progra m m i n g is to include assert checks i n code to isolate
errant conditions for debug. In C code, poi n ters are often passed to fu nctions for
efficiency. A caller of a function m ight not pass a vali d pointer, and t h i s can cause
an exception or erra n t behavior in the called function that can be difficult to trace
back to t h e bad p o i n ter. The following code demonstrates this with a very simple
example t h at fi rst passes
p r i n t Ad d r
# include
" s t d io . h "
# include
# include
" as s e r t . h "
c h a r v a l id P t r [ J
c h a r in v a l i d P t r
" s ome s t r i ng ;
=
=
( c har * ) 0 ;
v o i d p r i n tAdd r ( v o i d * p t r )
Scanned by CamScanner
i f po i n t e r is NULL /
ts and System
Real-Time Embedded Componen
1 74
p r in t f ( " p t r
Ox%08x \ n " ,
pt r ) ;
int main ( vo id )
p r intAd d r ( v a l id Pt r ) ;
p r intAdd r ( i n v a l i d Pt r ) ;
r e t u r n OK ;
_... _
line 1 1
Co de / a s s e r t . c ,
:: , -
( i nt ) p t r ,
y se of assert checking in code makes the error i n the calling argumen\s obvi
-out and avoids confusion. Without the assert check o n t h e pointer parameter, it
,"
-;- -""':fuight
seem that there is an error in the function called when it fi n ally attempts to
dereference or otherwise use the pointer, which would mos t l ikely cause an excep
tion. The assert check is also supported in Linux.
_
device mterfacmg. Call s mto the API can
fail for n u mer ous reas ons :
Fo r th is reason ap
pl1 ica ti n code sho uld
alw ays check ret u r n codes a nd handie
fa il ures w ith w ar n ng
og e o r se n t to
c onso le o r assert . N ot c h e cki n g re tur
codes often leads to d ffi
i cu t - to -fig ure -o ut fa il
ure s be yon d the i n it ial obv iou s AP
call failure .
Scanned by CamScanner
Debugging Components
175
.
I n Vx Works t h e tas k va ria ble e
.
rr
a1 ways md1cate s t h e last error encoun te red
dur i n g a n A P I c a l l A ca 1 1 t0 perro r ( )
m VxW orks will a l so pri n t more useful debug
. ,
.
m1o rma t ion w h e n a n A P I cal l ret urn s a
ba d code.
SIN G LE-ST E P D E B U Ci Ci l N Ci
Single-ste p d e b u ggi ng i s often t h e most
power fu I way to analyze and understand
both soft ware a lgor i t h m i c errors, ha rd ware so
/ ftware interfaces errors, and some.
even h rdwar e design flaws. Single -step debug
times
gi ng can be done at three d i f
feren t levels m m ost embe dded syste ms:
req u i res e i t her syst e m - or kernel- level debugging or processor level using TAP
( Test Access Port ) hardware tools.
Task-level debuggi ng in VxWorks is sim ple. Com mand -line debugging can be
performed d i rectly w i t h i n the windshel l . G raphical debugging can be performed
using t he Tornado tool known as C ross Wind, and then accessed a n d controlled
t h rough a source viewer that displays C code, assembly , or a m ixed mode. For
embedd ed systems , debuggi ng is describe d as cross-de bugging because the ho-t
w h i c h the debug i nterface runs does not have to be the same architec
system
on
ture as the target system being debugged. On the other hand, it certainl y can be
the same, which m ight be the case for an embedded L i n u x system or an embed ded
.
VxW orks system r u n n i n g o n I n tel arch i tecture F u rtherm ore, the embed ded
.
e
syste m may n ot have suffici ent 1/0 i n terface s for effectiv debug A cros debug
viewin g.
ging y tern r u n s a debug agent on the embed ded target and debug o u rcc
.
do, the host tool i
and comm and a n d contro l is done o n a host system For Torna
agent ac c:pt
).
Cross Wi n d a n d the debug agent i W D B ( Wind Debug The debug
tion when reque ted .
com m a n d s a n d replie s with curre nt ta rget tate i n forma
Scanned by CamScanner
1 76
Com p onents
Real-Ti me Embedded
and Systems
tur
c
fea
l
s
One o f t he mos t ba
the m. Th e re a re t wo ways t hat b reak
1
e s t e p bet ween
.
g
sin
or
n
ru
to
po in ts and
e m e n te d :
e most of ten i m pl
po in ts ar
ts
Ha rd wa re br ea k poi n
So ftw are b rea kp o i n t s
?reak
int add
ress
l'?
tha t the processor
Hardware b rea kpo i n t s req uire
tion
ruc
Po
r
nst
I
(
mte
)
P
I
the
or PC
i nes wh eth er
te r , a con 1pa rato r tha t det erm
reg1s
c
h
a
me
d
a
msm
an
to
rai se a
requested break add ess,
( Progra m Cou nter ) mat che s the
i ncl ude a
execu.
caus es a halt m
.
debu g exceptio n. The debu g exce ptio n
ler so that the user can exam in e the state of
tio n , and the debu g agent insta lls a hand
impo rtant and notab le characteristic of
the target at the point of this excep tion. O ne
the norm al th read of
nificant advantage of hardware b reakpo i n t s is that they do not modify the code being
debugged at all and are reliable even if memory is modi fied or errantly corru pted.
A processor reset or power cycle is most often the o n ly way they can be cleared.
Software breakpo i n ts are u n l i m i ted i n n u m ber. They a re i m pl e men ted by
raised, the debug agent handling of t h e e xce p t i o n is iden t ical. When the d ebug
agent ste s beyond the c u rren t breakp o i n t it must restore t h e original code it re
placed with the debug except ion i nstruc t i o n . Softwa re
b reakpo ints have the disad
vantag e t at they can and mostl y l ike l will
aded, U
111 ntt c.
:
system and t sequencer releases
n
eco nds a .
m1lhs
'I e
uses 900Yo of the
Th is
proce ssor cyc les wh 1
run s. Aft e r the two tas ks are
so
con d s
mi 1i 1se
blo c ks .
Scanned by CamScanner
s e r v 1 c eF 1 0 ever
20
1t
Debugging Components
- >
1 77
i
NAME
ENTRY
tExcTask
tWdbTask
service F 1 0
value
PC
SP
1 1 58de O
0 PEND
1 1 532b0
408358
1 1 58ce0
wdbTa sk
0 PEND
408358
1 1 4e668
1 1 53 1 bO
Ox423df 8
3 READY
408358
1 1 4e5 1 8
1 1 492b8
4 DELAY
408358
1 1 49 1 d8
s e r v i c e F20
STATUS
_logTa sk
-
t1
PRI
excTask
t LogTask
TIO
f ib 1 0
1 1 40f60
f i b20
2 1 PEND
408358
1 1 40e74
1 1 3bc08
22 PEND
408358
1 1 3bb 1 c
OxO
The t 1 task i s the sequencer and most often will be observed i n delay between
releases of s e r v i c e F 1 0 and serv iceF20. The two services will most often be observed
as pending while they wait for release. The i command is implemented by a request to
the tWdbTask ( the target agent), so it will be observed as ready ( running in VxWorks)
because it has to be run n in g to make the observation. Now, start the Cross Wind
debugger ( usually by clicking on a bug ico n ) , select Attach, and then select s e r
vic e F 1 0 and use the i command to dump the TCBs (Task Control Blocks) again.
Figure 9.3 shows the Cross Wind debugger now attached to servic eF 1 0 .
- tl JC
--
..
- --
- -
- - -
I -
M ('}l s
Ox 4 08 J S 8
Ox 4 08 35b
O x 4 0 8 3 5d
O x4 0 8 36 0
O x4 0 8 3 b 3
Ox 4 0 8 3 6 5
Ox 4 0 8 36 7
O x4 08 369
Ox 4 08 3 a
Ox 408 36 f
O x 4 08 3 7 4
FIGUltE t.J
+ O x0 5 6
T O ii:O S
+ O xO 1
+O x 0 6 4
+Ox067
+O x0 6 9
x071
+O xO 3
+OxO H
+OxO 9
-r0 x0 8 4
__
(;]
J.LD
MOV
MOV
CMP
JllE
TEST
JE
PUSH
PUSH
CALL
MOV
ES P
4
EAX
EBX
EA X . [ EBP-4 J
EAX . l
v 1 ndEx1
+ OxSc
EBX EBX
+ OxS 4
v 1 nd Ex 1
EBX
Ox 4 0 8 3 0 8
tr
EU EBX
Scanned by CamScanner
1 71
onents an d Systems
Real-Time Embedded Comp
->
i
ENT RY
NAME
PAI
TIO
PC
ST AT US
SP
e xcTask
1 1 sadeO
0 PE ND
408358
1 1 58ceo
t Exc Tas k
t Log Tas k
_logTask
1 1 53 2b 0
408358
1 1 53 1 b0
tWdbTask
wdbTask
1 1 4e668
0 PE ND
3 RE AD Y
408358
1 1 4e518
Ox423df 8
1 1 49 2 b8
4 DE LAY
408358
1 1 491 d8
t1
serv ice F 1 0 _f i b 1 0
s e r v ic e F20 _f ib20
value = O
1 1 40f60
2 1 SUSPEND
408358
1 1 40e74
1 1 3bc08
22 PEND
408358
1 1 3b b 1 c
OxO
The serviceF 1 0 task has been suspended by the WDB debug agent. Now the
debug agent can be queried for any i nformation on the state o f t h is task an
code
executing in the context of this task that the user wants t o see. I t can be smgle
stepped from this point as well. When this is done, most often the running task is
caught by the attach somewhere in kernel code, and if the user does not have full
kernel source, a disassembly is displayed along with a single-step p rompt.
Asynchronously attaching and si ngle stepping a run n in g task is often not help
ful to the user debugging the code being scheduled by the kernel a nd which calls
into the ket,nel AP i . Instead of attaching to a running task, n o w start the same se
quencer from the debugger ( first stop the debugger and resta rt the t arget or simu
lator to shut down the running tasks ) . Now, restart the debugger after the target or
simulator is running again and from the Debug menu, select Run. Start Sequencer
from the Run di log box and select Break at Entry. Note that a rguments to the
.
fu nct1on entry pomt can also be passed in from the debugger tool if n eeded. At the
.
wmdshell, output will indicate that the task has been started and that i t has hit a
breakpoint immediately on entry.
->
i
NAME
ENT RY
t ExcTask
_exc Tas k
t LogTask
_l og Ta sk
tWdbTask
_wdb Tas k
tDbgTask
_S eq ue nc er
value
Oxo
TIO
PRI
1 1 58 d eO
ST AT US
0 PE ND
1 1 532b O
0 PE NO
1 1 4 e668
3 RE AD Y
1 1 4 92b 8 1 00 SUS
PEND
PC
SP
408358
1 1 58ce0
408358
1 1 53 1 bO
40 83 58
1 1 4e 5 1 8
f940 e 7
1 1 49244
Scanned by CamScanner
Debugging Components
1 71
void Sequencer ( v o i d
)
':l
FIGURE 9.4
ii
. -
DltlJ
f.......@s-
f1o"'lil tl!L_
.,, J(l .d .
el4lc.cllfil --
-._ - .( -
j,
void Sequencer ( vo 1 d )
" !"! t o .: ue1
C' x f .. ces
PUH
HOV
ien.:er
:Sequei..... e t
Fll.:irl
c LL
41 [
0 be sure ve h l a.ec
/ J st
i i< t " H I
. l< l ( . , ...
/ Se t u
t.I
'
..,.f l O
FIGURE
t.5
debugging.
Scanned by CamScanner
FU -H
r lt
. >e l '
yJI
ll
apho res
wrv 1c r e l osa M ...
EI
EI
0>
. '
f<
t ic and TO, /
G x0 1 0
(f
t>
'
.
, .....,.
1 80
.{
iJ!t i ;
it( ( l n t ) thrucUd
0d041..2
CIOpi
O.I04lalMI jne
10WJUOJ11VICE)
SOllJ ,Ox.l( wbp)
=
ltt iaofday(llowStart ) ;
p.rlntf( "Low pno thread requf!a tin1
aharf!d naource at lid aec'
. i
1tt1"ofday(lhi1hStar t ) ;
print! ( "Hl1h pr lo thzHd nqu
t
i n 1 hued resource at
!Id c '
.i-- --- ui
in
f s iewe rtslocalh
::i
L011_rw1o_sERVICE)
:::.;
-g
!Id
CScount+-;
: r ( ( i.nt )threedid
and us.mg t h e
!Id na
pthrHd.JIUtH._lock( ...1sm ) ;
.._
::J
ost - / e x ] $
gee
- g P t h rea d
. c - lp t h re a d
p t h read
- o p t h r e ad
library
Th e Kdbg app
I'ICa tio
. n can
loa d th e p h
ba d upo n the LF
a . execu table a n d will fi nd the C
E
( Exec uta ble
an
es a sym bol
m mg For ma t
s as we ll as
) i n for ma tio n, which
th
e
so
u
rce co de p ath
I f an em bedded L
' ux a pp l '
m
ica t1o
n n eeds
d eb ugged with
to b e deb ugg
db bY con
ed, it c a n be rem otely
n ec ting to
g
db
over
r
e
Eth
m
e
m
ote
t
l y .o ver a seri
em bedd e drne . " o . u se Qdb re o e y,
al o r T C P connecti on
t l
J
yo
u
m
us
p JC a t on an d
t l mk deb ugg
the n u se
ing s t u bs wit h yo u r
Qdb t a ge
to con nect
t r e mo t e
er sen aJ to th e
l d e v / t t yS O , for ex am l
emb ed d ed
p e,
ap p l icat1. o
n fo r deb ug.
Scanned by CamScanner
Debuggin g Components
181
for com m u n ication. Li kewise, Linux requi res use of kdb the kernel debugger. The
advantage of system - level debug is that it al lows for single stepping of the entire ker
nel and all applications scheduled by it. System- or kernel -level debugging is most
often used by L i n u x d river, kernel module, and kernel featuredevelopers.
I nstead of system - level debug, a hardware debug method can be used to si ngle
step the entire processor rather than using t he system-debug software method. This
has an advantage in that n o services are needed from the software because c o m
munication i s p rovided b y h a rdware. T h e JTAG ( J oint Test Appl ication Grou p )
I EEE standard h a s evolved t o include debug funct ionality th rough the T A P ( Test
Access Port ) . A
]TAG i s a device
gle stepping, register loads and dumps, memory loads and dumps, and even down
load of program s for services such as flashing i m ages i n to no nvola t i le memory.
Figure 9.7 shows the W i n d River V isionCLI C K JTAG debugger run n i ng on a lap
top con n ected to a PowerPC eval uation board with a parallel port host i n terface
and a JTAG B D M ( Ba c kground Debug Mode) con nection to the PowerPC TAP.
..
, ,,
t
FIGURE 9.7
(!nost
l l o c k i ng o f t h e dcv ct u nde r t e t
The J TAG s tand ard i n c l ude s exte rna
a h i l i t y to clock kt d a t .1 1 n or o u t of t he fA P,
e c a >
r
often t h e proce sor cior d e '" u g ) th
1 c de od mg of Jdd n.:' , dJt.1, ,md o n t r I s
an d r eset con t ro I . T }.1 e TA u
r pro v id e ha
Scanned by CamScanner
1 11
h
a
fl
as
ng,
testi
,
ding
mg
nloa
dow
ini tial
are and
boot code. Brin ging up new ha rdw
m
elop
dev
ent
re
wa
tas
firm
a
is
k. By
t vect or
boot code that runs fr.om a system rese
ory,
t
mem
mos
e
l
i
t
vola
ofte
n to
out of non
defin ition , firm ware is software that runs
loaded
often
re,
a
softw
by the
er- level
boot a devic e a n d make it usea ble by high
des
an
d
n
provi
a
int
erface
supp ort
firmw are. The JTAG requi res no target softw are
u
d
n
a
event
,
ally
debu gged
p ro
so that initia l boot- strap code can be loade d,
rm
has
been
platfo
After the
boot
gramm ed into a nonvo latile boot memo ry device .
B
can
be used
ch as W D
strapped with JTAG, a more traditio nal debug agent u
instead for system-level or task- level debug.
K E R N E L S C H E D U L E R TRAC ES
Kernel scheduler trac ing is a critical debug method for real - t i m e systems. Real
time services run either as ISRs or as priority pree m p t ive schedu led tasks. In
VxWorks, this is the default sched u l i ng mecha n i m fo r all tasks. In Linux, the
POSIX t h read F I FO scheduling policy m ust be u ed fo r ystem scope t h reads.
Process scope PO SIX threads are only run when t he proce s that owns them is run.
m u l t i ervice system as just described, RM theory and pol icy can be used to assi gn
priorities to services and to determ i n e whether the system i s feasible. A syst e m is
feasible i f none of the services will m iss their req u i red deadli nes. Theory is an ex
cellent tart i n poi t, but the theo ret ical feasibi lity of a
yste m sho uld be veri fied
Scanned by CamScanner
Debuggi ng Components
l.1
FIGURE 9.8
1 IJ
3.Z
video encoder fra m e rate for t h i s t race wa 30 fp . The I N T 1 0 i n terrupt i a soc iated
wit h t h e t N e t T a s k a n d i s t h e l e t \ o r k I n terface ard ( N I C ) D I A com pletion i n
terrupt. T h e b u l k o f t h e e r ice ac t i v i ry between frame acq u i s i t io n a re relea e s o f
t h e s t r ea m i n g t a k a n d t h e T P/ I P network task. I d l e t i m e i n d i ca t e t h a t t h e
processor i s n o t fu l l y u t i l i zed, a n d i n a W i n d V iew trace, t h i i t i m e t ha t t he Wind
kernel d i pa t c h e r wa p i n n i n g and wa i t i n g for om e t h i n g to di patch from the
ready queue by p r i o r i ty. F i g u re 9.9 hows two ynthetic load e rv i ce ( each serv ice
computes the F i bo n acci seq uence ) : ' 1 r u n for I 0 m t'C every 20 m secs, a n d 2 ru n s
for 20 m sec every 50 m ec .
0.01
1.07
0.09
0. 1 0
0.1 1
0.12
0.13
0. 1 4
0. 1 5
0.1&
IMemtJlS
IMcmoplO
tExcTeak
tl...THk
ISllell
f'Nalnt
.eetr eat
acll
eeZI
., .
IWt
.....
...
FIGURE 9.9
Scanned by CamScanner
.. .
0. 1 7
0.11
0.1t
i
i
1 84
s
s and System
d Com ponent
edde
Emb
Time
Real-
re le as e t h e tw o se rv i ces
se q u en c e a n d
to
. ,
ed
us
s
.
re
t h e se map ho
, s e m 1 o an d s em 20 , w h i ch
rs
ce
en
On this tra ce,
qu
se
o
tw
.
.
n alo ng wi th
20, can b e see
0 . Th e ta sk d el ays are
fib1 o and f l b
f i b 1 o a n d f l b2
se
lea
re
.
o
t
s
re
ho
ap
m
se
g1ve
T he s ? h t ra c e s . for
.
ce
ra
t
ed
. p1 y de1 ay an d
s1m
sh
ha
w s a n d th e
righ t - po in tin g ar ro
d t r ce m d.1ca tes ti me
in dicated by the
tim e, a n d th e do tte _
n
io
ut
ec
ex
r
so
es
are pr oc
e. f h 1 s pa rti c ul ar tw o
f ib 1 0 an d f ib 20
n i ng , o r id le t i m
n
u
r
is
pt
rru
te
n
i
k or
se c pe ri od.
where some ot he r tas
r cy cle s ov er a l 00 m
e av ail ab le pr oc es so
th
of
%
90
es
us
rio
service sce na
ul t i p l e ) of t h e tw o rel ea se pe
( Le ast Co mm on M
M
LC
the
is
d
rio
pe
The 1 00 msec
ms ecs an d
es a re
pe cte d ex ec u t i o n tim
ex
e
Th
.
ecs
ms
0
=5
T2
riods T1 = 20 an d
rel ea ses of the ser v ices
e, i t 's c le a r t h a t t h e
rac
t
s
thi
m
Fro
52
and
C2=20 msecs for 51
u rat e a n d a s exp ect ed . T i me
exe cut ion ti me s are acc
are well syn chr on ize d, and the
c h a r d w a re t i m ers , such as
s i n g a rc h i t e c t u re - spe c i fi
sta mp ing on the trac e is d on e u
m p Co u n te r ) , wh ich is
u m , t h e TSC ( T i m e S t a
the interval ti.m er, or o n the Pen t i
a c c ura te to a m i
. I n gen era l, t i m esta m p i n g is
as acc urat e as the pro cess or cyc le rate
h a rdw are t i mes t a m p d r i ve r .
cros econ d o r less whe n supp orte d b y a
y
n t s s u c h as W i n d V i e w are c l e a rl y ver
Sc hedu ler trace s of tasks and kern el eve
S i t u a t i o n s l i k e servi ces over run
i l l u m i n a t i ng when debu gging rea l - t i m e syste ms.
ge r to zero i n on p roble matic
n i ng dead l i nes are made obvio us and allow t h e debug
n d V i e w c a n a lso be very
sequen ces and pe haps better synchr on i z e servic es. W i
_
o i mp rove effi
h l p ful when servtCes s i m p ly ru n too l o n g a n d m u s t b e t u n e d t
call
c t e nc y . The runt ime of each release can be clearly observed . U s i n g a fu n c t i o n
C1= I O
vEent,
.
.
h ard wa re m -c 1 rcu1t to collect the t race d a t a . S W I C ( So ft w a re l - c 1 r c u 1 t ) met h od s
.
.
are advantageou s because they don't re u 1. re elect r i cal probing
l ike Logic Analyzers
.
.
.
( LAs) ' no b u I. I t - m
I ogi c on processo r chips, a n d can be add e d t o syste m s m the field
co I I ec t 1o n a n d d u m pin g
a trace back to E a rt h over t he
NAS A 0ee Space Network- h op e fu l ly this wi ll
.
.
.
neve r be needed. To unde rstan d h ow m
t r us 1v e S W I C m e t h o d s s uch as Wmd
V 1ew
.
a re, you n eed to und er land t h e
eve nt m t r u m e n t ation
.
a n d t h e t race outp ut. T he
.
I ea t m t ru 1ve way to coll ect t race d at .
a I t o b u ffier t h e t race rec o rd i n me mo ry
.
( ryp1cally bit
. -e n oded 3 2 - b 1" t wo
. r d o r cac h e- I " m e - sized
.
record ) . Most p r oc e ors
h ave po ted writ e - buffe r i n t e rfa e t o
m e mo ry t h a t h av e 1. g n .1 fi c a n t t h r o u g h p ut
.
ba n d w 1'd t h . Lo k i ng more lo e 1 y a t t h e
.
.
t ra e i n p s g u r t:: 9 .8 and 9.9 earlie
. '
r m t he
c h a p t e r , 1 t c I e a r rhat eve n t s 0 u r a t L
h e ra t e f m i 1 1 1 e ond
.
n t he 200 M Hz
p ro e sor the r races were t ' ken o n . 1 n gen
eral ' '1 f t h e t ra e i of eve n t t h a t oc c u r
IC\
Scanned by CamScanner
1 85
Debugging Components
Workspace: Workspace1
I-ii I
di ng WindView ke rn el
FIGURE t. 1 0 Ad
instr um entation .
ve ry
ool k i t , w h i c h i
.
l led t h e L i n u x Tr a e T
ca
l
too
a
.
the
L i n u x can be t raced u m g
. on I a b '' t d i fft:r e n t ' b u t t he en d re ult 1
. .
.
.
1
t
a
.
si m i l a r to W i n d V 1 e w. Th e o per
erv i. es
i n l u d i n g a p p l.1 at 1on
ent
ev
rn
e
t
.
r ne l a n d Y
sa me . a S W J C f ra c e of k e
t m L in u x ke rn el wi th
u
a
1
id
u
b
d
n
a
h
u t pat
Lik e V x W ork s, t o u e LTf, yo u m
Scanned by CamScanner
1 86
1 1 49. I
chain of devices under test. A n expected test data o u t p u t sequen ce can be verified
for a test data input sequence . The in terface also incl udes a clock input and reset.
Even tually, m icroprocessor a n d firmware developer s deter m i ned that JTAG co uld
be extended with the TAP interface . The TAP allows a JTAG to send bjt patterns
through the scan chain and also allows the JT AG user to c o m mand a processor ,
single step it, load registers and m em o ry, download code, a n d d u m p registers and
memory out so that co m mands and data can be sent to t h e device under test . A
processor with a TAP can be fully clocked and con trolled by a JT AG device, which
allows the user to boot-strap i t into opera tion right o u t of the h a rdware reset logic.
From the basic JTAG fu nctionali ty, external cont rol of m icroprocessors for
fi rmware development has contin ued to evolve. M ul t i p rocessor syste ms on a
single chip can also be controlled by a si ngle J T A G w h e n t h e y a re placed on a
com m ? n boundary scan chain. The TAP a llows for selection o f one of the proces
sors with bypass logic in the chain so that p rocessor s n o t u n d e r J T AG cont rol can
forward comma nds and data. More o n - c h i p feat u res t o s u pport JTAG ha ve
eted mto
t he new syst em t 0 prov1 d e
nonvol at ile boot code to execute from the
1
p roce sor reset vector Typ ica
J I y, t h e E
1t1a ize
E PROM was program med to first m
.
a sena
. 1 port to wnte out some co
n fi1 rm 1 11g "hello worl d " data, b l i n k an LE D, and
'
. com po ne
1 m. t 1. a 1 1. ze ba 1c
dn t
n t s s ue h as m e m o ry. I f
di
the E E P RO M p rogra m
wo r k , th e fi1 rmw are pro gr a m me r
wou I d try aga i. n , repea t i n g the proc ess u n t il
progres was made.
Scanned by CamScanner
Debugging Components
1 87
After a PROM pr gram was developed fo r a new processor board, most often
the program was prov1d d as a PROM monitor to make t h ings easier for software
developers. PROM monitors would provide basic code download features, mem
ory dumps, d isassembly of code segments, and diagnosti c tests for hardware com
ponents on the system board. The PROM monitor was a common starting poi n t
for developi n g m o re soph i sticated boot code and firmware t o provide platform
services for software.
Today, most systems are boot -strapped with JTAG rather than "burn a nd
learn" or PROM monitors. The capability of JTAG and on-chip debug has pro
gressed so m uch t h at , for example, the entire Tornado/VxWorks tool system can
be run over JTAG, including W indView. This allows firmware and software devel
opers to bring up new hardware systems rapidly, most often the same day that
h ardware arrives back from fabrication. The h i story of this evolution from b u rn
and learn t o PROM moni tors t o use of JTAG and more advanced on-chip debug is
summarized well by Craig A. H aller of Macraigor Systems I Zen ] :
First there was the "crash and burn " method of debugging. . . . After
some time, the idea of a hardware single step was implemented. . . . A t some
point in history, someone (and after reading this I will get lots of email claim
ing credit) had the idea of a debugger monitor (aka ROM monitor) . . . . Sim
ila r in concept to the ROM emula tor, the next major breakthrough in
debugging was the user friendly in-circuit emula tor (ICE) . . . . The latest
addition to the debugger arsenal is on-ch ip debugging (OCD ) . . . . Some
processors enhance their OCD with other resources truly creating complete
on -ch ip deb uggers. IBM's 4xx PowerPC family of embedded processors have a
Scanned by CamScanner
1 11
RT
TR
S!,__
O!
P
El
AC!
__
__
__
_
__
__
__
__
__
__
__
__
__
__
__
__
.
.
cessor s often were i ntegrat ed i nto
Simpl er micro contro II ers an d oIder micro pro
.
c
.
ace t hat is
t Emulato r ) . An I C E me 1 u des an .mten
oth er t han
u nder test as long as no i n ternal memory dev1ces
th e state o f t he d evice
.
to an extern aJly visible m e m o ry. When on- ch ip
'
registers are I oaded a nd sto--d
cach e emerged as a Processor fieature, this made a true I C E d 1 ffii c u l t to imple ment .
.
Then the ICE had no idea what was being loaded from cache and written bac k to
cache on-chip and could easily lose track of the i n ternal state of the processo r being
emulated. The emulation provided not only all the features of JT A TAP, but also
full-speed state tracing so that bugs that were only observable r u n n i ng at full speed,
.
and not seen when single stepping, could be understood. This type of hardware or
software bug, often due to tim ing issues or race conditions i n code that is not well
syn chronized, is known as a Heisenb ug . The name ind icates that l ike the Heisenburg
Uncertainty Principle, where a particle's position and momen t u m can't be simulta
neously observed, a Heisenbug can't be single stepped and the fa ulty logic observed
simultaneously. This type of bug is observable only u nder full-speed conditions and
i s difficult to isolate and reproduce in a single-ste p debug mode. A n ICE can be in
valuable for understa nding and correcti ng a Heisenb ug.
Trace ports have emerged as a solutio n for the cost and com plexity of ICE
imple menta tion and the ability to debug full-sp eed execu tion
of software with no
intrus ion at the software level . Wind View is also an appro
ach, but does involve in
trusio n at the softw are level. Trace ports use i ntern
al h ardw are mon itors to out put
i ntern al state in form ation on a l i m ited n u
mbe r o f ded icated 1/0 pins from a
proc esso r to an exte rnal trace acqu isitio n
devi ce. For exam ple, the I P can be out
put from the processor core onto 32 outp
ut . p i n s ever y cycl e so t hat the thread of
exec utio n can be trac ed whi le a syst em
run s full spe ed. The t race ana lysis is do ne
offl ine afte r the fact , but coll ecti on
m ust occ ur a t the sam e spe ed as the exec utio ,
,
.
. Th e Jl -bit I p ca n be o rnp rt'do wn to 8 . its by inc lud ing
bu ilt -i n tra ce log i
.
tha t ou tpu t on ly rel d t i ve 0
from an i n 1 t 1al fou r - cy cle ou tpu
t of the I p at the a
t rt of the tra e. M o t o ft e n. c
br an c hes loc all y in loo p or
in dec i ion logi ra
th er th a n ma k i ng ab l u t e .
- b
Scanned by CamScanner
-set
Debugging Components
111
address b ranches. I f the code does take a 32-bit address branch then the internal
logic can indicate this so that the external trace capture system an obtain the new
I P over five o r m ore core cycles.
:race ports can be i n val uable fo r d i fficult bugs observable only at full-speed op
eration. However, a trace port is expensive, compl icated, and not easy to decode
and use because it requires external logic analysis equipment. As a result, internal
trace buffers are being designed i nt o most processor cores today. The internal trace
buffer stores a l i m i ted n umber of internal state vectors, including I P, address, and
data vectors, by a h a rdware collection state machi.ne into a buffer that can be
dumped. Trace buffers or trace registers ( often the buffer is seen as a single register
that can be read repeatedly to dump the ful l contents) are typically set up to contin
uously trace and stop t race on exception or on a software assert. Th is allows the de
bugger to captu re data up to a crash point for a bug that is only observable d u ring
extended ru ntimes at full speed . This post-mortem debug tool allows a system that
crashed to be an alyzed to deter m i ne what led to the crash. For extensive debug,
some trace buffers are essentially built-in LAs that allow software to program them
to collect a range of internally traceable data, including bus cycle data, the I P, ad
dress, data vector, and register data, to emulate the capabi l i ty that hardware/soft
ware debuggers had when all these signals could be externally probed with an LA.
external device, o probing an external bus allows you to capture POST codes and
diagnose p robl em even when the external devices on that bus are not operating
correctly. The x86 PC BIO ( Basic I nput Outpu t y t e r n ) has a rich h istory of
PO T and PO T c de o u t p u t t o well -known device and in terfaces o that config
u ra t i o n and external device fa i l u r e c an be readily d iagn o ed and fixed [ POST ) . A
Scanned by CamScanner
1 90
mponents an d Systems
Real-Time Embedded Co
np
pe
f
o
.
range
.
l
a
ff
o
i
ern
-ch
ext
p
or
p
i
h
m
-c
n
o
em
l
.
.
mo ry , eith er i nte rna
.
processors mt en'ace to me
n
ti
ry
tes
g.
mo
e
m
h
t
i
Me
w
r
mo
ry
r sho uld be fa m 1 l t a
ory. So, every firm wa re eng i nee
:
tests inc lud e the fol low ing
The walking l s test ensures that memory devices have been properly wire d to
the processor bus in terface. Byte la nes for wide mem ory b uses could be swapped,
and shorts, open circ u its, noise, or signal skew c o u l d a l l c a u se data to be corrupted
with words or arrays of words that match the width of the memory data bus. A
m e mory bus is oft ery 1 28 b i ts or wider. M ost archi tectu res support load/store mul
ory data buses wider than a si ngle 32 bit will have a b u s i n terface that m a v coalesce
to coax the compiler mto genera t i n g load/store strin g i nstructions with the proper
" st d io . h "
# i n c lude
/*
t yped e f
3 2 - b i t u n s ig n e d wo r d s
s t ruct
u n s ig n e d
m u l t iw o r d
-s
1 28 _ bi t
*/
i n t w o rd [ 4 ] ;
m u l t iwo r d t '
-
/*
t he s e d e c l a r a t i o n s s h
o u l d be
.
c a n s t mu l t 1 wo r d t t e s t z e
ro :
v o l a t i l e m u l t 1wo r a_t
v o l a t i l e mu l t 1wo r a_t
Scanned by CamScanner
a l i g n e d o n
.
1 28 - b i t
boundary * /
Debuggi ng Components
1 11
t e s t _lo c a t ion = t e s t _p a t t
ern ,.
r e g i s t e r i n t i ' 1- .'
f o r ( i=O ; i <4 ; i + + )
t e s t _pat t e r n
t e s t -z e ro I
f o r { j = O ; j <32 ; j + + )
I * walk t h e 1
t e s t _pat t e r n . word ( i ]
( O x 1 << j ) ;
/ * st r u c t u r e a s s ignment * /
a s s ig n_mul t i ( ) ;
/ * a s s e r t if st o r ed data does not have a bit s e t ;
a s s e r t ( t e s t_loca t ion . wo rd [ i J ) ;
int m a i n ( v oid )
t e st_m u l t i ( ) ;
inw
inw >
MW . out
The o t o o l is a binary utility for Darwin OS that will disassemble object code.
T he disas sembled code for c reveal s that t he com piler does not normall y gener
ate loa d/store string inst ruct ions. The load/store m ultiple instructions ( sutw and
Scanned by CamScanner
1 92
-a s s ign_mult i :
r30 , 0x fff8 ( r1 )
00002170
00002a74
stwu
00002a78
or
00002a7c
mf s p r
00002aao
bcl
00002a84
mf s p r
r8 , lr
00002a88
mtspr
lr , rO
00002aac
add is
r9 , r8 , 0x 0
00002a90
a dd i
00002a94
add is.
00002a98
addi
0000219c
00002110
00002114
00002118
000021ac
00002ab0
00002ab4
00002ab8
00002abc
00002ac0
00002ac4
stn
lwz
lwz
lwz
lwz
stw
stw
stw
stw
lwz
lw
blr
r 1 , 0x f f dO ( r 1 )
r 30 , r 1 , r 1
rO , l r
2 0 , 3 1 , 0x 2 a 8 4
r9 , r9 , 0 x 5 a c
r2 , r8 , 0 x 0
r2 , r 2 , 0x59c
rO , OxO ( r2 )
r1 1 , 0x4 ( r2 )
r 1 0 , 0 x8 ( r 2 )
r2 , 0xc ( r2 )
rO , OxO ( r9 )
r 1 1 , 0x4 ( r9 )
r1 0 , 0x8 ( r9 )
r 2 , 0xc ( r9 )
r 1 , 0xO ( r 1 )
r30 , 0xfff8 ( r 1 )
to
ruc tion s.
- m s t r i n g mw . c
mfspr
00002b28
bc l
000 02b 2c
m f sp r
00002b30
mtspr
lr , ro
add is
r9 , r 1 0 , 0 x 0
000 02b 34
00 00 2b3 8
Scanned by CamScanner
add1s
rO , l r
20 , 3 1 , 0 x 2b 2 c
r10 , lr
r2 , r 1 o , Ox o
>
O rnw
mw . out
00002b3c
add i
r9 , r9 , 0x50 4
00002b40
add i
00002b48
1t 1wi
00002b4c
blr
00002b44
1 IJ
r2 , r2 , 0x4f 4
11wi
rs , r 2 , 1 8
r5 , r9 , 1 8
i=O , j = O
mwo rd = Ox00000000000000000000000000000001
i=O ,
j=1
mwo rd = O x 00000000000000000000000000000002
i=O , j = 2
mwo rd = O x 00000000000000000000000000000004
i=O , j = 3
i= 1 ,
j =O
mwo rd
i=1 ,
j =1
mwo rd = ox00000000000000000000000200000000
i= 1 ,
j =2
mwo rd = Ox00000000000000000000000400000000
i=1 ,
j =3
mwo rd
1=1 ,
oxooooooooooooooooaooooooooooo oooo
j = 3 1 mwo rd =
i =2 ,
j =O
i=2 , j = 1
i =2 ,
j =2
i=2 , j = 3
oxoooooooooooooooooooooooaoooooooo
i =2 , j = 3 1 mword
Scanned by CamScanner
ooooooooo
0 x ooooc uooaoooooooooooooooo
ooooooo
s;a
194
_
_
_
_
_
_
_
" ,...
ialol
""'
sfrftr-..w
, . /
r
II
and Syste m s
ed Com pon ents
edd
Emb
me
Rea l-Ti
00000 00000000
1 00 000000000
ooo
ooo
oxo
=
000000000000
i=3 , j =O mwor d
00000000000
o20
o
ooo
oxoo
0000 0000
i=3 , j :: 1 mwor d =
00000000000
000
400
ooo
oxoooo
oo
i=3 , j = 2 mwor d =
ooooooooooooooo
ooo
ooo
oso
oo
oxoooo
i=3 , j =3 mwo r d =
oooooooo
ooooooooooooo
ooo
ooo
ooo
oxso
i=3 , j =31 mwor d =
. 11y, t h e most common external test equ ipdifficult to iso 1ate oth erw1se . H1sto nca
.
ment used for verifyin g h ar dware/software sys tems me l u d ed a n oscillosc ope an d a
.
.
.
.
.
e
Logic Analyzer. The osc1u oscope is used m ostlY t 0 .iso l ate s1gna
lmg issues with th
hardwa re, to verify sig.na1 s, an d to tune out pu t signa
l s. T h e oscilloscope is mos t
FOV
: :: ;
Scanned by CamScanner
<? V .
Debugging Compone
nts
1 15
'ITSC
FIGURE 9. 1 1
Likewis e, a relative ly inexpen sive MSO can be used for low-spe ed, l imited
.
w1dth digital logic a n alysis as well . Logic analyzers have an advantage over
oscillo
scopes in that they can observe 1 6, 32, or more signals at the same time, but only at
well -defi n ed l ogic levels .such as TTL (Vthreshold = l .4 V ) and CMOS (Vthreshold =
2.SV ) . The osci lloscope can be used to look at the same channels and to see their
analog natu re, i n c l u d i ng noise, rise time, fall time, and overshoot. The MSO has an
advantage in that it can display -LA output and one or more oscilloscope changes
on the same view, allowing for logic analysis with probing of the same signals to
exam ine a n alog signal issues that might be affecting digital logic.
Using t h e M S O to analyze an NTSC signal at m il liseconds of resolution, the
odd and even line raster periods can be seen at half the frame period ( frame rate is
30 fps ) . Figure 9. 1 2 shows the odd and even raster output signals measured with
the M SO from a composite outpu t CCTV (Closed Circui t Televi sion ) camer a.
NT SC is i nterlaced 2: 1 , so the field rate is 60 fps. The comp osite signa l outpu t from
the N TSC came ra c o m b i n s l u m i nance and chrom inanc e. The signal changes
based upon the sce n e that the cam era view s, but the basic perio d does not.
Scanned by CamScanner
1 96
FIGURE 9. 1 2
l ines within an odd/even raster c a n be detected. The CCTV ca meras have 5 1 0 hori
zontal pixels and 492 vertical pixels, so 246 odd l i nes a re d igitized and then 246 even
lines. Each l ine therefore has a period of 67.75 m icroseconds i n theory based u pon
...
.
FIGURE 1.1 J
Scanned by CamScanner
..
.....
'"....
.
.. . ........ ..
.
")
Debugging Components
1 17
vary significantly and has no effect. The pulse width sets the servo pos1tmn. Figure
9. 1 4 shows an MSO measurement of the PWM signal generated by the NCD 209 2
channel servo control chip .
.,
I
.
. ,
.
- - - - -
.
'
-.
FIGURE 9.14
- _, _ - I
'
-
- -
- -
put ) is 1 . 5
cen ter ing (th e defa ult out
Scanned by CamScanner
1 98
.
i nterfa ces to sens ors a n d actu ators. Logi c anal\rJ
. .
.
, ..er&
.
prob l ems m embedded system
.
.
domam d epteted earher i
d1g1tal
the
m
used
often
most
n F'1g.
.
on t he oth er ha n d a re
logic
fter
debug
to
n
used
a
be
can
alo
zer)
g
Analy
s ignals
ure 9. l l . An LA ( Logic
to
t
dnve
outpu
are
DA
signals
Cs
digital
or
before
PW "
l
h ave been digitize d or
. .
g
a
u
e
b
1
d
mtern
to
d
d
use
be
1g1t
can
LA
n
a
al
se,
Likewi
logic on
actuators like servos.
softw
re/hardw
digital
n
o
re
events
trace
to
inte rfaces
a processor board or even
the
signal
where
level
ts a co m m
1/0)
urpose
P
such as buses or GPIO ( Genera l
on
logic level. Probing h igh-speed logic signal such as those fo u n o a D O R m ern.
ory bus can be difficult due to speed, skew issue , and complex1 m decodi n g the
logic. Probing low-speed logic such as G P I O o r simpler memory mterfaces such as
SRAM is much simpler. H igh-speed specialized buses such as PCI are often most
easily traced using a bus a nalyzer that is specifically designed to captu re and ana
lyze logic signals only on one bus i nterface. Because the trend in e mbed ded
systems is to move external memory on-chip, the most useful probing is G PIO or
using external bus analyzers such as a PCI a nalyzer.
With a bus analyzer or LA used to capture G PI O output, software can be in
strumented to emit trace information to the external bus or G P I O pins for analy
sis. However, given enough memory for a software trace b u ffer, it's not clear how
this is more advantageous than the SWIC WindView style o f tracing. I n fact, Wind
River often describes WindView as a software LA. A write to a n external bus or
GPIO memory-map ped address can be j ust as i n trusive or more intrusive than
posting a write to on-chip or external memory. However, oscilliscop es and LAs
continue to be useful i n general for diagnosin g hardware problems and for tracing
software that is not instrum ented and software/hard ware i nteract ions.
wvEv ent
p int t
wE ent )
pri ntf
pri ntt
pri nt
Scanned by CamScanner
stn
bettt:
Debugging Components
1 91
SU M M A RY
v1ce 1nteract1ons can be difficult. Poorly synchronized services can suffer from race
conditions. Likewise, poorly designed hardware/software interfaces that do not
include synchronizing mechanisms can also have race conditions. ln general,
separate services should synchronize through i;maphores or message queues if
they need to i nteract, and likewise software should synchronize with hardware
through interrupts, pol l i ng status registers, or through control interfaces. Simple
functional software errors can be easily caught through use of parameter checking
and assert calls. H a rdware d iagnostics and POST run by boot firmware can make
hardware failures easier to isolate. Good practices can prevent many bugs before
they become difficult to diagnose. However, many bugs related to software/hard
ware integration will still arise; when they do, knowledge of debug monitors, trace,
and external test equipment is invaluable.
!_X ER CI SE S
a com mo n sensor o r a ctu a tor 1' n terface.
l . Use an osc illoscope to cha racterize
controller such as the. N C 0 209 an dc h arac S peC I'fi1ca 11 y, use a hob by servo
. .
.
.
servo and its hm it s of ope rati on.
by
hob
n
give
a
or
"
11
WM
tenze t h e P
tic load mg exa mp 1e nc l uded o nt he
the
syn
the
ze
aly
an
to
w
2. Us e W indVie
.
vJCes t hat use 90 o;10 o f th e proc es ser
two
es
lud
inc
it
t
l
'
f;
u
a
.
CD -RO M . By d e
an overload by mc reasmg Si run ate
cre
to
le
mp
exa
s
I
th'
sor cycles . M d J' fy
a t this ove rloa ds
U se a Wi ndV iew trac e to prove t h
s.
d
on
sec
ilh
m
0
tim e to 3
?.
!:
s
Scanned by CamScanner
200
CHAPTER REFERENCES
[ Barr99] Barr, M., Programming Embedded Systems in C and C++. O 'Re illy,
1999
[CorbetOS ] Corbet, J., A. Rubini, and G. Kroah-Hartman, Linux Device Dr
ivers
O'Reilly, 2005.
II
f
I
i
I
I
'
Scanned by CamScanner
,, .J..:-.. 1..,,.,_....
\.
I
11
10
Pe rfo rma n ce
Tu n ing
In This Chapter
INTRO DU CTI O N
The art of performance tuning a system or application is a huge 'topic that can't be
covered completely i n this chapter. The chapter covers basic methods and provides
enough knowledge for you to pursue a more in -depth understanding of perfor
mance tuning on yo ur own with further study. Performance tuning is critical to
real - time e mbedded systems when services are missing deadlines . Otherwise, effi
cient, hig h performan ce executio n of services is not necessary, although it can help
t he designer to save cost, reduce power, and simplify thermal cooling for systems.
Fu nda m ental ly, a real - time system is operatin g correctl y when it produce s the
requ ired
re ativ e to
output ( functio ns) and produces them by a specific deadlin e
req uest. Ea rly in a projec t, ystem design ers m ust size processing based upon un
co
JOI
Scanned by CamScanner
202
algori thm
Red uce the com ple xity of the
vic es
.
.
Reduce the freq uen cy of ser
.
ms with t u n m g
rith
algo
the
of
y
ienc
effic
n
utio
I ncrease the exec
.
g h a rdwa re to run w i th a faster
adin
upgr
by
t
ghpu
throu
ssor
proce
I ncrease the
and less laten cy
cloc k, more band widt h, mor e mem ory,
sion and decompr ession protocol for a data stream ) can pr ov i d e high rates of co m
pression at the cost of h igher algo r i t h m i c com plexity. I f the compre ssio n or
decompres sion can't keep up with the desi red fra me rate, t h e options inclu de a
lower performance codec that is simpler and can be executed at the desired frame
rate, or a lower frame rate. The simpler codec that provides less compressio n will
req u i re transport with higher bandwidth.
taining more resources ( e.g., more bandwidth or i n c reasi ng processor cycle rate)
more
resources o r performance red uction is performance t u n i n g to optimize the effi
ciency of service execution. This is easily s a id and h a rd to do. I deally, a system de
are not feasible or will not meet the system req u i rements. An alternative to
sign should not count heavily upon performance tuning to make it feasi b l e, but
tuning can save a project that would otherwise not work. From a n R M p erspecti ve,
tuning is j ust red ucing WCET for a software service. Another option is to offload a
service or specific functions t h at the service perfor ms to h a rdware state
machines.
Offloading functions can i n c rease overall syst e m concurrency and pro vide sig
nificant speed- up. ( Portions of the material on performance tuning prese nted
i n this chapter was first published on the I BM deve lo perWork s Web site, IBM
loa
Scanned by CamScanner
Performance Tunin g
20J
.
The crit ical pat h i n clud es cod e execu
ted at h igh frequency for perfor manc e
.
eval ution wor kloa ds and ben chm arks . This
is ofte n a sma ll port ion of the overall
code impl eme nted and may also often fit into L l
instr uctio n or L2 unifi ed cache. .
I nterr pt-ba sed profi l ing is the best place to start
analy zing software services
and deadl ine overr uns . I n terrup t profil ing requires that the system be run with a
well-de fined test worklo ad, ideally a workload that stresses the services and caus
is
ing deadl ine overrun s. The interrup t profiler samples the IP ( I nstructi on Pointer )
or LR ( Li n k Register) to determin e location in code on a periodic basis. The LR is
used in many archi tectures because it contains a return from interrupt address,
.
and the samples of where the code was in execution are taken by interrupt. Ob
serving the I P i n the sample i nterrupt routine is of l ittle use for profiling. Often a
I -millisecond sample rate is used. The workload should be repetitive at some fre
quency; for example, a video frame rate of a specific resolution, a data request rate
for a specific transfer size, a digital control loop sensor sample, and control law
output rate, or any workload that generates uniform periodic service requests.
Multiple workloads can be analyzed, but while profiling, one workload should be
run at a time. M ixed workloads are much harder to profile and understand. While
a uniform workload is being run, the profiler samples the execution locations, and
with asynchr onous sam pling relative to the workload period ( this is importan t ) ,
the profile begins t o stabiliz e a n d show where i n the code most o f the time is spent
tion )
F req uenc y of ser vic e execut ion (or fun ctio n eecu
.
tion )
se
func
(or
ice
relea
serv
each
for
th
leng
I ns t ruct10n count or path
.
ice release (or fun ctio n)
serv
h
eac
in
ion
cut
exe
e
cod
ci
Effi enc y of
.
map ued to services (tasks), to func tions , to lines
t mg c an be
N ote that t h e profil
.
r
hows how
.
ruct 1ons . Figu re 1 0. l s.
d 1vidua l mac hin e code mst
of c cod e, or eve n to in
a 32 -bit cod e add ress can be
a profi le collecte d at t h e Ieve J 0 f hits observed at
.
h 1 eveI down to a
g spent from a very h1g
mined to understa nd wh ere t 1 me is bein
.
Scanned by CamScanner
204
1,
CPU
....,,..ce
c--
....-..-i
I
'i
....... ID
f mdlm
...
Comts lll
e e I
c_, .. ,...._
cs
As m e n t io ned i n the i ntroduct ion, frequency and i nstruction count red uc ion
can solve t he performance problem, b u t at the cost of overall system performan(e.
Ideally, the goal is to ide n t ify i nefficient code, or
and to make m od ifications so those code blocks execute more efficiently. A cycle
ecu t ion efficie ncy. Poor exec ution efficiency can be the result of stalling a proce sor pipeli ne, m i sing cache, bloc k i n g reads from slow devices, mi a l i g ned d ata
t ructure , nonoptimal code generation from a compiler, bad coding pract i e in
k
One app ro a h to t u n i ng i to go t h rough code from tart to fini h an d
r
r
w
effici e nc y i ue , orrect them, and try r u n n i ng aga i n . Thi i tedio u . It a n
go
bu t the time req ui red , the low p a y -off, a n d the la rge n u m ber f od e han
en t 1
requ i red all ontribute to ri k with t h i ap pro h . A bet ter a ppro a h i t o id
a i 10"
t he hot pot , n a r rowing the o pe of ode hange ign ifi antly. and the n e.x
thi
ubset of the
Scanned by CamScanner
pt i m i 1,,a tio n .
Performance Tuning
1:
205
.
.
o pt i m izat i o n s c a n be m easu red as cod e
mod ified . The feed back appr oach reduces risk ( mo d i fyi ng a n y worki ng co d a w ays runs
_
the nsk
of i n t rodu cing a bug
the
tabilizing
code
base ) a n d pro v1 d es ,
or des
.
ioc us for the effiort F or any ted ious
e f. '
fort 1t s a l so affi r mi n g to measu re progress, and rerun ning
the
profi le and obser v.
ing hat less t i me i s spe n t i n a r ou t me aft er an opt i m ization
is
we 1 come feedback. It
sh ould be n oted t h a t h i g h - l evel per fiormanc e measur es often show no apparen t
.
.
, l ocal tzed opti m izatio ns . A n
.
impr ovem e n t 1or
.
opt1. m 1zat1on
may lead to a hurry up
.
.
and wait sce n a ri o for that P art icular function or service where a bottlenec k else .
wh ere m t h e system m a sks the i mprov ement .
The ove r a l l process bein g p ropose d 1 s ca JI ed
F i rs t , hotspots are
.
.
.1dent1fied w i t h o u t regard to why they are hot
, a n d th en c I oser ana I ys1 d etermi nes
.
.
of service code or the system a re ra t e 1 1 m 1 t mg.
why specific sect ions
Th"1 m
1 t 1 a I
. . .
lv l of a nalysis is i m porta n t because i t c a n help drive focus on t h e t ru ly rate.
hm1tmg software locks If he software is not the rate- limiting bottleneck in the
:
system, t en profilm g will still help, but will result in arriving at blocking locat ions
drill-down.
more q uickly. If the bottlenecks are related to 1 /0 latencies, this will st i l l show up
as a
otspot, b u t t h e opt i m i zation requ ired may req u i re code redesign or system
redesign. For example, status registers t hat are read and respond slowly to the read
request w i l l s t a l l execution if the read data is used right away. Read ing hardware
status i nto a cached state long before t h e data must be processed reduces stal l t i me.
Ideally, the h a rdware/s oftware i n terface would be modi fied to red uce the read
latency i n addi ti o n . M a ny system- level design improve ments can be made to avoid
of cache
1/0 bottlen ecks. For exam ple, fast-ac cess TCM is used so that the impact
c
n
i
w
no
m odern proce sor core
Scanned by CamScanner
20I
performance counters that can provide even better vision into performance issues
than interrupt-based profiles.
. .
Ideally, after a workload set is identified and perormance optt m1zations are
being considered for.inclusion in . a code base, . ongom.g perfo rmance egrss ion
testing should be in place. Performance regression tesing shoul d provide simple
high-level metrics to evaluate current perfor mance in trms of MIPs: FLOPs,
transactions or I/Os per second. Also, some level of profihng should be included,
perhaps supported by performance counters. The generation of this data sho uld be
automatic and be able to be correlated to specific code changes over time. In the
worst case, this can allow for backing out code optimizat ions that did not work
and for quick identification of code changes or added features that adversely affect
performance.
.
Some basic methods for building performance monitoring capability into the
hardware include:
We now focus on built-in event counters and how they can be used to profile
firmware and software to isolate execution efficiency hotspots. Tracing is advised
after hotspots are located. In general, tracing provides shorter-duration visibility
into the function of the CPU and firmware; but can be cycle-accurate and provides
a view of the exact order of events within a CPU pipeline. Trace is often invaluable
for dealing with esoteric timing issues and hardware or software interaction bugs,
and you can also use it to better uderstand performance bottlenecks. A profile
provides information that is mch like an initial medical diagnosis. The profile an
swers the question, where dos it h urt? By comparison, although a profile can't tell
the tuner anything about latency, a trace provides a precise measure of laten
Profiling supported by performanc e counters can not only indicate where time 15
spent, but also hotspots where cache is most often missed or code locations here
the processor pipeline is most often stalled. A trace is needed to determine stall du
ration and to analyze the timing at a stall hotspo t.
Understanding latency for 1/0 and memory access can enable better overlap of
processing with 1/0. One of the most common optimizations is to queue work and
Scanned by CamScanner
207
p t nm
g, trac ing feat ur
the most popular a p p roa ch to
es shou ld stiJI be designed into
.
the h ardware for debug a n d er 1orman ce tun mg
as well
M any ( although not a ll ) com mon CPU arc h
.
1tecture
s include trace ports or the
.
I ud e a trace port . The I B M and
option t o me
'i\MC C owerPC 4xx CPUs, X1lmx
. - ch 1p
n t commg o ff
'
to enable tracm g. Beca use the trace is encoded , 1 t 1 sn't eyeI e-accurat e, b u t 1s
accufor
ugh
h
ardwa
en
re
te
or software debug ging and for perfor mance optim izar
m
en
odern EDA ( Electro nic Design Autom atio n ) tools, ASIC design
uo . G 1:
venficatlon has advance d to the point where all significa nt hardwar e issues can be
discovered ring si1:1 ul ation and synthesis. SoC ( System-on -a-Chip) designs
make post-silicon verification difficult if not impossible, because buses and CPU
cores may have few or no signals that come off-chip. For hardware issues that
might arise with i n ternal on-chip i n terfaces, the only option is to use built-in logic
analyzer functions, such as those provided by the Virtex Chip-Scope. This type of
cycle-accurate trace is most often far more accurate than is necessary for tuning
the perform ance of firmware or software.
vect ors that the p rogram counter follows are easily encoded into 8 bits. Typical
processors might have at m<?st 1 6 i n terrupt vectors, requiring only 4 bits to encode
the interrupt source. F u rthermo re, relative branches for loops, C switch blocks,
and if blocks typical ly span short address ranges that might require 1 or 2 bytes to
encode. Finally , the occasi onal long branch to an absolute 32-bit address requires
4 bytes. So, overa ll, encod ed outpu t is typica lly 1 to 5 bytes for each branc h point
in cod e. Given that most code has a branc h densit y of l i n 10 instru ctions or less,
the encoded outpu t, which can take up to 5 cycles to outpu t an absol ute branc h, is
still very accu rate from a software exec utio n orde r and timi ng perspective.
progress between bran ch ints.
The prog ram cou nter is assu med to linea rly
ce code after 1t s ac
So, the bran ch t race is easil y correlated to C or asse mbly sour
,
loped
quired thro ugh a l ogic anal yzer or acqu isitio n t ? I such s ISCTrace deve
. 1 n , which 1s near ly the cycle rate
b y I BM for deb ug. Due to the h i gh rate of acq m1t ?
. hm1 ted. For exam ple, even a 64- MB
of t he CPU the trace window for a trace port 1s
ts or abo ut 200
trac e buffe wou l d capt ure app roxi mately 1 6 mill i n bran ch poin
s only one fifth of a seco nd of
mill.io n instructi ons At a doc k rate of I G Hz, t hat
.
dwa re and software debuggin&
Code execu tion . Thi s info rma tio n is i n val uab le for har
Scanned by CamScanner
JOI
an<l timing issues, as well as d i rect measu rement of latencies. Hoev er, m
ost ap.
.
plications and services provide a l re n u mber ? f soft wa re opera t i ons per second
.
.
.
over long periods of t ime. For v1s1b1hty m to h 1g er-level soft wa re p erfor m
an
pro fi l i ng provides i n formation t at i s m uc h easier t o use t h a n t he i n for ma
ti
.
traci ng provides. After a p rofi le 1s u nderstood, race c n prov ide
an i n va l u
a ble
level of drill-down to u nder tand poorly perfonmng sections of code.
Performance cou n ters fi rst appeared in the I B M Power PC arch ite ctur t'
in a
patent approved in 1 994. Since then, the m a n u fact u rers of a lmost all other p r0c.es
.
sor architectures have l icensed or inve n ted s i m i l a r fea t u res. The I nte l P M U
is a
wel l -known example in wide use, perhaps most often used by PC game dev elo per
.
The basic idea i s simple. I nstead of directly t racing code exec ut ion on a CPU, a
built-in state mac h i ne is programmed to assert a n i nterrupt periodically so th at an
JSR can sa m ple the state of the C P U , i n c l u d i ng t h e c u rrent add ress of the progr am
cou nter. Sam pling the program c o u n ter address is the s i m p lest form of perfo r
mance mon i toring and produces a histogram o f the add resses where the progr am
counter was found when sampled. This h istogram ca n be mapped to C fu nct io n
add ress bins ) , and n u mber of cycles that are spent i n each fu nct ion. With 32-bit
\\'Ord-sized address bins, the p rofile can provide t h i s i n fo rm a t ion down to the in
struction level and t herefore by l i n e of C code. M ost performance count er state
machi nes also i n c l ude event detec t i on fo r C P U core eve n t s , i ncluding cache
sign ifi
code) that have hotspots. As sta ted earlier, a hotspot is a code location where
cant time is spent or where code is exec u t i ng poorly. For example, if a pa rtic ulr
function cause a cache m iss coun ter to fire the sam p l i n g i n terrupt frequently. th
indi ate that the fu nction should be exam i ned for data and instruct ion ca che ineffi
c ienc ie
uch
as
en !'
u n
and e, i t point T h i s u e of coun ters with
t hose co u nt er
sam
ple
i
n
l
i
n
e
to
code
.
.
(ier red (
t ru ment a tio n ) 1 a h ybnd approach between trac
ing and profi l i ng, oft en re .
10
b 1
a evellt tracl. flg. Th e perfor ma n e o u n t e r req u
ire a hardw a re ell u t t in ..
P U core, but a bo req u i re some firm ware Jnd
e a prvni<
o ft ware: upport to p ro d u
d \, lft'
hM
or c ent t ra e. f f the cot, omplc xi t y, or
of
on
hed ule preve nt in l u)i
in st od.
uppor t for perfor mance moni tori ng, p u re
ofrw,1re method an be u :,ed
Scanned by CamScanner
Performance Tuning
JOI
U LDI N G PE RFO RM A N C E M O N I
TO R I NG I NTO SO FTWA R E
B I
.
system req mre fi rm ware and software sup por
t:
Software event logging to memory buffer ( software only, but hardwa re time
stamp i mproves quality )
Fu nction or block entry/exit tracing to a memory buffer ( software only, with
post-bu ild modific ation of binary executa bles)
Software event logging is a pure software in-circuit approach for performance
monitoring that requ i res no special hardware support. Most embedded operating
system kernels provide an event logging method and analysis tools, such as the
WindView and Linux Trace Toolkit event analyzers. At first, this approach might
seem really i n t rus ive com pared to the ha rdware-supported m ethods. However,
modern architectures such as the PowerPC provide features that make event log
gi n g t races very efficiently. Architectures such as the PowerPC i n c l ude posted
write b u ffers for memory writes, so that occasional trace i n st ructions writi n g
bit-encoded event codes and ti mestamps t o a memory buffer take no more than a
singl e i n struction. G iven that most functions are on the order of h undreds t o
thousands of i n st ructions i n length ( typically a line o f C code generates m ultiple
assembly i nstructions), a couple of additional inst ructions added at fun c t io n entry
and exit contribute little overhead to normal operation.
Eve n t trace logs are i nvaluable for understanding operati n g system kernel and
application code event t i m i ng. Although almost no hardware support is needed,
an acc u rate t i m estamp clock makes the trace t i m i ng more useful. W ithout a hard
ware clock, you can still provide an event trace showing only the order of events,
embedde d system, the capabilit y to collect h uge amounts of data is almost over
whelming. Effort put into data collection is of questionab le value in the absence of
a plan to anal yze that data . Trac e i n form ation i typically t h e easiest to analyze and
is taken as needed for sequences of interest and mapped to code or to
Scanned by CamScanner
timeline of
210
gh
OS events. When traces are mapped to code, enginee rs can replay and step th r
code as it ran at full speed, noting branch es taken and overall exec utio n flow or
event traces mapped to OS events, enginee rs can . analyze multithread cotex:t
. g events sue h as semaph
switches made by the schedu ler, t h rea d sync h ron1zm
ore
takes and gives, and appI .tcat10 n-spe c1' fiIC event s.
Profiles that collect event data and PC location down to the level of a 32_bIt
address take more analysis than simple mapping to be useful. Perfo rma nce tuni
effort can be focused well through the process of drilling down from the c g
module level, to the function level, and finally to the level of a specific line of cod e
Drilling down provides the engineer analyst with hotspots w h ere more detailed in formation, such as traces, should be acquired. The drill-down process ensu res th at
you won't spend time optimizing code that will have little impact on bott om l ine
performance improvement.
.
e.
Armed with nothing but hardware support to timestamp events, you can still de
termine code segment path length and execution efficiency. Ideally, performance
counters would be used to automate the acquisition of these metrics. However,
when performance counters aren't available, you can measure path length by hand
in two different ways. First, by having the C compiler generate assembly code, you
can then count the instructions by hand or by a word count utility. Second, if a sin
gle-step debugger is available ( for example, a cross-debug agent or JTAG ) , then
you can count instructions by stepping through assembly by hand. Althoug h labo
rious, it is possible, as you ' ll see by looking at some example code.
The code in Listing 1 0. l generates numbers in the Fibonacci sequence.
LISTING 1 0. 1
t y pede f u n s i g n e d i n t U I NT32 ;
#def i n e F I B_L I M I T_FOR 32 B I T 4 7
U I NT32 i d x
U I NT32
seqCnt
O,
j dx =
1 ;
= F I B_L I M I T_FOR_32_B I T ,
U I NT32 f i b = 0 ,
f ibO = 0 ,
fib1
1 ;
v o id f ib_wr a p p e r ( v o i d )
f o r ( id x =O ;
idx
f i b = f i bO
Scanned by CamScanner
<
i t e rC n t ;
f i b 1 '
idx + + )
i t e rC n t
1 ;
Performance Tuning
whi l e ( j d x % l t ;
21 1
s eqCn t )
{
f ibO
f ib 1 ;
fib1
j dx++ ;
fib ;
f i b = f ibO
fib1 ;
}
}
}
The easi est way t o get an i nstr ucti on cou nt for a bloc
k of code such as the
Fibo nacci sequ en e-gen erati ng funct ion in Listin g I 0. 1 is to comp ile the C code
into assembly. With t h e GCC C comp iler, this is easily done with the foJlow ing
com ma nd line :
gee f ib . c
-S
- o f ib . s
The resulting assembly is ill ustrated in Listing 1 0.2. Even with the automatic
generation of the assembly from
hand. For a simple code block such as this example, hand counting can work, but
the approach becomes ti me-consuming for real-world code blocks.
LISTING 1 0.2
. g l o b ! _f ib_w r a p p e r
. s e c t i o n TEXT , t ex t , r e g u l a r , pure_ i n s t r u c t ions
. a l i gn 2
_f ib_wr appe r :
s t mw r30 , - 8 ( r 1 )
stwu r 1 , - 48 ( r 1 )
mr r30 , r 1
mf l r rO
bcl 20 , 3 1 , " L000000000 0 1 $pb "
" L0000000000 1 $ p b " :
mf l r r 1 0
mt l r rO
00 1 S pb " )
i d x - " L0 00 00000
pb " ) ( r 2 )
la r 2 , l o 1 6 ( _i d x - " L00 000 0 000 0 1 $
a dd i s r 2 ' r 1 0 , h a 1 6 (
l i ro , o
stw rO , O ( r2 )
L2 :
00 00 00 1 Sp b " )
id x - " L0 0 00
1 Sp b " ) ( r9 )
la r9 , lo 1 6 ( _ idx - " LOOO O 000 000
000000 1 S p b " )
" L 00 00
a d d i s r 2 , r 1 0 , h a 1 6 ( _ I t e r at 1. o n s
" ) ( r2 )
OOOO000 0 1 Spb
la r 2 , l o 1 6 ( _ I t e r a t i on s - " LO O
a dd i s r9 ' r 1 0 ' h a 1 6 (
lwz r9 , 0 ( r9 )
Scanned by CamScanner
111
and Systems
Real-Time Em bedded Components
lwz r0 , 0 ( r2 )
cmp lw c r7 , r9 , r0
blt cr7 , L5
b L1
L5 :
1 $pb " )
a d d i s r 1 1 , r 1 0 , h a 1 6 ( _f i b - " L 0000 0000 00
1)
l a r 1 1 , lo 1 6 ( _f ib - " L0000 00000 0 1 $pb " ) ( r 1
a d d i s r9 , r 1 0 , h a 1 6 ( _f ib0 - " L0000 00000 0 1 $p b " )
la r9 , lo 1 6 ( _f i b0 - " L00000 00000 1 $pb " ) ( r9 )
addis r2 , r 1 0 , h a 1 6 ( _ f i b 1 - " L000000 0000 1 $p b " )
l a r2 , l o 1 6 _f i b 1 - " L000000 000 0 1 $pb " ) ( r2 )
lwz r9 , 0 ( r9 )
lwz r0 , 0 ( r2 )
add rO , r9 , rO
stw r0 , 0 ( r 1 1 )
LS :
addis r9 , r 1 0 , h a 1 6 ( _ j d x - " L0000000000 1 $ p b " )
la r9 , lo 1 6 ( _ j d x - " L 0000000000 1 $pb " ) ( r9 )
a d d i s r2 , r 1 0 , h a 1 6 (_se q i t e r at i o n s - " L000000000 0 1 $ p b " )
la r2 , lo 1 6 ( _se q i t e rat ion s - " L0000000000 1 $ pb " ) ( r2 )
lwz r9 , 0 ( r9 )
lwz r0 , 0 ( r2 )
cmplw c r7 , r9 , r0
blt cr7 , LS
b L4
LS :
addis r9 , r1 0 , h a 1 6 ( _ f i b0 - " L0000000 000 1 $ p b " )
la r9 , lo 1 6 ( _f ib0 - " L000000 00001 $pb " ) ( r9 )
addis r2 , r 1 0 , ha 1 6 ( _f i b 1 - " L0000 00000 0 1 $p b " )
la r2 , lo 1 6 ( _f ib 1 - " L0000 00000 01 $pb " ) ( r2 )
lwz r0 , 0 ( r2 )
stw r0 , 0 ( r9 )
a d d i s r9 , r 1 0 , h a 1 6 ( _f i b 1 - " L000
0000 000 1 $p b " )
la r9 , lo 1 6 ( _f i b 1 - " L00 000 000
00 1 $p b " ) ( r9 )
a d d i s r2 , r t O , h a 1 6 ( _f i b - "
L00 000 000 00 1 $pb " )
l a r2 , lo 1 6 ( _f i b - " L00 000 000
00 1 $ pb " ) ( r2 )
lwz rO , O ( r2 )
stw r0 , 0 ( r9 )
a d d i s r 1 1 , r 1 0 , ha 1 6 (
_f ib - " L0
la
0 1 $p b " )
1 1 , lo 1 6 ( _ f ib - " L0000000000 1 $p00b 00
" ) ( r1
a d d 1 s r9 , r O , h a 1 6
( _f i b0 -
00 00
1)
" L0 00 00 00 00 0 1 $p b
")
la r9 , lo1 6 ( _f ib0 - " L0
0000 0000 0 1 $p b " ) (
r9 )
addis r2 , r 1 0 , ha 1 6 ( _
f ib 1 - " L00 0000
0000 1 Sp b " )
la r2 , lo 1 6 ( _f i b 1 - Lo
ooooooooo 1 s pb
" ) ( r2 )
Scanned by CamScanner
Performance Tuning
J1J
lwz r 9 , 0 ( r9 )
lwz r0 , 0 ( r2 )
add rO , r9 , rO
stw r0 , 0 ( r 1 1 )
add i s r9 , r 1 0 , ha 1 6 ( _ j d x - " L00
000
la
000 00 1 $
I f you ' re looking for a less tedious approach than cou n t i ng i nstructions b y
hand from assembly source, you can use a single-step debugger and walk through
a code segment in disassembled format. An interesting side effect of this approach
for c o u n t i ng i nstructions is that the counter often learns quite a bit about the flow
count of i nstruction s in between. Watching the debugger auto- step and count i n
structions between a n e n t ry poin t a n d exit poi n t for code c a n even b e an i nstruc
tive expe rience that may spark ideas for path optim izatio n .
be compi l ed for G4- -or G S
List i n g 1 0. 3 p rovide s a mai n progra m that can
Scanned by CamScanner
214
:
te
n
ance
rs
cou
orm
perf
m
erPC
e mbeddtd
Pow
and even t coun ters are foun d i n the
I BM PowerPC cores .
usnNG 1 o.J
--
# inc lude
" s t d io . h "
#include
I I Simply p a s s in
PMAP I c o u n t e r s and
I I MONSTER p r o v i d e s a f u l l y f e a t u r e d s e t o f
f o r t h e Mac int osh
I I a n a l y s i s a long w i t h t h e f u l l s u i t e of CHUO t o o l s
I I G s e r i e s of PowerPC p r o c e s s o r s .
II
- DPMU_ANAL Y S I S .
F o r t h e P e n t ium ,
o n l y C P U c y c l e s will be
I I m e a s u r e d and C P I e s t im a t ed b a s e d u p o n k n own
i n s t r u c t ion count .
II
I I F o r t h e M a c i n t o s h G4 ,
II
II
II
II
II
s i mply l a u n c h t h e m a i n
s h e l l w i t h t h e MONSTER a n a l y z e r s e t u p f o r
p r o g r a m f r om a Mac OS
r e m o t e m o n i t o ri ng to
f o l low al o n g w i t h t h e e x am p l e s i n t h e a r t i c l e .
L e a v e t h e #def i n e LONGLONG_OK i f
su pport 64 - b it
unsigned integers ,
I I ANS I C .
y o u r c o mp i l e r a n d a r c h it ectur e
d e c l a r e d a s u n s i g n e d long long in
II
II
II
II
I f not ,
p l e a s e remove t h e # d e f i n e be low f o r 32 - b i t u n s i g n
ed
long d e c l a r at io n s .
#d ef i n e LO NG _ LO NG_OK
# d e f i n e F I B_L I M I T FOR 32 B I T
47
-
ty ped ef u n s ig n e d i n t
U I NT3 2 I
# i fde f MO NS TE R ANAL
YS I S
# i n c lud e
cH UO l c h ud . h "
#inc lude
ma c h l bo o l e a n . h "
# e l se
Scanned by CamScanner
Performance Tuning
21 5
# i fdef LONG_LONG_OK
t yped ef u n s ign ed lon g lon g int
U I Nrs4 ;
U I NT64 s t a r t TSC
U I NT64 st opTSC
O;
O;
U I NT64 c y c l eC n t
O;
UI NT64 readTSC ( vo id )
{
U I NT64 t s ;
asm
v o l a t i l e ( " . by t e OxO f , Ox
31 "
return t s ;
__
__
" =A "
(ts) ) ;
}
U I NT64 c y c l e s E l a p s e d ( U I NT64 st opTS ,
U I NT64 startTS )
{
ret u rn
( s t o pTS
s t a r t TS ) ;
}
#else
typedef s t r u c t
{
U I NT32 low ;
U I N T32 h i g h ;
TS64 ;
TS64 s t a rt T SC
TS64 st opTSC =
TS64 cyc leCnt
{O ,
{O ,
= o;
O} ;
0} ;
TS64 rea d T SC ( v o i d )
{
TS64 t s ;
__
as m
__
v o l a t i l e ( " . by t e
Ox 0f , Ox 3 1 "
" =a "
return t s ;
}
pTS ,
TS6 4 c y c l e s E l a p s e d ( T S6 4 s t o
{
U I NT32 o v e r F l owCn t ;
U I NT32 c y c l e C n t ;
TSR4 l a n sedT :
Scanned by CamScanner
TS64 s t a r t TS )
( t s . l ow) ,
" =d "
( t s . h ig h ) ) ;
111
ove r F l owC n t
) ;
i
s t a r t T SC . h g h
( s t opT SC . h i g h
opr sc . 1ow
i f ( o v e r F lowCnt && ( s t
<
s t a r t TS C . l o w ) )
{
overF lowCnt- ;
c y c l eCnt
s t a r t T SC . low )
s t o p TSC . lOW -
s t a r t TS C . l OW j
( Oxf f f f f f f f
}
else
{
c y c leCn t
}
e la p s e d T . low = c y c l eCnt ;
elapsed T . high
ov e r F lowC n t ;
r e t u rn e l a p s e d T ;
}
#endif
#endif
U I NT32 idx
o,
jdx = 1 ;
1;
UI NT32 f i b
OI
f i bO
OI
# d e f i n e F I B_TEST ( s eqC n t ,
.
for ( idx=O
;
idx $ l t ;
f 1' b I
i t e rCn t )
i t e rCn t I
1 j
idx++ )
{ \
fib
f ibO + f i b 1 ;
wh ile ( j d x
<
s eqCnt )
\
\
{ \
f i bO
f ib 1 ;
f ib 1
fib;
\
\
f i b = f i bO + f i b 1 ;
j d x++ ;
} \
} \
void f i b_wrappe r ( v oid )
{
F I B_ TES T ( s e q l t e ra
Scanned by CamScanner
t ions ,
I t e r a t i on s ) ;
s t o p TS C . l ow ;
Performance Tuning
# ifd ef MONSTER_ANALYSI S
ch a r lab e l [ ] = " F ib o n a c c i Se r ie s " '
in t m a i n (
i n t a r gc ,
c h a r a rg v ( ] )
{
doub le t b e g i n ,
if ( a rgc ==
t e lapse ;
2)
{
sscanf ( argv [ 1 J '
s eq i t e r a t io n s
I t e ra t io n s
}
else if ( argc ==
1)
p r i nt f ( " U s i n g d e f a u l t s \ n " ) ;
else
p r i n t f ( " Usage :
f i btest
[ N um iteration s } \n " ) ;
c h ud i n it i a l i z e ( ) ;
TRUE ) ;
ch udAc q u i r e R emoteAcc es s ( ) ;
t b e g i n = c h ud R e a d T imeBa s e ( chudMic roSecon ds ) ;
I t e r ations ) ;
c h u d S t o p R e m o t e P e rfMoni t o r ( ) ;
t e l a p s e = c h u d R e a d T imeBa s e ( c hudMic roSecond s ) ;
p r i n t f ( " \ n F i bo n a c c i ( \l u ) =\lu
s eq i t e ra t ion s ,
fib,
c h u d R e l e a s e R emo t eAc c es s ( ) ;
ret u rn O ;
}
#else
ER 1 5
#de f in e I NST CNT F I B_ I N N
R 6
#d e f i n e I NS T CN T F I B_ OU TTE
Scanned by CamScanner
fib,
( t elapse - t begin ) ) ;
J1 7
211
double c l k R a t e = o . o , f ibC P I
U I NT32 instCnt
o;
)
=
o.o;
if ( a rgc = = 2 )
sscanf ( argv [ 1 J ,
e l s e i f ( argc = = 1 )
print f ( " Us i n g d e f a u l t s \ n " ) i
else
prin t f ( " Usage : f ib t e s t [ Num i t e ra t ions ) \ n " ) ;
instCnt = ( I NST_CNT_F IB_ I NNER
s e q i t e ra t i on s )
I I E s t imate CPU c lo c k r a t e
s t a r t TSC = readTSC ( ) ;
u s l e ep ( 1 000000 ) ;
st opTSC = readTSC ( ) ;
#end if
Scanned by CamScanner
_,
Pmormance Tuning
star tTSC
211
readTSC ( ) ;
stopT SC = readTSC ( ) ;
I I ENO T imed F ibonacci Test
#ifdef LONG_LONG_OK
printf ( " s t artTSC =Ox\0 1 S x \
' startTSC ) ;
t
f
(
"
s
t
o
pTSC
=Ox\0 1 6x \ n , st opTS C ) ;
prin
prin t f (
print f (
#else
p r i n t f ( " startTSC h igh=Ox\0 8x , startTSC low=Ox\08x \ n " , startTSC . high ,
startTSC . l ow ) ;
,
p r i n t f ( " stopTSC h i gh=Ox\ 08x , stopTSC low=Ox\08x \ n " , stopTSC . h igh
stopTSC . l o w ) ;
);
cycl eCnt = c y c lesE laps ed ( stopTSC , sta rtTSC
) \ n " , seq l te rat ion s , f i" b ' f ib ) ;
p r i n t f ( " \ n F ibon ac c i ( \l u ) =\lu ( Ox\0 8lx
cyc leCn t . low ) ;
p r i n t f ( " \ nC y c l e Cou nt=\ lu \ n " ,
n " , cyc leC nt . h igh ) ;
p r i n t f ( " Ove rFl ow Cou nt=\lu \
low ) I ( ( dou ble ) ins tCn t ) ;
fib CP I = ( ( dou bl e } cyc leC nt .
f ibC PI ) ;
p r in t f ( " \ nCP l =\4 . 2f \ n " ,
#endif
}
#end if
, the M ON ste r
bu ilt for the M ac int os h G4
3
0.
1
g
tin
Lis
m
R un ni ng th e code fro
th e Fibo na cc i
d exec ut io n efficiency fo r
an
h
gt
len
th
pa
an alysis to ol de te rm i n es the
in Li st in g 10 .4.
fig k ' as su m m arized
seq ue nc e co de bl oc
.
ted us in g the M O N ste r on
lec
ol
c
s
wa
4
0.
1
in Li st in g
m t he
T he ex amp le analysis
C an d downl oa da bl e fro
GC
ith
w
d
le
pi
e ea sil y co m
u r at i on an d so u rc e cod
C D- RO M.
Scanned by CamScanner
220
LISTING 1 0.4
enabled , Threshold :
o , M u l t i p l ier :
2x
Timebase res u l t s
1 - CPU Cyc l e s
( p l c l ) PMC 1 :
d
( p 1 c 2 ) PMC 2 :
2 - I n s t r comp l e t e
( tbl ) Pl
CPI ( c omp l e t e d )
P l Timebase ( c pu c y c l e s )
_
P 1 pmc 1
SC res 1
Tbl :
( P1 )
( cpu cycles )
2 - I nst r Complet ed :
1 - CPU Cyc l e s :
CPI ( c omplet ed )
84 1 3 1 60
P 1 pmc 2
( p1 )
1 6 1 8307
5 . 1 9874 1 64790735
Tb1 :
( Pl )
( c pu cycles )
2 - I n s t r Comp l e t ed :
1 - CPU Cyc l e s :
CPI ( c ompleted )
7346540
7374481 . 7 1 0 1 0 7035
5 . 398402349080333
Tb1 :
(Pl )
( cpu cycles )
2 - I n s t r Complet ed :
1 - CPU Cy c l e s :
CPI ( comple t ed )
7207497 . 309806291
7 1 80243
4 . 302282649205663
Tb1 :
(Pl )
( cpu cycles )
2 - I n st r Compl e t e d :
1 - CPU C y c l e s :
CPI ( complet ed )
44808388
44985850 . 44768739
1 . 1 60973752343601
Tb1 :
( P1 )
( cpu cycl es )
2 - I n s t r Compl eted :
CPI
1 - CPU C y c l e s :
( c omplet ed )
470597098
472475463 . 2072901
1 . 026246290797887
Tb1 :
( P1 )
( cpu cycles )
2 - I n s t r Compl e t ed :
2 1 49357027 . 379779
(P1 )
CPI
1 - CPU C y c l e s :
( c omplet e d )
2 1 40806560
1 360873
( p1 )
1 668938
(p1 )
38595522
(p1 )
4 5 8 5 6 1 558
( p1 )
2 1 08 61 9999
1 . 0 1 526427759 1 63 1
S a m - Sie we r t s - Com put e r : - / TSC
Scanned by CamScanner
Performance Tuning
F ibonac c i ( 2 7 ) = 3 1 78 1 1
221
. / pe rfmon 1 0000000000
( OxOOOOOa 1 8 )
Note that i n the example runs, for larger numbers of iterations, the Fibonacci
sequence code becomes cached and the CPI drops dramatically from 5 to 1 . If the
Fibonacci code were considered to be critical path code for performance, you
might want to ((. 'Sider locking this code.block into LI instruction cache.
(Ti meStamp Counter) build, which counts CPU core cycles and uses a hand
counted in struction count to derive CPI, is included on the CD-ROM. You can
download and analyze this code on any Macintosh Mac OS X PC or Wi ndows PC
running Cygwin TM tools. On the Macintosh, if you use MONster with the Remote
feature, the appl ication will automatically upload counter data to the M ONster
analysis tool, including the cycle and i nstruction counts for the Fibonacci code
block.
can
be used
are
Scanned by CamScanner
222
Some more advanced methods for optim izing code segment s incl ude the
following:
SUMMARY
After the current marriage of software, firmware, and hardware is op tim i zed , how
does the architect improve the perfo rmance of the next generation of the system?
At some poi nt, for every product, the cost of further code optimization will out
weigh the gain of improved performance and m a rketability of the product. How
ever, you can still use performa nce a nalysis features such as performa nce co unters
sumer of CPU resources, and the system is C P U - b o u n d so that the CPU is the
bottleneck, then hardware rearch itecting to address t h is is beneficial. That part icu
lar fu nction might be considered for i mplementation i n a h ardwa re - accel erat ing
ne
state achi ne. Or perhaps a second processor c o re could be dedic ated to that
lln
Scanned by CamScanner
Performance Tuning
22J
EX E RCI S E S
l.
2.
3.
Use the Pentium TSC code to count the cycles for a given code block.
Then, in the single-step debugger, count the number of instructions for
the same code block to compute the CPI for this code on your system.
Provide evidence that your CPI calculation is accurate and explain why ifs
as high or low as you find it to be.
Compare the accuracy for timing the execution of a common code block
first using the CD-ROM time_stamp.c code and then using the Pentium
_
TSC . Is there a significant difference? Why?
Download the Brink and Abyss Pentium-4 PMU profiling tools and pro
file the Fibonacci code running on a Linux system . Explain your results .
CHAPTER R E F E R E N C E S
[ Siewert]
Scanned by CamScanner
,,. .
. ..
11
'
In
This Chapter
Reliab ility and Availa bility: Simila rities and Differ ences
Reliab ility
Reliable Software
Available Software
Design Tradeoffs
INTRODUCTION
High availability a n d high reliability ( HA/HR) are measured differently, but both
provide some indication of system robustness and quality expectations. Further
more, because system failures can be dangerous, both also relate to system safety.
I ncreasing availability and reliability of a system requires time and monetary in
vest ment, increasing cost, and increasing time-to- market for products. The worst
.thing that can come of engineering efforts to increase system availability and relia
bility is an unintentional decrease in overall availability and reliability. When de
sign for HNHR is not well tested or adds complexity, this un intentional reduction
.
in H A/ H R can often be the result. In this chapter, design methods for increasing
HA/H R are introduced along with a review of exactly what HA and what H R really
m eans. ( Portions of the material on performance tuning presented in this chapter
was first published o n the IBM developerWorks Web site, IBM developerWorks.
Reprinted with permission.)
Scanned by CamScanner
226
Availability is
or service. can go down and still meet the five n ines criteria. Five nines is a
high
availability, or HA metric.
In contrast, high reliability ( H R) is perhaps best described by the old adage that
a chain is only as strong as its weakest link. A system built from components that
have very low probabil ity of fail ure leads to high system reliability: The overalJ
'expected system reliability is the product of all subsystem rel iabili ties, and the sub
system rel iability is a product of all component reliabilities. Based upon this math
ematical fact, components are required to have very low probability of fail ure if the
subsystems and system are to also have reasonably low probab ility of failure. For
example, a system of
or
. _, ..
..
txpe of s i n gle-string design can greatly reduce overall rel iability. For example,
I ( MT B F
typ ical.
MTTR )
. it h a
a va i lab i l i ty if
s very fast recovery time s. From this relat ion , you can see t ha .
o ne syste m w t. t h five n i nes HA mig
ht hav e low rel iabi lity and a very fast recorf'
( ay 1 econd ) and therefo re go down and recov
a yea!
er 3 ' 000 o r mo re t i m es in
( 1 0 ti me a day' ) An H R syste m m i gh t go down
ait for
once every two yea rs a n d ..
where MTBF
.
Scanned by CamScanner
qual 1t'
227
RE L IA BI LITY
The<;>retically, it's possible to build a system with low-quality, not-so-reliable compo
nents and subsystems, and still achieve HA. This type of system would have to in
clude massive redundancy and complex switching logic to isolate frequently failing
components and to bring spares online very quickly in place of those components
that failed to prevent interruption to service. Most often, it's better to strike a balance
and invest in more reliable components to mini mize the interconnection and
switching requirements. If you take a simple example of a system designed with re
dundant components that can be isolated or activated, it's dear that the interconnec
tion and switching logic does not scale well to high levels of redundancy and sparing.
Consider the simple, single-spare dual-string system shown in Figure 1 1 . 1 .
Ct
Cl
C2
C2
C3
C3
FIGURE 1 1 . 1
Dual-string, cross-strapped
subsystem interconnection.
The exa mp le syst em s h 0wn in Figure 1 1 . l has eight possible configurat ions
that can be formed b Y activatin g and isolating components C l , c 2 , o r C3 from s 1de
and
A or si'd e B The system m ust be able to positively detect failed components
.
an operable switch state and new mtercontrack these failu res to reco nfiigure with
ble 1 1 1 describes the eight possible co n figu
nect ion of activate d compone n ts. Ta
exam p1 e .
. sma 11 scale HA sys tem
rations for t h is
. ple exam p 1e sh. own I. n F igure l l . l a trade-off can be made between
F ro m t h e s1m
. h
nec ting co mponents and redundancy management wit
0 f inte rcon
.
the comp l ex1ty
.
' g H R com ponen ts . The cost of hardware components with H R
the cost of me 1 ud m
b estimated based upon c o mpo ne nt tsting, expected
is fairly well k now n an d ca
acterist ics , pac k a gi ng, and the overa ll p hy sic a l
, MT BF, ope ratt on l cha r
'
failu re rate s
Scanned by CamScanner
111
TAllU
1 1 .1
(..........
C.. c2
c..,.1111 C1
Cllf es
System architects should also consider three simple parameters before invest
S y s t e rn A u t o n o m y
A n d A u tomatic
R e c o verv
D i a g no s i s
R e - c o n f i g u r a t ion
S u b- S ys tern E r ror
De
ion
'
FIGURE 1 1 .1
Scanned by CamScanner
Detection a n d
C o r rec t1on
C o mputing S y Hm
C o rrection
221
RE L IAB LE SOFTWA R E
Pe rhaps mu ch h a rde r to est ima te is the cos t of
H R so ft ware. Cl ear1y, re1 1a bl e h a rd .
.
wa re r u n n m g u n reli able sof twa re will resu lt
m c
ia11 ure mo d es t h at are I 1 k eI y to
.
.
.
ca se serv Ke tte u p t io . Com lex softw are is ofte
n less relia ble, and the best way
.
.
to increase reha b1hty 1s with testm g Test ing takes t .
1me an d uIt 1mateI y add s t o cost
to mar k et.
and time
.
can still wind u p in the field and emerge due to subtle differences in t iming o r exe
cution of data-driven algori th ms.
Design i n g firmware and software for HR can be costly. The FAA requires rig
orous d oc u mentation and testing to ensure that fliht software on commercial air
craft is HR. The D0- 1 788 class A standard requi res software developers to
maintain cop ious design, test, and process documentation. Furthermore, testing
must include formal proof that code has been well tested with criteria such as mul
tiple condition decision coverage ( M C DC ). This criteria ensures that all paths and
all statements in the code have been exercised and shown to work. This laborious
task greatly i nc reases the cost of software components.
AVAILA B L E SOFTWA R E
Real-time systems are composed o f software services that provide service upon re
quest. Norma lly these service s block i n - betwee n request s. Due to coding or design
error , it is possib le that software san ity may be lost so that service s will block i n
s
be
definitely. T o detect and recove r from indefin ite blocki ng, all service should
upper
coded so that they will event ually t i m e out i f they do not unblo ck by some
i meut
bo und o n an expec ted reque st perio d. When the servic e unblo cks due to t
f ( ) ) , whic h
or due to data avail able, the servi ce shou ld call keep_ alive ( t a s k i dSel
ty mon itor. The san
will post the servi ce's uniq ue task ID to a software syste m sani
to ake s re
i ty mon it o r is also a serv ice. This serv ice simp ly perio dica lly chec k
serv ICe. I f a serv ice
all other serv ices have chec ked in and are cont i nuin g to prov ide
use tas kSu spen d ( ) and
has not chec ked in by a t i m e l i m it , the sani ty mon itor c n
ific shut down ( ) to recover
tas kOe lete ( ) in VxWo rks alon g with a n app licat ion- spec
d. Thi s begs the que s
the wedged serv ice a n d its reso urces so it can then be rest arte
this case, a har dwa re
tion , wha t h app ens if the san ity mo n itor itse lf lose s san i ty? In
Scanned by CamScanner
230
dog is a co untdown ti
tn er
watch dog t imer will recov er the system . The watch
of
v
rec
re
full
boot
o ery the sy ,
which if not reset by the sanit y mon itor, will start a
S
,
t work and systems go
tern. I n rare circu msta nces, the recov ery meth ods w.on
i
nto
.
d ov .
of
sa
1ty
loss
in
ov
results
n
er
ry
recove
g
a
do
watch
the
?
where
t
rolling reboo
er
ak
s
time
and
n
,
te
stress
st
i
onal,
functi
ing
g, this is
D E S I G N TRA D EO F F S
Designing for H R alone can be cost prohibitive, so most often a balance o f design
for H A and H R is better . H A at the hardware level is most often achieved through
redu ndancy ( sparing) and switchi ng. A trade-off can be made between the cost of
dupl ication and simply engineering H R i n to components to reduce the MTBF.
Over t i me, designers have found a balance between H R a n d HA features to opti
mize cost and avai labi l i ty. Fundamental to dupl ication schemes is the recovery
latency. When planning component or subsystem duplication for HA, arc h itects
must carefully consider the com plexity a n d latency of t h e recovery scheme and
how this will affect firmware and software layers. T rade-offs between working to
simply increase reliabi lit)r instead of increas ing availabil ity through sparing should
be analyzed. A simple, well-proven methodology that is often employed by systems
engi neers is to consider the t rade-off of p robability of failure, impact o f failure,
and cost to mitigate i m pact or reduce likelihood of failu re. This method is most
often referred to as FMEA ( Failure Modes and Effects Analysis).
Another, less fo rmal process that system engineers often use is referred to as
the lo w-hanging fruit process. This process involves ranking system design feat es
under consideration by cost to i m plemen t, rel i a b i l i ty i m p roveme nt, availab1l 1'
.
improvement, and complexity . The point of low-hanging fruit analysis is to pKk
features that improve HA/ H R the most for least cost and with least risk. Wit ho t
.
existmg prod ucts and field testing, the hrdest part of F M EA or low-hangi ng fruit
au
analysis is estimating the p robability of fa i l u re for c o m pone nts and esti m ng
.
true
improvem ent to H.A/ H R for specific feat ures. For hardware, the tn. ed and .
.
.
.
testu l
met hod 1or
c
1s
est1 matmg re l 1a
. b 1' I tty
based u pon compone nt testi ng, sys tem
be
.
envi ron mental testi ng, accelerated t esting , and field testing. The tradeo ffs but
tween engineering rel iabil .ity and availability into hardware are fairly obv ious,
how does this work with fi rmwa re and softw are?
\.
cJ
Scanned by CamScanner
2J 1
of the fu nction
ensure that main Path-A is covered i n combina tion with the paths
define s Path -C) 28, 30, l 3, l 5, 1 6, 32.
ou t s id e l i m i t s , which
LISTING 1 1 .1
n Re uirem ents
Sim ple C Code with Two Paths and MCDC Testi g q
01 :
# d e f i n e U P P E R_ L I M I T 1 00
02 :
# d e f i n e TRUE
03 :
#define
FALSE 0
04 :
05 :
e x t e r n v o id
06 :
extern
07 :
08 :
09 :
12:
13:
14:
15:
16:
17:
18:
19:
20 :
21 :
lo gM e s s a g e ( c h a
*m sg ) ;
ne d i n t ad dr ) ;
MM I OR ea d ( u n s ig
u n s ig n ed i n t
o id ) ;
rtRecove ry ( v
extern void Sta
i o n ( v oi d ) ;
t inueOperat
e x t e r n v o id C on
re d ;
cove ryRequi
e x t e r n int Re
10:
1 1 :
int
n s ig n e d
L im i t Te s t ( u
if ( va l
>
IT)
UP PE _L I M
l o gM e s s a g
e ( " L imi t
E;
r e t u r n TRU
e l se
Scanned by CamScanner
in t v a l )
re t u r n
FAL SE ;
E x c eed ed \
n" ) ;
2J1
22 : }
23 :
24 : main ( )
25 : {
26 :
27 :
28 :
I OValue
29 :
O;
MM I ORead ( Ox F0000 1 00 ) ;
30 :
31 :
32 :
33 :
S t a rt Recovery ( ) ;
}
else
34 :
35 :
36 :
37 :
Cont i n u eOperat i o n ( ) ;
38 : }
H I E RA R C H I C A L APPROAC H E S F O R FAIL-SAFE D E S I G N
I deal ly, a l l system, s ubsystem, a n d component e r ro rs c a n b e detected and cor
rected in a h j erarchy so that component errors are detected and corrected without
any action req u i red by the conta i n i n g
SUMMARY
d
}'!> t ern designed for H A may not neces a ri l y b e d e i g n e d fo r
sy t ern te.n
t o be H A a l t hough t hey often fai l afe a n d wa i t f o r h u m a n i nt e rv e nt io n for re O\'
ery rather t h a n r i k a uto m a t ic recov ery . 0, t h e r e i, n o c l ea r o r rela t io n bet"'
t'f0
H R. H R
the H A a n d
MTTR .
HR
di
Scanned by CamScanner
2J J
MTBF
and rapid M TT R coul d yield the same availability, it appears that a va ilabil
ity alone does not characteriz e the safeness or quality of a system very well.
EX ERC ISES
1 . H ow many con figurations must a fault recovery system attempt for a d ual
string ful l y cross-strapped system with four subsystem s i
3. I f it takes 30 seconds for Win dows to boot, how many ti mes can this OS
fail d u e to deadlock or loss of software san ity in a year for five nines HA
4. I f the l a unch vehicle for satellites has an overall system reliability of 99.5%
and has more than 2,000 components, what is the average reliab ility of
each com ponent?
5. Research the CodeTest software coverage analysis tool and describe how it
provides test coverage criteria. Can it provide proof of MCDC coverage?
CHAPT E R R E F E R E N C E S
[ ReliableLi n u x ]
2002.
Scanned by CamScanner