Professional Documents
Culture Documents
· Frank Vahid
Dep~ent ~f Corrtpuu{Scfon~ an4 Engineering
University of California, Riverside ·
www.compsciz.blogspot.com
. .~ . : . ·· ·: .!_, :
www.compsciz.blogspot.in
-~
·~
l
:!
_j
:.,1
j
'l
I
I
-;· .:.-· -~·-· ,.,I
l' To my world: Amy. Eric, Kelsi and Maya, and to the memory ofour
:j sixth member, VahidAminian. --'- FV ·
!
109876
www.compsciz.blogspot.in
[I
~
~
";j
•I
~
:1
1
I
;j
;;
1
ii
l
:\ Preface
·~
·I
·•1
·1
,1
,j
,l ·.Purpose
. \
1 Embedded computing systems have gr<>wn tremendously irt recent years, not only in their
j· popularity, but also in'. their complexity: This coniple,ufydemands a new type cif designer, one
.i . who can easily cross the traditional borderbetw<ien' hardwaredesign and software design.
i ·After fuvestig~ting the availability' of:courses' lllld textbooks, we felt a new course .and
~. accompanying textbook were necessary· tifintioduce .embedded coniputing system design
.
•.,1 using a lliiified view of software and hard\\'lire; This textbook portrays hardware and software
1 not as different domains, · but :·rather as two mlplementation ·options along a continuum of
l options varying ui thefrdesign metricsf like cost;:peiformance; power, size, and flexibility.
~ · · Three important treridS have 'riiade such a· uhified view posmble. First, integrated circuit
~ (IC) capacities rui:ve increased to the point ~t both software processors and custom hardware
;! ' a
processors now ~rnnionly cod:ist on smg!e'It:'Second,quality compilers and-program size
\l. increases have led to the conunon use ofprocciisor~independent C, C++; and Java compilers
and ··integrated i design ' eilWoiurients ·1(Il)Es) in ' embedded system ' design, significantly
/j decreasing the importance of the· focus .on microprocessor internals· and assembly language
:1 programming that dominate most existing embc;dded ~stem courses
and textbooks: Third,
r1 ·syn*eSiS . teclmology has O~an~ ;JC) ; the,,ip!)iilt: $at synthesis .(OOlS .have ·become
! 1 ·.commonpla~ in the d~ign :~f .d,igitalJ1iµd~iµ-e:_; SyrithesiJ; tools achieve nearly the same for
i\ .• hardware de~ign 'as co~il~rfacbJey~.J~:~flw~:des~gn;) 'hey allow the designer to describe
-·- ---·--- -·T--· -· .. · desireit fwtcµ<>.naijty in',.a-Wih~1~e1 , pi::<1gram.miijg.~g\lage; and they .then automatica11y
k,1 .gel:!€:rate ~ effi~ie,qt -~~~to,Jn~),la¢wiµ-e. pr~r -implementation. The firsttrend inakes the
J · past,separation ofs<>ftw~e: 3114,har,4~M,e;!f~gr,;neaI'ly)ntpossibk Fortunately; tbe second
~ ' ancfthird trends enable tbeir ,lJJJ.ifi¢ d"'5igP,; bY.-l!lrnil!g embedded system design. atits highest
·~ .Jevel; into.the problent ofse,lectj11.g _lµld pn;igraimning (for so,ftware), designing (for hardware),
·il lµl~ ,~te.~tiilg"pJ<>CeS§q~t -i- ;s ·;t .·) : 1-c_;·,:. 'i,:,::-,:r;
·~} ... .,..........~.·.. ;·... : :,·(: i).: d ·::·,~_-;·~. ·: - :
·cQverage _ . ......-. . ._
· .·.._· . ., . · ..
:~
i .. .-
'"
, : .· .·.;Etl=iiaT:=~~s
.
~ . \ . . . .
j'l -,--_,..,..:__ _ _ _ _ _ _ _ _ _ __:__ _ _ _ _ _ _ _ _ _ _ ___,;__~,
}1 ..... · .· ,. .. J)!t: . .
. ~ -~~d~SysterrtOlisign .VII . . :\?
"
-----··--·- . . . . -
-l.i:. - - - -------- -- -------------
www.compsciz.blogspot.in
r
f - - - - ~ - - - - - - - - - , -- - - - - - - - - - - - - - - - - ~ - -
Preface
· ,:.
\'
environment, again usually available fot free or at low cost, may be useful_
~ -k
www.compsciz.blogspot.in
I
-"""---'----'-----------------------'------:--:--::-,-~-~---:---:----:-------~,-:-----.:......:_
•'Pr~tice .
I
__ --'-~-
Preface
At UCR,. our labs are based on the 805 L microcontroller.and Xilinx FPGAs. We use the About the Authors
Keil C compiler for the microcontrollei", Xilinx f'.oiirii:fatioit·Ejq>ress synthesis software for the I Frank Vilhid is an Associate Professor in the Department of Computer Science and
FPGA, and a~~elopm.ent board from·Xess Co.·iwfatfo."rifofpfotdtyping- the board ~o.ntains .··Ii
Engineering at the University of California, Rivenside, which he joined in 1994. He is also a
both an 805Land lilt FPGA. We also use an 8051 .einulator and stand-alone 8051 chips from f?Culty member of the Center for Embedded .Computer Systems at the University of·
Philips . .. . .. . California, · Irvine. He received his B.S. in Computer Engineering from the University of
We have provided extensive information onour lab setup and assignments on the book's l Illinoi~, UrQana/Champaign, and his MS. and Ph.D. degrees in Computer Science from the
Web page: Thus, while the book's microprocei,sor independence enables instructors to choose University of <;alifotnia, Irvine, where he was recipient of the Semiconductor Research
any lab environment, we have sti_ll prciv/4¢ instructors the option of obtaining extensive Corpcrat.ion Graduate Fellowship. He was an engineer at Hewlett Packard and has consulted
Online assistancejn
~-- ----·· . developing
. .
an
accompanyi~g
':·
laboratory, ; .,
for nUmerous companies,.including NEC -~d Mot!-)rola. He is co-author of the ~uate-level
textbookSpec{fication and Design of Embedded Sy_stems (Prentice-Hall, 1994). He has been
. ..• Adtti"iio11a1Matena1s 1.:' •. .. . · program chair and general chair for both the Inteniafional Symposium on System Synthesis
A.;cl> ' Jage has been established to 'tie used' dn )~onjunction · with the book: and for the International Symposium on Hardware/Software O;xlesign. He has been an active
researcher in.embedded system design since 1988, with more than 50 publications and several
·'tittp:liwww.cs.ucr.edu/esd. This Web page contains supplementary maierialand links for
eac;h c:bilpter. It also contains a setof1~1Jre sl\4es inf\1ic:wspf(PowerPoint fofillllt; because · . best papei-\awards, including an IEEE Transactions on VLSI best paper award in 2000. His
the book itself was done entirely in Microsoft Word," the 1'gures in the PowerPoint slides are research .interests are in einbedded system architectures, low-power design, and design
. ~owerPoint drawings (rather than imported gnlphics), and .thtis can be modified as desired by I methods for syste111-on-a-chip.
-mstructors-:,· ...·, . • , ·- . ·· -·., - .,.. ·.-.···.··· ..... ,.. ...,.,....· ,··. ..... , . ·. ··· .,. ,
., · ::. f ~~~6r~, the .W~b page c~~t.tin~ an•ixttn~iveJab ;rri~,um. to accompany this 0
, I an
Tony Givargis is. Assistant Professor in the Department of Infonnatiqn and Computer
Science and a member cif the Center for Embedded Computer Systems at the University of -
.text.bo_
:__: 9k, ~e_I"_ 30 _lab e_xercises_,_ 1_·ndu4i_:,n_ig ,de.Jail~ _,. 4e_sc_·.. .n_·ption_.~;,SC .. heQta.tics, and co~plete.or ., California,.lrvit!e;He.received his B.S. and Ph.D ..degrees .from the Univ~ ofCalifomia,
pru:(lil,I soh1t19ns, ·can be fo.und there. .11i.e ewfqses.i~e prgaiµzed by· c~pter, .startrng with .
Riverside; where-he received the Department of Computer Science Best Thesis award and the
v_ery_. ~j~
. P__ 1e_ •_ex_e.rc·i.~. an_d ~ea
-· dingt_o p:ro___gf·-~.ss-jv~!r,m_
,_._P_.•_~_~.:,9~m
-. P
... '_. .ex_·. _n_~- F·.o_._re_~p.le_._' Chapter •1
Q.
UCR~le_g~ofEn~(;¢ring Outstanding Student award, and where he was recipient of the
2',sexen::1~,sta,rtw1th a sunple bhnlciiJ.g.lig\lk;.,mcJ en!i,,~1t.h asoda.cmaclune_colltroller and a ·
· calculator: Appendix A provides fuJ1her information on.our Web page. .· ' I · GAANN Grad11ate .fellowship, a MICRO fellowship, and a Design Automation Conference
.. Ack~Jtv,~d~~ents' •. .·....•. .• . ' .. ' ' ... . .• .· . . .•.·. I scholarsliip. As a consultant, he has developed numerous embedded systems for several
companies; ranging !from an irrigation management system to a GPS-guided, self-navigating
· automobile. He h~'published more than 20 research.papers in the embedded systems field.
We ar~ g~tetul to nu~~rous indfviduals ~or their ~si~ce in _developirif ~1s book. Shar~n ., His research interests include embedded and real-time system design, low power design, and
...· flµ ,o(;Notre .Pame,.Nil<ll Putt o(lJG lryllle;,M4 $111J.t11.,.l;!~bi of UC ·Pavis and Synphc1ty . .proce5$or/system'-on-a-chip architectures.
··r~Et;~~~d~==;=;ru,::J
..; '.~ii~~:;1~~~t:f!~~~~%:tt!~i!::i::~:~~a~it;;~~r~:a~tJ ·
.._J9yer~~4e (;()ntributed mu~h ·qf thed~pter .o!),<;011trol sy$tem,s., Karen Jicl;iechter,<;onyerted our ·• ·
.. ·. ·\.',Oyeraes/~ jd~ WPt,.11e irtiJ.i;tl 3_.,D scepe; ~ e,.gepei_o~;4W¥1ti.QttS'..9f ~Q5 L~uip!);lent from .
... Pamti~ ,..Seijucondµctors. and of FPOA <;quip,roent,Jro01 .Xiliilx,-were ;a big assistance-".
. Lilcewjse/ a National Science .Fqund!ltio11 CAR.E;iiR .fiWar,d. supppn,ed •some· of tl.lis book's :I
.. ,cJeve!9p1lle11t.,We thank Caroline Sieg .at Wj,ley,:f<>r,over~ipg ,$<; .·l)wk's -pro<luction and J
Madelyn Lesure for overseeing th~ cover d~ign. Fiil@..Y; '\Y.!uu;e id~_ply( grat€:ful to -~ill :i•.
··••·. ~..1;:=~::rit~;~;-:,t:':~ ; :·~-.·fa'.i· . I
.. , i~~-";----"--'--'-----.--'--'---.....:..,.----_.:._~_.:.__,__~--------'_.:._.::.._"-----,-
~--~----------------------....,,,_.:c:---......,...~~........,...-·.,..,·~··'-··--"·· .Eni~eddedSystem O~ign xi
X .g~tie~ded Syste.m 6~~igrl .J .· ..~..:·~.~.
www.compsciz.blogspot.in _________________ __
- - - - ---,-~ - --=========aao!!!i!l!li!!b...-....:..._. - - - ~ ~"""'~ .......
~ · ··.··· -··
:i
."·,
"THIS .BOOK IS FOR SA.LE: ONLY IN THE COUNTRY TO WHICH IT IS FIRST CONSIGNED .
BY JOHN WILE"'.,& SONS (ASIA) PTE LTD AND MAY NOT BE RE-EXPORTED" .
Contents
...... .
vii
vii
vii
viii
~~ --•/4
. lff).~-_.._. ix
X
X
: ~..-'.· . ·. · . .
xi
'
1
l
4
'. 4
6
.7
8
9
9
I 10
12
13
,1 ·\ ····· 13
jl
i ..
·13
i 'i·
.•14"· ·
II, · · 14
· . .· ·,_, · .. ..
\ ·-:J 6·
-;11
18
www.compsciz.blogspot.in
Contents
. ''· :\is0enfi~~tio/
i :'": . '
1: ·. l
More Productivity Improvers
Trends·
!9 j\ 3.4
Superscalar and VLIW Architectures
Programmer's View 61
. 61
19 Instruction Set . : ,;···
1.6 Trade-offs J, 62
Program and Data.Memory Space
Design Productivity Gap _27. Registers
64
L7 Summary and Book Outlin~ .• 24 I/0 , 64
I. 8 Re rences and Further Reading · 25 j Interrupts
65
25 a 65
. . --~ -Single=-Purpose Processors:Hardware ·29 J Ex.ample~ Assembly-Language Programming ofDevice·Drivers · 66
~~
~
29
I
i
Operating System · · ·
67
3. 5 ·Development Environment
· .2 Co mational Logic 30 l 69
Tran . tors andLogic Gates ·,. ;,. 30
32
I Design Flow arid Tools
£~ample: Instruction-Set Simulator for a Simple Proces'sor
' 69
71
B 1c Combinational Logic Design J Testing and Debug}iing, .. ·
. T-LeveLCombin~tional Components .., ,., . 33 l 71
3.6 · Application-Specific Instruction-Set Processors (ASIPs)
2.3 · 1 :sequential Logic 34
. l1 Microcontrollers ·
74
·
Flip~flops,/ .' , . .. . . 34 i 74
'"RT~te:vel,'Sequential Components
35
I Digital Signal Processors (DSP)
Less-General ASIP Environments
75
75 ,
~
- :__--~-,.Js~e;
·q·~u.·ien~t~ia.·~1~L~o~~ic~ .· ~·..i-;;~:
·D~e~s~i~n~; · ;·.··;·.'·.~·.·~/~~·,~~:"~·);>:•::::-~
Custom Single-Purpose Processor De~rgn ··.'"· ·· ·,, ·· '
~ ;36
38 ,
,!.
3. 7 ~?'·--selecting a ~Microprocessor ···· ··
75
3. 8 .. General~Pu~~b-se Processor Des1g°*
2.5 RT-Level Custom Single;.Pu ose,Proce . . 44 j 3.9 Sumrhary • : 0 • • - : ·
77
. ttmtzmg ·nstom Singl¢"'Prirpose Processors :.-. ·47 i 80
Optimizing·the Origina!)Prograh1''' · · . • . < 47 ! 3. JO Refer~nces and Further Reading ·
_ 3) I Exefdises · . :- >. -~ >':. ·· ·
.
. . . •.· .
80
81.-
Optintizirig the FS~ . . .. . .. .. :~
Optimjzingthe Datapath .r - ;'.,i •'.
!_·
_91APTER 4:,·'
Stand~rfSirigle~Purpose Processors: Periph~rals 83
4. I Introduction · · ·. · ··
Optimii:jpg tge FSM 'Y:' ·· 83
42 Timers, C::ou~ , and Watchdog 'I;'imers 84
2.7 -Suttimary
Timers and
2.8 'References and Further Reading•, . 84
87
. ·~TE:;~t~i]i~iieral~;~~bii;f:~i~~:;;~s:>~oft:~~~,-;~:, .._,,h;i; ,.· 88
3.1 Introduction · · ·' · ., · · · ·· . . . .. 89 ,
3. 2 ·c Architecture .. ·-' i - 1 90
D ath .. Pulse Width Modulators· 92
i1t;ol Ulli(f/ '.· t~n~·: ,. ,:"~·- i-· 92
~ ; ; ; Controlling a D~-~otorUsinga PWM~- , , · ,:·
Memory 4 .5 LCDControllers .. .
94
·. 3.3 · 0 ration . 95
Overview
Inst ction Exe.cution 95
Example: LCD,lnitializafiori'' ;;' 97
~ipeliriirig - · · ·
4 .6 Keypad Controllers . 97
.xiv.
I
~nterits r.
!.J ,_--~ - - ~ - - - - - - - ~ . - - - , - - ' - - ~ ~ - _ _ _ . : . - ~ ~ ~
Conteni!: .
Example: ·fil4p+64 ~d. ~?C~~B."R.::1\W~PM D~v1ces .· 120 li. .•..· Network-Oriented Arbitration Methods' · 162 .
- ~'.~
5·t~~z~:m=~l=~~ii Mti1:~li~!s~ga~n- ~F.:u~pt~tab~le ==_~~~!:~;~~~
\
SA
Example: TC55V~}25FF~l00 Memory Device
Composing Me~ory · · ,;;, :::•\; , · !;~ :~·~:--.~ 6-~
-fx=a1~:~~ 5 1~·01
· 55 Me_m ory llierarchy and Cache· 125 ~ 6.8 · Advanced ConimunicationPrinciples 166
·, Cc1che MapPing:Techniqu.es .. :,_; ,·. :; ., , , 126 l .· · Parallel Communication . 166
Cach¢.:Replacement Policy ·. '' , 128 ·,:_;· ·-:;_.<;. .· Serial Commi,mication 166
Cache:Write Techniques · .-,·> 128, ; ·•·' Wireless Communication 167
Cliche lmpact on System PerfornHW.~ :,fo:i ,: · 128 .i Layering 16~
S.6 •·.· J\dvanc~d RAM . · ;.,~;nc,n,\· '' · · 130 t,;
ill
,: :, Error Detection and Correction I 68
www.compsciz.blogspot.in
-~<;ontents
.·
~~--;_;_~_;.,....,......_...,.......,....,..._...,......~__,...,......,... --:---:-.,.._,.,..,..._,,._ l
1
~
186. , 235
. Nonfunctional Requireroents
8.14 Implementation . . .. . . 236
Inf~rmal Functional Specification · 187 · _Creating ·· a.nd Terminating Proce.sses 236
· Refined Functional Specifica!ion ·· 187 ' Suspending and Resu.min_·_ g Processes
. . ·.,
'111·11. 238 ·.
7.4 Design 194 ;_.
. Join_ing a: Process · ··
9
11
Implementation 1: Microcontroller Alone 195 : 239
1,!
· Scheduling Processes . 239
Implementatjon 2: Microcontroller and CCDP:P . . 195 i . :/. 8.15 Dataflow Model
. .
IiI - .. .- - --·-:--: Implementation 3 :Micro.conttoller ancJ GCDPP/F1xed".'Pomt DCT 200 ; -·~, c · 8.16_ Real-Time Systems
241
Implementation 4: MicrocontmllerandCCpPP!DCT 403 ;i 242
205 Windows CE 242
1·. . 7.5 Summary :· ,.· · . \:- . .
i ·_·__ . :~::.x~ . .7. 6 References and Further Reading 2osi QNX 243
205. !
! . . ..
I 8.17 Summary 241
i r 207 i. ' , , 8':'18 · 'I{efererices and FurtherReadirig 244
~~TEi~~ cis~;~te' Madhine and ConcurrentPmcess MCldels ..
; '°~ . 1 9 Exercises i44
8 .1 Introduction · · ·.. . · . . , ·· .· · 207 i · ·,: ~ .:,. CRAPTER 9: Control Systems
8.2 · Models ~s. Languages;Text vs.,Graphics 2°..
9
:209 ;·:.:. '.'\~ 9.1 · Introduction . · ··: ,. · / /
245
·245
· ¥odeis vs: L·a_nguage~ , · . . .
. t'e'>ctual Languages vi{ Graphical Languages 21o 1 u2::· 92CNe~~z;i~~op and ,~iosed-1:°oP,1 ;o~!;.srst~ts:·,·.· · . 246 .·
246
.\
www.compsciz.blogspot.in
lF.
. -S-f:in-'-,t,!..:..T-'--i- •.:..:.
__ .- . _:.___ _ _ _ _---,---------':-::-:--:--::-:---:-------:----,----,---'-----
i
1
... .. ,·
·:., :.: ..,. ·
CHAPTER 1: Introduction
fl E~bedded sfsfariis
6veryiew. , ...·..·. _. .· ...•....
1.2 Design Challenge -Optimizing Design Metrics
l.J . ProcessorTechnology
1. 4 -• 1 IC Technology · ·
LS D~igtt Techrio16gy ··
f.6 Tradeoffs ··,.· . ·. ·
www.compsciz.blogspot.in
rr-
., ----
II -
Automatic transmission ·On-board navigation CCD
' ·
.
· Avionic systems · Pagers . . .
Battery chargers. Photocopiers .
Cimitorders · Point-of-sale systems
.. Cell
Cell phones
phone base stations Portab
S. Cordless phones
Cruise -control
Printerle
s video games
Satellite phones
Scanners
(J · .
· , .
_· . ·
. . · . . · ·
Curbside check-in systems Smart ovensldishwashens
·
Digital cameras
Disk drives Speech recogJl
Stereo systems ·
iz.ers · I«,l
.·,~· l ·
DMA controller _, ·.
Electronic card readers Teleconferencing systl
Electronic instruments Televisions ·. ·
Electronic toys/games Temperature controlle
Factory control · .
Theft tracking systems
Fax machines TV set-top ·boxes Memory controller
Fingerprmt identifiers VCR' s, DVD players
Ho,ne security systems Video game consoles
Life-support systems Video phones . ')
Medical testing systems. Washers and dryers Figure 1.2: An embedded system .,;..,,;:,ple--,- a digital_
camera.
2
Erti~ ed System Design
·"·- ·--- ---- ··".www.compsciz.blogspot.in
~~ --- 3
----- - ···-·- ·. --- --·· ·· --- --- --- --- --~ ~~ C--
"-= -== ==- ::iJh:;;:::: ---- -:--
Chapter 1: Introduction 1.2: Design Challenge - Optlmklng Design Metrlc:a
For example, consider the digital camera chip shown in Figure 1.2. The charge-coupled
device (CCD) contains an array of light-sensitive photocells that capture an image. The A2D
and D2A circuits coilvert analog images to digital and digital to analog, respectively. The
CCD preprocessor provides commands to the CCD to read the image. The JPEG codec
compresses and decompresses an image using the JPEG1 compression standard, enabling Size
f.
compact storage of images in the limited memory of the camera. The Pixel coprocessor aids
in rapidly displaying images. The Memory controller controls access to a memory chip also
fouild in the camera, while the DMA controller enables direct memory access by other devices
while the microcontroller is performing other functions. The UART enables communication
with a PC's serial port for uploading video frames, while the ISA bus interface enables a
faster connection with a PC's ISA bus. The LCD control and Display control circuits control Figure 1.3: Design metric competition- improving one may worsen others.
the · display of images on the camera's liquid-crystal display device. The ·
Multiplier/Accumulator circuit performs a particular frequently executed multiply/accumulate / NRE cost (nonrecurring engineering cost): The one-time monetary cost of designing
computation faster than the microcontroller could. At the heart of the system is t11e 0
the system. Once the system is designed, any number of units can be manufactured
Microcontroller, which is a programmable processor that controls the activities of all t11e ~ithout incurring any additional design cost; hence the term nonrecurring.
other circuits. We can: think of each device as a processor.designed for a particular task, while /"' Unit tost: -'.{he monetary cost of manufacturing each copy .of the system, excludinr,
the microcontroller is a more general processor designed for general tasks. yREcost.
This example illustrates some of the embedded system characteristics described earlier.. / . Si.it:'' The physical space required by the system, often measured in bytes for
First, it performs a single function repeatedly. The system always acts as a digital camera, · / / /oftware, and gates or transistors for hardware.
wherein it captures, compresses, and stores frames, decompresses and displays frames, and . / ,.. /
7
PerfortJUince: The execution time of the system.
uploads frames. Second, it is tightly constrained. The system must be low cost since ; , / Poy,er. The amount of power consumed by the system, which may determine the
consumers must be able to afford such a camera. It must be small so that it. fiis within a lifetime of a battery, or the cooling requirements of the IC, since more power means
standard-sized camera. It must be fast so that it can process numerous images in milliseconds. morehea1/
It must consume little power so that the camera's battery wiil last a long time. However, tllis / Flexibility: The ability to change the functionalit; of the system Y.ithout incurring
particular system does not possess a high degree of the characteristic of being reactive and • . ,ieavy NRE cost. Software is typically considered very flexible.
real time, as it responds only to the pressing of buttons by a user, which, even in the case of an ,./ ~ime-to-prototype: Th!~~11~e~ t()J?lln<!.!..wor~i.!!K:Y~rsi_on o_[l!!_e_~ys!~, which
avid photographer, is still quite slow with respect to processor speeds. · . may ~gger or more expensive than the finiil system 1mplementallon, but 1t can be
used to verify the system's usefulness and correctness and to reline the system ·s
0
ymctionality. .
1.2 Design Challenge - Optimizing Design Metrics ~ , Time-tg-'!}{lrket: The"-tim~..n,gy'ged_JQ_dex_rj9~~~_tem_l~l.ffie _point that it can be
/ re)~as~ __l!Jl<:I . SOid t() customers. The ·mairi: ~Q.ntriQ,Ut<!!:,S,_ ~ design time,
The embedded-system designer must of .course construct an implementation tl1at fulfills .. manufacturing time, ancftestfogume. - . . -· .
desired functionality, but a difficult challenge is to construct an implementation that · • Maintainability: The ability to modify the system after its initial release, especially
simultaneously optimizes numerous design metri¥. by designers who did not originally design the system. . .
~Correctness: Our confidence that we have implemented the system's functionality
ComJrion Design Metrics/ _/ ·· correctly. We can check the functionality throughout the process of designing the
1:p/our · purposes, an implementation consists either of a microprocessorr with an ~system, and we can insert test circuitry to check that·manufacturing was correct
accompanying .program, a connection of digital gates, or some combination thereof. A design . ~ SafeQ>_:_}'he probability t~~t the system wiH not ca1:1.se harm.
/
\
. .
. ' . r
1 JPEG is short for Joint Photographic Experts Group. "Joint" refers to the group' s :
-·
metric is a measurable teature of a system's implementation. Commonly used metrics include: '
. .
Metrics typically compete with one another: lmprov~ one often leads to worsening of
another. For example; if we reduce an imple enfation' s size, the implementation· s
performance may suffer. Some observers have mpared this phenomenon to a wheel with
status as a committee working on both ISO and ITU-T standards. Their best-known standard numerous pins, as illustrated in Figure 1.3. If, . u push one pin in, such as size, then the other
is for still-image compression. '·
on-time mark~t entry i~ the area ~fthe triangle labeled On-time, and the revenue for a delayed
pins pop out. To best meet this optimization challenge, the designer must be comfortable with ] ~n~ produc_t 1s the area pf the triangle labeled Delayed.'The ~~e loss for a delayed entr
a varietv of hardware and software implementation technologies, and must be able to migrate :
from o~e teclmology to another, in order to find the best implementation for a given ::
J!LIUSt..the_difference.of
~7 ·"----·-··----------.-··· these two. tr·tang Ie,s' lit~~--
· __tVs~denve
Le "'"" - an ~lion for· percenta ey
_ revenue loss, which. eci,uals ((?n-t1me :- pelayed) I On~time) * I ooyor-simplicity, we;I ···
application and constraints. Thus, a designer cannot simply be a hardware expert or a software :\ ~ssume the ~arke! nse angl~ is_ 45 degrees, meanin_~ !.11e_height of the triangle is W, and we
expert. as is commonly the case today; the design~r must have expertise in both areas. · ~~e 3;~:~all-~eJlerS1~e ~e denva~on of1,the sa~e equation f~r any angle. ~ e a .C>fth<: On-
t tr t ~ C~£!1~_,a_~2 ,*__tg_ ~e-~--~igbt, is. thu& ½ ~2W * W, or W 2 • The area ;ofthe
The Time-to-Market.Design Metric ~ n~ le 1s-½. (.W'-
. D + W) *. (W - D). After a\gebr.a1c · s1mphficat1on,
· · · we obtain the
Most of these metrics are heavily constrained in an embedded system. TI1e time-to-market .' 1011owi~g equation for percentage revenue loss:
constraint has become especially demanding in recent years. Introducing an embedded system ,
to the marketplace early can mak~ a big difference in the system's profitability, since niarket '
d. . percentage revenue loss ={D(3W,~D) / 2W2) * IOO% !.-'
J- 1<ki
2- J. yJvifW'
0
windows for products are becoming quite short, with such windows often measured in · <:onsider a prod_uct ~hose lifetime js-5 ~:. so W = 26. Accor~ing~: the prec~ng
months. For example,[Figure l.4(a) shows a sample market window during which time a · equau _ el~y of Just D = 4 w r~§Ults. m areve11ue loss of 22%, and a delay of D = 1O
product_ would have hi~l sales. Missing this window, which means that the product begins w . ,results m a loss of'5~. S?mestudies claim that reaching market late has a larger
being so further to the right on the time scale, can mean significant loss in sales. In some · n~gaove effect on revenu 1han g¢4elopmen.t cost overruns or even a product price that is too
ca . each day that a product is delayed from introduction to the market can translate to a high. ' / .
e-million-dollar loss. The average tJme-to-market constraint has been· reported as having . o(2iv0~
.?I~~.*'(O·J·-;\ &_--0Y!J
sluunk to only 8 monthsT> , The NRE and Unit Cost Design Metrics ~· o-
Adding to the difficulty of meeting the time-to-market constraint is the fact that As anoth~ exercise,_ let's consider _NRE cost and unit cost in more detail. Suppose three
embedded system complexities)tre growing due to increasing IC capacities, as we will see technolog1~s are ava1lable for use m a particular product. Assume that implementing the
later in this chapter. Such rapid growth in .IC c,apacity translates into pressure on designers to product usmg technology A would result in an NRE cost of $2,000 and unit cost of $100, that
add more functionality to a system. Thus. dysigners today are being asked to do more in less technology B would have an NRE cost of $30;ooo and unit cost of $30, and that technology c
A time. //
t's investi.gate the loss.· o~ re.·ve.ntfe that can occur due .to delay~ en. ~ of a product i~ :
market. We'ILuse a sunphfi~odel of revenue that 1s shov.:n m Figure 1.4(b). This;
assumes the peak of !he market occurs at the halfway point, denoted as W, ·of the
. product life, and that the peak is the same even for a delayed el).try . The rdve!]ue._ for an
~oul~ have an NRE cost of $100,000 and unit cost of $2. Ignoring all other design metrics,
hke t1me-to-m:irket, the ~st technology_ choice will depend on the number of units we plan to
produce. ~e illustrate this concept with the plot of Figure l.5(a). For each of the three·
technologies, we plot total cost versus the number of units produced, where: .
------------'--~------'----------:(' - = - - - - - - - - - - - - ~ - - - - --__:______7
Embedded System Design~:; Emb.edded System Design
6
www.compsciz.blogspot.in
Chapter 1:.Introduction
total cost= NRE cost + unit cost * # of units In embedd~ systems, perfonnance at a very detailed _level is. also often Qf concern. in
We see from the plot that, of the three technologies, technology A yields the lowest total particular, .two signal changes may have to be generated or measured within some number of
cost for low volumes, namely for volumes between l and 400. Technology B yields the nanoseconds,·
lowest total cost for volwnes between 400 and 2500. Technology C yields the lowest cost for · - Speedup is a com'!lon method of comparing the performance of two systems. The
volumes above 2500. speed\lp of system A over system 13 is determined simply as:
Figure l.S(b) illustrates how larger volumes allow us to amortize NRE costs such that speedup of A over B = performance of A / performance of B.
lower per-product costs result. The figure plo~ per-product cost versus volwne, where~
. Performance could be measured either as latency .or as throughput, depending on what is
per-product cost = total cost I # of' units = NRE cost/ # of units + unit cost of m~erest. Suppose the speedup of camera .A over camera B is 2. Then we· also can say that A
For example, for technology C and a volume of 200,000, the contribution to the · 1s 2 times faster than B and B is 2 times slower than A.
,...,,,,-·
per-product cost due to NRE cost is $100,000 / 200,000, or $0.50. So the per-product cost -!\
would be $0.50 + $2 = $2.50. The larger the volume, the lower the per-product cost, since the
NRE cost can be distributedover more products. The per-product cost for each technology 1.3 Processor Technology · -·1,.
approaches that technology's unit cost for very large volumes. So for very large volumes,
We can define technology as a manner of accomplishing a task, especially using technical
nwnbering in the hundreds of thousands, we can approach a per-product cost of just $2 -
__,, processes,_ methods, or knowledge. This book takes the perspective that three types of
quite a bit less than the per-product cost of over $100 for small vq~s.
technologies are central to embedded system design: processor technologies, IC technologies,
Clearly, one must consider the revenue impact of both · e-to-market and per-product
and design technologies. We describe all three briefly in this chapter and provide further
cost, as well as all the other relevant design metrics whe .. al"uating different technologies.
' '
.7 4 details in subsequent chapters.
. Processor technology relates to the architecture of the computation engine used to
The Performance Design Metric .:._.,/ ui:iplement a system's desired functionality. Altl1ough the term processor is usually associated
Performance of a system is a measure of how long the system takes to execute our desired with programmable software processors, we can think of many other, nonprogrammable
tasks. Performance is perhaps the most widely used design metric in marketing an embedded digital systems as being processors also. Each such processor differs in its speciali1.atio~
system, and also one of the most abused. Many metrics are commonly used in reporting towards a particular function (e.g., image compression), thus manifesting design metrics
system performance, such as clock frequency or instructions per second. However, what we ifferent tl1an other processors. We illustrate this concept graphically in Figure 1.6. The
really care about is how long the system takes to execute our application. For example, in pplication requires a specific embedded functionality, symbolized as a cross, such as the
terms of performance, we care about how long a digital camera takes to process an image. r summing of the items in an i>.rray, as shown in Figure I .6(a). Several types of processors can
The camera's clock frequency or instructj_ons per second are not the key issues - one camera implement this functionality, each of which we now describe. We often use a collection of
may actually process images faster but have a lower clock frequency than ariother camera. such processors to optimize a system's design metrics, as in our digital camera example.
With that said, there are several measures of performance. -For simplicity, suppose we
a
have single task that will be repeated over and over, such as processing ai1 image iii digital a , 'General-Purpose Processors - Software
camera. Jhe-twomain measures of performance are: __ ........ ·
The designer of a general-purpose processor, or microprocessor, builds a programmable
• ·· Latency, or response time: The time between the start of the task's execution and the
device that is suitable for a variety of applications to maximize the number of devices sold.
end. For example, processing an image may take 0.25 second.
One feature of such a processor is a program memory - the designer of such a processor
• · Throughput: The number of tasks that can be processed per unit time. For example, a
does not know what program will run on the processor, so the program cannot be built into
camera may be able to process 4 images per second.
the digital circuit. Another feature is a general datapatl1 - the datapath must be general
However, note that throughput is not always just the number of tasks times latency. A
enough to handle a variety of computations, so such a datapath typically has a large register
system may be able to do better than this by using parallelism, either by starting one task
before finishing the next one or by processing each task concurrently. A digital ca..,iera, for t file and one or more general-purpose aritlune ·
designer, however, need not be conce
· units (ALUs). An embedded system
about the design of a general-purpose processor.
example, might be able to capture and compress the next image, while still storing the :;
previous image to memory. Thus, our camera may have a ·latency of 0.25 second but a t n embedded system designer si
processor' s memory to ca
y uses a general-purpose processor, by programming the
ut the required functionali ty. Many people refer to this part of
throughput <:>f 8 images per second. l•,
t: an implementation as software" portion. ·
f.~
------------'-------------------------
.8 Embedded System Oesign
· . -·-- - ·~·-- · · · -- ~
www.compsciz.blogspot.in
-~ ~ -; -~.:...:::.....cvo"c ·· , - _, ,-~ __,;-1 /½·-· ~ - --
l_ Embedded System Design
9
Chapter 1: Introduction 1.3: Pro~ssor Technology
DO (b) (c)
0 (d) Program
. memory/
Data
memory memory
Datil
memory
/
.._,/ . .
/'
.._____,
/
Data
.,,..rilemory
As$p1Tlb(y Assembly.· ··
Figure 1.6: Processors vary in their customization for the problem at hand: (a) desired functionality, (b) general-
purpose processor, (c) application-specific processor, (d) single-purpose processor. e6de for: code for:
,.../
total =iO ~ a-total= 0
/ 4 . g a general-purpose processor in an
embedded system may result in several design fori =I lo. I
fori-=l to
metric benefits. Time-to-market and NRE costs are low because the designe_r must only write
(a) (b)
a program but not do any digital design. Fle~bility is high because changing functionality (c)
requires changing only the program. Unit cost may be low in small quantities compared with Figure. 1.7: Implementing dc.si(cd functionality on
different processor types: (a) general-purpose, (b)
application-specific, (c) single-purpose.
designing our own processor, since the general-pwpose processor manufacturer sells large
quantities to other customers and hence distributes the· NRE cost over many units.
Performance may be fast for computation-intensive applications, if using a fast processor, due designer may create a single-purpose processor by· designing a custom d,·e,o mtal · ·t
d' . .• . ClfCUI , as
to advanced architecture features and leading-edge IC technology. _1scussed m later chapters. Alternatively, the designer may purchase a predesigned
However, there are also some design-metric drawbacks. Unit cost may be relatively high ~mgle-p~se prn~ssor. Many people refer to this part of the implementation simply as the
for large quantities, since in large quantities we could design our own processor and amortize hardware portion. although even software requires a hardware processor on which to run
our NRE costs srich that our unit cost is lower. Performance may be slow for certain Other common tenns include coprocessor, accelerator, and peripheral. ·
applications. Size and power may be large due to unnecessary processor hardware. Usmg a smgle-purpose processor in an ~mbe<J.ied system results in several design-metric
For example, we can use a general-purpose processor to carry out our array-sununing ;; benefits and drawbacks, which are ~ssentially the inverse of those for general-purpose
functionality from the earlier example. Figure l.6{b) illustrates that a general-purpose r processors. Perf~nnanc~ may be fast, SIZe and power may be small, and unit cost may lie low
processor covers the desired functionality but not necessarily efficiently. Figure l.7(a) shows tr for large quanuties, w~le design time and NRE cos_ts may be high, flexibility low, unit cost
a simple architecture of a general-purpose processor implementing the array-sununing t_) high for s_ma~l· ~titles, and performance may not match general-purpose processors for
functionality. The functionality is. stored in a program memory. The controller fetches the [ some apphcat1o~SJ ..
current instruction, as indicated by the program counter (PC), into the instruction register :· For example, Figure l.6(d) ili~strates the use of a single~purpose processor in our
(IR). It then configures -the datapath for this instruction and executes the instruction. It then embe~ded . system example, representing an exact fit of the desired funciionality noth·
mo thin I F" · , mg
determines the next instruction address, sets the PC to this address, and fetches again. re, no g ess. tgure l. 7( c) 1Hu~trates the architecture of such a single-purpose processor
:or _the example. The data_p ath contams only the essential components for this program: two
Single~P.urpose Processors-·. Hardware egisters ,and . adder.
. one . Smee
. . processor only executes this oneprogram, .~,e
the d · the
.. h arwrre
.program s mstruc11ons d1rectly mto the control logic and use a slate register to step through.
A singJ/~purpose processor is a digital circuit designed to execute exactly one program. For those mstrucllons, so no program memory is necessary. · ·
:2.
.e)illllple, consider the digital camera example of Figiire 1 All of the components other than
the microcontroller are single-purpose processors: The JPEG codec, for example, executes a
single program that compresses and decompresses video frames. An embedded system
www.compsciz.blogspot.in
Chapter 1: Introduction
1.4: IC Technology
l_
13
www.compsciz.blogspot.in
· - - - -----'···· -·-- ·--·-' -~~~~ ~ - ~~0 .c.,;.,..;.__ · . - ; ~_.....¼.
Chapter 1: Introduction
1.4: IC Technology
10,000
·····~ ·········· ····
l ,000
Q,
:.au JOO
IC package JC
g_.;- IO
"'o --=
.. 0
-~] I
Figure 1.8: !Cs consist ofseveral layers. Shown is a simplified C°l'.f0S transistor; an IC may possess millions ofthese, ;. !;=3
connected above by many layers of metal (not shown). i• u 0. 1
.QC)
r: 0
...l
AND-OR-INVERT_combination, the mask portions are predesigned, usually by hand. Thus, f 0.01
the remaining task is to arrange these portions into complete masks for the gate level, and then [: 0.001
to connect the cells. ASICs are by far the most popular IC technology, as they provide for '
good performance and size, with much less NRE cost than full-custom I Cs. However, ASICs
I 11 11 1111
. still require weeks or even months to manufacture. ·
Figure 1.9: IC c~pacity exponential increase, following "Moore's Law." Source: The International T ho I
PLO ,,,
Roadmap for s,m,conductors. ! ec O ogy
In a prograirurul]>le logic device (PLD) technology, all layers already exist, so we can
purchase the IC before finishing our design. The laye~ement a programmable This trend, illuSuated in Figur~ 1.9, was actuaily predicted way back in 1965 by Intel
circuit, w re programming has a lower-level meaning than a software program. The co-fou nder Gord0n Moore. He predicted that semiconductor transistor density would double
pro · g that takes place may consist of creating or destroying connections ~tween eve~ 18 to 24 months. The trend is therefore known as Moore's, Law. Moore recently
wi that connect gates, either by blowing a fuse, or setting a bit in a programma)m: switch. pr~icted about ~other decade before such growth slows down. TI\e trend is mainly caused
mall devices called programmers, connected to a desktop computer, typically perform such b~ improvements m IC manufactunng that result in smaller parts, such as transistor parts and
programming. We can
divide PLDs into two types, simple and complex. One type of simple i wues, on th~ surface_ of the IC. The minimum part size, commonly known as feature size, for
PLO is a programmable logic array (PLA), which consists of a programmable array of AND ,· a CMOS IC m 2002 1s about 130 nanometers · ·
gates and a programmable array of OR gates. Another type is a programmable array logic . Figure_ 1.9 shows leading-edge chip ap~r<iximate capacity per year from 1981 to 2010,
(PAL), hjch uses just one programmable array to reduce the number of expensive usm~ predicted d~ta _for years 2000-20 IO. Note that chip capacity, shown in millions of
pro ble components. One type of complex PLO, growing very rapidly in popularity r "transi5tors. per chip, IS plotted on a logariUunic scale. People often underestimate and are
lI,,, er the past decade, is the field programmable gate array (FPGA). FPGAs offer more general /; so~ewhat amazed by the actual grow_th of something that doubles over short time periods, in
connectivity ng blocks of logic, rather than just arrays of logic as with PLAs and P ALs, 1 th1 s case 18 ~onth,s. For exm.nple, tlus underestimation in part explains the popularity of so-
and are sable to implement far more complex . esigns. PLOs offer very low NRE cost and ~ called pyranud schemes. It is the ·key to the popular trick question of asking someone to
'ristant IC availability. However, th &e typi~~gger than ASICs, -may have i' choose_betwee~ a salary o_f $1,000/day for a year, or a penny on day one, 2 pennies 011 day
gh_ unit cost, may c~nsume more power d may b e r r (especially FPG~)- They still f:: tw~, witb contmu oubJmg each day for a year. While many people would choose the first
pr ctevreasonablenormance, tho so they are espeCially well-swted to rapid I.; option, th~ sec a_ option resu_1ts in m~re money than exists in the world. Many people are
ototypmg. also surpn to discover that Just 20 generations ago, meaning a few hundred years .we find
~ that we each have one million ancestors. . . ,
Tr: nds !· . Fi~e I.IO ~hows that in 1981, a leading-edge chip could hold about 10,000 transistors,
h uld be bed { which is roughly the complexity of an 8-bit ·microprocessor. :i_n 2002, a leading-edge chip can ,
es o aware of what is by far the most important trend in em . ded systems, a trend r hold a~ut. 1_ 50,000,000_ transistors, the equivalent of 15,000 8.-bit micrnnrnri>ffnrsf ~or f
related. to ICs: IC transistor capacity has doubled roughly every 18 months for the past '
b
f -r--
companson, 1 automobile fuel efficiency had improved at this rate since 198 I cars in 2002
f
Several decaAes. !'·.;
- - ·-·--'---~~==~~''" 1~-·--·
14
www.compsciz.blogspot.in J /'
· · •· - , ··-·_..,.··?:,;;,· 'rlb:anLA;,;#AJt=t<f.::,;-;-=- "'tN ,·. ·
r
f:1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - t;
. Chapter 1: Introduction f_
_ . ; ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - r·; 1.5: Design Technology ·
f
i
I
f ------,---,-----\--h---~~:::........-
-,------~;:.._----'-----arllWPl~-~ .
~~aa~Ji
Chapter 1: Introduction
1.6: Trade-offs
Libraries/IP
!Libraries involve reuse of preexisting implementations. Using libraries of existing ·. ,----------------...:.·~-'-:-'--.- 100,ooY ,
\r6plementations can improve productivity if the time it takes to fi~d; acquire, integrate, and • g
test a library item is less than that of designing L'le item oneself. ~ . 10,000 ..c
A logic-level' library may consist of layouts for gates and cells. An ·R:_r.-level library r~ay E
0
1,000
i
consist oflayouts for RT components. like registers, multiplexors, decbders, and functional
units. A behavioral-level library may consist of commonly used componenls, such as 100
·-·.:::,c·:,. i:=-"...t
tI , compression .components, bus inteifaces. display controllers. and even general-purpose
10
g ,!:!}
"O ~
O · "O
processors. The advent _of_~:ystem-level integration has caµ~4 !!..&I:Ce!!.t_c.hange in ~s level of ~ it ~
,~-b' Rather than ·these c01nponeri:ts being ICs, they now must also be availabl~ in a form 0
ti '-.
·.;
that e can implement on just one portion of an IC. Such components are ~ailed cores. This C:
0.1
ch ge from behavioral-level libraries of I~s to libraries of cores has prompt~d the use of the . ~
t m1 · ellectual property (IP), to emphasize the fact that cores exist m an mtellectual form. 0.01
tl must be protected from copying. Finally, a system-level library might consist of complete 00
0,
;;:: 0 M
0
.,.,
0
r- 0,
0, 0 0
0 0 0 0 0
syste~s solving particulru- p~oblems, such as an inter~onnectiori ·of processors with N
accompanying operating $)'Stems and programs to implement an interface to the Internet over
"' "' N N
an Ethernet network. Figu~e 1.12: Design productivity exponential increase. Source: The. International Technology Roadmap for
Semiconductors. .· . ·
Test/Verification
· Languages focus. on capturing desired functionality with minimum designer effort. For
Test/verificahon invol~e,5:J ~ t fun~tionality is co~ect.. Such assurance can prev~nt f example, the. sequential programming· language of C is giving way to the object.oriented
tnne-cqnsummg debuggmg at low abstracllon levels and Iteratmg back to lugh abstraction i language of C++, which in tum has given some ground to Java. As another example,
lev . -~ state-machine languages pennit direct capture of functionality as a set of states and
Simulation is the most common method of testing for C?l!.<,.C.t functionality, although ·~ transitions, which can then be translated to other languages like C.
more formal ~-erification techniques are growing 1n popuiaiiiy'..ATthe logic level, gate-level'; Frameworks provide a software environment for the application of numerous tools
simulators provide output signal timing waveforms given input signal waveforms. Likewise, ; throughout the design process and management of versions of implementations. For example:
general-purpose processor' simulators execute machine code. At the RT-level, hardware .'. a framework might generate the UNIX directories needed for various simulators and svnthesis
description language (HDL) simulators execute RT-level descriptions and provide output 1 !<><>ls, supporting application of those tools through menu selections in a single graphical user
aveforms given input waveforms. At the behaviQral level, HDL simulators simulate · interface. -· ·
equential programs, and cosimulators connect HDL and general-purpose processor
/ :
mutators to enable hardware/software coverification. At the syste~ level, a model simulator . ; Trends
mutates the initial system specification using an abstract computation model, mdependent of . :
any processor technology, to verify correctness and completeness of the specification. Model ' Th~ combination of cornpilati~n/S}'Ilthesis, libraries/IP, test/verification, standards, languages,
checkers can also verify certain properties of the specification, such as ensuring that certain and frameworks has imp(OVed designer productivity over the past·several decades, as shown
simultaneous conditions·ne"er occur or that tl1e system does not deadlock. in Figure I. 12. Productivity is measured as the number of transistors that one designer can
produce _in one month. As the figure shows, the gro\\oth has been impressive. A designer in
More Productivity lmprovers 1981 could produce only about 100 transistors pen:nonth, whereas in 2002 a designer should
be able to produce about 5,000 transistors per month. ·
There are numerous ~dditional approaches to improving designer productivity. Standards.
focus on developing well-defined methods for specification, synthesis, and libraries. Such
standards can reduce the problems that arise when a designer uses multiple tools_,or retrieves 1.6
or provides design information from or to other designers. Common standards include Trade-offs
language standards, synthesis standards, and library standards. Perhaps the key embedded system design . challenge is the simultaneous optimization of
t--
, competing d~ign metrics. To address this challenge, the designer trades off among the
18
~
. . .. - ···- ~ ~ - ... ·· - ·· ... . Em,,_ s,..,m O~ga s,..,m o~..,
www.compsciz.blogspot.in
-~ .' Y d&'.~ ., .,
:'I
-·-~ :.:.. ::::: === ---- ---- -- §
~pter 1: Introduction
;
· 1.6: Trade-0ffs
General- Single-
General; ·purpose ASIP purpose
Customized,
providing improved: processor processor
providing improved:
Flexibility
Maintainability ¢=:J ,---1\ Power cHiciency
L-y Performance
NRE cost Size
Time- to-prototype
Cost (high volume)
Time-to-market
Cost (lov.' volume)
PLO Semicustom Full-custom
Fi gure 1.14, The independence of processor.and IC technologies: Any processor technology can be mapped to any IC
technology.
of abstraction, as illustrated in Figure I. 13. Thus, the starting point for either hardware or
software is sequential programs, enhancing the view that system functionality cau be
implemented in (:lardware, software, or some combination thereof. leading to the following
important point:
The choice of hardware versus software for a particular function is simply a
trade-off among various design metrics, like performance, power, size, NRE
l . cos/, and especial(v flexibility; there is no fundamental difference between
Figur~ t .l 3: The co-design ladder: recent inaturation of synthesis ena!>les a unified view of hardware and software. J what hardware or software can implement.
• f!ardwarel.1oftware codesign is the field that emphasizes a wtified view of hardware and
. tages of.the vario~ available processor techno~o~es ~ IC ( software, and develops synthesis tools and simulators that enable the co-developmen t of
advantages and disadvan
technologies. To optimize a system, the designer must therefore be familiar wtth and I systems using both .h ardware and software.
comfortable with the various technologies - the designer must be a "renaissance engineer," · In general, we can view .· the basic design trade-off as general · versus customized
in the words of some. In the past and to a large extent in the present, however, ·most designers · implementatio n, with respect to either processor technology or IC technology, as illustrated in
had expetlise with either general-purpose processors or with single-purpose processors but not · Figure I. 14. The more general. programmable technologies on the lefl of the figure provide
. both - they were either software designers or hardware designers. Because of this separation grea(er flexibility (a design can be reprogrammed relatively easily), reduced NRE cost
of design expertise, systems .had to be separated into the software and hardware subsystems (designing using those technologies is generally cheaper). faster time-to-prototy pe and
very early in the desig:i process, separately designed, and then integrated near the end.of the. time-to-marke t (since designing takes less time), and lower cost in low volumes (since the IC
process. However, such early and pennanent separation dearly dOt:Sn't allow for the best manufacturer distributes its IC NRE cost over large quantities of !Cs). On the other hand,
optimization of design metrics. Instead, being able to move functions between hardware and more customized technologies provide for better power efficiency, faster perfonnance.
software, at any stage of the design process, provides for better optimiz.ation. ...· . reduced size. and lower cost in high volumes. ·
TI1e relatively recent maturation of RT and behavioral synthesis tools has enabled a, Recall that each of the three processor technologies can be implemented in any of 1he
unified view of the design process for hardware and software. In the past, the design processes three IC technologies. For example, a general-purpose processor can be implemented un a
were radically different - software designers wrote sequential programs, while hardware PLO. semicustorri, or full-custom IC. In fact , a company marketing a prod11ct, such as a
designers connected components. But today, synthesis tools have changed the hardware set-top box or even a general-purpose processor, might first market a semicustom
designer's task essentially into one of writing sequential pro~ . ajbeit_..~th some , implementatio n to .reach the. market early, and then later introduce a full-custom .
knowledge 9f how the hardware will be synthesized from such programs: We ;can think ·of implementatior:i. They might also first map the processor. to an older but mme reliable
abstraction levels as being the rungs ofa ladder, and compilation and synthesis as enabling us . . technology, like 0.2 micron, and then later map it to a newer teclmology, like 0.08 micron. ·
to step up.the ladderand hence enabling designers to focus their design efforts at higher ~ey~Is ; These two evoluHons of mappings to a large extent explain why a general-purpos e processor'~
....... Team
60000
~~ 50000 19
· -> i::
0
·u E 40000
::, ...... 23
'O ~
e t;,;o
0. 30000
.,!ii';;;a ./ Months until completion
I-< t:, 20000 43
10000 .Individual
oc ....,
00
"'
00 r--
00 °'
00 r--
0 °'
0 0 10 20 30 40
°' °' °' ~ ~ 0
N .
0
N
Number of designers
Flgur~ 1.15: The·growing "design prrn;lucti.vity gap."
Figure 1.16: The "mythical man-month": Adding designers can decrease individual productivity and at some point
can actually delay the project completion time.
clock speed improves :on the market over time. Likewise, a designer of an e~bedded system :·1··:.•·.·.:,
may use PLDs for prototyping a product, and even for the first few hundred m.stances of the 1
. The situation is even worse than stated before, because the discm;sion assumes that
product to speed its time-to-market, switching to ASICs for larg~r-scale productloIL designer productivity is independent of project team size, whereas in reality adding more
Furthermore we often implement multiple processors of different types on th~ same IC. designers to a project team can actually decrease productivity. Suppose 10 designers work
Figure 1.2 was 'an example of just such a situation - the, digital cam~ra mcl~ded_ a together on a project; and each produces 5,000 transistors/month, so that their combined
microcontroller plus ·numerous single-purpose processors on the same IC. A smgle_ chip with output is IO * 5,000 = 50,000 transistors/month. Would 100 designers on a project then
multiple processors is often referred to as a system-ori-a_-chip. In fact, we can even implement produce 100 * 5,000 = 500,000 transistors/mqnth? Probably not. The complexity of having
more than one IC technology on a single IC~ a portion of the IC may be custom, ai_iother 100 designers work together is far greater than having 10 designers work' together. Even
portion scmicustom, and yet another portion programrna~le logic. The n~ for designers calling a meeting of 100 designers is a fairly complex task, whereas a 10-<iesigner meeting is
comfortable with the variety of processor and IC technologies thus becomes evident. quite straightfon'vard. Furthermore, a 100-designer team would likely be decomposed into
groups; each group having a group leader that meets with other group leaders and reports back
. Design Productivity Gap · , · to his or her group, thus introducing extra layers of communication·.and hence more likelihood
cif misunderstandings and time-consuming mistakes.
While designer productivity has grown at an impressive rate _over the past decades, the ra~e _of· •
This decrease in productivity as designers are added to a project was reported by
improvement has not kept pace with chip capacity growth. F1~e I. 15. shows the p~oducti~1ty ··
Frederick Brooks in his classic 1975 book entitled The Mythical 1:{an-Month. His book
growth plot superimposed on the chip capacity gro~ plot, 1llm;~t1ng th~ growmg design
focused on writing software, but the same principle applies to designing hardware. . The
productivity gap. For example, in 1981, a leadmg-edge chi~ requue_d about 10~
decrease in productivity due to .team-size complexity.can at some point-.actually lengthen the
designer-months to design, since 100 designer:-months * 1_00 trans1stors/~es1gner-month -
time to complete a project. For example, consider a hypothetical 1,000,000 transistor project,
!0,000 transistors. However, in· 2002, a leadmg-edge chip woul~ reqmr~ about 30,00~
in which a designer working alone can produce 5,000 transistors per month, and each
designer · months; since 30,000 designer-~o~ths * 5,000 tran_s1stor~des1gner-month . -
additional designer added to the project results in a productivity decrease of 100 transistors
150 000 000 transistors. So the design productlVIty gap has resulted m '.1" increase from 100 to
per designer, due· to the added complexities of team communication and management. So a
30,000 designer-months to build a leading-edge chip'. Assuming a designer cos!s $10,000 per
designer can complete the project . in 1,000,000 / 5,000 =· 200 months,· 10 designers can
month, the cost of building a leading-edge chip has risen from, $1,0~,000 m 1~81 to ~
incredible $300,000,000 in 2002. Fe·w products can justtl'.Y su~h large _mvestnlent m a chip. produce4,100 transistors per month each; meaning IO~ 4,100 = 41,000 transistors per month
Thus, most designs do not even come ~lose to using potential chip capacity. . total. requiring 1;000;000 I 41,000 = 24 .3 months to complete the project. Figure 1.16 plots
individual designer productivity as designers are added to the project. The figure also plots
www.compsciz.blogspot.in
-------------:-_:_--------------l----------~----...;.~_______1.;:·~--:R.:,e:.:,fete=.:nc=:e:s.:a::,:n:d.:_F:u:,:rt:h:er:_:R:~::a~c!'.'.;i!'l~!f
-C-ha-_p-t-er_1_:_ln-tr_od_uc-t-io_n__
team productivity, computed simply as the number of designers multiplied by their individual : The chapter is mostly a review of the features of such proc~ors; we assume the reade
productivity. Project completion _limes for different team sizes, compute? as 1,000,000 :, already has familiarity with . programming such processors using structured language:.
transistors divided by team-transistors/month, are also shown, A 25-designer team can .; Chapter 4 covers standard smgle-purpose processors, describing a number of common
produce 25 * 2.600 = 65,000 transistors per month, requiring l ,000,000/65,000 = 15.3 months l peripherals used in embedded systems. Chapter 5 describes memories, which are components
to complete the project. However, a 26-designer team also produces 26 * 2.500_ = 65.000 ., neces~ to store data for processors. Chapter 6 describes buses, components necessary to
transistors per month. so adding a 26m designer doesn' t help. Furthermore. a 27-_designer team commurucate data among processors and memories, beginning with basic interfacing
concepts, and introducing more advanced concepts and describing common buses. Chapter 7
produces only 27 * 2.400 = 64.800 transistors per month, thus actually delaymg the proJect ._•:
provides an e_xample of using processor technology to build an embedded system, a digital
1-·;···-
:
:
completion time to I 5.4 months. Adding more designers beyond 26 only worsens the proJcct
completion time. Hence. man-months are in a sense mythical: We cannot always add camera, illustrating the trade--0ffs of several diffef\ent implementations. ·
designers to a project to decrease the project completion time. . . . . Chapter 8 introduces some advance4: _techniques for programming embedded systems.
Therefore. tlie growing gap between IC capacity and designer productlVlty m F_1gure 1.15 including state machine models, and concurrent p'rocess models. It also introduces real-time
is even worse than the figure shows. Designer productivity decreases as we add designers to a systems. Chapter 9 discusses the very common class of embedded systems known as control
project. making the gap even larger. Furthennore. at some point we sim~ly cann_ot decr~se systems, and introduces some design techniques used for such systems.
project completion time no matter how much money we can ~pend on designers, smce addmg , C1'apter 10 describes the three main IC technologies with which we can implement the
designers will decrease the project team's overall productivity. And therefore. leading-edge :, processor-based designs we learn to create in the earlier chapters. Finally, Chapter l l
chips cannot always be desi 9ned in a given time period. no matter how much money we have ,. summarizes key tools and advances in design technology and emphasizes me need for a new
to spend on designers. ·· . . . . , breed of engineers for embedded systems proficient with boili software and hardware <lesign.
Thus. a pressing need exists for new design technologies that will shnnk the design gap. !
One partial solution proposed by many people is to educate designers not Just m one subarea ij
of embedded svstems. like hardware design or software design. but instead to educate them to ij 1.8 References and _Further Reading
be comfortabl~ with both hardware and software design. _TI1is book is intended to contribute to U
this solution. i • Brooks Jr., F.P., The Mythical Man-Month, anniversary edition. Reading. M A: Addison-
1.6 Using the revenue model of Figure 1.4(b), derive the percentage revenue loss equation ij 1.21 Compute the annual growth rate of (a) IC capacity and (b c1es·
1.22 If Moore' s law continues to hold, predict the a ' ro~ igner producttvi~.
. ..
for any rise angle, rather than just for 45 degrees (Hint: you should get the same I{
equation). _ _ ; leading edge re in (a) 2030, (b) 20so. PP te number of transistors per-
1.7 Using the revenue model of Figure 1.4(b), compute the percentage revenue loss if D = 5 : 1.23 Explain why single-pmpose processors (hardware) and eneral-
and W = 10. If the company whose product entered the market on time earned a total :• essenuaU~:he sain.e, and then describe how they differ ingterms o~~ pr~rs are
revenue of $25 ·million, how much revenue did the company that entered the market 5 ['. 1.24 Whal 1s a remnssance engineer," and why is it so important in t h e ~ metrics.
months late lose? i 1.25 What is the design gap? nt market?
1.8 What is NRE cost? ·, l; 1.26 Compute the rate at which the design productivity ga · · ·
implication of this growing gap? P is growmg per year. What IS the
edesign of a particular 4isk drive has an NRE cost of $100,000 and a unit cost of f
1.27 Define what is meant by the "mythical man-montlt"
O. ~o~ much will we hav~ fo add t<?. the, cost o~ each product to cover our NRE cost, tj
1.28 Assume a designer's prQductivity when working alone on a project is 5 000 transist
. _ ~ull).ing we sell: (a) 100 uruts, and (b) 10,000 wuts? :
• 1.10 cite a graph-with the x-axis the number of units and the y-axis the product cost. Plot ; per month_: Ass~e that each additional designer reduces productivitr by 5'¾ ;:~
the per-product cost function for an NRE of $50,000 and a unit cost of$5. f, k~ep m mmd this 1s an_ e_xtremely simplified model of designer productivity!) ;;/Plot
. 1- For a particular product, you detennine the NRE cost and unit cof.t to be the following [' team ,monthly producltv1ty versus team size for team sizes ranging from I to 40
for the three listed IC technologies: FPGA: ($10,000, $50); ASIC: ($50,000, $10); ~ designers. (b) Plot on the same graph the project completion time versus team size for
VLSI: ($200,?00, $5). Determine precise volumes for which each technology yields the (1 projects of sizes 100,000 and 1,000,000 transistors. (c) Provide the "optimal" number of
lowest total cost. ,, designers for each.of the two projects, indicating the number of months required· h ··,
1.12 Give an example of a recent consumer product whose prime market window was only ~ ~ . m~
~~~ F / 1,,,-J jQ, Q08Jcp '.f ",DOc
1.13 Create an equation for total revenue that combines time-to-market and NRE/unit cost ¢ ./ //,,/ • a
considerations. Use the revenue model of Figure 1. 4(b). Assume a 100-month product [ ,.,.;:i -t!G"(
lifetime, with_ peak revenue of $100,000 month. Compare use o'. a general-purposef' _ o\JC) 7 so• \J ' G
· ·processor havmg an NRE cost of $5,000, a wut cost of $30, and a tune-to-market of 12 1 i v; v l~ ? d,C) . 9 00 ?C -t
~ ;J~~ ~
months (so only 88 months of the product's lifetime remain), with use of a single-!'
purpose processor having an NRE cost of $20,000, a unit c;ost of $10, and a time·-to- p \~ ~o
market of 24 months. Assume the amount added to each unit for profit is $5. /,
1.14 Using a spreadsheet, develop a tool that allows us to plug in any numbers for problem1.
I. 13 and generates a revenue comparison of the two technologies.
1.15 List .and define the three main processor technologies, What are the benefits of using;:
each of the three different processor technologies? f
f l~~/
1.1.6 List and define the three main IC technologies. What are the benefits of using each ofii N \1.-t/' J.k
the three diffe~ent IC technologies? . r ~ ~o0 J_, ;;; I ,..,.,--
1.17 List and define the three main design technologies. How are each of the three different/: Y' Vl- ~,
design technologies helpful to designers? r' / 0
1.18 Create a 3*3 grid with the three processor technologies along the x-axis, and the thre(t\ (if,.. ..)_ 6{)1 QOO. YJ f .
re technologies along the y -axis. For each axis, put the most programmable fonnj ~ ~-
closest to the origin, and the most customized fonn at the end of the axis. Explaillf, / OO ,,.R • 9 [0
features and pOS!iible occasions for using each of the combinations of the two['. d-O
9
L '"
c ·
technolog!es. . . _ _ _
IJ9 Redraw Figure 1.9 to show the transistors per re from 1990 to 2000 on a lmear, nol[i
l ogarithmk, scale. Draw a square representing a 1990 IC and another representing al!
[
sao ~
~ ·cS 33 °'
'3 0 ,:;. ?--0
2000 IC, with correct relative proportions. · t vf '
1.20 Provide a definition of Moore's law. I! J ~ ).,,o _ =
6 ?i- (' {) 50
~ ~ - - - - - - - - - -_-------1':c
26
hl
::=~-=-=-:--~___:_---~~~---:c-=--
Embedded System Design ~ Embedded System Design Lj )°3' - "",.
:)
/
v
~ ~ z:,
ti._ __
.. - ·· . . . ~ ........ l_
~--- - --- - ·- - ~-"-"""'---~www.compsciz.blogspot.in
-- ~·--·---· ·-···
____Ja g ~~.~
-· ~ '"'
·I:_·
}
ti l I
J_•
~:. I
i
J;
i:
t:
2.1 Introduction ·
A processor is a digital cin:uit designed-to perform computatton tasks. A processor consists of
a datapath capable of storing and manipulating data and a controller capable of moving da1a
through the datapath. A general-purpose processor is designed such that it can catty out a
wide variety of computation tasks, which are described by a set of programmer-pr!)Vided
f. instructions. In contrast, a single-purpose processor. is designed specifically to cany out a
! particular computation task. While some tasks are so common that we can purchase standard
single-purpose processors to' implement those tasks, others are unique to · a particular .
embedded system. Sue~ custom tasks may be best implemented using custom single-purpose;
processors that we design ourselves.
An embedded .system designer may obtain several benefits by choosing to use a custom
single-purpose processor rather than a general-purpose processor to implement a computation
task.
t First, performance may be faster, due to fewer clock cycles resulting from a customized
I. datapath, and due to shorter clock cycles resulting from simpler functional units, fewer
1: multiplexors, or simpler controller logic. · Second, size may be smaller,. due to a simpler
www.compsciz.blogspot.in
~----,...
----'-----------------------------
g~
S't°- Conducts
1fgate= l
IC package IC
drain
(a) x-e F=x'
x-q ~y
F = (xy)'
datapath and no program memory. Third, power consumption may be less, due to more
efficient computation.
"-"q s source
l°'"docts
1f gate=O
drain
0 --
0 --
x-l
--
F = (x+y)'
~ y
However, cost could be higher because of high NRE costs. Since we may not be able to , (b) (c) (d) (e)
afford to invest as much NRE cost as can designers of a mass-produced general-purpose '.
processor, performance and size could actually be worse. Time-to-market may be longer, and
flexibility reduced, compared to general-purpose processors. . Figure 2.2: CMOS transistor implementations of some basic logic gates: (a) nMOS transisto (b) MOS ·
(c) mverter, (d) NANO gate, (e) NOR gate. r, P transistor,
In this chapter, we describe basic techniques for designing cµstom processors. We start .
with a review of combinational and sequential design, and we describe methods for ,
converting programs to custom single-purpose processors. the top !ransistors conducts and the bottom transistors do not conduct, so logic I appears at F.
If bo~h mputs are lo~1c I , then neither of the top transistors conducts, but both of the bottom
transistors do, so logic_ 0 appears at F. Likewise, we can easily build a circuit whose output is
log1c I .when both of tis m?uts ~e logic 0, as illustrated in Figure 2,2(e). The three circuits
2.2 Combinational Logic shown_unplement three_bas1c log1c gates: an inverter, a NANO gate, and a NOR gate.
D1g1tal syste;n designers usu.ally work at the abstraction level of logic gates rather than
Transistors and Logic Gates -;' tr'.111s1stors. Figure 2.3 describes eight basic logic gates. Each gate is represented symbolically
A transistor is the basic electrical component in digital systems. Combination~ of transi~tors f with a Boolean. equation, and with a truth table. The truth table has inputs on the left and ~
output on .the nght. !he AND gate outputs I if and only if both inputs aie I. The OR gate
form more abstract components called logic gates, which designers use when building digital :
systems. Thus, we begin ~th a short description of transistors before discussing logic design. ' outputs I tf and only tf at least one of the inputs is I. The XOR (exclusive-OR) gate outputs I
A transistor acts as a simple on/off switch. One type of transistor, complementary metal ·
oxide semiconductor (CMOS), is showninFigureH. Figure 2.2(a) shows lhe schematic ofa,
transislor. The gate, not to. be confusl:d with logic gate,·controls whether or not current flows '
from lhe soura1 to· the drai'!. We can apply either low or high voltage levels to the gate. The •
high. level may. be, for_-example, t3 o.r +5 volts,_· .w.hich we'll_ re.fe.r to as logi-·c _1. The Jowl;
voltage is typically ground, drawn as sevefcl). horizontal lines of decreasing width, which we'll ,
"f>£1~1~1u
F=x
I I
F = xy I
Xy F
.{)
0 0
0 I 0
0 0 _ F = x+y
X y F
0 -0 0 -- ~
0 I I
I 0 I F = xEBy
X
0
0
I 0
I
F
0 0
I
I
refer to as logic 0. When logic l is applied to the gate, .the transistor conducts and so current . Driver AND i I I I OR I I I XOR I I 0
flows. When logic Ois applied to the gate, the IIimSis:tor does not conduct. We can also build a '
transistor with the opposite functionality, illqstrated ,in Figure 2.2(b). When logic Q is applied ; y F
D
X
to the ~te, the transl-· stor~nduc!f. Whenlop<; 1 is.appli~ ~!!~- ___ ·stordoesnQt ~duct.. t ~F~F ~ :
~ 0 l 1L_f
x F
O O I
~!
1../_J 0 0 I y •
X F
0 0 I
Given these two baste transistors, we can easily bwld a cuCUit whose outpu_t rnverts its f- 0 I 0
gate input, as shown in Figure 2.2(c). Wlten thi_µ:iput x is. logi~ 0,. th~ top transistor conducts ; F = X, : ·I - O· F = (X-y)'
' • 0 I 1~
I 0 0 I 0
0 F = (x+y)' I 0 0 F=x@y
Inverter - NANb · •1 I 0 0
and the bottom transistor does notcon<Jug, ~ logic 1 appears at the oµtput F. w_e can also _ NOR I ·I 0 XNOR I ·1 I
easily puild a circuit whose output is Io~c 1 when at feast one of its inpu~is iogic o, as ;
shown in Figure 2.2(d). When at least one of the inputs x and y is logic 0, then at least one of , - " : :·;~~:-;-:--:~.c....;_;_~~~..::..____;~-.i..a;;;.:;_~.;;___~;.;...;_--
Figure 2.3: Basic logic gates.
L _. __
- ···- -····· · - · ,=·- = = www.compsciz.blogspot.in
•• - -M ••• ••• •- -•• .__ , •••-••- ••
Chapter 2: Custom Single-Purpose Processors: Hardware
if and only if exactly one of its two inputs is I. The NAND, NOR, and XNOR gates output ,,
the complement of AND, OR, and XOR, respectively ·
Even though AND and OR gates are easier to comprehend logically, NAND and NOR ·
I 2.2: Combinational Logic
inputs. Such a circuit has no memory of past inputs. We can use a simple technique to design
a combinational circuit from our basic logic gates, as illustrated in Figure 2.4. We start with a
problem description, which .describes the outputs in terms of the inputs, as in Figure 2.4(a).
gates are.more commonly used, ~d those are the gates we buil_t using transistors i_n Figure : We translate that description to a truth table. with all possible combinations of input values on
2.2. The NAND could easily be changed to AND by changing the I on the top to O and the O [' the left and desired output values for each combination on the right, as in Figure 2.4(b). For
on the bottom to I; the NOR could be changed to OR similarly. But it turns out that pMOS · each output column, we can derive an output equation, with one equation term per row, as in
transistors don't conduct Os very well, though they do fine conducting ls, for reasons beyond _ Figure 2.4(c). We can then translate these equations to a circuit diagram. However, we usually
this book's scope. Likewise, nMOS transistors don't conduct ls very well, though they do want to minimize the logic gates in the circuit. We can mfnimize the output equations by
fine conducting Os. Hence, NANDs and NORs prevail. f algebraically manipulating the equations. Alternatively, we can use Kamaugh maps, as shown
in Figure 2.4(d). Once we've obtained _the desired output equations. we can draw the circuit
Basic Combinational Logic Design _ diagram, as shown in Figure.2.4(e).
A combinational circuit is a digital circuit whose output is purely a function of its present
RT-Level Combinational Components
Although we can design all combinational circuits in this manner, large circuits would be very
(a) (d) complex to design. For example, a circuit with 16 inputs would have 2 16, or 64K, rows in its
y be 00 .01 II IO truth tabJ-e. One way to reduce the complexity is to use combinational components that are
y is I if a is I, orb and care I. z is a '>----r---r-,,~"""T""----,
I ifb ore is I, but not both (or, a, more powerful than logic gates: Figure 2.5 shows several such combinational components,
0 0 0 I ! 0
b, and care all I). often called register- transfer, or RT, level components. We now describe each briefly.
: I
.....TTl a
A multiplexor, sometimes called selector, allows only one of its data inputs Im to pass
....... j.........
through to the output 0. Th.us; a multiplexor acts much like a railroad switch, allowing only
(b) y=a+bc
one of multiple input tracks to connect to a single output track. If there are m data inputs, then
z there are logi(m) select lines S. We call this an m-by-1 multiplexor, meaning m data inputs,
Inputs Outputs be 00 01 II 10
a b C y z a 'P--~~-,-.----,-~..,-, and I data output. The binary value of S determines which data input passes through; O(LOO
0 0 0 0 0 .0 0. I j O !I l J: means JO passes through, 00...0 I means/ I passes through, 00 ... 10 means /2 passes through,
0 0 I 0 I
r-1- . . -·r·1-tl r
I·
and so on. For example, an 8x I multiplexor has eight data inputs and thus three select lines. If
-0 I 0 0 I 0
..... ··-- -..i--..,...;..:
~ t those three sele~t lines have values o_f 110, then 16 will pass through to the output. So if 16
0 I I I 0 f
z = ab+ b'c + be' were I, then the output would be I; if /6 were 0, then the output would be 0 . We commonly
I
I
0
0
0
I
I
I
0
I
r use a more complex device called an n-bit multiplexor, -in which each data input as well as the
I I 0 I I (e) ri output consist of n lines. Suppose the previous example used a 4-bit 8 x l multiplexor. Thus, if
I ·J I
(c)
I I
----t
a i==~~~==::;--:---,
b ,_....__,..__,
C
t
l
r:
f
16 were 1110, then the output_ would be 1110. Note that n is independent of the number of
select hoes.
Another combinational component is a decoder. A decoder converts _its binary input I into
a one-hot output 0 . "One-hot" qieans that exactly one pf the output lines can be I at a given
time. Thus, if there are n outputs, then there must be.log2 (n) inputs. We call this<!. logi(n)xn
r decoder. For example, a 3 x8 decoder has three inputs and eight outputs. If the in1iut were 000,
y = a'bc + ab'c' + ab'c + abc' + abc f then. the output 00 would be I and all other outputs would be 0. If the input were 001, then
r the output O I would be 1, and so on. A common feature on a decoder is an extra input called
z = a'b'c + a'bc' + ab'c + abc' + abc I
1:~ enable. When enal)le is 0, all outputs are 0. When enable is Y, ·the decoder fundtioris as txffore.
An adder adds two n-bit binary inputs A and B, generating an n-bit output sum along with
I, an output carry. For example, a 4-bit adder would have a 4-bit A input; a 4-bit B. ihput, a 4-bit .
--,-------------------~-----------~ ,.r sum. output, and a I ,bit qar.ry .<>11tput. If A .were 10, 0 and B were 1001, then sum would be
Figure 2.4: Combinational logic design.: (a) problem description, (b) truth table. (c) output equations, (d) minimized f 0011 and ~arry would.beJ. An adderco~en comes with a carry input also, so that such adders
output equations, (e) final circuit. Ii can be cascaded to -create larger adders.
www.compsciz.blogspot.in
. -- ·--~·~. . -- - -~- ->-----···-- . ~ ---...-~--------.
Chapter 2: CustC1m Single-Purpose Processors: Hardware
2.3: Sequential Logic
I(m-1) II IO I(log(n)-1) IO A B A B A B
n
nbit, ~
~
·-
m function SO
ALU ~ I Q
'---n.......-~S(;,,-log m)
Q
0 O(n-1) 0100 carry swn less equal greater Q
0 Q= Q=/sb Q=
O= 00 =I if/=0..00 sum =A+B less = I if A<B O = A hp B 0 if clear= I, - Content shifted 0 if clear= I ,
/0 if S=0 .. 00 01 =1 if/=0 .. 01 (first n bits) equal =I if A=B op determined / if load=! and clock=!, - I stored in msb Q(prev)+l ifcount=I
/1 if S=0 ..01 carry= (n+ I )'th greater=! if A>B by S Q otherwise.
and clock=!.
On =I if/=1..11 bitofA+B
/(m-1) if S=i .11
www.compsciz.blogspot.in
Chapter 2: Custom Single-Purpose Processors: Hardware 2.3:Sequentiallogic
(a) (e)
A shift register stores n bits, but these bits cannot be stored in parallel. Instead, they must Construct a pulse divider. Slow QIQO
be shifted into the register serially, meaning one bit per clock edge. A shift register has a I-bit down your pre-existing pulse so 11 00 01 II 10
data input /. and at least two control inputs clock and shift. When clock is rising and shift is 1, that you output a I for every four a
0
the Yalue of/ is stored in the nth bit, while sin:mltaneously the nth bit is stored in the (n - l)th _pulses detedeJ. 0 0
bit. the (n - l)th bit is stored in the (n - 2)th bit, and so on, down to the second bit being
stored in the first bit. The first bit is typically shifted out, appearing over an output Q. (b) 0 0
a=O a=O
A counter is a register that can also increment, meaning add binary I, to its stored binary 11 =Ql'QOa+Qla'+QIQO'
, ·alue. In its simplest form. a counter has a clear input, which resets all stored bits to 0, and a x=l IO QIQOOO 01 11 lO
count input. which enables incrementing on each clock edge. A counter often also has a · a=l
parallel load data input and associated load control signal. A common counter featqre is both a : :
a=l
up and down counting or incrementing and decrementing, requiring an additional control a=I
O 0 LL...........L' 0
input to indicate the count direction. r -
These control inputs can be either synchronous or asynchronous. A synchronous input's
x=O I ,_ __ L.,i 0 0 u__
,·alue only has an effect during a clock edge. An asynchronous input's value affects the circuit
a~o a=O
IO = QOa' + QO'a
independent of the clock. Typically, clear control lines are asynchronous. (c)
X QIQ000
X 01 II lO
Sequential ·Logic Design a Combinational logic a
0
Sequential logic design can be achieved using a straightforward technique, whose steps are. 0 0 r1··1 0
illustrated in Figure 2.7. We again start with a problem description, shown in Figure 2.7(a).
QI QO 0 0 L1...J 0
We translate this description to a state diagram, also called a finite state machine (FSM), as in
Figure 2.7(b). We describe FSMs further in a later chapter. Briefly, each state represents the ; '' x=QIQO
current ··mode"' of the circuit, serving as the circuit's memory of past input values. The f State register ~
desired output values are listed next to each state. The input conditions that cause a transition f.
from one state to another are shown next to each arc. Each arc condition is implicitly ~ II a
··AN.Dcd'' with a rising (or falling) clock edge. In other words. all in_puts are synchronous. All f
inputs and outputs must be Boolean, and all operations must be Boolean operations. FSMs (
can also describe asynchronous systems, but we do not cover such systems in this book, since f dl
they arc not very common. · ~ Inputs Outputs
We will ill}plcmcnt this FSM using a register to store the current state, and combinational ·
QI QO a 11 IO X
iogic to generate the output values and the next state. as shown in Figure 2.7(c). We assign to
0 0
each state a unique binaI) value. and we then create a truth table for the combinational logic, 0
0 I
as in Figure 2.7(d). The inputs for the combinational logic are the state bits coming from the 0 I •
slate register. and the external inputs. so w·c list all combinations of these inputs on the left . I O O
side ofthc table. The outputs for _the. combin.ational logic arc the state bits to be loaded into I:
the register ori the next clock edge (the next state), and the external output values, so we list ·
desired. values of these outputs for each input combination on the riiht side of the table. :
BcC:iiusc we used a state.diagram for which outputs were a function of tilt! current state only, t
and ncit of the il)puts, \\C list an external output value only for each possible state, ignoring the (
c\_t:rn.!11, inp.ut v~.·_,lucs. .N.o,,_· t_~llll we .hav~ a. t.~tb table. we. procee.d ,vith co.mbinati~nal .logic 1·· Figure 2.7: Sequential logi~·design: (a) problem description. (b) state diagram, {c) implementati.;.. model, (d) state
design as ocscnbcd carhcr. o~· _g cneratmg muum1zed output equations as shown m Figure . t•ble (\lonr<-typ•). (•) minimized· output equations, (I) combinational logic.
2.7.(ci: anq th!!n drawing the combinational logic circuit as in Figure 2.7(f). As you cari see, ,
scqucntta( logic design is very much like combinational logic design, 3S long as WC draw the .•
state table in such a way that it can be·used as a combinational logic trilth table also. [
_____ __ . ______ __ ___,___ -: •.~ -- ~ ~ :.... · ·:. ·- ··.:...··· ..~ - - ~-,v--· ·_., _ _- ·. . ·~~-.... - -
www.compsciz.blogspot.in ·- - -··----- -~·~-- de "/2S•lz¼: ..,: <.· ; -~~· -,..- . - -· .. ___ _ ___
";.~ -
t,c_J
/ system's data._ Exam~les of data in an embedded system in~~~de~inary__11u!11bers representing @
external external e~~':'1-al condillons hke_ temperature or speed, characters to be displayed on ascreen""." or a ,
control data d1~1t1zed p~otograp_hic unage to ~ sto°:'1 ~d ~mpressed. The datapath contains register t,~
inputs
datapath
inputs
uruts, functional umts, and connect.Ion wuts like wires and multiplexors. The datapath can be ~
control configured to reac!___data from particular regi§l~S, feed that data through functional units
contmller inputs datajili configured to carry out particu operations like aild or shift, and store the operation results
./ 2!.~ into particular regis.te~_s.. cont,,_o_l./er carries ·out such configuration of the. . datapath. It
~ the datapath control . puts, 1 e regt er _ _ an multiplexor select signals. of the
register units, fu<'" Jjg.~f npits, a conneclioU:urli~tain, the desired-;-configma~at a
particular t· _:. t ~~tors tema1.con~ei1..:,_~uts as~ ~- saa~--p~th
-~~~ti.oLoutputs. ~----;;
as s t@als, comm ortt functional wuts 1t ~xterrial comiol out uts as w
e can app y t combinational an u :· -·· · ..,- ··· -·,-·- s escribed e lier
to build a con er and a datapa . Of , we now describe a technique to convert a
computati task into a ~ustom single-purpose processor consis~-of a controller and a
Figure 2.8: A basic processor: (a) controller and datapath, (b) a view inside the controller and datapath. datapath. . ~ - ·
We begir. with a sequential program describing the computation task that we wish to
One of the biggest difficulties for people new to implementing FSMs is understanding implement. Figure 2.9 provides an example task of computing a greatest common divisor
FSM and controller timing. Consider the situation of being in state O in Figure 2.7(b). This (GCD). Figure 2.9(a) shows a black-box diagram of the desired system. having x_i and y _i
means that the output x is being assigned 0. and the FSM is waiting for the next clock pulse to data inputs and a data output d_o. The system's functionality is straightforward: the output
come along. When the new pulse comes. we'll transition to either state O or I, depending on should represent the GCD of the inputs. Thus, if the inputs are 12 and 8, the output should be
input a . From the impleme!ltation model perspective of Figure 2.7(c), in state 0, the state . 4 . If the inputs are 13 and 5, the output should be I. Figure 2.9(b) provides a simple progran1
register has Os in it. and these Os are trickling through the combinational logic, eventually II with this functionality. The reader might trace this program's execution on these examples to
. producing a stable O on output x. and the ne:\.1 state signals JO and/ I are being produced as a ,: verify that the program does indeed compute the GCD .
function of the state register outputs and input a. Input a needs to be stable before the next To begin building our si~gle-purpose processor implementing the GCD program, we first
clock pulse comes along. so that the next state signals are stable. When the ne:\.1 pulse does convert our program into a complex state diagram, in which states and arcs may include
come along. the state register w~I be loaded with either 00 or OI. Assume OI, meaning we arithmetic expressions, and those expressions may use external inputs and outputs as well as
will now be in state I. Tnen this 01 will tricJde through -the combinational logic, causing variables. In contrast, our earlier state diagrams included only Boolean expressions. and those
output x to be 0. And so on. Notice that the etions of a state occur slightly after a clock pulse expressions could use only external inputs and outputs, not variables. This more complex state
causes us to enter that state. diagram is essentially a sequential program in which statements have been schedOled tnto
Notice that ·there is a fundamen · assumption being made here regarding the clock states. We'll refer to a complex state diagram as afinite state machine with data (FSMD).
frequency. namely. that the clock Ii ency is fast enough to detect events on input a. In other , We can use templates to convert a program to an FSMD, as illustrated in Figure 2.10.
words. input a must be held at its alue long enough so that the next clock pulse will detect it First, we classify each statement as an assignment statement, loop statement, or branch
If input a switches from O to I d back to 0, all in between two clock pulses, th!'!n the switch (if-then-else or case) statement. For an assignment statement, we create a single state with that
to I would never be detect . Yet the clock frequency must be slow enough to allow outputs statement as its action; and we add an arc from this state to the first state of tile next statement,
to stabilize after being generated by the combinational logic. We recommend that one study as shown in Figure 2. IO(a). For a loop statement, we create a condition state C and a join state
the relationship between the FSM and the implementation model for a while, until one ls J, both with no actions, as shown in Figure 2.IO(b). We add an arc with the loop;s condition
com able with this relationship. from the condition state to the first statement in the loop body. We add a second arc with the
complement of the loop's condition from the condition state to the next statement after the
loop body. We also add an arcfrom the join state back to the condition state. For a branch
Custom Single-Purpose Processor Desi statement, we create a condition state C and a join state J, both with no actions. as shown in
Figure 2. lO(c). We add an arc with.the first branch's condition from the condition state to the
We now have the knowledge needed io build a basic process . sic processor consists of branch' s first statement. We add another arc with the complement of the first branch's
a controller and a datapath, as iliustrated in Figure 2.8. Th 'data _th stores and manipulates a
---~, condition ANDed with the second branch's condition from the condition state to the branch's
www.compsciz.blogspot.in
I
r
Chapte r 2: custom Single-Purpose Pr0C::esS0(~:
Hardw are
(a) (b)
2.4: Custo m Single- Purpos e Processor Design
I I:
int X, y;
cl strnts
else' if c2
2: next stater ent c2 strnts
GCD else
d_o other strnts
next stater ent
, - - - - - - !cond
(a)
C:
cl !cl "c2
0: int .x, y; 4:
·.... . l. . . ·7 . . ..
next ···:
1: while (1) { (c) statement cl stmts c2 stmts : other stmts
2: ~le (!go_i );
5:
7
3: X =Xi;
4: y=y-iG
}:'~~)x;{ 6: J:
lse · 7:
I
9: _,,,~ x ;
X =' X - y;
6-J:
next
statement
next
statement I
I
., ,. .-;:>,;.;;: . -
~:·/
- .... (b) Figure 2. JO: Templates for creating a slate diagram from
program stateme nts: (a) assignment: (b) loop. (c) branch.
..--·
/ ' ~·,,. 5-J:
,:
/ /
and a controller part, as shown in Figure 2.11. The
9: ! interconnection of combinational and sequential compo
consist ofa pure FSM (i.e., one containing only Boole
datapath part should consist of an
nents. The controller part should
an actions and conditions.)
1-J: We construct the datapath through a four-step process:
Figure 2 _9, Example program _ Greatest Common Divisor 1. First, we ·create a register for any declared variab
le. In the example, the variables are
stale diagram. ·
(GCD): (a) black-b ox vie", (b) desired fun~tionality,
(c) : xand y. We treat an output port as an implicit variab le, so we create a register d and
connect it to the output port. We also draw the input
and output ports. Figure 2.1l(b)
first statement. we. repeat this for e~ch branch. Finally .shows these three registers as light-gray rectangles.
, we co!Ulect the '.11'c leaving the iast ·
statement of each branch to the join state, and .we 2. Second, we create a functional unit for each
add an ar.c from this state to the, next arithmetic operation in the state
1· diagram. In the example, there are two subtractions
statement's state. , one comparison for less than,
Using this template approach, we conve rt our GCD . . and one comparison for inequality, yielding two subtra
program to the FSMD of Fi~re ctors and tm comparators,
. shown as white rectangles in Figure 2.1 l(b).
2.9(c). Notice that variables are being assigned in
x = x _ y , which also includes an .arithmeti.c operat
so~e of th~ stat~, such as th~ acu~n
ion. Agam, vanab.les and arithmetic
1· 3. Third, we connect the ports, registers, and functio •
nal units. For each write to a
operations/conditions are what make FSMDs more power . variable in the state diagram, we draw a connection·'1
ful th~ FSMs. ro·rn the write's source to the
We are now well on our way to design ing a custom . . variable's register. A source may be an input port,
s1~gl~-~~se processor, that . a functional unit, or another
executes the GCD program. Our next step is to divide register. Fcrr .each aritlunetic and logical operation, we
the funcuo nah~ mto a datapath par1 connect sources to an input of
the operation's corresponding functional· unit. When
more than one source is
www.compsciz.blogspot.in
. ·.•. ~.,.... :. ,,.·_....~. - -./.: ........,...-. -~_, '
Chapter·2: Custom Single-Purpose Processors: Hardware
2.4: Custom Single.Purpose Processor Design
r'
::,
~
e -· -
I
X
00
0
I w ;::::; - 0 t
"'
!!..
'-<
I
"'
!!..
t
0:
'-<
I
0:
I
Cl.
0:
the FSM of Figure 2. ll(a) representing our controller. The FSM has the same states and
transitions as the FSMD. However, we replace complex actions and conditions by Boolean
ones, making use of our datapath. We replace every variable write by actions that set the
'-<
1
go_i elk
(a) Controller ~Datapath
r-----,___._
0000 l:L_ _ y----, ++t-1--'_"-_=,.~~.-_
0001 2:
0010 2-J:
0011 3:
0100 4:
!x neq V
·---- ........
(c) Controller implernenta~on model
go_1 Figure 2.12: State table for the GCO example.
Combinational I
logic select signals of the multiplexor in front of the variable's register such that the write 's source
passes through, and we J1.sse1t the load signal of that register. We replace every logical J
operation in a condition by the corresponding functional unit control output. In this FSM, any ·f
signal not explicitly assigned· in a state is 'implicitly assigned a 0. For example, x Id is ·t
1001 'i
implicitly assigned O in every state except for 3 and 8, where x_Id is explicitly assigned ~ 1
t
l
We can then complete the controller design by implementing the FSM using our :I
1010 5-J: sequential design technique described earlier and illustrated in Figure 2.4. Figure 2. 1 l(c) :I
shows the controller implementation model. Figure 2.12 shows a state table for the controller.
Note that there are seven inputs to the controller, resuliing in 128 rows for the state table. We
reduced rows in the stace table of the figure by using * for some input combinations, but we
llOO 1-J: can still see that optimizing the design using hand techniques could be quite tedious. For this
reason, computer-aided design (CAD) tools that automate both the combinational and the
sequential logic design can be very helpful; we'll introd11ce some CAD tools in the last
chapter. CAD tools that automatically generate digital gates from sequential programs,
FSMDs, FSMs, or logic equations are known as synthesis tools .
. Figure 2.11: Example program - Greatest Common Divisor (GCD): (a) controller, (b) datapath, (<;) contr~er model.
Also, note that we could perform significant amowtts of optimization to both the datapath (a)
and the controller. For example, we could merge functional units in the datapath, resulting in
fewer units at the expense of more multiplexors. We could also merge a number of states into Bridge
a single state, reducing the size of the controller. Interested readers might· examine the rdy_in A single-purpose processor that rdy_out
converts two 4-bit inputs, arriving one
textbook by Gajski referred to at the end of this chapter for an introduction to these clock at a time over data_in along with a
optimizations. rdy_in pulse, into one 8-bit output on
Note that we could alternatively implement the GCD program by programming a data_out along with a rdy_out pulse.
general-purpose processor, thus eliminating the need for this design process, but possibly data.:_in(4) data_out(8)
yielding a slower and bigger design.
Finally, we once again discuss timing, this time for FSMDs rather than FSMs. When in a
. particular state, all actions internal to that state are considered to be concurrent to one another. rdy_in=O Bridge rdy_in=f
· Those actions are very different from a sequential program, in which statements are executed
in sequence. So, if x = 0 before entering a state A in an FSMD, and state A's actions are "x = .
WaitFirst4 RecFirst4Start RecFirst4End
x + l" and "y = x," then y will equal 0, not I, after exiting state A. This concurrency of
data_lo=data_in
actions also implies that the order in which we write the actions in the state does not matter.
Furthermore, note that actions consisting of writes to variables do not actually update rdy_in=O rdy_in=O
those variables until fhe next clock pulse, because those variables are implemented as rd in=I
registers. However, arcs leaving a state may use those variables in their conditions. Thus, an
arc leaving state A, but using variable x, is using the old value of x, 0 in our example in the WaitSecond4 RecSecond4Start RecSecond4End \
data_hi=data_in !
previous paragraph. Assuming an outgoing arc is using the new value assigned in the arc's !
source state is by far the most common mistake that people make when creating FSMDs. If I
rdy_in=O I
we wish to assign a value to variable x and then branch to different states depending on that Inputs
value, then we must insert an additional state before branching. Send8Start rdy_in: bit; data_in: bit[4);
data_out=data_hi Send8End Outputs
&data_lo rdy_out;=Q rdy_out: bit; data_out:bit(8)
rdy_out=l Variables
data_lo, data_hi: brt[4];
2.5 RT-Level Custom Single-Purpose Processor Design
(b)
Section 2.4 described a basic technique· for converting a sequential program into a custom
single-purpose processor, by first converting the program to an FSMD using the provided
templates for each language construct, splitting the FSMD into a simple FSM controlling a Figure 2.13: RT-level custom single-purpose processor design example: (afproblem specification, (b) FSMD.
datapath, and performing sequential logic design on the FSM. However, in many cases, we
prefer not to start with a program, but instead directly with an FSMD. The reason is that often Different designers might attack this problem at different levels of abstraction. One
the cycle-by-cycle timing of a system is central to the design, but programming. languages designer might start thinking in tcnns of registers. multiplexors. and flip-flops. Another might
don't typically support cycle-by-cycle description. FSMDs, in contrast, make cycle-by-cycle try to describe the bridge as a sequential program. But perhaps the most natural level is to
timing explicit. describe the bridge as an FSMD, as shown in Figure 2.13(b). We begin by creating a state
For example, consider the design problem in Figure 2.l3(a). We want one device (the Wai!First4 that waits for the first 4 bits, whose presence on data.in will be indicated by a .
sender) to send an 8-bit number to another device (the receiver). The problem is that while the pulse on rdy_in. Once the pulse is detected, we transition to a state RecFirst./Start tl1at saves
receiver can receive all 8 bits at once, the sender sends 4 bits at a time; first it sends the the contents of data_ iri in a variable called data_lo. We then wait for the pulse on r,{v in to
low~order 4 bits, then the high-order 4 bits. So we need to design a bridge that will enable to end, and then wait for the other 4 bits, indicated by a second pulse on rdv in. We save the
two devices to communicate. .. contents of data_in in a variable called data_hi. After waiting for the seco~ct" pulse on rely in
to end, we write the full 8 bits of data to the output data__out, and we pulse_rdy_o ut. We
www.compsciz.blogspot.in
··- --~ -..- -- · . ..... ~ · - ·d : >~ - ~ - , : : : ½~ --~--- - -- ·
Chapter 2: Custom Single-Purpose Processors: Hardware 2.6: Optimizing Custom Single-Purpose Processors
Bridge shown in Figure 2.14(a). This conversion requires only three simple changes, as shown in
bold m the figure. Having obtained the FSM, we can convert the FSM into a state-register and
(a) Controller combinational logic using the same technique as in Figure 2. 7; we omit this conversion here.
rdy_in=O rdyc..in=l This exa~ple demonstrates how_a _problem that consists mostly of waiting for or making
changes on signals, rather than cons1sung mostly of perfonning computations on data, might
WaitFirst4 RecFirst4Start RecFirst4End most easily be descnbed as an FSMD. The FSMD would be even more appropriate if specific
data_lo_ld=l numbers of clock cycles were specified (e.g., the input pulse would be held high exactly two
cycles and the output pulse would have to be held high for three cycles). On the other hand, if
rdy_in=O rd in=l a problem consists mostly of an algorithm with lots of computations, the detailed !i'lling of
,_...,__ __.__._...._rdy_in,=_I_ _ _ _ ____ which arc not especially important, such as the GCD computation in the earlier eX3)¼,le, then
RecSecond4Start RecSecond4End a program might be the best starting point.
data_hi_ld=l The FSMD level is often referred to as the register-transfer (RT) level, since an FSMD
describes in each state which rcgisters should have their data transferred to which other
registers, with that data possibly being transformed along the way. The RT•level is probably
the most common starting point for custom single-purpose processor design today. ·
Send8Start Send8End Some custom single-purpose processors do not manipulate much data. These processors
data out ld=l rdy_out=O
rdy_out=l
consist primarily of a controller, with perhaps no datapath or a trivial one with just a couple
registers or counters, as in our bridge example of Figure 2.14. Likewise, other custom
single-purpose processors do not exhibit much control. These processors consist primarily of
a datapath configured to do one or a few things repeatedly, with no controller or a trivial one
rdy_in rdy_out with just a couple flip-flops and gates. Nevertheless, we can still think of these circuits as
elk processors.
'
data_in(4) data_out
46
the GCD example, if we assume we can make use of a modulo operation <Y.;, we could write
an algorithm that would use fewer steps. In particular, we could use the following algorithm: (a) I: (b)
int x, y, r;
while (1) {
while ( !go i);
:-z:
if (x_i >= y_i) {x=x_i; y=y_i; l
else {x=y_i; y=x_i;} II x must be the larg~r . nuinber 2cJ: int X, y;
while (y != OJ {
r = X % y; 2:
X = y;
go_. !go_i
y r; ,---'--
I 3:
do= x;
Let us compare this second algorithm with the earlier one when computing the GCD of
42 and 8. The earlier algorithm would step through its inner loop with x and y values as
follows: (42;8), (34,8), (26,8), (18,8), (10,8), (2,8), (2,6), (2,4), (2,2), thus outputting 2. The
second algorithm would step through its. inner loop with r and y values as follows: (42,8),
(8,2), (2,0), thus outputting 2 . The second algorithm is far more efficient in terms of time.
Analysis of algorithms and their efficient design is a widely researched area. The choice of --.•
l 7:
algorithm can have perhaps the biggest impact on the efficiency of the designed processor.
operation doWI_t into smaller operations, lik·e tl = b * c, t2 = d * e, and a = ti * t2, with each
Vf"'
FSMD is shown in Figure 2. l5(b). We reduced the FSMD from thirteen states to only six : smaller o~ration having its own state. Thus, only one multiplier would ·be needed in the
states. Be careful, though, to avoid the common mistake of assuming that a variabie assigned i datapath, st~ce the three multiplications could share multiplier; sharing will discussed in re
in a state can have the newly assigned value read on an outgoing arc of that state! ,. the next section. · _/
. The original FSMD could also have had too few states to be efficient in terms of : In this scenario, we assumed that the timing of output operatio~ could be changed. For
hardware size. Suppose a particular program statement had the operation a = b *·c * d * e. e~ple, the reduced FSMD will generate the GCD output in fewer clock cycles than the
Generating a single state for this operation will require us to use three multipliers in our • ong_mal FS~. _In many cases, changing the timing is not acceptable. For ex;unple, in our
datapath; However, multipliers are expensive, and thus we might instead want to break this ·
..
· earher clock divider example, changing_the timing clearly would not be acceptable, siJK:e we
intended for the·cycle-by-cycle behavior of the original FSM to be preserved during design.·· The state merging that we did when optimizing our FSMD was not th
Thus, when optimizing the FSMD, a design must be aware of whether output timing may or · · · · · e same as state
rrurunuzatton as defined here. The reason is that our state merging in the FSMD
may not be modified. changed the output behavior, in particular the output timing,. of the FSMD T . 1·1 actually.
f . · · yp1ca Y by the
1me we amve at an FSM, we assume output timing cannot be changed State m· · ' · .
. .
does not c hange the output behavtor · 1rum1zauon
Optimizing the Datapath m anf way.
In our four-step datapath approach. we created a unique functional unit for every arithmetic
operation in the FSMD. However, such a one-to-one-mapping is often not necessary. Many
arithmetic operations in the FSMD can share a single functional unit if that functional unit 2. 7 Summary
supports those operations, and those operations occur in different states. In the GCD example,
states 7 and 8 both performed subtractions. In the datapath of Figure 2.11, each-subtraction Desig~g a custom sin~l~-purpo~ proces~r for a given program requires an understanding
got its own subtractor. Instead, we could use a single subtractor and use multiplexors to of v~nous as~':' of d1gi_tal des~gn. ~1gn of a c~cuit .to .implement Boolean functions
choose whether the subtractor inputs are randy, or instead y and r. require~ comb1nattonal ~~~· which consists o~building a truth table with all possible inputs
Furthermore, we often have a number of different RT components from which we can an~ desrred outputs, optt~ztng the output functions, and drawing a circuit. Design of a circuit
build our datapath. For example, we have fast and slow adders available. We may have !0 impleme~t a ,state di~gram requires sequential design, which consists of drawing an
multifunction .components, like ALUs, also. Allocation is the task of choosing which RT 1~plementat1_on model with. a .stat~ ri:gister and a combinational logic block, assigning a
components to use in tlie datapath. Binding is the task of mapping operations from the l1SMD bi~ en~ng to_each state, drawm? a state table with inpu~ and outputs, and repeating our
to allocated components. ~m~1natt?nal design process for this table. Finally, design of a single-purpose processor
Scheduling, allocation, and binding are highly interdependent. A given schedule will circuit to 1mple~ent a program requires us to first schedule the program's statements into a
affect the range of possible allocations, for example. An allocation will affect the range of complex state diagran_i, 90nstruct a ~~ from the diagram, create a new state diagram that
replaces complex acuons and conditions by datapath control operations and th d ·
possible schedules. And so on. Thus, we sometimes want to consider these tasks 11 · • fi . . , en es,gn a
simultaneously. ~ntro er crrcwt or the n~ sta~e dia~ usmg sequential desi~. The register-transfer level
1s the most co~on starting po~t ~f ~1~ today. Much optimization can be performed at
Optimizing the FSM
each level of design, but such optmuzatton 1s hard, so CAD tools would be great des·
aid. · · .
a , ..
1gner s
Designing a sequential circuit to implement an FSM also provides some opportunities 'for :
optimization, namely, state encoding and state minimization. - - - - - - - : - - - - - - - - : - - - - . . : . . . . __ _ _ ___;._ _ _ _ _ _ _ _ _ _ __
State encoding is the task of assigning a unique bit pattern to each state in an FSM. Any 2.8
assignment in which the encodings are unique will work properly, but the size of the state
References and Further Reading .
register as well as the size of the combinational logic may differ for different encodings. For · • De Michel!, Giovanni, Synthesis and Optimization of Digital Circuits. New York:
example, four states A, B, C, and D can be encoded as 00, 01, 10, and 11, respectively. ,
0
McGraw-Hill, 1994. Covers synthesis techniques from sequential programs down to
Alternatively, those states can be encoded as 11, 10, 00, and 01, respectively. In fact, for an gates.
FSM with n slates where n is a power of 2, there are n! possible encodings. We can see this
• ·- Gajski, D~el D,, Pr!nci[J.les of pigi~al P~sign, Englewood Cliffs, NJ: Prentice-Hall,
easily if we treat encoding as an orderil).g problem - we order the states and as~ign a 1997._ Descnbes combn~attonal and sequential logic design, with a focus on optimization .
straightforward binary encoding, starting with 00... 00 for the first state, 00... 01 for the second teclutlques, CAD, and higher levels of design.
state, and so on. There are n! possible orderings of n items, and thus n! possible encodings. n! • Gajski, ~aniel D.'. Nikil ~ Allen Wu, and Steve· Lin, High-Level Synthesis:
is a very large number for large n, and thus checking each encoding to determine which yields lntroducllon to Chip an.d _System Design. Norwell, MA: Kluwer Academic Publishers
the most efficient,controller is a hard problem. Even more encodings are possible; since we 1_992. Emphasizes optimizations when
converting sequential programs to a custo0:
can use more than logi(n) bits to encode n states, up to n bits to achieve a one-hot encoding. smgle-purpose processor. · · ·
CAD tools are therefore a great aid in searching for the best encoding. • Katz, Ran<IJ:, ContemfJ!1ra_ry Logic Design. Redwood City, CA: Beiljamin/CwmniJ~gs,
State minimization is the task of merging equivalent states into a single state. Two states 1994. ~escn~s. cor.nb1nauonal and sequential logic design, with a focus oil logic and
are equivalent if; for all possible input combinations, those two states generate the, same sequential opttnuzation and CAD: · · ·
outputs and transition to the same next state. Such states are clearly equivalent, since merging :
them will yield exac,tly the same output behavior. · ·
-=
·' ~··=
½-':~-··;.-
4-~·.,.
;.; ·= >.,...;;.,"'.'"""-""=-=·~. ~ -~- ... ---------~,=~ .~-~ --
;.; :pg
... www.compsciz.blogspot.in
------------- ·· - ---- - ·~ f rc w·-- · - ·";,S-''·;:;;i.::;; ·.;, ···--· ,-, -.·
Chapter 2: Custom Single-Purpose Processors: Hardware
2.9: Exercises
one clock cyde_per instruction (advanced processors use parallelism to meet 6r exceed
Exercises
2.9
2. 1 What is a single-purpose processor? What are the benefits of choosing a single-purpose .
processor over a general-purpose processor?
;~~r~:~r ~~s;:~:·~;;·a ~~~:~rr~it ~;:::~::ied gates with 200,000 cates, a .
Design a si~gle-purpose processo~ that outputs Fibonacci nwnbcrs up to n places. Start
2.2 How do nMOS and pMOS transistors differ? with a function computmg the desired result. translate it into a state diagram. and sketch
2.3 Build a 3-input NAND gate using a minimum number of CMOS transistors. a probable datapath.
2.4 Build a 3-input NOR gate using a minimum number of CMOS transistors. 2.20 Design a circuit that does the matrix muitiplication of matrices A and B. Matrix A is
2.5 Build a 2-input AND gate using a minimum number of CMOS transistors. 3 x2 and matrix Bis 2x3. The multiplication works as follows: ·
2.6 Build a 2-input OR gate using a minimum number of CMOS transistors. A B C j
[:j]·
2. 7 Explain why NAND and NOR gates are more common than AND and OR gates. h a•g ~ b*i a*h + b*k a*i + b*I ] . l
2.8 Distinguish between a combinational circuit and a sequential circuit. k [ c•g ·id"J c *I, + d*k c *i + d*I I.
i Fi
2.9 Design a 2-bit comparator (compares two 2-bit words) with a single output "less-than," ·
using the combinational design technique described in the chapter. Start from a truth
. e•g e *I, + f*k e*i + f*! !
table, use K-maps to minimize logic, and draw the final circuit. 2.21 An algorithm for matrix multiplication, assuming that we have one adder and one !
i
2.10 Design a 3x8 decoder. Start from a truth table, use K-maps to minimize logic and draw m~ltiplier, follows. (a) Convert the matrix multiplication algorithm into a state diagram !
the final circuit. usmg_ the t~mplate provided_ in Figure 2.10. (b) Rewrite the matrix multiplication i
2. 11 Describe what is meant by edge-triggered and explain why it is used. algonthrn _gi~en. the assumpl.lon that we have three adders and six multipliers. (c) If I
2.12 Design a 3-bit counter that counts the following sequence: l, 2, 4, 5, 7, l, 2, etc. This. each muluphcal.lon takes two cycles to compute and each addition takes one cvcle !
counter has an output "odd" whose value is l when the curreni count value is odd. Use compute, how many cycles does it take to complete the matrix multiplication given.one
the sequential design technique of the chapter. Start from a state diagram, draw the state adder and one mult1pher? Three adders and six multipliers? Nine adders and 18
table, minimize .the logic, and draw the final circuit. mult!pliers? (d) If each an adder requires 10 transistors to implement and each
2.13 Four lights are connected to a decoder. Build a circuit that will blink the lights in the mult1pher requires 100 transistors to implement. what is the total number of transistor
following order: O; 2, I, 3, O, 2, .... Start from a state diagram, draw the state table, needed. to implement the matrix multiplication circuit using one adder and one
minimize the logic, and draw the fin~ circuit. mult1pher? Three adders and six multipliers? Nine adders and 18 multipliers? (e) Plot
your results from parts (c) and (d) into a graph with latency along the x-axis and size
along the y-axis.
contrcller 1----'--....cS_O_l---1i.-10 main( ) (
Sil int A[3] [2 ] ( {l, 2), {3,4 ), (5, 6 ) );
int B[2] [3] ( (7, 8, 9}, (10, 11, 12} J;
int C[3](3];
2.14 Design a soda machine controller, given that a soda costs 75 cents and your machine int i , j, k;
accepts quarters only. Draw a black-box view, come up with a state diagram and state •
table, minimize the logic, and then draw the final circuit. for (i=O; i < 3; i++){
2. 15 What is the difference between a synchronous and an asynchronous circuit? for: ( j=O; j < 3; j++) (
C[i] [j] =O;
2.16 Determine whether the following are synchron0us or asynchronous: (a) multiplexor, (b)
for (k=O; k < 2; k++} (
_ ~gister, (c) decoder. . .
C[il[j) -t= A[i](k] * B[k][j ];
~ What is the purpose of the datapath? of the controller? . '
2.18 Compare the GCD custom-processor implementation to a software"implementation (a)
Compare the performance. Assume a I 00-ns clock for the microcontroller; and a 20-ns }
clock for the custom processor. Assume the microcontroller uses two operapd
·instructions, and each instruction requires four clock cycles. . Estimates for the 2.22 A subway has an embedded system controlling the turnstile. whicLes when two
. microcontroller are fine. (b) Estimate the number of gates for the custom design,- and tokens ~re deposited. (a) Draw the FSMD state diagram for this system. (b) Separate the
compare this to 10,000 gates for a simple 8-bit microcontroller. (c) Compare tlle custom FSMD mto an FSM+D. (c) Derive the FSM lqgic using truth tables and K-maps to
G_C D with the GCD rulUling on a 300-MHz- processor with 2-operand instructions and minimize logic. (d) Draw your FSM and datapath connections.
3.1 Introduction
A general-purpose processor is a programmable digital system intended to solve computation
problems in a large variety of applications. Copies of the same processor may solve
computatio.n problems in applications .as diverse as communication, automotive, and
industrial embedded systems. An embedded-system designer choosing to use a general-
purpose processor to implement part of a system;s functionality may achieve several benefits.
· First, the unit co.st of the processor may be very low, often a few dollars or less; One
reason for this low cost is that the processor manufacturer can spread its NRE cost for, the
processor's design over large numbers of units, often numbering in the millions or billions.
For example, Motorola sold nearly half a billion 68HC05 microcontrollers in 1996 alone
(soUl:'Ce::Motorola 1996 Anrtual Report). · . ··· · .
Second, because the processor manufacturei: can spread NRE cost over large numbers of
units, the inanufacturer can afford to invest<lai:ge NRE cost into the processor's design,
without significantly increasing the unit cost the processor manufacturer may thus use
r
Emtiedcied S'f!>tem besisn 55
•,, .., • .,; .• · ·is;;,.·.- · .........;,:a -·- •c'.'r·- ....... ... . .. .. · ~ v~ ··: ~ :_ .~· - - -·--··--·- ·--·--- "s--~ ~ . :,o,;~ ·= =··~·a~~ ----
www.compsciz.blogspot.in
Chapter 3: General-P11rpose Processors: Softwllre 3.2: Basic Ar<:hitecture
of storing temporruy data. Temporary data may include data brought in from memory but not
Processor
yet sent through the ALU, data coming from the ALU that will be needed for later ALU
operations or will be sent back to memory, and data that must be moved from one memory
Control unit Datapatb location to another. The internal data bus carries data within the datapath, while the external
data bus carries data to and from the data memory.
Controller ALU · We typically distinguish processors by their size, and we usually measure size as the
bit-width of the datapath components. A bit, which stands for binary digit, is the processor's
basic data unit; representing either a O (low or false) or a I (high or true), while we refer to 8
bits as a byte. An N-bit processor may have N-bit-wide registers, an. N-bit-wide ALU, an
N-bit-wide internal bus over which data moves among datapath components, and an N'-bit
wide external-bus O\'er which data is brought in and out of the datapath. Common processor
sizes include 4-bit, 8-bit; 16-bit, 32-bit, and 64-bit. However, in some cases, a particular
1/0 processor may have different sizes among its registers, ALU, infernal bus, or external bus, so
the processor-size definition is not an exact one. For example, a processor may have a 16-bit
internal bus, ALU and registers, but only an 8-bit external bus to reduce pins on the
processor's IC. ·
Figure 3.1: General-purpose processor basic architecture.
Control Unit
experienced computer architects who incorporate advanced architectural features, and may The control unit consists of circuitry for retrieving program instmctions and for moving data
use leading-edge optimization techniques, state-of-the-art IC technology, .and han?c~ed to, from, and through the datapath according to those instructions. The control unit has a
VLSI layouts for critical components. These factors can improve design metncs like program counter (PC) that holds the address in memory of the next program instruction to
perfuanance, size and power. . . fetch, and an instruction register (IR) to hold the fetched instruction. The control unit also has
Third, the embedded system designer may incur low NRE cost; sm::e the _d esigner need a controller, consisting of a state register plus next-state and control logic, as we saw in
only write software, and then apply a compiler and/or an assembler, bo~h of which are mature Chapter 2. This controller sequences through the states and generates the control signals
and iow-cost design technologies. Likewise, time-to-prototype and time-to-market will be necessary to read instructions into the IR, and control the flow of data in the <latapath. Such
short, since processor !Cs can be purchased and then programmed in _the ?esigner's own lab. flows may include inputting two particular registers into the ALU, storing ALU results into a
Flexibility will be great, since the designer can perfonn software rewntes m a straightfonvard particular register. or moving data between · memory and a register. The controller also
manner. detennines the next value of the PC. For a nonbranch instruction. the controller in&:rements
the PC. For a branch instruction, the controller looks at the datapath status signals and the IR
to determine the appropriate next address .
The PC' s bit-width represents the processor's address size. The address size is
. 3.2/a~ic Architecture ...... . _ . . . independent of the data word size; the address size is often larger. The address size
A general-purpose processor, sometimes Called . a CPU (central proc~ssmg urut) or a determines the number of directly accessible memory locations, referred to as the address
microprocessor, consists of a datapath and a control unit, tightly linked with a memory. We space or memory space. If the address size is M, then the address space is zM_ Thus, a
now discuss these .c omponents briefly. Figure 3.1 illustrates the basic architecture.
processor with a 16-bit PC can directly address z1 6 = 65,536 memory locations. We would
typically refer to this address space as 64K, although if lK = 1,000, this number would
Datapath . . represent 64,000, not the actual 65,536. Thus, in computer-speak, lK = 1,024 .
. The datapath consists of the circuitry for transfonning data and for storing temporary data. , For..each instruction the conu-oller ~rpically sequcnccs4hro11g-h-se\/eraLstages, such as
The daiapath contains an arithmetic-logic unit .(ALU) capable of transforming data through fetching U1e instruction . from memory. decoding _it_,_....fgchiog opecands_ _executing the
operations such as addition, subtraction, log\cal AND,_logical OR, inverting, and shifti~g. ~e inst~"!wninthe<latapath, arid.siormg-results. Each stage may coosisLo.[.011.e...QLIIIOre clock
ALU also. generate~ status signals; ofieJ:1_stored in a..status register (not s~own), mdicaung cycles. A cl~tcycle is usually the long~ tii~_LC.QUiteiLfotdataJo travel from orie register .
piirti.cular da,ta .c onditions. Such conditions includdndicating whether da~ 1s ze~o or whether to anotfieC-rhe_..P.a.!h through the data path gr contr~tlLeL.l.ha.t..results..i.rLthis long~st time <e.g.,
an addition of tw.o data items generates ac;arry,.The _datapath also contruns. registers capable from a datapath register "tiu-ough the ALU and back to a·datapathregister) is called the critical
---···· --· ...•.----- -----···· .
Processor Processor
Memory
(program and data)
Cache
;I .....,._ _ _ _...,...._,__ _ _....J .
(a) (b)
,-..--·-·.·~.··--···--·-- ·-· ·····--·-·-·-·····-··-;
j . ·~... - . -·-·-·· --··....·-·1
j, re 3.2: Two memory architectures: (a) Harvard, (b) Princeton.
•.i Memory
i
path. The inverse of the clock cycle is the clock frequency, measured in cycles per second, or / Slower/cheaper technology,
Hertz (Hz). For example, a clock cycle of 10 nanoseconds corresponds to a frequency of L._,_...~~~qf.lY..!?.~..~- d_i/ferent chip ...._ /
1/10 x 10·9 Hz, o_r 100 MHz. The shorter the critical path, the higl,.ler the clock frequency. We
often use clock frequency as a means of Comparing processors, especially different versions Figure 3.3: Cache memory.
of the same processor, with higher clock frequency implying faster program execution.
However, using clock frequency is not always an accurate method for comparing processor /4 reduce the time. ~eeded to access (read or write) memory, a local copy of a portion of
speeds. · · ~emory may be kept m a small but especially fast memory called cache. as illustrated in
Figure 3.3. Cache memory often resides on-chip and often uses fast but expensive static RAM
Memory techno1ogy rather than slower but cheaper dynamic RAM (see Chapter 5). Cache memorv is
While registers serve a processor's short-term storage requirements, memory serves the bas~ on the principle that if at a particular time a processor accesses a particular me~orv
proces or's medium- and long-tenn information-storage requirements. We can classify stored locat~on;.then the processor will likely access that location and immediate neighbors of th~
i ation as either program or data. Program information consists of the sequence of location m the near future. Thus, w~en we first access a location in memory. we copy that
• structions that cause the processor to carry out the desired system functionality. Data location and some number of its neighbors (c d a block) into cache. and then access the
information represents the values hying input, output and transformed by the program. · copy o~ the location in cache. When w cess another location, we first check a cache table
We can store program and data together or separately. ln a ~ t e c t u r e , data to see if a copy of the location re · es in cache. If the copy does reside in cache. we have a
and program words share the same. memory space. In a Han,ard architecture, the program cache hit, and we can read rite that location very quickly. If the copy does not reside in
memory s ace. is istinct from the data memory space. Figure J:2 illustrates these two cache, we have a cache 1ss, so we must copy the location's block into cache. which takes a
I method nceton architecture may result in a simpler hardware connection to memory, Io~ of time. Thus, ti a cache to be effective in improving performance, the ·ratio or hits to
since on one connection is necessary. A Harvard architecture, while requiring two misses must be ery high, requiring intelligent c,aching schemes. Caches are used for both
conn ions, can perform instruction and data .fetches simullaneously, so may resuit in l program m ory (often called instruction cache, or I-cache) as well as data memory (often
i roved perforrnanc~ Most machines have a Princeton architecture. The Intel 8051 . is a !
~
calle D. ache), ·
well-known Harvard architectur~ s, -3_3-. _ _ _ _ _ __ __ _ _ _ _ ___::___
Memory may be read-onl~emory (ROM) or readable and writable memory (RAM).
ROM is usually much more compact than RAM. An embedded system often uses ROM for
program memory, 'since, unlike in desktop systems, an embedded system's program does not
change. Constant qata may be stored in ROM, but other data of course requires RAM.
Memory may be on-chip or off-chip. On-chip memory resides oil'. the same IC as the r
processor, while off-chip memory resides· on a separate IC. The processor can usually access
on-chip memory much faster than off-chip memory, perhaps in just one cycle, but finite JC
i l!
capacity of course:implies only a limited amount of oil~hip .meinory. 1. ask of reading the next instruction from memory into the ~
R
4
~
;I
58 Embedded System Design
Embedded System Design 59 ~·.i
~
. ·-····-·--··- - - -···- ·-~ ~...··~ ;.,.,.="'"'·'=·· ~ -- .... www.compsciz.blogspot.in ~ - ~ r=""'"""""""'"'""""'.'Ll., .· ,c.. ,,.J
Chapter 3: General.Purpose Processors: Software
t Wash
-11~,2-l~3l-4~l5-l-6l. .,.7"'T'"ls~I
· · Non-pipe;.:lin::;; . .:ed
.........._ _~ _
ltl2l3!4l5l6l7l8!
Pi lined
decode unit decodes it while the instruction fetch un· simultaneously fetches the next
· ction. The idea of pipelining is illustrated · Fi e 3.4. Note that for pipelining to work
well, instmction execution must be decom eq~ le~gth 'stages, and
e into_~_u_~_hly""'
!Il2l3l4l5!6l7l8] !ll2l3i4l5!6l7l8l ctio uire the same n r of
Branches pose a problem for pipe · g, since we don't know the next instruction until
I I I I I I I I I I 1.. I I I I I I I I ., the current instruction has reached the execute stage. One solution is to stall the pipeline when
(a) Time (b) Time a branch is in the pipeline, waiting for the execute stage before fetching the nex1 instruction.
An alt~mative i~ t~ guess which way the bran~ll will go and fetch the corresponding
Fetch-instruction . 1 mstrucuon next; if n~ _we proceed with no ~ ty, but if we find out in the .execute stage
./ Decode
Fetch operands
Execute
that ~~ we~e wro
branch
we must ignore -~.in~ctions fetched si~ce the branch was fetched,
thus mcum a penallY, Mode
·ctors built in.
1pelined nncroprocessors often have. very sophisticated
· ·'
Store result
Superscalar and VLIW Architectures
I I I We can use multiple ALUs• to further speed µp a processor. A superscalar micprocessor can
(c)
execute two or more scalar operations in parallel, requiring two or more ALUs. A scalar
operation transforms one or two numbers, as opposed to vector or matrix operations that
. "sh c.leaning, (c) pipelined instruction execution. ,.·.
Figure 3.4: Piper ng; (a) nonpipelined dish cleaning, (b) pipelined di
transform entire sets of numbers. Some superscalar microprocessors require that the
2. Decode instruction: the task of detennining what operation the instruction in the . instructions be ordered statically (at compile time), wh\le others may reorder the instructions
instruction register represents (e.g., add, move, etc.). " .: dynamically (during runtime) to make use of the additional ALUs. A VLIW (very long
3. Fetch operands: the task of moving the instruction's operand data into f instruction word) architecture is a type of static supersca\31' architecture that encodes several
(perhaps four or more) operations in a single machine instruction.
appropriate registers.
4. Execute operation: the task of feeding ~ .e appropriate registers through the ALU
and back into an appropriate register. · . .
5. Store results: the task of writing a regist;,;r·into
...p~ ,V'··--· memitry. ;. 3.4 Programmer's View
' (
If each stage takes one clock cycle, then we can see that a single instruction may take ~ A progranuner writes the program instructions that carry out the desired functionality on the
several cycles to complete. . j general-purpose processor. The pro,.grammer may not actually need to know detailed
information about the processor's architecture or operation, but instead may deal with an
~Pipelining · .._ . J architectural abstraction, which hides much of that detail. The level of abstraction depends on
f (! .Pipelining is a ~olrimon way to increase the instruction throughput of a microprocessor. We i the le.vet of programming. We can distinguish between two levels of programming. The first
is assembly-language programming, in which one programs in a language representing
first make .a simple analogy of two people approachin.g the chore of washing and drying eight r··
dishes. In one approach, the first person washes all eight dishes, and then the second person . processor-specific instructions as mnemonics. The second is structured-language
dries all eight dishes. Assuming l minute per dish per person, this approach requires 16 ~ programming, in which one programs in a language using processor-independent instructions.
minutes. The approach is clearly inefficient since at any time only one person is working.and I: A compiler automatically translates those instructions to processor-specific instructions.
1l: the o~her is !die. Obvio.~sly, a better approach ~s for the second_ person tobe~in drying the first : ldeally, ·the structured-language programmer would need no information about the processor
I; q.ish unmediately after 1t has been washed. This approach reqmres only 9 mmutes - l minute · architecture, but in embedded systems, the programmer must usually have at least some
\;: for the .f i~t dish to be washed, and then 8 more minutes until the lastdish is finally dry . We. l awareness, as we shall discuss.
Actu~ly, _we -~ defi11e an even lower programming level, ma.9hine-language
\ \ refer to.this_latt~r ~pproa~h as "p!pelined." · · . . .··. . . I
\ , Each dish.1s like an mstruction, and the two sks of washing_,and drymg ~ like the five I programnung, m which the progrnmmer writes machine instructions in binary. This level has
· . ~ge~ h~~ ear!ier. By usJ.l,lg a separate_1.lilit . ch akin_to ~ per.son) for e,ru;~_stage, ~e- ca~l become extremely rare due to the advent of assemblers. Machine-language- progranuned
computers often had rows of lights representing to the programmer the current binary
·, P!~lme ms~cuo~ _execullon,0tter them cuon'fetch urut fetches the first mstruction, the I
instructions being executed. Today's computers look more like boxes or refrigerators, but
www.compsciz.blogspot.in
Chapter 3: General.Purpose Processors: Software 3.4: Progiamner's View_
Instruction Set
The assembly-language programmer must know .the processor's instruction set. ~e
Data
instruction set describes the bit configurations allowed in · the IR, indicating the atomic
processor operations that the programmer may invoke. Each such configuration forms an
assembly instruction, and a sequence of such instructions fonns an assembly program, stored
in a processor's memory, as illustrated in Figure 3.5. . Figure 3 .6 : Addressing mode;.
An instruction typically has two parts, an opcode field and oper~~ fi~lds. ~ opc_o<Ie
specifies the operation to take place during the instruction. We can classify 1~strucllons mto (}tie operand field may indicate the data's location through one of several addressing
three categories. Data-transfer instructions move data between memory and re~1sters,_ betw~en modes, illustrated in Figure 3.6. In immediate addressing, the operand field contains the data
input/output channels and registers, and between registers themselves. Anthmet1c/fog1cal itself. In register addressing, the operand fiel4 contains the address of a datapath register in
instructions configure the ALU to carry out a particular function, move data from the registers whic~ ~e. .~llli!.J.e
~ des._In register-indirect addressing, the operand field contains the-address
through the ALU, and move data from the ALU back to_ a particular register. Branch ol"=fi"'"register, which in tum contains the address of a memory location in which the, data·
instructions determine the address of the next program mstruct1on, based possibly on datapath resides. Irt direct addressing; the operand field contains the address of a memory location in
status sigrials. . .. . · .. . which the data resides. In indirect addressing, the 91~ rand field contains .the address of a
Branches can be further categorized as being uncondillonal Jumps, conditional Jumps or memorv location, which in· tU11tcn'niainsiheaddress orf ---~~~o'cy'To~iti~~-1n wtuch the data
procedure call.and return instructions. Unconditional jum_ps always de!e_n nine the address of resides~ Those familiar with structured language~ 111C:ly,note that direc~ addressing implements
I
the next instruction, while conditional jumps do so only 1f some condition evaluates to true, regular. vanab(es, -ana- mrurect ao(]ressmg hnpl_ements pointe~S-~ _Inh~rent or implicit
such as a particular register containing zero. A call instruction, in add!tion to .indicating the Mtskessmg, !!}~ particular register or memory location of the data 1s 1mphetnn the opcode; for
address of the next instruction, saves the address of the current mstrucuon so that a exaiiplc, the_data may Ieside irt a register · led th~ "accu_mbl~t~r.'@ in~exed ad~e~ing, the
subsequent return instruction can jump back to the insu:nction i~mediately ~ollowing th~ most
d~t or mdiret~er~ - .m
__•u.st be a. .•ed_... to _a_·p.·artl·. cu.lar 1mphctt register to obtam the act~
recent invoked. call instruction. This pair of instrucllons fac1htates the unplementauon of yefpcrlll}d~es~j~structi may use relative addressing to re'duce the number of btts
ptocedure/function call semantics of high-level programming languages . ·. ncederto indicate t11{Jum.P,·- --~;_ -,A reli ti~e actdf~ss indicates_how_J~uo jump from the
· 'An operand field specifies the location of the actual data ~t takes part man operatton. current address, rather ... ompl§ddre~s. Such addressmg is very common
Source operands serve as input to the operation, while a destmatton operand stores the output. . y - - .. .
smcc most Jumps-are near y mstruc
.
·
. ----- "
The number of operands per instruction varies among processors. ~ven fo~ a given processor, Ideally, the s ctured-langl! prog er would not need to know the instruction set
the number of operands per instruction may vary depending on the mstructton type. of the process . However arly every embedded system requires the programmer to write
al least so e portion the program in assembly language. Those portions may deal with
low-ley inp11Voutput operations with devices outside the processor, like a display device.
'
Assembly instruct First byte Second byte -Operation 0 MOVR0,#0; II total= 0
MOYRI, #I_?; // i = 10 ij
MOY Rn, direct 0000 Rn
I direct
I Rn= M(direct) 2
3
-MOYR2,#J;
MOVRJ,#0;
// constan! _I
//cons~tO
ii
int total = O;
r
MOY direct, Rn 0001 Rn I direct
I M(direct) = Rn for (int i=lO; i!=O; i-)
total+= i; Loop: JZRl, Next; II Dc:ine if i=O I
I I next instructicns .•. •n
MOY@Rn,Rm 0010 Rn Rm I r M(Rn)=Rm
5
6
'A DDRQ, RI ;
SUB RI , R2;
II total+= i
// i-
f
j
data-transfer instruction where the processor adds an o~rand field to the base register: to
opcode · operands
obtain an actual memory address.
Figure 3.7: A simple (trivial) instnu:tion set.
Other special-function registers must be known by both the assembly-language and the
structured-language programmer. Such registers may be used for configuring built-in timers,
Such a device may require specific timing sequences of signals in order to receive data, and counters, and serial communication devices, or for writing and reading external pins.
the progranunet may find that writing assembly code achieves such timing most conveniently.
A driver routine is a portion of a program written specifically to communicate with, or d,rive, 1/0
another device. Since drivers are often written in assembly language, the-structured-language The programmer should be aware of the processor's input and output (I/0} facilities, with
programmer may stiH require some familiarity with at least a subset of the instructio@ set. · which the processor conununicates with other devices. One common I/0 facility is parallel
Figure 3.7 si1ows a (trivial) instruction set with four data transfer instructions, two 1/0, · in which the programmer can read or write a port (a collection of external pins) by
aritlunetic.instructions, and one branch instruction, for a hypothetical processor. Figure 3.ll(a) reading or writing a special-function register. Another common 1/0 facility is a system bus,
shows a program written in C that adds the numbers l through 10. Figure 3.8(b) 'shows that consisting of address and data ports that are automatically activated by certain addresses or
same program written in assembly language using the given instruction set. types of instructions. I/0 methods will be discussed further in Chapter 6.
www.compsciz.blogspot.in
Chapter 3: General-Purpose Processors: Software
CheckPort proc
p..lSh ax ; save tlYil cxntent LPT Co!lllector Pin 1/0 Direction Rel(ister Address
p..lSh dx ; save tlYa cxntent I Output 01t1 bit of register #2
m::,v dx, 38:h+ 1 : base+ 1 for i:egi.ster #1 2-9 Output 0'"-7'" bit of register #0
in al, dK ; read register #1
10, 11, 12, 13,15 lnout 6,7,5,4,3"' bit of register# I
arrl al, !Oh ; mask rut all rut bit # 4
; is it 0? 14,16,17 Output 1,2,3" bit of rel(istcr #2
atp al, 0
jne switdLn ; if mt, we need to tum the I.ED en
SWitct-Dff:
m::,v dx, 38:h + 0 ; base + 0 for i:egi.ster #0 Figure 3.10: PC parallel port signals and associated registers.
in al, dK ; :rea::l tlYa run:ent state of the port
arrl al, feh ; clear first bit (ItaSking)
dx, al ; write it rut .to the port given in Figure 3.9. Writing and reading three special registers accomplishes parallel
out
jnp Dene ; we are d:ne communication on the PC. Those three registers are actually in afi 8255A Peripheral Interface
Controller chip. In unidirectional mode, (default power-on-reset mode), this device is capable
SWi.td'Dn: of driving 12 output and 5 input lines. In Figure 3. IO, we give the parallel port (known as
m::,v dx, 30Ch +0 ; base + 0 for i:egi.ster #0 LP1) connector pin numbers and the corresponding register location.
f al, dK ; read tlYa au:rent state of · the port
r
in A switch is connected to input pin number 13 of the parallel port. A light-emitting diode
or al, Olh ; set fust bit (ItaSk:i.n:J)
out dx, al ; write it rut to the port (LED) is connected to output pin number 2. Our program, running on the PC, should monitor
f}
''.·. Dene:
pep
pep dx
ax
; restore the cxntent
; restore the cxntent
the input switch and tum the LED on/off accordingly.
Figure 3.9 gives the code for such a program, in x86 assembly language. Note that the in
and out assembly instructions read and write the internal registers of the 8255A. Both
CheckPort en:;> instructions take two operands, address and data. The address specifies the register we are
// clefire:I in assetbly aboJe
trying to read or write. This address is calculated by adding the address of the device called
extem "C" CheckPort (void) ;
void J:ra:in(void) { the base address, to the address of the particular register as given in ·Figure 3.9. In m~st PCs,
while( 1 ) the base address of LPTl is at 3BC hex (though not always). The second operand is the data.
Ch:ckPort () ; For the in instruction, the content of this 8-bit operand will be written to the addressed
register. For the out instruction, the content of the addressed 8-bit register will be read into
this operand.
Figure 3.9: PC parallel port example.
The program makes use of masking, something quite common during low-level 1/0. A
mask is a bit-pattern designed such that ANDing it with a data item D yields a specific part of.
start at a particular memory location, while others recognize predefined names for particular
D. For example, a mask of OOOOII 11 can be used to yield bits 3 through O (e.g., 00001111
ISRs. AND IOIOlOIO yields 00001010). A mask of 00010000, or IOh in hexadecimal format,
For example, we may need to record the occunence of an event from a peripheral device,
would yield bit 4.
such as the pressing of a button. We record the event by setting a variable in memory when
In Figure 3.9, we have broken our program in two source files;assembly and C. The
that event occurs, although the user's main program may not process that event until later.
~ssembly program implements the low-level I/0 to the parallel port and the C program
llather than requiring the user to insert checks for the event throughout the main p_rogram, the
implements the high-level application.
pr9grammer merely writes an interrupt service routine and associates it with an input pin
connected to the button. The processor will then call the routine automatically when the
Operating System
button is pressed.
An operating system is a layer of software that provides low-level services to the application
Example: Assembly-Language Programming of Device Drivers . layer, a set of one or more programs executing on the CPU consuming and producing input
and output data. The task of managing the application layer involves the loading and
This example provides an application of assembly language programming of a low-level
executing of programs, sharing and allocating system resources to these programs, and
driver, showing how the parallel port of a PC can be used to perform digital I/0. The code is
-
1-·················-····-························. ·-···---·-·-·- - - - - -
DB file_narre "rut.txt" - store file naire
C File
M:JJ RO, 1324 - system call "open" id
M:JJ Rl, file narce - address of file-narre
!NI' 34 - cause a system call
JZ RO, Ll - if zero -> error
I2:
·i ---- l
Debugger I
Figure 3.11: System call invocation.
protecting these allocated resources from corruption by non-owner programs. One of the most Profiler
important resource of a system is the central processing unit (CPLJ?, which i~ typically s~ed
among a number of executing programs. The operating system 1s responsible for deciding
what program is to run next on the CPU and for how long. This is called process (or task)
;
, I
scheduling and it is determined by the operating system's preemption policy. Another very ~~~~°-~-~-~ ---··-····-'!_______
important resource is memory, including disk storage, which is also shared among the Figure 3.12: Soft_ware deve1°1'.ment pro":"s/'
applications running on the CPU. .
In addition to implementing an environment for management of high-level applica~on
programs, the operating system provides the software required for sen:icing vario~ . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
hardware-intenupts, and provides device drivers for driving the periphe~hdealv1ceds ~resent mh 3.5
the system. Typically, on startup, an operating system initializes all penp er eVtces, sue Development Environment
as disk controllers, timers, and inpul/output devices and installs hardware interrupt service In this section, we take a look at the general software design tools that are used by embedded
routines (ISRs) to handle various signals generated by these devices. Then it installs software system designers in design, test, and debugging of embedded software.
intenupts (intenupts generated by the software) to process system calls (calls made by
high-level applications to request opernting system services) as described next. . . Design Flow and Tools
. A system call is a mechanism for an application to invoke the operatmg system. It IS .
Several software and hardware tools -commonly support the programming of general-purpose
If
analogous to a procedure or function call, as in high-level programmjng lan~ges. When a
processors. First, we must distinguish between two processors we deal with when developing
program requires some service from the operating system, it generates a predefined software
an embedded system. One processor is the development processor, on which we write and
. intenupt that is serviced by the operating system. Parameters specific to the requested
debug our program. This processor is part of our desktop computer. The other processor is the
· services are typically passed from (to) the application program to (from) the operating system
target processor, to which we will send our program and which will form part of our
through CPU registers. Figure 3.11 illustrates how the file "open" system call may be
embedded system's implementation. For example, we may develop our system on a Pentiwn
invoked, in assembly, by a program. Languages like C and Pascal provide ~apper functions
processor but use a Motorola 68HC 11 as our target processor. Of course, sometimes the two
around the system-calls to provide a high-level mechanism for performing system calls.
processors happen to be the same, but this is mo~-tiy a coincidence.
· In ~ummary, the operating system abstracts away the details of the underlying hardw~re
· · Programming of an embedded system's processor is similar to writing a ~ that
. and provides the application layer an interface to the hardware through the system call
. mechanism. . . runs on your desktop computer, with some subtle but important differences. · e 3: 12
\ . 11
~epicts the standard software development process. 'Qte~elleral design flow_for programmmg
68
Embedded System Design Bnbedd@<.i. Sy!!lem Design
:,. 69
www.compsciz.blogspot.in
Chapter 3: General.Purpose Processors: Software 3.6: Development Environment
\ < / ~ ? ';:.> .
(a) (b) editing, compili.n: , assembling, and linking our program, is_the _same_as..that used for .
i
l
hnplementation
Phase
'
hnplementation
j~~?e:~;:~~~t ::t~Jf{~!ri;[~~~tro~~::i::l~!!~~~~~~i::
embedded
phase (Le., the proces•.,ofksting the.J1nal..el!-~1i!hl~...:::-~~"'."~ ~greatly-different.m
Phase
s s . n ~.- !~?o~ll . - paragra~, we":iY.~!_·d~~~.illie ~-~ch of-the~ev~ill¢i'U· t~ls i~
!~~]~~--;
Figure 3.12
llefail ~ c...!Y . . ·---- -
m ,lers iransl assembly \il~tructions to binary machine instructions. In addition to
\! Figure 3.12 [
Development processor
just replacing o
translate
sym
e and operand mnemonics by binary equivalents, an assembler may also
olic labels into actual addresses. For example, a programmer may add a
c label END to an instruction A and may reference END in a branch instruction. The
I·------··-····__.I iI '
sembler determines the actual binary address of A and replaces references to END by this
addri:s. Th.e mapping of assemb.1~ instructions to machine instructions is one-t~~)
ompilers translate structured _p_rograms into machine (or assembly) p r o ~
pro ·ng languages poss~s:, high,level constructs that greatly simplify programming,
such as loop constructs, so each high-level construct may translate to several or tens of
machine instructions.· Compiler technology has advanced tremendously over the past decades,
applying numerous program optimizations, often yielding very size and petfonnance efficient
cod~ cross compiler executes on one proces.~or ~ur development processor) but generates
code for a different processor (our target rocessoo/Cross compilers are extremely common
in;ffe.bedded system development. b
ainto
External tools
' ~ linker allows a prog to c a program in ~parately assembled or compiled
-!
J
1______ ---······· ------·-· Ph~
~ I
V e r i f i c a t i o n ~!
___ . - --- -·-· __ _ -·· ..J it
files; it combines the m me instructions of
program fi tpn19
•
•
• ..
a single program,• perhaps
incorporating instiucti . . from standard library ro . e:;. A linker designed for embedded
processors will al ti)' to eliminat binary code asso· ated with uncalled procedures and
functions as · I as memory cated to unused variables in order to reduce the overall
\ h_;,'( · ~ , o.
,JJ}-3-,
.
' '\
~1'·~;_·~ )
:1
J
E mple: Instruction-Set Simulator for a Sample Processor \/.::.-·"!'}<
Figure 3.13: Software ~signp-ocess: (a) desktop, (b) embeddOO.,: i An instruction-set simulator is a program that runs on one processor and executes the
.
·
I
'--'
, 3.5: Development Environment
c~pter 3: General-Purpose Processors: Software
envirorunent where the embedded system is to function. Hence, debugging a program running
in an embedded system requires having control over time, as well as control over the
#:i.n:lu:ie <stdio. h> envirorunent and the ability to trace or follow the execution of the program, in order to detect
typedef stru:::t {
un.si<;P!d char first_byte, .secx:nd_byte; errors. In the remaining paragraphs, we take a look at some tools and methods to help us do
J instru:::ticn; '\ just that. These tools, for the most part, enable us to execute and observe the_beha"ior of our
instruction program[1024); / / this is rur instructicn irarory 11 ,'){,'J~ prom!JlS. ~
un.sicprl char 11E!rOry[256]; // this is cur data irarory rit'l , 0
/2' De ggers help programmers ~uate and correct thei~-rograms. They--lUll--o~
,
void run_yrogram(int nun_bytes} { ~-_9.-, V · ~~ve opment processor_ and suppo_rt stepwise program execution executing one instruc~---..,.,._ ·
mt p:: = -1; r C) 0 · 6 and then stopping,·proceeding to the next instructi<J{l when inst __
0
ted by the user. They permit
un.si<;P!CI char reg[l61, fb, sb; l
while( ++pc< (nun bytes / 2) ) ( . ,·· -~ "\
execution up to user-specified breakpoi which_are- ·fustructioris" .. ·- when encountered
tb = program[p::J .first_byte; /\ C \ · ,, cause thci.program to stop executing. h never the rogram stop e user can examine
sb = program[p::] . secaid_byte; ( J::~ __ value~ . of vari us me~o _ a n d ~-- ocat1on . , source-level ··debugger enables
switch< fb » 4 > < -:::::..----,.-<. -,
case 0: reg[fb & OxOf] = irarory[sb]; break; r\J
r'1H
.tJ'C.,;,
--s1ep-by-sre1rexecution in ource progr;Un lari , , whether assembly language or a
case 1: nerrory[sb] = reg[fb & OxOf]; break;\_. . siructure-cl"t~guage. Ag~ debugg,ipg@a_b_·_'~-~ty d_~~s crucial, as today's programs can be quite
case 2: rrenory[reg[fb & OxOf]] = reg[sb » 4]; break; complex and hard to wnte correctly. . mc~~"ll',!ggeTs- are p~ograms that run on o r
case 3: reg[fb & OxOf] = sb; break; '~r
-~~~~9pm~11t proc~~~~r b!U~execute code d p._cil .for ... o~[Qi!g~Df( -- .. r; thev alwavs ..
case 4: reg[fb & OxOf] t= reg[sb » 4); break; · imic r ne the function of the tar et rcicessor. ese e ers.are also known ,as
case 5: reg[fb & OxOf] -= reg[sb » 4); break; ins ctio -set simulators ~s ..Q.i:.~~1unes( - . - c).. t:. 'j ' , 'CVM..
.. ,
case 6 : if (reg[fb & OxOf) = 0) p:: t= sb; break;
. mu a ors suppo e uggmg of the program while rt e;,,:~cutes on the targi:fprocessor.
defa ult: return - 1;
1i'l An -emulator typically C_ On~istS Of a debugger coupled:with· 3 b-~ ,·ifrcc:f C.O_nnecieiffo th_e_ desktop
rJ!;} processor via a cable.i@.e board_consists ·of the tar~!_u_s ~?~~~~~rt circuitry
return O;
;.
Embedded System Design 73
72 Embedded System Design
enable the system to interact with its environment more freely, hence provides the highest Digital Signal Processors (DSP)
execution accuracy but little debug control. ·
The availability of low-cost or high-quality development environments for a processor Digital signal processors (DSPs) ~ processors that are ~ghly optimized for processing large
often heavily influences the choice of a processor. amounts of data. The source of this large amount of data 1s some form of digitized signal, like
a photo image captured by a digital camera, a voice packet going through a network router, or
an audio clip played by a digital keyboard. A DSP may contain numerous register files,
memory blocks, multipliers, and other arillunetic units. In addition, DSPs often provide
3.6 Application-Specific Instruction-Set Processors (ASIPs) instructions that are central to digital signal processing, such as filtering and-uansfonning
Today's embedded applications, such as high definition TV, require high computing power vectors or metrics of data. In a DSP, frequently used arithmetic functions, such as multiply-
and very specific functionality. The performance, power, cost, or size demands of these and-accwnulate, are implemented in hardware and thus execute orders of magnitude faster
applications cannot always be dealt will! efficiently by using general-purpose processors. than a software implementation numing on a general-pUIJlOse processor. In addition, DSPs
Nonetheless, the inflexibility·of custom single-pUIJlOse processors is often too prohibitive. A may allow for execution of some functions in parallel, resulting in a boost in performance.
solution is to use an instruction-set processor that is specific to that applicatio~cir application As with microcontrollers, DSPs also tend to incorporate many peripherals that are useful
domain. Because these ASIPs are instruction-set processors, they can be programmed by in signal processing on a single IC. As an example, a DSP device may contain a number of
writing software, resulting in short time-to-market and good flexibility, while the performance analog-to-digital and digital-to-analog converters, pulse-width-modulators, direct-memory-
and other constraints may be efficiently satisfied. acx;ess controllers, timers, and counters.
As with most otheI aspects of embedded systems design, there is a trade-off here. Many companies offer a variety of commonly used DSPs that are well supported in tenns
Instruction-set processors and the associated software tools (compilers, linkers, etc.) are very of compiler and other development tools, making them easy anci cheap to integrate into most
expensive to develop; therefore, they are expensive to integrate into low-cost embedded embedded systems.
systems. In contrast, the large applicability and resulting cost amortization of general-purpose
processors make them very cost effective solutions in most embedded systems. ASIPs tend to Less-General ASIP Environments
come in three major varieties, namely, microcontrollers, which are specific to applications In contrast to microcontrollers and DSPs, which.. can be used in a variety of embedded
that perform a large amount of control-oriented tasks, digital signal processors (DSPs), which systems, IC manufacturers have designed ASIPs that are less general in nature. These ASIPs
are specific to applications that process large amounts of data, and everything else, which are are designed to perfonn some very domain specific processing while allowing some degree of
less general ASIPs. programmability. For example,_ an ASIP designed for networking hardware may be designed
to be progranunable with different network muting, checksum, and packet processing
Microcontrollers protocols. · ·
Numerous processor IC manufacturers market devices specifically for the control-dominated
embedded systems domain. These devices may include several features. First, they may
include several peripheral devices, such as timers, analog-to-digital converters, and serial 3.7 Selecting a Microprocessor
. communication devices, on the same IC. as the processor. Second, they may include some
program and data memory on the same IC. Third, they may provide the programmer with The embedded system designer must select a microprocessor for use in an embedded system.
direct access to a nuniber of pins of the IC. Fourth, they may'provide specialized instructions The choice of a processor depends on technical and nontechnical aspects. From a technical
for common embedded system control operations, such as bit-manipulation ·operations. A perspective, one must choose a processor that can achieve the desired speed within certain
microcontro/ler is a device possessing some or all of these features. power, size and cost constraints. Nontechnical aspects may include prior expertise with a
Incorporating peripherals and memory onto the same IC reduces the number of required processor and its development environment, special licensing anangements, and so on.
ICs, resulting in compact and low-power impleqientations. Providing pin access allows Speed is a particularly difficult processor aspect to measure and compare. We could
programs to easily monitor sensors, to set actuators, and to transfer data with other devices. compare processor clock speeds; but the number of instructions per clock" cycle may differ
Providing specialized instructions improves perfonnance for embedded systems applications. greatly among processors. We could instead compare instructions per second, but the
Thus, microcontrollers can be considered ASIPs to some degree. complexity of each instruction may also differ greatly among processors. For example, one
Many manufacturers market devices referred to as "embedded processors." The processor may require 100 instructions ~bile another processor may require 300 instructions
difference between embedded processors and microcontrollers is not clear, although we
note to perform the same computation. \ ·
that the former tenn seems to be used more for large (32-bit) processors.
Anothe~ commonly us~ spe~ comparison writ, which happens to be based on the
Processor Clock Peripherals Bus MIPS Power Tran- Price Dhrystone, 1s MIPS. One might think that MIPS simply means millions of instructions per
g,,,..e,1 Width sistors · second, but ~ctually _the ~o~on use of the tem1 is based on a somewhat more complex
General PIii nose p ··ocessors notion. Specifically, Its ongm 1s based on the speed of.Digital's VAX 11/780, thought to be
Intel Pill 1GHz 2xl6K 32 ---900 97W -7M $900 the first computer able to execute one million instructions per second. · A VAX 11/780 could
Ll,256KL2, execute 1,757 Dhrystones/second. Thus, for a VAX 11/780, 1 MIPS = 1 757
-MMX Dhry.stones/second. This unit for MIPS is the one commonly usect'today, and it is sometimes
IBM . 550 2x32KLI, 32/64 -1300 SW -7M $900
PowerPC 256KL2 referre~ to as Dhrystone MIPS. So if a machine today is said. to run at 7 50 MIPS, that actually
MHz
750X means 1t can execute 750 * 1,757 = 1,317,750 Dhrystones/second.
lv1IPS 250 2tj2K,2way 32/64 NA NA 3.6M NA The use and validity of benchmark data is a subject of great controversy. There is also a
R5000 MHz set assoc. clear need for benchmarks that measure perfonnance of embedded processors. An effort
StrongARM 233 None 32 268 IW 2.lM NA underway in iliis area is EEMBC (pronounced "embassy" ), the EDN Embedded Benchmark
SA-110 MHz Consortium. The EEMBC has five benctunarking suites of programs corresponding to
Microcontrollers different embe~ded applications: aut~mo~ve/industrial, consumer electronics, networking,
Intel 8051 12 4KROM, 128 8 -1 --0.2 W -lOK $7 office automation, and telecornmurucauons. Each suite consists of several common
MHz RAM,321/0, algorithms found in the suite's application area. For example, two of the programs in the
Timer,UART consumer electronics suite are JPEG compression and decompression (JPEG is a standard for
Motorola 3MHz 4KROM, 192 8 - .5 --0.1.W -IQK $5
still digital image compression). Another program in that suite involves infrared signal
68HC811 RAM, 321/0,
tr-ansmission and reception. .
Timer, WDT,
SPI
I Numerous general-purpose processors have evolved in the recent years and are in
· Di~ital Signal Processors common use today. In Figure 3.15, we summarize some of the features of several popular
TI C54l6 160 128K SRAM,· 3 16/32 -600 NA NA $34 process.ors.
MHz Tl Ports, DMA, i
13ADC, 9DAC - I
Lucent 80 16Klnst,2K 32 40 NA NA . $75
DSP32C MHz Pata, Serial ' > ral-Pureose Processor Design . I
Ports, DMA I
Sources: Intel, Motorola, MIPS, ARM. Tl, and IBM Websites/Datasheets; Embedded Systems
Programming,"Nov.1998. ·
A~ neral-purpose processor is reaily just a single-purpose processor whose purpose is to
process instructions stor~ in a program memory. Therefore, we can design a general~purpose
processor using the single-purpose processor design technique described in Chapter 2. While
l
real microprocessors intended for mass p_roduction are more commonly designed using
custom methods rather than, the general technique of this section, using the the general
' . . .. .
Figure 3.15: General-purpose processors. teclutlque h~re may prove a useful exercise that will illustrate the basic unity between single-
purpose and general-purpose processors~
One attempt. to provide a means for a .fairer comparison is the•Dhrysione benchmark. A _ ~uppose we want ~ design_ a gene~~{lurpose ~rocessor having the basic architecture of
benc~k i:t a program intended to be run on different processors fo compare their Figure 3.1 and supportmg th~strucuon set 9f Figure 3.7. We can begin by creating the
· performance. The Dhrystone benchmark was origiruilly developed in 19il4 by ileillhold fSMD shown in Figure 3.16( whichd~lles the desired processor's behavior. The FSMD
. Weicker specifically as a perfon:nance benchmark; ii perfcintis no useful work it focuses on . dec;lares several. variables for s orage: a ' 16-bit program counter PC, a 1'6-bit instruction
register JR, a 64K x 16 bit rnernory M, and a 16 x 16 bit register file RF. The"F.SMD's initial
exercising a processor's integer ai:itlimetic and string-handling capabilities. Its current version
state, Reset, clears PC to 0. The Fetch state reads M[PC] into JR. The Decode state does
is. writte11 in C and is in the public do~. Beta~ 'nii:,st pr~sors ·can>execute it in
milliseconds, it is typically execiited thoilsands of times, and thus a proci:ssor :is said to be
nothing but adds the extra cycle necessary for JR to get updated so we can then read it on an
arc. Each arc leaving the Decode state detects a particular instruction opcode, causing a
able to e~ecute so many Dhrystones per second. · · ·
transition to the correspondiiig execute state for that opcode. Each execute state, like Mov 1,
ri
,.
I
www.compsciz.blogspot.in
.J ii~~- ,,
·-· -:.~- . ...:·,..
- -- -~--=--:--
.. - =:ta~= - - - - -
Chapter 3: General-Purpose PrucesSOfS: Software _ _ _ _;....._ _ _ _ _ _ _ _ _ _ _ _ _ _ _....:,:3.8.:,::~Ge=ne:r:al:::-P_:u~rpo::se:_::Pr:::
· o:·ce~ss:::o~r~De~si~g~n
Declarations: Aliases:
op IR(l5.. 12] dir IR[7 ..0] Datapalh
bit PC[l6], IR(l6]; IR[7..0) Control wtit 0
m IR(ll..8) inun
bitM[64k)[l6), RF[l6)[16); IR(7 ..0]
rm IR[7..4] rel 2xl.mux
8
0
- ----
---M_o_v4____::.i::- I RF"~=rn; RF~l; RFs=lO;
a
Figure 3.17: Architecture of simple microprocessor.
Memory D i
0
Add RF[m) =RF[m}+RF[nn]l., .RFrla=rn; RFrle=I;
'-----'-4~to Fetch RFr2a=rm; RFI2e=l; ALUs=OO registers PC and IR, memory M, and register file RF. The second step is to instantiate
l
RF[m] = Rf[m)-RF[rm)! . RFwa=rn; RFwe=l; RFs=OO; .
functional units to carry out the FSMD operations. We'll use a single ALU capable of
Sub carrying out all the operations. The third step is to add the connections among the
· RFrla=rn; RFrle=l;
components' ports as required by· the FSMD operations, adding multiplexors when there is
'-----"-'+toFetch
. I!
RFI2a=nn; RFt.2e=i;ALUs=OI
more than one connection being input to a port. Finally, we create unique identifiers for every
Jz PC=(RF[m]=O) ?rel. :PC l PCld=ALUz; control signal.
' - - - - - " - - t o Fetch . ] Given this datapath, we can now rewrite the FSMD as an FSM representing the
data~th's controller. Each FSMD operation must be replaced by biruuy operations on control
Figure_3.16: A simple microprocessor: (a) FSMD, (b) FSM operations that replace the FSMD operations after we
a-eatethe dalapath of Figure 3.17. ·
signals, as shown in Figure 3.16(b). The states and arcs~ identical for the·FSMD and FSM,
. and only the operations change, so we do not redraw the states and arcs in the figure. As an
Add, and Jz, carries. out the .actual in$'uction operations by. moving datl between storage · example of operation replacement, we., replace the assignment PC = O in state-Reset by the
devices, modifying data, or updating PC. · • · . ·' control signal setting PCclr = l.
. we can now build a datapath that can carry out the operation of this FSMD, asEfesc"bed· We can use the FSM design technique of Chapter 2 to design a controller,·consisting of a
in Chapter 2. The datapath we create using the following.steps is sh~wn in Figure_3.17 . e state register and next-state/control logic. We omit this step here; ·
first step is to instantiate a storage device for ~b declared vanable, so we . te '
Having just designed a simple general-purpose processor using the same technique we em~ded systems, including programming, -compilers, operating svstems e 1
used to design a single-purpose processor, we can see the similarity between the two device progranuners, microcontrollers, PLDs and memories An ~ual b . ~u ators,
Provides
.
tabl
es o
f d- " , - .
ven ors ,or these items, including 8/16/3 2/64 ·
uyer s guide
.processor types. The key difference is that a single-purpose prveessor puts the "program"
inside of its contr<>I logic, whereas a general-purpose processor k¢eps it in an external mJcrocontrollers/rnicroprocessors and their features. -bit
memory. So the program of a single-purpose processor cannot be changed once the processor • Microprocessor Report, MicroDesign Resources California, 1999 A
·d· · -<I th ' ·
ti·'
mon uy report
has been implemented. But nevertheless, both processor types process programs. A second pro~ mg m ep cove~ge of trends, announcemenls, and technical details, for deskto
·difference is that we design the! datapath in a general-purpose processor without knowledge of mobile, and embedded microprocessors. p,
what program will be put in the memory, whereas we know this program in a single-purpose • www.eembc.org. The Web site for the EEMBC benchmark consortium.
processor. -SO the datapath of a siitglei)urpose processor can be optimized to the program. We • SIGPLAN Notices 23,8 (Aug. 1988), 49--62. Provides source for the Dhrystone
see that single-purpose and general-purpose processors both implement programs. Though benchmark version 2. Online source can be found at fip.nosc.mil:pub/abuno.
· they may differ in terms of design metrics like flexibility, power, performance, and cost, they
_ fundamentally do the same thin? . . 3.11 Exercises
3.1 Describe why a . general-purpose processor could cost less than a single-purpose
3.9 Summary processor you design yourself.
3.2 Detail the stages of ~xecuting the MOV instructions of Figure 3.7, assuming an 8-bit
General-purpose processors are popu1ar in embedded systems due to several features, .
including low unit cost, good performance, and low NRE cost. A general-purpose processor' processor and a 16-bu IR and program memory following the model of Figure 3. 1. For
a
consists of controller and datapath, with a memory tQ store program and data. To use a example, the sla~e~ for the ADD instruction are (I) fetch M[PC) into IR, (2) read Rn
general-purpose processor, the embedded system designer must write a program. The designer and Rm ~om re~ster file ~ough ALU configured for ADD, storing results back in Rn.
as
, may write some parts ot'this program, such driver routii,es, using assembly language, while
3.3 Add o~e mstruct1on to the mstruction set of Figure 3. 7 that would reduce the size of our
summmg assembly program by I instructiorL Hint: add a new branch instruction. Show
-writing other parts in a structured language. Thus, the designer should be aware of several 1.
the reduced program. · _ ·
aspects of the processor being used, such as the instruction set, available memory, registers,
3.4 Crea~e a lable l_jsting the address spaces for the following adtlres~ sizes: (a) 8-bit (b)
J/0 facilities, and interrupt facilities. Many tools exist to support the designer, including
assemblers, compilers, debuggers, device programmers, ·and emulators. The designer often
makes use of microcontrollers, which are processors specifically targeted to embedded
l(i-b1t, (c) 24-bit, (d) 32-bit, (e) 64-bit. ·
35 Illu~te how program and dala memory fetches can be overlapped in a Harvard
'
I
systems. These processors may include on-chip peripheral devices and memory, additional . architecture.
J/0 ports, and instructions supporting common embedded systein operations. The designer has -· 3.6 Read the entire problem before beginning. (a) Write a C program that clears an
a v;µiety of processors from which to choose. "short int M[2S6]." In other wo~ds, the prograt'h sets every location to o. Hint: 3;:.};
program should o~y be a coup!~ Imes long. (b) Assuming M starls at location 256 (and
. . . -th'"'. en_ds at l~auon SI I), wnte the same program in assembly language using the
earher mstru~Uon set. (c) Measure the time it ta1ces you to perform parts a and b and
3.10 ·References-and Further Reading report those umes. · '
• Philips semiconductors, 80C51-based 8-bit Microcontrollers Databook, Philips 3.7 A~uire a databo~k for _a rnicrocontroller. List the features of the basic version of that
Electronics North America, 1994. Provides an overview of the 8051 architecture and ~cr~ntroller, mcluding · key characteristics of the instruction set (number of
• on-chip peripherals, describes a large number of derivatives each with various features, ms~Cbons of each type, length per instruction, etc.), memory architecture and
_describes the I2C and CAN bus protocols, and highlights development support tools. -. ~vailable m:~?ry, general-purpose registers, special-function registers, I/0 facilities,
• - Rafiqlizzarnan;' Mohamed. Microprocessors and Microcomputer-Based System Design. mtemipt fac11it1es, and other salient features. _ ·
Boca '~ton: CRC Press, 1995. Provides an overview of general-purpose "pr<>Cessor 3-8 · For_ ~e microcontrol~er in the previous exet;rcise, create a table listing five existing
· · · architecture, along with detailed descriptions of vari~ Intel 80xx and Motorola 680QO va~tmns of .that nucrocontroller, stressing the features that differ from the basic
series processors. · -· _ · \ .· -- version.
• Embedd,ed Sy$,_tems Programming, Mille( Freeman Inc., San Francisco, 1999; A monthly
•· publication covering trends in v;trious aspects of general-purpose processors .for _
4.1 Introduction .
4.2 Timers, Counters, and Watchdog Timers
4.3 UART
4, 4 Pulse Width Modulators
4.5 LCD Controllers.
4.6 Keypad Controllers ·
4. 7 Stepper Motor Controllers
lti 4.8 Analog-to-Digital Converters.
4,9 Real-Time Clocks
4JO Summary .
t 4 .11 References and Further Reading
t 4.12 Exercises
t
~
[
i--------------.'------'-------------
tt 4.1
. Introduction
A singl~purpose processor is a digital system intended to solve.a specific computation task,
l . as opposed to a generaliJUII)OSe processor, which is i.JUended to solve a .wide variety of
'I computation tasks. ·The single-purpose processor may be· a custom one that we cbign
ourselves, as discussed in Chapter 2. However, . somecomputation tisks are so common that
standard single-purpose processors have evolved. These processors can be purchased "o~ the
shelf" The manufacturer of such an off-the-shelf processor sel,ls the device in large quantities. ·
An embedded system designer choosing to tise a standard single-purpose processor to
· implement a specific computation task. as opposed to choosing to ·design a custom single-
. purpose processor, may achieve several benefits. First, NRE cost will be low, since the
processor is predesigned. $econd, unit cost may be low, since the standard ~ r is
mass-produped and hence the manufacturer can amortize NRE costs. . .
with the CPU, often placing them on-chip, and even assigning peripheral registers to the
CPU's own register space. The result is the common term on.:.Chip peripherals, which some
may consider _somewhat of an oxymoron. ·
!imer
l. ;
i!' ~igure 4.1: ".'X"dures: (a) a basic timer, (b) a timer/counter,{c) ;timer with a ~ (cl) a 16/32-bit
tuner, (e) a timer with a prescaler. '--------=-~-~
~ .
4.2 Timers, Counters, and Watchdog Timers ~igure 4.l(a) provides the structure of a very simple timer. This timer has an internal
1.J 16-bit up counter, which increments its value on each clock pulse. Thus, the output value c:nt
Timers and Counters represents th~ m11~1ber of pulses since the counter was last reset to zero. To interpret this
A timer is an extremely common peripheral device that can m ~ time intervals. Such a
number as a rune mterval, we must know the frequency or period of the clock signal c:Jk. For
11 device can be used to either generate events at specific times, or to determine the duration . example, suppose we wish to measure the time that passes between two button presses_In this
case, we could reset the timer on the occurrence of the first press, and then read the timer
between two exte ents. Example applications that require generating events include •
keeping a traffi t green for a specified dtµ"ation, or communicating bits serially between : output on the second pres6ppose the frequency of_~lk w~rc::JOO MHz, meaning the period
devices at a · c rate. An example of an application that determines inter-event duration is would be I/ (100 MHz)= IO nanoseconds, and that c:nt = 20,000 at the time of the second
!1 that of co uting a car's speed by measuring the time the car takes to pass over two separated '
a road
button press. We would then compute the time that passed between the first and second button
pr~ses as 20,000 * IO nanoseconds = 200 microseconds. We note that since this timer's
timer·measures time by counting pulses that.occur on an input clock signal having a ; counter can count from O to 65,535, this particular timer has a measurement range of o to
wn period For example, if a particular clock's period is I microsecond, and we've ·. 65,535 * IO nanoseconds = 655.35 microseconds, with a ~olution of IO nanoseconds. We
unted 2,000 pulses on the clock signal, then we know that 2,000 miaoseconds have passed. define a timer's range as the maximum time interval the timer can measure. A fuaer's
·resolution is the minimum interval it can measure. .
A c:ounter is a more general version of a timer. Instead of counting clock pulses, a :
counter counts pulses on some other input signal. For example, a counter may be used to • . The timer in Figure 4. l(a) has an additional output top that
indicates when the top value
count the number of cars that pass over a road sensor, or the number of people that pass · of its range has been reached, also known as an overflow occurring, in which case the timer
through a turnstile. We often combine counters and timers to measure rates, such as counting J ?·
rolls ov~r to When we use a timer in conjunction with a general-purpose processor, and we
the number of times a car wheel rotates in one second, in order to determine a car's speed. ( ~xpect um_e mtervals to exceed the timer range, we typically connect the top signal to an
!o use a. timer, we 1Il_U$l configure its inputs and monitor its outputs. Such use often _I mterrupt pm on the processor. We create a corresponding interrupt service routine
that counts
reqwres or can.be .greatly aided by an understanding of the internal structure of the timer. The f the number of times the routine is called, thus effectively extending the range we can
inte.rnal structure can vary greatly among manufacturers. We provide a few common. features f m~ur~. Many _microco~trollers that include built-in timers will have special interrupts just
of such internal sttuctures .in Figure 4.1. . ·· . f for its tuners, with th"se mterrupts distinct from ext(lrnal interrupts.
Figure 4. l(b) provides the structure of a more advanced timer that can also be configured
as a counter. A mode register holds a bit, which the user sets; that uses a 2 x I multiplexor to
select the clock input to the internal 16-bit up counter: The clock input can be the external elk . (a)
signal, in which case the device acts like a timer. Alternatively, the clock input can be the
indicator Ir\! reaction
external cnt_in signal, in which case the device acts like a counter, counting the occurrences
of pulses on cnt_in. cnt_in would typically be connected to an external sensor, so pulses
light (LED)--'--..--. 0 iQ button
would occur at indetenninate intervals. In other words, we could not measure time by
counting such pulses.
LCD ~ I time: IOOms II
Figure 4. l(c) provides the structure of a timer that can inform us whenever a particular
interval of time has passed. A terminal count register holds a value, which the user sets, (b)
indicating the number of clock cycles in the desired interval. This nwnber can be computed
using the simple formula: /* ma:in.c */ while (user has not p.,tsred reaction buttm) {
if(tcp) {
~ r of clock cycles = desired time interval / clock period #define MS mrr 63535 step tilter
void rra:in(wid) { set 01t to MS INlT
For example, to obtain a duration of 3 microseconds from a clock cycle of 10 int co.int mi.llisecxnis = O; start tircer -
nanoseconds (100 :MHz), we must count: 3 x 10"" s I 10 x 10·9 s/cycle = 300 cycles. The timer caifig.u:e-t::iner m:xie reset tcp
structure includes a comparator that asserts its top output when the tenninal count has been s~t cnt to MS mrr cant_millisean:lst+;
wait a ran:km-1E!OZ!t of tine
reached. This top output is not only used to reset the counter to 0, but also serves to inform the
tum m in:li.cator licjlt
timer user that the desired time interval has passed. As mentioned earlier, we often connect start tiller tum off in::licator light
this signal to an .interrupt. The corresponding interrupt service routine would include the print! (''tiire: %i ms", co.int_mi.lliseccn::1.s) ;
actions that must be taken at the specified time interval.
To improve efficiency, instead of counting up from O to tenninal count, a timer could
instead count down from tenninal count to 0, meaning we would load terminal count rather Figure 4.2: Reaction timer: (a) LED, LCD, and button, (b) pseudo-code.
than O into the 16-bit counter upon reset, and the.counter would be a down counter rather.than
an up counter. The efficiency comes from the simplicity by which we can check if our COUil! Note that we could use a general-purpose processor to implement a timer. Knowing the
has reached O- we simply input the count into a 16-bit NOR gate. A single 16-bit NOR gate number of cycles that each instruction ~uires, we could write a loop that executes the
is far more area- and power-efficient tlian a 16-bit comparator. desired number of instructions; when this loop completes, we know that the desired time
Figure 4. l(d) provides the structure of a timer that can be configured as a 16-bit or 32-bil · passed. This implementation of a timer on a dedicated general-pmpose processor is obviously
timer. The timer simply uses the lop output of its first 16-bit up counter as the clock input of quite inefficient in terms of size. One could alternatively incorporate the tifner functionality
its second 16-bit counter. These are known as cascaded counters. into a main program, but the timer functionality then occupies much of the program's run
Finally, Figure 4. l(e) shows a timer with a prescaler. A presca/er is essentially a time, leaving little time for other computations. Thus, the benefit of assigning timer
configurable clock-divider circuit. Depending on the mode bits being input to the prescaler, functionality tea special-purpose processor becomes evident.
the prescaler output signal might be the same as the input signal, or it may have half the
frequency (double the period), oile-fourth·the frequency, one-eighth the frequency, etc. Thus, Example: Reaction Tim~r
a prescaler can be used to extend a timer's range, by reducing the timer's resolutio~r A reaction timer is an application that measures the time a person takes to respond to .a visual
exampl,e , consider a timer with a resolution of IO ns and a range of 65,535 * 10 nanoseconds or audio stimulus. In this example, the application turns on an LED, then measures the time a
= 655.35 microseconds. If the prescaler of such a timer is configured to divide the clock person takes to push a button in response, and displays this time on an LCD, as illustrated in
· · f r ~ by eight, then the timer will have a resolution of 80 ns and a range of 65.535 * 80 Figure 4.2. We expect reaction times to be on the order of seconds, and we want to display
~o~~~ds = 5.24 milliseconds. · . reaction times to millisecond precision. .
Many timers will combine the above features, plus adding other configurable features. · In this example, we'll use a microcontroller with a built-in 16-bit timer. The timer is
One such feature is a mode bit or additional input that enables or disables counting. Anothei incremented once every instruction-cycle, where one instruction cycle for this microcontroller
feature is a mode bit that enables or disables interrupt generation when top count is reached. equals six clock cycles. The dock frequency is 12 MHz, meaning the period is_ 83.3:f.i
nanoseconds. Thus, this timer has a resolution of I instruction-cycle = 6 clock cycles = 6 *
83.333 nanoseconds= 0.5 microsecond. Furthermore, since the timer has 16 bits, its range is
65,535 * 0.5 microsecond = 32.77 milliseconds. This timer does not have a prescaler or a
r·:
86 Embedded Syste111 Desigll .· Embedded System Design ,87
'
www.compsciz.blogspot.in - -.----··----- ~ ~=="""'-"'"'--'•
Chapter 4: Standard Single-Purpose ProceUOf'!I: Peripherals
4.2: Ti~s, Counters, and Watchdog Timers
tenninal count register, but it does however have a top signal to indicate overflow, and it also (a)
al19ws us to load in an initial value for its internal up counter.
We note that this timer's range is smaller than our desired range of several seconds, while osc prescaler elk r-sca-ler_e_g-:__ov
_e_rfl_o_w__ timereg 1-o_v_e_rtl_o~w. to system reset
its resolution is finer than our required one millisecond. Thus, we must somehow extend the · or
range, but without the convenience of a prescaler or terminal count register. Instead, we'll set ihtenupt.
the initial timer value such that overflow will occur after l millisecond, and then monitor the checkreg
top output signal of the timer to activate code that keeps a count of the number of overflows, (b) (c)
meaning the number of mimseconds. The nurber of instruction cycles corresponding to l /* main.c * / watdrlog_r eset_rcutine(){
millisecond is l millisecond/ (0.5 ajcrosecond/instruction-cycle) = 2,000 instruction cycles.
/ * checkreg is set so we can load value
Thus, the appropriate initial timer value is 65,535 - 2,000 = 63,535. Pseudocode describing ' main() l into tirrereg. Zero is loaded into
the reaction timer implementation is shown in Figure 4.2(b). . wait until card inserted scalereg an::i 11070 is loaded into
Note that we did not use an interrupt service routine here, since the system does not have ; call watc:hdog_reset_routine tirrereg * /
any other functions. Also note that waiting a random amount of time could also make use of a
while(transaction in progress) { checkreg = 1
timer. if (buttcn pressed) { scalereg = 0
Notice that the method described above has some inaccuracy. Our method requires that perform correspon:ling acti.cn tirrereg = 11070
we stop the timer, reset the timer, and then start the timer again. When we stop the timer to call watchdcg_reset_ routine
reset it, a certain amount of time that we are not measuring·passes. However, this time is i
small so we treat it as negligible. void interrupt_service_rcutine ()(
/ * if watch:log rese t rcutine not called e j ect caro ,
every < 2 minutes, - r eset screen
Watchdog Timers interrupt_se.rvice_rcutine i s called * /
A special type of timer is a watchdog timer. We configure a watchdog timer with a real-time
value, just as with a regular ~r. However, instead of the timer generating a signal for us
every X time units, we must generate a signal for the timer every X time units. If we fail to f Figure 4.3: ATM timeout using a watchdog timer, (a) timer stnicture. (b) main pseudo-code. (c) watchdog ·resel
rou1me. ..
generate this signal in time, then the timer "times out" and.generates a signal indicating that b
i ~~~ f Anot~er common use is to support ti~e outs in a program while keeping the program
ti One common use of a watchdog timer is to enable an embedded system to restart itself in !
case of a failure. In such use, we modify the system's program to include statements that reset
st~c_ture simple. For e~ample, we raay desire that a user respond to questions on a display
' i w1th10 some time penod. Rather titan sprinkling response-time checks throughout our
the watchdog timer. We place these statements such that the watchdog timer will be reset at
program, we can use a watchdog timer to check for us, thus keeping our program neater. An
least once d~g every time out inteival if the program is executing normally. We connect the example in this chapter illustrates such use of a watchdog timer.
fail signal from the watchdog timer to the microprocessor's reset pin. Now suppose the
program has an unexpected failure, such as entering an undesired infinite loop, or waiting for
Example: ATM Timeout Using a Watchdog Timer
an input event that never arrives. The watchdog timer will time out, and thus the
microprocessor will reset itself, starting its program from the beginning. In systems where In this example, 'a watchdog timer is used to implement a timeout for an automatic teller
such a full reset during system operation is not practical, we might instead connect the fail machine, _or A1M. ~ normal ATM session involves a user inserting a bank card. typing in a
signal to an interrupt pin, and create an interrupt service routine that jumps to some safe part ix:rsonal 1denuficauon_ number, and then answering questions about whetl1er to deposit or
of the program. We might even combine these two responses, first jumping to an interrupt withdraw money, which account will be involved, how much money will be involved.
service routine to test parts of the system and record what went wrong, and then resetting the whether another transaction is desired, and so on. We want to design tJ1e ATM such tliat it
system. The interrupt service routine may record information as to che number of failures and will terminate the session if at any time the user does not press any button for 2 minutes. In
the causes of each, so that a service technician may later evaluate this information to this case, the A1M will eject the bank card and terminate the ·session.
determine if a particular part requires replacement. Note that an embedded system often must We will use a watchdog timer with the internal structure shown in Figure 4.3(a). An
self-recover from failures whenever possible, as the user may not have the means to reboot oscillator signal osc is connected to a prescaler that divides the frequency by 12 10 gencra,te a
the system in the same manner that he/she might reboot a desktop system. signal elk. The signal elk is connected to an I I-bit up-counter scale reg. · When scalereg .
overflows, it rolls over to 0, and its overflow output c auses the 16-bit uixountcr timereg to
www.compsciz.blogspot.in
.....
Chapter 4: standard Single-Purpose Processors: Peripherals
increment. If timereg overflows, · it trigger.; the system reset or an interrupt. To reset the · (a) (b)
watchdog timer, ci,eckreg must be enabled. Then a value can be loaded into timereg. When a start bit data end bit
value is loaded into timereg, the checkreg register .is automatically reset. ff the checkreg
register is not enabled, a value cannot be loaded into timereg. This is to prevent erroneous
software from unintentionally resetting the watchdog timer. .
HI.IJllJIJI
Now let's determine what value to load into timereg to achieve a timeout of 2 minutes.
The osc signal frequency is 12 MHz. timereg is incremented at every t seconds where:
t = 12 • i1 1 • 1/(osc frequency) = 12 * 211 • 1/(12 • 106) 1111°1°111 11°1 111 11 jl I jojoj 1!I joj I II lj 11
= 12 • 2,048 • (8.33 • 10-8) = 0.002 second sending UART receiving UART
I
are sent and received. This is called the baud rate. The protocol also specifies the number of
u
tql
later chapter. For our purpose in this section, we will look at the basics of senal
communication using UARTs.
Internally, a simple UART may possess some configuration registers,. ~d two ·· ·.
independently operating processors, one for receiving and the other for transnuttmg. The
bits of data and the type-0f parity sent during each transmission. Finally, the protocol specifies
the minimum number of bits used to separate two consecutive data transmissions. Stop.bits
are important in serial conununication as they are used to give the receiving UART a chance
to prepare itself prior to the reception of the next data transmission.
transmitter may possess a register, often called a transmit buffer, ~t hol~ data to ~ -sent. The baud rate determines the speed at which data is exchanged between two serially
\.:\1 This register is a shift register, so the data can be transmitted one bit at a ume by shifting ~t connected UARTs. Common baud rates include 2,400, 4,800, 9,600, and 19.2K. There is a
i.!i the appropriate rate. Likewise, the receiver receives data into a shift r~gister, and ~en this
data can be read in parallel. This is illustrated in Figure 4.4(a). Note that m order to shift at the
great deal of misuse of the term baud rate, often assumed to be just the same as the term bit
rate. In fact, bit rate is a true measure of the number of bits that are sent over a connection in
1r-··
;r;: · appropriate rate based on the configuration register, a UART requires a time~. 1 ..
The receiver is constantly monitoring the receive pin (rx) for a start b1( The start bit is
"ne second, while baud rate is the measure of the number of signal changes that are
I ,·
transmitted over a connection in 1 second. Soine clever techniques can be used to achieve a
typically signaled by a high to low transition on the rx pin. ~er_ the start bi! ~as been bit rate higher than the baud rate.
detected, the receiver starts sampling the rx pin at predeterrnmed mtervals, shifting each To use. a UART, we must configure its baud rate by writing to the configuration register,
sampled bit into the receive shift register. If configured to do so, the receiver also reads an and then we must write data to the transmit register and/or· read data from the received
additional bit called p arity which it uses to determine if the received data is correct. For
www.compsciz.blogspot.in
:::!:· ··.··
register. Unfortunately, configuring the baud rate is usually not as simple as writing the
desired rate (e.g., 4,800) to a register. For example, to configure the UART of an 8051 i : l , T
microcontroller, we must use the following equation:
1---..------1·· -t-t++--:r-
I
i-
i . '
-- -- --_t- +-·-' 5V
(a) pwm_Ojl i .; i I ' ' ,---jOV
baud rate= (2""oc1 / 32) * oscfreq / (12 * (256 -THI)). ; , i5y
the oscillator, and THI is an.8-bit rat~register of a built-in timer. . 25% duty cycle-averagepwm_o 1s 1.2~V.
Note that we could use a general-p se processor. ~O~Jll$~l.~ UAR,T_ ~~pletely in
7
software. JLw~.-,usecLa~edicated _ge . - ~pJ!JP0St::P.!:.o¢ssor, ·tlie implemelltlltI9n woiififbe
1nclficient in- teniis.of size:-· · rnatively, we could integrate the __p:ansmit and receive pwm_o f-+--.l-+---+--+--t--+-i---t--t--1~r--1-t---t-+--W SY
I I i' i I . lov
furicti~nality with our main rogram. This wo\tlg require· creating a routine to__send data (b)
I 1 ! isv
elk
serially over an 1/0 !X)rt, aking use of a timer to control the rate. It would also.r(:(J.~ire using · II· - · I ·I' I I I I I i0 v
an interrupt service ro .. _ e to capture·se~_c!;lti g,ming from another 1/0 pprt whenever ~ch I I l
data begins arri . g. However, as with the timer functionality, adding send and receive · 50% duty cycle -average pwm_o is 2.5V.
~~,-=~~ ~! ~lj~~'''
functionality detract from tinie fo! other cornpp~-"s'> .
(c)
4.4 Pu]se Width Modulators
. - I ' ! ' . - , I • •J.-2,
Overview 75%dutycycle-averagepwm_ois3.75V. Q__l \ \\ ;~-:: ii
A pulse width modulator (PWM) generates an output signal that repeatedly switches between
high and low values. We control the duration of the high value and of the low v~ue by ·
Figure 4.5: Operation of a PWM, (a) 25% duty cycle, (b) 50% duly cycle, (c) 75% duty cycle. In lhe diagrams, logic
indicating the desired period, and the desired duty cycle, which is the percentage of time the· high is SY, low is OV.
signal is high compared to the signal's period. A square wave has a duty cycle of 50%. The
pulse's width corresponds to the pulse's time high, as shown in Figure 4.5. ·· ~ -::,
.. Again, PWM functionality could be implemented on a dedicated general-purpose
average input valtagt;:¥~Fefcia•ely •a ebt:ain lire desired speed. Using_il_ ~WM. we set the
duty cycle to achieve -the appropriate average voltage, and we set the period .small enough for J
:·.•
processor,. or integrated with another program's functionality, but the · single-purpose sm~peration of the motor (i.e., so the motor does not noticeably speed up and slow ii
procesSor approach has the benefits of efficiency and simplicity.
A common use of a PWM is to generate a clock-like signal to another device. For
. · down. ssuming the PWM's output is 5 V when high and O V when low, then we can obtain
an . e output of 1.25 V by setting the duty cycle to 25%, since 5 V * 25% = 1.25 V. This
'i
example, a PWM can be u~ to blink a light at a specific rate. . . . duty cycle is shown in Figure 4.5(a). Likewise, we can obtain an average output of 2.50 V by
Another common use of a PWM is to control the average current or voltage mput to a setting the duty cycle to 50%, as shown in Figure 4.5(b). A duty cycle of 75% would result in
device. For example, a DC (direct current).electric motor rotates when its input voltage i~ ~t average output of 3.75 V, as shown in Figure4:5_(c). This duty cycle adjustment principle
high, with the rotation speed pro!X)rtional to the input voltage level. Suppose the revolutions applies to the control of a wide variety of electric devices, such as dimmer lights.
per minute (rpm) equals 10 times the input voltage. To achieve a desired rpm of 12?, we Another use of a PWM is to encode control commands in a single signal for use ·by
would need to set .the input voltage to 1.25 V, wher~ achieving 250 rpm would reqwre an another device. For example, we may control a radio-controlled car by sending pulses of
input voltage of2.50 V. . ,, · different widths. Perhaps a width of I ms corresponds to a tum left command,· a 4-ms width to
One approach to .control the average input voltage to a DC motor uses a DC-to-DC . tum right, and an 8-ins width to forward. The receiver can use a timer to measure ·. the pulse
convener. circuit, which converts some reference voltage to a desired voltage, However, these width, by starting a. timer when the pulse starts and stopping the timer when the pulse ends,
circuits can be expensive. Another approach usesa digital-to-analog convener. A third and thus determining how much time elap~.
approach, perhaps the simplest, uses a PWM, .The P\\'M approach makes ~se .of the fact that.a
DC motor does not co_me to :an immediate stop when its inp~t voltage 1~0, but
rather it coasts, much like a bicycle coasts when we stop pedaling. Th~ ~ d the
www.compsciz.blogspot.in
·······--· ..·-·"·- ·- ·- --- -------·,···- ···,·····-· ., ...,. --------·-·- - ····---·~-.- --::ji~= - - - - -
Chapter 4: Standard Single-Purpose Processors: Peripherals 4.5: LCD ControUers
(a) looks at the values in the counter register and the cycle_high register. When the COUJ1ler value
-r>· ·-
input voltage %ofmaximum RPM ofOC motor is less than cycle_high,_ a. L(t5V) is outputted. When the value in counter is lowR ~the
voltage applied / />· valu; in mfi:yi:le_high regfaier a O (OV) is outputted. \V!J.en the counter value reaches 254,
0 0 :,,-,,··,....- 0 · counter is reset to Oand theprocess.r.epeats.. Thus, we see that elk· div detennines-the PWM's
2.5 50. . 4.600
period, specifyingih<i:° imin6er of ;:y_cle~ !!!J!tc: i>t:ri~. The registe~ cjcleJ!!gh detennines-ffie
3.75 75- 1;,-,,,.-- /.. urv ·
- ~
..
-:,,,
duty_J:y~mit"fii~:-'Y_ many of :a· ~nO<f~- cycles sff?°uld oiii~ut ~-I. ~
5.0 100 9,200 ~ e - - o u t p l i t s1gnal-is-alwa}'!l ·h1glrresulting.n-a-dnty-cyde of
(b) _ . ~ 169~( Ge~&tse ·
a-du. , . . . .
ll.:is.·sen~ol~(O.O:!i_T,:lhe.outputs1giial 1s always.low resiiltingin
-- -- --.:::-~=~""'-" : ' . : : ~ ~- .·::... :•..:.=. · · ---~-"C-= -- - -
COlJ!lt~r .· ~;unter<cycle_high,pwrn_o= I -----r=--~nnine the value of elk_div, we can try vanou ._ ... · --~ ·0 .._ _·•
('O --25:4J:. / :.Counter>: cycle_high, pwrn_o = 0 .f.~~EY~!??.f~~o slow for ou~ parti~~tor:·u the value of~/2~"lbo fow';the
?'" . v11lu_e__?.~tpllttecfl,}'._!fie ~~oy>~!lf!lt~s 7~ q~1c~~ c ~ ~ ~ r: outputs z.eros
8-bit comp~a!_or Y ·l~~ _e nou~,!~~}l!tj>S 111~~!._~<>~~lQw_.!!QWD.:, ~~~~:fil~ "i . ·m~ Q[(o gzntutu2~1:IJ!D.~l full
__ 4· ~"isetting the JialiI~ of..clk~qi_Y-.!Q_FF_h intliis C3$C-WO~~sCOiice lliis value is set, the
only register that needs to be consi@.!_~j!; Cyele~higW; -·
For the inotor,to run at 4,600 RPM; we iieect"a-ruty cycle of 50%. To compute the value
:.\.-. ~
,,~--r\>-). needed.in cycle_high for a 50% duly cycle, we multiply 254 by 0.50, yielding 127. Thus,
putting 7Fb ( 127 in he)cadecimal) into the cycle_hig~ register shoaj_cl c.!l~ the motor to run at
about 4,600 RPM. For ,the rtlQ!Qr to run· at 6,900 RPM, we need a 75%..dui"}; cyclt:. We
coi.:ipute 254 * o)5, yi"elding 191. Thus, putting BFh (191 in hexadecimal) into cycle_high
should ca"-~tk!2C.motor. . atabout6~9._00 RPM. ( fakt'i.}
We cannot Jusfco · · t the DC motor to the PWM because-"tnePWM _does not provide
enough c ...JQ. _:_ theDC motor. To. remedy this problem, we use an NPN transistor to
d · or. The code and schell:!l:ltic u~J9rthis example are:found in Figure 4.~c)
-a(ifffri. e figure, the name of the elk_ dTvce. is WMP and cyclej1igh 1s PWMI -~:::)c.
- LCD Controllers
\ ~ ! }:,
Overview
~le: Controlling a DC Motor Using a PWM: # A liquid cry.f[al display (LCD) is a low-cost, low-power device capable of displaying text and
images. LCDs are extremely common in embedded systems, since such systems often do not
In this example. we wish to control the speed of a direct-current (DC) -electtjc motor using a have video monitors like those that come standard with desktop systems. LCD~ .can be found
PWM. The speed uf the DC motor· is proportional to the voltage,. applied to· the motor. in numerous common devices like watches, fax and copy machines, and calculators.
Suppose that for a fixed loa(l the moHor i_clds the .revolutions per minut.e (rp.m) sh.own in The basic principle of one type of LCD, a reflective LCD, works as follows. First,
igurc 4.6(a) for the given input yolt,ge e must set the duty cycle ofa. PWM such that the incoming light passes through a polarizing_.plate. N_ext, that polarized ligh_t encounters liquid
veragc output yoltage equals ilic G . . V age/ ' · _;_, .
crystal mate'riat If we excite a region of this material, we cause the material's °molecules to
· Suppose. thatwe use a P as part of a system that includes two 8-bit regist~rs called . align, which in turn causes the polarized light to pass through the material. Otherwise, the
clk _div and cycle _high. it counter, and an 8-bit ~rnparator, as-5hown in Figure 4.6{b). light does not pass through. Finally, light that has passed through hits a mirror and -reflects ,
~c PWM works as ~ ws. Initially, the .value of _elk_div is lo_a d~ into the register. The back, so the excited region appears toJight up. Another ty,pe of LCD, an absorption LCD,
'c)k div register wo s a clock d1v1der. After a specified amount of time has elaps~,,a _plilse works simHarly, but uses a black ·surface instead of a mirr6r, The surface below the excited ·
is sent to the co ;?..----
r register. This causes the counter to incre t itself. The comparator then
>
region absorbs}ight, thus appearing datker than the other regions. '---
(a) A dot-matrix "LCD consists of a matrix of dots that can display alphanumeric characters
LCD controller communication bus (letters and digits) as well as other symbols. A common dot-matrix LCD has five columns and
eight rows of dots for one character. An LCD driver converts input data into the appropriate
E (
. electrical signals necessary to excite the appropriate LCD dots.
Iler RJW(i . ·-·----- w
Each type of LCD may be able lo display multiple characters. lri addition, each character
8
RSi\ '
. "
' ' may .be displayed in normal or inverted fashioIL The LCD may permit a character to be
blinking (cycling through normal and inverted display) or may permit display of a cursor
DB7-DBO I
(such as a blinking ~derscore) indicating the "current" character. Such functionality would
(b) be difficult for us lo implement using software. Thus, we use an LCD -controller to provide us
with a simple interface to an LCD, perhaps eighfdata inputs and one enable input To send a
R R/ D D D I) D D D D
Description byte to the LCD, we provide a value to the eight inputs and pulse the enable. This byte may be
s w B, e. B, a. B, ' B, B, Bo
a control word, which instructs the LCD controller to initialize the LCD, clear the display,
(~ 0 0 0 0 0 0 0 0 I Clears all display, return cursor home select the position of the cursor, brighten the display, and so on. Alternatively, this byte may
~
be a data word, such as an ASCII character, instructing the LCD to display the character at the
0 0 0 0 -~ bO 0 0 I • Returns cursor home
currently-selected display position. "
0 0 0 y, 0 0 0 I
J/
D
s Sets cursor move direction and specifies
· or n~t to shift disolay
y
Example: LCD Initialization
~
/ ON/ OFF of all display (D), cursor ON/
0 0 0 0 0 I D C B
OFF (D), and blink cmsor position (B) · 1:_,_
.• In · this example, a microprocessor is connected fo an LCD controller, which in tum is
0
/
Lef" 0 0 0 I
S/
C
RI
L. • • Move cursor lmd shifts display _1 .connected to an LCD, as illustrated in Figure 4, 7. The LCD eontroller receives collfrol words
from the microcontroller, it decodes the control words and perfonns the corresponding actions
(/ ~
D Sets interface data length. nwnber of
0 0 0 I
L
N F • • disolav lines, and character font
on the LCD/ · . ·
l Once the initialization sequence is done, we can send control words or send actual data to
I 0 DATA Writes DATA r 'be displayed. RS is set t6 low to indicate that the data sent is a control worcCWhen RS is higlt,
, this indicates that the data sent over the communicatiori bus corresponds to a character that is
t
I:·:·
(c) to be displayed.6rvtim~ is ~ 111,.wh~~~~ _!tis~~~1!'~1 ~c:>_~<i or clata, th~nab~~l?,it E _
Codes t must be toggled~e some of the -correspon<liiig__control .words that can be sent
J/D = I cursor moves left DL= I 8-bit Using the initialization codes of Figure 4.7(d), the LCD has beeri set witli an 8-bit
J/D = 0 cursor moves right DL= 0 4-bit interface. In addition; the display has ~n ~l!!3r~ the cursor 1s~ffilhenbmel>i)S1tio11; and the
S = I with display shift N= I 2rows ~ - ci!:_rsor m<>ves to the rigtffiis data is' dispiayed (as opposedC"i o the~actual.da~itiiog when we
SIC = I display shift N=O I row Ii write to the LCD). The LCD is now ready to be written to. Usingthe table ofFigure4.7(c),
SIC = 0 cursor movement F=l 5xI0dots ~- we see that in ord<J." to. write data, we ·set RS = l. The actual data we want to write is present
R/L = I shift to right F=O 5x7dots t on DB7-DBO. The WriteChar function, shown in Figure 4.7(d), accepts a c~cter which will
R/L = 0 shift to left
-------------------------------i
i Jxd sent to the LCD controller to displl!Y on the LCD. The EnableLCD function toggles the
· . enable bi~d acts as a delay so that the command can be processed and executed. •
Fi!l"re .4.7: Example of LCD initialization: (a) components, (b) initialization sequence, (c) control codes, (~)
nucrncontrollerpseudocode. .
i _ _ _ _ _:·_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Q
~
::>()
__::..:,;...;).....,;;,=
One of the simplest LCDs is a seven-segment LCD. Each of the seven segments can be '· 4' 6 Keypad Con_ trollers .
.activated, enabling the display of any digit character or one of several letters and symbols. · to
A keypad consists-of a set of buttons that may be pressed to provide input an embedded \i'
Such an LCD may have seven inputs, each corresponding to a segment, or it may have oiily · J system. Again, keypa ,are extremely common in embedded systems, since such systems may
four inputs to represent the numbers O through 9. ·Ari. LCD driver converts these inputs' to the lack the keyboa . , comes standard with desktop systems.
electrical signals necessary to excite the appropriate LCD segments. (A sim _ ons arranged in aµ N-col y M-row -grid as illustrated in
Figure . . e device has _N outputs, each-output nding to a colunm, and another M
www.compsciz.blogspot.in
Chapter 4: Standard Single-Purpose Processors: Peripherals 4.7: Stepper Motor Controll~rs
outputs, each output corresponding to a row. When we press a button, one column output and·
one row output go high, uniquely identifying the pressed button. To read such a keypad from · B
software, we must scan the column and row outputs. Thi, scanning may be perfonned by a
keypad c oiler.. Actually, such a device decodes rather than controls, but we'll call it a
"con er" for consistency with ' e other peripherals discusse simple form of such a I
c rolleri,,as shown in Figure 4.8lscans the column and row puts of the keypad. When the
ontroller detects a button press, it ~ores a code corres . g to t ~ re ·ster,
a
key_code, and sets an output high, k_yressed, indica · that button has been pressed ur
software may poll this output every 100 millisec as or so, and read the register n the
output is high. Alternatively, this output can generate an interrupt on our general~purpose
orientation of the motorc Thus, rotating the motor 360 degrees requires applying CllfTent to the
processor, eliminating the need for polling.
coils in a specified sequence. Applying the sequence in reverse causes reversed rotation.
In some cases, the stepper motor comes with four inputs corresponding to the four coils,
and with documentation that includes a table indicating the proper input sequence. To control
4.7 Stepper Motor Controllers the motor from software, we must maintain this table in software, and writ':.·a step routine that
applies high values to the inputs based on the table values that follow the previously applied
Overview values.
In other cases, the-Stepper motor comes with a built-in controller, which·is an instance of
A stepper motor is an electric motor that rotates a fixed. number of degrees whenever we
apply a "step" signal. In contrast, a regular electric motor rotates continuously whenever a special-purpose processor, implementing this sequence. Thus, we merely create a pulse on
power is applied, coasting to a stop when power is removed. We specify a stepper motor an input sigrtal of the motor, causing the controller to generate the appropriate high signals to
either by the number of degrees in a single step of the motor, such as 1.8 degrees, or by ~e. the coils that will cause the motor to rotate one step~. ·- _
number of steps required to move 360 degrees, such as 200 steps. Stepper motors are common"'.
in embedded systems with moving parts, such as disk drives, printers; photocopy. and fax er Motor Driver- ~
machines, robots, carncorders,.and VCRs. · ,,, ~ - . trolling a stepper or requires apPlying a series of voltages to the four (typically) coils
Internally, a stepper motor typically has four coils. To rotate the motor one.step;'.we pass of the stepper ino . The coils are energized one or two at a time causing the motor to rotate
current tluough one or two of the coils; which particular coil or coils depends on the present one step. In s example, we are using a 9-volt, 2-phase bipolar stepper motor. Figure 4.9
shows . le. indicating the input sequence required to rotate !}le motor. The entire sequence
\
98 Em~ded System Design ·:: l;mbedded System Design 99
• • ~- - - -- _ _ _ _ , J . ~ w- • -·•••• •• www.compsciz.blogspot.in
I
· .:
·''
4.7: Stepper MotorControllers
Chapter 4: standard Single-Purpose Processors: Peripherals
--~----
··--·-·--··- ---·-·
--------~:=- S~om 01--
www.compsciz.blogspot.in
•~= =•" ..~ -/ ·-·····- ···--..·>- ·- ·
103
..~ ,.......-~,· .
Chapter,,: stllndard Single-Purpose Processors: feripherals
4.9: Real-Time Clocks
(a) lo I o I o lo Io lo lo I o Since the above value is higher than the input voltage, we insert a zero into the next bit, as
(b) I o 11 I o I o lo I o I o lo shown in Figure 4. l4(c). Note that Vm., is set fo 5.63 V. Now we plug into the formula ~d
compute the next approximation: . 0 ~9:t~ ( 5:,b_,)
(c) Io 11 lo lo Io I o I o 10 ½(5 63 + 3.75) ,.. -""("""o O \V' f uv
y I 0;.J
Io I o lo . . _4.69V . ~ f~ -:;--
0 0 0 0 Smee the above value 1s lower than the mput voltage, we msert a one into the next most
significant bit, as shown in Figure 4. 14(d). Note that Vmin is set to 4.69 V. Now we .plug into
11 I o I o the fonnula and compute the next approximation:
(g) I o I I lo 11 Io 11 I o lo ½(5.63 + 4.69) 0 .\/'~ i \~' }_,,
(h) lo I 1 I o 11 Io I 1 lo It 1% s.16v Vo 0 / /~
Sine:the above value is higher than the input voltage. we insert a zero into the next :most \.Y
significapt bit. as shown in Figure 4. l4(e). Note that V= is set to 5.16 V. Now we plug into
Figure 4.14: Successive approximation: giv~n an analog input signal whose ".oltage should range from Oto 15 V, and the formula and compute the next approximation: 0 ,-·i
8 bits for digital encoding, we are to calculate the correct encoding o ~ y ~.518 ,··
½(5.16 + 4.69)
-q c:··"] L__ ";}.
. :...1 ,)
approximation approach to find the correct encoding. We already know that the encoding
should be:
4.93V 3· ,.-'._:tvJ-1'1'-
Since the above value is lower then the input voltage, we insert a one into the next bit, as
5/15 = d/(28-1) shown· in Figure 4. l4(f). Note that i;,,;. is set to 4.93 V. Now we plug into the formula and
d = .85 ./ compute: the next approximation:
Applying the successive approximation method we· start by fmding the halfway po.i.m_~tween ½(5.16 + 4.93)
the maximum and minimum voltages, where Vmar = 15 V and Vmin = 0 '(: .. .. ,. 5.05 V
½(Vmax + Vmi,J Since the above voltage is higher than the input voltage, we insert a zero into the next bit, as
7.5V shown in Figure 4.14($). Note that Vmar is set to 5.05 V. Now we plug into the formula and
compute the next approximation:
Since the above voltage is higher than the inp~tage we insert a zero i~tQ!Jte. hiiw._eJJ ~it, as
sho~ in Figure 4.14(a). We also know that the highest possible value is,:~ ~ set !"m"' 'l:,_(5.05 + 4°.93")
= 7.5 V. Ne~, we plug into the fonnula again and compute the next approxutJaUon: 4.99V
3.75 V
½UJ+O)
The encoding is now done. ote that the division by ½ can be done efficiently in binary
Since the above voltage is lower then the input voltage, we insert a one into the next most .
significant bit, as shown in Figure 4.14(b). We know the lowest possible value !s 3.:5 V, so
Vmin is set to 3. 75 V. Next, we plog into the fonnula and compute the next approximation: ·.
l arithmetic by simply shift" the number to riglu. The resulting value, shown in Figure
4.14(h), is OIO IO lO I = , as expected.
½(7.5 + 3.75)
4.9 Real-Time Clocks
5.63V '
Much like a digital wristwatch, a real-time clock (RTC) keeps the time and date in an
embedded system. Real-time clocks are typically composed of a crystal-controlled oscillator.
numerous cascaded counters, and a battery backup. The crystal-controlled oscillator generates
- •
4.12: Exercises
Spasov, Peter, Microcontroller Technology: The 68HCJ 1, 2nd edition. Englewood Cliffs
a very consistent nigh-frequency digital pulse that feeds the cascaded ~ounters. The fi~
NJ: Prentice Hall, 1996. Contains .descriptions of principles and details for commo '
counter, typically, counts these pulses up to the oscillator frequency, which corr~nds to 68HCll peripherals. n
exactly one second. At this point, it generates a pulse that feeds the next counter. This counter
counts up to 59, at which point it generates a pulse feeding the minute counter. The ho~, date,
month, and year counters work in a similar fashion. In addition, real-time clocks ~dJust ~or
leap years. The rechargeable back-up battery is used to keep the real-time clock runrung while
4.12 Exercises
4.1
Given ~ ti~er structured as in Figure 4. I(c) and a clock frequency of JO MHz: (a)
the system is powered off.
· From the microcontroller's point of view, the content of these counters can be set to a Detemune its range and resolution. (b) Calculate the terminal count value needed to
desired value, which corresponds to setting the clock, a.1d retrieved. _Communica~on between measure 3 ms intervals. (c) If a prescaler is added; what is the minimum division
needed . to measure ~ interval of 100 ms? (Divisions should be in powers of 2.)
the microcontroller and a real-time clock is typically accomplished through a senal bus, such
as r2C. It should be noted that, given a timer peripheral, it is possible to implement a real_-ti~e Detemune this _designs range and resolution. (d) If instead-of a prescaler a second J6-
clock in software running on a processor. In fact, many systems use this approach to maintain b1t _up-counter 1s cascaded as in Figure 4. l(d), what is the range and resolution of this
the time. However, the drawback of such systems is that when the processor 1s shut down or design? .
reset, the time is lost. 4.2
A watchdog timer that uses two cascaded 16-bit up-counters as in Figure 4. l(d) is
connected lo an 11.981 MHz oscillator. A timeout should occur if the function
watchdog_reset is not called within 5 minutes. What value should be loaded into the
4.10 Summary 4.3
up-counter pair when the function is called?
Given a controller with two built-in timers designed as in Figure 4. l(b) write c code
Numerous single-purpose processors are manufactured to fulfill a specific function in a
"'for _a fu~ction "double RPM" that returns the revolulions per minute of ~me device or
variety of embedded systems. These standard single-purpose processors ~ay b~ fast and
small and they have low unit and NRE costs. A timer infonns us when a part.tcular interval of _;: I 1f a time~ overflows. Assume all inputs to the timers have been initializ.ed and the
time has passed, while a watchdog timer requires us to signal it_within a particular inte':'al to . timers have_ been started before entering RPM. Timer! 's cnt_Jn is connected to the
indicate that a program is running without error. A counter infonns us when _a particular device an_d 1s pulsed once for each revolu_tion. Timer2 's elk input is connected to a JO
number of pulses have occurred on a signal. A UART converts parallel data to senal ~ta, and Mllz osc1lla1or. The timers have the outputs cntl , cnt2, top! ,. ·and top2, which were
m1t1al1zed to O when their respective timer began. What is the minimum (olher than O)
vice versa. A PWM generates pulses on an output signal, with specific high and low t1_mes: An
and maximum revolutions per minute that can be measured if top is not used?
LCD controller simplifies the writing of characters to an LCD. A keypad controller simplifies 4.4
capture and decoding of a button press. A stepper-motor controller assists us to rotate a Given a 100 MHz crystal-controlled oscillator and a 32-bit and anv number of 16-bit
stepper motor a fixed amount forward or backward. ADCs and DACs co~vert analog signals . tenninal- Ont timers, design a real-time clock that outputs the dat~ and time down to
to digital, and vice versa. A real-time clock keeps track of date and time. Most of these l mil · s<>nds. You can ignore leap years. Draw a diagram and indicate terminal-count
- aluesfor all timers. .. .
single-purpose processors could be implemented as ,software 0~ a general-purpose processor, r Determine lhe values for smod and THI to generate a baud rate of 9,600 for lhe 8051
but such implementation can be burdensome. These standard smgle-purpose ~rocessors thus ·
simplify embedded system design tremendously. Many microcontrollers integrate these . baud rate equation in the chapter, assuming an 11. 981 MHz oscillator. Remember that
.1mod is 2 bits and Tlfl is 8 bits. There is more than one correct answer.
processors on~hip. 4.6·
A particular motor operates at IO revolutions per second when its controlling inpul
voltage 1s 1.7 V. Assume that you arc using a microcontroller with a PWM whose
output port.can be sci high (5 V) or low (O V). (a) Compute the duty cycle necessa,y to
4.11 References and Further Re~ding obtam JO revolutions per second. (b) Provide values for a pulse width-and period that
• Embedded Systems Programming. Includes information on a variety of single-purpose. achieve this duty cycle. You do not need to consider whether the frequency is too high
processors, such as programs for implementing or using timers and UARTs on or too low allhough the values should be reasonable. There is no one correct answer.
47
microcontrollers. Usmg the PWM described in Figure 4.6 compute the value assigned to PWMJ re
ach1e,·c.an RPM of 8.050 as~ Ln_g_lhc _irip.u.Lv_ollagc nee~~4_i.s 4.3?._5 V. ,
W_ntc _a function m _pscudocode that m111alizes the LCD described in Figure 4.7. Aft<·
1111ualizat1on. the display should be clear with a blinking cursor The initiali,-.u1io1~
~~'
107
. - - --~----- ;www.compsciz.blogspot.in
__.. . .:.___. c. .:.:= =;;.._------
.
'.l ': 1
· should set the following data to shift to the left, have a data length of 8-bits and a font
of 5 x 10 dots, and be displayed on one line.
4.9 Given a 120-step stepper motor with its own controller, write a C function Rotate (int
degrees), which, given the desired rotatlon amount in degrees (between O and 360),
pulses a microcontroller's output port the correct number of times to achieve the CHAPTER 5: Memory
desired rotation.
4.10 Modify only the main function in Figure 4 .12 to cause a 240-step stepper motor to
rotate forward 60 degrees followed by a backward rotation of 33 degrees. This stepper
motor uses the same input sequence.as the example for each step. In other words, do not
change the lookup table.
4.11 Extend the ratio and resolution equations of analog-to-digital conversion to any voltage
5.1 Introduction
range between Vmin to Vmaz rather than Oto _Vma.·
4.12 Given an analog output signal whose voltage should range from Oto 10 V, and an 8-bit 5.2 Memory Write Ability and Storage Permanence
digital encoding, provide the encodings for the following desired voltages: (a) 0 V, (b) 5.3 Common Memory Types
1 V, (c) 5.33 V, (d) IO V, (e) What is the resolution of our conversion? 5. 4 Composing Memory
4.13 Given an analog input signal whose voltage ranges from O to 5 V, and an 8-bit digital 5. 5 __Memory Hierarchy and Cache
encoding, calculate the correct encoding for ·3.5 V, and then trace the ~ - Advanced RAM
successive-approximation approach (i.e ., list all the guessed encodings in the correct 5.7 Summary
order) to find the correct encoding.
ii 5.8 References and Further Reading
4.14 Given an analog input signal whose voltage ranges from - 5 to 5 V, and a 8-bit digital
encoding, calculate the correct encoding 1.2 V, and then trace the successive- 5. 9 Exercises
1: .
approximation approach to find the correct encoding. .
Ji 4.15 Compute the memory needed in bytes to store a 4-bit digital encoding of a 3-second f
I analog audio signal sampled every 10 milliseconds. f
I
5.1 Introduction
Any embedded system's functionality consists of three aspects: processi~g, storage, and
communication. Processing is the transformation of data, stm:age is the retention of data for
later use, and communication is the transfer of data. Each of these · aspects must be
I eight input/output data signals. To read a memory means to retrieve the word of a particular
address, while to write a memory means to store a word in a particular address. A memory
access refers to either a read or write. A memory that can be both read and written ~ ~ .
www.compsciz.blogspot.in
· · ·' - .. -···-·-·· -- ·------··---------- -- ····-·· ··- ----·---,·- -· ----------- __________
- ··- ·-- ---'--···" ._....
- - - - - --- --- · ~ ---·--- i
Chapter S: Memory
5.2: Memory Write Ability and Storage Permanence
rn x n memory
r/w
enable 2• x n read and
write memory Mask-programmed ROM Ideal ,gemo,y
1~ "-----v----1
Ao
Life of
product
•
OTP ROM
•
11 bits per word
On-I Qo Tens of EPROM !EE~ROM FLASH
(a) (b)
• • •
NVRAM
Figure 5 l : t..t..iory: (a) words and bils per word.7ry block diagram.
.. •
···-··-·····-···········--···-·---- ·--·--·- ··-·-···----·-·-···--·- - ···-·····-··---··-··- - -··-··
In-system
additional control input. labeled r 1w in Figure 5. l(b). to indicate which access to perform_ SRAM/DRAM
i programmable
Most memo!}· types have an enable control input. which when ~cassertcd. causes the memory
to ignore tJ1e address. such that data is neitJ1cr written to or read from the memory. Some types
Near
zero r Write
ability
•
of memo11·. known as multiport memory. support multiple accesses to different locations During External External fa.1emal
simultaneously. Such a memory has multiple sets of control lines. address lines. and data fabrication programmer, programmer ogrammer programmer In-system, fast
lines. where one set of address and corresponding data and control lines is known as a port. only one time only OR in-system, OR in-system., writes,
Memory has evolved ve_T)' rapidly over the past few decades. The main advancement has 1,000s block-oriented wilimited
been tJie ttend of memol}·-chip bit-capacity doubling every 18 months, following Moore's of cycles writes, 1,000s cycles
of cycles
Law. The importance ofthisl(end in enabling today"s sophisticated embedded systems should
not be underestimated. No matter how fast and complex processors become. t_hose processors
Figure 5.2: Write ability and storage pennanence of memories, showing relative degrees along each axis (nol lo
still need memories to store programs and to store data to operate on. For _example, a digital scale).
camera is possible not only bccaule of fast A2D and compression processors but also because
of memories capable of storing sufficient quantities of bits to represent quality pictures.
Further advancements to memory have blurred tJ1e distinction between the two traditional
memo!}' categories of ROM and RAM. providing designers with the benefit of more choic,f
Traditionallv. tlie term ROAi has referred to a mcmorv that a processor can only read. and 5.2~;;;;rite Ability and ~ e n c e
which hold; its stored 'bits even without a power sou.rec. The term RAM has referred to a
memory that a processor can both read and write but loses its stored bits if power is removed._ Write Ability
However. processors can not only read. but also wrile to advanced ROMs. like EEPROM and We use the term write ability to refer to the manner and speed that a particular memory can be
Flash. although such writing may be slow compared to writing RAMs. Furthermore. advanced written. Al_l types of memoi:y can be read from bfa processor, since otherwise their stored
RAMs, like NVRAMs. can hold their bits even when power is removed. bits would sen ·e littJe purpose in an embedded system. Likewise, all types of memory can be
Thus, in this chapter, we depart from the ttaditional ROM/RAM distinction. and instead written. since otherwise we w9_uld have no way to store bits in such a memory. However, the
distinguish among memories using two characteristics. namely write ability and storage manner and spc_i_:d ofsuch writing vari~l'_greatly among memory types.
permanence. We then introduce forms of memories commonly found in embedded systems. Afthe-Y1igh encf.of the rai'C ' ·- . we have tYJ>CS of memory that_ a processor
We describe techniques for the common task of composing memories to build bigger can write to simply and quickly by setti· uch a memory's address lines, datJ, input-bits, and
memories. We describe the use of memory hiernrchy to improve memory access speed. control lines lippropriately .o_F.ard e middle of the ran e,
we have types of memory that are··
slower to write by a processor ~~~ e . er~!!._ ofth; range; we liavel)yes
of memory .
that can on! be' writte b a s u:ce ot~uipmenf called a "progranurre_y. 11lis
de~ice=must apply cia1 voltage
--,,.-C--:;:= =-=--- - - - ---·-···
levels
Jo wn-'re:.c.To·-llie memory, also knO\m as
www.compsciz.blogspot.in
!.
l .I 5.3: Common Memory Types
Chapter 5: Memory
"programming" or "burning" the memory. Do not confuse this use of the term programmer
with the use referring to someone who writes software. At the low end of the r_ange of write· 8 x 4 ROM
ability, we have types of memory that can only have their bits stored when the' memory chip r~~"l-..e..::r-<"'::::r...._~-=- word 0
itself is being fabricated. · enable zk x nROM 3 x 8 r---.::r--==t---.::t-e..r- word I
decoder word 2
Storage Permanence t-4<:::t-""d--4's::t-e<;j,---E-, word line
Storage permanence refers to the ability of memory to hold its stored bits after those bits ·have
been written. At the low end of the range of storage permanence is lTlelJIOry t_~ t begins to lose
. b. s almost immediately after those bits are written, an"inlierefoie must be ·continually
hed. Next is mem._q_ry_that wiH ho~d its bits as long as power is applied tq the me?1o~. . data line
Qo
comes memory that can hold 1ts bits for days, months, or even years after the memory s programmable
wired-OR
po :er source has been turned off. At the high-end of the range is memory that ~sentially '· • connection
er lose its bits- as long as the memory chip is not damaged, of course.v .
The terms nonvolatile and volatile are commonly used to divide mernocy_ty~two (a)
e
gories along the stora permanence axis, as ·shown in Figure 5.2. Nonvolatile memory
can hold its bits even . e~ power is no longer supplied. Com:ersely, volatile ~ i:equires Figure 5.3: ROM: (a) external block diagram, (b) internal view of an 8 x 4 ROM.
continual power. to tam its data. ·--~ ·.
Likewise, e term in-system programmable is used to divide\ memories into two We can use ROM for various purposes. One use is to store a software program for a'
the write ability axis. In-system prograrnmabl emoryJ;lUl-be-written to by general-purpose processor. We may write each program instruction to one ROM word. For
-· p _ ·ng in the embedded system thal,:f'JReS..-1J1e memory. Conversely, a memory
some processors, we write each instruction to several ROM words. For other processors, we
-system prog ble must be en by some external means, rather than may pack several instructions into a single ROM word. A related use is to store constant data,
al o ration the embedded o=-- < like large lookup tables of strings or numbers.
A second common use is to store constant data needed by a system. A third, less
common, use is to implement a combinational circuit We can implement any combinational
As described in Chapter 1, design metrics often compete with one another. Memory write : function of k variables by using a 2k x l ROM, and we can implement n functions of the same
ability and storage penYlanence are two such metrics. Ideally, we want a memory· with the k variables using a 2k ~ n ROM. We simply program the ROM to implement the truth table
highest write ability a4,d the highest storage permanence, as illustrated by the ideal memory for the functions, as shown in Figure 5.4.
point in Figure 5.2. Unfortllllately, write ability and storage permanence tend to be inversely Figure 5.3(b) provides a symbolic view of the internal d.;sign of an 8 x 4 ROM. To the
proportional to one another. Furthermore, highly writable membry typically requires more right .o f the 3 x 8 decoder in the figure is a grid of lines, with word lines running horizontally
area and/or power than less-writable memory. and data lines vertically; lines that cross without a circle in the figure are not connected. Thus,
word lines only connect to data lines via the programmable connection lines shown. The
figure shows all connection lines in place except for two connections in word 2. To see how
this device acts as a read-only memory, consider an input address of 010. The decoder will
5.3~mmon Memory.Types thus set word 2 's line to I. Because the lines connecting this word line with data lines 2 and 0
do not exist, the ROM output will read 1010. Note that if the ROM enable input were 0, then
Introduction to "Reaq-Only" Memory - ROM no word would be read, since all decoder outputs would be 0. Also note that each data line is
ROM, or read-only memory, is a nonvolatile memory that can be read frnm, but not written shown as a wired-OR, meaning that the wire itself acts to logically OR all the connections to
to, by a processor in an embedded system. Of course, there must be a mechanism for setting it.
the bits in the . memory, but we call this progranuning, not writing. For traditional types of · How do we program the programmable connections? The answer depends on the type of
ROM, such programming is <kine off-line, when the memory is not actively serving as a ROM being used. Common types include mask-programmed ROM, one-time progranunable
memory in an embedded system. We program such a ROM before inserting it into the ROM, erasable programmable ROM, electrically erasable programmable ROM, and Flash, in
embedded system. Figure 5.3(a) provides an external block diagram of a ROM · order of increasing write ability. In tenns of write ability, the latter two have such a high
www.compsciz.blogspot.in
. ·-- ------~· . ---- ··-· .
·i
tJ
~!
Chaplet" 5: Memory
~ --· 5,3: Conwnon Memory Types _ ~l.
connectio
. n can never be reestablished. For this reason, basic PROM is often re£erred
Truth table (ROM contents) toas
one-ume-programmable ROM, or OTP ROM. -
Inputs (address) Outputs
OTP ROMs have the lowest write ability of all PROMs, as illustrated in figure
a b C V z 5.2, since
wordO they~ only be written once, and they require a progranuner device. However
0 0 0 1-00 ....... 0(".: ....~ very high storage permanence, since their stored bits.won't change unless someone
, they have
0 0 I word I recorutects
0 I 0 o···-··T···' 0 the device to a programm er and blows more fuses. Because of their high storage
permanence,
0 I I I 0 I 0 OTP ROMs are commonl y used in final products, versus other PROMs,
enable which are more
I 0 0 I 0 I 0 susceptible to having their contents inadvertently modified from-radiation, malicious
I 0 I I I ness or
I I jUSt the mere passage Of many years:·· "-- -- I
I I 0 I I C
I I ,
I I I b-__;_ ~-a---- i---1 OTP ROMs are also cheaper per chip"thani>~er PROMs, often costing under
I I I I word? a dollar
each. This also makes them more attractive in final products versus other types
of PROM, and
y z also versus mask-pro grammed ROM when time-to-rruuket constraints or unit costs
make them
a ~tter choice. Because the chips are so cheap, some designers even use OTP
ROMs during
design development. Those designers simply throw away the used chips as they
(a) (b) program new
ones. ·
Figure 5.4: Implementi ng combinational functions with a ROM: (a) truth tab!~,
(b) ROM contents . EPROM ·- Erasab le Programmable ROM ,
degree of write ability that calling them read-only memory is not really accurate. Another type of PROM is an erasabie PROM, or EPROM. This device uses a
In terms of MOS transistor
storage permanence, all ROMs have high storage permanence, and in fact, all as its programmable compone nt. The transistor has a "floating gate," shown
are nonvolatile. in Figure 5.5(a),
We now describe each ROM type briefly. meaning the transistor 's gate is not connected and is instead surrounded by
insulator. An
EPROM programm er injects electrons into the floating gate, using higher than
normal voltage
(usuaHy-12 V to 25 V) that causes electrons to tunnel through the insulator into
Mask-P rogram med ROM the gate, as in
Figure 5.5(b). When that high voltage is removed, the electrons cannot escape,
In a mask-pro grammed ROM, the corutection is programmed when the and hence the
chip is being . gate has been charged and programming has occurred. Reading an EPROM
fabricated by creating an appropriate set of masks. Mask-programmed ROM is much faster
obviously has than writing, since reading doesn't require programming. To erase the progra111,
extremely low write ability, as illustrated in Figure 5.2, ·but has the the electrons
highest storage must be excited enough to escape from the gate. Ultraviolet (UY) light is u6ed
permanence of any memory type, since the stored bits will never change unless to fulfill this
the chip is role of erasing, as shown in Figure 5.5(c). The device must be placed under a
damaged. Such ROM types are typically only used after a final design has been UV eraser for a
determined, period of time, typically ranging from 5 to 30 minutes, after which the
and only in high-volume systems, for which the NRE costs can be amortize device can be
d to result in a programmed again. For the UV light to reach the chip, EPROMs come with
a small quartz
lower unit cost than other ROM types.• window in the package through which the chip can be seen, as shown in Figure
l
5.5(d). For
this reason, EPROM is often referred to as a windowed ROM device. EPROMs
can typically
OTP ROM _ One-Time Programmable ROM be erased and reprogrammed thousands of times; and standard EPROMs are guarantee
d to
Many systems use som form of user-programmable ROM devi~e, meaning hold their programs for at least IO years. ·
the ROM can be
programmed by esigner in the lab, long after the chip has been manufactured. Compared with OTP ROM OMs have improved write ability, as illustrate d in Figure
User-progra le ROMs are generally referred to as programm able ROMs, or PROMs. 5.2, since they can be e and reprogrammed thousands of times. However, they have
These devi s are better suited to prototyping and to low-volume applicatio reduced storage pe ence, since they _are guaranteed to hold a program only for about I 0
ns than are
mask-pr ed ROM. The most basic PROM uses a fuse for each programmable years, and the· stored _bits are susceptible to undesired changes · if the
chip is used in
conn ion. To program a PROM device, the user provides a file that indicates en-vironme with much electrical 11oise or radiation. Thus, use of EPROMs in productio
the desired n
R contents. A piece of equipment called a ROM programmer then configure parts i united. If used in production, EPROMs should have their windows
s each covered by a
rograrnmable connection according to the file. Note that here the programm stic r _to reduce _the likelihood of undesired changes of the memory.
er is a piece of
equipmeat, not a person who writes software. The ROM programmer blows \
fuses by .passing
a large current wherever a connection should not exist. However, once a fuse
is blown, the
__ ___ ._............___,.~ ·~ ·-·- - -. - - ----· - --------··-·· ··-·-· ·· www.compsciz.blogspot.in ·-·--·--···-- ~----- . ·- -~-- ... - ,. ---- ·--- - -·- ... - - - -~--
I
EPROMs can only be erased in their entirety. EEPROMs are typically more expensive than
. EPROMs:, but far more convenient to use. EEPROMs are often -called E2 s, pronounced "E-
squareds..
Because EEPROMs can be erased and programmed electronically, we can build the
circuit providing the higher-than-nonnal voltage levels for such electronic erasing and
programming right into the embedded system in which the EEPROM is being used ..Thus, we
can treat this as a memory that can be both read and written - a write to a particular word
would consist of erasing that word followed by progranuning that word. Thus, an EEPROM is
in-system progranunable. We can use it to store data that an embedded system should save
after power is shut off. For example, EEPROM is typically used in telephones that can store
(a) Initially, the negative charges form a ~hannel commonly dialed phone numbers in memory for speed-dialing. If you unplug the phone, thus
between the source and drain of the tranststor
shutting off power, and then plug it back in, the numbers will still be in memory. EEPROMs
storing a logic l at that cell's location.
can typically hold data for IO years and can be erased and programmed tens of thousands of
times before losing their ability to store data.
+15V
In-system programming of EEPROMs has become so common that many EEPROMs
(b) By applying a larg~ positive vol~e come with a built-in memory controller. A memory controller hides internal memory-access
at the gate of the transistor, the negab.ve details from the memory user, and provides a simple memory interface to the user. In this
charges move out of the c~el area ·case, the memory controller would contain the circuiUy and single-purpose processor
and get trap~ in the floa~ gate: necessary for erasing the word at the user-specified address, and then programming the
storing a logic O at that cell s location.
user-specified data"irito that word.
Flash Memory
to UV hght to erase e · · h lectr ·c s
al~ erased eiectronically, typically by using higher than no~ voltag~.eduf~r ;PRO':. Flash memory is an extension of EEPROM that was developed in the late 1980s. While also
erasing typically only requires seconds, rather than the many mmutes reqwr h eas using the floating-gate principle of EEPROM, flash memory is designed such that large
Furthermo ' re, EEPROMs ::an have individual words erased and reprogrammed, w __ er _ blocks, of memory can be erased all at once, rather than just one word at a time as in
ii
116
Embedded System pesign ' Embedded System Design 117 n
T'
www.compsciz.blogspot.in ___j
·- .. -----'----·- · ........ . _.___
_ .
-- --··- --·--'·---·~ . ----~ -
-·
Chapter 6: Memory
5.3: COIJWnori Mern T •--·
ory ypes
traditional EEPROM. A block is typi~ally several thousand bytes large. This fast erase ability
can vastly improve the performance of embedded systems where large data items must be
l stored in nonvolatile memory, systems like digital cameras, TV set-top boxes, cell phones,
and medical monitoring equipment. It ·can also speed manufacturing throughput, since
programming the complete contents of flash may be faster than programming a similar-sized
EEPROM.
I
Like EEPROM, each block in a flash memory can typically be erased and reprogrammed
II tens of thousands of times before the .block loses its ability to store data, and can store its data
for 10 years or more.
A drawback of flash memory is that writing to a single word in flash may be slower than
w
i writing to a single word in EEPROM, since an entire block will need to be read, the word (a)
within it updated. and than the block written back. · (bl
Figure 5.7: Memory cell internals: (a) SRAM. (b) DRAM.
iii i
.
wnte and the row is enabled and
~~~e;:g~!~~;~ : ;~; : : ~s~~t:~
than dynamic RAM Furth--- - - ---",- -
. .
: :~~tlu~.
h th . s ores e mput data bl! when rd 117 indicates
~i! \\~h~~-~~'.»·r .i~ilicates re,d and
. - '.1°1 c. ~ta~ic RAM.1s fast.er but larocr.
1
errnore, stallc RAM is easily im 1 -· - d ·-- .-- . . . " •
processors, whereas.dynamic RAM is usually i·m I d p emente on._~_same :r ,1 ,
i{
enable 2 x4 ._ p emente on..asep_<ll]!t_e 1c;,, .
'I l SRAM - Static RAM ;;;:::;
I
Ao Static RAM or SRAM
Ij A, ·. ' , uses a memory cell shown in Figure 5 7( ) - .
to sto_~~a: -~it E~-=bjJ .thus requires abou; six . - ·.- -a , cons~stm~ of a It "i°'f-
I because !~Will hold its data: as fon ~ as ~w · -~3:fl~l/i!o~s. This RAM type )S S1,!!$L::;::J ,,
RAM is typ· u· - . .·· - ~,c~ --. p ens supphed, m contrast to dvnanuc RAi\r -:. ·,'
rd/wr . , 1ca, y used for h1gh-peifo ce part f --······ - . ,,, "
r . . . . . - -- . . . so a system (e.g., cache).
=--- .y : • ,..:: ~':'.!
fJ
-, ·~ : .-, .-. . !' / ."','·
Figure 5.6: RAM internals.
.' ,~
y- ~': ·':{:, '. G >.
www.compsciz.blogspot.in
{f.S~,('/A
_.: .. . .
Chapter 5: Memory
5.3: Common Memory Types
26
/CS I
CS2 HM6264
27C256
PSRAM - Pseudo-Static RAM ~ (a)
Many RAM variations exist. Pseudo-static RAMs, or PSRA.i\.fs, are DJlAMs with a memory
refresh co Iler built-in. Thus, since the RAM user need not wom--about refres!!mg, the Device Access Standby Active Pwr. Vee Voltage
devic pears to ~have much like ·an ~..Ho'wever, ~ -confrast ~~. a HM6264
Time (ns)
85-100
Pm. (mW)
.01
(mW) (V)
may be busy reftes~ing itself when accessed, which could slow access time and add 27C256
15 s
90 .5 IOO s
some system complexity: Nevertheless, PS popular low-cost high~ensfiy memory
alternative to SRAM in many embedded systems. " (b)
lrr-P c_.,,~'(
.~'>VJ
_1,
Read oikration
NVRAM - Nonvolatile RAM Write operation wlj
\/Nonvolatile RAM, or NVRAM, is a special RAM variation that is able to hold its data even f data
;.
aftffextemal power is removed. There are two common types t,NVRAM. -,r======!~
"I
-1 ~
data
One type, often called battery-backed RAM, contains'a ·static RAM along with it own
permanently connected battery. When external po e(is removed or drops below a certain
threshold, ·the -irijmal battery maintains power J.!(!Jle--SRAM, a n ~ o r y continues
addr
OE
~ -~
.______Jr--
addr
WE
-1_~_-_-_-_.::,:::_::----l~
to store..i~itsr Com ared ~itll._Qth~r-~~Illl ' f'iionvolatil~ memo~ battery-baci~ -~ > is /CSI /CS ! - - - ~...__ _ __ _1
byte 1/0. The interested reader should refer to the manufacturer's datasheets for complete
timing information. The read operation can be initiated with either the address status
processor (ADSP) input or the address status controller (ADSC) input. Here, we have asserted
data<31.. .0> Device Access Standby Active Pwr. Vee Voltage both. Subsequent burst addresses can be generated internally and are controlled by the address
Time (ns) Pwr. (mW) (mW) (V)
adyance (ADV) input. In other words, as long as ADV is asserted, the device will keep
addr<IS ... O> TC55V23 JO na 1200 3.3 incrementing its address register and output the corresponding data on the next clock cycle.
25FF-100
addr<IO ...O>
(b)
/CS! 5.4 Composing Memory
/CS2 ~ embedded system designer is often f;:i~ with the situation o_f needing a particular-sized
A single read operation
memory (ROM _o r RAM), but ha~mg readily available memones _o f a .different siz~or
example, the designer may need a_2 x 8 ROM, but may have 4k x 16 ROMs readily ava1lable.
/WE
CLK a
Alternatively, the designer may need 4k x 16 ROM, but may have zk x 8 ROMs available for
use. (le.('cft..dL ~~ ~ -
/ADSP The case where the available memory is larger than.needed_js-easy"to deal with. We
/OE
/ADSC simply use the needed l~we.r words. inJ!!Dnenro~ignoring-aj!heedciHtigher words aii.cf-
MODE their high-order addtess -~ iand t s ; use the lowe_r ]a~ mput/outputhnes, thus ignoring
/ADV unneeded hd1gher dala Ii~ Of rse, W! Wuld_~f~}~e ~igh~j line~ ~~ignor! . the lower
/ADSP 1mes mstea . . • _;rr - ·
addr<l5 ... 0> The case where the available memory is smaller than needed requires more design effort.
/ADSC /WE In this case. we must ·comµpse several smaller· niemories io behave as the larger· memory we
/ADV /OE need. Suppose the available memories have. the correct number of words, but each word is not
\_yi_~_e cnpµ_gh. In this case, we. can simpli conoei::t The available memories side-by-side. For
CLK /CS I and /CS2 exampT., igure 5. IO(a) illµstrates the situation Qf_fil!_eding_a..B.OM ,three-times wider that
ava· e. We connect-thr¢eROf,XS"side-by-side, sharing the same address and
TC55V2325F CS3 - n th d concatenating the data lines to form the desired word ,vidth.
F-100
_ uppose mstea at the available memones _ave wor w1 t , ut not enough
data<31 ...O>
~d,:;. In this case, we can connect the availa~le memories' top ·lo botfom. For-exan1ple,
Figure 5.1 O(b) illustrates the situation of needing a ROM with twice as many words, and
(a) (c)' hence needing one extra address line, than that availabk We connect the ROMs top to
bottom. ORing the corresponding data lines of each. We use the extra ltigh-order address line
to select the higher or lower ROM using a I " 2 decoder, and the remaining address lines to
Figure 5.9: TC55V2325FF-IOO RAM devices: (a) block diagram, (bi characteristics, (c) timing diagrams.
offset into the selected ROM. Since only one ROM will ever be enabled at a time, the ORing
of the data lines never actually involves more than one nonzero data line.
If we instead needed four times as many words, and hence two extra address lines, we
Example: TC55V2325FF-100 Memory Device would instead use four ROM 2 x 4 decoder having the two high-order address line as
In this example, we introduce a 2-megabit synchronous pipelined burst SRAM mem~ry input »'.9!:!ld select one e four ROMs to access.
device, shown ill Figure 5.9(a), designed to be interfaced with 32-bit process?rs. This device, Suppose the -ailable memories ve a smaller word width..as well as fewer wor s t ian
made by Toahiba Inc., is organized as. 64K x 32 bits. Figure 5.9(b) sununanzes some of the necessarv. then combine the above two techpiques, fir cieaiing the number of columns.
characteristics of this device. of mem es necessary to achieve· the needed word wi and then creating thenumber of
In Figure 5.9(c), we present the timing diagram for a single read operation. Write rows o memories necessary, along with a deco s. ·
operation is similar. This device is capable of fast sequential reads· and writes as well as single The a p p r ~ t e c i in Figur
(a)
2m x 3n ROM /
Ao
2rn•I x n ROM
A
2'" xn ROM
Disk
Ao
Am-I
Arn Jx2 Tape
decoder
-.. ..........-- ................... _________________________________ ______________________
: .............................. , .. , , , , .....--------·-···-··············
,,,,
outputs
Figure 5.11 : An example memory hierarchy.
enable - + - - - - '
www.compsciz.blogspot.in
Chapter 5: Memory
rr=====,.===;~=v===,.q
only a fraction of the size of main memory. Cache access time may be as low as just one Tag
clock cycle, whereas main memory access time is typically several cycles. ·
~ache -99erat,;s as follows. When we want the processor to access (read or write) a V
main mt.iiory address, we first check for a copy of that location in cache. If the copy is ln the ·
cache, called a cache hit, then we can access it quickly. If the copy is not there, called a cache.
(a)
miss, t.!1-_~n we must first read the. addre~s and perhaps some of its neighbo~o .!!.1e. cad~,,
This description of cache operation leads to several cache design choices: cacne m a ~
cache rep\acement policy1 and cac!_ie write techniques. The_se ..design . choic~~ave
significant impact on system cost, pe ormance, as well as power, and thus should be
evaluated carefully for a given appr 10 ' ·
e Mapping Techniques
'Cache mapping is the method for assigning main memory addresses to the far fewer number
of available cache addresses, and for determining whether a particular main memory address's
contents are in the cache. Cache mapping can be accomplished using one of three basic
techniques (see Figure 5.12):
(b)
I. In direct mapping, illustrated in Figure 5.12(a), rhe ·main memory address is divided
into two fields, the index and the tag. The index represents the cache address, and
thus the .number 'or index bits is determined by the cache size (i.e., index size =
log2(cach size)}»)No~~ that many different ·main memory addresses will map to thJ
e che a<f~ss. Wh91 we store .the -contents of a main memory address in the
a
e, we·'a!s~ store the ! ~ o determine if desired main memory address' is' in the
.. he, we g~ to the cat~ddress indicated by the ~~x, an~ c?mp~~-~g:!!!e~
~1~h the desired tag. If the ta_gs~match, then.we-,c~eck-the vahd _bi~ ~lid-.bzt :fag
indicates >Nhether the .data'st6red ih ~t:eache slot nas pr~viously ~~n'toaded _into
the cache fi'.om the main n_ie~ol)1JWe tfSe:f' · ojJ}et po~on of tnl!."I_ l@'.n~l?,' ~<l~ess t~ V
grab a articular word within tire cacne-1 · .cache line, also known a~ ·caehe
bl~_ . . the number of (inseparable) ~d· . nt 'memory a,d dres~es loaded from or
ed into main memory at a time. A.typical block size is four or eight addresses. (c)
<:1' ly .associative mapping, illustrated in Figure, 5.12(b), each cache address .·
'1:l;,, ms !J.Ot only the contents of a niainmemory address, but also the complete main
di a
address. To determine if desired main memory address is in the cache, we .
I1 ired address.
..
si I eously (assodaiively) compare-all-the
··
\ .
_ ·
addresses
·, .
' , ·
stored
. in the. cache with the
.J . •·
' "
'' i
1.·1
,: I
In set-associative mapping, illustrated in Figure 5.12(c), a compromise- is reached
,1,I betwee /direct and fully associative mapping. As in direct mapping, aJ\. ~ a P L
u ea m · memory address to a cache address, b ~ s contains
1
e ontent and tags of two~ o f y ~ < : _ l y,,a ~et oi e~!lfy~ ·
etenni~e i~ a .desired ~ memory addre~ is · . 'ffie ·cac~ i:9e_. B': _ l ! ~__cac~e' Figure 5. I 2: Cache mappjng techniques: (a) direct-mapped, (b) rur' y associative, (c) two-way set associative.
address md_1¢ated by the mdex, and we then s1 taneousl ~ssoqaUvruY) compare
.Direct-mapped caches are easy to implement, but may result in numerous misses if two
allthe tags at that location (i.e., of that set) · · the d · .. . . : A cache a set of wth or .more words with the same index are accessed frequently, since each will bump the other
size 1:f Js called an. N-way set-associati cao/', jp y!., ~ - w a y set ?'1t ofth~ ~he. F~ly associative~ches on the other hand are rasi:but the a'imparison logic
assoc_ 1at1ve caches are c o ~: ..f .J./ 1 · • .• . . ts expensive to implement Set-associative caches can reduce · misses compared to
direct-mapped caches, without requiring nearly as much comparison logic as fully associative .·
caches.
Caches are usually designed to treat collections of a small number of adjacent main. ·
memory addresses as one indivisible block, also known as a line, typically consisting of about
eight addresses. ·
I Kb 2 Kb 4 Kb 8 Kb 16 Kb 32. Kb 64 Kb
128
Embedded System Design , p - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - -
• Embedded System Design
129
www.compsciz.blogspot.in ./ \~"'. : : : " ' , - - - - - - -
...........·,.. .. ....... . .. .,_. - - - - - = i i !!l!ii!
Chapter S: Memory
S.6: Advanced RAM
Note that the problem of making a cache larger is additional access time penalty, which
quickly offsets the benefits of improved hit rates. Designers often use other methods to data
improve cache hit rate without increasing the cache size. For example, they make a cache set
associative or increase the line size. These methods too incur additional logic and add to the
access time latency. Increasing the line size can, additionally, improve main memory access
time. at the expense of more complex multiplexing of data and thus increased access latency. Sense
Figure 5.13 summarizes the effects of cache size and associativity in terms of average miss Amplifiers
rate for a number of commonly used programs under the Unix environment, such as gee.
The behavior of caches is very dependent on the type of applications that run on the
processor. Fortunately, for an embedded system, the set of applications are well defined and
known at design time. so the designer has the ability to measure the performance of some
candidate cache designs and choose one that best meets the performance, cost, and power ~:, :,
constraints. One way to perform such analysis is as follows. We instrument the executable m m
'::i ..;
with additional code such that, when executed, it outputs a trace of memory references. Then, 0 "O
"O
s -<
we feed these traces through a cache simulator, which outputs cache statistics at the end of its
execution. We can perform all this analysis on our development computer.
address
"'
Cl
&
~
._
ras
5.6
Figure 5.14: Basic DRAM architecture.
icr DRAM as a type of st~~ag_e~ice that uses a single ·
transistor/ citor pair to store a bit. Because of such architecture and the resulting high
capac nd low cost. DRAMs are commonly used as the main memory in yrocessor bas~d J
e ded systems. In order for DRAMs to keep pace with processor speeds, many variations ·
on the basic DRAM interface has been proposed. In this section, we d~rtbe the stn"icture of a
~·::
~ s i c DRAM
. .
.
------------- --~
basic DRAM as well as some of the mo~ recent and advanced DRAM desj2_ns_.... .
.
, .
l
;.
The basic DRAM architecture is depicted iil Figure 5.14. The addressing mechanism for a :
memory read· works as follow/Th; address bus is multiplexed between row and column.·
components. Using the row adMs°" select (ras) signal; the row component of the address is ·
latched into tlte row address buffer. Likewise, using the column address select (cas) signal, ·
the col component of the address is latched into the column address ~offer. (Note that in ·.
days, the number of 1/0 pins were limited, hence manufacturers of DRAMs adopted ·
s multiplexed scheme to reduce e overall 1/0 requirements. In fact, some DRAM devices
used the same 1/0 pins form · exed data as well as multiplexed address signals.) As soon ~e fas! page mode DRAM design is an improvement on the basic DRAM architecture. In
as the row address co nent is latched into the row address buffer, the row decoder · this design, each ro_w of the memory bH-array is viewed as a page. A page contains·multiple
~~:::::...;;:.~~~~-~--~:. .s. The length of this bit-row depends on the word size and words. Each ~rd 1s addr~ssed by a different column address, The sense amplifier in FPM
ce. ce the column address buffer is latched, the column decoder enables DRAM amphfies the entire page once its address is strobed into the row address latch.
the particul ord (referred the address) in order for it to propagate to the sense amplifier Thereafter, ea~h _wor~ of that page is read (or written) by strobing the corresponding column
(Die se amplifier's las ·s to detect the voltage level of the bits (transi$tor/capacitor pairs) addre~s. The t1mmg diagram for FPM DRAM is depicted in Figure 5.15. Here, after selecting
corr nding to the erenced word and amplify them to a high enough level for latching a particular page (row), three data words within that page are read consecutively. The page
int the output b ce the data is in the output buffers, it can be read by asserting tm
www.compsciz.blogspot.in
__________· ~
~x~-~~ Jo~
y M~~
Chapter 5: Memory 5.6: Advanced RAM ·1
ras
_J (f~~ i
cas ______,..n. . __~n.__ ____.n. . .___ ras l
address
cas L
row
address
,_
data
data
data
'
I
;·'
5.11
5.12
A given design with cache implemented has a main memory access cost of 20 cycles on
a miss and two cycles on a hit. The same design without the cache has a roam memory
access cost of 16 cycles. Calculate the minimwn hit rate of the cache to make the cache
implementation worthwhile.
Design your own SK x 32 PSRAM using an SK x 32 DRAM, by des1grung a re res
. . , f h
controller The refresh controller should guarantee refresh of each word every I 5.6!5
CHAPTER 6: Interfacing
microsec~nds. Because the PSRAM may be busy re~reshing itself when a rea~ or wnte
access request occurs (i.e., the enable input is set), 1t should have an output s1~al ~k
indicating that an access request has been completed. Make use of a timer. Design e
system down to complete structure. Indicate at what frequency your c!ock must operate.
6.1 Introduction
6.2 Communiciition Basics
6.3 Microprocessorlnterfacing: 1/0 Addressing
6.4 Microprocessor Interfacing: Interrupts
6.5 Microprocessor Interfacing: Direct Memory Access
6.6 Arbitration
6. 7 Multilevel Bus Architectures
6.8 Advanced Communication Principles
6.9 Serial Prntocols
6.1 o Parallel Protocols
6.11 Wireless Protocols
6.12 Summary
6 .13 References and Further Reading
6. 14 Exercises
6.1 Introduction
As stated in the Chapter 5, we use processors to implement processing, memory to implement
storage, and buses to implement communication. The earlier chapters described processors
and memory. This chapter describes implementing communication with buses, known as·
interfacing. Communication is the transfer of data among processors and memories. For
example, a general-purpose processor reading or writing a memory is a-common form of
communication. A general-purpose processor reading or writing a peripheral's register is
another common form.
We begin by defining some basic communication concepts. We then introduce several
issues relating to the common task of interfacing to a general-purpose processor: addressing,
interrupts, and direct memory.access. We also describe several schemes for arbitrating among
multiple processors attempting to access a single bus or memory simultaneously. We show
that many systems may include several hierarchically organized buses. We then discuss some rd'/wr
more advanced communication principles and survey several common serial, parallel; and Processor
enable - Memory
wireless communication protocols.
addr[O~I lj
.
data[0-7)
6.2 Communication Basics ~
bus
Basic Terminology (a)
We begin by introducing a very basic communication example between a processor and a rd'/wr .rd'/V.T
memory, shown in Figure 6.1. Figure 6.1 (a) shows the bus structure, or the wires connecting enable '-- enable "--
~~lf1
the processor and the memory. A line rd'lwr indicates whether the processor is reading or
=t~~ I
addr addr
writing. An enable line is used by the processor to carry out the read or write. Twelve address
tines addr indicate the memory address that the processor wishes to read or write: Eight data data data
lines data are set by the processor when writing or set by the memory when the processor is
reading. Figure 6. l(b) describes the read protocol over these wires: the processor se1:5 rd1/wr
to O, places a valid address on addr, and strobes enable, after which the memory WIii place (b} (c)
valid data on the data lines. Figure 6.l(c) shows a write protocol: the processor sets rd1/wr to
1, places a valid address on addr, places· data on data, and strobes enable, causing the Figure 6.1 : A simple bus example: (a) bus structure, (b) read protocol, (c) write protocol.
memory to store the data. This very simple example brings up several points that we now
describe. the IC. In fact, even for a processor packaged in its own IC, alternative packaging-techniques
Wires may be unidirectional; meaning they transmit in only one direction, as did rd'l wr, may use something other than pins for connections, such as small metallic balls. However, we
enable and addr, or they may be bidirectional, meaning they transmit in two directions, ···W
can still use the term pin to refer to a port on a processor.
though in only one direction at a time, as did data .. A set ?f wires with the same functi~n is The distinction between a bus and a port is similar to the distinction between a street and
typically drawn as a thick line and/or as a line with a small angled line drawn through 1t, as a driveway - the bus is like the street, which connects various driveways. A processor's port
was the case with addr arid data. is like a house's driveway, which provides access between the house and the street.
The tenn bus can refer to a set of wires with a single function within a communication. The most common method for describing a hardware protocol is a timing diagram, as was
For example, we can refer to the "address b~" and the "data bus" in the above e_xan:1ple. The used in Figure 6 . l(b) and(c). In the diagram, time proceeds to the right along the x-axis. The
tenn bus can also refer to the entire collection of wires used for the commurucatlon (e.g., diagram shows that the processor must set the rd'lwr line low for a read to The diagramoccur.
rd'lwr, enable, addr, and data) along with the communication protocol over those wires. Both also shows, using two vertical lines, that the processor must place the address on addr for at
uses of the term are common and are often used in conjunction with one another. For least t,ctup time before setting the enable line high. The diagram shows that the high enable
example, we may say that the processor's bus consists of an address bus a,iid a_data_ bus .. A __ line triggers the memory to put data on the data wires after a time tread, Note that a timing
protocol describes the rules for communicating over those wires. We deal pnmarily With low- , diagram represents control lines, like rd'lwr and enable, as either being high or low, while it
level hardware protocols in this chapter, while higher-level protocols, like IP (Internet represents data lines, like addr and data, either as being invalid or valid, using a single
Protocol) can be built on top of these protocols, using a layered approach. homontal line or two horizontal lines, respectively. The actual value of data lines is not
The bus connects to ports of a processor (or memory). A port is the actual conducting normally relevant when describing a protocol, so that value is typically not shown.
device, like metal, on the periphery of a processor, through which a signal is input to or output In the above protocol, the control line enable is active high, meaning -that a 1 on the
from the processor. A port may refer to a single wire, or to a set ·of wires with a single enable line triggers the .data transfer. In many protocols, control lines are instead active low,
function, such as an address port consisting of twelve wires. A related term is pin. When a meaning that a O o_n the line triggers the tr.iilsfer. Sµch a control line's nanJ,e is typically
processor is packaged as its own IC, there are actual pins extending from the package, and written with a bar above it, a single quote after it (e.g., enable), a forward slash before it (e.g.,
those pins are often designed to be plugged into a socket on a printed-circuit board. Today, /enable), or _the letter L after.it (e~g., enable_/). To be general, we will use the term assert to
however, a processor commonly coexists on a single IC with other processors and memories. mean setting a control line to its active value, .such as to I for an active high line, and to O for
·Such a processor does not have any actual pins on its periphery, but rather "pads''.. of metal in an active low line. We \\'i ll use the term deassert to mean setting the control line to its inactive
139
138 Embedded System Design
-·· ---·-··'-~ ~ - - - ··- -··----,--------- --····· --· ----------·--·· ··- . www.compsciz.blogspot.in - ··· ·--·-- -- ------- ------ ------- --·------· ,~- . ·-~+::c~---
-~
·;i] -
· Chapter 6: Interfacing
6.2: Communication Basics )
~
Master
data
mux
Servant
data
demux
Master req Servant Master req Servant Master
~
req
ack
Servant
; ~J
data
C:
data ~~
~~
www.compsciz.blogspot.in
141
iil
-·-- ----- ·- ··-- ------ ···--·-u~------- , • _____.:. :.,;. ,;_ , •• . .. l
-~ .::o:,;,.!,;.,; ,~~
Chapter 6: Interfacing ·8.2: Conwnunic:ation Basics
=~
t-
A[l9-0J H'---,----;------;--AD-,--D_RE_ss_ _ _ _ _+--_--..J
·
ALE µ-i'----+----;-----+----+-----
/MEMR l !
included in lhe timing diagram. The operation works as follows. In clock cycle Cl, -the-(
Processor Processor Porto
microprocessor puts a 20~bil memory address on lhe address lines A and asserts lhe address
Port I
latch enable signal ALE. During clock cycles C2 and CJ, the processor asserts th(; memory
Port2
read signal MEMR to request ·a read operation from the rnernory device. After C3 . the memory Port 3
device holds lhe data on data lines D. In cycle C4, all signals are deasserted.
The ISA read bus cvcle uses a compromise strobe/handshake control method. The
memory device deasserted the ch~el ready signal CHRDY before the rising clock edge in
C2. ca~sing lhe microprocessor to insert wail cycles until CHRDY was reasserted. Up lo six
wait cycles can be inserted by a slow device.
Figure 6.5(b) illustrates the bus timing for performing a memory write operation, ~eferred Port A Port B Port C Port A Port B Port C
to as a memoril write cvcle. During a memory write bus cycle. the microprocessor dnves the
bus signals to ·write a b)1e of data lo memory. The operation works as follows. in cl~k cycle (a) (b)
Cl. lhe processor puts the 20-bil memory address to be written on the address Imes and
Figure 6.6: Parallel 1/0: (a) adding parallel 1/0 to a bus-based 1/0 processor, (b) exlended parallel 1/0.
asserts the ALE sign~. During cycles C2 and 0. the processor pu!s lhe wri!e data ~n the data
lines and asserts lhe memory write signal A/£~./H' to indicate a wnle operauon lo the memory
device. In cycle Cl. all signals are deasserted. The write cycle also uses a compromise In bus-based 110, the microprocessor has a set of address, data, and control ports
strobe/handshake control method. · correspondi,ng to bus lines, and uses the bus to access memory as well as peripherals. The
microprocessor has the bus protocol built in lo its hardware. Specifically, the software does
not implement lhe protocol but merely executes a single instruction lhat in tum causes the
- ~ ~=~~- -- -----·
www.compsciz.blogspot.in
!
I
Chapter 6: Interfacing 6.3: Microprocessor Interfacing: UO Addressing ~
fj
;.i
CLOCK! ALE
., P2 Adr. 15...8
D[7-0) i
A[19-0]
"!
'ADDRESS
[-i. . ~-------+----------....---~r- Q Adr. 7... 0
/WR
P2
8
Figure 6.8: A basic memory protocol: (a) timing diagram for a read operation. (b) interface schematic.
Figure 6. 7: ISA bus protocol for standard 1/0.
In standard /IQ (also known as 110-mapped /10), .the bus includes an additional pin,
which we label MIIO,.to indicate whether the access is to memory or to a peripheral (i.e.; an Example: The ISA Bus Protocol - Standard 1/0
I/0 device). For example,. when MIIO is 0, the address on the address bus corresponds to a The ISA bus protocol introduced_earlier supports standard 1/0. The 1/0 read bus cycle is
memory address. When WIO is l, the address corr-esponds to a peripheral. · depicted in Figure 6.7. During this bus cycle, the microprocessor drives the bus signals to read
.An advantage of memory-mapped 1/0 is that the microprocessor need not include special I a byte of data from a peripheral, according lo tlte tiining diagram shown. Note lhat the cycle
instructions for communicating with peripherals. The microprocessor's assembly instructions ' uses a control line distinct from !MEMR. namely 1/0R, wh_ich is consistent with the standard
involving memory, such as MOV or ADD, will also ·work for peripherals. For example, a . 1/0 approach. The 1/0 device address space is limited to 16 bits, as opposed. to 20 bits for
microprocessor may have an ADD A, B instruction that adds the data at address B to the data , memory devices. The I/0 write bus cycle is similar to the memory write bus cycle but uses a
at address A and stores the result .in A. A and B may correspond to memory locations, or control signal I/OW and again limits the address to 16 bits. The 1/0 read and write bus cyclej;
registers in peripherals. In contrast, if the microprocessor uses standard 1/0, the use tlte compromise strobe/handshake 't)ntrol method. as did the memory bus cycles.
microprocessor . requires special instructions for reading and writing peripherals. These
instructions are often called IN and OUT. Tiws, to perform the same addition of locations A Example: A Basic Memory Protocol
and B corresQ<>nding to peripherals, the following instructions would be necessary: In this example. we illustrate how to interface SK of data and 32K or9program code memory
IN~A .
to a microcontroller. specifically the Intel 8051. The 8051 uses separate memory address
lNRl,B spaces for data and program code. Data or code address space is limited to 64K, hence,
ADDRO,Rl addressable with 16 bits through ports PO (least significant bits) and P2 (most significant
OUTA,RO bits). A separate signal, called PSEN (program strobe enable), is used to distinguish between
Advantages of standard 1/0 include no loss of memory addresses to the use as 1/0 data/code. For the most part, the 8051 generates all of the necessary signals to perform
. addresses, and potentially simpler address decoding logic in peripherals. .Address decoding memory 1/0, however, since port PO is used both for the least significant address bits and for
logic can be simplified with sl3rldard 1/0 if we know that ihere will only be a smali nwnber of data. an 8-bit latch is required to perform the necessary multiplexing. The timing diagram
peripherals, because ilie peripherals can then ignore high-orqer address bits_: For example, a depicted in Figure 6.8(a) illustrates a memory read operation. A memory write operation is
bus may have a 16-bit address, but we niay know there will never be more ' than 256 I/0 performed in a similar fashion with data flow reversed and RD (read) replaced wilh WR
addresses required. 11).e peripherals can .thus safely ignore the high-order 8 address bits, (write). The memory .read operation proceeds as follows. The microcontroller places. the
resulting in smaller and/or.faster addressC<>mparators in each peripheral. llJote that we can source address_(i.e., the memory location to be read) on ports P2 .and PO. P2, holding lhe
build a system using both standard and memory-mapped 1/0, since peripherafs in the memory · eight most significant addriess bi~, retains its value throughout the read operation. PO. holding
space act just like memory themselves. the eight least-significant address bits; is stored inside an 8-bit latch. The AIE signal (address
..::::l
3
/(a}: µPis executing its main program. l{b): Pl receives input data in a
register with address Ox8000.
!SR Program memory
Vi: MOV RO. 0~8000
!SR Program memory ,-=:---~ !j",
lfr MOV RO Ox8000 tl
17: # modifies RO :··;
17: ' }I
18: MOVOx800 l. RO
asserts Int to request servicing
3: After completing instruction at 100, µP sees
2: PI
b the micro rocessor.
19: RETI # ISR rctum
Main program
18: MOV Ox8001 . RO
19: RETI Ii ISR return
... 1
•I
4(a}: The ISR reads data from Ox8000, modifies !(a): µPis executing its main program
2: _Pl asserts Int to request servicing by the
the data, and writes the resulting data to Ox8001 . . l(b): Pl receives input data in a register with nucroprocessor
address Ox8000.
I
I 7: # modifies RO
18: MOVOx8001. RO /
18: MOV0x8001, RO
Figure 6. IO: lntcmtjll-drivcn.1/0 using fiAed ISR location: summary ofnow of actions. 19: RETI # ISR return
... 19: RETI # ISR return
I
,\,Jain program
Mam program
numerous peripherals that can request service. In this method, the microprocessor has one 100: instruction
100: in.,tru,'liou
interrupt pin, say, Int, which any peripheral can assert. After detecting the interrupt, the 101: instmction
IO l: instruction
microprocessor asserts another pin, say, lnta, to acknowledge that it has detected the interrupt
and to request that the interrupting peripheral ·provide the address where the relevant ISR 3: After completing instruction at 100, µP sees Int 4(a): The IS~ reads data from Ox8000, modifies the
resides. The peripheral provides this address on the data bus, and the microprocessor reads the .asserted, saves the PC's value of 100, and sets PC data, and vm.tes the resulting data to Ox800 I .
address and jumps to the corresponding ISR. We discuss the situation where multiple lo the ISR fixed location of 16. 4(b): After being read, Pl de.asserts Int.
peripherals· simultaneously request servicing in a later section on arbitration. For now,
consider an example of one peripherar using vectored interrupt, The flow of actions is shown
in Figure 6: 12, which represents an example very similar to the previous one. Figure 6.13 !SR Program memory
illustrates the example graphically. In contrast to the earlier example, the ISR location is not 16: MOV RO, Ox8000
17: # nwdifies RO
fixed at 16. Thus, Peripheral! contains an extra register holding the ISR location. After 18: .MOV 0,8001. RO
detecting the interrupt and saving its state, the microprocessor asserts lnta in order to get 19: RETI ~ ISR return
Peripheral! to place 16 on the data bus. The microprocessor reads this 16 into the PC and 5: The ISR returns, thus restoring PC to
Main program
loo+l=IOI, where µPresumes executing.
then jumps to the ISR, which exectJtes and completes in the same manner as the earlier tool instruction<-/--
example. IOI : instruction
As a compromise between the fixed· and vectored interrupt methods, we can use an
Figure 6. 1I: Interrupt-driven 1/0 using fixed !SR l ocation: flow ofac1i• ~s.
interrupt address table. In this method, we still have only one interrupt pin on the processor,
but we also create in the processor's memory a table that holds ISR addresses. A typical table
might have 256 entries. A peripheral, rather than providing the ISR address, instead provides ~que nurn~r independent of ISR locations, meaning that we could move the ISR location
without haVIIlg to change anything in the peripheral. ·
a number corresponding to an entry in the table. The processor reads this entry number from
ihe bus, and. then reads the corresponding table entry to obtain the ISR address. Compared to External interrupts may. be maskable .or nonmaskable. In maskable interrupt, the
the entire memory, the table is typically very small, so an entry number's bit encoding is pro~er ma! force ~e nucropr~sor to ignore the:interrupt.pin, either by executing a
small. This small bit encoding is especially important when the data bus is not wide enough to ~ific m~ctI?n to disable the iµterrupt _o r by setting bits in an interrupt configuration
hold a complete'ISR address. Furthermore, this approach allows us. to assign each peripheral a ~gister: _A Situa_tion where a programmer ought want to mask interrupts is·when there exist
tune-cnttcal regmns of code, such as a routine that generates a puJse of a certain duration. The l
· :1so Embedded System Design
l
Embedded System Design
151 i!
j
www.compsciz.blogspot.in ·--- - ·-··- -·-··----·- ·-· ··- --~ ~. ~ ...J
j
I
Chapter 'ii: Interfacing
6.5: Microprocessor Interfacing: Direct Memory Access
programmer may include an instruction that disables interrupts at the beginning of the routine, 3: After completing instruction at 100, µP sees Int 4: Pi detects/nta and puts interrupt address
and another instruction reenabling inte11Upts at the end of the routine. Nonmaskable interrupt asserted, saves the PC's value of 100, and asserts ·vector 16 on the data bus.
lnJa.
cannot be masked by the programmer. It requires a pin distinct from maskable interrupts. It is
typically used for very drastic situations, such as power failure. In this case, if power is JSR Program memory Program memory
ISR
failing, a nonmaskable interrupt can cause a jump to a subroutine that stores critical data in 16: MOV RO, Ox8000 16: MOV RO, Ox8000
17: # modifies RO , · 17: # modifi<s RO
nonvolatile memory, before power is completely gone. 18: MOV Ox8001. RO , 18: MOVOx800f.iW
In some microprocessors, the jump to an ISR is handled just like the jump to any other 19: RETI # ISR return '
,.. 19: REJI # ISRr<'<IJm
...
subroutine, meaning that the ·state of the microprocessor is stored on a stack, including Mi1i11 progta111 Main program
contents of the program counter~_datapath status register, and all other registers. The state is 10/i; ins1r11ction 100: instruction
then restored upon completion of the ISR. In other microprocessors, only a few registers are IO[: inslruclion IO l: instruction
stored, like just the program counter and status registers. The assembly.programmer must be
aware of what registers have been stored, so as not to overwrite nonstored register data with 5(a): P jumps to the address· on the bus ( 16). The
JSR there reads data from Ox8000, modifies the 6: The ISR returns, thus restoring the PC to
the ISR. These microprocessors need two types ofassembly instructions for subroutine return. data, and writes the resulting data to Ox8000.I. 100:I-I=.IOI; where the µPresumes
A regular return instruction returns from a regular subroutine, which was called using a 5(b): After being read, Pl deasserts Int..
subroutine call instruction. A return from intellUpt instruction returns from an ISR, which was
jumped to not by a call instruction but by the hardware itself, and which restores only those
Figure 6.13: Interrupt-driven-VO using vectored interrupt: flow of actions.
registers that were stored at the beginning· of the interrupt. The C programmer is freed from
having to worry about such considerations, as the C compiler handles them.
The reason we used the term external interrupt is to distinguish this type of inte11Upt
from internal interrupts, also called traps. An internal interrupt results from an exceptional
condition, such as divide-by-0, or execution of an
invalid opcode. Internal interrupts, like 6.5 Microprocessor Interfacing: Direct Memory Access ·
external ones, result in a jump to an ISR. A third type of interrupt, called software interrupts,
can be initiated by executing a special assembly instruction. Commonly, the data being accum~ted in a peripheral should be first. stored in memory
before being processed by a program running on the microp~ocessor. Such temporary storage
of data that is awaiting processing is called buffering. for example, packet .data from an .
.1'"'" '
would then have to wait for the DMA to complete).
! not recogruze any difference between being connected to a DMA controller device or a
....,~-------------,
,:' I (ai: ftP is executing its main I (b): Pl receives input
Program memory
71 prograIU. It has already configured data in a register with No JSR needed! No JSR needed.'
the f)J£.l cir! registers. address Ox8000.
3: OMA ctr! asserts
./: Aller executing instruction IOO, Dreq torequest control ainprogram ainprogram DMActrl Pl
µP sees Dreq asserted, releases the of system bus.
sysh:m bus, asserts Dack, and I ~>.-OOOtl ad(
00: instruction • 00: instruction
resumes execution. µP stalls only if 101: instruction : IO I: instruction PC ,... ~ req . Ox8000
it needs the system bus to continue 5: (a) OMA ctr! asserts
ack (b) reads data from
[@] :CJ ~~
executing.
Ox8000 and (b) writes !(a): µPis executing its main program. It has ~------------------~~
',~ { 2: Pl asserts req to request servicing /
:
that data lo OxOOO I. already configured the DMA ctrl registers
!(b): Pl receives input data in a register with ,,, : by DMA cul
address Ox8000. · ' \ 3: OMA ctrl asserts Dreq to request control of
' systembus
Dreq andack
Program memory
7(aJ: µP deasserts Dack and completing handshake
resumes control of the bus. with PL No !SR needed! No JSR needed!
figur< 6.16: Peripheral to memory trnnsf.!r with D!\·I A: summary of flow of actions.
ainprogram ainprogram
microprocessor de,·ice: all the peripheral knows is that it asserts a request signal on the 100: instruction : 100: instruction
device. and then that device services the peripheral's request. We connect the DMA controller lO I: instruction Ox8000 : 101: instruction
to two special pins of the microprocessor. One pin. which we'll call Dreq, is used by the Ci:]
DMA controller to request control of the bus. The other pin, which we'll call Dack, is used by 4: After executing instruction 100, µP sees Dreq
the microprocessor to acknowledge to the DMA controller that bus control has been granted. asserted, releases the system bus, asserts Dack, and 5: (a) DMA ctrl asserts ack, (b) rellds data from
resumes execution, µP stalls only if it needs the Ox8000, and (c) writes that data to OxOOO I.
Thus, unlike the peripheral. the microprocessor must be specially designed with these two
pins in order to support DMA. The DMA controller also connects to all the system bus system bus to continue executing.
...--=---------,
signals. including address, <lat.a. and control lines. · Program memory µP Datamemory
0~-0000 OxOOOI
To achieve this we must have configured the.OMA controller to know what addresses to
,Vo JSR needed! No !SR needed.'
access in the peripl1cral and the memory. Such setting of addresses may be done by a routine System bus
running on the-microprocessor during system initialization. In particular. during initialization,
the microprocessor writes to configuration registers in the DMA controller just as it would
write to any other pe·,1pherars registers. Alternatively, in an embedded. system that is ainprogram
gu~ranteed not to change. we can hardcodc the addresses directly into the.OMA controller. In 100: instruction
the example of Figure 6. 17, we sec two registers in the DMA controller holding the peripheral 101: instruction
register address and the memory address.
During its control of the system bus, the DMA controller ni.ight transfer just one piece of
data. but more commonly will transfer numerous pieces of data (called~ block), one right
after other. before relinquishing the bus. This is because many peripherals, such as any
peripheral that dc;1ls ,\'ith storage devices (e.g.. .CD-ROM players or disk controllers) or that Figure 6.17: Peripheral to memory transfer with OMA: flow of actions.
deals "ith network communication. send and receive data in large blocks. For example.· a
particular disk controller peripheral might read data in blocks of 128 words and store ~is data For the example just given, the DMA controller works as follows. The DMA controller
in c1 128-word internal memory. after which the peripheral requests servicing (i.e., requests gains control of the bus, makes 128 peripheral reads and memory writes, and only then
that this data be buffered in memory). relinquishes the bus. We must therefore -configure the OMA controller to operate in either
www.compsciz.blogspot.in
..,,•. ~.'···; ·.:.
··--- ---··--·-- ··--- ·- - ~
-- =="-"- ··· -- ·-- .. c __ ,j
II
1
Chapter 6_: Interfacing
----·---------------------.....:...:.:.:.::.:.:::.:::::.
Micro-
6.6: Arbitration
2~ 2
Ireql - - - ~
CYCLE /Cl (C2 iC3 \c4 b lC6 lC7
i I
Iackl 6
~
ALE 4. Microprocessor stops executing its program and stores its state.
w i L'1E~J
5. Microprocessor asserts lnta.
flORi~ --~!~I_._~~Ii 6.
7.
Priority arbiter asserts lackI to acknowledge Peripheral 1.
Peripheral I puts its interrupt address vector on the system bus.
IMEMW ,_,_ _ _ ....,i
i LJ_J l I(;)W 8. Microprocessor jumps to the address ofthe ISR read from the data bus, !SR executes and returns
I LI (and completes handshake with arbiter).
CHRDY
REQlr---t--+--+--+--+--+-~L ,~.
ACK:e--s---;---.;--a---'-'
--------+---. 9. Microprocessor resumes executing program.
(b) (c)
response, to this, the .OMA controller will assert its DRQ to signal the processor. The ·
processor, then, relinquishes the bus control signals and signals to the OMA controller with an
Figure 6.18: DMA using tho lSA bus protocol: {a) system architeclure. (b) DMA write cycle, {c) DMA read cycle. acknowledgment (DACK). In response, the OMA will acknowledge the 1/0 device's DRQ by
asserting its DACK. At this point, the actual transfer of data from the device to memory is
single transfer mode or block tran~fer mode. For block transfer mode, we must configure a initiated. Note that the. actual OMA signals (DACKs and DRQs) are nor part of the ISA
base address as well as the number of words in a block. . protocol. The ISA protocol merely provides a scheme for performing an 1/0 read and a .
OMA controllers typically come with numerous channels. Each channel supports one memory write in the same bus cycle. The OMA memory write bus cycle is shown in Figure
peripheral. Each channel has its own set of configuration registers. Some modem peripherals 6.18(b).
come with OMA capabilities built into the peripheral itself. Let us now look at the OMA memory read bus cycle. The DMA memory read bus .cycle
is almost identical to a OMA memory write bus cycle. The_only difference is that !OW is
Example: OMA 1/0 and the ISA Bus Protocol replaced with !OR and MEMW is replaced with MEMR. In addition, the order in which the
1/0 write·artd memory read signals are asserted is reversed. The OMA memory read bus cycle
In an earlier example, we introduced the basic ISA memory and peripheral 1/0 read and write
is shown in Figure 6.18(cr
bus cycles. In this example, we will introduce the OMA related bus cycles. Our sample
architecture is extended now to include a OMA controller as shown in Figure 6.18(a). In this
figure, R d~notes the OMA request signal and A denotes the OMA acknowledge signal.
. OM".', 1s used to perform memory writes/reads to/from 1/0 devices directly without the 6.6 Arbitration
mt~rvenuon of the processor. Let us first look at the OMA memory write bus cycle. A OMA In our earlier discussions, several situations existed in which multiple peripherals might
wnte bus cycle proceeds as follows. First, the processor programs the OMA contrciller to request service from a single re59urce. For example, multiple peripherals might share a single
monitor a particular 1/0 device for available data. The processor also programs the OMA with microprocessor that services their interrupt requests. As another example, multiple peripherals
the starting memory address where the data item is to be written to. Once the 1/0 devi~e has might share a single OMA controller that services their DMA requests. In such situations, two
available data, it generates a OMA request by asserting its OMA request line (DRQ). In or more peripherals may request service simultaneously. We therefore must have some
158 159
Embedded System Design
www.compsciz.blogspot.in
Chapter 6: lnte1facing
6.6: Arbitration
method to arbitrate among these contending requests. Specifically, we must decide which one
of the contending peripherals gets service, and thus which peripherals need to wait. Several ;' µP
methods exist, which we now discuss. System bus
Priority arbiters typically use one of two common schemes to detennine priority among Peripheral I : Periphera12 Peripheral)
Inta I
peripherals: fixed priority or rotating priority. In fixed priority arbitration, each peripheral has ck_in Ack_ou ck_in Ack_out
hit eq_out Req_in 0
a unique rank among all the peripherals. The rank can be represented as a number, so if there
eq_out Req_in
are four peripherals, each peripheral is ranked l, 2, 3, or 4. If two peripherals simultaneously
seek servicing, the arbiter chooses the one with the higher rank. I
\---'-------',
In rotating priority arbitration (also called round-robin), the arbiter changes priority of
----------------- - --- I
peripherals based OD the history of servicing of those peripherals. For example, one rotating
(b)
priority scheme grants service to the least-recently serviced of the contending peripherals.
This scheme obviously requires a more complex arbiter.
Figure 6.20: Arbiuation using a daisy-chain configuration: (a) Daisy-chain aware peripherals, (b) adding logic to
We prefer fixed priority when .there is a clear difference in priority among peripherals.
make a peripheral daisy-chain aware; more complex logic will typically be necessary, ,however.
However, in many cases the peripherals are somewhat equal, so arbitrarily ranking them could
cause high-ranked peripherals to get much more servicing than low-ranked ones. Rotating
its request will flow through the downstream peripherals and eventually reach the
priority e11$ures a more equitable distribution of servicing in this case.
microprocessor. Even if more than one peripheral requests servicing, the microprocessor will
Notice that the priority arbiter is connected to the system: bus, sin1;e the microprocessor
see only one request. The microprocessor acknowledge signal connects to the first peripheral.
can configure registers within the arbiter to set · the priority schemes and/or the relative
If this peripheral is requesting service, it proceeds to put its interrupt vector address on the
priorities of the devices. However, once configured, the arbiter does not use the system bus
system bus. But if it doesn't need service, then it instead passes the acknowledgment
when arbitrating.
upstr~am to t~e next peripheral, by asserting its acknowledge output In the same manner, the
Priority arbiters represent another instance of a standard single-purpose processor. They
next peripheral may either begin being serviced or may instead pass the acknowledgment
are also often found built into o_ther single-purpose processors like DMA controllers. A
along. Obviously, the peripheral at the front of the chain, i.e., the one to which the
common type of priority arbiter arbitrates interrupt requests; this peripheral is referred to as an
· microprocessor acknowledge is connected, has highest priority, and the peripheral at the end
interrupt controller.
of the chain has lowest priority.
We prefer a daisy-chain priority configuration over a priority arbiter when we want to be
Daisy-Chain Arbitration able to add or remove peripherals from an embedded system without redesigning the system.
The daisy-chain arbitration method builds arbitration right into the peripherals. A daisy-chain Although conceptually we cou.ld add as many peripherals to a daisy chain· as we desired, in
configuration is shown in Figure ·6.20(a), again using vectored intemipt to illustrate the reality the servicing response lime for peripherals at the end of the chain could become
method. Each peripheral has a request output and an acknowledge input, as before. But now intolerably slow. In contrast to a daisy chain, a priority arbiter has a fixed number of channels;
each peripheral also has a request input and an ackriowledge output. A peripheral asserts its once they are all used, the system needs to be redesigned in order to accommodate more
request output if it requires servicing or if its request input is asserted; the latter means that peripherals. However; a daisy chain has the drawback of not supporting more advanced
one of the "upstream" devices is requesting servicing. Thus, if any peripheral needs servicing, priority schemes, like rotating priocity. A second drawback is that if a peripheral in the chain
stops working, other 'peripherals may lose their access to the processor.
www.compsciz.blogspot.in
Chapter 6: Interfacing
AIL.hough it appears from Figure 6.20(a) that each peripheral must be daisy-chain aware, ?; Processor
in fact logic external to each peripheral can be used to cany out the daisy-chain logic. Figure ·.·
6.20(b) illustrates a simple form of such logic. Periphera/1 and Periphera/3 are both ,
daisy-chain aware. whereas Periphera/2 is not. In order to incorporate Periphera/2 into the ..• MASK
{t!
MEMORY
daisy chain configuration, we must extend it to take care of requests and acknowledgments. \ IDXO
Regarding requests, if Periphera/3 requests service or Peripheral2 requests service, then ' IDXI
Periphera/1 's req__ in needs to be asserted. To accomplish this, we OR Peripheral2's req_out ENABLE
and Periphera/3's req_out and input the result to Peripheral!. Regarding acknowledgments,
if Periphera/1 's ack_out is asserted., then if Periphera/2 requested service, it should not pass DATA Jump Table
this acknowledgment to Peripheral], per the daisy-chain protocol. However, if Periphera/2
did not request service, then it should pass the acknowledgment to Peripheral3. To ·
accomplish this, we use an inverter and an AND gate, as shown in the figure. Only if .
Peripherall 's ack_out is high and Peripheral2's req_out is low do we assert Peripheral3's
ack__ in. However, note that this logic is very simple in this case, whereas m<>st peripherals will Figure 6.21: Architecture of a system \J~ing vectored interrupt and an interrupt table.
require more complex logic. even implementing a state machine, to convert the peripheral to a
daisy-chain aware device. devices are collllected to a two-channel priority arbiter with fixed priority scheme (i.e.,
Per[pherall has higher priority than Peripheral2). Both the peripherals and the arbiter are
Network-Oriented Arbitration Methods connected to the processor's memory bus and communicate with it using memory-mapped
The arbitration methods described are typically used to arbitrate among peripherals in an 1/0. The interrupt table index placed on the memory bus (a.k.a. system bus) by the arbiter is
embedded system. However, many embedded systems contain multiple microprocessors software progranunable through two memory-mapped registers. Both peripheral devices
communicating via a shared bus; such a bus is sometimes called a network. Arbitration in receive data from the external environment and raise their interrupt accordingly.
such cases is typically built right into the bus· pro_tocol, since the bus serves as the only The software to initialize the peripherals and the priority arbiter, and to process the data
connection among the microprocessors. A key feature· of such a collllection is that a processor received by our peripherals, is given in Figure 6.22. Let us now study the code. First, we
about to write to the bus has no way of knowing whether another processor is about to define a number of variables that correspond to the registers inside the priority arbiter and
simultaneously write to the bus. Because of the relatively long wires and high capacitances of peripheral devices. However, unlike defining ordinary variables in a program, these variables
such buses. a processor may write many bits of data before those .bits appear at another must refer to specific memory locations, namely, those that are mapped to the peripheral's
processor. For example, Ethernet and I2C use a method in which multiple processors may register. Normally, a compiler will place a variable somewhere in memory where storage for
write to the bus simultaneously, resulting in a collision and causing any data on the bus to be that variable's data is available. By using ·special keywords, we can force the compiler to
corrupted. The processors detect this collision, stop transmitting their data, 'wait for some place these variables at specific memory locations (e.g., in our compiler the keyword at
time. and then try transmitting again. The protocols must ensure that the contending followed by a memory location ill used to accomplish this). The priority arbiter, thus. has fou~
processors don't start sending again at the same time, or must at least use statistical methods registers located at memory locations Oxfffl) through OxfffJ. Note that our processor has a 16-
that make the chances of them sending again at the same time small. bit memory address. ·
,As another example, the CAN bus uses a clever address encoding scheme such that if two Next, we define two procedures, Periphera/2_JSR and Periperha/2JSR, that handle the
a\Jdrcsscs arc written simultaneously by different processors using the bus, the higher-priority interrupts generated by the peripherals. Since we are using an interrupt jwnp table, these ISRs
address will override the lower-priority c,me. Each processor that is writing the bus also checks can be ordinary C procedures. Each JSR, of course, must perform necessary processing.
the bus. and if the address it is writing does not appear, then that processor realizes that a Often, an ISR merely reads the data from the peripheral, places the data into a buffer, sets a
higher-priority transfer is taking place and so that processor stops writing tke bus. flag indicating to the main program that the buffer was updated.
Finally, we define the procedure lnitializePeripherals. The procedure first configures the
Example: Vectored Interrupt Using an Interrupt table priority arbiter. We can select, in software, which interrupts we are willing to handle. This is
This is an example of a system using vectored interrupts as well as a vectored interrupt table. done through the mask register. ~n our case, we set the first two bits of the mask register,
We will describe the software progi-,m1ming required to handle the interrupt requests. The indicating that we are to handle interrupts generated by both peripherals. Next, we program
relevant portions of the system architecture are shown in Figure 6.21. Here, two peripheral ·: the priority arbiter with the indices into the jump table where the location of the ISR is stored.
\ We have chosen to place these in locations 13 and 17, but this choice is arbitrary. The
www.compsciz.blogspot.in
I
.. ·-.~:·
-·r··-· ·1
1i
- - - - - - - - - - - - - - - - - : : - - - - , ; _ _ . . . . ;_ _,:s·::7.:..t::ii.1:u:tti::le:,:v:,:e::_I:B~u,!s:_A~r~ch~it~ect~u~r~e!s·
Chapter 6: Interfacing .,
:--- . 0 cL,~</)
Micro- Cache Memory. .--~D-=-MA-c·--(D /i(:>-'
processor controller controller
unsigned char ARBITER MA.SK REG _at_ OxfffO; !]
unsigned char ARBITER:::Qi()_INlE(_Rffi
unsigned char ARBITER- an- INDEX_JlF.G
unsigned char ARBITER_ENABIE_ Rffi .
unsigned char PERIPHERAL!_ll'\TA_Rffi
_at_ Oxfffl;
_at_ Oxfff2;
_at_ Oxfff3;
_at_ OxffeO; :((J
._, Processor-local bus \ ,ilt,'
.., ,
\: ""'
;
~
;1
0-- 'l
unsigned char PERIPHERAL2_ ll'\TA_Rffi _at_ Oxffel;
unsigned void* INI'ERRUPI'_ IJXJWJP_ TABLE (256] _at_ OxOlOO;
Peripheral 'Peripheral \ Peripheral \ ~ f r---::-..,.L-~ .,, ) ;_
,11'
\ \\,.I/.-?,
I~
void Perii:nerall_ISR(void) { "~\\ f -~ '\,
unsigned char data;
data = PERIHiERAI.l_r:mA_Rffi;
\,0 ~.. ,JA\-?:,_~-
J
I I do saret:hin;J with the data
~
-.-----''-------..L..-----f-'------------L-- U\·.
Periph~al.~~s,, 1 ?G ( <i <;~) r~
periphe~ls, _lik0A~U')we could try to i!Il.Plement a single ~!g~_--:s p~~Jius for all the J · J_,,--., 0
'l,\ } ;---1'- - J • :11 / 1 ·"
-.,_()
commurucauons, but this ;ipproach has several disadvantages. First. it re qmres ~Cb penpheraL .,I
I
" I - . ~ -=,_ ---;_.
www.compsciz.blogspot.in
---- - . - ---- .. - ·--·---- ------- -----·-·-· -~'"- ------ ···------· . . • . : · ··- . _: ____,__, __ _.__
Chapter 8: Interfacing 6.8: Advanced Communication Principles
power wires, running from one device to another. In serial communication, a word of data is
6.8 Advanced Communication Principles transmitted one bit at a time. Serial buses are capable of higher throughputs than parallel
In the preceding sections, · we discussed basic meth~ ~f interfa~g. Th~se interfacing buses when used to connect two physically distant devices. The reason for this is that a serial
methods could be applied to interconnect components within an IC vta on-chip buses, or to bus will have less average capacitance, enabling it to send more bits per unit of lime. In
interconnect res via on-board buses. In the remainder of the chapter, we study more advanced addition, a serial bus cable is cheaper to build because it has fewer wires. The disadvantage of
interfacing concepts and look at communication from a more abstract point of _view. In a se~al bus is that the interfacing logic and communication protocol will be more complex.
particular, we study parallel, serial, and wireless communication. We also descnbe some on· the sending side, a transmitter must decompose data words into bits and on the receiving
advanced concepts, such as layering and \error detection, which are part of_ many side, and the receiver must compose bits into words.
communication protocols. Furthermore, we highlight some of the popular parallel, senal, and Most serial bus protocols eliminate the need for extra control signals, such as read and
wireless communication protocols in use today. · . write signals, by using the same wire that carries data for this purpose. This is performed as
Communication can take place over a number of different types of media,_such as _a follows. When data is to be sent, the sender first transmits a bit called a start bit. A start bit
single wire, a set of wires, radio waves, or infrared waves. We refer to the med~um that 1s merely signals the receiver to wakeup and start receiving data. The start bit is then followed
used to carry data from one device to another as the physical layer. Dev_en~g. on the by N data bits, where N is the size 01 the word, and a stop bit. The stop bit signals to the
protocol, we may refer to an actor as a device or ~ode. In either case, a devtce 1s _s1mply a receiver the encl of the transmission. Often, both the transmitter and the receiver agree on lhe
processor that uses the physical layer to send or r~1v~ data to ~ from ano~er ~evtce. transmission speed used to send and, receive data. After sending a start bit, the transmitter
In this section, we provide a general descnpt1on of senal co~urucanon, ~ e l sends all N bits at the predetermined transmission speed. Likewise, on seeing a stan bit. a
communication, and wireless communication. In a~tion, we · ~escnbe commurucauon receiver si·mply starts sampling the data at a predetennined frequency until all N bits arc
principtss such as layering, error detection and correction, data secunl)', and 'plug and play. assembled. Another common synchronization technique is to use an additional wire for
/ . clocking purposes (see the I2C bus protocol). Here, the transmitter and receiver devices use
allel Communication ~ ...._ this clock line to determine when to send or sample the data.
·' ara//el communication takes place when the physical layer is capable of carrying mul~ple
bits ofdata from one device to another. This means that the data bus is-composed of multiple
Wireless Communication
data wires in addition to control and possibly . power wires, running in parallel from one Wireless commUnication eliminates the need for devices to be physically connected in order
device to a~other. Each wire carries one of the bits. Parallel communication has the advantage to communicate. The physical layer used in wireless communication is typically either an
of high data throughput, if the length of the bus is short. The length of a parallel bus m~st.be infrared (IR) channel or a radio frequency (RF) channel.
kept short because Jong parallel wires will result in high capacitance values, and ~sm1tung Infrared uses electromagnetic wave frequencies that are just below the visible light
a bit on a bus with a higher capacitance value will require more time to charge or discharge. spectrum, thus undetectable by the human eye. These waves can be generated by using an
In addition, small variations in the length of the individual wires of a parallel bus can cause infrared diode and detected by using an infrared transistor. An infrared diode is similar to a
the received bits of the data word to arrive at different times. Such misalignment of ~ta red or green diode except that it emits infrared light. An infrared transistor is a transistor that
becomes more of a problem as the length of a parallel bl.ls increases. Another problem with conducts (i.e., allows current to flow from its source to its drain). when exposed to infrared
parallel buses is the fact that they are more costly to construct and ~y be bulky, es~cially light. A simple transmitter can send ls by turning on its infrared diode and can send Os bv
when considering the insulation that must be used to prevent the no1_se from ea~h wrre from turning off its infrared diode. Correspondingly, a receiver will detect ls when current f10\~
interfering with the other wires. For example, a 32-wire cable connecting two devtces together chrough its infrared transistor and Os otherwise. The advantage of using infrared
will cost much more and be larger than a two-wire cable. . communication is that it is relatively cheap to build transmitters · and receivers. ll1c
In ge era!, parallel communication is used when corutecting devices that reside on ~e disadvantage of using infrared is the need for line of sight between the transmitter and
same IC or devices that reside on the same circuit board. Since the length of such buses 1s receiver, resulting in a very restricted communication range.
short, e capacitanceJoad, data misalignment and cost problems mentioned earlier do not Radio frequency (RF) uses electromagnetic wave frequencies in the radio spectrum. A
transmitter here will need to use analog circuitry and an antenna to transmit data. Likewise. a
pla); · important role.
receiver will need to use an antenna and analog circuitry to receive data. One adyantagc of
using RF js that, generally, a line of sight is not necessary and thus longer distance
erial Communication
f Serial communication involves a physical l~yer that ~es one bit.of data at a time. This communication is possible. The range of communication is; of course. dependent on the
transmission power used by the transmitter.
means that the data bus is composed of a smgle data wire, along with control and ~ss1bly _ ..
www.compsciz.blogspot.in
Chapter 6: Interfacing
- · - - - - - - - - - - - - - - - -- - ----'------
6.9: .Serial Protocols
SCL an I2c bus. ~e protocol does 1101 limit the number of master devices on an t2c bus, but
SDA typically, in a microcontroller-based system, the microcontroller serves as the master. Both
master and servant devices can be senders or receivers of data. This will depend on the
function of the device. In our example. the microcontroller and EEPROM send and receive
Micro- EEPROM Temp. LCD-
controller data, while the temperature sensor sends .data and the LCD-controller receives data. In Figure
(servant) Sensor controller
(master) (servant) (servant) 6.24(a), arrows connecting the devices to the 12C bus wires depict the data movement
< 400pF direction. Normally, all servant devices residing on an I2c assert high-impedance on the bus
(a) while the master device maintains logic high, signaling an idle condition.
Addr=OxOI Addr=Ox02 Addr=Ox03 All data transfers on an I2c bus are initialed by a start condition. A start condition is
shown iri Figure 6.24(b). A high to low transition of the SDA line while the SCL signal is held
high signals a start condition. All data transfers on an I2c bus are terminated by a stop
SDA SDA SDA condition. A stop condition is shown in Figure 6.24(b). A low to high transition of the SDA
SCL line while the SCL signal is held high signals a stop condition. Actual data is transferred in
SCL SCL
between start and stop conditions. A typical I2C byte write cycle works as follows. The master
Start condition SendingO Sending I Stop condition device initiates the transfer by a start condition. Then, the address of the device that the byte is
being written to is sent starting with the most significant down to the least significant bit.
(b) From Ones and zeros are sent as shown in Figure 6.24(b). Here, the bit value is placed on the SDA
From
Servant receiver line by the master device while the SCL line is low and maintained stable until after a clock
I I pulse on SCL. If performing a write, right after sending the address of the receiving device,
D ~ I V. \
' I \ 'V the master sends a zero. The receiving device in return acknowledges the transmission by
C s A A A A R A D D D A s 0
holding the SDA line low during the first ACK clock cycle. Following the acknowledgment,
T R .._ 5 - o- ! - C ,__ 8 '--- 7 '-- 0 ~ C T p the master device transmits a byte of data starting with the most significant down to the least
/
6 -
T w K K significant bit. The receiving device, in this case the servant, acknowledges the reception of
Typical read/wnte cycle .
data by holding the SDA line low during the second ACK clock cycle. If performing a read
(c) operation, the master initiates the transfer by a start condition, sends· the address of the device
that is being read, sends a one (logic high on SDA line) requesting a read and waits to receive ·
an acknowledgment. Then, the sender sends a byte of data. The receiver, mastet device in this
Figure 6 .24: I'c bus slructure.
case, acknowledges the reception of data and terminates the transfer by generating a stop
condition.,The ti~ngdiagramofa typical read/write cycle is d~ficted in ~igure 6.24(c) .. . t,('
co~uni~te with ea~h other using simple communication har~ware. Based on th~ original
specification of the IC, data transfer rates of up to 100 kbits/s. and 7-bit addressing are CAN-
,,.,.- 4n 6~
./ "-0 ~
LJI
\ \, ! ,1,
0 '(· -f\i ,J' '\("\, • • .-/ i Q
w='-'?-. e
)tLJ_
possible. Seven-bit addressing allows a total of 128 devices to communicate over a shared I2C _,,-- ,,_- u .:;).
bus. With.increased data transfei'rate requirements, the I2c specification has been recently e .controller a~ea netw~rk (CAN) b~s is a serial c.ommun!cation protocol for _!eal-time {21
enhanced to include fast-mode, 3.4 Mbits/s, with IO-bit addressing. Cominon devices capable application~ poss1b1y: earned over a tw~ste<:I PaJ.r_~f ."'ires_. ~~: ~r~~5~::was.developed by D
,>f.,;t
interfacing to an 12c. bus include .EPRO-Ms, Flash and some RAM memory devices, real- Rohert-Boah .GmbH-to-enable--oommurueauonamong vanous electromc components-of..cars
~ te mative----to--expensi:ve-.and.-eumbersome --wiring harnesses. · Th-e-robustness----of:....the...
ff't 1~locks, watchdog timers, and nucrocontrollers. ·
~ p i e I~C network is depicted in Figure 6.24(a). The bus consists ofh~o wires; a data _pr~ol::has--expande_d'its·-use··to,nany---other- automation and----industrial-application@9me
e called serial-data-hrte (SDA) and a clock wire called serial-clock-line (SCL). The I2C ch3J1!cteristics of the CAN protocol include high-integrity serial .data . communications,
specifi9tion does notlimit the length of the bus_wires, as long as the total capacitance of the reaf: tlme··suppon; ·data· rates of up to 1 Mbit/s, I I-bit adoressing, error detectiop., and
bus remains under 400 pF. In this example, there are four devices attached to the bus. One of -~~ nfine'?ent c:Pabil,ities . 'fl.!(::'."{:~~ot~f~i_iniod :i.~ e1iiect't~cfT f89'lfl _ O:or liigh-sp~ed
these devices, the inicrocontroller, is a master. The other three devices, a temperature sensor, applw.at.iruiS~@-4~W~ef4ower~plicauons)emmon appliciittons, otb.e r
an.~PROM, _and a LCD-controller, are servants. Each of these servant devices is assighed a tl)i!Il autom<>_biles, usin~ CAN include elevator co~lers, copiers, teiescopes; -production-line
uruque addn:ss, as shown in Figure 6.24(a). Only master devices can initiate a data transfer on · control systems, and medicaVs'trurnents. Amo~g devices that incorporate a CAN interface
---·-:::""" - ·
170 Embedded System Design 171
Embedded System Design
www.compsciz.blogspot.in
Chapter 6: Interfacing 6.10: Parallel Protocols
are the 805150mpatible 859} pr~or, and a variety of standalone CAN controllers, such as available, providing
. a handful
. of convenient USB ports right on the desktop . Hubs r,cature an
the 80C200,'from Philips) ,._ Y' · upstream connecllon (pomted toward the PC) as well as multiple downstream port 1 , II
.
the connect10n f dd" · · h S O ii ow
!he CAN spec_ificati~n does not specify the actual layout and structure-of the physica) · th. o a 11lonal penp eral devices. Up to 127 USB devices can be cOnncctcd
bus tts! !.f· Instead, _~I_Jeqmres that a device connected to the CAN bus is .able to transmit, or toget her m 1s way.
dete~t, on the physical bus, one of two signals called domi.nant or recessi~ For example, a USB_host controllers manage and control the driver software and bandwidth required by
dommant signal may be re~resented as logic 'O' and recessive as logic '(11on a single data each pen~heral connected to the bus. Users don't need to do a thing, because all the
wire: Furth~rrnore, the physical CAN bus must guarantee that if one of two devices asserts a configuration steps happen automatically. The USB host controller even allocates electrical
domu~ant s_1gnal and another devi~ simultaneously a recessive signal, the dominant signaJ P.Ower to the USB devices. Like USB host controllers, USB hubs ·can detect attachments and
prevails. Given a physical CAN bus with the above-mentioned properties, the protocol defines detachments of f:>l!riph~rals occu~g ~o~tream and supply appropriate levels of power to
data packet format and transmission rules to prioritize messages, guarantee latency times downstream devices. Smee power 1s drstnbuted through USB cables, with a maximum length
allow fo_r multiple masters, handles transmiss.ion errors, retransmit corrupted messages, and of 5 meters, you no longer need a clunky AC power supply box for many devices.
d1stmgmsh between a permanent failure of a node versus temporary errors.
USB ~ PCI is a widely used industry standards, many other bus protocols are predominantly
esigned and used internally by various IC design companies. One such bus is the ARM bus
The Universal Serial Bus (USB) protocol is designed to make it easier for PC users to connect
moni_tors, printers, digital speakers, modems and input devices like scanners, digital cameras, ~esig~ed by th~ ARM Co-~r-ation.and ~ocumented in ARM' s application note 18v~s bus
Joysllcks, _and multimedia game equipment. USB has two data rates, 12 Mbps for devices . ~ ~,m~4.JQ...mtetfa.~W!.til. t!!~--~ lm~_gf p~ocessors. The .ARM bus supports 3 ~ t a
requmng mcreas~ bandwidth, and l.5 Mbps for lower-speed devices like joysticks and game ._
transfer and 32-bit ad$essmg·anc1, ·similar tcrPCI, is implemented using synchronous data i
transfer architecture. ~ ~ f e r__~!«:__En an ARM bus is _!1,o t specified and instead is a ·. j
II
pads. USB uses a llered star topology, which means that some USB devices, called USB hubs,
~-n~tion_?f_t_!l~ ~~ ~~ .t:d used in a particular. w"pfication:. More specillcaffy:,
1fiheclock i
can serve as connection ports for other USB peripherals. Only one device needs to be plugged ·
rnto the PC Other devices can then be plugged into the hub. USB hubs may be embedded in
such devices as monitors, printers and keyboards. Standalone hubs could also be made -- ·-....r
r
ID ,, .. ,,
..))
~~ed o; the ARM~ is denoted as X, "tJien the iransfer rat~ ~s,;6 r~- 7?-s~ _,2 . . ··- .
;_) L )j • .
~
../
172 Embedded System Design ·
www.compsciz.blogspot.in
--~::J
Chapter 6: Interfacing
- 6.11: Wireless Protocols
hoc network architectures uses a broadcast and flooding method to all other nodes to establish
who's who. The seco11d type of network structure used in wireless LANs is the infrastructure.
6.11 Wireless Protocols This architecture uses fixed network access points with which mobile nodes can
Jn this section, we briefly introduce three new and emerging wireless protocols, namely lrDA,
communicate. These network access points are sometime connected to landlines 10 widen the
Bluetooth, and the IEEE 802. l l. LAN's capability by bridging wireless nodes to other wired nodes. If service areas overlap,
handoffs can occur. This structure is very similar to the present day cellular networks around
lrDA the world.
The Infrared Data Association (lrDA) is an international organization that creates and The IEEE 802.11 protocol places specifications on the parameters of both the physical
promotes interoperable, low-cost, infrared data interconnection standards that support a w~- PHY and medium access control MAC layers of the network. 1be PHY layer, which actually
up. point-to-point user model. Their protocol suite, also _commonly referred to ~s IrDA,_ 1s handles the transmission of data between nodes, can use direct sequence spread spectrum,
designed to support transmission of data between two devices over short-ran~e pomt-to-pomt frequency-hopping spread spectrum, or infrared pulse position modulation. IEEE 802. 11
infrared at speeds between 9.6 kbps and 4 Mbps. IrDA is that small,_ sermtransparent, red makes provisions for data rates of either I Mbps or 2 Mbps, and calls for operation in the 2.4
window that you may have wondered about on your notebook compute~. Over·the last seve_ral to 2.4!135 GHz frequency band, which is an unlicensed band for industrial, scientific, and
vears. IrDA hardware has been deployed in notebook computers, pnnters, personal dig1~ medical applicati~ns, and 300 to 428,000 GHz for IR transmission. Infrared is generally
~ssistants. digital cameras, public phones, and even cell phones. One of the r~ons for this considered to be more secure to eavesdropping, because IR transmissions require absolute
has been the simplicity and low cost of IrDA hardware. Unfortunately, until recently, the line-of-sight links (no transmission is possible outside any simply connected space or around
hardware has not been available for applications programmers-to use because of a lack of comers), as opposed to radio frequency transmissions, which can penetrate walls and be
suitable protocol drivers. · . . . intercepted by third parties unknowingly. However, infrared transmissions can be adversely
Microsoft Windows CE l.0 was the first Windows operating system to proVIde bmlt-m affected by sunlight, and the spread-spectrum protocol of IEEE 802.11 does provide some
IrDA support. Windows 2000 and Windows 98 now also include support for the_same IrDA, rudimentary security for typical data transfers.
programming APis that have enabled file sharing applications and games on Windows ~E. The MAC layer is a set of protocol~. which is responsible for maintaining order in the use
!rDA implementations are becoming available on several popular embedded operatmgl of a shared medium. The IEEE 802.1 fsia:ndard specifies a carrier sense multiple access with
collision avoidance CSMA/CA protocol. In this protocol, when a node receives ·a packet to be
systems. I transmitted, it first listens to ensure no other node is transmitting. If the channel is clear, it
Bluetooth
t then transmits the packet. Otherwise, it chooses a random backoff-factor, which determines
Blue~ooth is a new and global standard for wireless connectivity. Tl;ris protocol is based on a.
the amount of time the node must wait, until it is allowed to transmit its packet. During .I
periods in which the channel is clear, the transmitting node ·decrements its backoff counter.
low-cost. short-range radio link. The radio frequency used by Bluetooth 1s globally available.
When two Bluetooth-equipped devices come wit~in IO m~ters_ of eac~ oth~r, th~y can
establish a connection. Because Bluetooth uses a radio-based hnk, 1t doesn t reqmre a h~e-of-
When the backoff counter reaches zero, the node transmits the packet. Since the probability
that two nodes will choose the same backoff factor is small, collisions between packets are I
• minimized. Collision detection, as is employed in Ethernet, cannot be used for the radio
sight connection in order to communicate. For example, your laptop could send 1nformallon to
frequency transmissions of IEEE 802. Ii. The reason for tlris is that when a node is
a printer in the next room, or your microwave oven could send a ~e~sage to your cordle~
transmitting it cannot hear any other node in the system, which may be transmitting, since its
phone telling you that your meal is ready. In the future, Bluetooth 1s likely to be ~tan~d Ill
own signal will drown out any others arriving at the node.
tens of millions of mobile phones, PCs, laptops and a whole range of other electromc devices.
Whenever a packet is to be transmitted, the transmitting node first sends out a short
ready-to-send RTS packet containing information on the length of the packet. If the receiving
IEEE 802.11 node hears the RTS, it responds with a short clear-to-send CTS packet. After this exchange,
li-'U'. 802. r I is an
IEEE-proposed standard for wireless local area networks (LANs). There the transmitting node sends its packet. When the packet is received successfully, as
arc two different wavs to configure a network: ad-hoc and infrastnlc!ure. In the a~-hocl determined by a cyclic redundancy check, the receiving node transmits an acknowledgment
network. computers ;re brought together to form a network on the fly, Here, ~ere 1s no ACK packet.
·structure to the network. there are no fixed points, and usually every node is ab\e to : }·:
;.'jl
communicate with every other node. Although it seems that order would be difficult_tO
maintain in this type of network, special algorithms ha~c been designed lo elect o~e ma_chm;I ·
as the master station of the network with the others bcmg servants. Another algontlun ma ·
.I,
174 ·- ---· --- . Embeddod sy~,m Di:b..,od sy,•m o~••
175 i
www.compsciz.blogspot.in · ··· ····-·- · -·-· ··---·- ---- ·- - ·-··- · ~ - . ..... -~-ec:a:s ""''
,. _..
·· ~~
Chapter 6: Interfacing 6.14: Exercises
6.3 Show how to extend. the number. of ports on a 4-pon 8051 to g b y usmg · extende d
6.12 Summary parallel l/0. (a) Usmg block diagrams for the 8051 and the extended H 1 I/O
Interfacing processors and memory represents a challenging design task. Timing dic1grams devic~, draw and label all interco~ections and I/0 ports. Clearly indicat:: :ames
provide a basic·means for us to describe interface protocols. Thousands of protocols ::xist, but and widths of all connections. (b) Give C code for a function that could be used to write
to the extended ports.
they can be better understood by understanding bask protocol concepts like actors, data
direction, addresses. time multiplexing, and control methods. A general-purpose processor 6.4 · Discuss the advaniages and disadvantages of using memory-mapped I/0 v~rsus
standard I/0.
typically has either a bus-based I/0 structure or a port-based I/0 Structure for interfacing.
Interfacing with a general-purpose processor is the most common interfacing task and · 6.5 Explain the benefits that an interrupt addre~ table has over fixed and vectored interrupt
methods. 1
involves three key concepts. The first is the processor's approach for addressing external data
locations. known as its I/0 addressing approach, which may be memory-mapped I/0 or 6.6 Draw a block diagram of a processor; memory, and peripheral cormected with a svstem
standard I/0. The second is the processor's approach for hanrlling requests for servicing by bus, in which the peripheral gets serviced using vectored interrupt. Assume se~icing
peripherals, known as its interrupt handling approach, which may be fixed or vectored. The moves data from the peripheral to the memory. Show all relevant control and data lines
third is the ability of peripherals to directly access memory, known as direct memory access. of lie bus, and label component inputs/outputs clearly. Use symbolic values for
Interfacing also leads to the common problem of more than one processor simultaneously· addresses .. Provide a timing diagram illustrating what happens over the system bus
during the interrupt. · ·
seeking access to a shared resource such as a bus. requiring arbitration. Arbitration may be
carried out using a priority arbiter or using daisy chain arbit.ration. A system often has a 6.7 Draw a block diagram of a processor, memory, peripheral, and OMA controller
hierarchy of buses, such as a high-speed processor local bus and a lower-speed peripheral bus. connected with a ·system bus, in which the peripheral transfers 100 bj1es of data to the
Communication protocols may carry out parallel or serial communication, and may use wires, memory using DMA. Show all relevant control and data lines of the bus, and label
infrared. or radio frequencies as the transmission medium. Communication protocols may component inputs/outputs clearly. Draw a timing diagram showing what happens
include extra bits for error detection and correction, and typically involve layering as an during the transfer; skip the 2nd through 99th bytes.
abstraction mechanism. Popular serial protocols include iZc, CAN, FireWire, anct USB. 6.8 Repeat problem 6.7 for a daisy-chain configuration.
Popular parallel protocols include PCI and AiUvt:. Popular serial wireless pn,tocols include 6. 9 Design a parallel I/0 peripheral for the ISA bus. Provide: (a) a state-machine
IrDA, Bluetooth. and IEEE 802.11. description and (b) a structural description.
6.10 Design an extended parallel I/0 peripheral. Provide: (a) a state-machine description and
~ (b) a structural description.
~ List the three main transmission mediums described in the chapter. Give two common
6.13 References and Further Reading applications for each.
• VSI Alliance. On-Chip Bus Development Working Group, Specification I version 1.0, 6.12 Assume an 8051 is used as a master device on an I2C bus with pin P 1.0 corresponding
''On-Chip Bus Attributes:" August 1198, http://www.vsi.org. to I2C_Data and pin Pl.l corresponding to I2C_Clock. Write a set of C routines that
• L. Eggebrecht. Inter:facing to the IBM Personal Computer Indianapolis, IN: SAMS, encapsulate the details of the I2C protocol. Specifically, write the routines called
Macmillan Computer Publishing, 1990. StartI2C/StopI2C, that send.the appropriate start/stop signal to slave devices. Likewise,
• Peter W. Gofton, Mastering Serial Communications. Aiarneda, CA: SYBEX Inc .. 1994. write the routines ReadByte and Writei3yte, each taking a device Id as input and
• Bob O ' Hara and Al Petrick. IEEE 802.11 Handbook - - A Designer ·s Companion. performing the appropriate 1/0 actions.
Piscalaway, NJ: Standards Information Network, IEEE Press, I 999. 6.13 Select one of the following serial bus protocols, then, perfonn :\n internet search for
• John Hyde. USBDesign by Example. New York: John Wiley & Sons. Inc .. 1999. infonnation on transfer rate, addressing, error correction (if applicable). and
plug-and-play capability (if appl\cable). Then give timing diagrams for a typical
transfer of data (e.g., a write operation). The protocols are USB, I20, Fibre Cha11J1el,
SMBus, IrDA, or any other serial bus in use bj' the industry and not described in this
6.14 Exercises book. . , I
6.1 Draw the timing diagram for a bus protocol that is handshaked. nonaddressed. and
transfers 8 bits of data-over a 4-bit data bus. · '
6.2 Explain the difference between por1-bascd I/0 and bus-based 1/0.
www.compsciz.blogspot.in
Chapter 6: Interfacing
6.14 Select one of the following parallel bus protocols, then, perfonn an Internet search for
_,
infonnation on transfer rate, addressing , DMA and intenupt control (if applicable), and
plug-and-play capability (if applicable). Then give timing diagrams for a typical
transfer of data (e.g., a write operation). The protocols are STD 32, VME, SCSI,
ATAPI, Micro Channel, or any other parallel bus in use by the industry and not
CHAPTER 7: Digital Camera Example
described in this book.
7.1 Introduction
7.2 Introduction to a Simple Digital Camera
' 7.3
7.4
Requirements Specification
Design
7.5 Summary
7.6 References and Further Reading
7;7 Exercises
7 .1 Introduction
In the preyious chapters, we introduced general-purpose processors, custom single-purpose
processors; ·standard single-purpose processors, memory, and techniques for interfacing
pr~ors and memory. In this chapter, we apply this knowledge to design a simple digital
camera. In particular, we will examine the trade-Qffs of using generaJ-pwpose versus
single-purpose processors to implement the necessary camera functionality. We will see that
choosing a good partitioning of functionality among the different processor types is essential
to building a good design. This in tum requires a unified view of different processor cypes, as
this book has thus far stressed. . · ·
. We begin with a general introduction to digitai cameras and their inner workings. We
as
then develop tlie camera's specifications, which describe the. desired behavior well as
constraints on design metrics like perfonnance, siz.e, and power. We explore several
alternative implementations of the digital camera and compare their design metrics.
be feasible. The advent of systems-on-a-chip and high-capacity flash memory has made such
cameras possible.
When e;_posed to
light, each cell.
Lens area
7.2: Introduction to a Simple Digital camera
lJie electromechanical
User's Perspective becomes electrically shutter is activated to
charged. This charge expose the cells to
From a user's point of view. a simple digital camera works as follows. The user turns on the light for a brief
digital camera, points the camera lens to the scene to be photographed, and clicks the can then be converted
moment.
"shutter" button. The user can repeat these steps until up to N images are stored internally in to a 8-bit value where
0 represents no
the camera. Here. N is _a constant that depends on the model of the camera, which in turn The electronic
exposure while 255 circuitry, when
depends on the amount of memory in the camera and the number of bits used per image. The represents very intense
user may also attach the digital camera to a PC, say, by using a serial cable, to download the commanded,
exposure of that cell to ~ discharges the cells,
photos to a hard disk for pennanent storage. light, i:: activates the
oi
X electromechanical
Designer's Perspective Some of the colunms ii: shutter, and then reads
From a designer's point of view, a simple digital camera performs two key tasks. The first
are covered with a
black strip of paint. ~,,:::::::: ,,,,::::::::,,, the 8-bit charge value
of each cell These
task is that of processing images and storing them in internal memory. The second task is that The light-intensity of values can be clocked
of uploading the images serially to an attached PC. , these pixels is used for out of the CCD by
zero-bias adjustments y
The task of processing and storing images is initiated when the user presses the shutter Pixel -external logic through
of all the cells. a standard parallel bus
button. At this point. the image is captured and converted to digital form by a charge-coupled
device (CCD). Then. the image is processed and stored in internal memory. The task of interface.
. Figure 7. 1: Internals of a charge-coupled device (CCD).
uploading the image is initiated when the user attaches the digital camera to a PC and uses
special software to command the digital camera to transmit the archived images serially. Let
us look at these actions in more detail. A digital camera uses a CCD to capture an image. Once the image is captured, it must be
A CCD is a special sensor that captures an image. A CCD is a light-sensitive silicon corrected to eliminate the zero bias error. Then. the image must be encoded using the JPEG
solid-state device composed of many small cells. The light falling on a cell is converted into a encoding scheme. The task of bias adjusunent is described next. .,,---
small amount of electric charge, which is tl1en measured by the CCD electronics and stored as Figure 7.2 shows a raw image block of size 8 x 8 pixels that is captured using a CCD of
a number. The number usually ra'nges from 0, meaning no light, to 256 or 65,535, meaning that size. Nonnally, the CCD would be of much greater resolution, say 640 x 480 pixels, but
very intense light per pixel. Figure 7.1 illustrates the internals of a CCD. On the periphery, a we use a small one to. be able to illustrate the various operations of a digital camera in this
CCD 1s composed of a mechanical shutter. This is a screen that normally blocks the light from chapter. Notice in Figure 7.2(a) that there are IO columns. As mentioned earlier, the last two
falling ~n tl1e light sensitive surface. When activated, the screen opens momentarily and columns are extra and are used to detect zero-bias. Recall that these ·two columns are covered
allows hght to hit the light sensitive surface. charging the cells with electrical energy that is and should nonnally read a value of zero. Looking at the last two columns of the first row, we
proportional to the amount of light passed in. The screen typically sits behind an optical Jens see that the measured light intensity is on the average 13 units larger than the actual light
that focuses the_scene observed through the viewfinder onto the light sensitive surface of the intensity. We obtain 13 by averaging the last two columns ((12 + 14) / 2) = 13). We can thus
CCD. A CCD also has internal circuitry that measures the electric charge of each cell, correct the error for this row by subtracting 13 from each element of the first row. We can
converts it to a 9igital value, and provides an interface for outputting the data. repeat this process for each row to obtain a block of 8 x 8 pixels that has been corrected for
Due to manufacturing errors, the light-sensitive cells of a CCD may always measure the zero bias errors. The corrected block is given in Figure 7.2(b).
light intensity to be slightly above or below the actual val~e. This error, called the zero-bias The next step is to compress the image, which reduces the number of bits needed to store
error, is typically the same across columns but different across rows. for this reason, some of the image in memory. Compression allows us to store more images in Iinlited amount of
the.left most columns of a CCD's light-sensitive cells ar.c blocked .by a strip of black paint. memory. Compressed images can also be transmitted to a PC in less time. We'll perform
The actt1al intensity registered by these blocked cells ?hould be z ero. Therefore, a reading of JPEG encoding of the image. JPEG is a popular standard format for representing digital
other than zero would indicate the zero~bi<1s error for that row. Figure 7.1.shows the covered images in a compressed form. JPEQ, pronounced "jay-peg," is short for Join( Photographic
cells. 11lis becomes clearer as we give an exan:iple) nthe next para~phs. ..· ' · Experts Group. The wr,rdjoint refers to the group's status as a conunittee working on both
ISO arid ITU-T stahdarc;,. Theirbest-kil.own standard is for still-image compression.
using the previous equation. for row u and column v. Figure 7.3(a) shows the OCT-encoded
JPEG encoding provides for a number of different modes of operation. For a full
values for our sample block of 8 x 8 pixels. The inverse process will obtain the block in
coverage of the JPEG encoding, the reader is referred to the reference section at the end of Figure 7.2(b) from that in Figure 7.3(a).
this chapter. The mode that we discuss in this chapter is an encoding that provides for high
The DCT is sometimes distinguished from the ICDT by referring to the DCT as the
compression ratios using the discrete cosine transform (DCT). To compress an i~ge, the forward DCT. or FDCT
image data is divided into blocks of 8 x 8 pixels each. Each block is then processed m three
The next processing step is lo reduce the quali~. of the encoded DCT image. which helps
steps. The first step perfonns the DCT, the second step performs quantization, and the last
us compress the image. We do ~his by re~fucing the bit precision of the encoded data. Note
step performs Huffman encoding. ·
that if we represent the pixels with less precision. we will need fewer bits to encode them,
The DCT step transfonns our original 8 x 8 pixel block into a cosine-frequency domain.
thus achieving compression. For example. we can divide all the values by some factor of 2
Once in this form, the upper-left comer values of the transformed data represent more of the
(since division by a factor of 2 ·is achieved simply by right shifts). such as 8. This is the s1ep
essence of the image while the lower-right comer values represent finer details. We can,
where we actually loose image quality in order to achieve high compression ratios. This
therefore reduce the ·precision of these lower-right comer values to facilitate compression
process is referred to as quantization. 'ro decompress. we would perform a· dcquantization. In
while retaining reasonable overa.ll image quality. The actual DCT operation is given in this other words. we would multiply each pixel by the same factor of 2 (i.e .. 8 in our example).
formula: Figure 7.3(b) illustrates the quantization applied to the block of 8 x 8 shown in Figure 7.3(a).
C(h) = if ( h = 0) then l/sqrt(2) else 1.0 The last step of the JPEG compression is the encoding of data. Herc. the blo~ of 8 x 8
pixels. is first f;erialized. Specifically. the values are converted into a single list according to a
F(u,v) = 1/4 x C(u) x C(v) Lx=o..1 Ir-0 .7 Dxy x cos(1t(2x + l)u / 16) ~ cos(1t(2y + l)v / 16) zigzag pattern, as shown in Figure 7.4. Then, the values are Huffman encoded. Huffman
Here, C(h) is simply an auxiliary function used in the main equation, namely, F(u,v). The encoding is a mirumal variable-length encoding based on the frequency of each pixel. In other
function F(u,v) gives the encoded pixel at row u, column v. Dxy is the original pixel value at words, the frequently occurring pixels will be assigned a short binary code while tl1ose that
row x, column y . Of course, it would be useless to have a DCT transform if we are unable to don't occur as frequently will be assigned a longer code. Let us explain tliat with an example.
reverse the process and obtain the original. Below is the inverse DCT (IDCT), although 1t 1s In Figure 7.S(a), we have given the frequency of pixel occurrence of the encoded and
not necessary in the implementation of our simple digital camera: quantized 8 x 8 block shown in Figure 7.3(b). Herc. as shown. the encoded pixel value -I
occurs fifteen times while the encoded pixel value 14 occurs only one ti1nc.
C(h) = if ( h = 0) then l/sqrt(2) else 1.0 From this information, we construct a Huffman tree as illustrated in Figure 7.5(b). Wit11
f(x,y) = 1/4 Lu=o. 7 I..o 7 C(u) x C(v) x Euv x cos(1t(2u + l)x / 16) x cos(1t(2v + l)y / 16) each node in such a tree. we associate a value that is, computed as follo'>'s. For an internal
node, the va lue is the sum of the values of the children :o f that node. For a leaf node, the value
Again, C(h) is simply anauxiliajy'function used in the main equation, narnely,ftxJI). The is the frequency of occurrence of the pixel being rcp~sented by that leaf node. The tree is
functionftx.Y) gives the original pixel at row x, columny. Euv is the OCT-encoded pixel value, constructed from the bottom up (i.e., starting from lcilfs and working up toward the root).
www.compsciz.blogspot.in
;;,1apier 7: Digital Camera Example
-I i5,
I 00
(I 8,
~ 100
-2 6:-;
2 11 0
I 5,
I 01 0
2 5,
12 1110
J 5,
3 1010
5 5x
15 0110
-3 4,
3 111 10
Fig11r< 7.4: Data encoding sequence ofa block of8 ' · 8 pixel.
-5 3,
5 IOI 10
-JO 2x
tnitiallY. we create a leaf node for each of the pixels and initialize the values of these nodes \ IO 011 10
144 Ix
according to the pixel's frequency. Then we create an internal node by joining any two nodes 1-44 111111
that will result in the rninimwn value. We repeat this process until we have a complete bmary -9 I,
9 11 1110
-8 Ix
8 101111
tree. Once the Huffman tree is constructed, we can obtain a bin~ code for each of the pixel -4 h -1 101110
values by traversing the tree starting at the root down to the leaf labeled with that pixd. While 6 h ~ L! :1 111
traversi~g the tree, we construct a binary string. Each time we traverse down past a right child 14 Ix j ...q 14
we append a ·T' to our binary string, whereas each time we traverse down a left child_we 14 6 _l)
01 1110
(a)
append a '·o·· to our binary string. For example, in order to obtain the bma.ry code for the pixel (c)
value -3 in Figure 7.5(b), we would make four right traversals and a left traversal.. thus
obtaining the oinary string'' 11110". Figure 7.5(c) gives the Huffman codes for the remanung Figure 7._5: Hullinan encoding ofthc block of8 · 8 pixels shown in Figure 7.J(h): (a) the pi.\d valu'"- and associated
lr<qucnctes. (b) the resultmg Huffman tr«. (c) and the Huffman codes
~va~ . . .
· GiYen these Huffman codes. we encode our block of 8 x 8 pixels by creatmg a long stnng
of Os and Is. Here we take the sequence of pixels generated by the zigzag ordering shown m ways to perform such archiving. In any e1·ent. our memory requirement will be based ou .\"_
Figure 7A. and for each pixel we output the Huffman bim:cy code. In our example of Ftgure the image size and the average compression ratio that we can obtain using JPEG encoding.
7..J.. we would obtain the binary string "1111110I1001110 .. .." ~ . Finally, the only processing task that remains is to upload tl1e images and free the space
As stated earlier. Huffman encoding achieves compression by assigning a short bmary in memory when a PC is connected to the camera and an upload c0111111and is rccci,·cd. To
code to the most frequently appearing pixel values, while leaving longer_binary codes for tl1e accomplish this. we use a UART As you ·11 recall. a UART transmits data scriah 0\ er a
least frequently appearing pixels. Of course, this process is reversible since Huffman single data 1rirc. Our processing task will be to re;1d lhc images from memory and -trn11smi1
encoding also ensures that no two codes a.re a prefix of each_other. . _ _ them using the UART. As we transmit images. we reset the pointers. imagc-siz~ Yariablcs and
the global memory pointer accordingly.
Our next processing step is to archive our image. This step 1s rather easy. We simply
record the starting address and size of each image. We can use a lmked hst data s~cture to It mus! be noted again tl1at our descri ption of a digilal camera is ,·en· simple. A real
record this information. If we know beforehand that tl1e camera will hold at most iv images. digital camera " ·ill enable you lo take pictures of 1·aricd sizes. display images on an LC'D.
11e can set aside a portion of memory for our V addresses and N image-size variables. In allow image deletion. perform advanced image processing such as digitall_l" stretching.
1.00111i11g in and out. and many other things.
addition. 11e 11ould need to keep a counter that tells us the location of the next available
address in rnemor\'. For example, ·initially, all N addresses and image-size variables might be
set 10 o. Our global memory address will be set to N x 4, assuming that the address and t 7.3 Requirements Specification
imagc-si1.c 1·ariables occupv the initial N x 4 b)tes in memory. Then, tl1e first image will be
arcl;;, ed in memory starting at location :V x 4. Assuming the image was of size 1024, then we
f
Our digitai camera product's life begins II ith a requirements specification. A specification
11ill t1pd1te our global mc;nory address to N x 4 + 1024, and so on. Of course, there are other f describes · 1\hat a particular ~ystcm should do. namely the system's rcq11ircmcnts.
I ; ;:,~"s;.,;
Specificat ions include both functional and nonfunctional requirements. Func1i0nal
.k=
www.compsciz.blogspot.in
-- ····- ·- -· ~··· -~~..,-~ -- ·-·-- · --:
Chapter 7: Digital Camera Example
7.3: Re_
quirem~nts Specific;t_tion
requirements describe the system's behavior, meaning the system's outputs as a function_of
inputs (e.g., "output X should equal input Y times 2"). Nonfunctional reqwrements descnbe
constraints on design metrics (e.g., "the system should use 0.00 I watt or less"). The initial CCD --'--..i Zero-bias adjust ,.__ _ _ _~
specification.of a system may be very general and may come from our company's marketing input
department. The initial specification for our camera might be a short document detailing the
market need for ·'a very basic low-end digital camera capable of capturing and storing at least
50 low-resolution images and uploading such images to a PC, costing around $100, with a
single medium-sized IC costing less than $25, including amortized NRE costs. Battery life
should be as long as possible. Expected sales volume is 200,000 if market entry is earlier than
6 months, and 100,000 if market entry is between 6 to 12 months. Beyond 12 months, this
product will not sell in significant quantities." yes
Let us begin by discussing the nonfunctional requirements in more detail, followed by an
informal high-level functional specification, and then a more detailed description of behavior.
Nonfunctional Requirements
Given our initial requirements specification, we might want to pay attention to several design
metrics in particular: performance, size, power, and energy. Performance is the time required
no
to process an image. Size is the number of elementary logic gates (such as a two input NANO serial output
gate) in our IC. Power is a measure of the average electrical energy consumed by the IC while e.g., 011010...
processing an image. Energy is power times time, which directly relates to battery lifetime.
Some of these metrics will be constrained metrics - those metrics must have values below
(or in some cases above) a certain threshold. Some metrics may be optimization metrics -
those metrics should be improved as much as possible, since this, optimization improves the Figure 7.6: Functional block-diagram specification ofa digital camera.
product. A metric can be both a constrained and optimization metric.
Regarding performance, our design must process images fast enough to be useful. We
might determine that a reasonable timing constraint is I second per image. Note that the terms Informal Functional Specification
timing and performance are often used interchangeably. More time than I second would
. _We can describe the high-level functionality of the digital camera by using the flowchart
probably be quite annoying from a camera user' s perspective. Imagine having to wait IO
m Figure 7.6. We see the major functions involved in image capture, namely zero-bias adjust,
seconds after pressing the shutter button before you could press the button again. A typical
DCT, quantize and archive in memory. We also see the function transmit serially. We could
soccer parent would probably not buy such a camera, for fear of missing a great goal! On the
then ~escnbe each_fu~ction's details in English; we omit such descriptions here since they
other hand, since we are aiming for the low-end of the digital camera market, our performance
were mcluded earher m the chapter. We'll asswne a very low-quality image with a 64 x 64
doesn't need to be much better than I second. Thus, performance is a constrained.metric but resolution, meaning the CCD has 64 rows and 64 columns.
not an optimization metric - anything less than I second is equally good.
Note that Figure 7.6 does not dictate:._that each of the blocks be mapped onto a distinct
Regarding size, our design must use an IC that fits in a reasonably sized camera. Suppose
processor. Instead, the description only aids in capturing the functionality of the digital
that, based on current technology, we determine that our IC has a size constraint of 200,000
r.amera by breaking that functionality down into simpler functions. The functions could be
gates. In addition to being a constrained metric, size is also an optimization metric, since implemented on any combination of single,purpose and general-purpose processors.
smaller !Cs are generally cheaper. They are cheaper because we can either get higher yield
from a current technology or use an older and hence cheaper technology. Refined Functional Specification
Finally, power is a constrained metric because the IC must operate below a certain ,.
temperature. Note that our digital camera cannot use a fan to cool the IC, so low power We can now concentrate on refining the informal functional specification into one that can
operation is crucial. Let's assume we determine the power constraint to be 200 milliwatt. actually be executed. This typjcally .consists of a C or C++ program describing the
Energy will be an optimization metric because we want the battery to last as long as possible. functionality. In our case, we could write C or Ct+ code to describe each function in Figure
Notice that reducing power or time each reduces energy. 7.6. Such a software prototype of the system is often referred to as a system-level model, a .
www.compsciz.blogspot.in
Chapter 7: Digital Camera Example
7.3: Requirements Specification
output .1 e
rowirrlex = 0;
co1Index = O;
}
Figure 7.7: Block-diagram of the executable model of the digilal camera.
char CcdPopPixel (void)
dJar pixel;
prototype. or simply a model, though the prototype is also a first imple_mentation. Keep in I pixel = buffer[rcwln:lex) [collndex];
mind that one person's specification may be another person's implementation. . . if ( ++co1Index = SZ_CTJL ) {
The software prototype can be executed on our' development computer to ven_fy its collndex = 0;
correctness. It can also provide insight into the operations cif our system. For example, m our if ( ++.rwlndex = sz- Fail )
collrrlex = - l;
digital camera, we can profile our executable specification as it is running, in order to find the
rcwi ndex = -1;
computationally intensiveDfunctions. Recall that a profiling tool is a tool that watche~ a
program under execution and records the number of times a particular procedure or funct10n
call was made. or a variable was written or read. We can also use the prototype to obtain return pixel;
sample output that is later used to verify the correctness of our final implementation.. For
example. we can run an image through our executable specification and obtain the senally ·
encoded output and store that in a file. Later, when we are testing our final IC chip, we can
Figure 7.8: High-level implementation of the CCD module.
feed it the same image and check that the output matches the expected output.
Figure 7. 7 gives the block-diagram of our high-level model of the digital camera. O_ur
executable model is composed of five modules. We start with the CCD module and its The Ccdlnitialize procedure is called to initialize our model. just prior to execution. It
corresponding C file called CCD.C, as shown in Figure 7.8. This module is responsible for takes as a parameter the name of the image file that is used to obtain the pixel data. The
CcdCapture procedure is called to actually capture an image,· in this case·, read it from a file.
simulating a real CCD (i.e .. it is designed to mimic the operations of an actual CCD}. It does
that by simply reading tl1e pixels of an image directly from a file that we specify. This module The CcdPopPixel procedure is called to get the pixels out of the CCD, one at a time. At this
point, you should have noted that in our executable specification, our modules commwiicate
exports three procedures, Ccdl11irialize, CcdCapture, and CcdPopPixel,
using procedure calls and parameter passing. .
Our next module is called, rather cryptically, CCDPP, .and its corresponding C file is
called CCDPP.c, · as shown in Figure 7.9. The PP stands for preprocessing. This module
www.compsciz.blogspot.in ~
- ·- -· -······ ··----~ ~ ~ - --= ..:....i
7.3: Requirement~ Specification
Chapter 7: Digital Camera Example
flinclu:ie <stdio.h>
#define SZ_RCW 64 static FILE *ruq:utFi leHarrlle;
#define sz COL 64 void uartinitialize (ccnst char *rutp..ttFileNaire)
static diar
buffer[SZ !OfJ [SZ COL]; rutp.itFi]P.Han:lle = fopen(rutp..itFileNaire, "w") ;
static unsigned rcwirrlex, co1Index; J
void Ccdf:pinitialize() { void UartSen::i (char d) {
rcwirrlex = -1; fprintf(outµ.ltFileHarrlle, "%i\n", (int)d);
colirdex = - 1;
)
void CcclJ;p:apture (voidi
Figure 7. 10: High-level implementation of th~ UART module.
ctiar bias;
Ccdcapture () ;
for (rcwindex=O; rcwlndex<SZ IDv; rcwin:iex++) [
module is identical to that of the CCD module;:. We can think of the CCDPP as a CCD that
for (colin::iex=O; colI~<SZ COL; collndex++)
buffer(rcwin:iex) [col~) = CcdPopPi xel (); performs the zero-bias adjusunents internally
) Let us now !ook at the UART module and its corresponding C file called UART.C, as
bic:s = (CcdPopPixel () + CodPopPixel () ) / 2; shown in Figure 7. lO. Titls is really a model of a half UART (i.e., one that only transmits, bui
for(colirdex=O; colirrlex<SZ_COL; collrrlex++) does not receive). As with the other modules, the UART module exports an mitialization
buffer[rcwlrrlex) [co1Index) -= bias; procedure, called Uartlnitialize . This procedure takes a file name, were the transmiited data is
written to. The other procedure, UartSenii, is called when the digital camera is transmitting a
) byte. The procedure simply writes the transmitted byte to the output file,
rcwinde.x = 0; Our next module is called CODEC and its corresponding C file is called CODEC.C. as
colirdex = 0; ~hown in Figure 7. 11 . This file models the forward OCT encoding that was described earlier
in this chapter. The CODEC module exports the procedures Codeclnitialize. Cndec:PushPixef..
char Ccdf:pPopPixel (void) CodecPopPixel, and CodecDoFdct. The Codeclnitialize procedure resets an index 1ha1. is used
char pixel; by the push and pop procedures for traversing two buffers, described next. The
pixel = buffer[rcwlrrlex] [co1Index); {'odecPushPixe/ is called 64 times to fill an input buffer, called ibuffer. which holds the
if( ++colirdex = SZ_COL ) { original block of 8 x 8 pixels that is to be encoded. The CodecPopPixel is called 64 times to
co1Index = 0; retrieve pixels from the output buffer, called obuffer, which holds the encod~d block or 8 , 8
if ( ++rcwinde.x = szyo,, )
pixels. Once a block is placed in the input buffer, CodedDoFdct is called to actually perform
colirdex = - 1;
rcwirdex = -1; the transform. Therefore, to encode a block of8 x 8 pixels, we call CodecPushPixe/ 64 times.
. )
and CodecDoFdct once followed by 64 calls to CodecPopPixe/. Let us now discuss the actual
implementation o(this module. The module simply implements the FDCT equation given
return pixel; earlier and presented here again:
C(h) = if ( h == 0 ) then l/sqrt(2) else 1.0
F(u,v) = l/4 x C(u) x C(v) L =o1 L y:o.1 D,y x cos(n(2x + l)u / J6) x cos(n(2y + 1)v / 16)
Figure 7.9: lligh-kvd implementation <>fthc CCDPP module.
The first thing that you may note after studying the code is the large table called
performs the zero-bias adjustment processing. shown in Figure 7.9 and described at the COS_TABLE. If you look at the above equation, you'll notice that the argument to the cosine
beginning of this chapter. . function is always one of 64 possible -values. because the only variables in the cosine
This module also exports three procedures called Ccdpplnitialize , CldppCapture. and argument expression are the integers x and u (or y and v) and each of these variables can take
Ccdppl'opl'ixel. The Ccdpplnilialize procedure performs any necessary in_itializations. 1Jie oile of 8 values, from O to 7. Thus, for perfom1ance purposes, we have decided to precomputc,
CcdppCapture procedure is called to actually capture an image. Note that this pro~edure call,5 '. :/ the cosine value for afl these 64 possibilities and store them in a table. Actually. we have done
on the CcdCapture and CcdPopPixel procedures of the CCD module to obtain an unage. As 11 . u::,. more than that. l_nstead of storing the floating-point values. we have converted these 10 an
is obtaining the iQ1age pixels. it also performs the zero-bias .adjustments. The CcdppPopPixel integer representation.
procedure is called to get the pixels out of the CCDPP. Note that the interface to the CCDPP jj
!i
i
190 Embedded System Design _ , -~rnbedded System Design 191 l111
~
www.compsciz.blogspot.in . ··--- ·--- - -- -- --· -~ - .;,..,.:._____;_ ~=··· &
=...........
'
·..:··
.
I 32768,
32768,
32768,
32138, 30273, 27245,
27245, 12539, -6392,
18204, -12539, -32138,
23170, 18204, 12539,
-23170,
6392 ),
- 23170, -32138, -30273, -18204 ),
6392, 30273, 27245 ),
#de.fire
#define
SZ_COL 64
NUM_Ral_BUXlG (SZ_!Gl' / 8)
#de.fire
NUM COL IillXlG (SZ mL / 8)
( 32768, 6392, -30273, -18204, 23170, 27245 i -12539, -32138 ), static soort
b.lffer[SZ R:M] [SZ O)Lj, i, j' k, l, tarp;
( 32768, -6392, -30273, 18204, 23170, -27245, -12539, 32138 ), void Cntr1Initialize (void) {} -
I 32768, -18204, -12539, 32138, -23170, -6392, 30273, -27245 ), voidCntrlCaptureimage(void) (
( 32768, -27245, 12539, 6392, -23170, 32138, -30273, 18204 ), Ca:Jw:apture () ;
( 32768, -32138, 30273, -27245, 23170, -18204, 12539, -6392 } for(i=O; i<SZ RCJil; i++)
}; for(j=O; J<SZ_COL; j-1+) ,
static short OOE OvER SQRT T\'ll = 23170, ililffedB] [8], oruffer[B] [8], Hix; buffer [i] [j) = C~Pixel ();
static dcuble COS(int-xy, ~t uv) { return COS TABLE[xy] [uv] I 32768.0;
static dcuble C(int h} ( return h? 1.0 .: ONE_CNER_SQRT_'IW) I 32768.0; } void CntrlCaipressimage(void) (
static int FOCT(int u, int v, short ing[8] [Bl} [ for(i=O; i~_RCW_BLCCKS; i-1+)
dcuble s[8], r = 0; int x; for(j=O; j<Nl.M_COL_BI.a:KS; j++)
for(x=O; x<B; x-1+) [ for(k=O; k<B; k-1+)
s[x] = irrg[x] [0] * COS(O, v) + ing[x] [l]*COS(l, v) + i,.g[x] [2] * CC6(2, v) + for(l=O; 1<8; 1-ft)
irrg[x] [3] * COS(3, v) + ing[x][4]*COS(4, v) t irrg[x] [5] * CC6(5, v) + CodecPushPixel ( (char)wffer[i * 8 + kl [j * 8 + l]);
irrg[x] [6] * COS(6,. v) + ing[x] [7]*COS(7, v); CodecDoFdct () ; /* part 1 - ITC!' * /
for(k=O; k<B; k-1+)
for(x=O; x<8; x-1+') r += s[x] * COS(x, u}; for(l=O; 1<8; l-1+) (
return (short} (r * .25 * C(u) * Civ}); buffer[i * 8 + k] [j * 8 + l] = CodecPopPixel ();
b.iffer[i*B+k][j*B+l] >>= 6; /* part 2 - quantizaticn */
void Codecinitialize(void) [ 1dx = O;
void CodecPushPixel(short p) [
if( idx = 64) idx ~ O; }
i.buffer[idx / 8] [idx I 8] = p; idx++; void CntrlSendiroage(void)
} for(i=O; i<SZ RCJil; i++)
short CcxlecPopPixel(void) ( for(j=O; J <SZ_COL; j-1+) {
short p; tatp = a.iffer[i] [j];
if ( idx = 64 ) idx = 0; , UartSend ( ( (dla.r*) &tarp) (OJ) ; /* serd uwer byte * /
p = obuffer[idx / 8] [idx 1,: SJ,; idx++; UartSend( ( (dla.r*) &tatp) [l]); /* serd la...er byte * /
return p;
}
void CodecDoFdct(void)
int X, y;
for(x=O; x<B; x-1+) figure 7.12: High-level implementation of the CNTRL module.
for(y=O; y<B; y++) obuffer[x] [y] = FIXT(x, y, ibuffer);
}
idx = 0; of fixed-point representation, which is described later in this chapter. Thus the COS procedure
handles the portions of the above equation involving the cosine and its arguments.
We have also implemented a procedure called C that simply corresponds to the function·
Figure 7.11: lligh-lcvd impkmantation of the CODEC moduk
C(h) given above. All that remains now is the implementation .of the nested summations.
These summations are performed in the FDCT procedure. The inner summation is simply
More specifically. we have multiplied the 64 cosine values by 32,678 and rounded the unrolled (i.e., we have expanded it into eight terms that are added together). The outer
result to the nearest integer: The value 32,678 is chosen to allow us to store each value in t summation is implemented as two consecutive for loops. This choice of implementation, of
bytes of memor)'. To convert tl1csc integers back to floating point, we need lo divide the course, is not unique. There are many ways to perform FDCT and the ~der is encouraged, as
stored values by 32,678.0. This is accomplished in the procedure called COS. This· is a form . ari exercise, to implement these DCT functions with performance in mind.
www.compsciz.blogspot.in
Chapter 7: Digital camera Example
7.4: Design
www.compsciz.blogspot.in
·chapter 7: Digital Camera Example 7.4: Design
Instruction 4KROM
Decoder
Controller -~
ALU
14----.____~--'~~
L-------'
Figure 7.18: Rewriting the UART module lo utilize the hardware UART.
procedures with memory assigrunenls to the respective hardware devices. Let us show this
with the UART example. The code for this module is given in Figure 7.18. Here we have
defined two variables, called U_TX_REG; and U_STAT_REG. There are two keywords used
in defining these two variables that you may not recognize. The first one, called xdata,
instructs our compiler to place these variables in the external memory; in other words, the
Figure 7.17: The CCDPP single-purpose processor as a FSMD. compiler will generate code that will load and store these variables over the external memory
bus of the processor. The secorid keyword, called _at_ instructs Our compiler to place these
and the process eithet repeals, reading the next row, or stops when the e_ntire image is variables at the specified memory address. These two keywords allow us to declare a variable
processed. we assume that, as with the UART, this single-p~se processor 1s connected to such that when read or written will cause appropriate read or write operations to be performed
the 8051 processor's memory bus with the content of the mtemal buffer mapped to upper on the bus. Now, all we have to do to send a byte using our UART single-purpose processor is
memory addresses of the processor. . write the byte to be sent to the U_TX_REG causing it to be invoked. But since our processor
We now have all the componenls of our system-on-a-chip and,are ready to connect things may be much faster than the UART, we need to first make sur~t the UART is in its idle
together making up our digital camera. This is accomplished thro~gh the 8051 's ~emory bus, state. Til.i!; is accomplished by the while loop. Having designed our UART such that we can
as stated before. The 8051 memory bus uses a simple read and wnte protocol and 1s ~mposed check weather its busy or not, we can busy-wait until it becomes idle before.sending the next
of an 8-bit data-bus, a 16-bit address-bus, a read control signal and a write-control-signal. A data byte: The implementation Of the CCDPP module is similarly modified to utilize the
memory read works as follows. The processor places the memory address on the address-bus, CCD PP single-purpose processor. The rest of the modules are untouched.
then asserts the read control signal for exactly one clock-cycle and reads the data from the Now we can compile and ·link all our software modules and obtain the final program
data-bus one clock-cycle later. The device that is being read, either the RAM or on~ of o~ executable. This program executable is then translated into the VHDL representation of the
memory mapped single-purpose processors, when detecting that the read control signal 1s ROM using a ROM generator. All that remains is to test our entire system-on-a-chip. This is
asserted, and after checking the content of the address-bus, places and ~olds. th~ requ~ed done using a VHDL simulator program. A VHDL simulator takes as input the VDHL files,
data on the data-bus for exactly one clock cycle. A write ope-ration works m a similar ~on. making up our system, anci functionally simulates the execution of the final lC by interpreting
The processor places the memory and the data on the address and data·b'_'5, re~v~ly, the descriptions. By simulating, we are able .to learn weather our design is functionally
Then, it asserts the write control signal for exactly one clock cycle. The devi~ that 1s bemg correct. Moreover, we can also measure the amount of time, <''" clock-cycles, that it takes to
written, when detecting that the write control signal is asserted, and after checking the content process a single image. This is our first metric of inten,.,t, namely, performance.
of the ·address-bus, reads and stores the data from the data-bus. . Figure 7. l 9(a) shows how after simulating the VHDL models, we obtain.the execution time.
Now that we· have the hardware portion of our design implemented, we need to wnte the Figure 7.19(b) shows how we .synthesize the high~level VHDL models and obtain the gate-
software to complete the project. Fortunately, our executable specification will provide the level description of the corresponding circuils. Then, we simulate the gate-level models to
majority of the code that we need. In fact, we will maintain the same. structure of the code obtain the intermediate data necessary to compute the .power consumption of the circuit.
(i.e., we will keep the same module hierarchy, procedure names, and mam pro~). Th~ only Figure 7.19(c)-show~ how by adding the number of gates, we obtain the total area of the chip.
thing that needs to..be. done is to design the UART and CCDPP custom smgle-?urpose Once we are satisfied.that our design functions correctly, we can use our synthesis tool to
. processors, This is rather easy to do. All that we need to do. is replace the code m these translate the VHDL files down to an interconnection of logic gates. A synthesis tool is like a
www.compsciz.blogspot.in
---------------------·---------------
Chapter 7: Digital Camera Example 7.4: Design
Power \ operations. To make matters worsr., our processor is an 8-bit processor with no floating-point
equation )---. support; thus, the compiler needs to emulate each of these floating-point operations. Floating-
Gate level
simulator
/ ') point emulation is performed as follows. The compiler generates procedures for each of the
floating-point operations, such as multiplication and addition. These procedures may execute
tens of integer instructions in order to perform a single floating-point operation. Then. when
the compiler encounters floating-point operations in the source file, it places a call to these
compiler-generated procedures. ConSC(juently, our one million floating-point operations will
require ten million or more integer operations. In addition, our program will be larger, since it
Power has to accommodate the compiler-generated procedures.
We can thus consider speeding up the CODEC module to use fixed-point arithmetic. We
(b)
hope to reduce the total number of integer instructions required to encode each pixel. Our
Execution time implementation is shown in Figure 7.20. Let us first describe how fixed-point arithmetic
Chip area
works. In fixed-point arithmetic, we use an integer to represent real numbers. The bits within
(a)
(c) this integer are interpreted as follows. We use a constant and known number of these bits to
represent the portion of a real number after the decimal and the rest of the bits to represent the
portion of the real number before the decimal point.
Figure 7.19: Obtaining design•metrics of interest; (a) performance, (b) power, ( c) area. In our implementation of the CODEC, we have chosen to use 6 bits to represent the
fractional part of all arithmetic operations. The choice here has to do with the accuracy that
compiler for· single-purpose processors. It reads a VHDL file and translates it to a we desire. The more bits we m;e for the portion after the decimal, the more accurately we can
corresponding gate-level description. You '11 leam more about this process in a later chapter of · represent a real number. However, this will leave us fewer bits to represent the portion of the
this book. At this stage, these gates can be sent to an IC fabrication company to make our IC real number before the decimal point (i.e., the magnitude of the real number).
chip. But what we are interested in is counting the total number of gates_to get an idea of_h~w Once we have chosen the number of bits to represent the portion after the decimal point/
big our design is. This will tell us how big of an area we need ~o implement the d1~tal a.k.a. the fractional part, we can translate any constant to the fixed-point representation. For
camera, or the third metric of interest. To obtain the power consumptlon, our second metnc of example, imagine that we are.using 8-bit integers. Let us use 4 bits to represent the fractional
interest, we simulate the gate-level description of the digital camera and keep track of the part. The fixed-point representation of the real value 3. 14 would be 50, or 00110010. We
number of ti.mes these gates switch from zero to one and from one to zero. Recall that we can obtain 50 by multiplying the real value, 3.14, by 2 raised to the number gf bits we are using
estimate power consumption if we know the amount of switching that takes place in a circuit for the fractional part, i4 = 16, and rounding it to the nearest integer, 3.14 x 16 =50.24:::: 50.
We can now analyze our first implementation using the approach outlined in Figure 7.19. Note that the 4 least significant bits equal 2. Since there are a total of I 6 possibilities, each
Using simulation, we have measured the total execution time for processing a single image to would represent .0625. Given that we have·2, we get 2 x 0 .0625 = 0. 125. The four most
be 9.1 seconds. The power consumption is measured to be 0.033 watt. The energy significant bits encode the value 3, which when added to our fractional part, gives 3.125. Of
s
consumption is 9.1 x 0.033 watt= 0.30 joule. The area is measured to be 98,000 gates. course, our representation is not exact but close. We can improve this by using more bits for
the fractional part. In fact, the cosine table in Figure 7.20 gives the fixed-point representation
Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT of the cosine values, using 8-bit integers. ·
The previous implementation does not achieve I-image-per-second processing. Looking at the Now that we know how to represent a real number using integers, we have to define the
execution of the previous implementation, we see that most of the microcontroller computer two operations that are used in our calculations, namely addition and multiplication. Addition
cycles are spent performing the DCT operation. Thus, we could consider pulling this is straightforward. All that we have to do is add the integers. For example, assume that we
compute-intensive function out from software to custom hardware,_ as ~e did ~or the CCD have 3.14 encoded as 50, or OOIIOOIO and 2.71 as 43, or 00101011. To add these two
preprocessor. However, unlike the CCD preprocessor, the DCT functtonality 1s. fairly complex together, we add the integers 50 and 43 to obtain 93, or Ol O11 l Ol. Converting this back to a
and thus will likely require more design effort. We can instead speed up the OCT · real, we 6 et 5 + 13 .x 0.625 =5.8125. This number is close to the actual value, which is 5.85,
functionality by modifying its behavior. but not exact, as expected.
Recall that each OCT operation involves numerous floating-point operations. Actually, Similarly, with multiplication, we can multiply the two fixed-point values to obtain our
for. each pixel that is transformed, about 260 floating-point operations are performed. Th~re result. But, at this point we need to perform an additional operation. Let us multiply the value
are 64 x 64 = 4,096 pixels that are encoded, for a total of about one million floating-porn! 3. i4 encoded as 50, or 0011001-0 and 2.71 as 43, or 0010101 I. From this we obtain 2,150, or
www.compsciz.blogspot.in
Chapter 7: Digital Camera Example ----------------------------... -. -·-.. . . w~·- --- ·-- ••·-·-- ~
7.4: ~sign
www.compsciz.blogspot.in
;, ;,_~·-.
[
t
r
r,
l
I -~;,:.;__,.;;:;__----------------------
Chapter 7: Digital camera Example
I
l 7.4 Convert 1.0, U , 1.2, l.3, 1.4, 1.5, 1.6, 1.7, 1.8, and 1.9 to fixed-point representation
1 using (a) two bits for the fractional part, (b) three bits for the fractional part, and (~)
three bits for the fractional part.
CHAPTER 8: State Machine and
_,
7.5 Write two C routines that, each, take as input two 32-bit fixed-point numbers and
perform addition and multiplication using 4 bits · for the fractional part and the
remaining bits for the whole part.
7.6 Using any programming language of choice to (a) implement the FDCT and IDCT
Concurrent Process Models
equations presented in Section 7.2 using fixed-point arithmetic with 4 bits used for the
fractional part and the remaining bits u~ for the whc,le part, (b) use the block of 8 x 8
pixels given in Figure 7.2(b) as input to your FDCT and obtain the encoded block, (c)
use the output of part (b) as input to your IDCT to obtain the original block, and (d)
compute the percent error between your decoder's output and the original block.
7.7 List the modifications made in implementations 2 and 3 and discuss why each was 8.1 Introduction
beneficial in terms of performance. 8.2 Models vs. Lal).guages, Text vs. Graphics
8.3 An Introductory Example
8.4 A Basic State Machine Model: Finite-State Machines
8.5 Finite-State Machine with Datapath Model: FSMD
8.6 Using State Machines
8.7 HCFSM and the Staiecharts Language
8.8 Program-State Machine Model (PSM)
8.9 .The Role of an Appropriate Model and Language
8.10 Concurrent Process Model
8.11 Concurrent Processes
8.12 Communication among Processes
8.13 Synchronization among Processes
8.14 Implementation
8.15 Dataflow Model
8.16 Real-Time Systems
8.17 Summary
8.18 References and Further Reading
8.19 Exercises
8.1 Introduction
We implement a syste~'s processing behavior with p~ocessors. But to accomplish this, we
must_ 1'.3-ve first ~escnbed. that processing behavior. One method we've discussed for
de~nbmg processmg behavtor uses assembly language. Another more powerful method uses
a high-level programining language like C. -Both methods use what is known as a seq~ ntial
www.compsciz.blogspot.in
Chapter s: State Machine and ConcurTent Process Models
8.2: Models vs. Languages, Text vs. Graphics
programs, such as state machines or dataflow. Third, certain languages may be better at "Move the elevator either up or down to reach the
capturing sequential programs than others - while C works fine, a primitive language like requested floor. Once at the requested floor, open the door
for at least 10 seconds, and keep it open until the
assembly without constructs for "loops" or "procedures" may be cumbersome to use for requested floor changes. Ensure the door is never open.
up
capturing sequential programs. As another example, C can be used to capture state machines, Unit while moving. Don't change directions unless there are no
as we will see later, but a language intended specificaJly to capture state machines might be Control higher requests when moving up or no lower requests
more convenient. when moving down ... "
floor (b)
Textual Languages vs. Graphical Languages
Languages may use a variety of methods to capture models, such as text or graphics. Defining req Inputs: int floor; bit bLbN; upl..upN-1; dn2..dnN;
Request Outputs: bit up, down, open;
a graphical language equivalent to a textual one is fairly straightforward, and vice versa. The
Resolver Global van·ables: int req;
choice of a textual language versus a graphical language is entirely independent of the choice bl buttons
of a computation model. b2 } inside . void UnitControl() void RequestResol ver()
Let us return to our analogy involving recipes. We could choose to capture a particular bN elevator { {
recipe in the English textual language. On the other hand, we could choose to capture the up= down = O; open = I; while (I)
while (I) {
recipe using a graphical recipe language, which might include icons of objects like eggs and up I } up/down
up2 . buttons while (req == floor); req = ...
bowls, as well as icons for tasks like "mix" or "sinuner." open= O;
Likewise, we could choose to capture a particular sequential program in the C textual dn23 on each
up if(req > floor) {up= I;)
language. On the other hand, we could choose to capture the sequential program using a dn3 floor else {down= l;}
graphical sequential progranuning language, which might include icons of objects like while (req != floor); void main()
up= down= O; {
variables and constants, as well as icons for tasks like "assign" or "loop." Graphical dnN Call concurrently:
sequential programming languages were commonly proposed in the 1980s, but have not open= I;
delay(IO); UnitControl() and
become very popular. The state machine model is often captured in textual languages, but it is } RequestResolver()
also commonly captured in graphical languages found in numerous commercial products. l
(a) (c)
8.3 An Introductory Example Figure 8.2: Specifying an elevalor controller system: (a) system interface, (b) partial English description, (c) more
precise description using a sequential program model.
Here, we introduce an example system that we'll use in the chapter, and we'll use the
sequential program model, introduced in an earlier chapter, to describe part of the system. called delay). It then goes back to the beginning of the infinite loop. The RequestResolver
Consider the simple elevator controller system in Figure 8.2(a). It has several control inputs would be written similarly.
corresponding to the floor buttons inside the elevator and corresponding to the up and down
buttons on each of the N floors at which the elevator stops. It also has a data input
representing the current floor of the elevator. It has three control outputs that make the
elevator move up or down, and_open the elevator door. A partial English description of the 8.4 A Basic State Machine Model: Finite-State Machines
system's desired behavior is shown in Figure 8.2(b). In a _finite-state machine (FSM) model, we describe system behavior as a set of possible
We decide that this system is best described as two blocks. RequestResolver resolves the states; the system can only be in one of these states at a given time. We f "() describe the
various floor requests into a single requested floor. UnitControl actually moves the elevator possible transitions from one state to another depending on input values: FinaUy, we d~scribe
unit to this requested floor, as shown in Figure 8.2. Figure 8.2(c) shows a sequential program the actions that occur when in a state or when transitioning between states.
description for the UnitControl process. Note that this process is more precise than the · For example, Figure 8.3 shows a state machine description of the Uni/Control part of our·
English description. It firsts opens the elevator door and then enters an infinite loop. In this elevator example. The initial state. Idle, sets up and down to O_and open to L. ·nie state
loop, it first waits until the requested and current floors differ. It then closes the door and machine stays in state Idle until the requested floor differs from the current floor. If the·
moves the elevator up or down. It then waits until the current floor equals the requested floor, requested floor is greater, then the machine transitions to state Going Up, which _sets up to I,
stops moving the elevator, and opens the door for 10 seconds (assuming there's a routine whereas if the requested floor is less, then the machirie transitions to state GoingDown, which
www.compsciz.blogspot.in
;T
,:f
meaning that register updates are synchronized to clock pulses (e.g., registers are updated
only on the rising {or falling) edge of a clock). Such an FSM would have every transition
u,d,o, t = 1,0,0,0 condition ANDed with the clock edge (e.g., clock ' rising and x = y). To avoid having to add
timer< IO this clock edge to every transition condition, we can simply say that the FSM is synchronous,
meaning that every transition condition is implicitly ANDed with the clock edge.
reg= floor u,d,o,t = 0,0, I, I
u,d,o,t = 0, 1,0,0 !(req<floor) 8.5 Finite-State Machine with Datapath Model: FSMD
u is up, d is down, o is open · When using an FSM for embedded system design. the inputs and outputs represent.Boolean
t is ti mer_start data types, and the, functions therefore represent Boolean functions with Boolean operations.
Titis model may be sufficient for many purely control systems that do not input or output data.
However, when we must deal with data, two new features would be helpful: more complex
Figure 8.3: The elevator's UnitControl process described using a state machine.
data type~ (such as integers or floating point numbers) and variables to store data. Gajski (see
:hapter 2) refers to an FSM model extended to support more complex _data types and
sets down to I. The machine stays in either state until the current floor equals the requested vuiiables as an FSM with datapath, or FSMD. Most authors refer to this model as an extended
floor, after which the machine transitions to state DoorOpen, which sets open to I. We FSM, but there are many kinds of extensions and therefore we prefer the more precise name
assume the system includes a timer, so we start the timer while transitioning to DoorOpen. ofFSMD: One possible FSMD model definition as a 7-tuple is <S. I, 0 , V, F, H. so>, where:
We stay in this state until the timer says 10 seconds have passed, after which we transition
back to the /d/.e state. Sis a set of states {so. s1, ... , s1},
We have described state machines somewhat informally, but now provide a more formal lisasetofinputs {it\ i 1, ... , im},
definition. We start by defining the well-known finite-state machine computation model, or 0 is a set ofoutputs {ot\ 01 , . .. , On},
FSM, and then we'll define extensions to that model to obtain a more useful model for Vis a set of variables {vt\ v 1, .... vn},
embedded system design. An FSM is a 6-tuple F<S. I, 0, F, H, so>, where
Fis a next-state function, mapping states and inputs and variables to states (S x I x V->S),
S is a set of states {so, s1, ••• , si),
His an action function, mapping current states to outputs and variables (S->O + V),
/is a set of inputs {io, i 1, ••• , im), so is an initial state.
0 is a set ofoutputs {oo, 01, ••• , o.), In an FSMD, the inputs, outputs and variables may represent various data types perhaps
as complex as the data types allowed in a typical programming language. Furthermore, the
Fis a next,state function (i.e., transitions), mapping states and inputs to .states (SX/->S),
functions F and H may include arithmetic operations, such as addition, rather than just
His an output function, mapping current states to outputs (S->0), Boolean operations as in an FSM. We now call Han _action function rather than an output
function, since it describes not just outputs, but also variable updates. Note that the above
so is an initial state.
definition is for a Moore-type FSMD, and it could easily be modified for a Mealy type or a
The above is a Moore-type FSM, which associates outputs with states. A second type of combination of the two types. During .execution of the model, the complete system state
FSM is a Mealy-type FSM, which associates outputs with transitions (i.e., H maps S x 1->0). consists not only of the current state s;, but also the values of all variables. Our earlier state
You might remember that Moore outputs are associated with states by noting that the name machine description of Uni/Control was an FSMD, since its input data types were integers,
Moore has two o's in it, which look like states in a state diagram. Many tools that support and it had arithmetic operations, like magnitude comparisons, in its tran,sition conditions.
FSMs support combinations of the two types, meaning we can associate outputs with states,
transitions, or both.
We can use some shorthand notations to simplify FSM descriptions. First, there may be 8.6 · · Using State Machines
many system outputs, so rather than explicitly assigning every output in every state,.we can
say that any outputs not assigned in a state are implicitly assigned 0. Second, we often use an Having introduced the basic FS~ and FSMD models, ws:: now discuss several issues
FSM to describe a single-purpose processor (i.e., hardware). Most hardware is synchronous, related to using those models to describe desired system behavior.
different way of thinking of a system's behavior. Capturing State Machines in Sequential Programming Language
X common, point of confusion is the distinction between state machine and sequential As elegant as the state machine model is for describing control-v.ominated systems, the fact
progran1 models versus the distinction between graphical and textual languages. In particular, remains that the most popular embedded system development tools use sequential
a state machine description excels in many cases, not because of its graphical representation, programming languages like C, C++, Java, Ada, VIIDL, or Verilog. Such tools are typically
but rather because it provides a more natural means of computing for those cases; it can be complex and expensive, supporting tasks like compilation, synthesis, simulation, interactive
captured textually and still provide the same advantage. For example, while in Figure 8.3 we debugging, and/or in-circuit emulation. Thus, although sequential programming languages do
described the elevator's UnitControl as a state machine captured in a graphical state-machine not directly support the capture of state machines (i.e., they don't possess specific constructs
language, called a state diagram , we could have instead captured the state machine in a corresponding to states or transitions) we still want to use the popular embedded system
textual state-machine language. One textual language would be a state table, in which we list development tools to protect our financial and educational investments in them. Fortunately,
each state as an _entry in a table. Each state's row would list the state's actions. Each row we can still describe our system using a state machine model while capturing the model in a
would also list all possible input conditions, and the next state for each such condition. sequential program language, by using one of two approaches.
Conversely. while in Figure 8.2 we described the elevator's UnitContro( as a sequential In afront-end tool approach, we install an additional tool that supports a state machine
program captured using a textual sequential programming language, in tfus case C, we could ·language. These tools typically define graphical and perhaps textual state machine languages,
have instead captured the sequential program using a graphical seguential programming and include nice graphic interfaces for drawing and displaying states as circles and transitions
language, such as a flowchart. as directed arcs. they may support graphical simulation of the state machine, highlighting the
www.compsciz.blogspot.in
Chapter B: State Machine and Concurrent Process Models 8.7: HCF~M and the Statecharts Language
#define SO 0
#define Sl 1
#define SN N
X
void StateMachine() ( B
int state = SO; // or: whatever: is the initial state.
while (1) (
switch (state) (
SO:
I I Insert SO' s acticns here & Insert tr:ansiticns T1 leaving SO: (a) (b)
if( T.,'s con:lition is true) {state= T0 's next state; /*acticns*/ / B '\
if( T;,s con:lition is true) {state= Tt's next state; /*acticns*/
Sl:
SN:
if ( Tw.' s con:iition is true ) [state =Tm's next state; /*acticns*/
break;
(c)
~
construct. We capture the state machine as a subroutine, in which we declare a state variable
initialized to the initial state. We then create an infinite loop, containing a single switch
Figure 8.5: General template for capturing a state machine in a sequential programming language. statement that branches to the case corresponding to the value of the state variable. Each
state's case starts with the actions in that state, and then the transitions from that state. Each
current state and active transition. Such tools automatically generate code in a sequential transition is captured as an if statement that _checks if the transition's condition is true and then
program language (e.g., C code) with the same functionality as the state machine. This sets the next state. Figure 8.5 shows a general template for capturing a state machine in C.
sequential program code can then be input to our main development tool. In many cases, the To be safer, we could replace the sequence of if statements representing a state's
front-end tool is designed to interface directly with our main development tool, so that we can transitions by an if-then-else statement. This would ensure that if the transition conditions
control and observe simulations occurring in the development tool directly from the front-end were mistakenly nonexclusive, the code would merely execute the first transition whose
tool. The drawback of this approach is that we must support yet another tool, which includes - · · condition was true, rat.her than executing all such transitions.
additional licensing costs, version upgrades, training, integration problems with our
deYelopment environment, and so on. .
In contrast, we can use a language subset approach. In this approach, we directly capture
our state machine model in a sequential program language, by following a strict set of rules 8.7 HCFSM and the Statecharts Language
for capturing each state machine construct in an equivalent set of ~uential pro~am Hiererarchicallconcurrent state machine models (HCFSM) are extensions to the state
constructs. This approach is by far the most common approach for captunng state ~chines, ·, machine model. Hare! proposed extensions to the state machine model to support hierarchy
both in software languages like Caswell as hardware languages like VHDL and Venlog. W.e and concurrency, and developed Statecharts, a graphical state machine language designed to
now describe how to capture a state machine model in a sequential program language. . . . caplurc that model. We refer to the model as a hierarchical/concurrent FSM, or HCFSM.
We start by capturing our UnitControl state machine in. the sequentialprogramnung · The hierarchy extension in HCFSMs allows us to decompose a state into auoiher state
language C, illustrated in Figure 8.4. We enumerate all states, in this case using the #define C machine. or com·ersely staled, to group several states into a new hierarchical· state. For
ElevatorController
. state machine in Figure 8.6(a), having three states Al, A2, ~~d 8. Al is
example, consider the . . h A 1 or A 2 and event z occurs, we trans1uon to state UnitControl RequestResolver
the initial s~te. ~en~ver we are •h~ e1tber ouping A I and A2 into a hierarchical state A, as
B we can simplify this state mac me Y gr ... 1 Al w Norma!Mode
shown in Figure 8.6(b). State A is the i~it_ial ~tate, which in~um th;sl: :t~::eani~g i:
draw the transition to B on event z as orig10at10g from state , no . . · !fire
that regardless of whether we are in A 1 or A2, event z causes a transition to state 8 · th t
As another hierarchy example consider our earlier elevator example, and suppose 1 a we
' . · h t· ed'ately moves thee evator
want to add a control input fire, along with new behavior t a 1mm I -
UnitControl Figure s·.8: Using concJrrency in an HCFSM to describe both processes of the ElevatorController.
u,d,o = 1,0,0 down to the first floor and opens the door when fire is true. As shown in Figure 8.7(a), we
can capture this.behavior by adding a transition from every state originally in UnitControl to a
u,d,o = 0,0, I u,d,o = 0,0,l new state called FireGoingDn, which moves the elevator to the first floor, followed by a state
FireDrOpen, which holds the door open on the first floor. When fire becomes false, we go lo
req=floor the Idle siate. W.lile this new state machine captures the desired behavior, the state machine is
u,d,o=0,1.0 becoming more complex due to many more transitions, and harder to comprehend due to
more states. We can use hierarchy to reduce the number of transitions· and enhaucc
understandability. As shown in Figure ·8.7(b). we can group the original state mac_!tine inlo a
!fire hierarchical state called Norma/Mode, and group the fire-related states into a state called
FireMode. This grouping reduces the number of transitions, since instead of four transitions
(a)
from each-original state to the fire-related states, we now need only one transition.in this case
Uni1Control
from Norma/Mode to FireMode. This grouping also enhances understandability, since it
Normal Mode clearly represents two main operating modes. one normal and one in case of fire.
The second extension that HCFSMs possess. concurrency, allows us lo u~ hierarchy to
decompose a state into two concurrent states, or conversely stated, to group two concurrent
states into a new hierarchical state. For example, Figure 8.6 (c), shows a state B decomposed
,d,o=0,0,1 into two concurrent states C and D. C happens to be decomposed into another state_machine,
DoorOpen as docs D. Figure 8.8 shows the entire ElevatorControiler behavior captured as a HCFSM
with two concurrent states.
Therefore, we see that there are two methods for using hierarchy to decompose a state
into substates. OR-decomposition decomposes a state into sequential states, in which only one
state is active at a time - either the first state OR the second state OR the third state. etc.
AND-decomposition decomposes a state into concurrent states, all of which are active at a
time -. the first state AND the second state AND the third .state. etc.
The Statecharts language includes numerous additional constructs t~ improve state
machine capture. A timeout is a transition with a time limit as its condition. The u-ansition is
automatically taken if the transition source state is active for an amount of time equal to the
limit. Note that we used a timeout lo simplify the UnitControl state machine in Figure 8.7;
. rather than starting and checking an external timer in state DoorOpen,. we instead created a
transition from DoorOpen to Idle ,~ith the condition timeout( 10). History is a mechanism for
Figure 8.7: The elevator's UnitControl with new behavior for a new inputjire: (a) without hierarchy (quite a mess). remembering the last substate that an OR-decomposed state A was in before transitioning to
(b) with hierarchy.
ElevatorController
we describe FireMode as a sequential program. We didn ·1 have to use scquenlial programs for
Treq; R those program-states, and could have used state machines for one or both - the point is thal
UnitControl ,Jr
NormalMode PSM allows the designer to choose whichever model is most appropriate.
up = down =0; open = 1; PSM enforces a stricter hierarchy than the HCFSM model used in Statecharts. In
while (I) { ·
req = ... Statecharts. transitions may point not just between states at the same level of hierarchy_ but
while (req = floor); may cross hierarchical levels also. An example is the transition in Figure 8.7(b) pointing from
open=O; · the ~11~eDrOpen substate of the FireMode stale to the A'orma/Mode state. Having this
if(req > floor) {up= l;} '
trans1t1on start from FireDrOpen rather than FireMode causes the elevator to always go all
else {down= l;} · I
while (req != floor); the way down to the first floor when the fire input becomes true_ even if the input is true just
open= I; momentanly. PSM. on the other hand. allows transitions only between sibling states (i.e..
delay(lO); between states with the same parent state). PSM's model of hierarchy is the same as in
I sequential program languages that use subroutines for hierarchy; namely.· we alwars enter the
!fire subro~tine from one point and when we exit the subroutine we do not specify t; where we
fire are exit.mg.
As in the sequential programmmg modeL but unlike the HCFSM modeL PSM includes
the n~tion of a program-state completing. If the program-state is a sequential program. then
reachmg the end of the code means tl1e program-state is complete. If the program-state is
OR-decomposed into substates. then a special complete substate· may be added. Transilions
may occur fro~~ substate to the complete substate_ but no transitions may leave the complete
Figure 8.9: Using PSM to describe the ElevatorController. substate. Trans1honmg to the complete substate means that the program-state is complete.
Consequently, PSM mtroduces two types of transitions. A iransition-immediateh- (Tl)
another state B. Upon reentering state A, we can start with the remembered substate rather transition is taken immediately if its condition becomes true_ regardless of the stalus· of the
than A's initial state. Thus, the transition leaving A is treated much like an interrupt and Bas source program-state - this is the same as tl1e transition type in an HCFSM. A second_ new
an interrupt service routine. type of transition, transition-on-completion (TOC). is taken only if the condition is true AND
the source program-state is complete. Graphically_ a TOC transition is drawn originating from
a filled square inside a state, rather than from tl1e state· s perimeter. We used a TOC transition
in Figure 8.9 to transition from FireN!ode to Norma/A/ode only after Firdfode completed.
8.8 Program-State Machine Model _(PSM) where such compktion meant that the elevator had reached the first flooi;. By supporting both
The program0 state machine (PSM) model· extends state machines to allow use of sequential types of transitions, PSM elegantly merges · the reactive nature of HCFSM models. using Tl
program code ·10 define a state's actions, including extensions for complex data types and transitions. with the transfom1ational nature of sequential program models, using TOC
variables. PSM also includes the hierarchy· and concurrency extensions of HCFSM. Thus, transitions.
PSM is a merger of the HCFSM and s~uential program models, subsuming both models. A The SpecCharrs language was tl1e first language designed to easily capture tlie PSM
PSM having only one state, called a program-state in PSM terminology, where that state's model. Actually, two languages were defined. one graphical and the other textual. SpecCharts
actions are defined using a sequential program, is equivalent to a sequential program. A PSM was designed as an extension of VHDL, using VHDL 's syntax and sem·an1ics for all variable
having many states, whose actions are all just assignment statements, is equivalent to an declarations and sequential program statemenls. More recenlly. the Spec(' language was
HCFSM. Lying between these two extremes are various combinations of tlie two models. developed to capture PSM, hut uses an extension of C rather than VHDL.
· For example, Figure 8.9 shows a PSM description of the ElevatorController behavior,
which ·we AND-decompose · into two concurrent program-states UnitControl and
RequestResolver, as in the earlier HCFSM example. Furthermore, we OR-decompose 8.9 The Role of an Appropriate Model and Language
UnitControl into two sequential program-states, NormalMode and FireMode, again as in the
HCFSM example. However, unlike the HCFSM example, we describe NormalMode as a Specifying embedded system functionality can be a hard task. bul an appropriate eompulation
sequential program, identical to that of Figure 8.2(c), rather than a state machine. Likewise, model can help. The model shapes the way we lhink of the system. The language should
capture the model easily.
:· ~c
I~·
g
Chapter 8: State Machine and Concurrent Process Models 8.10: Concurrent Process Model
[ Consider how models sh.aped the way we thought about the elevator controller example's
UnitControl behavior. In order to create the sequential program that we captured in Figure Coo=rentProcessExarrple ()
X = ReadX() PrintHelloWorld
f
l,
8.2(c). we were thinking in tenns of a sequence of actions. First, we wait for the -requested
floor to differ from the target floor, then we close the door, then we move up or down to the
y = ReadY()
Call =ncur.rently:
- ReadX - Read~
desired floor. then we open the door, and then we repeat this sequence. In contrast, in order to PrintHellaobrld(x) and PrintHowAreYou
create the state machine that we captured in Figure 8.3, we were thinking in tenns of possible PrintHcwAreYru(y)
system states and the transitions among those states. Many individuals say that, fo, this time
PrintHelloW)rld(x) (
example. the state ma:;hine model feels more natural than the sequential program model (b)
while( 1 ) {
When a system must react to a va,;ety of changing inputs, a state machine model may be a print "Hello =rld." Enter X: 1
good choice. Furthermore, notice that the HCFSM model was able ·to describe the fire delay(x); Enter Y: 2
behavior nicely. while the FSM or FSMD models would have become somewhat complex. Hello world. (Time= 1 s)
The language should capture our chosen model easily. Ideally, the language would have Hello world. (Time= 2 s)
constructs that directly capture features of the model - a language for capturing state Print:Ha,JJ\.reYou (x) Hew are you? (Time= 2 s)
..rule( 1) [ Hello world . (Time= 3 s)
machines should have constructs for capturing states and transitions, for example. However,
print "Hew a.re yru?" Hew. are you? (Time= 4 s)
such a model/language match is not always the case. As you may have already ascertained, delay(y); Hello world. (Time= 4 s)
the most common situation of a model/language mismatch in embedded systems is that of
having a language designed to support the sequential program model, but wanting to capture a
system using a state machine model. In this case, we can use structured techniques for
capturing the state machine model in the sequential program language, as shown earlier. To (a) (c)
see the benefit of using the best model, think of how the fire behavior would have been
incorporated into the sequential program of Figure 8.2(c). We would have had to insert checks Figure 8.10: A simple concurrent process example: (a) pseudo-code, (b) subroutine execution over time, (c) samplo
input and output.
for the signal throughout the code, making the code very complex.
The moral of _the story here. is that often we cannot choose the language used to capture
embedded system functionality - that choice is often dictated by other factors. But we need The concurrent process model is a model that allows us to describe the functionality of a
not be limited to using the model directly supported by that language. We can use a different system in terms of two or more concurrently executing subtasks. Many systems are easier to
model if that model provides an advantage, and then capture the model in the language using describe as a set of concurrently executing tasks.because they are inherently multitasking. For
structured techniques. instance, imagine this variation on the Hello World example. This system allows a user to
provide two numbers X and Y. We then want to write "Hello World" to a display every X
seconds, and "How are you" to the ·display every Y seconds. A very simple way w describe
this system using concurrent tasks is shown in Figure 8. IO(a). After reading in X and Y, we
8.1 o Concurrent Process Model call two subroutines, each describing one of the tasks, concurrently. One subroutine prints
Thus far in this chapter, we have looked at computational models such as finite-state "Hello World" every X seconds, the other prints ''How are you" every Y seconds. {Note that
machines and drawn an important distinction between computational models and languages. you cannot call two subroutines concurrently in a pure sequential program model, such as the
As defined in the previous chapter, a computation model provides a set of objects and rules model supported by the basic version of the C language). As shown in Figure 8. IO(b), these
operating on those object that help a designer describe a system's functionality. A system's two subroutines execute simu)taneously. Sample output for X = l and Y = 2 is .shown in
functionality, in fact, may be described using multiple computational models. A language, on ;-·igure 8. IO(c). To see why concurrerit processes are helpful, try describing the same system
the other hand, provides semantics and· constructs that enable a designer to capture a using a finite-state machine or Pascal program. You will find yourself exerting effort figuring
computational model. Some languages, in fact, capture more than one computational model. out how to schedule the two subroutines into one sequential program. Since thi.s example is a
Tn this chapter, we present a new <::omputational model called concurrent process. In addition,·, . trivial one, this extra effort is not a serious problem, but for a complex system, this extra
we extend our distinction between computational models and languages to include -cffon can be significant and can detract from the time you have to focus on the desired system ,
implementation. behavior. In general, the concurrent process model is useful when describing systems that are
inherently multitasking. That is to say that the function of these systems can best oe described ;
in tenns of a number of subtasks each executing concurrently to one another.
www.compsciz.blogspot.in
Chapter 8: State Machine and Concurrent Process Models
. - - = - - - - ~ ~ - - - - - ~ - - - - - -- -~--~..'.8::_.1,:.1::_:.::C~o:n::c:'.u~rr:en:t~P:'.r~o~ce:'.s'.:se:s:_
The choice of
computational
Heartbeat Monitorin S stem
j State : : Sequent : ' Data
( machine : 1 pro~ I! flo; I!
: /Concurren(
processes !
model( s) is based on
whether it allows the
designer to describe
Task I:
Read pulse
If pulse < Lo then
Task 2:
If BI /82 pressed then
Lo= Lo+/- I
the system. Activate Siren If B3/84 pressed then
(a) If pulse> Hi then Hi= Hi-+/- I
Activate Siren Sleep 500 ms
Sleep I second ' · Repeat .!
Repeat ~
The choice of ~
language(s) is based
on whether it
i~ l
captures the I
J
computational
model(s) used by the set-ton BOX 1I
designer. Task I: Task 2:
Read Signal Wait on Task I Ii
Separate AudioNideo I·. Decode/output Audio I
I
Send Audio to Task 2 · Repeat I
(b) Send Video to Task 3 !
The choice of
implementation is Repeat Task 3: ,_....
based on whether it Wait on Task I
meets power, size, Decode/output Video
performance and cost Repeat
, requirements. .. . -
Figure 8.11: Distinctions between computational models, languages, and implementations. Figure 8. 12: Typical examples of embedded system: (a) Heartboat monitoring system (b) Set-top bo~ system.
www.compsciz.blogspot.in - -- - - - · - · ·- -- - - - --- • . ~
- ¼ ........... ...- - · -·,
Chapter B: State Machine and Concurrent Process Models 8.12: Communication among Processes
and decompose it into compressed audio and video streams. l11e second and third parts of the further create other processes and so on. Again, keep in mind that in this discussion the term
system, in tum, will decode the compressed audio and video signals. The three subparts in the procedure and process a_re used interchangeably.
set-top box are quite independent of one another also, and can be thought . of executing Terminate terminates an already executing process and destroys all data associated with
concurrently to one another, even though they share data Tiying to describe them as a single that process. Terminate is an operation that is performed by one process on another. If a
sequential program in a sequential program model could be difficult. Instead, we' d like to process does not implement an infinite loop, it is terminated automatically when it reaches the
describe them using three sequential programs, indicating that these three programs could end of its execut_ion (i.e., right after executi~g its last instruction). The need for terminating a
execute concurrently. But we don't want three entirely separate prngrams, since those three process _may anse when handhng exceptional events. For instance, in an assembly-line
programs do need to communicate with one another. In fact, these three programs share large momtonng system composed of multiple processes, if one process detects an error condition
volumes of audio and video data Thus, the need arises for a model for describing multiple it may terminate other processes, such as those controlling the conveyer belt driver motor and
communicating sequential programs. A concurrent process model achieves this goal. A guide arms.
process is just one of the sequential programs iri such a model. The traditional definition of a
process is simply a unit of execution. A process executes concurrently with the other Process Suspend and Resume
processes in the model and is typically thought of as an infinite loop, executing its sequential Suspend suspends the execution of an already created process. Once a process, say, X, has
statements forever. started to execute, another process may need to stop it without terminating it. lllat means that
We define a process's state to be one of running, runnable, or blocked. A process is in the the state of X (i.e., all the intermediate data value~ that have been computed by that process)
running _state if it is currently being executed. A process is in the runnable state if it is ready and the location of the currently executing instruction or the program counter need to be
·and executable. Of course, there is no reason for a runnable· process not to be running. saved. A suspended process can, at some later point, be allowed to execute .again by restoring
However. as we will see later in the chapter, when we discuss implementation of concurrent its state and allowing it to execute. This operation is called Resume.
processes, a runnable process may be waiting its turn to -be executed. A process is in the
blocked state if it is not ready to be executed. There are a number of reasons for a process to Process Join
be in the blocked state. One reason could be that the process needs to wait for some other
process to finish its execution first Another common reason for a process to be blocked is Once a process, say, X, has started to execute, another process, typically the one that created
when it is waiting for some device to complete an operation, such as, waiting for the network X, may need to wait until ,X _fiP,ishes execution and terminates. That means that the process
invoking lhe join opera'tioii' is suspended until the to-be-joined process has reached the end of
device to send a data packet.
Recall that a computational model defines objects and operations on those objects. In ·a· its executing. This operation is called Join. Join is an important operation that is uses for
concurrent process model, a process becomes the fundamental object encapsulating some synchronization of processes arid their execution. We will discuss process synchronization in I
portion of a system's functionality. The basic operations defined by the concurrent process detail later in this chapter. I
I
model on processes are create, terminate, suspend, resume, andjoin, which we now describe.
I
Process Create and Terminate 8.12 Communication among Processes ,, j
Create creates a new process, initializes any associated data and starts execution of that When a.system's functionality is divided into two or more concurrently executing processes,
process. In our Hello World example, shown in Figure 8. lO(a), we created two processes by .. it is essential to provide means for communication among these processes. Two common
executing concurrently two procedures called PrintHelloWorld(x) and PrintHowAreYou(y}. methods for .communication among processes are shared memory and message passing. In
Each of these procedures described the sequential execution of one of the processes of our shared memory, processes can read and write the same memory locations. In message
example. Conceptually, one can think of a create operation as an asynchronous procedure call. passing, processes explicitly send or receive data to and from each other.
In a sequential programming model, a procedure call blocks the calling procedure and starts
executing tl)e called procedure. Once the called procedure terminates, coritrol is transferred _ Shared MG:nory
back to the calling procedure, and it is allowed to resume execution. In our analogy, a· Using shared mt;..,Oiy, multiple processes communicate by reading and writing the same
procedure acts like a process and the procedure call behaves like creating another process. In memory locations or common variables. This form of communication is very efficient and
contrast in the concurrent process model, an asynchronous procedure call does not block the easy to implement. An example of using shared memory is shown in Figure 8.13. In this
calling procedure (process). Instead both the calling procedure (process) and the · called particular example, we have two processes that share the same memory address space. In
procedure (new process), start executing concurrently. Either one of these. processes can
particular, they shared an array of N data items called buffer and a variable that holds the
www.compsciz.blogspot.in
Chapter 8: State Machine and Concurrent Process Models
8.12: Communication among Processes
www.compsciz.blogspot.in
Chapter 8: State Machine and Concurrent Process Models 8.13: Synchronization among Processes
The identifier Wliquely identifies one of the processes that are currently executing in 'ihe Condition Variables
system. An example of message passing is illustrated in Figure 8.16. Here process A , after One way to achieve svnchronization among concurrenth' executing p~occsses ·
. -· ._ . .. - 1s to use a
producing a data packet, sends it lo process B. Meanwhile, process B receives the packet, special. dconstruct called a cond111on vanable. A condition variable is an ob,iect ti . t -
k · . _ - . , . 1a pcnmts
performs some transformation on the data and sends it back to A . Process A, after receiving two m s_of operauons. called signal and wmt. to be performed on it. When wait is pcrf., cd
the data packet, consumes it and the cycle repeats. Regions of code labeled I and 2 are d" · I vnn
on a con 1110n vanab_ e. the proc~ss that performed the wait operation is blocked until another
segments that perform auxiliary functions in each process. proces~ performs a signal opera110n on lite same condition ,·ariablc. The semantics of a wait
Note that receive operations are always blocking. That means the once a process executes operallon 1s m fact a bit more complex. When a process. say. .·J. executes a wait opera lion. it
a receive operation, it is blocked until another process executes the corresponding send passes 11 a mutex Yanable that it has already acquired the lock for. The wait operation \\ill
operation. The send operations, on the other hand, may or may not be blocking. One reason th~~ cause ~e mutex to be unlock~d such that another process. say. 8. may be able to enter a
for having nonblocking send operations is to allow a process that just performed a send cnucal section and compute some value or make some condition become true. Once the
operation to continue with its execution. In our example, the regions of code labeled I and 2
are executed immediately after a send operation, even though the receiving process may not 01: data_type l:uffer[N];
have received the data item. 02: int count= O;
03: nutex cs_nutex;
04: cmditirn l:uffer_enpty, l:uffer full;
06: void p=essA() (
8.13 Synchronization among Processes 07: int i;
In order for two or more concurrent processes to accomplish a common task, they must al 08 : while( 1 ) (
times synchronize their execution. Synchronization among processes means that one process 09: prcx:h.Jce(&data);
must wait for another process to compute some value, reach a known point in its execution, or 10: cs_nutex.lock();
signal some condition, before it (the waiting process) proceeds. To clarify this concept, 11: if( camt = N) l:uffer arpty.wait( cs rrutex);
13: l:uffer[i] = data; - -
consider the consumer-producer example shown in Figure 8.14. Recall that on lines 8 and 19
14: i =- (i + 1) \ N;
processes A and Blooped waiting for some condition to become untrue. The condition for the 15: count =count+ l;
consumer process A was that the value of count becomes less than N, meaning that buffer 16: cs_nutex.unl.ock();
contained at least on~ empty slot. The condition for the producer processB was that the value 17: l:uffer_full. s ignal();
of count becomes greater than zero, meaning tliat buffer contained at least one new data item. 18:
This form of waiting on a condition is called busy-waiting. It is called busy-waiting because 19:
the waiting process is simply executing noops, instead of being blocked until the condition is 20: void processB()
met, hence making the GPU available for useful computation. In this section, we will 21: int i;
introduce constructs that are more efficient to use in place of busy-waiting. Note that we have 22: while( 1 ) {
discussed the join opemtion and blocking send and receive primitives earlier in this chapter, 23: cs_nutex.lock();
which are both forms of synchronization primitives. 24: if( ca.mt= 0) l:uffer full.wait(cs mutex );
26: data = buffer[i]; - -
The join operation that we discussed earlier is a limited form of synchronization among
27: i = (i + 1) ~ N;
two processes. Recall that here, one process performed a join operation on another process, 28: count = count - 1;
indicating that it should be blocked until that other process terminates. The blocking send and 29: cs_m:itex.unl.ock(};
receive protocols, a.k.a. synchronous send and receive, discussed in the previous section, also 30: 1:uffer_enpty.signal();
serve to synchronize processes. When one process performs a seqd or receive operation, it is 31: consune(&data); ,.
blocked until the other process reaches its receive or send point, respectively, before the 32:
blocked process is allowed to continue. We will next describe condition variables and 33:
monitors as synchronization mechanisms. 34: void Ira.in(}
35: create_proc:ess(processA); create_process (processB);
37: .-r
I
Chapter s: State Machine and Concurrent Process Models
condition becomes true, process B will signal the condition variable causing process A to
8.13: Synchronization among Processes
I
f
weather there is at least one valid data item in our buffer, called bufferJuli .. The two
processes execute as follows. Once the producer process A has produced valid data, it
acquires the lock to the critical section. It then checks the value of count. If the value is N, the
p6:
07:
08:
09:
void prooassA.() (
int i;
while ( 1 ) (
prod.le:~ (&clat:a) ;
buffer is full, so it executes a wait operation on the buffer_empty condition variable, thus 10: if( count= N) buffer_tiipty.wait();
waiting until the buffer becomes empty. Meanwhile, by executing the wait operation, it 12: b.lffer[i] = data;
releases the lock to the critical section such that the producer process is able to enter and 13: i = (i + 1) % N;
execute that region of code. (Othern-ise, the consumer process will never be able to enter the 14: ca.int = COJlt + l;
15: buffer_full.signal();
critical section and consume data; therefore, the system will be deadlocked!) If the value of
16:
count is less than N, the consumer proce·ss simply inserts the data into buffer, increments 17: 1.
count, releases the lock, and signals to the producer process (possibly making it runnable 18: void processB()
19: int i;
Monitor 20: .mile ( 1 ) (
Monitor 21: if( count= O) buffer_full.wait();
DATA 23: data = ruffer[i];
Waiting 24: i = (i + 1) % N;
CODE 25: ccunt = COJlt - l;
26: buffer: enpty.signal();
27: ~ (&data) ;
28: buffer_full. signal() ;
29:
30: J
Process 31: ) /* errl rronitor */
y 32: void nwn o <
33: create_process (processA); create_p=ess (processB);
(a) 35: }
~
Monitors
Another way to achieve synchronization among cortcuqently executing processes is to use a
· special construct called a monitor. ":, '!'onitor is .~· co~lection ~f da~ and meth~s or.
~(d)
subroutines that operate.on this data ,~mular to an object m an obJecH~nented panid1gm. A
.;
Figure 8. i8: Producer-consumer example with monitors: (a) Xis allowed to enter the monitor while Y waits, (b) X
special guarding property of a monitor guarantees that only·one process 1s allowed to execute·
executes a wait on a condition and is blocked, y is allowed to enter the monitor, ( C) y signals the condition that Xis r inside the monitor at a given time. In other words, one and only one of the methods of a
·..;a1tmg on and thus is blocked allo\\ing Xto finish and exit the monitor, (d) Y is allowed to finish its execution. !
monitor can be active at any given time. A proce~s, say, X, is allowed to enter a monitor if
there are no other processes executing in that monitor. This is shown in Figure 8. 18(a). Once
in a monitor, X has exclusive access to the data inside the monitor. If, and when, X executes a · Processor A l!l
wait operation on a condition variable, also defined inside the monitor, it will be blocked , iil
i::
waiting as shown in Figure 8.18 (b). At this point, another process, say Y, is allowed to enter · (a) Processor B 0
·.:,
the monitor. If Y signals the condition that Xis currently waiting on, Y will be blocked and X '. "'0
,§
Processor C
will be allowed to reenter the monitor. This is shown in Figure 8.18 (c). Then, once X .
terminates, or waits on a condition, Y is allowed to reenter and finish its execution as shown in · Processor D
~
0
Figure 8.18 (d). . u
To clarify this a bit more, we have implemented the consumer-producer problem using .
monitors as shown in Figure 8.19. A single monitor is used to encapsulate the sequential
.,
1
programs of the consumer and producer processes. The shared buffer is also encapsulated in ij'
fl
the monitor. Initially, one of the consumer or producer processes will be allowed to execute. : (b) ~1
Let us assume that the consumer process A will be allowed to execute first. Once the General Purpose ll
g
consumer checks the buffer size and discovers that there are no data items produced, it will · Processor
wait on the bufferJul/ condition variable and thus allow the producer process A to enter the
ll
monitor and produce a data item. Once the producer process A signals the bufferJuli l
condition, the producer process will be allowed to reenter and execute. This behavior will I
repeat and the two processes will take turn producing and consuming data items. Here, it is
left as an exercise to show that, the size of the buffer will never exceed I . Processor A
i::
0
I
(c)
.,
·.:,
General
8.14 Implementation Purpose
-~
So far we have discussed nwnerous operations permitted by the concurrent process model. Processor §
0
Here we will discuss how these operations are implemented using single or general-purpose u
processors.
Figure 8.20: Mapping processes on processors:- (a) processes mapped on multiple single-purpose processors, (b)
Creating and Terminating Processes processes mapped on one general-purpose processor, (c) processes mapped 10 a combination of single and general
purpose processors.
One way to i~plement multiple processes in a system is to use multiple processors, each 0
executing one process. Each of these processors may be a general-purpose processor, in which still execute at the nece.ssary rates. Different ways to map processes to processors are
case we can use a programming language like C to describe the function of the process and illustrated in Figure 8.20.
compile it- down to the instructions of that processor. Or, we can build a custom One method for sharing a processor among multiple processes is to manually rewrite the
single-purpose processor that implements the function of the process. In both cases, when processes as a single sequential program. For example, consider our Hello World program
using processors to implement multiple processes, we can achieve true multitasking (i.e., each from earlier. We could rewrite the concurrent process model as a sequential one by replacing ·
process will execute in parallel to other processes in the system). Implementing each process the c~:mcurrent running of the PrintHelloWor/d and PrintHowAreYou routines by the
on its own processor is common when each process is to be implemente_d using a following:
single-purpose processor. However, we often decide that several processes should be . I= l; T = 0;
implemented using general-purpose processors, While we could conceptually use one -/ while (1) l
general-purpose processor per process, this would likely be very expensive and in most cases Delay (!) ; T T + I
is not necessary. It is not necessary because the processes likely do not require 100% -of the if X mo dulo Tis O the n call PrintHelloWorld
processor's processing time; instead, many processes·may share a single processoris time and if Y modulo Tis O the n c a ll PrintHowAr e You
We would also modify each routine to have no parameter, no loop and no delay; each
- Joining a Process
8. 14: Implementation
would merely print its message. If we wanted to reduce iterations, we could set I to the .
greatest common divisor of X and Y rather than to one. Manually rewriting a model may be If multiple processes are implemented using single-purpose processo 1h r, .
. . . . . . rs, an or one process .\
practical for simple examples, but extremely difficult for more complex examples. While to JOm anolher process Y would reqmre bmldmg additional logic that will d t · 1 y
ha ched · · · . e emunc w 1en
some automated techniques have evolved to assist with such rewriting of concurrent processes s rea !ls comp 1etlon pomt and m response resume X Therefore in ddiu· t h ·
· ·gna1 th · , a on o avmg
mput s1 s at signal when a processor should suspend, each processo t
into a sequential program, these techniques are not very commonly used. . · tha · di . r mus 11avc output
Instead. a second, far more common method for sharing a processor among multiple · signals t m cate when that processor 1s done executing its task lf mut 11· I ·
. . _ . . p e processors are
processes is to rely on a multitas~g operating system. An operating system is a low-level 1mpleulrn~ntedkinusli1~bg a s1thngle_general-purpose processors, join must be built into the language
program that runs on a processor, responsible for scheduling processes, allocating storage, and or m tltas _g rary at 1s used to describe the processes. In both cases, lhe prog ramming ·
interfacing to peripherals, among many other things. A real-time operating system (RTOS) is . language or hbrary may rely on the underlying operating system to handle this operation.
an operating system that allows one to specify constraints on the rate of processes, and that
guarantees that these rate constraints will be met. In such an approach, .we would describe our Scheduling Processes
concurrent processes using either a language with processes built-in (such as Ada or Java), or When multiple processes are implemented on a single general-purpose processor the rnann
· whihth '
a sequential programming language (like C or C++) using a library of routines that extends m c ese processes are executed on a single shared processor plays an important role cr
in
the language lo support concurrent processes. POSIX threads were developed for the latter meeting each process's tin:u"g requireme~ts. This task of deciding when and for how long a ·
purpose. processor executes a particular process 1s JcJtow as process scheduling. A scheduler is a
A third method for sharing a processor among multiple processes is to convert the special process that performs process scheduling. A scheduler can eilher be implemented as a
processes to a sequential program that includes a process scheduler right in the code. Thi°s nonpreemptive scheduler or preemptive scheduler. A nonpreemptive scheduler only decides
method results in less overhead since it does not rely on an operating system but also yields on what process to select for execution, on the processor, once the currently executing process
code that may be harder to maintain. completes its execution: A preemptive scheduler is a scheduler that only allows a process to
In operating system terminology, a qistinction is made between regular processes and. executed for a predetermined amount of time, called a time quantum, before preempting in
threads. A regular process is a process that has its own virtual address space (stack, data, order to ~o~ another process to execute on the processor. This time quantum may be lO to
code) and system resources (e.g., open files). A thread, in contrast, is really a subprocess lOOs of mtlhseconds long. The length of this time quantum greatly determines the response
within a process. It is a lightweight process that typically has only a program counter, stack, time of a system.
and registers; it shares its address space and system resources with other threads. Since We have already defined a process state as being one of rururing, runnable, and blocked.
threads are small compared to regular processes, they can be created quickly, and switching We further assign to each process an integer valued priority. Wilhout loss of generality, we
between threads by an operating system does not incur very heavy costs_. Furthermore, threads assume that the process with highest priority is always selected first by the scheduler to be
can share resources and variables so they can communicate quickly and efficiently. executed on the processor. A process's priority is often statically determined during the
Throughout this chapter, we use the tenn process to denote a heavyweight process or creation of the. process and may be dynamically changed during executioIL
i . lightweight thread. ~ A very simple scheduler is one that employs a first in first out FIFO scheduler. Using a
FIFO scheduler, processes are added to the FIFO as they are created or become runnable, and
Suspending and Resuming Processes processes are removed from the FIFO to be executed on the general-purpose processor
If multiple processes are implemented using single-purpose processors, suspending or whenever the time quantum of the currently executing process ends or the process is blocked.
resuming them must be built as part of the processor's implementation. For example, the Another type of a simple scheduler maintains a priority queue.of processes that are in the
processors may be designed having an extra input. When this input is asserted, the processor runnable state. When the scheduler is ready to select a new process for execution, it simply
is suspended, otherwise it is executing. If multiple processors are implemented using a single selects the process with highest priority for execution. When a blocked process becomes
general-purpose processor, than suspending or resuming the processes must be built into th~ · runnable, it is added to the priority queue of the scheduler to be selected for execution at some
programming language ,,r multitasking library that is used to describe the processes. In both later point When multiple processes have equal priority, the scheduier uses a first-come
cases, the programming language or library may rely on the underlying operating system to first-served basis to select among the processes .w ith equal priorities. When nonpreemptive
handle these operations. , scheduling is being used, this form of scheduling is called priority scheduling. When
preemption is used, this form of scheduling is called round7robin scheduling.
Of course, the real question is how to assign priorities to processes. Before we do this, we
have to have an understanding of how often each of the processes in our system need to
www.compsciz.blogspot.in
Chapter 8: State Machine and Concurrent Process Models
-------- -------- -------- --.-----
a.ls: Dataftow Model
(a) (b)
z z z
Figure 8.21 : Priority assignment; (a) rate monotonic, (b) deadline monotonic priority assignment.
(a) (b) (b)
execute. Let us define the period of a process to be a repeating time interval during which that
processes has to execute once. For example, if we assign to process A a period of 100 ms, Figure 8.22: Simple ~taflow models: (a) nodes representing arithmetic transformalions, (b) oocfes representing mor."
then, process A must execute once every 100 ms. The period of a process is often obtained complex trdnsformations, (c) synchronous dataflow. ·
from the description of a system (e.g., a processes responsible for refreshing the screen on a
display device must run 27 times per second, which equals a period of 37 ms). This notion of
period is similar to the period of a sound wave. In rate monotonic scheduling, processes are
assigned priorities such that those with shorter periods are given higher priorities. We have 8.15 Dataflow Model
given an example of rate monotonic priority assignment in Figure 8.2I(a). Here there are six
processes, labeled A through F, with the corresponding periods given in the next column. We A derivative of the concurrent process model is the data.flow model. In a dataflow model we
can assign priorities to these processes, as follows. We assign the to the process with the .describe system behavior as a set of nodes representing transformations, and a set of dire~ted
largest period, D, the smallest priority, one. Then we assign to the next process with the edges representing the flow of data from one node to another. Each node consumes data from
largest period, F, the next smallest priority, two, and so on. its input edges. performs its transfonnation, and produces data on its output edge. AJI nodes
In the previous discussion, we have assumed that the execution deadline of a process is may execute concurrently. For example, Figure 8.22(a) shows a dataOow model of the
equal to its period. The deadline of a process is defined as the time before which a process computation Z = (A + BJ * (C - DJ. Figure 8.22 (b) shows another dalaflow model having
must run to completion. For example, if a process has a deadline of 20 ms, than it must more _complex node transfonnations. Each edge may or not have data. Data present on an
complete 20 ms after it starts. Note that the actual execution time of a process is equal or less edge 1s called a token. When all input edges to a node have at least on token, the node may
than its deadline. For example, process A may have an execution time of 5 ms, and a deadline fire. When a node fires, it consumes one token from each input edge, executes its data
of 20 ms. This means that once A is started, it can execute for 4 ms, than sleep for 14 ms, and trans!ormation on the consumed token, .and generates a token on its output edge. Note that
resumed to execute for the additional I ms. · Thus, the total time since that process started muluple nodes may fire simultaneously, depending only on the presence of tokens.
would be 4 + 14 + I = 19 ms, which is less than the deadline, therefore such scheduling would Several commercial tools support graphical languages for the capture of dataflow models.
be valid. If we know that a deadline of a process, being scheduled, is less than its period, we These tools can automatically translate·· the model to a conCl!rrent process model for
can use deadline monotonic priori~y assignment. As in rate monotonic priority assignmen~ implementation on a microprocessor. We can translate a dataflow model to a concurrent
instead of the period, we use the deadline to assign priorities. ·in deadline monotonic process model by converting each node to a process, and each edge :o a channel. This
. scheduling, processes are assigned priorities such that those with shorter deadlines are given concurrent process model can be implemented either by using a real-time operating system or
higher priorities. We have given an example of deadline monotonic prfority assignment in by mapping the concurrent processes to a sequential program. ·
Figure 8.2 l(b). Here there are six processes, labeled G through L, with the corresponding 0 We observe that in many digital signal-processing systems, data flows into and out of the
deadlines given in the next column. We can assign priorities to these processes, as follo~s. system at a fixed rate, and that a node may consume and produce many tokens per firing. We
We: assign the to the process with the largest deadline, K, the smallest priority, one. Then we therefore created a variation of dataflow called synchronous dataflow. In this model, we
assign to the next process with the largest period, H, the next smallest priority, two, ·a nd so on. annotate each input and output edge of a node with the number of tokens that node consumes
and produces, respectively, during one firing. The advantage of this model is that, rather th.an
t translating to a concurrent process model for implementation, we can instead statically
schedule the nodes to produce a sequential program model. This model can be captured in a for systems that are designed to interface to the Internet. The Windows CE kernel allows for '. \
sequential program language like C, thus running without a real-time operating syslem and 256 priority levels per precesses and implements preemptive priority scheduling. The size of
hence executing more efficiently. Much effort has gone into developing algoritluns for the Windows CE kernel is 400 Kbytes.
scheduling the nodes into "single-appearance" schedules, in which the C code only has one
statement that calls each node's associated procedure (tl1ough this call may be in a loop). Such QNX
a schedule allows for procedure inlining, which further improves performance by reducing the
The QNX RTOS architecture consists of a real-time micro-kernel surrounded by a collection ·
overhead of procedure calls. witlt0ut resulting in an explosion of code size that would have
of optional processes (called resource managers) that provide POSIX and UNIX compatible .
occurred had there been many statements that called each node's procedure.
system services. A micro-kernel is a name given,to a kernel that only supports the most basic :
services and operations that typical operating system's provide. However, by including or ·
excluding resource manager processes the developer can scale QNX down for ROM-based ,_
8.16 Real-Time Systems embedded systems, or scale it up to encompass hlUldreds of processors connected by various ,
In most embedded systems it is important to perform some of the compulations in a timely networking and communication technologies. Resource manager processes are modules that ·
manner. For example, in the set-top box example, shown in Figure 8. I 2(b), at least 20 video can be added or removed from the basic micro-kernel to best fit the functionality provided b}·
frames need to be decoded within each second for the output to appear continues. Likewise, a the operating system to that needed l?Y the application. The micro-kernel of QNX occupies
digital cell phone decodes audio packets, converts digital signals to analog, and reproduces less than 10 Kbytes and complies with POSIX real-time standard. QNX supports up to 32 ,
the voice in the speaker. All this takes place during strictly defined time periods, or else the priority levels per process and implements preemptive process scheduling using either FIFO, ·.
sound of the remote speaker would appear to be delayed to the listener. Other systems that round robin, adaptive, or priority-driven scheduling. · ·
f have stringent timing requirements include navigation and process control systems, assembly
!·
line monitoring systems. multimedia systems, and network systems, to name a few. Real-time
systems are systems that are fundamentally composed of two or more concurrent processes 8.17 Summary
that execute with stringent timing requirements and cooperate with each other in order to
We have introduced the concurrent ·process model as a well suitable model for describing a
accomplish a common goal. In order for these concurrent processes to work together, it is
large class of embedded systems. Since much of an embedded system's behavior~ be
essential to provide means for communication and synchronization among them. The
described as two or more concurrently executing tasks, the concurrent process model 1s well
concurrent process model addresses most of these requirements and is best suited for use in
suited for describing them. The concurrent process model provides operations to_create,
describing real-time systems. Thus. a system described using the concurrent process model
terminate, suspend, resume, and join processes. The concurrent process model provi~ for
with the additional stringent execvtion-timing requirement imposed on each process, is a
communication and synchronization of processes, since both of these are essential for
real-time system. The additional timing requirement of real-time systems is met by adapting
correctly implementing a system in tenns of multiple processes. Processes must be able to
scheduling algorithms that guarantee time_ly execution of each process in the system as
share data and synchronize their execution in order to achieve a commo~ go~. ~e have
described earlier in this chapter.
described communication protocols that use shared memory and send/receive pnnuuves. In
We will discuss some operating systems that are designed to support real-time systems.
the shared memory scheme, two processes commllllicate by reading and writing variables that
Note that the term real-time system refers to a class of applications or embedded systems thar
are visible to both. A mutex is used to lock for a period of time a region of s~ared data and
exhibit the real-time characteristics and requirements mentioned above. Real-time operating
only allow one process to update it. Syncluonization primitives such as condition variable and
systems, on the other hand, refer to underlying implementations or systems that supports
monitors are also used to allow processes to signal various events to each other. We have
real-time systems. In other words, real-time . operating systems provide mechanisms.
looked at the implementation of concurrent processes as single- and _general-puIPOSC
primitives, and guidelines for building embedded systems that are real-time in nature.
processes. We have defined a real-time system as a system composed of mulliple concurrently
executing processes each having stringent timing requirements. ·we have looked at two
Windows CE real-time operating systems and their features, namely the Windows CE and the QNX RTOS.
Windows CE was built specificaliy for th~ embedded system and the appliance market
providing a scalable real-time 32-bit platfonn that can be used in a wide variety of embedded.
systems and products. One of the benefits of using Windows CE as an RTOS that-supports the
Windows application-programming interface AP!, which has gained great popularity. This
operating system provides a set of Internet browsing and serving services that make it suitable·
II
Actuator
~
f>!
,.;
=i e
I \
reference input I \ (desired speed) ~J
I
I
L;:. .- m
Ti,~ ,'
Control law
~ -cf·. 1------' ,' 1-----':
______ .,,. , Goal: design F u, = P*r, -~
such that v
! L....---------- approaches r Plant (Automobile) System model ~
"
(a)
tirne
(b)
Error
(a)
v,., = 0. 7v, + 0.5Pr,
I
il
e, (error)= r, - v, w,
to force a physical system's output to track a reference input: (a) good
Figure 9.1: The goal of a control system is
!racking, (b) not-as-good !racking.
detector
I·
~
ft Car model 8
engine perfonnance. It must correcdy handle any situation presented to it, like accelerating
u, = F(x1) v,., = 0.7v, + 0.5u, - w, "/J
from 20 mph to 50 niph while going down a steep hill. It should control the car in a way that Control law
·is comfortable to the car's passengers, avoiding extremely fast acceleration or deceleration, u, = P*(r, - v1)
and avoiding speed oscillations.
Plant (Automobile) System model
Control systems have been widely studied, and a rich theory for control system design
Sensor (speed) v,. 1" (0.7-0.5P)v, +
exists. This chapter does not describe that theory in detail, since that requires a book in itself · 0.5Pr, - w, ·
as well as a strong background in differential equatio_ns. Instead, we will introduce the basic
concepts of control systems using a greatly simplified example. This introduction will lead up
(b)
to PID controllers, which are extremely common. One of the goals•of the chapter is k> enable
the reader to detect when an embedded system is an instance of a control system, so that the Figure 9.2: Control systems and automobile cruise controller example: (a) open-loop control, (b) closed-loop control.
reader knows to tum to control theory (or to someone trained in control theory), rather than
using ad hoc techniques, in those cases. However, in some cases, PIO controllers can be used 4. The actuator !s the device that we use to control the input to the plant. A stepper
without extensive knowledge of control theory, and thus we will introduce some commonly motor co.ntrolling a car's thrqttle position is an example of an aciuator.
used PID tuning techniques. 5. The co~troller is the system that we use to compute the input to the plant such that
, we achieve the desired output from the plant.
6. A ~isturbance is an additional undesirable input to the plant imposed by the
environment that may cause the plant output to differ from what we would have
9.2 Open-Loop and Closed-Loop Control Systems expected based on the plant input. Wind and road grade are examples of disturbances
that can alter the speed of an automobile. ·
A control system with the~ components, configured as in Figure 9.2(a), is referred to as
Overview an open-loop, or fee~-forward, control system. The controller reads the reference input, and
Control systems minimally consist of several parts, illustrated in Figure 9.2: then co~putes a ~ttJng for the actuator. The actuator modifies the input to the plant. which,
I. The plant, also known as the process, is the physical system to be controlled. An along with any disturbance, results some time later in a change in the plant output. In .an
automobile is an example of a plant, as in Figure 9 .2(a). open-loop_ system, the controller does not measure how well the plant output matches the
2. Th~ 'output is the particular physical system aspect that we are interested in reference mput. ~us, open-loop con~ol is ~est suited to situations where the plant output
controlling. The speed of an automobile is an example of an output. respo~d_s very predictably to the plant mput (1.e., the model is accurate and disturbance effects
3; The reference input is the desired value that we want to see for the output The are muumal). ..
desired speed set by an automobile's driver is an example of a reference input Many control systems possess some additional parts, as illustrated in Figure 9.2(b):
1. A sensor measures the plant output. ·
,,
246 Embed~ System Design Embedded System Design
www.compsciz.blogspot.in
Chapter 9: Control Systems
9·2= Open-l.oop and Closed-Loop Control Systems
2. An error detector determines the differenc.e between the plant output and ·the .
reference input Flo be as simple or as complex a function as desired. Let's start b . .
simple linear function of the form: Y assuming that F 1s a
A control system with these parts, configured as in Figure 9.2(b), is known a as
closed-loop, or feedback, control system. A closed-loop system monitors the error between
the plant output a.id the reference input. The controller adjusts the plant input in response to
this error. The goal is typically to minimize this tracking error given the physical constraints
u, = P * r,
of the system. Here, P is a constant that the designer must specify. This linear prop0 ...., nal
ak · · · - - . rnO contro11er
mh es intwt1ve sense smce ,t mcreases_ the throttle angle as the desired speed increases. In
A First Example: An Open-Loop Automo~iSe Cruise Controller ot er words, the throttle.angle ,s proport10nal to the desired speed.
We are primarily interested in closed-loop control in this chapter. However, let us begin by ~iven this proportional contr?l fu~ction, we can now write an equation that models the
providing a simple example of an open-loop automobile cruise controller, illustrat~ in Figure combined controller and plant, which will help us determine what value to use for P:
9.2(a). As you probably already know, the objective of a crwse-control system is to maich the v,+1 = 0. 7v, + 0.5u,
car· s speed to the desired speed set by the driver.
Developing a Model: In many cases of controller design, our first task is to develop a u,=P*r,
model of how the plant behaves. A model describes how the plant output reacts as a function v.,, = 0.7v, + 0.5P * r,
of the plant inputs and current state. For our cruise controller, the model should describe how
the car reacts to the throttle position and the current speed of the car. As we will see iater in Th~ design goal for ~e cruise controller is to keep the actual speed of the car v equal to
this chapter, we don't always have to model the plant, and instead could design a controller tl1e des1 red. speed r at all umes. Of course, it is impossible to keep these two values equal at
through s.omewhat ad hoc experimenting. We could see how a particular controller works and all hmes, smce the car will require some time to react to any changes the controller makes to
iteratively modify the controller until the desired tracking is achieved. However, for many the throttle angle. For example, the car cannot accelerate from O to 50 mph instantaneously.
plants. like a car, such experimenting is dangerous, so usi.hg a model for the experimenting is Rather, from _the moment the controlle( sets the throttle, a car will take several sh:onds to
preferable. Furthermore, with a model, we can even design fihe controller using quantitative acc~lerate to its final speed. Therefore, the design goal can be relaxed to that of forcing the
techniques, thus avoiding the need for experimentation while creating a better controller.· ~ar s actual speed v to be equal to the desired speed r in steady state. Steady state means that
The car has a throttle input whose position u can vary from O to 45 degrees. We decide to if the_controller sets th~ throttle to a constant value, and nothing else changes. then at some
begin by test-driving the car on a flat road and taking measurements.. Suppose that with the lime in the future, v will also not change. So in steady state, vi+, = v,. Let' s refer to this
car traveling steadily at 50 mph and the throttle set at 30 degrees, we quickly change the steady-state velocity as v.,,. Substituting v,, for both v,+1 and v, above, we get:,
throttle to 40 degrees, and measure the car's speed every second thereafter, until the car' s v,+1 = 0. 7v, + 0.5P * r, i !
speed finally becomes constant. Based on the measured·speed data, suppose we detennine that ' iI
the following equation describes the car's speed as a function of the current speed and throttle let, v,.; 1 = v, = v..
I
position:
v,+1 = 0.7v, + 0.5u,
v" = 0:7v~, + 0.5P * r,
I
v.. - 0.7v~, =0.5P * r
Here, \', is tl1e car's current speed, u, is the throttle position, and v,+ 1 is the car's speed one. v,, = 1.67P * r,
second later. For example, v2 = 0.7v1 + 0.5u1 = 0.7 *. 50 + 0.5 * 40 = 55. Suppose further that
we try a variety of other speeds and throttle positions, and we find that the above equation So, if we want v,, = r,, we merely need to set I' = 1/1 .67 = 0.6. We have now designed
holds for all those other situations. Therefore, we decide:that the above equation is a suitable _our first controller:
first model for tl1e car over the range of speed that is of interest. Please note that this is· not u, = F(r,)
actually a reasonable model of a car, and is instead used for illustrative purposes only.
Developing a Controller: Now let's tum our attention away from modeling the car and
toward designing the cruise controller for the car. Suppose the only input to the controller is
u, = P * r,
Ii
the desired speed r,. as.shown in Figure 9.2(a). The controller's behavior is a function Fofthe
commanded speed, so that the throttle position is u, = F(rJ . The control designer can choose The controller merely multiplies the desired speed r, by 0.6? to d~temline the desired
throttle angle. .
I
248 Embedded System Design
!I
www.compsciz.blogspot.in
Embedded System Design
. 249 I
····- -·· ·-- ~ ·= ~ -,~--~ ~
- I
r!
··.q;
.'
l
9.2: Open-Loop and Closed-Loo~ Control Systems
Time (t) v, v, for w - +5 v1 for w 5 degrees, corresponding to uphill roads. The car goes faster downhill and slower uphill.
I 0 20.00 20.00 20.00 Suppose road grade is incorporated into the earlier model for the car alone as follows:
1 29.00 24.00 34.00
2 35.30 26.80 43.80 vt+1 =0.7v,+0.511i-w,
3 39.71 28.76 50.66 Since the open-loop controller has no means of sensing the road grade or its effect on the
4 42.80 30.13 55.46
speed, this disturbance will obviously result in speed error when driving downhill or uphill.
5 44.96 3109 58.82
6
Figure 9.3(b) displays the behavior of the car with the open loop controller when driving up a
46.47 31.76 61.18
7 47.53 32.24 62.82 +5% grade, and Figure 9.3(c) when driving down a -5% grade. The speed error at time t = 12
8 48.27 32.56 63.98 is about 50 - 33 = 17 mph in the uphill .case, arid about 50 - 66 = -16 mph in the downhill
9 48.79 32.80 64.78 case. This error is quite bad! Closed-loop control systems, which will be discussed shortly,
10 49.15 32.96 65.35 can help reduce errors caused by disturbances.
11 49.41 33.07 65.74 Determining Performance Parameters: Using the model of the system created earlier, a
12 49.58 33.15 66.02 designer can quickly determine vario~s important performance parametas_ Assume that the
(a) (b) (c) initial speed is v0, the desired speed is ro, and the disturbance is w0, then we can develop an
equation for v, asfollows:
F.,gure 9 .3-. Open-loop cruise controller trying to accelerate the car from 20 mph to 50 mph, when the grade is: (a) v1 = 0.7vo+0.5P * r0 -wo
0%, (b} +5%, (c)-5%.
v2 = 0.7 * (0.7vo + 0.5P * r0 -wo) +0.5P * r0 -w0
1• Analyzing Our First Controller: Let's analyze how well this controller achieves its goal. v2 = 0. 7 * 0.7v0 + (0. 7 + 1.0)* 0.5P * ro - (0.7 + 1.0)* w0
Two issues are of interest: (I) what is the transient behavior whe~ r change_s; and (2) .":hat
effects do disturbances have on the system? Th(: equation representmg the entire system 1s. v1 = 0.71 * v0 + (0.11· 1 + 0.71· 2 + ... + 0.7 + l.0)(0.5P * ro -w0 )
v,+1= 0.7v, + 0.5*0.6r, The last equation shows three important points. First, in the model v,+1 = 0.7v, + 0.5u, -
w,, let's refer to the coefficient ofv, as a; in this case a= 0.7. Looking at the last equation, we·
v,+ 1 = 0.7v, + 0.3r, see that a detennines the rate of decay of the effect of the initial speed. In other words, a
To see how the system behaves, suppose a car is traveling steadily at 20 mph at time t = bigger a will restilt in the car taking longer to reach its desired speed. Notice that, in open
o at which time the desired speed r0 is set to 50. Given the form of our controller abo_ve, we loop c.oiltrol, the controller gain P has no effect on this rate of decay. In closed-loop control, it
will.
~e that the controller will set the throttle position to 0.6 * 50 = 30 degrees, and hold '.t there
until r, changes again. We can "simulate" the system by evaluafn~ the above _equatlon for Also note that if /al > I, then v, would grow without bound as time increased, since a is
various time values (a spreadsheet program makes this task easy). Figure 9.J(a) Illustrates the being"raised to the power oft. Furthermore, note that a negative a will result in an oscillating
car's speed over time. We see that (in the absence of disturbances) the controller does well, speed. Again, in closed-loop control, we will be able to change a.
approaching the desired speed of 50 mph to within 0.3% in 10 seconds. . . Second, the sensitivity of the speed to the disturbance is not altered by the open loop
Considering Disturbances: Suppose now that additional tt:5ting of the ~r 1s performed
· controller.
on roads with grades w varying from -5 degrees, ·correspondmg to downhill roads, to +5 Third, if our assumed model were not correct, then this model error wouJd cause the
steady state speed, that results from the open loop controller u = P,, to not equal the desired
speed.
4 Note that the simulation evaluates the controUer performance relative to a _model. For
the simulation results to accurately predict the results of the ~ture contr?I expenments \Uth A Second Example: A Closed-Loop Automobile Cruise Controller
the actual hardware, the model must be accurate. However, smce ther~ ~s expense mvoh ~d We can r«iuce the speed error caused by disturbances, ~ grade or wind, by enabling the
with developing the model, there is always a trade off and art to de~erm1run~ when the m':1el controller to detect·speed errors and correct for them. To detect speed errors, we introduce a
is sufficiently accurate to complete the analytic portion of the design. T~mg of the des gn speed sensor into the system, as shown in Figure 9.2(b), to m~ure the ca(s speed. We also
cypically occurs during the initial hardware experiments to accommodate differences between. introduce a device that outputs the difference between the desired speed r1 and the actual
the model and hardware. speed v,. lbis difference is the speed error e, = r, - v,. Note that the penalties for this
closed-loop approach are the cost of the sensor, added controller complexity, and the addition Time v, u. v, u, v, u,
of sensor. noise. The benefits will be the ability to change the rate of response, reduction of 0 20:00 99.00 20.00 4S.OO 20.00 30.00
sensitivity to disturbances, and reduction of sensitivity to model error. If we select the forrn of 1 63.SO . -44.55 36.50 44.SS 29.00 21.00
the controller to be linear and proportional as before, namely u, = P * (r, - v,J, then: 2 22.18 91.82 47.83 7.18 30.80 19.20
3 61.43 -37.73 37.07 42.68 31.16 18.84
v,, 1 = 0.7v, + 0.5u, -w, 4 24.14 85.34 47.29 8.95 31.23 18.77
s S9.57 -31.58 37.58 40.99 31.2S 18.7S
v,+1 =0. 7v, + 0.5P * (r, - v,) - w, 6 25.91 79.50 46.80 10.55 31.25 18.75
Vt+! = (0.7. - 0.5P) * Vr + 0.5P * ft - Wr
7 S7.89 -26.02 38.04 39.47 31.25 18.75
8 27.51 74.22 46.36 12.00 31.25 18.75
Note that the closed-loop controller results in a = 0.7 - 0.5P, and remember that a 9 S6.37 -21.01 38.4S 38.10 31.25 18.75
determines the rate of decay of the effect of the initial speed. Therefore, by choice of the 10 28.95 69.46 4S.97 13.31 31.25 18.75
!)aiameter P, the control system designer can alter the rate of convergence of the closed-loop ...
system. However, we cannot make P arbitrarily large, because if the designer selects a value . 45 44.53 18.06 41.70 27.39 31.25 18.75
46 40.20 32.34 42.89 23.48 31.25 18.75
of P such that 10. 7 - 0.5PI > 1.0, then the speed will not converge to the commanded speed,
47 44.31 18.78 41.76 27.20 . 3L25 18.75
but instead grow without bound. The constraint 10.7 - 0.5PI < 1.'o is necessary for the system
48 40.41 31.66 42.83 23.66 31 .25 18.75
to be stable. This stability constraint translates to the following: 49 44.11 19.42 41.81 27.02 31.25 18.75
0.7 -0.5P < LO so 40.59 JI.OS 42.78 23.83 31.25 18.75
...
0.7 -0.5P >-LO ss 42.31 25.38 42.31 25.38 31.25 18.75
(a) (b) (c)
--0.5P<0.3
;
l
--0.5P > -1. 7 60.-------------------
i
. : 501---..--.---_,-----:::------"-----
p > --0.6 · '.[ 40f--:;;;,~-"~!'..,-~~=--~./-__:~Le-:..a:::::!!:::::tl:::::~::R=-11::!!
P < J.4 £~~~~~~~~~~~~~~~~~~
so, --0.6 < P < 3.4 1'20...-- - - - - - ---'r--- ---F- p=3.3
o> 10 r - - - - - - - - - - - - - - - - ~ - . - p = 1 . 0
We could set P close to 3.4 to obtain the fllitest decay of the initial condition. However,
remember that a negative a will cause oscillation, which is something we'd usually like to 0 -1--~-,-.--.--~-.--.-~-~~---;:::::::...._
0 2 3 4 .5 ·6 7 8 9 10 45 46 47 48 49 50
avoid. To keep a positive, we need:
Time (sec)
0.7 -0.5P >= 0 (d)
--0.5P >= --0.7
P<= L4 Figure 9.4: ~losed-loop ~_ise con~oll"': trying to accelerate from 20 to 50 mph, ignoring dislurbance, where vis car
speed and u ,s throttle pos1!Jon: (a) mvahd data when throttle saturation is ignored, (b) valid data for p = 3.3, (c) valid
So the fastest rate of convergence to steady state without oscillation, known as deadbeat data for P = 1.0, (d) plot for P = 3.3 and P = 1.0.
i
www.compsciz.blogspot.in -~ ~""--'-- ~ --- -~ - j
I
n·
; ':
·J')-
Chapter 9: Control Systems
9 3· Gene I Co · · . ·
• • ra ntrol Systems and PIO Controllers
having introduced additional steady-state error when there is no disturbance, and of having Modeling Real Physical Systems
introduced oscillation.
To allow more control objectives to be satisfied with fewer trade-offs, the complexity of An essential prelude to control system design is accurate modeling of the behavior of the
th_e controller will have to increase. as will be described subsequently. plant The controller will be designed based on this plant model. If the plant model is
inaccurate, then the controller will be controlling the wrong plant. There are two key features
that real systems display that our earlier example did not consider.
The first · feature of real physical systems is that they typically respond as continuous
9.3 General Control Systems and PIO Controllers variables and as continuous functions of time. In the cruise-controller example we assumed
Havi11g seen the abo\·c examples. we can now discuss control systems more generally. This that the car's speed would change exactly one second after a change in the throttle. Obviously.
section discusses objccti\·cs of control design. modeling real physical systems, and the PIO cars do not synchronize their reactions to the discrete time intervals, but' instead they a~
approach to controller design. continuously reacting: Therefore, the plant dynamic model is usually a differential equation.
There. are .methods for determining a discrete ti me model that is equivalent '- only at the
..Control Objectives sampling mstants - to the plant differential equation. Between the sampling instants, the
discrete time model tells the designer nothing about the continuous time resPQnse. Therefore,
TI1e objecti\·c-of control system design is to make a physical sysiem behave in a useful
fashion. in particular. by causing its output to track a desired reference input even in the
·s
the sampling period must be selected much smaller than the system reaction time so that the
system cannot change significantly between sampling instants. The I second sample time
presence of measurement noise. model error. and disturbances. Satisfaction of this objective
used in the earlier examples of this chapter is not meant to be realistic. See also the
can be evaluated through several metrics specified relative to a step change in the control
subsequent discussion of aliasing.
systems input:
The second feature of real physical systems is that they are typically much more complex
I. Stability: The main idea of stabili~' is that all variables in the control svstcm remain
than any model we create. The model will not include all nonlinear effects, all system states,
bounded. Preferably. the error variables. like desired output minus ·plant output.
would con\'crge to zero. Stability is of primary importance. since without stabilitv.
a
or all state interactions. For example, the response of the speed of a car to change in tlrrottle
depends on spark advance, manifold pressure, engine· speed, and additional variables.
all of the other objectives arc immaterial. · · ·
. Therefore. any model is a simplified abstraction. Modeling and control design is an iterative
2. f'erjim11ance : Assuming stability. performance describes how well the output tracks
process, where the model of the actual plant is improved at each iteration to include key
a change in the reference input. Performance has SC\'cral ::~.pcc:ts. illusua!cd iii Figure ;/' .
features identified during the prior iteration, Then the controller is improved to properly
9.6.
address the improved model. Linear models usually suffice when the variables of the model
have a small operating range. ·
70.0
I
0 60.0
!
~
reference input 50.0
~ 40.0 =.= • • • • • • • • • •
----------
I
Controller Design
Figure 9.8: PD step response.
The earlier closed loop example showed that increasing P caused the steady state speed v.,.. to
better ~tch the desired speed r, and tq resist tracking error caused by distwbances. A
probably should increase tlle plant input, and actually should have increased it earlier. We sc1.:
controller that multiplies the tracking error by a constant is known as using proportional
these things because we predict the system's future behavior will be similar to its pas1
control. To summarize, when propo~ional control is applied to a first order plant the
behavior, a good assumption when dealing with physical systems. The derivative term, whicl1
resulting closed loop model is similar to our particular cruise-controller model of:
looks at the difference in the output between two successive time instances, can be used I<·
v1• 1 = (0.7 -0.5P) * v1 + 0.5P * r1 - w, achieve similar prediction, and thus can cause the controller to react accordingly. In the
language of control systems, this is referred to as adding lead.
Therefore, the controller parameter P affects transient response, steady state tracking
PD control implies a more complet controller, since the controller must keep track of th,.
error, and distlllbance rejection. However, we saw that adjusting P resulted in trade-offs
error derivative. However, l?D control will give us more flexibility in achieving our control
among these control objectives. We could reduce oscillation and improve convergence, but at ·
objectives. We can see this by deriving the equation for the complete cruise-controller s,s1c11,
the expettse of worse steady-state error, and vice versa. .
using PD control, just as we did for the simpler P controller: ·
PD Control: More degrees of freedom must be introduced into the controller design to
allow greater flexibility in the optimization of the trade-offs involved in the closed loop v,.1 = 0. 7v, + 0.5u, - W;
performance. We can achieve this by using a proportional plus derivati~e .controller. In- ·--·
let u, = P * e. + D * (e. - e,_i)
proportional plus derivative control (PD control), the form of the control law 1s: .
and Ci =t, - v,
u, = P* e,+ D * (e. - e.-1)
Here. e, =; r, ,,, is the measured speed error, and e1 - e,_1 is the derivative of the error
v,+1 = 0.7v, + 0.5 * (P * (r, -v,) + D * ((r, - v,) -- (r,.1 - ,·,_i))) - w,
(meaning the change in error over time). Pis the proportional constant, and Dis the derivative
constant. . . . ,
. Intuitively, the derivative tennis being used to predict the future. Consider Figure 9.7.-.
V1+1 = (0.7 - 0.5 * (P + D)) * v, + 0.5D * v,_1 + 0.5 * (P + D) * r, - 0.5D * r,.1 - w,
The two plots show two different responses. In (a), we as hwnai1s can~ _that the system When the reference input and disturbance arc constant, the steady-state speed is again:
output is approaching the reference input quickly, and so we should probably reduce the plant
input to prevent overshoot. In (b), we can see that the output is increasing very slowly. so we,,
Vss = (0.5P/ (1 -0 7 + 0.5P)) * r
www.compsciz.blogspot.in
I
40.0
8. 30.0
~=:~:t.-$. -·j.---a
.,,...,.... . ......
• • • •
We can comb'
u,
· al .
(/)
aclu:i\~e~~:;~s : :1~0 ;::;c:';e~oi::!~er : : : ; 0 ~ is1~en to select the PID gains to ;,
20.0
different values of PID. The main effect of varying !is that. asp/o s_ step redspothnses for . ~ee i'.
: --+--· '" <2.5,1= .5, D=-0.35 th . . is increase , e rate at wluch
I e response converges to its desired value increases· how ver tlte / tenn d I aff; h Vt·
10.0 i ·~
--it- ?·=2.5,l-= .25,D=-0.35
·- i'=3 3.1=0,D=-0.35
nature of the transient. If I is increased too much_ th~n the ;espo'nse can be oes a so ·11 ect t e ;_!
even unstable. · come osc1 atory or ,:
0.0 ~ · --~-~·- - -------.,. -
PID comrollers are extremely co~mon in embedded control systems: Several tools exist ~
.::\ 2 3 4 5 6 7 8 9 ~0 1; 12 :3 14 15 16
:!1elp a des;gnher chboolsc the appropnate PID values for a given plant model. OE-the-shelf IC :,\
Sampling Instant
PID ~~:i;t setta e P, I, and D values, called PID controllers, are available to accomplish ',
~
;i
Figure 9.9: PIO step response.
9.4 Software Coding of a PIO Controller
This is the same as for proportional control, since in steady state the effect of the
A PIO controller can be implement~ quite ea~ily in software. Consider writing a program in
derivative tenn is zero:
C to implement a PID controller. It might consist of a main function with the following loop:
The characteristics of convergence of the tracking erroc e to its steady-state value is
determined by the roots of the polynomial: z2-(0.7 - 0.5 * (P + D)) *, - 0 .5D = 0, under the void ma i n ()
I! '
assumption that the magnitude of the roots (they may be complex) is less than l. Therefore,
L
\, adding the derivative tenn allows the transient response to be modified without affecting the double sensor:_value , a c tuator: value, er:r:or~c ur:r:e nt ;
~-' steady state tracking or disturbance rejection characteristics. Figure 9. 8 plots step responses PID_ DATA pid data; -
for various \'alucs of P and D. Note that the steady state value of the response is affected by P, Pidinitiali z; (&pi d data);
not D. The parameter D does significantly affect the character of the transient response, in while (1) ( -
other words, the rate of convergence and the oscillation. The dashed-dotted line should be sensor_value = SensorGetValue () ;
compared with the response in Figure 9.4, for wllich P = 3.3 and for which we can treat D =0. reference__value = Refer:enceGetValue ();
In summary, by building a slightly more complex controller, namely, a PD controller, which actuator:_value =
considers not just the error input bi.JI also the derivative of the error input we can adjust the PidUpdate(&pid_data,sensor_value,refer:ence value);
transient response and the steady-state error independently by adjusting D and P. Ac tuato r:SetVa lue(actuator:_ value ) ; -
PI and PID Control: In proportional plus integral. control (Pl control), the form of the
control law is:
u, = P * e. + I * (Co + e1 + .. . + c,) We create the main function to (oop forever. During each iteration, ~e first read the plant
output sensor'. read the current desired reference input value. and pass this information to
The integral tenn sums up the eiror over time. Let's consider this tem1 intuitively. Look funclion Pid(Jpdate. PidUpdate determines the value of the plant actuator. which we then use·
at Figure 9.4(d) again. Notice that both controllers achieve a steady-state value that is below to set the actuator.. Note that read.in~ th.e sensor will typically =.• ., ~ )ve an analog-to-digital .
the desired \·alue of 50 mph. As humans, we can see that we should just increase the plant converter. and setting . the actuator will involve a digital-to-ana 'lg converter; L'ie details of
input again until this error goes to zero. In o1her words. as long as there's error, we shouldn't these funct10ns are orrutted.
rest! Tite integral tenn achieves this goal by summing the error over time, we ensure rhat the
~
- - - -·- - - - - - - - - - - - - - ·· Embedded System Design
261
~
1,
260 Embedded Syst~m Design ;~
www.compsciz.blogspot.in . ., - - - - · ~C · -· - ·· ·2" .~
··-·---------- --
Controt1\
9.6: Practical Issues Related to C~put~r-Based
Chapter 9: Control Systems --------------------------~-.,..,;~~.;....-==,;.,-l ,\
'C1
safety is not a concern, and the cost of using the plant_ is not a major concern either, we can \!
Our PID- DA TA data structure has the following form:
select the PIO values through a somewhat ad hoc turung process. This has two advantages. ;1
F/rst, our model of the plant may be too complex for us to wotk with quantitatively. Second, ~
type de f struct PID - DATA
. [ . .
double Pgain, Dgain, Iga~n, . // find the derivative we may not even have a model of the plant, perhaps because we don't have the time or ~
double sensor_value_previous '. knowledge to create such a model. The tuning process we'll discuss has been shown to result -;'.\
double error_suro; // cumulative erro r in PID values that are reasonably close to the values that would have been obtained through ~
quantitative analysis. fj
l . ts which we assume are set in the
One tuning approach is to start by setting the P gain to some small value, and the D and J \_1·,_
. So !'1TJ DATA holds the three gam ~onstan , I which will be used for the 1
. - I I holds the prev10us sensor va uc, th . t I gains to_O. We then increase the D gain, usually starting about 100 times greater than P, until -
Pid!nitia!ize function. t a so . · ul . of error values, used for e m egra we see oscillation, at which point we reduce D by a factor of 2 to 4. At this point, the system , ..
der vative term. Finally, it holds the cum auve sum
will probably be responding slowly. Next, we begin increasing the P gain until we see ,
termWe can now define our PidUpdate function as follows: oscillation or excessive overshoot, and then we reduce P by a factor of 2 to 4. Finally, we ~
sensor_value, begin increasing the J gain, starting perl.iaps between 0.0001 and 0.01, and again backing off i
PidUpdate(PID_DATA *pid_data, double when we· see oscillation or excessive overshoot. These three steps can be repeated until either !
double . double reference_value)
www.compsciz.blogspot.in 265
Chapter 9: Control Systems
9.10: Exercises
-:\._
9.8 Summary
This chapter introduced control systems. A control sys_tem has several components and
signals, including. the .actuator, controller, plant, sensor, output, reference input, and
disturbance. We developed increasingly complex ·controHers, specifically a proportional
open-loop controller, a proportional closed-loop controller, a proportional-derivative (PD)
closed-loop controller, and a proportional-integral-derivative (PID) closed-loop controller.
There are numerous control objectives, including stability, and performance objectives such
as rise time, peak time, overshoot, and settling time. These objectives may compete with one
another. The more complex controllers assist us to achieve various objectives with less
restrictive trade-offs between objectives. Several additional issues must be considered when
using computers to implemeni a controller, including quantization and overflow effects
aliasing, and computation delay.
10.1 Introduction
In Chapter I, we introduced the idea that embedded system design includes the use of three
classes of technologies: processor technology, IC technology and design technology. Chapters
2-7 have focused mostly on processor technology, since one should understand how to build a
processing system first, before learning what IC technologies are available to implement such
a system, and before learning what design technologies are available to help build the system
more rapidly. In this chapter, we provide an overview of three key IC technologies.
Several earlier chapters focused on an embedded system's structure. A system's
structural representation describes the numbers and types of processors, memories, and buses,
with which we implement the system's functionality. In this chapter, we focus on mapping
that structure to a physical implementation. A system's physical implementation describes the
mapping of the structure to actual chips, known as integrated circuits (ICs). A given structure
can be mapped to one of several alternative physical implementations, each representing
different design trade-offs. In fact, different parts of a structure may be mapped to different
physical implementations. We might think of the structural representation as food menu for a
banquet meal, and the physical implementation as the meal itself A wedding banquet might
call for a menu of chicken and vegetables, whereas a sports team banquet might call for
spaghetti. Thus, we see trade-offs made in choosing the structure. Each meal itself can be
prepared in different ways (e.g., the vegetables could be fresh or frozen). Thus, we see further
trade-offs in choosing the physical implementation.
www.compsciz.blogspot.in
~
;i
R
- - - - - - - - - - - - - - - - - - - - - - - - - -I\
Chapter 10: IC Technology
--------------------------~.;__~1:o.~1=~·nfroduction~
-~-~-~--I~ :.~
ri
fj
... attracts electrons here,
D
gate !~
turning the channel
between source and drain x-q p-y melal2 layer
D ~
••
oxide layer ;_j
into a conductor. mc:lall layer f!1
F=(xy)' oxide layer -~
polysilioon layer
(al
source
(b)
drain
--
silicon substrate •y
D
X
,J
(a)-- (b) (c)
t1
Fi~re ; G. I : (a) a CMOS trans istor (nMOS), (b) t<>!Hlown view. F_if!lJr_e 10.2: Depicting circuits in silicon: (a) a NAND circuit schematic, (b) layers, (c) top-down view ofthe NANO !
etrcutt on 8!l IC.
f
We will consi<k:r lhree major categories of physical implementations, or IC technologies: .J
~mi-custom and programmable. We should mention that the tenn "technology" in When drawing tens or hundreds of transistors, the three-dimensional view of Fi~ 1
I •._; .•:.,111
,·;~ ~.C!''"Xt of ICs is often used to instead refer to a particular manufacturing process 10. l(a) quickly becomes cumbersome to create and is really unnecessary. Instead, we can use
a to~own two-dimensional view, wherein we first assign a unique pattern to represent each
tcchnologv, describmg the type and generation of manufacturing equipment ~ing used to
layer. Thus, the transistor of Figure 10. l(a) could be represented using the top-down view
\I touil<l t.ik '.( for.example, a chip may be manufactured using a CMOS 0.3-~cron process
teclml'logy. Our use of the term IC technology here refers instead to different categories of shown in Figure 10.l(b). The oxide layer is implicit, since it must always exist below the
}
ICs: each category can be implemented using any manufacturing process. . polysilicon.
I
I Om: should recall from Chapter l tbat processor technologies and IC technologies are Transistors are;n<it very useful unless they are connected with one another, and so we'll
i need to introduce at least two layers of metal, which we'll call metal 1 and metal 2, to seive as
independent of one another. Any type of processor can be map~ to any type of I~.
Furthennore, a single IC inay implement part of a processor, an entire processor, or as IS connections. These layers will need to be insulated from each other and from the polysilicon,
commonly the case today, multiple processors. .. requiring two more oxide layers. Figure l0:2(b) depicts the ordering of the various layers
Let 11s begin our discussion of IC technology by again examining a basic transi_stor. ~ we've introduced so far. This fi~ only depicts the ordering of layers, and doesn't show the
simplified version of a complementary metal-oxide-semiconductor (CMOS) tr.ms1stor IS connections that must also exist between higher and lower layers.
shown in Figure 10.l(a). It consists of three tenninals: the source, drain, and gate. The ~urce Note that we always need at least two layers of metal, since otherwise we will be unable
and drain rr.gions lie within the silicon itSelf, created by implanting ions ~to those re_~ons. to implement all but the most trivial of circuits. Think of trying to build a system of freeways
Th-· gat; , Il)llde from polysilicon, sits between the source and_drain but above the silicon, without being allowed to build any bridges and without being able to cross roads going to
se;rated from :he silicon by a thin layer of insulator, silicon dioxide. The voltage at the gate different places, and you'll understand why at least two metal levels are necessary.
controls wh,'.;t::r current can flow between the source and the drain, while the insulator Manufacturing processes that use even more than two levels of metal are common.
prevents cu.:: · :·rem, flowing through the gate itSelf. For an nMOS transistor, if a_ ~gh -, - Now that we have layers for representing transistors and their connections, we can build a
enough rolt1:;, ;~ ,ipplie.1 m the gat~. electrons are attracted from throughout the silicon simple circuit on an IC. Suppose that we want to build the simple NANO circuit that was
substrate \OlU the cruiru:ei ''.~ C.
C• ''. • £.'"Jd the drain, creating a field that allows curreni
,~ • • • • C
introduced in.Chapter 2, and is redrawn in Figure l0.2(a) for convenience, consisting of two
conduction between source a..'lct ,L~,:~ On the other hand, ifO Vis applied to the gate, then the nMOS and two pMOS t..'311SistOIS. We'll use a top-down view, and use the patterns shown in
channel cannot conduct ~ - · · Figure 10.2(b) for each !ayer. Figme 10.2(c) shows the top-down view· of the NAND circuit,
Notice that the transistor hw; three layers. The source and drain regions lie within the with black representing metall. Take. some time to see if you can see the correspondence
silicon substrate; these regions ;,re known as p~on or n-diffusiou, 4cpen!ling on whether between (a) and (c).
we a1e building an nMOS or pMOS ~nsistoi. The silicon dioxidt- insulating lay~r lies on_ top ·._ Let's consider how a circuit on an IC is actually manufactured. We'll begin by revisiting
of the substrate, and is typical[; reterred to a~ oxide. The gate r-:;on lies on·top of \he sihcon · the simple transistor of Figure 10. i' and considering how this transistor would actually be
dioxide, and is made from a substance kn0"11 polysiliC{'n. as . . manufactured. Since the ~ r consists of three layers, we might mistakenly asmune that
we could manufactme this transistor in three steps. In such an idealized manufacturing
- - - - - - - - - - - ~ - - - - - -- ---- -- .
· ______
Programmable Logic Semi-custom
..,A.,__________~ Full~ustom
Device r
Gate array
I'
Standard cell
II START
p-y p-type
..
I
J I
I
Designers are provided Design= create layolJtl
I I with a library of for basic coniponen\g. · · F = (xy)'
I I
I
I
I
I
predesi ed cells.
J I
I I
I I
I
I
I
I
,
I
J
I
I
I
I
Dt:::o \ n-type
I
I
I
I
' \
I
/ SfART
:
\ (a)
: Designers are provided , I1)esign~ celli Design= place the
\\G/ ~
'place and connect theliljI
: with a set of masks of components, resulting in Figure 10.5: A more compact NANO circuit: (a) NANO circuit schematic, (b) compacted layout.
I
: ~fined ates. ~\ resulting in masks. :
I ',', , • Placement: the task of placing and orienting every transistor somewhere on the IC.
I
', II
I ,_ • Routing: the task of running wires between the transistors, without intersecting other
I
I
I
, .........
--
'I wires or transistors.
I
I
I
• Sizing: the task of deciding how big each wire and transistor will be. Larger wires
[ START_ Designers provide the Designers provide the and transistors provide better performance but consume more power and require
: Designers provtde the connections among connections among · · more silicon area
, connections among components, which are
u
I cells, which are A good layout is typically defined by characteristics like speed and size. Speed is the
: gates, which are translated to masks. translated to masks.
longest path from input to output, or from register to register, typically measured in
\l@i G
nanoseconds. Size is the total silicon area necessary to implement the complete circuit. Both
of these features are usually improved when the circuit is highly compacted, namely, when
transistors that are connected are placed close together and hence their connecting wires are
I I shorter. Consider for example the NANO layout ofFigure I0.2(c). In that example, we did not
I I th ,The masks are sent to 1he
'Th masks are sent to the : Tue masks are sent to e : fabrication plant to pay attention to creating a compact layout. Figure 10.S(b) shows a compacted version of the
' ~abrication plant to : fabrication plant to NANO circuit. Notice how much less area is wasted in this compacted version. However,
\A·
• ~ weeks or months
~ irllll \ pool\llll_
produce res. such compaction must obey certain design rules. For example, two transistors must be spaced
apart a minimum distance Jest they electrically interfere with one another.
START
Designers obtain a
premade chip, and
.
r-~ r-~ I
\
I
I
I
I
.
.... •11-
~
In the past, many transistor circuits were converted by hand into compact layouts. Such
circuit design was a common job. However, !Cs can now hold so many transistors, numbering
in the hundreds of millions, that laying out complete ICs by hand would require an absurd
amount of time. Thus, hand layout is usually .used only for relatively small, critical
., •
program portions of I
the chip to execute ICs are now ready to l ICs are now ready to components, like the ALU of a microprocessor, or for basic components like logic gates that
ICs are now ready to : be tested/used. will be heavily reused.
the desired betested/used. be tested/used. I
functionality.
\., Instead of hand layout, most layout today is done using automated layout tools, known as
physical design tools. These tools typically include powerful optimization algorithms that run
for hours or days seeking to improve the speed and size of a layout.
I'
The advantages of full-custom IC technology include its excellent efficiency with respect l
Figure 10.4: The three IC technologies. to power, performance, and size. lntercoruiected transistors can be placed near each other and
thus be connected by very short wires, yielding good perfonnance and power. Furthennore,
I
10.4: Programmabl l .
e og1c Device (PLO) IC Technology ij
only those transistors necessary for the circuit being designed appear on the IC, resulting in no
wasted area due to unused transistors.
The main disadvantages of full-custom IC technology are its high NRE cost and long
time-to-market. These disadvantages stem from having to design a complete layout, which
even with the aid of tools can be time-consuming and error-prone. Furthermore, masks for ·Cell
every IC layer must be created, increasing NRE cost and delaying time-to-market. In addition, Library
errors discovered after manufacturing the IC are common, often requiring several respins.
the number of ICs we plan to manufacture if tliat number is small. ln·addition, manufacturing Programmable
an IC is risk~:. since we may discover after such manufacturing that an IC doesn't work connections c'd
properly in its target .system, either due to manufacturing problems or due to an incorrect
initial design. Thus, we never know how many respins will be necessary before we get a
working IC; a recent study stated that the industry average was 3.5 spins. Therefore, we
would like an IC technology that allows us to implement our system's structure on an IC, but
no + +
C01U1ection
connection
that doesn't require us to manufacture that IC Instead, we want an IC that we can program in
the field, with the field being our lab or office. The term program here does not refer to
writing software that executes on a microprocessor, but rather to configuring logic circuits
and interconnection switches to implement a desired structural circuit.
Programmable logic device (PLO) technology satisfies this goal. A PLD is a pre- OutO
manufactured IC that we can purchase and then configure to implement our desired circuit.
An early example of a PLO was a programmable logic array (PLA), introduced in the
early 1970s. A PLA was a small PLO with two levels of logic, a programmable AND array
and a programmable OR array. Every PLA input and its complement was connected to every
AND gate. So if a PLA had IO inputs, every AND gate had 20 inputs. Any of these
connections could be broken, meaning that each AND gate could generate any prodl!_ct term.
Likewise. each OR gate could generate any sum of AND gate outputs. A PAL (programmable
array logic) is another PLO type that eliminates the programmability of the OR array to
reduce size and delay. PLAs and PALs are often referred to as simple PLOs, or SPLDs.
As IC capacity grew over' the years, SPLOs could not simply be extended by adding more
inputs, since the number of required connections to the AND array inputs would grow too
high. Thus, the new capacity was taken advantage of instead·by integrating numerous SPLOs
on a single chip and adding programmable interconnect between them, resulting in what is
known as a complex PLO, or CPLD. CPLOs often contain latches 10 enable implementation
of sequential circuits also. Figure 10.7 illustrates a sample architecture for a CPLO. The top
half of the figure is an SPLO that can implement any function of the chip's input signals as
well as any SPLO output signal. The bottom half represents another identical SPLO. The
array on the left consists of vertical lines that c;m be programmed to connect with any of the
,, horizontal lines, so that any signal's true or complemented value can be fed into any gate. The
i output of each SPLO feeds into an IO cell. The IO cell can be programmed to pass the
latched or unlatched, tfucor complemented, output to the CPLO's external output_ and/or to
the programmable array on the left as input to SPLOs.
While able lo implement more complex circuits than SPLOs, CPLOs suffer from the Figure IO 7: A CPLD architecture.
problem of not ·scaling weli as their sizes increase. For example, supposed the CPLD
architecture of Figure 10. 7 had 4 inputs and 2 outputs. Then there wo1.1!d be 6 signals in the known a~ field-programmable gate arrays FP
programmable array, plus 6 more for those signals' complements. thus tequiring 12-input programmable iogic blocks connected by P~~g~~~t. An FPGA consists of arrays of
AND gates. Likewise, suppose there were 12 inputs and 6 outputs. Then therie would be 18 + The name FPGA . . a e mterconnect blocks
. was mtended to contrasr th(:<C d ,,; . .:
18 signa!s. requiring 36-input AND gates. Notice such an architecture doesn't scale well. which need masks to create the interconne . < . ~ f', .ces with traditional gate arrays.
The logical solution is to build devices that are more modular in nature. In paiticular. in contrast. have t11cir interconnection. ctwn._Jbet,.,.eer._the already layed-out gates. FPGAs
there is no need to connect every input signal and every output signal to every AND gate. A
more nexiblc approach can be used in which a subset of inputs and outputs are input to each
meaning ir. the designer' s lab. However"r::.;v:~
anywhere to bt found. and thus the nai . FPG
as log1~ blocks orogrammed in the field,
~ A GA arch1tec1ures do not have arrays of gates
SPLD. This more modular, more scalable approach to PLO design resulted in architectures · ne can be somewhat misleading.
rammin is done by setting bits within the logic o_r interconnect blocks. Those bits
Prog . g I ·1 (EPROM EEPROM) or volatile (SRAM) memory technology.
are stored using ~onvo :ue d -'n PLDs is an antifuse, which, as the name implies,
. behaves
Another opposite
nonv~latJ.le tecfu
10 a seque ~s: ax:tifuse is originally an open circuit but takes on low
resistance when programmed. CHAPTER 11: Design Technolo
10.5 Summary
uf; · IC ft; m this layout are complex,
Creating an IC _circuit layout and man actunngd: s sten:: designers can choose from
expensi\·e. and umc-consummg processes. Embed y . arket b trading off with
different IC technologi~s. in_order to reduce ~ cos~:~d ~:~:::om IC iechnology is the 11.l Introduction
other design. metncs. hke size. perfont~~~osiand .time-to-market but yields the most 11.2 Automation: Synthesis
most expensive technolog:,, m tenns o . r ASICs involve use of predesigned basic 11.J V~rification: Hardware/Software Co~Simulation
efficient circ~ts .. Set~:~~I~o!~!:i°~~!~t~-marke;.'but still providing good effici~ncy. 11.4 Reuse: Intellectual Property Cores
components. ms re du i_ emanufactured and thus eliminate the need for the designer 11.5 Design Process Models
;:~!i:::~e~~g~~:~~;~~~; ~~ge. greatly rcdm;ing NRE co:1 and ti'.11e-to~;;:!:;.:t; : l 1.6 Summary
. ificantly inferior to custom or semi-custom IC m terms of size, power, p . h., t ll.7 Book Summary
sign t l11e designer mav choose to use PLDs early in the design ~rocess, sw1tc mg o 11.8
urut cosand . L '.G !)recess w1ien 11ie·des1gn
. even full-custom· later m · has stab1hzed. References and Further Reading
ASICs 11.9 Exercises
l
<
Specification
Automation
Reuse
,;,,,fi<a<km Implementation
CJCJCJ
hard, but creating a physical implementation that satisfies coristraints is. also very difficult
because there are so many competing, tightly constrained metrics,
These difficulties slow designer productivity. Embedded system designer productivity
can be measured by software lines of code produced per month or hardware transistors
produced per month. Productivity numbers are surprisingly low,· with some studies showing
just tens of lines of code or just hundreds of transistors . produced per designer-day. In
response to low production rates, the design community has focused much effort and
resources to developing design technologies that improve productivity. We can classify many
of those technologies into three general te,;hniques, illustrated in Figure l l. l:
\. Automation is the task of using a computer program to replace manual design effort.
2. Reuse is the process of using predesigned components (whether designed by humans
or computers) rather than designing those components oneself.
3. Verification is the task of ensuring the correctness and completeness of each design
step. . · · Figure 11.2: The codosign ladder.
Providing thorough coverage of the advances in these productivity-improving techniques
for embedded systems over the past couple of decades would require an entire book itself. complexity b~ga~ to grow. Because of the different techniques used to desi~ software and
Instead, we will focus in this chapter on a few advances that have enabledihe unified view of ~dware, ~ div1_s1on between the fields of hardware design and software design occurred. As
hardware and software design. First, we will discuss the automation technique of synthesis, •l!ustrated m Figure .. ! 1.2, design tools simultaneously evolved in both fields, albeit at
which has made hardware design look like software design. Second; we will discuss the reuse different_ rates, ~o allo"'. behavior description at progressively more abstract levels, in order to
·of cores in the hardware domain, which has enabled the coexistence of general-purpose manage mcreasmg design complexity. 11ti"s simultaneous evolution has brought us to a point
processors (software) and single-purpose processors ('.hardware) on a single IC. Third, we will t~y. where both fields ~e the sequential program model to describe behavior, thus a
describe the verification of hardware-software co-simulation, which has enabled designers to reJommg of the two fields mto one field seems inuninent.
_verify compl~te hardware/software systems before they are implemented. As shown in Figure 11.2, early software consisted of machine instructions, coded as
sequences of Os and ls, necessary to carry out the desired system behavior on a
general-p~ose processor. A collection of machine instructions was called a program. As
,1 1.2 Automation: Synthesis progr~ size~ grew from hundreds of instructions to thousands of instructions, the tediousness
.of dealmg with Os and Is became evident, resulting in use of assemblers and linkers. These
"Going up": The Parallel Evolution of Compilation and Synthesis tools automatically translate assembly instructions, consisting of ·instructions written using
l~tt~rs _and numbers to represent symbols, into equivalent machine instructions. Soon, the
Wheri processors were first being designed in the late ! 940s IDd early 1950s, designing a
lmutat:Ions o~ asse~bly instru~tio~ became evident for programs consisting of tens of
compuier system consisted mostly ofhardwarc design; software, ifit was used, was fairly
thousands o( m~ctJons, resultmg m the development of compilers. Compilers automatically
simpL:. However, as the idea of the general-purpose processor began to take hold, software
translate sequential programs, written in a high-level language like C, into equivalent
assembly instructions. Compilers became quite popular_ starting in the I 960s, and their
popularity has continued to grow. Tools like assemblers/linkers, and then compilers, helped idea idea
software designers climb to higher abstraction levels.
Early hard•.,•are oc ·::'sted of cin;uits of interconnected logic gates. As circuit sizes grew
from thousands cf "':: · · .<:s :-;f thousands, the tediousness of dealing with gates became
apparent, resulting·. ··the ,, ::velopment of logic synthesis tools. These tools automatically
convert logic equations or finite-state machines into logic gates. As circuit sizes continued to
grow, register-transfer (RT) synthesis tools evolved. These tools automatically convert
FSMDs into FSMs, logic equat'ons, and predesigned RT components like registers and implementation
adders. In the 1990s, behavioral synthesis tools started to appear, which convert sequential
(a) (b)
programs into FSMDs.
Therefore, we now see that, while for several decades the starting point for the fields of
hardware design and software design consisted of very different design descriptions, today ·,
·~
t
both fields can start from sequential programs. Figure 11.~: The abstraction pyramid: (a) a model at a higher abstraction level has more potential implementations·
Why did the hardware design field take some 30 years longer to climb the abstraction (b) the design process proceeds to lower abstraction levels, narrowing in on a single implementation. ·'
\1· ladder to the level of sequential programs? One reason is thai hardware design involves many
t::
1:i
more design dimensions. While a compiler must generate assembly instructions to implement
a sequential program on a given processor, a synthesis tool must actually design the processor Synthesis Levels
itself. Extensive research and more powerful computers have enabled synthesis tools to In the following sections, we provide brief overviews of the details of synthesis at different
address the problem adequately. A second reason is that the very fact that one chooses to abstraction levels. Unlike compiler users; synthesis tool users must have a fair amount of
implement behavior in hardware rather than software implies that one is extremely concerned knowl~ge about synthesis. Compilers tend to be fairly inexpensive and easy-to-use tools.
about size, performance, power, and/or other design metrics. Therefore. optimization is Synthesis tools, on the other I,.and, range from costing hundreds of dollars to tens of thousands
crucial, and humans tend to be far better at multidimensional optimization than are computers. of doll~s. The user must control perhaps hundreds of synthesis options. Furthennore,
as long as the problem size is not too large and enough design time is available. Just look. for syn~es1s too_ls may ~e many hours to run, and their output occasionally needs to be
example, at how many decades it has taken for computers to be able to seriously challenge the modified. This compleX1ty associated with synthesis stems from the fact that optirni7.ation is
world's.best chess players. If the game of chess had evolved such that players only had IO ab_sol~1tely crucfal_wh~n synthe,sizing har~ware, and each user will have different optimization
seconds to think of each move, and the playing board was the size of a football field with tens cntena If opl..imIZation wasn t so crucial, one would simply implement one' s system as
of thousands of pieces, then we'd have a situation more like that of IC design. in which software rather than as hardware.
automation today is clearly better. .. We now provide a brief overview of the various levels of synthesis. A standard definition
We see above that, like an elevator going up, both hardware and software. design fields for synthesize_ is "fonning a complex whole by combining parts." In the context of digital
have continued to focus design effort on increasingly higher abstraction levels. Starting design hardw3;e des1~, howeve~, ~e ~nn has taken on the meaning of "automatically converting a
from a higher abstraction level has two advantages. First, descriptions at higher levels tend to :-Jstem s behavioral descnpuon into a structural implementation," where that implementation
be smaller and easier to capture. For example, one line of sequential program code might JS a_ comple~ whole formed by parts. The structwal implementation must optimize some set of
translate to one thousand logic gates. Second, as Figure 11 J(a) illustrates, a description at a design metncs, such as performance, size, and power.
i. higher abstraction level has many more possible implementations than those at lower levels. To better understand the meaning of converting from a behavioral description to a
p One can think of holding a flashlight higher above the ground - the higher we go, the more structural implementation, Gajski developed the Y -chart, shown in Figure I L4. The chart
ground we illuminate. For example, a sequential program description may have possible consists of three axes, behavioral, structural, and physical, each representing a type of a
implementations whose performance and transistor counts differ by orders of magnitude. description of a digital system, as follows: .
However, a logic-level description may have transistor implementations varying ii: . • A behavioral description defines outputs as a function of inputs. It ~escribes the
performance and size by only a factor of:wo or so. algorithms we'll use to obtain those outputs, but does not say how we'll implement
those algorithms.
• A structural description implements that behavior by connecting components with
known behavior. ·
_ 2 _ 8 _ 4 - - - - - - - - - - - - - - - ~ - - - - - - - - - - E -,m-b_ed_d_e_d_S-ys-t-em-O-es-ig-n __E_m_be_d_d-ed-S-ys-te_m_O-:--es-lg_n_-"-_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
28-,-5 . . • I
www.compsciz.blogspot.in ------·- - - --- 1~__.i
~
Chapter 11: Desjgn Techno~y 11.2: Automation:.Synthesis · ~
same level or a lower one, but not a higher one We now describe s th · hn'
several different abstraction levels. · yn esis tee iques at
Structural Behavior
Logic Synthesis
Sequential programs Logic synthesis .automatically converts a logic-level behavior, consisting of log·1c equations ·
Processors, memorie\ di . .
an . or an FS~, _m to a stru~m:11 1mple~entahoo, consisting of connected gates. Let us divide
Register transfers logic synthesis mto combmauonal-logic synthesis Combm·ati·onal Iog1c
·
Registers, FUs, MUXs . . . .. , and FSM synthesis. .
synthes1s can be further subdivided 1nto two-level minimization and multilevel minimization
Gates, flip-flops Logic equations/FSM Two-level logic minimkatio~: We-can represent any logic function as a sum of produ~s
{or a product of sums). We can implement this function directly using a level consisting of
Transistors Transfer functions AND ~ates, one for each produ~t tenn, and a second level consisting of a single OR gate.
Thus, we have two levels, plus inverters necessary to complement some inputs to the AND
gates. The longest possible path from an input signal to an output signal passes through at
Cell Layout most two gates, not counting inverters. We cannot in general obtain faster perfonnance. For
example, the function F = abc'd' + a'cd + ab'cd would be implemented with three AND gates
Modules followed by one OR gate, as shown in Figure l l.S(b).
. .Si~ce. pe~o~~ !s al_ready the best possible, the main goal of two-level logic
Chips m1runuzation 1s to rmmmize size. We can set a goal of minimizing the number of AND gates
m_a_sum of products imp~ementation. We can state this goal more formally as that of finding a
. Boards m1rumum cover of a logic expression, or function. We will now provide several definitions
that lead us to the definition of a minimum cover. We are given a set of variables (inputs to
Physical the function), such as: {a, b, c, d}.
• A literal is the appearance of a variable or its complement in a function. For
example, the above function has I I literals: a, b, c', d', a ', c, d, a, b', c , d.
• A min term is a product of literals in which each variable or its COlJlplement appears
Figure\ 1.4: Gajski"s Y-chart.
exactly once. For example, in the previous function, abc 'd' is a mintenn, but a'cd is
• A physical description tells us the sizes and locations on a chip or board of a not, because b does not appear. Any logic function can be expressed as a sum of
minterms; note that each mintenn corresponds to a row in a truth table. For example,
system's components and their interconnecting ~ires. . . .
For example, addition is a behavior, while a carry-npple adder 1s a structure. Likewise, a F could be expressed as abc'd' + ab'cd + a'bcd + a'b'cd.
sequential program that sequences through an array to find the array'_s lar~est-valued element • An implicant is a product of literals in which each variable or its complement
appears no rriore than once, rather than exactly once as for minterms. · in the earlier
is a behavior while a controller and datapath implementing that algonthm 1s a structure.
The ch~ also shows that each description can exist at one of various lev~ls of ·· function, ab 'cd and a'cd are examples of implicants. An implicant covers one or
abstraction. For example, at the ,gate-level of abstraction, a behavioral description consist~ of more minterms; for example, a'cd covers rnintenns a'bcd and a'b.'cd.
logic equations, a structural description consists of a connec~on of gates, and a phys•~. • A cover of a logic function is a set of implicants that covers all of the function's
description consists of a placement of gates/cells and a . routmg ~o~g them. As. ai_iothe minterms.
•. Finally, a minimum cover is a cover having the minimum possible number of
l
example, at the system level of abstraction, a behav10ral des~n~uon may cons~st of
communicating sequential programs (processes), a structural descnpuon of a connection of implicants.
processors and memories, and a physical description of a placement of processor/memory Since each implicant corresponds to an AND gate, by finding a minimum cover, we have
cores and buses on an IC or a board.
Synthesis can generally be thought of as converting a behavioral ~escnpllon at a
. .
achieved our goal of minimizing the number of AND gates.
We can extend our goal by not only minimizing the number of AND gates but also
.
i
I
particular abstraction level to a structural description. That structural descnpt10n may be at the minimizing the number of inputs to each AND gate. We can state this goal formally as finding
a minimum cover that is prime. A prime cover's implicants are all prime implicants. A prime
implicant of a logic function is an implicant that is not covered by any other implicant of the
'
Chapter 11: [>.esign Technology
· ··11.2: Automation: Synthesis
9
Multilevel logic minimization is thus an even harder problem than two-I 1 · . . .
. .
Theretiore, heunstics . _ . . eve mm1m1zat1on
are agam used by logic synthesis tools addressing this obi 1 · ·
· · drawmg
·improvement heunstJcs · from a smte of equation modificatio pr em. terauve
. . .
prevailing approach. ns are agam the
. .Synthesis:
. .FSM . Synthesizing
. an FSM to gates consists of two mai·n tasks state
muunuzatJon and state encodmg. State minimization reduces the number of FSM '1 b
·d u·ry· and · · sta es y
1 en mg _merging eqmvalent states. Reducing the number of states may result in a
smaller state register and fewer_gates. Two states are equivalent if their outputs and next states
. . .
are eqm~alent for all possible mputs. We can use an algorithm based on a tabular method to
~Ive this problem exactly. We start with a table showing each possible pair of states as a cell
size m the table. We step _through the ~lls, ~king each cell as not equivalent, equivalent, or
dependent on other parrs of states bemg eqwvalent.. which we list in the cell. "Not equivalent"
Figure 11.6: Trading off size and performance.
means the cell's two states either have different outputs or have a next stale no,ir whose c II ·
marked as not eqmva. I "E . al ,, ,,.... e is
ent. qmv ent means the cell's two states have the same outputs d
sacrifice some perfonnance if such a sacrifice would decrease the circuit size _further ~ ne~t state pairs ~t are all known to be equivalent. We step through the cells several ti:s
even the best two-level implementation. We can achieve such a trade-off by usmg muluple unld all cells are either marked equivalent or not equivalent.
levels of logic. . . .. ~e drawb~~ of the above algori~ is that the table size is n2, where n is the number of
As a simple example, consider tl).e function F = adef+ bdef+ cdef+ gh. The function can states m the ongmal FSM. Although n 1s not nearly as bad as 2". it still grows quick! ti
not be minimized further in two levels, and would require five gates (four AND gates and one I ·· · h y or
arger n, reqwnn~ muc compu_ter memory and computations. An example with perhaps 500
OR gate) if implemented. However, we could easily reduce the number of gates by factoring states would reqwre a table of size 250,000. Thus, many tools resort to heuristics.
out the def tenn from the first three implicants, resulting in F = (a+ b + c)def + gh. This . S!ate_enco1in? en~es each state as a unique bit sequence, such that some design metric
function requires only four gates (two AND gates and two OR gates). Furthermore, note that lik~ s1ze.1s o~Unuzed. Given n states, we require a minimum of logi<n) bits to represent n
the number of inputs per gate is reduced too. If each gate input requires two transistors, then uruque en~mgs. Th~re are rl! possible assignments of n states ton encodings (the first state
we've reduced the number pf transistors from 18 * 2 = 36 down to 11 * 2 = 22. The trade-off has n _possible ~ncodmgs, the second state has n - I since the first state already used one
·is that this implementation has slower performance, since it now has three levels rather than encodmg, the thi~d state has n - 2, ~d so on). We can't possibly try_all possible assignments
two, due to inputs a, b, and c passing through three gates before reaching the output. of s_tates to encodings for moderate size examples, because n ! grows so quickly. Heuristics are
· · We illustrate this trade-off of size and performance in Figure 11.6. The filled gray area agam common. '
represents.the set of all possible circuit implementations of a particular logic expression. The . Techno/~gy Mapping: We must specify the library of gates available for use in an
x-axis represents circuit size, and the y-axis represents ciicuit delay. Ideally, ~e'd l~e ~o 1mplementat1on. ~or ex~mple, a_s a trivial extreme, we may have only simple two-input AND
minimize both, but generally no such circuit exists, as illustrated by the hypothetical pomt m and OR gates available m our hbrary. At the other extreme, we may have numerous sizes of
the lower left of the figure mth an X through it. Two-level logic has minimum delay, and thus AND, OR, NAND, NOR, XOR, and XNOR gates, plus efficiently implemented meta-gates
two-level logic minimization seeks to find the smallest-sized two-level implementation, ~s (called cells or macros) such as multiplexors, decoders, and combinations of gates (like
illustrated. Further size reduction requires an increase .in delay (i.e., more than two logic AN?-OR-~RT). Thus, logic synthesis must generate final structure consisting of only the
. levels). Multilevel logic minimization seeks to find the Pareto-optimal solution (one on the
lower-left curved booodary of the filled area) for a given delay or size. .-
as
available hbrary components and should use cells and macros as much possible to improve
the ~verall design efficiency. Th!s. task is called technology mapping. Technology mapping is
.··· .As another example, consider the earlier two-level logic function: F = abc'd' + b'cd + agam a complex problem, requmng use of heuristics. Furthennore, a tool that integrates
a'cd. Simpk: algebraic manipulation yields the equivalent function: F= ab~'d' +(a'+ b')cd. t~hn~logy ma~ing with logic minimization, while making the synthesis problem harder,
We now "have three levels, but fewer transistors. We can simplify even furtl).er by noting that will likely result ma more efficient circuit. ·· ·
abc'd'= (~bc'd)"= (a'+ b' + c + d)'= ((a'+ b)+c+ d)'. So now the qri~ function is: F= ~he Impact~! Complexity on· Logic Synthesis User: In the previous paragraphs, we
((a' +.b}+ c + d)' (a'+ b)ccl. So we~ reuse the (a'+ b) term to further (rduee transistors descnbed the basic subproblems that together make up the logic synthesis problem. We saw
·do.wn io only 20, as shown.·in Figure I1.5(f)_. ! ..-.r
·; .
that each problem had a number of possible solutions that was eriormous for moderate-sized
But how did we come up. with this new function using fewer transistors? You ·can problems, such that ·enumerating all possible solutions and choosing the best resulted in
probably see.that it is.not easy. There are many .different ways to manipulate the equations.
. •· -
- - prohibitive space and/or time complexity. In most cases, rio algorithm ~f reasonable
www.compsciz.blogspot.in
11 ·2= ~utomation; Synthesis
complexity exists to optimally solve those problems. Therefore, most tools resort to heuristics·
having a far lower complexity in order to solve the problems using a reasonable amount of
memory and computation time.
The impact of complexity and of the use of heuristics ·on logic synthesis users is
significant. Logic synthesis tools differ tremendously. according to the heuristics they use.
• " .l Wire
Some 1ools use computationally expensive heuristics, thus requiring long run times measured • "
in hours or even days, and requiring huge amounts of memory typically found only on • " • Transistor
. .
. --~~-
· an ""e elay. Source: International Tee...._.___ R d r
!.
heuristic requires roughly:;, computations (times some constant factor) for a problem of size ph):s1cal_ design is no longer possible. lnslcad. we m ean scparat1~n of logi~ synthesis and
n. A super-linear-time heuristic (usually just called nonlinear, though that could refer to design sunullaneously ifwc are reallv to d ·. ffi _ust ~rform logic synthesis and physical
sublinear, too), in contrast grows more quickly than !hat, for example, requiring n3 • es1gn e 1c1ent c1rcu11s.
computations. This nonlinear growth means that a large problem may require much longer run Register-Transfer Synthesis
time than two proble1ns each half the size of the large problem. For example, I003 is more
Logic synthesis allowed us 10 describe .
than 503 ,- 503 (i.e., 1,000,000 > 250,000). Furthermore, 1003 is much more than 253 + 25 3 + H · · · , .· our S\Slem as boolean cq r
25 3 + 253 (i.c:.. 1,000,000 >> 62,500). Likewise, memery usage may grow nonlinearly. Thus, owever. many syslems are too complex to initially . . . ua_ IOns. or as an FSM.
a logic synthesis tool user must often partition a system into several smaller systems having Instead, we often describe our svst . - describe al lh1s logic level of abslraclion
compu1ation model, such as an FSMD. em using a more abs1rac1 (and hence powerful)
equivalent behavior, in order to achieve acceptable synthesis tool run times and memory
usage. . Recall that an FSMD allo\\S variable declarahons of . ,
Integrating Logic SyTJthesis and Physical Design: l1tthe past, transistors, and hence logic anthme11c actions and condi1ions. Clearlv rn k . complex data types. and allo\\S
gates, had a very large . time delay compared with wires. Thus, it made sense to create gates than to convert an FSM to gates a~d th~re ~,or _1sk~ssary lo c_onvert-an FSMD lo
. th . R . · 1s e.x ra wor 1s performed by reg· 1
synthesis .tools that evaluated perfonnance in terms of the number of levels of gates from syn es1s. cg1stcr-transfcr (RT) s, nlhcsis lak FS IS er-transfer
input to output. As the. industry moves to IC manufacturing processes that involve smaller and single-purpose processor. consisting of a da1a
gencralcs a complclc datapath. consislm ' of r ~-
:,~1
::d MD and converts ii lo a cuslorn
. an FSM cont_rollcr. In particular. ii
smaller f~IUre sizes, transistors slu;ink not only in their siz:e, but also in their delay. That's the
good news. ·· to implcmcnl arilhmCIIC opcra(ions and ~ cg1stcr umts lo Slorc variables. functional units
Now for the bad news. While transistor delays shrink with reduced feature sizes, wire these olhcr units. II also gcncmlcs a~ FSM01~~cc11on ulmls _(buses and muhiplcxors) lo connect
. delays have actually begun to incrl!<lse! This phenomenon is illustrated in Figure l l.7.
c rcalmg
· Ihc datapath requires solvin , "'twocon1ro s 1l11s da1apa1h
k . .
Therefore, in the past, it made sense to think of circuits as transistors connected by wires. A.l!oca1i~n is !he problem of mstantiatin , Sl~ra e cy subproblems:_ allocat10n and_bmding. -~ .
llowev~r, in. the future, it appears .that we'll have to start thinking of circuits as wires Bmdmg is !he problem of rnappin<> FsJo g .um1s. funcuonal unns. and conncct10n units.
A · 1 . 0 • opcra11ons lo specific uruls
connected by transistors! · s rn og1c syn1hes1s. both of these in S} nlhesis problems arc ha;d to soh•e optimally.
This change in the ratio of transistor delay and wire delay impacts logic synthesis
tremendously. To understand the delay of a given logic expression, a synthesis tool can no BE!havioral Synthesi~ ·
longer just count the number of logic gates from input to output. Instead, the tool must In RT synlhcsis. we describe !he , r h :.
measure the length of the wires connecting those gates. But in order to know those lengths, an FSMD. However. for man , save ions I ~t occur on C~'Cl}' clock .cycle ?f the syslcm. using
the tool must know how the transistors are placed on an IC. Placing transistors was previously corrccl function of lhc inpuls.1ni ,!::~.;~:r~ onlil ml~rcsl~ I~ havmg tlie o~tpul be a
cycles. Therefore. we may. want to d .b c iohw . Jal unction, .1s broken down m10 clock
escn c sue a system usmg a sequential program.
292 Embedded System Desig~ .
.. ~rttbedded System Design
. uential program into a single-purpose processor Scheduling is the task of determining when each of the multiple processes on a single
Behavioral synthesis converts a smgle seq B h vioral synthesis has also been referred to
structure that executes only that one program. e a processor will have its chance IQ execute on the processor. Likewise, memory ac~sses an~
bus communications must be scheduled. · ·
as high-level sy_nthesis. . . . FSl'vID in that it does not require us to sched~e the
A sequential p~ogram differs from an ribin the behavior. Therefore, implementing a These tasks may be performed in a variety of orders, and iteration among the tasks is
common.
system's actions into states when _desc .g d binding as in RT synthesis, but also
. · s not only allocation an System synthesis, like all forms of synthesis, is driven by constraints. A typi.cal set of
sequential program _req~ire . ent of a sequential program's operations to states. .
constraints dictates that certain perfonnance requirements must be met at minimum cost. In
scheduling. Scheduling is ~e asst~ hni ue for behavioral synthesis. First, we provided
In Chapter 2, we proVIded a simple_ tee q struct into an equivalent set of states, such a situation, system synthesis might seek to allocate as much .behavior as possible to a
templates for converting every sequential pro~dedcoans1·mple allocation and binding method, general-purpose processor, since a GPP may provide for low-cost, flexible implementation. A
. . hedul'ng Second we prov1 · minimum number of single-purpose processors might be used to meet the perfonnance
thus accomplishing sc t . . , ·able one functional unit for every operation, requirements. ·
namely, allocati~g one _storage urut for efivery~~e thi~ approach results in a correct processor
System synthesis for general-purpose processors only (software) has been around for a
and one connecu~n. urut for every t~s _er. Thus, behavioral synthesis tools use ad~a~ced
few decades, but hasn't been. called system synthesis. Names like multiprocessing, parallel
circuit, the circwt is clearly not opt1m~~~ allocation, and binding in order to optmuze a
processing, ll;lld real-time scheduling have been more common. The maturation of behavioral
techniques to carry ou~ the ~sks ~f sche ~d ~ompi!er optimizations that are a~plied before
circuit. They also typically mclu. e s_tan d d-code elimination . and loop unrolling. synthesis in the 1990s has enabled the consideration of single-purpose processors (hardware)
those tasks, such as constant propagation, ea , . . during the _allocation and partitioning tasks of system synthesis. This joint consideration of
general-purpose and single-purpose processors by the same automatic tools was in stark
System Synthesis and Hardware/Software Codesign . contrast to the prior art. Thus, the term hardware/software codesign has been used extensively
. sin e ential program (behavior) to a smgle-p~se in the research conununity, to highlight research that focuses on the unique requirements of
Behavioral synthesis converts a gl . sequbedded systems may require more than this. In such simultaneous consideration of both hardware and software during synthesis. However,
processor (structure). _However, complex e:vide better performance or power. Furtherm~re, .
pa..rticular, using multiple processors mayJ .bed using multiple concurrently executing
this term may be temporary in nature, as the distinction between GPPs and SPPs continues to
blur.
the original behavior may be better ~;:em synthesis converts multiple processes into
sequential programs, known as processes. fers to a collection of processors. Temporal and Spatial Thinking
multiple processors. Th~ term system here :e thesis involves several tasks. Transformati~n is
As we discussed earlier, the evolution of synthesis to higher abstraction Ieyels has had the
Given one or mo.re processes, system yn ble to synthesis. For example, a d"es1gner
the task of ~writing the processe~ to ~ m~re an;~":sses but analysis might show that those effect of enabling a unified view of hardware and software design, since implementing
functionality on general-purpose or single-purpose processors can be seen to have the same
may have described some behaVIor usmg w':np and th~s could be merged into one process.
design starting point of sequential programs. In fact, some researchers think that synthesis has
two processes a!"C really exc~usive to one :~is:rof two independent operations that could be
fundamentally changed.to the nature of the skills needed to build hardware. ·
Likewise, a large process aught actually Id be divided into two processes. Other common
done concurrently, so that process cou Ir Before synthesis, designers of hardware worked primarily in the structural do11!3in. They
transformations !nclude procedure inli~ing :d ~=~:o ::-types of processors to use to. connected simpler components, each having a well-defined functionality; to build more
complex systems. For example, a designer might have spent most of his/her time connecting
Allocation is the task of _selectm~ choose to use an 8-bit general-purpose process~r
implement the processes. A designer might . I the designer might use a 32~b1t logic gates to build a coiitroller, or connecting registers, multiplexors and ALUs to build a
along with . a single-purpose pr~ssor. Alternative %cessor, and multiple single-pUfP?se datapath. Gajski referred to this era as the "capture-and-simulate" era of ·hardware design,
since _designers would capture these systems using computer-aided design tools, and then
general-purpose processor, an _8-b1t general!-~urpo::ssors memories, and buses. Allocat1on simulate the system to verify correctness, before fabricating a chip.
proceSsors. Allocation actually mcludes sehiecung
t pr ,
is essentially the ~esign of the system ~c tee ur~sses to processors. One process ~ be With the advent of synthesis, designers of hardware work primarily in the behavioral
Partitioning is the task of mappmg th~ !' I ocesses can be implemented on a smgle domain. They describe FS]\,fl)s or sequential programs, and they then synthesize these items
implemented on multiple processors, and m . ~p :J':.Uong memories· and communications, automatically into structural connections of components. Gajski refers to this era as the
processor. Likewise, variables must be parUUon , "describe-and-synthesize" era.
among buses. · · · This paradigm shift from working in the structural domain to_the behavioral domain has
not only increased productivity but also had the effect of dramatically changing the skills
· necessary to be a good hardware designer. During the capture-and-simulate era, strong spatial
Embedded System_Design
294 Embedded System Design
www.compsciz.blogspot.in
Chapter 11: Design Technology
11.3: Verification: Hardware/Software Co-Simulation
reasoning skills were needed ·to connect components. Structural diagrams were the main verify !lie correctness of an ALU by .providi:1g all possible i . .
method for communicating system design information, supplemented with English the ALU outputs for correct -results which we f nput combrnauons, and checking
descriptions of how the system worked. For example, recall that in Chapter 4, we mentioned L'k · . , • o course have to co .
' ew1se, we can verif'.)' that an elevator conlroller won't h , mpute usmg other means.
that timers were typically described .in datasheets using a diagram of the internal structure of elevator is movmg, by simulating the conlroller for all .b a\e the door open wlule the
the timer. Hcwevcr, during the describe-and-synthesize era, designers must have very strong that the door is always closed when the ele"afor ·s _poss, le input sequences and checking
temporal reasoning skills, since they aren't working so much with components as they are
u . . • - 1 movmg..
. nfortunately, s1mulaung "all possible inputs" or " II . .
with things like FSrvIDs. FSrvIDs (and sequential programs) are created by composing states m1poss1ble for all but the simplest of systems N I. h a possible input sequences" is
(or statements) that have relationships with one another over time. Although designers always 32-bit ALU requires simulating 232 * 232 . 26~ ice I _at s1_mulatmg all possible inputs of a
had to have some temporal reasoning skills, those skills have nciw .become extremely Id - I , or . possible mput combi t" E
cou s1mu ate one million combinations per sec d . I - na mns. ven if we
important to create good hardware. These skills are often associated with peopl~ who are w Id . on , s1mu atmg that numbe f b'
ou requue over half-a-million years Furth . r o com mations
strong programmers. tor a sequenual . circuit,
. · ennore an ALU 1s onlv a comb· · 1 .
like an elevator controll ' . - - rnationa circuit;
At the same time, the structure of the implementations output by todai s synthesis tools input combinations but also all possible se er, we must simulate not ?nly all possible
is heavily influenced by lhe style with which a designer describes the behavior. Thus, the simulating all possible inputs or . ut quen~es of such combmat1ons. Instead of
design must still have. a strong understanding hardware structure and know how to write possible inputs. Titis subset usual';;inc~~~:~ceslc~~~;i"ers can_or.ly simulate a tiny subset of
behavior that will synthesize into an efficient implementation. Boundary conditions for an ALU might incl: ues, pl.u: known boundary conditions.
another where both operands are all I Th e ?ne case "h.,re both operands are Os and
design is correct and complete but does s;t us, s1mul_a11on mcreases our confidence that a
c . • n prove anythmg.
11.3 Verification: Hardware/Software Co~Simulation ompared wi th a physical implementation · I ·
respect to testing and debugging a system. The I~ smm atJon has several advantages will!
~C~ha~P~,e~-r:1~1::~[i~es~i~g~nzTfec~h~n~o~k>~g~y==========================================---- operation, determine the current values of B and C, compute A,- and send the results
s~mewher~. Thus, this si~gle operation might_ require 10 to 100 simulator operations. The
1 hour simulator 1s ~ctually runmng under an operatmg system, so each simulator operation may
actually requn:; perhaps 10 to 100 operating system operations. Finally, each operating
l day
system operation may translate to 10 hardware operations. So each operation we wish to
simulate may require 1,000 to 100,000 actual hardware operations.
l.4 months To overcome this problem of long simulation time, we have some options. One option is
instruction-set simulation 1.2 years to reduce the amount of real time that we simulate. So rather than simulating I hour of
12 years execution, we might just simulate I millisecond of execution. requiring 10,000,000 * 0.001 =
cycle-accurate simulation
>l lifetime 10,000 seconds, or about 3 hours. However, simulating I milisecond of execution does give
register-transfer-level IIDL simulation us much confidence in the correctness and completeness of our system, For example, I
I miUenium
x l 0,000,000 L-----~ga::t:=..e-.::,le:_v.....
el_:HD_L_s_im_ul_a_tio_n________ millisecond of execution of a cruise-controller tells us very lirtle about how the controller
responds in a variety of scenarios. Nevenheless, because of the slow speed of simulation,
. - ed with real-time execution. many embedded systems are only simulated for perhaps a few seconds of real-time before
- d f different types· of simulation/emu1at1on compar
Figure l l .8: Sample relauve spee s 0 . S . VLSI and Philips product literature. they are first implemented physically.
These numbers depend on the system size. ource.
Another way to overcome this problem is to use a faster simulator. There are two
. d I f the environment will likely be somewhat incomplete, so ma~ n~t common ways that simulators can be made faster. One way is to build or use special hardware
• 1hc mo es o II h that behavior 1s
O • for simulation purposes. These devices are known as emulators, which we'll discuss in an
model complex environment behavior correctly, esp..,c1a y w en
upcoming section. Another way is to use a simulator that is less precise or accurate. In other
·,-,
um.locumentcd. ed t execution of a physical words, we can reduce controllability and observability iP exchange for speed.
• Simuiation speed can be quite slow compar o
As an example of reducing precision or accuracy to gain speed, consider the earlier
especialiy the speed problem, will be
1',,
J.
implementation. .
Techniques for overcoming these problems,
discussed in the next few sections.
example where we used a gate-level microprocessor model as our simulation model. When
testing the cruise control program for correctness and completeness. we probably don't care
about what's happening at the inputs and outputs of every logic gate in the microprocessor.
Simulating at the gate level of detail is costing us tremendously in terms of speed. since the
'
Simulation Speed . . microprocessor may have hundreds of thousands of gates. Instead. we might replace the
~r
(1:
. .
compared to execution on a p Y
.
p
f simulation is that simulauon is very slow
Perhaps the most s1gruficant d1~d;i:~a~~ olementation. For example, while a physical
te 100 million instructions per second, a
implementation of a microprocessi ~a~ execu ssor may only execute 10 instructions per
gate-level model by a model made _up of register-transfer level components. which might
execute 10 times faster than the gate-level model, as illustrated in Figure I 1.8. An even faster
simulation approach is known as cycle-based simulation, in which we design a simulator that.
l1'
simulation of a gate-level model ~f
second meaning that the gate- eve s1mu a o
t1
~c~or:is 10 million times slower than actual
. . . sample numbers
is only accurate at clock cycle boundaries, and does not provide any information of signal
changes in between cycles. As showii. in the figure, this may gain us another factor of 10
execution. Figure 11.8 ill~Slrales _thishdi:r~;c;f ::;~o:spi:=~is~:::gOne hour of actual speed improvement. Going for more speed, we may not need to model the structural
t.\o
components inside the microprocessor at all, and instead we might just use an instruction-set
representative of an SOC with perha~s unbo et million hours of gate-level simulation,
execution of an SOC would require a lat ion is quite a reasonable duration to want simulator, which may gain yet another factor of 10. An instruction-set simulati.on may thus be
equivalent to about 1,000 years: One hour o s~~u .s -controller as an example. Given the 10,0000 times slower than real execution, so now simulating our desired 1 hour requires
to simulate. For example, consider an automo I e c~~ e locities we might certainly want to 10,000 hours. or just over l year. Such faster simulation is often -coupled with the
wide variety of possible speeds, road g~des, and wm ~e and c~ise-controller responses. above-mentioned reduction of the real time being simulated. So if we are wming to simulate
. . bo h , worth of envtronment scenanos _ . . . for 10 hours, we could simulate 10 x l / 10,000 = 0.001 hour of real time, or 3.6 seconds of
invesugate a ut an our s · . b cause we are sequentlalIZmg a
Simulation is slow for several reasons. One reason_is e . Wh • unplemented as an real time.
. parallel design. Suppose there are 1,000,000 logic gates _m a_.d ~;;;~n :en essentially have to
IC all 1 000 000 gates operate m parallel. However, m s1m . , - . Hardware-Software Co-Simulation
' ' ' - th -- f h gate one at a time.
analyze the inputs and generate_ e output o _eac - dding several programs . in More generally, a variety of simulation approaches exist, varying in their simulation speed
A second reason simulauon IS slow is because we are a 1 suppose we wanrto
and precision/accuracy. For a given processor, whether general-purpose or single-purpose,
between the system being simulated and real har~ware. Fo~ examp ed and understand this
simulate a simple operation like A = B + C. A simulator as to rea
·Embedded System Design 299
_ _ __::_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~E-:m=b~ed=ded System Design
298
www.compsciz.blogspot.in
Chapter 11: Oesign Technology
~-~~-c----------------..c.:--,--.c__ _
:-e.........
m"""""" System Des_ign
302 Embedded System Design
303
www.compsciz.blogspot.in
I
l
I
proceed to the next step, we never come back to the earlier steps, much like water cascading
Behavioral down a mountain doesn't return to higher elevations,
Structural
Behavi,Jral Unfortunately, the waterfall model is not very realistic, for several reasons. First, we will
almost always find bugs in the later steps that should be fixed in an earlier step. For example,
when testmg the structure, we may notice that we forgot to handle a certain input combination
SU11ctural in the behavior. Second, we often do not know the complete desired behavior of the system
until we have a working prototype. For example, we may build a prototype device and show it
to a customer, who then gets the idea of adding several features. Third, system specifications
commonly change unexpectedly. For example, we may be halfway done designing a system
(a) (b) when our company decides that to be competitive, the product must be smaller and consume
less power than originally expected, requiring several features to be dropped. Nevertheless,
Figura J 1.9; Design process models: (a) waterfall, (b) spiral. many designers design their systems following the waterfall model. The accompanying
unexpected iterations back through the three steps often result in missed deadlines and hence
extensively tested. Ideally, synthesis and physical design tools wo~~ generate correct in lost revenues or products that never make it to market. · '
implementations, but that is simply not the case today. In addiuon, even correct An alternative process model is the spiral model, shown in Figure 11.9(b). Suppose again
implementations will vary in tenns of their timing and power. . . that the designer has six months to build the system. In the spiral model, the designer first
A second increased difficulty in verification stems from tre fact that there is no duect exerts some effort to describe the basic behavior of the system, perhaps a few weeks. This
access to a core once it has been integrated into a chip. In the past, a system' s !Cs resided on a description will be incomplete, but have the basic functions, with many functions left to be
board, and those ICs could thus be tested individually by connecting a lo~ic analyzer the t? filled in later. Next, the designer moves on to designing structure, again taking maybe a few
IC's pins. Today, a system's cores are buried inside of a si~e IC, so directly _accessing a weeks. Finally, the designer creates a physical prototype of the system. This prototype is used
core's ports is impossible, requiring other means for scanrung port values m and out. to test out the basic functions, and to get a better idea of what functions we should add to the
Furthennore one cannot simply replace a bad core by anot'ier one, the way one could replace system. With this experience, the designer proceeds to proceed through the ~ steps again,
a bad IC in the past, thus making early verification even more crucial. expanding the original behavioral description or even starting with a new one, creating
structure, and obtaining a physical implementation again. These steps may be repeated several
times until the desired system is obtained. . .
up:
The spiral model has its drawbacks, too. The designer must come with ways to obtain
11.5 Design Process Models structure and physical implementations quickly. For example., the designer may have to use ·
A designer must proceed through several steps when designing a system. We can think of FPGAs for the physical prototypes, finally generating .new silicon (a task. that can take
describing behavior as one design step, converting behavior to structure as ano~er step.. and months) for the final product. Thus, the designer may have to use more too~s, which itself can
mapping structµre to a physical implementation as another ~ep. Each step _will ~bv1ously require extra effort and costs. Also, if a system was well defined in the b\:ginning and if we
consist of numerous substeps. A design process model descnbes the order m which these would have created a first-time correct implementation using the waterfall model, then the
steps are taken. The term process here should not be confused with the no_tion of.a process_ in spiral model requires more time due to the overhead of creating numerous prototypes.
the concurrent process model discussed in an earlier chapter, nor should it_ be confused with Nevertheless, variations of the spiral model have become extremely popular; both in software
the IC manufacturing process. Here, process refers to the manner in which the embedded development as well as hardware development.
svstem designer proceeds through design steps. The preceding discussion focused implicitly on _designiqg ~ingle-purpose processors,
· One process model is the waterfall model, illustrated in Figure l I ._9(a). Suppose a sine<: we started with behavior, designed structure, and then .mapped to a physical
designer has six months to build a system. In the waterfall model, the designer first ~xerts implementation. However, the discussion applies equally _to using gcneral-pWJ>05e processors.
extensive effort, perhaps two months, describing the behavior completely. On~e fully sall_sfied In the traditional waterfall approach illustrated in Figure · I l.9(a), a genefl!)-purpose
that the behavior is correct, after extensive behavioral simulation and debuggmg, the designer . processor's architecture (structure) is developed by a particular compariy and acquired by an
moves on to the next step of designing structure. Again, much effort is exerted, per'1:3ps embedded system designer. The designer then develops a software appli<:ation (behavior). ·
another two months, until the designer is satisfied the structure is coriect. Finally, the phys~cal Finally, the designer maps.the application to the architecture, using compilation and ~ual.
~~ ... '
implementation step is carried out, occupying perhaps the last two months. The result is a
final system implementation, hopefully a correct .one. In the waterfall _model, when we
Embedded systems represent a large and growing class. of computing systeR1S; which some
people beli~ve will soon become even more significant than desktop computing systems. The
nature of embedded systems has been changed dramatically by today's outrageously large
chip capacities coupled with powerful new automation tools, but methods for teaching
embedded systems design have not _~olved concurrently. This book is a first attempt to
remedy this situation. We started by introducing the view that computing systeins are built
· primarily from collections of processors, some general-purpose, some single-purpose
Figure. I I.10: A spiral-like approach represented using another. y -chart. (standard or custom), which differ not in some fundamental way, but rather just in their design
metrics like power, performance, and flexibility. We introduced memories commonly used
. ·· . led proach is begmrung. · 1o change. A spiral-like along with processors and descnl>ed how to interface processors and memories. With
However, even this ~del_y accep ~p be inning to be applied by embedded system processors, memories, and interfacing methods, we could build complete systems, and so we ·
Process model, illustrated m Figure l l.10, is g . s an architecture and develops an gave an example of one such system: a digital camera.
. ode! the designer develops or acqwre ' .
designers. In _this m '. . - e desi ner then maps the application to the archit~e, During the first part of the book, we did not focus on the nitty-gritty interruil details of
application o~ set of _apphca~ons. r1!.
and analyzes the design metncs O I is com ~na
gb. tion of application, architecture and mappmg.
. g, (b) modify the application to better
any particular microprocessor, since modern tools greatly reduce the need for such
knowledge. Instead, in ~e second part of this book, we focused ·on higher-level issues. We
The designe~ can then_choose t~ (a~:=1~::U~~~P=tter suit the application. This~ step examined powerful higher-level computation models like state machines and concurreri_t
suit the architecture, or(c) modify . difficult to consider. However, With the processes, which enable the capture of more complex functionality. We introduced the basics
. . tl hitecture was previous1y too . f
of modifymg ie arc -1 that can generate code for a vanety 0 of a large class of embedded systems, known as control systems. We summarized the key IC
maturation of synthesis tools as well as comp~ers "ble Furthermore as mentioned above, technologies available to implement embedded systems. Finally, we summarized the issues
instruction sets; this last step is much ~ore eas1 or ar.chitech•re in,the form of intellectual related to design technologies for mapping desired behavior to a physical implementation.
. · g1 obtaining the rrucroprocess ,..,
designers ar:.
property, w ic
~=:usybe potentially be tuned to the application. This is inbestarkodifi~:5'~yo
. . r IC obviously could not m
This book was intentionally broad in nature. It was designed primarily to serve as a
starting point for students about to study the various subtopics of embedded systems in more
the past, when
coincidence, an . ~~edF.
the dep1<;Uon m1cror;~:r°1his
m igure . . process
rl. model is referred to as the Y-chart,
detail, topics like _VLSI/ASIC design, real-time progranunjng, digital-design S}'llthesis, control
sy<=tem design, and other topics. The hope is that the student pursuing those topics·will have a
but has no relation with Gajski '_s y -chart de~n~ :iu::~ral structural, or physical models) unified view of hardware and software throughout their studies, an(J view embedded systems
Refining to lower abstraction levels (w ~t er ed . F'. 11 3(b) Such narrowing design not as a field comprising mostly low-level code hacking but-rather as a · unique
narrows the potential implementatio~s, ~ illustrat m igure . . -
proceeds until a particular implementauon 1s chosen. engineering discipline dell13riding a balanced knowledge of hardware and software issues. We
hope you have found the book useful:
l
Chapter 11: Design Technology
11.9: Exercises
11.9 Exercises
11.1 List and describe three general approaches to improving designer productivity.
11.2 Describe each tool that has enabled the elevation of software design and hardware
design to higher abstraction levels.
11.3 Show behavior and structure (at the same abstraction level) for a design that finds
minimum of three input integers, by showing the following descriptions: a sequential
program behavior, a processor/memory structure, a register-transfer behavior, a
register/FU/MUX structure, a logic equation/FSM behavior, and finally a gate/flip-flop
structure. Label each description and associate each label with a point on Gajski's
Y-chart.
11.4 Develop an example of a Boolean function that can be implemented with fewer gates
when implemented in more than two levels (your designs should have roughly 10 gates,
A.1 Introduction .
We intentionally designed this textbook to be independent of any particular microprocessor,
microcontroller, programming language, hardware description language, FPGA, and so on.
This decision was niade largely because the growing popularity and complexity of embedded
systems has been accompanied by tremendous diversity. The days when most courses on
microprocessor-based design used a fairly standard microcontroller are quickly giving way to
the · situation of tremendous diversity in · lab setups. Some setups emphasize 8-bit
microcontrollers, while others emphasize 32-bit platforms using one of a variety of popular
processors like Intel 80x86, Motorola 68000 variations, Sun Spares, MIPS processors, ARM
processors, digital signal processors, multimedia processors (like TriMedia's), and so on.
Furthermore, these processors come oil. a variety of development boards, each with unique
features. Some courses focus mostly on hardware prototyping while others include_extensive
simulation too. Some courses integrate the use ofFPGAs, which also come in diverse setups.
New chips and platfonns that integrate microprocessois and FPGAs are beginning to appear.
This diversity, coupled with the evolution of embedded system design into a discipline, make
the need to decouple lecture material from Jab material quite evident.
· -However, we have not simply left the instructor and students entirely on their-own with-
respect to lab setup. Instead, we have used the World Wide Web to supplement this book with
extensive lab materials. In fact, using the Web, we can provide even more than a typical
processor-specific textbook might be able to provide. ·
www.compsciz.blogspot.in
I
I
·r
APPENDIX A: Online Resources
I
A.3: Lab Resoun:es
•
the 1-bit adder previously used. '
"XS40 Tutorial: VHDL Synthesis." This tutorial shows students how to synthesize
and download VHDL code onto an XS40 board. The tutorial gives steps showing
with great success. Students are able to
much less time than before.
:n~t~:-
examples directly relevant to what the: d an don t spend enough time giving
e
We have used this Web site
-code for complex systems in
how to synthesize the code provided using Xilinx Foundation Express to generate a
bit stream. ·
Indoors
I
I. Cordless phone 6. Microwave oven
2. Coffee maker 7. Smart refrigerator.
3. Rice cooker 8. In-home computer network switch
4. Portable radio 9. Clothes dryer
5. Programmable range IO. Clothes-washing machine
.,
-<
-- -- ----
l1I J
··----·---- -- ........__,.~ · ··--·----- -·--····· . ,. ·--- - ·---~ -------- . .. ..
www.compsciz.blogspot.in
T
'
Index
I
cost process scheduling. 239 syslem specification. _16
safety. 5
nonrnlatile memo~ - l I l processor, 9. 21. 29 system synthesis. 294 ,:
scheduling. 48. 295 ·1;
NRE cost. 5. 7. 30 processor local bus. 165 system-on-a-chip. 22
SDRAM. 132
NVRAM. 120 processor technology, 9 . .
send. 231
0
programmable logic device. See PLD
PROM. 114 II sequential logic. 34
sequential program. 39
T
target processor, 69
obsen-ability. 297 proportional control. 258 technology, 9
one-hot. 33 · protocol. 118 sequential program model, 208
I- serial communication. 166 technology mapping. 291
one-time-progranunable ROM. See OTP PSM. 220 test, 18
ROM. PSRAM. 12ll set-associative cache, 126
shared memory, 227 thread. 238
opcode.62
operand. 62
pulse width modula1or. See PWM
PWM. 92
Ij shift register. 36 throughput. 8
time multiplexing. 141
shifter. 34
operating system. 67 limer. 84
optimization. 19
OTPROM.115
Q
QNX. 2-B
I silicon spin. 273
simulation. 18. 296 lime-to-market, 5. 6. 30
time-to-prototype, 5
single-purpose processor. JO. 29, 38
p
quantization. 183. 263
I software intenupt, 152 timing diagram. 139
top'<iown design. 16
SpecCharts. 221
PAL. 14. 278 R transistor, 30
speedup. 9
parallel 1/0. IJ4. 145 RAM, 58. 118 trap, 152
spiral model. 305
parity. 90. 168 Rambus. 133
SPLD. 278 lwo-level logic minimil".ation. 288
partitioning. 294 . range. 85 .
square wa\'e. 92
PCI. 173 rate monotonic scheduling. 240
SRAM. 119 u
performance. 8. 29 reaction timer. 87 UART, 90, 185, 197
period. 240 reactive system, 3 standard cell. 13. 276
standard I/0. 146 unit cost, 5, 7
peripheral. 11 . 84 real-time clock. 105 USB, 172
standards. 18
peripheral bus. I65 real-time system. 3. 242
state diagram. 214
photolithography. 272 receive, 23 I V
register. 35 state encoding. 50, 291
PID control. 261 vectored intenupt. 149. 160. 162
pipelining. 60 register addressing, 63 . state machine model, 208
state minimization, 50, 291 verification, 18
PLA. 1-L 278 register-indirect addressing, 63 VHDL, 313
PLD. 14. 278 register-transfer level. 3~ .
Statecharts. 217
static RAM. See SRAM VLIW, 61 l
polling, 149
polysilicon. 270
register-transfer synthesis, 293
relative addressing, 63
stepper motor, 98 VLSI, 13
volatile memory, 112
l
!
port. I JO. 138 storage permanence, 112 !
resolution. 85
strobe protocol, 141 1
port-based I/0. 144 revenue model. 6
structural description, 285 w I
323
ii
www.compsciz.blogspot.in ··-·-------- ·_ __J
Index
I
!
I
www.compsciz.blogspot.in
www.compsciz.blogspot.in