Frank Vahid - Tony Givargis - Embedded System Design

Embedded System Desig~
A Unified Hardware/Software Introduction
· Frank Vahid
Dep~ent ~f Corrtpuu{Scfon~ an4 Engineering
University of California, Riverside ·
..;. Tony Givargis

Depanmentof Infonnation and Computer Scien~
. . . Ulli".~ity <>f California, Irvine . .
www.compsciz.blogspot.com
. .~ . : . ·· ·: .!_, :
John \Viley & Sons, Inc.

-: ~ ;-,_. '.-':
. ··:":. I
www.compsciz.blogspot.in
-~
·~
l
:!
_j
:.,1
j
'l
I
I
-;· .:.-· -~·-· ,.,I
l' To my world: Amy. Eric, Kelsi and Maya, and to the memory ofour
:j sixth member, VahidAminian. --'- FV ·
!
To my family: Neli, Fredrick, Odet, .and Edvin. - TG

I
Copyright © 2002, 2003, 2004, 2005 Exclusive rights by John Wiley & Sons
(Asia) Pte. Ltd,, Singapore ·for manufa9t11re, anp e)!.119_rt,. This book cannot be re-
exported from the countri .to ~hie~ it 1s' 2dns\gned 'fry' Jbtm Wiley & Sons .
. . : · ·_ ·;. .·:.. .:.; ... . ; -_. -- ~ ....::.::-: ·,:-:· -~·-;Jr·:-.- ~=·~ . ·. · . . -.--. . .
. .
Copyright © 2002 by John Wiley & Son$; Inc. All .rights rese~ed.
I
No part of this publication may be repraju.ced, stored in a,retrieval system or
transmitted in any form or by any means;' ei~tionic/irihch~nical, photocopying, ! ;
recording, scanning or otherwise;·ex~ept as penr;iiiiil under Section 107 or 108
of the 1976 United States Copyright Act; withoiit;eitheHhe. prior written i
permission of the Publisher, or authorization through payment of the appropriate
per-copy fee to the Copyright Cleirrarice Center, 222 Rosewood Drive, Danvers,
I
MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to tlie Publisher for
permission should be addressedt().the Permissions Department, John Wiley & ·
I www.compsciz.blogspot.com
Sons, Inc., 605 Third Avenue, New York, NY l0158°0012, (212) 850-6011,
fax (212) 850-6008, E-Mail: PE~Q@WILE¥:coM. ·
i
.,1
Library of c_ongress Catawging-in Publication Data !
Vahid, Frank , ·' r ,

Embedded System Design:: i.. Unified H4ra.W~/ Software
introduction I Frank Vahid, Tony Givargis .::-.:~ .<
ISBN 9971-51-405~2 .
:·; _.· ,- ·:' · "\(} .-·· · :;_;'
Printed and bound in India by Replika Press Pvt: Ltd. Kuridli 13 l 028
109876
[I
~
~
";j
•I
~
:1
1
I
;j
;;
1
ii
l
:\ Preface
·~
·I
·•1
·1
,1
,j
,l ·.Purpose
. \
1 Embedded computing systems have gr<>wn tremendously irt recent years, not only in their
j· popularity, but also in'. their complexity: This coniple,ufydemands a new type cif designer, one
.i . who can easily cross the traditional borderbetw<ien' hardwaredesign and software design.
i ·After fuvestig~ting the availability' of:courses' lllld textbooks, we felt a new course .and
~. accompanying textbook were necessary· tifintioduce .embedded coniputing system design
.
•.,1 using a lliiified view of software and hard\\'lire; This textbook portrays hardware and software
1 not as different domains, · but :·rather as two mlplementation ·options along a continuum of
l options varying ui thefrdesign metricsf like cost;:peiformance; power, size, and flexibility.
~ · · Three important treridS have 'riiade such a· uhified view posmble. First, integrated circuit
~ (IC) capacities rui:ve increased to the point ~t both software processors and custom hardware
;! ' a
processors now ~rnnionly cod:ist on smg!e'It:'Second,quality compilers and-program size
\l. increases have led to the conunon use ofprocciisor~independent C, C++; and Java compilers
and ··integrated i design ' eilWoiurients ·1(Il)Es) in ' embedded system ' design, significantly
/j decreasing the importance of the· focus .on microprocessor internals· and assembly language
:1 programming that dominate most existing embc;dded ~stem courses
and textbooks: Third,
r1 ·syn*eSiS . teclmology has O~an~ ;JC) ; the,,ip!)iilt: $at synthesis .(OOlS .have ·become
! 1 ·.commonpla~ in the d~ign :~f .d,igitalJ1iµd~iµ-e:_; SyrithesiJ; tools achieve nearly the same for
i\ .• hardware de~ign 'as co~il~rfacbJey~.J~:~flw~:des~gn;) 'hey allow the designer to describe
-·- ---·--- -·T--· -· .. · desireit fwtcµ<>.naijty in',.a-Wih~1~e1 , pi::<1gram.miijg.~g\lage; and they .then automatica11y
k,1 .gel:!€:rate ~ effi~ie,qt -~~~to,Jn~),la¢wiµ-e. pr~r -implementation. The firsttrend inakes the
J · past,separation ofs<>ftw~e: 3114,har,4~M,e;!f~gr,;neaI'ly)ntpossibk Fortunately; tbe second
~ ' ancfthird trends enable tbeir ,lJJJ.ifi¢ d"'5igP,; bY.-l!lrnil!g embedded system design. atits highest
·~ .Jevel; into.the problent ofse,lectj11.g _lµld pn;igraimning (for so,ftware), designing (for hardware),
·il lµl~ ,~te.~tiilg"pJ<>CeS§q~t -i- ;s ·;t .·) : 1-c_;·,:. 'i,:,::-,:r;
·~} ... .,..........~.·.. ;·... : :,·(: i).: d ·::·,~_-;·~. ·: - :
·cQverage _ . ......-. . ._
· .·.._· . ., . · ..
:~
i .. .-
'"
, : .· .·.;Etl=iiaT:=~~s
.
~ . \ . . . .
j'l -,--_,..,..:__ _ _ _ _ _ _ _ _ _ __:__ _ _ _ _ _ _ _ _ _ _ ___,;__~,
}1 ..... · .· ,. .. J)!t: . .
. ~ -~~d~SysterrtOlisign .VII . . :\?
"
-----··--·- . . . . -
-l.i:. - - - -------- -- -------------
r
f - - - - ~ - - - - - - - - - , -- - - - - - - - - - - - - - - - - ~ - -
Preface
· ,:.
(software). custom single-pw-pose processors (hardware), standard single:purpose processors

(peripherals). and so on. But nevertheless, they are all just processors, differing in their cost, Introduction to Logic/Digital
po:wer.perl'ormance. design time, flexibility, and so on; but-essentially doing the same thing. PJ:Qgramming
Chapter I provides an overview of embedded systems and their design challenges. We
~ign .,, . ...
introduce custom single-purpose processors in Chapter 2, emphasizing a top-down technique
to digital design amenable to synthesis, picking up where many textbooks <?PAi~~I deliil:lll
leave off. We introduce general-pw-pose processors and their use in Chapter 3; expecting this
chapter to · be mostly review for many readers, and ending by showing how to design a
general-purpose processor using the techniques of Chapter 2. Chapter 4 describes numerous
s(?,!IQi!!d sin.gle-pµrp<1se processors (peripherals) common in embedded systems. Chapters 5
and 6_ introduce memories and interfacing concepts; respectively, .to complete the fundamental
knowledge necessary to build basic embedded systems. Chapter 7 provides a digital camera
example. showing how we can trade off among hardware, software, and peripherals to
achieve implementations that vary in their power, perl'ormance, and size: 'These seven
. chaptei:sJonn the core. ofthis book. _ .
·:·,.··..fr~/ ~r9!ll. the: necessity 9f c:o".ering ,the,. ajtty,~gritty detail~, of. a particular
'-.~n\icropJ'~s~or; ~ in,te~s and assembly langua.ge pr.9gra,numng, .tltis bQQ~ includes coverage .,·.i: ..:..... '. /'.·
.. ,,.,· . ;.9f.;~09,i~,'.a~diti_qrui!, eipqedded systems,t9pic;s. _Chapter 8 _<!~~bes: a$i~ced state· machine ' those courses shift away from assembly-level programming to the use ofmore tools fuodem
; computiltion ,models ,that ,are becoming popular when descn~.mgc:omple;ic·emlx:dded system . and to tlie 'mtegration ·of.microprocessors and, custc,m: hardWate"'(e:g:?FPGAs)_ In other
.··,·:·:~-~ha~ior:·:fai~\ ntrod~ces the concurrent process model and real-time sysJems. Chapter 9 curtic:ula; :a new :course on embedded,· systems may. be necessarf;' we 'obsetve:tiiat numerous
. ' ,_.gh;~s a :b~ic' introduction to control systems, enough to make.stuck;nts aware that a rich universities areiii!foduCing such courses, often convertiiig'a second coufse in digital design to
·- ··";,ifie9cy, r
e,.:fsis or s.ontrol system~, and _to enable ,stµdents ~Q. determiAe.when an:embedded . a oourse 9n einbedd¢d systems (as We ;did at UC::R); The book'coulahlso be used in a capstone
. ,--~;:sfomis' a1i ,exampfoofa qmtrol system. Chi!Pter \0,- intr<>d\l~~ a variety ,of popular IC a
senipr,design·to~·as text that brings tqgether and;organizeii'niucb ·of whafmidents may
. . techrihlogies.'fron{ which adesigner may ch9ose for systemjmplementation. Finally, Chapter have beyn' exposed ,lo:already -such courses ·often do not:evetiiliave;~;textbocik: The book
J 1. .highlights sarious design technologies for. buil_ding , ernbe4ged ,.systems, including should ·atso be use,W at the graduate level for an introductory embedded systems course.
. ·.. •·. :,iji;c~~sion pf )lar~ware/soft~~~ codesign, a user's introductjo(!. tQ:syµJhesis (from behavioral . ·-. 'i ' : ·. .. ;: ,.' .:_\;'":, {::(.-<:, ~\ :.;·[·;(.-:\ ·, -.:"°';._
~
,.. \ ciqi:niciJogic level.s), and the major trend to.ward de!i/gn. based Qn il}teilectuaJ property (IP).
• ·,. - . ... , · : -.:- . . • .:...s ,: . .· • •' - . . ' • l..~bon!!<>iY..;/ ,.;, -,
i Jdeajly;:a:touise using this book ·should have :ari accompanyibglaboratofy)'.Jkideal lab setup
•· I · would1iriCludi:°boih:software development.on an embedded miciopro&ss'ot or inicrocootroller
. , ·.r <We use tliis bookafthe _Uni\'ersity .of California;; Riverside; in-ii_~rie~uarter course called I1. . 'platform aruf hardware''development on · ari' FPGA platfomi,r(ot 'even:'in a 'simulation
·i'lritrodiiCtiori to- Embedded Svstems: wliich follciws' our"iritroductory course •ciri: logic design,
...,,,,,, ,;::;--;',;<:]iid' 'i\'hicli ·,fa takeri·bF all tontpUler sdence;·:c:omputer 'engineering,:'arid iielitrical engineering j . ·. t~~;::~~oiially ¢reated this book to be_·_:mdepe:~~:t.~ o;;; :y·.~~:~·~ o r.
<'-:'.,:. !0St uaents>al'roughly the sopliomore JeveL This ear~y
1>lace1I1ent,'ofthe;oouise'in o~r'clllrictilum •i ::one reason is·because 'embedded system: tools and:p~oc1ucts hl'e evo1viligTc11>idlFiJci;;;we
· ':t sd rc:prcscnts"'ciiir :belief'tha[ an earl}' unified Niew "of haidwarc 'antl i software 'can. be very ;j •;··-considet':the .abilityXto_•change'· lab· eqvironinerits ,\Vi.thout :baviiigi:to>cbanget textbooks an
.·,··. , ..•.;.·tx:·iie!icial,toa,student's •mindset' when ·tafor takingimofe" specializecfcburses:-;The' suggested ,j •impo,rtimf biie.•Asecond reason is because the embedded system field has'evolved ·sufficiently
·~ '.,.cd place~cnF'6I the course: in ' a.it ilildetgraduafoturriciullim 'isI~tio'\VrViriiFigureP. l : C>ur one-· 1 '. ;:·\t_o warrant_.r>a 1.f>oo _·ever, !a;;-course
_'-·k 1'based ,on ',principles. How ' < lab may
__.;With ca_·,cfunds.ioo_
_: ,,quarter)c;oursc covers, Chapters : 1-,,7.' Wdiave a •:s&orid quarter oourse'oi{embeddedsystems ~~
; ' ' Silpplemerit this:oookwith a ,processor.:specifo:·databook; :whi.cli is typically low cost or eveil
that covers -~hapters -·s-12. supplemented with a .textboolc ron: ·reati.time ··systems. A ·! ··, ,,_•· free; ,,or -witfr orte 'of. ,many ,commonly available, ,i•extendoo->datatxx,lt) pfuces'Sor-specific
·one 0 scmester course might cover Chapters l-7 plus two or three additional chaptt;rs of the.
1,
Ii
1·
instructor's choice.· · {'1i}f,!o;;}.'> · ' ;j ·:,'1:;'.:. ::•.._,'_t~~~t~'t .: ~~:bt~:·;n~epend~~t··-~f-·:~y··'~ ~ ;.~ W'• ~ o n-·•'Ianguage,
:I
!j :::~i1UI~iri~!~l~!-iS:~J1=~tiE9:l~::; ••
;J
t
J1
. synthesis tool, simulator, •of: FPGA. .. Supplements that . desctibc:Y,the•-·pamcwar·:bardware
\'
environment, again usually available fot free or at low cost, may be useful_
i vii_i e~~~~~~~i~t~ll)A~~J~O..,, 1·Emb~t~~:~;te:.~":'.~-~-

•l
ix··
~ -k
I
-"""---'----'-----------------------'------:--:--::-,-~-~---:---:----:-------~,-:-----.:......:_
•'Pr~tice .
I
__ --'-~-
Preface
At UCR,. our labs are based on the 805 L microcontroller.and Xilinx FPGAs. We use the About the Authors
Keil C compiler for the microcontrollei", Xilinx f'.oiirii:fatioit·Ejq>ress synthesis software for the I Frank Vilhid is an Associate Professor in the Department of Computer Science and
FPGA, and a~~elopm.ent board from·Xess Co.·iwfatfo."rifofpfotdtyping- the board ~o.ntains .··Ii
Engineering at the University of California, Rivenside, which he joined in 1994. He is also a
both an 805Land lilt FPGA. We also use an 8051 .einulator and stand-alone 8051 chips from f?Culty member of the Center for Embedded .Computer Systems at the University of·
Philips . .. . .. . California, · Irvine. He received his B.S. in Computer Engineering from the University of
We have provided extensive information onour lab setup and assignments on the book's l Illinoi~, UrQana/Champaign, and his MS. and Ph.D. degrees in Computer Science from the
Web page: Thus, while the book's microprocei,sor independence enables instructors to choose University of <;alifotnia, Irvine, where he was recipient of the Semiconductor Research
any lab environment, we have sti_ll prciv/4¢ instructors the option of obtaining extensive Corpcrat.ion Graduate Fellowship. He was an engineer at Hewlett Packard and has consulted
Online assistancejn
~-- ----·· . developing
. .
an
accompanyi~g
':·
laboratory, ; .,
for nUmerous companies,.including NEC -~d Mot!-)rola. He is co-author of the ~uate-level
textbookSpec{fication and Design of Embedded Sy_stems (Prentice-Hall, 1994). He has been
. ..• Adtti"iio11a1Matena1s 1.:' •. .. . · program chair and general chair for both the Inteniafional Symposium on System Synthesis
A.;cl> ' Jage has been established to 'tie used' dn )~onjunction · with the book: and for the International Symposium on Hardware/Software O;xlesign. He has been an active
researcher in.embedded system design since 1988, with more than 50 publications and several
·'tittp:liwww.cs.ucr.edu/esd. This Web page contains supplementary maierialand links for
eac;h c:bilpter. It also contains a setof1~1Jre sl\4es inf\1ic:wspf(PowerPoint fofillllt; because · . best papei-\awards, including an IEEE Transactions on VLSI best paper award in 2000. His
the book itself was done entirely in Microsoft Word," the 1'gures in the PowerPoint slides are research .interests are in einbedded system architectures, low-power design, and design
. ~owerPoint drawings (rather than imported gnlphics), and .thtis can be modified as desired by I methods for syste111-on-a-chip.
-mstructors-:,· ...·, . • , ·- . ·· -·., - .,.. ·.-.···.··· ..... ,.. ...,.,....· ,··. ..... , . ·. ··· .,. ,
., · ::. f ~~~6r~, the .W~b page c~~t.tin~ an•ixttn~iveJab ;rri~,um. to accompany this 0
, I an
Tony Givargis is. Assistant Professor in the Department of Infonnatiqn and Computer
Science and a member cif the Center for Embedded Computer Systems at the University of -
.text.bo_
:__: 9k, ~e_I"_ 30 _lab e_xercises_,_ 1_·ndu4i_:,n_ig ,de.Jail~ _,. 4e_sc_·.. .n_·ption_.~;,SC .. heQta.tics, and co~plete.or ., California,.lrvit!e;He.received his B.S. and Ph.D ..degrees .from the Univ~ ofCalifomia,
pru:(lil,I soh1t19ns, ·can be fo.und there. .11i.e ewfqses.i~e prgaiµzed by· c~pter, .startrng with .
Riverside; where-he received the Department of Computer Science Best Thesis award and the
v_ery_. ~j~
. P__ 1e_ •_ex_e.rc·i.~. an_d ~ea
-· dingt_o p:ro___gf·-~.ss-jv~!r,m_
,_._P_.•_~_~.:,9~m
-. P
... '_. .ex_·. _n_~- F·.o_._re_~p.le_._' Chapter •1
Q.
UCR~le_g~ofEn~(;¢ring Outstanding Student award, and where he was recipient of the
2',sexen::1~,sta,rtw1th a sunple bhnlciiJ.g.lig\lk;.,mcJ en!i,,~1t.h asoda.cmaclune_colltroller and a ·
· calculator: Appendix A provides fuJ1her information on.our Web page. .· ' I · GAANN Grad11ate .fellowship, a MICRO fellowship, and a Design Automation Conference
.. Ack~Jtv,~d~~ents' •. .·....•. .• . ' .. ' ' ... . .• .· . . .•.·. I scholarsliip. As a consultant, he has developed numerous embedded systems for several
companies; ranging !from an irrigation management system to a GPS-guided, self-navigating
· automobile. He h~'published more than 20 research.papers in the embedded systems field.
We ar~ g~tetul to nu~~rous indfviduals ~or their ~si~ce in _developirif ~1s book. Shar~n ., His research interests include embedded and real-time system design, low power design, and
...· flµ ,o(;Notre .Pame,.Nil<ll Putt o(lJG lryllle;,M4 $111J.t11.,.l;!~bi of UC ·Pavis and Synphc1ty . .proce5$or/system'-on-a-chip architectures.
··r~Et;~~~d~==;=;ru,::J
..; '.~ii~~:;1~~~t:f!~~~~%:tt!~i!::i::~:~~a~it;;~~r~:a~tJ ·
.._J9yer~~4e (;()ntributed mu~h ·qf thed~pter .o!),<;011trol sy$tem,s., Karen Jicl;iechter,<;onyerted our ·• ·
.. ·. ·\.',Oyeraes/~ jd~ WPt,.11e irtiJ.i;tl 3_.,D scepe; ~ e,.gepei_o~;4W¥1ti.QttS'..9f ~Q5 L~uip!);lent from .
... Pamti~ ,..Seijucondµctors. and of FPOA <;quip,roent,Jro01 .Xiliilx,-were ;a big assistance-".
. Lilcewjse/ a National Science .Fqund!ltio11 CAR.E;iiR .fiWar,d. supppn,ed •some· of tl.lis book's :I
.. ,cJeve!9p1lle11t.,We thank Caroline Sieg .at Wj,ley,:f<>r,over~ipg ,$<; .·l)wk's -pro<luction and J
Madelyn Lesure for overseeing th~ cover d~ign. Fiil@..Y; '\Y.!uu;e id~_ply( grat€:ful to -~ill :i•.
··••·. ~..1;:=~::rit~;~;-:,t:':~ ; :·~-.·fa'.i· . I
.. , i~~-";----"--'--'-----.--'--'---.....:..,.----_.:._~_.:.__,__~--------'_.:._.::.._"-----,-
~--~----------------------....,,,_.:c:---......,...~~........,...-·.,..,·~··'-··--"·· .Eni~eddedSystem O~ign xi
X .g~tie~ded Syste.m 6~~igrl .J .· ..~..:·~.~.
www.compsciz.blogspot.in _________________ __
- - - - ---,-~ - --=========aao!!!i!l!li!!b...-....:..._. - - - ~ ~"""'~ .......
~ · ··.··· -··
:i
."·,
"THIS .BOOK IS FOR SA.LE: ONLY IN THE COUNTRY TO WHICH IT IS FIRST CONSIGNED .
BY JOHN WILE"'.,& SONS (ASIA) PTE LTD AND MAY NOT BE RE-EXPORTED" .
Contents
...... .
vii
vii
vii
viii
~~ --•/4
. lff).~-_.._. ix
X
X
: ~..-'.· . ·. · . .
xi
'
1
l
4
'. 4
6
.7
8
9
9
I 10
12
13
,1 ·\ ····· 13
jl
i ..
·13
i 'i·
.•14"· ·
II, · · 14
· . .· ·,_, · .. ..
\ ·-:J 6·
-;11
18
Contents
. ''· :\is0enfi~~tio/
i :'": . '
1: ·. l
More Productivity Improvers
Trends·
!9 j\ 3.4
Superscalar and VLIW Architectures
Programmer's View 61
. 61
19 Instruction Set . : ,;···
1.6 Trade-offs J, 62
Program and Data.Memory Space
Design Productivity Gap _27. Registers
64
L7 Summary and Book Outlin~ .• 24 I/0 , 64
I. 8 Re rences and Further Reading · 25 j Interrupts
65
25 a 65
. . --~ -Single=-Purpose Processors:Hardware ·29 J Ex.ample~ Assembly-Language Programming ofDevice·Drivers · 66
~~
~
29
I
i
Operating System · · ·
67
3. 5 ·Development Environment
· .2 Co mational Logic 30 l 69
Tran . tors andLogic Gates ·,. ;,. 30
32
I Design Flow arid Tools
£~ample: Instruction-Set Simulator for a Simple Proces'sor
' 69
71
B 1c Combinational Logic Design J Testing and Debug}iing, .. ·
. T-LeveLCombin~tional Components .., ,., . 33 l 71
3.6 · Application-Specific Instruction-Set Processors (ASIPs)
2.3 · 1 :sequential Logic 34
. l1 Microcontrollers ·
74
·
Flip~flops,/ .' , . .. . . 34 i 74
'"RT~te:vel,'Sequential Components
35
I Digital Signal Processors (DSP)
Less-General ASIP Environments
75
75 ,
~
- :__--~-,.Js~e;
·q·~u.·ien~t~ia.·~1~L~o~~ic~ .· ~·..i-;;~:
·D~e~s~i~n~; · ;·.··;·.'·.~·.·~/~~·,~~:"~·);>:•::::-~
Custom Single-Purpose Processor De~rgn ··.'"· ·· ·,, ·· '
~ ;36
38 ,
,!.
3. 7 ~?'·--selecting a ~Microprocessor ···· ··
75
3. 8 .. General~Pu~~b-se Processor Des1g°*
2.5 RT-Level Custom Single;.Pu ose,Proce . . 44 j 3.9 Sumrhary • : 0 • • - : ·
77
. ttmtzmg ·nstom Singl¢"'Prirpose Processors :.-. ·47 i 80
Optimizing·the Origina!)Prograh1''' · · . • . < 47 ! 3. JO Refer~nces and Further Reading ·
_ 3) I Exefdises · . :- >. -~ >':. ·· ·
.
. . . •.· .
80
81.-
Optintizirig the FS~ . . .. . .. .. :~
Optimjzingthe Datapath .r - ;'.,i •'.
!_·
_91APTER 4:,·'
Stand~rfSirigle~Purpose Processors: Periph~rals 83
4. I Introduction · · ·. · ··
Optimii:jpg tge FSM 'Y:' ·· 83
42 Timers, C::ou~ , and Watchdog 'I;'imers 84
2.7 -Suttimary
Timers and
2.8 'References and Further Reading•, . 84
87
. ·~TE:;~t~i]i~iieral~;~~bii;f:~i~~:;;~s:>~oft:~~~,-;~:, .._,,h;i; ,.· 88
3.1 Introduction · · ·' · ., · · · ·· . . . .. 89 ,
3. 2 ·c Architecture .. ·-' i - 1 90
D ath .. Pulse Width Modulators· 92
i1t;ol Ulli(f/ '.· t~n~·: ,. ,:"~·- i-· 92
~ ; ; ; Controlling a D~-~otorUsinga PWM~- , , · ,:·
Memory 4 .5 LCDControllers .. .
94
·. 3.3 · 0 ration . 95
Overview
Inst ction Exe.cution 95
Example: LCD,lnitializafiori'' ;;' 97
~ipeliriirig - · · ·
4 .6 Keypad Controllers . 97
.xiv.
www.compsciz.blogspot.in . . --··· - ---·--------- --- - - -- - ---- -· ·-----·- -----··-

I
~
I
~nterits r.
!.J ,_--~ - - ~ - - - - - - - ~ . - - - , - - ' - - ~ ~ - _ _ _ . : . - ~ ~ ~
Conteni!: .
4 .7 Stepper Motor Contro.Uers .: . 98

98
j~ The Basic DRAM 130
Overview . .,, Fast Page Mode DRAM(FPMDRAM) 131
Example: Using a Stepper Motor Driver . 99
1.01
l: Extended Data Out DRAM (EDO DRAM) · 132
a
Example: Controlling Steppe~MotorDirectly J Synchronous (S) and Enhanced Synchronous (ES) DRAM ~ - ----r32
4.8 Analog-to-DigitalOmverters 102 ; · Rambus DRAM (RDRAM) ' · 133
103
Example: Successive Approximation DRAM Integration Problem lJ3
j•
1
4 .9 Real-Time Clocks · · · l 05 Memory Management Unit (MMU) 134.

4 ,10 :~ummary . . ,, . . \'. ,;,. · < 106 ~ 5.7 Summary · · · 134
4 . 11 References and Further Readirtg 106 ·! · 5.8 References and Further Reading . 135 .
4 .12 Exercises . . 110079 _•.!,;_:, _ ~T:.J.'. Exercises 135
~TER 5: . _M~n10ry . y1'!_~TER 6: Interfacing .· 137
5.l Jntroduction .. • 109 · ~-,: · 6.1 , Introduction . . . 137
5 .2 Memory Write Ability and·Storage,Rermanence .
Write AbHity _, . · > ' . ., .
!!.! J,J 6.2 Communication Basics
Basic Terminology
138
138
Storage Perni;mence 112 ' ii Basic Protocol Concepts . . . 140
Trade-offs , ·.... ' 1 ' \ ,-, .,,. ~·- 112 ~ -· · Example: The ISA Bus Protocol_;M~mory Access . . 141
5.3 · C9 on Memory'Type( ,, ,,-'.t ,: "· 2 1:'1 ; •• ·
11.112 1
6.3 Microprocessor Interfacing: 1/0 Addressing .. . 144
Intro ction to "Read-Only" M~mqcy;'""; ROM 1 Port and Bus-Based 1/0 144
sk-Programmed . . . _. . . . .. 114 \I Memory-Mapped 1/0 and Standard 1/0 145
TP ROM - 0 . -Tiµie P~ogr~able ROM 114 i · Exaip.ple: 'the ISA Bus Protocol -Standard 1/0 14T
EPROM ~ . sabfe Prqgi~ab.!iJ.lOM. 115 lJ · · · Example: A Basic Memory Protocol 147
. EEPRO Electrically Erasable Programmable ROM . 1 J6 iJ .··. Example: A Complex Memory Protocol· 148
Flash.;.:.; e,:nt>ry . _~l> _· ;, --~-..~·-, ·-;·
1
• 117 j 6.4 Microprocessorfnterfacing: Interrupts . 148
In du'ctiontoRead-WriteMemory-RAM .. 118 J · 6.5 Microprocessor Interfacing: ])i{ectMemory Access 153
-'- Static' R,..AM _.-. · 119 i Example: DMA 1/0 and the lSABus Protocol 158
DRAM - Dynamic llAM.. .· 120 ; . 6.6 · Arbitration · 159
PSRAM ~Pseudo~StatfoRAM · •:: ., . 120 n Priority Arbite~ . : ; 160
, .... NVRAM ~ .:t{onv.:o lat~~~.-,: . _.. _ . 120J - · ·_· · · Daisy.;Chain .Arbitration 160
- ------ - .- ·- ..
Example: ·fil4p+64 ~d. ~?C~~B."R.::1\W~PM D~v1ces .· 120 li. .•..· Network-Oriented Arbitration Methods' · 162 .
- ~'.~
5·t~~z~:m=~l=~~ii Mti1:~li~!s~ga~n- ~F.:u~pt~tab~le ==_~~~!:~;~~~
\
SA
Example: TC55V~}25FF~l00 Memory Device
Composing Me~ory · · ,;;, :::•\; , · !;~ :~·~:--.~ 6-~
-fx=a1~:~~ 5 1~·01
· 55 Me_m ory llierarchy and Cache· 125 ~ 6.8 · Advanced ConimunicationPrinciples 166
·, Cc1che MapPing:Techniqu.es .. :,_; ,·. :; ., , , 126 l .· · Parallel Communication . 166
Cach¢.:Replacement Policy ·. '' , 128 ·,:_;· ·-:;_.<;. .· Serial Commi,mication 166
Cache:Write Techniques · .-,·> 128, ; ·•·' Wireless Communication 167
Cliche lmpact on System PerfornHW.~ :,fo:i ,: · 128 .i Layering 16~
S.6 •·.· J\dvanc~d RAM . · ;.,~;nc,n,\· '' · · 130 t,;
ill
,: :, Error Detection and Correction I 68
-~<;ontents
.·
~~--;_;_~_;.,....,......_...,.......,....,..._...,......~__,...,......,... --:---:-.,.._,.,..,..._,,._ l
1
6. 9 Serial Protocols 169 l

l .. · ·,.
I2c ! 8.J An Introductory Examp e .. _ .· .· . . .· 2Hf
169 ,1 , 8.4 A Basic State N.f"achfoe Mo'del:;·Finite:State Machines
CAN 211 '
171 .,l 8.5 Finite-State Machine' With 'Datapath M6dei: .FSMI) 213·· ..·
s. /· ·. FireWire · 172 ; 8.6 · Using State Machines .. .· · · · 213
USB 172; · ··
Describing a Systein as a State Machine .. . . 214
6.10 Parallel Protocols
PCIBus
173
173 ;
i ·Comparing State Machine and Sequ~ntial Program Models 214
· ARMI3us 173 •
Capturing State Machines in Sequentfal Program
·
.· .. ming Language
· 215
8. 7 HCFSM and the Statech_ arts Lan81J_a ge ·· 217
6.11 Wireless Protocols 174
8.8 Program-State Machine Model'&,SM) .• . 220
IrDA 174 ' . 8.9 The Role ofan AppropriateModehmd Language · · 221 ·
Bluetooth lM J · ·
174 . .. 8.10 Concurrent Process Moqel .222
. IE~2.ll .
· 8.11 1 Concurrent Processes 225
~ . Summary . .. . , .. 176 . Process Create and Terminate
6. 13 References and Further Readmg 226 ·
76
1 ·• Process 'Suspend and Resume · · 227
6. 14 Exercises 176 _a Process Join 227 ·
~APT~R 7: ·P!gitalCamera Example 179 ;I
8~"12 Communication among Processes 227
7.1 Introduct1cm . .. . • . , ·' .· . · . · 179 • S_hared Memonr . · ·..
7.2 Introduction to a Simple Digital .C amera 179 : 'J
227
User's Perspective . 180 i Message Passing 231 ·
8.13 Synchi.onization among Processes·- . 232
Designer'sPei"spectiv~ . 180 . ·.
· Condition Variables 233
7.3 Requirements Specification 18.5 '
Monitors
~
186. , 235
. Nonfunctional Requireroents
8.14 Implementation . . .. . . 236
Inf~rmal Functional Specification · 187 · _Creating ·· a.nd Terminating Proce.sses 236
· Refined Functional Specifica!ion ·· 187 ' Suspending and Resu.min_·_ g Processes
. . ·.,
'111·11. 238 ·.
7.4 Design 194 ;_.
. Join_ing a: Process · ··
9
11
Implementation 1: Microcontroller Alone 195 : 239
1,!
· Scheduling Processes . 239
Implementatjon 2: Microcontroller and CCDP:P . . 195 i . :/. 8.15 Dataflow Model
. .
IiI - .. .- - --·-:--: Implementation 3 :Micro.conttoller ancJ GCDPP/F1xed".'Pomt DCT 200 ; -·~, c · 8.16_ Real-Time Systems
241
Implementation 4: MicrocontmllerandCCpPP!DCT 403 ;i 242
205 Windows CE 242
1·. . 7.5 Summary :· ,.· · . \:- . .
i ·_·__ . :~::.x~ . .7. 6 References and Further Reading 2osi QNX 243
205. !
! . . ..
I 8.17 Summary 241
i r 207 i. ' , , 8':'18 · 'I{efererices and FurtherReadirig 244
~~TEi~~ cis~;~te' Madhine and ConcurrentPmcess MCldels ..
; '°~ . 1 9 Exercises i44
8 .1 Introduction · · ·.. . · . . , ·· .· · 207 i · ·,: ~ .:,. CRAPTER 9: Control Systems
8.2 · Models ~s. Languages;Text vs.,Graphics 2°..
9
:209 ;·:.:. '.'\~ 9.1 · Introduction . · ··: ,. · / /
245
·245
· ¥odeis vs: L·a_nguage~ , · . . .
. t'e'>ctual Languages vi{ Graphical Languages 21o 1 u2::· 92CNe~~z;i~~op and ,~iosed-1:°oP,1 ;o~!;.srst~ts:·,·.· · . 246 .·
246
.\
lF.
. -S-f:in-'-,t,!..:..T-'--i- •.:..:.
__ .- . _:.___ _ _ _ _---,---------':-::-:--:--::-:---:-------:----,----,---'-----
i
1
A First Example: An Open-LooRA.utom.opi!~.~rµi~ ~pntroller

A Se~nd Example: A Clo~~L,Qqp.Aut9mobite(:ruise_Controller
248
251
I~
· ..·Temporal and Spatial Thinking
295
11.3 Verification: I:Iardware/Software Co-Simulation
.9. 3 ...
General Cp_ntrol •Syst~mf~d. l?IP
~,opfrq~lers .· :.: ' 256 •· · Formal Verification· and Simulation
296
Control Objectives ... . . . . . _- ·,i< . 256 Simulation Speed
296
· Modeling Real Physical Systerµs . : 257 298
Hardware-Software Co-Simulation 299
ControHerDesign . ,. ; , , _., _. · .. : •< 258 ; Emulators
301
9A .. SQftwwe Coding ofa PII) Co11troller 261 11 .4 Reuse: Iittelle·c tual Property Cti>res
262 . .301
<{5 ' PIO Tuning ,' , _. , : , , _ ·. . .> i ,. . .. . Hard, soft and firm cores . . 302
9.6 Practical Issu~sJlel,a (~ ,t.<> 9>ffiP!lter~B~ C9ntrol 263 ·_ New Challenges Posed by Cores to Processor Providers 302
Quanti:zation and Overflo~-Effe~s ,. .. · · ·· 263 New Challenges Posed by Cores to Processor Users 303
Aliasing · ·.·· · · · · · · · · · 264 11. 5 · Design Process Models 304
Computation Delay .. _ . _ , . 265 11.6 1 · S.ummary
306
9.7 Benefits of Computer-Bas~ (:~nt~ol ,Illplement~tions 265 r I,7 Book Summary 307
Repeatability, Reproducability, ,11~:~!lpi.l~ty," · 266 , 11 .8 References and Further Reading 307
Programmability 266 11 .9 Exercises · 308
9.8 Summary .· .._ ._ , . , . 266 APPENDIX A: Online Resources 311
9.9 References and Further Reading 266 A. l -Introduction 311 .
9.10 Exercises 267 , A.2 Summary of the ESD Web Page 312
-CHAPTER 10: lC Techtio~o,gy , .· · 269 A.3--- Lab Resources <f'
312
10.1 ·. Introduction 269 Chapter 2 j - 312
10.2 Full~Custom (VLSI) IC Technology 273 Chapter 3 .
10.3 Semi-Custom (ASIC) I(; Technology , ,
Gate Array Semi-Custom ICTe~hno!9gy . ,
276
216
II Chapter4 : _
Chapter 5/
314
314
315
Standard Cell Semic.CustorncJ¢ Ti_~[lQl<>gy~, : .,·, ... . ' ,, 276. i Chapter6 315
. 10.4 . Programmable Logic Device (PLO) IC J~il[lology;.; ,:;;. 277_ i Chapter7 315
10.5 Summary . . 280 I · A.4 ·AbouttheBook Cover
'I 0.6 References and Further Reading
· 10.7 Ex~rcises ·-- _____..:. .... · -
2so I , Outdoors .
··· · Indoors
. 1
315 -
_315
.316
CHAPTER 1 L Design Technology .:,· · · Index ·319
11.l Introduction ·
ll.2 Automation: Synthesis .··,,:-•·. - ; . . ~.ff.•
282 1.·.•· •·· .
''Going up": The Parallel .E:vgJµ_tiorl,of(;.o_mp,il;tiiov,an4 -Sfnthesis 282 ·
. S}'nthesis.Levels . .:::_~;·i.>~. :' /:·! · 0
285 :·
;:{i
- '-
. Logic Synthesis -~= ..:.:-: ,· •· · \"

Register-Transfer Synthesis ,;:\·;::.· ·: ·::,.: ·
·. · Behavioral$yQtgesi$.,., . ·; ,,,:·.. (i)e:,,•.,_,L; ;:,o,i'. , ,.;-'; .... v sr;\'.:C . : . 293 :_-.i_
System Synth~sis· a~d H~rdia~~/Software Cociesignv<; ),,c,J : 294 1
i•
Embedded System Design
. .xxr ... ,-·. .
:,,· , · =-.
... .. ,·
·:., :.: ..,. ·
CHAPTER 1: Introduction
fl E~bedded sfsfariis
6veryiew. , ...·..·. _. .· ...•....
1.2 Design Challenge -Optimizing Design Metrics
l.J . ProcessorTechnology
1. 4 -• 1 IC Technology · ·
LS D~igtt Techrio16gy ··
f.6 Tradeoffs ··,.· . ·. ·
1.1 Summary and Bqok Outline. .

1. 8 References and Further Reading
1.9 · Exen;ises
1.1 EmbeddedSystemsOverview, : · . . .>·

- -~~a!;ffte~~:t'
niafuframes· and
. woi:kstatio .
:~·JJ:~~~?!Jo~~t:11~::~
be . '. . . com
.
servers. WhatJllif iii'thatbillions of tin
rr-
., ----
. Chapt er 1: Introd uction
1.1: Embe dded System s Overview
Anti-lock brakes Modems ~ - -.. __

Auto,.focus cameras MPEG decoders ~
Auto~ tic teller machines Network car~ _ ,ilJ.-tf"I
Automatic toll systems Network swrtches/routers Digital camera chip
II -
Automatic transmission ·On-board navigation CCD
' ·
.
· Avionic systems · Pagers . . .
Battery chargers. Photocopiers .
Cimitorders · Point-of-sale systems
.. Cell
Cell phones
phone base stations Portab
S. Cordless phones
Cruise -control
Printerle
s video games
Satellite phones
Scanners
(J · .
· , .
_· . ·
. . · . . · ·
Curbside check-in systems Smart ovensldishwashens
·
Digital cameras
Disk drives Speech recogJl
Stereo systems ·
iz.ers · I«,l
.·,~· l ·
DMA controller _, ·.
Electronic card readers Teleconferencing systl
Electronic instruments Televisions ·. ·
Electronic toys/games Temperature controlle
Factory control · .
Theft tracking systems
Fax machines TV set-top ·boxes Memory controller
Fingerprmt identifiers VCR' s, DVD players
Ho,ne security systems Video game consoles
Life-support systems Video phones . ')
Medical testing systems. Washers and dryers Figure 1.2: An embedded system .,;..,,;:,ple--,- a digital_
camera.
Figure 1.1: A short list of embedded systems.
product scanners, and automated teller machines), :ind

. i, several programs are swapped in and out of a system
. example, some missiles run one program while in cruise
due to size limitations. For
mode, .then load a second
cruise control, fuel'injection, antilock brakes, and active
autom~iles ~ ~ o n co~~ . i program for locking onto a: targ~. Nevertheless,
we ·can see ·that even these
of embedded system exiµrtples; a more complete list
suspension). Figure ti 1s a short_ list
would require many pa~es. One m1gbt
l .,exceptions represent systt;ms with a specific function.
~ Tightly constrained: All computjng systems have •
say that nearly any device. that ruits cin electricity either constraints on design metrics, but
already has or _will soon have a those on embedded systems can be especially tight A
compµting system e~d ed within i~ A.lthough embed design metric is a measure of
ded COfilP.uters typ1 ~ .co~ far
than desktop compute~ their quatrtities are huge. For example,
lll 1999 a.tyt)Ical Amencan
!ess an implementation's features, such as cost, size, perfor
systems often must cost just a(ew dollars, must b,e sized
mance, and power. Embedded
household may have had one desktop computer, but to fit on a single chip, must
· · · · ·. · · .· · • · ·· ··
each one mid ben,v ~ 35 and SO perform fast enough to process data in real time, and
must consume minimum power
embedded computers, with-that numbe · xpecte·
r e . d.. to nse
·· ·earl 300· b 2004 Furthennore
. to extend battecy life-br-preventtlie necessity of a cooHn
the average 1998 car bad so embedded computers costin ton y . Y - ··'. --.-- -- -·--:-··' •.:_
. ,-_. g
g sev~rath_undred doll~ Jll alL with ~I:eactive and ;'!al ti~:. Many embedded systems must fan. · ·
an annual cost growth rate of 17%. Several billion em~ continually react to changes
ded llllC (!)pr ~r. ~ we~ sold m the system s enVIrorunent ':and must compute certain
annually m . .years, com ~ to a few hundred milli~ : results in real time without
E ded systems have $CVeral common clllJractenstics
n _desktop ~~l> ~(lD llts. :.-·,. _!
delay. For example, a car's cruise controller continually
monitors and reacts to speed
tllat4_ispPgµi~ .~9h systelllS I and brake sensors. It must compute acceleration or decele
ration amounts repeatedly
other computing systems: . . . . within a limited tiµie; a delayed computation could
'I I. Single-fancti_oned: An i:mbedded system usuall .. ' •. .· . .. .· 1J result in a failure to ·maintain
! y execµtes a . ~ c program ~ntro l of :the ~- fu COJ/-~ a desktop system typica
repeatedly. Forexa.mple, a pager ~s always a pager. Inw ~ -~~9 lly focuses on computations,
execµtes a variety Of P,f? ~, lilce !ipreadsheets'. word P~ with relatively m/re<iuent (from ,.the computer's perspe
ctive) reaqions . to -input
with new progrlll11$ ad,(\e9 freq11eritly: Of c<>~, there
p ~ r s.. ~vi,dl:O
gam~, ··devices .. In addition, ~ delaf hi those computations,
while perhaps-inconvenient to
~ ~~fCPtlO,~i ~e: ~ 15 cthe computer .user, typically does not result in a system
failure.
where ail. embed4i:d ~~m 's p~gr.tm is updated witll_a
example, sonie cell phones can be updated in such, a manne ll~% PfQ ~ y~on. For
r. A second case IS where -· ~
'
2
Erti~ ed System Design
·"·- ·--- ---- ··".www.compsciz.blogspot.in
~~ --- 3
----- - ···-·- ·. --- --·· ·· --- --- --- --- --~ ~~ C--
"-= -== ==- ::iJh:;;:::: ---- -:--
Chapter 1: Introduction 1.2: Design Challenge - Optlmklng Design Metrlc:a
For example, consider the digital camera chip shown in Figure 1.2. The charge-coupled
device (CCD) contains an array of light-sensitive photocells that capture an image. The A2D
and D2A circuits coilvert analog images to digital and digital to analog, respectively. The
CCD preprocessor provides commands to the CCD to read the image. The JPEG codec
compresses and decompresses an image using the JPEG1 compression standard, enabling Size
f.
compact storage of images in the limited memory of the camera. The Pixel coprocessor aids
in rapidly displaying images. The Memory controller controls access to a memory chip also
fouild in the camera, while the DMA controller enables direct memory access by other devices
while the microcontroller is performing other functions. The UART enables communication
with a PC's serial port for uploading video frames, while the ISA bus interface enables a
faster connection with a PC's ISA bus. The LCD control and Display control circuits control Figure 1.3: Design metric competition- improving one may worsen others.
the · display of images on the camera's liquid-crystal display device. The ·
Multiplier/Accumulator circuit performs a particular frequently executed multiply/accumulate / NRE cost (nonrecurring engineering cost): The one-time monetary cost of designing
computation faster than the microcontroller could. At the heart of the system is t11e 0
the system. Once the system is designed, any number of units can be manufactured
Microcontroller, which is a programmable processor that controls the activities of all t11e ~ithout incurring any additional design cost; hence the term nonrecurring.
other circuits. We can: think of each device as a processor.designed for a particular task, while /"' Unit tost: -'.{he monetary cost of manufacturing each copy .of the system, excludinr,
the microcontroller is a more general processor designed for general tasks. yREcost.
This example illustrates some of the embedded system characteristics described earlier.. / . Si.it:'' The physical space required by the system, often measured in bytes for
First, it performs a single function repeatedly. The system always acts as a digital camera, · / / /oftware, and gates or transistors for hardware.
wherein it captures, compresses, and stores frames, decompresses and displays frames, and . / ,.. /
7
PerfortJUince: The execution time of the system.
uploads frames. Second, it is tightly constrained. The system must be low cost since ; , / Poy,er. The amount of power consumed by the system, which may determine the
consumers must be able to afford such a camera. It must be small so that it. fiis within a lifetime of a battery, or the cooling requirements of the IC, since more power means
standard-sized camera. It must be fast so that it can process numerous images in milliseconds. morehea1/
It must consume little power so that the camera's battery wiil last a long time. However, tllis / Flexibility: The ability to change the functionalit; of the system Y.ithout incurring
particular system does not possess a high degree of the characteristic of being reactive and • . ,ieavy NRE cost. Software is typically considered very flexible.
real time, as it responds only to the pressing of buttons by a user, which, even in the case of an ,./ ~ime-to-prototype: Th!~~11~e~ t()J?lln<!.!..wor~i.!!K:Y~rsi_on o_[l!!_e_~ys!~, which
avid photographer, is still quite slow with respect to processor speeds. · . may ~gger or more expensive than the finiil system 1mplementallon, but 1t can be
used to verify the system's usefulness and correctness and to reline the system ·s
0
ymctionality. .
1.2 Design Challenge - Optimizing Design Metrics ~ , Time-tg-'!}{lrket: The"-tim~..n,gy'ged_JQ_dex_rj9~~~_tem_l~l.ffie _point that it can be
/ re)~as~ __l!Jl<:I . SOid t() customers. The ·mairi: ~Q.ntriQ,Ut<!!:,S,_ ~ design time,
The embedded-system designer must of .course construct an implementation tl1at fulfills .. manufacturing time, ancftestfogume. - . . -· .
desired functionality, but a difficult challenge is to construct an implementation that · • Maintainability: The ability to modify the system after its initial release, especially
simultaneously optimizes numerous design metri¥. by designers who did not originally design the system. . .
~Correctness: Our confidence that we have implemented the system's functionality
ComJrion Design Metrics/ _/ ·· correctly. We can check the functionality throughout the process of designing the
1:p/our · purposes, an implementation consists either of a microprocessorr with an ~system, and we can insert test circuitry to check that·manufacturing was correct
accompanying .program, a connection of digital gates, or some combination thereof. A design . ~ SafeQ>_:_}'he probability t~~t the system wiH not ca1:1.se harm.
/
\
. .
. ' . r
1 JPEG is short for Joint Photographic Experts Group. "Joint" refers to the group' s :
-·
metric is a measurable teature of a system's implementation. Commonly used metrics include: '
. .
Metrics typically compete with one another: lmprov~ one often leads to worsening of
another. For example; if we reduce an imple enfation' s size, the implementation· s
performance may suffer. Some observers have mpared this phenomenon to a wheel with
status as a committee working on both ISO and ITU-T standards. Their best-known standard numerous pins, as illustrated in Figure 1.3. If, . u push one pin in, such as size, then the other
is for still-image compression. '·

Embedded System Design 5
4
~1 ~ ~-k~"~-< J;_~-l ~ ~
-- -- --·--- - -··--·----·- -- 'i!
l1
!~i
- - - - - - - - - - - - - - ~ - - - - - - - - ; ; ~ f! 1.2: Design C.hallenge - Optimizing Dealgn M ~ .
~C:,:h:a!:pt:,:e;_r_:1;_::':::":tr:od:u:ct:;:ien:::,;_
. _,.;.__ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~ - - - - - - ~ - ij
Peak revenue .
$2ll0,000,---------l=·;-:,A
,1110.000 t----------a--c --B
Time (months) 2W 0 800 1800 2400 800 1800 2400

~ Number of uni~ (volume) Number of uniis (volume)
:&_"·Y
(a)
(b)
(a)
Figure 1.5: Costs for technologies A, B, and Casa function of volume: (a) total cost, (b) per-producl cost
figure \.4: Time-to-marlcet: (a) market window, (b) simplified revenue model for computing revepue loss from
delayed entry.
'i
:, !'+- !}.VJ " 51--
lf\'.),..-~
0 -
on-time mark~t entry i~ the area ~fthe triangle labeled On-time, and the revenue for a delayed
pins pop out. To best meet this optimization challenge, the designer must be comfortable with ] ~n~ produc_t 1s the area pf the triangle labeled Delayed.'The ~~e loss for a delayed entr
a varietv of hardware and software implementation technologies, and must be able to migrate :
from o~e teclmology to another, in order to find the best implementation for a given ::
J!LIUSt..the_difference.of
~7 ·"----·-··----------.-··· these two. tr·tang Ie,s' lit~~--
· __tVs~denve
Le "'"" - an ~lion for· percenta ey
_ revenue loss, which. eci,uals ((?n-t1me :- pelayed) I On~time) * I ooyor-simplicity, we;I ···
application and constraints. Thus, a designer cannot simply be a hardware expert or a software :\ ~ssume the ~arke! nse angl~ is_ 45 degrees, meanin_~ !.11e_height of the triangle is W, and we
expert. as is commonly the case today; the design~r must have expertise in both areas. · ~~e 3;~:~all-~eJlerS1~e ~e denva~on of1,the sa~e equation f~r any angle. ~ e a .C>fth<: On-
t tr t ~ C~£!1~_,a_~2 ,*__tg_ ~e-~--~igbt, is. thu& ½ ~2W * W, or W 2 • The area ;ofthe
The Time-to-Market.Design Metric ~ n~ le 1s-½. (.W'-
. D + W) *. (W - D). After a\gebr.a1c · s1mphficat1on,
· · · we obtain the
Most of these metrics are heavily constrained in an embedded system. TI1e time-to-market .' 1011owi~g equation for percentage revenue loss:
constraint has become especially demanding in recent years. Introducing an embedded system ,
to the marketplace early can mak~ a big difference in the system's profitability, since niarket '
d. . percentage revenue loss ={D(3W,~D) / 2W2) * IOO% !.-'
J- 1<ki
2- J. yJvifW'
0
windows for products are becoming quite short, with such windows often measured in · <:onsider a prod_uct ~hose lifetime js-5 ~:. so W = 26. Accor~ing~: the prec~ng
months. For example,[Figure l.4(a) shows a sample market window during which time a · equau _ el~y of Just D = 4 w r~§Ults. m areve11ue loss of 22%, and a delay of D = 1O
product_ would have hi~l sales. Missing this window, which means that the product begins w . ,results m a loss of'5~. S?mestudies claim that reaching market late has a larger
being so further to the right on the time scale, can mean significant loss in sales. In some · n~gaove effect on revenu 1han g¢4elopmen.t cost overruns or even a product price that is too
ca . each day that a product is delayed from introduction to the market can translate to a high. ' / .
e-million-dollar loss. The average tJme-to-market constraint has been· reported as having . o(2iv0~
.?I~~.*'(O·J·-;\ &_--0Y!J
sluunk to only 8 monthsT> , The NRE and Unit Cost Design Metrics ~· o-
Adding to the difficulty of meeting the time-to-market constraint is the fact that As anoth~ exercise,_ let's consider _NRE cost and unit cost in more detail. Suppose three
embedded system complexities)tre growing due to increasing IC capacities, as we will see technolog1~s are ava1lable for use m a particular product. Assume that implementing the
later in this chapter. Such rapid growth in .IC c,apacity translates into pressure on designers to product usmg technology A would result in an NRE cost of $2,000 and unit cost of $100, that
add more functionality to a system. Thus. dysigners today are being asked to do more in less technology B would have an NRE cost of $30;ooo and unit cost of $30, and that technology c
A time. //
t's investi.gate the loss.· o~ re.·ve.ntfe that can occur due .to delay~ en. ~ of a product i~ :
market. We'ILuse a sunphfi~odel of revenue that 1s shov.:n m Figure 1.4(b). This;
assumes the peak of !he market occurs at the halfway point, denoted as W, ·of the
. product life, and that the peak is the same even for a delayed el).try . The rdve!]ue._ for an
~oul~ have an NRE cost of $100,000 and unit cost of $2. Ignoring all other design metrics,
hke t1me-to-m:irket, the ~st technology_ choice will depend on the number of units we plan to
produce. ~e illustrate this concept with the plot of Figure l.5(a). For each of the three·
technologies, we plot total cost versus the number of units produced, where: .
------------'--~------'----------:(' - = - - - - - - - - - - - - ~ - - - - --__:______7
Embedded System Design~:; Emb.edded System Design
6
Chapter 1:.Introduction
total cost= NRE cost + unit cost * # of units In embedd~ systems, perfonnance at a very detailed _level is. also often Qf concern. in
We see from the plot that, of the three technologies, technology A yields the lowest total particular, .two signal changes may have to be generated or measured within some number of
cost for low volumes, namely for volumes between l and 400. Technology B yields the nanoseconds,·
lowest total cost for volwnes between 400 and 2500. Technology C yields the lowest cost for · - Speedup is a com'!lon method of comparing the performance of two systems. The
volumes above 2500. speed\lp of system A over system 13 is determined simply as:
Figure l.S(b) illustrates how larger volumes allow us to amortize NRE costs such that speedup of A over B = performance of A / performance of B.
lower per-product costs result. The figure plo~ per-product cost versus volwne, where~
. Performance could be measured either as latency .or as throughput, depending on what is
per-product cost = total cost I # of' units = NRE cost/ # of units + unit cost of m~erest. Suppose the speedup of camera .A over camera B is 2. Then we· also can say that A
For example, for technology C and a volume of 200,000, the contribution to the · 1s 2 times faster than B and B is 2 times slower than A.
,...,,,,-·
per-product cost due to NRE cost is $100,000 / 200,000, or $0.50. So the per-product cost -!\
would be $0.50 + $2 = $2.50. The larger the volume, the lower the per-product cost, since the
NRE cost can be distributedover more products. The per-product cost for each technology 1.3 Processor Technology · -·1,.
approaches that technology's unit cost for very large volumes. So for very large volumes,
We can define technology as a manner of accomplishing a task, especially using technical
nwnbering in the hundreds of thousands, we can approach a per-product cost of just $2 -
__,, processes,_ methods, or knowledge. This book takes the perspective that three types of
quite a bit less than the per-product cost of over $100 for small vq~s.
technologies are central to embedded system design: processor technologies, IC technologies,
Clearly, one must consider the revenue impact of both · e-to-market and per-product
and design technologies. We describe all three briefly in this chapter and provide further
cost, as well as all the other relevant design metrics whe .. al"uating different technologies.
' '
.7 4 details in subsequent chapters.
. Processor technology relates to the architecture of the computation engine used to
The Performance Design Metric .:._.,/ ui:iplement a system's desired functionality. Altl1ough the term processor is usually associated
Performance of a system is a measure of how long the system takes to execute our desired with programmable software processors, we can think of many other, nonprogrammable
tasks. Performance is perhaps the most widely used design metric in marketing an embedded digital systems as being processors also. Each such processor differs in its speciali1.atio~
system, and also one of the most abused. Many metrics are commonly used in reporting towards a particular function (e.g., image compression), thus manifesting design metrics
system performance, such as clock frequency or instructions per second. However, what we ifferent tl1an other processors. We illustrate this concept graphically in Figure 1.6. The
really care about is how long the system takes to execute our application. For example, in pplication requires a specific embedded functionality, symbolized as a cross, such as the
terms of performance, we care about how long a digital camera takes to process an image. r summing of the items in an i>.rray, as shown in Figure I .6(a). Several types of processors can
The camera's clock frequency or instructj_ons per second are not the key issues - one camera implement this functionality, each of which we now describe. We often use a collection of
may actually process images faster but have a lower clock frequency than ariother camera. such processors to optimize a system's design metrics, as in our digital camera example.
With that said, there are several measures of performance. -For simplicity, suppose we
a
have single task that will be repeated over and over, such as processing ai1 image iii digital a , 'General-Purpose Processors - Software
camera. Jhe-twomain measures of performance are: __ ........ ·
The designer of a general-purpose processor, or microprocessor, builds a programmable
• ·· Latency, or response time: The time between the start of the task's execution and the
device that is suitable for a variety of applications to maximize the number of devices sold.
end. For example, processing an image may take 0.25 second.
One feature of such a processor is a program memory - the designer of such a processor
• · Throughput: The number of tasks that can be processed per unit time. For example, a
does not know what program will run on the processor, so the program cannot be built into
camera may be able to process 4 images per second.
the digital circuit. Another feature is a general datapatl1 - the datapath must be general
However, note that throughput is not always just the number of tasks times latency. A
enough to handle a variety of computations, so such a datapath typically has a large register
system may be able to do better than this by using parallelism, either by starting one task
before finishing the next one or by processing each task concurrently. A digital ca..,iera, for t file and one or more general-purpose aritlune ·
designer, however, need not be conce
· units (ALUs). An embedded system
about the design of a general-purpose processor.
example, might be able to capture and compress the next image, while still storing the :;
previous image to memory. Thus, our camera may have a ·latency of 0.25 second but a t n embedded system designer si
processor' s memory to ca
y uses a general-purpose processor, by programming the
ut the required functionali ty. Many people refer to this part of
throughput <:>f 8 images per second. l•,
t: an implementation as software" portion. ·
f.~
------------'-------------------------
.8 Embedded System Oesign
· . -·-- - ·~·-- · · · -- ~
-~ ~ -; -~.:...:::.....cvo"c ·· , - _, ,-~ __,;-1 /½·-· ~ - --
l_ Embedded System Design
9
Chapter 1: Introduction 1.3: Pro~ssor Technology
Controller Datapath Controller

total= 0 Datapath
fori=ltoN Control
lcxp Register rnde~
logic and file ·
total += M[i] State
errl locp ltota~
register
(a) [~J
DO (b) (c)
0 (d) Program
. memory/
Data
memory memory
Datil
memory
/
.._,/ . .
/'
.._____,
/
Data
.,,..rilemory
As$p1Tlb(y Assembly.· ··
Figure 1.6: Processors vary in their customization for the problem at hand: (a) desired functionality, (b) general-
purpose processor, (c) application-specific processor, (d) single-purpose processor. e6de for: code for:
,.../
total =iO ~ a-total= 0
/ 4 . g a general-purpose processor in an
embedded system may result in several design fori =I lo. I
fori-=l to
metric benefits. Time-to-market and NRE costs are low because the designe_r must only write
(a) (b)
a program but not do any digital design. Fle~bility is high because changing functionality (c)
requires changing only the program. Unit cost may be low in small quantities compared with Figure. 1.7: Implementing dc.si(cd functionality on
different processor types: (a) general-purpose, (b)
application-specific, (c) single-purpose.
designing our own processor, since the general-pwpose processor manufacturer sells large
quantities to other customers and hence distributes the· NRE cost over many units.
Performance may be fast for computation-intensive applications, if using a fast processor, due designer may create a single-purpose processor by· designing a custom d,·e,o mtal · ·t
d' . .• . ClfCUI , as
to advanced architecture features and leading-edge IC technology. _1scussed m later chapters. Alternatively, the designer may purchase a predesigned
However, there are also some design-metric drawbacks. Unit cost may be relatively high ~mgle-p~se prn~ssor. Many people refer to this part of the implementation simply as the
for large quantities, since in large quantities we could design our own processor and amortize hardware portion. although even software requires a hardware processor on which to run
our NRE costs srich that our unit cost is lower. Performance may be slow for certain Other common tenns include coprocessor, accelerator, and peripheral. ·
applications. Size and power may be large due to unnecessary processor hardware. Usmg a smgle-purpose processor in an ~mbe<J.ied system results in several design-metric
For example, we can use a general-purpose processor to carry out our array-sununing ;; benefits and drawbacks, which are ~ssentially the inverse of those for general-purpose
functionality from the earlier example. Figure l.6{b) illustrates that a general-purpose r processors. Perf~nnanc~ may be fast, SIZe and power may be small, and unit cost may lie low
processor covers the desired functionality but not necessarily efficiently. Figure l.7(a) shows tr for large quanuties, w~le design time and NRE cos_ts may be high, flexibility low, unit cost
a simple architecture of a general-purpose processor implementing the array-sununing t_) high for s_ma~l· ~titles, and performance may not match general-purpose processors for
functionality. The functionality is. stored in a program memory. The controller fetches the [ some apphcat1o~SJ ..
current instruction, as indicated by the program counter (PC), into the instruction register :· For example, Figure l.6(d) ili~strates the use of a single~purpose processor in our
(IR). It then configures -the datapath for this instruction and executes the instruction. It then embe~ded . system example, representing an exact fit of the desired funciionality noth·
mo thin I F" · , mg
determines the next instruction address, sets the PC to this address, and fetches again. re, no g ess. tgure l. 7( c) 1Hu~trates the architecture of such a single-purpose processor
:or _the example. The data_p ath contams only the essential components for this program: two
Single~P.urpose Processors-·. Hardware egisters ,and . adder.
. one . Smee
. . processor only executes this oneprogram, .~,e
the d · the
.. h arwrre
.program s mstruc11ons d1rectly mto the control logic and use a slate register to step through.
A singJ/~purpose processor is a digital circuit designed to execute exactly one program. For those mstrucllons, so no program memory is necessary. · ·
:2.
.e)illllple, consider the digital camera example of Figiire 1 All of the components other than
the microcontroller are single-purpose processors: The JPEG codec, for example, executes a
single program that compresses and decompresses video frames. An embedded system
10 Embedded System Design
Chapter 1: Introduction
1.4: IC Technology
autoincrementing register, a path that allows us to add a register with a m . .

Application-Specific Processors · ·
one mstrucUon, tiewer registers,
· and a sunpler
. controller. emory 1ocatton m
/ /"Afi-~pplication-specific instruction-set processor (ASIP) can serve as a compromise between
,..,/ the other process'or options. An AS~P is a programmable ·processor optirnizcd for a particular
class of applications having common characteristics, such as embedded control, digital-signal
.processing, or telecommunications. The designer of such a processor can optimize the
ology
datapath for the application class, perhaps adding special functional units for common ;s E processor must eventually be implemented on an integrated circuit (IC). IC techn l
operations and eliminating other infrequently used units.
Using an ·ASIP in an embedded system can provide the benefit of flexibility while still !,,;
t mvolves the manner in which we map a digital (gate-level) implementation onto an IC. ~ ~f
often called a. "chip," is a semiconductor device consisting of a set of connected transistor~
achieving good perfonnance, power, and size. However, such processors can require large /, and other dCV1ces. A number of different processes exist to build semiconductors the most
NRE cost to build the processor itself and to build a compiler, if these items don' t already /: popular of which is complementary metal oxide semiconductor (CMOS). '
exist Much research currently focuses on automatically generating application-specific ~ IC technologies differ by how custpmized the IC is for a particular design. IC technology
processors and their associated compile and. debug environments, and some commercial ~ is i~dent from processor technology; any type of processor can be mapped to any type
products that do this have recently appeared. However, due to the lack of good compilers that [ ./'6f!C t~hnology, as illustrated in Figure 1.14.
can exploit the unique features of most ASIPs, designers using ASIPs often write much of the ~ To understand the differences among IC technologies, we must first recoIDJ.ize that
software in assembly language. . semiconductoi:5 consist of ~umerous layers as illustrated in Figure i.8. The bott~m layers
Microcontrollers and digital signal processors are twci well-known types of ASIPs that form the transistors. The rruddle layers form logic components. The top layers connect these
have been used for several decades. A microcontroller is a microprocessor that has been ~ com~ments with "'.ires. One way to create -these layers is by depositing photo-sensitive
optimized for embedded control applications. Such applications typically monitor and set f' che~cals on the chip surface and then shirting light through masks to change regions of the
nwnerous single-bit control signals but do not perform large amounts of data computations. J chenucals. Thus, the ~k of building layers is actually one of designing appropriate
Thus, microcontrollers tend to have simple datapaths that excel at bit-level operations and at ~ ~asks. A set of masks IS often caU layout. The na..rrowest line that we can create on a chip
reading and writing external bits. Furthermore, they tend to incorporate on the microprocessor ~ 1s call~ the feature size, whic oclay is well below one micrometer (submicron): For each IC
chip several peripheral components common in control applications, like serial technology, all layers m eventually be built to get a working IC; the question is who builds
communication peripherals, timers, counters, pulse-width modulators, and analog-digital each layer and wh '·
converters, all of which will be covered in a later chapter. Such incorporation of peripherals · t
enables single-chip implementations and hence smaller and lower-cost products. f. Full-CustomNLSI
Digital-signal processors (DSPs) are another common type of ASIP. A DSP is a ~ . .
In a fulku~om IC technology, we optimize all layers for a particular embedded system's
microprocessor designed to perform common operations on digital si~als, which are the t
digital_)mpl~me tioii. ~u~h optimiza~on . include~ ~lac~g . the transist~rs to minimize
digital encodings of analog signals like video and audio. These operations'cany out common <f
~o~ n lengths, SlZlng the tranSJstors to opUnuze signal transnuss1ons and routing
signal processing tasks like signal filtering, transformation, or a combination. Such operations i
wir m?ng the transistors. Once we ~olllpl~te ~!Jhe masks, we send the mask specifications
are usually math-intensive, including operations like multiply and add or shift and add. To
a fllbncauon plant that bui!¢dhe..actiial1€s. ·FiJlL--custorii-1C-design, often referred to as
support such .operations, a DSP may have special-purpose datapath components such as a
multiply-accumulate unit, which can perform a computation like T = T + M[i} * k using only f ~ery large. scale integration (VLSI) desi , . a very high NRE cost and long turnaround
Umes, typically many m<>nths be e the IC becomes available, but can yield excellent
one instruction. Because DSP programs often manipulate large arr;iys of data, a DSP may also ,
penom_iance with s size . ~w,sr.. It is usually used only in high-vo:ume or extremely
include special hardware to fetch sequential data memoiy. locations in parallel with other 1' performance-c · ·cal appl · .tions. -
operations, to further speed execufion. fj
Although microcontrollers and DSPs represent widely used types of ASIPs, the term ~
ASJP has really been used only in the past few years, as a n:sult of recent attention given to . f
Sem·custom ASIC (Gate Array and Standard Cell) .
creaµng ASIPs for much smaller application classes, some as small as just a handful of [:·;,. In an application_.:.~~cific IC (ASIC) technoloey, the lower layers are fully or partially built,
programs. 1, Ieav1~g us t tsh the upper layers. In a gate-array ASIC technology, the masks for the
Figure .6(c) illnstrates the use of lln ASIP for our example. Although partially E trans1 . and gate levels are already built (i.e., the IC already consists of arrays of gates). The
customi to the desired functionality, the ASIP yields some inefficiency sirtce it also ~ ·rung task is to connect these gates to achieve our p a r t i ~ timentation. In a
con · s features to support reprogramming. Figure L7(b) shows the general architecture of t standard-cell ASIC technology, logic-level cells, such as an AND gate or an
ASIP for the example. The datapath may be customized for the example. It may have an 1:
'
1------------------,----------..:________
12 Embedded System O~ign f, Embedded System Design
l_
13
· - - - -----'···· -·-- ·--·-' -~~~~ ~ - ~~0 .c.,;.,..;.__ · . - ; ~_.....¼.
1.4: IC Technology
10,000
·····~ ·········· ····
l ,000
Q,
:.au JOO
IC package JC
g_.;- IO
"'o --=
.. 0
-~] I
Figure 1.8: !Cs consist ofseveral layers. Shown is a simplified C°l'.f0S transistor; an IC may possess millions ofthese, ;. !;=3
connected above by many layers of metal (not shown). i• u 0. 1
.QC)
r: 0
...l
AND-OR-INVERT_combination, the mask portions are predesigned, usually by hand. Thus, f 0.01
the remaining task is to arrange these portions into complete masks for the gate level, and then [: 0.001
to connect the cells. ASICs are by far the most popular IC technology, as they provide for '
good performance and size, with much less NRE cost than full-custom I Cs. However, ASICs
I 11 11 1111
. still require weeks or even months to manufacture. ·
Figure 1.9: IC c~pacity exponential increase, following "Moore's Law." Source: The International T ho I
PLO ,,,
Roadmap for s,m,conductors. ! ec O ogy
In a prograirurul]>le logic device (PLD) technology, all layers already exist, so we can
purchase the IC before finishing our design. The laye~ement a programmable This trend, illuSuated in Figur~ 1.9, was actuaily predicted way back in 1965 by Intel
circuit, w re programming has a lower-level meaning than a software program. The co-fou nder Gord0n Moore. He predicted that semiconductor transistor density would double
pro · g that takes place may consist of creating or destroying connections ~tween eve~ 18 to 24 months. The trend is therefore known as Moore's, Law. Moore recently
wi that connect gates, either by blowing a fuse, or setting a bit in a programma)m: switch. pr~icted about ~other decade before such growth slows down. TI\e trend is mainly caused
mall devices called programmers, connected to a desktop computer, typically perform such b~ improvements m IC manufactunng that result in smaller parts, such as transistor parts and
programming. We can
divide PLDs into two types, simple and complex. One type of simple i wues, on th~ surface_ of the IC. The minimum part size, commonly known as feature size, for
PLO is a programmable logic array (PLA), which consists of a programmable array of AND ,· a CMOS IC m 2002 1s about 130 nanometers · ·
gates and a programmable array of OR gates. Another type is a programmable array logic . Figure_ 1.9 shows leading-edge chip ap~r<iximate capacity per year from 1981 to 2010,
(PAL), hjch uses just one programmable array to reduce the number of expensive usm~ predicted d~ta _for years 2000-20 IO. Note that chip capacity, shown in millions of
pro ble components. One type of complex PLO, growing very rapidly in popularity r "transi5tors. per chip, IS plotted on a logariUunic scale. People often underestimate and are
lI,,, er the past decade, is the field programmable gate array (FPGA). FPGAs offer more general /; so~ewhat amazed by the actual grow_th of something that doubles over short time periods, in
connectivity ng blocks of logic, rather than just arrays of logic as with PLAs and P ALs, 1 th1 s case 18 ~onth,s. For exm.nple, tlus underestimation in part explains the popularity of so-
and are sable to implement far more complex . esigns. PLOs offer very low NRE cost and ~ called pyranud schemes. It is the ·key to the popular trick question of asking someone to
'ristant IC availability. However, th &e typi~~gger than ASICs, -may have i' choose_betwee~ a salary o_f $1,000/day for a year, or a penny on day one, 2 pennies 011 day
gh_ unit cost, may c~nsume more power d may b e r r (especially FPG~)- They still f:: tw~, witb contmu oubJmg each day for a year. While many people would choose the first
pr ctevreasonablenormance, tho so they are espeCially well-swted to rapid I.; option, th~ sec a_ option resu_1ts in m~re money than exists in the world. Many people are
ototypmg. also surpn to discover that Just 20 generations ago, meaning a few hundred years .we find
~ that we each have one million ancestors. . . ,
Tr: nds !· . Fi~e I.IO ~hows that in 1981, a leading-edge chip could hold about 10,000 transistors,
h uld be bed { which is roughly the complexity of an 8-bit ·microprocessor. :i_n 2002, a leading-edge chip can ,
es o aware of what is by far the most important trend in em . ded systems, a trend r hold a~ut. 1_ 50,000,000_ transistors, the equivalent of 15,000 8.-bit micrnnrnri>ffnrsf ~or f
related. to ICs: IC transistor capacity has doubled roughly every 18 months for the past '
b
f -r--
companson, 1 automobile fuel efficiency had improved at this rate since 198 I cars in 2002
f
Several decaAes. !'·.;
u, tr would get about 500,000 miles per gallon. '
- - ·-·--'---~~==~~''" 1~-·--·
14
www.compsciz.blogspot.in J /'
· · •· - , ··-·_..,.··?:,;;,· 'rlb:anLA;,;#AJt=t<f.::,;-;-=- "'tN ,·. ·
r
f:1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - t;
. Chapter 1: Introduction f_
_ . ; ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - r·; 1.5: Design Technology ·
f
i
1981 Compilation/ Libraries/ Test/

1984 1987 1990 1993 1996 1999 , 2002
Synthesis IP Verification
!I Ill
150,000,000
transistors
Co i/ation!Synthesis:
,..,..,..,,.System
specific!ltion
System
synthesis
Hw/t: Model simulators/
7\utomates exploration OS ~ checkers
Leading edge and insertion of
chipin2002 implementation details ·
for lower vel. V .....--Behavioral Behavior Cores
~ - ~ --w-Sw
1'I
~
/
specification synthesis j>simulators
Figure 1:10: Graphical demonstration of the rapid growth in transistor density. The shaded region symbolizes the area ries!IP: Incorporates
required by a 10,00<kransistor design over the years'. Note that the area occupies an incredibly tiny portion of a
H
f' pre-designed · t
leading edge chip in 2002. · t: implementation from
~- lower abstraction level , RT RT fIDL simulators
This trend of increasing chip capacity has enabled the proliferation of low-cost, high- into higher level. ' .-· specification S}11thesis com~ts
performance embedded systems that we see today. U:::Verification: Ensures V '----~----'
~ cor:ect functionality at 0
each level, thus reducing
costly iterations between·
·---... Logic
synthesis .
Gate J
simulators)
1.5 Design Technology levels. ~-
' J
Design technology involves the manner in which we convert our concept of desired system alimp~on
functionality into an implementation. We must not only design the implementation to
optimize design metrics, but we must do so quickly. As described earlier, the designer must
be able to produce larger numbers of transistors every year to keep pace with IC technology.
ce, improving design technology to enhance productivity has been a key focus . of the e three main approaches to improving the design process for increased
ware and hanlware design communities for decades. pr ivity, which we label as compilation/synthesis, libraries/IP, and test/verification.
To understand how to improve the design process, we roust first und~d the·design veral 0th~r approaches also exist We will discuss all of these approaches. Each approach
process itself. Variations of a top-down design process have become popular ii!- the past can be applied at any of the four abstraction levels. · · ~ «a_~{
decade, an ideal form of which is illustrated in Figure 1.11. The desJ.gner !l!6n~ tlle system -\_: C · · , · c;l~ 1 ·
throu~h ~e~ abstra · n levels. l),
th<; system level,_ the _designer_ describes the desired ,, ompilation/$ynthesis
funcuo~ty m so anguage, often a natural language like English, but preferably an fi .a
Compitat1on/syntnesi~:1ets d~~gner SP<:ci~~~~~ired _f!l =-:.<;>n_i@y}I_!~ ~J?stract IJlaDner and
executable Ian ge lik~ C; we shall call this,the 'system specificati;~The designer-<@in.es ~ ;!l., automat_ically
generates1pwer-levff imp1ementafo oetailtfoes~ribing..:l'.lsysrem :it high ·
this specifi on by ,di$ibuting portions of it among several gener.iland/or single-pwpose ,, \ bs!~c~~on lev~l~__ca·n impro~e productivity by ~eaucing th(amountof details: ·often by an
proces , yielding behavioral specific tions for each processor}The designer refines these R rder of magrutude, thatalles1gner must speci~
cations into . register-transfer (RT) specificationJ by converting behavior _.on [ A logic S}'.nthesis t?Ol converts Boolean~ressions into a connection of logic gates,
gene~~-purpose proce~~rs ~ . -~9<!e, and__ by converting. behavior on ~ ~ .:se ~
- 1lee_--~pm:i'°mpo• , cal~ed ·a netli~ ~ register-transfer (RT) synthesis tool converts finite-state machines and
processors to a co~ection of re . - ~ _ r ~mponen~ and state l_!laGhines,.r ~ e r reg1st~r traRSfers 1~to a ~ t h <>f RT.components and a. controller of Boolean equations. A
.J,· then re~es the
specification
- . level spec~__tiPl_l_OU!.c5ingl~se P,rn.cesso~ 111,W..Jl..logic
·. ofB eanequatiorls no-refinementofageneral-putpose"'pr~essor's ,~ :;,,.
behavioral ~nthes1s toor'Wnferts a Sequential program into finite-state machines and register
yai_isfer~. Likewise, a software · __ a ~uential program{o assembly code:\
assembly c e · Qn~e at this level. Finally, . esigner refines the re_mairun.
• g specificatio11s hich 1s ~e~nJL_Y .re · er:tnu!sfer .code. mally: · a-·system s_yµuies1s tool converts .iii?'
into an impl entatio consisting of niac . e code for gene~-purpose processors and a . r! a
stract system specific . into set of sequentiaf . .ogram·s general~~rutd 'single-pmpose on
gate-level tlist for · le-purpose proce - rs. - l) processors: . , _.. ,.;f.
r
--~
, .,.~C)&Y-
~()_ . \ Q ; ~" ~~
J.,A,'v
'~ ~
_~_
-I ~ ·.1_)_ ··. (AM_d
5(p}:L
------------7'--~------'-----------,--'-
I
f ------,---,-----\--h---~~:::........-
-,------~;:.._----'-----arllWPl~-~ .
.. _-:-.~-0-. 1.1. _,_

..... ~--·... .......... . ... ... ._ ___. _ :-www.compsciz.blogspot.in
:-a_-_,_·-~-:-~ _ o.;., ~- ;-gn_ ~--
b-ed-d-ed_.s__yst.
__ •• ~~~ ,_ .: •.•..
- - - - - - - - - - - - , - - - - - - -----,--- - - - - :,
~~aa~Ji
1.6: Trade-offs
Libraries/IP
!Libraries involve reuse of preexisting implementations. Using libraries of existing ·. ,----------------...:.·~-'-:-'--.- 100,ooY ,
\r6plementations can improve productivity if the time it takes to fi~d; acquire, integrate, and • g
test a library item is less than that of designing L'le item oneself. ~ . 10,000 ..c
A logic-level' library may consist of layouts for gates and cells. An ·R:_r.-level library r~ay E
0
1,000
i
consist oflayouts for RT components. like registers, multiplexors, decbders, and functional
units. A behavioral-level library may consist of commonly used componenls, such as 100
·-·.:::,c·:,. i:=-"...t
tI , compression .components, bus inteifaces. display controllers. and even general-purpose
10
g ,!:!}
"O ~
O · "O
processors. The advent _of_~:ystem-level integration has caµ~4 !!..&I:Ce!!.t_c.hange in ~s level of ~ it ~
,~-b' Rather than ·these c01nponeri:ts being ICs, they now must also be availabl~ in a form 0
ti '-.
·.;
that e can implement on just one portion of an IC. Such components are ~ailed cores. This C:
0.1
ch ge from behavioral-level libraries of I~s to libraries of cores has prompt~d the use of the . ~
t m1 · ellectual property (IP), to emphasize the fact that cores exist m an mtellectual form. 0.01
tl must be protected from copying. Finally, a system-level library might consist of complete 00
0,
;;:: 0 M
0
.,.,
0
r- 0,
0, 0 0
0 0 0 0 0
syste~s solving particulru- p~oblems, such as an inter~onnectiori ·of processors with N
accompanying operating $)'Stems and programs to implement an interface to the Internet over
"' "' N N
an Ethernet network. Figu~e 1.12: Design productivity exponential increase. Source: The. International Technology Roadmap for
Semiconductors. .· . ·
Test/Verification
· Languages focus. on capturing desired functionality with minimum designer effort. For
Test/verificahon invol~e,5:J ~ t fun~tionality is co~ect.. Such assurance can prev~nt f example, the. sequential programming· language of C is giving way to the object.oriented
tnne-cqnsummg debuggmg at low abstracllon levels and Iteratmg back to lugh abstraction i language of C++, which in tum has given some ground to Java. As another example,
lev . -~ state-machine languages pennit direct capture of functionality as a set of states and
Simulation is the most common method of testing for C?l!.<,.C.t functionality, although ·~ transitions, which can then be translated to other languages like C.
more formal ~-erification techniques are growing 1n popuiaiiiy'..ATthe logic level, gate-level'; Frameworks provide a software environment for the application of numerous tools
simulators provide output signal timing waveforms given input signal waveforms. Likewise, ; throughout the design process and management of versions of implementations. For example:
general-purpose processor' simulators execute machine code. At the RT-level, hardware .'. a framework might generate the UNIX directories needed for various simulators and svnthesis
description language (HDL) simulators execute RT-level descriptions and provide output 1 !<><>ls, supporting application of those tools through menu selections in a single graphical user
aveforms given input waveforms. At the behaviQral level, HDL simulators simulate · interface. -· ·
equential programs, and cosimulators connect HDL and general-purpose processor
/ :
mutators to enable hardware/software coverification. At the syste~ level, a model simulator . ; Trends
mutates the initial system specification using an abstract computation model, mdependent of . :
any processor technology, to verify correctness and completeness of the specification. Model ' Th~ combination of cornpilati~n/S}'Ilthesis, libraries/IP, test/verification, standards, languages,
checkers can also verify certain properties of the specification, such as ensuring that certain and frameworks has imp(OVed designer productivity over the past·several decades, as shown
simultaneous conditions·ne"er occur or that tl1e system does not deadlock. in Figure I. 12. Productivity is measured as the number of transistors that one designer can
produce _in one month. As the figure shows, the gro\\oth has been impressive. A designer in
More Productivity lmprovers 1981 could produce only about 100 transistors pen:nonth, whereas in 2002 a designer should
be able to produce about 5,000 transistors per month. ·
There are numerous ~dditional approaches to improving designer productivity. Standards.
focus on developing well-defined methods for specification, synthesis, and libraries. Such
standards can reduce the problems that arise when a designer uses multiple tools_,or retrieves 1.6
or provides design information from or to other designers. Common standards include Trade-offs
language standards, synthesis standards, and library standards. Perhaps the key embedded system design . challenge is the simultaneous optimization of
t--
, competing d~ign metrics. To address this challenge, the designer trades off among the
18
~
. . .. - ···- ~ ~ - ... ·· - ·· ... . Em,,_ s,..,m O~ga s,..,m o~..,
-~ .' Y d&'.~ ., .,
:'I
-·-~ :.:.. ::::: === ---- ---- -- §
~pter 1: Introduction
;
· 1.6: Trade-0ffs
General- Single-
General; ·purpose ASIP purpose
Customized,
providing improved: processor processor
providing improved:
Flexibility
Maintainability ¢=:J ,---1\ Power cHiciency
L-y Performance
NRE cost Size
Time- to-prototype
Cost (high volume)
Time-to-market
Cost (lov.' volume)
PLO Semicustom Full-custom
Fi gure 1.14, The independence of processor.and IC technologies: Any processor technology can be mapped to any IC
technology.
of abstraction, as illustrated in Figure I. 13. Thus, the starting point for either hardware or
software is sequential programs, enhancing the view that system functionality cau be
implemented in (:lardware, software, or some combination thereof. leading to the following
important point:
The choice of hardware versus software for a particular function is simply a
trade-off among various design metrics, like performance, power, size, NRE
l . cos/, and especial(v flexibility; there is no fundamental difference between
Figur~ t .l 3: The co-design ladder: recent inaturation of synthesis ena!>les a unified view of hardware and software. J what hardware or software can implement.
• f!ardwarel.1oftware codesign is the field that emphasizes a wtified view of hardware and
. tages of.the vario~ available processor techno~o~es ~ IC ( software, and develops synthesis tools and simulators that enable the co-developmen t of
advantages and disadvan
technologies. To optimize a system, the designer must therefore be familiar wtth and I systems using both .h ardware and software.
comfortable with the various technologies - the designer must be a "renaissance engineer," · In general, we can view .· the basic design trade-off as general · versus customized
in the words of some. In the past and to a large extent in the present, however, ·most designers · implementatio n, with respect to either processor technology or IC technology, as illustrated in
had expetlise with either general-purpose processors or with single-purpose processors but not · Figure I. 14. The more general. programmable technologies on the lefl of the figure provide
. both - they were either software designers or hardware designers. Because of this separation grea(er flexibility (a design can be reprogrammed relatively easily), reduced NRE cost
of design expertise, systems .had to be separated into the software and hardware subsystems (designing using those technologies is generally cheaper). faster time-to-prototy pe and
very early in the desig:i process, separately designed, and then integrated near the end.of the. time-to-marke t (since designing takes less time), and lower cost in low volumes (since the IC
process. However, such early and pennanent separation dearly dOt:Sn't allow for the best manufacturer distributes its IC NRE cost over large quantities of !Cs). On the other hand,
optimization of design metrics. Instead, being able to move functions between hardware and more customized technologies provide for better power efficiency, faster perfonnance.
software, at any stage of the design process, provides for better optimiz.ation. ...· . reduced size. and lower cost in high volumes. ·
TI1e relatively recent maturation of RT and behavioral synthesis tools has enabled a, Recall that each of the three processor technologies can be implemented in any of 1he
unified view of the design process for hardware and software. In the past, the design processes three IC technologies. For example, a general-purpose processor can be implemented un a
were radically different - software designers wrote sequential programs, while hardware PLO. semicustorri, or full-custom IC. In fact , a company marketing a prod11ct, such as a
designers connected components. But today, synthesis tools have changed the hardware set-top box or even a general-purpose processor, might first market a semicustom
designer's task essentially into one of writing sequential pro~ . ajbeit_..~th some , implementatio n to .reach the. market early, and then later introduce a full-custom .
knowledge 9f how the hardware will be synthesized from such programs: We ;can think ·of implementatior:i. They might also first map the processor. to an older but mme reliable
abstraction levels as being the rungs ofa ladder, and compilation and synthesis as enabling us . . technology, like 0.2 micron, and then later map it to a newer teclmology, like 0.08 micron. ·
to step up.the ladderand hence enabling designers to focus their design efforts at higher ~ey~Is ; These two evoluHons of mappings to a large extent explain why a general-purpos e processor'~
Embedded Sysiem Design 21

20
=k=.
1.6: Trade-offs
....... Team
60000
~~ 50000 19
· -> i::
0
·u E 40000
::, ...... 23
'O ~
e t;,;o
0. 30000
.,!ii';;;a ./ Months until completion
I-< t:, 20000 43
10000 .Individual
oc ....,
00
"'
00 r--
00 °'
00 r--
0 °'
0 0 10 20 30 40
°' °' °' ~ ~ 0
N .
0
N
Number of designers
Flgur~ 1.15: The·growing "design prrn;lucti.vity gap."
Figure 1.16: The "mythical man-month": Adding designers can decrease individual productivity and at some point
can actually delay the project completion time.
clock speed improves :on the market over time. Likewise, a designer of an e~bedded system :·1··:.•·.·.:,
may use PLDs for prototyping a product, and even for the first few hundred m.stances of the 1
. The situation is even worse than stated before, because the discm;sion assumes that
product to speed its time-to-market, switching to ASICs for larg~r-scale productloIL designer productivity is independent of project team size, whereas in reality adding more
Furthermore we often implement multiple processors of different types on th~ same IC. designers to a project team can actually decrease productivity. Suppose 10 designers work
Figure 1.2 was 'an example of just such a situation - the, digital cam~ra mcl~ded_ a together on a project; and each produces 5,000 transistors/month, so that their combined
microcontroller plus ·numerous single-purpose processors on the same IC. A smgle_ chip with output is IO * 5,000 = 50,000 transistors/month. Would 100 designers on a project then
multiple processors is often referred to as a system-ori-a_-chip. In fact, we can even implement produce 100 * 5,000 = 500,000 transistors/mqnth? Probably not. The complexity of having
more than one IC technology on a single IC~ a portion of the IC may be custom, ai_iother 100 designers work together is far greater than having 10 designers work' together. Even
portion scmicustom, and yet another portion programrna~le logic. The n~ for designers calling a meeting of 100 designers is a fairly complex task, whereas a 10-<iesigner meeting is
comfortable with the variety of processor and IC technologies thus becomes evident. quite straightfon'vard. Furthermore, a 100-designer team would likely be decomposed into
groups; each group having a group leader that meets with other group leaders and reports back
. Design Productivity Gap · , · to his or her group, thus introducing extra layers of communication·.and hence more likelihood
cif misunderstandings and time-consuming mistakes.
While designer productivity has grown at an impressive rate _over the past decades, the ra~e _of· •
This decrease in productivity as designers are added to a project was reported by
improvement has not kept pace with chip capacity growth. F1~e I. 15. shows the p~oducti~1ty ··
Frederick Brooks in his classic 1975 book entitled The Mythical 1:{an-Month. His book
growth plot superimposed on the chip capacity gro~ plot, 1llm;~t1ng th~ growmg design
focused on writing software, but the same principle applies to designing hardware. . The
productivity gap. For example, in 1981, a leadmg-edge chi~ requue_d about 10~
decrease in productivity due to .team-size complexity.can at some point-.actually lengthen the
designer-months to design, since 100 designer:-months * 1_00 trans1stors/~es1gner-month -
time to complete a project. For example, consider a hypothetical 1,000,000 transistor project,
!0,000 transistors. However, in· 2002, a leadmg-edge chip woul~ reqmr~ about 30,00~
in which a designer working alone can produce 5,000 transistors per month, and each
designer · months; since 30,000 designer-~o~ths * 5,000 tran_s1stor~des1gner-month . -
additional designer added to the project results in a productivity decrease of 100 transistors
150 000 000 transistors. So the design productlVIty gap has resulted m '.1" increase from 100 to
per designer, due· to the added complexities of team communication and management. So a
30,000 designer-months to build a leading-edge chip'. Assuming a designer cos!s $10,000 per
designer can complete the project . in 1,000,000 / 5,000 =· 200 months,· 10 designers can
month, the cost of building a leading-edge chip has risen from, $1,0~,000 m 1~81 to ~
incredible $300,000,000 in 2002. Fe·w products can justtl'.Y su~h large _mvestnlent m a chip. produce4,100 transistors per month each; meaning IO~ 4,100 = 41,000 transistors per month
Thus, most designs do not even come ~lose to using potential chip capacity. . total. requiring 1;000;000 I 41,000 = 24 .3 months to complete the project. Figure 1.16 plots
individual designer productivity as designers are added to the project. The figure also plots
22 Embedded. Syst~m Design.

Eml,edded .System Design
23
-------------:-_:_--------------l----------~----...;.~_______1.;:·~--:R.:,e:.:,fete=.:nc=:e:s.:a::,:n:d.:_F:u:,:rt:h:er:_:R:~::a~c!'.'.;i!'l~!f
-C-ha-_p-t-er_1_:_ln-tr_od_uc-t-io_n__
team productivity, computed simply as the number of designers multiplied by their individual : The chapter is mostly a review of the features of such proc~ors; we assume the reade
productivity. Project completion _limes for different team sizes, compute? as 1,000,000 :, already has familiarity with . programming such processors using structured language:.
transistors divided by team-transistors/month, are also shown, A 25-designer team can .; Chapter 4 covers standard smgle-purpose processors, describing a number of common
produce 25 * 2.600 = 65,000 transistors per month, requiring l ,000,000/65,000 = 15.3 months l peripherals used in embedded systems. Chapter 5 describes memories, which are components
to complete the project. However, a 26-designer team also produces 26 * 2.500_ = 65.000 ., neces~ to store data for processors. Chapter 6 describes buses, components necessary to
transistors per month. so adding a 26m designer doesn' t help. Furthermore. a 27-_designer team commurucate data among processors and memories, beginning with basic interfacing
concepts, and introducing more advanced concepts and describing common buses. Chapter 7
produces only 27 * 2.400 = 64.800 transistors per month, thus actually delaymg the proJect ._•:
provides an e_xample of using processor technology to build an embedded system, a digital
1-·;···-
:
:
completion time to I 5.4 months. Adding more designers beyond 26 only worsens the proJcct
completion time. Hence. man-months are in a sense mythical: We cannot always add camera, illustrating the trade--0ffs of several diffef\ent implementations. ·
designers to a project to decrease the project completion time. . . . . Chapter 8 introduces some advance4: _techniques for programming embedded systems.
Therefore. tlie growing gap between IC capacity and designer productlVlty m F_1gure 1.15 including state machine models, and concurrent p'rocess models. It also introduces real-time
is even worse than the figure shows. Designer productivity decreases as we add designers to a systems. Chapter 9 discusses the very common class of embedded systems known as control
project. making the gap even larger. Furthennore. at some point we sim~ly cann_ot decr~se systems, and introduces some design techniques used for such systems.
project completion time no matter how much money we can ~pend on designers, smce addmg , C1'apter 10 describes the three main IC technologies with which we can implement the
designers will decrease the project team's overall productivity. And therefore. leading-edge :, processor-based designs we learn to create in the earlier chapters. Finally, Chapter l l
chips cannot always be desi 9ned in a given time period. no matter how much money we have ,. summarizes key tools and advances in design technology and emphasizes me need for a new
to spend on designers. ·· . . . . , breed of engineers for embedded systems proficient with boili software and hardware <lesign.
Thus. a pressing need exists for new design technologies that will shnnk the design gap. !
One partial solution proposed by many people is to educate designers not Just m one subarea ij
of embedded svstems. like hardware design or software design. but instead to educate them to ij 1.8 References and _Further Reading
be comfortabl~ with both hardware and software design. _TI1is book is intended to contribute to U
this solution. i • Brooks Jr., F.P., The Mythical Man-Month, anniversary edition. Reading. M A: Addison-
·- - - - ' - ____ ____

___.:._-'--
.
·· ·
___.:.___.:.________ _______ ~-
__;_._
s •
•
Wesley, 1995. Original edition published in 1975.
EE Times, Oct I 1, 1999. Embedded Systems section.
Midyear forecast - CEO Perspectives, EE Times, May 27, 1998, Issue I 009.
1.7 Summary and Book Outline · · . l • Semiconductor Industry Association. International Technology Roadmap for
The number of embedded svst~ms is growing every year as electroriic · devices gain t Semiconductors: 1999 edition. Austin, TX: International SEMATECH, 1999.
computational elements. Emb~ded systems possess several common c~cteristics that ~ • Debardelaben, J., Madisetti, V.K., and Gadient, A.J., Incorporating Cost Modeling into
differentiate them from desktop systems and pose several challenges to designers. The ~ey ~ Embedded System Design, IEEE Design and Test of Computers, July 1997, pp. 24-35.
challenge is to optimize design metrics. which is particularly difficult sin_ce_ tho_se ~etncs 1; Includes discussion of revenue model.
compete with one anotl1er. One particularly difficult design metric to opt11mze 1s time-to~ ;
market. because embedded systems arc growing in complexity at a tremendous rate. and the ,;
rate at ·which productivity improves eve!)' year is not keeping up with that growth. TI1is book:: 1.9 Exercises
seeks to help improve productivity by presenting a unified view of soft~are and hardware
design. Th.is goa l is worked toward by presenting three key technologies for embedded . I. I What is an embedded system? Why is it so hard to define?
systems design: processor teclmology, IC technolo~·· a_nd design technology. Processor j I .2 List and define the three main characteristics of embedd~ systems that distinguish such
technology is divided into general-purpose, apphcat1on-spec1fic, and smgle-p'."l'ose ~ systems from other comp_11_tj_ng systems.
processors . .IC technology is divided into custom. scmicustom. and programma~le logic ICs. ~i'. L3 . Whatis:~::a~iagrtmetric? : _: \····
Design technology is divided into compilation/synthesis. libraries/IP; and test/venfi~atlon_. :: 1.4 List a pair of design metrics that may compete with one another, providing an intuitive
explanation of the reason behind the competition. ·
TI1is book focuses on processor technology (both hardware and software). w1t1l tl1e last , .
several chapters providing introductions to topics in IC and design technologies. ·
Chapters 2-7 discuss processor technology. Chapter 2 describes digital design techniques~
I l.5 · What is a "market window" and why is it so important for products to ret1ch the market
early in this window? ·
for building custom single-purpose processors. Chapter 3 covers general-purpose processors]
Embedded System Design ' .Embedde9 System Design 25

24
www.compsciz.blogspot.in - ---- ----~ - --- ··-·"···--- ·····
.·
___. __ .•..-.. . ' . : ---··'- ..... ~
~------'----"--f:_·
Chapter 1: Introduction ~ 1.9: Ezen:ises
1.6 Using the revenue model of Figure 1.4(b), derive the percentage revenue loss equation ij 1.21 Compute the annual growth rate of (a) IC capacity and (b c1es·
1.22 If Moore' s law continues to hold, predict the a ' ro~ igner producttvi~.
. ..
for any rise angle, rather than just for 45 degrees (Hint: you should get the same I{
equation). _ _ ; leading edge re in (a) 2030, (b) 20so. PP te number of transistors per-
1.7 Using the revenue model of Figure 1.4(b), compute the percentage revenue loss if D = 5 : 1.23 Explain why single-pmpose processors (hardware) and eneral-
and W = 10. If the company whose product entered the market on time earned a total :• essenuaU~:he sain.e, and then describe how they differ ingterms o~~ pr~rs are
revenue of $25 ·million, how much revenue did the company that entered the market 5 ['. 1.24 Whal 1s a remnssance engineer," and why is it so important in t h e ~ metrics.
months late lose? i 1.25 What is the design gap? nt market?
1.8 What is NRE cost? ·, l; 1.26 Compute the rate at which the design productivity ga · · ·
implication of this growing gap? P is growmg per year. What IS the
edesign of a particular 4isk drive has an NRE cost of $100,000 and a unit cost of f
1.27 Define what is meant by the "mythical man-montlt"
O. ~o~ much will we hav~ fo add t<?. the, cost o~ each product to cover our NRE cost, tj
1.28 Assume a designer's prQductivity when working alone on a project is 5 000 transist
. _ ~ull).ing we sell: (a) 100 uruts, and (b) 10,000 wuts? :
• 1.10 cite a graph-with the x-axis the number of units and the y-axis the product cost. Plot ; per month_: Ass~e that each additional designer reduces productivitr by 5'¾ ;:~
the per-product cost function for an NRE of $50,000 and a unit cost of$5. f, k~ep m mmd this 1s an_ e_xtremely simplified model of designer productivity!) ;;/Plot
. 1- For a particular product, you detennine the NRE cost and unit cof.t to be the following [' team ,monthly producltv1ty versus team size for team sizes ranging from I to 40
for the three listed IC technologies: FPGA: ($10,000, $50); ASIC: ($50,000, $10); ~ designers. (b) Plot on the same graph the project completion time versus team size for
VLSI: ($200,?00, $5). Determine precise volumes for which each technology yields the (1 projects of sizes 100,000 and 1,000,000 transistors. (c) Provide the "optimal" number of
lowest total cost. ,, designers for each.of the two projects, indicating the number of months required· h ··,
1.12 Give an example of a recent consumer product whose prime market window was only ~ ~ . m~
~~~ F / 1,,,-J jQ, Q08Jcp '.f ",DOc
1.13 Create an equation for total revenue that combines time-to-market and NRE/unit cost ¢ ./ //,,/ • a
considerations. Use the revenue model of Figure 1. 4(b). Assume a 100-month product [ ,.,.;:i -t!G"(
lifetime, with_ peak revenue of $100,000 month. Compare use o'. a general-purposef' _ o\JC) 7 so• \J ' G
· ·processor havmg an NRE cost of $5,000, a wut cost of $30, and a tune-to-market of 12 1 i v; v l~ ? d,C) . 9 00 ?C -t
~ ;J~~ ~
months (so only 88 months of the product's lifetime remain), with use of a single-!'
purpose processor having an NRE cost of $20,000, a unit c;ost of $10, and a time·-to- p \~ ~o
market of 24 months. Assume the amount added to each unit for profit is $5. /,
1.14 Using a spreadsheet, develop a tool that allows us to plug in any numbers for problem1.
I. 13 and generates a revenue comparison of the two technologies.
1.15 List .and define the three main processor technologies, What are the benefits of using;:
each of the three different processor technologies? f
f l~~/
1.1.6 List and define the three main IC technologies. What are the benefits of using each ofii N \1.-t/' J.k
the three diffe~ent IC technologies? . r ~ ~o0 J_, ;;; I ,..,.,--
1.17 List and define the three main design technologies. How are each of the three different/: Y' Vl- ~,
design technologies helpful to designers? r' / 0
1.18 Create a 3*3 grid with the three processor technologies along the x-axis, and the thre(t\ (if,.. ..)_ 6{)1 QOO. YJ f .
re technologies along the y -axis. For each axis, put the most programmable fonnj ~ ~-
closest to the origin, and the most customized fonn at the end of the axis. Explaillf, / OO ,,.R • 9 [0
features and pOS!iible occasions for using each of the combinations of the two['. d-O
9
L '"
c ·
technolog!es. . . _ _ _
IJ9 Redraw Figure 1.9 to show the transistors per re from 1990 to 2000 on a lmear, nol[i
l ogarithmk, scale. Draw a square representing a 1990 IC and another representing al!
[
sao ~
~ ·cS 33 °'
'3 0 ,:;. ?--0
2000 IC, with correct relative proportions. · t vf '
1.20 Provide a definition of Moore's law. I! J ~ ).,,o _ =
6 ?i- (' {) 50
~ ~ - - - - - - - - - -_-------1':c
26
hl
::=~-=-=-:--~___:_---~~~---:c-=--
Embedded System Design ~ Embedded System Design Lj )°3' - "",.
:)
/
v
~ ~ z:,
ti._ __
.. - ·· . . . ~ ........ l_
~--- - --- - ·- - ~-"-"""'---~www.compsciz.blogspot.in
-- ~·--·---· ·-···
____Ja g ~~.~
-· ~ '"'
·I:_·
}
ti l I
J_•
~:. I
i
J;
CHAPTER 2: Custom Single-Purpose

K
i
Processors: Hardware
rf;
~
['.
(
1'
11
2,1 Introduction
[ 2.2 Combinational Logic
}:
t 2.3 Sequential Logic
t'. 2.4 Custom Single-Purpose Processor Design ii.
I
L
r 2.5 RT-Level Custom Single-Purpose Processor Design i:1
r 2.6 Optimizing Custom Single-Purpose Processors
~i
rJ
I
2.7 Summary 1'.!
2.8 References and Further Reading
2.9 Exercises
i:
t:
2.1 Introduction ·
A processor is a digital cin:uit designed-to perform computatton tasks. A processor consists of
a datapath capable of storing and manipulating data and a controller capable of moving da1a
through the datapath. A general-purpose processor is designed such that it can catty out a
wide variety of computation tasks, which are described by a set of programmer-pr!)Vided
f. instructions. In contrast, a single-purpose processor. is designed specifically to cany out a
! particular computation task. While some tasks are so common that we can purchase standard
single-purpose processors to' implement those tasks, others are unique to · a particular .
embedded system. Sue~ custom tasks may be best implemented using custom single-purpose;
processors that we design ourselves.
An embedded .system designer may obtain several benefits by choosing to use a custom
single-purpose processor rather than a general-purpose processor to implement a computation
task.
t First, performance may be faster, due to fewer clock cycles resulting from a customized
I. datapath, and due to shorter clock cycles resulting from simpler functional units, fewer
1: multiplexors, or simpler controller logic. · Second, size may be smaller,. due to a simpler
·· ·------ - -~ s · ~ ~... : :~ - .- - ---~-= ------~ --~- -~

'1._
. ..· t:
I·
I·'.
~----,...
----'-----------------------------
Embedded Systeni Design 29

Chapter 2: Custom Single-Purpose Processors: Hardware
·2.2: Combinational Logic
g~
S't°- Conducts
1fgate= l
IC package IC
drain
(a) x-e F=x'
x-q ~y
F = (xy)'
Figure 2.1: A simplified view of a CMOS transistor on silicon.
datapath and no program memory. Third, power consumption may be less, due to more
efficient computation.
"-"q s source
l°'"docts
1f gate=O
drain
0 --
0 --
x-l
--
F = (x+y)'
~ y
However, cost could be higher because of high NRE costs. Since we may not be able to , (b) (c) (d) (e)
afford to invest as much NRE cost as can designers of a mass-produced general-purpose '.
processor, performance and size could actually be worse. Time-to-market may be longer, and
flexibility reduced, compared to general-purpose processors. . Figure 2.2: CMOS transistor implementations of some basic logic gates: (a) nMOS transisto (b) MOS ·
(c) mverter, (d) NANO gate, (e) NOR gate. r, P transistor,
In this chapter, we describe basic techniques for designing cµstom processors. We start .
with a review of combinational and sequential design, and we describe methods for ,
converting programs to custom single-purpose processors. the top !ransistors conducts and the bottom transistors do not conduct, so logic I appears at F.
If bo~h mputs are lo~1c I , then neither of the top transistors conducts, but both of the bottom
transistors do, so logic_ 0 appears at F. Likewise, we can easily build a circuit whose output is
log1c I .when both of tis m?uts ~e logic 0, as illustrated in Figure 2,2(e). The three circuits
2.2 Combinational Logic shown_unplement three_bas1c log1c gates: an inverter, a NANO gate, and a NOR gate.
D1g1tal syste;n designers usu.ally work at the abstraction level of logic gates rather than
Transistors and Logic Gates -;' tr'.111s1stors. Figure 2.3 describes eight basic logic gates. Each gate is represented symbolically
A transistor is the basic electrical component in digital systems. Combination~ of transi~tors f with a Boolean. equation, and with a truth table. The truth table has inputs on the left and ~
output on .the nght. !he AND gate outputs I if and only if both inputs aie I. The OR gate
form more abstract components called logic gates, which designers use when building digital :
systems. Thus, we begin ~th a short description of transistors before discussing logic design. ' outputs I tf and only tf at least one of the inputs is I. The XOR (exclusive-OR) gate outputs I
A transistor acts as a simple on/off switch. One type of transistor, complementary metal ·
oxide semiconductor (CMOS), is showninFigureH. Figure 2.2(a) shows lhe schematic ofa,
transislor. The gate, not to. be confusl:d with logic gate,·controls whether or not current flows '
from lhe soura1 to· the drai'!. We can apply either low or high voltage levels to the gate. The •
high. level may. be, for_-example, t3 o.r +5 volts,_· .w.hich we'll_ re.fe.r to as logi-·c _1. The Jowl;
voltage is typically ground, drawn as sevefcl). horizontal lines of decreasing width, which we'll ,
"f>£1~1~1u
F=x
I I
F = xy I
Xy F
.{)
0 0
0 I 0
0 0 _ F = x+y
X y F
0 -0 0 -- ~
0 I I
I 0 I F = xEBy
X
0
0
I 0
I
F
0 0
I
I
refer to as logic 0. When logic l is applied to the gate, .the transistor conducts and so current . Driver AND i I I I OR I I I XOR I I 0
flows. When logic Ois applied to the gate, the IIimSis:tor does not conduct. We can also build a '
transistor with the opposite functionality, illqstrated ,in Figure 2.2(b). When logic Q is applied ; y F
D
X
to the ~te, the transl-· stor~nduc!f. Whenlop<; 1 is.appli~ ~!!~- ___ ·stordoesnQt ~duct.. t ~F~F ~ :
~ 0 l 1L_f
x F
O O I
~!
1../_J 0 0 I y •
X F
0 0 I
Given these two baste transistors, we can easily bwld a cuCUit whose outpu_t rnverts its f- 0 I 0
gate input, as shown in Figure 2.2(c). Wlten thi_µ:iput x is. logi~ 0,. th~ top transistor conducts ; F = X, : ·I - O· F = (X-y)'
' • 0 I 1~
I 0 0 I 0
0 F = (x+y)' I 0 0 F=x@y
Inverter - NANb · •1 I 0 0
and the bottom transistor does notcon<Jug, ~ logic 1 appears at the oµtput F. w_e can also _ NOR I ·I 0 XNOR I ·1 I
easily puild a circuit whose output is Io~c 1 when at feast one of its inpu~is iogic o, as ;
shown in Figure 2.2(d). When at least one of the inputs x and y is logic 0, then at least one of , - " : :·;~~:-;-:--:~.c....;_;_~~~..::..____;~-.i..a;;;.:;_~.;;___~;.;...;_--
Figure 2.3: Basic logic gates.
30 Embedded System Design Embedded system·Design
L _. __
- ···- -····· · - · ,=·- = = www.compsciz.blogspot.in
•• - -M ••• ••• •- -•• .__ , •••-••- ••
if and only if exactly one of its two inputs is I. The NAND, NOR, and XNOR gates output ,,
the complement of AND, OR, and XOR, respectively ·
Even though AND and OR gates are easier to comprehend logically, NAND and NOR ·
I 2.2: Combinational Logic
inputs. Such a circuit has no memory of past inputs. We can use a simple technique to design
a combinational circuit from our basic logic gates, as illustrated in Figure 2.4. We start with a
problem description, which .describes the outputs in terms of the inputs, as in Figure 2.4(a).
gates are.more commonly used, ~d those are the gates we buil_t using transistors i_n Figure : We translate that description to a truth table. with all possible combinations of input values on
2.2. The NAND could easily be changed to AND by changing the I on the top to O and the O [' the left and desired output values for each combination on the right, as in Figure 2.4(b). For
on the bottom to I; the NOR could be changed to OR similarly. But it turns out that pMOS · each output column, we can derive an output equation, with one equation term per row, as in
transistors don't conduct Os very well, though they do fine conducting ls, for reasons beyond _ Figure 2.4(c). We can then translate these equations to a circuit diagram. However, we usually
this book's scope. Likewise, nMOS transistors don't conduct ls very well, though they do want to minimize the logic gates in the circuit. We can mfnimize the output equations by
fine conducting Os. Hence, NANDs and NORs prevail. f algebraically manipulating the equations. Alternatively, we can use Kamaugh maps, as shown
in Figure 2.4(d). Once we've obtained _the desired output equations. we can draw the circuit
Basic Combinational Logic Design _ diagram, as shown in Figure.2.4(e).
A combinational circuit is a digital circuit whose output is purely a function of its present
RT-Level Combinational Components
Although we can design all combinational circuits in this manner, large circuits would be very
(a) (d) complex to design. For example, a circuit with 16 inputs would have 2 16, or 64K, rows in its
y be 00 .01 II IO truth tabJ-e. One way to reduce the complexity is to use combinational components that are
y is I if a is I, orb and care I. z is a '>----r---r-,,~"""T""----,
I ifb ore is I, but not both (or, a, more powerful than logic gates: Figure 2.5 shows several such combinational components,
0 0 0 I ! 0
b, and care all I). often called register- transfer, or RT, level components. We now describe each briefly.
: I
.....TTl a
A multiplexor, sometimes called selector, allows only one of its data inputs Im to pass
....... j.........
through to the output 0. Th.us; a multiplexor acts much like a railroad switch, allowing only
(b) y=a+bc
one of multiple input tracks to connect to a single output track. If there are m data inputs, then
z there are logi(m) select lines S. We call this an m-by-1 multiplexor, meaning m data inputs,
Inputs Outputs be 00 01 II 10
a b C y z a 'P--~~-,-.----,-~..,-, and I data output. The binary value of S determines which data input passes through; O(LOO
0 0 0 0 0 .0 0. I j O !I l J: means JO passes through, 00...0 I means/ I passes through, 00 ... 10 means /2 passes through,
0 0 I 0 I
r-1- . . -·r·1-tl r
I·
and so on. For example, an 8x I multiplexor has eight data inputs and thus three select lines. If
-0 I 0 0 I 0
..... ··-- -..i--..,...;..:
~ t those three sele~t lines have values o_f 110, then 16 will pass through to the output. So if 16
0 I I I 0 f
z = ab+ b'c + be' were I, then the output would be I; if /6 were 0, then the output would be 0 . We commonly
I
I
0
0
0
I
I
I
0
I
r use a more complex device called an n-bit multiplexor, -in which each data input as well as the
I I 0 I I (e) ri output consist of n lines. Suppose the previous example used a 4-bit 8 x l multiplexor. Thus, if
I ·J I
(c)
I I
----t
a i==~~~==::;--:---,
b ,_....__,..__,
C
t
l
r:
f
16 were 1110, then the output_ would be 1110. Note that n is independent of the number of
select hoes.
Another combinational component is a decoder. A decoder converts _its binary input I into
a one-hot output 0 . "One-hot" qieans that exactly one pf the output lines can be I at a given
time. Thus, if there are n outputs, then there must be.log2 (n) inputs. We call this<!. logi(n)xn
r decoder. For example, a 3 x8 decoder has three inputs and eight outputs. If the in1iut were 000,
y = a'bc + ab'c' + ab'c + abc' + abc f then. the output 00 would be I and all other outputs would be 0. If the input were 001, then
r the output O I would be 1, and so on. A common feature on a decoder is an extra input called
z = a'b'c + a'bc' + ab'c + abc' + abc I
1:~ enable. When enal)le is 0, all outputs are 0. When enable is Y, ·the decoder fundtioris as txffore.
An adder adds two n-bit binary inputs A and B, generating an n-bit output sum along with
I, an output carry. For example, a 4-bit adder would have a 4-bit A input; a 4-bit B. ihput, a 4-bit .
--,-------------------~-----------~ ,.r sum. output, and a I ,bit qar.ry .<>11tput. If A .were 10, 0 and B were 1001, then sum would be
Figure 2.4: Combinational logic design.: (a) problem description, (b) truth table. (c) output equations, (d) minimized f 0011 and ~arry would.beJ. An adderco~en comes with a carry input also, so that such adders
output equations, (e) final circuit. Ii can be cascaded to -create larger adders.
32 Embedded System Design embedded System Design
. -- ·--~·~. . -- - -~- ->-----···-- . ~ ---...-~--------.
Chapter 2: CustC1m Single-Purpose Processors: Hardware
2.3: Sequential Logic
I(m-1) II IO I(log(n)-1) IO A B A B A B
n
nbit, ~
~
·-
m function SO
ALU ~ I Q
'---n.......-~S(;,,-log m)
Q
0 O(n-1) 0100 carry swn less equal greater Q
0 Q= Q=/sb Q=
O= 00 =I if/=0..00 sum =A+B less = I if A<B O = A hp B 0 if clear= I, - Content shifted 0 if clear= I ,
/0 if S=0 .. 00 01 =1 if/=0 .. 01 (first n bits) equal =I if A=B op determined / if load=! and clock=!, - I stored in msb Q(prev)+l ifcount=I
/1 if S=0 ..01 carry= (n+ I )'th greater=! if A>B by S Q otherwise.
and clock=!.
On =I if/=1..11 bitofA+B
/(m-1) if S=i .11
Figure 2.6: Sequential c~mponents.

n~l l O _,
n--l_ me~ns 1 .. 111 .
rnw1res + H+ b~sic sequential circu!ts is ajlip-jlop. A flip-flop stores a single bit. The simplest type is the D
fl~p-flop. It has two mputs: D and clock. When clock is I, the value of D is stored in the
Figure 2.5: Combinational components. flip-flop, and that value appears at the output Q. When clock is O, the value of D is ignored,
ai:1d .the output Q continu~s to reflect the stored value. Another type of flip-flop is the SR
A comparator compares two n-bit binary inpl!tS A and B, generating outputs that indicate fl1~·flop, which has three mputs: S, R, and clock. When clock is O, th~ previously stored bit is
whether A is less than, equal to, or greater than B. IfA were 1010 and B were 1001, then less mamtamed and appears at. output Q. When clock is I, the inputs Sand R are ·examined. If Sis
would be 0, equal would be 0, and greater would be 1. · _I , al is stored. If R is 1, a O is stored. If both are 0, there's no change. If both are I, behavior
An arithmetic-logic unit (ALU) can perform a variety of aritlunetic and logic functions 1s, undefined .. Th.us, S stands for set to I, andR for reset to 0. Another flip-flop type is a JK
on its n-bit inputs A and B. The select lines S choose the current function; if there are m flip-flop, which 1s the same as an SR flip-flop except that when both J and Kare I the stored
possible functions, then there must be at least log2(m) select lines. Common functions include bi.t toggles from. I to O or O to I. To prevent une;,.;pected behavior from sig~I glitches,
addition, subtraction, AND, and OR. flip-flops are typically designed to be edge-triggered, meaning they examine their non-clock
Another common RT-level component, not shown in the figure, is a shifter. An n-bit mputs when clock is rising from Oto I, or alternatively when clock is falling from I too.
input/ can be shifted left or right and then output to an output 0. For example, a 4-bit shifter '
with an input 1010 would output O10 I when shifting right one position. Shifters usually come RT-Level Sequential Components
with an additional input indicating what value should be shifted in and an additional output
Just .as .we used more abstract combinational components to implement complex
indicating the value of the bit being shifted out. ·
combm~tional systems, we also use more abstract sequential components for complex
sequemial systems. Figure 2.6 illustrates several sequential components, which we describe.
A register stores n bits from its n-bit data input/, with those stored bits .appearing at its
2.3 Sequential l,.ogic output Q. A register usually has at least two control inputs clock and load. For a
rising-edge-triggered re~ster, ~e inputs / ar~ only stored when ,;ad is . I and clock is rising
Flip ..Flops from O to 1. The c_l~k mput 1s usually drawn as a small triangle, as shown in the figure.
a
A sequentialcir6u1tis digitai circiilt who~ outputs are a function of the present as weli as Another common reg1st~~ control input is clear, which resets all bits to O, regardless of the
previous input values. In
other words/ seq'ueritiiil logic possesses tnemoryi One of the most value ?f /. Because all n bits of the register can be stored in parallel, we often refer to this type
of register as a parallel-load register, to distinguish it from a shift register.
34 Ern,be(lded SystemOesign Embedded Syst~m Design

35
Chapter 2: Custom Single-Purpose Processors: Hardware 2.3:Sequentiallogic
(a) (e)
A shift register stores n bits, but these bits cannot be stored in parallel. Instead, they must Construct a pulse divider. Slow QIQO
be shifted into the register serially, meaning one bit per clock edge. A shift register has a I-bit down your pre-existing pulse so 11 00 01 II 10
data input /. and at least two control inputs clock and shift. When clock is rising and shift is 1, that you output a I for every four a
0
the Yalue of/ is stored in the nth bit, while sin:mltaneously the nth bit is stored in the (n - l)th _pulses detedeJ. 0 0
bit. the (n - l)th bit is stored in the (n - 2)th bit, and so on, down to the second bit being
stored in the first bit. The first bit is typically shifted out, appearing over an output Q. (b) 0 0
a=O a=O
A counter is a register that can also increment, meaning add binary I, to its stored binary 11 =Ql'QOa+Qla'+QIQO'
, ·alue. In its simplest form. a counter has a clear input, which resets all stored bits to 0, and a x=l IO QIQOOO 01 11 lO
count input. which enables incrementing on each clock edge. A counter often also has a · a=l
parallel load data input and associated load control signal. A common counter featqre is both a : :
a=l
up and down counting or incrementing and decrementing, requiring an additional control a=I
O 0 LL...........L' 0
input to indicate the count direction. r -
These control inputs can be either synchronous or asynchronous. A synchronous input's
x=O I ,_ __ L.,i 0 0 u__
,·alue only has an effect during a clock edge. An asynchronous input's value affects the circuit
a~o a=O
IO = QOa' + QO'a
independent of the clock. Typically, clear control lines are asynchronous. (c)
X QIQ000
X 01 II lO
Sequential ·Logic Design a Combinational logic a
0
Sequential logic design can be achieved using a straightforward technique, whose steps are. 0 0 r1··1 0
illustrated in Figure 2.7. We again start with a problem description, shown in Figure 2.7(a).
QI QO 0 0 L1...J 0
We translate this description to a state diagram, also called a finite state machine (FSM), as in
Figure 2.7(b). We describe FSMs further in a later chapter. Briefly, each state represents the ; '' x=QIQO
current ··mode"' of the circuit, serving as the circuit's memory of past input values. The f State register ~
desired output values are listed next to each state. The input conditions that cause a transition f.
from one state to another are shown next to each arc. Each arc condition is implicitly ~ II a
··AN.Dcd'' with a rising (or falling) clock edge. In other words. all in_puts are synchronous. All f
inputs and outputs must be Boolean, and all operations must be Boolean operations. FSMs (
can also describe asynchronous systems, but we do not cover such systems in this book, since f dl
they arc not very common. · ~ Inputs Outputs
We will ill}plcmcnt this FSM using a register to store the current state, and combinational ·
QI QO a 11 IO X
iogic to generate the output values and the next state. as shown in Figure 2.7(c). We assign to
0 0
each state a unique binaI) value. and we then create a truth table for the combinational logic, 0
0 I
as in Figure 2.7(d). The inputs for the combinational logic are the state bits coming from the 0 I •
slate register. and the external inputs. so w·c list all combinations of these inputs on the left . I O O
side ofthc table. The outputs for _the. combin.ational logic arc the state bits to be loaded into I:
the register ori the next clock edge (the next state), and the external output values, so we list ·
desired. values of these outputs for each input combination on the riiht side of the table. :
BcC:iiusc we used a state.diagram for which outputs were a function of tilt! current state only, t
and ncit of the il)puts, \\C list an external output value only for each possible state, ignoring the (
c\_t:rn.!11, inp.ut v~.·_,lucs. .N.o,,_· t_~llll we .hav~ a. t.~tb table. we. procee.d ,vith co.mbinati~nal .logic 1·· Figure 2.7: Sequential logi~·design: (a) problem description. (b) state diagram, {c) implementati.;.. model, (d) state
design as ocscnbcd carhcr. o~· _g cneratmg muum1zed output equations as shown m Figure . t•ble (\lonr<-typ•). (•) minimized· output equations, (I) combinational logic.
2.7.(ci: anq th!!n drawing the combinational logic circuit as in Figure 2.7(f). As you cari see, ,
scqucntta( logic design is very much like combinational logic design, 3S long as WC draw the .•
state table in such a way that it can be·used as a combinational logic trilth table also. [

_____ __ . ______ __ ___,___ -: •.~ -- ~ ~ :.... · ·:. ·- ··.:...··· ..~ - - ~-,v--· ·_., _ _- ·. . ·~~-.... - -
www.compsciz.blogspot.in ·- - -··----- -~·~-- de "/2S•lz¼: ..,: <.· ; -~~· -,..- . - -· .. ___ _ ___
";.~ -
Chapter .2: Custom Single-Purpose Processors: Hardware

Ell () / 2,4: Custom Single-Purpose Processor Design .
t,c_J
/ system's data._ Exam~les of data in an embedded system in~~~de~inary__11u!11bers representing @
external external e~~':'1-al condillons hke_ temperature or speed, characters to be displayed on ascreen""." or a ,
control data d1~1t1zed p~otograp_hic unage to ~ sto°:'1 ~d ~mpressed. The datapath contains register t,~
inputs
datapath
inputs
uruts, functional umts, and connect.Ion wuts like wires and multiplexors. The datapath can be ~
control configured to reac!___data from particular regi§l~S, feed that data through functional units
contmller inputs datajili configured to carry out particu operations like aild or shift, and store the operation results
./ 2!.~ into particular regis.te~_s.. cont,,_o_l./er carries ·out such configuration of the. . datapath. It
~ the datapath control . puts, 1 e regt er _ _ an multiplexor select signals. of the
register units, fu<'" Jjg.~f npits, a conneclioU:urli~tain, the desired-;-configma~at a
particular t· _:. t ~~tors tema1.con~ei1..:,_~uts as~ ~- saa~--p~th
-~~~ti.oLoutputs. ~----;;
as s t@als, comm ortt functional wuts 1t ~xterrial comiol out uts as w
e can app y t combinational an u :· -·· · ..,- ··· -·,-·- s escribed e lier
to build a con er and a datapa . Of , we now describe a technique to convert a
computati task into a ~ustom single-purpose processor consis~-of a controller and a
Figure 2.8: A basic processor: (a) controller and datapath, (b) a view inside the controller and datapath. datapath. . ~ - ·
We begir. with a sequential program describing the computation task that we wish to
One of the biggest difficulties for people new to implementing FSMs is understanding implement. Figure 2.9 provides an example task of computing a greatest common divisor
FSM and controller timing. Consider the situation of being in state O in Figure 2.7(b). This (GCD). Figure 2.9(a) shows a black-box diagram of the desired system. having x_i and y _i
means that the output x is being assigned 0. and the FSM is waiting for the next clock pulse to data inputs and a data output d_o. The system's functionality is straightforward: the output
come along. When the new pulse comes. we'll transition to either state O or I, depending on should represent the GCD of the inputs. Thus, if the inputs are 12 and 8, the output should be
input a . From the impleme!ltation model perspective of Figure 2.7(c), in state 0, the state . 4 . If the inputs are 13 and 5, the output should be I. Figure 2.9(b) provides a simple progran1
register has Os in it. and these Os are trickling through the combinational logic, eventually II with this functionality. The reader might trace this program's execution on these examples to
. producing a stable O on output x. and the ne:\.1 state signals JO and/ I are being produced as a ,: verify that the program does indeed compute the GCD .
function of the state register outputs and input a. Input a needs to be stable before the next To begin building our si~gle-purpose processor implementing the GCD program, we first
clock pulse comes along. so that the next state signals are stable. When the ne:\.1 pulse does convert our program into a complex state diagram, in which states and arcs may include
come along. the state register w~I be loaded with either 00 or OI. Assume OI, meaning we arithmetic expressions, and those expressions may use external inputs and outputs as well as
will now be in state I. Tnen this 01 will tricJde through -the combinational logic, causing variables. In contrast, our earlier state diagrams included only Boolean expressions. and those
output x to be 0. And so on. Notice that the etions of a state occur slightly after a clock pulse expressions could use only external inputs and outputs, not variables. This more complex state
causes us to enter that state. diagram is essentially a sequential program in which statements have been schedOled tnto
Notice that ·there is a fundamen · assumption being made here regarding the clock states. We'll refer to a complex state diagram as afinite state machine with data (FSMD).
frequency. namely. that the clock Ii ency is fast enough to detect events on input a. In other , We can use templates to convert a program to an FSMD, as illustrated in Figure 2.10.
words. input a must be held at its alue long enough so that the next clock pulse will detect it First, we classify each statement as an assignment statement, loop statement, or branch
If input a switches from O to I d back to 0, all in between two clock pulses, th!'!n the switch (if-then-else or case) statement. For an assignment statement, we create a single state with that
to I would never be detect . Yet the clock frequency must be slow enough to allow outputs statement as its action; and we add an arc from this state to the first state of tile next statement,
to stabilize after being generated by the combinational logic. We recommend that one study as shown in Figure 2. IO(a). For a loop statement, we create a condition state C and a join state
the relationship between the FSM and the implementation model for a while, until one ls J, both with no actions, as shown in Figure 2.IO(b). We add an arc with the loop;s condition
com able with this relationship. from the condition state to the first statement in the loop body. We add a second arc with the
complement of the loop's condition from the condition state to the next statement after the
loop body. We also add an arcfrom the join state back to the condition state. For a branch
Custom Single-Purpose Processor Desi statement, we create a condition state C and a join state J, both with no actions. as shown in
Figure 2. lO(c). We add an arc with.the first branch's condition from the condition state to the
We now have the knowledge needed io build a basic process . sic processor consists of branch' s first statement. We add another arc with the complement of the first branch's
a controller and a datapath, as iliustrated in Figure 2.8. Th 'data _th stores and manipulates a
---~, condition ANDed with the second branch's condition from the condition state to the branch's
38 Embedded System Design Embedded System Design 39
I
r
Chapte r 2: custom Single-Purpose Pr0C::esS0(~:
Hardw are
(a) (b)
2.4: Custo m Single- Purpos e Processor Design
I I:
int X, y;
L~- ->- ---- i

IJ a=b
next statar ent
while (cxn::i) I
lcxp-b :x:ly- stat=t s
if (cl)
(c)
cl strnts
else' if c2
2: next stater ent c2 strnts
GCD else
d_o other strnts
next stater ent
, - - - - - - !cond
(a)
C:
cl !cl "c2
0: int .x, y; 4:
·.... . l. . . ·7 . . ..
next ···:
1: while (1) { (c) statement cl stmts c2 stmts : other stmts
2: ~le (!go_i );
5:
7
3: X =Xi;
4: y=y-iG
}:'~~)x;{ 6: J:
lse · 7:
I
9: _,,,~ x ;
X =' X - y;
6-J:
next
statement
next
statement I
I
., ,. .-;:>,;.;;: . -
~:·/
- .... (b) Figure 2. JO: Templates for creating a slate diagram from
program stateme nts: (a) assignment: (b) loop. (c) branch.
..--·
/ ' ~·,,. 5-J:
,:
/ /
and a controller part, as shown in Figure 2.11. The
9: ! interconnection of combinational and sequential compo
consist ofa pure FSM (i.e., one containing only Boole
datapath part should consist of an
nents. The controller part should
an actions and conditions.)
1-J: We construct the datapath through a four-step process:
Figure 2 _9, Example program _ Greatest Common Divisor 1. First, we ·create a register for any declared variab
le. In the example, the variables are
stale diagram. ·
(GCD): (a) black-b ox vie", (b) desired fun~tionality,
(c) : xand y. We treat an output port as an implicit variab le, so we create a register d and
connect it to the output port. We also draw the input
and output ports. Figure 2.1l(b)
first statement. we. repeat this for e~ch branch. Finally .shows these three registers as light-gray rectangles.
, we co!Ulect the '.11'c leaving the iast ·
statement of each branch to the join state, and .we 2. Second, we create a functional unit for each
add an ar.c from this state to the, next arithmetic operation in the state
1· diagram. In the example, there are two subtractions
statement's state. , one comparison for less than,
Using this template approach, we conve rt our GCD . . and one comparison for inequality, yielding two subtra
program to the FSMD of Fi~re ctors and tm comparators,
. shown as white rectangles in Figure 2.1 l(b).
2.9(c). Notice that variables are being assigned in
x = x _ y , which also includes an .arithmeti.c operat
so~e of th~ stat~, such as th~ acu~n
ion. Agam, vanab.les and arithmetic
1· 3. Third, we connect the ports, registers, and functio •
nal units. For each write to a
operations/conditions are what make FSMDs more power . variable in the state diagram, we draw a connection·'1
ful th~ FSMs. ro·rn the write's source to the
We are now well on our way to design ing a custom . . variable's register. A source may be an input port,
s1~gl~-~~se processor, that . a functional unit, or another
executes the GCD program. Our next step is to divide register. Fcrr .each aritlunetic and logical operation, we
the funcuo nah~ mto a datapath par1 connect sources to an input of
the operation's corresponding functional· unit. When
more than one source is
40 Embedded-System Oes~n. 1~ e d System Design
. ·.•. ~.,.... :. ,,.·_....~. - -./.: ........,...-. -~_, '
Chapter·2: Custom Single-Purpose Processors: Hardware
2.4: Custom Single.Purpose Processor Design
connected to a register, we add /ID appropriately sized multiplexor, shown as

dark-gray rectangles in Figure 2.1 l(b). State table
4. Finally, we create a unique identifier for each control input and output of the lnputs Outputs
X
datapath components. Examples in the figure include x_sel and x_neq_y. .
Now that we have a complete datapath, we can modify our FSMD of Figure 2.9(c) mto .0
w
.0
IV
,0 ,0
0
I
r'
::,
~
e -· -
I
X
00
0
I w ;::::; - 0 t
"'
!!..
'-<
I
"'
!!..
t
0:
'-<
I
0:
I
Cl.
0:
the FSM of Figure 2. ll(a) representing our controller. The FSM has the same states and
transitions as the FSMD. However, we replace complex actions and conditions by Boolean
ones, making use of our datapath. We replace every variable write by actions that set the
'-<
1
go_i elk
(a) Controller ~Datapath
r-----,___._
0000 l:L_ _ y----, ++t-1--'_"-_=,.~~.-_
0001 2:
0010 2-J:
0011 3:
0100 4:
!x neq V
·---- ........
(c) Controller implernenta~on model
go_1 Figure 2.12: State table for the GCO example.
Combinational I
logic select signals of the multiplexor in front of the variable's register such that the write 's source
passes through, and we J1.sse1t the load signal of that register. We replace every logical J
operation in a condition by the corresponding functional unit control output. In this FSM, any ·f
signal not explicitly assigned· in a state is 'implicitly assigned a 0. For example, x Id is ·t
1001 'i
implicitly assigned O in every state except for 3 and 8, where x_Id is explicitly assigned ~ 1
t
l
We can then complete the controller design by implementing the FSM using our :I
1010 5-J: sequential design technique described earlier and illustrated in Figure 2.4. Figure 2. 1 l(c) :I
shows the controller implementation model. Figure 2.12 shows a state table for the controller.
Note that there are seven inputs to the controller, resuliing in 128 rows for the state table. We
reduced rows in the stace table of the figure by using * for some input combinations, but we
llOO 1-J: can still see that optimizing the design using hand techniques could be quite tedious. For this
reason, computer-aided design (CAD) tools that automate both the combinational and the
sequential logic design can be very helpful; we'll introd11ce some CAD tools in the last
chapter. CAD tools that automatically generate digital gates from sequential programs,
FSMDs, FSMs, or logic equations are known as synthesis tools .
. Figure 2.11: Example program - Greatest Common Divisor (GCD): (a) controller, (b) datapath, (<;) contr~er model.
42 Embedded System Design Embedded S~stem Design

43
-· -- ·-·- --,,;,.:..........~ . : . . ~ - ~ ~ '- -·-··--· -·-··-· ··· ~---

Chapter 2:.Custom Single-Purpose Processors: Hardware
Also, note that we could perform significant amowtts of optimization to both the datapath (a)
and the controller. For example, we could merge functional units in the datapath, resulting in
fewer units at the expense of more multiplexors. We could also merge a number of states into Bridge
a single state, reducing the size of the controller. Interested readers might· examine the rdy_in A single-purpose processor that rdy_out
converts two 4-bit inputs, arriving one
textbook by Gajski referred to at the end of this chapter for an introduction to these clock at a time over data_in along with a
optimizations. rdy_in pulse, into one 8-bit output on
Note that we could alternatively implement the GCD program by programming a data_out along with a rdy_out pulse.
general-purpose processor, thus eliminating the need for this design process, but possibly data.:_in(4) data_out(8)
yielding a slower and bigger design.
Finally, we once again discuss timing, this time for FSMDs rather than FSMs. When in a
. particular state, all actions internal to that state are considered to be concurrent to one another. rdy_in=O Bridge rdy_in=f
· Those actions are very different from a sequential program, in which statements are executed
in sequence. So, if x = 0 before entering a state A in an FSMD, and state A's actions are "x = .
WaitFirst4 RecFirst4Start RecFirst4End
x + l" and "y = x," then y will equal 0, not I, after exiting state A. This concurrency of
data_lo=data_in
actions also implies that the order in which we write the actions in the state does not matter.
Furthermore, note that actions consisting of writes to variables do not actually update rdy_in=O rdy_in=O
those variables until fhe next clock pulse, because those variables are implemented as rd in=I
registers. However, arcs leaving a state may use those variables in their conditions. Thus, an
arc leaving state A, but using variable x, is using the old value of x, 0 in our example in the WaitSecond4 RecSecond4Start RecSecond4End \
data_hi=data_in !
previous paragraph. Assuming an outgoing arc is using the new value assigned in the arc's !
source state is by far the most common mistake that people make when creating FSMDs. If I
rdy_in=O I
we wish to assign a value to variable x and then branch to different states depending on that Inputs
value, then we must insert an additional state before branching. Send8Start rdy_in: bit; data_in: bit[4);
data_out=data_hi Send8End Outputs
&data_lo rdy_out;=Q rdy_out: bit; data_out:bit(8)
rdy_out=l Variables
data_lo, data_hi: brt[4];
2.5 RT-Level Custom Single-Purpose Processor Design
(b)
Section 2.4 described a basic technique· for converting a sequential program into a custom
single-purpose processor, by first converting the program to an FSMD using the provided
templates for each language construct, splitting the FSMD into a simple FSM controlling a Figure 2.13: RT-level custom single-purpose processor design example: (afproblem specification, (b) FSMD.
datapath, and performing sequential logic design on the FSM. However, in many cases, we
prefer not to start with a program, but instead directly with an FSMD. The reason is that often Different designers might attack this problem at different levels of abstraction. One
the cycle-by-cycle timing of a system is central to the design, but programming. languages designer might start thinking in tcnns of registers. multiplexors. and flip-flops. Another might
don't typically support cycle-by-cycle description. FSMDs, in contrast, make cycle-by-cycle try to describe the bridge as a sequential program. But perhaps the most natural level is to
timing explicit. describe the bridge as an FSMD, as shown in Figure 2.13(b). We begin by creating a state
For example, consider the design problem in Figure 2.l3(a). We want one device (the Wai!First4 that waits for the first 4 bits, whose presence on data.in will be indicated by a .
sender) to send an 8-bit number to another device (the receiver). The problem is that while the pulse on rdy_in. Once the pulse is detected, we transition to a state RecFirst./Start tl1at saves
receiver can receive all 8 bits at once, the sender sends 4 bits at a time; first it sends the the contents of data_ iri in a variable called data_lo. We then wait for the pulse on r,{v in to
low~order 4 bits, then the high-order 4 bits. So we need to design a bridge that will enable to end, and then wait for the other 4 bits, indicated by a second pulse on rdv in. We save the
two devices to communicate. .. contents of data_in in a variable called data_hi. After waiting for the seco~ct" pulse on rely in
to end, we write the full 8 bits of data to the output data__out, and we pulse_rdy_o ut. We
44 Embedded System Design Embedded System Design 45:

~
··- --~ -..- -- · . ..... ~ · - ·d : >~ - ~ - , : : : ½~ --~--- - -- ·
Chapter 2: Custom Single-Purpose Processors: Hardware 2.6: Optimizing Custom Single-Purpose Processors
Bridge shown in Figure 2.14(a). This conversion requires only three simple changes, as shown in
bold m the figure. Having obtained the FSM, we can convert the FSM into a state-register and
(a) Controller combinational logic using the same technique as in Figure 2. 7; we omit this conversion here.
rdy_in=O rdyc..in=l This exa~ple demonstrates how_a _problem that consists mostly of waiting for or making
changes on signals, rather than cons1sung mostly of perfonning computations on data, might
WaitFirst4 RecFirst4Start RecFirst4End most easily be descnbed as an FSMD. The FSMD would be even more appropriate if specific
data_lo_ld=l numbers of clock cycles were specified (e.g., the input pulse would be held high exactly two
cycles and the output pulse would have to be held high for three cycles). On the other hand, if
rdy_in=O rd in=l a problem consists mostly of an algorithm with lots of computations, the detailed !i'lling of
,_...,__ __.__._...._rdy_in,=_I_ _ _ _ ____ which arc not especially important, such as the GCD computation in the earlier eX3)¼,le, then
RecSecond4Start RecSecond4End a program might be the best starting point.
data_hi_ld=l The FSMD level is often referred to as the register-transfer (RT) level, since an FSMD
describes in each state which rcgisters should have their data transferred to which other
registers, with that data possibly being transformed along the way. The RT•level is probably
the most common starting point for custom single-purpose processor design today. ·
Send8Start Send8End Some custom single-purpose processors do not manipulate much data. These processors
data out ld=l rdy_out=O
rdy_out=l
consist primarily of a controller, with perhaps no datapath or a trivial one with just a couple
registers or counters, as in our bridge example of Figure 2.14. Likewise, other custom
single-purpose processors do not exhibit much control. These processors consist primarily of
a datapath configured to do one or a few things repeatedly, with no controller or a trivial one
rdy_in rdy_out with just a couple flip-flops and gates. Nevertheless, we can still think of these circuits as
elk processors.
'
data_in(4) data_out
t 2.s Optimizing Custom Single-Purpose Processors

t You may have noticed in the GCD example of Figure 2.11 that we ignored several
opportunities to simplify the resulting design. For example, the FSM had several states that
u
(.
.obviously do nothing and could have been removed. Likewise, the datapath has two adders
f whereas one woulcl have been sufficient. We intentionally did not perform such optimiz.ations
so as not to detract from the basic idea that programs can be converted to custom
..;...__ _ _(b_)_Da_ta_pa_th_ _ _- - ' - - - - - - : - - - - - - - - ~ - - - · 1[.:.,.. single-purpose processors through a series of straightforward steps. However, when we really
Figure 2.14: RT-level custom single-purpose processor design example continued: (a) controller, (b) datapath. design such processors, we will usually also want to optimize them whenever possible.
Optimi7.ation is the task of making design metric values the best possible. Optinni.ation is an
assume we are building a synchronous circuit, so the bridge has a clock input - in our ~ extensive subject. and we do not intend to cover it in depth here. Instead, we point out some
FSMD, every transition is implicitly ANDed with the clock. f simple optimizations that can be applied, and refer the reader to textbooks on the subject
·We apply the same methods as before to convert this FS:MD to a controller and a datapath (
implementation, as illustrated in Figure 2.i4. We build a datapath, shown .in Figure 2.14(b), • Optimizing the Original Program
using the four-step process outlined before. We add registers for data_hi and data_Jo, as well 1· Let us start with optimizing the initial program, such as the GCD program in. Figure 2.9. At
as for the output data_out. We don't add any .functional units sin~ there are no arithmetic . tl1is level, we can analyze the number of computations and size of variables that are required ·
operations. We connect the registers according to the assignments in the FS'MD; no ' by the algari!lun. In other ~ords, we can _analyze the algorithm in terms of time complexity
multiplexors are necessary. We create unique identifiers for the register control signals. · and space c--1:1,plexity. We can try to develop alternative algorithms that are more efficient In
Having completed the datapath, we convert the FS:MD into an FSM that uses the data~th, as ~
46
-- ------ .... - - - - - -- -=-- . ~ _,, .• , _ - =~-==~~-, ,l·~""' www.compsciz.blogspot.in

s - o~.,
...,...------r
_C_ha_p~t-er_2_:_C_u_st_o_m_S_in_g_le---Pu_rp_os_e_P_r_oc_e_s_so_rs_:_H_a-rd_w_a_r_e_ _ _ _ _ _ _ _ _ _ _
-·-----------::-------2_.6_:_0..:p...:t_im...:i.::zi.:.:ng:_:C,:11:::st::om::.:.:S:::in:g!:k!::::.P:u:rpo!
: :se:.'.P:roc~e~ss~o~r~s
i
.. !
·'
the GCD example, if we assume we can make use of a modulo operation <Y.;, we could write
an algorithm that would use fewer steps. In particular, we could use the following algorithm: (a) I: (b)
int x, y, r;
while (1) {
while ( !go i);
:-z:
if (x_i >= y_i) {x=x_i; y=y_i; l
else {x=y_i; y=x_i;} II x must be the larg~r . nuinber 2cJ: int X, y;
while (y != OJ {
r = X % y; 2:
X = y;
go_. !go_i
y r; ,---'--
I 3:
do= x;
Let us compare this second algorithm with the earlier one when computing the GCD of
42 and 8. The earlier algorithm would step through its inner loop with x and y values as
follows: (42;8), (34,8), (26,8), (18,8), (10,8), (2,8), (2,6), (2,4), (2,2), thus outputting 2. The
second algorithm would step through its. inner loop with r and y values as follows: (42,8),
(8,2), (2,0), thus outputting 2 . The second algorithm is far more efficient in terms of time.
Analysis of algorithms and their efficient design is a widely researched area. The choice of --.•
l 7:
algorithm can have perhaps the biggest impact on the efficiency of the designed processor.
Optimizing the FSMD

Once an algorithm is settled upon, we convert the program describing that algorithm to an
FS:tv!D. Use of the template-based method inti:oduced in this chapter will result in a rather
inefficient FSMD. In particular, many states in the resulting FSMD could likely be merged
into fewer states. ..
Scheduling is the task of assigning operations from the original program to states in an
FSMD. The scheduling obtained using th~ template-based method can be improved. Consider
the origiru)J. FSMD for the GCD, which is redrawn in Figure 2.15(a). State l is clearly not
necessary since its outgoing transitions have con.i;tant values. States 2 and 2-J can be merged
into a single state since there are no loop operations in between them. States 3 and 4 can be
merged since they perform assignment operations that .a re independent o.f one an . o.ther. States
5 and 6 can be merged. States 6-J and 5-J can be eliminated, with the transitions from states 7
and 8 pointing directly to state 5. Likewise, state 1-J can be eliminated. The resulting reduced
I
..
Figure 2. 15: Opt_imizing the FSMD for the- GCD example: (a) original FSMD and optimizations and (b) .,..,;nu·zed
FSMD. . -· .. . ' '
operation doWI_t into smaller operations, lik·e tl = b * c, t2 = d * e, and a = ti * t2, with each
Vf"'
FSMD is shown in Figure 2. l5(b). We reduced the FSMD from thirteen states to only six : smaller o~ration having its own state. Thus, only one multiplier would ·be needed in the
states. Be careful, though, to avoid the common mistake of assuming that a variabie assigned i datapath, st~ce the three multiplications could share multiplier; sharing will discussed in re
in a state can have the newly assigned value read on an outgoing arc of that state! ,. the next section. · _/
. The original FSMD could also have had too few states to be efficient in terms of : In this scenario, we assumed that the timing of output operatio~ could be changed. For
hardware size. Suppose a particular program statement had the operation a = b *·c * d * e. e~ple, the reduced FSMD will generate the GCD output in fewer clock cycles than the
Generating a single state for this operation will require us to use three multipliers in our • ong_mal FS~. _In many cases, changing the timing is not acceptable. For ex;unple, in our
datapath; However, multipliers are expensive, and thus we might instead want to break this ·
..
· earher clock divider example, changing_the timing clearly would not be acceptable, siJK:e we
48 · E111bedd~ i.ystem Design

Embedded System Design · 49
- . -_- . ·,.·i;>··-ry·· 1:·-).,•.-· ..:, -' ~ ,y - . · -· ·· -·-
Chapter 2: custom Single-Purpose Processors: Hardware
2.7: Summary. :
intended for the·cycle-by-cycle behavior of the original FSM to be preserved during design.·· The state merging that we did when optimizing our FSMD was not th
Thus, when optimizing the FSMD, a design must be aware of whether output timing may or · · · · · e same as state
rrurunuzatton as defined here. The reason is that our state merging in the FSMD
may not be modified. changed the output behavior, in particular the output timing,. of the FSMD T . 1·1 actually.
f . · · yp1ca Y by the
1me we amve at an FSM, we assume output timing cannot be changed State m· · ' · .
. .
does not c hange the output behavtor · 1rum1zauon
Optimizing the Datapath m anf way.
In our four-step datapath approach. we created a unique functional unit for every arithmetic
operation in the FSMD. However, such a one-to-one-mapping is often not necessary. Many
arithmetic operations in the FSMD can share a single functional unit if that functional unit 2. 7 Summary
supports those operations, and those operations occur in different states. In the GCD example,
states 7 and 8 both performed subtractions. In the datapath of Figure 2.11, each-subtraction Desig~g a custom sin~l~-purpo~ proces~r for a given program requires an understanding
got its own subtractor. Instead, we could use a single subtractor and use multiplexors to of v~nous as~':' of d1gi_tal des~gn. ~1gn of a c~cuit .to .implement Boolean functions
choose whether the subtractor inputs are randy, or instead y and r. require~ comb1nattonal ~~~· which consists o~building a truth table with all possible inputs
Furthermore, we often have a number of different RT components from which we can an~ desrred outputs, optt~ztng the output functions, and drawing a circuit. Design of a circuit
build our datapath. For example, we have fast and slow adders available. We may have !0 impleme~t a ,state di~gram requires sequential design, which consists of drawing an
multifunction .components, like ALUs, also. Allocation is the task of choosing which RT 1~plementat1_on model with. a .stat~ ri:gister and a combinational logic block, assigning a
components to use in tlie datapath. Binding is the task of mapping operations from the l1SMD bi~ en~ng to_each state, drawm? a state table with inpu~ and outputs, and repeating our
to allocated components. ~m~1natt?nal design process for this table. Finally, design of a single-purpose processor
Scheduling, allocation, and binding are highly interdependent. A given schedule will circuit to 1mple~ent a program requires us to first schedule the program's statements into a
affect the range of possible allocations, for example. An allocation will affect the range of complex state diagran_i, 90nstruct a ~~ from the diagram, create a new state diagram that
replaces complex acuons and conditions by datapath control operations and th d ·
possible schedules. And so on. Thus, we sometimes want to consider these tasks 11 · • fi . . , en es,gn a
simultaneously. ~ntro er crrcwt or the n~ sta~e dia~ usmg sequential desi~. The register-transfer level
1s the most co~on starting po~t ~f ~1~ today. Much optimization can be performed at
Optimizing the FSM
each level of design, but such optmuzatton 1s hard, so CAD tools would be great des·
aid. · · .
a , ..
1gner s
Designing a sequential circuit to implement an FSM also provides some opportunities 'for :
optimization, namely, state encoding and state minimization. - - - - - - - : - - - - - - - - : - - - - . . : . . . . __ _ _ ___;._ _ _ _ _ _ _ _ _ _ __
State encoding is the task of assigning a unique bit pattern to each state in an FSM. Any 2.8
assignment in which the encodings are unique will work properly, but the size of the state
References and Further Reading .
register as well as the size of the combinational logic may differ for different encodings. For · • De Michel!, Giovanni, Synthesis and Optimization of Digital Circuits. New York:
example, four states A, B, C, and D can be encoded as 00, 01, 10, and 11, respectively. ,
0
McGraw-Hill, 1994. Covers synthesis techniques from sequential programs down to
Alternatively, those states can be encoded as 11, 10, 00, and 01, respectively. In fact, for an gates.
FSM with n slates where n is a power of 2, there are n! possible encodings. We can see this
• ·- Gajski, D~el D,, Pr!nci[J.les of pigi~al P~sign, Englewood Cliffs, NJ: Prentice-Hall,
easily if we treat encoding as an orderil).g problem - we order the states and as~ign a 1997._ Descnbes combn~attonal and sequential logic design, with a focus on optimization .
straightforward binary encoding, starting with 00... 00 for the first state, 00... 01 for the second teclutlques, CAD, and higher levels of design.
state, and so on. There are n! possible orderings of n items, and thus n! possible encodings. n! • Gajski, ~aniel D.'. Nikil ~ Allen Wu, and Steve· Lin, High-Level Synthesis:
is a very large number for large n, and thus checking each encoding to determine which yields lntroducllon to Chip an.d _System Design. Norwell, MA: Kluwer Academic Publishers
the most efficient,controller is a hard problem. Even more encodings are possible; since we 1_992. Emphasizes optimizations when
converting sequential programs to a custo0:
can use more than logi(n) bits to encode n states, up to n bits to achieve a one-hot encoding. smgle-purpose processor. · · ·
CAD tools are therefore a great aid in searching for the best encoding. • Katz, Ran<IJ:, ContemfJ!1ra_ry Logic Design. Redwood City, CA: Beiljamin/CwmniJ~gs,
State minimization is the task of merging equivalent states into a single state. Two states 1994. ~escn~s. cor.nb1nauonal and sequential logic design, with a focus oil logic and
are equivalent if; for all possible input combinations, those two states generate the, same sequential opttnuzation and CAD: · · ·
outputs and transition to the same next state. Such states are clearly equivalent, since merging :
them will yield exac,tly the same output behavior. · ·
50 Embedded System.Design ~mbedded S.ystem Design 5l '
-=
·' ~··=
½-':~-··;.-
4-~·.,.
;.; ·= >.,...;;.,"'.'"""-""=-=·~. ~ -~- ... ---------~,=~ .~-~ --
;.; :pg
... www.compsciz.blogspot.in
------------- ·· - ---- - ·~ f rc w·-- · - ·";,S-''·;:;;i.::;; ·.;, ···--· ,-, -.·
2.9: Exercises
one clock cyde_per instruction (advanced processors use parallelism to meet 6r exceed
Exercises
2.9
2. 1 What is a single-purpose processor? What are the benefits of choosing a single-purpose .
processor over a general-purpose processor?
;~~r~:~r ~~s;:~:·~;;·a ~~~:~rr~it ~;:::~::ied gates with 200,000 cates, a .
Design a si~gle-purpose processo~ that outputs Fibonacci nwnbcrs up to n places. Start
2.2 How do nMOS and pMOS transistors differ? with a function computmg the desired result. translate it into a state diagram. and sketch
2.3 Build a 3-input NAND gate using a minimum number of CMOS transistors. a probable datapath.
2.4 Build a 3-input NOR gate using a minimum number of CMOS transistors. 2.20 Design a circuit that does the matrix muitiplication of matrices A and B. Matrix A is
2.5 Build a 2-input AND gate using a minimum number of CMOS transistors. 3 x2 and matrix Bis 2x3. The multiplication works as follows: ·
2.6 Build a 2-input OR gate using a minimum number of CMOS transistors. A B C j
[:j]·
2. 7 Explain why NAND and NOR gates are more common than AND and OR gates. h a•g ~ b*i a*h + b*k a*i + b*I ] . l
2.8 Distinguish between a combinational circuit and a sequential circuit. k [ c•g ·id"J c *I, + d*k c *i + d*I I.
i Fi
2.9 Design a 2-bit comparator (compares two 2-bit words) with a single output "less-than," ·
using the combinational design technique described in the chapter. Start from a truth
. e•g e *I, + f*k e*i + f*! !
table, use K-maps to minimize logic, and draw the final circuit. 2.21 An algorithm for matrix multiplication, assuming that we have one adder and one !
i
2.10 Design a 3x8 decoder. Start from a truth table, use K-maps to minimize logic and draw m~ltiplier, follows. (a) Convert the matrix multiplication algorithm into a state diagram !
the final circuit. usmg_ the t~mplate provided_ in Figure 2.10. (b) Rewrite the matrix multiplication i
2. 11 Describe what is meant by edge-triggered and explain why it is used. algonthrn _gi~en. the assumpl.lon that we have three adders and six multipliers. (c) If I
2.12 Design a 3-bit counter that counts the following sequence: l, 2, 4, 5, 7, l, 2, etc. This. each muluphcal.lon takes two cycles to compute and each addition takes one cvcle !
counter has an output "odd" whose value is l when the curreni count value is odd. Use compute, how many cycles does it take to complete the matrix multiplication given.one
the sequential design technique of the chapter. Start from a state diagram, draw the state adder and one mult1pher? Three adders and six multipliers? Nine adders and 18
table, minimize .the logic, and draw the final circuit. mult!pliers? (d) If each an adder requires 10 transistors to implement and each
2.13 Four lights are connected to a decoder. Build a circuit that will blink the lights in the mult1pher requires 100 transistors to implement. what is the total number of transistor
following order: O; 2, I, 3, O, 2, .... Start from a state diagram, draw the state table, needed. to implement the matrix multiplication circuit using one adder and one
minimize the logic, and draw the fin~ circuit. mult1pher? Three adders and six multipliers? Nine adders and 18 multipliers? (e) Plot
your results from parts (c) and (d) into a graph with latency along the x-axis and size
along the y-axis.
contrcller 1----'--....cS_O_l---1i.-10 main( ) (
Sil int A[3] [2 ] ( {l, 2), {3,4 ), (5, 6 ) );
int B[2] [3] ( (7, 8, 9}, (10, 11, 12} J;
int C[3](3];
2.14 Design a soda machine controller, given that a soda costs 75 cents and your machine int i , j, k;
accepts quarters only. Draw a black-box view, come up with a state diagram and state •
table, minimize the logic, and then draw the final circuit. for (i=O; i < 3; i++){
2. 15 What is the difference between a synchronous and an asynchronous circuit? for: ( j=O; j < 3; j++) (
C[i] [j] =O;
2.16 Determine whether the following are synchron0us or asynchronous: (a) multiplexor, (b)
for (k=O; k < 2; k++} (
_ ~gister, (c) decoder. . .
C[il[j) -t= A[i](k] * B[k][j ];
~ What is the purpose of the datapath? of the controller? . '
2.18 Compare the GCD custom-processor implementation to a software"implementation (a)
Compare the performance. Assume a I 00-ns clock for the microcontroller; and a 20-ns }
clock for the custom processor. Assume the microcontroller uses two operapd
·instructions, and each instruction requires four clock cycles. . Estimates for the 2.22 A subway has an embedded system controlling the turnstile. whicLes when two
. microcontroller are fine. (b) Estimate the number of gates for the custom design,- and tokens ~re deposited. (a) Draw the FSMD state diagram for this system. (b) Separate the
compare this to 10,000 gates for a simple 8-bit microcontroller. (c) Compare tlle custom FSMD mto an FSM+D. (c) Derive the FSM lqgic using truth tables and K-maps to
G_C D with the GCD rulUling on a 300-MHz- processor with 2-operand instructions and minimize logic. (d) Draw your FSM and datapath connections.
52 Embedded ·System :Oesign:: i:o::bedded System Design

53
CHAPTER3: General-Purpose
Processors: Software
''.i
1
!·l
3.1 Introduction \
3.2
3.3
Basic Arc·hitecture
Operation I
3.4 Programmer's View 1
3.5 Development Environment
. 3.6 Application-Specific Instruction-Set Processors (ASIPs)
3.7 .. Selecting a Microprocessor
3.8 General-Purpose Processor Design
3.9 Summary
3.11 pxercises
3.1 Introduction
A general-purpose processor is a programmable digital system intended to solve computation
problems in a large variety of applications. Copies of the same processor may solve
computatio.n problems in applications .as diverse as communication, automotive, and
industrial embedded systems. An embedded-system designer choosing to use a general-
purpose processor to implement part of a system;s functionality may achieve several benefits.
· First, the unit co.st of the processor may be very low, often a few dollars or less; One
reason for this low cost is that the processor manufacturer can spread its NRE cost for, the
processor's design over large numbers of units, often numbering in the millions or billions.
For example, Motorola sold nearly half a billion 68HC05 microcontrollers in 1996 alone
(soUl:'Ce::Motorola 1996 Anrtual Report). · . ··· · .
Second, because the processor manufacturei: can spread NRE cost over large numbers of
units, the inanufacturer can afford to invest<lai:ge NRE cost into the processor's design,
without significantly increasing the unit cost the processor manufacturer may thus use
r
Emtiedcied S'f!>tem besisn 55
•,, .., • .,; .• · ·is;;,.·.- · .........;,:a -·- •c'.'r·- ....... ... . .. .. · ~ v~ ··: ~ :_ .~· - - -·--··--·- ·--·--- "s--~ ~ . :,o,;~ ·= =··~·a~~ ----
Chapter 3: General-P11rpose Processors: Softwllre 3.2: Basic Ar<:hitecture
of storing temporruy data. Temporary data may include data brought in from memory but not
Processor
yet sent through the ALU, data coming from the ALU that will be needed for later ALU
operations or will be sent back to memory, and data that must be moved from one memory
Control unit Datapatb location to another. The internal data bus carries data within the datapath, while the external
data bus carries data to and from the data memory.
Controller ALU · We typically distinguish processors by their size, and we usually measure size as the
bit-width of the datapath components. A bit, which stands for binary digit, is the processor's
basic data unit; representing either a O (low or false) or a I (high or true), while we refer to 8
bits as a byte. An N-bit processor may have N-bit-wide registers, an. N-bit-wide ALU, an
N-bit-wide internal bus over which data moves among datapath components, and an N'-bit
wide external-bus O\'er which data is brought in and out of the datapath. Common processor
sizes include 4-bit, 8-bit; 16-bit, 32-bit, and 64-bit. However, in some cases, a particular
1/0 processor may have different sizes among its registers, ALU, infernal bus, or external bus, so
the processor-size definition is not an exact one. For example, a processor may have a 16-bit
internal bus, ALU and registers, but only an 8-bit external bus to reduce pins on the
processor's IC. ·
Figure 3.1: General-purpose processor basic architecture.
Control Unit
experienced computer architects who incorporate advanced architectural features, and may The control unit consists of circuitry for retrieving program instmctions and for moving data
use leading-edge optimization techniques, state-of-the-art IC technology, .and han?c~ed to, from, and through the datapath according to those instructions. The control unit has a
VLSI layouts for critical components. These factors can improve design metncs like program counter (PC) that holds the address in memory of the next program instruction to
perfuanance, size and power. . . fetch, and an instruction register (IR) to hold the fetched instruction. The control unit also has
Third, the embedded system designer may incur low NRE cost; sm::e the _d esigner need a controller, consisting of a state register plus next-state and control logic, as we saw in
only write software, and then apply a compiler and/or an assembler, bo~h of which are mature Chapter 2. This controller sequences through the states and generates the control signals
and iow-cost design technologies. Likewise, time-to-prototype and time-to-market will be necessary to read instructions into the IR, and control the flow of data in the <latapath. Such
short, since processor !Cs can be purchased and then programmed in _the ?esigner's own lab. flows may include inputting two particular registers into the ALU, storing ALU results into a
Flexibility will be great, since the designer can perfonn software rewntes m a straightfonvard particular register. or moving data between · memory and a register. The controller also
manner. detennines the next value of the PC. For a nonbranch instruction. the controller in&:rements
the PC. For a branch instruction, the controller looks at the datapath status signals and the IR
to determine the appropriate next address .
The PC' s bit-width represents the processor's address size. The address size is
. 3.2/a~ic Architecture ...... . _ . . . independent of the data word size; the address size is often larger. The address size
A general-purpose processor, sometimes Called . a CPU (central proc~ssmg urut) or a determines the number of directly accessible memory locations, referred to as the address
microprocessor, consists of a datapath and a control unit, tightly linked with a memory. We space or memory space. If the address size is M, then the address space is zM_ Thus, a
now discuss these .c omponents briefly. Figure 3.1 illustrates the basic architecture.
processor with a 16-bit PC can directly address z1 6 = 65,536 memory locations. We would
typically refer to this address space as 64K, although if lK = 1,000, this number would
Datapath . . represent 64,000, not the actual 65,536. Thus, in computer-speak, lK = 1,024 .
. The datapath consists of the circuitry for transfonning data and for storing temporary data. , For..each instruction the conu-oller ~rpically sequcnccs4hro11g-h-se\/eraLstages, such as
The daiapath contains an arithmetic-logic unit .(ALU) capable of transforming data through fetching U1e instruction . from memory. decoding _it_,_....fgchiog opecands_ _executing the
operations such as addition, subtraction, log\cal AND,_logical OR, inverting, and shifti~g. ~e inst~"!wninthe<latapath, arid.siormg-results. Each stage may coosisLo.[.011.e...QLIIIOre clock
ALU also. generate~ status signals; ofieJ:1_stored in a..status register (not s~own), mdicaung cycles. A cl~tcycle is usually the long~ tii~_LC.QUiteiLfotdataJo travel from orie register .
piirti.cular da,ta .c onditions. Such conditions includdndicating whether da~ 1s ze~o or whether to anotfieC-rhe_..P.a.!h through the data path gr contr~tlLeL.l.ha.t..results..i.rLthis long~st time <e.g.,
an addition of tw.o data items generates ac;arry,.The _datapath also contruns. registers capable from a datapath register "tiu-ough the ALU and back to a·datapathregister) is called the critical
---···· --· ...•.----- -----···· .
Embedded System Design Embedded System Oesign 57

56
www.compsciz.blogspot.in ----- - - - - ---- - -----'-'- , -_ ..:.... · -·- ·· ·--·5··· _:...__ .;· ·· · :x' . . _ ' ·d ·
Chapter 3: General-Purpose f»!'oeessors:_.s.gnware
U: Operation
Processor Processor
Memory
(program and data)
Cache
;I .....,._ _ _ _...,...._,__ _ _....J .
(a) (b)
,-..--·-·.·~.··--···--·-- ·-· ·····--·-·-·-·····-··-;
j . ·~... - . -·-·-·· --··....·-·1
j, re 3.2: Two memory architectures: (a) Harvard, (b) Princeton.
•.i Memory
i
path. The inverse of the clock cycle is the clock frequency, measured in cycles per second, or / Slower/cheaper technology,
Hertz (Hz). For example, a clock cycle of 10 nanoseconds corresponds to a frequency of L._,_...~~~qf.lY..!?.~..~- d_i/ferent chip ...._ /
1/10 x 10·9 Hz, o_r 100 MHz. The shorter the critical path, the higl,.ler the clock frequency. We
often use clock frequency as a means of Comparing processors, especially different versions Figure 3.3: Cache memory.
of the same processor, with higher clock frequency implying faster program execution.
However, using clock frequency is not always an accurate method for comparing processor /4 reduce the time. ~eeded to access (read or write) memory, a local copy of a portion of
speeds. · · ~emory may be kept m a small but especially fast memory called cache. as illustrated in
Figure 3.3. Cache memory often resides on-chip and often uses fast but expensive static RAM
Memory techno1ogy rather than slower but cheaper dynamic RAM (see Chapter 5). Cache memorv is
While registers serve a processor's short-term storage requirements, memory serves the bas~ on the principle that if at a particular time a processor accesses a particular me~orv
proces or's medium- and long-tenn information-storage requirements. We can classify stored locat~on;.then the processor will likely access that location and immediate neighbors of th~
i ation as either program or data. Program information consists of the sequence of location m the near future. Thus, w~en we first access a location in memory. we copy that
• structions that cause the processor to carry out the desired system functionality. Data location and some number of its neighbors (c d a block) into cache. and then access the
information represents the values hying input, output and transformed by the program. · copy o~ the location in cache. When w cess another location, we first check a cache table
We can store program and data together or separately. ln a ~ t e c t u r e , data to see if a copy of the location re · es in cache. If the copy does reside in cache. we have a
and program words share the same. memory space. In a Han,ard architecture, the program cache hit, and we can read rite that location very quickly. If the copy does not reside in
memory s ace. is istinct from the data memory space. Figure J:2 illustrates these two cache, we have a cache 1ss, so we must copy the location's block into cache. which takes a
I method nceton architecture may result in a simpler hardware connection to memory, Io~ of time. Thus, ti a cache to be effective in improving performance, the ·ratio or hits to
since on one connection is necessary. A Harvard architecture, while requiring two misses must be ery high, requiring intelligent c,aching schemes. Caches are used for both
conn ions, can perform instruction and data .fetches simullaneously, so may resuit in l program m ory (often called instruction cache, or I-cache) as well as data memory (often
i roved perforrnanc~ Most machines have a Princeton architecture. The Intel 8051 . is a !
~
calle D. ache), ·
well-known Harvard architectur~ s, -3_3-. _ _ _ _ _ __ __ _ _ _ _ ___::___
Memory may be read-onl~emory (ROM) or readable and writable memory (RAM).
ROM is usually much more compact than RAM. An embedded system often uses ROM for
program memory, 'since, unlike in desktop systems, an embedded system's program does not
change. Constant qata may be stored in ROM, but other data of course requires RAM.
Memory may be on-chip or off-chip. On-chip memory resides oil'. the same IC as the r
processor, while off-chip memory resides· on a separate IC. The processor can usually access
on-chip memory much faster than off-chip memory, perhaps in just one cycle, but finite JC
i l!
capacity of course:implies only a limited amount of oil~hip .meinory. 1. ask of reading the next instruction from memory into the ~
R
4
~
;I
Embedded System Design 59 ~·.i
~
. ·-····-·--··- - - -···- ·-~ ~...··~ ;.,.,.="'"'·'=·· ~ -- .... www.compsciz.blogspot.in ~ - ~ r=""'"""""""'"'""""'.'Ll., .· ,c.. ,,.J
Chapter 3: General.Purpose Processors: Software
t Wash
-11~,2-l~3l-4~l5-l-6l. .,.7"'T'"ls~I
· · Non-pipe;.:lin::;; . .:ed
.........._ _~ _
ltl2l3!4l5l6l7l8!
Pi lined
decode unit decodes it while the instruction fetch un· simultaneously fetches the next
· ction. The idea of pipelining is illustrated · Fi e 3.4. Note that for pipelining to work
well, instmction execution must be decom eq~ le~gth 'stages, and
e into_~_u_~_hly""'
!Il2l3l4l5!6l7l8] !ll2l3i4l5!6l7l8l ctio uire the same n r of
Branches pose a problem for pipe · g, since we don't know the next instruction until
I I I I I I I I I I 1.. I I I I I I I I ., the current instruction has reached the execute stage. One solution is to stall the pipeline when
(a) Time (b) Time a branch is in the pipeline, waiting for the execute stage before fetching the nex1 instruction.
An alt~mative i~ t~ guess which way the bran~ll will go and fetch the corresponding
Fetch-instruction . 1 mstrucuon next; if n~ _we proceed with no ~ ty, but if we find out in the .execute stage
./ Decode
Fetch operands
Execute
that ~~ we~e wro
branch
we must ignore -~.in~ctions fetched si~ce the branch was fetched,
thus mcum a penallY, Mode
·ctors built in.
1pelined nncroprocessors often have. very sophisticated
· ·'
Store result
Superscalar and VLIW Architectures
I I I We can use multiple ALUs• to further speed µp a processor. A superscalar micprocessor can
(c)
execute two or more scalar operations in parallel, requiring two or more ALUs. A scalar
operation transforms one or two numbers, as opposed to vector or matrix operations that
. "sh c.leaning, (c) pipelined instruction execution. ,.·.
Figure 3.4: Piper ng; (a) nonpipelined dish cleaning, (b) pipelined di
transform entire sets of numbers. Some superscalar microprocessors require that the
2. Decode instruction: the task of detennining what operation the instruction in the . instructions be ordered statically (at compile time), wh\le others may reorder the instructions
instruction register represents (e.g., add, move, etc.). " .: dynamically (during runtime) to make use of the additional ALUs. A VLIW (very long
3. Fetch operands: the task of moving the instruction's operand data into f instruction word) architecture is a type of static supersca\31' architecture that encodes several
(perhaps four or more) operations in a single machine instruction.
appropriate registers.
4. Execute operation: the task of feeding ~ .e appropriate registers through the ALU
and back into an appropriate register. · . .
5. Store results: the task of writing a regist;,;r·into
...p~ ,V'··--· memitry. ;. 3.4 Programmer's View
' (
If each stage takes one clock cycle, then we can see that a single instruction may take ~ A progranuner writes the program instructions that carry out the desired functionality on the
several cycles to complete. . j general-purpose processor. The pro,.grammer may not actually need to know detailed
information about the processor's architecture or operation, but instead may deal with an
~Pipelining · .._ . J architectural abstraction, which hides much of that detail. The level of abstraction depends on
f (! .Pipelining is a ~olrimon way to increase the instruction throughput of a microprocessor. We i the le.vet of programming. We can distinguish between two levels of programming. The first
is assembly-language programming, in which one programs in a language representing
first make .a simple analogy of two people approachin.g the chore of washing and drying eight r··
dishes. In one approach, the first person washes all eight dishes, and then the second person . processor-specific instructions as mnemonics. The second is structured-language
dries all eight dishes. Assuming l minute per dish per person, this approach requires 16 ~ programming, in which one programs in a language using processor-independent instructions.
minutes. The approach is clearly inefficient since at any time only one person is working.and I: A compiler automatically translates those instructions to processor-specific instructions.
1l: the o~her is !die. Obvio.~sly, a better approach ~s for the second_ person tobe~in drying the first : ldeally, ·the structured-language programmer would need no information about the processor
I; q.ish unmediately after 1t has been washed. This approach reqmres only 9 mmutes - l minute · architecture, but in embedded systems, the programmer must usually have at least some
\;: for the .f i~t dish to be washed, and then 8 more minutes until the lastdish is finally dry . We. l awareness, as we shall discuss.
Actu~ly, _we -~ defi11e an even lower programming level, ma.9hine-language
\ \ refer to.this_latt~r ~pproa~h as "p!pelined." · · . . .··. . . I
\ , Each dish.1s like an mstruction, and the two sks of washing_,and drymg ~ like the five I programnung, m which the progrnmmer writes machine instructions in binary. This level has
· . ~ge~ h~~ ear!ier. By usJ.l,lg a separate_1.lilit . ch akin_to ~ per.son) for e,ru;~_stage, ~e- ca~l become extremely rare due to the advent of assemblers. Machine-language- progranuned
computers often had rows of lights representing to the programmer the current binary
·, P!~lme ms~cuo~ _execullon,0tter them cuon'fetch urut fetches the first mstruction, the I
instructions being executed. Today's computers look more like boxes or refrigerators, but
6.0 El)lbedded System pesign

Chapter 3: General.Purpose Processors: Software 3.4: Progiamner's View_
Instruction I opcode operand} operand2 Addressing Register-file Memory

mode Operand field contents contents
In~tion2 opcode operandl operand2
Instruction 3 opcode operandl operand2 Mo"'

·
~n,#/ 'f6\ri0
fnuned1ate . Data
Instruction 4 opcode operandl operand2
...
/, .. .
/cgister- Register address Data
/ _direct
Re · r
Figure 3.5: Instructions stored in memory._ Register address Data
direct
I:
I they do not make for interesting movie props, so you may notice that in the movies,
computers with rows of blinking lights live on.
· Memory address ' - - - - - - - - - - - - ~ Data
1---------1
Instruction Set
The assembly-language programmer must know .the processor's instruction set. ~e
Data
instruction set describes the bit configurations allowed in · the IR, indicating the atomic
processor operations that the programmer may invoke. Each such configuration forms an
assembly instruction, and a sequence of such instructions fonns an assembly program, stored
in a processor's memory, as illustrated in Figure 3.5. . Figure 3 .6 : Addressing mode;.
An instruction typically has two parts, an opcode field and oper~~ fi~lds. ~ opc_o<Ie
specifies the operation to take place during the instruction. We can classify 1~strucllons mto (}tie operand field may indicate the data's location through one of several addressing
three categories. Data-transfer instructions move data between memory and re~1sters,_ betw~en modes, illustrated in Figure 3.6. In immediate addressing, the operand field contains the data
input/output channels and registers, and between registers themselves. Anthmet1c/fog1cal itself. In register addressing, the operand fiel4 contains the address of a datapath register in
instructions configure the ALU to carry out a particular function, move data from the registers whic~ ~e. .~llli!.J.e
~ des._In register-indirect addressing, the operand field contains the-address
through the ALU, and move data from the ALU back to_ a particular register. Branch ol"=fi"'"register, which in tum contains the address of a memory location in which the, data·
instructions determine the address of the next program mstruct1on, based possibly on datapath resides. Irt direct addressing; the operand field contains the address of a memory location in
status sigrials. . .. . · .. . which the data resides. In indirect addressing, the 91~ rand field contains .the address of a
Branches can be further categorized as being uncondillonal Jumps, conditional Jumps or memorv location, which in· tU11tcn'niainsiheaddress orf ---~~~o'cy'To~iti~~-1n wtuch the data
procedure call.and return instructions. Unconditional jum_ps always de!e_n nine the address of resides~ Those familiar with structured language~ 111C:ly,note that direc~ addressing implements
I
the next instruction, while conditional jumps do so only 1f some condition evaluates to true, regular. vanab(es, -ana- mrurect ao(]ressmg hnpl_ements pointe~S-~ _Inh~rent or implicit
such as a particular register containing zero. A call instruction, in add!tion to .indicating the Mtskessmg, !!}~ particular register or memory location of the data 1s 1mphetnn the opcode; for
address of the next instruction, saves the address of the current mstrucuon so that a exaiiplc, the_data may Ieside irt a register · led th~ "accu_mbl~t~r.'@ in~exed ad~e~ing, the
subsequent return instruction can jump back to the insu:nction i~mediately ~ollowing th~ most
d~t or mdiret~er~ - .m
__•u.st be a. .•ed_... to _a_·p.·artl·. cu.lar 1mphctt register to obtam the act~
recent invoked. call instruction. This pair of instrucllons fac1htates the unplementauon of yefpcrlll}d~es~j~structi may use relative addressing to re'duce the number of btts
ptocedure/function call semantics of high-level programming languages . ·. ncederto indicate t11{Jum.P,·- --~;_ -,A reli ti~e actdf~ss indicates_how_J~uo jump from the
· 'An operand field specifies the location of the actual data ~t takes part man operatton. current address, rather ... ompl§ddre~s. Such addressmg is very common
Source operands serve as input to the operation, while a destmatton operand stores the output. . y - - .. .
smcc most Jumps-are near y mstruc
.
·
. ----- "
The number of operands per instruction varies among processors. ~ven fo~ a given processor, Ideally, the s ctured-langl! prog er would not need to know the instruction set
the number of operands per instruction may vary depending on the mstructton type. of the process . However arly every embedded system requires the programmer to write
al least so e portion the program in assembly language. Those portions may deal with
low-ley inp11Voutput operations with devices outside the processor, like a display device.
Embedded System Design Embedded System Design 63

62
.,. .., , ·,j• ·· - - ,.~- ·.:.·~ ~- ' - ,· - ·
. .;..........,..i . . ~--.....- _ _ . • --.......,,.~ · ··'""""

·c..e www.compsciz.blogspot.in
·· Cc.,•,o!
,. Se,;ae·=
::,:_--= =-- - - - - - -
.. ' .
..
Chapter 3: General-Purpose Processors: Software 3.4: Prograrriiiier•s' View
'
Assembly instruct First byte Second byte -Operation 0 MOVR0,#0; II total= 0
MOYRI, #I_?; // i = 10 ij
MOY Rn, direct 0000 Rn
I direct
I Rn= M(direct) 2
3
-MOYR2,#J;
MOVRJ,#0;
// constan! _I
//cons~tO
ii
int total = O;
r
MOY direct, Rn 0001 Rn I direct
I M(direct) = Rn for (int i=lO; i!=O; i-)
total+= i; Loop: JZRl, Next; II Dc:ine if i=O I
I I next instructicns .•. •n
MOY@Rn,Rm 0010 Rn Rm I r M(Rn)=Rm
5
6
'A DDRQ, RI ;
SUB RI , R2;
II total+= i
// i-
f
j
(MOYRn, #imm~ 0011 Rn immediate

I Rn = immediate 7 JZR\Loop; i/ Jwnp always ir
Next:
ADDRn,Rm 1 0100 1 Rn Rm
I Rn=Rn+Rm
II next instructions...
l
(a) (b) i
SUB Rn, Rm 0101
I Rn Rm
l Rn= Rn-Rm
Figure 3.8: Sample p;.;;grams: (a~~ program, (b) equi"..alent assembly program.
I
I
i
I
JZ Rn, relali'!s: 0110
·' -r-J
I- Rn
I relative
! PC = PC + relative
(only ifRnisO) functions. For example, a base register may exist, which permits the programmer to use a
!
I
data-transfer instruction where the processor adds an o~rand field to the base register: to
opcode · operands
obtain an actual memory address.
Figure 3.7: A simple (trivial) instnu:tion set.
Other special-function registers must be known by both the assembly-language and the
structured-language programmer. Such registers may be used for configuring built-in timers,
Such a device may require specific timing sequences of signals in order to receive data, and counters, and serial communication devices, or for writing and reading external pins.
the progranunet may find that writing assembly code achieves such timing most conveniently.
A driver routine is a portion of a program written specifically to communicate with, or d,rive, 1/0
another device. Since drivers are often written in assembly language, the-structured-language The programmer should be aware of the processor's input and output (I/0} facilities, with
programmer may stiH require some familiarity with at least a subset of the instructio@ set. · which the processor conununicates with other devices. One common I/0 facility is parallel
Figure 3.7 si1ows a (trivial) instruction set with four data transfer instructions, two 1/0, · in which the programmer can read or write a port (a collection of external pins) by
aritlunetic.instructions, and one branch instruction, for a hypothetical processor. Figure 3.ll(a) reading or writing a special-function register. Another common 1/0 facility is a system bus,
shows a program written in C that adds the numbers l through 10. Figure 3.8(b) 'shows that consisting of address and data ports that are automatically activated by certain addresses or
same program written in assembly language using the given instruction set. types of instructions. I/0 methods will be discussed further in Chapter 6.
· Program and Oaia Memory Space Interrupts

The embedded systems programmer must be aware of the size of the available memory for An interrupt causes the processor to suspend execution of the main program and jump to an
program and for data. For example, a particular processor may have a 64K program space, interrupt service routine. (ISR) that fulfills a special, short-term processing need. In particular,
and a 64K data space. The programmer must not exceed these limits. In addition, ihe the processor stores the current PC and sets it to the address of the ISR. After the ISR
programmer will probabiy want to be aware of on<hip program and data memory capacity, completes, the processor resumes execution of the main program by restoring the PC. The
taking care to fit the necessaiy program and'.data in on<hip memory if possible. progranuner should be aware of the types of interrupts supported by the processor (described
in Chapter 6), and must write ISRs when necessary. The assembly-language/programmer
Registers places each ISR ·at a specific ..address . in. program memory, -The ..structured-language ·
. Assembly-language . programmers must know how many registers are available for progranuner must do so also; sortie comptlei-s" allow a progranuner to force a prOC€dure to
general-purpose data storage. They must also be familiar with other registers that have special
Embedded Systl!m·oesigi\ ~mbedded System Oes_lgn 65_
Chapter 3: General-Purpose Processors: Software
CheckPort proc
p..lSh ax ; save tlYil cxntent LPT Co!lllector Pin 1/0 Direction Rel(ister Address
p..lSh dx ; save tlYa cxntent I Output 01t1 bit of register #2
m::,v dx, 38:h+ 1 : base+ 1 for i:egi.ster #1 2-9 Output 0'"-7'" bit of register #0
in al, dK ; read register #1
10, 11, 12, 13,15 lnout 6,7,5,4,3"' bit of register# I
arrl al, !Oh ; mask rut all rut bit # 4
; is it 0? 14,16,17 Output 1,2,3" bit of rel(istcr #2
atp al, 0
jne switdLn ; if mt, we need to tum the I.ED en
SWitct-Dff:
m::,v dx, 38:h + 0 ; base + 0 for i:egi.ster #0 Figure 3.10: PC parallel port signals and associated registers.
in al, dK ; :rea::l tlYa run:ent state of the port
arrl al, feh ; clear first bit (ItaSking)
dx, al ; write it rut .to the port given in Figure 3.9. Writing and reading three special registers accomplishes parallel
out
jnp Dene ; we are d:ne communication on the PC. Those three registers are actually in afi 8255A Peripheral Interface
Controller chip. In unidirectional mode, (default power-on-reset mode), this device is capable
SWi.td'Dn: of driving 12 output and 5 input lines. In Figure 3. IO, we give the parallel port (known as
m::,v dx, 30Ch +0 ; base + 0 for i:egi.ster #0 LP1) connector pin numbers and the corresponding register location.
f al, dK ; read tlYa au:rent state of · the port
r
in A switch is connected to input pin number 13 of the parallel port. A light-emitting diode
or al, Olh ; set fust bit (ItaSk:i.n:J)
out dx, al ; write it rut to the port (LED) is connected to output pin number 2. Our program, running on the PC, should monitor
f}
''.·. Dene:
pep
pep dx
ax
; restore the cxntent
; restore the cxntent
the input switch and tum the LED on/off accordingly.
Figure 3.9 gives the code for such a program, in x86 assembly language. Note that the in
and out assembly instructions read and write the internal registers of the 8255A. Both
CheckPort en:;> instructions take two operands, address and data. The address specifies the register we are
// clefire:I in assetbly aboJe
trying to read or write. This address is calculated by adding the address of the device called
extem "C" CheckPort (void) ;
void J:ra:in(void) { the base address, to the address of the particular register as given in ·Figure 3.9. In m~st PCs,
while( 1 ) the base address of LPTl is at 3BC hex (though not always). The second operand is the data.
Ch:ckPort () ; For the in instruction, the content of this 8-bit operand will be written to the addressed
register. For the out instruction, the content of the addressed 8-bit register will be read into
this operand.
Figure 3.9: PC parallel port example.
The program makes use of masking, something quite common during low-level 1/0. A
mask is a bit-pattern designed such that ANDing it with a data item D yields a specific part of.
start at a particular memory location, while others recognize predefined names for particular
D. For example, a mask of OOOOII 11 can be used to yield bits 3 through O (e.g., 00001111
ISRs. AND IOIOlOIO yields 00001010). A mask of 00010000, or IOh in hexadecimal format,
For example, we may need to record the occunence of an event from a peripheral device,
would yield bit 4.
such as the pressing of a button. We record the event by setting a variable in memory when
In Figure 3.9, we have broken our program in two source files;assembly and C. The
that event occurs, although the user's main program may not process that event until later.
~ssembly program implements the low-level I/0 to the parallel port and the C program
llather than requiring the user to insert checks for the event throughout the main p_rogram, the
implements the high-level application.
pr9grammer merely writes an interrupt service routine and associates it with an input pin
connected to the button. The processor will then call the routine automatically when the
Operating System
button is pressed.
An operating system is a layer of software that provides low-level services to the application
Example: Assembly-Language Programming of Device Drivers . layer, a set of one or more programs executing on the CPU consuming and producing input
and output data. The task of managing the application layer involves the loading and
This example provides an application of assembly language programming of a low-level
executing of programs, sharing and allocating system resources to these programs, and
driver, showing how the parallel port of a PC can be used to perform digital I/0. The code is

fi, .
.... .,·,..,_j:·._..
··- ~--····---···- .. --·.;.;.;-=
··-'---___;~ ~= - - - - - -- ·-·----- ------v--"-' ="-"'~""""....,""""'°""'a
Chapter 3: General-Purpose Processors: .Software
-
1-·················-····-························. ·-···---·-·-·- - - - - -
DB file_narre "rut.txt" - store file naire
C File
M:JJ RO, 1324 - system call "open" id
M:JJ Rl, file narce - address of file-narre
!NI' 34 - cause a system call
JZ RO, Ll - if zero -> error
; read the file

JMP I2 - bypass error ccrrliticri
Ll:
• han:ile the error
I2:
·i ---- l
Debugger I
Figure 3.11: System call invocation.
protecting these allocated resources from corruption by non-owner programs. One of the most Profiler
important resource of a system is the central processing unit (CPLJ?, which i~ typically s~ed
among a number of executing programs. The operating system 1s responsible for deciding
what program is to run next on the CPU and for how long. This is called process (or task)
;
, I
scheduling and it is determined by the operating system's preemption policy. Another very ~~~~°-~-~-~ ---··-····-'!_______
important resource is memory, including disk storage, which is also shared among the Figure 3.12: Soft_ware deve1°1'.ment pro":"s/'
applications running on the CPU. .
In addition to implementing an environment for management of high-level applica~on
programs, the operating system provides the software required for sen:icing vario~ . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
hardware-intenupts, and provides device drivers for driving the periphe~hdealv1ceds ~resent mh 3.5
the system. Typically, on startup, an operating system initializes all penp er eVtces, sue Development Environment
as disk controllers, timers, and inpul/output devices and installs hardware interrupt service In this section, we take a look at the general software design tools that are used by embedded
routines (ISRs) to handle various signals generated by these devices. Then it installs software system designers in design, test, and debugging of embedded software.
intenupts (intenupts generated by the software) to process system calls (calls made by
high-level applications to request opernting system services) as described next. . . Design Flow and Tools
. A system call is a mechanism for an application to invoke the operatmg system. It IS .
Several software and hardware tools -commonly support the programming of general-purpose
If
analogous to a procedure or function call, as in high-level programmjng lan~ges. When a
processors. First, we must distinguish between two processors we deal with when developing
program requires some service from the operating system, it generates a predefined software
an embedded system. One processor is the development processor, on which we write and
. intenupt that is serviced by the operating system. Parameters specific to the requested
debug our program. This processor is part of our desktop computer. The other processor is the
· services are typically passed from (to) the application program to (from) the operating system
target processor, to which we will send our program and which will form part of our
through CPU registers. Figure 3.11 illustrates how the file "open" system call may be
embedded system's implementation. For example, we may develop our system on a Pentiwn
invoked, in assembly, by a program. Languages like C and Pascal provide ~apper functions
processor but use a Motorola 68HC 11 as our target processor. Of course, sometimes the two
around the system-calls to provide a high-level mechanism for performing system calls.
processors happen to be the same, but this is mo~-tiy a coincidence.
· In ~ummary, the operating system abstracts away the details of the underlying hardw~re
· · Programming of an embedded system's processor is similar to writing a ~ that
. and provides the application layer an interface to the hardware through the system call
. mechanism. . . runs on your desktop computer, with some subtle but important differences. · e 3: 12
\ . 11
~epicts the standard software development process. 'Qte~elleral design flow_for programmmg
68
Embedded System Design Bnbedd@<.i. Sy!!lem Design
:,. 69
Chapter 3: General.Purpose Processors: Software 3.6: Development Environment
\ < / ~ ? ';:.> .
(a) (b) editing, compili.n: , assembling, and linking our program, is_the _same_as..that used for .
i
l
hnplementation
Phase
'
hnplementation
j~~?e:~;:~~~t ::t~Jf{~!ri;[~~~tro~~::i::l~!!~~~~~~i::
embedded
phase (Le., the proces•.,ofksting the.J1nal..el!-~1i!hl~...:::-~~"'."~ ~greatly-different.m
Phase
s s . n ~.- !~?o~ll . - paragra~, we":iY.~!_·d~~~.illie ~-~ch of-the~ev~ill¢i'U· t~ls i~
!~~]~~--;
Figure 3.12
llefail ~ c...!Y . . ·---- -
m ,lers iransl assembly \il~tructions to binary machine instructions. In addition to
\! Figure 3.12 [
Development processor
just replacing o
translate
sym
e and operand mnemonics by binary equivalents, an assembler may also
olic labels into actual addresses. For example, a programmer may add a
c label END to an instruction A and may reference END in a branch instruction. The
I·------··-····__.I iI '
sembler determines the actual binary address of A and replaces references to END by this
addri:s. Th.e mapping of assemb.1~ instructions to machine instructions is one-t~~)
ompilers translate structured _p_rograms into machine (or assembly) p r o ~
pro ·ng languages poss~s:, high,level constructs that greatly simplify programming,
such as loop constructs, so each high-level construct may translate to several or tens of
machine instructions.· Compiler technology has advanced tremendously over the past decades,
applying numerous program optimizations, often yielding very size and petfonnance efficient
cod~ cross compiler executes on one proces.~or ~ur development processor) but generates
code for a different processor (our target rocessoo/Cross compilers are extremely common
in;ffe.bedded system development. b
ainto
External tools
' ~ linker allows a prog to c a program in ~parately assembled or compiled
-!
J
1______ ---······· ------·-· Ph~
~ I
V e r i f i c a t i o n ~!
___ . - --- -·-· __ _ -·· ..J it
files; it combines the m me instructions of
program fi tpn19
•
•
• ..
a single program,• perhaps
incorporating instiucti . . from standard library ro . e:;. A linker designed for embedded
processors will al ti)' to eliminat binary code asso· ated with uncalled procedures and
functions as · I as memory cated to unused variables in order to reduce the overall
\ h_;,'( · ~ , o.
,JJ}-3-,
.
' '\
~1'·~;_·~ )
:1
J
E mple: Instruction-Set Simulator for a Sample Processor \/.::.-·"!'}<
Figure 3.13: Software ~signp-ocess: (a) desktop, (b) embeddOO.,: i An instruction-set simulator is a program that runs on one processor and executes the
.
·
t instructions of another processor. In this example, we design an instn.1ction-set simulator for

/~pplic_atio~ that run on a desktop co~pu!er . ~ with ~ting our source _codf, possibly 1·
:···
the simple processor of Figure 3. 7. Our program takes as input a file name containing binary
l.:f' organized ma number of files for modulanty, usmg an editor. Then we compile or assemble instructions of our simple processor. The code for this instruction-set siiuula!or is g~~~ ~
th~ ~e. in each file, u~ing a co~piler or ~sembler, into corresponding b~files. Next,
usmg a link~, we combm~ these b1~ files m -~ ..c~.: etecuta.'lie. Th~,~ collectively,
F1gure3.14. . 0D'E·: -,/r,.,\_,J,--)J
can be constdered e implementation phase. ext, we test oui: program by running the Testing and Debugging -~ .. .,..,.·; . ·
executable file er the command of a deb er. Sometimes--we,nay-use~rofilerto-
pinpoint __ petfQ.rmance-bottlenecks-of your program. During this phase, if we .discover Generally, the testing and debugging phase of developing programs is a major part of the
~rrors · r petfonnance bottleneclcs, w;: · return to the implementation phase, make overall design processes. This is especially true when the program is being developed to run
rove?Ients, and repeat the processy .. in an embedde<I" system. For example, it is not acceptable for your car's engine management
.Typically, a11 of th~se tools have bee.n combined into a single integrated dwelopment . system to require occasional rebooting because of a software hang up, Programming is an
environmen( (IDE), which ~tly simplifies the design process. Figure 3.13(b) shows the error prone activity, and it is inevitable that there will exist errors and bugs in writing any
desi~ ~ow ~o~ e~tfedsoftwgre dev~lopment, in c:o~
to the design!!\?""' for.desktop
applications m Figure 3,l}(a). ~re;· the ·implemeritation phase;·cwhiebs4s the-process of
reasonablv large program. The most common method of verifying the. correctness of
program is running (executing) it with ample input data that check the program's behavior.
a
,· t [1,:?Y·
/

\
www.compsciz.blogspot.in · · ·,-d.,,.~i ·l6·w·,a·&e5t\:i•·e:, , ~.- -
I
'--'
, 3.5: Development Environment
c~pter 3: General-Purpose Processors: Software
envirorunent where the embedded system is to function. Hence, debugging a program running
in an embedded system requires having control over time, as well as control over the
#:i.n:lu:ie <stdio. h> envirorunent and the ability to trace or follow the execution of the program, in order to detect
typedef stru:::t {
un.si<;P!d char first_byte, .secx:nd_byte; errors. In the remaining paragraphs, we take a look at some tools and methods to help us do
J instru:::ticn; '\ just that. These tools, for the most part, enable us to execute and observe the_beha"ior of our
instruction program[1024); / / this is rur instructicn irarory 11 ,'){,'J~ prom!JlS. ~
un.sicprl char 11E!rOry[256]; // this is cur data irarory rit'l , 0
/2' De ggers help programmers ~uate and correct thei~-rograms. They--lUll--o~
,
void run_yrogram(int nun_bytes} { ~-_9.-, V · ~~ve opment processor_ and suppo_rt stepwise program execution executing one instruc~---..,.,._ ·
mt p:: = -1; r C) 0 · 6 and then stopping,·proceeding to the next instructi<J{l when inst __
0
ted by the user. They permit
un.si<;P!CI char reg[l61, fb, sb; l
while( ++pc< (nun bytes / 2) ) ( . ,·· -~ "\
execution up to user-specified breakpoi which_are- ·fustructioris" .. ·- when encountered
tb = program[p::J .first_byte; /\ C \ · ,, cause thci.program to stop executing. h never the rogram stop e user can examine
sb = program[p::] . secaid_byte; ( J::~ __ value~ . of vari us me~o _ a n d ~-- ocat1on . , source-level ··debugger enables
switch< fb » 4 > < -:::::..----,.-<. -,
case 0: reg[fb & OxOf] = irarory[sb]; break; r\J
r'1H
.tJ'C.,;,
--s1ep-by-sre1rexecution in ource progr;Un lari , , whether assembly language or a
case 1: nerrory[sb] = reg[fb & OxOf]; break;\_. . siructure-cl"t~guage. Ag~ debugg,ipg@a_b_·_'~-~ty d_~~s crucial, as today's programs can be quite
case 2: rrenory[reg[fb & OxOf]] = reg[sb » 4]; break; complex and hard to wnte correctly. . mc~~"ll',!ggeTs- are p~ograms that run on o r
case 3: reg[fb & OxOf] = sb; break; '~r
-~~~~9pm~11t proc~~~~r b!U~execute code d p._cil .for ... o~[Qi!g~Df( -- .. r; thev alwavs ..
case 4: reg[fb & OxOf] t= reg[sb » 4); break; · imic r ne the function of the tar et rcicessor. ese e ers.are also known ,as
case 5: reg[fb & OxOf] -= reg[sb » 4); break; ins ctio -set simulators ~s ..Q.i:.~~1unes( - . - c).. t:. 'j ' , 'CVM..
.. ,
case 6 : if (reg[fb & OxOf) = 0) p:: t= sb; break;
. mu a ors suppo e uggmg of the program while rt e;,,:~cutes on the targi:fprocessor.
defa ult: return - 1;
1i'l An -emulator typically C_ On~istS Of a debugger coupled:with· 3 b-~ ,·ifrcc:f C.O_nnecieiffo th_e_ desktop
rJ!;} processor via a cable.i@.e board_consists ·of the tar~!_u_s ~?~~~~~rt circuitry
return O;
int milil(int argc , char * argv[J) {

A often· ano_th.er proce..ssor._). The b.oard may have another cable with :tdev1cc havmg the same
pin;lnfition as. the_.tar . ge.t processor, al
system. uch an in-ci cuit emul~tor
exec . .. n°lhe' - . tern"' . ··ecl\y
. lowing one to the device into a real embedded
en Jes o_ ____o.. control and ..roonitor the program's
. . In-circuit emulators are available for nearly
FIIE* ifs;
If( argc != 2 11. (ifs = f cpen{argv[l], "rb"} = NULL ) return -1;
--~ y:ji"rocess-oT~- eiiae ;··or embed,·ea·us :·a ugh they can be quite expensjve if they are to
if (run_yrcgram (f read (program, s izeof (program) = OJ ( run at real speeds. ' ~ c. r ( rw- •
print _= ~ccntent s () ; De, ice p grammers download a binary machine program from tfu:development
retum(O); processor's memozy into the target processor's memozy. ·Once the target processor has been
} programmed, the entire embedded system can be tested in its most realistic form (i.e., it can
else retum(-1);
be executed in its environment and the behavior observed in a realistic way). For example, a
\ car ~quipped with our engine management S.)'Stem can be taken out for a drive!
;Revisiting Figure 3.12, we see that programs intended for embedded systems can be
Figure 3.14: Instruction-set si_mulator implementation. tested in three ways, namely, debugging using an ISS, emulation using an emulator, and field
testing by downloading the program directly into the target processor. The difference between
especially using boundary cases. This is relatively easy to do when developing programs that these three methods is as follows. The design cycle using a debugger based on an ISS running
run on your desktop computer. . . on the development computer is fast, but it is inaccurate since it can only interact with the rest
For embedded system programmers, this task is a little more chall~ngmg. Specifically, a of the system and the environment to a limited degree. The design cycle using an emulator is
program running in an embedded system most often needs to be real-~~e. For 7~ampl~ our ·1 a little longer, since the code must be download::d into the emulator hardware; however. the
engine management program must generate pulses that actuate the ~el-1ll.Jecto~ with a_ umely emulator hardware can interact with the rest of the system, hence can allow for more accurate
and calculated pattern. A distinguishing characteristic of a Ieal-ume system IS that It must r testing. The design cycle using a programmer to download the program into the target
compute correct results within a predetermined amount of time, w~ile ~ non-teal-time system t L processor is the longest of all. Here, the target processor must be removed from its system and
only needs to compute correct results. In addition, a program runrung m an ~mbedded s_ystem put into the programmer, programmed, and returned to the system. However, this method will
works in conjunction with many other components of that system as well as mteracts with the
;.
www.compsciz.blogspot.in . . ··-- ----------- .~ --- · ·.:_, _ _,_..... -,·

Chapter 3: General-¥urpose Processors: Software
-·-----------~.::::::.:!.:.:::::!:::== 3.7: Selecting a MicroprocesS!II" ·
enable the system to interact with its environment more freely, hence provides the highest Digital Signal Processors (DSP)
execution accuracy but little debug control. ·
The availability of low-cost or high-quality development environments for a processor Digital signal processors (DSPs) ~ processors that are ~ghly optimized for processing large
often heavily influences the choice of a processor. amounts of data. The source of this large amount of data 1s some form of digitized signal, like
a photo image captured by a digital camera, a voice packet going through a network router, or
an audio clip played by a digital keyboard. A DSP may contain numerous register files,
memory blocks, multipliers, and other arillunetic units. In addition, DSPs often provide
3.6 Application-Specific Instruction-Set Processors (ASIPs) instructions that are central to digital signal processing, such as filtering and-uansfonning
Today's embedded applications, such as high definition TV, require high computing power vectors or metrics of data. In a DSP, frequently used arithmetic functions, such as multiply-
and very specific functionality. The performance, power, cost, or size demands of these and-accwnulate, are implemented in hardware and thus execute orders of magnitude faster
applications cannot always be dealt will! efficiently by using general-purpose processors. than a software implementation numing on a general-pUIJlOse processor. In addition, DSPs
Nonetheless, the inflexibility·of custom single-pUIJlOse processors is often too prohibitive. A may allow for execution of some functions in parallel, resulting in a boost in performance.
solution is to use an instruction-set processor that is specific to that applicatio~cir application As with microcontrollers, DSPs also tend to incorporate many peripherals that are useful
domain. Because these ASIPs are instruction-set processors, they can be programmed by in signal processing on a single IC. As an example, a DSP device may contain a number of
writing software, resulting in short time-to-market and good flexibility, while the performance analog-to-digital and digital-to-analog converters, pulse-width-modulators, direct-memory-
and other constraints may be efficiently satisfied. acx;ess controllers, timers, and counters.
As with most otheI aspects of embedded systems design, there is a trade-off here. Many companies offer a variety of commonly used DSPs that are well supported in tenns
Instruction-set processors and the associated software tools (compilers, linkers, etc.) are very of compiler and other development tools, making them easy anci cheap to integrate into most
expensive to develop; therefore, they are expensive to integrate into low-cost embedded embedded systems.
systems. In contrast, the large applicability and resulting cost amortization of general-purpose
processors make them very cost effective solutions in most embedded systems. ASIPs tend to Less-General ASIP Environments
come in three major varieties, namely, microcontrollers, which are specific to applications In contrast to microcontrollers and DSPs, which.. can be used in a variety of embedded
that perform a large amount of control-oriented tasks, digital signal processors (DSPs), which systems, IC manufacturers have designed ASIPs that are less general in nature. These ASIPs
are specific to applications that process large amounts of data, and everything else, which are are designed to perfonn some very domain specific processing while allowing some degree of
less general ASIPs. programmability. For example,_ an ASIP designed for networking hardware may be designed
to be progranunable with different network muting, checksum, and packet processing
Microcontrollers protocols. · ·
Numerous processor IC manufacturers market devices specifically for the control-dominated
embedded systems domain. These devices may include several features. First, they may
include several peripheral devices, such as timers, analog-to-digital converters, and serial 3.7 Selecting a Microprocessor
. communication devices, on the same IC. as the processor. Second, they may include some
program and data memory on the same IC. Third, they may provide the programmer with The embedded system designer must select a microprocessor for use in an embedded system.
direct access to a nuniber of pins of the IC. Fourth, they may'provide specialized instructions The choice of a processor depends on technical and nontechnical aspects. From a technical
for common embedded system control operations, such as bit-manipulation ·operations. A perspective, one must choose a processor that can achieve the desired speed within certain
microcontro/ler is a device possessing some or all of these features. power, size and cost constraints. Nontechnical aspects may include prior expertise with a
Incorporating peripherals and memory onto the same IC reduces the number of required processor and its development environment, special licensing anangements, and so on.
ICs, resulting in compact and low-power impleqientations. Providing pin access allows Speed is a particularly difficult processor aspect to measure and compare. We could
programs to easily monitor sensors, to set actuators, and to transfer data with other devices. compare processor clock speeds; but the number of instructions per clock" cycle may differ
Providing specialized instructions improves perfonnance for embedded systems applications. greatly among processors. We could instead compare instructions per second, but the
Thus, microcontrollers can be considered ASIPs to some degree. complexity of each instruction may also differ greatly among processors. For example, one
Many manufacturers market devices referred to as "embedded processors." The processor may require 100 instructions ~bile another processor may require 300 instructions
difference between embedded processors and microcontrollers is not clear, although we
note to perform the same computation. \ ·
that the former tenn seems to be used more for large (32-bit) processors.
74 Embedded System Design fa,,[::,,~dd~o:i System Design 75
···- -= ~·= ~ - - - -- - --- ......... - -- -- ·- www.compsciz.blogspot.in

Chapter 3: General.Purpose Processdrs: Software
3.8: General-Purpose Processor Design
Anothe~ commonly us~ spe~ comparison writ, which happens to be based on the
Processor Clock Peripherals Bus MIPS Power Tran- Price Dhrystone, 1s MIPS. One might think that MIPS simply means millions of instructions per
g,,,..e,1 Width sistors · second, but ~ctually _the ~o~on use of the tem1 is based on a somewhat more complex
General PIii nose p ··ocessors notion. Specifically, Its ongm 1s based on the speed of.Digital's VAX 11/780, thought to be
Intel Pill 1GHz 2xl6K 32 ---900 97W -7M $900 the first computer able to execute one million instructions per second. · A VAX 11/780 could
Ll,256KL2, execute 1,757 Dhrystones/second. Thus, for a VAX 11/780, 1 MIPS = 1 757
-MMX Dhry.stones/second. This unit for MIPS is the one commonly usect'today, and it is sometimes
IBM . 550 2x32KLI, 32/64 -1300 SW -7M $900
PowerPC 256KL2 referre~ to as Dhrystone MIPS. So if a machine today is said. to run at 7 50 MIPS, that actually
MHz
750X means 1t can execute 750 * 1,757 = 1,317,750 Dhrystones/second.
lv1IPS 250 2tj2K,2way 32/64 NA NA 3.6M NA The use and validity of benchmark data is a subject of great controversy. There is also a
R5000 MHz set assoc. clear need for benchmarks that measure perfonnance of embedded processors. An effort
StrongARM 233 None 32 268 IW 2.lM NA underway in iliis area is EEMBC (pronounced "embassy" ), the EDN Embedded Benchmark
SA-110 MHz Consortium. The EEMBC has five benctunarking suites of programs corresponding to
Microcontrollers different embe~ded applications: aut~mo~ve/industrial, consumer electronics, networking,
Intel 8051 12 4KROM, 128 8 -1 --0.2 W -lOK $7 office automation, and telecornmurucauons. Each suite consists of several common
MHz RAM,321/0, algorithms found in the suite's application area. For example, two of the programs in the
Timer,UART consumer electronics suite are JPEG compression and decompression (JPEG is a standard for
Motorola 3MHz 4KROM, 192 8 - .5 --0.1.W -IQK $5
still digital image compression). Another program in that suite involves infrared signal
68HC811 RAM, 321/0,
tr-ansmission and reception. .
Timer, WDT,
SPI
I Numerous general-purpose processors have evolved in the recent years and are in
· Di~ital Signal Processors common use today. In Figure 3.15, we summarize some of the features of several popular
TI C54l6 160 128K SRAM,· 3 16/32 -600 NA NA $34 process.ors.
MHz Tl Ports, DMA, i
13ADC, 9DAC - I
Lucent 80 16Klnst,2K 32 40 NA NA . $75
DSP32C MHz Pata, Serial ' > ral-Pureose Processor Design . I
Ports, DMA I
Sources: Intel, Motorola, MIPS, ARM. Tl, and IBM Websites/Datasheets; Embedded Systems
Programming,"Nov.1998. ·
A~ neral-purpose processor is reaily just a single-purpose processor whose purpose is to
process instructions stor~ in a program memory. Therefore, we can design a general~purpose
processor using the single-purpose processor design technique described in Chapter 2. While
l
real microprocessors intended for mass p_roduction are more commonly designed using
custom methods rather than, the general technique of this section, using the the general
' . . .. .
Figure 3.15: General-purpose processors. teclutlque h~re may prove a useful exercise that will illustrate the basic unity between single-
purpose and general-purpose processors~
One attempt. to provide a means for a .fairer comparison is the•Dhrysione benchmark. A _ ~uppose we want ~ design_ a gene~~{lurpose ~rocessor having the basic architecture of
benc~k i:t a program intended to be run on different processors fo compare their Figure 3.1 and supportmg th~strucuon set 9f Figure 3.7. We can begin by creating the
· performance. The Dhrystone benchmark was origiruilly developed in 19il4 by ileillhold fSMD shown in Figure 3.16( whichd~lles the desired processor's behavior. The FSMD
. Weicker specifically as a perfon:nance benchmark; ii perfcintis no useful work it focuses on . dec;lares several. variables for s orage: a ' 16-bit program counter PC, a 1'6-bit instruction
register JR, a 64K x 16 bit rnernory M, and a 16 x 16 bit register file RF. The"F.SMD's initial
exercising a processor's integer ai:itlimetic and string-handling capabilities. Its current version
state, Reset, clears PC to 0. The Fetch state reads M[PC] into JR. The Decode state does
is. writte11 in C and is in the public do~. Beta~ 'nii:,st pr~sors ·can>execute it in
milliseconds, it is typically execiited thoilsands of times, and thus a proci:ssor :is said to be
nothing but adds the extra cycle necessary for JR to get updated so we can then read it on an
arc. Each arc leaving the Decode state detects a particular instruction opcode, causing a
able to e~ecute so many Dhrystones per second. · · ·
transition to the correspondiiig execute state for that opcode. Each execute state, like Mov 1,
ri
,.
I
76 Embeddec! System Design :

· Embedded System Design 77
.J ii~~- ,,
·-· -:.~- . ...:·,..
- -- -~--=--:--
.. - =:ta~= - - - - -
Chapter 3: General-Purpose PrucesSOfS: Software _ _ _ _;....._ _ _ _ _ _ _ _ _ _ _ _ _ _ _....:,:3.8.:,::~Ge=ne:r:al:::-P_:u~rpo::se:_::Pr:::
· o:·ce~ss:::o~r~De~si~g~n
Declarations: Aliases:
op IR(l5.. 12] dir IR[7 ..0] Datapalh
bit PC[l6], IR(l6]; IR[7..0) Control wtit 0
m IR(ll..8) inun
bitM[64k)[l6), RF[l6)[16); IR(7 ..0]
rm IR[7..4] rel 2xl.mux
Reset PC=O; PCch=l;

Controller RFw
MS=IO; (b)
(a) (next-state and From all
hld=l;
Fetch Mre=I;
control output RF (16)
logic; state register) control
PCinc=l;
signals
from states
Decode below hid
RFrl RFr2
Movl RF[m) = M[dir]

'------~to Fetch. I
i
;
RFwa=m; RFwe=l; RFs=I;
Ms~); Mre=l;
ALUs
ALUz
ALU
Mov2 M[dir]= RF[m] ! RFrla=rn; RFrle=I;
' - - - - - " ~ t o Fetch I Mr-01; Mwe=l;

Ms
2
3xl mux ·
0
Mre Mwe
Mov3 M[m) = RF[nn) I RFrla=rn; RFrle=I;
g ~to Fetch I Ms=IO; Mwe=l;

I
8
0
- ----
---M_o_v4____::.i::- I RF"~=rn; RF~l; RFs=lO;
RFwa=rn; RFwe= I; RFs=OO;

A
a
Figure 3.17: Architecture of simple microprocessor.
Memory D i
0
Add RF[m) =RF[m}+RF[nn]l., .RFrla=rn; RFrle=I;
'-----'-4~to Fetch RFr2a=rm; RFI2e=l; ALUs=OO registers PC and IR, memory M, and register file RF. The second step is to instantiate
l
RF[m] = Rf[m)-RF[rm)! . RFwa=rn; RFwe=l; RFs=OO; .
functional units to carry out the FSMD operations. We'll use a single ALU capable of
Sub carrying out all the operations. The third step is to add the connections among the
· RFrla=rn; RFrle=l;
components' ports as required by· the FSMD operations, adding multiplexors when there is
'-----"-'+toFetch
. I!
RFI2a=nn; RFt.2e=i;ALUs=OI
more than one connection being input to a port. Finally, we create unique identifiers for every
Jz PC=(RF[m]=O) ?rel. :PC l PCld=ALUz; control signal.
' - - - - - " - - t o Fetch . ] Given this datapath, we can now rewrite the FSMD as an FSM representing the
data~th's controller. Each FSMD operation must be replaced by biruuy operations on control
Figure_3.16: A simple microprocessor: (a) FSMD, (b) FSM operations that replace the FSMD operations after we
a-eatethe dalapath of Figure 3.17. ·
signals, as shown in Figure 3.16(b). The states and arcs~ identical for the·FSMD and FSM,
. and only the operations change, so we do not redraw the states and arcs in the figure. As an
Add, and Jz, carries. out the .actual in$'uction operations by. moving datl between storage · example of operation replacement, we., replace the assignment PC = O in state-Reset by the
devices, modifying data, or updating PC. · • · . ·' control signal setting PCclr = l.
. we can now build a datapath that can carry out the operation of this FSMD, asEfesc"bed· We can use the FSM design technique of Chapter 2 to design a controller,·consisting of a
in Chapter 2. The datapath we create using the following.steps is sh~wn in Figure_3.17 . e state register and next-state/control logic. We omit this step here; ·
first step is to instantiate a storage device for ~b declared vanable, so we . te '
· Embedded Syste"1 Design 79

78
.. ·-·--- ----- --- ~~- , ..
. -
- cti,pter 3: General.Purpose Ptoces~: Software
.-;= 3.11: Exercises
Having just designed a simple general-purpose processor using the same technique we em~ded systems, including programming, -compilers, operating svstems e 1
used to design a single-purpose processor, we can see the similarity between the two device progranuners, microcontrollers, PLDs and memories An ~ual b . ~u ators,
Provides
.
tabl
es o
f d- " , - .
ven ors ,or these items, including 8/16/3 2/64 ·
uyer s guide
.processor types. The key difference is that a single-purpose prveessor puts the "program"
inside of its contr<>I logic, whereas a general-purpose processor k¢eps it in an external mJcrocontrollers/rnicroprocessors and their features. -bit
memory. So the program of a single-purpose processor cannot be changed once the processor • Microprocessor Report, MicroDesign Resources California, 1999 A
·d· · -<I th ' ·
ti·'
mon uy report
has been implemented. But nevertheless, both processor types process programs. A second pro~ mg m ep cove~ge of trends, announcemenls, and technical details, for deskto
·difference is that we design the! datapath in a general-purpose processor without knowledge of mobile, and embedded microprocessors. p,
what program will be put in the memory, whereas we know this program in a single-purpose • www.eembc.org. The Web site for the EEMBC benchmark consortium.
processor. -SO the datapath of a siitglei)urpose processor can be optimized to the program. We • SIGPLAN Notices 23,8 (Aug. 1988), 49--62. Provides source for the Dhrystone
see that single-purpose and general-purpose processors both implement programs. Though benchmark version 2. Online source can be found at fip.nosc.mil:pub/abuno.
· they may differ in terms of design metrics like flexibility, power, performance, and cost, they
_ fundamentally do the same thin? . . 3.11 Exercises
3.1 Describe why a . general-purpose processor could cost less than a single-purpose
3.9 Summary processor you design yourself.
3.2 Detail the stages of ~xecuting the MOV instructions of Figure 3.7, assuming an 8-bit
General-purpose processors are popu1ar in embedded systems due to several features, .
including low unit cost, good performance, and low NRE cost. A general-purpose processor' processor and a 16-bu IR and program memory following the model of Figure 3. 1. For
a
consists of controller and datapath, with a memory tQ store program and data. To use a example, the sla~e~ for the ADD instruction are (I) fetch M[PC) into IR, (2) read Rn
general-purpose processor, the embedded system designer must write a program. The designer and Rm ~om re~ster file ~ough ALU configured for ADD, storing results back in Rn.
as
, may write some parts ot'this program, such driver routii,es, using assembly language, while
3.3 Add o~e mstruct1on to the mstruction set of Figure 3. 7 that would reduce the size of our
summmg assembly program by I instructiorL Hint: add a new branch instruction. Show
-writing other parts in a structured language. Thus, the designer should be aware of several 1.
the reduced program. · _ ·
aspects of the processor being used, such as the instruction set, available memory, registers,
3.4 Crea~e a lable l_jsting the address spaces for the following adtlres~ sizes: (a) 8-bit (b)
J/0 facilities, and interrupt facilities. Many tools exist to support the designer, including
assemblers, compilers, debuggers, device programmers, ·and emulators. The designer often
makes use of microcontrollers, which are processors specifically targeted to embedded
l(i-b1t, (c) 24-bit, (d) 32-bit, (e) 64-bit. ·
35 Illu~te how program and dala memory fetches can be overlapped in a Harvard
'
I
systems. These processors may include on-chip peripheral devices and memory, additional . architecture.
J/0 ports, and instructions supporting common embedded systein operations. The designer has -· 3.6 Read the entire problem before beginning. (a) Write a C program that clears an
a v;µiety of processors from which to choose. "short int M[2S6]." In other wo~ds, the prograt'h sets every location to o. Hint: 3;:.};
program should o~y be a coup!~ Imes long. (b) Assuming M starls at location 256 (and
. . . -th'"'. en_ds at l~auon SI I), wnte the same program in assembly language using the
earher mstru~Uon set. (c) Measure the time it ta1ces you to perform parts a and b and
3.10 ·References-and Further Reading report those umes. · '
• Philips semiconductors, 80C51-based 8-bit Microcontrollers Databook, Philips 3.7 A~uire a databo~k for _a rnicrocontroller. List the features of the basic version of that
Electronics North America, 1994. Provides an overview of the 8051 architecture and ~cr~ntroller, mcluding · key characteristics of the instruction set (number of
• on-chip peripherals, describes a large number of derivatives each with various features, ms~Cbons of each type, length per instruction, etc.), memory architecture and
_describes the I2C and CAN bus protocols, and highlights development support tools. -. ~vailable m:~?ry, general-purpose registers, special-function registers, I/0 facilities,
• - Rafiqlizzarnan;' Mohamed. Microprocessors and Microcomputer-Based System Design. mtemipt fac11it1es, and other salient features. _ ·
Boca '~ton: CRC Press, 1995. Provides an overview of general-purpose "pr<>Cessor 3-8 · For_ ~e microcontrol~er in the previous exet;rcise, create a table listing five existing
· · · architecture, along with detailed descriptions of vari~ Intel 80xx and Motorola 680QO va~tmns of .that nucrocontroller, stressing the features that differ from the basic
series processors. · -· _ · \ .· -- version.
• Embedd,ed Sy$,_tems Programming, Mille( Freeman Inc., San Francisco, 1999; A monthly
•· publication covering trends in v;trious aspects of general-purpose processors .for _
80 ~bedded Syst~m Design

E"!bedded System Design . 81
------- --- ·-·-- ___._,___...,_.~= =ca.=~ www.compsciz.blogspot.in

___.,_-.·.. ,·.., . ,. ·-
·. :-::-·
CHAPTER 4: Standard Single-Purpose

Processors: Peripherals
4.1 Introduction .
4.2 Timers, Counters, and Watchdog Timers
4.3 UART
4, 4 Pulse Width Modulators
4.5 LCD Controllers.
4.6 Keypad Controllers ·
4. 7 Stepper Motor Controllers
lti 4.8 Analog-to-Digital Converters.
4,9 Real-Time Clocks
4JO Summary .
t 4 .11 References and Further Reading
t 4.12 Exercises
t
~
[
i--------------.'------'-------------
tt 4.1
. Introduction
A singl~purpose processor is a digital system intended to solve.a specific computation task,
l . as opposed to a generaliJUII)OSe processor, which is i.JUended to solve a .wide variety of
'I computation tasks. ·The single-purpose processor may be· a custom one that we cbign
ourselves, as discussed in Chapter 2. However, . somecomputation tisks are so common that
standard single-purpose processors have evolved. These processors can be purchased "o~ the
shelf" The manufacturer of such an off-the-shelf processor sel,ls the device in large quantities. ·
An embedded system designer choosing to tise a standard single-purpose processor to
· implement a specific computation task. as opposed to choosing to ·design a custom single-
. purpose processor, may achieve several benefits. First, NRE cost will be low, since the
processor is predesigned. $econd, unit cost may be low, since the standard ~ r is
mass-produped and hence the manufacturer can amortize NRE costs. . .
· ~mbedd~d System Design 83
_______.._......,.~- ·- --· - -- ---·-------·-- - ~ - ·-·,..____,._. - - - - - - - - - -

.
,...,:.,· ;, :·~ - .
Chapter 4: standard Single-Purpose Processors: Peripherals 4.2: Taners, Counters, and \vatchdog Tmers
Using a standard single-purpose processor also provides benefits over using a

general-purpose processor. Perfonnance may be faster, power may be lower, and size may be (a)
smaller, all due to the fact that the standard single-purpose processor is customized for the
particular task. Even if a general-purpose processor will exist in a system, adding single.
purpose processors can free the general-purpose· processor for other tasks. ·
There are of course trade-offs. If we are already using a general-purpose processor, then
implementing a task on an additional single-purpose processor rather than in software may
add to the system size and power consumption. , (d)
In this chapter, we describe the basic functionality of several standard single-purpose tennioal
processors commonly found in embedded systems. The level of detail of the description is cowlt
intended to be enough to enable use of such pr~ors but not necessarily the design of one.
We refer to standard single-purpose processors as peripherals because they usually exist
on the periphery of the CPU. However, microcontrollers tightly integrate these peripherals
- - ---·· .-.-- - ···
with the CPU, often placing them on-chip, and even assigning peripheral registers to the
CPU's own register space. The result is the common term on.:.Chip peripherals, which some
may consider _somewhat of an oxymoron. ·
!imer
l. ;
i!' ~igure 4.1: ".'X"dures: (a) a basic timer, (b) a timer/counter,{c) ;timer with a ~ (cl) a 16/32-bit
tuner, (e) a timer with a prescaler. '--------=-~-~
~ .
4.2 Timers, Counters, and Watchdog Timers ~igure 4.l(a) provides the structure of a very simple timer. This timer has an internal
1.J 16-bit up counter, which increments its value on each clock pulse. Thus, the output value c:nt
Timers and Counters represents th~ m11~1ber of pulses since the counter was last reset to zero. To interpret this
A timer is an extremely common peripheral device that can m ~ time intervals. Such a
number as a rune mterval, we must know the frequency or period of the clock signal c:Jk. For
11 device can be used to either generate events at specific times, or to determine the duration . example, suppose we wish to measure the time that passes between two button presses_In this
case, we could reset the timer on the occurrence of the first press, and then read the timer
between two exte ents. Example applications that require generating events include •
keeping a traffi t green for a specified dtµ"ation, or communicating bits serially between : output on the second pres6ppose the frequency of_~lk w~rc::JOO MHz, meaning the period
devices at a · c rate. An example of an application that determines inter-event duration is would be I/ (100 MHz)= IO nanoseconds, and that c:nt = 20,000 at the time of the second
!1 that of co uting a car's speed by measuring the time the car takes to pass over two separated '
a road
button press. We would then compute the time that passed between the first and second button
pr~ses as 20,000 * IO nanoseconds = 200 microseconds. We note that since this timer's
timer·measures time by counting pulses that.occur on an input clock signal having a ; counter can count from O to 65,535, this particular timer has a measurement range of o to
wn period For example, if a particular clock's period is I microsecond, and we've ·. 65,535 * IO nanoseconds = 655.35 microseconds, with a ~olution of IO nanoseconds. We
unted 2,000 pulses on the clock signal, then we know that 2,000 miaoseconds have passed. define a timer's range as the maximum time interval the timer can measure. A fuaer's
·resolution is the minimum interval it can measure. .
A c:ounter is a more general version of a timer. Instead of counting clock pulses, a :
counter counts pulses on some other input signal. For example, a counter may be used to • . The timer in Figure 4. l(a) has an additional output top that
indicates when the top value
count the number of cars that pass over a road sensor, or the number of people that pass · of its range has been reached, also known as an overflow occurring, in which case the timer
through a turnstile. We often combine counters and timers to measure rates, such as counting J ?·
rolls ov~r to When we use a timer in conjunction with a general-purpose processor, and we
the number of times a car wheel rotates in one second, in order to determine a car's speed. ( ~xpect um_e mtervals to exceed the timer range, we typically connect the top signal to an
!o use a. timer, we 1Il_U$l configure its inputs and monitor its outputs. Such use often _I mterrupt pm on the processor. We create a corresponding interrupt service routine
that counts
reqwres or can.be .greatly aided by an understanding of the internal structure of the timer. The f the number of times the routine is called, thus effectively extending the range we can
inte.rnal structure can vary greatly among manufacturers. We provide a few common. features f m~ur~. Many _microco~trollers that include built-in timers will have special interrupts just
of such internal sttuctures .in Figure 4.1. . ·· . f for its tuners, with th"se mterrupts distinct from ext(lrnal interrupts.
Figure 4. l(b) provides the structure of a more advanced timer that can also be configured
as a counter. A mode register holds a bit, which the user sets; that uses a 2 x I multiplexor to
84 Embedded .syst€m Design

Embedded System -Design 85
www.compsciz.blogspot.in ;;i,;; ·· ·- · . . ..., . . ._, .,__ . . - - ·
~L . -··- ----~ ------··--- ·-·-- ---.- ----- -------,-- -· --- - ------- ~-~·- ··---- ·. ,! ;
Chapter 4: Standard Single-Purpose Processors: Peripherals 4.2: Timers, Counters, and Watchdog rmers
select the clock input to the internal 16-bit up counter: The clock input can be the external elk . (a)
signal, in which case the device acts like a timer. Alternatively, the clock input can be the
indicator Ir\! reaction
external cnt_in signal, in which case the device acts like a counter, counting the occurrences
of pulses on cnt_in. cnt_in would typically be connected to an external sensor, so pulses
light (LED)--'--..--. 0 iQ button
would occur at indetenninate intervals. In other words, we could not measure time by
counting such pulses.
LCD ~ I time: IOOms II
Figure 4. l(c) provides the structure of a timer that can inform us whenever a particular
interval of time has passed. A terminal count register holds a value, which the user sets, (b)
indicating the number of clock cycles in the desired interval. This nwnber can be computed
using the simple formula: /* ma:in.c */ while (user has not p.,tsred reaction buttm) {
if(tcp) {
~ r of clock cycles = desired time interval / clock period #define MS mrr 63535 step tilter
void rra:in(wid) { set 01t to MS INlT
For example, to obtain a duration of 3 microseconds from a clock cycle of 10 int co.int mi.llisecxnis = O; start tircer -
nanoseconds (100 :MHz), we must count: 3 x 10"" s I 10 x 10·9 s/cycle = 300 cycles. The timer caifig.u:e-t::iner m:xie reset tcp
structure includes a comparator that asserts its top output when the tenninal count has been s~t cnt to MS mrr cant_millisean:lst+;
wait a ran:km-1E!OZ!t of tine
reached. This top output is not only used to reset the counter to 0, but also serves to inform the
tum m in:li.cator licjlt
timer user that the desired time interval has passed. As mentioned earlier, we often connect start tiller tum off in::licator light
this signal to an .interrupt. The corresponding interrupt service routine would include the print! (''tiire: %i ms", co.int_mi.lliseccn::1.s) ;
actions that must be taken at the specified time interval.
To improve efficiency, instead of counting up from O to tenninal count, a timer could
instead count down from tenninal count to 0, meaning we would load terminal count rather Figure 4.2: Reaction timer: (a) LED, LCD, and button, (b) pseudo-code.
than O into the 16-bit counter upon reset, and the.counter would be a down counter rather.than
an up counter. The efficiency comes from the simplicity by which we can check if our COUil! Note that we could use a general-purpose processor to implement a timer. Knowing the
has reached O- we simply input the count into a 16-bit NOR gate. A single 16-bit NOR gate number of cycles that each instruction ~uires, we could write a loop that executes the
is far more area- and power-efficient tlian a 16-bit comparator. desired number of instructions; when this loop completes, we know that the desired time
Figure 4. l(d) provides the structure of a timer that can be configured as a 16-bit or 32-bil · passed. This implementation of a timer on a dedicated general-pmpose processor is obviously
timer. The timer simply uses the lop output of its first 16-bit up counter as the clock input of quite inefficient in terms of size. One could alternatively incorporate the tifner functionality
its second 16-bit counter. These are known as cascaded counters. into a main program, but the timer functionality then occupies much of the program's run
Finally, Figure 4. l(e) shows a timer with a prescaler. A presca/er is essentially a time, leaving little time for other computations. Thus, the benefit of assigning timer
configurable clock-divider circuit. Depending on the mode bits being input to the prescaler, functionality tea special-purpose processor becomes evident.
the prescaler output signal might be the same as the input signal, or it may have half the
frequency (double the period), oile-fourth·the frequency, one-eighth the frequency, etc. Thus, Example: Reaction Tim~r
a prescaler can be used to extend a timer's range, by reducing the timer's resolutio~r A reaction timer is an application that measures the time a person takes to respond to .a visual
exampl,e , consider a timer with a resolution of IO ns and a range of 65,535 * 10 nanoseconds or audio stimulus. In this example, the application turns on an LED, then measures the time a
= 655.35 microseconds. If the prescaler of such a timer is configured to divide the clock person takes to push a button in response, and displays this time on an LCD, as illustrated in
· · f r ~ by eight, then the timer will have a resolution of 80 ns and a range of 65.535 * 80 Figure 4.2. We expect reaction times to be on the order of seconds, and we want to display
~o~~~ds = 5.24 milliseconds. · . reaction times to millisecond precision. .
Many timers will combine the above features, plus adding other configurable features. · In this example, we'll use a microcontroller with a built-in 16-bit timer. The timer is
One such feature is a mode bit or additional input that enables or disables counting. Anothei incremented once every instruction-cycle, where one instruction cycle for this microcontroller
feature is a mode bit that enables or disables interrupt generation when top count is reached. equals six clock cycles. The dock frequency is 12 MHz, meaning the period is_ 83.3:f.i
nanoseconds. Thus, this timer has a resolution of I instruction-cycle = 6 clock cycles = 6 *
83.333 nanoseconds= 0.5 microsecond. Furthermore, since the timer has 16 bits, its range is
65,535 * 0.5 microsecond = 32.77 milliseconds. This timer does not have a prescaler or a
r·:
86 Embedded Syste111 Desigll .· Embedded System Design ,87
'
www.compsciz.blogspot.in - -.----··----- ~ ~=="""'-"'"'--'•
Chapter 4: Standard Single-Purpose ProceUOf'!I: Peripherals
4.2: Ti~s, Counters, and Watchdog Timers
tenninal count register, but it does however have a top signal to indicate overflow, and it also (a)
al19ws us to load in an initial value for its internal up counter.
We note that this timer's range is smaller than our desired range of several seconds, while osc prescaler elk r-sca-ler_e_g-:__ov
_e_rfl_o_w__ timereg 1-o_v_e_rtl_o~w. to system reset
its resolution is finer than our required one millisecond. Thus, we must somehow extend the · or
range, but without the convenience of a prescaler or terminal count register. Instead, we'll set ihtenupt.
the initial timer value such that overflow will occur after l millisecond, and then monitor the checkreg
top output signal of the timer to activate code that keeps a count of the number of overflows, (b) (c)
meaning the number of mimseconds. The nurber of instruction cycles corresponding to l /* main.c * / watdrlog_r eset_rcutine(){
millisecond is l millisecond/ (0.5 ajcrosecond/instruction-cycle) = 2,000 instruction cycles.
/ * checkreg is set so we can load value
Thus, the appropriate initial timer value is 65,535 - 2,000 = 63,535. Pseudocode describing ' main() l into tirrereg. Zero is loaded into
the reaction timer implementation is shown in Figure 4.2(b). . wait until card inserted scalereg an::i 11070 is loaded into
Note that we did not use an interrupt service routine here, since the system does not have ; call watc:hdog_reset_routine tirrereg * /
any other functions. Also note that waiting a random amount of time could also make use of a
while(transaction in progress) { checkreg = 1
timer. if (buttcn pressed) { scalereg = 0
Notice that the method described above has some inaccuracy. Our method requires that perform correspon:ling acti.cn tirrereg = 11070
we stop the timer, reset the timer, and then start the timer again. When we stop the timer to call watchdcg_reset_ routine
reset it, a certain amount of time that we are not measuring·passes. However, this time is i
small so we treat it as negligible. void interrupt_service_rcutine ()(
/ * if watch:log rese t rcutine not called e j ect caro ,
every < 2 minutes, - r eset screen
Watchdog Timers interrupt_se.rvice_rcutine i s called * /
A special type of timer is a watchdog timer. We configure a watchdog timer with a real-time
value, just as with a regular ~r. However, instead of the timer generating a signal for us
every X time units, we must generate a signal for the timer every X time units. If we fail to f Figure 4.3: ATM timeout using a watchdog timer, (a) timer stnicture. (b) main pseudo-code. (c) watchdog ·resel
rou1me. ..
generate this signal in time, then the timer "times out" and.generates a signal indicating that b
i ~~~ f Anot~er common use is to support ti~e outs in a program while keeping the program
ti One common use of a watchdog timer is to enable an embedded system to restart itself in !
case of a failure. In such use, we modify the system's program to include statements that reset
st~c_ture simple. For e~ample, we raay desire that a user respond to questions on a display
' i w1th10 some time penod. Rather titan sprinkling response-time checks throughout our
the watchdog timer. We place these statements such that the watchdog timer will be reset at
program, we can use a watchdog timer to check for us, thus keeping our program neater. An
least once d~g every time out inteival if the program is executing normally. We connect the example in this chapter illustrates such use of a watchdog timer.
fail signal from the watchdog timer to the microprocessor's reset pin. Now suppose the
program has an unexpected failure, such as entering an undesired infinite loop, or waiting for
Example: ATM Timeout Using a Watchdog Timer
an input event that never arrives. The watchdog timer will time out, and thus the
microprocessor will reset itself, starting its program from the beginning. In systems where In this example, 'a watchdog timer is used to implement a timeout for an automatic teller
such a full reset during system operation is not practical, we might instead connect the fail machine, _or A1M. ~ normal ATM session involves a user inserting a bank card. typing in a
signal to an interrupt pin, and create an interrupt service routine that jumps to some safe part ix:rsonal 1denuficauon_ number, and then answering questions about whetl1er to deposit or
of the program. We might even combine these two responses, first jumping to an interrupt withdraw money, which account will be involved, how much money will be involved.
service routine to test parts of the system and record what went wrong, and then resetting the whether another transaction is desired, and so on. We want to design tJ1e ATM such tliat it
system. The interrupt service routine may record information as to che number of failures and will terminate the session if at any time the user does not press any button for 2 minutes. In
the causes of each, so that a service technician may later evaluate this information to this case, the A1M will eject the bank card and terminate the ·session.
determine if a particular part requires replacement. Note that an embedded system often must We will use a watchdog timer with the internal structure shown in Figure 4.3(a). An
self-recover from failures whenever possible, as the user may not have the means to reboot oscillator signal osc is connected to a prescaler that divides the frequency by 12 10 gencra,te a
the system in the same manner that he/she might reboot a desktop system. signal elk. The signal elk is connected to an I I-bit up-counter scale reg. · When scalereg .
overflows, it rolls over to 0, and its overflow output c auses the 16-bit uixountcr timereg to
88 Embedded System ·Design Embedded System Design 89
.....
Chapter 4: standard Single-Purpose Processors: Peripherals
increment. If timereg overflows, · it trigger.; the system reset or an interrupt. To reset the · (a) (b)
watchdog timer, ci,eckreg must be enabled. Then a value can be loaded into timereg. When a start bit data end bit
value is loaded into timereg, the checkreg register .is automatically reset. ff the checkreg
register is not enabled, a value cannot be loaded into timereg. This is to prevent erroneous
software from unintentionally resetting the watchdog timer. .
HI.IJllJIJI
Now let's determine what value to load into timereg to achieve a timeout of 2 minutes.
The osc signal frequency is 12 MHz. timereg is incremented at every t seconds where:
t = 12 • i1 1 • 1/(osc frequency) = 12 * 211 • 1/(12 • 106) 1111°1°111 11°1 111 11 jl I jojoj 1!I joj I II lj 11
= 12 • 2,048 • (8.33 • 10-8) = 0.002 second sending UART receiving UART
So this watchdog timer has a resolution of 2 milliseconds. Since timereg ~s _a .16-bit

register, its rciilge is O to 65,535, and thus the timer's range is O t~ 131,070 null1~nds Figure 4.4: Serial transmission using UARTs, (a) A PC communicating serially with an embedded device, (b)
(approximately 2.18 minutes). Because timereg counts up, then to a ~ the watchdog interval transmission protocol used by the two UARTs.
time X milliseconds, we load the following value into the timereg register:
timeregvalue= 131,070-~
example; using odd parity, if the number of ls in the received data add up to an even number, l
If time.reg is not reset within X milliseconds, it will overflow. ~igure ~.3(b) and (c)
and the parity bit is' l, the data is assumed to be valid, otherwise it is assumed to be erroneous.
-Likewise, the UART can be configured to check for even parity or no parity at all. Once data I
provide pseudo-code for the main routine and the watchdog reset routine t~ implement the
timeout functionality of the ATM. We want the timeout to be 2 nunutes (120,000
is received, the UART signa! · ost processor. The host processor in tum reads the byte out
of the receive shift re · The receiver is now ready to receive more data. ii
milliseconds). This means the value of X should be 11,070 (i.e., 131,070 - 120,000). The transmi 'cirks as follows. After the host processor writes a byte to the transmit
: i.;:
i
buffer of the ART, the transmitter sends a start bit over its transmit pin (tx), signaling the Ii
r:
i be · · of a transmission to the remote UART. Then; the transmitter shifts out the data in !
its mihsm.it buffer over its tx pin at a predetennined rate. If configured to do Sd, the transmitter
I I 4.3 UART also transmits an additional parity bit used as discussed previously. At this point, the UART
i' ' iversal asynchronous receiver/transmitter (UAR1) receives serial data and stores it as
i [: processo~sjts host processor, indicating that it is ready to_send more da~jf. 3yailable. -...
! iI / parallel data, usually one byte. It also takes parallel data and ~smits it as serial data. Such - - · -rnorder for two seruilly coimecte<nm1trs to communicate with each other, they inust
i "' serial communication is beneficial when we need to commurucate bytes of data betweeu agree on the transmission protocol in use. A sample transmission protocol is illustrated in
iI devic that are separated by long distances, or when those devices simply have few avail~ble Figure 4.4(b). The transmission protocol used by UART's dete11nines the rate at which bits.
{-.; l IL pins. Advanced principles and protocols of serial communication .mil be ~scussed 1~ a
I
are sent and received. This is called the baud rate. The protocol also specifies the number of
u
tql
later chapter. For our purpose in this section, we will look at the basics of senal
communication using UARTs.
Internally, a simple UART may possess some configuration registers,. ~d two ·· ·.
independently operating processors, one for receiving and the other for transnuttmg. The
bits of data and the type-0f parity sent during each transmission. Finally, the protocol specifies
the minimum number of bits used to separate two consecutive data transmissions. Stop.bits
are important in serial conununication as they are used to give the receiving UART a chance
to prepare itself prior to the reception of the next data transmission.
transmitter may possess a register, often called a transmit buffer, ~t hol~ data to ~ -sent. The baud rate determines the speed at which data is exchanged between two serially
\.:\1 This register is a shift register, so the data can be transmitted one bit at a ume by shifting ~t connected UARTs. Common baud rates include 2,400, 4,800, 9,600, and 19.2K. There is a
i.!i the appropriate rate. Likewise, the receiver receives data into a shift r~gister, and ~en this
data can be read in parallel. This is illustrated in Figure 4.4(a). Note that m order to shift at the
great deal of misuse of the term baud rate, often assumed to be just the same as the term bit
rate. In fact, bit rate is a true measure of the number of bits that are sent over a connection in
1r-··
;r;: · appropriate rate based on the configuration register, a UART requires a time~. 1 ..
The receiver is constantly monitoring the receive pin (rx) for a start b1( The start bit is
"ne second, while baud rate is the measure of the number of signal changes that are
I ,·
transmitted over a connection in 1 second. Soine clever techniques can be used to achieve a
typically signaled by a high to low transition on the rx pin. ~er_ the start bi! ~as been bit rate higher than the baud rate.
detected, the receiver starts sampling the rx pin at predeterrnmed mtervals, shifting each To use. a UART, we must configure its baud rate by writing to the configuration register,
sampled bit into the receive shift register. If configured to do so, the receiver also reads an and then we must write data to the transmit register and/or· read data from the received
additional bit called p arity which it uses to determine if the received data is correct. For
Embedded System Design Embedded System Design

90
:::!:· ··.··
4.4: Pulse Width Modulators

Chapter 4: Standard Single-Purpose Processors: Peripherals
register. Unfortunately, configuring the baud rate is usually not as simple as writing the
desired rate (e.g., 4,800) to a register. For example, to configure the UART of an 8051 i : l , T
microcontroller, we must use the following equation:
1---..------1·· -t-t++--:r-
I
i-
i . '
-- -- --_t- +-·-' 5V
(a) pwm_Ojl i .; i I ' ' ,---jOV
baud rate= (2""oc1 / 32) * oscfreq / (12 * (256 -THI)). ; , i5y
Here, smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of

elk L I ! I I I I I ;ov
! !
the oscillator, and THI is an.8-bit rat~register of a built-in timer. . 25% duty cycle-averagepwm_o 1s 1.2~V.
Note that we could use a general-p se processor. ~O~Jll$~l.~ UAR,T_ ~~pletely in
7
software. JLw~.-,usecLa~edicated _ge . - ~pJ!JP0St::P.!:.o¢ssor, ·tlie implemelltlltI9n woiififbe
1nclficient in- teniis.of size:-· · rnatively, we could integrate the __p:ansmit and receive pwm_o f-+--.l-+---+--+--t--+-i---t--t--1~r--1-t---t-+--W SY
I I i' i I . lov
furicti~nality with our main rogram. This wo\tlg require· creating a routine to__send data (b)
I 1 ! isv
elk
serially over an 1/0 !X)rt, aking use of a timer to control the rate. It would also.r(:(J.~ire using · II· - · I ·I' I I I I I i0 v
an interrupt service ro .. _ e to capture·se~_c!;lti g,ming from another 1/0 pprt whenever ~ch I I l
data begins arri . g. However, as with the timer functionality, adding send and receive · 50% duty cycle -average pwm_o is 2.5V.
~~,-=~~ ~! ~lj~~'''
functionality detract from tinie fo! other cornpp~-"s'> .
(c)
4.4 Pu]se Width Modulators
. - I ' ! ' . - , I • •J.-2,
Overview 75%dutycycle-averagepwm_ois3.75V. Q__l \ \\ ;~-:: ii
A pulse width modulator (PWM) generates an output signal that repeatedly switches between
high and low values. We control the duration of the high value and of the low v~ue by ·
Figure 4.5: Operation of a PWM, (a) 25% duty cycle, (b) 50% duly cycle, (c) 75% duty cycle. In lhe diagrams, logic
indicating the desired period, and the desired duty cycle, which is the percentage of time the· high is SY, low is OV.
signal is high compared to the signal's period. A square wave has a duty cycle of 50%. The
pulse's width corresponds to the pulse's time high, as shown in Figure 4.5. ·· ~ -::,
.. Again, PWM functionality could be implemented on a dedicated general-purpose
average input valtagt;:¥~Fefcia•ely •a ebt:ain lire desired speed. Using_il_ ~WM. we set the
duty cycle to achieve -the appropriate average voltage, and we set the period .small enough for J
:·.•
processor,. or integrated with another program's functionality, but the · single-purpose sm~peration of the motor (i.e., so the motor does not noticeably speed up and slow ii
procesSor approach has the benefits of efficiency and simplicity.
A common use of a PWM is to generate a clock-like signal to another device. For
. · down. ssuming the PWM's output is 5 V when high and O V when low, then we can obtain
an . e output of 1.25 V by setting the duty cycle to 25%, since 5 V * 25% = 1.25 V. This
'i
example, a PWM can be u~ to blink a light at a specific rate. . . . duty cycle is shown in Figure 4.5(a). Likewise, we can obtain an average output of 2.50 V by
Another common use of a PWM is to control the average current or voltage mput to a setting the duty cycle to 50%, as shown in Figure 4.5(b). A duty cycle of 75% would result in
device. For example, a DC (direct current).electric motor rotates when its input voltage i~ ~t average output of 3.75 V, as shown in Figure4:5_(c). This duty cycle adjustment principle
high, with the rotation speed pro!X)rtional to the input voltage level. Suppose the revolutions applies to the control of a wide variety of electric devices, such as dimmer lights.
per minute (rpm) equals 10 times the input voltage. To achieve a desired rpm of 12?, we Another use of a PWM is to encode control commands in a single signal for use ·by
would need to set .the input voltage to 1.25 V, wher~ achieving 250 rpm would reqwre an another device. For example, we may control a radio-controlled car by sending pulses of
input voltage of2.50 V. . ,, · different widths. Perhaps a width of I ms corresponds to a tum left command,· a 4-ms width to
One approach to .control the average input voltage to a DC motor uses a DC-to-DC . tum right, and an 8-ins width to forward. The receiver can use a timer to measure ·. the pulse
convener. circuit, which converts some reference voltage to a desired voltage, However, these width, by starting a. timer when the pulse starts and stopping the timer when the pulse ends,
circuits can be expensive. Another approach usesa digital-to-analog convener. A third and thus determining how much time elap~.
approach, perhaps the simplest, uses a PWM, .The P\\'M approach makes ~se .of the fact that.a
DC motor does not co_me to :an immediate stop when its inp~t voltage 1~0, but
rather it coasts, much like a bicycle coasts when we stop pedaling. Th~ ~ d the
~bedded Syslem Design 93

92 Embedded System Design_
·······--· ..·-·"·- ·- ·- --- -------·,···- ···,·····-· ., ...,. --------·-·- - ····---·~-.- --::ji~= - - - - -
Chapter 4: Standard Single-Purpose Processors: Peripherals 4.5: LCD ControUers
(a) looks at the values in the counter register and the cycle_high register. When the COUJ1ler value
-r>· ·-
input voltage %ofmaximum RPM ofOC motor is less than cycle_high,_ a. L(t5V) is outputted. When the value in counter is lowR ~the
voltage applied / />· valu; in mfi:yi:le_high regfaier a O (OV) is outputted. \V!J.en the counter value reaches 254,
0 0 :,,-,,··,....- 0 · counter is reset to Oand theprocess.r.epeats.. Thus, we see that elk· div detennines-the PWM's
2.5 50. . 4.600
period, specifyingih<i:° imin6er of ;:y_cle~ !!!J!tc: i>t:ri~. The registe~ cjcleJ!!gh detennines-ffie
3.75 75- 1;,-,,,.-- /.. urv ·
- ~
..
-:,,,
duty_J:y~mit"fii~:-'Y_ many of :a· ~nO<f~- cycles sff?°uld oiii~ut ~-I. ~
5.0 100 9,200 ~ e - - o u t p l i t s1gnal-is-alwa}'!l ·h1glrresulting.n-a-dnty-cyde of
(b) _ . ~ 169~( Ge~&tse ·
a-du. , . . . .
ll.:is.·sen~ol~(O.O:!i_T,:lhe.outputs1giial 1s always.low resiiltingin
-- -- --.:::-~=~""'-" : ' . : : ~ ~- .·::... :•..:.=. · · ---~-"C-= -- - -
COlJ!lt~r .· ~;unter<cycle_high,pwrn_o= I -----r=--~nnine the value of elk_div, we can try vanou ._ ... · --~ ·0 .._ _·•
('O --25:4J:. / :.Counter>: cycle_high, pwrn_o = 0 .f.~~EY~!??.f~~o slow for ou~ parti~~tor:·u the value of~/2~"lbo fow';the
?'" . v11lu_e__?.~tpllttecfl,}'._!fie ~~oy>~!lf!lt~s 7~ q~1c~~ c ~ ~ ~ r: outputs z.eros
8-bit comp~a!_or Y ·l~~ _e nou~,!~~}l!tj>S 111~~!._~<>~~lQw_.!!QWD.:, ~~~~:fil~ "i . ·m~ Q[(o gzntutu2~1:IJ!D.~l full
__ 4· ~"isetting the JialiI~ of..clk~qi_Y-.!Q_FF_h intliis C3$C-WO~~sCOiice lliis value is set, the
only register that needs to be consi@.!_~j!; Cyele~higW; -·
For the inotor,to run at 4,600 RPM; we iieect"a-ruty cycle of 50%. To compute the value
:.\.-. ~
,,~--r\>-). needed.in cycle_high for a 50% duly cycle, we multiply 254 by 0.50, yielding 127. Thus,
putting 7Fb ( 127 in he)cadecimal) into the cycle_hig~ register shoaj_cl c.!l~ the motor to run at
about 4,600 RPM. For ,the rtlQ!Qr to run· at 6,900 RPM, we need a 75%..dui"}; cyclt:. We
coi.:ipute 254 * o)5, yi"elding 191. Thus, putting BFh (191 in hexadecimal) into cycle_high
should ca"-~tk!2C.motor. . atabout6~9._00 RPM. ( fakt'i.}
We cannot Jusfco · · t the DC motor to the PWM because-"tnePWM _does not provide
enough c ...JQ. _:_ theDC motor. To. remedy this problem, we use an NPN transistor to
d · or. The code and schell:!l:ltic u~J9rthis example are:found in Figure 4.~c)
-a(ifffri. e figure, the name of the elk_ dTvce. is WMP and cyclej1igh 1s PWMI -~:::)c.
- LCD Controllers
\ ~ ! }:,
Overview
~le: Controlling a DC Motor Using a PWM: # A liquid cry.f[al display (LCD) is a low-cost, low-power device capable of displaying text and
images. LCDs are extremely common in embedded systems, since such systems often do not
In this example. we wish to control the speed of a direct-current (DC) -electtjc motor using a have video monitors like those that come standard with desktop systems. LCD~ .can be found
PWM. The speed uf the DC motor· is proportional to the voltage,. applied to· the motor. in numerous common devices like watches, fax and copy machines, and calculators.
Suppose that for a fixed loa(l the moHor i_clds the .revolutions per minut.e (rp.m) sh.own in The basic principle of one type of LCD, a reflective LCD, works as follows. First,
igurc 4.6(a) for the given input yolt,ge e must set the duty cycle ofa. PWM such that the incoming light passes through a polarizing_.plate. N_ext, that polarized ligh_t encounters liquid
veragc output yoltage equals ilic G . . V age/ ' · _;_, .
crystal mate'riat If we excite a region of this material, we cause the material's °molecules to
· Suppose. thatwe use a P as part of a system that includes two 8-bit regist~rs called . align, which in turn causes the polarized light to pass through the material. Otherwise, the
clk _div and cycle _high. it counter, and an 8-bit ~rnparator, as-5hown in Figure 4.6{b). light does not pass through. Finally, light that has passed through hits a mirror and -reflects ,
~c PWM works as ~ ws. Initially, the .value of _elk_div is lo_a d~ into the register. The back, so the excited region appears toJight up. Another ty,pe of LCD, an absorption LCD,
'c)k div register wo s a clock d1v1der. After a specified amount of time has elaps~,,a _plilse works simHarly, but uses a black ·surface instead of a mirr6r, The surface below the excited ·
is sent to the co ;?..----
r register. This causes the counter to incre t itself. The comparator then
>
region absorbs}ight, thus appearing datker than the other regions. '---
S14 Embedded System· Deslgn .:.Embedded System Design 95
www.compsciz.blogspot.in ----~~~-·-· .:... · . ·· ·· -~ .. ~-

I
Chapter 4: Standard Single-Purpose Processors: Peripherals
______________________ __.;.;.:,_:.:.:.:,:==== 4.6: Keypad Controllers .
(a) A dot-matrix "LCD consists of a matrix of dots that can display alphanumeric characters
LCD controller communication bus (letters and digits) as well as other symbols. A common dot-matrix LCD has five columns and
eight rows of dots for one character. An LCD driver converts input data into the appropriate
E (
. electrical signals necessary to excite the appropriate LCD dots.
Iler RJW(i . ·-·----- w
Each type of LCD may be able lo display multiple characters. lri addition, each character
8
RSi\ '
. "
' ' may .be displayed in normal or inverted fashioIL The LCD may permit a character to be
blinking (cycling through normal and inverted display) or may permit display of a cursor
DB7-DBO I
(such as a blinking ~derscore) indicating the "current" character. Such functionality would
(b) be difficult for us lo implement using software. Thus, we use an LCD -controller to provide us
with a simple interface to an LCD, perhaps eighfdata inputs and one enable input To send a
R R/ D D D I) D D D D
Description byte to the LCD, we provide a value to the eight inputs and pulse the enable. This byte may be
s w B, e. B, a. B, ' B, B, Bo
a control word, which instructs the LCD controller to initialize the LCD, clear the display,
(~ 0 0 0 0 0 0 0 0 I Clears all display, return cursor home select the position of the cursor, brighten the display, and so on. Alternatively, this byte may
~
be a data word, such as an ASCII character, instructing the LCD to display the character at the
0 0 0 0 -~ bO 0 0 I • Returns cursor home
currently-selected display position. "
0 0 0 y, 0 0 0 I
J/
D
s Sets cursor move direction and specifies
· or n~t to shift disolay
y
Example: LCD Initialization
~
/ ON/ OFF of all display (D), cursor ON/
0 0 0 0 0 I D C B
OFF (D), and blink cmsor position (B) · 1:_,_
.• In · this example, a microprocessor is connected fo an LCD controller, which in tum is
0
/
Lef" 0 0 0 I
S/
C
RI
L. • • Move cursor lmd shifts display _1 .connected to an LCD, as illustrated in Figure 4, 7. The LCD eontroller receives collfrol words
from the microcontroller, it decodes the control words and perfonns the corresponding actions
(/ ~
D Sets interface data length. nwnber of
0 0 0 I
L
N F • • disolav lines, and character font
on the LCD/ · . ·
l Once the initialization sequence is done, we can send control words or send actual data to
I 0 DATA Writes DATA r 'be displayed. RS is set t6 low to indicate that the data sent is a control worcCWhen RS is higlt,
, this indicates that the data sent over the communicatiori bus corresponds to a character that is
t
I:·:·
(c) to be displayed.6rvtim~ is ~ 111,.wh~~~~ _!tis~~~1!'~1 ~c:>_~<i or clata, th~nab~~l?,it E _
Codes t must be toggled~e some of the -correspon<liiig__control .words that can be sent
J/D = I cursor moves left DL= I 8-bit Using the initialization codes of Figure 4.7(d), the LCD has beeri set witli an 8-bit
J/D = 0 cursor moves right DL= 0 4-bit interface. In addition; the display has ~n ~l!!3r~ the cursor 1s~ffilhenbmel>i)S1tio11; and the
S = I with display shift N= I 2rows ~ - ci!:_rsor m<>ves to the rigtffiis data is' dispiayed (as opposedC"i o the~actual.da~itiiog when we
SIC = I display shift N=O I row Ii write to the LCD). The LCD is now ready to be written to. Usingthe table ofFigure4.7(c),
SIC = 0 cursor movement F=l 5xI0dots ~- we see that in ord<J." to. write data, we ·set RS = l. The actual data we want to write is present
R/L = I shift to right F=O 5x7dots t on DB7-DBO. The WriteChar function, shown in Figure 4.7(d), accepts a c~cter which will
R/L = 0 shift to left
-------------------------------i
i Jxd sent to the LCD controller to displl!Y on the LCD. The EnableLCD function toggles the
· . enable bi~d acts as a delay so that the command can be processed and executed. •
Fi!l"re .4.7: Example of LCD initialization: (a) components, (b) initialization sequence, (c) control codes, (~)
nucrncontrollerpseudocode. .
i _ _ _ _ _:·_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Q
~
::>()
__::..:,;...;).....,;;,=
One of the simplest LCDs is a seven-segment LCD. Each of the seven segments can be '· 4' 6 Keypad Con_ trollers .
.activated, enabling the display of any digit character or one of several letters and symbols. · to
A keypad consists-of a set of buttons that may be pressed to provide input an embedded \i'
Such an LCD may have seven inputs, each corresponding to a segment, or it may have oiily · J system. Again, keypa ,are extremely common in embedded systems, since such systems may
four inputs to represent the numbers O through 9. ·Ari. LCD driver converts these inputs' to the lack the keyboa . , comes standard with desktop systems.
electrical signals necessary to excite the appropriate LCD segments. (A sim _ ons arranged in aµ N-col y M-row -grid as illustrated in
Figure . . e device has _N outputs, each-output nding to a colunm, and another M
96 Embe<jded 6ystem' Design -Embedded SY$tem Design
Chapter 4: Standard Single-Purpose Processors: Peripherals 4.7: Stepper Motor Controll~rs
------------1NJ seouence A B A' B'

--------1N2 I + + - - Vd s::: Vm
..------1NJ ()
N4 2 - + + - A' 2 w
.i:,. B
--L --L --L 3 - - + + A 3 --.I
'D 8'
4 '"O
4 + - - + GND 5 GND
5 + + - -
bias'/set 6 phase A'
4 MC3479P elk 7 cw'/ccw
key_code key_code stepper motor driver 8051 01c 8 full ' /half step
10 cw'IWN Pl .0
7 1--"clk"----'--....IPI.•
~ite ~.
keypad controller
yellow B
black B'
Figure 4.8: Internal keypad structure, withN = 4 andM= 4.
outputs, each output corresponding to a row. When we press a button, one column output and·
one row output go high, uniquely identifying the pressed button. To read such a keypad from · B
software, we must scan the column and row outputs. Thi, scanning may be perfonned by a
keypad c oiler.. Actually, such a device decodes rather than controls, but we'll call it a
"con er" for consistency with ' e other peripherals discusse simple form of such a I
c rolleri,,as shown in Figure 4.8lscans the column and row puts of the keypad. When the
ontroller detects a button press, it ~ores a code corres . g to t ~ re ·ster,
a
key_code, and sets an output high, k_yressed, indica · that button has been pressed ur
software may poll this output every 100 millisec as or so, and read the register n the
output is high. Alternatively, this output can generate an interrupt on our general~purpose
orientation of the motorc Thus, rotating the motor 360 degrees requires applying CllfTent to the
processor, eliminating the need for polling.
coils in a specified sequence. Applying the sequence in reverse causes reversed rotation.
In some cases, the stepper motor comes with four inputs corresponding to the four coils,
and with documentation that includes a table indicating the proper input sequence. To control
4.7 Stepper Motor Controllers the motor from software, we must maintain this table in software, and writ':.·a step routine that
applies high values to the inputs based on the table values that follow the previously applied
Overview values.
In other cases, the-Stepper motor comes with a built-in controller, which·is an instance of
A stepper motor is an electric motor that rotates a fixed. number of degrees whenever we
apply a "step" signal. In contrast, a regular electric motor rotates continuously whenever a special-purpose processor, implementing this sequence. Thus, we merely create a pulse on
power is applied, coasting to a stop when power is removed. We specify a stepper motor an input sigrtal of the motor, causing the controller to generate the appropriate high signals to
either by the number of degrees in a single step of the motor, such as 1.8 degrees, or by ~e. the coils that will cause the motor to rotate one step~. ·- _
number of steps required to move 360 degrees, such as 200 steps. Stepper motors are common"'.
in embedded systems with moving parts, such as disk drives, printers; photocopy. and fax er Motor Driver- ~
machines, robots, carncorders,.and VCRs. · ,,, ~ - . trolling a stepper or requires apPlying a series of voltages to the four (typically) coils
Internally, a stepper motor typically has four coils. To rotate the motor one.step;'.we pass of the stepper ino . The coils are energized one or two at a time causing the motor to rotate
current tluough one or two of the coils; which particular coil or coils depends on the present one step. In s example, we are using a 9-volt, 2-phase bipolar stepper motor. Figure 4.9
shows . le. indicating the input sequence required to rotate !}le motor. The entire sequence
\
98 Em~ded System Design ·:: l;mbedded System Design 99
• • ~- - - -- _ _ _ _ , J . ~ w- • -·•••• •• www.compsciz.blogspot.in
I
· .:
·''
4.7: Stepper MotorControllers
Chapter 4: standard Single-Purpose Processors: Peripherals
void nain (void) {

sbit clk=PlAl;
* /a..un the I1Dtor forward * if(dir=O)(
sbit not:A=P2AO;
cw=O; /* set directicn / for{y=O; y<=step; y++){
sbit cw=PlAO; sbit isA=P2Al;
clk--0; /* p..ilse clock */ for{z=l9; 7.>=0; z -= 4) {
sbit notB=P2A2;
delay(); isA=lookup[z);
void delay (void) { sbit isB=P2A3;
clk=l; isB=lookup[z-1);
int i, j; sbit.dir-:P2A4;
notA=lookup[z -2];
for (i=O; i<lOOO; i++)
· for (j=O; j<SO; j+t) /*a..un the I1Dtor backwards *J: void delay() {
int a, b; notB=lookup[z-3);
cw=l; /* set directicn */ delay( ) ;
i = i + 0; for(a=O; a<SOOO; a++)
clk=O; /* p..ilse clo::j(. */
for(b=O; b<lOOOO; b++J;
delay();
clk=l;
void IIOVe(int dir, int steps)
int y, z; )
if(di.r = 1)( int lookup[20]
figure 4. 10: Controlling a stepper motor using a driver - software. for(y=O; y<=steps; yH'. { 1, 1, 0, 0, 0, l, 1, 0, O, 0,
1, 1, 1, 0, 0, 1, 1, 1, O, 0 };
~ for(z=O; z<=l9; zt=4J{
void mun< l<
isA.=lookup[z);
8051 isB=lookup[z+l]; while(l)I
not:A=lookup[z+2); /* m:,ve forward 15 degrees */
P2.4 GND/+V notB=lookup[z+3); m:,ve(l, 2);
delay(); I* m:,ve backwards 7 . 5 degrees */
P2.3 irove (0, 1);
P2.2
P2.I
P2.0
Figure 4.12: Controlling a stepper motor directly- software.
Example: Controlling a Stepper Motor Directly

In the second example, the stepper motor driver is eliminated. The stepper motor is connected
directly to ihe 8051 microcontroller. Figure 4.1 l gives the schematic showing how to connect
the stepper motor to the 805 l. The direction of the stepper. motor is controlled maually. If
P2.4 is grounded, the motor rotates counterclockwise, otherwise the motor rotates clockwise.
Figure 4 . 12 gives the code required to execute the input sequence from the table in Figure 4.9
must be applied to get the motor to rotate 7.5 degrees. To rotate the motor in the opposite to tum the motor.
direction, we simply apply the sequence in reverse order. Note that the 8051 ports are unable to directly supply the current ~eeded to drive the
We can use an 805 l microcontroller and a stepper motor driver {¥C3479P) chip to motor. This can be solved by adding buffers. A possible way to implement the buffers is show
control the step .motor. We need only worry about setting the direction on the I· in Figure 4 . 11. The 8051 alone cannot drive the stepper motor, so several transistors were
clockwisc/cou · rclockwise pin .(cw'1ccw) and pulsing the clock pin (elk) on the stepper mqtor added to increase the current goi o the stepper motor. QI are MJE3055T NPN transistors
~~~tg:-,,~~ip_,::-_!-~~?l":1iccoco~troller. F:gure _4.9 gives· the schematic showi~g how to and 02 is an MJE2955T P ansistor. A is connected to the 8051 microcontroller and B is
connect stepper motocto tire driver. and the driver to the 8051. Figure 4.10 gives some
sampl ode to run the stepper motor. .

100
·-·- - ·- --'~--- -J ::
' ) .> ·, '.)
·)· ·::.- \ ,") •.. \. ·-,
r Chapter 4: Standard Single-Purpose Processors: Peripherals 4·8= Analog-to-Digital Converters
/ '-(aj" (b) (c)

4.8 Analog-to-Digital Converters Vmax=7.5V
l
·llll
An analog-to-digital coTTVerter (ADC, ND, or A2D) converts an analog signal to a digital 7.oy 1110
signal, and a digital-to-analog converter (DAC, DIA, or D2A) does the opposite. Such 6.5V I JOI 4 ·····-····-····--..... ··-····..···--··-- --...
conversions are necessary because, while embedded systems deal with digital values, an .6:ov --·- -- -·noo
~~- ~
embedded system's surroundings typically involve many analog signals. Analog refers to a roT I c:.' 3 .............. ................. ............_...___
continuously valued signal, such as temperature or speed, with infinite possible values in 5.0V 1010 =5
between. Digital refers to discretely valued signals, such as integers, and in computing ~ -:~_::-- · 1001 -~ ! 2 ·····-·ic---'·····i·············' ···········::-,
systems, these signals are. encoded in binary. By converting between analog and digital ~ 5
signals, we can use digital processors in an analog environment.
'-!l,Y--X. -
J.SV
·-i-------:.=.:,c:::
] 1 ................- ...........__,.................... i I ........i ......_; ........; .............;..................,
For example, consider an analog input signal whose value could range. from O to 7.5 "'-+-~-+-~- a
volts. We want to r~present each possible voltage in this rang-·e·-~sing. a 4-bit b_in ______,.~-;wube
=-.J.
· r. ~•;;;;=~--1- tl ! ti _{3 t4.t'·une
Clearly, 0000 would be the most obvious encoding for· o--V, and 1TI I for 7.5-·v-: The 1
OIOO ~?i~r~~Out° 101

:.:..:c:c..=.= ' . ,...,_ 0
encodings between 0000 and 11 II would then be evenly distributed to the range between O '.-<I 2, •
and 7.5 V,'as showninFigure4.13(~ · n
Now suppose that for a particular time inteival, the analog input signal's values were - (~--'f., / c ';, ·\ ) j:_,o -
those shown in Figure 4. 13(b), ranging from I V up to 4 V and then down to just over 2 V. ."1_ -(6\\ d. / __,,- 1.--0 .
The digital encoding of this signal, sampled at times t I, t2, t3 and t4, into four bits is shown
beneath the figure's x-axis. Conversely, suppose that for a time interval; we want to output an Figure 4. 13 : Conversion: (a) proportionality, (b) analog-to-digita~ (c) digital-to-analog.
~ 4-:}!}?
~!::.. ,L
analog signal corresponding to the digital encodings shown at the bottom of Figure 4. 13(c). ~ ~
The analog signal is shown in the figure. (usin~ an analog comparato_r). If the_two sufficiently match, then the ADC has found a proper
More ge iy,
we can compute the digital values from the analog values, and vice encodi~g. So now ~he quesuon remams: how do we guess the correct encoding?
versa, us the following ratio: !J.~ 9'.: 1··,,. Q. n _ 'f!ris pro?lem Is analogous to _the commo.n computer-programming problem of finding an
f;;\'7 r ' ~l{;'-'"' ov· •· _, e.. ,,.- ~ ,,. o ~, .:i item. m a hst. ~ne approach 1s seque_nual search, or "counting-up" in analog-digital
'-jf
'~a:.~fl (2" ·· 1) ~ ' · ...., ..., tennmology. In this approach, we start with an encoding of o, then 1, then 2, etc., until we
Here, Vm= is the maximum vpltage that the analog signal can assume, n is the number of fi nd a ~atch. Unfortunately, while simple, this approach (in the worst case) requires 2•
bits available for the digital encoding, dis the present digital encoding, and e is L'ie present . compansons (where~ was the number of bits in the encoding), so it may be quite slow.
analog voltage._In ou~ example of Figure 4. 13.' SU_Ppose Vmax_is ?)-· V. Th~~ e = 3 V, we ·A faSler soluuon uses what programmers call binary search, or "successive
have the followmg rauo: 3 / 7.5 = d I 15, resultmg m d= 6, or OI IO. @. '?. r:,> , approximation" _in analog-digital terminology. We start with an encoding corresponding to
We can define the resolution_ of _a DAC or ADC as v,,,.. I rz•
~presenting the · half 0.f the ma~unurn. We then compare the resulting analog value with the original; if the
number of volts between successive digitai' encodings. In Figure 4. I3(a), we see graphically re~lung value is greater (less) than the original, we set the new encoding to halfway between
that the resolution is 0.5 V between successive encodings. . th •s o~e and the maximum (minimum). We continue this process, dividing the possible
The above discussion assumes a minimum voltage of O V. The equations can easily be / en~ng range m half at each step, until the compared voltages are equal. This technique
extended to any voltage range. . reqmres at most n compa · orts, which is significantly better than 2". However, it requires a
Internally, DACs possess simpler desig11s1han ADCs. A DAC has n inputs for the digital more complex conv r.
encoding d, a V,,,= analog input, and analog outp~t -e. A fairly straightforward circui~ Because . Cs must guess the correct encoding, they require some time.
involving resistors and an op-amp, can used to co erl' d to e. addition the analog input and digital oytput, the in · t u 'i1~n:JU=tS~.H
ADCs, on the other hand, r ·re designs t are more complex, for the following con rsion, and a done output to indicat ,,. t the conve s· ~
reason. Given a V,,,.,, analog inp and an an g input e, how does the converter know what (-- ~ . . ) .,.;i
binary value to assign in ord to satisfy e above ratio? Unlike DA Cs, there is no simple Example: Successive ~
analog circuit to compute d rom e. Ins d, an ADC may itself contai.rl a D AC also connected Given_an analog input · al whose voltage should range from o to 15 v, and an S-bit digital
to Vm=· The ADC "gue es" an en ing d, and then evaluates its guess by inputting d into encodmg, we a~:_!o calculate the correct encoding_of 5 V. Let us trace through the successive-
the DAC, and corn g the nerated analog output e' with the original analog input e -- -=-
102
--~----
··--·-·--··- ---·-·
--------~:=- S~om 01--
•~= =•" ..~ -/ ·-·····- ···--..·>- ·- ·
103
..~ ,.......-~,· .
Chapter,,: stllndard Single-Purpose Processors: feripherals
4.9: Real-Time Clocks
(a) lo I o I o lo Io lo lo I o Since the above value is higher than the input voltage, we insert a zero into the next bit, as
(b) I o 11 I o I o lo I o I o lo shown in Figure 4. l4(c). Note that Vm., is set fo 5.63 V. Now we plug into the formula ~d
compute the next approximation: . 0 ~9:t~ ( 5:,b_,)
(c) Io 11 lo lo Io I o I o 10 ½(5 63 + 3.75) ,.. -""("""o O \V' f uv
y I 0;.J
Io I o lo . . _4.69V . ~ f~ -:;--
0 0 0 0 Smee the above value 1s lower than the mput voltage, we msert a one into the next most
significant bit, as shown in Figure 4. 14(d). Note that Vmin is set to 4.69 V. Now we .plug into
11 I o I o the fonnula and compute the next approximation:
(g) I o I I lo 11 Io 11 I o lo ½(5.63 + 4.69) 0 .\/'~ i \~' }_,,
(h) lo I 1 I o 11 Io I 1 lo It 1% s.16v Vo 0 / /~
Sine:the above value is higher than the input voltage. we insert a zero into the next :most \.Y
significapt bit. as shown in Figure 4. l4(e). Note that V= is set to 5.16 V. Now we plug into
Figure 4.14: Successive approximation: giv~n an analog input signal whose ".oltage should range from Oto 15 V, and the formula and compute the next approximation: 0 ,-·i
8 bits for digital encoding, we are to calculate the correct encoding o ~ y ~.518 ,··
½(5.16 + 4.69)
-q c:··"] L__ ";}.
. :...1 ,)
approximation approach to find the correct encoding. We already know that the encoding
should be:
4.93V 3· ,.-'._:tvJ-1'1'-
Since the above value is lower then the input voltage, we insert a one into the next bit, as
5/15 = d/(28-1) shown· in Figure 4. l4(f). Note that i;,,;. is set to 4.93 V. Now we plug into the formula and
d = .85 ./ compute: the next approximation:
Applying the successive approximation method we· start by fmding the halfway po.i.m_~tween ½(5.16 + 4.93)
the maximum and minimum voltages, where Vmar = 15 V and Vmin = 0 '(: .. .. ,. 5.05 V
½(Vmax + Vmi,J Since the above voltage is higher than the input voltage, we insert a zero into the next bit, as
7.5V shown in Figure 4.14($). Note that Vmar is set to 5.05 V. Now we plug into the formula and
compute the next approximation:
Since the above voltage is higher than the inp~tage we insert a zero i~tQ!Jte. hiiw._eJJ ~it, as
sho~ in Figure 4.14(a). We also know that the highest possible value is,:~ ~ set !"m"' 'l:,_(5.05 + 4°.93")
= 7.5 V. Ne~, we plug into the fonnula again and compute the next approxutJaUon: 4.99V
3.75 V
½UJ+O)
I Since the above voltage is less

shown in Figure 4. l4(h).
than the-· t voltage, we insert a one into the next bit, as
The encoding is now done. ote that the division by ½ can be done efficiently in binary
Since the above voltage is lower then the input voltage, we insert a one into the next most .
significant bit, as shown in Figure 4.14(b). We know the lowest possible value !s 3.:5 V, so
Vmin is set to 3. 75 V. Next, we plog into the fonnula and compute the next approximation: ·.
l arithmetic by simply shift" the number to riglu. The resulting value, shown in Figure
4.14(h), is OIO IO lO I = , as expected.
½(7.5 + 3.75)
4.9 Real-Time Clocks
5.63V '
Much like a digital wristwatch, a real-time clock (RTC) keeps the time and date in an
embedded system. Real-time clocks are typically composed of a crystal-controlled oscillator.
104 Embedded System Design .

L..__ . . . .. ·- ·- ···-·-· .,~·~~-
Chapter 4: standard Single.Purpose Processors: Peripherals
numerous cascaded counters, and a battery backup. The crystal-controlled oscillator generates
- •
4.12: Exercises
Spasov, Peter, Microcontroller Technology: The 68HCJ 1, 2nd edition. Englewood Cliffs
a very consistent nigh-frequency digital pulse that feeds the cascaded ~ounters. The fi~
NJ: Prentice Hall, 1996. Contains .descriptions of principles and details for commo '
counter, typically, counts these pulses up to the oscillator frequency, which corr~nds to 68HCll peripherals. n
exactly one second. At this point, it generates a pulse that feeds the next counter. This counter
counts up to 59, at which point it generates a pulse feeding the minute counter. The ho~, date,
month, and year counters work in a similar fashion. In addition, real-time clocks ~dJust ~or
leap years. The rechargeable back-up battery is used to keep the real-time clock runrung while
4.12 Exercises
4.1
Given ~ ti~er structured as in Figure 4. I(c) and a clock frequency of JO MHz: (a)
the system is powered off.
· From the microcontroller's point of view, the content of these counters can be set to a Detemune its range and resolution. (b) Calculate the terminal count value needed to
desired value, which corresponds to setting the clock, a.1d retrieved. _Communica~on between measure 3 ms intervals. (c) If a prescaler is added; what is the minimum division
needed . to measure ~ interval of 100 ms? (Divisions should be in powers of 2.)
the microcontroller and a real-time clock is typically accomplished through a senal bus, such
as r2C. It should be noted that, given a timer peripheral, it is possible to implement a real_-ti~e Detemune this _designs range and resolution. (d) If instead-of a prescaler a second J6-
clock in software running on a processor. In fact, many systems use this approach to maintain b1t _up-counter 1s cascaded as in Figure 4. l(d), what is the range and resolution of this
the time. However, the drawback of such systems is that when the processor 1s shut down or design? .
reset, the time is lost. 4.2
A watchdog timer that uses two cascaded 16-bit up-counters as in Figure 4. l(d) is
connected lo an 11.981 MHz oscillator. A timeout should occur if the function
watchdog_reset is not called within 5 minutes. What value should be loaded into the
4.10 Summary 4.3
up-counter pair when the function is called?
Given a controller with two built-in timers designed as in Figure 4. l(b) write c code
Numerous single-purpose processors are manufactured to fulfill a specific function in a
"'for _a fu~ction "double RPM" that returns the revolulions per minute of ~me device or
variety of embedded systems. These standard single-purpose processors ~ay b~ fast and
small and they have low unit and NRE costs. A timer infonns us when a part.tcular interval of _;: I 1f a time~ overflows. Assume all inputs to the timers have been initializ.ed and the
time has passed, while a watchdog timer requires us to signal it_within a particular inte':'al to . timers have_ been started before entering RPM. Timer! 's cnt_Jn is connected to the
indicate that a program is running without error. A counter infonns us when _a particular device an_d 1s pulsed once for each revolu_tion. Timer2 's elk input is connected to a JO
number of pulses have occurred on a signal. A UART converts parallel data to senal ~ta, and Mllz osc1lla1or. The timers have the outputs cntl , cnt2, top! ,. ·and top2, which were
m1t1al1zed to O when their respective timer began. What is the minimum (olher than O)
vice versa. A PWM generates pulses on an output signal, with specific high and low t1_mes: An
and maximum revolutions per minute that can be measured if top is not used?
LCD controller simplifies the writing of characters to an LCD. A keypad controller simplifies 4.4
capture and decoding of a button press. A stepper-motor controller assists us to rotate a Given a 100 MHz crystal-controlled oscillator and a 32-bit and anv number of 16-bit
stepper motor a fixed amount forward or backward. ADCs and DACs co~vert analog signals . tenninal- Ont timers, design a real-time clock that outputs the dat~ and time down to
to digital, and vice versa. A real-time clock keeps track of date and time. Most of these l mil · s<>nds. You can ignore leap years. Draw a diagram and indicate terminal-count
- aluesfor all timers. .. .
single-purpose processors could be implemented as ,software 0~ a general-purpose processor, r Determine lhe values for smod and THI to generate a baud rate of 9,600 for lhe 8051
but such implementation can be burdensome. These standard smgle-purpose ~rocessors thus ·
simplify embedded system design tremendously. Many microcontrollers integrate these . baud rate equation in the chapter, assuming an 11. 981 MHz oscillator. Remember that
.1mod is 2 bits and Tlfl is 8 bits. There is more than one correct answer.
processors on~hip. 4.6·
A particular motor operates at IO revolutions per second when its controlling inpul
voltage 1s 1.7 V. Assume that you arc using a microcontroller with a PWM whose
output port.can be sci high (5 V) or low (O V). (a) Compute the duty cycle necessa,y to
4.11 References and Further Re~ding obtam JO revolutions per second. (b) Provide values for a pulse width-and period that
• Embedded Systems Programming. Includes information on a variety of single-purpose. achieve this duty cycle. You do not need to consider whether the frequency is too high
processors, such as programs for implementing or using timers and UARTs on or too low allhough the values should be reasonable. There is no one correct answer.
47
microcontrollers. Usmg the PWM described in Figure 4.6 compute the value assigned to PWMJ re
ach1e,·c.an RPM of 8.050 as~ Ln_g_lhc _irip.u.Lv_ollagc nee~~4_i.s 4.3?._5 V. ,
W_ntc _a function m _pscudocode that m111alizes the LCD described in Figure 4.7. Aft<·
1111ualizat1on. the display should be clear with a blinking cursor The initiali,-.u1io1~
106 Embedded System o:. Embedded System Oesig~
~~'
107
. - - --~----- ;www.compsciz.blogspot.in
__.. . .:.___. c. .:.:= =;;.._------
.
'.l ': 1
Chapter 4: Standard Single.Purpose Processors: Peripherals
· should set the following data to shift to the left, have a data length of 8-bits and a font
of 5 x 10 dots, and be displayed on one line.
4.9 Given a 120-step stepper motor with its own controller, write a C function Rotate (int
degrees), which, given the desired rotatlon amount in degrees (between O and 360),
pulses a microcontroller's output port the correct number of times to achieve the CHAPTER 5: Memory
desired rotation.
4.10 Modify only the main function in Figure 4 .12 to cause a 240-step stepper motor to
rotate forward 60 degrees followed by a backward rotation of 33 degrees. This stepper
motor uses the same input sequence.as the example for each step. In other words, do not
change the lookup table.
4.11 Extend the ratio and resolution equations of analog-to-digital conversion to any voltage
5.1 Introduction
range between Vmin to Vmaz rather than Oto _Vma.·
4.12 Given an analog output signal whose voltage should range from Oto 10 V, and an 8-bit 5.2 Memory Write Ability and Storage Permanence
digital encoding, provide the encodings for the following desired voltages: (a) 0 V, (b) 5.3 Common Memory Types
1 V, (c) 5.33 V, (d) IO V, (e) What is the resolution of our conversion? 5. 4 Composing Memory
4.13 Given an analog input signal whose voltage ranges from O to 5 V, and an 8-bit digital 5. 5 __Memory Hierarchy and Cache
encoding, calculate the correct encoding for ·3.5 V, and then trace the ~ - Advanced RAM
successive-approximation approach (i.e ., list all the guessed encodings in the correct 5.7 Summary
order) to find the correct encoding.
ii 5.8 References and Further Reading
4.14 Given an analog input signal whose voltage ranges from - 5 to 5 V, and a 8-bit digital
encoding, calculate the correct encoding 1.2 V, and then trace the successive- 5. 9 Exercises
1: .
approximation approach to find the correct encoding. .
Ji 4.15 Compute the memory needed in bytes to store a 4-bit digital encoding of a 3-second f
I analog audio signal sampled every 10 milliseconds. f
I
5.1 Introduction
Any embedded system's functionality consists of three aspects: processi~g, storage, and
communication. Processing is the transformation of data, stm:age is the retention of data for
later use, and communication is the transfer of data. Each of these · aspects must be
I implemented. We use processors to implement processing, memory to implement storage, and

buses to implement communication. The earlier chapters described common processor types:
custom single-purpose processors, general-purpose processors, and standard single-purpose
I processors. This chapter describes memory. ·

Let's start by describing some basic r11emory concepts. A memory stores Jaige numbers.
of bits. These bits can be viewed as m words of n bits each, for a total of m • n bits, as
illustrated in Figure 5.l(a). We refer to a memory as an m x n ("m-by-n ") memory. Figure
5. l(b) shows an external view of a memory. Log:z(m) address input sigI!als are required to
identify a particular word. Stated another way, if a memory has k address inputs, it can have
up to 2k words. n data signals are required to output (and possibly input) a selected word. For
example, a 4,096-by-8 memory can store 32,768 bits, and requires 12 adctfuss signals and
I eight input/output data signals. To read a memory means to retrieve the word of a particular
address, while to write a memory means to store a word in a particular address. A memory
access refers to either a read or write. A memory that can be both read and written ~ ~ .
Embedded System Design Embedded Sr-;tem Design

•. 109
108
· · ·' - .. -···-·-·· -- ·------··---------- -- ····-·· ··- ----·---,·- -· ----------- __________
- ··- ·-- ---'--···" ._....
- - - - - --- --- · ~ ---·--- i
Chapter S: Memory
5.2: Memory Write Ability and Storage Permanence
rn x n memory
r/w
enable 2• x n read and
write memory Mask-programmed ROM Ideal ,gemo,y
1~ "-----v----1
Ao
Life of
product
•
OTP ROM
•
11 bits per word
On-I Qo Tens of EPROM !EE~ROM FLASH
(a) (b)
• • •
NVRAM
Figure 5 l : t..t..iory: (a) words and bils per word.7ry block diagram.
.. •
···-··-·····-···········--···-·---- ·--·--·- ··-·-···----·-·-···--·- - ···-·····-··---··-··- - -··-··
In-system
additional control input. labeled r 1w in Figure 5. l(b). to indicate which access to perform_ SRAM/DRAM
i programmable
Most memo!}· types have an enable control input. which when ~cassertcd. causes the memory
to ignore tJ1e address. such that data is neitJ1cr written to or read from the memory. Some types
Near
zero r Write
ability
•
of memo11·. known as multiport memory. support multiple accesses to different locations During External External fa.1emal
simultaneously. Such a memory has multiple sets of control lines. address lines. and data fabrication programmer, programmer ogrammer programmer In-system, fast
lines. where one set of address and corresponding data and control lines is known as a port. only one time only OR in-system, OR in-system., writes,
Memory has evolved ve_T)' rapidly over the past few decades. The main advancement has 1,000s block-oriented wilimited
been tJie ttend of memol}·-chip bit-capacity doubling every 18 months, following Moore's of cycles writes, 1,000s cycles
of cycles
Law. The importance ofthisl(end in enabling today"s sophisticated embedded systems should
not be underestimated. No matter how fast and complex processors become. t_hose processors
Figure 5.2: Write ability and storage pennanence of memories, showing relative degrees along each axis (nol lo
still need memories to store programs and to store data to operate on. For _example, a digital scale).
camera is possible not only bccaule of fast A2D and compression processors but also because
of memories capable of storing sufficient quantities of bits to represent quality pictures.
Further advancements to memory have blurred tJ1e distinction between the two traditional
memo!}' categories of ROM and RAM. providing designers with the benefit of more choic,f
Traditionallv. tlie term ROAi has referred to a mcmorv that a processor can only read. and 5.2~;;;;rite Ability and ~ e n c e
which hold; its stored 'bits even without a power sou.rec. The term RAM has referred to a
memory that a processor can both read and write but loses its stored bits if power is removed._ Write Ability
However. processors can not only read. but also wrile to advanced ROMs. like EEPROM and We use the term write ability to refer to the manner and speed that a particular memory can be
Flash. although such writing may be slow compared to writing RAMs. Furthermore. advanced written. Al_l types of memoi:y can be read from bfa processor, since otherwise their stored
RAMs, like NVRAMs. can hold their bits even when power is removed. bits would sen ·e littJe purpose in an embedded system. Likewise, all types of memory can be
Thus, in this chapter, we depart from the ttaditional ROM/RAM distinction. and instead written. since otherwise we w9_uld have no way to store bits in such a memory. However, the
distinguish among memories using two characteristics. namely write ability and storage manner and spc_i_:d ofsuch writing vari~l'_greatly among memory types.
permanence. We then introduce forms of memories commonly found in embedded systems. Afthe-Y1igh encf.of the rai'C ' ·- . we have tYJ>CS of memory that_ a processor
We describe techniques for the common task of composing memories to build bigger can write to simply and quickly by setti· uch a memory's address lines, datJ, input-bits, and
memories. We describe the use of memory hiernrchy to improve memory access speed. control lines lippropriately .o_F.ard e middle of the ran e,
we have types of memory that are··
slower to write by a processor ~~~ e . er~!!._ ofth; range; we liavel)yes
of memory .
that can on! be' writte b a s u:ce ot~uipmenf called a "progranurre_y. 11lis
de~ice=must apply cia1 voltage
--,,.-C--:;:= =-=--- - - - ---·-···
levels
Jo wn-'re:.c.To·-llie memory, also knO\m as

!.
l .I 5.3: Common Memory Types
Chapter 5: Memory
"programming" or "burning" the memory. Do not confuse this use of the term programmer
with the use referring to someone who writes software. At the low end of the r_ange of write· 8 x 4 ROM
ability, we have types of memory that can only have their bits stored when the' memory chip r~~"l-..e..::r-<"'::::r...._~-=- word 0
itself is being fabricated. · enable zk x nROM 3 x 8 r---.::r--==t---.::t-e..r- word I
decoder word 2
Storage Permanence t-4<:::t-""d--4's::t-e<;j,---E-, word line
Storage permanence refers to the ability of memory to hold its stored bits after those bits ·have
been written. At the low end of the range of storage permanence is lTlelJIOry t_~ t begins to lose
. b. s almost immediately after those bits are written, an"inlierefoie must be ·continually
hed. Next is mem._q_ry_that wiH ho~d its bits as long as power is applied tq the me?1o~. . data line
Qo
comes memory that can hold 1ts bits for days, months, or even years after the memory s programmable
wired-OR
po :er source has been turned off. At the high-end of the range is memory that ~sentially '· • connection
er lose its bits- as long as the memory chip is not damaged, of course.v .
The terms nonvolatile and volatile are commonly used to divide mernocy_ty~two (a)
e
gories along the stora permanence axis, as ·shown in Figure 5.2. Nonvolatile memory
can hold its bits even . e~ power is no longer supplied. Com:ersely, volatile ~ i:equires Figure 5.3: ROM: (a) external block diagram, (b) internal view of an 8 x 4 ROM.
continual power. to tam its data. ·--~ ·.
Likewise, e term in-system programmable is used to divide\ memories into two We can use ROM for various purposes. One use is to store a software program for a'
the write ability axis. In-system prograrnmabl emoryJ;lUl-be-written to by general-purpose processor. We may write each program instruction to one ROM word. For
-· p _ ·ng in the embedded system thal,:f'JReS..-1J1e memory. Conversely, a memory
some processors, we write each instruction to several ROM words. For other processors, we
-system prog ble must be en by some external means, rather than may pack several instructions into a single ROM word. A related use is to store constant data,
al o ration the embedded o=-- < like large lookup tables of strings or numbers.
A second common use is to store constant data needed by a system. A third, less
common, use is to implement a combinational circuit We can implement any combinational
As described in Chapter 1, design metrics often compete with one another. Memory write : function of k variables by using a 2k x l ROM, and we can implement n functions of the same
ability and storage penYlanence are two such metrics. Ideally, we want a memory· with the k variables using a 2k ~ n ROM. We simply program the ROM to implement the truth table
highest write ability a4,d the highest storage permanence, as illustrated by the ideal memory for the functions, as shown in Figure 5.4.
point in Figure 5.2. Unfortllllately, write ability and storage permanence tend to be inversely Figure 5.3(b) provides a symbolic view of the internal d.;sign of an 8 x 4 ROM. To the
proportional to one another. Furthermore, highly writable membry typically requires more right .o f the 3 x 8 decoder in the figure is a grid of lines, with word lines running horizontally
area and/or power than less-writable memory. and data lines vertically; lines that cross without a circle in the figure are not connected. Thus,
word lines only connect to data lines via the programmable connection lines shown. The
figure shows all connection lines in place except for two connections in word 2. To see how
this device acts as a read-only memory, consider an input address of 010. The decoder will
5.3~mmon Memory.Types thus set word 2 's line to I. Because the lines connecting this word line with data lines 2 and 0
do not exist, the ROM output will read 1010. Note that if the ROM enable input were 0, then
Introduction to "Reaq-Only" Memory - ROM no word would be read, since all decoder outputs would be 0. Also note that each data line is
ROM, or read-only memory, is a nonvolatile memory that can be read frnm, but not written shown as a wired-OR, meaning that the wire itself acts to logically OR all the connections to
to, by a processor in an embedded system. Of course, there must be a mechanism for setting it.
the bits in the . memory, but we call this progranuning, not writing. For traditional types of · How do we program the programmable connections? The answer depends on the type of
ROM, such programming is <kine off-line, when the memory is not actively serving as a ROM being used. Common types include mask-programmed ROM, one-time progranunable
memory in an embedded system. We program such a ROM before inserting it into the ROM, erasable programmable ROM, electrically erasable programmable ROM, and Flash, in
embedded system. Figure 5.3(a) provides an external block diagram of a ROM · order of increasing write ability. In tenns of write ability, the latter two have such a high
Embedded System.Design. Embedded System Design · 113

112
. ·-- ------~· . ---- ··-· .
·i
tJ
~!
Chaplet" 5: Memory
~ --· 5,3: Conwnon Memory Types _ ~l.
connectio
. n can never be reestablished. For this reason, basic PROM is often re£erred
Truth table (ROM contents) toas
one-ume-programmable ROM, or OTP ROM. -
Inputs (address) Outputs
OTP ROMs have the lowest write ability of all PROMs, as illustrated in figure
a b C V z 5.2, since
wordO they~ only be written once, and they require a progranuner device. However
0 0 0 1-00 ....... 0(".: ....~ very high storage permanence, since their stored bits.won't change unless someone
, they have
0 0 I word I recorutects
0 I 0 o···-··T···' 0 the device to a programm er and blows more fuses. Because of their high storage
permanence,
0 I I I 0 I 0 OTP ROMs are commonl y used in final products, versus other PROMs,
enable which are more
I 0 0 I 0 I 0 susceptible to having their contents inadvertently modified from-radiation, malicious
I 0 I I I ness or
I I jUSt the mere passage Of many years:·· "-- -- I
I I 0 I I C
I I ,
I I I b-__;_ ~-a---- i---1 OTP ROMs are also cheaper per chip"thani>~er PROMs, often costing under
I I I I word? a dollar
each. This also makes them more attractive in final products versus other types
of PROM, and
y z also versus mask-pro grammed ROM when time-to-rruuket constraints or unit costs
make them
a ~tter choice. Because the chips are so cheap, some designers even use OTP
ROMs during
design development. Those designers simply throw away the used chips as they
(a) (b) program new
ones. ·
Figure 5.4: Implementi ng combinational functions with a ROM: (a) truth tab!~,
(b) ROM contents . EPROM ·- Erasab le Programmable ROM ,
degree of write ability that calling them read-only memory is not really accurate. Another type of PROM is an erasabie PROM, or EPROM. This device uses a
In terms of MOS transistor
storage permanence, all ROMs have high storage permanence, and in fact, all as its programmable compone nt. The transistor has a "floating gate," shown
are nonvolatile. in Figure 5.5(a),
We now describe each ROM type briefly. meaning the transistor 's gate is not connected and is instead surrounded by
insulator. An
EPROM programm er injects electrons into the floating gate, using higher than
normal voltage
(usuaHy-12 V to 25 V) that causes electrons to tunnel through the insulator into
Mask-P rogram med ROM the gate, as in
Figure 5.5(b). When that high voltage is removed, the electrons cannot escape,
In a mask-pro grammed ROM, the corutection is programmed when the and hence the
chip is being . gate has been charged and programming has occurred. Reading an EPROM
fabricated by creating an appropriate set of masks. Mask-programmed ROM is much faster
obviously has than writing, since reading doesn't require programming. To erase the progra111,
extremely low write ability, as illustrated in Figure 5.2, ·but has the the electrons
highest storage must be excited enough to escape from the gate. Ultraviolet (UY) light is u6ed
permanence of any memory type, since the stored bits will never change unless to fulfill this
the chip is role of erasing, as shown in Figure 5.5(c). The device must be placed under a
damaged. Such ROM types are typically only used after a final design has been UV eraser for a
determined, period of time, typically ranging from 5 to 30 minutes, after which the
and only in high-volume systems, for which the NRE costs can be amortize device can be
d to result in a programmed again. For the UV light to reach the chip, EPROMs come with
a small quartz
lower unit cost than other ROM types.• window in the package through which the chip can be seen, as shown in Figure
l
5.5(d). For
this reason, EPROM is often referred to as a windowed ROM device. EPROMs
can typically
OTP ROM _ One-Time Programmable ROM be erased and reprogrammed thousands of times; and standard EPROMs are guarantee
d to
Many systems use som form of user-programmable ROM devi~e, meaning hold their programs for at least IO years. ·
the ROM can be
programmed by esigner in the lab, long after the chip has been manufactured. Compared with OTP ROM OMs have improved write ability, as illustrate d in Figure
User-progra le ROMs are generally referred to as programm able ROMs, or PROMs. 5.2, since they can be e and reprogrammed thousands of times. However, they have
These devi s are better suited to prototyping and to low-volume applicatio reduced storage pe ence, since they _are guaranteed to hold a program only for about I 0
ns than are
mask-pr ed ROM. The most basic PROM uses a fuse for each programmable years, and the· stored _bits are susceptible to undesired changes · if the
chip is used in
conn ion. To program a PROM device, the user provides a file that indicates en-vironme with much electrical 11oise or radiation. Thus, use of EPROMs in productio
the desired n
R contents. A piece of equipment called a ROM programmer then configure parts i united. If used in production, EPROMs should have their windows
s each covered by a
rograrnmable connection according to the file. Note that here the programm stic r _to reduce _the likelihood of undesired changes of the memory.
er is a piece of
equipmeat, not a person who writes software. The ROM programmer blows \
fuses by .passing
a large current wherever a connection should not exist. However, once a fuse
is blown, the
114 E1$edded System Design · Embedded System Design

115
__ ___ ._............___,.~ ·~ ·-·- - -. - - ----· - --------··-·· ··-·-· ·· www.compsciz.blogspot.in ·-·--·--···-- ~----- . ·- -~-- ... - ,. ---- ·--- - -·- ... - - - -~--
I
5.3: Common Memory Types

Chapter 5: Memory
EPROMs can only be erased in their entirety. EEPROMs are typically more expensive than
. EPROMs:, but far more convenient to use. EEPROMs are often -called E2 s, pronounced "E-
squareds..
Because EEPROMs can be erased and programmed electronically, we can build the
circuit providing the higher-than-nonnal voltage levels for such electronic erasing and
programming right into the embedded system in which the EEPROM is being used ..Thus, we
can treat this as a memory that can be both read and written - a write to a particular word
would consist of erasing that word followed by progranuning that word. Thus, an EEPROM is
in-system progranunable. We can use it to store data that an embedded system should save
after power is shut off. For example, EEPROM is typically used in telephones that can store
(a) Initially, the negative charges form a ~hannel commonly dialed phone numbers in memory for speed-dialing. If you unplug the phone, thus
between the source and drain of the tranststor
shutting off power, and then plug it back in, the numbers will still be in memory. EEPROMs
storing a logic l at that cell's location.
can typically hold data for IO years and can be erased and programmed tens of thousands of
times before losing their ability to store data.
+15V
In-system programming of EEPROMs has become so common that many EEPROMs
(b) By applying a larg~ positive vol~e come with a built-in memory controller. A memory controller hides internal memory-access
at the gate of the transistor, the negab.ve details from the memory user, and provides a simple memory interface to the user. In this
charges move out of the c~el area ·case, the memory controller would contain the circuiUy and single-purpose processor
and get trap~ in the floa~ gate: necessary for erasing the word at the user-specified address, and then programming the
storing a logic O at that cell s location.
user-specified data"irito that word.
(c) By shining UV rays on th~ surface

5-30min
I While read accesses may require only tens of nanoseconds, writes may require .tens.of
microseconds or more, because of the necessary erasing and programming. Thus, EEPROMs
with built-in memory controllers will typically latch the address and data, so that the writing
of the floating-gate, the negative processor can move on to other tasks. Furthermore, such an EEPROM would have an extra
charges move down into the channel
restoring the logic l at the cell's
"busy" pin to. indicate to the processor that the EEPROM is busy writing, mearting that a I
processor wanting to write to the EEPROM must check the value of this ousy pin before
location.
attempting to write. Some EEPROMs support read accesses even while the memory is busy .\
writing.
A common use of EEPROM is to serve as the program memory for a microprocessor. In \
(d) An EPROM package ~th a this case, we may want to ensure that the memory cannot be in-system programmed. Thus,
quartz window through which UV \
EEPROM typically comes with a pin that can be used to disable programming.
light can pass.
EEPROMs are more writable than EPROMs, . as illustrated in Figure . 5.2, since
\
EEPROMs can be program m-system, and they are easier to erase. EEPROM is where the
Figure 5.5: EPROM internals. distinction between and RAM begins to blur, since EEPROMs are in-system
programmable us writable directly by a processor. Thus, the term "read-only-memory" .•\
for EEPRO · really a misnomer, since the processor can in fact write to an EEPROM. Such
EEPROM - Electrically Erasable Programmable ROM
Electrically erasable PROM, or EEPROM, develo~ed in th~ earl\ 1;sOS, "'.35g
eliminate the time-consuming and sometimes impossible requrremen o expos~
·
. .
th R:OM An EEPROM is not only programmed electrorucally, but 1t 1s
:t::~
. writes ar ow compared to reads and are limited in number, but nevertheless, EEPROMs can
and · commonly written by a processor during normal system operation.
Flash Memory
to UV hght to erase e · · h lectr ·c s
al~ erased eiectronically, typically by using higher than no~ voltag~.eduf~r ;PRO':. Flash memory is an extension of EEPROM that was developed in the late 1980s. While also
erasing typically only requires seconds, rather than the many mmutes reqwr h eas using the floating-gate principle of EEPROM, flash memory is designed such that large
Furthermo ' re, EEPROMs ::an have individual words erased and reprogrammed, w __ er _ blocks, of memory can be erased all at once, rather than just one word at a time as in
ii
116
Embedded System pesign ' Embedded System Design 117 n
T'
www.compsciz.blogspot.in ___j
·- .. -----'----·- · ........ . _.___
_ .
-- --··- --·--'·---·~ . ----~ -
-·
Chapter 6: Memory
5.3: COIJWnori Mern T •--·
ory ypes
traditional EEPROM. A block is typi~ally several thousand bytes large. This fast erase ability
can vastly improve the performance of embedded systems where large data items must be
l stored in nonvolatile memory, systems like digital cameras, TV set-top boxes, cell phones,
and medical monitoring equipment. It ·can also speed manufacturing throughput, since
programming the complete contents of flash may be faster than programming a similar-sized
EEPROM.
I
Like EEPROM, each block in a flash memory can typically be erased and reprogrammed
II tens of thousands of times before the .block loses its ability to store data, and can store its data
for 10 years or more.
A drawback of flash memory is that writing to a single word in flash may be slower than
w
i writing to a single word in EEPROM, since an entire block will need to be read, the word (a)
within it updated. and than the block written back. · (bl
Figure 5.7: Memory cell internals: (a) SRAM. (b) DRAM.
Introduction to Read-Write Memory - RAM

the media, one would have to sequence throu h a n be .
We now tum our'attention to a type of memory referred to as RAM. RAM, or random-access tape -would have to be rewound or fast-forw~rded u~ r of other !oca!lons. For example. a
memory. is a memory. that can be both read and written easily. Writing to a RAM is about as memory location could be accessed in the same . n contrast, . with RAM. any ··random ..
fast as reading from a RAM. in contrast to in-system programmable ROMs where writes take r~g~dl~s _of the previously accessed location. Thiamount of_ llme as any other location..
much longer than reads. Furthermore, RAM is typically volatile. Unlike forms of ROM, RAM d1stmgu1shi11g feature of this memo typ th . s ~dom access feature was the kev .
never contains data when inserted in an embedded system. Instead, the system writes data to stuck even today. ry e at e llme of Its mtroduction. and the name ha·s
and then reads data from the RAM during its execution. Figure 5. l(b) provides a block
. A RA!vf's_ internal structure is somewhat more com le than •
diagram of a RAM. Figure 5.6, which illustrates a 4 >< 4 RAM M . P x a ROM s. as shown in
A common question is. where does the term random-access come from in the name not just four as in the figure) Each d . ( . ote. RAMs typically have thousands of words
random-access memory? RAM should really be called read-write memory, to contrast it from bit. In the figure each inp~t data ;or consists of a number of memory cells. each storing
, me connects to every cell in its col L.k
i
read-only memory. However, when RAM was first introduced, it was in stark contrast to the output d~ta line connects to every cell in its colu ., umn. I eni sc. each
then-common sequentially accessed memory media, like magnetic tapes or drums. These ORed wuh the output data line from bo E hmn, with the output of a memory cell being
media required that the particular)ocation to be accessed be_positioned under an access device t · . · a ve. ac word enable line fr th .<f
o every cell m its row. The read/write input (rd/wr) . om e ecoder connects
(e.g., a head). To access another location not immediately adjacent to the current location on The memory cell must possess logic such that it t is :s~med to be connected to eve~ cell .
iii i
.
wnte and the row is enabled and
~~~e;:g~!~~;~ : ;~; : : ~s~~t:~
than dynamic RAM Furth--- - - ---",- -
. .
: :~~tlu~.
h th . s ores e mput data bl! when rd 117 indicates
~i! \\~h~~-~~'.»·r .i~ilicates re,d and
. - '.1°1 c. ~ta~ic RAM.1s fast.er but larocr.
1
errnore, stallc RAM is easily im 1 -· - d ·-- .-- . . . " •
processors, whereas.dynamic RAM is usually i·m I d p emente on._~_same :r ,1 ,
i{
enable 2 x4 ._ p emente on..asep_<ll]!t_e 1c;,, .
'I l SRAM - Static RAM ;;;:::;
I
Ao Static RAM or SRAM
Ij A, ·. ' , uses a memory cell shown in Figure 5 7( ) - .
to sto_~~a: -~it E~-=bjJ .thus requires abou; six . - ·.- -a , cons~stm~ of a It "i°'f-
I because !~Will hold its data: as fon ~ as ~w · -~3:fl~l/i!o~s. This RAM type )S S1,!!$L::;::J ,,
RAM is typ· u· - . .·· - ~,c~ --. p ens supphed, m contrast to dvnanuc RAi\r -:. ·,'
rd/wr . , 1ca, y used for h1gh-peifo ce part f --······ - . ,,, "
r . . . . . - -- . . . so a system (e.g., cache).
=--- .y : • ,..:: ~':'.!
fJ
-, ·~ : .-, .-. . !' / ."','·
Figure 5.6: RAM internals.
.' ,~
y- ~': ·':{:, '. G >.
118 Embedded System Design e:m:be:d:d:~d:;:S:ys:te:m:::Des:'.=:ig:::n-----------.....:__________ _ _.: :· ,. ._l

~~(-~!~~~-
{f.S~,('/A
_.: .. . .
Chapter 5: Memory
5.3: Common Memory Types
DRAM - Dynamic RAM

Dynamic RAM, or DRAM, uses a memory cell, shown in Figure 5.7(b), consisting of a MOS 11-13, 15-19...... data<7 ... 0>
11-13, 15-19 - data<7 ...0>
transistor and capacitor to store a bit. Each bit thus requires only one transistor, resulting in
2,23,21,24, addr<l5 ..O>
more compact memory than SRAM. However, the charge stored· in the capacitor leaks 25, 3-10 27,26,2,23,21 , - addr< IS ..0>
gradually, leading to discharge and eventual~to loss of data- TQ..prev~Hoss of_data;_each 22
24,25, 3-10
/OE
cell must regularly have its charge "refreshed.:)A typical DRAM cell's minimum refreslpate 22 - - /OE
is once every 15.625 microseconds. B~use~f the way D ~ aredesigned, reading a 27 . !WE
DRAM word refreshes ~t word's cells.~ ~~cular, ~~ng.a,.ilRAM-W?~g{(!_SuJ.ts...m_the 2 0 - - /CS
tend to be ~lower .to ~cc;ess t h a n ~ ~ 7 . ,· · ·:, '·

word's data being stored ma buffer and then bemg wntten back tQJhe_words_c:_¢lls. Q,M_Ms 20
26
/CS I
CS2 HM6264
27C256
PSRAM - Pseudo-Static RAM ~ (a)
Many RAM variations exist. Pseudo-static RAMs, or PSRA.i\.fs, are DJlAMs with a memory
refresh co Iler built-in. Thus, since the RAM user need not wom--about refres!!mg, the Device Access Standby Active Pwr. Vee Voltage
devic pears to ~have much like ·an ~..Ho'wever, ~ -confrast ~~. a HM6264
Time (ns)
85-100
Pm. (mW)
.01
(mW) (V)
may be busy reftes~ing itself when accessed, which could slow access time and add 27C256
15 s
90 .5 IOO s
some system complexity: Nevertheless, PS popular low-cost high~ensfiy memory
alternative to SRAM in many embedded systems. " (b)
lrr-P c_.,,~'(
.~'>VJ
_1,
Read oikration
NVRAM - Nonvolatile RAM Write operation wlj
\/Nonvolatile RAM, or NVRAM, is a special RAM variation that is able to hold its data even f data
;.
aftffextemal power is removed. There are two common types t,NVRAM. -,r======!~
"I
-1 ~
data
One type, often called battery-backed RAM, contains'a ·static RAM along with it own
permanently connected battery. When external po e(is removed or drops below a certain
threshold, ·the -irijmal battery maintains power J.!(!Jle--SRAM, a n ~ o r y continues
addr
OE
~ -~
.______Jr--
addr
WE
-1_~_-_-_-_.::,:::_::----l~
to store..i~itsr Com ared ~itll._Qth~r-~~Illl ' f'iionvolatil~ memo~ battery-baci~ -~ > is /CSI /CS ! - - - ~...__ _ __ _1
far moi?"wri~We as illustrated in Fi · , : r-Since no-special prograifiliiinjfis necessary, CS2 CS2---..s------,

wfifci°tai:ectone m nanoseconds,
/; -- - ---·- . unlike
just like reads. Furthermore, ROM•ba~ [omis of
---·=,.-~.,.-....
~~o;:~,~~~~~R~~~~-s~~~~~~-~~~
j (c)
,.
1. :~ven~-~le
~: N- ·· having attenes lliaCtarr-fast for 10 years. ·However,'!iNRAMs are more Figure 5.8: IIM6264 and 27C256 RAM,'ROM devices: (a) block diagram (b) ·I ..
• c.:
.. . · · ·
1arai.:h!nsllc.:s. (c) hmm!; diagrams.
.
F'
sceptible to having bits changed inadvertently due to noise than are EEPROM or flash.
t A second type of NVRAM contains a static 'RAM as ell as an EEPROM or flash having Subsequent digits ~ive the memory capacit)· in kilobits. Both these devices arc a\·ailable in 4
the same capacity as the static RA1f1 T · . its co · RAM contents 8: 1_6 : 32, an~ 64 kllobytes. so the part numbers 62 or 27 would be followed bv the number of
into the EEPROM just before power is turned e data, k1lob1ts.. which
. . may be 32. , 64, 128 · ·· some of the
. , 2:,-6. or .,c 12. F.1gure 5.8(b) summanzcs
and then reloads that.data from EEPRO/ .·.. . , charactenst1cs of these devices. ·
Memory access:to and from these devices is performed through an 8-bi; parallel protocol.
Example: HM6264 and 27C256 RAM/ ~,:Devices / . ·. Placmg a memory addr_ess on th_e address-bus and asserting the read signal output enable (0/:)
_·· pacity memory devices, shown in ·1
In this example, we introduce a pair of low-c t low-:ca performs a read ~perauon. Placmg-some data and a memoiy address on the data and address
Figure 5.8(a), commonly used in 8-bit microc trailer-based embedded systems. The first two busses _a n~ assert_mg. the write signal write enable ( II F ) performs a write operation. n,c read
numeric digits irt these devices indicate ether the device is RAM (62) or ROM (27). . and wnte tlmmg 1s giv~n m Figure 5.&(c). ·
120 Embedded System Design Embedded System Design-

121.
www.compsciz.blogspot.in :J
·- __n,
-------
Chapter 6: Memory
5.4: Composing Memory
byte 1/0. The interested reader should refer to the manufacturer's datasheets for complete
timing information. The read operation can be initiated with either the address status
processor (ADSP) input or the address status controller (ADSC) input. Here, we have asserted
data<31.. .0> Device Access Standby Active Pwr. Vee Voltage both. Subsequent burst addresses can be generated internally and are controlled by the address
Time (ns) Pwr. (mW) (mW) (V)
adyance (ADV) input. In other words, as long as ADV is asserted, the device will keep
addr<IS ... O> TC55V23 JO na 1200 3.3 incrementing its address register and output the corresponding data on the next clock cycle.
25FF-100
addr<IO ...O>
(b)
/CS! 5.4 Composing Memory
/CS2 ~ embedded system designer is often f;:i~ with the situation o_f needing a particular-sized
A single read operation
memory (ROM _o r RAM), but ha~mg readily available memones _o f a .different siz~or
example, the designer may need a_2 x 8 ROM, but may have 4k x 16 ROMs readily ava1lable.
/WE
CLK a
Alternatively, the designer may need 4k x 16 ROM, but may have zk x 8 ROMs available for
use. (le.('cft..dL ~~ ~ -
/ADSP The case where the available memory is larger than.needed_js-easy"to deal with. We
/OE
/ADSC simply use the needed l~we.r words. inJ!!Dnenro~ignoring-aj!heedciHtigher words aii.cf-
MODE their high-order addtess -~ iand t s ; use the lowe_r ]a~ mput/outputhnes, thus ignoring
/ADV unneeded hd1gher dala Ii~ Of rse, W! Wuld_~f~}~e ~igh~j line~ ~~ignor! . the lower
/ADSP 1mes mstea . . • _;rr - ·
addr<l5 ... 0> The case where the available memory is smaller than needed requires more design effort.
/ADSC /WE In this case. we must ·comµpse several smaller· niemories io behave as the larger· memory we
/ADV /OE need. Suppose the available memories have. the correct number of words, but each word is not
\_yi_~_e cnpµ_gh. In this case, we. can simpli conoei::t The available memories side-by-side. For
CLK /CS I and /CS2 exampT., igure 5. IO(a) illµstrates the situation Qf_fil!_eding_a..B.OM ,three-times wider that
ava· e. We connect-thr¢eROf,XS"side-by-side, sharing the same address and
TC55V2325F CS3 - n th d concatenating the data lines to form the desired word ,vidth.
F-100
_ uppose mstea at the available memones _ave wor w1 t , ut not enough
data<31 ...O>
~d,:;. In this case, we can connect the availa~le memories' top ·lo botfom. For-exan1ple,
Figure 5.1 O(b) illustrates the situation of needing a ROM with twice as many words, and
(a) (c)' hence needing one extra address line, than that availabk We connect the ROMs top to
bottom. ORing the corresponding data lines of each. We use the extra ltigh-order address line
to select the higher or lower ROM using a I " 2 decoder, and the remaining address lines to
Figure 5.9: TC55V2325FF-IOO RAM devices: (a) block diagram, (bi characteristics, (c) timing diagrams.
offset into the selected ROM. Since only one ROM will ever be enabled at a time, the ORing
of the data lines never actually involves more than one nonzero data line.
If we instead needed four times as many words, and hence two extra address lines, we
Example: TC55V2325FF-100 Memory Device would instead use four ROM 2 x 4 decoder having the two high-order address line as
In this example, we introduce a 2-megabit synchronous pipelined burst SRAM mem~ry input »'.9!:!ld select one e four ROMs to access.
device, shown ill Figure 5.9(a), designed to be interfaced with 32-bit process?rs. This device, Suppose the -ailable memories ve a smaller word width..as well as fewer wor s t ian
made by Toahiba Inc., is organized as. 64K x 32 bits. Figure 5.9(b) sununanzes some of the necessarv. then combine the above two techpiques, fir cieaiing the number of columns.
characteristics of this device. of mem es necessary to achieve· the needed word wi and then creating thenumber of
In Figure 5.9(c), we present the timing diagram for a single read operation. Write rows o memories necessary, along with a deco s. ·
operation is similar. This device is capable of fast sequential reads· and writes as well as single The a p p r ~ t e c i in Figur
Embedded System Design · ~inbe"dded System Design 123

122
\
·-···--- ----~~ ,.~ ·

-· . ·---- --~~·------
--------------
., -··· ., :. ·. .
-----------------~---------------_.:_-
Chapter S: Memory 5.5 : Memory Hierarchy and Cache
(a)
2m x 3n ROM /
enable .... 2m x n ROM __ ___ 2m__ >$_n,__ ROM _____ 2m x n ROM
Ao
Oln-1 Q2n-l On-I Qo

,, (b) (c)
i i
2rn•I x n ROM
A
2'" xn ROM
Disk
Ao
Am-I
Arn Jx2 Tape
decoder
-.. ..........-- ................... _________________________________ ______________________
: .............................. , .. , , , , .....--------·-···-··············
,,,,
outputs
Figure 5.11 : An example memory hierarchy.
enable - + - - - - '
5.5 Memory Hierarchy and-Cache

When we design a memory tci store an embedded system's program and data, we are often
faced with a dilenuna: We want an inexpensive and fast memory, but inexpensive memory
tends to be slow, whereas fast memory tends to be ~nsive. The solution to th.is problem is
Qn-1 Qo ~o create a me~~!)'..!):_i~rarctiy, as illUSlrated in Figure 5.11. '!le use an inexpensive but slow
main memory to store all of the program and datr."We_use a small .amoul!~~f fast but
expensive cache memory to store)::opies of likely accessed parts of main memory. Using
Figure 5.10: Composing smaller memory J)"r1s into larger memory. cache is analogous to posting on a wall near a telephone a short list of important p~one
n · · 1
Note that, when composing memories to increase the number of words, we don'! me ude even larger and less expensive forms of memory, .sue as
necessarily have to use the highest-order address lines to select the appropriate memory, · ·tape; for some of their storage needs. However, we do not consider these further as they are
although these are ·the most logical choice. Sometimes, especially when we are composing not especially common in embedded systems. Also, although the figure shows only one
just two memories, we use the lowest order bit to select among memories - thus, one cache, we can include any number of levels of cache, those closer to the processor being
memory represents "odd" addresses, and-the other represents "e11:en"-addresses. smaller and faster than those closer to main memory. A two-level cache scheme is -common.
C~clie is usually designed using static RAM rather than dynamic RAM, which is one
reason that <:ache is more expensive but faster than main memory. Because cache usually
appears on the same chip as a processor, where space is very limited, cache size is typically
124 Em~dded System Oesig~ Embedded System Design 125
Chapter 5: Memory
rr=====,.===;~=v===,.q
only a fraction of the size of main memory. Cache access time may be as low as just one Tag
clock cycle, whereas main memory access time is typically several cycles. ·
~ache -99erat,;s as follows. When we want the processor to access (read or write) a V
main mt.iiory address, we first check for a copy of that location in cache. If the copy is ln the ·
cache, called a cache hit, then we can access it quickly. If the copy is not there, called a cache.
(a)
miss, t.!1-_~n we must first read the. addre~s and perhaps some of its neighbo~o .!!.1e. cad~,,
This description of cache operation leads to several cache design choices: cacne m a ~
cache rep\acement policy1 and cac!_ie write techniques. The_se ..design . choic~~ave
significant impact on system cost, pe ormance, as well as power, and thus should be
evaluated carefully for a given appr 10 ' ·
e Mapping Techniques
'Cache mapping is the method for assigning main memory addresses to the far fewer number
of available cache addresses, and for determining whether a particular main memory address's
contents are in the cache. Cache mapping can be accomplished using one of three basic
techniques (see Figure 5.12):
(b)
I. In direct mapping, illustrated in Figure 5.12(a), rhe ·main memory address is divided
into two fields, the index and the tag. The index represents the cache address, and
thus the .number 'or index bits is determined by the cache size (i.e., index size =
log2(cach size)}»)No~~ that many different ·main memory addresses will map to thJ
e che a<f~ss. Wh91 we store .the -contents of a main memory address in the
a
e, we·'a!s~ store the ! ~ o determine if desired main memory address' is' in the
.. he, we g~ to the cat~ddress indicated by the ~~x, an~ c?mp~~-~g:!!!e~
~1~h the desired tag. If the ta_gs~match, then.we-,c~eck-the vahd _bi~ ~lid-.bzt :fag
indicates >Nhether the .data'st6red ih ~t:eache slot nas pr~viously ~~n'toaded _into
the cache fi'.om the main n_ie~ol)1JWe tfSe:f' · ojJ}et po~on of tnl!."I_ l@'.n~l?,' ~<l~ess t~ V
grab a articular word within tire cacne-1 · .cache line, also known a~ ·caehe
bl~_ . . the number of (inseparable) ~d· . nt 'memory a,d dres~es loaded from or
ed into main memory at a time. A.typical block size is four or eight addresses. (c)
<:1' ly .associative mapping, illustrated in Figure, 5.12(b), each cache address .·
'1:l;,, ms !J.Ot only the contents of a niainmemory address, but also the complete main
di a
address. To determine if desired main memory address is in the cache, we .
I1 ired address.
..
si I eously (assodaiively) compare-all-the
··
\ .
_ ·
addresses
·, .
' , ·
stored
. in the. cache with the
.J . •·
' "
'' i
1.·1
,: I
In set-associative mapping, illustrated in Figure 5.12(c), a compromise- is reached
,1,I betwee /direct and fully associative mapping. As in direct mapping, aJ\. ~ a P L
u ea m · memory address to a cache address, b ~ s contains
1
e ontent and tags of two~ o f y ~ < : _ l y,,a ~et oi e~!lfy~ ·
etenni~e i~ a .desired ~ memory addre~ is · . 'ffie ·cac~ i:9e_. B': _ l ! ~__cac~e' Figure 5. I 2: Cache mappjng techniques: (a) direct-mapped, (b) rur' y associative, (c) two-way set associative.
address md_1¢ated by the mdex, and we then s1 taneousl ~ssoqaUvruY) compare
.Direct-mapped caches are easy to implement, but may result in numerous misses if two
allthe tags at that location (i.e., of that set) · · the d · .. . . : A cache a set of wth or .more words with the same index are accessed frequently, since each will bump the other
size 1:f Js called an. N-way set-associati cao/', jp y!., ~ - w a y set ?'1t ofth~ ~he. F~ly associative~ches on the other hand are rasi:but the a'imparison logic
assoc_ 1at1ve caches are c o ~: ..f .J./ 1 · • .• . . ts expensive to implement Set-associative caches can reduce · misses compared to
' ' '
126 Em~_d ded System Design

··----~------ -- -- ·-- - - - .-- - . ·-·-------· ---·----···· www.compsciz.blogspot.in

Chapter 5: Memory
6:fi: Memory Hier.1rchy and Cache
direct-mapped caches, without requiring nearly as much comparison logic as fully associative .·
caches.
Caches are usually designed to treat collections of a small number of adjacent main. ·
memory addresses as one indivisible block, also known as a line, typically consisting of about
eight addresses. ·
I Kb 2 Kb 4 Kb 8 Kb 16 Kb 32. Kb 64 Kb
Cache Write Tyfmiqu~ · · Figure 5.13: Sample cache performance trade-offs.

. . - :/M~ ..•
Whe e write to/a cache, we must 1'1 some point update the main memory. Such update is
:· i
y an issue .· rdata cache, since instruction cache is read--0nly. There are two common cache design are the total size of the cache, its degree of associativity and the data block ·
k 1· . tha . , Size,
a. .a., me size,. t 1s read or written during each cache access by the microprocessor. _
update tee . ques, write-throughand write-back.
In ·write-through techniq'iie, whenever we writf\_ to the cache, we also write to main The total. size.of the cache is measured as the total number of data bytes that the cache I
me ,· requiring the processor to wait until pie write to main memory completes. Although . can hold. Nouce that a ~che sto~es other information, such as the tags and valid bits, which I
· to implement, this t hr.ique may result in several unnecessary writes to main memory. .· do not contnbute to the size of the cache. So, a 32 Kbyte·cache has room for 32,678 bytes of
or example, suppose ograrn writes to a block in the cache, then reads it, and then writes .. da~, plus add1uonal storage ~or ~ag and house k~ping bits. By making a cache larger,. one
it again, with the b staying in the cache during all tlu·ee accesses. There would have been , achieves lower ~1ss rates, which 1s one of the design goals. However, accessing words within
no need to u · ihe main memory after the first write, since the second write overwrites this a .large cache wlll be slower than accessing words in a smaller cache. To clarify this, we will
first write give an example. ~irst let us assume that we are designing a small 2 Kbyte cache for our
Th write-back technique reduces the..number of writeio main memory ~ n g , a processor. With this cache, we have measured the miss rate to be 15%, meaning 15 out of
~ main.me~ory o~y when th~ b~Q~kis bei.ng replac~d -'=~-~~r19.rtl:?':.,i ~~ -,.1ock
was every I00_accesses to the cache result in a miss on the average. The cost of going to main
wntten to dunng its stay m the cache. fins tecJ!nique reqw s tfiat we a~ g1_!~_ll.l! S:xtra bll, memory (1.e., the cost of memory access when there is a miss) is 20 cycles. The·cost of going
called a dirty bit; w"ith each block.(We ~rthis bil wheu . - / 'ITI~ to tfil!~t~ ~'4~he, . only to the cache (1.e., the cost of memory.access when there is a h i t ~ cycles. Hence, on
ai:idy;e then C~t::Ck it when replac~biock to det inc if we sli.o~!~r ~~~k to . Qte average,. the cost of a memory access 1s (0.85 * 2) + (O.J5~~-4? cycles. Now let us
main memory. · douMe the size of the cache, and assume this inlproV:t::S our hit rate to 93 .5%, at the expense of
slowmg the cache down by an extra clock cycle: Now, the average cost of a memory access
Cache Impact on·System Perfo · becomes <_?.935 * 3) + (0.065 * 20) = 4. W5_cycles. This second cache will perform better
The design and configuration of caches n have a large impact on performance and power than .our urst one. Now, w~ double ~ ze
of our cache one more time, resulting in an
consumption of a system. So far, we looked at cache mapping, associativity, write back, and add1uonal clock cycle per hit achieving 94.435% inlprovemenl in terms of hit rate. The
replacement policies. From a performance point ofvie.w, the ·most important parameters in average cost of me access, thus, becomes (0.94435 * 4) + (0.05565 * 20) = 4.8904
cycles. This larg cache will perform worse than our first two designs.
128
Embedded System Design , p - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - -
• Embedded System Design
129
www.compsciz.blogspot.in ./ \~"'. : : : " ' , - - - - - - -
...........·,.. .. ....... . .. .,_. - - - - - = i i !!l!ii!
Chapter S: Memory
S.6: Advanced RAM
Note that the problem of making a cache larger is additional access time penalty, which
quickly offsets the benefits of improved hit rates. Designers often use other methods to data
improve cache hit rate without increasing the cache size. For example, they make a cache set
associative or increase the line size. These methods too incur additional logic and add to the
access time latency. Increasing the line size can, additionally, improve main memory access
time. at the expense of more complex multiplexing of data and thus increased access latency. Sense
Figure 5.13 summarizes the effects of cache size and associativity in terms of average miss Amplifiers
rate for a number of commonly used programs under the Unix environment, such as gee.
The behavior of caches is very dependent on the type of applications that run on the
processor. Fortunately, for an embedded system, the set of applications are well defined and
known at design time. so the designer has the ability to measure the performance of some
candidate cache designs and choose one that best meets the performance, cost, and power ~:, :,
constraints. One way to perform such analysis is as follows. We instrument the executable m m
'::i ..;
with additional code such that, when executed, it outputs a trace of memory references. Then, 0 "O
"O
s -<
we feed these traces through a cache simulator, which outputs cache statistics at the end of its
execution. We can perform all this analysis on our development computer.
address
"'
Cl
&
~
._
ras
5.6
Figure 5.14: Basic DRAM architecture.
icr DRAM as a type of st~~ag_e~ice that uses a single ·
transistor/ citor pair to store a bit. Because of such architecture and the resulting high
capac nd low cost. DRAMs are commonly used as the main memory in yrocessor bas~d J
e ded systems. In order for DRAMs to keep pace with processor speeds, many variations ·
on the basic DRAM interface has been proposed. In this section, we d~rtbe the stn"icture of a
~·::
~ s i c DRAM
. .
.
------------- --~
basic DRAM as well as some of the mo~ recent and advanced DRAM desj2_ns_.... .
.
, .
l
;.
The basic DRAM architecture is depicted iil Figure 5.14. The addressing mechanism for a :
memory read· works as follow/Th; address bus is multiplexed between row and column.·
components. Using the row adMs°" select (ras) signal; the row component of the address is ·
latched into tlte row address buffer. Likewise, using the column address select (cas) signal, ·
the col component of the address is latched into the column address ~offer. (Note that in ·.
days, the number of 1/0 pins were limited, hence manufacturers of DRAMs adopted ·
s multiplexed scheme to reduce e overall 1/0 requirements. In fact, some DRAM devices
used the same 1/0 pins form · exed data as well as multiplexed address signals.) As soon ~e fas! page mode DRAM design is an improvement on the basic DRAM architecture. In
as the row address co nent is latched into the row address buffer, the row decoder · this design, each ro_w of the memory bH-array is viewed as a page. A page contains·multiple
~~:::::...;;:.~~~~-~--~:. .s. The length of this bit-row depends on the word size and words. Each ~rd 1s addr~ssed by a different column address, The sense amplifier in FPM
ce. ce the column address buffer is latched, the column decoder enables DRAM amphfies the entire page once its address is strobed into the row address latch.
the particul ord (referred the address) in order for it to propagate to the sense amplifier Thereafter, ea~h _wor~ of that page is read (or written) by strobing the corresponding column
(Die se amplifier's las ·s to detect the voltage level of the bits (transi$tor/capacitor pairs) addre~s. The t1mmg diagram for FPM DRAM is depicted in Figure 5.15. Here, after selecting
corr nding to the erenced word and amplify them to a high enough level for latching a particular page (row), three data words within that page are read consecutively. The page
int the output b ce the data is in the output buffers, it can be read by asserting tm
130 Embedded System Design Embedded System Design

131
__________· ~
~x~-~~ Jo~
y M~~
Chapter 5: Memory 5.6: Advanced RAM ·1
ras
_J (f~~ i
cas ______,..n. . __~n.__ ____.n. . .___ ras l
address
cas L
row
address
,_
data
data
data
Figure 5.15: FPM DRAM timing.

Figure 5.16: EDO DRAM timing.
\,\,,1\o.
The enhanced synchronous DRAM. or ESDRAi'v1. is an improvement to the SDRAM
design. ~ j_m_proyerne.fil::!~ ~ qg()us) £ 2,l!i!L!WI.Q.\UQ~t!le ~M QR.AM by .the ED In
-~.hart, ca.(jJ.i;:f.{6iillersTn~ ~- 6¢en .ad.ded .to.Jhe..s_cl!fil;_ amplifiers o enable overlapping of e
column addressing. ·s enables faster clocking-and low atency in reading 1ting
data. ,, J .,,a,.'Y,d) _ .o ci,.~V-:3) . _,11.{_
u':'.:'. ().Ji"- 3/ . Sq(\.J-Jt.a-1 ~- ~ rJ5'lP
R _(RDRAM) 0 ~w,.i'l ~~~
Rambus is really more of a bus interface arch.itectuie than DRAM architecture, Rambus uses
multiplexed address/data lines · to connect the me_mory .controller (or processor) to the
RD~ device. The specification for tl1is interface states that the clock runs at 300 MHz. In
addition, data is latched on both rising and falling edge of the clock. Using such a bus.
theoretically, a transfer rate of 600 million cycles is possible. In addition. each 64-Mbit
RDRAM is bro~ into· four banks (pmts) each with its own row decoders. So. at any given
time, four pages remain open_ The RDRAM protocol is packet driven. where address packets
ar.e followed by data packets. The smallest transaction requires a minimum of four cycles.
Because of its mu!_tiple open page s.::hcmcs. and fast bus 1/0. RD RAM. when utilized properly.
is capable of very high throughput.
DRAM lntegra~on Problem \,\,fl\..0 •

So far, we have discussed static and dynainic types of RAMs and brought up the benefits and
disadvantages of each type. In th.is section. we describe the problem of integrating memory
and conventional logic (gates) on the same IC. While most static types of RAMs can easily be
integrated with other logic on a single chip (e.g., .JCs containing· a cache and a·
microprocessor), it is very difficult to integrate DRAMs and conventional logic The iiifficuhy
l (}-l. arises from tlie different chip making process that is involved when making DRAMs as
opposed to c~nventional logic. When designing · conventional logic ICs. the goal of the
designers is to minirnize,the parasitic capacitance in order to reduce signal propagation delays
and power consumption. In contrast, wh~n designing DRAMs, the goal of the designers is to

132 Embedded System Design 133
www.compsciz.blogspot.in }
/ ..... ........... - ---·---~~ -·- -· --.-if-
Chapter 5: Memory
5.8: References and Further R. · d.
ea mg
copies of frequently accessed instructions/data in small fast memori s c h ·

clock l
fast memory between a processor and main. memory. Several cachee desi3c is a small and
mfluence the speed and cost of cache, including mapping techniques re:ce~~:~s ~~tly
ras and wnte techniques. Several advanced DRAMs provide high speed' po 1c1.es,
FPM RAM EDO RAM - . ,memory access hke
cas . , , and SDRAM. lntegrating DRAM with on-chi · ·rocessor '
diflkult due 1.0 different re processes. Thus, choice of memory types and ~!ign of a ~~n be
archt1ecture is an important part of embedded .system desi and . . mory
performance. power, size, and cost. gn, can greatly impact
./~ !5i¥1~~ta~·H data 5.8 References and Further Reading ·

• http://www.instantweb.com/- foldoc/contcnts html The Free o 1· !) · ·
Figure 5. I 7: SDRAM timing. C I p ·d · ' n me 1cllonary of
. ompu mg. rov1 es definitions of a variety of computer-related terms · I di
numerous ROM and RAM variations. ' me u ng
create capacitor cells in order to retain stored infonnation. lllis difference in design goal leads
• David Patt~rson and John He~essy, Computer Organization and Design. San Francisco
to a design process that is considerably different between DRAM and conventional logic.
CA. Morgan Kaufmann Publishers, lnc. lncludes discussion of memorv hierarchy and'
However, integrated processes are beginning to appear. cache. ·J
Memory Management Unit (MMU)

We conclude this section by briefly discussing the duties of a MMU. A gystem that contains 5.9 Exercises
DRAM requires some processor that handles tasks such as refresh, DRAM bus interface and
arbitration and sharing of a memory among multipls: processors. In addition, the MMU 5. I Briefly define each of the following: mask-programmed ROM. PROM EPROM
translates logical memory addresses, issued by attached processors, to physical memory EEPROM, flash EEPROM, RAM. SRAM, DRAM, PSRAM, and NVRAM. ' '
addresses thai make sense to the particular DRAM architecture in use. Modem CPUs often 5.2 Define the.two ma.m characteristics ofme~ories as discussed in this chapter. From the
come with a .MMU built as part of the processor's core. Otherwise, single-purpose processors types of. memory mentioned m Excerc1se 5.1, list the worst choice for each
can be designed or purchased to handle such memory management tasks. charactensllc. Explain. . ·
~ ketch the internal design of a 4 x 3 ROM. ·
,JA"' _,Sketch the mtemal design ofa 4 x 3 RAfvf.
,, }-~ om.pose 1Kx 8 ROMs into a IK x 32 ROM (Note: IK actually means 1,024 words).
5.7 Summary ~om pose I K x 8 ROMs mto an 8K x g ROM.
Memory stores data for use by processors. We have categorized memory using two om pose I K x 8 RO Ms into a 2K x 16 ROM.
characteristics, namely, write ability and storage permanence. ROM typically is only read by 5.8 Show how to use a IK x 8 ROM to implement a 512 z 6 ROM
an embedded system. It can be programmed during fabrication (mask-programmed) or by the 5 .9 Given the following three cache designs, find the one with. the best performance bv
user (programmable ROM, or PROM). PROM may be erasable using UV light (EPROM), or ~alcula~~g the average co!t of.access. Show all calculations. (a) 4 Kbj1e. 8-way sct-
electronically erasable (EEPROM) word by word, or in large blocks (Flash). RAM, on the as.soc1all\·c cache with a 6 1/o miss rate; cache hit costs one cycle. cache miss costs 12
other hand, is memory that can be read or written by an embedded gystem. Static RAM uses a cycles. (b) 8 Kbytc,. 4-.way set-associative cache with a 4% miss rate: cache hit costs
flip-flop to store each bit, while dynamic RAM uses a transistor and capacitor, resulting in ~~·(.) cycles, cache nµss costs 12 cycles. (c) 16 Kb.ytc. 2-way set-associat1ve cache with a
fewer. transistors but the need to refresh the charge on the capacitor and slower performance. -1/o miss rate, ·cache hit costs tlrrce cycles. cache miss costs 12 cvcles.
Psuedo-static RAM is a dynamic RAM with a built-in refresh controller. Nonvolatile RAM S. IO Given a 2-lcvcl cache design where the hit rates arc 88% for th~ smaller cache and 97%
keeps its data even after power is shut off. Designers must not only choose the appropriate for the. .larger cache, the access costs for a miss arc 12 cycles and 20 ovcles.
type of memories for a given gystem, but must often compose smaller memory parts into rcspccll\cl:,, , and the access cost for a hit is one cycle. calculate the average c~st of
larger memory. Using a memory hierarchy can improve system performance by keeping access.
134 Embedded System Design Embeild~di System Design

..135
,1
Chapter 5: Memory
'
I
;·'
5.11
5.12
A given design with cache implemented has a main memory access cost of 20 cycles on
a miss and two cycles on a hit. The same design without the cache has a roam memory
access cost of 16 cycles. Calculate the minimwn hit rate of the cache to make the cache
implementation worthwhile.
Design your own SK x 32 PSRAM using an SK x 32 DRAM, by des1grung a re res
. . , f h
controller The refresh controller should guarantee refresh of each word every I 5.6!5
CHAPTER 6: Interfacing
microsec~nds. Because the PSRAM may be busy re~reshing itself when a rea~ or wnte
access request occurs (i.e., the enable input is set), 1t should have an output s1~al ~k
indicating that an access request has been completed. Make use of a timer. Design e
system down to complete structure. Indicate at what frequency your c!ock must operate.
6.1 Introduction
6.2 Communiciition Basics
6.3 Microprocessorlnterfacing: 1/0 Addressing
6.4 Microprocessor Interfacing: Interrupts
6.5 Microprocessor Interfacing: Direct Memory Access
6.6 Arbitration
6. 7 Multilevel Bus Architectures
6.8 Advanced Communication Principles
6.9 Serial Prntocols
6.1 o Parallel Protocols
6.11 Wireless Protocols
6.12 Summary
6 .13 References and Further Reading
6. 14 Exercises
6.1 Introduction
As stated in the Chapter 5, we use processors to implement processing, memory to implement
storage, and buses to implement communication. The earlier chapters described processors
and memory. This chapter describes implementing communication with buses, known as·
interfacing. Communication is the transfer of data among processors and memories. For
example, a general-purpose processor reading or writing a memory is a-common form of
communication. A general-purpose processor reading or writing a peripheral's register is
another common form.
We begin by defining some basic communication concepts. We then introduce several
issues relating to the common task of interfacing to a general-purpose processor: addressing,
interrupts, and direct memory.access. We also describe several schemes for arbitrating among
multiple processors attempting to access a single bus or memory simultaneously. We show
. Embedded System Design . Embedded System Design , 137

136
6.2: Conwnunication Basics
Chapter 6: Interfacing
that many systems may include several hierarchically organized buses. We then discuss some rd'/wr
more advanced communication principles and survey several common serial, parallel; and Processor
enable - Memory
wireless communication protocols.
addr[O~I lj
.
data[0-7)
6.2 Communication Basics ~
bus
Basic Terminology (a)
We begin by introducing a very basic communication example between a processor and a rd'/wr .rd'/V.T
memory, shown in Figure 6.1. Figure 6.1 (a) shows the bus structure, or the wires connecting enable '-- enable "--
~~lf1
the processor and the memory. A line rd'lwr indicates whether the processor is reading or
=t~~ I
addr addr
writing. An enable line is used by the processor to carry out the read or write. Twelve address
tines addr indicate the memory address that the processor wishes to read or write: Eight data data data
lines data are set by the processor when writing or set by the memory when the processor is
reading. Figure 6. l(b) describes the read protocol over these wires: the processor se1:5 rd1/wr
to O, places a valid address on addr, and strobes enable, after which the memory WIii place (b} (c)
valid data on the data lines. Figure 6.l(c) shows a write protocol: the processor sets rd1/wr to
1, places a valid address on addr, places· data on data, and strobes enable, causing the Figure 6.1 : A simple bus example: (a) bus structure, (b) read protocol, (c) write protocol.
memory to store the data. This very simple example brings up several points that we now
describe. the IC. In fact, even for a processor packaged in its own IC, alternative packaging-techniques
Wires may be unidirectional; meaning they transmit in only one direction, as did rd'l wr, may use something other than pins for connections, such as small metallic balls. However, we
enable and addr, or they may be bidirectional, meaning they transmit in two directions, ···W
can still use the term pin to refer to a port on a processor.
though in only one direction at a time, as did data .. A set ?f wires with the same functi~n is The distinction between a bus and a port is similar to the distinction between a street and
typically drawn as a thick line and/or as a line with a small angled line drawn through 1t, as a driveway - the bus is like the street, which connects various driveways. A processor's port
was the case with addr arid data. is like a house's driveway, which provides access between the house and the street.
The tenn bus can refer to a set of wires with a single function within a communication. The most common method for describing a hardware protocol is a timing diagram, as was
For example, we can refer to the "address b~" and the "data bus" in the above e_xan:1ple. The used in Figure 6 . l(b) and(c). In the diagram, time proceeds to the right along the x-axis. The
tenn bus can also refer to the entire collection of wires used for the commurucatlon (e.g., diagram shows that the processor must set the rd'lwr line low for a read to The diagramoccur.
rd'lwr, enable, addr, and data) along with the communication protocol over those wires. Both also shows, using two vertical lines, that the processor must place the address on addr for at
uses of the term are common and are often used in conjunction with one another. For least t,ctup time before setting the enable line high. The diagram shows that the high enable
example, we may say that the processor's bus consists of an address bus a,iid a_data_ bus .. A __ line triggers the memory to put data on the data wires after a time tread, Note that a timing
protocol describes the rules for communicating over those wires. We deal pnmarily With low- , diagram represents control lines, like rd'lwr and enable, as either being high or low, while it
level hardware protocols in this chapter, while higher-level protocols, like IP (Internet represents data lines, like addr and data, either as being invalid or valid, using a single
Protocol) can be built on top of these protocols, using a layered approach. homontal line or two horizontal lines, respectively. The actual value of data lines is not
The bus connects to ports of a processor (or memory). A port is the actual conducting normally relevant when describing a protocol, so that value is typically not shown.
device, like metal, on the periphery of a processor, through which a signal is input to or output In the above protocol, the control line enable is active high, meaning -that a 1 on the
from the processor. A port may refer to a single wire, or to a set ·of wires with a single enable line triggers the .data transfer. In many protocols, control lines are instead active low,
function, such as an address port consisting of twelve wires. A related term is pin. When a meaning that a O o_n the line triggers the tr.iilsfer. Sµch a control line's nanJ,e is typically
processor is packaged as its own IC, there are actual pins extending from the package, and written with a bar above it, a single quote after it (e.g., enable), a forward slash before it (e.g.,
those pins are often designed to be plugged into a socket on a printed-circuit board. Today, /enable), or _the letter L after.it (e~g., enable_/). To be general, we will use the term assert to
however, a processor commonly coexists on a single IC with other processors and memories. mean setting a control line to its active value, .such as to I for an active high line, and to O for
·Such a processor does not have any actual pins on its periphery, but rather "pads''.. of metal in an active low line. We \\'i ll use the term deassert to mean setting the control line to its inactive
139
-·· ---·-··'-~ ~ - - - ··- -··----,--------- --····· --· ----------·--·· ··- . www.compsciz.blogspot.in - ··· ·--·-- -- ------- ------ ------- --·------· ,~- . ·-~+::c~---
-~
·;i] -
· Chapter 6: Interfacing
6.2: Communication Basics )
~
Master
data
mux
Servant
data
demux
Master req Servant Master req Servant Master
~
req
ack
Servant
; ~J
data
C:
data ~~
~~
req ~ req~ req=C \ .

data ~ . addr/data ~ - data~l~--~~-4
¼ccess
(a) (b)
I. Master asserts req to receive data I. Master asserts req to receive data
Figure 6.2: Time-multiplexed data transfer: (a) data serializing, (b) address/data muxing. 2. Servant puts data on bus.within time'•<= 2, Servant puts data on bus and asserts ack
3. Master receives data and deasserts req 3. Master receives data and deasserts req
. .
4,. S.erv<!rit ready for next request 4. Servant ready for next request
value. Notice that the rd'lwr of our earlier example merges two control signals into one ·line,
so we accomplish a read by setting.rd1/wr to O and a write by setting rd'lwr to I. (a) (b)
. A protocol typically consists of several possible subprotocols, such as a read protocol and
Figure_6.J: Two protocol control methods: (a) strobe, (b)_handshake. Tbc~ain differences are underlined.
a write protocol. Each subprotocol is knov.n as a transaction or a·bus cycle. A bus cycle may
consist of several clock cycles. · · ·
Another protocol concept is time multiplexing. To multiplex means to share a single set
Basic Protocol Concepts of wires for multiple pieces of data. In time multiplexing, the multiple pieces of data are sent
one at a time over the shared wires. For e,mmple, Figure 6.2(a) shows a master sending 16
The processor-memory protocol described above was a simple one. Hardware protocols can
bits of data over an 8-bit bus using a strobe protocol and time-multiplexed data. The master
be much more complex. However, we can understand them better by defining s~me basic
first sends the high-order byte and then the low-order byte. The servant inust receive the bytes
protocol concepts. These concepts are: actors, data direction, addresses, time-multiplexing,
and then demultiplex the data. This serializing of data can be done ta any extent, even down
and control methods.
to a I-bit bus, in order to reduce the number of wires. As another example, Figure 6.2(b)
An actor is a processor or memory involved in the data transfer. _A protocol typically
shows a master sending both an address arid data to a servant, such as a memory. In this case,
involves two actors: a master and a servant. A master initiates the data transfer. A servant,
rather than using separate sets of lines for address and data, as was done in Figure 6. 1, we can
·· commonly called a slave, responds to the initiation request. In the example of Figure 6.1 , the
time multiplex the address and data over a shared set of lines addrl data.
processor is the master, and the memory is the servant (i.e., the memory canno{ initiate a data
Control methods are schemes for initiating and ending the transfer. Two of the most
transfer). The servant couid also be another processor. Masters are usually general-purpose
common methods are strobe and handshake. In a strobe protocol, the master uses_one control
processors, and servants are usually peripherals and memories. ·
line, often called the request line, to-initiate the data transfer, and the transfer is considered to
Data direction denotes the direction that the transferred data moves between the actors.- ---::'.
be complete after some fixed time interval after the initiation. For example, Figure 6.3(a)
We indicate this direction by denoting each actor as either receiving or sending data. Note that ·
shows a strobe protocol with a master wanting co receive data from a servant. The master first
actor types are independent of the direction of the data transfer. In particular, a master_ may
either be the receiver of data, as in Figure 6.J(b), or the sender of data, as shown Figure
asserts the request line .to initiate a transfer. The-servant then has time '•= ,
to put the data on
the data bus. After this time, the masier reads the data bus, believing the data to be valid. The
6. l"(c). . . . ·
master than deasserts the request line, so that the servant can stop putting the data on the data
Addresses represenl a special type of data used to indicate where regular data should go
bus, aild both actors are then ready for the next transfer. An analogy is a demanding boss who
to or come from. A protocol often includes both an address .and regular ¢tta, as did the
. tells an employee "I want that report (the data) on my desk (the data bus) in orie hour (t.«..,),"
memory access protocol in Figure 6.1; where the address specified the location where the data
and merely expects the report to be on the desk in one hour. ,
. should be read from ·or written to in the memory. An address is also necessary when a
The second common control method is a handshake protocol, in which the master uses a
general-purpose processor communicates with multiple peripherals over a single bus; the
request fine to initiate the transfer, and the servant uses an acknowledge line to inform the
address not only specifies a particular peripheral, but also may specify a particular register
within that peripheral.
master when the data is ready. For example, Figure 6.3(b) shows a handshake protocol with a
i
140 .
141
iil
-·-- ----- ·- ··-- ------ ···--·-u~------- , • _____.:. :.,;. ,;_ , •• . .. l
-~ .::o:,;,.!,;.,; ,~~
Chapter 6: Interfacing ·8.2: Conwnunic:ation Basics
Master req Servant

::. CYCLE f Cl
wait
CLOCK!
data
- 0[7-0) 1:-----~---+-----+-----L.:~~---
=~
t-
A[l9-0J H'---,----;------;--AD-,--D_RE_ss_ _ _ _ _+--_--..J
·
ALE µ-i'----+----;-----+----+-----
/MEMR l !
I. Master asserts req to receive data

¼=s
1. Master asserts req to receive data CHRDY
-------.
2. Servant puts data on bus within time t•= 2. Servant can't put data within t•=• asserts wait
(wait line is unused) 3. Servant puts data on bus.and deasserts wait (a)
3. Master receives data and deasserts req 4. Master receives data and deasserts req
4. Servant ready for next request
(a)
S. Servant ready for next request
(b) CYCL'E I Cl ! WAIT l C3
Figure 6.4: A strobe/handshake compromise: (a) fast-response, (b) slow-response. The differences are underlines. CLOCK!
0[7-0) ;....- - - - -'-------;-------;-----;----'

receiving master. The master first asserts the request line to irutiate the transfer. The servant
takes as much time as necessary to put the data on the data bus, and then asserts the A[I9-0Jri.____-...._____ ___,!-
AD_D_RE_ss___,_ _ _ _ _ _
acknowledge line to inform the master that the data is valid. The master reads the data bus and
then deasserts the request line so that the servant can stop putting data on the cl.\ta bus. The ALE u--i
___~_-_-_-_,-+-----------+--,------i;:=========
servant deasserts the acknowledge line, and both actors are then ready for the next transfer. In
our boss-employee analogy, a handshake protocol corresponds to a more tolerant boss who
/MEMR I
CHRDY;------.--,
tells an employee "I want that report on my desk soon; let me know when it's ready." A
handshake protocol can adjust to a servant, or servants, with varying response times, unlike a
strobe prqtocol. However, when response time is known, a handshake protocol may be slower (b)
than a strobe protocol, since it requires the master to detect the acknowledgment before
getting the data, possibly requiring 'an extra clock cycle if the master is synchronizing the bus. Figure 6.S: ISA bus protocol: (a) read bus timing, (b) write bus timing.
control signals. A handshake also requires an extra line for acknowledge.
To achieve both the speed of a strobe protocol and the varying response time tolerance of peripherals· and memories. 1/0 is relative to the processor: input means data comes into the
a handshake protocol, a compromise protocol is often used, as illustrated in Figure 6.4. In this processor, while output means data goes out of the processor. In the next three sections, we
case, when the servant can put the data: on the bus within time t,ccess, the protocol is identical will discuss three · microprocessor-interfacing issues: addressing, interrupts, and direct
to a strobe protocol, as shown in Figure 6.4(a). However, if the servant cannot put the data on memory access. We' ll use the term microprocessor to refer to a general-purpose processor.
tlte bus in time, it instead tells the master to wait longer, by asserting a line we' ve labeled
wail. When the servant has finally put the data on the bus, it deasserts the wait iine, thus Example: The ISA Bus· Protocol - Memory Acc~ss .
informing the master that the data is ready. The master receives the data and deasserts the
request line. Thus, the handshake only occurs if it is necessary: In our boss-employee ·, the Industry Standard Architecture (ISA) bus protocol is common in systems using an 80x86
analogy, the boss tells the employee "l waht that report on my desk in an hour; ·if you can't. microprocessor.: Figffe.'e 6.5(a) .illustrates the bus timing for performing a memory read
finish by then. let me know that and then let me know when it's ready." ·· , operation, referred·';\ o. as a memory read cycle. During a memory read cycle, the
Perhaps the most common communication situation in embedded systems is the input and microprocessor drives· the bus signals io read a t;,yte of data from memory. Note that in Figure
ouiput (l/0). of ·data to and from a general 0purpose processor, as it communicates with its 6.5(a), several other control signals _that ~e inactive during a memory read cycle are not
142 Embedded System Design :-Embedded SY$1em Design 143
www.compsciz.blogspot.in --·- -- ----'-'-~' --·---·~·-· .. __j

·.) ,
6.3: Microprocessor Interfacing: 110 Addressing
Chapter 6: lnterfacang
included in lhe timing diagram. The operation works as follows. In clock cycle Cl, -the-(
Processor Processor Porto
microprocessor puts a 20~bil memory address on lhe address lines A and asserts lhe address
Port I
latch enable signal ALE. During clock cycles C2 and CJ, the processor asserts th(; memory
Port2
read signal MEMR to request ·a read operation from the rnernory device. After C3 . the memory Port 3
device holds lhe data on data lines D. In cycle C4, all signals are deasserted.
The ISA read bus cvcle uses a compromise strobe/handshake control method. The
memory device deasserted the ch~el ready signal CHRDY before the rising clock edge in
C2. ca~sing lhe microprocessor to insert wail cycles until CHRDY was reasserted. Up lo six
wait cycles can be inserted by a slow device.
Figure 6.5(b) illustrates the bus timing for performing a memory write operation, ~eferred Port A Port B Port C Port A Port B Port C
to as a memoril write cvcle. During a memory write bus cycle. the microprocessor dnves the
bus signals to ·write a b)1e of data lo memory. The operation works as follows. in cl~k cycle (a) (b)
Cl. lhe processor puts the 20-bil memory address to be written on the address Imes and
Figure 6.6: Parallel 1/0: (a) adding parallel 1/0 to a bus-based 1/0 processor, (b) exlended parallel 1/0.
asserts the ALE sign~. During cycles C2 and 0. the processor pu!s lhe wri!e data ~n the data
lines and asserts lhe memory write signal A/£~./H' to indicate a wnle operauon lo the memory
device. In cycle Cl. all signals are deasserted. The write cycle also uses a compromise In bus-based 110, the microprocessor has a set of address, data, and control ports
strobe/handshake control method. · correspondi,ng to bus lines, and uses the bus to access memory as well as peripherals. The
microprocessor has the bus protocol built in lo its hardware. Specifically, the software does
not implement lhe protocol but merely executes a single instruction lhat in tum causes the
Microprocessor Interfacing: UO Addressing

hardware lo write or read data over the bus. We normaUy consider lhe access to lhe
6.3 peripherals as I/0, but don't norrnally consider the access lo memory as I/0, since lhe
memory is considered more as a part of-the microprocessor.
Port and ·Bus-Based 1/0 A system may require parallel I/0 (port-based I/0), but a microprocessor may only
A mieopr~essor may have tens or hundreds of pins. many of which are control pins, such as , support bus-based I/0. In this case, a parallel 1/0 peripheral may be used, as illustrated in
I
a pin for clock input and another input pin for resetting lhc microprocess?r. Many of the other
pins are used lo communicate daia to and fro1~ lhe microprocessor. wluch we call processor ·
1/0. There are two common methods for usmg pms lo support I/0: port-based I/0 and •
l Figure 6.6(a). The peripheral is connecteq to the system bus on one side, wilh corresponding
.address, data, and control lines, and has several ports on the other side, consisting just of a set
of data lines. The ports are connected to registers inside· lhe peripheial, and the
bus-based 1/0. · · microprocessor can read and write those registers in order lo read and write lhe ports.
In port-based ID. also known as parallel /iO. a port can be directly read and written by Even when a microprocessor supports port-based I/0, we rna)>rcquire more ports than are
processor instructions.just like any other register in the microprocessor: in fact, the port is available. In this case, a parallel I/0 peripheral can again be used, as illustrated in Figure
usually connected to a dedicated register. For cxan1plc. consider an 8-bil port named PO. A 6.6(b). The microprocessor has four ports in this example, one of which is used to interface
C-language programmer may write to PO using an instruction like: PO ' 255, which wou_ld with .a parallel I/0 peripheral, which itself has three ports. Thus, we have extended the
set all eight pins to Is. In this case, lhc C compiler manual would have defined PO as a spe~1al number of available ports from four lo six. ·using such a peripheral in this manner is often
variable that would automatically be mapped to the register PO during compilallon. referred lo as extended parallel {/0. ·
Conversely. the programmer might read the value of a port PI being written by some other
deYice by typing ·something like a •· PI. In some microprocessors. each bit of a port can be Memory-Mapped ,;0 and Standartj 1/0
configured as input or output by writing to a configuration register-for the port. For exa~ple, In bus-based I/0, there are two methods for a microprocessor lo comnumicate with
PO might have an associated configuration register called CPO. To set the high-order four bits
. peripherals; known as memory-mapped I/0 and standard I/0. ,
to input and the low-order four bits to output. we might say: CPO : 15. This writes 00001111 · in memory-mapped 110, peripherals occupy specific addresses in tbe existing address
to the CPO register. where a O means input and a I means output. Ports arc often
space. For example, consider a bus with a 16-bit address. The lower 32K addresses may
bit-addressable. meaning that a programmer can read or write specific bits of the port. For
correspond to memory ad~ses, wlu,f~ the upper 3 2K may correspond lo 1/0 addresses.
example, one might say: x ~ P0.2. giYing x the value of the number 2 pin of port /'O.

144
- ~ ~=~~- -- -----·
!
I
Chapter 6: Interfacing 6.3: Microprocessor Interfacing: UO Addressing ~
fj
;.i
CYCLE Gl I 1AIT C3i C4

PO Adr. 7..0
PO D
CLOCK! ALE
., P2 Adr. 15...8
D[7-0) i
A[19-0]
"!
'ADDRESS
[-i. . ~-------+----------....---~r- Q Adr. 7... 0
/WR
P2
8
~E~-----'-~-------------.----- ALE /RD

/PSEN
/IOR I
,-.;...-----+--.
/RD ,---
8051
CHRDY
(a) (b) 27C256
Figure 6.8: A basic memory protocol: (a) timing diagram for a read operation. (b) interface schematic.
Figure 6. 7: ISA bus protocol for standard 1/0.
In standard /IQ (also known as 110-mapped /10), .the bus includes an additional pin,
which we label MIIO,.to indicate whether the access is to memory or to a peripheral (i.e.; an Example: The ISA Bus Protocol - Standard 1/0
I/0 device). For example,. when MIIO is 0, the address on the address bus corresponds to a The ISA bus protocol introduced_earlier supports standard 1/0. The 1/0 read bus cycle is
memory address. When WIO is l, the address corr-esponds to a peripheral. · depicted in Figure 6.7. During this bus cycle, the microprocessor drives the bus signals to read
.An advantage of memory-mapped 1/0 is that the microprocessor need not include special I a byte of data from a peripheral, according lo tlte tiining diagram shown. Note lhat the cycle
instructions for communicating with peripherals. The microprocessor's assembly instructions ' uses a control line distinct from !MEMR. namely 1/0R, wh_ich is consistent with the standard
involving memory, such as MOV or ADD, will also ·work for peripherals. For example, a . 1/0 approach. The 1/0 device address space is limited to 16 bits, as opposed. to 20 bits for
microprocessor may have an ADD A, B instruction that adds the data at address B to the data , memory devices. The I/0 write bus cycle is similar to the memory write bus cycle but uses a
at address A and stores the result .in A. A and B may correspond to memory locations, or control signal I/OW and again limits the address to 16 bits. The 1/0 read and write bus cyclej;
registers in peripherals. In contrast, if the microprocessor uses standard 1/0, the use tlte compromise strobe/handshake 't)ntrol method. as did the memory bus cycles.
microprocessor . requires special instructions for reading and writing peripherals. These
instructions are often called IN and OUT. Tiws, to perform the same addition of locations A Example: A Basic Memory Protocol
and B corresQ<>nding to peripherals, the following instructions would be necessary: In this example. we illustrate how to interface SK of data and 32K or9program code memory
IN~A .
to a microcontroller. specifically the Intel 8051. The 8051 uses separate memory address
lNRl,B spaces for data and program code. Data or code address space is limited to 64K, hence,
ADDRO,Rl addressable with 16 bits through ports PO (least significant bits) and P2 (most significant
OUTA,RO bits). A separate signal, called PSEN (program strobe enable), is used to distinguish between
Advantages of standard 1/0 include no loss of memory addresses to the use as 1/0 data/code. For the most part, the 8051 generates all of the necessary signals to perform
. addresses, and potentially simpler address decoding logic in peripherals. .Address decoding memory 1/0, however, since port PO is used both for the least significant address bits and for
logic can be simplified with sl3rldard 1/0 if we know that ihere will only be a smali nwnber of data. an 8-bit latch is required to perform the necessary multiplexing. The timing diagram
peripherals, because ilie peripherals can then ignore high-orqer address bits_: For example, a depicted in Figure 6.8(a) illustrates a memory read operation. A memory write operation is
bus may have a 16-bit address, but we niay know there will never be more ' than 256 I/0 performed in a similar fashion with data flow reversed and RD (read) replaced wilh WR
addresses required. 11).e peripherals can .thus safely ignore the high-order 8 address bits, (write). The memory .read operation proceeds as follows. The microcontroller places. the
resulting in smaller and/or.faster addressC<>mparators in each peripheral. llJote that we can source address_(i.e., the memory location to be read) on ports P2 .and PO. P2, holding lhe
build a system using both standard and memory-mapped 1/0, since peripherafs in the memory · eight most significant addriess bi~, retains its value throughout the read operation. PO. holding
space act just like memory themselves. the eight least-significant address bits; is stored inside an 8-bit latch. The AIE signal (address
146 Embedded System Design Embedded ·System Design 147
www.compsciz.blogspot.in ··--------·-···-~- ~ - -<-....,.__....,, -+"· ·· "6 ·-·-

l i.!J
i
Chapter 6: Interfacing --------------:-::-::-:-------.:__ it 6.4: Microprocessor Interfacing: -Interrupts
}~
tJ
data . from a periphernl whenever that peripheral has new data; such processing is called f
servicing. lf the penphe':11 gets new data at unpredictable intervals, how can the program ij
SI deterrrune when. the peripheral has new data? The most straightforward approach is to
mterleave the nucroprocessor's other tasks with a routine that checks for new data in the
peripheral, perhaps by checking for a I in a particular bit in a register of the peripheral. This
repeated checking by the microprocessor for data is called polling. Polling is simple to
implement, but this repeated checking wastes many dock cycles, so it may not be acceptable
m many cases, especially when there are numerous peripherals to be -checked. We could
clte.c k at less-frequent intervals, but then we may not process the data quickly enough.
To overcome the limitations of polling, most microprocessors come with a feature called
external interrupt. A microprocessor with th.is feature has a pin, say, Int. At the end of
executing each machine instruction, the processor's controller checks Int. If Int is asserted, the
microprocessor jumps to a particular address at which a subroutine exists that services the
interrupt. Th.is subroutine is called an interrupt service routine, or lSR. Such UO is called
Figure 6.9: A complex memory protocol. interrupt-driven l/0.
One might wonder if inlenupts have really solved the problem with polling, namely of
latch enable) is used. to trigger dte latching of port PO . Now, the microcontroller asserts high wasting time performing excessive checking, since the interrupt pin is "polled" at the end of
impedance on PO to allow the memory device to drive it with the requested data. The memory every microprocessor instruction. However, in this case, the polling of the pin is built right
device outputs valid data as long as the RD signal is asserted. Meanwhile, the microcontroller into tlie microprocessor's controller hardware, and therefore can be done simultaneously with
reads the data and deasserts its control and port signals. Figure 6.8(b) illustrates the interface the execution of an instruction, resulting in no extra clock cycles.
schematic. There are two methods by which a microprocessor using intenupts determines the
address, known as the interrupt address vector, at which the ISR resides. These two methods
Example: A Complex Memory Protocol are fixed and vectored interrupt. In fixed interrupt, the address to which the microprocessor
In this example, we will build a finite-state machine (FSM) controller that will generate all the jumps on an interrupt is built into the microprocessor, so it is fixed and cannot be changed.
necessary control signals to drive the TC55V2325FF memory chip in burst read mode (i.e., The assembly programmer either puts the ISR at that address, or if not enough bytes are
pipelined read operation), as described in Chapter 5. Our specificatfon for this FSM is the available in that region of _memory, merely puts a jump to the real ISR there. For C
timing diagram presented in the earlier example from Chapter 5. The input to our machine is a programmers, the compiler typically reseiVes a special name for the 1SR and then compiles a
clock signal (CLK), the starting address (AddrO) and the enable/disable signal (GO). The subroutLne having that name into the ISR location, or again just a jump to that subroutine. In
output of our ma~hine is a set of control signals specific to our memory device. We assume microprocessors with fixed lSR addresses, there may be several· interrupt pins to support
that the chip's enable and WE signals are asserted. Figure 6.9 gives the FSM description. interrupts from multiple peripherals. - ·
From the state machine description, we can derive the next-state and output truth tables. From Figure 6. 10 provides a summary of the flow of actions for an example of interrupt-driven
these tiuth tables, we can compute next-state and output equations. By deriving the next-state I/0 using a fixed ISR address. Figure 6.11 illustrates this flow graphically for the example. In
transition table, we can solve and optimize the next-state and output equations. These tliis example, data received by Peripheral I must be read, tran·sformed, and then written to
equations can. be implemented using logic components. (See Chapter 2 for details.) Any Peripheral2 . Periphera/1 might represent a sensor, and f>eriphera/2, a display. Meanwhile,
processor that _is to be interfaced with one of these memory devices must implem(,nt, tlic microprocessor_ is running its main program, located in program memory starting at
internally or externally, a state machine similar to t11e one presented in th.is example. address 100. When Periphera/1 receives data, it asserts Int to request that the microprocessor
service the data. After the microprocessor completes execution of its current instruction, it
stores its state and jumps to the !SR located at the fixed program memory location of 16. The
ISR reads the data from Periphera/1 , transforms it, and writes the resuli to Peripheral2. The .
6.4 Microprocessor Interfacing: Interrupts last ISR instruction is a retwn from i_nterrupt, causing the microprocessor to restore its state·
Another microprocessor UO issue is that of interrupt-driven I/0. To introduce this issue, and reswne execution of its main program, in this case executing instruction IO I.
suppose t_he program running on a microprocessor must, among other tasks, read and process Other microprocessors use vectored interrupt to determine the address at which the ISR
resides. This approach is especially common in systems with a system bus, since there maybe

--··-···•-"····· -- - - -~m!&::::~ - -
. Chapter 6: Interfacing . -in_g_:_~
·.oc_e_ss_or_l_n_te_rf_ac_
- - - - - , - - - - - - - - - - - - - - - - - - - - :6;:-.4:;::-:M::i~cr_opr_
..::::l
3
/(a}: µPis executing its main program. l{b): Pl receives input data in a
register with address Ox8000.
!SR Program memory
Vi: MOV RO. 0~8000
!SR Program memory ,-=:---~ !j",
lfr MOV RO Ox8000 tl
17: # modifies RO :··;
17: ' }I
18: MOVOx800 l. RO
asserts Int to request servicing
3: After completing instruction at 100, µP sees
2: PI
b the micro rocessor.
19: RETI # ISR rctum
Main program
18: MOV Ox8001 . RO
19: RETI Ii ISR return
... 1
•I
lnJ asserted, saves the PC's value of I00, and sets

Main program ~
I 00: instruction <!
100: instruction
PC to the ISR fixed location of 16. 101: instruction
IOI : instruction
4(a}: The ISR reads data from Ox8000, modifies !(a): µPis executing its main program
2: _Pl asserts Int to request servicing by the
the data, and writes the resulting data to Ox8001 . . l(b): Pl receives input data in a register with nucroprocessor
address Ox8000.
5: The ISR returns, thus restoring PC to

I00+ 1=101,.where ·µP resumes executing. !SR Program memory /SR Program memory :
16: MOV RO, Ox8000 l
17: # modifies RO , 16:-Mov RO, Ox8000
I
I 7: # modifies RO
18: MOVOx8001. RO /
18: MOV0x8001, RO
Figure 6. IO: lntcmtjll-drivcn.1/0 using fiAed ISR location: summary ofnow of actions. 19: RETI # ISR return
... 19: RETI # ISR return
I
,\,Jain program
Mam program
numerous peripherals that can request service. In this method, the microprocessor has one 100: instruction
100: in.,tru,'liou
interrupt pin, say, Int, which any peripheral can assert. After detecting the interrupt, the 101: instmction
IO l: instruction
microprocessor asserts another pin, say, lnta, to acknowledge that it has detected the interrupt
and to request that the interrupting peripheral ·provide the address where the relevant ISR 3: After completing instruction at 100, µP sees Int 4(a): The IS~ reads data from Ox8000, modifies the
resides. The peripheral provides this address on the data bus, and the microprocessor reads the .asserted, saves the PC's value of 100, and sets PC data, and vm.tes the resulting data to Ox800 I .
address and jumps to the corresponding ISR. We discuss the situation where multiple lo the ISR fixed location of 16. 4(b): After being read, Pl de.asserts Int.
peripherals· simultaneously request servicing in a later section on arbitration. For now,
consider an example of one peripherar using vectored interrupt, The flow of actions is shown
in Figure 6: 12, which represents an example very similar to the previous one. Figure 6.13 !SR Program memory
illustrates the example graphically. In contrast to the earlier example, the ISR location is not 16: MOV RO, Ox8000
17: # nwdifies RO
fixed at 16. Thus, Peripheral! contains an extra register holding the ISR location. After 18: .MOV 0,8001. RO
detecting the interrupt and saving its state, the microprocessor asserts lnta in order to get 19: RETI ~ ISR return
Peripheral! to place 16 on the data bus. The microprocessor reads this 16 into the PC and 5: The ISR returns, thus restoring PC to
Main program
loo+l=IOI, where µPresumes executing.
then jumps to the ISR, which exectJtes and completes in the same manner as the earlier tool instruction<-/--
example. IOI : instruction
As a compromise between the fixed· and vectored interrupt methods, we can use an
Figure 6. 1I: Interrupt-driven 1/0 using fixed !SR l ocation: flow ofac1i• ~s.
interrupt address table. In this method, we still have only one interrupt pin on the processor,
but we also create in the processor's memory a table that holds ISR addresses. A typical table
might have 256 entries. A peripheral, rather than providing the ISR address, instead provides ~que nurn~r independent of ISR locations, meaning that we could move the ISR location
without haVIIlg to change anything in the peripheral. ·
a number corresponding to an entry in the table. The processor reads this entry number from
ihe bus, and. then reads the corresponding table entry to obtain the ISR address. Compared to External interrupts may. be maskable .or nonmaskable. In maskable interrupt, the
the entire memory, the table is typically very small, so an entry number's bit encoding is pro~er ma! force ~e nucropr~sor to ignore the:interrupt.pin, either by executing a
small. This small bit encoding is especially important when the data bus is not wide enough to ~ific m~ctI?n to disable the iµterrupt _o r by setting bits in an interrupt configuration
hold a complete'ISR address. Furthermore, this approach allows us. to assign each peripheral a ~gister: _A Situa_tion where a programmer ought want to mask interrupts is·when there exist
tune-cnttcal regmns of code, such as a routine that generates a puJse of a certain duration. The l
· :1so Embedded System Design
l
151 i!
j
www.compsciz.blogspot.in ·--- - ·-··- -·-··----·- ·-· ··- --~ ~. ~ ...J
j
I
Chapter 'ii: Interfacing
6.5: Microprocessor Interfacing: Direct Memory Access
!SR l'r9gram memory

~ I(a): µP is executing its main program. J(b): Pl receives input data in a /SR Program memory
n re ister with address Ox8000. _16: MOV RD, Ox8000 · 16: MOV RO, Ox8000
I 7: ii h1odifies.RO 17: # modifies RO
18: M.OVOx8001.RO 18: MOV0,8001,RO
----------'-------'--,i"I 2: Pl asserts Int to request servicing 19: RETI ii ISRretum 1:.:RETI # IS~ r<lum
1:· After completing instruction at 100, µP sees b the micro rocessor. Main program . M~'.n program
Int asserted. saves the PC's value of I 00, and 100: instruction
4: Pl detects Ima and puts interrupt 100: . instruction
asserts lnla. 101: instruction 101: instruction
address vector 16 on the data bus.
1(a:): µP is execu~g its main pro
5(a): µP jumps to the address on the bus (16). 2: Pl asserts Int to-request servicing by the
l(b): Pl receives input data iit a register with microprocessor ·
The ISR there reads data from Ox8000, modifie~ address 0xaooo. .
5(b).: After being red, Pl deasserts
the data, and writes the resulting data to Ox8001: Program memory . ~-----
Int. !SR . . !SR Program memory
16: t-.!OVRO. Ox8000 16: l\!OV RO. Ox8000
17: # modifies RO 17: # modifies RO ,
6: The ISR returns, thus restoring PC to 18: MOV Ox8001. RO 18: MOV Ox8001. RO
100+1,;:101, where µPresumes executing. 19: RETI # ISR rciuru 19: RETI # ISR reluru .
...
Main program Main program
Figure 6.12: lnterrup1-dri ven 1/0 using vectored interrupt: summary or now or actions.
I 00: itlstructio·n 100: instruction
IOI: instruction 101: instruction
programmer may include an instruction that disables interrupts at the beginning of the routine, 3: After completing instruction at 100, µP sees Int 4: Pi detects/nta and puts interrupt address
and another instruction reenabling inte11Upts at the end of the routine. Nonmaskable interrupt asserted, saves the PC's value of 100, and asserts ·vector 16 on the data bus.
lnJa.
cannot be masked by the programmer. It requires a pin distinct from maskable interrupts. It is
typically used for very drastic situations, such as power failure. In this case, if power is JSR Program memory Program memory
ISR
failing, a nonmaskable interrupt can cause a jump to a subroutine that stores critical data in 16: MOV RO, Ox8000 16: MOV RO, Ox8000
17: # modifies RO , · 17: # modifi<s RO
nonvolatile memory, before power is completely gone. 18: MOV Ox8001. RO , 18: MOVOx800f.iW
In some microprocessors, the jump to an ISR is handled just like the jump to any other 19: RETI # ISR return '
,.. 19: REJI # ISRr<'<IJm
...
subroutine, meaning that the ·state of the microprocessor is stored on a stack, including Mi1i11 progta111 Main program
contents of the program counter~_datapath status register, and all other registers. The state is 10/i; ins1r11ction 100: instruction
then restored upon completion of the ISR. In other microprocessors, only a few registers are IO[: inslruclion IO l: instruction
stored, like just the program counter and status registers. The assembly.programmer must be
aware of what registers have been stored, so as not to overwrite nonstored register data with 5(a): P jumps to the address· on the bus ( 16). The
JSR there reads data from Ox8000, modifies the 6: The ISR returns, thus restoring the PC to
the ISR. These microprocessors need two types ofassembly instructions for subroutine return. data, and writes the resulting data to Ox8000.I. 100:I-I=.IOI; where the µPresumes
A regular return instruction returns from a regular subroutine, which was called using a 5(b): After being read, Pl deasserts Int..
subroutine call instruction. A return from intellUpt instruction returns from an ISR, which was
jumped to not by a call instruction but by the hardware itself, and which restores only those
Figure 6.13: Interrupt-driven-VO using vectored interrupt: flow of actions.
registers that were stored at the beginning· of the interrupt. The C programmer is freed from
having to worry about such considerations, as the C compiler handles them.
The reason we used the term external interrupt is to distinguish this type of inte11Upt
from internal interrupts, also called traps. An internal interrupt results from an exceptional
condition, such as divide-by-0, or execution of an
invalid opcode. Internal interrupts, like 6.5 Microprocessor Interfacing: Direct Memory Access ·
external ones, result in a jump to an ISR. A third type of interrupt, called software interrupts,
can be initiated by executing a special assembly instruction. Commonly, the data being accum~ted in a peripheral should be first. stored in memory
before being processed by a program running on the microp~ocessor. Such temporary storage
of data that is awaiting processing is called buffering. for example, packet .data from an .

Embedded Sy$1ein 'Design· 153
Chapter 6: lr·terfacing. 6.5: Microprocessor Interfacing: Direct Memory Access
Program memory µP Data memory µP

Program memory Datam~mo~.
!SR Ox-00000x1)00I : !SR
~ I (a): µP is executing its main program. I (b) : Pl receives input data in a 16: MOV RO. Ox8000
. x0000 0,.·000 I ,
register with address Ox8000. 17: ti modifies RO I ··· I ~ 16: MOV RO, Ox8000
" 18: MOVOx8001, RO
17: # modifi~s RO
18: MOV Ox8001, RO
19: RETI # ISRr•tum 19: Rf.TI t !SRrotum
2: Pl asserts Int to request.servicing by ... ...
3: After completing instruction at 100, µP sees Int the microprocessor.
asserted,-saves the PC's value of I00, and asserts I 00; instruction 100: instruction
IO I: instruction IO I: instruction
Inta. 4: PI detects lnta and puts interrupt
address vector 16 on the data bus.
I(a): µP is executing its main program
5(a): µP jumps to the address on the bus ( 16). The 2: _Pl asserts Int to request servicing by the
l(b): Pl receives input data in a register with
!SR there reads data from Ox8000 and then writes address Ox8000.
rmcroprocessor
it to OxOOOI, which is in memory.
!SR Program memory µP D.it.a memorv
Ox00000:-.-0001 . • !SR Program memory
16: MOV RO. Ox&OOO
6: The ISR returns, thus restoring PC.to 17: # modifies RO I I I··· Ii 16 l\·1 0 V RO, Ox8000 ·
17: /! modifies RO
100+1=101, where µPresumes executing. IS: MOVOx8001: RO 18: 1\-IOV Ox8001. RO System bus
19: RETI .# ISRretum 19: RETI # ISR r•tum
...
Figure 6.14: Peripheral to memory transfer without DMA, using vectored interrupt: summary of flow of actions. 100: instruction 1.00: instruction
IOI: instruction IOI: instruction
Ethernet card is stored in main memory and is later processed by the different software layers
(e.g., Internet Protocol stacks). We · could write a simple interrupt service routine on the 3: After completing instruction at 100, µP sees Int 4: PI detects lnta and puts interrupt add=s vector
micropr~sor, such that the peripheral device would. interrupt the microprocessor whenever asserted, saves the PC's value of 100 and asserts 16 on the data bus. ·
lnta. · · '
it had data to be stored in memory. The ISR would simply transfer data from the peripheral to
the memory, and then resume numing its application. For example, Figure 6.14 provides a /SR Program memory Program memory µP Data memoiy
/SR 0>.0000 Onl()() I
summary of the flow of actions for an example in which peripheral Pl interrupts the 16: MOV RO, Ox8000 16: MOV RO, Ox8000
17: # modifies RO 17: ii'inodifies RO
microprocessor when receiving new data. Figure 6.15 illustrates the example graphically. In 18:MOVOx800I,RO 1
18: MOV Ox8001, RO
this example, the microprocessor jumps to ISR location 16, which moves the data from 19: RETI # !SR return 19: R£TI ii!SR r<11Jm
Ox8000 in the peripheral to OxOOO l in memory. Afterward, the ISR returns. However, recall Main program
...
Main program
that jumping to an ISR requires the microprocessor to store its state (i.e., register contents), IOii: instruction I 00: . instruction
and then to restore its state when returning from the ISR. This storing and restoriug of the 10 I: instruction IOl: instruction<_-/-
state may consume many clock cycles, and is thus somewhat inefficient. Furthermore, the
microprocessor cannot execute its regular program while moving the data, resulting in further
inefficiency. ·
6: The ISR returns, thus restoring PC to
The VO method of direct memory access (DMA) eliminates these inefficiencies. In
1Oo+ I= IO I, where µP resumes executing.
DMA, we use a separate single-purpose processor, called a DMA controller, whose sole
purpose is to transfer data between memories and peripherals. Briefly, the peripheral requests
Figure 6:15: Peripheral to memory transfer without DMA, using vectored interrupt: flow ofai:tions.
servicing from the· DMA controller, which then requests control of the system bus from the
microprocessor. The microprocessor merely needs to relinquish control of the bus to the
OMA controller. The microprocessor does not need to jump to an ISR, and thus the overhead A· system with a separate bus between the mjcroprocessor and· cache may be able to . ·
of storing and restoring . the microprocessor state is eliminated. Fwthermore, the .execute for some time from the cache while the DMA transfer f?kes place.
microprocessor can execute its regular program while the DMA controller has bus control, as . Figure 6.16 ~ z e s the flow of actions for an exampie transfer using DMA. and
long as that regular program doesn't require use of the bus (at which point ll:1e micrpprocessor
I F1~ 6.17 depicts the. example graphically. , As seen in Figure. Q.17, we coimect the
penpheral t~ the D~ controller rather than the microprocessor. Note that the peripheral does-
.1'"'" '
would then have to wait for the DMA to complete).
! not recogruze any difference between being connected to a DMA controller device or a
154 . Embedded System Design

s,- ,..., ,..
www.compsciz.blogspot.in - · · ,·a;;,.---·2, ·s ·.·
6.6: Microprocessor Interfacing: Direct Memory Access
Cha1>ler 6: Interfacing
....,~-------------,
,:' I (ai: ftP is executing its main I (b): Pl receives input
Program memory
71 prograIU. It has already configured data in a register with No JSR needed! No JSR needed.'
the f)J£.l cir! registers. address Ox8000.
3: OMA ctr! asserts
./: Aller executing instruction IOO, Dreq torequest control ainprogram ainprogram DMActrl Pl
µP sees Dreq asserted, releases the of system bus.
sysh:m bus, asserts Dack, and I ~>.-OOOtl ad(
00: instruction • 00: instruction
resumes execution. µP stalls only if 101: instruction : IO I: instruction PC ,... ~ req . Ox8000
it needs the system bus to continue 5: (a) OMA ctr! asserts
ack (b) reads data from
[@] :CJ ~~
executing.
Ox8000 and (b) writes !(a): µPis executing its main program. It has ~------------------~~
',~ { 2: Pl asserts req to request servicing /
:
that data lo OxOOO I. already configured the DMA ctrl registers
!(b): Pl receives input data in a register with ,,, : by DMA cul
address Ox8000. · ' \ 3: OMA ctrl asserts Dreq to request control of
' systembus
Dreq andack
Program memory
7(aJ: µP deasserts Dack and completing handshake
resumes control of the bus. with PL No !SR needed! No JSR needed!
figur< 6.16: Peripheral to memory trnnsf.!r with D!\·I A: summary of flow of actions.
ainprogram ainprogram
microprocessor de,·ice: all the peripheral knows is that it asserts a request signal on the 100: instruction : 100: instruction
device. and then that device services the peripheral's request. We connect the DMA controller lO I: instruction Ox8000 : 101: instruction
to two special pins of the microprocessor. One pin. which we'll call Dreq, is used by the Ci:]
DMA controller to request control of the bus. The other pin, which we'll call Dack, is used by 4: After executing instruction 100, µP sees Dreq
the microprocessor to acknowledge to the DMA controller that bus control has been granted. asserted, releases the system bus, asserts Dack, and 5: (a) DMA ctrl asserts ack, (b) rellds data from
resumes execution, µP stalls only if it needs the Ox8000, and (c) writes that data to OxOOO I.
Thus, unlike the peripheral. the microprocessor must be specially designed with these two
pins in order to support DMA. The DMA controller also connects to all the system bus system bus to continue executing.
...--=---------,
signals. including address, <lat.a. and control lines. · Program memory µP Datamemory
0~-0000 OxOOOI
To achieve this we must have configured the.OMA controller to know what addresses to
,Vo JSR needed! No !SR needed.'
access in the peripl1cral and the memory. Such setting of addresses may be done by a routine System bus
running on the-microprocessor during system initialization. In particular. during initialization,
the microprocessor writes to configuration registers in the DMA controller just as it would
write to any other pe·,1pherars registers. Alternatively, in an embedded. system that is ainprogram
gu~ranteed not to change. we can hardcodc the addresses directly into the.OMA controller. In 100: instruction
the example of Figure 6. 17, we sec two registers in the DMA controller holding the peripheral 101: instruction
register address and the memory address.
During its control of the system bus, the DMA controller ni.ight transfer just one piece of
data. but more commonly will transfer numerous pieces of data (called~ block), one right
after other. before relinquishing the bus. This is because many peripherals, such as any
peripheral that dc;1ls ,\'ith storage devices (e.g.. .CD-ROM players or disk controllers) or that Figure 6.17: Peripheral to memory transfer with OMA: flow of actions.
deals "ith network communication. send and receive data in large blocks. For example.· a
particular disk controller peripheral might read data in blocks of 128 words and store ~is data For the example just given, the DMA controller works as follows. The DMA controller
in c1 128-word internal memory. after which the peripheral requests servicing (i.e., requests gains control of the bus, makes 128 peripheral reads and memory writes, and only then
that this data be buffered in memory). relinquishes the bus. We must therefore -configure the OMA controller to operate in either
156 Embedded system Design Embedded System Design. 157'
..,,•. ~.'···; ·.:.
··--- ---··--·-- ··--- ·- - ~
-- =="-"- ··· -- ·-- .. c __ ,j
II
1
Chapter 6_: Interfacing
----·---------------------.....:...:.:.:.::.:.:::.:::::.
Micro-
6.6: Arbitration
r Processor Memory processor ~----c.----:::--:---:--------,i----------

i System bus
! (a) mm 5~---=-=--,---~
i•' A
DMA mt
Priority
· arbiter
Periphera12
2~ 2
Ireql - - - ~
CYCLE /Cl (C2 iC3 \c4 b lC6 lC7
i I
Iackl 6
CLOCK j Ireq2 - - - - - - - ~ - - - '

· Iack2 1-----------~
0[7-0J j,---,.'-·- ' - - - + - - - l ~ , - ; - - - ------'---l@E]t-,--- I. Microprocessor is executing its program.
A[l9-0J ; : ADPRB\S
H~-;----AD_p~RE--e.SS_· -1~ 2.
3.
Peripheral I needs s_ervicing so asserts freq I. Pcripheral2 also needs servicing so asserts lreq2.
Priority arbiter sees at least one freq.input asserted, so asserts Int.
u
,_!
~
ALE 4. Microprocessor stops executing its program and stores its state.
w i L'1E~J
5. Microprocessor asserts lnta.
flORi~ --~!~I_._~~Ii 6.
7.
Priority arbiter asserts lackI to acknowledge Peripheral 1.
Peripheral I puts its interrupt address vector on the system bus.
IMEMW ,_,_ _ _ ....,i
i LJ_J l I(;)W 8. Microprocessor jumps to the address ofthe ISR read from the data bus, !SR executes and returns
I LI (and completes handshake with arbiter).
CHRDY
REQlr---t--+--+--+--+--+-~L ,~.
ACK:e--s---;---.;--a---'-'
--------+---. 9. Microprocessor resumes executing program.
Figure 6.19: Arbitration using a priority arbiter.

,ts
(b) (c)
response, to this, the .OMA controller will assert its DRQ to signal the processor. The ·
processor, then, relinquishes the bus control signals and signals to the OMA controller with an
Figure 6.18: DMA using tho lSA bus protocol: {a) system architeclure. (b) DMA write cycle, {c) DMA read cycle. acknowledgment (DACK). In response, the OMA will acknowledge the 1/0 device's DRQ by
asserting its DACK. At this point, the actual transfer of data from the device to memory is
single transfer mode or block tran~fer mode. For block transfer mode, we must configure a initiated. Note that the. actual OMA signals (DACKs and DRQs) are nor part of the ISA
base address as well as the number of words in a block. . protocol. The ISA protocol merely provides a scheme for performing an 1/0 read and a .
OMA controllers typically come with numerous channels. Each channel supports one memory write in the same bus cycle. The OMA memory write bus cycle is shown in Figure
peripheral. Each channel has its own set of configuration registers. Some modem peripherals 6.18(b).
come with OMA capabilities built into the peripheral itself. Let us now look at the OMA memory read bus cycle. The DMA memory read bus .cycle
is almost identical to a OMA memory write bus cycle. The_only difference is that !OW is
Example: OMA 1/0 and the ISA Bus Protocol replaced with !OR and MEMW is replaced with MEMR. In addition, the order in which the
1/0 write·artd memory read signals are asserted is reversed. The OMA memory read bus cycle
In an earlier example, we introduced the basic ISA memory and peripheral 1/0 read and write
is shown in Figure 6.18(cr
bus cycles. In this example, we will introduce the OMA related bus cycles. Our sample
architecture is extended now to include a OMA controller as shown in Figure 6.18(a). In this
figure, R d~notes the OMA request signal and A denotes the OMA acknowledge signal.
. OM".', 1s used to perform memory writes/reads to/from 1/0 devices directly without the 6.6 Arbitration
mt~rvenuon of the processor. Let us first look at the OMA memory write bus cycle. A OMA In our earlier discussions, several situations existed in which multiple peripherals might
wnte bus cycle proceeds as follows. First, the processor programs the OMA contrciller to request service from a single re59urce. For example, multiple peripherals might share a single
monitor a particular 1/0 device for available data. The processor also programs the OMA with microprocessor that services their interrupt requests. As another example, multiple peripherals
the starting memory address where the data item is to be written to. Once the 1/0 devi~e has might share a single OMA controller that services their DMA requests. In such situations, two
available data, it generates a OMA request by asserting its OMA request line (DRQ). In or more peripherals may request service simultaneously. We therefore must have some
158 159
Chapter 6: lnte1facing
6.6: Arbitration
method to arbitrate among these contending requests. Specifically, we must decide which one
of the contending peripherals gets service, and thus which peripherals need to wait. Several ;' µP
methods exist, which we now discuss. System bus
Priority Arbiter Peripheral I Peri pheral2

Inta ck_in Ack_ourl-----H\ck_in Ack_ouit-----..
One arbitration method uses a single-purpose processor, called a priority arbiter. We illustrate"
Int . Req_out Req_in eq_out Req_in 0
a priority arbiter arbitrating among !fiUltiple peripherals using vectored interrupt to request
servicing from a microprocessor, as illustrated in ·Figure 6.19. Each of the peripherals makes
its request to the arbiter. The arbiter in turn asserts the microprocessor intenupt, and waits for (a)
the interrupt acknowledgment The arbiter then provides an acknowledgment to exactly one
peripheral, which permits that peripheral to put its interrupt vector address on the data bus µP System bus
(which, as you 'II recall, causes the ·microprocessor to jump to a subroutine that services that .,
peripheral). I - -~--~----------~----
Priority arbiters typically use one of two common schemes to detennine priority among Peripheral I : Periphera12 Peripheral)
Inta I
peripherals: fixed priority or rotating priority. In fixed priority arbitration, each peripheral has ck_in Ack_ou ck_in Ack_out
hit eq_out Req_in 0
a unique rank among all the peripherals. The rank can be represented as a number, so if there
eq_out Req_in
are four peripherals, each peripheral is ranked l, 2, 3, or 4. If two peripherals simultaneously
seek servicing, the arbiter chooses the one with the higher rank. I
\---'-------',
In rotating priority arbitration (also called round-robin), the arbiter changes priority of
----------------- - --- I
peripherals based OD the history of servicing of those peripherals. For example, one rotating
(b)
priority scheme grants service to the least-recently serviced of the contending peripherals.
This scheme obviously requires a more complex arbiter.
Figure 6.20: Arbiuation using a daisy-chain configuration: (a) Daisy-chain aware peripherals, (b) adding logic to
We prefer fixed priority when .there is a clear difference in priority among peripherals.
make a peripheral daisy-chain aware; more complex logic will typically be necessary, ,however.
However, in many cases the peripherals are somewhat equal, so arbitrarily ranking them could
cause high-ranked peripherals to get much more servicing than low-ranked ones. Rotating
its request will flow through the downstream peripherals and eventually reach the
priority e11$ures a more equitable distribution of servicing in this case.
microprocessor. Even if more than one peripheral requests servicing, the microprocessor will
Notice that the priority arbiter is connected to the system: bus, sin1;e the microprocessor
see only one request. The microprocessor acknowledge signal connects to the first peripheral.
can configure registers within the arbiter to set · the priority schemes and/or the relative
If this peripheral is requesting service, it proceeds to put its interrupt vector address on the
priorities of the devices. However, once configured, the arbiter does not use the system bus
system bus. But if it doesn't need service, then it instead passes the acknowledgment
when arbitrating.
upstr~am to t~e next peripheral, by asserting its acknowledge output In the same manner, the
Priority arbiters represent another instance of a standard single-purpose processor. They
next peripheral may either begin being serviced or may instead pass the acknowledgment
are also often found built into o_ther single-purpose processors like DMA controllers. A
along. Obviously, the peripheral at the front of the chain, i.e., the one to which the
common type of priority arbiter arbitrates interrupt requests; this peripheral is referred to as an
· microprocessor acknowledge is connected, has highest priority, and the peripheral at the end
interrupt controller.
of the chain has lowest priority.
We prefer a daisy-chain priority configuration over a priority arbiter when we want to be
Daisy-Chain Arbitration able to add or remove peripherals from an embedded system without redesigning the system.
The daisy-chain arbitration method builds arbitration right into the peripherals. A daisy-chain Although conceptually we cou.ld add as many peripherals to a daisy chain· as we desired, in
configuration is shown in Figure ·6.20(a), again using vectored intemipt to illustrate the reality the servicing response lime for peripherals at the end of the chain could become
method. Each peripheral has a request output and an acknowledge input, as before. But now intolerably slow. In contrast to a daisy chain, a priority arbiter has a fixed number of channels;
each peripheral also has a request input and an ackriowledge output. A peripheral asserts its once they are all used, the system needs to be redesigned in order to accommodate more
request output if it requires servicing or if its request input is asserted; the latter means that peripherals. However; a daisy chain has the drawback of not supporting more advanced
one of the "upstream" devices is requesting servicing. Thus, if any peripheral needs servicing, priority schemes, like rotating priocity. A second drawback is that if a peripheral in the chain
stops working, other 'peripherals may lose their access to the processor.
160 Emb.e!fde~ System Design. .

161
AIL.hough it appears from Figure 6.20(a) that each peripheral must be daisy-chain aware, ?; Processor
in fact logic external to each peripheral can be used to cany out the daisy-chain logic. Figure ·.·
6.20(b) illustrates a simple form of such logic. Periphera/1 and Periphera/3 are both ,
daisy-chain aware. whereas Periphera/2 is not. In order to incorporate Periphera/2 into the ..• MASK
{t!
MEMORY
daisy chain configuration, we must extend it to take care of requests and acknowledgments. \ IDXO
Regarding requests, if Periphera/3 requests service or Peripheral2 requests service, then ' IDXI
Periphera/1 's req__ in needs to be asserted. To accomplish this, we OR Peripheral2's req_out ENABLE
and Periphera/3's req_out and input the result to Peripheral!. Regarding acknowledgments,
if Periphera/1 's ack_out is asserted., then if Periphera/2 requested service, it should not pass DATA Jump Table
this acknowledgment to Peripheral], per the daisy-chain protocol. However, if Periphera/2
did not request service, then it should pass the acknowledgment to Peripheral3. To ·
accomplish this, we use an inverter and an AND gate, as shown in the figure. Only if .
Peripherall 's ack_out is high and Peripheral2's req_out is low do we assert Peripheral3's
ack__ in. However, note that this logic is very simple in this case, whereas m<>st peripherals will Figure 6.21: Architecture of a system \J~ing vectored interrupt and an interrupt table.
require more complex logic. even implementing a state machine, to convert the peripheral to a
daisy-chain aware device. devices are collllected to a two-channel priority arbiter with fixed priority scheme (i.e.,
Per[pherall has higher priority than Peripheral2). Both the peripherals and the arbiter are
Network-Oriented Arbitration Methods connected to the processor's memory bus and communicate with it using memory-mapped
The arbitration methods described are typically used to arbitrate among peripherals in an 1/0. The interrupt table index placed on the memory bus (a.k.a. system bus) by the arbiter is
embedded system. However, many embedded systems contain multiple microprocessors software progranunable through two memory-mapped registers. Both peripheral devices
communicating via a shared bus; such a bus is sometimes called a network. Arbitration in receive data from the external environment and raise their interrupt accordingly.
such cases is typically built right into the bus· pro_tocol, since the bus serves as the only The software to initialize the peripherals and the priority arbiter, and to process the data
connection among the microprocessors. A key feature· of such a collllection is that a processor received by our peripherals, is given in Figure 6.22. Let us now study the code. First, we
about to write to the bus has no way of knowing whether another processor is about to define a number of variables that correspond to the registers inside the priority arbiter and
simultaneously write to the bus. Because of the relatively long wires and high capacitances of peripheral devices. However, unlike defining ordinary variables in a program, these variables
such buses. a processor may write many bits of data before those .bits appear at another must refer to specific memory locations, namely, those that are mapped to the peripheral's
processor. For example, Ethernet and I2C use a method in which multiple processors may register. Normally, a compiler will place a variable somewhere in memory where storage for
write to the bus simultaneously, resulting in a collision and causing any data on the bus to be that variable's data is available. By using ·special keywords, we can force the compiler to
corrupted. The processors detect this collision, stop transmitting their data, 'wait for some place these variables at specific memory locations (e.g., in our compiler the keyword at
time. and then try transmitting again. The protocols must ensure that the contending followed by a memory location ill used to accomplish this). The priority arbiter, thus. has fou~
processors don't start sending again at the same time, or must at least use statistical methods registers located at memory locations Oxfffl) through OxfffJ. Note that our processor has a 16-
that make the chances of them sending again at the same time small. bit memory address. ·
,As another example, the CAN bus uses a clever address encoding scheme such that if two Next, we define two procedures, Periphera/2_JSR and Periperha/2JSR, that handle the
a\Jdrcsscs arc written simultaneously by different processors using the bus, the higher-priority interrupts generated by the peripherals. Since we are using an interrupt jwnp table, these ISRs
address will override the lower-priority c,me. Each processor that is writing the bus also checks can be ordinary C procedures. Each JSR, of course, must perform necessary processing.
the bus. and if the address it is writing does not appear, then that processor realizes that a Often, an ISR merely reads the data from the peripheral, places the data into a buffer, sets a
higher-priority transfer is taking place and so that processor stops writing tke bus. flag indicating to the main program that the buffer was updated.
Finally, we define the procedure lnitializePeripherals. The procedure first configures the
Example: Vectored Interrupt Using an Interrupt table priority arbiter. We can select, in software, which interrupts we are willing to handle. This is
This is an example of a system using vectored interrupts as well as a vectored interrupt table. done through the mask register. ~n our case, we set the first two bits of the mask register,
We will describe the software progi-,m1ming required to handle the interrupt requests. The indicating that we are to handle interrupts generated by both peripherals. Next, we program
relevant portions of the system architecture are shown in Figure 6.21. Here, two peripheral ·: the priority arbiter with the indices into the jump table where the location of the ISR is stored.
\ We have chosen to place these in locations 13 and 17, but this choice is arbitrary. The
I
.. ·-.~:·
-·r··-· ·1
1i
- - - - - - - - - - - - - - - - - : : - - - - , ; _ _ . . . . ;_ _,:s·::7.:..t::ii.1:u:tti::le:,:v:,:e::_I:B~u,!s:_A~r~ch~it~ect~u~r~e!s·
Chapter 6: Interfacing .,
:--- . 0 cL,~</)
Micro- Cache Memory. .--~D-=-MA-c·--(D /i(:>-'
processor controller controller
unsigned char ARBITER MA.SK REG _at_ OxfffO; !]
unsigned char ARBITER:::Qi()_INlE(_Rffi
unsigned char ARBITER- an- INDEX_JlF.G
unsigned char ARBITER_ENABIE_ Rffi .
unsigned char PERIPHERAL!_ll'\TA_Rffi
_at_ Oxfffl;
_at_ Oxfff2;
_at_ Oxfff3;
_at_ OxffeO; :((J
._, Processor-local bus \ ,ilt,'
.., ,
\: ""'
;
~
;1
0-- 'l
unsigned char PERIPHERAL2_ ll'\TA_Rffi _at_ Oxffel;
unsigned void* INI'ERRUPI'_ IJXJWJP_ TABLE (256] _at_ OxOlOO;
Peripheral 'Peripheral \ Peripheral \ ~ f r---::-..,.L-~ .,, ) ;_
,11'
\ \\,.I/.-?,
I~
void Perii:nerall_ISR(void) { "~\\ f -~ '\,
unsigned char data;
data = PERIHiERAI.l_r:mA_Rffi;
\,0 ~.. ,JA\-?:,_~-
J
I I do saret:hin;J with the data
~
-.-----''-------..L..-----f-'------------L-- U\·.
Periph~al.~~s,, 1 ?G ( <i <;~) r~
void Perii:neral2 ISR (void) {

unsigned-char data; \.·,-, I,
,1/· 1
V
Figure6,23: At:·level bu~~~~?tu(C: \_ · ~ ,:BV< ~i:vi:~? !
I
_.,
data = PERIPHERAI2_ r:mA_Rffi; - t; ~ · _,. -1,
I I do saret:hin;J with the data t / .· ,.J _ t? h.ive _a(~~~1~'c(.i, ~ s er:ace: ~ce a peripheral may not need\~~{ -~gh-speed
~oid InitializePeri]:herals (Void) { .
ARBITER MASK REG = 0x03; I I enable l:oth channels
,;> \.'L v_/
l.
commurucal:IOn, na\JTil uch an wterface -m.ay result ~ ~ t e s , 'pow~_Loonsumption and
~egmd, since _a high-speed bus will be Very processer-s~~lfic, jl.;peripheral with an
i
·
ARBITER- am INlEX REG = 13; ~.J!l.1~~S.~J2,:.l ffi~,b~E1~¥"P<?t .Q~-vel)'~rtab~e. Third, h~ving1oo many peripherals on the bus \
ARBITER- an- INDEX- REG = 17;
INI'ERRUPI' LOJKUP TABIB[l3] (void*)Perip,.erall ISR;
may result ,in a slower tfos. -7 _" · · I , o
Therefore,.we often design systems with two _level~ Qf buses: a high-speed processor local ,:,: 1
INI'ERRUPI':::ID'.JKUP:::TI\BIB (17] (void*) Perip,.era12)SR; . bus: and a lower-speed pe~p,9eral bus, as illusttatecl in Figure 6:23. Tiieprocessor local buF.J);
ARBITER_ENABIB_REG = l;
/ Jyp1cafly connects the rmcroprocessor, cache, memory controllers, and certain high-sp,eed ;_/
J
void rrain () { -" , coprocessors'. and is processo~cific. It is us,u~Iy wide, as wide as a mem_gry wpi:_d. . _ f:._-, -1
InitializePeri]:herals () ; The peripheral bus connects th~se processors that do not have fast processor local bus ) -:;,) . '
for (; ; ) {J I I rrain program gees here ac~ss as a top .priori~, but rather e~phasize pof!;lbilify, lo~wer, or low gate count. Tot'.1 :(;;i
penph<:_ral bus 1s typ1c3 lly an mdustry standarc(.bus, such ;is ISA or Pd; thus supporting .
portabilil}V of the peripherals. (It_is often narrower and/or slow~r tnan a proceslioi- local bus, ${,:L;
thus requiring fewer pins, fewer gates and less power for interfaci1J,g;:;e--/ · \.V · ~· •· · ; · ·1·- ,"" i
Figure 6,22: Software for a system using vec1ored interrupt and an interrupt tabk A bridge connects the two buses. A bridge is a single-purpose processor that cou.vert~:~:_,.,:! __!
collUllunication oh
one · bus" to communication on another bus_. fQI.-example;--• the -'. j 'i:,
procedure then places the ISRs.into the lookup table at locations 13 and 17. as shown in the ;QPrgq:ss.or-niay_g~nerate a read on the proces~or f~l bus with an ·address corresponding : ; , , J
code. st, the procedure enables interrupts by setting the arbiter's interrupt enable register. tQ a j>e.' ral. Th_:,?rid~~- d~~ts tha(tli.e adiir~~li~'!~_P.O~d~o a p(;ripheral, and thus· f( !:::i·'~ -J
th}C Jl~ra_tes a reaa_~ ~ ~i:!~!_~~~ ~~ ~ n t the ~ta. !11~
bridge s~nds that dat_a _. l
?
Jo~~~~~~ ~9'.9P~Q_f~ss_<>_r_~us-~~~.even k_n~w ~!if l_~ _b~~ge ex~S!_s-::- ~t _
rece1~sJ!_ie ruita, albeit~ f~_cx~tes later, as 1filie pep plieral were o~Jhe processor local bq~ ,\
I
Multilevel Bus Architectures A three-level bus hierarchy JS. afs<D>:_tlssible,' 2Jlroposed by the VSI Alliance. The first '. -f...-0 I
A microprocessor-based embedded system will have numerous types of communications that l:vel is·th~ proces~~r-JoCil}_?iii, ~ -e :s~on~ l~ve a ~ifs';"aiiclthe_third level a peripheral I!'
./ must take place, varying in their frequencies and speed requirements, The most__fu:)1.ueut..aru!_ bus. The system bus· would be a h:~~:;~d
ul, but'would offioad much of the traffic from ,, . .,,
high-speed communications will likely be between the microprocessor and its m¢iories ~e~s
frequent communication's. requiring less' speed. will be between the microprocessor and its ·
the proce~ai'fius. It may be ~/7
i1i'fomplex 7 i t h n~erous coprocessors. '6
. '½' rl, ,: ' c)
periphe~ls, _lik0A~U')we could try to i!Il.Plement a single ~!g~_--:s p~~Jius for all the J · J_,,--., 0
'l,\ } ;---1'- - J • :11 / 1 ·"
-.,_()
commurucauons, but this ;ipproach has several disadvantages. First. it re qmres ~Cb penpheraL .,I
I
" I - . ~ -=,_ ---;_.

---- - . - ---- .. - ·--·---- ------- -----·-·-· -~'"- ------ ···------· . . • . : · ··- . _: ____,__, __ _.__
Chapter 8: Interfacing 6.8: Advanced Communication Principles
power wires, running from one device to another. In serial communication, a word of data is
6.8 Advanced Communication Principles transmitted one bit at a time. Serial buses are capable of higher throughputs than parallel
In the preceding sections, · we discussed basic meth~ ~f interfa~g. Th~se interfacing buses when used to connect two physically distant devices. The reason for this is that a serial
methods could be applied to interconnect components within an IC vta on-chip buses, or to bus will have less average capacitance, enabling it to send more bits per unit of lime. In
interconnect res via on-board buses. In the remainder of the chapter, we study more advanced addition, a serial bus cable is cheaper to build because it has fewer wires. The disadvantage of
interfacing concepts and look at communication from a more abstract point of _view. In a se~al bus is that the interfacing logic and communication protocol will be more complex.
particular, we study parallel, serial, and wireless communication. We also descnbe some on· the sending side, a transmitter must decompose data words into bits and on the receiving
advanced concepts, such as layering and \error detection, which are part of_ many side, and the receiver must compose bits into words.
communication protocols. Furthermore, we highlight some of the popular parallel, senal, and Most serial bus protocols eliminate the need for extra control signals, such as read and
wireless communication protocols in use today. · . write signals, by using the same wire that carries data for this purpose. This is performed as
Communication can take place over a number of different types of media,_such as _a follows. When data is to be sent, the sender first transmits a bit called a start bit. A start bit
single wire, a set of wires, radio waves, or infrared waves. We refer to the med~um that 1s merely signals the receiver to wakeup and start receiving data. The start bit is then followed
used to carry data from one device to another as the physical layer. Dev_en~g. on the by N data bits, where N is the size 01 the word, and a stop bit. The stop bit signals to the
protocol, we may refer to an actor as a device or ~ode. In either case, a devtce 1s _s1mply a receiver the encl of the transmission. Often, both the transmitter and the receiver agree on lhe
processor that uses the physical layer to send or r~1v~ data to ~ from ano~er ~evtce. transmission speed used to send and, receive data. After sending a start bit, the transmitter
In this section, we provide a general descnpt1on of senal co~urucanon, ~ e l sends all N bits at the predetermined transmission speed. Likewise, on seeing a stan bit. a
communication, and wireless communication. In a~tion, we · ~escnbe commurucauon receiver si·mply starts sampling the data at a predetennined frequency until all N bits arc
principtss such as layering, error detection and correction, data secunl)', and 'plug and play. assembled. Another common synchronization technique is to use an additional wire for
/ . clocking purposes (see the I2C bus protocol). Here, the transmitter and receiver devices use
allel Communication ~ ...._ this clock line to determine when to send or sample the data.
·' ara//el communication takes place when the physical layer is capable of carrying mul~ple
bits ofdata from one device to another. This means that the data bus is-composed of multiple
Wireless Communication
data wires in addition to control and possibly . power wires, running in parallel from one Wireless commUnication eliminates the need for devices to be physically connected in order
device to a~other. Each wire carries one of the bits. Parallel communication has the advantage to communicate. The physical layer used in wireless communication is typically either an
of high data throughput, if the length of the bus is short. The length of a parallel bus m~st.be infrared (IR) channel or a radio frequency (RF) channel.
kept short because Jong parallel wires will result in high capacitance values, and ~sm1tung Infrared uses electromagnetic wave frequencies that are just below the visible light
a bit on a bus with a higher capacitance value will require more time to charge or discharge. spectrum, thus undetectable by the human eye. These waves can be generated by using an
In addition, small variations in the length of the individual wires of a parallel bus can cause infrared diode and detected by using an infrared transistor. An infrared diode is similar to a
the received bits of the data word to arrive at different times. Such misalignment of ~ta red or green diode except that it emits infrared light. An infrared transistor is a transistor that
becomes more of a problem as the length of a parallel bl.ls increases. Another problem with conducts (i.e., allows current to flow from its source to its drain). when exposed to infrared
parallel buses is the fact that they are more costly to construct and ~y be bulky, es~cially light. A simple transmitter can send ls by turning on its infrared diode and can send Os bv
when considering the insulation that must be used to prevent the no1_se from ea~h wrre from turning off its infrared diode. Correspondingly, a receiver will detect ls when current f10\~
interfering with the other wires. For example, a 32-wire cable connecting two devtces together chrough its infrared transistor and Os otherwise. The advantage of using infrared
will cost much more and be larger than a two-wire cable. . communication is that it is relatively cheap to build transmitters · and receivers. ll1c
In ge era!, parallel communication is used when corutecting devices that reside on ~e disadvantage of using infrared is the need for line of sight between the transmitter and
same IC or devices that reside on the same circuit board. Since the length of such buses 1s receiver, resulting in a very restricted communication range.
short, e capacitanceJoad, data misalignment and cost problems mentioned earlier do not Radio frequency (RF) uses electromagnetic wave frequencies in the radio spectrum. A
transmitter here will need to use analog circuitry and an antenna to transmit data. Likewise. a
pla); · important role.
receiver will need to use an antenna and analog circuitry to receive data. One adyantagc of
using RF js that, generally, a line of sight is not necessary and thus longer distance
erial Communication
f Serial communication involves a physical l~yer that ~es one bit.of data at a time. This communication is possible. The range of communication is; of course. dependent on the
transmission power used by the transmitter.
means that the data bus is composed of a smgle data wire, along with control and ~ss1bly _ ..
Embedded System Design . 167

166
www.compsciz.blogspot.in Nf+i - :·., ,,. .,;, .;• · i " ~..:. .:....... .

Chapte.r 6: Interfacing
6.9: Serial Protocols
Typically, RF transmitters and receivers must agree on a specific frequency in order to

send and receive data. Using frequency hopping, it is possible for the transmitter and receiver certainty that.there was at least one transmission error. Parity c hecks will always detect a
to communicate ,,hile constantly changing the transmission frequency. Of course, both single bi! error. However, burst bi! errors may or may not be detected by parity checking_
devices must have a common understanding of the sequence for frequency hops. Frequency an even number of errors, for example, will not be detected.
hopping allows· more devices to share a fixed set of frequencies and is commonly used in As an example of parity-based error checking, consider wanting to transmit the following
wireless communication protocols designed for networks of computers and other electronic 7-bit word: 0101010. Assuming even parity, we would actually transmit the 8-bit word:
devices. 0101010 I, where the least-significant bit is the parity bit. Now, suppose during transmission,
a bi! gets flipped, so thal a receiver receives the following 8-bit word: 11010101. The receiver
Layering detects thal this word has odd parity; knowing that the word was supposed to have even
parity. the receiver detennines that this word has an error. Instead, suppose the receiver
Layering is a hierarchical organization of a communication protocol where lower levels of the receives: I 1110101. This word has even parity, and so the receiver thinks the word is correct,
protocol provide services to the higher levels. We have already discussed the physical layer. even though it contains two errors.
The physical layer provides the basic service of sending and receiving bits or words of data. Checksum is a stronger form of error checking that is applied to a packet of data. A
The next higher-level protocol uses the physical layer to send and receive packets of data, packet of data will contain multiple words of data. Using parity checking, we used one extra
where a packet of data is composed of possibly multiple bytes. The next higher level uses the _ bit per word to help us detect errors. Using checksum, we use an extra word per_packet for the
packet transmission service of its lower level to perhaps send different type of data such as same purpose. For example, we may compute the XOR sum of all the data words in a packet
acknowledgments, special requests, and so on. Typically, the lowest level consists of the and send this value along with the data packet. Upon receiving the data packet words and the
physical layer and the highest level consists of the application layer. The application layer checksum word, the receiver will compute the XOR sum of all the data words it received. If
provides abstract services to the application such as ftp or http. the computed checJcsum word equals the received checksum word, the data packet is assumed
Layering is a way to break the complexity of a communication protocol into independent to be correct. Otherwise, it is assumed to be incorrect. Again, not all error combinations can
pieces, thus making it easier to destgn and understand, much like a programmer abstracting be detected. We can of course use both parity and checksum for stronger error checking.
away complexities of a program by creating objects or libraries of routines. In communication As an example, suppose a packet consists of four words: 0000000, 0101010, 1010101,
and networking, the concept of layering is very fundamental. and 0000000. The XOR checksum of these four words is 1111111. A transmitter can thus
send that checksum word at the end of t11e packet. Now, suppose the receiver receives
Error Detection and Correction 1000001. 0101010, 1010101 , and 0000000. Note that two bils have switched in the first word,
Error detection is the ability of a receiver to detect errors that may occur during the and that parity-based error checking would not detect this error. The receiver computes the
. transmission of a data word or packet. The most common types of errors are bit errors and checksum of this packet and obtains O 111110. This differs from the receivi:d checksum of
burst of bit errors. A bit error occurs when a single bit in a word or data packet is received as 11111 t'l , and thus the receiver determines t11at an error has occurred. . ·
its inverted value. A burst of bit error occurs when consecutive bits of a word or data packet Nole thal errors can also occur in the parity bit or the checksum word itself.
are received incorrectly. Given that an error is detected, error correction is the ability of a When using parity or checksum error detection. error correction is typically done by a -
receiver and transmitter to cooperate in order to correct the problem. The ability to detect and retransmission and acknowledgment protocol. Here, the transmitter sends a data packet and
correct errors is often part of a bus protocol. We will next discuss parity and checksum error expects to receive an ac;knowlcdgmenl from the receiver indicating that the data packet was
·detection algorithms, which are commonly used in bus protocols. received correctly. If an acknowledgment is not received, !he transmitter retransmits the data
Parity is a single bit of information that is sent along with a word of data by the packet and waits for a second acknowledgmcnl. 1
-ff-.C: ~ , ~l-
transmitter to give the receiver some additional knowledge about the data word. This 1} ?- (., J vJJ.) 1, -J;G J, ').-
additional knowledge is used by the receiver to detect, to some degree, a bit or burst of bit ·-- --- - -- - - -- -- -- ~~--- - - - - - - --=- - - -- - -
error in receiving a word. Common types of parity are odd or even. Odd parity is a bit that if
set indicates to the receiver that the <l?ta word bits plus parity bit contain an odd number of Is.
6.9 Serial Protocols > . f Kb:i.::&,
.,,,--,-.@ ·
Even parity is a bit that if set indicates to the receiver that the data word bits plus parity bit In this s . we describe four popular serial protocols, namely the 1-c protoi:o~ the CAN
contain an even number of ls. Prior to sending a word of data, the transmitter will compute 1··.
. the FircWirc protocol, and the USB protocol. •
the parity and send that along with the data word to the receiver. On reception of the data ~ / ·. .
,,ord and parity.bit. the receiver will compute the parity of the data and make sure that it
agrees with the parity bit received from the transmitter. If a parity check fails, it indicates with · Philips Semiconduc!ors developed lhc Inter-IC, or i2C. bus nearly 20 years ago. 12_!;: is a two-
wirc--Scrial- bus protocol. This protocol enables peripheral ICs in electronic systems to
168 , Embedded System Design

Embedded System Design 16~
- · - - - - - - - - - - - - - - - -- - ----'------
6.9: .Serial Protocols
SCL an I2c bus. ~e protocol does 1101 limit the number of master devices on an t2c bus, but
SDA typically, in a microcontroller-based system, the microcontroller serves as the master. Both
master and servant devices can be senders or receivers of data. This will depend on the
function of the device. In our example. the microcontroller and EEPROM send and receive
Micro- EEPROM Temp. LCD-
controller data, while the temperature sensor sends .data and the LCD-controller receives data. In Figure
(servant) Sensor controller
(master) (servant) (servant) 6.24(a), arrows connecting the devices to the 12C bus wires depict the data movement
< 400pF direction. Normally, all servant devices residing on an I2c assert high-impedance on the bus
(a) while the master device maintains logic high, signaling an idle condition.
Addr=OxOI Addr=Ox02 Addr=Ox03 All data transfers on an I2c bus are initialed by a start condition. A start condition is
shown iri Figure 6.24(b). A high to low transition of the SDA line while the SCL signal is held
high signals a start condition. All data transfers on an I2c bus are terminated by a stop
SDA SDA SDA condition. A stop condition is shown in Figure 6.24(b). A low to high transition of the SDA
SCL line while the SCL signal is held high signals a stop condition. Actual data is transferred in
SCL SCL
between start and stop conditions. A typical I2C byte write cycle works as follows. The master
Start condition SendingO Sending I Stop condition device initiates the transfer by a start condition. Then, the address of the device that the byte is
being written to is sent starting with the most significant down to the least significant bit.
(b) From Ones and zeros are sent as shown in Figure 6.24(b). Here, the bit value is placed on the SDA
From
Servant receiver line by the master device while the SCL line is low and maintained stable until after a clock
I I pulse on SCL. If performing a write, right after sending the address of the receiving device,
D ~ I V. \
' I \ 'V the master sends a zero. The receiving device in return acknowledges the transmission by
C s A A A A R A D D D A s 0
holding the SDA line low during the first ACK clock cycle. Following the acknowledgment,
T R .._ 5 - o- ! - C ,__ 8 '--- 7 '-- 0 ~ C T p the master device transmits a byte of data starting with the most significant down to the least
/
6 -
T w K K significant bit. The receiving device, in this case the servant, acknowledges the reception of
Typical read/wnte cycle .
data by holding the SDA line low during the second ACK clock cycle. If performing a read
(c) operation, the master initiates the transfer by a start condition, sends· the address of the device
that is being read, sends a one (logic high on SDA line) requesting a read and waits to receive ·
an acknowledgment. Then, the sender sends a byte of data. The receiver, mastet device in this
Figure 6 .24: I'c bus slructure.
case, acknowledges the reception of data and terminates the transfer by generating a stop
condition.,The ti~ngdiagramofa typical read/write cycle is d~ficted in ~igure 6.24(c) .. . t,('
co~uni~te with ea~h other using simple communication har~ware. Based on th~ original
specification of the IC, data transfer rates of up to 100 kbits/s. and 7-bit addressing are CAN-
,,.,.- 4n 6~
./ "-0 ~
LJI
\ \, ! ,1,
0 '(· -f\i ,J' '\("\, • • .-/ i Q
w='-'?-. e
)tLJ_
possible. Seven-bit addressing allows a total of 128 devices to communicate over a shared I2C _,,-- ,,_- u .:;).
bus. With.increased data transfei'rate requirements, the I2c specification has been recently e .controller a~ea netw~rk (CAN) b~s is a serial c.ommun!cation protocol for _!eal-time {21
enhanced to include fast-mode, 3.4 Mbits/s, with IO-bit addressing. Cominon devices capable application~ poss1b1y: earned over a tw~ste<:I PaJ.r_~f ."'ires_. ~~: ~r~~5~::was.developed by D
,>f.,;t
interfacing to an 12c. bus include .EPRO-Ms, Flash and some RAM memory devices, real- Rohert-Boah .GmbH-to-enable--oommurueauonamong vanous electromc components-of..cars
~ te mative----to--expensi:ve-.and.-eumbersome --wiring harnesses. · Th-e-robustness----of:....the...
ff't 1~locks, watchdog timers, and nucrocontrollers. ·
~ p i e I~C network is depicted in Figure 6.24(a). The bus consists ofh~o wires; a data _pr~ol::has--expande_d'its·-use··to,nany---other- automation and----industrial-application@9me
e called serial-data-hrte (SDA) and a clock wire called serial-clock-line (SCL). The I2C ch3J1!cteristics of the CAN protocol include high-integrity serial .data . communications,
specifi9tion does notlimit the length of the bus_wires, as long as the total capacitance of the reaf: tlme··suppon; ·data· rates of up to 1 Mbit/s, I I-bit adoressing, error detectiop., and
bus remains under 400 pF. In this example, there are four devices attached to the bus. One of -~~ nfine'?ent c:Pabil,ities . 'fl.!(::'."{:~~ot~f~i_iniod :i.~ e1iiect't~cfT f89'lfl _ O:or liigh-sp~ed
these devices, the inicrocontroller, is a master. The other three devices, a temperature sensor, applw.at.iruiS~@-4~W~ef4ower~plicauons)emmon appliciittons, otb.e r
an.~PROM, _and a LCD-controller, are servants. Each of these servant devices is assighed a tl)i!Il autom<>_biles, usin~ CAN include elevator co~lers, copiers, teiescopes; -production-line
uruque addn:ss, as shown in Figure 6.24(a). Only master devices can initiate a data transfer on · control systems, and medicaVs'trurnents. Amo~g devices that incorporate a CAN interface
---·-:::""" - ·
Chapter 6: Interfacing 6.10: Parallel Protocols
are the 805150mpatible 859} pr~or, and a variety of standalone CAN controllers, such as available, providing
. a handful
. of convenient USB ports right on the desktop . Hubs r,cature an
the 80C200,'from Philips) ,._ Y' · upstream connecllon (pomted toward the PC) as well as multiple downstream port 1 , II
.
the connect10n f dd" · · h S O ii ow
!he CAN spec_ificati~n does not specify the actual layout and structure-of the physica) · th. o a 11lonal penp eral devices. Up to 127 USB devices can be cOnncctcd
bus tts! !.f· Instead, _~I_Jeqmres that a device connected to the CAN bus is .able to transmit, or toget her m 1s way.
dete~t, on the physical bus, one of two signals called domi.nant or recessi~ For example, a USB_host controllers manage and control the driver software and bandwidth required by
dommant signal may be re~resented as logic 'O' and recessive as logic '(11on a single data each pen~heral connected to the bus. Users don't need to do a thing, because all the
wire: Furth~rrnore, the physical CAN bus must guarantee that if one of two devices asserts a configuration steps happen automatically. The USB host controller even allocates electrical
domu~ant s_1gnal and another devi~ simultaneously a recessive signal, the dominant signaJ P.Ower to the USB devices. Like USB host controllers, USB hubs ·can detect attachments and
prevails. Given a physical CAN bus with the above-mentioned properties, the protocol defines detachments of f:>l!riph~rals occu~g ~o~tream and supply appropriate levels of power to
data packet format and transmission rules to prioritize messages, guarantee latency times downstream devices. Smee power 1s drstnbuted through USB cables, with a maximum length
allow fo_r multiple masters, handles transmiss.ion errors, retransmit corrupted messages, and of 5 meters, you no longer need a clunky AC power supply box for many devices.
d1stmgmsh between a permanent failure of a node versus temporary errors.
FireWire 6.10 Par_allel_ Proto~ols . (y,Q)v~~R(\JACi'l C06',y tSVJ/;';v~)~,

The FireWire (a.k.a. I-Link or Lynx) is a high-performance serial bus develope4 by Apple In this secllon, 'Ye bnefly descnbe two popular parallel protocols, namely the PC! b~s
Computer Inc. Because the s~~ification of the Fire Wire protocol is gi_v!!n by the 1394 IEEE protocol and the ARM Bus protocol.
designation, many refer to it as the IEEE 1394, or simply 1394. The need for FireWire is
driven by the rapidly growing need for mass information transfer. Typical local or wide area PCIBus
networks (LANs/WANs) are incapable of providing cost-effective connection capabilities and
The Peripheral Component interconnect (PCI) bus is a high-performance bus for
do not gu'.'™1tee bandwidth for real-time applications.¢ome characteristics of the FireWire
interconnecting c~ps, expansion boards (e.g:, a video card that plugs into a main board like a
protocol mclude transfer rates of I 2.5 to 400 rvifilts/s, 64-bit addressino real-time
Pentium mother board), and processor ~emory subsy.s.tems. The PCI bus originated at Intel in
connection/disconnect and address assignment (a.k.a., plug-and-play capabilities), and
the ~ly 1990s, was then adopteifby the industry as a standard and administered by the PCI
packet-based-layered design structure.
Special Interest Group (PCISIG), and was first used in personal computers in 1994 along with
While r2c and CAN bus protocols are designed mostly for interfacing ICs, FireWire is I
Intel 486 processors. '[!le -~CI bush~ since largely replaced the earlier biis architectures such
des'.gned for interfacing among independent electronic devices (e.g., a desktop comp~ter and
as the ISA/EISA bus des9ribed earlier, and Micro Channeibus·protocols. Some<:haracteristies
a_ d1_g1tal scanner). Moreover, FireWire is capable of supporting an entire local-area netwoik
of the PC! bus protocol include transfer rates of 127.2 to 508.6 Mbits/s, 32:bit addressing,
s1mr(ar to,one based.on Ethernet. The 64-bit wide address space of FfreWireis11artitioned as
synchronous bus architecture (i,e.; all transfers take place with respect to a clock signal), and
IO bits for network identifiers, 6 bits for node identifiers, and 48 bits for memory addre_ss. A
multiplexed 32-bit <4ita/address lines. It must be noted that later additions to the specification
local-ar~ network based on FireWire can consist of 1,023 subnetworks, each consisting of63
of the PCI bus extend ·the protocol to allow 64-bit data and addressing while maintaining
nodes, wllh each node, in turn, addressable by 28 l terabytes of distinct locations! FireWire is
feasible for applications such as disk drives, printers, scanners, cameras, and many other
compatibility wi~ the 32-bit ~chemes. ( ,-"' D p)
consumer electronics devices. ARM . s ·t(, ,
32 ,· ,:,,-
\._c:;;_.i~_-.; ··/
USB ~ PCI is a widely used industry standards, many other bus protocols are predominantly
esigned and used internally by various IC design companies. One such bus is the ARM bus
The Universal Serial Bus (USB) protocol is designed to make it easier for PC users to connect
moni_tors, printers, digital speakers, modems and input devices like scanners, digital cameras, ~esig~ed by th~ ARM Co-~r-ation.and ~ocumented in ARM' s application note 18v~s bus
Joysllcks, _and multimedia game equipment. USB has two data rates, 12 Mbps for devices . ~ ~,m~4.JQ...mtetfa.~W!.til. t!!~--~ lm~_gf p~ocessors. The .ARM bus supports 3 ~ t a
requmng mcreas~ bandwidth, and l.5 Mbps for lower-speed devices like joysticks and game ._
transfer and 32-bit ad$essmg·anc1, ·similar tcrPCI, is implemented using synchronous data i
transfer architecture. ~ ~ f e r__~!«:__En an ARM bus is _!1,o t specified and instead is a ·. j
II
pads. USB uses a llered star topology, which means that some USB devices, called USB hubs,
~-n~tion_?f_t_!l~ ~~ ~~ .t:d used in a particular. w"pfication:. More specillcaffy:,
1fiheclock i
can serve as connection ports for other USB peripherals. Only one device needs to be plugged ·
rnto the PC Other devices can then be plugged into the hub. USB hubs may be embedded in
such devices as monitors, printers and keyboards. Standalone hubs could also be made -- ·-....r
r
ID ,, .. ,,
..))
~~ed o; the ARM~ is denoted as X, "tJien the iransfer rat~ ~s,;6 r~- 7?-s~ _,2 . . ··- .
;_) L )j • .
~
../
172 Embedded System Design ·
--~::J
- 6.11: Wireless Protocols
hoc network architectures uses a broadcast and flooding method to all other nodes to establish
who's who. The seco11d type of network structure used in wireless LANs is the infrastructure.
6.11 Wireless Protocols This architecture uses fixed network access points with which mobile nodes can
Jn this section, we briefly introduce three new and emerging wireless protocols, namely lrDA,
communicate. These network access points are sometime connected to landlines 10 widen the
Bluetooth, and the IEEE 802. l l. LAN's capability by bridging wireless nodes to other wired nodes. If service areas overlap,
handoffs can occur. This structure is very similar to the present day cellular networks around
lrDA the world.
The Infrared Data Association (lrDA) is an international organization that creates and The IEEE 802.11 protocol places specifications on the parameters of both the physical
promotes interoperable, low-cost, infrared data interconnection standards that support a w~- PHY and medium access control MAC layers of the network. 1be PHY layer, which actually
up. point-to-point user model. Their protocol suite, also _commonly referred to ~s IrDA,_ 1s handles the transmission of data between nodes, can use direct sequence spread spectrum,
designed to support transmission of data between two devices over short-ran~e pomt-to-pomt frequency-hopping spread spectrum, or infrared pulse position modulation. IEEE 802. 11
infrared at speeds between 9.6 kbps and 4 Mbps. IrDA is that small,_ sermtransparent, red makes provisions for data rates of either I Mbps or 2 Mbps, and calls for operation in the 2.4
window that you may have wondered about on your notebook compute~. Over·the last seve_ral to 2.4!135 GHz frequency band, which is an unlicensed band for industrial, scientific, and
vears. IrDA hardware has been deployed in notebook computers, pnnters, personal dig1~ medical applicati~ns, and 300 to 428,000 GHz for IR transmission. Infrared is generally
~ssistants. digital cameras, public phones, and even cell phones. One of the r~ons for this considered to be more secure to eavesdropping, because IR transmissions require absolute
has been the simplicity and low cost of IrDA hardware. Unfortunately, until recently, the line-of-sight links (no transmission is possible outside any simply connected space or around
hardware has not been available for applications programmers-to use because of a lack of comers), as opposed to radio frequency transmissions, which can penetrate walls and be
suitable protocol drivers. · . . . intercepted by third parties unknowingly. However, infrared transmissions can be adversely
Microsoft Windows CE l.0 was the first Windows operating system to proVIde bmlt-m affected by sunlight, and the spread-spectrum protocol of IEEE 802.11 does provide some
IrDA support. Windows 2000 and Windows 98 now also include support for the_same IrDA, rudimentary security for typical data transfers.
programming APis that have enabled file sharing applications and games on Windows ~E. The MAC layer is a set of protocol~. which is responsible for maintaining order in the use
!rDA implementations are becoming available on several popular embedded operatmgl of a shared medium. The IEEE 802.1 fsia:ndard specifies a carrier sense multiple access with
collision avoidance CSMA/CA protocol. In this protocol, when a node receives ·a packet to be
systems. I transmitted, it first listens to ensure no other node is transmitting. If the channel is clear, it
Bluetooth
t then transmits the packet. Otherwise, it chooses a random backoff-factor, which determines
Blue~ooth is a new and global standard for wireless connectivity. Tl;ris protocol is based on a.
the amount of time the node must wait, until it is allowed to transmit its packet. During .I
periods in which the channel is clear, the transmitting node ·decrements its backoff counter.
low-cost. short-range radio link. The radio frequency used by Bluetooth 1s globally available.
When two Bluetooth-equipped devices come wit~in IO m~ters_ of eac~ oth~r, th~y can
establish a connection. Because Bluetooth uses a radio-based hnk, 1t doesn t reqmre a h~e-of-
When the backoff counter reaches zero, the node transmits the packet. Since the probability
that two nodes will choose the same backoff factor is small, collisions between packets are I
• minimized. Collision detection, as is employed in Ethernet, cannot be used for the radio
sight connection in order to communicate. For example, your laptop could send 1nformallon to
frequency transmissions of IEEE 802. Ii. The reason for tlris is that when a node is
a printer in the next room, or your microwave oven could send a ~e~sage to your cordle~
transmitting it cannot hear any other node in the system, which may be transmitting, since its
phone telling you that your meal is ready. In the future, Bluetooth 1s likely to be ~tan~d Ill
own signal will drown out any others arriving at the node.
tens of millions of mobile phones, PCs, laptops and a whole range of other electromc devices.
Whenever a packet is to be transmitted, the transmitting node first sends out a short
ready-to-send RTS packet containing information on the length of the packet. If the receiving
IEEE 802.11 node hears the RTS, it responds with a short clear-to-send CTS packet. After this exchange,
li-'U'. 802. r I is an
IEEE-proposed standard for wireless local area networks (LANs). There the transmitting node sends its packet. When the packet is received successfully, as
arc two different wavs to configure a network: ad-hoc and infrastnlc!ure. In the a~-hocl determined by a cyclic redundancy check, the receiving node transmits an acknowledgment
network. computers ;re brought together to form a network on the fly, Here, ~ere 1s no ACK packet.
·structure to the network. there are no fixed points, and usually every node is ab\e to : }·:
;.'jl
communicate with every other node. Although it seems that order would be difficult_tO
maintain in this type of network, special algorithms ha~c been designed lo elect o~e ma_chm;I ·
as the master station of the network with the others bcmg servants. Another algontlun ma ·
.I,
174 ·- ---· --- . Embeddod sy~,m Di:b..,od sy,•m o~••
175 i
www.compsciz.blogspot.in · ··· ····-·- · -·-· ··---·- ---- ·- - ·-··- · ~ - . ..... -~-ec:a:s ""''
,. _..
·· ~~
Chapter 6: Interfacing 6.14: Exercises
6.3 Show how to extend. the number. of ports on a 4-pon 8051 to g b y usmg · extende d
6.12 Summary parallel l/0. (a) Usmg block diagrams for the 8051 and the extended H 1 I/O
Interfacing processors and memory represents a challenging design task. Timing dic1grams devic~, draw and label all interco~ections and I/0 ports. Clearly indicat:: :ames
provide a basic·means for us to describe interface protocols. Thousands of protocols ::xist, but and widths of all connections. (b) Give C code for a function that could be used to write
to the extended ports.
they can be better understood by understanding bask protocol concepts like actors, data
direction, addresses. time multiplexing, and control methods. A general-purpose processor 6.4 · Discuss the advaniages and disadvantages of using memory-mapped I/0 v~rsus
standard I/0.
typically has either a bus-based I/0 structure or a port-based I/0 Structure for interfacing.
Interfacing with a general-purpose processor is the most common interfacing task and · 6.5 Explain the benefits that an interrupt addre~ table has over fixed and vectored interrupt
methods. 1
involves three key concepts. The first is the processor's approach for addressing external data
locations. known as its I/0 addressing approach, which may be memory-mapped I/0 or 6.6 Draw a block diagram of a processor; memory, and peripheral cormected with a svstem
standard I/0. The second is the processor's approach for hanrlling requests for servicing by bus, in which the peripheral gets serviced using vectored interrupt. Assume se~icing
peripherals, known as its interrupt handling approach, which may be fixed or vectored. The moves data from the peripheral to the memory. Show all relevant control and data lines
third is the ability of peripherals to directly access memory, known as direct memory access. of lie bus, and label component inputs/outputs clearly. Use symbolic values for
Interfacing also leads to the common problem of more than one processor simultaneously· addresses .. Provide a timing diagram illustrating what happens over the system bus
during the interrupt. · ·
seeking access to a shared resource such as a bus. requiring arbitration. Arbitration may be
carried out using a priority arbiter or using daisy chain arbit.ration. A system often has a 6.7 Draw a block diagram of a processor, memory, peripheral, and OMA controller
hierarchy of buses, such as a high-speed processor local bus and a lower-speed peripheral bus. connected with a ·system bus, in which the peripheral transfers 100 bj1es of data to the
Communication protocols may carry out parallel or serial communication, and may use wires, memory using DMA. Show all relevant control and data lines of the bus, and label
infrared. or radio frequencies as the transmission medium. Communication protocols may component inputs/outputs clearly. Draw a timing diagram showing what happens
include extra bits for error detection and correction, and typically involve layering as an during the transfer; skip the 2nd through 99th bytes.
abstraction mechanism. Popular serial protocols include iZc, CAN, FireWire, anct USB. 6.8 Repeat problem 6.7 for a daisy-chain configuration.
Popular parallel protocols include PCI and AiUvt:. Popular serial wireless pn,tocols include 6. 9 Design a parallel I/0 peripheral for the ISA bus. Provide: (a) a state-machine
IrDA, Bluetooth. and IEEE 802.11. description and (b) a structural description.
6.10 Design an extended parallel I/0 peripheral. Provide: (a) a state-machine description and
~ (b) a structural description.
~ List the three main transmission mediums described in the chapter. Give two common
6.13 References and Further Reading applications for each.
• VSI Alliance. On-Chip Bus Development Working Group, Specification I version 1.0, 6.12 Assume an 8051 is used as a master device on an I2C bus with pin P 1.0 corresponding
''On-Chip Bus Attributes:" August 1198, http://www.vsi.org. to I2C_Data and pin Pl.l corresponding to I2C_Clock. Write a set of C routines that
• L. Eggebrecht. Inter:facing to the IBM Personal Computer Indianapolis, IN: SAMS, encapsulate the details of the I2C protocol. Specifically, write the routines called
Macmillan Computer Publishing, 1990. StartI2C/StopI2C, that send.the appropriate start/stop signal to slave devices. Likewise,
• Peter W. Gofton, Mastering Serial Communications. Aiarneda, CA: SYBEX Inc .. 1994. write the routines ReadByte and Writei3yte, each taking a device Id as input and
• Bob O ' Hara and Al Petrick. IEEE 802.11 Handbook - - A Designer ·s Companion. performing the appropriate 1/0 actions.
Piscalaway, NJ: Standards Information Network, IEEE Press, I 999. 6.13 Select one of the following serial bus protocols, then, perfonn :\n internet search for
• John Hyde. USBDesign by Example. New York: John Wiley & Sons. Inc .. 1999. infonnation on transfer rate, addressing, error correction (if applicable). and
plug-and-play capability (if appl\cable). Then give timing diagrams for a typical
transfer of data (e.g., a write operation). The protocols are USB, I20, Fibre Cha11J1el,
SMBus, IrDA, or any other serial bus in use bj' the industry and not described in this
6.14 Exercises book. . , I
6.1 Draw the timing diagram for a bus protocol that is handshaked. nonaddressed. and
transfers 8 bits of data-over a 4-bit data bus. · '
6.2 Explain the difference between por1-bascd I/0 and bus-based 1/0.
176 Embedded System Oesigii ·. / Embedded System Design 177
6.14 Select one of the following parallel bus protocols, then, perfonn an Internet search for
_,
infonnation on transfer rate, addressing , DMA and intenupt control (if applicable), and
plug-and-play capability (if applicable). Then give timing diagrams for a typical
transfer of data (e.g., a write operation). The protocols are STD 32, VME, SCSI,
ATAPI, Micro Channel, or any other parallel bus in use by the industry and not
CHAPTER 7: Digital Camera Example
described in this book.
7.1 Introduction
7.2 Introduction to a Simple Digital Camera
' 7.3
7.4
Requirements Specification
Design
7.5 Summary
7;7 Exercises
7 .1 Introduction
In the preyious chapters, we introduced general-purpose processors, custom single-purpose
processors; ·standard single-purpose processors, memory, and techniques for interfacing
pr~ors and memory. In this chapter, we apply this knowledge to design a simple digital
camera. In particular, we will examine the trade-Qffs of using generaJ-pwpose versus
single-purpose processors to implement the necessary camera functionality. We will see that
choosing a good partitioning of functionality among the different processor types is essential
to building a good design. This in tum requires a unified view of different processor cypes, as
this book has thus far stressed. . · ·
. We begin with a general introduction to digitai cameras and their inner workings. We
as
then develop tlie camera's specifications, which describe the. desired behavior well as
constraints on design metrics like perfonnance, siz.e, and power. We explore several
alternative implementations of the digital camera and compare their design metrics.
7.2 .Introduction to a Simple Digital Camerc[I

A 'digital camera is.a popular consumer electronic device that can·capture images, or "take
pictures," and store them in a digital format A digital camera does not contain film, but rather
or
oiie mote ICs possessing processors and.memories. Digital cameras were not possible over
a decade ago, because sinall-enough iCs co~d not process fast enough or store enough bits to
·Embedded Sysl!!m Design Embedded ~ystem Design 1.79

178
-· ·--·- ···"--• - · ·- ·-·· -- ···----·· - - ' . ---·- ·- -· ·· www.compsciz.blogspot.in

'
,.
Chapter 7: Digital Camera Example
be feasible. The advent of systems-on-a-chip and high-capacity flash memory has made such
cameras possible.
When e;_posed to
light, each cell.
Lens area
7.2: Introduction to a Simple Digital camera
lJie electromechanical
User's Perspective becomes electrically shutter is activated to
charged. This charge expose the cells to
From a user's point of view. a simple digital camera works as follows. The user turns on the light for a brief
digital camera, points the camera lens to the scene to be photographed, and clicks the can then be converted
moment.
"shutter" button. The user can repeat these steps until up to N images are stored internally in to a 8-bit value where
0 represents no
the camera. Here. N is _a constant that depends on the model of the camera, which in turn The electronic
exposure while 255 circuitry, when
depends on the amount of memory in the camera and the number of bits used per image. The represents very intense
user may also attach the digital camera to a PC, say, by using a serial cable, to download the commanded,
exposure of that cell to ~ discharges the cells,
photos to a hard disk for pennanent storage. light, i:: activates the
oi
X electromechanical
Designer's Perspective Some of the colunms ii: shutter, and then reads
From a designer's point of view, a simple digital camera performs two key tasks. The first
are covered with a
black strip of paint. ~,,:::::::: ,,,,::::::::,,, the 8-bit charge value
of each cell These
task is that of processing images and storing them in internal memory. The second task is that The light-intensity of values can be clocked
of uploading the images serially to an attached PC. , these pixels is used for out of the CCD by
zero-bias adjustments y
The task of processing and storing images is initiated when the user presses the shutter Pixel -external logic through
of all the cells. a standard parallel bus
button. At this point. the image is captured and converted to digital form by a charge-coupled
device (CCD). Then. the image is processed and stored in internal memory. The task of interface.
. Figure 7. 1: Internals of a charge-coupled device (CCD).
uploading the image is initiated when the user attaches the digital camera to a PC and uses
special software to command the digital camera to transmit the archived images serially. Let
us look at these actions in more detail. A digital camera uses a CCD to capture an image. Once the image is captured, it must be
A CCD is a special sensor that captures an image. A CCD is a light-sensitive silicon corrected to eliminate the zero bias error. Then. the image must be encoded using the JPEG
solid-state device composed of many small cells. The light falling on a cell is converted into a encoding scheme. The task of bias adjusunent is described next. .,,---
small amount of electric charge, which is tl1en measured by the CCD electronics and stored as Figure 7.2 shows a raw image block of size 8 x 8 pixels that is captured using a CCD of
a number. The number usually ra'nges from 0, meaning no light, to 256 or 65,535, meaning that size. Nonnally, the CCD would be of much greater resolution, say 640 x 480 pixels, but
very intense light per pixel. Figure 7.1 illustrates the internals of a CCD. On the periphery, a we use a small one to. be able to illustrate the various operations of a digital camera in this
CCD 1s composed of a mechanical shutter. This is a screen that normally blocks the light from chapter. Notice in Figure 7.2(a) that there are IO columns. As mentioned earlier, the last two
falling ~n tl1e light sensitive surface. When activated, the screen opens momentarily and columns are extra and are used to detect zero-bias. Recall that these ·two columns are covered
allows hght to hit the light sensitive surface. charging the cells with electrical energy that is and should nonnally read a value of zero. Looking at the last two columns of the first row, we
proportional to the amount of light passed in. The screen typically sits behind an optical Jens see that the measured light intensity is on the average 13 units larger than the actual light
that focuses the_scene observed through the viewfinder onto the light sensitive surface of the intensity. We obtain 13 by averaging the last two columns ((12 + 14) / 2) = 13). We can thus
CCD. A CCD also has internal circuitry that measures the electric charge of each cell, correct the error for this row by subtracting 13 from each element of the first row. We can
converts it to a 9igital value, and provides an interface for outputting the data. repeat this process for each row to obtain a block of 8 x 8 pixels that has been corrected for
Due to manufacturing errors, the light-sensitive cells of a CCD may always measure the zero bias errors. The corrected block is given in Figure 7.2(b).
light intensity to be slightly above or below the actual val~e. This error, called the zero-bias The next step is to compress the image, which reduces the number of bits needed to store
error, is typically the same across columns but different across rows. for this reason, some of the image in memory. Compression allows us to store more images in Iinlited amount of
the.left most columns of a CCD's light-sensitive cells ar.c blocked .by a strip of black paint. memory. Compressed images can also be transmitted to a PC in less time. We'll perform
The actt1al intensity registered by these blocked cells ?hould be z ero. Therefore, a reading of JPEG encoding of the image. JPEG is a popular standard format for representing digital
other than zero would indicate the zero~bi<1s error for that row. Figure 7.1.shows the covered images in a compressed form. JPEQ, pronounced "jay-peg," is short for Join( Photographic
cells. 11lis becomes clearer as we give an exan:iple) nthe next para~phs. ..· ' · Experts Group. The wr,rdjoint refers to the group's status as a conunittee working on both
ISO arid ITU-T stahdarc;,. Theirbest-kil.own standard is for still-image compression.
_ _ __ _ __ _ _ _ _ _ _ _ _ ___:____ _ _...:.:.:...;___...:._ _. c , - - " ' - - ' - - - -
180 Embedded System Design ·

Embedded System Design . 181
Chapter 7: Digital Camera Example 7.2: Introduction to a Simple Digital Camera
1150 39 --43 -10 26 -83 II 41 144 5 -5 -1 3 - 10 1 5

136 170 155 140 144 115 112 248 11 14 123 157 142 127 131 102 99 235 -81 -3 I 15 -73 -{i -2 22 -5 -IO 0 14 -9 -1 0 3 -1
145 146 168 123 120 117 119 147 11 10 134 135 157 112 109 106 108 136 14 - II I --42 26 -3 17 -38 2 --1 ..5
0 3 o. 2 -5
144 153 168 I Ii 121 127 118 135 9 9 135 144 159 108 112 118 109 126 2 -{i I -13 -12 36 -23 -18 5 0 -8 -2 -2 .5 --3 ·-2 I
176 183 161 I 11 186 130 132 i33 0 0 176 183 161 111 186 130 132 133 44 13 37 --4 10 -21 7 -8 6 2 5 -I J -3 1 -I
144 156 161 133 192 153 138 139 7 7 137 149 154 126 185 146 131 132 36 -II -9 --4 20 -28 -21 14 5 --1 -1 -1 ) -4 .3 2
I
122 131 128 14i 206 151 131 127 1 0 \ 121 130 127 146 205 150 130 126 -19 -7 21 -{i 3 3 12 -21 -2 -1 3 -1 0 0 2 -3
121 155 164 185 254 165 138 129 -4 4 I 17 151 160 181 250 161 134 125 -5 -13 -11 -17 --4 -I 7 -4 -I -2 -I 2 -1 0 1 -·I
173 175 176 183 188 184 117 129 5 5 168 170 171 178 183 179 IT2 124
(a) (b)
(a) (b)
Figure 7.3: A block of 8 x 8 pixd as caplured using a CCD. zero hias correcled: (a) after being encoded using OCT,
figure 7.2: A block of 8 x 8 pixels as captured using aCCD: (a) before zero-bias adjustment; the last 2 columns help (b) then after quantization.
represent zero bias for a given row, and (b) after zero-bias adjustmenL
using the previous equation. for row u and column v. Figure 7.3(a) shows the OCT-encoded
JPEG encoding provides for a number of different modes of operation. For a full
values for our sample block of 8 x 8 pixels. The inverse process will obtain the block in
coverage of the JPEG encoding, the reader is referred to the reference section at the end of Figure 7.2(b) from that in Figure 7.3(a).
this chapter. The mode that we discuss in this chapter is an encoding that provides for high
The DCT is sometimes distinguished from the ICDT by referring to the DCT as the
compression ratios using the discrete cosine transform (DCT). To compress an i~ge, the forward DCT. or FDCT
image data is divided into blocks of 8 x 8 pixels each. Each block is then processed m three
The next processing step is lo reduce the quali~. of the encoded DCT image. which helps
steps. The first step perfonns the DCT, the second step performs quantization, and the last
us compress the image. We do ~his by re~fucing the bit precision of the encoded data. Note
step performs Huffman encoding. ·
that if we represent the pixels with less precision. we will need fewer bits to encode them,
The DCT step transfonns our original 8 x 8 pixel block into a cosine-frequency domain.
thus achieving compression. For example. we can divide all the values by some factor of 2
Once in this form, the upper-left comer values of the transformed data represent more of the
(since division by a factor of 2 ·is achieved simply by right shifts). such as 8. This is the s1ep
essence of the image while the lower-right comer values represent finer details. We can,
where we actually loose image quality in order to achieve high compression ratios. This
therefore reduce the ·precision of these lower-right comer values to facilitate compression
process is referred to as quantization. 'ro decompress. we would perform a· dcquantization. In
while retaining reasonable overa.ll image quality. The actual DCT operation is given in this other words. we would multiply each pixel by the same factor of 2 (i.e .. 8 in our example).
formula: Figure 7.3(b) illustrates the quantization applied to the block of 8 x 8 shown in Figure 7.3(a).
C(h) = if ( h = 0) then l/sqrt(2) else 1.0 The last step of the JPEG compression is the encoding of data. Herc. the blo~ of 8 x 8
pixels. is first f;erialized. Specifically. the values are converted into a single list according to a
F(u,v) = 1/4 x C(u) x C(v) Lx=o..1 Ir-0 .7 Dxy x cos(1t(2x + l)u / 16) ~ cos(1t(2y + l)v / 16) zigzag pattern, as shown in Figure 7.4. Then, the values are Huffman encoded. Huffman
Here, C(h) is simply an auxiliary function used in the main equation, namely, F(u,v). The encoding is a mirumal variable-length encoding based on the frequency of each pixel. In other
function F(u,v) gives the encoded pixel at row u, column v. Dxy is the original pixel value at words, the frequently occurring pixels will be assigned a short binary code while tl1ose that
row x, column y . Of course, it would be useless to have a DCT transform if we are unable to don't occur as frequently will be assigned a longer code. Let us explain tliat with an example.
reverse the process and obtain the original. Below is the inverse DCT (IDCT), although 1t 1s In Figure 7.S(a), we have given the frequency of pixel occurrence of the encoded and
not necessary in the implementation of our simple digital camera: quantized 8 x 8 block shown in Figure 7.3(b). Herc. as shown. the encoded pixel value -I
occurs fifteen times while the encoded pixel value 14 occurs only one ti1nc.
C(h) = if ( h = 0) then l/sqrt(2) else 1.0 From this information, we construct a Huffman tree as illustrated in Figure 7.5(b). Wit11
f(x,y) = 1/4 Lu=o. 7 I..o 7 C(u) x C(v) x Euv x cos(1t(2u + l)x / 16) x cos(1t(2v + l)y / 16) each node in such a tree. we associate a value that is, computed as follo'>'s. For an internal
node, the va lue is the sum of the values of the children :o f that node. For a leaf node, the value
Again, C(h) is simply anauxiliajy'function used in the main equation, narnely,ftxJI). The is the frequency of occurrence of the pixel being rcp~sented by that leaf node. The tree is
functionftx.Y) gives the original pixel at row x, columny. Euv is the OCT-encoded pixel value, constructed from the bottom up (i.e., starting from lcilfs and working up toward the root).
;;,1apier 7: Digital Camera Example
7.3: Requirements Specification
-I i5,
I 00
(I 8,
~ 100
-2 6:-;
2 11 0
I 5,
I 01 0
2 5,
12 1110
J 5,
3 1010
5 5x
15 0110
-3 4,
3 111 10
Fig11r< 7.4: Data encoding sequence ofa block of8 ' · 8 pixel.
-5 3,
5 IOI 10
-JO 2x
tnitiallY. we create a leaf node for each of the pixels and initialize the values of these nodes \ IO 011 10
144 Ix
according to the pixel's frequency. Then we create an internal node by joining any two nodes 1-44 111111
that will result in the rninimwn value. We repeat this process until we have a complete bmary -9 I,
9 11 1110
-8 Ix
8 101111
tree. Once the Huffman tree is constructed, we can obtain a bin~ code for each of the pixel -4 h -1 101110
values by traversing the tree starting at the root down to the leaf labeled with that pixd. While 6 h ~ L! :1 111
traversi~g the tree, we construct a binary string. Each time we traverse down past a right child 14 Ix j ...q 14
we append a ·T' to our binary string, whereas each time we traverse down a left child_we 14 6 _l)
01 1110
(a)
append a '·o·· to our binary string. For example, in order to obtain the bma.ry code for the pixel (c)
value -3 in Figure 7.5(b), we would make four right traversals and a left traversal.. thus
obtaining the oinary string'' 11110". Figure 7.5(c) gives the Huffman codes for the remanung Figure 7._5: Hullinan encoding ofthc block of8 · 8 pixels shown in Figure 7.J(h): (a) the pi.\d valu'"- and associated
lr<qucnctes. (b) the resultmg Huffman tr«. (c) and the Huffman codes
~va~ . . .
· GiYen these Huffman codes. we encode our block of 8 x 8 pixels by creatmg a long stnng
of Os and Is. Here we take the sequence of pixels generated by the zigzag ordering shown m ways to perform such archiving. In any e1·ent. our memory requirement will be based ou .\"_
Figure 7A. and for each pixel we output the Huffman bim:cy code. In our example of Ftgure the image size and the average compression ratio that we can obtain using JPEG encoding.
7..J.. we would obtain the binary string "1111110I1001110 .. .." ~ . Finally, the only processing task that remains is to upload tl1e images and free the space
As stated earlier. Huffman encoding achieves compression by assigning a short bmary in memory when a PC is connected to the camera and an upload c0111111and is rccci,·cd. To
code to the most frequently appearing pixel values, while leaving longer_binary codes for tl1e accomplish this. we use a UART As you ·11 recall. a UART transmits data scriah 0\ er a
least frequently appearing pixels. Of course, this process is reversible since Huffman single data 1rirc. Our processing task will be to re;1d lhc images from memory and -trn11smi1
encoding also ensures that no two codes a.re a prefix of each_other. . _ _ them using the UART. As we transmit images. we reset the pointers. imagc-siz~ Yariablcs and
the global memory pointer accordingly.
Our next processing step is to archive our image. This step 1s rather easy. We simply
record the starting address and size of each image. We can use a lmked hst data s~cture to It mus! be noted again tl1at our descri ption of a digilal camera is ,·en· simple. A real
record this information. If we know beforehand that tl1e camera will hold at most iv images. digital camera " ·ill enable you lo take pictures of 1·aricd sizes. display images on an LC'D.
11e can set aside a portion of memory for our V addresses and N image-size variables. In allow image deletion. perform advanced image processing such as digitall_l" stretching.
1.00111i11g in and out. and many other things.
addition. 11e 11ould need to keep a counter that tells us the location of the next available
address in rnemor\'. For example, ·initially, all N addresses and image-size variables might be
set 10 o. Our global memory address will be set to N x 4, assuming that the address and t 7.3 Requirements Specification
imagc-si1.c 1·ariables occupv the initial N x 4 b)tes in memory. Then, tl1e first image will be
arcl;;, ed in memory starting at location :V x 4. Assuming the image was of size 1024, then we
f
Our digitai camera product's life begins II ith a requirements specification. A specification
11ill t1pd1te our global mc;nory address to N x 4 + 1024, and so on. Of course, there are other f describes · 1\hat a particular ~ystcm should do. namely the system's rcq11ircmcnts.
I ; ;:,~"s;.,;
Specificat ions include both functional and nonfunctional requirements. Func1i0nal
·i84 Emcedded System [ 1 esign o~;ge .- - - ··- ----- - - 185
.k=
-- ····- ·- -· ~··· -~~..,-~ -- ·-·-- · --:
7.3: Re_
quirem~nts Specific;t_tion
requirements describe the system's behavior, meaning the system's outputs as a function_of
inputs (e.g., "output X should equal input Y times 2"). Nonfunctional reqwrements descnbe
constraints on design metrics (e.g., "the system should use 0.00 I watt or less"). The initial CCD --'--..i Zero-bias adjust ,.__ _ _ _~
specification.of a system may be very general and may come from our company's marketing input
department. The initial specification for our camera might be a short document detailing the
market need for ·'a very basic low-end digital camera capable of capturing and storing at least
50 low-resolution images and uploading such images to a PC, costing around $100, with a
single medium-sized IC costing less than $25, including amortized NRE costs. Battery life
should be as long as possible. Expected sales volume is 200,000 if market entry is earlier than
6 months, and 100,000 if market entry is between 6 to 12 months. Beyond 12 months, this
product will not sell in significant quantities." yes
Let us begin by discussing the nonfunctional requirements in more detail, followed by an
informal high-level functional specification, and then a more detailed description of behavior.
Nonfunctional Requirements
Given our initial requirements specification, we might want to pay attention to several design
metrics in particular: performance, size, power, and energy. Performance is the time required
no
to process an image. Size is the number of elementary logic gates (such as a two input NANO serial output
gate) in our IC. Power is a measure of the average electrical energy consumed by the IC while e.g., 011010...
processing an image. Energy is power times time, which directly relates to battery lifetime.
Some of these metrics will be constrained metrics - those metrics must have values below
(or in some cases above) a certain threshold. Some metrics may be optimization metrics -
those metrics should be improved as much as possible, since this, optimization improves the Figure 7.6: Functional block-diagram specification ofa digital camera.
product. A metric can be both a constrained and optimization metric.
Regarding performance, our design must process images fast enough to be useful. We
might determine that a reasonable timing constraint is I second per image. Note that the terms Informal Functional Specification
timing and performance are often used interchangeably. More time than I second would
. _We can describe the high-level functionality of the digital camera by using the flowchart
probably be quite annoying from a camera user' s perspective. Imagine having to wait IO
m Figure 7.6. We see the major functions involved in image capture, namely zero-bias adjust,
seconds after pressing the shutter button before you could press the button again. A typical
DCT, quantize and archive in memory. We also see the function transmit serially. We could
soccer parent would probably not buy such a camera, for fear of missing a great goal! On the
then ~escnbe each_fu~ction's details in English; we omit such descriptions here since they
other hand, since we are aiming for the low-end of the digital camera market, our performance
were mcluded earher m the chapter. We'll asswne a very low-quality image with a 64 x 64
doesn't need to be much better than I second. Thus, performance is a constrained.metric but resolution, meaning the CCD has 64 rows and 64 columns.
not an optimization metric - anything less than I second is equally good.
Note that Figure 7.6 does not dictate:._that each of the blocks be mapped onto a distinct
Regarding size, our design must use an IC that fits in a reasonably sized camera. Suppose
processor. Instead, the description only aids in capturing the functionality of the digital
that, based on current technology, we determine that our IC has a size constraint of 200,000
r.amera by breaking that functionality down into simpler functions. The functions could be
gates. In addition to being a constrained metric, size is also an optimization metric, since implemented on any combination of single,purpose and general-purpose processors.
smaller !Cs are generally cheaper. They are cheaper because we can either get higher yield
from a current technology or use an older and hence cheaper technology. Refined Functional Specification
Finally, power is a constrained metric because the IC must operate below a certain ,.
temperature. Note that our digital camera cannot use a fan to cool the IC, so low power We can now concentrate on refining the informal functional specification into one that can
operation is crucial. Let's assume we determine the power constraint to be 200 milliwatt. actually be executed. This typjcally .consists of a C or C++ program describing the
Energy will be an optimization metric because we want the battery to last as long as possible. functionality. In our case, we could write C or Ct+ code to describe each function in Figure
Notice that reducing power or time each reduces energy. 7.6. Such a software prototype of the system is often referred to as a system-level model, a .
186 Embedded System Design ':mbedde_d System Design
#:in:::lu:le <stdio. h>

#define sz Fail
101011010
110101010
CCD.C
=
#define SZ a:JL
64
(64 + 2)
s tatic FILE *irnageFileH3rrlle;
010101101 static dlar buffer[SZ_I0'1] [SZ~COL);
static = igned .ro,,In::!ex, collndex;
void Ccdinitialize (cmst char *irnageFileNarre)
CODEC.C
inegeFileHandle = fq)ell(irnageFileNarre, "r");
rcwirrlex = -1;
image I e colirrlex = - l;
)
v oid Ccd:apture (void)
int pixel.•
101010101 remrl (irnageFileHarrlle) ;
010101010 for (rowindex=O; nMindex<SZ Fail; .ro,,In::!ex++) (
101010101. for (colin:lex=O; · collrdex<SZ_COL; col Index++)
0.. if ( fscanf (:ima<}=FileHarrlle, "%i", &pixel ) = 1 ) {
buffer[rowlndex) [collrrlex) = (mar) pixel;
UART.C
output .1 e
rowirrlex = 0;
co1Index = O;
}
Figure 7.7: Block-diagram of the executable model of the digilal camera.
char CcdPopPixel (void)
dJar pixel;
prototype. or simply a model, though the prototype is also a first imple_mentation. Keep in I pixel = buffer[rcwln:lex) [collndex];
mind that one person's specification may be another person's implementation. . . if ( ++co1Index = SZ_CTJL ) {
The software prototype can be executed on our' development computer to ven_fy its collndex = 0;
correctness. It can also provide insight into the operations cif our system. For example, m our if ( ++.rwlndex = sz- Fail )
collrrlex = - l;
digital camera, we can profile our executable specification as it is running, in order to find the
rcwi ndex = -1;
computationally intensiveDfunctions. Recall that a profiling tool is a tool that watche~ a
program under execution and records the number of times a particular procedure or funct10n
call was made. or a variable was written or read. We can also use the prototype to obtain return pixel;
sample output that is later used to verify the correctness of our final implementation.. For
example. we can run an image through our executable specification and obtain the senally ·
encoded output and store that in a file. Later, when we are testing our final IC chip, we can
Figure 7.8: High-level implementation of the CCD module.
feed it the same image and check that the output matches the expected output.
Figure 7. 7 gives the block-diagram of our high-level model of the digital camera. O_ur
executable model is composed of five modules. We start with the CCD module and its The Ccdlnitialize procedure is called to initialize our model. just prior to execution. It
corresponding C file called CCD.C, as shown in Figure 7.8. This module is responsible for takes as a parameter the name of the image file that is used to obtain the pixel data. The
CcdCapture procedure is called to actually capture an image,· in this case·, read it from a file.
simulating a real CCD (i.e .. it is designed to mimic the operations of an actual CCD}. It does
that by simply reading tl1e pixels of an image directly from a file that we specify. This module The CcdPopPixel procedure is called to get the pixels out of the CCD, one at a time. At this
point, you should have noted that in our executable specification, our modules commwiicate
exports three procedures, Ccdl11irialize, CcdCapture, and CcdPopPixel,
using procedure calls and parameter passing. .
Our next module is called, rather cryptically, CCDPP, .and its corresponding C file is
called CCDPP.c, · as shown in Figure 7.9. The PP stands for preprocessing. This module
188 Embedded Sy:stem Design Embedded System ;::iai,sign ·JSS
www.compsciz.blogspot.in ~
- ·- -· -······ ··----~ ~ ~ - --= ..:....i
7.3: Requirement~ Specification
flinclu:ie <stdio.h>
#define SZ_RCW 64 static FILE *ruq:utFi leHarrlle;
#define sz COL 64 void uartinitialize (ccnst char *rutp..ttFileNaire)
static diar
buffer[SZ !OfJ [SZ COL]; rutp.itFi]P.Han:lle = fopen(rutp..itFileNaire, "w") ;
static unsigned rcwirrlex, co1Index; J
void Ccdf:pinitialize() { void UartSen::i (char d) {
rcwirrlex = -1; fprintf(outµ.ltFileHarrlle, "%i\n", (int)d);
colirdex = - 1;
)
void CcclJ;p:apture (voidi
Figure 7. 10: High-level implementation of th~ UART module.
ctiar bias;
Ccdcapture () ;
for (rcwindex=O; rcwlndex<SZ IDv; rcwin:iex++) [
module is identical to that of the CCD module;:. We can think of the CCDPP as a CCD that
for (colin::iex=O; colI~<SZ COL; collndex++)
buffer(rcwin:iex) [col~) = CcdPopPi xel (); performs the zero-bias adjusunents internally
) Let us now !ook at the UART module and its corresponding C file called UART.C, as
bic:s = (CcdPopPixel () + CodPopPixel () ) / 2; shown in Figure 7. lO. Titls is really a model of a half UART (i.e., one that only transmits, bui
for(colirdex=O; colirrlex<SZ_COL; collrrlex++) does not receive). As with the other modules, the UART module exports an mitialization
buffer[rcwlrrlex) [co1Index) -= bias; procedure, called Uartlnitialize . This procedure takes a file name, were the transmiited data is
written to. The other procedure, UartSenii, is called when the digital camera is transmitting a
) byte. The procedure simply writes the transmitted byte to the output file,
rcwinde.x = 0; Our next module is called CODEC and its corresponding C file is called CODEC.C. as
colirdex = 0; ~hown in Figure 7. 11 . This file models the forward OCT encoding that was described earlier
in this chapter. The CODEC module exports the procedures Codeclnitialize. Cndec:PushPixef..
char Ccdf:pPopPixel (void) CodecPopPixel, and CodecDoFdct. The Codeclnitialize procedure resets an index 1ha1. is used
char pixel; by the push and pop procedures for traversing two buffers, described next. The
pixel = buffer[rcwlrrlex] [co1Index); {'odecPushPixe/ is called 64 times to fill an input buffer, called ibuffer. which holds the
if( ++colirdex = SZ_COL ) { original block of 8 x 8 pixels that is to be encoded. The CodecPopPixel is called 64 times to
co1Index = 0; retrieve pixels from the output buffer, called obuffer, which holds the encod~d block or 8 , 8
if ( ++rcwinde.x = szyo,, )
pixels. Once a block is placed in the input buffer, CodedDoFdct is called to actually perform
colirdex = - 1;
rcwirdex = -1; the transform. Therefore, to encode a block of8 x 8 pixels, we call CodecPushPixe/ 64 times.
. )
and CodecDoFdct once followed by 64 calls to CodecPopPixe/. Let us now discuss the actual
implementation o(this module. The module simply implements the FDCT equation given
return pixel; earlier and presented here again:
C(h) = if ( h == 0 ) then l/sqrt(2) else 1.0
F(u,v) = l/4 x C(u) x C(v) L =o1 L y:o.1 D,y x cos(n(2x + l)u / J6) x cos(n(2y + 1)v / 16)
Figure 7.9: lligh-kvd implementation <>fthc CCDPP module.
The first thing that you may note after studying the code is the large table called
performs the zero-bias adjustment processing. shown in Figure 7.9 and described at the COS_TABLE. If you look at the above equation, you'll notice that the argument to the cosine
beginning of this chapter. . function is always one of 64 possible -values. because the only variables in the cosine
This module also exports three procedures called Ccdpplnitialize , CldppCapture. and argument expression are the integers x and u (or y and v) and each of these variables can take
Ccdppl'opl'ixel. The Ccdpplnilialize procedure performs any necessary in_itializations. 1Jie oile of 8 values, from O to 7. Thus, for perfom1ance purposes, we have decided to precomputc,
CcdppCapture procedure is called to actually capture an image. Note that this pro~edure call,5 '. :/ the cosine value for afl these 64 possibilities and store them in a table. Actually. we have done
on the CcdCapture and CcdPopPixel procedures of the CCD module to obtain an unage. As 11 . u::,. more than that. l_nstead of storing the floating-point values. we have converted these 10 an
is obtaining the iQ1age pixels. it also performs the zero-bias .adjustments. The CcdppPopPixel integer representation.
procedure is called to get the pixels out of the CCDPP. Note that the interface to the CCDPP jj
!i
i
190 Embedded System Design _ , -~rnbedded System Design 191 l111
~
www.compsciz.blogspot.in . ··--- ·--- - -- -- --· -~ - .;,..,.:._____;_ ~=··· &
=...........
'
·..:··
.

static ccnst short COS_TABIB [8] [8] = { #de.fire sz_R:M 64
I 32768,
32768,
32768,
32138, 30273, 27245,
27245, 12539, -6392,
18204, -12539, -32138,
23170, 18204, 12539,
-23170,
6392 ),
- 23170, -32138, -30273, -18204 ),
6392, 30273, 27245 ),
#de.fire
#define
SZ_COL 64
NUM_Ral_BUXlG (SZ_!Gl' / 8)
#de.fire
NUM COL IillXlG (SZ mL / 8)
( 32768, 6392, -30273, -18204, 23170, 27245 i -12539, -32138 ), static soort
b.lffer[SZ R:M] [SZ O)Lj, i, j' k, l, tarp;
( 32768, -6392, -30273, 18204, 23170, -27245, -12539, 32138 ), void Cntr1Initialize (void) {} -
I 32768, -18204, -12539, 32138, -23170, -6392, 30273, -27245 ), voidCntrlCaptureimage(void) (
( 32768, -27245, 12539, 6392, -23170, 32138, -30273, 18204 ), Ca:Jw:apture () ;
( 32768, -32138, 30273, -27245, 23170, -18204, 12539, -6392 } for(i=O; i<SZ RCJil; i++)
}; for(j=O; J<SZ_COL; j-1+) ,
static short OOE OvER SQRT T\'ll = 23170, ililffedB] [8], oruffer[B] [8], Hix; buffer [i] [j) = C~Pixel ();
static dcuble COS(int-xy, ~t uv) { return COS TABLE[xy] [uv] I 32768.0;
static dcuble C(int h} ( return h? 1.0 .: ONE_CNER_SQRT_'IW) I 32768.0; } void CntrlCaipressimage(void) (
static int FOCT(int u, int v, short ing[8] [Bl} [ for(i=O; i~_RCW_BLCCKS; i-1+)
dcuble s[8], r = 0; int x; for(j=O; j<Nl.M_COL_BI.a:KS; j++)
for(x=O; x<B; x-1+) [ for(k=O; k<B; k-1+)
s[x] = irrg[x] [0] * COS(O, v) + ing[x] [l]*COS(l, v) + i,.g[x] [2] * CC6(2, v) + for(l=O; 1<8; 1-ft)
irrg[x] [3] * COS(3, v) + ing[x][4]*COS(4, v) t irrg[x] [5] * CC6(5, v) + CodecPushPixel ( (char)wffer[i * 8 + kl [j * 8 + l]);
irrg[x] [6] * COS(6,. v) + ing[x] [7]*COS(7, v); CodecDoFdct () ; /* part 1 - ITC!' * /
for(k=O; k<B; k-1+)
for(x=O; x<8; x-1+') r += s[x] * COS(x, u}; for(l=O; 1<8; l-1+) (
return (short} (r * .25 * C(u) * Civ}); buffer[i * 8 + k] [j * 8 + l] = CodecPopPixel ();
b.iffer[i*B+k][j*B+l] >>= 6; /* part 2 - quantizaticn */
void Codecinitialize(void) [ 1dx = O;
void CodecPushPixel(short p) [
if( idx = 64) idx ~ O; }
i.buffer[idx / 8] [idx I 8] = p; idx++; void CntrlSendiroage(void)
} for(i=O; i<SZ RCJil; i++)
short CcxlecPopPixel(void) ( for(j=O; J <SZ_COL; j-1+) {
short p; tatp = a.iffer[i] [j];
if ( idx = 64 ) idx = 0; , UartSend ( ( (dla.r*) &tarp) (OJ) ; /* serd uwer byte * /
p = obuffer[idx / 8] [idx 1,: SJ,; idx++; UartSend( ( (dla.r*) &tatp) [l]); /* serd la...er byte * /
return p;
}
void CodecDoFdct(void)
int X, y;
for(x=O; x<B; x-1+) figure 7.12: High-level implementation of the CNTRL module.
for(y=O; y<B; y++) obuffer[x] [y] = FIXT(x, y, ibuffer);
}
idx = 0; of fixed-point representation, which is described later in this chapter. Thus the COS procedure
handles the portions of the above equation involving the cosine and its arguments.
We have also implemented a procedure called C that simply corresponds to the function·
Figure 7.11: lligh-lcvd impkmantation of the CODEC moduk
C(h) given above. All that remains now is the implementation .of the nested summations.
These summations are performed in the FDCT procedure. The inner summation is simply
More specifically. we have multiplied the 64 cosine values by 32,678 and rounded the unrolled (i.e., we have expanded it into eight terms that are added together). The outer
result to the nearest integer: The value 32,678 is chosen to allow us to store each value in t summation is implemented as two consecutive for loops. This choice of implementation, of
bytes of memor)'. To convert tl1csc integers back to floating point, we need lo divide the course, is not unique. There are many ways to perform FDCT and the ~der is encouraged, as
stored values by 32,678.0. This is accomplished in the procedure called COS. This· is a form . ari exercise, to implement these DCT functions with performance in mind.
192 Embedded .System Design' ~!llbedded System Design
Chapter 7: Digital camera Example
7.4: Design
int nain(int argc, dlar *argv[J) {

char •uart0.1tp..1tFileNarre = ai:gc > 1 .? argv[l] : "uart_rut.txt";
char *i.JrageFileNane = ·argc > 2 ? argvf2] : "iirage. txt";
/* initialize the m::dules * /
Uartinitialize (uartO.ltp..ltFileNarre);
Ccrlinitialize (irnageFileNarre) ;
Co:lwinitialize () ;
Ccrlecinitialize();
Cntrllnitialize();
/* sinulate functi<nality * /
Cntrlcap..itreinage () ;
Cntrlcarpressinage();
CntrlSerx:!Inage () ;
Figure 7.14: Block-diagram of our first implementation.
We might begin by examining a low-end general-purpose processor connected to flash

Figure 7.13: Putting it all together is the main module. memory, and trying to map all functionality to software running on that microprocessor. Such
an implementation is often a good starting point for embedded system design, since the
The last module that we need in order to complete the implementation of our digital implementation will usually satisfy our power, size. and time-to-market constraints. If it also
camera is the heart of the system, or what we have called the CNTRL, short for controller. satisfies performance, then design is nearly complete. If this design doesn ·1 satisfy constraints,
The corresponding C file of the CNTRL module is called CNTRL.C and is shown in Figure then we could Uy a faster processor, we could use single-purpose processors for time-critical
7. 12. This module exports three procedures named Cntr/Jnitialize, Cntr/Compresslmage, and functions, or we could even rewrite the functional specification. We'll now start with such an
Cntr/Senc/Jmage. The Cntr/lnitialize procedure does nothing and is provided for consistency implementation and then speed it up using different approaches.
purposes only. The Cntr/Compressimage procedure uses the other modules that we have ·
described so far, namely the CCDPP and the CODEC to capture and perform FDCT and t Implementation 1: Microcontroller Alone
quantization on an image. Part of what this procedure has to do is to break the image into
Suppose we choose an Intel 8051 microcontroller (or similar such device) as our low-end
windows or what we have referred to as blocks of 8 x 8 pixels. Once a block is FDCT
processor. We determine that total IC cost (including NRE) would be about $5, power well
encoded,' it is quantized and stored in memory. The Cntr/Sendlmage procedure simply
below 200 mW, and time-to-markeli only about three months. However. a rough analysis
transmits the encoded image, serially, using the UART module.
shows that there is no way an 805 I alone will satisfy our performance requirement of one
Putting all this together is our main program, shown in Figure 7.13, that simply initializes
image per second. Suppose the particular microcontroller we choose runs at 12 MHz and
all the modules and calls on the controller to capture, compress and transmit one image.
requires 12 cycles per instrnction. meaning it executes one million instructions per second.
We now have a system-level model (executable specifi~ation) of our digital camera. We
Suppose we noticed during the execution of our earlier system-level model that CCD
can experiment with this extensively. Note that any bugs we find here will be orders of
preprocessing consumed a lot of the computation time. Figure 7.9 shows the original code for
magnitude easier to correct than if found at a later design stage.
the CCD preprocessor. Tne CcdppCapture function has a pair of nested loops that result in 64
x 64 = 4,096 iterations per image. Looking at tl1e code, we might estimate tJ1at each iteration
will require about 100 assembly instructions during execution. Thus, this function alone will
7.4 Design require 4,096 x 100 = 409,600 instructions per image. This is nearly half of our budget of one
Design consists primarily of determining the system's architecture, and mapping the million instructions per second. just to read the image alone, and net even considering the
functionality to that architecture. The architecture consists of a set of processors, memories other more compute-intensive tasks of DCT .and Huffman coding. Clearly. performance will
and buses. Processors may be any combination of single-purpose (custom or standard) or be much worse than one image per second. We' II have to speed things up somehow.
general-purpese processors. Multiple functions may be mapped to a single processor, and a l
function may ·be mapped to multiple processors. . We'll say that an implementation is a Implementation 2: Microcontroller and CCDPP
particular architecture and mapping. The set of possible implementations defines the solution One method for improving performance is to implement a function using a custom
space. Note that the solution space is usually enormous. So where do we begin? single-purpose processor. Nonnally, we resist designing custom single-purpose processors
·chapter 7: Digital Camera Example 7.4: Design
Instruction 4KROM
Decoder
Controller -~
ALU
14----.____~--'~~
L-------'
To External Memory Bus

Figure 7. 16: The L ART single-purpose processor as an FSMD.
Figure 7.15: Block-diagram of the Intel 8051 processor core.
compile C programs for execution on our processor. The ROM is generated using a special
because they can increase NRE cost and time to market. However, the CCDPP_ function is a program that reads the output of the C compiler/linker and outputs a VIIDL description of the
prime candidate for such implementation - not only is it taking up many nucrocontroller ROM.
cycles, but it looks simple to implement as a si~gle-purpose processor. There , 1s no The UART is a simple single-purpose processor. Its behavior is depicted in Figure 7 ..16
complicated arithmetic. so the datapath will be very simple, and the controller doesn t look as a finite-state machine with data (FSMD). Normally, the UART is in its idle state. When
like it will have many states either. since most of the cycles come from the 64 x 64 -loop invoked, it transitions into the start state, where it transmits a O indicating the start of a byte
iterations - these will likely translate to a couple of simple counters. transmission. Then.. it transitions info the data state, where it sends the 8 bits of the byte being
Thus. we decide to use an 8051 microcontroller coupled with a CCDPP single-purpose sent. Then, it transitions into the stop state, where it transmits a 1 indicating the stop of the
processor. Let's also implement a simple UART for the transmit-serially function. We'll also byte transmission. Finally, it trwsitions back into the idle state, ready to repeat the processes,
add an EEPROM for program memory and a RAM for data memory. Note that the CCDPP when s11mmoned again. Since the UART is memory mapped to the processor's memory
and UART. processors could be implemented by finding standard com"?nents for each, but address space, it is invoked when the processor executes a store instruction with the UART's
they are straightforward components, so let's implement them as custom mst_ead. ~1e CC1:>PP enable register as its target memory location. Of course, the UART is constantly monitoring
implements the zero-bias operations and interacts with the actual CCD chip. which res1d~s the address-bus. and when it detects the enable register' s address, it captures the data on the
external
.
to oar svstem-on-a-chip
. IC.2 The rest of the functionality will be implemented m data-bus and starts ·the transmission processes as just described. Note that we will use
software on the microcontroller. · memorv-mappcd 1/0 for communication between the 8051 P.rocessor and any other single-
Let us briefly describe the three main processors depicted in Figure 7. i4 in more detail. purpose processor in our system. Since the 8051 processor' s address space is 16 bits wide, we
We begin with the microcontroller. A synthesizable implementation of this microcontroller, use lower memory address, those starting at O and going up, for RAM and upper memory
captured at the register transfer level (RTL) and written in VHDL. is available to us_ for address. and\vc use those starting at 65,515 and going down for memory-mapped UO de.ices.
integration into the rest of the system. A block-diagram of the mam components of the 80:, l 1s The CCDPP is one of the single-purpose processors that has been implemented in
given in Figure 7.15. The controller fetches instructions from its read~onl~ program memory hard11arc. The FSMD of the CCDPP is depicted in Figure 7 .17. Internally, the CCDPP
an.d decodes them using the decoder component. The ALU component 1s used to actua lly single-purpose processor has a buffe r, labeled B, and three variables called R, C, and Bias.
execute arithmetic operations such as addition, multiplication. and division amon~ many The variables l? and Care used as row and column indices. The variable Bias holds.the zero-
others. The source and destination of these operations are registers that reside i.n the mtemal bias error for each or the rows as the rows are processed. The FSMD works as follows·. Once
RAM of the processor. Special data movement instructions are used to load and store data inYokcd, it transitions into the GetRow state where it reads from the actwil ccp a complete
from external memory through the external memory bus. A C compiler/linker is used to row including the last two blacked-out pixels. (For details, refer to the description of a CCD
gi\ en at the beginning of this chapter.) Then, the FSMD transitions ·into the ComputeBias
state where it computes the bias of the. current row and stores it into the Bias variable. In the
~ We assume that. the CCD chip resides external to our system-on-a~hip since ~i~en next state, called PixBias, the FSMD iterates over the same row subtracting away the bias
1od;1y· s mainstream t echnology. and mostly due to fabrication process differences, cornbrnmg from each clement in that row. In the next state, called NextRow, the row index is incremented
a CCD \Yith ordinary logic is not feasible.

·Embedded System Design
196
---------------------------
7.4: Design
static unsigned dlar xdata U 'IX REX; at 65535; ·!

· .;
static unsigned dlar xdata u::::soo_im; _at_ 65534;

void UARl'Ini tialize {void) (I ·
void UARI'Serrl(unsigned char d)
while { U S'OO Rffi =
1 ) {
/* l:usy wait */
U_TX_REG = d;
Figure 7.18: Rewriting the UART module lo utilize the hardware UART.
procedures with memory assigrunenls to the respective hardware devices. Let us show this
with the UART example. The code for this module is given in Figure 7.18. Here we have
defined two variables, called U_TX_REG; and U_STAT_REG. There are two keywords used
in defining these two variables that you may not recognize. The first one, called xdata,
instructs our compiler to place these variables in the external memory; in other words, the
Figure 7.17: The CCDPP single-purpose processor as a FSMD. compiler will generate code that will load and store these variables over the external memory
bus of the processor. The secorid keyword, called _at_ instructs Our compiler to place these
and the process eithet repeals, reading the next row, or stops when the e_ntire image is variables at the specified memory address. These two keywords allow us to declare a variable
processed. we assume that, as with the UART, this single-p~se processor 1s connected to such that when read or written will cause appropriate read or write operations to be performed
the 8051 processor's memory bus with the content of the mtemal buffer mapped to upper on the bus. Now, all we have to do to send a byte using our UART single-purpose processor is
memory addresses of the processor. . write the byte to be sent to the U_TX_REG causing it to be invoked. But since our processor
We now have all the componenls of our system-on-a-chip and,are ready to connect things may be much faster than the UART, we need to first make sur~t the UART is in its idle
together making up our digital camera. This is accomplished thro~gh the 8051 's ~emory bus, state. Til.i!; is accomplished by the while loop. Having designed our UART such that we can
as stated before. The 8051 memory bus uses a simple read and wnte protocol and 1s ~mposed check weather its busy or not, we can busy-wait until it becomes idle before.sending the next
of an 8-bit data-bus, a 16-bit address-bus, a read control signal and a write-control-signal. A data byte: The implementation Of the CCDPP module is similarly modified to utilize the
memory read works as follows. The processor places the memory address on the address-bus, CCD PP single-purpose processor. The rest of the modules are untouched.
then asserts the read control signal for exactly one clock-cycle and reads the data from the Now we can compile and ·link all our software modules and obtain the final program
data-bus one clock-cycle later. The device that is being read, either the RAM or on~ of o~ executable. This program executable is then translated into the VHDL representation of the
memory mapped single-purpose processors, when detecting that the read control signal 1s ROM using a ROM generator. All that remains is to test our entire system-on-a-chip. This is
asserted, and after checking the content of the address-bus, places and ~olds. th~ requ~ed done using a VHDL simulator program. A VHDL simulator takes as input the VDHL files,
data on the data-bus for exactly one clock cycle. A write ope-ration works m a similar ~on. making up our system, anci functionally simulates the execution of the final lC by interpreting
The processor places the memory and the data on the address and data·b'_'5, re~v~ly, the descriptions. By simulating, we are able .to learn weather our design is functionally
Then, it asserts the write control signal for exactly one clock cycle. The devi~ that 1s bemg correct. Moreover, we can also measure the amount of time, <''" clock-cycles, that it takes to
written, when detecting that the write control signal is asserted, and after checking the content process a single image. This is our first metric of inten,.,t, namely, performance.
of the ·address-bus, reads and stores the data from the data-bus. . Figure 7. l 9(a) shows how after simulating the VHDL models, we obtain.the execution time.
Now that we· have the hardware portion of our design implemented, we need to wnte the Figure 7.19(b) shows how we .synthesize the high~level VHDL models and obtain the gate-
software to complete the project. Fortunately, our executable specification will provide the level description of the corresponding circuils. Then, we simulate the gate-level models to
majority of the code that we need. In fact, we will maintain the same. structure of the code obtain the intermediate data necessary to compute the .power consumption of the circuit.
(i.e., we will keep the same module hierarchy, procedure names, and mam pro~). Th~ only Figure 7.19(c)-show~ how by adding the number of gates, we obtain the total area of the chip.
thing that needs to..be. done is to design the UART and CCDPP custom smgle-?urpose Once we are satisfied.that our design functions correctly, we can use our synthesis tool to
. processors, This is rather easy to do. All that we need to do. is replace the code m these translate the VHDL files down to an interconnection of logic gates. A synthesis tool is like a
Embedded Systero Design 199

---------------------·---------------
Chapter 7: Digital Camera Example 7.4: Design
Power \ operations. To make matters worsr., our processor is an 8-bit processor with no floating-point
equation )---. support; thus, the compiler needs to emulate each of these floating-point operations. Floating-
Gate level
simulator
/ ') point emulation is performed as follows. The compiler generates procedures for each of the
floating-point operations, such as multiplication and addition. These procedures may execute
tens of integer instructions in order to perform a single floating-point operation. Then. when
the compiler encounters floating-point operations in the source file, it places a call to these
compiler-generated procedures. ConSC(juently, our one million floating-point operations will
require ten million or more integer operations. In addition, our program will be larger, since it
Power has to accommodate the compiler-generated procedures.
We can thus consider speeding up the CODEC module to use fixed-point arithmetic. We
(b)
hope to reduce the total number of integer instructions required to encode each pixel. Our
Execution time implementation is shown in Figure 7.20. Let us first describe how fixed-point arithmetic
Chip area
works. In fixed-point arithmetic, we use an integer to represent real numbers. The bits within
(a)
(c) this integer are interpreted as follows. We use a constant and known number of these bits to
represent the portion of a real number after the decimal and the rest of the bits to represent the
portion of the real number before the decimal point.
Figure 7.19: Obtaining design•metrics of interest; (a) performance, (b) power, ( c) area. In our implementation of the CODEC, we have chosen to use 6 bits to represent the
fractional part of all arithmetic operations. The choice here has to do with the accuracy that
compiler for· single-purpose processors. It reads a VHDL file and translates it to a we desire. The more bits we m;e for the portion after the decimal, the more accurately we can
corresponding gate-level description. You '11 leam more about this process in a later chapter of · represent a real number. However, this will leave us fewer bits to represent the portion of the
this book. At this stage, these gates can be sent to an IC fabrication company to make our IC real number before the decimal point (i.e., the magnitude of the real number).
chip. But what we are interested in is counting the total number of gates_to get an idea of_h~w Once we have chosen the number of bits to represent the portion after the decimal point/
big our design is. This will tell us how big of an area we need ~o implement the d1~tal a.k.a. the fractional part, we can translate any constant to the fixed-point representation. For
camera, or the third metric of interest. To obtain the power consumptlon, our second metnc of example, imagine that we are.using 8-bit integers. Let us use 4 bits to represent the fractional
interest, we simulate the gate-level description of the digital camera and keep track of the part. The fixed-point representation of the real value 3. 14 would be 50, or 00110010. We
number of ti.mes these gates switch from zero to one and from one to zero. Recall that we can obtain 50 by multiplying the real value, 3.14, by 2 raised to the number gf bits we are using
estimate power consumption if we know the amount of switching that takes place in a circuit for the fractional part, i4 = 16, and rounding it to the nearest integer, 3.14 x 16 =50.24:::: 50.
We can now analyze our first implementation using the approach outlined in Figure 7.19. Note that the 4 least significant bits equal 2. Since there are a total of I 6 possibilities, each
Using simulation, we have measured the total execution time for processing a single image to would represent .0625. Given that we have·2, we get 2 x 0 .0625 = 0. 125. The four most
be 9.1 seconds. The power consumption is measured to be 0.033 watt. The energy significant bits encode the value 3, which when added to our fractional part, gives 3.125. Of
s
consumption is 9.1 x 0.033 watt= 0.30 joule. The area is measured to be 98,000 gates. course, our representation is not exact but close. We can improve this by using more bits for
the fractional part. In fact, the cosine table in Figure 7.20 gives the fixed-point representation
Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT of the cosine values, using 8-bit integers. ·
The previous implementation does not achieve I-image-per-second processing. Looking at the Now that we know how to represent a real number using integers, we have to define the
execution of the previous implementation, we see that most of the microcontroller computer two operations that are used in our calculations, namely addition and multiplication. Addition
cycles are spent performing the DCT operation. Thus, we could consider pulling this is straightforward. All that we have to do is add the integers. For example, assume that we
compute-intensive function out from software to custom hardware,_ as ~e did ~or the CCD have 3.14 encoded as 50, or OOIIOOIO and 2.71 as 43, or 00101011. To add these two
preprocessor. However, unlike the CCD preprocessor, the DCT functtonality 1s. fairly complex together, we add the integers 50 and 43 to obtain 93, or Ol O11 l Ol. Converting this back to a
and thus will likely require more design effort. We can instead speed up the OCT · real, we 6 et 5 + 13 .x 0.625 =5.8125. This number is close to the actual value, which is 5.85,
functionality by modifying its behavior. but not exact, as expected.
Recall that each OCT operation involves numerous floating-point operations. Actually, Similarly, with multiplication, we can multiply the two fixed-point values to obtain our
for. each pixel that is transformed, about 260 floating-point operations are performed. Th~re result. But, at this point we need to perform an additional operation. Let us multiply the value
are 64 x 64 = 4,096 pixels that are encoded, for a total of about one million floating-porn! 3. i4 encoded as 50, or 0011001-0 and 2.71 as 43, or 0010101 I. From this we obtain 2,150, or
Embedded System Design Embedded System ·Oesign - .201 ·

200
Chapter 7: Digital Camera Example ----------------------------... -. -·-.. . . w~·- --- ·-- ••·-·-- ~
7.4: ~sign
static c:cnst mar code ms_TABLE[8J [8] = {

( 64, 62, 59, 53, 45, 35', 24, 12 },
( 64, 53, 24, -12, -45, -62, -59, -35 },
{ 64, 35, -24, -62, -45, 12, 59, 53 },
{ 64, 12, -59, -35, 45, 53, -24, -62 },
{ 64, -12, -59, 35, 45, -53, -24, 62 },
{ 64, -35, -24, 62, -45, ~12, 59, -53 },
{ 64, -53, 24, 12, -45, 62, -59, 35 },
{ 64, -62, 59, -53, 45, -35, 24, -12 l
};
static c:cnst diar ONE OI/ER S<J<T"'IW) = 5;
static short xdata in&!ffer[!l] [8], rut&.lffer[S] [8], idx;
static unsigned diar C(int h) { return h? 64 : OOE_OJER_SQRT_'IW);
static int F(int u, int v, short :iirg[8][8]) {
lcng s[B], _r = O;
\IDSigned dlar x, j ,.
for(x=O; x<B; x++) {
s[x] = O; We can now analyze our second implementation using the approach outlfr,ed ii; Fh;ure
for(j=(); j<B; jtt) s[x] t= (:iirg[x] [j] * cm_TABLE[j] [v] ) » 6; 7.19. Using simulation, we have measured the total execution time for processmg,, si~gle
)
for(x=O; x<B; xtt) r += (s[x] * cm TABLE[x] [u]) » 6; unage to be 1.5 seconds. The power consumption is measured to be 0.033 wali. the same as
retum (short) { ( ( (r * ( ( (16 * C(u) )-» 6) * C(v)) » 6)) » 6) » 6); before. The energy consumption of this design is 1.5 s x 0.033 watt = 0.050 jouie. This is
l means that our batteries will last six times longer when compared to the previous design / The
void Ccdecinitialize(void) { idx = 0; ) area is measured to be 90,000 gates. We have improved perfonnance by a factor of six and
void C<idec:PushPixel (short p) { reduced the chip area by about 8,000 gates over the previous design. The gate reduction is
if( idx = 64) idx = 0;
because our program no longer needs to emulate the complex floating-point operations, thus
~fer[idx / BJ [idx % BJ =p « 6; idxtt;
requiring less memory for storing the corresponding code.
}
void CodecfuFdc:t (void) (
=igned short x, y; Implementation 4: Microcontroller and CCDPP/DCT
for(x--0; x<B: x++)
for(y=O; y<B; yt+) Our third implementation's performance is close to that required by our spe:;ificaticn,
outBuffer[x] [y] = F(x, y, :inaiffer); achieving 1.5 seconds per image. Let us try to improve performance further to obtain 1 second
idx = 0; per image. In our next implementation, we will resort to implementing the CODEC in
hardware. That means that we will design a single-purpose processor that performs the DCT
operation on a block of 8 x 8 pixels. The block-diagram of our new system-on-a-chip is given
Figure 7.20: Fixed-point implementation of the CODEC module. in Figure 7.21. Designing the processor for the CODEC may take some time !o get correct.
To 1JSe this CODEC, we will need to make some changes to our software. Specifically,
100001100110. Note that, when multiplying two 8-bit i~tegers, we can expect the result to be we will need to change the CODEC module, as we did the UART and the CCDPP modules.
16 bits wide. What we have to do to obtain our final result is to discard the lower 6 bits of our The code is presented in Figure 7.22. We have designed our hardware CODEC to have four
16-bit result,. obtaining 10000110. Converting this back to a real, we get, 8 + 6 x 0.0625 = memmy-mapped registers. Two of these registers, called C DATA! REG and
8.375. The number is close to the correct value, which is 8.5094, but not exact, as expected. C_DAT~O_REG, are used to push and pop a block of 8 x 8 pixels into ani oui of the
The biggest difficulty with fixed-point arithmetic is to ensure that !be resulting values, CODEC. Another register, called C_C,\!fND_REG, is used to command the CODEC.
after performing addition and multiplication operations, do not exceed the bit-width of the Specifically. writing a one to this register will invoke the CODEC. The last register, called
integers that are being used. Therefore, it is important to consider the intervals, or rdllge, of C~SFAT_REG, can be polled in software to tell when our CODEC is done encoding a block
the real values that are be\ng operated on. We have applied the fixed-point arithmetic scheme of pixels. The actual implementation of the CODEC is a direct translation of our fixed-point
presented here in recoding the CODEC. This time our CODEC uses integer operations only version of the CODEC written in C, and used in our second implementation, into VHDL.
and we expect' it to execute faster than our first implementation. Using a single-purpose processor for encoding data, we expect to improve our execution time,
and thus satisfy our timing constraints.

'Embedded System Design
www.compsciz.blogspot.in - - -·- - - -"------~--- · - -- ~ -- ~ - - - ~ -- --=-~ / ; ;_,;,;. -· --·; - -

- -·- -···- - - ---···-··
.:. ;.: :. .__. .:._________________=-~--~--------
Chapter 7: Digital Camera Example 7.5: Sumrnary
static unsigned d1ar xdata C STM REG at 65527;

static unsigned d1ar xdata c:::01ND:::REG :::a( 65528; 7.5 Summary
stati,: unsigned d1ar xdata C_D".TAI_REG _at_ 65529;
static unsigned d1ar xdata C rATAO REG at 65530; We have introduced ~ ~i_gital camera and have described its various components. These
void Codecinitialize(void) {) - - - compone~t'i capture, diginze, process, and st~re image~, among other things. As part of our
void CodecPushPixel (short p) ( C_lll>ITAO_ REG ; (ma.r)p; · presentatmn, we have descnbed JPEG encodmg, to a linuted extent We have specified our
short CodecPopPixel(void) ( design project in an informal format using English as well as an executable specification. We
return ((C_DATAI_REG « 8) C_fJllJAI_REGi;
have described three design metrics of interest, namely, perfonnance, power consumption.
)
void CodecD::lFdct(void) [ and chip area. For each of these metrics, we have suggested optimization techniques. rn the
C CMIID REG; l; second part of the chapter, we have described seve\al successively improved implementations.
whi.le(-C_STAT_Rffi = 1 ) ( /* b.Jsy wait */ ) The first implementation we considered--used a single microcontroller, but would have been
far too slow. Our second implementation used ·a coprocessor to speed things up a bit but we
were still much too slow. Our third implementation gave up some accuracy during
Figure 7.22: Rewriting the CODEC modulo to utilize the hardware CODEC.
compression by using fixed instead of floating-point numbers. It ·came close to our
performance constraint, but was a still a bit slow. Our last implementation involved another
We can now analyze our final implementation using the approach outlined in Figure 7.19.
coprocessor for compression, meeting performance easily but costing more and taking more
Using simulation. we have measured the total execution time for processing a single image to
design time. The better of these last two implementations is not clear.
be 0.099 seconds. The power consumption is measured to be 0.040 watt. Notice that the
The executable specification and the three latter implementations are available in source
power consumption increased. because our chip is now doing more (i.e., tl1ere are multiple code format on this book's Web page.
processors working). The energ)' consumption of this design is 0.099 s x 0.040 wall :::: .00040
joule. This means that our batteries will last 12 times longer tban the previous design. The
area is measured to be 128,000 gates. We are now well under I second, processing one image
in about I/10th of a second (approaching video camera speed now) . However, we have 7.6 References and Further Reading
increased tl1e IC size significantly. This implementation certainly meets our timing • C. Wayne Brown and Barry J. Shepherd. Graphics File Formats - Reference and
requirements. More importantly. if we design the DCT ourselves. we will likely increase our Guide. Connecticut: Manning Publications Company, 1995.
NRE cost and time-to-market. lfwe purchase an existing DCT,we may increase our IC cost. • P. van der Wolf, P. Lieverse, M. Goel, D.L. Hei, and K. Vissers. An MPEG 2 Decoder
We have summarized our results in Figure 7.23. In designing an embedded system. many Case Study as a Driver for a System-level Design Methodology. International Workshop
other metrics need to be considered. As with any other commercial product, in addition to on Hardware/Software Co-Design, March 1999. ·
engineering issues. a careful cost ll{lalysis of a system must be made.
Implementation 3 is close in terms of performance but a little slow, and consumes more
energy,' but is· likely much cheaper and will be built in less time. Implementation 4 meets the
performance (by ·a lot) and consumes much less energy (by a lot). but will be more expensive
7.7 Exercises
and may result in missing our time-to-market cutoff. Which is better? It's a choice that our 7.1 Using any programming language of choice, (a) implement the FDCT and IDCT
company will have to make. As mentioned in Chapter La key challenge facing the embedded equations presented in section 7.2 using double precision and floating-point arithmetic.
system designer is to construct an implement_ation that simultaneously optimizes numerous (b) Use the block of 8 x 8 pixel given in Figure 7.2(b) as input to your FDCT and
design metrics. We can't always get what we want! obtain the encoded block. (c) Use the output of part (b) as input to your IDCT to obtain
the original block. (e) Compute the percent error between your decoder's output and the
Im lementation 2 Im lementation 3 Im lementation 4 original block.
Perfmmance (second) 9.1 1.5 0.099 7.2 Assuming 8 bits per each pixel va lue, calculate the length, in bits, cif the block given in
Power (watt) 0.033 0.033 0.040 Figure 7.3(b).
Size ate) 98,000 90,000 128,000 7.3 Using the Huffman codes given in Figure 7.5, encode the block given in Figure 7.3(b).
Ener y .oule) <UO 0.050 0 .0040 (a) What is tlle length, in bits? (b) How much compression did we achieve be using
Huffman encoding? Use the resuUs of the last question to calculate this.
- -- - - - - -- - - -- - - -- - - - -- - -- - - -
Figure 7.23 :: Summary of \.ksign mctri~~.
204 Embedded System Design l:mbedded ·System Design
;, ;,_~·-.
[
t
r
r,
l
I -~;,:.;__,.;;:;__----------------------
Chapter 7: Digital camera Example
I
l 7.4 Convert 1.0, U , 1.2, l.3, 1.4, 1.5, 1.6, 1.7, 1.8, and 1.9 to fixed-point representation
1 using (a) two bits for the fractional part, (b) three bits for the fractional part, and (~)
three bits for the fractional part.
CHAPTER 8: State Machine and
_,
7.5 Write two C routines that, each, take as input two 32-bit fixed-point numbers and
perform addition and multiplication using 4 bits · for the fractional part and the
remaining bits for the whole part.
7.6 Using any programming language of choice to (a) implement the FDCT and IDCT
Concurrent Process Models
equations presented in Section 7.2 using fixed-point arithmetic with 4 bits used for the
fractional part and the remaining bits u~ for the whc,le part, (b) use the block of 8 x 8
pixels given in Figure 7.2(b) as input to your FDCT and obtain the encoded block, (c)
use the output of part (b) as input to your IDCT to obtain the original block, and (d)
compute the percent error between your decoder's output and the original block.
7.7 List the modifications made in implementations 2 and 3 and discuss why each was 8.1 Introduction
beneficial in terms of performance. 8.2 Models vs. Lal).guages, Text vs. Graphics
8.3 An Introductory Example
8.4 A Basic State Machine Model: Finite-State Machines
8.5 Finite-State Machine with Datapath Model: FSMD
8.6 Using State Machines
8.7 HCFSM and the Staiecharts Language
8.8 Program-State Machine Model (PSM)
8.9 .The Role of an Appropriate Model and Language
8.10 Concurrent Process Model
8.11 Concurrent Processes
8.12 Communication among Processes
8.13 Synchronization among Processes
8.14 Implementation
8.15 Dataflow Model
8.16 Real-Time Systems
8.17 Summary
8.19 Exercises
8.1 Introduction
We implement a syste~'s processing behavior with p~ocessors. But to accomplish this, we
must_ 1'.3-ve first ~escnbed. that processing behavior. One method we've discussed for
de~nbmg processmg behavtor uses assembly language. Another more powerful method uses
a high-level programining language like C. -Both methods use what is known as a seq~ ntial

:06
Chapter s: State Machine and ConcurTent Process Models
8.2: Models vs. Languages, Text vs. Graphics
program computation model, in which a set of instructions executes sequentially. A high-levet

programming language provides more advanced constructs for sequencing among the '
-,:
instructions than does an assembly language, and the instructions are more powerful, but Models Dai,;- :
nevertheless, the sequential execution model (one statement at a time) is the same. 11cm ,
L
However, the increasing complexity of embedded system functionality requires more
advanced computation models. The increasing complexity results from increasing IC
capacity: The more we can put on an IC, the more functionality we want to put into our
Languages
embedded system. Thus, while embedded systems previously encompassed simple
applications. like washing machines and small games, requiring perhaps hundreds of lines of .(a) .(b)
code, today they also cover sophisticated applications like television set-top boxes and
Figuro 8.1: ~fodds vs. languag.s: (a) re,·ipes ,·s. English. (b) scyu"ntial programs \'s. C.
cellular telephones, requiring perhaps hundreds of thousands of lines of code.
Trying to describe the behavior of such systems can be extremely difficult. The desired
behavior is often not even fully understood initially. Therefore, designers must spend much
time and effort simply understanding and describing the desired behavior of a system, and
they often make, mistakes at this stage, before a single line of code has been written. Some 8.2 Models vs. Languages, Text vs. Graphics
studies have found that many system bugs come from mistakes made describing the desired
A common point of confusion is the difference bct,,ccn a ~omputation model and a language.
behavior rather than. from mistakes in implementing that -behavior. The common method
Another 1s the difference bet\\cen a textual language and a graphical language. Thus. we will
today of using an English (or some other natural language) description of desired behavior exphc1tly state the differences here.
provides a reasonable first step, but it is not nearly sufficient because English is not precise.
Trying to describe a system precisely in English can be an arduous and often futile endeavor Models vs. Languages
- many long and hard-to-read legal documents serve as examples of what happens when
attempting to be precise and complete in a natural language. A co111putation 111odel describes .des_ircd s~·stcm ·bcl;m·ior. while a. language captures models.
A computation model assists the designer to understand and describe the behavior by A model 1s a conceptual_ nolton. while a language captures that concept in a concrete fonn. A
providing a means to compose the behavior from simpler objects. A computation model model can _be capturc_d ma ,·ariety of languages. ,,hile a language can capture a ,·arict,· of
provides a set of objects, rules for composing those objects, and execution semantics of the models. as illustrated m Figure 8.1. ·
composed objects. Several models are commonly used for describing embedded systems. . Let us consider an analogy im·~h ing cooking recipes. A recipe is like a model. a
These include: conceptual notion. cons1stmg of a sci of instmctions for cooking something: and a nolion of
• The sequential program model; which provides a set of statements, rules for putting how _to sequence among those instmctions. For example. a particular recipe mav include a
statements one after another, and semantics &tating how the statements are executed reqmrement of fir? putting flour in abowl and then mixing in two eggs. English is· a language
one-at a time. ca~able of captunng a recipe. Th_1s snnplc example illustrates three important points. first. a
• The communicating process model, which supports description of multiple recipe can be captt~red fauhfully m various languages. such as English. Spanish. or Japanese.
sequential programs running concWTently. In fact. a _recipe exists independent of its capture in a particular language - some recipes arc
• The state machine m odel, used commonly for control-dominated systems. A contra/- nev_e r wntten down! _Second. a particular lanh'llagc can capture many different conceptual
dominated system is one whose behavior consists mostly of monitoring control not10ns other_than recipes. such as poetry or stories. Third. certain languages ma~· be better at
inputs and reacting by setting control outputs. cap~~m~ rec,~pes th~, others - while English works fine. a primitive language without words
• The datajlow model, used commonly for data-dominated systems. A data-dominated for boil or sirruner may be cumbersome to use for capluring recipes.
system's behavior consists mostly of transforming streams of input data into streams Retumiqg now from cooking to computing. consider sequential programs A sequential
of output data. program IS a model. a conceptual notion. consisting of a set of prognim instmclions for
• The object-oriented model, which provides an elegant means for breaking complex · computing somethmg. and a notion of how lo sequence among those i11S1mc1io11s: For
software into simpler, well-defined pieces. example, a particular scq!lential pro6>ra m may include a requirement of lirst initiali;jng a
In fact, a system may be described using a combination of models. We will describe vanable_ to 10. and the~ adding 2 to that rnriablc. C is a language capable of capturing a
several mode.ls in Ibis chapter. sequenttal program. As m our analogy above. there arc three important points to remember.
First, a sequential program can be captured in any of various languages. such as C. C++. or
Java. Second. a particular language can capture many different models other than sequential
208 Embedded System Design · Embedded System Design

209 ·.
Chapter s: state Machine and Concurrent Process Models 8.4: A Basic State Machine Model: Finite-State Machines
programs, such as state machines or dataflow. Third, certain languages may be better at "Move the elevator either up or down to reach the
capturing sequential programs than others - while C works fine, a primitive language like requested floor. Once at the requested floor, open the door
for at least 10 seconds, and keep it open until the
assembly without constructs for "loops" or "procedures" may be cumbersome to use for requested floor changes. Ensure the door is never open.
up
capturing sequential programs. As another example, C can be used to capture state machines, Unit while moving. Don't change directions unless there are no
as we will see later, but a language intended specificaJly to capture state machines might be Control higher requests when moving up or no lower requests
more convenient. when moving down ... "
floor (b)
Textual Languages vs. Graphical Languages
Languages may use a variety of methods to capture models, such as text or graphics. Defining req Inputs: int floor; bit bLbN; upl..upN-1; dn2..dnN;
Request Outputs: bit up, down, open;
a graphical language equivalent to a textual one is fairly straightforward, and vice versa. The
Resolver Global van·ables: int req;
choice of a textual language versus a graphical language is entirely independent of the choice bl buttons
of a computation model. b2 } inside . void UnitControl() void RequestResol ver()
Let us return to our analogy involving recipes. We could choose to capture a particular bN elevator { {
recipe in the English textual language. On the other hand, we could choose to capture the up= down = O; open = I; while (I)
while (I) {
recipe using a graphical recipe language, which might include icons of objects like eggs and up I } up/down
up2 . buttons while (req == floor); req = ...
bowls, as well as icons for tasks like "mix" or "sinuner." open= O;
Likewise, we could choose to capture a particular sequential program in the C textual dn23 on each
up if(req > floor) {up= I;)
language. On the other hand, we could choose to capture the sequential program using a dn3 floor else {down= l;}
graphical sequential progranuning language, which might include icons of objects like while (req != floor); void main()
up= down= O; {
variables and constants, as well as icons for tasks like "assign" or "loop." Graphical dnN Call concurrently:
sequential programming languages were commonly proposed in the 1980s, but have not open= I;
delay(IO); UnitControl() and
become very popular. The state machine model is often captured in textual languages, but it is } RequestResolver()
also commonly captured in graphical languages found in numerous commercial products. l
(a) (c)
8.3 An Introductory Example Figure 8.2: Specifying an elevalor controller system: (a) system interface, (b) partial English description, (c) more
precise description using a sequential program model.
Here, we introduce an example system that we'll use in the chapter, and we'll use the
sequential program model, introduced in an earlier chapter, to describe part of the system. called delay). It then goes back to the beginning of the infinite loop. The RequestResolver
Consider the simple elevator controller system in Figure 8.2(a). It has several control inputs would be written similarly.
corresponding to the floor buttons inside the elevator and corresponding to the up and down
buttons on each of the N floors at which the elevator stops. It also has a data input
representing the current floor of the elevator. It has three control outputs that make the
elevator move up or down, and_open the elevator door. A partial English description of the 8.4 A Basic State Machine Model: Finite-State Machines
system's desired behavior is shown in Figure 8.2(b). In a _finite-state machine (FSM) model, we describe system behavior as a set of possible
We decide that this system is best described as two blocks. RequestResolver resolves the states; the system can only be in one of these states at a given time. We f "() describe the
various floor requests into a single requested floor. UnitControl actually moves the elevator possible transitions from one state to another depending on input values: FinaUy, we d~scribe
unit to this requested floor, as shown in Figure 8.2. Figure 8.2(c) shows a sequential program the actions that occur when in a state or when transitioning between states.
description for the UnitControl process. Note that this process is more precise than the · For example, Figure 8.3 shows a state machine description of the Uni/Control part of our·
English description. It firsts opens the elevator door and then enters an infinite loop. In this elevator example. The initial state. Idle, sets up and down to O_and open to L. ·nie state
loop, it first waits until the requested and current floors differ. It then closes the door and machine stays in state Idle until the requested floor differs from the current floor. If the·
moves the elevator up or down. It then waits until the current floor equals the requested floor, requested floor is greater, then the machine transitions to state Going Up, which _sets up to I,
stops moving the elevator, and opens the door for 10 seconds (assuming there's a routine whereas if the requested floor is less, then the machirie transitions to state GoingDown, which
;T
,:f
r Chapter 8: State Machine and Concurrent Process Models

-----------------------'-----
8.5: Finite-State Machine with Datapath Model: FSMO
meaning that register updates are synchronized to clock pulses (e.g., registers are updated
only on the rising {or falling) edge of a clock). Such an FSM would have every transition
u,d,o, t = 1,0,0,0 condition ANDed with the clock edge (e.g., clock ' rising and x = y). To avoid having to add
timer< IO this clock edge to every transition condition, we can simply say that the FSM is synchronous,
meaning that every transition condition is implicitly ANDed with the clock edge.
reg= floor u,d,o,t = 0,0, I, I
u,d,o,t = 0, 1,0,0 !(req<floor) 8.5 Finite-State Machine with Datapath Model: FSMD
u is up, d is down, o is open · When using an FSM for embedded system design. the inputs and outputs represent.Boolean
t is ti mer_start data types, and the, functions therefore represent Boolean functions with Boolean operations.
Titis model may be sufficient for many purely control systems that do not input or output data.
However, when we must deal with data, two new features would be helpful: more complex
Figure 8.3: The elevator's UnitControl process described using a state machine.
data type~ (such as integers or floating point numbers) and variables to store data. Gajski (see
:hapter 2) refers to an FSM model extended to support more complex _data types and
sets down to I. The machine stays in either state until the current floor equals the requested vuiiables as an FSM with datapath, or FSMD. Most authors refer to this model as an extended
floor, after which the machine transitions to state DoorOpen, which sets open to I. We FSM, but there are many kinds of extensions and therefore we prefer the more precise name
assume the system includes a timer, so we start the timer while transitioning to DoorOpen. ofFSMD: One possible FSMD model definition as a 7-tuple is <S. I, 0 , V, F, H. so>, where:
We stay in this state until the timer says 10 seconds have passed, after which we transition
back to the /d/.e state. Sis a set of states {so. s1, ... , s1},
We have described state machines somewhat informally, but now provide a more formal lisasetofinputs {it\ i 1, ... , im},
definition. We start by defining the well-known finite-state machine computation model, or 0 is a set ofoutputs {ot\ 01 , . .. , On},
FSM, and then we'll define extensions to that model to obtain a more useful model for Vis a set of variables {vt\ v 1, .... vn},
embedded system design. An FSM is a 6-tuple F<S. I, 0, F, H, so>, where
Fis a next-state function, mapping states and inputs and variables to states (S x I x V->S),
S is a set of states {so, s1, ••• , si),
His an action function, mapping current states to outputs and variables (S->O + V),
/is a set of inputs {io, i 1, ••• , im), so is an initial state.
0 is a set ofoutputs {oo, 01, ••• , o.), In an FSMD, the inputs, outputs and variables may represent various data types perhaps
as complex as the data types allowed in a typical programming language. Furthermore, the
Fis a next,state function (i.e., transitions), mapping states and inputs to .states (SX/->S),
functions F and H may include arithmetic operations, such as addition, rather than just
His an output function, mapping current states to outputs (S->0), Boolean operations as in an FSM. We now call Han _action function rather than an output
function, since it describes not just outputs, but also variable updates. Note that the above
so is an initial state.
definition is for a Moore-type FSMD, and it could easily be modified for a Mealy type or a
The above is a Moore-type FSM, which associates outputs with states. A second type of combination of the two types. During .execution of the model, the complete system state
FSM is a Mealy-type FSM, which associates outputs with transitions (i.e., H maps S x 1->0). consists not only of the current state s;, but also the values of all variables. Our earlier state
You might remember that Moore outputs are associated with states by noting that the name machine description of Uni/Control was an FSMD, since its input data types were integers,
Moore has two o's in it, which look like states in a state diagram. Many tools that support and it had arithmetic operations, like magnitude comparisons, in its tran,sition conditions.
FSMs support combinations of the two types, meaning we can associate outputs with states,
transitions, or both.
We can use some shorthand notations to simplify FSM descriptions. First, there may be 8.6 · · Using State Machines
many system outputs, so rather than explicitly assigning every output in every state,.we can
say that any outputs not assigned in a state are implicitly assigned 0. Second, we often use an Having introduced the basic FS~ and FSMD models, ws:: now discuss several issues
FSM to describe a single-purpose processor (i.e., hardware). Most hardware is synchronous, related to using those models to describe desired system behavior.
_ 212 Embedded Si-;tem Design

Chapters: State Machine and Concurrent Process Models 8.6: Using State Machines
Describing a System as a State Machine

#define. IDIE 0
Describing a system's behavior as a state machine, in particular as an FSMD, consists of #define G)JN:;UP
several steps: #define Q)m:;a,J 2
r List all possible states. giving each a descriptive name. #define lXX)R()PEN 3
2. Declare all variables. void UnitCai.trol ()

int state = IDIE;
3. For each state. list the possible transitions, with associated conditions, to other states. whiie (1) {
-l. For each state and/or transiti9n, list the associated actions. switch (state) {
S. For each state. ensure that exiting transition conditions are exclusive, meaning that IDIB: U[)"-0; d=n=O; c:pen=1; tirrer start=O;
no two conditions could be true simultaneously, and complete. meaning that one of if (req=floor) ·fstc1te = IDIE;}
the conditions is true at anv time. if (req > floor) {state = C?DINQJP; J
if (req < floor) {state = G:>IN'."DI;}
If the transitions leaving a state are not exclusive. then we have a nondeterministic state
break;
machine. When the machine executes and reaches a state with more than one transition that GOINQJP: up=l; d==O; cp;n=O; tiner_start=O;
could be taken. then one of those transitions is taken, but we don't know which one that if (req > floor) {state = C?DINGUP; J
would be. The nondeterminism prevents having to over-specify behavior in some cases, and if ( ! (req>floor) ) {state = IXOROPEN; J
may result in don·t-cares that may reduce hardware size, but we won't focus on break;
nondeterministic state machines in this book. cornci:N: up=l; da-.n=O; cpetFO; tiner_start=O;
if (req < floor)._ {state= C?Dno::N;}
If the transitions leaving a state are not complete, then that usually means that we stay in if ( ! (req<floor) ) {state= IXOROPEN;}
that state until one of the conditions becomes true. This way of reducing the number of break;
explicit transitions should probably be avoided when first learning to use state machines. IXXJROPEN: up=O; da.Jll=O; open=l; tilrer_start=l;
i f (timer < 10) {s tate ., IXXlIDPEN; J
Comparing State Machine and Sequential Program Models i f ( ! (tilTer<lO)) {state = IDLE; J
break;
Many would agree that the state machine model excels over the sequential program model for
} }
describing a control-based system like the elevator controller. The state machine model is
designed such that it encourages a designer to think of all possible states of the system, and to
think of .all possible transitions among states based on possible input conditions. The Figure 8.4: Capturing the elevator's Uni/Control state machine in a sequential programming language.
sequential program model, in contrast, is designed to transform data ·through a series of
instruciions that may be iterated and conditionally executed. Each model encourages a - ..
different way of thinking of a system's behavior. Capturing State Machines in Sequential Programming Language
X common, point of confusion is the distinction between state machine and sequential As elegant as the state machine model is for describing control-v.ominated systems, the fact
progran1 models versus the distinction between graphical and textual languages. In particular, remains that the most popular embedded system development tools use sequential
a state machine description excels in many cases, not because of its graphical representation, programming languages like C, C++, Java, Ada, VIIDL, or Verilog. Such tools are typically
but rather because it provides a more natural means of computing for those cases; it can be complex and expensive, supporting tasks like compilation, synthesis, simulation, interactive
captured textually and still provide the same advantage. For example, while in Figure 8.3 we debugging, and/or in-circuit emulation. Thus, although sequential programming languages do
described the elevator's UnitControl as a state machine captured in a graphical state-machine not directly support the capture of state machines (i.e., they don't possess specific constructs
language, called a state diagram , we could have instead captured the state machine in a corresponding to states or transitions) we still want to use the popular embedded system
textual state-machine language. One textual language would be a state table, in which we list development tools to protect our financial and educational investments in them. Fortunately,
each state as an _entry in a table. Each state's row would list the state's actions. Each row we can still describe our system using a state machine model while capturing the model in a
would also list all possible input conditions, and the next state for each such condition. sequential program language, by using one of two approaches.
Conversely. while in Figure 8.2 we described the elevator's UnitContro( as a sequential In afront-end tool approach, we install an additional tool that supports a state machine
program captured using a textual sequential programming language, in tfus case C, we could ·language. These tools typically define graphical and perhaps textual state machine languages,
have instead captured the sequential program using a graphical seguential programming and include nice graphic interfaces for drawing and displaying states as circles and transitions
language, such as a flowchart. as directed arcs. they may support graphical simulation of the state machine, highlighting the
214 Emb~ed System Design · Embedded System.Design 215
Chapter B: State Machine and Concurrent Process Models 8.7: HCF~M and the Statecharts Language
#define SO 0
#define Sl 1
#define SN N
X
void StateMachine() ( B
int state = SO; // or: whatever: is the initial state.
while (1) (
switch (state) (
SO:
I I Insert SO' s acticns here & Insert tr:ansiticns T1 leaving SO: (a) (b)
if( T.,'s con:lition is true) {state= T0 's next state; /*acticns*/ / B '\
if( T;,s con:lition is true) {state= Tt's next state; /*acticns*/
Sl:
SN:
if ( Tw.' s con:iition is true ) [state =Tm's next state; /*acticns*/
break;
// Insert Sl's acticns here

// Insert trfillSitions T1 leaving Sl
break;
'il'i
~
2
_.,I;'
(c)
~
// Insert SN's actiais here

// Insert transitions T1 leaving SN
Figure 8.6: Adding hierarchy and concurrency to the slate machine model: (a) three-state example without hierarchy,
break; (b) same example with hierarchy, ( c) concurrency.
construct. We capture the state machine as a subroutine, in which we declare a state variable
initialized to the initial state. We then create an infinite loop, containing a single switch
Figure 8.5: General template for capturing a state machine in a sequential programming language. statement that branches to the case corresponding to the value of the state variable. Each
state's case starts with the actions in that state, and then the transitions from that state. Each
current state and active transition. Such tools automatically generate code in a sequential transition is captured as an if statement that _checks if the transition's condition is true and then
program language (e.g., C code) with the same functionality as the state machine. This sets the next state. Figure 8.5 shows a general template for capturing a state machine in C.
sequential program code can then be input to our main development tool. In many cases, the To be safer, we could replace the sequence of if statements representing a state's
front-end tool is designed to interface directly with our main development tool, so that we can transitions by an if-then-else statement. This would ensure that if the transition conditions
control and observe simulations occurring in the development tool directly from the front-end were mistakenly nonexclusive, the code would merely execute the first transition whose
tool. The drawback of this approach is that we must support yet another tool, which includes - · · condition was true, rat.her than executing all such transitions.
additional licensing costs, version upgrades, training, integration problems with our
deYelopment environment, and so on. .
In contrast, we can use a language subset approach. In this approach, we directly capture
our state machine model in a sequential program language, by following a strict set of rules 8.7 HCFSM and the Statecharts Language
for capturing each state machine construct in an equivalent set of ~uential pro~am Hiererarchicallconcurrent state machine models (HCFSM) are extensions to the state
constructs. This approach is by far the most common approach for captunng state ~chines, ·, machine model. Hare! proposed extensions to the state machine model to support hierarchy
both in software languages like Caswell as hardware languages like VHDL and Venlog. W.e and concurrency, and developed Statecharts, a graphical state machine language designed to
now describe how to capture a state machine model in a sequential program language. . . . caplurc that model. We refer to the model as a hierarchical/concurrent FSM, or HCFSM.
We start by capturing our UnitControl state machine in. the sequentialprogramnung · The hierarchy extension in HCFSMs allows us to decompose a state into auoiher state
language C, illustrated in Figure 8.4. We enumerate all states, in this case using the #define C machine. or com·ersely staled, to group several states into a new hierarchical· state. For

·- - - - - - - - - - - - - - - -
217
216 £mbedded System Design
li
www.compsciz.blogspot.in ... .......,......... ___.j
r
8.7: HCFSM and the Statecharts Language
Chapter 8: State Machine and Concurre_
nt Process Models
ElevatorController
. state machine in Figure 8.6(a), having three states Al, A2, ~~d 8. Al is
example, consider the . . h A 1 or A 2 and event z occurs, we trans1uon to state UnitControl RequestResolver
the initial s~te. ~en~ver we are •h~ e1tber ouping A I and A2 into a hierarchical state A, as
B we can simplify this state mac me Y gr ... 1 Al w Norma!Mode
shown in Figure 8.6(b). State A is the i~it_ial ~tate, which in~um th;sl: :t~::eani~g i:
draw the transition to B on event z as orig10at10g from state , no . . · !fire
that regardless of whether we are in A 1 or A2, event z causes a transition to state 8 · th t
As another hierarchy example consider our earlier elevator example, and suppose 1 a we
' . · h t· ed'ately moves thee evator
want to add a control input fire, along with new behavior t a 1mm I -
UnitControl Figure s·.8: Using concJrrency in an HCFSM to describe both processes of the ElevatorController.
u,d,o = 1,0,0 down to the first floor and opens the door when fire is true. As shown in Figure 8.7(a), we
can capture this.behavior by adding a transition from every state originally in UnitControl to a
u,d,o = 0,0, I u,d,o = 0,0,l new state called FireGoingDn, which moves the elevator to the first floor, followed by a state
FireDrOpen, which holds the door open on the first floor. When fire becomes false, we go lo
req=floor the Idle siate. W.lile this new state machine captures the desired behavior, the state machine is
u,d,o=0,1.0 becoming more complex due to many more transitions, and harder to comprehend due to
more states. We can use hierarchy to reduce the number of transitions· and enhaucc
understandability. As shown in Figure ·8.7(b). we can group the original state mac_!tine inlo a
!fire hierarchical state called Norma/Mode, and group the fire-related states into a state called
FireMode. This grouping reduces the number of transitions, since instead of four transitions
(a)
from each-original state to the fire-related states, we now need only one transition.in this case
Uni1Control
from Norma/Mode to FireMode. This grouping also enhances understandability, since it
Normal Mode clearly represents two main operating modes. one normal and one in case of fire.
The second extension that HCFSMs possess. concurrency, allows us lo u~ hierarchy to
decompose a state into two concurrent states, or conversely stated, to group two concurrent
states into a new hierarchical state. For example, Figure 8.6 (c), shows a state B decomposed
,d,o=0,0,1 into two concurrent states C and D. C happens to be decomposed into another state_machine,
DoorOpen as docs D. Figure 8.8 shows the entire ElevatorControiler behavior captured as a HCFSM
with two concurrent states.
Therefore, we see that there are two methods for using hierarchy to decompose a state
into substates. OR-decomposition decomposes a state into sequential states, in which only one
state is active at a time - either the first state OR the second state OR the third state. etc.
AND-decomposition decomposes a state into concurrent states, all of which are active at a
time -. the first state AND the second state AND the third .state. etc.
The Statecharts language includes numerous additional constructs t~ improve state
machine capture. A timeout is a transition with a time limit as its condition. The u-ansition is
automatically taken if the transition source state is active for an amount of time equal to the
limit. Note that we used a timeout lo simplify the UnitControl state machine in Figure 8.7;
. rather than starting and checking an external timer in state DoorOpen,. we instead created a
transition from DoorOpen to Idle ,~ith the condition timeout( 10). History is a mechanism for
Figure 8.7: The elevator's UnitControl with new behavior for a new inputjire: (a) without hierarchy (quite a mess). remembering the last substate that an OR-decomposed state A was in before transitioning to
(b) with hierarchy.

218
. ----------~- -~~ -:~ . _ ___ _ j
Chapter a: State Machine and Concurrent Process Models
8 .9: The Role of an Appropriate Model and Language
ElevatorController
we describe FireMode as a sequential program. We didn ·1 have to use scquenlial programs for
Treq; R those program-states, and could have used state machines for one or both - the point is thal
UnitControl ,Jr
NormalMode PSM allows the designer to choose whichever model is most appropriate.
up = down =0; open = 1; PSM enforces a stricter hierarchy than the HCFSM model used in Statecharts. In
while (I) { ·
req = ... Statecharts. transitions may point not just between states at the same level of hierarchy_ but
while (req = floor); may cross hierarchical levels also. An example is the transition in Figure 8.7(b) pointing from
open=O; · the ~11~eDrOpen substate of the FireMode stale to the A'orma/Mode state. Having this
if(req > floor) {up= l;} '
trans1t1on start from FireDrOpen rather than FireMode causes the elevator to always go all
else {down= l;} · I
while (req != floor); the way down to the first floor when the fire input becomes true_ even if the input is true just
open= I; momentanly. PSM. on the other hand. allows transitions only between sibling states (i.e..
delay(lO); between states with the same parent state). PSM's model of hierarchy is the same as in
I sequential program languages that use subroutines for hierarchy; namely.· we alwars enter the
!fire subro~tine from one point and when we exit the subroutine we do not specify t; where we
fire are exit.mg.
As in the sequential programmmg modeL but unlike the HCFSM modeL PSM includes
the n~tion of a program-state completing. If the program-state is a sequential program. then
reachmg the end of the code means tl1e program-state is complete. If the program-state is
OR-decomposed into substates. then a special complete substate· may be added. Transilions
may occur fro~~ substate to the complete substate_ but no transitions may leave the complete
Figure 8.9: Using PSM to describe the ElevatorController. substate. Trans1honmg to the complete substate means that the program-state is complete.
Consequently, PSM mtroduces two types of transitions. A iransition-immediateh- (Tl)
another state B. Upon reentering state A, we can start with the remembered substate rather transition is taken immediately if its condition becomes true_ regardless of the stalus· of the
than A's initial state. Thus, the transition leaving A is treated much like an interrupt and Bas source program-state - this is the same as tl1e transition type in an HCFSM. A second_ new
an interrupt service routine. type of transition, transition-on-completion (TOC). is taken only if the condition is true AND
the source program-state is complete. Graphically_ a TOC transition is drawn originating from
a filled square inside a state, rather than from tl1e state· s perimeter. We used a TOC transition
in Figure 8.9 to transition from FireN!ode to Norma/A/ode only after Firdfode completed.
8.8 Program-State Machine Model _(PSM) where such compktion meant that the elevator had reached the first flooi;. By supporting both
The program0 state machine (PSM) model· extends state machines to allow use of sequential types of transitions, PSM elegantly merges · the reactive nature of HCFSM models. using Tl
program code ·10 define a state's actions, including extensions for complex data types and transitions. with the transfom1ational nature of sequential program models, using TOC
variables. PSM also includes the hierarchy· and concurrency extensions of HCFSM. Thus, transitions.
PSM is a merger of the HCFSM and s~uential program models, subsuming both models. A The SpecCharrs language was tl1e first language designed to easily capture tlie PSM
PSM having only one state, called a program-state in PSM terminology, where that state's model. Actually, two languages were defined. one graphical and the other textual. SpecCharts
actions are defined using a sequential program, is equivalent to a sequential program. A PSM was designed as an extension of VHDL, using VHDL 's syntax and sem·an1ics for all variable
having many states, whose actions are all just assignment statements, is equivalent to an declarations and sequential program statemenls. More recenlly. the Spec(' language was
HCFSM. Lying between these two extremes are various combinations of tlie two models. developed to capture PSM, hut uses an extension of C rather than VHDL.
· For example, Figure 8.9 shows a PSM description of the ElevatorController behavior,
which ·we AND-decompose · into two concurrent program-states UnitControl and
RequestResolver, as in the earlier HCFSM example. Furthermore, we OR-decompose 8.9 The Role of an Appropriate Model and Language
UnitControl into two sequential program-states, NormalMode and FireMode, again as in the
HCFSM example. However, unlike the HCFSM example, we describe NormalMode as a Specifying embedded system functionality can be a hard task. bul an appropriate eompulation
sequential program, identical to that of Figure 8.2(c), rather than a state machine. Likewise, model can help. The model shapes the way we lhink of the system. The language should
capture the model easily.

221
-r, ~;;; ·
:· ~c
I~·
g
Chapter 8: State Machine and Concurrent Process Models 8.10: Concurrent Process Model
[ Consider how models sh.aped the way we thought about the elevator controller example's
UnitControl behavior. In order to create the sequential program that we captured in Figure Coo=rentProcessExarrple ()
X = ReadX() PrintHelloWorld
f
l,
8.2(c). we were thinking in tenns of a sequence of actions. First, we wait for the -requested
floor to differ from the target floor, then we close the door, then we move up or down to the
y = ReadY()
Call =ncur.rently:
- ReadX - Read~
desired floor. then we open the door, and then we repeat this sequence. In contrast, in order to PrintHellaobrld(x) and PrintHowAreYou
create the state machine that we captured in Figure 8.3, we were thinking in tenns of possible PrintHcwAreYru(y)
system states and the transitions among those states. Many individuals say that, fo, this time
PrintHelloW)rld(x) (
example. the state ma:;hine model feels more natural than the sequential program model (b)
while( 1 ) {
When a system must react to a va,;ety of changing inputs, a state machine model may be a print "Hello =rld." Enter X: 1
good choice. Furthermore, notice that the HCFSM model was able ·to describe the fire delay(x); Enter Y: 2
behavior nicely. while the FSM or FSMD models would have become somewhat complex. Hello world. (Time= 1 s)
The language should capture our chosen model easily. Ideally, the language would have Hello world. (Time= 2 s)
constructs that directly capture features of the model - a language for capturing state Print:Ha,JJ\.reYou (x) Hew are you? (Time= 2 s)
..rule( 1) [ Hello world . (Time= 3 s)
machines should have constructs for capturing states and transitions, for example. However,
print "Hew a.re yru?" Hew. are you? (Time= 4 s)
such a model/language match is not always the case. As you may have already ascertained, delay(y); Hello world. (Time= 4 s)
the most common situation of a model/language mismatch in embedded systems is that of
having a language designed to support the sequential program model, but wanting to capture a
system using a state machine model. In this case, we can use structured techniques for
capturing the state machine model in the sequential program language, as shown earlier. To (a) (c)
see the benefit of using the best model, think of how the fire behavior would have been
incorporated into the sequential program of Figure 8.2(c). We would have had to insert checks Figure 8.10: A simple concurrent process example: (a) pseudo-code, (b) subroutine execution over time, (c) samplo
input and output.
for the signal throughout the code, making the code very complex.
The moral of _the story here. is that often we cannot choose the language used to capture
embedded system functionality - that choice is often dictated by other factors. But we need The concurrent process model is a model that allows us to describe the functionality of a
not be limited to using the model directly supported by that language. We can use a different system in terms of two or more concurrently executing subtasks. Many systems are easier to
model if that model provides an advantage, and then capture the model in the language using describe as a set of concurrently executing tasks.because they are inherently multitasking. For
structured techniques. instance, imagine this variation on the Hello World example. This system allows a user to
provide two numbers X and Y. We then want to write "Hello World" to a display every X
seconds, and "How are you" to the ·display every Y seconds. A very simple way w describe
this system using concurrent tasks is shown in Figure 8. IO(a). After reading in X and Y, we
8.1 o Concurrent Process Model call two subroutines, each describing one of the tasks, concurrently. One subroutine prints
Thus far in this chapter, we have looked at computational models such as finite-state "Hello World" every X seconds, the other prints ''How are you" every Y seconds. {Note that
machines and drawn an important distinction between computational models and languages. you cannot call two subroutines concurrently in a pure sequential program model, such as the
As defined in the previous chapter, a computation model provides a set of objects and rules model supported by the basic version of the C language). As shown in Figure 8. IO(b), these
operating on those object that help a designer describe a system's functionality. A system's two subroutines execute simu)taneously. Sample output for X = l and Y = 2 is .shown in
functionality, in fact, may be described using multiple computational models. A language, on ;-·igure 8. IO(c). To see why concurrerit processes are helpful, try describing the same system
the other hand, provides semantics and· constructs that enable a designer to capture a using a finite-state machine or Pascal program. You will find yourself exerting effort figuring
computational model. Some languages, in fact, capture more than one computational model. out how to schedule the two subroutines into one sequential program. Since thi.s example is a
Tn this chapter, we present a new <::omputational model called concurrent process. In addition,·, . trivial one, this extra effort is not a serious problem, but for a complex system, this extra
we extend our distinction between computational models and languages to include -cffon can be significant and can detract from the time you have to focus on the desired system ,
implementation. behavior. In general, the concurrent process model is useful when describing systems that are
inherently multitasking. That is to say that the function of these systems can best oe described ;
in tenns of a number of subtasks each executing concurrently to one another.
222 Embedded System Design · Embedded System o ·e sign 223
Chapter 8: State Machine and Concurrent Process Models
. - - = - - - - ~ ~ - - - - - ~ - - - - - -- -~--~..'.8::_.1,:.1::_:.::C~o:n::c:'.u~rr:en:t~P:'.r~o~ce:'.s'.:se:s:_
The choice of
computational
Heartbeat Monitorin S stem
j State : : Sequent : ' Data
( machine : 1 pro~ I! flo; I!
: /Concurren(
processes !
model( s) is based on
whether it allows the
designer to describe
Task I:
Read pulse
If pulse < Lo then
Task 2:
If BI /82 pressed then
Lo= Lo+/- I
the system. Activate Siren If B3/84 pressed then
(a) If pulse> Hi then Hi= Hi-+/- I
Activate Siren Sleep 500 ms
Sleep I second ' · Repeat .!
Repeat ~
The choice of ~
language(s) is based
on whether it
i~ l
captures the I
J
computational
model(s) used by the set-ton BOX 1I
designer. Task I: Task 2:
Read Signal Wait on Task I Ii
Separate AudioNideo I·. Decode/output Audio I
I
Send Audio to Task 2 · Repeat I
(b) Send Video to Task 3 !
The choice of
implementation is Repeat Task 3: ,_....
based on whether it Wait on Task I
meets power, size, Decode/output Video
performance and cost Repeat
, requirements. .. . -
Figure 8.11: Distinctions between computational models, languages, and implementations. Figure 8. 12: Typical examples of embedded system: (a) Heartboat monitoring system (b) Set-top bo~ system.
To describe a system as a set of concurrently executing tasks, we use a language that

captures the concurrent process model. An implementation is then derived from such a
description, An.implementation is a mappiflg of a system's functionality, captured using a 8.11 Concurrent Processes
computational model (or models) and written in some language (or languages), onto hardware
processors. This relationship is shown in Figure 8. 11. The choice of the programming We are familiar with describing a syste!Il's desired behavior using a sequential program
language is independent of the implementation. A particular language may be used because it model, captured perhaps in the C language. A sequential program model consists of a set of
captures the computational model used to describe our system. A particular implementation statements, and its execution consists of executing each statement one at a time. However,
may be used because it meets all power, timing, performance, and cost requirements of the some systems have behavior that really consists of several somewhat independent
system. Once a final implementation is .obtained, a designer can execute the system and subbehaviors. For example, a heartbeat monitoring system's behavior, shown in Figure
obseive the behavior, measure design metrics of interest, and decide if the implementation is 8. I2(a), can be decomposed into two parts. The first part, once a second, samples the input
feasible. A final implementation also seives as a blueprint or prototype for mass pulse and computes the heartbeat of the patient, making sure that it does not exceed or drop
manufacturing of the final product. below thresholds c~led Hi and Lo. The second part, twice a second, checks to see if one of
In this chapter, we describe the concurrent process model and ·related implementation the four buttons labeled BI through B4 is pressed and if so, increments or decrements the
issues. In addition. we introduce real-time systems as systems that are inherently composed of corresponding Lo or Hi threshold. These two subparts are quite independent from each other
multiple processes. However, processes of a real-time system have stringent timing and can be thought of as executing concurrently, even though they share access to common
requirements. data (Hi, Lo). In another example, a set-top box system's behavior, shown in Figure 8.12(b),
) may consist of three parts. The first part would receive a digital broadcast from the antenna
www.compsciz.blogspot.in - -- - - - · - · ·- -- - - - --- • . ~
- ¼ ........... ...- - · -·,
Chapter B: State Machine and Concurrent Process Models 8.12: Communication among Processes
and decompose it into compressed audio and video streams. l11e second and third parts of the further create other processes and so on. Again, keep in mind that in this discussion the term
system, in tum, will decode the compressed audio and video signals. The three subparts in the procedure and process a_re used interchangeably.
set-top box are quite independent of one another also, and can be thought . of executing Terminate terminates an already executing process and destroys all data associated with
concurrently to one another, even though they share data Tiying to describe them as a single that process. Terminate is an operation that is performed by one process on another. If a
sequential program in a sequential program model could be difficult. Instead, we' d like to process does not implement an infinite loop, it is terminated automatically when it reaches the
describe them using three sequential programs, indicating that these three programs could end of its execut_ion (i.e., right after executi~g its last instruction). The need for terminating a
execute concurrently. But we don't want three entirely separate prngrams, since those three process _may anse when handhng exceptional events. For instance, in an assembly-line
programs do need to communicate with one another. In fact, these three programs share large momtonng system composed of multiple processes, if one process detects an error condition
volumes of audio and video data Thus, the need arises for a model for describing multiple it may terminate other processes, such as those controlling the conveyer belt driver motor and
communicating sequential programs. A concurrent process model achieves this goal. A guide arms.
process is just one of the sequential programs iri such a model. The traditional definition of a
process is simply a unit of execution. A process executes concurrently with the other Process Suspend and Resume
processes in the model and is typically thought of as an infinite loop, executing its sequential Suspend suspends the execution of an already created process. Once a process, say, X, has
statements forever. started to execute, another process may need to stop it without terminating it. lllat means that
We define a process's state to be one of running, runnable, or blocked. A process is in the the state of X (i.e., all the intermediate data value~ that have been computed by that process)
running _state if it is currently being executed. A process is in the runnable state if it is ready and the location of the currently executing instruction or the program counter need to be
·and executable. Of course, there is no reason for a runnable· process not to be running. saved. A suspended process can, at some later point, be allowed to execute .again by restoring
However. as we will see later in the chapter, when we discuss implementation of concurrent its state and allowing it to execute. This operation is called Resume.
processes, a runnable process may be waiting its turn to -be executed. A process is in the
blocked state if it is not ready to be executed. There are a number of reasons for a process to Process Join
be in the blocked state. One reason could be that the process needs to wait for some other
process to finish its execution first Another common reason for a process to be blocked is Once a process, say, X, has started to execute, another process, typically the one that created
when it is waiting for some device to complete an operation, such as, waiting for the network X, may need to wait until ,X _fiP,ishes execution and terminates. That means that the process
invoking lhe join opera'tioii' is suspended until the to-be-joined process has reached the end of
device to send a data packet.
Recall that a computational model defines objects and operations on those objects. In ·a· its executing. This operation is called Join. Join is an important operation that is uses for
concurrent process model, a process becomes the fundamental object encapsulating some synchronization of processes arid their execution. We will discuss process synchronization in I
portion of a system's functionality. The basic operations defined by the concurrent process detail later in this chapter. I
I
model on processes are create, terminate, suspend, resume, andjoin, which we now describe.
I
Process Create and Terminate 8.12 Communication among Processes ,, j
Create creates a new process, initializes any associated data and starts execution of that When a.system's functionality is divided into two or more concurrently executing processes,
process. In our Hello World example, shown in Figure 8. lO(a), we created two processes by .. it is essential to provide means for communication among these processes. Two common
executing concurrently two procedures called PrintHelloWorld(x) and PrintHowAreYou(y}. methods for .communication among processes are shared memory and message passing. In
Each of these procedures described the sequential execution of one of the processes of our shared memory, processes can read and write the same memory locations. In message
example. Conceptually, one can think of a create operation as an asynchronous procedure call. passing, processes explicitly send or receive data to and from each other.
In a sequential programming model, a procedure call blocks the calling procedure and starts
executing tl)e called procedure. Once the called procedure terminates, coritrol is transferred _ Shared MG:nory
back to the calling procedure, and it is allowed to resume execution. In our analogy, a· Using shared mt;..,Oiy, multiple processes communicate by reading and writing the same
procedure acts like a process and the procedure call behaves like creating another process. In memory locations or common variables. This form of communication is very efficient and
contrast in the concurrent process model, an asynchronous procedure call does not block the easy to implement. An example of using shared memory is shown in Figure 8.13. In this
calling procedure (process). Instead both the calling procedure (process) and the · called particular example, we have two processes that share the same memory address space. In
procedure (new process), start executing concurrently. Either one of these. processes can
particular, they shared an array of N data items called buffer and a variable that holds the
Embepded .Syst~m Design_. · Embedded System Design 227

226
8.12: Communication among Processes
01: data type b..lffer[N];

02: int ;ount = O; 01: data type b..lffer[N];
03: void processA() 02: int count = O;
04: int i; 03: rrutex ca.mt rrutex;
05: while ( 1 ) { 04: void prccessA() (
06: prcduce ( &data) ; 05: int i;
07: while ( ca.mt = N ) ; /*locp* / 06: while ( 1 l {
08: b..lffer[i] = data; 07: prcduce (&data);
09: i = (i + li % N; 08: while ( ccunt = N ) ; /*locp*/
10: count = ccunt + l; 09: b..tffer (i] = data;
11: 10: i = (i + 1) % N;
12: 11: = t mitex.lodc();
l3: void prccess B() 12: a::,unt-= count + 1;
l 4: int i; 13: a::,unt mitex.unl.ock();
15: while( 1 ) { 14:
16: while ( ca.mt = 0 ) ; /*locp* / 15: )
17: data= buffer[i]; 16: void prccessB()
18: i = (i + 1) t N; 17: int i ;
19: count = ccunt - l; 18: while ( 1 ) {
20: consurre ( &data) ; 19: while ( crunt = 0 ) ; / *locp* /
21: 20: data = buffer[i J;
22 : ) 21: i = (i + 1) % N;
23: void main() 22: = t mitex.lodc();
24: create__precess (prccessA); create__p=ess (processB) ; 23: a::,unt-= OJllilt - 1;
26: ) 24: a::,unt mitex..unlock();
25: consurre ( &data) ;
26:
Figure 8.13: An incorrect solution to the consumer-producer problem.
27: I
28: void mair. () (
numbctofvalid data items in the array called count. One of the processes, named A, produces 29: creat,:,__process (pro::essA) ; create__prccess (processB) ;
as part of its computation data elements that are consumed by the other process, named B. 31: J
(For example, the producer process is decoding video packets while tl1e consumer process
displays the decoded packets on a LCD display.) When proce5: A has a ne~~data element figure 8.14: A ~orrect solution to the ~onsumer·producer problom.
read,·. it waits (line 7) for a location in the array to become ava!lable. Then, 1t puts the data
clen~cnt in the array and increments count. Likewise, process B waits (line 16) until at least
After the above execution sequence. the v,:.iue of count will be incorrectly set to 2 . The
one packet becomes available in the array. Then, process B removes that element, decrements
problem is that the execution of lines 9 and 19 should never be performed _concurrently.
count. and consumes the data. This example is known as the consumer-producer problem.
Instead thev should execute in mutual exclusion of each other. We can consider the code
Although the code in Figure 8.13 is very simple and appears to be correct, it will not
segments tl;at update the memory location of count as a critical secri 0 11. A critical section is a
function correct!\. To illustrate the problem, consider the case where count holds the value 3
possibly noncontiguous section of code where simultaneous updates, by multiple prncess~s t_o
and both process~s A and Bare about to update it concurrently (lines 9 and 19). The following
a shared memof\" location. can occur. In order to guarantee such mutual exclusion. 11 1s
c.,ccutio11 sequence will results in an incorrect final value stored into the coqnt vanable:
neccssar\' to mtr~duce primitives that enable us to lock a section of code allowing only one
• . I: loads count from memory into CPU register R l(R I = 3 ),
processe~ to be executing in that section at a time. , . . .
• .·1: increments value in register RI (RI = 4),
A murex is a primiti,·e that allows us to do just that. A mutex itself 1s a shared object with
• IJ: loads count from memory into CPU register R2 (R2 = 3 ),
t,rn operations. lock and unlock. A mutex is typically associated vith a segment of sh~~d
• 13: decrements value in register R2 (R2 = 2),
dala and ser\'es as a guard. disallo,,ing multiple read/write access to the shared memory 11 1s
• .,-1: stores RI back to memorv location of count (count = 4 ),
• H: stores R2 back to memory location of count (count= 2).
guarding. Tiro or more processes can simultaneously perfonn the lock operation, but only one
;f the processes will actualh acquire the lock. and others will be put in the blocked state

embedded System Design 229
www.compsciz.blogspot.in - ---·~-- ·-----------·---··---'·~- -- - --·-·- --
8.12: Communication among Processes
void process/\.() { void prcq!ssB o I

l>.hile( 1) { i.hile( 1 ) (
prodx:e (&data) zecei.ve(A, Miata> ;
01: rrutex rrutexl, rrutex2; seui(B, (dab) ; transfonn(&data)
02: void pi:=essA() [ seod(A, (data) ;
/* .regicn 1 * /
03: while( 1) { mcei.ve(B, (dab); /* regicn 2 * /
04: cxnsure (&data); }
05: rrutexl. lock() ;
06: /* critical secticn 1 * /
07: rrutex2.lock();
08: /* critical secticn 2 * / Figure 8.16: Communication among processes using send and ·~eceive.
09: rrutex2. unlock() ;
10: /* critical secticn 1 * / Locking sections of code is necessary to . correctly implement shared memory based
11: rrutexl.unlock() ; conununication among processes. However, using locks may lead to a deadlock, causing a
12: } system to hang. Therefore care must be taken in using locks. A deadlock is a name given to a
13: } evndition where two or more processes are blocked waiting for each other to unl9Ck critical
14 : V9id pi:ccessB () sections of code. Since both are waiting, neither can proceed and hence they will wait forever.
15: while( 1 ) [
As an example, consider the code segment in Figure 8.15. Here we have two processes, A and
16:
17: rrutex2. lock O ; B, that may execute two different critical sections of code simultaneously. Therefore, two
18: /* critical secticn 2 */ locks are used to disallow simultaneous access to these regions of code. The following
19: rrutexl. lock O; sequence of execution will illustrate the deadlock problem:
20: /* critical secticn 1 * / • A: executes lock operation on mutex l (A acquires the lock on muter l ),
21: rrutexl.unlock(); • B: executes lock operation on muter2 (B acquires the lock on muter2),
22: /* critical secticn 2 * I • AIB: they are both executing in critical sections l and 2, respectively,
23: rrutex2.unlock(); • A: executes lock operation on mutex2 (A is blocked wttil B unlocks muter2),
24: } • ·B: executes lock operation on mutexl (Bis blocked wttil A unlocks mutexl).
At this point, both processes are waiting for the other to unlock mutexl or muter2. A
Figure 8.15: Deadlock among processes. deadlock has occurred and the two processes will wait indefinitely. ·
One protocol for eliminating deadlocks is to only allow locking of mutexes in increasing
1 t t r the critical section of the code When the process holding the lock exits the order. That means that mutexes need to be numbered in an increasing order. In addition to this
una~ el o et~ e ot· the code it performs an wtl~k operation allowing the blocked processes to requirement, we must further impose the restriction that once a process unlocks a mutex, it
cnttca sec 10n , · · th I k u · g a mutex, <>
be put back in the runable state where they can compete to _acqmre e ~ .F sm 8 14 Let does not acquire anymore mutexes until it wtlocks all the mutexes that it currently holds the
ed our consumer and producer implementation as shown m igure . . lock to. In essence, a prQCess will undergo an initial phase where it is acquiring locks of
we have co~ect sume that the count variable currently holds the value 3 and processes A and increasing order and a second phase where it is releasing locks it has acquired in the first
us once again as . . .
B are about to update it simultaneously using the following executton ~e~cek, A is blocked phase. This fonn of locking is known as two-phase locking (2PL). H is left as an assignment
• AIB: execute lock operation on count- mutex (say B acqmres e oc to show why it is important to have a two-phase locking in addition to our initial requirement
and wtable to execute), · · of locking m~texes in increasing order.
• B: loads count from memory into CPU register R2 (R2 = 3),
• B: decrements value in register R2 (R2 = 2), Message Passing .
• B: stores R2 back to memory location of count (c?unt = 2), . Using message passing, data is exchanged between two processes in an explicit fashion. That
• B: executes unlock operation on count__muter (A ts made runnable agam), means that when a process wants to send data to another process it perfonns a special
• A: loads count from memory into CPU register RI (R l = 2), . operation, called send. Likewise, a processes must explicitly perfonn a special o~ration,
• A: increments value in register RI (RI = 3), called receive, in order to receive data from another process. The send and receive operations
• A: stores RI back to memory location of count(count = 3). both require an identifier that specifies·what process data is to be sent to or received from.
This time the value of count is correctly set to 3.
Embedded ,System Design Embedded System Design 23'1

230
Chapter 8: State Machine and Concurrent Process Models 8.13: Synchronization among Processes
The identifier Wliquely identifies one of the processes that are currently executing in 'ihe Condition Variables
system. An example of message passing is illustrated in Figure 8.16. Here process A , after One way to achieve svnchronization among concurrenth' executing p~occsses ·
. -· ._ . .. - 1s to use a
producing a data packet, sends it lo process B. Meanwhile, process B receives the packet, special. dconstruct called a cond111on vanable. A condition variable is an ob,iect ti . t -
k · . _ - . , . 1a pcnmts
performs some transformation on the data and sends it back to A . Process A, after receiving two m s_of operauons. called signal and wmt. to be performed on it. When wait is pcrf., cd
the data packet, consumes it and the cycle repeats. Regions of code labeled I and 2 are d" · I vnn
on a con 1110n vanab_ e. the proc~ss that performed the wait operation is blocked until another
segments that perform auxiliary functions in each process. proces~ performs a signal opera110n on lite same condition ,·ariablc. The semantics of a wait
Note that receive operations are always blocking. That means the once a process executes operallon 1s m fact a bit more complex. When a process. say. .·J. executes a wait opera lion. it
a receive operation, it is blocked until another process executes the corresponding send passes 11 a mutex Yanable that it has already acquired the lock for. The wait operation \\ill
operation. The send operations, on the other hand, may or may not be blocking. One reason th~~ cause ~e mutex to be unlock~d such that another process. say. 8. may be able to enter a
for having nonblocking send operations is to allow a process that just performed a send cnucal section and compute some value or make some condition become true. Once the
operation to continue with its execution. In our example, the regions of code labeled I and 2
are executed immediately after a send operation, even though the receiving process may not 01: data_type l:uffer[N];
have received the data item. 02: int count= O;
03: nutex cs_nutex;
04: cmditirn l:uffer_enpty, l:uffer full;
06: void p=essA() (
8.13 Synchronization among Processes 07: int i;
In order for two or more concurrent processes to accomplish a common task, they must al 08 : while( 1 ) (
times synchronize their execution. Synchronization among processes means that one process 09: prcx:h.Jce(&data);
must wait for another process to compute some value, reach a known point in its execution, or 10: cs_nutex.lock();
signal some condition, before it (the waiting process) proceeds. To clarify this concept, 11: if( camt = N) l:uffer arpty.wait( cs rrutex);
13: l:uffer[i] = data; - -
consider the consumer-producer example shown in Figure 8.14. Recall that on lines 8 and 19
14: i =- (i + 1) \ N;
processes A and Blooped waiting for some condition to become untrue. The condition for the 15: count =count+ l;
consumer process A was that the value of count becomes less than N, meaning that buffer 16: cs_nutex.unl.ock();
contained at least on~ empty slot. The condition for the producer processB was that the value 17: l:uffer_full. s ignal();
of count becomes greater than zero, meaning tliat buffer contained at least one new data item. 18:
This form of waiting on a condition is called busy-waiting. It is called busy-waiting because 19:
the waiting process is simply executing noops, instead of being blocked until the condition is 20: void processB()
met, hence making the GPU available for useful computation. In this section, we will 21: int i;
introduce constructs that are more efficient to use in place of busy-waiting. Note that we have 22: while( 1 ) {
discussed the join opemtion and blocking send and receive primitives earlier in this chapter, 23: cs_nutex.lock();
which are both forms of synchronization primitives. 24: if( ca.mt= 0) l:uffer full.wait(cs mutex );
26: data = buffer[i]; - -
The join operation that we discussed earlier is a limited form of synchronization among
27: i = (i + 1) ~ N;
two processes. Recall that here, one process performed a join operation on another process, 28: count = count - 1;
indicating that it should be blocked until that other process terminates. The blocking send and 29: cs_m:itex.unl.ock(};
receive protocols, a.k.a. synchronous send and receive, discussed in the previous section, also 30: 1:uffer_enpty.signal();
serve to synchronize processes. When one process performs a seqd or receive operation, it is 31: consune(&data); ,.
blocked until the other process reaches its receive or send point, respectively, before the 32:
blocked process is allowed to continue. We will next describe condition variables and 33:
monitors as synchronization mechanisms. 34: void Ira.in(}
35: create_proc:ess(processA); create_process (processB);
37: .-r
Figure 8. 17: Synchronized consuiner·produccr problem using condition rnriahks.

----··-- ·~ - . . ~ ~ --·- --,- -- -· j
,
l
I
Chapter s: State Machine and Concurrent Process Models
condition becomes true, process B will signal the condition variable causing process A to
8.13: Synchronization among Processes
II become runnable and implicitly reacquire the mutex lock.

To clarify, we will implement our consumer-producer problem using condition variables,
as in Figure 8.17. We have chosen two condition variables, one that signals whether there is at
01: M::n:i.tor {
02:
03:
04:
data type tuffer[N];
int co.mt = 0;
cco:ii.tim tuffer_ full, a:n:liticn ruffer_E!lpty;
least one free location available in our buffer, called buffer_empty, and another that signals
I
f
weather there is at least one valid data item in our buffer, called bufferJuli .. The two
processes execute as follows. Once the producer process A has produced valid data, it
acquires the lock to the critical section. It then checks the value of count. If the value is N, the
p6:
07:
08:
09:
void prooassA.() (
int i;
while ( 1 ) (
prod.le:~ (&clat:a) ;
buffer is full, so it executes a wait operation on the buffer_empty condition variable, thus 10: if( count= N) buffer_tiipty.wait();
waiting until the buffer becomes empty. Meanwhile, by executing the wait operation, it 12: b.lffer[i] = data;
releases the lock to the critical section such that the producer process is able to enter and 13: i = (i + 1) % N;
execute that region of code. (Othern-ise, the consumer process will never be able to enter the 14: ca.int = COJlt + l;
15: buffer_full.signal();
critical section and consume data; therefore, the system will be deadlocked!) If the value of
16:
count is less than N, the consumer proce·ss simply inserts the data into buffer, increments 17: 1.
count, releases the lock, and signals to the producer process (possibly making it runnable 18: void processB()
19: int i;
Monitor 20: .mile ( 1 ) (
Monitor 21: if( count= O) buffer_full.wait();
DATA 23: data = ruffer[i];
Waiting 24: i = (i + 1) % N;
CODE 25: ccunt = COJlt - l;
26: buffer: enpty.signal();
27: ~ (&data) ;
28: buffer_full. signal() ;
29:
30: J
Process 31: ) /* errl rronitor */
y 32: void nwn o <
33: create_process (processA); create_p=ess (processB);
(a) 35: }
Figure 8.19: Synchronized consumer-producer.problem using monitors.

Monitor Monitor
Waiting again) that there is now at least one new data item available. ~e. consum~r process_B works
DATA
in reverse order. It too attempts to acquire the lock to the cntical secuon, then 1t checks
whether count is zero or not. If count is zero, it waits on bufferJuli condition variable,
otherwise, it removes a data item from buffer, decrements count, releases the lock and signal
to the producer process.
~
Monitors
Another way to achieve synchronization among cortcuqently executing processes is to use a
· special construct called a monitor. ":, '!'onitor is .~· co~lection ~f da~ and meth~s or.
~(d)
subroutines that operate.on this data ,~mular to an object m an obJecH~nented panid1gm. A
.;
Figure 8. i8: Producer-consumer example with monitors: (a) Xis allowed to enter the monitor while Y waits, (b) X
special guarding property of a monitor guarantees that only·one process 1s allowed to execute·
executes a wait on a condition and is blocked, y is allowed to enter the monitor, ( C) y signals the condition that Xis r inside the monitor at a given time. In other words, one and only one of the methods of a
·..;a1tmg on and thus is blocked allo\\ing Xto finish and exit the monitor, (d) Y is allowed to finish its execution. !
234 Embedded System Design . 1-

~nbedded System Design
235
www.compsciz.blogspot.in , -(" ·:e···s

8.14: lmplemeritation
monitor can be active at any given time. A proce~s, say, X, is allowed to enter a monitor if
there are no other processes executing in that monitor. This is shown in Figure 8. 18(a). Once
in a monitor, X has exclusive access to the data inside the monitor. If, and when, X executes a · Processor A l!l
wait operation on a condition variable, also defined inside the monitor, it will be blocked , iil
i::
waiting as shown in Figure 8.18 (b). At this point, another process, say Y, is allowed to enter · (a) Processor B 0
·.:,
the monitor. If Y signals the condition that Xis currently waiting on, Y will be blocked and X '. "'0
,§
Processor C
will be allowed to reenter the monitor. This is shown in Figure 8.18 (c). Then, once X .
terminates, or waits on a condition, Y is allowed to reenter and finish its execution as shown in · Processor D
~
0
Figure 8.18 (d). . u
To clarify this a bit more, we have implemented the consumer-producer problem using .
monitors as shown in Figure 8.19. A single monitor is used to encapsulate the sequential
.,
1
programs of the consumer and producer processes. The shared buffer is also encapsulated in ij'
fl
the monitor. Initially, one of the consumer or producer processes will be allowed to execute. : (b) ~1
Let us assume that the consumer process A will be allowed to execute first. Once the General Purpose ll
g
consumer checks the buffer size and discovers that there are no data items produced, it will · Processor
wait on the bufferJul/ condition variable and thus allow the producer process A to enter the
ll
monitor and produce a data item. Once the producer process A signals the bufferJuli l
condition, the producer process will be allowed to reenter and execute. This behavior will I
repeat and the two processes will take turn producing and consuming data items. Here, it is
left as an exercise to show that, the size of the buffer will never exceed I . Processor A
i::
0
I
(c)
.,
·.:,
General
8.14 Implementation Purpose
-~
So far we have discussed nwnerous operations permitted by the concurrent process model. Processor §
0
Here we will discuss how these operations are implemented using single or general-purpose u
processors.
Figure 8.20: Mapping processes on processors:- (a) processes mapped on multiple single-purpose processors, (b)
Creating and Terminating Processes processes mapped on one general-purpose processor, (c) processes mapped 10 a combination of single and general
purpose processors.
One way to i~plement multiple processes in a system is to use multiple processors, each 0
executing one process. Each of these processors may be a general-purpose processor, in which still execute at the nece.ssary rates. Different ways to map processes to processors are
case we can use a programming language like C to describe the function of the process and illustrated in Figure 8.20.
compile it- down to the instructions of that processor. Or, we can build a custom One method for sharing a processor among multiple processes is to manually rewrite the
single-purpose processor that implements the function of the process. In both cases, when processes as a single sequential program. For example, consider our Hello World program
using processors to implement multiple processes, we can achieve true multitasking (i.e., each from earlier. We could rewrite the concurrent process model as a sequential one by replacing ·
process will execute in parallel to other processes in the system). Implementing each process the c~:mcurrent running of the PrintHelloWor/d and PrintHowAreYou routines by the
on its own processor is common when each process is to be implemente_d using a following:
single-purpose processor. However, we often decide that several processes should be . I= l; T = 0;
implemented using general-purpose processors, While we could conceptually use one -/ while (1) l
general-purpose processor per process, this would likely be very expensive and in most cases Delay (!) ; T T + I
is not necessary. It is not necessary because the processes likely do not require 100% -of the if X mo dulo Tis O the n call PrintHelloWorld
processor's processing time; instead, many processes·may share a single processoris time and if Y modulo Tis O the n c a ll PrintHowAr e You
236 .Embedded System

,• . .
Design j· ;:~bedded System Design 237
We would also modify each routine to have no parameter, no loop and no delay; each
- Joining a Process
8. 14: Implementation
would merely print its message. If we wanted to reduce iterations, we could set I to the .
greatest common divisor of X and Y rather than to one. Manually rewriting a model may be If multiple processes are implemented using single-purpose processo 1h r, .
. . . . . . rs, an or one process .\
practical for simple examples, but extremely difficult for more complex examples. While to JOm anolher process Y would reqmre bmldmg additional logic that will d t · 1 y
ha ched · · · . e emunc w 1en
some automated techniques have evolved to assist with such rewriting of concurrent processes s rea !ls comp 1etlon pomt and m response resume X Therefore in ddiu· t h ·
· ·gna1 th · , a on o avmg
mput s1 s at signal when a processor should suspend, each processo t
into a sequential program, these techniques are not very commonly used. . · tha · di . r mus 11avc output
Instead. a second, far more common method for sharing a processor among multiple · signals t m cate when that processor 1s done executing its task lf mut 11· I ·
. . _ . . p e processors are
processes is to rely on a multitas~g operating system. An operating system is a low-level 1mpleulrn~ntedkinusli1~bg a s1thngle_general-purpose processors, join must be built into the language
program that runs on a processor, responsible for scheduling processes, allocating storage, and or m tltas _g rary at 1s used to describe the processes. In both cases, lhe prog ramming ·
interfacing to peripherals, among many other things. A real-time operating system (RTOS) is . language or hbrary may rely on the underlying operating system to handle this operation.
an operating system that allows one to specify constraints on the rate of processes, and that
guarantees that these rate constraints will be met. In such an approach, .we would describe our Scheduling Processes
concurrent processes using either a language with processes built-in (such as Ada or Java), or When multiple processes are implemented on a single general-purpose processor the rnann
· whihth '
a sequential programming language (like C or C++) using a library of routines that extends m c ese processes are executed on a single shared processor plays an important role cr
in
the language lo support concurrent processes. POSIX threads were developed for the latter meeting each process's tin:u"g requireme~ts. This task of deciding when and for how long a ·
purpose. processor executes a particular process 1s JcJtow as process scheduling. A scheduler is a
A third method for sharing a processor among multiple processes is to convert the special process that performs process scheduling. A scheduler can eilher be implemented as a
processes to a sequential program that includes a process scheduler right in the code. Thi°s nonpreemptive scheduler or preemptive scheduler. A nonpreemptive scheduler only decides
method results in less overhead since it does not rely on an operating system but also yields on what process to select for execution, on the processor, once the currently executing process
code that may be harder to maintain. completes its execution: A preemptive scheduler is a scheduler that only allows a process to
In operating system terminology, a qistinction is made between regular processes and. executed for a predetermined amount of time, called a time quantum, before preempting in
threads. A regular process is a process that has its own virtual address space (stack, data, order to ~o~ another process to execute on the processor. This time quantum may be lO to
code) and system resources (e.g., open files). A thread, in contrast, is really a subprocess lOOs of mtlhseconds long. The length of this time quantum greatly determines the response
within a process. It is a lightweight process that typically has only a program counter, stack, time of a system.
and registers; it shares its address space and system resources with other threads. Since We have already defined a process state as being one of rururing, runnable, and blocked.
threads are small compared to regular processes, they can be created quickly, and switching We further assign to each process an integer valued priority. Wilhout loss of generality, we
between threads by an operating system does not incur very heavy costs_. Furthermore, threads assume that the process with highest priority is always selected first by the scheduler to be
can share resources and variables so they can communicate quickly and efficiently. executed on the processor. A process's priority is often statically determined during the
Throughout this chapter, we use the tenn process to denote a heavyweight process or creation of the. process and may be dynamically changed during executioIL
i . lightweight thread. ~ A very simple scheduler is one that employs a first in first out FIFO scheduler. Using a
FIFO scheduler, processes are added to the FIFO as they are created or become runnable, and
Suspending and Resuming Processes processes are removed from the FIFO to be executed on the general-purpose processor
If multiple processes are implemented using single-purpose processors, suspending or whenever the time quantum of the currently executing process ends or the process is blocked.
resuming them must be built as part of the processor's implementation. For example, the Another type of a simple scheduler maintains a priority queue.of processes that are in the
processors may be designed having an extra input. When this input is asserted, the processor runnable state. When the scheduler is ready to select a new process for execution, it simply
is suspended, otherwise it is executing. If multiple processors are implemented using a single selects the process with highest priority for execution. When a blocked process becomes
general-purpose processor, than suspending or resuming the processes must be built into th~ · runnable, it is added to the priority queue of the scheduler to be selected for execution at some
programming language ,,r multitasking library that is used to describe the processes. In both later point When multiple processes have equal priority, the scheduier uses a first-come
cases, the programming language or library may rely on the underlying operating system to first-served basis to select among the processes .w ith equal priorities. When nonpreemptive
handle these operations. , scheduling is being used, this form of scheduling is called priority scheduling. When
preemption is used, this form of scheduling is called round7robin scheduling.
Of course, the real question is how to assign priorities to processes. Before we do this, we
have to have an understanding of how often each of the processes in our system need to
238 ·Embedded System Desig! _· Embedded Sysjem Design

239
-------- -------- -------- --.-----
a.ls: Dataftow Model
Process Period Priority Process · Deadline Priority

A B C D A B C D
A 25 ms s G 17ms s A B C D
B SO ms 3 H 50ms 2
C 12 ms 6 I 32ms 3
D IOOms I J !Oms 6
E 40 ms 4 K 140ms I ~Q
F 75 ms 2 L 32ms 4
(a) (b)
z z z
Figure 8.21 : Priority assignment; (a) rate monotonic, (b) deadline monotonic priority assignment.
(a) (b) (b)
execute. Let us define the period of a process to be a repeating time interval during which that
processes has to execute once. For example, if we assign to process A a period of 100 ms, Figure 8.22: Simple ~taflow models: (a) nodes representing arithmetic transformalions, (b) oocfes representing mor."
then, process A must execute once every 100 ms. The period of a process is often obtained complex trdnsformations, (c) synchronous dataflow. ·
from the description of a system (e.g., a processes responsible for refreshing the screen on a
display device must run 27 times per second, which equals a period of 37 ms). This notion of
period is similar to the period of a sound wave. In rate monotonic scheduling, processes are
assigned priorities such that those with shorter periods are given higher priorities. We have 8.15 Dataflow Model
given an example of rate monotonic priority assignment in Figure 8.2I(a). Here there are six
processes, labeled A through F, with the corresponding periods given in the next column. We A derivative of the concurrent process model is the data.flow model. In a dataflow model we
can assign priorities to these processes, as follows. We assign the to the process with the .describe system behavior as a set of nodes representing transformations, and a set of dire~ted
largest period, D, the smallest priority, one. Then we assign to the next process with the edges representing the flow of data from one node to another. Each node consumes data from
largest period, F, the next smallest priority, two, and so on. its input edges. performs its transfonnation, and produces data on its output edge. AJI nodes
In the previous discussion, we have assumed that the execution deadline of a process is may execute concurrently. For example, Figure 8.22(a) shows a dataOow model of the
equal to its period. The deadline of a process is defined as the time before which a process computation Z = (A + BJ * (C - DJ. Figure 8.22 (b) shows another dalaflow model having
must run to completion. For example, if a process has a deadline of 20 ms, than it must more _complex node transfonnations. Each edge may or not have data. Data present on an
complete 20 ms after it starts. Note that the actual execution time of a process is equal or less edge 1s called a token. When all input edges to a node have at least on token, the node may
than its deadline. For example, process A may have an execution time of 5 ms, and a deadline fire. When a node fires, it consumes one token from each input edge, executes its data
of 20 ms. This means that once A is started, it can execute for 4 ms, than sleep for 14 ms, and trans!ormation on the consumed token, .and generates a token on its output edge. Note that
resumed to execute for the additional I ms. · Thus, the total time since that process started muluple nodes may fire simultaneously, depending only on the presence of tokens.
would be 4 + 14 + I = 19 ms, which is less than the deadline, therefore such scheduling would Several commercial tools support graphical languages for the capture of dataflow models.
be valid. If we know that a deadline of a process, being scheduled, is less than its period, we These tools can automatically translate·· the model to a conCl!rrent process model for
can use deadline monotonic priori~y assignment. As in rate monotonic priority assignmen~ implementation on a microprocessor. We can translate a dataflow model to a concurrent
instead of the period, we use the deadline to assign priorities. ·in deadline monotonic process model by converting each node to a process, and each edge :o a channel. This
. scheduling, processes are assigned priorities such that those with shorter deadlines are given concurrent process model can be implemented either by using a real-time operating system or
higher priorities. We have given an example of deadline monotonic prfority assignment in by mapping the concurrent processes to a sequential program. ·
Figure 8.2 l(b). Here there are six processes, labeled G through L, with the corresponding 0 We observe that in many digital signal-processing systems, data flows into and out of the
deadlines given in the next column. We can assign priorities to these processes, as follo~s. system at a fixed rate, and that a node may consume and produce many tokens per firing. We
We: assign the to the process with the largest deadline, K, the smallest priority, one. Then we therefore created a variation of dataflow called synchronous dataflow. In this model, we
assign to the next process with the largest period, H, the next smallest priority, two, ·a nd so on. annotate each input and output edge of a node with the number of tokens that node consumes
and produces, respectively, during one firing. The advantage of this model is that, rather th.an
t translating to a concurrent process model for implementation, we can instead statically
240 Embedded System Design ,

Eiiitiedded System Oesign 241
www.compsciz.blogspot.in j
;.:..,__ _
- - - - - - - - - - ~ - · - - - - - - - - - - - - - - - - - - - - - - - ·:,
Chapter 8: state Machine and Concurrent Proc~ss Models 8.17: SwnmaryJj
schedule the nodes to produce a sequential program model. This model can be captured in a for systems that are designed to interface to the Internet. The Windows CE kernel allows for '. \
sequential program language like C, thus running without a real-time operating syslem and 256 priority levels per precesses and implements preemptive priority scheduling. The size of
hence executing more efficiently. Much effort has gone into developing algoritluns for the Windows CE kernel is 400 Kbytes.
scheduling the nodes into "single-appearance" schedules, in which the C code only has one
statement that calls each node's associated procedure (tl1ough this call may be in a loop). Such QNX
a schedule allows for procedure inlining, which further improves performance by reducing the
The QNX RTOS architecture consists of a real-time micro-kernel surrounded by a collection ·
overhead of procedure calls. witlt0ut resulting in an explosion of code size that would have
of optional processes (called resource managers) that provide POSIX and UNIX compatible .
occurred had there been many statements that called each node's procedure.
system services. A micro-kernel is a name given,to a kernel that only supports the most basic :
services and operations that typical operating system's provide. However, by including or ·
excluding resource manager processes the developer can scale QNX down for ROM-based ,_
8.16 Real-Time Systems embedded systems, or scale it up to encompass hlUldreds of processors connected by various ,
In most embedded systems it is important to perform some of the compulations in a timely networking and communication technologies. Resource manager processes are modules that ·
manner. For example, in the set-top box example, shown in Figure 8. I 2(b), at least 20 video can be added or removed from the basic micro-kernel to best fit the functionality provided b}·
frames need to be decoded within each second for the output to appear continues. Likewise, a the operating system to that needed l?Y the application. The micro-kernel of QNX occupies
digital cell phone decodes audio packets, converts digital signals to analog, and reproduces less than 10 Kbytes and complies with POSIX real-time standard. QNX supports up to 32 ,
the voice in the speaker. All this takes place during strictly defined time periods, or else the priority levels per process and implements preemptive process scheduling using either FIFO, ·.
sound of the remote speaker would appear to be delayed to the listener. Other systems that round robin, adaptive, or priority-driven scheduling. · ·
f have stringent timing requirements include navigation and process control systems, assembly
!·
line monitoring systems. multimedia systems, and network systems, to name a few. Real-time
systems are systems that are fundamentally composed of two or more concurrent processes 8.17 Summary
that execute with stringent timing requirements and cooperate with each other in order to
We have introduced the concurrent ·process model as a well suitable model for describing a
accomplish a common goal. In order for these concurrent processes to work together, it is
large class of embedded systems. Since much of an embedded system's behavior~ be
essential to provide means for communication and synchronization among them. The
described as two or more concurrently executing tasks, the concurrent process model 1s well
concurrent process model addresses most of these requirements and is best suited for use in
suited for describing them. The concurrent process model provides operations to_create,
describing real-time systems. Thus. a system described using the concurrent process model
terminate, suspend, resume, and join processes. The concurrent process model provi~ for
with the additional stringent execvtion-timing requirement imposed on each process, is a
communication and synchronization of processes, since both of these are essential for
real-time system. The additional timing requirement of real-time systems is met by adapting
correctly implementing a system in tenns of multiple processes. Processes must be able to
scheduling algorithms that guarantee time_ly execution of each process in the system as
share data and synchronize their execution in order to achieve a commo~ go~. ~e have
described earlier in this chapter.
described communication protocols that use shared memory and send/receive pnnuuves. In
We will discuss some operating systems that are designed to support real-time systems.
the shared memory scheme, two processes commllllicate by reading and writing variables that
Note that the term real-time system refers to a class of applications or embedded systems thar
are visible to both. A mutex is used to lock for a period of time a region of s~ared data and
exhibit the real-time characteristics and requirements mentioned above. Real-time operating
only allow one process to update it. Syncluonization primitives such as condition variable and
systems, on the other hand, refer to underlying implementations or systems that supports
monitors are also used to allow processes to signal various events to each other. We have
real-time systems. In other words, real-time . operating systems provide mechanisms.
looked at the implementation of concurrent processes as single- and _general-puIPOSC
primitives, and guidelines for building embedded systems that are real-time in nature.
processes. We have defined a real-time system as a system composed of mulliple concurrently
executing processes each having stringent timing requirements. ·we have looked at two
Windows CE real-time operating systems and their features, namely the Windows CE and the QNX RTOS.
Windows CE was built specificaliy for th~ embedded system and the appliance market
providing a scalable real-time 32-bit platfonn that can be used in a wide variety of embedded.
systems and products. One of the benefits of using Windows CE as an RTOS that-supports the
Windows application-programming interface AP!, which has gained great popularity. This
operating system provides a set of Internet browsing and serving services that make it suitable·
242 Embrdded System DesigO., Embedded System Design 243

'
www.compsciz.blogspot.in .. ·····--·-·- -··--.. . . . .~ ~-j
I
II
Chapter a: state Machine and Concurrent Process Models

• Abraham Silberschatz and Peter B. Galvin. Operating System Concepts. Reading, MA:
Addison-Wesley, 1995. Describes concepts in operating systems.
• Jean Bacon. Concurrent Systems. New York: Addison-Wesley, 1993. Describes
concurrent systems including real-time, database, and distributed systems.
• Mukesh Singha! and Niranjan G. Shivaratri. Advanced Concepts in Operating Systems.
New York: McGraw-Hill, 1994. Describ_es advanced concepts in operating systems and
multitasking environments. I
• Gary Cornell and Cay S. Horstmann. Core Java . Englewood Cliffs, NJ: Prentice Hall,
9.1 introduction
1997. This book describes the Java programming language, including the multilhreaded
programming. 9.2 Open-Loop and Closed-Loop Control Systems
93 General Control Systems and PID Controllers
9.4 Software Coding of a PID Controller
8.19 Exercises 9.5 PID Ti.ming
8.1 Define the following terms: finite-state machines, concurrent processes, real-lime 9.6 Practical Issues Related to Computer-Based Control
systems, and real-time operating system. 9.7 Benefits of Computer-Based Control Imple~entations
8.2 Briefly describe three computation models commonly used to describe embedded 9.8 Summary
systems and/or their peripherals. For each model list two languages that can be used to 9.9 References and Further Reading
capture it. 9.10 Exercise
8.3 Describe the elevator UnitControl state machine in Figure 8.4 using the FSMD model
. ...definition <S, I, O; V, F, H, sO> given in this chapter. In other words, list the set of
states (S), set of inputs (1), and so on.
8.4 Show how using the process create and join semantics one can emulate the procedure 9.1 Introduction
call semantics of a sequential prograrruriing model. Control systems represent a very common class of embedded systems. A control system seeks
8.5 List three requirements of real-time systems and briefly describe each. Give examples
!o make a physical system's output track a ~esired reference input, by setting physical system
of actual real-time systems to support your arguments. ·
mputs. Perhaps the best-known example is an automobile cruise controller, which ·seeks to
8.6 Show why, in addition to ordered locking, two-phase locking is necessary in order to
make a car's speed track a desired speed, by setting the car's throttle and brake inputs.
avoid ·deaillocks. Give an example and execution trace that results in deadlock if
Another example is a thennostat controller, which seeks to force a building's temperature to a
two.phase locking is not used. desired temperature, by turning on the heater or air conditioner and adjusting the fan speed.
8.7 Give pseudo-code for a pair of functions implementing the send and receive More examples include controlling the speed of a spinning disk drive by var,-ing the applied
communication constructs. You may assume that mutex and condition variables are motor voltage, and maintaining the altitude of an aircraft by adjustment of the aileron and
provided. · _
elevator posit.ions. In contrast, digital cameras, video games, and cell phones are not examples
8.8 Show that the buffer size in the consumer-producer problem implemented in Figure 8.9 of control systems, as they do not seek to track a reference input. Figure 9.1 illustrates the
will never exceed one. Re-implement this problem, using monitors, to allow the size of idea of tracking in a control system.
the buffer to reach its maximum.
Designing control systems is not easy. Think of a car's cruise controller. It should never
8.9 Given the processes A through F in Figure 8.2l(a) where their deadline equals their let the car speed deviate significantly from the reference speed specified by the driver. It must
.. period, determine whether they can be scheduled on time using a non-preemptive adjust to external factors like wind speed, road grade, tire pressure, brake conditions, and
scheduler. Assume all processes begin at the same time and their execution times are as
\
follows: A: 8 ms, B: 25 ms. C: 6 ms, D : 25 ms, E: 10 ms, F: 25 ms. Explain"your
answer by showing that each process either meets or misses its deadline.
3
This chapter was contributed mainly by Jay Farrell of the University of California, Ii
Riverside.
j
'J
E;mbedded System Design
245 i
www.compsciz.blogspot.in -~- - - ~ -·-~---~.J
Chapter 9: Control Systems 92· ' - --
- - Open-l.oop and Closed-Loop Control SyS'i,:;~"f. ;j
•;;
Actuator
~
f>!
,.;
-6. ;:,.,!:! ,-, Reference input r,

u, (throttle)
Disturbance
w, ( road grade) !,fi
,
,,
I \
=i e
I \
reference input I \ (desired speed) ~J
_ _ ___;_ _;__.LI u, = F(r,) Car model [i

0
5~
II> -
/-sysiem-ouiput : \_,,,,' Plant model v,., = 0.7v, + 0.5u, -w, Cl
I
I
L;:. .- m
Ti,~ ,'
Control law
~ -cf·. 1------' ,' 1-----':
______ .,,. , Goal: design F u, = P*r, -~
such that v
! L....---------- approaches r Plant (Automobile) System model ~
"
(a)
tirne
(b)
Error
(a)
v,., = 0. 7v, + 0.5Pr,
I
il
e, (error)= r, - v, w,
to force a physical system's output to track a reference input: (a) good
Figure 9.1: The goal of a control system is
!racking, (b) not-as-good !racking.
detector
I·
~
ft Car model 8
engine perfonnance. It must correcdy handle any situation presented to it, like accelerating
u, = F(x1) v,., = 0.7v, + 0.5u, - w, "/J
from 20 mph to 50 niph while going down a steep hill. It should control the car in a way that Control law
·is comfortable to the car's passengers, avoiding extremely fast acceleration or deceleration, u, = P*(r, - v1)
and avoiding speed oscillations.
Plant (Automobile) System model
Control systems have been widely studied, and a rich theory for control system design
Sensor (speed) v,. 1" (0.7-0.5P)v, +
exists. This chapter does not describe that theory in detail, since that requires a book in itself · 0.5Pr, - w, ·
as well as a strong background in differential equatio_ns. Instead, we will introduce the basic
concepts of control systems using a greatly simplified example. This introduction will lead up
(b)
to PID controllers, which are extremely common. One of the goals•of the chapter is k> enable
the reader to detect when an embedded system is an instance of a control system, so that the Figure 9.2: Control systems and automobile cruise controller example: (a) open-loop control, (b) closed-loop control.
reader knows to tum to control theory (or to someone trained in control theory), rather than
using ad hoc techniques, in those cases. However, in some cases, PIO controllers can be used 4. The actuator !s the device that we use to control the input to the plant. A stepper
without extensive knowledge of control theory, and thus we will introduce some commonly motor co.ntrolling a car's thrqttle position is an example of an aciuator.
used PID tuning techniques. 5. The co~troller is the system that we use to compute the input to the plant such that
, we achieve the desired output from the plant.
6. A ~isturbance is an additional undesirable input to the plant imposed by the
environment that may cause the plant output to differ from what we would have
9.2 Open-Loop and Closed-Loop Control Systems expected based on the plant input. Wind and road grade are examples of disturbances
that can alter the speed of an automobile. ·
A control system with the~ components, configured as in Figure 9.2(a), is referred to as
Overview an open-loop, or fee~-forward, control system. The controller reads the reference input, and
Control systems minimally consist of several parts, illustrated in Figure 9.2: then co~putes a ~ttJng for the actuator. The actuator modifies the input to the plant. which,
I. The plant, also known as the process, is the physical system to be controlled. An along with any disturbance, results some time later in a change in the plant output. In .an
automobile is an example of a plant, as in Figure 9 .2(a). open-loop_ system, the controller does not measure how well the plant output matches the
2. Th~ 'output is the particular physical system aspect that we are interested in reference mput. ~us, open-loop con~ol is ~est suited to situations where the plant output
controlling. The speed of an automobile is an example of an output. respo~d_s very predictably to the plant mput (1.e., the model is accurate and disturbance effects
3; The reference input is the desired value that we want to see for the output The are muumal). ..
desired speed set by an automobile's driver is an example of a reference input Many control systems possess some additional parts, as illustrated in Figure 9.2(b):
1. A sensor measures the plant output. ·
,,
246 Embed~ System Design Embedded System Design
Chapter 9: Control Systems
9·2= Open-l.oop and Closed-Loop Control Systems
2. An error detector determines the differenc.e between the plant output and ·the .
reference input Flo be as simple or as complex a function as desired. Let's start b . .
simple linear function of the form: Y assuming that F 1s a
A control system with these parts, configured as in Figure 9.2(b), is known a as
closed-loop, or feedback, control system. A closed-loop system monitors the error between
the plant output a.id the reference input. The controller adjusts the plant input in response to
this error. The goal is typically to minimize this tracking error given the physical constraints
u, = P * r,
of the system. Here, P is a constant that the designer must specify. This linear prop0 ...., nal
ak · · · - - . rnO contro11er
mh es intwt1ve sense smce ,t mcreases_ the throttle angle as the desired speed increases. In
A First Example: An Open-Loop Automo~iSe Cruise Controller ot er words, the throttle.angle ,s proport10nal to the desired speed.
We are primarily interested in closed-loop control in this chapter. However, let us begin by ~iven this proportional contr?l fu~ction, we can now write an equation that models the
providing a simple example of an open-loop automobile cruise controller, illustrat~ in Figure combined controller and plant, which will help us determine what value to use for P:
9.2(a). As you probably already know, the objective of a crwse-control system is to maich the v,+1 = 0. 7v, + 0.5u,
car· s speed to the desired speed set by the driver.
Developing a Model: In many cases of controller design, our first task is to develop a u,=P*r,
model of how the plant behaves. A model describes how the plant output reacts as a function v.,, = 0.7v, + 0.5P * r,
of the plant inputs and current state. For our cruise controller, the model should describe how
the car reacts to the throttle position and the current speed of the car. As we will see iater in Th~ design goal for ~e cruise controller is to keep the actual speed of the car v equal to
this chapter, we don't always have to model the plant, and instead could design a controller tl1e des1 red. speed r at all umes. Of course, it is impossible to keep these two values equal at
through s.omewhat ad hoc experimenting. We could see how a particular controller works and all hmes, smce the car will require some time to react to any changes the controller makes to
iteratively modify the controller until the desired tracking is achieved. However, for many the throttle angle. For example, the car cannot accelerate from O to 50 mph instantaneously.
plants. like a car, such experimenting is dangerous, so usi.hg a model for the experimenting is Rather, from _the moment the controlle( sets the throttle, a car will take several sh:onds to
preferable. Furthermore, with a model, we can even design fihe controller using quantitative acc~lerate to its final speed. Therefore, the design goal can be relaxed to that of forcing the
techniques, thus avoiding the need for experimentation while creating a better controller.· ~ar s actual speed v to be equal to the desired speed r in steady state. Steady state means that
The car has a throttle input whose position u can vary from O to 45 degrees. We decide to if the_controller sets th~ throttle to a constant value, and nothing else changes. then at some
begin by test-driving the car on a flat road and taking measurements.. Suppose that with the lime in the future, v will also not change. So in steady state, vi+, = v,. Let' s refer to this
car traveling steadily at 50 mph and the throttle set at 30 degrees, we quickly change the steady-state velocity as v.,,. Substituting v,, for both v,+1 and v, above, we get:,
throttle to 40 degrees, and measure the car's speed every second thereafter, until the car' s v,+1 = 0. 7v, + 0.5P * r, i !
speed finally becomes constant. Based on the measured·speed data, suppose we detennine that ' iI
the following equation describes the car's speed as a function of the current speed and throttle let, v,.; 1 = v, = v..
I
position:
v,+1 = 0.7v, + 0.5u,
v" = 0:7v~, + 0.5P * r,
I
v.. - 0.7v~, =0.5P * r
Here, \', is tl1e car's current speed, u, is the throttle position, and v,+ 1 is the car's speed one. v,, = 1.67P * r,
second later. For example, v2 = 0.7v1 + 0.5u1 = 0.7 *. 50 + 0.5 * 40 = 55. Suppose further that
we try a variety of other speeds and throttle positions, and we find that the above equation So, if we want v,, = r,, we merely need to set I' = 1/1 .67 = 0.6. We have now designed
holds for all those other situations. Therefore, we decide:that the above equation is a suitable _our first controller:
first model for tl1e car over the range of speed that is of interest. Please note that this is· not u, = F(r,)
actually a reasonable model of a car, and is instead used for illustrative purposes only.
Developing a Controller: Now let's tum our attention away from modeling the car and
toward designing the cruise controller for the car. Suppose the only input to the controller is
u, = P * r,
Ii
the desired speed r,. as.shown in Figure 9.2(a). The controller's behavior is a function Fofthe
commanded speed, so that the throttle position is u, = F(rJ . The control designer can choose The controller merely multiplies the desired speed r, by 0.6? to d~temline the desired
throttle angle. .
I
!I
. 249 I
····- -·· ·-- ~ ·= ~ -,~--~ ~
- I
r!
··.q;
.'
Chapter 9: Conllol Systems
l
9.2: Open-Loop and Closed-Loo~ Control Systems
Time (t) v, v, for w - +5 v1 for w 5 degrees, corresponding to uphill roads. The car goes faster downhill and slower uphill.
I 0 20.00 20.00 20.00 Suppose road grade is incorporated into the earlier model for the car alone as follows:
1 29.00 24.00 34.00
2 35.30 26.80 43.80 vt+1 =0.7v,+0.511i-w,
3 39.71 28.76 50.66 Since the open-loop controller has no means of sensing the road grade or its effect on the
4 42.80 30.13 55.46
speed, this disturbance will obviously result in speed error when driving downhill or uphill.
5 44.96 3109 58.82
6
Figure 9.3(b) displays the behavior of the car with the open loop controller when driving up a
46.47 31.76 61.18
7 47.53 32.24 62.82 +5% grade, and Figure 9.3(c) when driving down a -5% grade. The speed error at time t = 12
8 48.27 32.56 63.98 is about 50 - 33 = 17 mph in the uphill .case, arid about 50 - 66 = -16 mph in the downhill
9 48.79 32.80 64.78 case. This error is quite bad! Closed-loop control systems, which will be discussed shortly,
10 49.15 32.96 65.35 can help reduce errors caused by disturbances.
11 49.41 33.07 65.74 Determining Performance Parameters: Using the model of the system created earlier, a
12 49.58 33.15 66.02 designer can quickly determine vario~s important performance parametas_ Assume that the
(a) (b) (c) initial speed is v0, the desired speed is ro, and the disturbance is w0, then we can develop an
equation for v, asfollows:
F.,gure 9 .3-. Open-loop cruise controller trying to accelerate the car from 20 mph to 50 mph, when the grade is: (a) v1 = 0.7vo+0.5P * r0 -wo
0%, (b} +5%, (c)-5%.
v2 = 0.7 * (0.7vo + 0.5P * r0 -wo) +0.5P * r0 -w0
1• Analyzing Our First Controller: Let's analyze how well this controller achieves its goal. v2 = 0. 7 * 0.7v0 + (0. 7 + 1.0)* 0.5P * ro - (0.7 + 1.0)* w0
Two issues are of interest: (I) what is the transient behavior whe~ r change_s; and (2) .":hat
effects do disturbances have on the system? Th(: equation representmg the entire system 1s. v1 = 0.71 * v0 + (0.11· 1 + 0.71· 2 + ... + 0.7 + l.0)(0.5P * ro -w0 )
v,+1= 0.7v, + 0.5*0.6r, The last equation shows three important points. First, in the model v,+1 = 0.7v, + 0.5u, -
w,, let's refer to the coefficient ofv, as a; in this case a= 0.7. Looking at the last equation, we·
v,+ 1 = 0.7v, + 0.3r, see that a detennines the rate of decay of the effect of the initial speed. In other words, a
To see how the system behaves, suppose a car is traveling steadily at 20 mph at time t = bigger a will restilt in the car taking longer to reach its desired speed. Notice that, in open
o at which time the desired speed r0 is set to 50. Given the form of our controller abo_ve, we loop c.oiltrol, the controller gain P has no effect on this rate of decay. In closed-loop control, it
will.
~e that the controller will set the throttle position to 0.6 * 50 = 30 degrees, and hold '.t there
until r, changes again. We can "simulate" the system by evaluafn~ the above _equatlon for Also note that if /al > I, then v, would grow without bound as time increased, since a is
various time values (a spreadsheet program makes this task easy). Figure 9.J(a) Illustrates the being"raised to the power oft. Furthermore, note that a negative a will result in an oscillating
car's speed over time. We see that (in the absence of disturbances) the controller does well, speed. Again, in closed-loop control, we will be able to change a.
approaching the desired speed of 50 mph to within 0.3% in 10 seconds. . . Second, the sensitivity of the speed to the disturbance is not altered by the open loop
Considering Disturbances: Suppose now that additional tt:5ting of the ~r 1s performed
· controller.
on roads with grades w varying from -5 degrees, ·correspondmg to downhill roads, to +5 Third, if our assumed model were not correct, then this model error wouJd cause the
steady state speed, that results from the open loop controller u = P,, to not equal the desired
speed.
4 Note that the simulation evaluates the controUer performance relative to a _model. For
the simulation results to accurately predict the results of the ~ture contr?I expenments \Uth A Second Example: A Closed-Loop Automobile Cruise Controller
the actual hardware, the model must be accurate. However, smce ther~ ~s expense mvoh ~d We can r«iuce the speed error caused by disturbances, ~ grade or wind, by enabling the
with developing the model, there is always a trade off and art to de~erm1run~ when the m':1el controller to detect·speed errors and correct for them. To detect speed errors, we introduce a
is sufficiently accurate to complete the analytic portion of the design. T~mg of the des gn speed sensor into the system, as shown in Figure 9.2(b), to m~ure the ca(s speed. We also
cypically occurs during the initial hardware experiments to accommodate differences between. introduce a device that outputs the difference between the desired speed r1 and the actual
the model and hardware. speed v,. lbis difference is the speed error e, = r, - v,. Note that the penalties for this

251
www.compsciz.blogspot.in ·--------·---~. ·------------·---- ··--·-·~·-~- .. _ ,_j

I
·- -..
9 .2: Open-Loop and Closed-loop Control Syste

Chapter 9: Control systems
closed-loop approach are the cost of the sensor, added controller complexity, and the addition Time v, u. v, u, v, u,
of sensor. noise. The benefits will be the ability to change the rate of response, reduction of 0 20:00 99.00 20.00 4S.OO 20.00 30.00
sensitivity to disturbances, and reduction of sensitivity to model error. If we select the forrn of 1 63.SO . -44.55 36.50 44.SS 29.00 21.00
the controller to be linear and proportional as before, namely u, = P * (r, - v,J, then: 2 22.18 91.82 47.83 7.18 30.80 19.20
3 61.43 -37.73 37.07 42.68 31.16 18.84
v,, 1 = 0.7v, + 0.5u, -w, 4 24.14 85.34 47.29 8.95 31.23 18.77
s S9.57 -31.58 37.58 40.99 31.2S 18.7S
v,+1 =0. 7v, + 0.5P * (r, - v,) - w, 6 25.91 79.50 46.80 10.55 31.25 18.75
Vt+! = (0.7. - 0.5P) * Vr + 0.5P * ft - Wr
7 S7.89 -26.02 38.04 39.47 31.25 18.75
8 27.51 74.22 46.36 12.00 31.25 18.75
Note that the closed-loop controller results in a = 0.7 - 0.5P, and remember that a 9 S6.37 -21.01 38.4S 38.10 31.25 18.75
determines the rate of decay of the effect of the initial speed. Therefore, by choice of the 10 28.95 69.46 4S.97 13.31 31.25 18.75
!)aiameter P, the control system designer can alter the rate of convergence of the closed-loop ...
system. However, we cannot make P arbitrarily large, because if the designer selects a value . 45 44.53 18.06 41.70 27.39 31.25 18.75
46 40.20 32.34 42.89 23.48 31.25 18.75
of P such that 10. 7 - 0.5PI > 1.0, then the speed will not converge to the commanded speed,
47 44.31 18.78 41.76 27.20 . 3L25 18.75
but instead grow without bound. The constraint 10.7 - 0.5PI < 1.'o is necessary for the system
48 40.41 31.66 42.83 23.66 31 .25 18.75
to be stable. This stability constraint translates to the following: 49 44.11 19.42 41.81 27.02 31.25 18.75
0.7 -0.5P < LO so 40.59 JI.OS 42.78 23.83 31.25 18.75
...
0.7 -0.5P >-LO ss 42.31 25.38 42.31 25.38 31.25 18.75
(a) (b) (c)
--0.5P<0.3
;
l
--0.5P > -1. 7 60.-------------------
i
. : 501---..--.---_,-----:::------"-----
p > --0.6 · '.[ 40f--:;;;,~-"~!'..,-~~=--~./-__:~Le-:..a:::::!!:::::tl:::::~::R=-11::!!
P < J.4 £~~~~~~~~~~~~~~~~~~
so, --0.6 < P < 3.4 1'20...-- - - - - - ---'r--- ---F- p=3.3
o> 10 r - - - - - - - - - - - - - - - - ~ - . - p = 1 . 0
We could set P close to 3.4 to obtain the fllitest decay of the initial condition. However,
remember that a negative a will cause oscillation, which is something we'd usually like to 0 -1--~-,-.--.--~-.--.-~-~~---;:::::::...._
0 2 3 4 .5 ·6 7 8 9 10 45 46 47 48 49 50
avoid. To keep a positive, we need:
Time (sec)
0.7 -0.5P >= 0 (d)
--0.5P >= --0.7
P<= L4 Figure 9.4: ~losed-loop ~_ise con~oll"': trying to accelerate from 20 to 50 mph, ignoring dislurbance, where vis car
speed and u ,s throttle pos1!Jon: (a) mvahd data when throttle saturation is ignored, (b) valid data for p = 3.3, (c) valid
So the fastest rate of convergence to steady state without oscillation, known as deadbeat data for P = 1.0, (d) plot for P = 3.3 and P = 1.0.
control, occurs when P = L4.

A control design goal is to achieve v equal tor in steady state, meaning v,+1 = v, = v,, = r. v., = (0.5P I (0.3 + 0.5P)) * ro - (1.0 / (0.3 + 0.5P)) * Wo
To analyze the steady-state response, we again assume the commanded speed and disturbance .. From this equation, we see that we can reduce the effect of the disturbance w0 by making
have the constant values ro and Wo. Substituting into the earlier system equation yieh;ls: . the coefficient (LO/ (0.3 + 0.SP)) less than l, meaning that P > l.4. But, remember that P >
v., =(O. 7 - 0.5P) * v,, + 0.5P * ro - Wn 1.4 will cause oscillation! · ·
(! - 0 ; + 0 SP) * v,, ~ 0. 5P * , .. · '"· .

-.c_~-~-=----------------'---------------
Design .
E111~edded System 253
i
[;
- - - - - - ' - - - - - - -·-- ···- ·--·- -·-·· -·-·- - - --
252 Embedded System Design· '
~
www.compsciz.blogspot.in. :i ' .........······ ·---- ~ - .. ~-·.·---··-·· ..·-..-· J
- 9.2: Open-t.oop ,tnd·Closed-Loop Contiol_Systems -
con~ts of c~nvergence rate, di~ce ~jection, and steady-state tracking accuracy. To

contmue, assunung thal steady-state tracking IS of primai:y performance, let p = 3.3.
Time v, u, v, u,
0 20.00 45 .00 20.00 45.00
We have now designed our second controller. 1be controller sets its output, the throttle
angle u,, to 3.3 times its input, as follows:
I 31.50 45.00 41.50 28.05
2 39.55 34.49 48.08 6.35 u, = 3.3 * (rt - v,)
3 39.93 33.24 41.83 26.97
4 39.57 34.42 47.76 7.38 This controller will result in oscillation, but that's the price we pay to achieve the
5 39.91 33.30 42.13 25.99 smallest steady-state error. Notice that the input to our second controller is the speed error, in
6 39.59 34.37 47.48 8.31 contrast to om first controller, whose input was the desired speed. Let's analyze how well this
7 39.89 33.35 42.3') 25.10 second controller achieves its input-tracking goal by "simulating" the system, namely, by
8 39.60 34.32 47.23 9.15 iterating the closed-loop equation over the time range of interest (again, a spreadsheet helps
9 39.88 33.40 42.63 24.30 with such simulation). Initially, assume an initial speed of 20 mph, a grade w of 0, and then a
IO 39.62 34.27 . 47.00 9.91
...
desired speed setting of 50 mph. Figure 9.4(a) shows the speed v, and the throttle position u,
45 39.76 33.78 44.52 18.09 from time O to 50 seconds. Notice that the controller generates throttle ·angle commands
46 39.72 33.91 45.21 15.82 outside of the 'range of possible throttle positions of O to 45 degrees. Thus, the data in Figure
47 39.76 33.78 44.55 17.97 9.4(a) is not valid Instead, we must treat any value less then Oas 0, and greater than 45 as 45.
48 39.73 33.91 45.17 15.92 The throttle is said to saturate at Oand 45. ·
49 39.76 33.79 44.58 17.87 . Figure 9.4(b) shows the speed and throttle position when we include,this saturation in the
50 39.73 33.90 45 .14 16.02 model. Figure 9.4(d) shows the speed versus time graphically. Notice that, for P = 3.3 (a= 0.7
... -0.5P = -0.95), the speed oscillates for many seconds until it finally reaches a steady-state
39.74 33.85 44.87 16.92
ss
(a) (b)
speed of 42.31 mph. Recall that a negative a causes such oscillation. Intuitively, this
oscillation means the cruise controller is accelerating too hard when the current speed is less
6 0 ~ - - - -- - -------------- than the desired speed, thus overshooting the desired speed. Also notice that the steady-state
50 ~ - - - - ~-- ~ - --,-- - - - . - - - - - - - - - - - - speed is not 50 mph, but rather 42.31 mph, representing an error of about _8 niph. The
iE 40 ••••••••
+---JC----.-li'.---!11-_..-'-llt--il-.....-411--11-....-il--tt-..........-il simulated responses in Figure 9.4(b) and (d) thus confinn our earlier analysis of oscillation
and steady-state ::rror.
-
~ 30-l--+-18-----------
l 20 lll'.------__:__:_ _ _ _ _ _ _---'----_ _-i~p=3.3
Since osci.1.lation of the car speed could be uncomfortable to the car's passengers, we
would like to reduce or eliminate the oscillation. We can reduce the oscillation by decreasing
C/J 10 L__.._p=1 .Q the constant P in the controller. For example, Figure 9.4(c) shows the speed .and throttle
0 _j_~.---.----.----.----.----.----.-- -r--r__:,_.~-=;==;:::~ positions for P = 1.0. Figure 9.4(b) shows the speed graphically. The result of the smaller Pis
0 1 2 3 4 5 6 7 8 9 10 45 46 47 48 49 50 that oscillation is eliminated and convergence time is reduced; however, the steady-state
Time (sec) speed is only 31.25 mph, representing a large error of nearly 19 mph. - -
(c) • We have learned an ·important lesson of control, namely, that" ,system-perfonnance
objectives, such as reducing oscillation, obtaining fast convergence, and reducing steady state
0
error, often compete with one another.

Figure 9.5 : The same closed-l0:0p cruise controller, trying to a_cceleratc from 20 to 50_ mph, this time in the presen~
of a di~turbance of a grade equal to: (a) +5%, (b) -5%, (c) grapliical illustration of(a) and_(b).
. Recall that a motivation for using closed-loop control was to reduce the speed error
.
caused by disturbances like grade. Figure 9.5(a) and (b) show the effect~ of +5% and -5%
grades. respectively, using P = 3.3 in the controller. Figure 9.5(c) sho""'.s the results for both
Also, if we want .v.., to be approximately equal to r,,,
then we need to S"Clect P as large a,__
situations graphically. Notice that steady:-State errors of about IO mph arid 5 mph are not too
possible, so that the term multiplying r,,., namely, (0.5P I (0.3 + 0.5P)), is approximately equal
much different than the 7 mph error with a 0% grade, but are much improved over the 17 mph
to I. There is no value of P i'n the range -0.6 < /' < 3.4 for which this coefficient eq1;1als 1, ~o
perfect steady state tracking is not achievable by proportional control for this example. The. and -16 mph errors that resulted from the open-loop controller in Figure 9.3(b) and (c). Thus,
best we can do to minimize steady-state error, therefore, is to set P reasonably 'close _to 3.4. the goal of reducing the sensitivity to disturbances has been achieved, involving a trade-off of
Note that the designer must select P to balance the trade-offs between the conflicting
254 Embedded System'Oesign Embedde_d System Design 255
i
www.compsciz.blogspot.in -~ ~""--'-- ~ --- -~ - j
I
n·
; ':
·J')-
9 3· Gene I Co · · . ·
• • ra ntrol Systems and PIO Controllers
Mp a) Rise time T, is the time required for the response 10 h .,_

o¼ . . .,_ . .. c ange 11 om 10% to
. o of. the distance uom the tmtial value
90 . I tiorthe fi rst
to the final vaue,
,fl 1% lime. Different percentages may be of mterest in different a 1- ·
b) Peak time. T. . th . . ed pp •cations
P 1s
. e time reqwr to reach the first peak of.the response.·
c) Overshoot Mp 1s the percentage amount by which -the peak of-the res nse
exceeds the final value. po
d) &ttling time T, is the time required for the system to settle down t ·thin
ii:'
1% ~f final value. A dilfereQt percentage may be of interest in ~;;rent
applications. 1
3. Disturbance rejection: Disturbances are undesired effects on the system behavior

ca~s~ by the environment. A designer cannot eliminate disturbances, but can reduce
T, Time (sec) ·
their impact on system behavior. 1
4.. Robustness: The plant model is a simplification of a physical system, and is never
perfect. Robustness requires that the stability and performance of the controlled
figuro 9.6: Control system response perfonrninc~ m~trics.
system should not be significantly affected by the presence of model errors.
having introduced additional steady-state error when there is no disturbance, and of having Modeling Real Physical Systems
introduced oscillation.
To allow more control objectives to be satisfied with fewer trade-offs, the complexity of An essential prelude to control system design is accurate modeling of the behavior of the
th_e controller will have to increase. as will be described subsequently. plant The controller will be designed based on this plant model. If the plant model is
inaccurate, then the controller will be controlling the wrong plant. There are two key features
that real systems display that our earlier example did not consider.
The first · feature of real physical systems is that they typically respond as continuous
9.3 General Control Systems and PIO Controllers variables and as continuous functions of time. In the cruise-controller example we assumed
Havi11g seen the abo\·c examples. we can now discuss control systems more generally. This that the car's speed would change exactly one second after a change in the throttle. Obviously.
section discusses objccti\·cs of control design. modeling real physical systems, and the PIO cars do not synchronize their reactions to the discrete time intervals, but' instead they a~
approach to controller design. continuously reacting: Therefore, the plant dynamic model is usually a differential equation.
There. are .methods for determining a discrete ti me model that is equivalent '- only at the
..Control Objectives sampling mstants - to the plant differential equation. Between the sampling instants, the
discrete time model tells the designer nothing about the continuous time resPQnse. Therefore,
TI1e objecti\·c-of control system design is to make a physical sysiem behave in a useful
fashion. in particular. by causing its output to track a desired reference input even in the
·s
the sampling period must be selected much smaller than the system reaction time so that the
system cannot change significantly between sampling instants. The I second sample time
presence of measurement noise. model error. and disturbances. Satisfaction of this objective
used in the earlier examples of this chapter is not meant to be realistic. See also the
can be evaluated through several metrics specified relative to a step change in the control
subsequent discussion of aliasing.
systems input:
The second feature of real physical systems is that they are typically much more complex
I. Stability: The main idea of stabili~' is that all variables in the control svstcm remain
than any model we create. The model will not include all nonlinear effects, all system states,
bounded. Preferably. the error variables. like desired output minus ·plant output.
would con\'crge to zero. Stability is of primary importance. since without stabilitv.
a
or all state interactions. For example, the response of the speed of a car to change in tlrrottle
depends on spark advance, manifold pressure, engine· speed, and additional variables.
all of the other objectives arc immaterial. · · ·
. Therefore. any model is a simplified abstraction. Modeling and control design is an iterative
2. f'erjim11ance : Assuming stability. performance describes how well the output tracks
process, where the model of the actual plant is improved at each iteration to include key
a change in the reference input. Performance has SC\'cral ::~.pcc:ts. illusua!cd iii Figure ;/' .
features identified during the prior iteration, Then the controller is improved to properly
9.6.
address the improved model. Linear models usually suffice when the variables of the model
have a small operating range. ·
';Embedded System Design 257

www.compsciz.blogspot.in · · · - - -- - -- ....- ·· ._,______., . _ _ _h __ _ ... i
9.3: General Control Systems and PIO Controllers
70.0
I
0 60.0
!
~
reference input 50.0
~ 40.0 =.= • • • • • • • • • •
----------
I
_____., ; system output

--~,-
..,.
"'
/A- 30.0
----------
20.0 /-+-P=2.5, D=-035j
(a) time (b) ---P=2.5, D=O.O I
10.0
1
'
-.\-- P=3.3, D=-0.35 I
Figure 9.7: Aben..- controller could be designed if it could predict the future. Both controllers ha_ve forced the output 0.0 +-...,----..,--,-, - - . -·- - . - - ~ - , - - - . - - ~ - ~ - - - - - . - - - ·r---,----
halfway to the desired value. But the controller for (a) should start to re~ce ~e pl~ ~put, -..bile the controller for 0 2 3 4 5 6 7 8 9 10 11 12 n 1s
(b) should have increased the input earlier. Derivative control seeks to satisfy this pred1ct1on goal.
Sampling Instant
Controller Design
Figure 9.8: PD step response.
The earlier closed loop example showed that increasing P caused the steady state speed v.,.. to
better ~tch the desired speed r, and tq resist tracking error caused by distwbances. A
probably should increase tlle plant input, and actually should have increased it earlier. We sc1.:
controller that multiplies the tracking error by a constant is known as using proportional
these things because we predict the system's future behavior will be similar to its pas1
control. To summarize, when propo~ional control is applied to a first order plant the
behavior, a good assumption when dealing with physical systems. The derivative term, whicl1
resulting closed loop model is similar to our particular cruise-controller model of:
looks at the difference in the output between two successive time instances, can be used I<·
v1• 1 = (0.7 -0.5P) * v1 + 0.5P * r1 - w, achieve similar prediction, and thus can cause the controller to react accordingly. In the
language of control systems, this is referred to as adding lead.
Therefore, the controller parameter P affects transient response, steady state tracking
PD control implies a more complet controller, since the controller must keep track of th,.
error, and distlllbance rejection. However, we saw that adjusting P resulted in trade-offs
error derivative. However, l?D control will give us more flexibility in achieving our control
among these control objectives. We could reduce oscillation and improve convergence, but at ·
objectives. We can see this by deriving the equation for the complete cruise-controller s,s1c11,
the expettse of worse steady-state error, and vice versa. .
using PD control, just as we did for the simpler P controller: ·
PD Control: More degrees of freedom must be introduced into the controller design to
allow greater flexibility in the optimization of the trade-offs involved in the closed loop v,.1 = 0. 7v, + 0.5u, - W;
performance. We can achieve this by using a proportional plus derivati~e .controller. In- ·--·
let u, = P * e. + D * (e. - e,_i)
proportional plus derivative control (PD control), the form of the control law 1s: .
and Ci =t, - v,
u, = P* e,+ D * (e. - e.-1)
Here. e, =; r, ,,, is the measured speed error, and e1 - e,_1 is the derivative of the error
v,+1 = 0.7v, + 0.5 * (P * (r, -v,) + D * ((r, - v,) -- (r,.1 - ,·,_i))) - w,
(meaning the change in error over time). Pis the proportional constant, and Dis the derivative
constant. . . . ,
. Intuitively, the derivative tennis being used to predict the future. Consider Figure 9.7.-.
V1+1 = (0.7 - 0.5 * (P + D)) * v, + 0.5D * v,_1 + 0.5 * (P + D) * r, - 0.5D * r,.1 - w,
The two plots show two different responses. In (a), we as hwnai1s can~ _that the system When the reference input and disturbance arc constant, the steady-state speed is again:
output is approaching the reference input quickly, and so we should probably reduce the plant
input to prevent overshoot. In (b), we can see that the output is increasing very slowly. so we,,
Vss = (0.5P/ (1 -0 7 + 0.5P)) * r
258 Embedded System Desig". : Em~e_d ded _System Design

" ••-!··. ;159
I
Chapter 9 : Control systems

l
I 9.4: Software Coding of a PIO Controller'.\
~ j
!
error will eventually go to zero, since otherwise the controller o . . ;
In other words, with inrcgral control th dy . utput ':ould increase forever. '.:
h . . '. e stea state will not even exist unless e =-0 since ··\
60.0 · ot erw1se the integral tenn would be increasing. Therefo if I "" ' ':
that the system is stable, then for a constant input the ste:~Y, s~~ u::c~~ P and I ~e found such :]
I
.,,
50.0
40.0
8. 30.0
~=:~:t.-$. -·j.---a
.,,...,.... . ......
• • • •
We can comb'
u,
· al .
= p * e, +I* (co+ e, + ... + e,) + D * ((e, -

mg error 1s zero
me proportion , mtegral, and derivative control as follows: .
Co)+ (e2 - e,) + ... + (e, - i-1))
"
':
fl
(/)
aclu:i\~e~~:;~s : :1~0 ;::;c:';e~oi::!~er : : : ; 0 ~ is1~en to select the PID gains to ;,
20.0
different values of PID. The main effect of varying !is that. asp/o s_ step redspothnses for . ~ee i'.
: --+--· '" <2.5,1= .5, D=-0.35 th . . is increase , e rate at wluch
I e response converges to its desired value increases· how ver tlte / tenn d I aff; h Vt·
10.0 i ·~
--it- ?·=2.5,l-= .25,D=-0.35
·- i'=3 3.1=0,D=-0.35
nature of the transient. If I is increased too much_ th~n the ;espo'nse can be oes a so ·11 ect t e ;_!
even unstable. · come osc1 atory or ,:
0.0 ~ · --~-~·- - -------.,. -
PID comrollers are extremely co~mon in embedded control systems: Several tools exist ~
.::\ 2 3 4 5 6 7 8 9 ~0 1; 12 :3 14 15 16
:!1elp a des;gnher chboolsc the appropnate PID values for a given plant model. OE-the-shelf IC :,\
Sampling Instant
PID ~~:i;t setta e P, I, and D values, called PID controllers, are available to accomplish ',
~
;i
Figure 9.9: PIO step response.
9.4 Software Coding of a PIO Controller
This is the same as for proportional control, since in steady state the effect of the
A PIO controller can be implement~ quite ea~ily in software. Consider writing a program in
derivative tenn is zero:
C to implement a PID controller. It might consist of a main function with the following loop:
The characteristics of convergence of the tracking erroc e to its steady-state value is
determined by the roots of the polynomial: z2-(0.7 - 0.5 * (P + D)) *, - 0 .5D = 0, under the void ma i n ()
I! '
assumption that the magnitude of the roots (they may be complex) is less than l. Therefore,
L
\, adding the derivative tenn allows the transient response to be modified without affecting the double sensor:_value , a c tuator: value, er:r:or~c ur:r:e nt ;
~-' steady state tracking or disturbance rejection characteristics. Figure 9. 8 plots step responses PID_ DATA pid data; -
for various \'alucs of P and D. Note that the steady state value of the response is affected by P, Pidinitiali z; (&pi d data);
not D. The parameter D does significantly affect the character of the transient response, in while (1) ( -
other words, the rate of convergence and the oscillation. The dashed-dotted line should be sensor_value = SensorGetValue () ;
compared with the response in Figure 9.4, for wllich P = 3.3 and for which we can treat D =0. reference__value = Refer:enceGetValue ();
In summary, by building a slightly more complex controller, namely, a PD controller, which actuator:_value =
considers not just the error input bi.JI also the derivative of the error input we can adjust the PidUpdate(&pid_data,sensor_value,refer:ence value);
transient response and the steady-state error independently by adjusting D and P. Ac tuato r:SetVa lue(actuator:_ value ) ; -
PI and PID Control: In proportional plus integral. control (Pl control), the form of the
control law is:
u, = P * e. + I * (Co + e1 + .. . + c,) We create the main function to (oop forever. During each iteration, ~e first read the plant
output sensor'. read the current desired reference input value. and pass this information to
The integral tenn sums up the eiror over time. Let's consider this tem1 intuitively. Look funclion Pid(Jpdate. PidUpdate determines the value of the plant actuator. which we then use·
at Figure 9.4(d) again. Notice that both controllers achieve a steady-state value that is below to set the actuator.. Note that read.in~ th.e sensor will typically =.• ., ~ )ve an analog-to-digital .
the desired \·alue of 50 mph. As humans, we can see that we should just increase the plant converter. and setting . the actuator will involve a digital-to-ana 'lg converter; L'ie details of
input again until this error goes to zero. In o1her words. as long as there's error, we shouldn't these funct10ns are orrutted.
rest! Tite integral tenn achieves this goal by summing the error over time, we ensure rhat the
~
- - - -·- - - - - - - - - - - - - - ·· Embedded System Design
261
~
1,
260 Embedded Syst~m Design ;~
www.compsciz.blogspot.in . ., - - - - · ~C · -· - ·· ·2" .~
··-·---------- --
Controt1\
9.6: Practical Issues Related to C~put~r-Based
Chapter 9: Control Systems --------------------------~-.,..,;~~.;....-==,;.,-l ,\
'C1
safety is not a concern, and the cost of using the plant_ is not a major concern either, we can \!
Our PID- DA TA data structure has the following form:
select the PIO values through a somewhat ad hoc turung process. This has two advantages. ;1
F/rst, our model of the plant may be too complex for us to wotk with quantitatively. Second, ~
type de f struct PID - DATA
. [ . .
double Pgain, Dgain, Iga~n, . // find the derivative we may not even have a model of the plant, perhaps because we don't have the time or ~
double sensor_value_previous '. knowledge to create such a model. The tuning process we'll discuss has been shown to result -;'.\
double error_suro; // cumulative erro r in PID values that are reasonably close to the values that would have been obtained through ~
quantitative analysis. fj
l . ts which we assume are set in the
One tuning approach is to start by setting the P gain to some small value, and the D and J \_1·,_
. So !'1TJ DATA holds the three gam ~onstan , I which will be used for the 1
. - I I holds the prev10us sensor va uc, th . t I gains to_O. We then increase the D gain, usually starting about 100 times greater than P, until -
Pid!nitia!ize function. t a so . · ul . of error values, used for e m egra we see oscillation, at which point we reduce D by a factor of 2 to 4. At this point, the system , ..
der vative term. Finally, it holds the cum auve sum
will probably be responding slowly. Next, we begin increasing the P gain until we see ,
termWe can now define our PidUpdate function as follows: oscillation or excessive overshoot, and then we reduce P by a factor of 2 to 4. Finally, we ~
sensor_value, begin increasing the J gain, starting perl.iaps between 0.0001 and 0.01, and again backing off i
PidUpdate(PID_DATA *pid_data, double when we· see oscillation or excessive overshoot. These three steps can be repeated until either !
double . double reference_value)
" doub~e Pterm, !term, Dterm;

satisfactory performance is achieved or performance cannot be further improved. There are J.
many more detailed tuning approaches, but the one introduced here should give an idea of the \
basic approach. \
double error, difference;
. value - sensor value; */ ~l
err o r = re f eren _ . . .-/* pro po r t i onal term
'd d . t >Pgain * e r ror, - . */ i
Pterm = pi - a a - - . /*current+ cumulati ve .9.6 Practical Issues Related to CQmputer~Based Control
pid data->error_suro +- error,
// -th e integ·ral
.
term
. * 'd data->error sum,
. Quantization and Overflow Effects
_ id data->Igain pi_ .-
!term - p - . - sensor value previous - Quantization occurs when a signal or machine number must be altered to fit the constraints of
differenc e= pid_data > - -
sensor_value; the computer memory. For example, if the number 0. 36 were to be stored as a 4-bit fraction,
then it would have to quantized to one of the following machine numbers: 0.75, 0.50, 0.25,
// upda te for next iteratio~ = s ehs or_ value;
p.id_data->s ens~r_ va:i.ue _previous 0.00, -0.25, -0.50, --0.75, -1.00. The closed machine number is 0.25, which would result in a
quantization error of O.11. Quantization occuis for two reasons. \i
// the derivative te~ .\
* difference; First, machine arithmetic can generate results requiring more precision that the original
Dterm = pid_data-> Dgain
return (Pterm +!term+ Dterm); values. A simple example is the product of two 4-bit machine numbers:
i
0.50 * 0.25 =0. 125
l . .call made to the basic code above to improve . \
There are some modifications thai are typt y . . typically constrained to stay .This product cannot be stored as a signed four bit machine number. To limit the effects of \
F ample the error sum is
PID controller performance. or .ex . ' . to avoid having the variable reach I·ts quantization effects due to machine arithmetic, many digital processors will store intermediate \
within a particular range, to reduce osc11:u:~e=tion of the error is typically stopped both results with higher precision than the final result. In such applications, arithmetic quanitzation '~
upper limit and hence r~ll over to Od}J.~o, pe\ods of actuator saturation. only occurs when the fmal result of an operation is stored in a memory location. It is up to the
when the tracking error is large an unng di:signet to optimize the software implementation to take full advantage of such processor
design features. . . .
Second, the analog signals available from the sens~rs ;ue real valued.-'f hese analog
signals are quantized into machine numbers through the analog-to-digital convetsion process.
,:j
9,5 PIO Tuning . . based on a model of the plant. P, I, and D . Accuracy and expense increase as the number of bits in the digi\ll} representation increase.
·. ·u·_n··u \now. w_e have discu_ssed controller design . t· analy·s1·s· In many cases, however,
. ' · · ed thr gh quanUta IVe · . . h Overtlo,v' results when a machine operation outputs a nwnber with magnitude too large to
.. ·values could therefore be detenrun ou . t ecessary· In particular, m cases w ere be represented by the numeric processof.ln the4-bit example above, 0.15 +o.sO = f.2s is too
. - th p I and I) values 1s no n · .
quantitatively detenrurung e , , ·
Embedded System Design : 263 ,

EmbeddE:d System Design
262 . ·-· -· -·.. ...- ····-··· .~:_.,.,..:,_____, ·-·. _J
9.7: Benefits of Comput~',.••.•.• ,-~:~ ;}~:: ) :~·: '.;:~:_.;:·,:~;~;:'j~)j
!";Sb~ 'f ;;.

from <he f~pHng <h,P"'.t': t~•
<gnat ;, ;,...: : : , : ; : : ~ ~ ~
in:;~:n·w:ut1:;e:it: ~e::1:a~:1~;;~:; }' z(t) = l .O * sin((2l[)(O.S + 2:5n) * ·1) for ~y ·
f\'ij
on th
Aliasing is an artifact of the sampling process. When the _sampling frequency is f., based
sa_mplethrecordt. the codm~uterdcantol nltythiresolve fr~q uenc(1_e; in a range off., Hz. In most
I
.._.·.i .
1_e 11
app 1ca ons, e sys em is es1gnc so ia s range 1s 1 c -JA·,/v], where IN =Is I 2 is the "
Nyquist freque~cy. When the ac:ual signal has fr~uency content above the Nyquist Jt
frequeng•, II will appear to the computer as bcmg wllhin [-IN,}i.,J. For example, in Figure
9.10, the actual signal had a ·frequency of 3 Hz. which is above the Nyquist frequency offv =
1.25 Hz for that example. The computer treats th,:: measured signal as if it has a frequency of
',:1 3.0 -Is= 0.5 Hz. Therefore. due to aliasing, the computer control system would be trying to
&
en
.{);2 compensate for a signal at the wrong frequency. The consequences of aliasing can be
significant. For example, based on the sampling process above, the signal cos((2l[ / 0.4) * t)
--0.4 would be inteipreted by the numeric processor as a unit amplitude constant signal_
This section has not addressed the theory of the aliasing phenomenon (see one of the
--0.6
references on discrete time control or signal processing). Instead, the objective of this section
--0.8 is to ensure that the reader understands that this phenomenon exists. Aliasing places two
constraints on the design. First, the designer must understand the application well enough to
-1 judiciously specify the sampling frequency. Too large of a sampling frequency will result in
0 0:2 unnecessary increases in the cost of the fival product. Too small of a sampling frequency will
lime, t, sec. either result in aliasing effects that can be very difficult to debug or too low of a system
bandwidth. Second, analog anti-aliasing filiers must be designed and used at the interface of
Figure 9.10: lllustra1ion of aliasing. The sampling frequency is 2.5 Hz. each analog signal to the analog to digital converter. The pwpose of the anti-aliasing filter is
to attenuate frequency content above.IN so that the effect of aliasing is negligi~le.
. . . ber The results of overflow are dependent on the
large to be represented as a machine num . · b" ntation of machine
method that the num;!riC processor uses to implement the mary represe Computation Delay
numbers. . . dr th affects of ·quantization· and Time lags are of critical importance in control systems. Intuitively, delay results in the control
The designer has a few design choices to ad ess e . f bin umbers
· · · t resentation signal being applied later than desired. Obviously, too much delay will result in performance
overflow. The first is fixed versus tloa~g pom rep e o mac ctive ebutn require· degradation. The effect of delay can be accurately analyzed. For the designer of embedded
Fixed-point implementations are less expensive from the hardwarodar= q~tiz.ation and control systems there are two important conclusions. First, analyze at an early stage in the
the designer to carefully analyze andfidesr: syste:t to:p~=tatioens is .typically more design process; the hardware platform and processor speed relative to the phase lag that can
overflow effects. The hardware or oa g po . . k ·t
expensive, but the floating-point implementation may result m a faster time to mar e be tolerated. Second, organize the software so that only the necessary computations occur
between the time the sensor signals are sampled and the time that the control signal is output.
Move all possible computations to outside of this time critical path.
~~~ ..
. contmuous
Physical systems typically evolve m . ·
time, but discrete time control signals
. tuitive operate
behavior
. on samples of the evolving process. ~ese si~ple facts can lead to counter m . ' 9.7 Benefits of Computer-Based Control Implementations
if the sampling processes are not propet d:1gned.a1· ing The circles every 0.4 s repn,sent
Figure.9.10 illustrates a process/e err. tQ_as .1~5 Hz
The actual signal isy(t) = 1.0 *
Control systems can be implemented by either continuous time (analog component) or digital
time (computer based) approaches. Since most processes that we are interested in controlling
the discrete time samples. The samplmg frequency is . . ' .. . based on the samples
sin(6nt), which is periodic with frequency 3.0 Hz. The figure shows that . -- - evolve as continuous variables in continuous time, and computer-based control approaches
add additional complications such as quantization, overflow, aliasing, and :amputation delay,
it is important to consider briefly the benefits obtained through embedded computer control.
264 . . Embedded System Design ·· ··· --- ----- - ----- ·------·- ----
www.compsciz.blogspot.in 265
9.10: Exercises
Repeatability, Reproducability, and Stability

9.10 Exercises
The analog components in a control system are affected by aging, temperature, and
9. 1 Explain the difference between open-looped and losed 1 ·
manufacturing tolerance effects. Alternatively, digital systems are inherently repeatable. If
two processors -are loaded with the same program and data, they will compute identical are we more concerned with closed-iooped syst ~ - ooped control systems. Why
9.2
.
L1st ems.
results. They are also more stable than analog implementations in the presence of aging. and describe the eight parts of the closed-loo st · .
each (other than those mentioned in the book). p sy em. Give a real-life example of
9.3
Programmability . Using a spreadsheet program, create a simulation of the cruise-co . .
this chapter, using PI control only. Show simulations (graphs an~U: systems given 10
Programmability allows advanced features to be easily included in computer implementations
following p and / values. Remember to include thr ttl . . ta tables) for the
that would be very complex in analog implementations. Examples of such advanced features ·
Y_ou can ignore 0 e saturation 10 your ti
disturbance. (a) p = 3.3, I= o. {b) p = 4 0 1 = 0
include: control mode and gain switching, on-line performance evaluation, data storage,
performance parameter estimation, and adaptive behavior. In addition to being progranunable,
achieve a mearungful trade-off Explain that trade-off.
r%~w
differ from part_ (a)? Explain!) (c) p = 3.3, I = X (d) p = 3.. 3,, I=
·
equa ons.
d~ the results
oose X and Y to
computer-based control systems are easily reprogrammable. Therefore, it is straightforward to 9.4 Wnte a generic PID controller in C.
periodically upgrade and enhance the system characteristics.
-:\._
9.8 Summary
This chapter introduced control systems. A control sys_tem has several components and
signals, including. the .actuator, controller, plant, sensor, output, reference input, and
disturbance. We developed increasingly complex ·controHers, specifically a proportional
open-loop controller, a proportional closed-loop controller, a proportional-derivative (PD)
closed-loop controller, and a proportional-integral-derivative (PID) closed-loop controller.
There are numerous control objectives, including stability, and performance objectives such
as rise time, peak time, overshoot, and settling time. These objectives may compete with one
another. The more complex controllers assist us to achieve various objectives with less
restrictive trade-offs between objectives. Several additional issues must be considered when
using computers to implemeni a controller, including quantization and overflow effects
aliasing, and computation delay.

• Astrom, Karl J. and Bjorn WiEermark. Computer Controlled Systems: Theory and
Design, Englewood Cliffs, NJ: Prentice-Hall, 1984.
• Franklin, Gene F., J. David Powell, and Abbas Emami-Naeini. Feedback Control of
Dynamic Systems, 3rd Ed., Reading, MA: Addison-Wesley, 1994.
• Marven, Craig and Gillian Ewers. A Simple Approach lo Digital Signal Processing, ·
·Texas Instruments, ·1994. · ___.,-. ··
• Wescott, Tim. PID without a PhD, Embedded Systems .Programming, Vol. 13, No. 11,
October 2.0 00.

Embedded system Design
267
I
CHAPTER 10: /C Technology

_,
10.1 Introduction
10.2 Full-Custom (VLSI) IC Technology
10.3 Semi:-Custom (ASIC) IC Technology
10.4 Programmable Logic Device (PLD) IC Technology
10.5 Summary
IO.7 Exercises
10.1 Introduction
In Chapter I, we introduced the idea that embedded system design includes the use of three
classes of technologies: processor technology, IC technology and design technology. Chapters
2-7 have focused mostly on processor technology, since one should understand how to build a
processing system first, before learning what IC technologies are available to implement such
a system, and before learning what design technologies are available to help build the system
more rapidly. In this chapter, we provide an overview of three key IC technologies.
Several earlier chapters focused on an embedded system's structure. A system's
structural representation describes the numbers and types of processors, memories, and buses,
with which we implement the system's functionality. In this chapter, we focus on mapping
that structure to a physical implementation. A system's physical implementation describes the
mapping of the structure to actual chips, known as integrated circuits (ICs). A given structure
can be mapped to one of several alternative physical implementations, each representing
different design trade-offs. In fact, different parts of a structure may be mapped to different
physical implementations. We might think of the structural representation as food menu for a
banquet meal, and the physical implementation as the meal itself A wedding banquet might
call for a menu of chicken and vegetables, whereas a sports team banquet might call for
spaghetti. Thus, we see trade-offs made in choosing the structure. Each meal itself can be
prepared in different ways (e.g., the vegetables could be fresh or frozen). Thus, we see further
trade-offs in choosing the physical implementation.
·Embedded System Design
~
;i
R
- - - - - - - - - - - - - - - - - - - - - - - - - -I\
Chapter 10: IC Technology
--------------------------~.;__~1:o.~1=~·nfroduction~
-~-~-~--I~ :.~
ri
fj
... attracts electrons here,
D
gate !~
turning the channel
between source and drain x-q p-y melal2 layer
D ~
••
oxide layer ;_j
into a conductor. mc:lall layer f!1
F=(xy)' oxide layer -~
polysilioon layer
(al
source
(b)
drain
--
silicon substrate •y
D
X
,J
(a)-- (b) (c)
t1
Fi~re ; G. I : (a) a CMOS trans istor (nMOS), (b) t<>!Hlown view. F_if!lJr_e 10.2: Depicting circuits in silicon: (a) a NAND circuit schematic, (b) layers, (c) top-down view ofthe NANO !
etrcutt on 8!l IC.
f
We will consi<k:r lhree major categories of physical implementations, or IC technologies: .J
~mi-custom and programmable. We should mention that the tenn "technology" in When drawing tens or hundreds of transistors, the three-dimensional view of Fi~ 1
I •._; .•:.,111
,·;~ ~.C!''"Xt of ICs is often used to instead refer to a particular manufacturing process 10. l(a) quickly becomes cumbersome to create and is really unnecessary. Instead, we can use
a to~own two-dimensional view, wherein we first assign a unique pattern to represent each
tcchnologv, describmg the type and generation of manufacturing equipment ~ing used to
layer. Thus, the transistor of Figure 10. l(a) could be represented using the top-down view
\I touil<l t.ik '.( for.example, a chip may be manufactured using a CMOS 0.3-~cron process
teclml'logy. Our use of the term IC technology here refers instead to different categories of shown in Figure 10.l(b). The oxide layer is implicit, since it must always exist below the
}
ICs: each category can be implemented using any manufacturing process. . polysilicon.
I
I Om: should recall from Chapter l tbat processor technologies and IC technologies are Transistors are;n<it very useful unless they are connected with one another, and so we'll
i need to introduce at least two layers of metal, which we'll call metal 1 and metal 2, to seive as
independent of one another. Any type of processor can be map~ to any type of I~.
Furthennore, a single IC inay implement part of a processor, an entire processor, or as IS connections. These layers will need to be insulated from each other and from the polysilicon,
commonly the case today, multiple processors. .. requiring two more oxide layers. Figure l0:2(b) depicts the ordering of the various layers
Let 11s begin our discussion of IC technology by again examining a basic transi_stor. ~ we've introduced so far. This fi~ only depicts the ordering of layers, and doesn't show the
simplified version of a complementary metal-oxide-semiconductor (CMOS) tr.ms1stor IS connections that must also exist between higher and lower layers.
shown in Figure 10.l(a). It consists of three tenninals: the source, drain, and gate. The ~urce Note that we always need at least two layers of metal, since otherwise we will be unable
and drain rr.gions lie within the silicon itSelf, created by implanting ions ~to those re_~ons. to implement all but the most trivial of circuits. Think of trying to build a system of freeways
Th-· gat; , Il)llde from polysilicon, sits between the source and_drain but above the silicon, without being allowed to build any bridges and without being able to cross roads going to
se;rated from :he silicon by a thin layer of insulator, silicon dioxide. The voltage at the gate different places, and you'll understand why at least two metal levels are necessary.
controls wh,'.;t::r current can flow between the source and the drain, while the insulator Manufacturing processes that use even more than two levels of metal are common.
prevents cu.:: · :·rem, flowing through the gate itSelf. For an nMOS transistor, if a_ ~gh -, - Now that we have layers for representing transistors and their connections, we can build a
enough rolt1:;, ;~ ,ipplie.1 m the gat~. electrons are attracted from throughout the silicon simple circuit on an IC. Suppose that we want to build the simple NANO circuit that was
substrate \OlU the cruiru:ei ''.~ C.
C• ''. • £.'"Jd the drain, creating a field that allows curreni
,~ • • • • C
introduced in.Chapter 2, and is redrawn in Figure l0.2(a) for convenience, consisting of two
conduction between source a..'lct ,L~,:~ On the other hand, ifO Vis applied to the gate, then the nMOS and two pMOS t..'311SistOIS. We'll use a top-down view, and use the patterns shown in
channel cannot conduct ~ - · · Figure 10.2(b) for each !ayer. Figme 10.2(c) shows the top-down view· of the NAND circuit,
Notice that the transistor hw; three layers. The source and drain regions lie within the with black representing metall. Take. some time to see if you can see the correspondence
silicon substrate; these regions ;,re known as p~on or n-diffusiou, 4cpen!ling on whether between (a) and (c).
we a1e building an nMOS or pMOS ~nsistoi. The silicon dioxidt- insulating lay~r lies on_ top ·._ Let's consider how a circuit on an IC is actually manufactured. We'll begin by revisiting
of the substrate, and is typical[; reterred to a~ oxide. The gate r-:;on lies on·top of \he sihcon · the simple transistor of Figure 10. i' and considering how this transistor would actually be
dioxide, and is made from a substance kn0"11 polysiliC{'n. as . . manufactured. Since the ~ r consists of three layers, we might mistakenly asmune that
we could manufactme this transistor in three steps. In such an idealized manufacturing
- - - - - - - - - - - ~ - - - - - -- ---- -- .

··---·--"· .•-~- ~ . ~ - ~~ __j
Chapter 10: IC Technology 10.2: Full-Custom (VLSI) IC Technology
provided to the manufacturer on a magnetic t .

of dioital data o .d. th I ape commonly used for stonng large quantities
Design Manufacturing <=> , pr v, mg e ayout to a manufacturer is commonly referred to as "ta -o t ,,
Mask Chip cutout Because part of the manufactunng process involves s innin 1 .. . pe u :
~ ,:
creation /packaging also referred to as a "silicon spin." IC manufacturing ~ay tat::;:~:'.hcon, generatmg ICs is
/'L -- - ·-1 Manufacturing consists of several main steps. The first step 1·s to c · t
. h I ' rea ea set o mas s
f k
correspond mg to t e ayout. Hundreds of masks ma,, be required Th d ·
. h f h k . , · e secon step 1s to use
eac o t ese mas s to create the vanous layers on the silicon surf:ac · · f
.---------------------, b1 . · e, cons1stmg o several
su s eps per mask. We pomt out that this layering process doesn't just create a single IC but
: ,.--::::=-<"'__-;.~-:_-=.,a:;...--- :
rather nwnerous !Cs at once. The reason is that ICs are built~ a si·11·con wati A ·i'·
: c:::::, : 1 r. · Ll · r h d · · er. s1 icon
\a er 1.~ ~ un po.is e circle, shced from a cylinder of silicon, like a pepperoni slice intended
: ingot silicon wafer,!
I"'--"-- for a pizza 1s sliced from a cvlmder of sausage A silicon wafer may be te f · ·
d· ·h . ' · ns o centimeters m
I --------------------' iameter. " ercas an IC 1s usually less than one centimeter on a si' de th · tha
, f. h Id , us mearung t a
Figure 10.3: IC manufacturing steps. \\a .er can o, tens of !Cs (perhaps 100). Thus, the masks actually contain tens of identical
regwns. so that tens of ICs are bemg created simultaneously on a silicon \Vatie a ho ·
ti fi Thi k f h' . r, s s wn m
process, we would first inject into the substrate the ions necessary to create the source and 1e. 1gure. n o I 1s the next llme that you watch a movie where everyone is trying to get
drain regions. Second, we would lay the silicon dioxide over the channel. Third, we would thcJr hands on some )rototype chip" that must be found lest the world be destroyed· if
place the gate's polysilicon on top of the silicon dioxide. theres one chip, there·s probably 50 or 100 more that were made on the same wafer, lying
Unfortunately, IC manufacturing is not quite so simple. Many i.teps are necessary to around somewhere• (Never rrund- Just enJoy the movie).
create each layer. For example, in a common manufacturing process; creating the silicon . The third step is to test the !Cs on the wafer. ICs determined , to be bad are marked,
dioxide layer under the gate actually consists of several steps. First, we grow silicon dioxide . literally, so that they will be thrown away later. The machines that perform such testing are
on top of the entire IC by exposing the IC to extreme heat and gas, akin to growing rust on known appropriately as testers. They use probes that contact the pads, or input and output
metal. Second, we cover the silicon dioxide with a substance called photoresist, which ports, of a particular IC on the wafer. They then apply streams of input sequences and took for
becomes soluble when exposed to ultraviolet light. Third, we pass ultraviolet light through a the appropnate output sequences. These testers are very expensive devices, and their cost per
mask, which is designed to cast a shadow on the photoresist wherever we want silicon dioxide IC pm has actually mcreased. Unfortunately, with all the steps required to build an IC and
to stay; the remaining photoresist will be exposed to light and thus become soluble. This because of the extremely small sizes of the transistors and wires involved, bad JCs are quite
process is called photo/ithography. Fourth, we wash away the soluble photoresist with a comm.o n. Yield 1s a measure of the percentage of good ICs versus bad ICs coniaining errors
solvent, thus exposing regions of silicon dioxide. Fifth, we etch away the exposed silicon . Fmally, the last step is to cut out each IC and mount the good ones in an IC packa~e
dioxide with chemicals. Sixth, we remove the remaining photoresist tci expose the regions of which of course gets tested again. . '
silicon dioxide that we wanted in the first place. - Now that we have a better idea of how !Cs implement circuits and how res are
A similar process is repeated for each layer of the IC. An IC may llilve about 20 layers. manufactured, we .can survey the three main IC technologies: full-custom, semi-custom, and
So we see that there may be hundreds of steps, involving hundreds of masks, required to ~rogrammable logic device IC technology. Figure I0.4 provides an oven,iew of the designer's
manufacture an IC. 1asks for each of the technologies. Full-custom provides the best size and perfonnance but is
Fortunately, embedded system designers need not worry too much about the details of the costly to design and manufact~, while programmable-logic devices involve the simplest
IC manufacturing process. Instead, they may only have to provide the input to .the I:C design process at the expense of size and performance. Semi-custom represents a compromise
manufacturing process, which is a layout. A layout specilies the placement of every transistor between these two extremes.
and every wire connecting those transistors on the desired IC. The top-down view that we
used for depicting a circuit on an IC was in fact a layout. A layout is akin to a map showing
· the placement of every city and every highway connecting those cities. From a layout, an IC 10.2 Full-Custom (VLSI) IC Technology
manufacturer can derive the appropriate set of masks and thus manufacture the IC.
Figure 10.3 illustrates IC manufacturing steps. During the design phase, a designer In a full-custom IC technology, the designer creates the complete layout - a task often called .
creates a structural design and then generates a layout for i:hat design. The design phase may physical desiR,n or V!,S/ design (where VLSI stands for very large scale integrated circuit).
take many months. Once fully satisfied, the designer provides the layout to an IC The desi gner must design or Qbtam a transistor-level circuit for every processor and memory.
manufacturer, also known as a fabrication plant, "fab," or foundry. Because the layout is often After Llus pomt. there are several key physical design tasks necessary to obtain a good layout:
f.mbedded System Design

272 273
Embedded.System 'Design
10.2: Full-Custom (VLSI) !C Technolc,gy
ct,apter 10: IC Technology
· ______
Programmable Logic Semi-custom
..,A.,__________~ Full~ustom
Device r
Gate array
I'
Standard cell
II START
p-y p-type
..
I
J I
I
Designers are provided Design= create layolJtl
I I with a library of for basic coniponen\g. · · F = (xy)'
I I
I
I
I
I
predesi ed cells.
J I
I I
I I
I
I
I
I
,
I
J
I
I
I
I
Dt:::o \ n-type
I
I
I
I
' \
I
/ SfART
:
\ (a)
: Designers are provided , I1)esign~ celli Design= place the
\\G/ ~
'place and connect theliljI
: with a set of masks of components, resulting in Figure 10.5: A more compact NANO circuit: (a) NANO circuit schematic, (b) compacted layout.
I
: ~fined ates. ~\ resulting in masks. :
I ',', , • Placement: the task of placing and orienting every transistor somewhere on the IC.
I
', II
I ,_ • Routing: the task of running wires between the transistors, without intersecting other
I
I
I
, .........
--
'I wires or transistors.
I
I
I
• Sizing: the task of deciding how big each wire and transistor will be. Larger wires
[ START_ Designers provide the Designers provide the and transistors provide better performance but consume more power and require
: Designers provtde the connections among connections among · · more silicon area
, connections among components, which are
u
I cells, which are A good layout is typically defined by characteristics like speed and size. Speed is the
: gates, which are translated to masks. translated to masks.
longest path from input to output, or from register to register, typically measured in
\l@i G
nanoseconds. Size is the total silicon area necessary to implement the complete circuit. Both
of these features are usually improved when the circuit is highly compacted, namely, when
transistors that are connected are placed close together and hence their connecting wires are
I I shorter. Consider for example the NANO layout ofFigure I0.2(c). In that example, we did not
I I th ,The masks are sent to 1he
'Th masks are sent to the : Tue masks are sent to e : fabrication plant to pay attention to creating a compact layout. Figure 10.S(b) shows a compacted version of the
' ~abrication plant to : fabrication plant to NANO circuit. Notice how much less area is wasted in this compacted version. However,
\A·
• ~ weeks or months
~ irllll \ pool\llll_
produce res. such compaction must obey certain design rules. For example, two transistors must be spaced
apart a minimum distance Jest they electrically interfere with one another.
START
Designers obtain a
premade chip, and
.
r-~ r-~ I
\
I
I
I
I
.
.... •11-
~
In the past, many transistor circuits were converted by hand into compact layouts. Such
circuit design was a common job. However, !Cs can now hold so many transistors, numbering
in the hundreds of millions, that laying out complete ICs by hand would require an absurd
amount of time. Thus, hand layout is usually .used only for relatively small, critical
., •
program portions of I
the chip to execute ICs are now ready to l ICs are now ready to components, like the ALU of a microprocessor, or for basic components like logic gates that
ICs are now ready to : be tested/used. will be heavily reused.
the desired betested/used. be tested/used. I
functionality.
\., Instead of hand layout, most layout today is done using automated layout tools, known as
physical design tools. These tools typically include powerful optimization algorithms that run
for hours or days seeking to improve the speed and size of a layout.
I'
The advantages of full-custom IC technology include its excellent efficiency with respect l
Figure 10.4: The three IC technologies. to power, performance, and size. lntercoruiected transistors can be placed near each other and
thus be connected by very short wires, yielding good perfonnance and power. Furthennore,
I
-------o---'--_________:__________--;E:m:bedded=:::;-:Sys:.t:em::Co:es:ig;o;. ) Embedded System Design 275

274 .;: ;{
·· - --· -·- ------···-·--- -··-- ·- · _____
_ .,.._
Chapt~ 10: IC Technology
10.4: Programmabl l .
e og1c Device (PLO) IC Technology ij
only those transistors necessary for the circuit being designed appear on the IC, resulting in no
wasted area due to unused transistors.
The main disadvantages of full-custom IC technology are its high NRE cost and long
time-to-market. These disadvantages stem from having to design a complete layout, which
even with the aid of tools can be time-consuming and error-prone. Furthermore, masks for ·Cell
every IC layer must be created, increasing NRE cost and delaying time-to-market. In addition, Library
errors discovered after manufacturing the IC are common, often requiring several respins.
10.3 Semi-Custom (ASIC) IC Technology

As mentioned above, creating a full-custom layout can be quite challenging. A designer using
a semi-custom IC technology has this burden partially relieved, since rather than creating a
full-custom layout, the designer connects pre-layed-out building blocks. The common name
for such a semi-custom IC is an application specific integrated circuit (ASIC). The term
application specific was likely chosen to contrast with general-purpose processor !Cs, since (a)
for many years a processor was implemented as its own IC. AS!Cs in contrast implemented a (b)
circuit specific to a particular application (i.e., a single-purpose processor). Today, however,
· a single ASIC may implement · a combination of general-purpose and single-purpose Figure I0.6 : Semi-custom IC-technology: (a) gate-array, (b) s1anw.'..d cell.
processors. Needless to say, there is much confusion related to use of the term ASIC today.
Thus, we prefer the term semi-custom IC. . com~actly Iayed out. Examples of cells include a NAND t
The two · main types of semi-custom IC technologies are gate array and standard cell. : mulliplexor, and a combination of AND-OR-INVER ga e, ~ NOR gate, a 2 x 1
With either type, the main advantages versus full-custom are reduced NRE cost and faster i already Iayed out, but the placement of cells ha Ti:;tes. The transistors within a cell are
time-to-market since less layout and mask creation must be performed. The main / decide which cells to use, where to place ·the . s ;~t n detemuned. A designer thus must
disadvantage is reduced performance, power, and size efficiency. However, relative to; layout is shown in Figure 10.6(b). m, an ow to route am~ng them. A standard cell
programmable IC technology (yeJ to be discussed), semi-custom is extremely efficient in / Standard cell therefore requires more NRE . .
terms of performance, power, and size. Because of its good efficiency coupled with reduced ; array, since there is more layout remaining to be . c~st a:! longer ttme-to:market than gate
NRE costs, semi-custom is the.most popular IC technology today. j However, NRE and time-to-market . till
1s s
phe orm and all masks must still be made.
muc less than full t . .
layout within each cell is already completed In addi . . -cus om, smce the intricate
Gate Array Semi-Custom IC Technology gate array, since only those cells needed ar~ act 11 tron, ;ffio1ency_1s very good compared to
so as to reduce interconnect Furtherm h ual y use. , and their placement can be made
In a gate array IC technology, all of the logic gates of the IC have already been layed out with; . · ore, eac ce I ma~· unpie , t .
than m gate arrays leading t . , n,en more complex functions
their placement on the IC known, leaving the designer with the task of connecting the gates . , o more compact designs.
(routing) in a manner implementing the desired circuit. Note that gate here refers to a logic A compronuse between gate array and st dard .
array, or cell-based array A cell . . an cell senu-custom ICs is known as a cell
gate (e.g., AND, OR) rather than a terminal of a CMOS transistor. A simplified gate array name. Cells, which you·i1 reme;:\:tetty much what we'd expect it to be based on its
layed out, and have also already been I ~ ;;.ore comple_x than gates, have already been
layout is shown Figure 10.6(a).
Because the !C's gates are placed beforehand, many of them may go unused, since we together. P ace · us, the designer need only connect the cells
may not need all instances of each type of gate in -our particular circuit. Furthermore, routing
wires between gates may be quite long since the gate placement was decided before knowing
what connections would be made.
10.4
Pr~gram_mable Logic Device (PLO) IC Technology
Standard Cell Semi-Custom IC Technology The time reqmred to manufacture an IC . . .
In standard cell IC technology, common logic functions, or cells, have already been While we may accept this time once ' is measured m months. typically two to three months .Ii
can't wait so long to obtain a pro~; re rC:,~Y to manufacture o_ur final system, we probabl;
manufacture an IC (i.e., creating a Iayo~and mou~ s)ystem. Furthermor~, the NRE cost to
l;
276
as s may be too expensive to amortize over
i
l
I
I
www.compsciz.blogspot.in 277 l
······~-- '-- · ··.J
10.4: Program bf · .
ma e Logic Device (Pl.DJ IC Technology
the number of ICs we plan to manufacture if tliat number is small. ln·addition, manufacturing Programmable
an IC is risk~:. since we may discover after such manufacturing that an IC doesn't work connections c'd
properly in its target .system, either due to manufacturing problems or due to an incorrect
initial design. Thus, we never know how many respins will be necessary before we get a
working IC; a recent study stated that the industry average was 3.5 spins. Therefore, we
would like an IC technology that allows us to implement our system's structure on an IC, but
no + +
C01U1ection
connection
that doesn't require us to manufacture that IC Instead, we want an IC that we can program in
the field, with the field being our lab or office. The term program here does not refer to
writing software that executes on a microprocessor, but rather to configuring logic circuits
and interconnection switches to implement a desired structural circuit.
Programmable logic device (PLO) technology satisfies this goal. A PLD is a pre- OutO
manufactured IC that we can purchase and then configure to implement our desired circuit.
An early example of a PLO was a programmable logic array (PLA), introduced in the
early 1970s. A PLA was a small PLO with two levels of logic, a programmable AND array
and a programmable OR array. Every PLA input and its complement was connected to every
AND gate. So if a PLA had IO inputs, every AND gate had 20 inputs. Any of these
connections could be broken, meaning that each AND gate could generate any prodl!_ct term.
Likewise. each OR gate could generate any sum of AND gate outputs. A PAL (programmable
array logic) is another PLO type that eliminates the programmability of the OR array to
reduce size and delay. PLAs and PALs are often referred to as simple PLOs, or SPLDs.
As IC capacity grew over' the years, SPLOs could not simply be extended by adding more
inputs, since the number of required connections to the AND array inputs would grow too
high. Thus, the new capacity was taken advantage of instead·by integrating numerous SPLOs
on a single chip and adding programmable interconnect between them, resulting in what is
known as a complex PLO, or CPLD. CPLOs often contain latches 10 enable implementation
of sequential circuits also. Figure 10.7 illustrates a sample architecture for a CPLO. The top
half of the figure is an SPLO that can implement any function of the chip's input signals as
well as any SPLO output signal. The bottom half represents another identical SPLO. The
array on the left consists of vertical lines that c;m be programmed to connect with any of the
,, horizontal lines, so that any signal's true or complemented value can be fed into any gate. The
i output of each SPLO feeds into an IO cell. The IO cell can be programmed to pass the
latched or unlatched, tfucor complemented, output to the CPLO's external output_ and/or to
the programmable array on the left as input to SPLOs.
While able lo implement more complex circuits than SPLOs, CPLOs suffer from the Figure IO 7: A CPLD architecture.
problem of not ·scaling weli as their sizes increase. For example, supposed the CPLD
architecture of Figure 10. 7 had 4 inputs and 2 outputs. Then there wo1.1!d be 6 signals in the known a~ field-programmable gate arrays FP
programmable array, plus 6 more for those signals' complements. thus tequiring 12-input programmable iogic blocks connected by P~~g~~~t. An FPGA consists of arrays of
AND gates. Likewise, suppose there were 12 inputs and 6 outputs. Then therie would be 18 + The name FPGA . . a e mterconnect blocks
. was mtended to contrasr th(:<C d ,,; . .:
18 signa!s. requiring 36-input AND gates. Notice such an architecture doesn't scale well. which need masks to create the interconne . < . ~ f', .ces with traditional gate arrays.
The logical solution is to build devices that are more modular in nature. In paiticular. in contrast. have t11cir interconnection. ctwn._Jbet,.,.eer._the already layed-out gates. FPGAs
there is no need to connect every input signal and every output signal to every AND gate. A
more nexiblc approach can be used in which a subset of inputs and outputs are input to each
meaning ir. the designer' s lab. However"r::.;v:~
anywhere to bt found. and thus the nai . FPG
as log1~ blocks orogrammed in the field,
~ A GA arch1tec1ures do not have arrays of gates
SPLD. This more modular, more scalable approach to PLO design resulted in architectures · ne can be somewhat misleading.

-· --------------------
279
www.compsciz.blogspot.in ---· . .. -- ----- - --- -·-------·
rammin is done by setting bits within the logic o_r interconnect blocks. Those bits
Prog . g I ·1 (EPROM EEPROM) or volatile (SRAM) memory technology.
are stored using ~onvo :ue d -'n PLDs is an antifuse, which, as the name implies,
. behaves
Another opposite
nonv~latJ.le tecfu
10 a seque ~s: ax:tifuse is originally an open circuit but takes on low
resistance when programmed. CHAPTER 11: Design Technolo
10.5 Summary
uf; · IC ft; m this layout are complex,
Creating an IC _circuit layout and man actunngd: s sten:: designers can choose from
expensi\·e. and umc-consummg processes. Embed y . arket b trading off with
different IC technologi~s. in_order to reduce ~ cos~:~d ~:~:::om IC iechnology is the 11.l Introduction
other design. metncs. hke size. perfont~~~osiand .time-to-market but yields the most 11.2 Automation: Synthesis
most expensive technolog:,, m tenns o . r ASICs involve use of predesigned basic 11.J V~rification: Hardware/Software Co~Simulation
efficient circ~ts .. Set~:~~I~o!~!:i°~~!~t~-marke;.'but still providing good effici~ncy. 11.4 Reuse: Intellectual Property Cores
components. ms re du i_ emanufactured and thus eliminate the need for the designer 11.5 Design Process Models
;:~!i:::~e~~g~~:~~;~~~; ~~ge. greatly rcdm;ing NRE co:1 and ti'.11e-to~;;:!:;.:t; : l 1.6 Summary
. ificantly inferior to custom or semi-custom IC m terms of size, power, p . h., t ll.7 Book Summary
sign t l11e designer mav choose to use PLDs early in the design ~rocess, sw1tc mg o 11.8
urut cosand . L '.G !)recess w1ien 11ie·des1gn
. even full-custom· later m · has stab1hzed. References and Further Reading
ASICs 11.9 Exercises

• . Snu·11 t M .J · · lrplication-'>peci/ic
Sebasuan · · :1• /nteRrated Circuits. Reading. MA: Addison 11.1 Introduction
Wesley. 1997.
We have described how to design embedded systems from processors, memories, and
interfaces, and in Chapter IO we described the various IC technologies available for
implemellling such systems. Recall that in Chapter I, we pointed out that IC transistor
10.7 Exercises capacity is growing faster than the ability of designers to produce transistors in their designs.
IO. I Using the NAND gate (shown in figure Figure 10.1) as building block .. (a)_ drawf ~:: This difference in growth rate has resulted in the well-known productivity gap. Thus, there
circuit schematic for the f1mction F . xz + yz'. and (b) draw the top-dO\~n view o has been tremendous inte est over the past few decades in developing design technologies that
circuit 011 an IC (make your layout as compact as possible} . _ tc will enable designers to produce transistors more rapidly. These technologies have been
Drnw (a) tJ1c transistor-level circuit schematic for a two-mput mulllplcxor,_and (b) tJ developed for both software and for hardware, but the recent developments in hardware
10.2 top-down \·icw of the circuit on an IC (make your layout as compact as possible.) F. , ire design technology deserves special attention since they've brought us to a new era in
l(l.J llnplement tJ1c funclion F " xz ' vz' using the gate array structure gnen m ig1 embedded system design.
I06(a) Design is the task of defining a system's functionality and converting that functionality
into a physical implementation, while ;atisfying· certain constrained design .metrics and
optimizing other design metrics. Design is hard. Just getting the functionality right is tricky
because embedded system functionality can be very complex, with millions of possible
enviromnent scenarios that must be responded to properly. For example, consider an elevator
controller, and in particular the many possible combinations of buttons being pressed, the
elevator moving, the doors being open, and so Oil Not only is getting the :functionality right

281
· · --· --, l
-Chapter 11: Design Technology
11.2: Autoriiatron: syllffl~
l
<
Specification
Automation
Reuse
,;,,,fi<a<km Implementation
CJCJCJ
Figure 11 .1: Productivity improvers.
hard, but creating a physical implementation that satisfies coristraints is. also very difficult
because there are so many competing, tightly constrained metrics,
These difficulties slow designer productivity. Embedded system designer productivity
can be measured by software lines of code produced per month or hardware transistors
produced per month. Productivity numbers are surprisingly low,· with some studies showing
just tens of lines of code or just hundreds of transistors . produced per designer-day. In
response to low production rates, the design community has focused much effort and
resources to developing design technologies that improve productivity. We can classify many
of those technologies into three general te,;hniques, illustrated in Figure l l. l:
\. Automation is the task of using a computer program to replace manual design effort.
2. Reuse is the process of using predesigned components (whether designed by humans
or computers) rather than designing those components oneself.
3. Verification is the task of ensuring the correctness and completeness of each design
step. . · · Figure 11.2: The codosign ladder.
Providing thorough coverage of the advances in these productivity-improving techniques
for embedded systems over the past couple of decades would require an entire book itself. complexity b~ga~ to grow. Because of the different techniques used to desi~ software and
Instead, we will focus in this chapter on a few advances that have enabledihe unified view of ~dware, ~ div1_s1on between the fields of hardware design and software design occurred. As
hardware and software design. First, we will discuss the automation technique of synthesis, •l!ustrated m Figure .. ! 1.2, design tools simultaneously evolved in both fields, albeit at
which has made hardware design look like software design. Second; we will discuss the reuse different_ rates, ~o allo"'. behavior description at progressively more abstract levels, in order to
·of cores in the hardware domain, which has enabled the coexistence of general-purpose manage mcreasmg design complexity. 11ti"s simultaneous evolution has brought us to a point
processors (software) and single-purpose processors ('.hardware) on a single IC. Third, we will t~y. where both fields ~e the sequential program model to describe behavior, thus a
describe the verification of hardware-software co-simulation, which has enabled designers to reJommg of the two fields mto one field seems inuninent.
_verify compl~te hardware/software systems before they are implemented. As shown in Figure 11.2, early software consisted of machine instructions, coded as
sequences of Os and ls, necessary to carry out the desired system behavior on a
general-p~ose processor. A collection of machine instructions was called a program. As
,1 1.2 Automation: Synthesis progr~ size~ grew from hundreds of instructions to thousands of instructions, the tediousness
.of dealmg with Os and Is became evident, resulting in use of assemblers and linkers. These
"Going up": The Parallel Evolution of Compilation and Synthesis tools automatically translate assembly instructions, consisting of ·instructions written using
l~tt~rs _and numbers to represent symbols, into equivalent machine instructions. Soon, the
Wheri processors were first being designed in the late ! 940s IDd early 1950s, designing a
lmutat:Ions o~ asse~bly instru~tio~ became evident for programs consisting of tens of
compuier system consisted mostly ofhardwarc design; software, ifit was used, was fairly
thousands o( m~ctJons, resultmg m the development of compilers. Compilers automatically
simpL:. However, as the idea of the general-purpose processor began to take hold, software
translate sequential programs, written in a high-level language like C, into equivalent
'282 Embedded System Design

283
www.compsciz.blogspot.in --- ·, · . · -- ., _ ___ . , __ _ _ _
I
C_haplef"· 11: Design Technology

1(2: Automation: Synthesis
assembly instructions. Compilers became quite popular_ starting in the I 960s, and their
popularity has continued to grow. Tools like assemblers/linkers, and then compilers, helped idea idea
software designers climb to higher abstraction levels.
Early hard•.,•are oc ·::'sted of cin;uits of interconnected logic gates. As circuit sizes grew
from thousands cf "':: · · .<:s :-;f thousands, the tediousness of dealing with gates became
apparent, resulting·. ··the ,, ::velopment of logic synthesis tools. These tools automatically
convert logic equations or finite-state machines into logic gates. As circuit sizes continued to
grow, register-transfer (RT) synthesis tools evolved. These tools automatically convert
FSMDs into FSMs, logic equat'ons, and predesigned RT components like registers and implementation
adders. In the 1990s, behavioral synthesis tools started to appear, which convert sequential
(a) (b)
programs into FSMDs.
Therefore, we now see that, while for several decades the starting point for the fields of
hardware design and software design consisted of very different design descriptions, today ·,
·~
t
both fields can start from sequential programs. Figure 11.~: The abstraction pyramid: (a) a model at a higher abstraction level has more potential implementations·
Why did the hardware design field take some 30 years longer to climb the abstraction (b) the design process proceeds to lower abstraction levels, narrowing in on a single implementation. ·'
\1· ladder to the level of sequential programs? One reason is thai hardware design involves many
t::
1:i
more design dimensions. While a compiler must generate assembly instructions to implement
a sequential program on a given processor, a synthesis tool must actually design the processor Synthesis Levels
itself. Extensive research and more powerful computers have enabled synthesis tools to In the following sections, we provide brief overviews of the details of synthesis at different
address the problem adequately. A second reason is that the very fact that one chooses to abstraction levels. Unlike compiler users; synthesis tool users must have a fair amount of
implement behavior in hardware rather than software implies that one is extremely concerned knowl~ge about synthesis. Compilers tend to be fairly inexpensive and easy-to-use tools.
about size, performance, power, and/or other design metrics. Therefore. optimization is Synthesis tools, on the other I,.and, range from costing hundreds of dollars to tens of thousands
crucial, and humans tend to be far better at multidimensional optimization than are computers. of doll~s. The user must control perhaps hundreds of synthesis options. Furthennore,
as long as the problem size is not too large and enough design time is available. Just look. for syn~es1s too_ls may ~e many hours to run, and their output occasionally needs to be
example, at how many decades it has taken for computers to be able to seriously challenge the modified. This compleX1ty associated with synthesis stems from the fact that optirni7.ation is
world's.best chess players. If the game of chess had evolved such that players only had IO ab_sol~1tely crucfal_wh~n synthe,sizing har~ware, and each user will have different optimization
seconds to think of each move, and the playing board was the size of a football field with tens cntena If opl..imIZation wasn t so crucial, one would simply implement one' s system as
of thousands of pieces, then we'd have a situation more like that of IC design. in which software rather than as hardware.
automation today is clearly better. .. We now provide a brief overview of the various levels of synthesis. A standard definition
We see above that, like an elevator going up, both hardware and software. design fields for synthesize_ is "fonning a complex whole by combining parts." In the context of digital
have continued to focus design effort on increasingly higher abstraction levels. Starting design hardw3;e des1~, howeve~, ~e ~nn has taken on the meaning of "automatically converting a
from a higher abstraction level has two advantages. First, descriptions at higher levels tend to :-Jstem s behavioral descnpuon into a structural implementation," where that implementation
be smaller and easier to capture. For example, one line of sequential program code might JS a_ comple~ whole formed by parts. The structwal implementation must optimize some set of
translate to one thousand logic gates. Second, as Figure 11 J(a) illustrates, a description at a design metncs, such as performance, size, and power.
i. higher abstraction level has many more possible implementations than those at lower levels. To better understand the meaning of converting from a behavioral description to a
p One can think of holding a flashlight higher above the ground - the higher we go, the more structural implementation, Gajski developed the Y -chart, shown in Figure I L4. The chart
ground we illuminate. For example, a sequential program description may have possible consists of three axes, behavioral, structural, and physical, each representing a type of a
implementations whose performance and transistor counts differ by orders of magnitude. description of a digital system, as follows: .
However, a logic-level description may have transistor implementations varying ii: . • A behavioral description defines outputs as a function of inputs. It ~escribes the
performance and size by only a factor of:wo or so. algorithms we'll use to obtain those outputs, but does not say how we'll implement
those algorithms.
• A structural description implements that behavior by connecting components with
known behavior. ·
_ 2 _ 8 _ 4 - - - - - - - - - - - - - - - ~ - - - - - - - - - - E -,m-b_ed_d_e_d_S-ys-t-em-O-es-ig-n __E_m_be_d_d-ed-S-ys-te_m_O-:--es-lg_n_-"-_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
28-,-5 . . • I
www.compsciz.blogspot.in ------·- - - --- 1~__.i
~
Chapter 11: Desjgn Techno~y 11.2: Automation:.Synthesis · ~
same level or a lower one, but not a higher one We now describe s th · hn'
several different abstraction levels. · yn esis tee iques at
Structural Behavior
Logic Synthesis
Sequential programs Logic synthesis .automatically converts a logic-level behavior, consisting of log·1c equations ·
Processors, memorie\ di . .
an . or an FS~, _m to a stru~m:11 1mple~entahoo, consisting of connected gates. Let us divide
Register transfers logic synthesis mto combmauonal-logic synthesis Combm·ati·onal Iog1c
·
Registers, FUs, MUXs . . . .. , and FSM synthesis. .
synthes1s can be further subdivided 1nto two-level minimization and multilevel minimization
Gates, flip-flops Logic equations/FSM Two-level logic minimkatio~: We-can represent any logic function as a sum of produ~s
{or a product of sums). We can implement this function directly using a level consisting of
Transistors Transfer functions AND ~ates, one for each produ~t tenn, and a second level consisting of a single OR gate.
Thus, we have two levels, plus inverters necessary to complement some inputs to the AND
gates. The longest possible path from an input signal to an output signal passes through at
Cell Layout most two gates, not counting inverters. We cannot in general obtain faster perfonnance. For
example, the function F = abc'd' + a'cd + ab'cd would be implemented with three AND gates
Modules followed by one OR gate, as shown in Figure l l.S(b).
. .Si~ce. pe~o~~ !s al_ready the best possible, the main goal of two-level logic
Chips m1runuzation 1s to rmmmize size. We can set a goal of minimizing the number of AND gates
m_a_sum of products imp~ementation. We can state this goal more formally as that of finding a
. Boards m1rumum cover of a logic expression, or function. We will now provide several definitions
that lead us to the definition of a minimum cover. We are given a set of variables (inputs to
Physical the function), such as: {a, b, c, d}.
• A literal is the appearance of a variable or its complement in a function. For
example, the above function has I I literals: a, b, c', d', a ', c, d, a, b', c , d.
• A min term is a product of literals in which each variable or its COlJlplement appears
Figure\ 1.4: Gajski"s Y-chart.
exactly once. For example, in the previous function, abc 'd' is a mintenn, but a'cd is
• A physical description tells us the sizes and locations on a chip or board of a not, because b does not appear. Any logic function can be expressed as a sum of
minterms; note that each mintenn corresponds to a row in a truth table. For example,
system's components and their interconnecting ~ires. . . .
For example, addition is a behavior, while a carry-npple adder 1s a structure. Likewise, a F could be expressed as abc'd' + ab'cd + a'bcd + a'b'cd.
sequential program that sequences through an array to find the array'_s lar~est-valued element • An implicant is a product of literals in which each variable or its complement
appears no rriore than once, rather than exactly once as for minterms. · in the earlier
is a behavior while a controller and datapath implementing that algonthm 1s a structure.
The ch~ also shows that each description can exist at one of various lev~ls of ·· function, ab 'cd and a'cd are examples of implicants. An implicant covers one or
abstraction. For example, at the ,gate-level of abstraction, a behavioral description consist~ of more minterms; for example, a'cd covers rnintenns a'bcd and a'b.'cd.
logic equations, a structural description consists of a connec~on of gates, and a phys•~. • A cover of a logic function is a set of implicants that covers all of the function's
description consists of a placement of gates/cells and a . routmg ~o~g them. As. ai_iothe minterms.
•. Finally, a minimum cover is a cover having the minimum possible number of
l
example, at the system level of abstraction, a behav10ral des~n~uon may cons~st of
communicating sequential programs (processes), a structural descnpuon of a connection of implicants.
processors and memories, and a physical description of a placement of processor/memory Since each implicant corresponds to an AND gate, by finding a minimum cover, we have
cores and buses on an IC or a board.
Synthesis can generally be thought of as converting a behavioral ~escnpllon at a
. .
achieved our goal of minimizing the number of AND gates.
We can extend our goal by not only minimizing the number of AND gates but also
.
i
I
particular abstraction level to a structural description. That structural descnpt10n may be at the minimizing the number of inputs to each AND gate. We can state this goal formally as finding
a minimum cover that is prime. A prime cover's implicants are all prime implicants. A prime
implicant of a logic function is an implicant that is not covered by any other implicant of the
Embedded System Design .

l
286 l;mbedded System Design 2~1 I
www.compsciz.blogspot.in --·--·-----~ - - ~
·,·,~- ~r
r
11:2: AutomatiQn: synthesis
f Chllp_ter 11: Design Technology
minimal number of maximum~sized circles coyering the ls corresponds to finding the

cd minimum ~umber_ of prime .i~plicants, wh~e. each maximum-sized circle represents one
ab oo 01 ti 10
prime imphcant. Figure l l.5(d) 1Uustrat~s a rrununum cover, and Figure l l.5(e) illustrates the
oo o·0 1 0 corresponding logic circuit Note that this circuit has fewer gates and wires than the
F=abc'd' + n'b'cd + F 01 0 0 1 0 unoptimized circuit of Figure l l .5(b). Figure l l .5(f) and (g) show a minimum cover that is
a'bcd + ab'c°' II 1 0 0 0 prime (al I circles are maxinlum size) aild the C()rresponding circuit, having a smaller gate and
10 0 0 1 0 fewer wires than the minimum cover that was not prime. "Th.is circuit represents the optimum
two-leveI circuit. · / ·
4 4-input AND\gates and I 4 input However, the K-map approach becomes too complicating for functions with more than 5
OR gate -> 40 transistors or 6 inpilts. Another popular, computer-based, optimal approach uses an algorithm based on a
(b) (c) two-step tabular method. The first step finds all of a function's prime implicants, and the
(a)
second step finds a minimum number of these implicants that covers the function.
cd Uitfortunately,
.
cd00 0 I II 10
ab oo 01 II 10 the first step requires that we first list all the minterms of a function, but there
ab may be a prohibitively large number of ininterms. In particular, if there are n inputs, there
00 0 0 '{1\ 0
00 0 0 ft\ 0 maybe up to 2n mintertns. For a function with 8 inputs, there may be 256 minterms, which is
F 01 0 0 \J) 0 reasonable. For a function with 32 inputs, .there may be over 4 billion minterms, which would
0I 0 0 \1) 0
1. '..L 0 o· 0 d _'t._=l_~~8.-~-f'-L.Jl
11 r0 0 0 exceed the memory liinits of most computers. Larger sized examples: which are extremely
10 0 0 i 1\ 0 common, would require trillions of years on the fastest computers just to enumerate the
0 0 0 :i; 0 2 4-input AND gate; minterms. This phenomenon is called exponential comj>lexii.y, and it limits the tabular method
l 3,input AND gates, and I 4
F=abc'd' + a'cd + ab'cd input OR gate -> 28 transistors F=abc'd' + a'cd+ b'cd to
to systems witli relatively few inputs. Adding the problem is the fact that the second step of
(0 finding a minimum cover also has exponential complexity.
(d) (e)
Beeause solving the two-level logic minimization problem optimally is very hard for
examples with even a moderate number of inputs, most logic synthesis tools include inexact
approaches using heuristics. A heuristic is a solution teclmique that is not guaranteed to result
F in the optimal solution but hopefully will come close. A popular heuristic approach is iterative
F improvement. In this approach, we start with an initial solution, such as the original logic
equation, and we repelltedly (iteratively) make modifications to that solution to bring us
toward a better solution. For example, suppose we are given the function F = a'b'cde + a'bcde
I 4-input ANP gate, + defghij + klmnopqrstuv. We inight try to improve this by trying to merge pairs of implicants
2"3-input AND gates, a,nd I 4 f=((a' + b'>, + c + d)' ., into a smgle implicant that covers .the pair. For example, we could merge the first two
inj,ut OR gate --> 26 transistors
implicants into the single implicant a'cde, which is obviously an improvement. Note that we
(h)
(g) didn't have to enumerate all the ininterms to find this improvement
-map represe~tation,
. .. --.. Tuere are several conunoit modificatfoits used in heuristic two-level ·logic synthesis.
Figure 115: Logic minimization: (a) oci~nal function,.~> djrecl impleme~tatio~, (c) i<m:naugh
(d) minimum cover and i_ts (e) implen,u;ntation, (I) nuna~um cover that ,s pnme and its (g)
implementation, (h) Expand replaces each nonprinie implicani by a prime implicant that covers it, deleting all
multil.evel implementation using three levels but fewer uans1stors. other implicants covered by the new priine implicant Reduce does basically the opposite of
expand. Reshape expands one implicant while reducing another, thus maintaining the same
function. For example, in the earlier function, abc'd', b'cd, and a'cd_ are _al~ pn· me implicants. total number of implicants. ]"edundant selects a minimum number of implicants from the
but a;b'cd 1s not, because even though it is an implicant of the functton, it 1s. covered by a'cd, existing ones while still covering the fwi.ctiori. Logic synthesis tools differ· by which of these
moijifications they use and the order in which they apply these modific;:ations:
as well as by b'cd. . . . . . . . .
We can optimally solve the two-level logic mm1~1zat10n problem of finding .t minimum Muiti/evel Logic Minimization: The previous paragraphs dealt with rninimizing the AND
cover that is prime, One popular pencil-and-paper approach_ us~s ~amau_gh maps (K-maps), gates and their sizes in a tw~level sum-of-products implementation. We noted that a two-
illustrated in Figure l l.5(c) an~ (d)._\Vhile we won't descnbe 1~11'_ <tetad her~. we note for level implementation has excellent performance, with the longest path being only two gates.
those f~miliar ~ith K-maps that the I~ in the chart correspond_to mmterms, and drawing the However:,perhaps we don't need such great pelfoimance. Rather, perhaps we are willing to
Embe(!ded System 289

288
::{,
~j{;3 r,>a.,
..
I
<,
.
'
Chapter 11: [>.esign Technology
· ··11.2: Automation: Synthesis
9
Multilevel logic minimization is thus an even harder problem than two-I 1 · . . .
. .
Theretiore, heunstics . _ . . eve mm1m1zat1on
are agam used by logic synthesis tools addressing this obi 1 · ·
· · drawmg
·improvement heunstJcs · from a smte of equation modificatio pr em. terauve
. . .
prevailing approach. ns are agam the
. .Synthesis:
. .FSM . Synthesizing
. an FSM to gates consists of two mai·n tasks state
muunuzatJon and state encodmg. State minimization reduces the number of FSM '1 b
·d u·ry· and · · sta es y
1 en mg _merging eqmvalent states. Reducing the number of states may result in a
smaller state register and fewer_gates. Two states are equivalent if their outputs and next states
. . .
are eqm~alent for all possible mputs. We can use an algorithm based on a tabular method to
~Ive this problem exactly. We start with a table showing each possible pair of states as a cell
size m the table. We step _through the ~lls, ~king each cell as not equivalent, equivalent, or
dependent on other parrs of states bemg eqwvalent.. which we list in the cell. "Not equivalent"
Figure 11.6: Trading off size and performance.
means the cell's two states either have different outputs or have a next stale no,ir whose c II ·
marked as not eqmva. I "E . al ,, ,,.... e is
ent. qmv ent means the cell's two states have the same outputs d
sacrifice some perfonnance if such a sacrifice would decrease the circuit size _further ~ ne~t state pairs ~t are all known to be equivalent. We step through the cells several ti:s
even the best two-level implementation. We can achieve such a trade-off by usmg muluple unld all cells are either marked equivalent or not equivalent.
levels of logic. . . .. ~e drawb~~ of the above algori~ is that the table size is n2, where n is the number of
As a simple example, consider tl).e function F = adef+ bdef+ cdef+ gh. The function can states m the ongmal FSM. Although n 1s not nearly as bad as 2". it still grows quick! ti
not be minimized further in two levels, and would require five gates (four AND gates and one I ·· · h y or
arger n, reqwnn~ muc compu_ter memory and computations. An example with perhaps 500
OR gate) if implemented. However, we could easily reduce the number of gates by factoring states would reqwre a table of size 250,000. Thus, many tools resort to heuristics.
out the def tenn from the first three implicants, resulting in F = (a+ b + c)def + gh. This . S!ate_enco1in? en~es each state as a unique bit sequence, such that some design metric
function requires only four gates (two AND gates and two OR gates). Furthermore, note that lik~ s1ze.1s o~Unuzed. Given n states, we require a minimum of logi<n) bits to represent n
the number of inputs per gate is reduced too. If each gate input requires two transistors, then uruque en~mgs. Th~re are rl! possible assignments of n states ton encodings (the first state
we've reduced the number pf transistors from 18 * 2 = 36 down to 11 * 2 = 22. The trade-off has n _possible ~ncodmgs, the second state has n - I since the first state already used one
·is that this implementation has slower performance, since it now has three levels rather than encodmg, the thi~d state has n - 2, ~d so on). We can't possibly try_all possible assignments
two, due to inputs a, b, and c passing through three gates before reaching the output. of s_tates to encodings for moderate size examples, because n ! grows so quickly. Heuristics are
· · We illustrate this trade-off of size and performance in Figure 11.6. The filled gray area agam common. '
represents.the set of all possible circuit implementations of a particular logic expression. The . Techno/~gy Mapping: We must specify the library of gates available for use in an
x-axis represents circuit size, and the y-axis represents ciicuit delay. Ideally, ~e'd l~e ~o 1mplementat1on. ~or ex~mple, a_s a trivial extreme, we may have only simple two-input AND
minimize both, but generally no such circuit exists, as illustrated by the hypothetical pomt m and OR gates available m our hbrary. At the other extreme, we may have numerous sizes of
the lower left of the figure mth an X through it. Two-level logic has minimum delay, and thus AND, OR, NAND, NOR, XOR, and XNOR gates, plus efficiently implemented meta-gates
two-level logic minimization seeks to find the smallest-sized two-level implementation, ~s (called cells or macros) such as multiplexors, decoders, and combinations of gates (like
illustrated. Further size reduction requires an increase .in delay (i.e., more than two logic AN?-OR-~RT). Thus, logic synthesis must generate final structure consisting of only the
. levels). Multilevel logic minimization seeks to find the Pareto-optimal solution (one on the
lower-left curved booodary of the filled area) for a given delay or size. .-
as
available hbrary components and should use cells and macros as much possible to improve
the ~verall design efficiency. Th!s. task is called technology mapping. Technology mapping is
.··· .As another example, consider the earlier two-level logic function: F = abc'd' + b'cd + agam a complex problem, requmng use of heuristics. Furthennore, a tool that integrates
a'cd. Simpk: algebraic manipulation yields the equivalent function: F= ab~'d' +(a'+ b')cd. t~hn~logy ma~ing with logic minimization, while making the synthesis problem harder,
We now "have three levels, but fewer transistors. We can simplify even furtl).er by noting that will likely result ma more efficient circuit. ·· ·
abc'd'= (~bc'd)"= (a'+ b' + c + d)'= ((a'+ b)+c+ d)'. So now the qri~ function is: F= ~he Impact~! Complexity on· Logic Synthesis User: In the previous paragraphs, we
((a' +.b}+ c + d)' (a'+ b)ccl. So we~ reuse the (a'+ b) term to further (rduee transistors descnbed the basic subproblems that together make up the logic synthesis problem. We saw
·do.wn io only 20, as shown.·in Figure I1.5(f)_. ! ..-.r
·; .
that each problem had a number of possible solutions that was eriormous for moderate-sized
But how did we come up. with this new function using fewer transistors? You ·can problems, such that ·enumerating all possible solutions and choosing the best resulted in
probably see.that it is.not easy. There are many .different ways to manipulate the equations.
. •· -
- - prohibitive space and/or time complexity. In most cases, rio algorithm ~f reasonable
Embedded System Design •· Embedded System Design

291
11 ·2= ~utomation; Synthesis
complexity exists to optimally solve those problems. Therefore, most tools resort to heuristics·
having a far lower complexity in order to solve the problems using a reasonable amount of
memory and computation time.
The impact of complexity and of the use of heuristics ·on logic synthesis users is
significant. Logic synthesis tools differ tremendously. according to the heuristics they use.
• " .l Wire
Some 1ools use computationally expensive heuristics, thus requiring long run times measured • "
in hours or even days, and requiring huge amounts of memory typically found only on • " • Transistor
" " " • ., • "

>,
expensive computer servers or engineering workstations. In contrast, other tools use fast ""
-,.;
Cl
heuristics, requiring run times measured in minutes and requiring only small amounts of
memory typically found on PCs. Users should not expect the same quality of resillts from
these differenl 1ools. Furthermore, the tools with expensive heuristics usually allow a user to
" ••• •
control the optimization effort that the tool will apply. When just trying to develop prototypes,
the user therefore shou'.:I select low optimizatiop to get fast synthesis run times, while the Reduced leaturc: size
near-final product should use high optimiz.ation. Additionally, these tools may allow the
designer to indicate the relative importance of various design metrics, like perfonnance, size,
and power, so the designer must indicate this information.
To achieve decenl results, nearly all tools use super-linear-time heuristics. A linear-time
Figur.e 11.7: The changing values lran.<istor delav
. Som1condu,1ors. 1999
,· .
d .· d
Ilic domain of physical design. Thus, we sec that the cl

.
. .
. --~~-
· an ""e elay. Source: International Tee...._.___ R d r
!.
heuristic requires roughly:;, computations (times some constant factor) for a problem of size ph):s1cal_ design is no longer possible. lnslcad. we m ean scparat1~n of logi~ synthesis and
n. A super-linear-time heuristic (usually just called nonlinear, though that could refer to design sunullaneously ifwc are reallv to d ·. ffi _ust ~rform logic synthesis and physical
sublinear, too), in contrast grows more quickly than !hat, for example, requiring n3 • es1gn e 1c1ent c1rcu11s.
computations. This nonlinear growth means that a large problem may require much longer run Register-Transfer Synthesis
time than two proble1ns each half the size of the large problem. For example, I003 is more
Logic synthesis allowed us 10 describe .
than 503 ,- 503 (i.e., 1,000,000 > 250,000). Furthermore, 1003 is much more than 253 + 25 3 + H · · · , .· our S\Slem as boolean cq r
25 3 + 253 (i.c:.. 1,000,000 >> 62,500). Likewise, memery usage may grow nonlinearly. Thus, owever. many syslems are too complex to initially . . . ua_ IOns. or as an FSM.
a logic synthesis tool user must often partition a system into several smaller systems having Instead, we often describe our svst . - describe al lh1s logic level of abslraclion
compu1ation model, such as an FSMD. em using a more abs1rac1 (and hence powerful)
equivalent behavior, in order to achieve acceptable synthesis tool run times and memory
usage. . Recall that an FSMD allo\\S variable declarahons of . ,
Integrating Logic SyTJthesis and Physical Design: l1tthe past, transistors, and hence logic anthme11c actions and condi1ions. Clearlv rn k . complex data types. and allo\\S
gates, had a very large . time delay compared with wires. Thus, it made sense to create gates than to convert an FSM to gates a~d th~re ~,or _1sk~ssary lo c_onvert-an FSMD lo
. th . R . · 1s e.x ra wor 1s performed by reg· 1
synthesis .tools that evaluated perfonnance in terms of the number of levels of gates from syn es1s. cg1stcr-transfcr (RT) s, nlhcsis lak FS IS er-transfer
input to output. As the. industry moves to IC manufacturing processes that involve smaller and single-purpose processor. consisting of a da1a
gencralcs a complclc datapath. consislm ' of r ~-
:,~1
::d MD and converts ii lo a cuslorn
. an FSM cont_rollcr. In particular. ii
smaller f~IUre sizes, transistors slu;ink not only in their siz:e, but also in their delay. That's the
good news. ·· to implcmcnl arilhmCIIC opcra(ions and ~ cg1stcr umts lo Slorc variables. functional units
Now for the bad news. While transistor delays shrink with reduced feature sizes, wire these olhcr units. II also gcncmlcs a~ FSM01~~cc11on ulmls _(buses and muhiplcxors) lo connect
. delays have actually begun to incrl!<lse! This phenomenon is illustrated in Figure l l.7.
c rcalmg
· Ihc datapath requires solvin , "'twocon1ro s 1l11s da1apa1h
k . .
Therefore, in the past, it made sense to think of circuits as transistors connected by wires. A.l!oca1i~n is !he problem of mstantiatin , Sl~ra e cy subproblems:_ allocat10n and_bmding. -~ .
llowev~r, in. the future, it appears .that we'll have to start thinking of circuits as wires Bmdmg is !he problem of rnappin<> FsJo g .um1s. funcuonal unns. and conncct10n units.
A · 1 . 0 • opcra11ons lo specific uruls
connected by transistors! · s rn og1c syn1hes1s. both of these in S} nlhesis problems arc ha;d to soh•e optimally.
This change in the ratio of transistor delay and wire delay impacts logic synthesis
tremendously. To understand the delay of a given logic expression, a synthesis tool can no BE!havioral Synthesi~ ·
longer just count the number of logic gates from input to output. Instead, the tool must In RT synlhcsis. we describe !he , r h :.
measure the length of the wires connecting those gates. But in order to know those lengths, an FSMD. However. for man , save ions I ~t occur on C~'Cl}' clock .cycle ?f the syslcm. using
the tool must know how the transistors are placed on an IC. Placing transistors was previously corrccl function of lhc inpuls.1ni ,!::~.;~:r~ onlil ml~rcsl~ I~ havmg tlie o~tpul be a
cycles. Therefore. we may. want to d .b c iohw . Jal unction, .1s broken down m10 clock
escn c sue a system usmg a sequential program.
292 Embedded System Desig~ .
.. ~rttbedded System Design
--~'· :.(~ 293

Chapter 11: Design Technology
. uential program into a single-purpose processor Scheduling is the task of determining when each of the multiple processes on a single
Behavioral synthesis converts a smgle seq B h vioral synthesis has also been referred to
structure that executes only that one program. e a processor will have its chance IQ execute on the processor. Likewise, memory ac~sses an~
bus communications must be scheduled. · ·
as high-level sy_nthesis. . . . FSl'vID in that it does not require us to sched~e the
A sequential p~ogram differs from an ribin the behavior. Therefore, implementing a These tasks may be performed in a variety of orders, and iteration among the tasks is
common.
system's actions into states when _desc .g d binding as in RT synthesis, but also
. · s not only allocation an System synthesis, like all forms of synthesis, is driven by constraints. A typi.cal set of
sequential program _req~ire . ent of a sequential program's operations to states. .
constraints dictates that certain perfonnance requirements must be met at minimum cost. In
scheduling. Scheduling is ~e asst~ hni ue for behavioral synthesis. First, we provided
In Chapter 2, we proVIded a simple_ tee q struct into an equivalent set of states, such a situation, system synthesis might seek to allocate as much .behavior as possible to a
templates for converting every sequential pro~dedcoans1·mple allocation and binding method, general-purpose processor, since a GPP may provide for low-cost, flexible implementation. A
. . hedul'ng Second we prov1 · minimum number of single-purpose processors might be used to meet the perfonnance
thus accomplishing sc t . . , ·able one functional unit for every operation, requirements. ·
namely, allocati~g one _storage urut for efivery~~e thi~ approach results in a correct processor
System synthesis for general-purpose processors only (software) has been around for a
and one connecu~n. urut for every t~s _er. Thus, behavioral synthesis tools use ad~a~ced
few decades, but hasn't been. called system synthesis. Names like multiprocessing, parallel
circuit, the circwt is clearly not opt1m~~~ allocation, and binding in order to optmuze a
processing, ll;lld real-time scheduling have been more common. The maturation of behavioral
techniques to carry ou~ the ~sks ~f sche ~d ~ompi!er optimizations that are a~plied before
circuit. They also typically mclu. e s_tan d d-code elimination . and loop unrolling. synthesis in the 1990s has enabled the consideration of single-purpose processors (hardware)
those tasks, such as constant propagation, ea , . . during the _allocation and partitioning tasks of system synthesis. This joint consideration of
general-purpose and single-purpose processors by the same automatic tools was in stark
System Synthesis and Hardware/Software Codesign . contrast to the prior art. Thus, the term hardware/software codesign has been used extensively
. sin e ential program (behavior) to a smgle-p~se in the research conununity, to highlight research that focuses on the unique requirements of
Behavioral synthesis converts a gl . sequbedded systems may require more than this. In such simultaneous consideration of both hardware and software during synthesis. However,
processor (structure). _However, complex e:vide better performance or power. Furtherm~re, .
pa..rticular, using multiple processors mayJ .bed using multiple concurrently executing
this term may be temporary in nature, as the distinction between GPPs and SPPs continues to
blur.
the original behavior may be better ~;:em synthesis converts multiple processes into
sequential programs, known as processes. fers to a collection of processors. Temporal and Spatial Thinking
multiple processors. Th~ term system here :e thesis involves several tasks. Transformati~n is
As we discussed earlier, the evolution of synthesis to higher abstraction Ieyels has had the
Given one or mo.re processes, system yn ble to synthesis. For example, a d"es1gner
the task of ~writing the processe~ to ~ m~re an;~":sses but analysis might show that those effect of enabling a unified view of hardware and software design, since implementing
functionality on general-purpose or single-purpose processors can be seen to have the same
may have described some behaVIor usmg w':np and th~s could be merged into one process.
design starting point of sequential programs. In fact, some researchers think that synthesis has
two processes a!"C really exc~usive to one :~is:rof two independent operations that could be
fundamentally changed.to the nature of the skills needed to build hardware. ·
Likewise, a large process aught actually Id be divided into two processes. Other common
done concurrently, so that process cou Ir Before synthesis, designers of hardware worked primarily in the structural do11!3in. They
transformations !nclude procedure inli~ing :d ~=~:o ::-types of processors to use to. connected simpler components, each having a well-defined functionality; to build more
complex systems. For example, a designer might have spent most of his/her time connecting
Allocation is the task of _selectm~ choose to use an 8-bit general-purpose process~r
implement the processes. A designer might . I the designer might use a 32~b1t logic gates to build a coiitroller, or connecting registers, multiplexors and ALUs to build a
along with . a single-purpose pr~ssor. Alternative %cessor, and multiple single-pUfP?se datapath. Gajski referred to this era as the "capture-and-simulate" era of ·hardware design,
since _designers would capture these systems using computer-aided design tools, and then
general-purpose processor, an _8-b1t general!-~urpo::ssors memories, and buses. Allocat1on simulate the system to verify correctness, before fabricating a chip.
proceSsors. Allocation actually mcludes sehiecung
t pr ,
is essentially the ~esign of the system ~c tee ur~sses to processors. One process ~ be With the advent of synthesis, designers of hardware work primarily in the behavioral
Partitioning is the task of mappmg th~ !' I ocesses can be implemented on a smgle domain. They describe FS]\,fl)s or sequential programs, and they then synthesize these items
implemented on multiple processors, and m . ~p :J':.Uong memories· and communications, automatically into structural connections of components. Gajski refers to this era as the
processor. Likewise, variables must be parUUon , "describe-and-synthesize" era.
among buses. · · · This paradigm shift from working in the structural domain to_the behavioral domain has
not only increased productivity but also had the effect of dramatically changing the skills
· necessary to be a good hardware designer. During the capture-and-simulate era, strong spatial
Embedded System_Design
11.3: Verification: Hardware/Software Co-Simulation
reasoning skills were needed ·to connect components. Structural diagrams were the main verify !lie correctness of an ALU by .providi:1g all possible i . .
method for communicating system design information, supplemented with English the ALU outputs for correct -results which we f nput combrnauons, and checking
descriptions of how the system worked. For example, recall that in Chapter 4, we mentioned L'k · . , • o course have to co .
' ew1se, we can verif'.)' that an elevator conlroller won't h , mpute usmg other means.
that timers were typically described .in datasheets using a diagram of the internal structure of elevator is movmg, by simulating the conlroller for all .b a\e the door open wlule the
the timer. Hcwevcr, during the describe-and-synthesize era, designers must have very strong that the door is always closed when the ele"afor ·s _poss, le input sequences and checking
temporal reasoning skills, since they aren't working so much with components as they are
u . . • - 1 movmg..
. nfortunately, s1mulaung "all possible inputs" or " II . .
with things like FSrvIDs. FSrvIDs (and sequential programs) are created by composing states m1poss1ble for all but the simplest of systems N I. h a possible input sequences" is
(or statements) that have relationships with one another over time. Although designers always 32-bit ALU requires simulating 232 * 232 . 26~ ice I _at s1_mulatmg all possible inputs of a
had to have some temporal reasoning skills, those skills have nciw .become extremely Id - I , or . possible mput combi t" E
cou s1mu ate one million combinations per sec d . I - na mns. ven if we
important to create good hardware. These skills are often associated with peopl~ who are w Id . on , s1mu atmg that numbe f b'
ou requue over half-a-million years Furth . r o com mations
strong programmers. tor a sequenual . circuit,
. · ennore an ALU 1s onlv a comb· · 1 .
like an elevator controll ' . - - rnationa circuit;
At the same time, the structure of the implementations output by todai s synthesis tools input combinations but also all possible se er, we must simulate not ?nly all possible
is heavily influenced by lhe style with which a designer describes the behavior. Thus, the simulating all possible inputs or . ut quen~es of such combmat1ons. Instead of
design must still have. a strong understanding hardware structure and know how to write possible inputs. Titis subset usual';;inc~~~:~ceslc~~~;i"ers can_or.ly simulate a tiny subset of
behavior that will synthesize into an efficient implementation. Boundary conditions for an ALU might incl: ues, pl.u: known boundary conditions.
another where both operands are all I Th e ?ne case "h.,re both operands are Os and
design is correct and complete but does s;t us, s1mul_a11on mcreases our confidence that a
c . • n prove anythmg.
11.3 Verification: Hardware/Software Co~Simulation ompared wi th a physical implementation · I ·
respect to testing and debugging a system. The I~ smm atJon has several advantages will!
Formal Verification and Simulation

Verification is the task of ensuring that a design is correct and complete. Correctness means
system. A designer can control time as well a i
controllability and observability. Controllabili . . o ~ost .":1POrta...,t ~dvantagcs are excellent
~s tht ab,hty to comrol the execution of the
start a simulation whenever desired. As for d:1/va~a ues. -~s. for time, a designer can stop or
that the design implements its specification accurately. Completeness means that the design's or even a system's internal values to ,d . ues, a_ ues1gner can set a system's inputs,
specification described appropriate output responses to all relevant input sequences. examine system values A d . , an} esired quantities. Observability is the abili1y to
· es1gner can stop a sim I ti d . ·
The two main approaches to verification are known as formal verification and simulation. environment values. With excellent control . . u a on an . exarrune any system or
Fonnal verification is an approach to verification that analyzes a design to prove or disprove ~esigner to peiform debugging that wouldla~~~ty and observ_a~1hty, s_imulation allows a
certain properties. We might seek to formally verify correctness of a particular design step, implementation. A designer can, for exam I e been - near!? nnposs1ble- on a physical
such as verifying that a particular structural description correctly implements a particular simulated time; observe internal system val~=~ stop a . s1mula11on after, say, 2 seconds of
behavioral description, by proving the equivalence of the two descriptions. For example, we before restarting the simulation Th d . , and modify system or enviro001ent values
might describe an ALU behaviorally and then create a structural implementation using gates. nanoseconds, obseiving values ~t C:c/:i:er could also step through small inteivals, say, 500
We can prove the correctness of the structure by deriving a aooiean equation for the outputs, Simulation has some other advantages Settin . .
creating a truth table for those equations, and showing that this truth table is identical to the less ti~e than setting up a physical im Iem~ntationg? a s1mulatJon of a syste~ may_require
table created from the original behavior.. Alternatively, we might seek to formally verify behavioral description may take hour; da . or example, settmg up a s1mulatrnn of a
completeness of a behavioral description, be proving formally that certain situations always or implementation Furthennore sim lat· or. yafis, vers~s weeks or months to obtain a physical
never occur. For example, we might prove that for an elevator controller, the elevator door · , u rnr. 1s s e so 1f the system d , k
property damage or threat to lives oc F , . oesn t wor. properly. no
·can never be open whjle the elevator is moving by deriving the conditions for the door being simulate an automobile cruise-controll c:f; or e~ample, ~'e would most certainly want to
open and showing that these conditions conflict wi_th those for the elevator moving. Unfortunate! . I . . er ore testmg one man automobile
. y, s1mu atrnn also has several d. d .
The more common approach to verification is simulation. Fonnal verification is a very implementation: isa vantages compared with a physical
hard problem, and as such has been limited in practice to either small designs or to verifying • Setting up simulation could take much . .
only certain key properties. Instead, by far the most common approach to verification in environments. A designer ma · d tm_ie for systems wuh complex external
practice is simulation. Simulation is an approach in which we create a model of the design than the system itself. y _spen more lime modeling the external environment
that can be executed on a computer. We provide sample input values to this model, and check ·
that the outputvalues generated by the model match our C?xpe.ctations. For example, we cari
296 Embedded Syst~m Design · -Embedded System Design

. 297
1~:·!
_. ., ..
11.3: Verification: Hardware/Software Co-Simulation
~C~ha~P~,e~-r:1~1::~[i~es~i~g~nzTfec~h~n~o~k>~g~y==========================================---- operation, determine the current values of B and C, compute A,- and send the results
s~mewher~. Thus, this si~gle operation might_ require 10 to 100 simulator operations. The
1 hour simulator 1s ~ctually runmng under an operatmg system, so each simulator operation may
actually requn:; perhaps 10 to 100 operating system operations. Finally, each operating
l day
system operation may translate to 10 hardware operations. So each operation we wish to
simulate may require 1,000 to 100,000 actual hardware operations.
l.4 months To overcome this problem of long simulation time, we have some options. One option is
instruction-set simulation 1.2 years to reduce the amount of real time that we simulate. So rather than simulating I hour of
12 years execution, we might just simulate I millisecond of execution. requiring 10,000,000 * 0.001 =
cycle-accurate simulation
>l lifetime 10,000 seconds, or about 3 hours. However, simulating I milisecond of execution does give
register-transfer-level IIDL simulation us much confidence in the correctness and completeness of our system, For example, I
I miUenium
x l 0,000,000 L-----~ga::t:=..e-.::,le:_v.....
el_:HD_L_s_im_ul_a_tio_n________ millisecond of execution of a cruise-controller tells us very lirtle about how the controller
responds in a variety of scenarios. Nevenheless, because of the slow speed of simulation,
. - ed with real-time execution. many embedded systems are only simulated for perhaps a few seconds of real-time before
- d f different types· of simulation/emu1at1on compar
Figure l l .8: Sample relauve spee s 0 . S . VLSI and Philips product literature. they are first implemented physically.
These numbers depend on the system size. ource.
Another way to overcome this problem is to use a faster simulator. There are two
. d I f the environment will likely be somewhat incomplete, so ma~ n~t common ways that simulators can be made faster. One way is to build or use special hardware
• 1hc mo es o II h that behavior 1s
O • for simulation purposes. These devices are known as emulators, which we'll discuss in an
model complex environment behavior correctly, esp..,c1a y w en
upcoming section. Another way is to use a simulator that is less precise or accurate. In other
·,-,
um.locumentcd. ed t execution of a physical words, we can reduce controllability and observability iP exchange for speed.
• Simuiation speed can be quite slow compar o
As an example of reducing precision or accuracy to gain speed, consider the earlier
especialiy the speed problem, will be
1',,
J.
implementation. .
Techniques for overcoming these problems,
discussed in the next few sections.
example where we used a gate-level microprocessor model as our simulation model. When
testing the cruise control program for correctness and completeness. we probably don't care
about what's happening at the inputs and outputs of every logic gate in the microprocessor.
Simulating at the gate level of detail is costing us tremendously in terms of speed. since the
'
Simulation Speed . . microprocessor may have hundreds of thousands of gates. Instead. we might replace the
~r
(1:
. .
compared to execution on a p Y
.
p
f simulation is that simulauon is very slow
Perhaps the most s1gruficant d1~d;i:~a~~ olementation. For example, while a physical
te 100 million instructions per second, a
implementation of a microprocessi ~a~ execu ssor may only execute 10 instructions per
gate-level model by a model made _up of register-transfer level components. which might
execute 10 times faster than the gate-level model, as illustrated in Figure I 1.8. An even faster
simulation approach is known as cycle-based simulation, in which we design a simulator that.
l1'
simulation of a gate-level model ~f
second meaning that the gate- eve s1mu a o
t1
~c~or:is 10 million times slower than actual
. . . sample numbers
is only accurate at clock cycle boundaries, and does not provide any information of signal
changes in between cycles. As showii. in the figure, this may gain us another factor of 10
execution. Figure 11.8 ill~Slrales _thishdi:r~;c;f ::;~o:spi:=~is~:::gOne hour of actual speed improvement. Going for more speed, we may not need to model the structural
t.\o
components inside the microprocessor at all, and instead we might just use an instruction-set
representative of an SOC with perha~s unbo et million hours of gate-level simulation,
execution of an SOC would require a lat ion is quite a reasonable duration to want simulator, which may gain yet another factor of 10. An instruction-set simulati.on may thus be
equivalent to about 1,000 years: One hour o s~~u .s -controller as an example. Given the 10,0000 times slower than real execution, so now simulating our desired 1 hour requires
to simulate. For example, consider an automo I e c~~ e locities we might certainly want to 10,000 hours. or just over l year. Such faster simulation is often -coupled with the
wide variety of possible speeds, road g~des, and wm ~e and c~ise-controller responses. above-mentioned reduction of the real time being simulated. So if we are wming to simulate
. . bo h , worth of envtronment scenanos _ . . . for 10 hours, we could simulate 10 x l / 10,000 = 0.001 hour of real time, or 3.6 seconds of
invesugate a ut an our s · . b cause we are sequentlalIZmg a
Simulation is slow for several reasons. One reason_is e . Wh • unplemented as an real time.
. parallel design. Suppose there are 1,000,000 logic gates _m a_.d ~;;;~n :en essentially have to
IC all 1 000 000 gates operate m parallel. However, m s1m . , - . Hardware-Software Co-Simulation
' ' ' - th -- f h gate one at a time.
analyze the inputs and generate_ e output o _eac - dding several programs . in More generally, a variety of simulation approaches exist, varying in their simulation speed
A second reason simulauon IS slow is because we are a 1 suppose we wanrto
and precision/accuracy. For a given processor, whether general-purpose or single-purpose,
between the system being simulated and real har~ware. Fo~ examp ed and understand this
simulate a simple operation like A = B + C. A simulator as to rea
·Embedded System Design 299
_ _ __::_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~E-:m=b~ed=ded System Design
298
Chapter 11: Oesign Technology
,11.4: Reuse: Intellectual Property Cores·
· Ia u·on can· vary· from \"erv

s1mu .. detailed. like a gate-level model. to very abstract.· like
kn an
instruction-level model. An instruction-level model of a general-prnpose processor ts own
In order to minimize this communication, we -can model the memory in both the ISS and the
as an instruction-set simulator (ISS). An instruction-level model of a single-purpose processor
HDL simulator. Each simulator can use its own copy of the memory without bothering the
is simply know:1 '.!Sa "-' ,tern-level model. Lower-le\·el simulations of either type of processor
other simulator most of the time. The co-simulator must ensure that the memories remain
is usuallv done t,y ·,;-caiin.:. a bella\·ior. RT. or gate-level model m a hardware descnpllon
consistent and that shared data does get communicated properly. Co,simulators using this
languag~ (HDL) c·i,,:irorun,;nt. Because of the past separation ~f s_oftware design and hardware
speedup technique exhibit much faster perfonnance, with some reports indicating a factor of
design. the simulation tools for each domain ha\·e ernl\·ed qmte mdepend~ntly. The emphasis 100 or more.
in software simulation has been on ISSs. l11e emphasis of hardware s1mulauon has been
models in hard\\·are description languages (HDLs). . Emulators
The integration of general-purpose and singlc-prnpose processors on_to a smgle _IC has
increased the need for an integrated method for simultaneously simulaung these different Emulators were created to help solve the prqbleins associated with simulation listed earlier.
types of proccss~rs. Thus. there is much interest in merging previously distinct software and namely, expensive environment setup, incomplete environment models, and slow simulation
hardware simulation tools. speed. An emulator is a general physical device onto which a system can be mapped relatively
One simple but nai\"e form cf imegration is to create an HDL model of the quickly, perhaps in hours or days, and which usually can be placed into the system·s real
microprocessor that will run the software of a system. and then integr~te that _model wtth tl1e eventual environment. A microprocessor emulator typically consists of a microprocessor IC
HDL models of the remaining single-purpose processors. While straightforward to with some monitoring and control circuitry. An emulator for single-purpose processors
implement. simulating a microprocessor in ,m HDL has two key disadvantages. First. this typically consists of tens or hundreds of FPGAs. Both types of emulators usually support
approach \\·_ill be much slower than an ISS. since the HDL simulator represents an extra layer designer debug tasks, like stopping execution and viewing internal values.
of software that must be executed. Second. such an approach ignores the fact that ISSs have Emulation has several advantages over simulation. The environment setup needed with
excellent controllabilit\" and observability features that designers have become accustomed to. simulation is not necessary, and. obviously the incomplete environment problem is not a
Another approac'h to integrating gcncral-prnpose and single-purpose processor problem since we aren't modeling the environment. Furthermore, because the emulator is a
simulations is to create communication between an ISS and HDL s11nulator. Thus. each physical implementation, it is typically much faster than simulation.
simulator runs independently of the other. except \1·hcn data needs to be transferred between a However, emulators have some disadvantages, too. First, they are still not as fast as real
gcneral-prnpose processor and single-purpose processor. A simulator that 1s designed to hide implementations, which could lead to timing problems in the real environment. For example,
the details of the integration of an ISS and HDL simulator is known as a l~ar~ware-software . an emulated cruise-controller system may not respond quickly enough to keep control of the
co-simulator. While faster than HDL-only simulation and wlulc cap1tal1zmg a.pd tl1e car it controls. Second, mapping a system to an emulator can still be time-consuming. For
popularity of ISSs. co-simulators' can still be quite slo"i1· if the general-purpose and example, mapping a complex SOC description to 10 FPGAs requires partitio~g the system
single-purpose processors must communicate with one another frequently . into 10 parts, a task which itself could take weeks. Third, emulators can be very expensive.
As it turns out. in man~· embedded systems. those processors do have frequent For example, a top-of-the-line FPGA-based emulator can cost between $100.000 to
communicatiorL Therefore. modem hardware-software co-simulators do more than just $1,000,000. Not only is the eost a problem in itself, but it can also lead to a resource
integrate hrn simulators. They aiso seek to minimize the communication between those bottleneck. Specifically, a company may only purchase one emulator that must be shared by
simulators. Consider. for example. a system hm·ing one microprocessor. one smgle~purpose several different design groups, requiring one group to wait days or weeks until another group
processor representing a coprocessor. and one memory. all connected using a single shared finishes.
bus. Suppose the microprocessor's program is stored in this memory. and that the coprocessor
uses the memory cxtensivch- also. We can simulate the microprocessor using an ISS and tl1e.
coprocessor usi~g an HDL .. But where should the shared memory be modeled. in 01c ISS or 11.4 Reuse: Intellectual Property Cores
the HDL? If in the HDL then on every instmction. the ISS will need to stall m order to
Designers have always had at their disposal commercial. off-the-shelf (COTS) components.
communicate with the HDL simulator to fetch the next instruction from memory. If in tl1e
which they could purchase and use in building a given system. Using sucli predesigned and
ISS. then tl1e HDL simulator will need to stall in order interrupt the ISS for access to the
prepackaged ICs, each implementing general-purpose or single-prnpose processors. greatly
mcmorY. However. note that most of these.stalls arc probably not necessary. For example. tlie
'· reduced design and debug time, as compared to building all system components from scratch.
1SS ac~esses of its instructions in memory arc really irrelevant to the coprocessor. Likewise.
the coprocessor's manipulation of data in memory is not relevant to tl1e microprocessor. As discussed in Chapter 1, the trend of· growing IC capacities is leading to all the
components of a system being implemented on a single chip, known as a system-on-a-chip
except in cases where that data is being transferred between the processors using the memo!)·.
(SOC). Th.is trend, therefore, is leading to a major change in the distribution of such
off-the-shelf components. Rather than being sold as ICs, such components are increasingly
Embedded System Besign 301

c~apte~ 11: Design reci,no109y 11.4: Reuse: Intellectual Property Cores
Pricing models have proliferated. In the past vendors .

being sold in the form of intellectual property, or IP. Specifically, they are sold as behavioral, designers in the fonn of an IC. Designers could no·, ( _could provide their products to
. econom1cal1\') copnhesc IC .
structural or physical descriptions, rather than actual ICs. A designer can integrate those wanted more copies thev had to buv more ICs Th · . s. so 1fthc,·
descri.ptions with other descriptions to fonn one large SOC description that can then be ' · . · c more !Cs the d · · ·
more money the vendor earned Today the \'endors pr ··d th . es1gner purchased. the
fabricaied into a new IC. t; . . · •• ou e eir product t des· · ·
on_n of I~>, transnutted na an electronic format like the World Wid . s o igners m the
· Processor-levet 9omp:;;.ients that are available in the fonn of IP are known as cores. designer IIlCOl]JOrates this IP into an SOC and then prod c Web_or CD-ROM. The
Initially, the term core referred only to microprocessors, but now is used for nearly any uces
The vendor can now choose whether lo sell the IP to th d . . . as he/s1ie needs.
as mam· copies
.. e es1gner usmg diffi . .
general-purpose or single-purpose processor II;' component. modeIs. One pncmg model follows that of !Cs. name!\' th d . crent pncmg
a~ount for eac~ copy he or she creates. This is know~ ~s : ro~s~r:.~~must pay a certain
Hard, soft and.firm cores different model is that of a fixed price name!,· the des' . . sed model. A ,·en
· · .· igncr pavs for the ·gh ·
Cores come in three forms: and then creates as many copies as desired Man,· "O . · . n t to use the IP
h . · · "' mparues now gl\'C out ,,
• A soft core is a synthesizable behavioral description of a component; typically w en theU: p~ucts are purchased. such as synthesis tools or an FPGAs cores_·~r _free
written in a hardware description language (HDL) like VHDL or Verilog. and combmat1ons of these models exist in toda,· ·s IP 1· . . Countless , anatmns
od I . . iccnsmg arrangements. Ea h . .
. • A firm core is a structural description of a component, again typically provided in an m e com_es u1th accompanying challenges of enforcing those models A ·cl pncmg
HDL. ~ode! reqwr~. forexampie. that the IP vendor be aware of how man\' rod. ~~a ~-ba~d
• A hard core is physical description; provided in any of a variety of physical layout its IP are being sold. Extensive contracts must often be created t . ~ ~ts mco~rntmg
arrangements. . 0 e orce these hccnsmg
file formats.
Note.that the three forms of cores, namely, soft, finn, and hard, correspond to the three IP p.rotection has ~ome a ke~· concern of core pro,·iders. In l11e past ill h· c ..
axes in Gajski's Y-chart in Figure 11.4. an IC uould ha\.·e required a tremendous amount of deliberate cffiort ...,.. egal • . op~ mg
effort "A ·ct tal~ · , ..,·erse cngmccrmg
.. A hard core has the advantages of ease of use and predictability. Since the core developer · cc1 en copymg of an IC was not e\·en possible Tod · · ·
electro · f; d · · · a~. cores are sold m an
has. already designed and tested the core, the core can be used right away and can be expected me orma1. an so deliberate and e,·cn accidental unaul1mrized co ·
easier Th m · f
r
p,ing o a core are
.to work correctly. Furthermore, the size, power and performance of the core can be predicted . · e a -ent_o cor~s has therefore greatl~· increased the safeguards ·that \'end I
, quite accurately. However, a hard core is specific to a particular IC process, and thus cannot
be easily mapped to a different process. For example, a hard core A may be available for IC
consider when selling their products. Contracts must be created to ensure that des· . orsdmust
copy o d'strib th IP s · 1gners o not .l
l
r I ut~ c . ome _"endors use cncl)ption techniques to limit the actual ex sure l
vendor X's 0.25 micrometer CMOS process. If a designer wishes to use vendor ..\~s 0.18 to ~h~ IP.that designers ~n ach1c,·c. ;cchniqucs known as watermarking arc being de,·:ped
micrometer process, or wishes to use vendor Y, then the hard core A cannot be used. to e P \:endors dctemunc whether a particular instance of a processor 1·n .,- IC ·d
· ndor and w h.el11er 11us
from th.e \·e · copy was authorized.
·· .... was cop1e
On the other hard, a soft core has the advantages of retargeting and optimization
potential: A hard core must be designed using a particular IC technology, and thus can't be th These ~~llenges. of cour~c. come wit~ ben~fits. One key benefit to core prm'iders is that
, used in a different technology. In contrast, .a soft core can be synthesized (targeted) to nearly bo% ~:::!~e manufactunng fr~m th';:'r business entirely. Many processor manufacturers.
any.technology,-as long as the user has access to the synthesis and physical design tools for g . purpose and standard smgle-purposc. ha\'c gone to a core-only business model.
the desired technology. Furthermore, a designer can modify the behavior to be optimized for a
particular use - for example, deleting unused functions of the core - resulting in lower- New Challenges Posed by Cores to Processor Users
power and smaller designs. But, soft cores obv.iously require more design effort. and may not The ad\'ent of cores also poses new challenges to designers seeking to use a g~ral-purposc
· work properly in a technology for which ii has never been tested. Furthennore, a soft core will or standanl smgle-purpose proccsso r. Tl1esc me
· 1udc 11ccnsmg
· · arrangements.
·
. . extra <lesion
likely nor' be as optimized as a hard core for the same processor, sinceJmrd cores typically effiOrt, and vcnficat1on. · ' e
have been given much more design attention. . Licensing ~gemcrits are more _complicated f~r cores. A designer purchasing a core
Finn.cores are a compromise between soft and hard cores, providing some retargetability typically cannot JUSt order one as caSJlv as purchasmg an IC Contracts ·c·n",....; · ·
and some limited optimization, but also providing better predictability and ease of use. od I d IP · · · . · . n .....ng pncmg
m e ~ an . . pro(ect1on ~ust_ be drawn up and signed. perhaps requiring legal assistance.
Extra des1$11 effort w1H hkclv be nccessar\'. especial!\' for soft co•cs A n· .
· New, Challeo.ges. Posed by Cores to Processor Providers still be , th ·1.00 d · .· . . · . . · . ,, · so core mm,1
s~n ':51 . an_ tested. E\en minor differences m S)nt!•csis tools can )·ield problems.
The advent of cores has dramatically changed the business model of vendors of Vcnficauon reqmremcnts l~vc become much more difficult. One increased difficult, is
general-purpose processors and standard single-purpose processors. Two key areas that have ::'3t soft cores Iha! arc synthesized must be tested cxtcnsi\'cly 10 ensure correct s~nth~sis
changed significantly a~e pricing models and IP protection. · Ulpul. Fwthennore. soft and finn cores mapped to a particular tcchnolo!,!\' 1111,st again be
~-~~-c----------------..c.:--,--.c__ _
:-e.........
m"""""" System Des_ign
303
I
l
I
11.&: Design Proce·ss Models

Chapt_er 11; Design Technology
proceed to the next step, we never come back to the earlier steps, much like water cascading
Behavioral down a mountain doesn't return to higher elevations,
Structural
Behavi,Jral Unfortunately, the waterfall model is not very realistic, for several reasons. First, we will
almost always find bugs in the later steps that should be fixed in an earlier step. For example,
when testmg the structure, we may notice that we forgot to handle a certain input combination
SU11ctural in the behavior. Second, we often do not know the complete desired behavior of the system
until we have a working prototype. For example, we may build a prototype device and show it
to a customer, who then gets the idea of adding several features. Third, system specifications
commonly change unexpectedly. For example, we may be halfway done designing a system
(a) (b) when our company decides that to be competitive, the product must be smaller and consume
less power than originally expected, requiring several features to be dropped. Nevertheless,
Figura J 1.9; Design process models: (a) waterfall, (b) spiral. many designers design their systems following the waterfall model. The accompanying
unexpected iterations back through the three steps often result in missed deadlines and hence
extensively tested. Ideally, synthesis and physical design tools wo~~ generate correct in lost revenues or products that never make it to market. · '
implementations, but that is simply not the case today. In addiuon, even correct An alternative process model is the spiral model, shown in Figure 11.9(b). Suppose again
implementations will vary in tenns of their timing and power. . . that the designer has six months to build the system. In the spiral model, the designer first
A second increased difficulty in verification stems from tre fact that there is no duect exerts some effort to describe the basic behavior of the system, perhaps a few weeks. This
access to a core once it has been integrated into a chip. In the past, a system' s !Cs resided on a description will be incomplete, but have the basic functions, with many functions left to be
board, and those ICs could thus be tested individually by connecting a lo~ic analyzer the t? filled in later. Next, the designer moves on to designing structure, again taking maybe a few
IC's pins. Today, a system's cores are buried inside of a si~e IC, so directly _accessing a weeks. Finally, the designer creates a physical prototype of the system. This prototype is used
core's ports is impossible, requiring other means for scanrung port values m and out. to test out the basic functions, and to get a better idea of what functions we should add to the
Furthennore one cannot simply replace a bad core by anot'ier one, the way one could replace system. With this experience, the designer proceeds to proceed through the ~ steps again,
a bad IC in the past, thus making early verification even more crucial. expanding the original behavioral description or even starting with a new one, creating
structure, and obtaining a physical implementation again. These steps may be repeated several
times until the desired system is obtained. . .
up:
The spiral model has its drawbacks, too. The designer must come with ways to obtain
11.5 Design Process Models structure and physical implementations quickly. For example., the designer may have to use ·
A designer must proceed through several steps when designing a system. We can think of FPGAs for the physical prototypes, finally generating .new silicon (a task. that can take
describing behavior as one design step, converting behavior to structure as ano~er step.. and months) for the final product. Thus, the designer may have to use more too~s, which itself can
mapping structµre to a physical implementation as another ~ep. Each step _will ~bv1ously require extra effort and costs. Also, if a system was well defined in the b\:ginning and if we
consist of numerous substeps. A design process model descnbes the order m which these would have created a first-time correct implementation using the waterfall model, then the
steps are taken. The term process here should not be confused with the no_tion of.a process_ in spiral model requires more time due to the overhead of creating numerous prototypes.
the concurrent process model discussed in an earlier chapter, nor should it_ be confused with Nevertheless, variations of the spiral model have become extremely popular; both in software
the IC manufacturing process. Here, process refers to the manner in which the embedded development as well as hardware development.
svstem designer proceeds through design steps. The preceding discussion focused implicitly on _designiqg ~ingle-purpose processors,
· One process model is the waterfall model, illustrated in Figure l I ._9(a). Suppose a sine<: we started with behavior, designed structure, and then .mapped to a physical
designer has six months to build a system. In the waterfall model, the designer first ~xerts implementation. However, the discussion applies equally _to using gcneral-pWJ>05e processors.
extensive effort, perhaps two months, describing the behavior completely. On~e fully sall_sfied In the traditional waterfall approach illustrated in Figure · I l.9(a), a genefl!)-purpose
that the behavior is correct, after extensive behavioral simulation and debuggmg, the designer . processor's architecture (structure) is developed by a particular compariy and acquired by an
moves on to the next step of designing structure. Again, much effort is exerted, per'1:3ps embedded system designer. The designer then develops a software appli<:ation (behavior). ·
another two months, until the designer is satisfied the structure is coriect. Finally, the phys~cal Finally, the designer maps.the application to the architecture, using compilation and ~ual.
~~ ... '
implementation step is carried out, occupying perhaps the last two months. The result is a
final system implementation, hopefully a correct .one. In the waterfall _model, when we

304
-.-
····1··
1 11.7) Book Summary
11. 7 Book Summary

Architecture Application(s)
Embedded systems represent a large and growing class. of computing systeR1S; which some
people beli~ve will soon become even more significant than desktop computing systems. The
nature of embedded systems has been changed dramatically by today's outrageously large
chip capacities coupled with powerful new automation tools, but methods for teaching
embedded systems design have not _~olved concurrently. This book is a first attempt to
remedy this situation. We started by introducing the view that computing systeins are built
· primarily from collections of processors, some general-purpose, some single-purpose
Figure. I I.10: A spiral-like approach represented using another. y -chart. (standard or custom), which differ not in some fundamental way, but rather just in their design
metrics like power, performance, and flexibility. We introduced memories commonly used
. ·· . led proach is begmrung. · 1o change. A spiral-like along with processors and descnl>ed how to interface processors and memories. With
However, even this ~del_y accep ~p be inning to be applied by embedded system processors, memories, and interfacing methods, we could build complete systems, and so we ·
Process model, illustrated m Figure l l.10, is g . s an architecture and develops an gave an example of one such system: a digital camera.
. ode! the designer develops or acqwre ' .
designers. In _this m '. . - e desi ner then maps the application to the archit~e, During the first part of the book, we did not focus on the nitty-gritty interruil details of
application o~ set of _apphca~ons. r1!.
and analyzes the design metncs O I is com ~na
gb. tion of application, architecture and mappmg.
. g, (b) modify the application to better
any particular microprocessor, since modern tools greatly reduce the need for such
knowledge. Instead, in ~e second part of this book, we focused ·on higher-level issues. We
The designe~ can then_choose t~ (a~:=1~::U~~~P=tter suit the application. This~ step examined powerful higher-level computation models like state machines and concurreri_t
suit the architecture, or(c) modify . difficult to consider. However, With the processes, which enable the capture of more complex functionality. We introduced the basics
. . tl hitecture was previous1y too . f
of modifymg ie arc -1 that can generate code for a vanety 0 of a large class of embedded systems, known as control systems. We summarized the key IC
maturation of synthesis tools as well as comp~ers "ble Furthermore as mentioned above, technologies available to implement embedded systems. Finally, we summarized the issues
instruction sets; this last step is much ~ore eas1 or ar.chitech•re in,the form of intellectual related to design technologies for mapping desired behavior to a physical implementation.
. · g1 obtaining the rrucroprocess ,..,
designers ar:.
property, w ic
~=:usybe potentially be tuned to the application. This is inbestarkodifi~:5'~yo
. . r IC obviously could not m
This book was intentionally broad in nature. It was designed primarily to serve as a
starting point for students about to study the various subtopics of embedded systems in more
the past, when
coincidence, an . ~~edF.
the dep1<;Uon m1cror;~:r°1his
m igure . . process
rl. model is referred to as the Y-chart,
detail, topics like _VLSI/ASIC design, real-time progranunjng, digital-design S}'llthesis, control
sy<=tem design, and other topics. The hope is that the student pursuing those topics·will have a
but has no relation with Gajski '_s y -chart de~n~ :iu::~ral structural, or physical models) unified view of hardware and software throughout their studies, an(J view embedded systems
Refining to lower abstraction levels (w ~t er ed . F'. 11 3(b) Such narrowing design not as a field comprising mostly low-level code hacking but-rather as a · unique
narrows the potential implementatio~s, ~ illustrat m igure . . -
proceeds until a particular implementauon 1s chosen. engineering discipline dell13riding a balanced knowledge of hardware and software issues. We
hope you have found the book useful:
11.6 Summary - _ 11.8 References and Further Reading

. design technology, so that the gap between
dv • Balarin, E, M. _Chiodo, A. Iurecska, H. Hsieh,
Tremendous eff~rt. is bemg exert~ to a =~educed. Synthesis has made dramatic changes A. L. Lavagno; C. Passerone, A. L.
designer productMty and IC capacity can d . ed making such design much more Sangiovanni-Vincentelli, E. Sentovich, K. Suzukt, and B. Tabbara. Hardware-Software
in the ·way that single-purpose pr~ssors are ~ign m::.ms that software and hardware Co-Design of Embedded Systems: A Polis Approach. Noiwell, MA: Kluwer Academic
similar to software desi~. lncr~g IC ca~~ity . a design paradigm shift toward Press, June 1997.
components -coexi~ on s!ngle-chip S~s, re~u
extensive: reuse of ))redesigned cores. Simulati_ng . hni
a;ic~
is beneficial but very hard to _do
Whether designmg
• De Micheli, Giovanni. Synthesis and Optimization of Digital Circuits: New York:
. helped ·bY -co-sim
. ulation and emulauon McGraw.Hill, 1994. An introduction to the models and algorithrns inside synthesis tools,
quickly, and it. 1s . · tee roach
ques. to design.-
ranging from high~level down to logic-level synthesis tools.
software or hardware, a Spiral design process model is a popular app -
306 Embedded S)'Stem Design

307
I
l
11.9: Exercises
• Gajski, Daniel D. Principles o/Digital Design. Englewood Cliffs, NJ: Prentice-Hall,

1997. Introduces combinational and sequential logic design, with a unique focus on not hundreds!). ~ n g two. transistors per gate input, and g~te dela of
synthesis and higher-levels of design in the later chapters. . nanoseconds, create a smgle plot showing size versus delay for both des· Y 10
• Gajski, Daniel D., Nikil Dutt, Alen We, and Steve LiIL High-Level Synthesis: l 1.S Show how to partition a single finite-state machine into two sma1 1gnles. fi .
introduction to Chip Design. Norwell, MA: Kluwer Academic Publishers, 1992. A machi hi h · r 1mte-state
. . nes: w c ~ght be necessary to achieve acceptable synthesis tool run time
description of the methods and algorithms underlying high-level synthesis. Includes Hznt. you II need to introduce a couple new signals and states. ·
discussion of the Gajski Y-chart. 11.6 De~ne hardware/software codesign.
• Gajski, Daniel D., Frank Vahid,,Sanjiv Narayan, and Jie Gong. Specification and Design l l. 7 Wnte a small program that reads a file of integers and outputs their sum. Write another
of Embedded Systems. Englewood Chffs, NJ: Prentice Hall, 1994. Introduces a top-down program ~t does not add the integers using the built-in addition operator of
specify-explore-refine approach to design. · programnung lan~ge, but instead "simulates" addition by using an addition functio:
• Katz, Randy. Contemporary Logic Design. Redwood City, CA: Benjamin/Cummings, that_ ~onverts each mteger to a string of Os and ls, adds the strings by mimicking bi
1994. Describes combinational and sequential logic design, with a focus on logic and :idit.ton, and converts the binary result back to an integer. Compare the performance~
sequential optimization and CAO. !
Ife ~ve p~gram 0 the performance of the simulation program on a large file.
• Kienhuis, B, E. Deprettere, K. A. Vissers, and P. van der Wolf. An Approach for 11.8 a s1mulat.t~n environment can simulate 1,000 instructions per second, estimate how
Quantitative Analysis of Application-specific Dataflow Architectures. In Proceeding of long the envtronment would take to simulate the boot sequence of Windows running on
11th Int. Conference of Applications-specific Systems, Architectures and Processors a modem PC..Even a very rough estimate is acceptable, since we are interested in the
(ASAP 1997), pp. 338-349, 1997. Describes the Y-chart design process. order of magrutude of such simulation.
• A. C. J. Kienhuis. "Design Space Exploration of Stream-based Dataflow Architectures: l 1.9 'Yhat i~ hardware/software co-simulation? What is a key method for speeding up such
Methods and Tools," PhD thesis, Delft University .of Technology, The Netherlands, simulation?
January 1999; ISBN 90-5326-029-3. Discusses advantages of working at a higher 11.10 Show_the correspondence of the three types of cores with Gajski's Y-Chart
abstraction level; includes the abstraction pyramid and the Y-chart design process. 1 l. I I D_escn~ the n~w challenges created by cores for processor developers as ~ell as users
• Klein, Russ. Hardware/Software Co-Simulation. Mentor Graphics Corporation, technical l 1.12 List maJor design steps for building the digital camera example of Chapter 7 assumin ·.
white paper, http://www.rnentorg.com/searnless. Descnbes the basics and some (a) a waterfall process model, (b) a spiral-like Y-chart process model. g.
experiences with hardware/software co-simulation.
• Sommerville, laIL Software Engineering. Reading, MA: Addison Wesley, 2000. A survey
of the many different aspects of software engineering, including the spiral design process
model.
11.9 Exercises
11.1 List and describe three general approaches to improving designer productivity.
11.2 Describe each tool that has enabled the elevation of software design and hardware
design to higher abstraction levels.
11.3 Show behavior and structure (at the same abstraction level) for a design that finds
minimum of three input integers, by showing the following descriptions: a sequential
program behavior, a processor/memory structure, a register-transfer behavior, a
register/FU/MUX structure, a logic equation/FSM behavior, and finally a gate/flip-flop
structure. Label each description and associate each label with a point on Gajski's
Y-chart.
11.4 Develop an example of a Boolean function that can be implemented with fewer gates
when implemented in more than two levels (your designs should have roughly 10 gates,
. 308 Embedded System Design

309
-----------------------------~,----,,..--4
APPENDIX A: Online Resources

.....
A. l Introduction
A.2 Summary of the ESD Web Page
A.3 Lab Resources
A.4 About the Book Cover
A.1 Introduction .
We intentionally designed this textbook to be independent of any particular microprocessor,
microcontroller, programming language, hardware description language, FPGA, and so on.
This decision was niade largely because the growing popularity and complexity of embedded
systems has been accompanied by tremendous diversity. The days when most courses on
microprocessor-based design used a fairly standard microcontroller are quickly giving way to
the · situation of tremendous diversity in · lab setups. Some setups emphasize 8-bit
microcontrollers, while others emphasize 32-bit platforms using one of a variety of popular
processors like Intel 80x86, Motorola 68000 variations, Sun Spares, MIPS processors, ARM
processors, digital signal processors, multimedia processors (like TriMedia's), and so on.
Furthermore, these processors come oil. a variety of development boards, each with unique
features. Some courses focus mostly on hardware prototyping while others include_extensive
simulation too. Some courses integrate the use ofFPGAs, which also come in diverse setups.
New chips and platfonns that integrate microprocessois and FPGAs are beginning to appear.
This diversity, coupled with the evolution of embedded system design into a discipline, make
the need to decouple lecture material from Jab material quite evident.
· -However, we have not simply left the instructor and students entirely on their-own with-
respect to lab setup. Instead, we have used the World Wide Web to supplement this book with
extensive lab materials. In fact, using the Web, we can provide even more than a typical
processor-specific textbook might be able to provide. ·
Embedded System Design . _3_u_ j

.,-,._,..-.•--~-~- -" ·--w
~.
-,~~~
I
I
·r
I
A.3: Lab Resoun:es
I • "XS40 Tutorial: Onboard Microcontroller 80 " . .

to control the onboard microcontroiler (803\) 31 ). Thieds tutonal shows students how
A.2 Summary of the ESD Web Page • "XS40 Tutorial· ·sending si·gnat fr th connect to the FPGA
·. . · s om ePC"Thi · ·
The \,1/eb site accompanying Embedded System Design can be found at: ~11.Ject s1gn'."1s from the PC into the XS40 board~ s tutonal shows students how to
http://www.cs.ucr.edu/esd. It currently includes items like: • In~oduct1on to FPGAs with Schematic Ca ~ure " . .
• Lab resources, including setup details and tutorials for the setup used at UCR design a seven-segment decoder using AND ~ . The purpose of this lab is to
lab assignments, and solutions gates. ' R, NAND, NOR XOR, and XNOR
• Links to related Wro sites, data-sheets, and industry standards documents • "Introduction to FPGAs rJsing VHDL " Th. lab .
• PowerPoint lecture slides to FPGAs with Schematic Capture" lab ThIS is a follow-up to the "Introduction
Of course, the Web site will continually evolve, so more items may be added in the the previous lab is translated intoVHDL e seven-segment decoder circuit draw in
future. • "Introduction to VHDL Simulation and· · . . . .
lab in which the student will implements:::;~~- Bhnkmg_L~Ds Lab." This is the
whose output is fed to a 2-to-4 decod Th o~ descn~t1on of a 2-bit counter
.• ''Seven-Segment Decoder· BehaVJ· erl.D e d_ecoder is then wired to four LEDs.
A.3 Lab Resources " ·
lntr?<fuction to FPGAs Using VHDL" lab l
ora escnpllon " Thi J b ·
. . .. s _a is a follow-up to the
Because the lab resources are of great interest to many instructors, we provide a brief rewntten so that it is a behavioral desc . ti . n this lab me seven-segment decoder is
summary of the items on the Web site at the time this book went to press. The items are • "ALUD . " np on.
basically the items used at UC Riverside, generalized for use by other schools. Again, these behavio~:l~n. The purpose of this lab is to build a 2-bit ALU. The ALU is written
items will likely evolve and expand. We encourage instructors to submit material that could .. "2-bit Counter." The purpose of this lab is to w . . .
be used to broaden the platfonns that can be used in conjunction with the textbook. ~ounter as a finite state machine (FSM). nte a VHDL descnpllon of 2-bit
For our lab setup, we chose to focus on the Intel 8051 and Xilinx FPGAs. For the FPGA • FSM + D: GCD Calculator" The . . .
side, we use VHDL as our hardware description language, and use Xilinx's Foundation of a GCD ( . purpose of this lab is to write a VHDL des . ti..
greatest common divisor) calculat Th . . cnp on
Express for synthesis and mapping to FPGAs. For the 8051 side, we use the Keil C compiler, farts - a controller and a datapath. or. e calculator is divided into two
and emulators, progranuners and chips from Philips.. We also currently use a platfonn from " FSM + D: Parallel to Serial Converter" Th . .
XESS Corporation, consisting of a single board including both a Xilinx FPGA and an 8051 description of a parallel to serial . e purpose of this lab is to write a VHDL
• "FSM · converter as an FSMD.
derivative. . to FSM+D: Soda Machine Controller." The . .
We chose to use an 8-bit microcontroller, as opposed to a 32-bit platform, for two implement a soda machine conlfoll pmpose of this lab 1s to
reasons. First, 8-bit platfonns are typically quite simple, with no operating system, BIOS, or • "VHD er.
L Calculator." The purpose of th· lab . .
other more advanced features. Thus, we felt they are more suitable for an introductory type VHDL to perform simple calculations,: ; . t_o unplemen~ a finite state machine in
course. Second, we have found that students enjoy being able to build inexpensive standalone • "Watchdog Timer." In this lab studente a . ii.Ion, su~tra~t1on, and multiplication.
systems - many students have been quite creative in developing projects that they have then timer. ' · s will be desigrung a hardware watchdog
actually taken home and put into use. For a more advanced course on real-time embedded In addition to the above labs, we include the foll . .
systems, we use a 32-bit processing platfonn. • " VHDL by Example" This Web site . o;n~.
The lab material is categorized by chapter, with a brief summary of associated labs. sei of increasingly ~omplex exam 1: es1~~ to t~ch stu~ents VHDL through a
ending with a microprocessor anl ;/~g~nnmg With a simple logic gate, and
observing that learning a hardware da I~ta filter. We developed this site after
Chapter 2
. escnpt1on language was t diffi
• "Tutorial: Aldec Active-HDL Simulation." In this lab, code is provided for a 1-bit students m an introductory course . .1 b oo 1cult for most
adder and a corresponding testbench. Code is provided for a 4-bi.t added built using much detail regarding the lan~;";37t,Y :ause, VHDL textbooks ~o into too
•
the 1-bit adder previously used. '
"XS40 Tutorial: VHDL Synthesis." This tutorial shows students how to synthesize
and download VHDL code onto an XS40 board. The tutorial gives steps showing
with great success. Students are able to
much less time than before.
:n~t~:-
examples directly relevant to what the: d an don t spend enough time giving
e
We have used this Web site
-code for complex systems in
how to synthesize the code provided using Xilinx Foundation Express to generate a
bit stream. ·

312 Embedded System Desi~n· · 313
A~NDIX A: Online Resources
A.4: About the Boole Cover

Chapter3
• "Microprocessor." The purpose of this lab is to implel1\ent a microprocessor in • ;:eypad ~-,. The purpose of this lab is to read input from a k
VHDL. corresponding key Pressed unto a seven'."Segrnent display. eypad and display
• "8051 Tutorial." Titis tutorial provides code to blink an LED using the 8051. The
tutorial shows students how to compile the program using the c5 l compiler, and then
Chapters
run the program using the PDS5 l emulation software. • "~sing EEPROMs," where the student connects
• "The 8051 Standalone Chip Tutorial." This 11.ltorial provides code to blink an LED rrucrocontroller and writes neces""nr code t I than . EEPROM device to a
using the 8051 standalone chip. • "805I E -J o compete e mterface
. xtemal Memory " In this lab the . . ·
• "Music Generator." The purpose of this lab is to design a peripheral device that plays I . . external memory device. . , student mterfaces an 805 I with an
musical notes. n addition, there are links to several memory da~eets.
• "Day of the Week." The purpose of this lab is to design a FSM when given day,
month. and year will output the day of the week. Chapters
• "Prefix Length." The purpose is to describe, at FSMD level, an entity that will
compute the length of the prefix of two 16-bit binary strings.
• "Serial Communioation." The pwpose of . .
copununication between the PC and the 8051 this lab ts to establish serial
• "Virtual Clock." In this lab, students implement a software real-time clock (i.e., a • "ISAB "H ·
virtual clock (VC)). .• "I2C B us;, Inerethi'. students implement the ISA bus using VHDL.
• "Instruction Set Simulator." The purpose of this lab is for the student to design and . us. s lab, students work on VHDL ·
protocol. a implementation of the I2c bus
experiment with a simple insuuction-set simulator. • "Bus lnven " In thi lab . .
In addition, we include the following: . . s students Will unplement a bus encodin h
mvert. g sc eme called bus
• An 8051 instruction set simulator, provided as C++ source code.
. In addition, there are links to nwnerous . .
., An 8051 synthesizable core, written in_VHDL at the register-transfer level.
m.terfaces such as AGP, CAN, FireWire, ARM, ~~~~~-documents outlining major bus
Chapter4 Chapter7
• "Implementing a 4-bit Counter and Interfacing it to an LCD." In this lab, students • CMIDL code for the Digital Camera e I Iha .
will learn how to write a simple C program for 80X5 l microccntroller, compile it provided on the Web page. xamp e t is described in Chapter 7 is
using the C51 compiler, emulate it on an emulator using PDS5l, and learn how to
use an LCD (Liquid Crystal Display).
• "Implementing a Calculator Using Peripherals." In this lab, the student will build a
A.4 About the Book Cover
simple calculator using the keypad as an input and the LCD as an output peripheral.
• "AID Conversion." The purpose of this lab is to be able to implement analog to A quick look around
.
··
our enVIJ'orunent turns up embedded . ..
digital conversion using the ADC0804LCN 8°bit AID converter. places. This book's cover shows just a few h syst:ms ma SUipnsmg number of
• "Stepper Motor." The purpose of this lab is to control a stepper motor, with numbering of those systems appears in Fioure A~\ li~~terns m common environments. A
instructions received from the PC via a serial communication link to the 8051. . .,_ . . Sting of those systems follows.
• "4-Bit Counter with Seven Segment Display." Here, students implement a 4-bit Outdoors
counter using an 8051 and a seven-segment di~play. I. Helicopter: control, navigation, co:mmunication,
• "Decimal Counter with Output Multiplexing." The purpose of this lab is to etc. 10. Automatic door
11. Electric wheelchair
implement a decimal counter, which counts from O to 99 using two seven-segment 2. Medicine administering systems
3· Smart l)ospita) bed with sensors and 12· Smart briefcase ~ fingerprint enabled lock
display and an 8051 . communication · 13. Am!>t'lance: medical and communication
• "Decimal Counter and Time Multiplexing." The purpose of this lab is to implement 4. Patient monitoring system equipment
a decimal counter. which counts from O to 99 using two seven~segment display and S. Surgical displays 14. Aut~matic irrigation systems
6. Ventilator IS. Jet ai~crafl: c~trol, navigation, communication,
an 805 l. Unlike the previous lab, in this lab only one port is used so we must 7. Digital thermometer autop1I~ collmon-avoidance, in-flight
multiplex the output. 8. Portable data entiy systems entertairunent, passenger telephones, etc.
9. Pacemaker 16· Laptop computer (contains embedded systems)
17. Cellular telephone
314 Embedded System Desigr

Embedded System Design ·
315
www.compsciz.blogspot.in .... -···--- --·--··-··- ····-····
-----=====-------'"-- •. .. . I
A.4: About the Book Cover
11. Portabl< MP3 player

12. Digital camera 29. TV-based Web access box
13. Elecrronic book 30. House temperature control
14. Trash compactor 3 I. Home alann system
I 5. Hearing aid 32. Point-of-sale system
16. Dishwasher 33. Video-game console
17. Electronic clock 34. TV remote control
18. Video camera 35. Electronic keyboard/S}'Dlhcsizer
I 9. Elecrronic wristwatch 36. Fax machine
20. Pager 37. Scaimer
21. Cell phone 38. Wireless networking
22. CD player 39. Telephone modern
23. DVD player 40. Cable modem
24. Sman speakers 41. Printer
25. Stereo receiver 42. Portable video game
26. TV set-top box 4 3. Personal digital assistant
27. Television 44. Portable digital picture viewe,-
28. VCR 4 5· Phone with answering machine
Figure A. l: Embedded systems in common environments, numbered for reference.
18. Portable stereo (boom-box) 25. Automatic lighting

19. Satellite receiver system 26. Pump monitoring_ system
20. Credit / debit card reader 27. Lottery ticket dispenser
21 . Barcode scanner 28. Cell-phone/pager·
22. Cash register 29. Traffic-light controller
23. ATM machine 30. Police vehicle (data lookup, corrununication,
24. Automobile (engine control, cruise control, sirens, radar detector, etc.)
temperature control, music system, anti-lock 31 . · Cell-phone base station
brakes, active suspension, navigation, toll 32. Handheld communicator (walkie-talkie)
transponder, etc.) 33. Fire-control onboard computer
Indoors
I
I. Cordless phone 6. Microwave oven
2. Coffee maker 7. Smart refrigerator.
3. Rice cooker 8. In-home computer network switch
4. Portable radio 9. Clothes dryer
5. Programmable range IO. Clothes-washing machine
316 ,Embedded System Design Embedded System Design

317
Index
.,
-<
bus-based 1/0. 145

A busy-waiting, 232
accelerator. 11 byte. 57
actuator. 247
adder. 33 C
address space. 57 cache. 125
aliasing. 264 cache block. 59. 126
allocation. 50. 294 cache hit. 59. 126
ALU, 34, 56 cache line. 126
analog-to-digital convener. 102 cache memory. 59
application-specific IC. See ASIC cache miss. 59. 126
application-specific instruction-set cache replacement policy. 128
processor. See ASIP CAD. 43
arbitration, 160 CAN bus. 171
ASIC. 13. 276 CCD. 180
ASIP. 12. 74 cell-based array. 277
assembler, 71 checksum. 169
assembly-language progranuning. 61 clock cycle. 57
asynchronous. 36 closed-loop control. 248
CMOS transistor. 30. 270
B codesign.. 21. 295
battery-backed RAM. 120 combinational logic. 33
baud rate. 91 communicating process model. 208
behavioral description, 285 comparntor. 34
behavioral synthesis. 294 compiler. 71
benchmark, 76 computation model, 209
binding, 50 concurrent process model, 223
bit, 57 condition variable. 233
Bluetooth, 174 consumer-producer problem, 228
bridge, 165 control system. 245
buffering, 153 control unit. 57
bus, 138 control-dominated system. 208
-- -- ----
l1I J
··----·---- -- ........__,.~ · ··--·----- -·--····· . ,. ·--- - ·---~ -------- . .. ..
T
'
Index
controllability, 297 DRAM, 120, 130, 134 GCD,48

controller, 39, 42 driver, 64, 66 ISA bus. 143, 147. 158. 315
general-pwpose processor, 9, 29, 55. 77
coprocessor, 11 DSP. See digital signal processor ISR, 65. 68, 149. 152. 163
greatest common divisor, 39
core, 302 duty cycle, 92
correctness, 5 dynamic RAM. See DRAM J
H
co-simulation, 18 JPEG, 4, 181
handshake protocol, 141
counter, 36, 84 E hardware/software codesign. See codesign
CPLD, 278 EEMBC, 77 K
hardware-s6ftware co-simulator, 300
CPU, 56 EEPROM, 116 Kamaugh map, 33. 288
Harvard architecture, 58
critical path, 57 embedded processors, 74 keypad, 97
HCFSM, 217
critical section, 229 embedded system, 1 high-level synthesis, 294
cross compiler, 71 . emulator, 73, 301 L
Huffman encoding, 183
cruise-conlrol system, 248 EPROM, 115 languages, 19
ESDRAM, 133 latency, 8
I
D extended data out DRAM, 132 layout, 272
I2c bus, 169, 315
LCD, 95
daisy-chain arbitration, 160 extended FSM, 213 IC, 13, 21
data-dominated system, 208 extended parallel 1/0, 145 linker, 71
IC technology, 13
dataflow model, 208, 241 I-cache, 59 liquid crystal display. See LCD
datapath, 9, 38, 41, 56 F logic gates, 30
IEEE 802.1 l, 174
DC motor, 94 fast page mode DRAM, 131 logic synthesis. 287
immediate addressing, 63
D-cache, 59 feature size, 13 implicant. 287
DCT, 182, 200 field progranunable gate array. See FPGA M
implicit addressing, 63
deadline, 240 FIFO scheduler, 239
indexed addressing, 63
machine-language programming. 61
deadline monotonic priority assignment, fjnite-stjlte machine. See.FSM maintainability. 5
indirect addressing, 63
240 finite-state machine with data. See FSMD market window. 6
Industry Standard Architecture. See ISA mask. 67
deadlock, 231 FireWire bus, 172 bus ·
debugger, 73 fixed interrupt, 149 maskable interrupt. 151 · . ·
infrared, 167
decoder, 33 fixed priority arbitration, / 60 mask-programmed ROM, 114
inherent addressing, 63
design metric, 4 fixed-point arithmetic, 201 .. Mealy-type FSM. 212
instruction-set simulator, 71, 300
design process model,· 304 flash memory, 117 memory hierarchy, 125
in-system prograrrunable memory, 112
design productivity gap, 22 flexibility, 5 memory programmer, 111
integrated circuit See IC ·
design technology, 16 flip-flop, 35 memory-mapped I/0, 145
integrated development environment
development processor, 69 formal verification, 296 message passing, 231
(IDE). 70
device programmer, 73 FPGA, 14,279,312 microcontroller, 12, 74,312
Intel 8051, 147. 195,312
Dhrystone benchmark, 76 frameworks, 19 microprocessor. See general-purpose
intellectual property. See IP
Dfuystone MIPS, 77 FSM, 36,211 processor
interrupt. 65, 149
4igital camera, 4, 179, 315 FSMD,39,44,47,48,213 minterm, 287
interrupt address table, 150
digital signal processor, 12, 75 full-custom IC, 13 MIPS, 77
interrupt address vector, 149
digital-to-analog converter, 102 fully-associative cache, 126 MMU, 134
interrupt controller, 160
direct addressing, 63 monitor. 235
interrupt service routine. See ISR
direct memory access. See DMA G interrupt-driven Ii0, I49
Moore's Law, 15, 110
direct-mapped cache, 126 gate array, 13, 276 Moore-type FSM, 212
IP, 18, 302
DMA, 154 . gates, 30 multilevel logic minimization, 290
lrDA 174
multiplexor. 33
320 Embeddt!d System Design

www.compsciz.blogspot.in 321
Index
muluport memory. I iO POSIX. 238

RF. 167
mutex. 229 power. 5. 30 successive approxima1ion. I0.1
ROM. 58. 112
mutual exclusion. 229 prescaler. 86 supcrscalar micprocessor. 61
111~1hical man month. 23 Princeton architecture. 58 rotating priority arbitration. 160
round-robin scheduling. 239 synchroni1ation. 232
priority arbiter. 160 synchronous. 36
RTL components. 35
N priority queue. 239 synchronous DRAM. See SDRAM
RTOS. 2J8
nonmaskabic interrupt 152 priority scheduling. 239 svnthesis. 17. 285
nonrecurring engineering cost. See NRE process. 226 ·
s system call. 68
I
cost process scheduling. 239 syslem specification. _16
safety. 5
nonrnlatile memo~ - l I l processor, 9. 21. 29 system synthesis. 294 ,:
scheduling. 48. 295 ·1;
NRE cost. 5. 7. 30 processor local bus. 165 system-on-a-chip. 22
SDRAM. 132
NVRAM. 120 processor technology, 9 . .
send. 231
0
programmable logic device. See PLD
PROM. 114 II sequential logic. 34
sequential program. 39
T
target processor, 69
obsen-ability. 297 proportional control. 258 technology, 9
one-hot. 33 · protocol. 118 sequential program model, 208
I- serial communication. 166 technology mapping. 291
one-time-progranunable ROM. See OTP PSM. 220 test, 18
ROM. PSRAM. 12ll set-associative cache, 126
shared memory, 227 thread. 238
opcode.62
operand. 62
pulse width modula1or. See PWM
PWM. 92
Ij shift register. 36 throughput. 8
time multiplexing. 141
shifter. 34
operating system. 67 limer. 84
optimization. 19
OTPROM.115
Q
QNX. 2-B
I silicon spin. 273
simulation. 18. 296 lime-to-market, 5. 6. 30
time-to-prototype, 5
single-purpose processor. JO. 29, 38
p
quantization. 183. 263
I software intenupt, 152 timing diagram. 139
top'<iown design. 16
SpecCharts. 221
PAL. 14. 278 R transistor, 30
speedup. 9
parallel 1/0. IJ4. 145 RAM, 58. 118 trap, 152
spiral model. 305
parity. 90. 168 Rambus. 133
SPLD. 278 lwo-level logic minimil".ation. 288
partitioning. 294 . range. 85 .
square wa\'e. 92
PCI. 173 rate monotonic scheduling. 240
SRAM. 119 u
performance. 8. 29 reaction timer. 87 UART, 90, 185, 197
period. 240 reactive system, 3 standard cell. 13. 276
standard I/0. 146 unit cost, 5, 7
peripheral. 11 . 84 real-time clock. 105 USB, 172
standards. 18
peripheral bus. I65 real-time system. 3. 242
state diagram. 214
photolithography. 272 receive, 23 I V
register. 35 state encoding. 50, 291
PID control. 261 vectored intenupt. 149. 160. 162
pipelining. 60 register addressing, 63 . state machine model, 208
state minimization, 50, 291 verification, 18
PLA. 1-L 278 register-indirect addressing, 63 VHDL, 313
PLD. 14. 278 register-transfer level. 3~ .
Statecharts. 217
static RAM. See SRAM VLIW, 61 l
polling, 149
polysilicon. 270
register-transfer synthesis, 293
relative addressing, 63
stepper motor, 98 VLSI, 13
volatile memory, 112
l
!
port. I JO. 138 storage permanence, 112 !
resolution. 85
strobe protocol, 141 1
port-based I/0. 144 revenue model. 6
structural description, 285 w I

watchdog timer, 88
i
!
,
I
323
ii
www.compsciz.blogspot.in ··-·-------- ·_ __J
Index
waterfall model, 304 ~ write-through cache, 128

windowed ROM, l 15
Windows CE, 242 y
write ability, 111 Y-chart, 285, 306
write-back cache, 128
I
!
I
324 Embedded System Oestgn

Frank Vahid - Tony Givargis - Embedded System Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Frank Vahid - Tony Givargis - Embedded System Design

Uploaded by

Copyright:

Available Formats

Embedded System Desig~

A Unified Hardware/Software Introduction

..;. Tony Givargis

John \Viley & Sons, Inc.

To my family: Neli, Fredrick, Odet, .and Edvin. - TG

Vahid, Frank , ·' r ,

(software). custom single-pw-pose processors (hardware), standard single:purpose processors

i vii_i e~~~~~~~i~t~ll)A~~J~O..,, 1·Emb~t~~:~;te:.~":'.~-~-

www.compsciz.blogspot.in . . --··· - ---·--------- --- - - -- - ---- -· ·-----·- -----··-

4 .7 Stepper Motor Contro.Uers .: . 98

4 .9 Real-Time Clocks · · · l 05 Memory Management Unit (MMU) 134.

6. 9 Serial Protocols 169 l

A First Example: An Open-LooRA.utom.opi!~.~rµi~ ~pntroller

. Logic Synthesis -~= ..:.:-: ,· •· · \"

1.1 Summary and Bqok Outline. .

1.1 EmbeddedSystemsOverview, : · . . .>·

. Chapt er 1: Introd uction

1.1: Embe dded System s Overview

Anti-lock brakes Modems ~ - -.. __

Figure 1.1: A short list of embedded systems.

product scanners, and automated teller machines), :ind

Embedded System Design

,1110.000 t----------a--c --B

Time (months) 2W 0 800 1800 2400 800 1800 2400

Controller Datapath Controller

10 Embedded System Design

autoincrementing register, a path that allows us to add a register with a m . .

u, tr would get about 500,000 miles per gallon. '

1981 Compilation/ Libraries/ Test/

.. _-:-.~-0-. 1.1. _,_

Embedded Sysiem Design 21

22 Embedded. Syst~m Design.

·- - - - ' - ____ ____

Embedded System Design ' .Embedde9 System Design 25

CHAPTER 2: Custom Single-Purpose

·· ·------ - -~ s · ~ ~... : :~ - .- - ---~-= ------~ --~- -~

Embedded Systeni Design 29

Figure 2.1: A simplified view of a CMOS transistor on silicon.

30 Embedded System Design Embedded system·Design

32 Embedded System Design embedded System Design

Figure 2.6: Sequential c~mponents.

34 Ern,be(lded SystemOesign Embedded Syst~m Design

36 Embedded System Design

Chapter .2: Custom Single-Purpose Processors: Hardware

38 Embedded System Design Embedded System Design 39

L~- ->- ---- i

40 Embedded-System Oes~n. 1~ e d System Design

connected to a register, we add /ID appropriately sized multiplexor, shown as

42 Embedded System Design Embedded S~stem Design

-· -- ·-·- --,,;,.:..........~ . : . . ~ - ~ ~ '- -·-··--· -·-··-· ··· ~---

44 Embedded System Design Embedded System Design 45:

t 2.s Optimizing Custom Single-Purpose Processors

-- ------ .... - - - - - -- -=-- . ~ _,, .• , _ - =~-==~~-, ,l·~""' www.compsciz.blogspot.in

Optimizing the FSMD

48 · E111bedd~ i.ystem Design

50 Embedded System.Design ~mbedded S.ystem Design 5l '

52 Embedded ·System :Oesign:: i:o::bedded System Design

Embedded System Design Embedded System Oesign 57

6.0 El)lbedded System pesign

Instruction I opcode operand} operand2 Addressing Register-file Memory

Instruction 3 opcode operandl operand2 Mo"'

Embedded System Design Embedded System Design 63

. .;..........,..i . . ~--.....- _ _ . • --.......,,.~ · ··'""""

·- - - - ' -

enable .... 2m x n ROM _ 2m__ >$_n, ROM ___ 2m x n ROM