Embedded System PDF

o m
ot.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
Version 2 EE IIT, Kharagpur 1
o m
ot.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
1
Introduction to Real Time
Embedded Systems Part I
Downloaded from www.citystudentsgroup.blogspot.com
Example, Definitions, Common Architecture
Instructional Objectives
After going through this lesson the student would be able to
Know what an embedded system is

distinguish a Real Time Embedded System from other systems
tell the difference between real and non-real time
Learn more about a mobile phone
Know the architecture
Tell the major components of an Embedded system
o m
Pre-Requisite o t.c
s p
Digital Electronics, Microprocessors
o g
Introduction . bl
u p
r o
In the day-to-day life we come across a wide variety of consumer electronic products. We
s g
are habituated to use them easily and flawlessly to our advantage. Common examples are TV
nt
Remote Controllers, Mobile Phones, FAX machines, Xerox machines etc.
e
u d
However, we seldom ponder over the technology behind each of them. Each of these
st
devices does have one or more programmable devices waiting to interact with the environment
t y
as effectively as possible. These are a class of embedded systems and they provide service in
i
.c
real time. i.e. we need not have to wait too long for the action.
w
w
Let us see how an embedded system is characterized and how complex it could be? Take
w
example of a mobile telephone: (Fig. 1.1)

o m
o t.c
Fig. 1.1 Mobile Phones
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

When we want to purchase any of them what do we look for?

Let us see what are the choices available?
Phone Weight Screen Games Camera Radio Ring tones Memory

Price / Size
Phone 1 88.1 x TFT1 Stauntman2 Yes No Polyphonic
Rs 47.6 x 65k & 4 x Zoom
5000/- 23.6 mm Color Monopoly3
116 g 96x32 included
screen more
downloadable
Phone 2 89 x 49 TFT J2ME Integrated No Polyphonic
Rs x 24.8 65k Games: Digital and MP3
6000/- mm Color Stauntman Camera
123 g 176x220 and 1 M Pixel
o m
t.c
screen Monopoly
More
downloadable
p o
Phone 3 133.7 x 176 x Symbian and No s FM
g Stereo 3.4 MB
Rs 69.7 x 208 Java
o user
5000/- 20.2mm pixel download
. bl memory
137g backlit
screen
games or
packaged on u p built in.
with MMC cards r o

4096
s g
colors
e nt
d
Besides the above tabulated facts about the mobile handset, being a student of technology you
u
may also like to know the following
st
Network type GSM2 or CDMA3 (Bandwidth),
it y
c
. Talk-time per one charge, Standby time
Battery: Type and ampere hour
w
w
w
1
Short for thin film transistor, a type of LCD flat-panel display screen, in which each pixel is controlled by from
one to four transistors. The TFT technology provides better resolution of all the flat-panel techniques, but it is also
the most expensive. TFT screens are sometimes called active-matrix LCDs.
2
short form of Global System for Mobile Communications, one of the leading digital cellular systems. GSM uses
narrowband Time Division Multiple Access (TDMA), which allows eight simultaneous calls on the same radio
frequency. GSM was first introduced in 1991. As of the end of 1997, GSM service was available in more than 100
countries and has become the de facto standard in Europe and Asia.
3
Short form of Code-Division Multiple Access, a digital cellular technology that uses spread-spectrum techniques.
Unlike competing systems, such as GSM, that use TDMA, CDMA does not assign a specific frequency to each user.
Instead, every channel uses the full available spectrum. Individual conversations are encoded with a pseudo-random
digital sequence. CDMA is a military technology first used during World War II by the English allies to foil German
attempts at jamming transmissions. The allies decided to transmit over several frequencies, instead of one, making it
difficult for the Germans to pick up the complete signal.

From the above specifications it is clear that a mobile phone is a very complex device which
houses a number of miniature gadgets functioning coherently on a single device.
Moreover each of these embedded gadgets such as digital camera or an FM radio along with the
telephone has a number of operating modes such as:
you may like to adjust the zoom of the digital camera,
you may like to reduce the screen brightness,
you may like to change the ring tone,
you may like to relay a specific song from your favorite FM station to your friend
using your mobile
You may like to use it as a calculator, address book, emailing device etc.
m
These variations in the functionality can only be achieved by a very flexible device.
o
.c than a Customized
This flexible device sitting at the heart of the circuits is none tother
Microprocessor better known as an Embedded Processor and the o mobile phone housing a
number of functionalities is known as an Embedded System. sp
g
Since it satisfies the requirement of a number of users atlo the same time (you and your friend,
you and the radio station, you and the telephone network . b etc) it is working within a time-
u p acceptable delay. We call this as to
constraint, i.e. it has to satisfy everyone with the minimum
work in Real Time. This is unlike your holidaying
r o attitude when you take the clock on your
stride.
s g
We can also say that it does not make us n twait long for taking our words and relaying them as
well as receiving them, unlike an email d eserver, which might take days to receive/deliver your
message when the network is congested
t u or slow.
s
ti y telephone as a Real Time Embedded System (RTES)
Thus we can name the mobile
.c
Definitions w
w
Now we are ready to wtake some definitions
Real Time
Real-time usually means time as prescribed by external sources
For example the time struck by clock (however fast or late it might be). The timings generated by
your requirements. You may like to call someone at mid-night and send him a picture. This
external timing requirements imposed by the user is the real-time for the embedded system.

Embedded (Embodiment)
Embodied phenomena are those that by their very nature occur in real time and real space
In other words, A number of systems coexist to discharge a specific function in real time
Thus A Real Time Embedded System (RTES) is precisely the union of subsystems to
discharge a specific task coherently. Hence forth we call them as RTES. RTES as a generic term
may mean a wide variety of systems in the real world. However we will be concerned about
them which use programmable devices such as microprocessors or microcontrollers and have
specific functions. We shall characterize them as follows.
Characteristics of an Rtes
Single-Functioned
o m
Here single-functioned means specific functions. The RTES is c
t. usually meant for very
specific functions. Generally a special purpose microprocessor executes
p o
over again for a specific purpose. If the user wants to change the functionality,
a program over and
e.g. changing the
mobile phone from conversation to camera mode or calculator s
o g function. These operations are
mode the program gets flushed
l
out and a new program is loaded which carries out the requisite
monitored and controlled by an operating system called asbReal Time Operating System (RTOS)
.
p etc. as compared to the conventional
which has much simpler complexity but more rigid constraints
operating systems such as Micro Soft Windows and Unix
o u
gr
Tightly Constrained ts
e n
u d
The constraints on the design and marketability of RTES are more rigid than their non-real-
while developing such a system. Size,t

time non-embedded counter parts. Time-domain constraints are the first thing that is taken care
s weight, power consumption and cost are the other major
4
factors.
it y
c
.Time
Reactive and Real w
w
Many embeddedwsystems must continually react to changes in the systems environment and
must compute certain results in real time without delay. For example, a cars cruise controller
continually monitors and reacts to speed and brake sensors. It must compute acceleration or
deceleration amounts repeatedly within a limited time; a delayed computation could result in a
failure to maintain control of the car. In contrast a desktop computer system typically focuses on
computations, with relatively infrequent (from the computers perspective) reactions to input
devices. In addition, a delay in those computations, while perhaps inconvenient to the computer
user, typically does not result in a system failure.
4
Very few in India will be interested to buy a mobile phone if it costs Rs50,000/- even if it provides you a faster
processor with 200MB of memory to store your address, your favorite mp3 music and plays them , acts as a small-
screen TV whenever you desire, takes your call intelligently
However in USA majority can afford it !!!!!!

Common Architecture of Real Time Embedded Systems

Unlike general purpose computers a generic architecture can not be defined for a Real Time
Embedded Systems. There are as many architecture as the number of manufacturers.
Generalizing them would severely dilute the soul purpose of embodiment and specialization.
However for the sake of our understanding we can discuss some common form of systems at
the block diagram level. Any system can hierarchically divided into subsystems. Each sub-
system may be further segregated into smaller systems. And each of these smaller systems may
consist of some discrete parts. This is called Hardware configuration.
Some of these parts may be programmable and therefore must have some place to keep these
programs. In RTES the on-chip or on-board non-volatile memory does keep these programs.
These programs are the part of the Real Time Operating System (RTOS) and continually run as
o m
long as the gadget is receiving power. A part of the RTOS also executes itself in the stand-by
mode while taking a very little power from the battery. This is also called the sleep mode of the
system.
o t.c
s p
Both the hardware and software coexist in a coherent manner. Tasks which can be both
g
carried out by software and hardware affect the design process of the system. For example a
o
bl
multiplication action may be done by hardware or it can be done by software by repeated
.
additions. Hardware based multiplication improves the speed at the cost of increased complexity
p
ou
of the arithmetic logic unit (ALU) of the embedded processor. On the other hand software based
multiplication is slower but the ALU is simpler to design. These are some of the conflicting
gr
requirements which need to be resolved on the requirements as imposed by the overall system.
s
nt
This is known as Hardware-Software Codesign or simply Codesign.
d e
Let us treat both the hardware and the imbibed software in the same spirit and treat them as
t u
systems or subsystems. Later on we shall know where to put them together and how. Thus we
s
can now draw a hierarchical block diagram representation of the whole system as follows:
y
it
.c
w
w
w

System
Subsystems
o m
Components
o t.c
s p
= interfaces
o g
. bl
= key interface
u p standards
= uses open
r o
s g and Architecture
Fig. 1.2 The System Interface
t
The red and grey spheres in Fig.1.2n represent interface standards. When a system is
assembled it starts with some chassis ordaesingle subsystem. Subsequently subsystems are added
onto it to make it a complete system. u
s t
Let us take the example of i tya Desktop Computer. Though not an Embedded System it can
.c a system from its subsystems.
give us a nice example of assembling
w a desktop computer (Fig.1.3) starting with the chassis and then
w
You can start assembling
w mode power supply), motherboard, followed by hard disk drive,
take the SMPS (switched
CDROM drive, Graphic Cards, Ethernet Cards etc. Each of these subsystems consists of several
components e.g. Application Specific Integrated Circuits (ASICs), microprocessors, Analog as
well as Digital VLSI circuits, Miniature Motor and its control electronics, Multilevel Power
supply units crystal clock generators, Surface mounted capacitors and resistors etc. In the end
you close the chassis and connect Keyboard, Mouse, Speakers, Visual Display Units, Ethernet
Cable, Microphone, Camera etc fitting them into certain well-defined sockets.
As we can see that each of the subsystems inside or outside the Desktop has cables fitting
well into the slots meant for them. These cables and slots are uniform for almost any Desktop
you choose to assemble. The connection of one subsystem into the other and vice-versa is known
as Interfacing. It is so easy to assemble because they are all standardized. Therefore,
standardization of the interfaces is most essential for the universal applicability of the system and
its compatibility with other systems. There can be open standards which makes it exchange

information with products from other companies. It may have certain key standards, which is
only meant for the specific company which manufactures them.
SMPS
CDROM drive
Hard Disk drive
o m
t.c
oInterface Cables
s p
o g
bl Mother Board
u p.
r o
s g
n t
d e
t u
y s
Fig. 1.3 Inside Desktop Computer
i t
A Desktop Computer will c
. have more open standards than an Embedded System. This is
w in the later. Many of the components of the embedded systems
because of the level of integration
w chip. This concept is known as System on Chip (SOC) design. Thus
are integrated on to a single
w
there are only few subsystems left to be connected.
Analyzing the assembling process of a Desktop let us comparatively assess the possible
subsystems of the typical RTES.
One such segregation is shown in Fig.1.4. The explanation of various parts as follows:
User Interface: for interacting with users. May consists of keyboard, touch pad etc
ASIC: Application Specific Integrated Circuit: for specific functions like motor control, data
modulation etc.
Microcontroller(C): A family of microprocessors

Real Time Operating System (RTOS): contains all the software for the system control and user
interface
Controller Process: The overall control algorithm for the external process. It also provides
timing and control for the various units inside the embedded system.
Digital Signal Processor (DSP) a typical family of microprocessors
DSP assembly code: code for DSP stored in program memory
Dual Ported Memory: Data Memory accessible by two processors at the same time
CODEC: Compressor/Decompressor of the data
User Interface Process: The part of the RTOS that runs the software for User Interface activities
Controller Process: The part of the RTOS that runs the software for Timing and Control
amongst the various units of the embedded system
o m
.c Process
User Interface
t
Controller
o
s p
ASIC C RTOS
o g User Interface
. bl Process
p
System Bus
u
r o
DSP Digital Signal
s g Digital Signal DSP
nt
assembly code Processor Processor assembly code
d e
u
t Dual-port memory
y s CODEC
t
ci
Hardware
.
w Software
w
w Fig. 1.4 Architecture of an Embedded System
The above architecture represents a hypothetical Embedded System (we will see more realistic
ones in subsequent examples). More than one microprocessor (2 DSPs and 1 C) are employed
here to carry out different tasks. As we will learn later, the C is generally meant for simpler and
slower jobs such as carrying out a Proportional Integral (PI) control action or interpreting the
user commands etc. The DSP is a more heavy duty processor capable of doing real time signal
processing and control. Both the DSPs along with their operating systems and codes are
independent of each other. They share the same memory without interfering with each other.
This kind of memory is known as dual ported memory or two-way post-box memory. The Real
Time Operating System (RTOS) controls the timing requirement of all the devices. It executes
the over all control algorithm of the process while diverting more complex tasks to the DSPs. It
also specifically controls the C for the necessary user interactivity. The ASICs are specialized

units capable of specialized functions such as motor control, voice encoding,

modulation/demodulation (MODEM) action etc. They can be digital, analog or mixed signal
VLSI circuits. CODECs are generally used for interfacing low power serial Analog-to-Digital
Converters (ADCs). The analog signals from the controlled process can be monitored through an
ADC interfaced through this CODEC.
Please click on
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

Questions and Answers

Q1 Which of the following is a real time embedded system? Justify your answer
(a) Ceiling Fan
(b) Microwave Oven
(c) Television Set
(d) Desktop Key Board
(e) Digital Camera
Ans:
(b) and (e) are embedded systems
o m
t.c
(a) Ceiling Fans: These are not programmable.
(b) & (e) obey all definitions of Embedded Systems such as
o
(i) Working in Real Time (ii) Programmable (iii) A number of systems coexist on a
p
(c) g s
single platform to discharge one function(single functioned)
Television Set: Only a small part of it is programmable. It can work without being
o
(d)
programmable. It is not tightly constrained.
. bl
Desktop Keyboard: Though it has a processor normally it is not programmable.
u p
Definition of Real Time Systems r o
s g
n t is called a real-time operation if the combined
An operation within a larger dynamic system
d e
reaction- and operation-time of a task operating
maximum delay allowed, in view of circumstances
on current events or input, is no longer than the
outside the operation. The task must also
t u
occur before the system to be controlled becomes unstable. A real-time operation is not
s
necessarily fast, as slow systemsycan allow slow real-time operations. This applies for all types
c
of dynamically changing systems.it The polar opposite of a real-time operation is a batch job with
. somewhere in between the two extremes.
interactive timesharing falling
w
w is said to be hard real-time if the correctness of an operation depends
Alternately, a system
w correctness of the operation but also upon the time at which it is
not only upon the logical
performed. An operation performed after the deadline is, by definition, incorrect, and usually has
no value. In a soft real-time system the value of an operation declines steadily after the deadline
expires.
Embedded System
An embedded system is a special-purpose system in which the computer is completely
encapsulated by the device it controls. Unlike a general-purpose computer, such as a personal
computer, an embedded system performs pre-defined tasks, usually with very specific
requirements. Since the system is dedicated to a specific task, design engineers can optimize it,
reducing the size and cost of the product. Embedded systems are often mass-produced, so the
cost savings may be multiplied by millions of items.

Handheld computers or PDAs are generally considered embedded devices because of the
nature of their hardware design, even though they are more expandable in software terms. This
line of definition continues to blur as devices expand.
Q.2 Write five advantages and five disadvantages of embodiment.
Ans:
Five advantages:
1. Smaller Size
2. Smaller Weight
3. Lower Power Consumption
4. Lower Electromagnetic Interference
5. Lower Price
Five disadvantages
o m
1.
2.
Lower Mean Time Between Failure
Repair and Maintenance is not possible
o t.c
3. Faster Obsolesce
s p
4. Unmanageable Heat Loss
o g
bl
5. Difficult to Design
p .
Q3. What do you mean by Reactive in Real Time. Cite an example.
u
o
Ans:
g r
s
Many embedded systems must continually treact to changes in the systems environment and
n
must compute certain results in real timeewithout delay. For example, a cars cruise controller
u
continually monitors and reacts to speed d and brake sensors. It must compute acceleration or
st In contrast
deceleration amounts repeatedly within
failure to maintain control of theycar.
a limited time; a delayed computation could result in a
i t a desktop computer system typically focuses on

c
computations, with relatively infrequent (from the computers perspective) reactions to input
devices. In addition, a delay. in those computations, while perhaps inconvenient to the computer
w
w in a system failure.
user, typically does not result
w
Q4. Give at least five examples of embedded systems you are using/watching in your day to day
life.
(i) Mobile Telephone (ii)Digital Camera (iii) A programmable calculator (iv) An iPod (v) A
digital blood pressure machine
iPod: The iPod is a brand of portable media players designed and marketed by Apple Computer.
Devices in the iPod family are designed around a central scroll wheel (except for the iPod
shuffle) and provide a simple user interface. The full-sized model stores media on a built-in hard
drive, while the smaller iPod use flash memory. Like many digital audio players, iPods can serve
as external data storage devices when connected to a computer.

Q5. Write the model number and detailed specification of your/friends mobile telephone.
Manufacturer
Model:
Network Types: EGSM/ GSM /CDMA
Form Factor: The industry standard that defines the physical, external dimensions of a particular
device. The size, configuration, and other specifications used to describe hardware.
Battery Life Talk (hrs):
Battery Life Standby (hrs):
Battery Type:
Measurements
Weight:
Dimensions:
Display Display Type: Colour or Black & White
Display Size (px):
o m
Display Colours:
o t.c
General Options
s p
Camera:
o g
bl
Mega Pixel:
Email Client:
Games: Yes p .
High Speed Data:
ou
MP3 Player:
gr
PC Sync: Yes
s
Phonebook:
Platform Series ent
Polyphonic Ring tones:
u d
Predictive Text:
st
Streaming Multimedia:
it y
Text Messages:
Wireless Internet: Opera .c
w
w
Other Options
Alarm:
w
Bluetooth:
Calculator:
Calendar:
Data Capable:
EMS:
FM Radio:
Graphics (Custom):
Infrared:
Speaker Phone:
USB:
Vibrate:

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
2
Introduction to Real Time
Embedded Systems Part II
Structure and Design
After going through this lesson the student will
Learn more about the numerous day-to-day real time embedded systems
Learn the internal hardware of a typical mobile phone
Learn about the important components of an RTES
Learn more about a mobile phone
Learn about the various important design issues
Also learn the design flow
o m
Pre-Requisite o t.c
s p
o g
bl
p.
Common Examples Of Embedded Systems
u
r o
Some of the common examples of Embedded Systems are given below:
g
s cameras, camcorders, DVD players, portable
t
n assistants etc.
Consumer electronics cell phones, pagers, digital
e
video games, calculators, and personal digital
d
t u
s
i ty
.c
w
w
w Fig. 2.1(a) Digital Camera

Fig. 2.1(b) Camcorder
o m
.c
Fig. 2.1(c) Personal Digital Assistants
t
p o
g s
b lo
Home appliances microwave ovens, answering machines. p . thermostats, home security systems,
washing machines. and lighting systems etc.
o u
gr
ts
e n
u d
st 2.1(d) Microwave Oven
Fig.
i ty
.c
w
w
w
Fig. 2.1(e) Washer and Dryers

office automation fax machines, copiers, printers, and scanners
Fig. 2.1(f) Fax cum printer cum copier
business equipment electronic cash registers, curbside check-in, alarm systems, card readers
product scanners, and automated teller machines
o m
o t.c
s p
o g
. bl
p
Fig. 2.1(g) Electronic Cash Registers
u
r o
s g
ent
u d
st
it y
.c
Fig. 2.1(h)Electronic Card Readers
w
w
w
Fig. 2.1(i)Automated Teller Machines

automobiles Electronic Control Unit(ECU) which includes transmission control, cruise control,
fuel injection, antilock brakes, and active suspension in the same or separate modules.
Fig. 2.1(j)ECU of a Vehicle
o m
o t.c
s p
Mobile Phone o g
. bl
typical architecture of RTES. u p
Let us take the same mobile phone as discussed in Lesson 1 as example for illustrating the
r o
s g components:
In general, a cell phone is composed of the following
n t
A Circuit board (Fig. 2.2)
d e
Antenna
t u
Microphone
y s
t
Speaker
. ci
w (LCD)
Liquid crystal display
w
Keyboard
w
Battery

o m
o t.c
s p
o g
bl
Fig. 2.2 The Cell Phone Circuitry
p .
o u
gr
ts
RF receiver (Rx)
Antenna e n DSP Speaker
d
u (Tx)
t
RF transmitter
s
Microphone
ti y
.c Micro- Display
w controller
w Keyboard
w
Fig. 2.3 The block diagram
A typical mobile phone handset (Fig. 2.3) should include standard I/O devices (keyboard, LCD),
plus a microphone, speaker and antenna for wireless communication. The Digital Signal
Processor (DSP) performs the signal processing, and the micro-controller controls the user
interface, battery management, call setup etc. The performance specification of the DSP is very
crucial since the conversion has to take place in real time. This is why almost all cell phones
contain such a special processor dedicated for making digital-to-analog (DA) and analog-to-
digital(AD) conversions and real time processing such as modulation and demodulation etc. The
Read Only Memory (ROM) and flash memory (Electrically Erasable and Programmable
Memory) chips provide storage for the phones operating system(RTOS) and various data such
as phone numbers, calendars information, games etc.

Components of an Embedded System

By this time we know where are our Embedded Systems and what makes them stand out from
other systems like Calculators, Desktop Computers, and our Old Television Sets. We have also
developed some 6th sense to guess the components of an RTES.
1. Microprocessor
This is the heart of any RTES. The microprocessors used here are different from the general
purpose microprocessors like Pentium Sun SPARC etc. They are designed to meet some specific
requirements. For example Intel 8048 is a special purpose microprocessor which you will find in
the Keyboards of your Desktop computer. It is used to scan the keystrokes and send them in a
synchronous manner to your PC. Similarly mobile phones Digital Cameras use special purpose
processors for voice and image processing. A washer and dryer may use some other type of
processor for Real Time Control and Instrumentation.
o m
2. Memory
o t.c
p
s Circuit Board(PCB) or same
The microprocessor and memory must co-exit on the same Power
g
chip. Compactness, speed and low power consumption areothe characteristics required for the
b lsemiconductor memories are used in
memory to be used in an RTES. Therefore, very low power
almost all such devices. For housing the operating system .
pduration.
Read Only Memory(ROM) is used.
The program or data loaded might exist for considerable
o u It is like changing the setup of
your Desktop Computer. Similar user defined setups
g r exist in RTES. For example you may like to
s be capable of retaining the information even
change the ring tone of your mobile and keep it for some time. You may like to change the
t
after the power is removed. In other words e n the memory should be non-volatile and should be
screen color etc. In these cases the memory should
easily programmable too. It is achieveddby using Flash memories.1
t u
s
3. Input Output Devices ti y and Interfaces
c
.necessary
w
Input/Output interfaces are to make the RTES interact with the external world. They
could be Visual Display wspeakers
Units such as TFT screens in a mobile phone, touch pad key board,
antenna, microphones,w etc. These RTES should also have open interfaces to other
devices such as Desktop Computers, Local Area Networks (LAN) and other RTES. For example
you may like to download your address book into your personal digital assistant (PDA). Or you
may like to download some mp3 songs from your favorite internet site into your mp3 player.
These input/output devices along with standard software protocols in the RTOS provide the
necessary interface to these standards.
1
A memory technology similar in characteristics to EPROM(Erasable Programmable Read Only Memory) memory,
with the exception that erasing is performed electrically instead of via ultraviolet light, and, depending upon the
organization of the flash memory device, erasing may be accomplished in blocks (typically 64k bytes at a time)
instead of the entire device.

4. Software
The RTES is the just the physical body as long as it is not programmed. It is like the human body
without life. Whenever you switch on your mobile telephone you might have marked some
activities on the screen. Whenever you move from one city to the other you might have noticed
the changes on your screen. Or when you are gone for a picnic away from your city you might
have marked the no-signal sign. These activities are taken care of by the Real Time Operating
System sitting on the non-volatile memory of the RTES.
Besides the above an RTES may have various other components and Application Specific
Integrated Circuits (ASIC) for specialized functions such as motor control, modulation,
demodulation, CODEC.
The design of a Real Time Embedded System has a number of constraints. The following section
discusses these issues.
o m
Design Issues
o t.c
The constraints in the embedded systems design are imposed s p
by external as well as internal
g
lo
specifications. Design metrics are introduced to measure the cost function taking into account
the technical as well as economic considerations.
. b
Design Metrics u p
r o
A Design Metric is a measurable feature g
t sare conflicting requirements i.e. optimizing one
of the systems performance, cost, time for
implementation and safety etc. Most of these
e
shall not optimize the other: e.g. a cheaper
n processor may have a lousy performance as far as
speed and throughput is concerned.
u d
s t
ty into account while designing embedded systems
Following metrics are generally taken
i
NRE cost (nonrecurring .c engineering cost)
w
It is one-time cost ofw
w
designing the system. Once the system is designed, any number of units can
be manufactured without incurring any additional design cost; hence the term nonrecurring.
Suppose three technologies are available for use in a particular product. Assume that
implementing the product using technology A would result in an NRE cost of $2,000 and unit
cost of $100, that technology B would have an NRE cost of $30,000 and unit cost of $30, and
that technology C would have an NRE cost of $100,000 and unit cost of $2. Ignoring all other
design metrics, like time-to-market, the best technology choice will depend on the number of
units we plan to produce.
Unit cost
The monetary cost of manufacturing each copy of the system, excluding NRE cost.

Size
The physical space required by the system, often measured in bytes for software, and gates or
transistors for hardware.
Performance
The execution time of the system
Power Consumption
It is the amount of power consumed by the system, which may determine the lifetime of a
battery, or the cooling requirements of the IC, since more power means more heat.
Flexibility o m
o t.c
The ability to change the functionality of the system without incurring heavy NRE cost. Software
is typically considered very flexible.
s p
o g
Time-to-prototype
. bl
The time needed to build a working version of the p
u system, which may be bigger or more
expensive than the final system implementation,obut it can be used to verify the systems
gr functionality.
usefulness and correctness and to refine the systems
ts
Time-to-market e n
u d
The time required to develop a system
s t to the point that it can be released and sold to customers.
The main contributors are design
i t y time, manufacturing time, and testing time. This metric has
c
become especially demanding in recent years. Introducing an embedded system to the
marketplace early can make.a big difference in the systems profitability.
w
Maintainabilityw
w
It is the ability to modify the system after its initial release, especially by designers who did not
originally design the system.
Correctness
This is the measure of the confidence that we have implemented the systems functionality
correctly. We can check the functionality throughout the process of designing the system, and we
can insert test circuitry to check that manufacturing was correct.
The Performance Design Metric

Performance of a system is a measure of how long the system takes to execute our desired tasks.
The two main measures of performance are:
Latency or response time

This is the time between the start of the tasks execution and the end. For example, processing an
image may take 0.25 second.
Throughput
This is the number of tasks that can be processed per unit time. For example, a camera may be
able to process 4 images per second
These are the some of the cost measures for developing an RTES. Optimization of the overall
cost of design includes each of these factors taken with some multiplying factors depending on
m
their importance. And the importance of each of these factors depends on the type of application.
o
t.c
For instance in defense related applications while designing an anti-ballistic system the execution
time is the deciding factor. On the other hand, for de-noising a photograph in an embedded
p o
camera in your mobile handset the execution time may be little relaxed if it can bring down the
cost and complexity of the embedded Digital Signal Processor. s
g
o
The design flow of an RTES involves several steps. The costl and performance is tuned and fine-
. b is enumerated below.
tuned in a recursive manner. An overall design methodology
u p
Design Methodology (Fig. 2.4) r o
s g
System Requirement and Specifications
n t
Define the problem
What your embedded system is requireddto do?
e
t u control)
syour system?
Define the requirements (inputs, outputs,
y
What are the inputs and outputs of
t
i them
. c
Write down the specifications for
Specify if the signals are in digital or analogue form. Specify the voltage levels, frequency etc.
w
w segregated into the following steps
The design task can be further
System level Design

w
Find out the possible subsystems of the system and the interconnections between them.
Sub-system or Node Level design

Each of these subsystems can be termed as the nodes. Elaborate on each of these subsystems and
further make the block diagram and component level interconnections.
Processor Level Design

Each subsystem may consist of processor, memory, I/O devices. Specification and design at this
level is required now.
Task Level Design

Complete interconnection of these subsystems depending on the tasks they would perform.
Overall System specifications

Input to the design
System level design
Node Level Specifications
Output to node level design
Node level design

Processor Level o m
Specifications
o t.c
p
Output to processor level design
s
Processor level design
o g
Task Specifications
. bl
p
u to task level design
Output
r o
Task level g design
ts
Fig. 2.4 Thee ndesign approach
u d
Conclusion s t
i ty systems has been encompassing more and more diverse
The scope of embedded
.c day by day. Obsolescence of technology occurs at a much faster
wto the same in other areas. The development of Ultra-Low-Power VLSI
disciplines of technology
pace as comparedw
mixed signalwtechnology is the prime factor in the miniaturization and enhancement of
the performance of the existing systems. More and more systems are tending to be
compact and portable with the RTES technology. The future course of embedded systems
depends on the advancements of sensor technology, mechatronics and battery technology.
The design of these RTES by and large is application specific. The time-gap between the
conception of the design problem and marketing has been the key factor for the industry.
Most of the cases for very specific applications the system needs to be developed using
the available processors rather than going for a custom design.

Questions
Q1. Give one example of a typical embedded system other than listed in this lecture. Draw the
block diagram and discuss the function of the various blocks. What type of embedded
processor they use?
Ans:
Example 1: A handheld Global Positioning System Receiver
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
For details please http://www.gpsworld.com/
A GPS receiver receives signals from a constellation of at least four out of a total of 24 satellites.
Based on the timing and other information signals sent by these satellites the digital signal
processor calculates the position using triangulation.

The major block diagram is divided into (1) Active Antenna System (2)RF/IF front end (3) The
Digital Signal Processor(DSP)
The Active Antenna System houses the antenna a band pass filter and a low noise amplifier
(LNA)
The RF/IF front end houses another band pass filter, the RF amplifier and the demodulator and
A/D converter.
The DSP accepts the digital data and decodes the signal to retrieve the information sent by the
GPS satellites.
Q2. Discuss about the Hard Disk Drive housed in your PC. Is it an RTES?
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
Ans:
u d
st
it y
Hard drives have two kinds of components: internal and external. External components are
.c
located on a printed circuit board called logic board while internal components are located in a
sealed chamber called HDA or Hard Drive Assembly.
w
w
For details browse http://www.hardwaresecrets.com/article/177/3
w
The big circuit is the controller. It is in charge of everything: exchanging data between the hard
drive and the computer, controlling the motors on the hard drive, commanding the heads to read
or write data, etc.
All these tasks are carried out as demanded by the processor sitting on the motherboard. It can be
verified to be single-functioned, tightly constrained,
Therefore one can say that a Hard Disk Drive is an RTES.

Q3. Elaborate on the time-to-market design metric.
Ans:
The time required to develop a system to the point that it can be released and sold to customers.
The main contributors are design time, manufacturing time, and testing time. This metric has
become especially demanding in recent years. Introducing an embedded system to the
marketplace early can make a big difference in the systems profitability.
Q4. What is Moores Law? How was it conceived?
Moore's law is the empirical observation that the complexity of integrated circuits, with respect
to minimum component cost, doubles every 24 months. It is attributed to Gordon E. Moor, a co-
founder of Intel.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

References and Further Reading

[1] Richard Bohuslav Kosik , Digital ignition & Electronic fuel injection Department of
Computer Science and Electrical Engineering The University of Queensland, Australia,
Bachelors Thesis, October 2000
[2] Frank Vahid, Tony Givargis, Embedded System Design, A Unified
Hardware/Software Introduction,John Wiley and Sons Inc, 2002
[3] Wayne Wolf, Computers as Components, Morgan Kaufmann, Harcourt India,2001
[4] A.M Fox, J.E. Cooling, N.S. Cooling, Integrated Design approach for real time
embedded systems, Proc. IEE-Softw., Vo.146, No.2., April 1999, page 75-85.
[5] Phen Edwards, Luciano Lavagno, Dward A. Lee.Alberto Sangiovanni- Vincentelli ,
Design of Embedded Systems: Formal Models, Validation, and Synthesis,
PROCEEDINGS OF THE IEEE, VOL. 85, NO. 3, MARCH 1997, page-366-390
[6] J.A. Debardelaben, V. K. Madisetti, A. J. Gadeint, Incorporating Cost Modeling in
1997, Page 24-35 o m

Embedded-System Design, IEEE Design and Test of Computers, July-September-
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
3
Embedded Systems
Components Part I
Structural Layout with Example
After going through this lesson the student would
Know the structural layout

The specifications of major components of an Embedded system
Especially learn about a single board computer
Pre-Requisite
o m
Introduction
o t.c
s p
The various components of an Embedded System can be hierarchically grouped as
g
System Level Components to Transistor Level Components. A system (subsystem) component is
o
. bl
different than what is considered a "standard" electronic component. Standard components are
the familiar active devices such as integrated circuits, microprocessors, memory, diodes,
u p
transistors, etc. along with passives such as resistors, capacitors, and inductors. These are the
o
basic elements needed to mount on a circuit board for a customized, application-specific design.
r
s ghas active and passive components mounted on
A system component on the other hand,
n t task. (Fig. 3.1) System components can be either
circuit boards that are configured for a specific
d e as highly integrated building blocks of a system. A
single- or multi-function modules that serve
system component can be as simpleuas a digital I/O board or as complex as a computer with
video, memory, networking, and I/O s t all on a single board. System components support industry
standards and are available fromty
i multiple sources worldwide.
.c
w
w
w

System
Subsystems
(PCBs)
Processor Level Components

(Integrated Circuits)
(Microprocessors, Memory, I/O devices etc)
o m
t.c
Gate Level Components
Generally inside the
Integrated Circuits rarely outside
p o
g s
o
Fig. 3.1 The Hierarchical Components
. bl
Structure of an Embedded System
u p
r o
g
The typical structure of an embedded system is shown in Fig. 3.2. This can be compared
s
nt
with that of a Desktop Computer as shown in Fig. 3.3. Normally in an embedded system the
primary memory, central processing unit and many peripheral components including analog-to-
d e
digital converters are housed on a single chip. These single chips are called as Microcontrollers.
ut
This is shown by dotted lines in Fig. 3.2.
s
y computer may contain all these units on a single Power
i t
On the other hand a desktop
Circuit Board (PCB) called cas the Mother Board. Since these computers handle much larger
. to the embedded systems there has to be elaborate arrangements
wtransfer between the CPU and memory, CPU and input/output devices
dimension of data as compared
w
for storage and faster data
w
and memory and input/output devices. The storage is accomplished by cheaper secondary
memories like Hard Disks and CDROM drives. The data transfer process is improved by
incorporating multi-level cache and direct memory access methods. Generally no such
arrangements are necessary for embedded systems. Because of the number of heterogeneous
components in a desktop computer the power supply is required at multiple voltage-levels
(typically 12, 5, 3, 25 volts). On the other hand an Embedded Systems chip may just need
one level DC power supply (typically +5V).
In a desktop computer various units operate at different speeds. Even the units inside a
typical CPU such as Pentium-IV may operate at different speeds. The timing and control units
are complex and provide multi-phase clock signal to the CPU and other peripherals at different
voltage levels. The timing and control unit for an Embedded system may be much simpler.

Primary Memory
Power Supply
Central Processing Unit
Input Output Devices

(AD Converters, UARTs, Infrared Ports)
o m
t.c
AD Converter-Analog to Digital Converter
UART Universal Asynchronous Receiver and Transmitter
p o
s
Fig. 3.2 The typical structure of an Embedded System
g
o
. bl
Primary Memory
u p
r o
sg
Cache tMemory
e n
Direct Memory Access
u d
Power Supply
st Microprocessor
it y
.c
w
w Input Output Interfaces
w
Keyboard, Hard Disk Drive,
Network Card,
Video Display Units
Fig. 3.3 The structural layout of a desktop Computer
Typical Example
A Single Board Computer (SBC)

Since you are familiar with Desktop Computers, we should see how to make a desktop
PC on a single power circuit board. They will be called Single Board Computers or SBC.
These SBCs are typical embedded systems custom-made generally for Industrial
Applications. In the introductory lectures you should have done some exercises on your PC.
Now try to compare with this SBC with your desktop.
Let us look at an example of a single board computer from EBC-C3PLUS SBC from
Winsystems1.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
Fig. 3.4 The Single Board Computer (SBC)
w
w
Let us discuss and try to understand the features of the above single board Embedded computer.
w
This will pave the way of our understanding more complex System-On-Chip (SOC) type of
systems.
The various unit and their specifications are as follows
VIA 733MHz or 1 GHz low power C3 processor EBX-compliant board (Fig. 3.5)
This is the processor on this SBC. VIA represents the company which manufactures the
processor (www.via.com.tw), 733MHz or 1GHz is the clock frequency of this processor. C3 is
1
Courtesy WinSystems, Inc. 715 Stadium Drive, Arlington Texas 76011
http://sbc.winsystems.com/products/sbcs/ebcc3plus.html

the brand name as P3 and P4 for Intel. (You must be familiar with Intel processors as your PC
has one)
o m
Fig. 3.5 The Processor
o t.c
32 to 512MB of system PC133 SDRAM supported in a 168-pin DIMM
p
s socket
g
o on the SBC. SDRAM stands for
32 to 512 MB tells the possible Random Access Memory size l
bthis in the memory chapter. 168-pin
Synchronous Dynamic RAM. We will learn more about
DIMM stands for Dual-In-Line Memory-Modules which
.
p the memory chips and can fit into
holds
the board easily.
o u
DIMMs Look like this gr
ts
e n
u d
st
ti y
.c
w
w Fig. 3.6 DIMM
w
Socket for up to 1Giga Byte bootable DiskOnChip or 512KB SRAM or 1MB EPROM
These are Static RAMs (SRAM) or EPROM which houses the operating system just like the
Hard Disk in a Desktop computer
Type I and II Compact Flash (CF) cards supported
It is otherwise known as semiconductor hard-disk or floppy disk.
Flash memory is an advanced form of Electrically Erasable and Programmable Read Only
Memory (EEPROM). Type I and Type II are just two different designs Type II being more
compact and is a recent version.

Fig. 3.7 Flash Memory

PC-compatible supports Linux, Windows CE.NET and XP, plus other x86-compatible RTOS
This indicates the different types of operating systems supported on this SBC platform.
High resolution video controller supports: Color panels supported with up to 36-bits/pixel
Supports resolutions up to 1920 x 1440
This is the video quality supported by the on-board video chips
Simultaneous CRT and LCD operation: 4X AGP local bus for high speed operation: LVDS
supported
o m
CRT is for cathode ray terminal, LCD for Liquid Crystal Display terminalc
AGP means Accelerated Graphic Port o t.
4X represents the speed of the graphic port s p
g
Accelerated Graphics Port: An extremely fast expansion-slotoand bus (64 bit) designed for high-
performance graphics cards
. bl
LVDS Low Voltage Differential Signaling, a low noise,
u p low power, low amplitude method for
high-speed (gigabits per second) data transmission
r o over copper wire on the Power Circuit
Boards.
s g
Dual 10/100 Mbps Intel PCI Ethernet controllers
n t
The networking interface
d e
4 RS-232 serial ports with FIFO, COM1 t u & COM2 with RS-422/485 support
The serial interface FIFO stands y s First in First Out,
t for
i are the serial communication standards which you will study in
RS-232/RS-422/RS-485: These
. c
w
due course. COM1 and COM2 stands for the same RS232 port. (your desktop has COM ports)
Bi-directional LPT portwsupports EPP/ECP
LPT stands for Linew Printer Terminal: EPP/ECP stands for Enhanced Parallel Port and Extended
Capabilities Port
48 bi-directional TTL digital I/O lines with 24 pins capable of event sense interrupt generation
These are extra digital Input/Output lines. 24 lines are capable of sensing interrupts.
Four USB ports onboard
USB Universal Serial Bus, an external bus standard that supports data transfer rates of 12 Mbps.
A single USB port can be used to connect up to 127 peripheral devices, such as mouse, modems,
and keyboards.

Two, dual Ultra DMA 33/66/100 EIDE connectors

Ultra DMA
DMA stands for Direct Memory Access. It is a mode to transfer a bulk of data from the memory
to hard-drive and vice-versa
EIDE
Short for Enhanced Integrated Drive Electronics (IDE), a newer version of the IDE mass storage
device interface. It supports higher data rates about three to four times faster than the old IDE
standard. In addition, it can support mass storage devices of up to 8.4 gigabytes, whereas the old
standard was limited to 528 MB. The numbers 33/66/100 indicates bit rates in Mbps
Floppy disk controller supports 1 or 2 drives
AC97 Audio-Codec 97
Audio Codec '97 (AC'97) is the specification for, 20-bit audio architecture used in many desktop
o m
PCs. The specification was developed in the old Intel Architecture Labs in 1997 to provide
t.c
system developers with a standardized specification for integrated PC audio devices. AC'97
p
bit playback in stereo and 48kHz/20-bit in multi-channel playback modes o
defined a high-quality audio architecture for the PC and is capable of delivering up to 96kHz/20-
PC/104 and PC/104-Plus expansion connectors g s

o
called the PC, and from the number of pins used to connect . bl the cards
PC104 gets its name from the popular desktop personal computers initially designed by IBM
cards are much smaller than ISA-bus cards found in PC's p together (104). PC104
u and stack together which eliminates the
need for a motherboard, backplane, and/or card cageo
AT keyboard controller and PS/2 mouse supportg
r
ts
An 84-key keyboard introduced with the PCn/AT. It was later replaced with the 101-key
Enhanced Keyboard.
d e
Two interrupt controllers and 7 DMA t uchannels, Three, 16-bit counter/timers, Real Time Clock,
Watch Dog Timer and Power on Self
y s Test
t
i channels, counter/timers and Real Time Clock are used for real
The interrupt controllers, DMA
. c
time applications.
w
Specifications w
w
+5 volt only operation
Mechanical
Dimensions: 5.75" x 8.0" (146mm x 203mm)
Jumpers: 0.025" square posts
Connectors
Serial, Parallel, Keyboard: 50-pin on 0.100" grid
COM3 & 4: 20-pin on 0.100" grid
Floppy Disk Interface: 34-pin on 0.100" grid
EIDE Interface: 40-pin on 0.100" grid (Primary)
44-pin on 2mm grid (Primary)
40-pin on 0.100" grid (Secondary)
50-pin 2mm Flash connector
Parallel I/O: Two, 50-pin on 0.100" grid

CRT: 14-pin on 2-mm. grid

FP-100 Panel: Two, 50-pin on 2-mm. grid
LVDS 20-pin on 0.100" grid
Ethernet: Two RJ-45
PC/104 bus: 64-pin 0.100" socket, 40-pin 0.100" socket
PC/104-Plus 120-pin (4 x 30; 2mm) stackthrough with shrouded header
USB Four, 4-pin 0.100
Audio Three, 3.5mm stereo phone jacks
Power: 9-pin in-line Molex
Environmental
Operating Temperature:
-40 to +85C (733MHz)
-40 to +60C (1GHz)
Non-condensing relative humidity: 5% to 95%
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 3.8 Another Single Board Computer
Conclusion
It is apparent from the above example that a typical embedded system consist of by and large the
following units housed on a single board or chip.

1. Processor
2. Memory
3. Input/Output interface chips
4. I/O Devices including Sensors and Actuators
5. A-D and D-A converters
6. Software as operating system
7. Application Software
One or more of the above units can be housed on a single PCB or single chip
In a typical Embedded Systems the Microprocessor, a large part of the memory and major I/O
devices are housed on a single chip called a microcontroller. Being custom-made the embedded
systems are required to function for specific purposes with little user programmability. The user
interaction is converted into a series of commands which is executed by the RTOS by calling
various subroutines. RTOS is stored in a flash memory or read-only-memory. There will be
additional scratch-pad memory for temporary data storage. If the CPU sits on the same chip as
o m
memory then a part of the memory can be used for scratch-pad purposes. Otherwise a number of
o t.c
CPU registers will be required for the same. CPU communicates with the memory through the
address and data bus. The timing and control of these data exchange takes place by the control
p
unit of the CPU via the control lines. The memory which is housed on the same chip as the CPU
s
g
has the fastest transfer rate. This is also known as the memory band-width or bit rate. The
o
bl
memory outside the processor chip is slower and hence has a lesser transfer rate. On the other
p .
hand Input/Output devices have a varied degree of bandwidth. These varying degrees of data
transfer rates are handled in different ways by the processor. The slower devices need interface
u
chips. Generally chips which are faster than the microprocessor are not used.
o
r
Architecture of a typical embedded-system is shown in Fig. 3.8. The hardware unit consists of
g
s
the above units along with a digital as well as an analog subsystem. The software in the form of a
RTOS resides in the memory.
e nt
u d
st EMBEDDED SYSTEM
it y
.c hardware
w software
w digital
w mechanical
optical
subsystem

subsystem sensors analog
subsystem
actuators
Fig. 3.9 Typical Embedded System Architecture

Question Answers
Q1. What are the Hierarchical components in a embedded system design.
Ans:
System
Subsystems
(PCBs)
Processor Level Components

(Integrated Circuits) o m
(Microprocessors, Memory, I/O devices etc)
o t.c
s p
o g
bl
Gate Level Components
Generally inside the
p .
Integrated Circuits rarely outside
o u
r
The Hierarchical Components
g
s
Q.2. What is LVDS?
e nt
Ans: u d
st
Known as Low Voltage Differential
i ty canSignaling. The advantages of such a standard is low noise
.
and low interference such that c one increase the data transmission rate. Instead of 0 and 5 V
w
or 5V a voltage level of 1.5 or 3.3 V is used for High and 0 or 1 V is used for Low. The Low to
w
High voltage swing reduces interference. A differential mode rejects common mode noises.
w
Q.3. Is there any actuator in your mobile phone?
Ans:
There is a vibrator in a mobile phone which can be activated to indicate an incoming call or
message. Generally there is a coreless motor which is operated by the microcontroller for
generating the vibration.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
4
Embedded Systems
Components Part II
Overview on Components
Overview of the following

o Processors
o Memory
o Input/Output Devices
Pre-Requisite
Digital Electronics, Microprocessors o m
o t.c
You are now almost familiar with the various components of an embedded system. In this
chapter we shall discuss some of the general components such as
s p
Processors o g
Memory . bl
Input/Out Devices u p
r o
Processors s g
ent
The central processing unit is the most important component in an embedded system. It exists in
u d
an integrated manner along with memory and other peripherals. Depending on the type of
st
applications the processors are broadly classified into 3 major categories
it y
1. General Purpose Microprocessors
.c
2. Microcontrollers
w
w
3. Digital Signal Processors
w
For more specific applications customized processors can also be designed. Unless the demand is
high the design and manufacturing cost of such processors will be high. Therefore, in most of the
applications the design is carried out using already available processors in the market. However,
the Field Programmable Gate Arrays (FPGA) can be used to implement simple customized
processors easily. An FPGA is a type of logic chip that can be programmed. They support
thousands of gates which can be connected and disconnected like an EPROM (Erasable
Programmable Read Only Memory). They are especially popular for prototyping integrated
circuit designs. Once the design is set, hardwired chips are produced for faster performance.
General Purpose Processors

A general purpose processor is designed to solve problems in a large variety of applications as
diverse as communications, automotive and industrial embedded systems. These processors are

generally cheap because of the manufacturing of large number of units. The NRE (Non-recurring
Engineering Cost: Lesson I) is spread over a large number of units. Being cheaper the
manufacturer can invest more for improving the VLSI design with advanced optimized
architectural features. Thus the performance, size and power consumption can be improved.
Most cases, for such processors the design tools are provided by the manufacturer. Also the
supporting hardware is cheap and easily available. However, only a part of the processor
capability may be needed for a specific design and hence the over all embedded system will not
be as optimized as it should have been as far as the space, power and reliability is concerned.
Processor
Control unit Datapath
ALU
Controller Control
/Status
o m
t.c
oRegisters
s p
o g
.bl
PC IR
u p
r o
s g
ent I/O
u d Memory
st
it y
.c
Fig. 4.1 The architecture of a General Purpose Processor
w
w
Pentium IV is such a general purpose processor with most advanced architectural features.
w
Compared to its overall performance the cost is also low.
A general purpose processor consists of a data path, a control unit tightly linked with the
memory. (Fig. 4.1)
The Data Path consists of a circuitry for transforming data and storing temporary data. It
contains an arithmetic-logic-unit(ALU) capable of transforming data through operations such as
addition, subtraction, logical AND, logical OR, inverting, shifting etc. The data-path also
contains registers capable of storing temporary data generated out of ALU or related operations.
The internal data-bus carries data within the data path while the external data bus carries data to
and from the data memory. The size of the data path indicates the bit-size of the CPU. An 8-bit
data path means an 8-bit CPU such as 8085 etc.
The Control Unit consists of circuitry for retrieving program instructions and for moving data to,
from, and through the data-path according to those instructions. It has a program counter(PC) to
hold the address of the next program instruction to fetch and an Instruction register(IR) to hold
the fetched instruction. It also has a timing unit in the form of state registers and control logic.
The controller sequences through the states and generates the control signals necessary to read
instructions into the IR and control the flow of data in the data path. Generally the address size is
specified by the control unit as it is responsible to communicate with the memory. For each
instruction the controller typically sequences through several stages, such as fetching the
instruction from memory, decoding it, fetching the operands, executing the instruction in the data
path and storing the results. Each stage takes few clock cycles.
Microcontroller
Just as you put all the major components of a Desktop PC on to a Single Board Computer (SBC)
if you put all the major components of a Single Board Computer on to a single chip it will be
called as a Microcontroller. Because of the limitations in the VLSI design most of the
input/output functions exist in a simplified manner. Typical architecture of such a
microprocessor is shown in Fig. 4.2.
o m
t.c
po
Interrupt
Address Bus
IRAM XRAM
Serial Controller
Port
g s
o
bl
Parallel
.
Port
p
C500 Core
u
ROM
Timers
r o (1 or 8 Datapointer)
Peripheral
g
Parallel
s
Bus
Port
nt
Access
Control Control
A
d e
D
t u Housekeeper
y s RST
i t Ext.
EA
Data Bus
MDU
.c Control PSEN
w Port0/Port2
ALE
w WDU
XTAL
w
Fig. 4.2 The architecture of a typical microcontroller named as C500 from
Infineon Technology, Germany
*The double-lined blocks are core to the processor. Other blocks are on-chip
The various units of the processors (Fig. 4.2) are as follows:

The C500 Core contains the CPU which consists of the Instruction Decoder, Arithmetic Logic
Unit (ALU) and Program Control section
The housekeeper unit generates internal signals for controlling the functions of the individual
internal units within the microcontroller.
Port 0 and Port 2 are required for accessing external code and data memory and for emulation
purposes.

The external control block handles the external control signals and the clock generation.
The access control unit is responsible for the selection of the on-chip memory resources.
The IRAM provides the internal RAM which includes the general purpose registers.
The XRAM is another additional internal RAM sometimes provided
The interrupt requests from the peripheral units are handled by an Interrupt Controller Unit.
Serial interfaces, timers, capture/compare units, A/D converters, watchdog units (WDU), or a
multiply/divide unit (MDU) are typical examples for on-chip peripheral units. The external
signals of these peripheral units are available at multifunctional parallel I/O ports or at dedicated
pins.
Digital Signal Processor (DSP)

These processors have been designed based on the modified Harvard Architecture to handle real
time signals. The features of these processors are suitable for implementing signal processing
o m
algorithms. One of the common operations required in such applications is array multiplication.
For example convolution and correlation require array multiplication. This is accomplished by
o t.c
multiplication followed by accumulation and addition. This is generally carried out by Multiplier
and Accumulator (MAC) units. Some times it is known as MACD, where D stands for Data
move. Generally all the instructions are executed in single cycle.
s p
o g
.bl
Processing
Result/Operands
u p Data
Unit r o Memory
s g
ent
d
uAddress
Status Opcode
st
it y
Controlw
.c Instructions Program
w
Unit Memory
w Address
Fig. 4.3 The modified Harvard architecture

The MACD type of instructions can be executed faster by parallel implementation. This is
possible by separately accessing the program and data memory in parallel. This can be
accomplished by the modified architecture shown in Fig. 4.3. These DSP units generally use
Multiple Access and Multi Ported Memory units. Multiple access memory allows more than one
access in one clock period. The Multi-ported Memory allows multiple addresses as well Data
ports. This also increases the number of access per unit clock cycle.

Address Bus 1 Data Bus 1

Dual Port
Memory
Address Bus 2 Data Bus 2
Fig. 4.4 Dual Ported Memory
The Very Long Instruction Word (VLIW) architecture is also suitable for Signal Processing
applications. This has got a number of functional units and data paths as seen in Fig. 4.5. The
long instruction words are fetched from the memory. The operands and the operation to be
performed by the various units are specified in the instruction itself. The multiple functional
units share a common multi-ported register file for fetching the operands and storing the results.
o m
Parallel random access to the register file is possible through the read/write cross bar. Execution
RAM and the register file.

ot.c
in the functional units is carried out concurrently with the load/store operation of data between
Multi-ported Register
p
s File
g
b lo
.
Program Control Unit
p
u Cross Bar
r o
Read/Write
s g
n t
d e
Functional
u . . . . . . . Functional
st Unit 1 Unit n
it y
.c
w Instruction Cache
w
w
Fig. 4.5 Block Diagram of VLIW architecture
Microprocessors vs Microcontrollers
A microprocessor is a general-purpose digital computers central processing unit. To make a
complete microcomputer, you add memory (ROM and RAM) memory decoders, an oscillator,
and a number of I/O devices. The prime use of a microprocessor is to read data, perform
extensive calculations on that data, and store the results in a mass storage device or display the
results. These processors have complex architectures with multiple stages of pipelining and
parallel processing. The memory is divided into stages such as multi-level cache and RAM. The
development time of General Purpose Microprocessors is high because of a very complex VLSI
design.

ROM EEPROM
RAM
Microprocessor
Serial I/O
A/D
Parallel I/O
Analog I/O Input and
output
Input and
o m
t.c
output
ports ports Timer
po
D/A
g s
o PWM
. bl
u p
Fig. 4.6 A Microprocessor based System
r o
g
The design of the microcontroller is driven by the desire to make it as expandable and flexible
s
nt
as possible. Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to
on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip
d e
hardware for I/O and RAM and ROM they usually have pretty low performance CPU.
t u
Microcontrollers also often have timers that generate interrupts and can thus be used with the
s
CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a
y
it
microcontroller is to control the operations of a machine using a fixed program that is stored in
.c
ROM and does not change over the lifetime of the system. The microcontroller is concerned with
w
getting data from and to its own pins; the architecture and instruction set are optimized to handle
w
data in bit and byte size.
w

ROM EEPROM
RAM
Analog in A/D Serial I/O

CPU core
Parallel I/O
Timer
o m Analog out
t.c
po
Microcontroller PWM Filter
g s
o Digital PWM
. bl
p
Fig. 4.7 A Microcontroller
u
r o
The contrast between a microcontroller and a microprocessor is best exemplified by the fact that
s g
most microprocessors have many operation codes (opcodes) for moving data from external
ent
memory to the CPU; microcontrollers may have one or two. Microprocessors may have one or
two types of bit-handling instructions; microcontrollers will have many.
u d
A basic Microprocessors vs a basic DSP
st
it y
.c Program
w Memory
w
w Processor
Data
Memory
Fig. 4.8 The memory organization in a DSP

DSP Characterization
1. Microprocessors specialized for signal processing applications
2. Harvard architecture
3. Two to Four memory accesses per cycle
4. Dedicated hardware performs all key arithmetic operations in 1 cycle

5. Very limited SIMD(Single Instruction Multiple Data) features and Specialized, complex
instructions
6. Multiple operations per instruction
7. Dedicated address generation units
8. Specialized addressing [ Auto-increment Modulo (circular) Bit-reversed ]
9. Hardware looping.
10. Interrupts disabled during certain operations
11. Limited or no register Shadowing
12. Rarely have dynamic features
13. Relatively narrow range of DSP oriented on-chip peripherals and I/O interfaces
14. synchronous serial port
o m
Processor Memory
o t.c
s p
g
Fig. 4.9 Memory Organization in General Purpose Processor
o
Characterization of General Purpose Processor bl
.
1. CPUs for PCs and workstations E.g., Intel Pentium
p
u IV
r o
2. Von Neumann architecture
s g
3. Typically 1 access per cycle
n t
4. Most operations take more than 1 e cycle
u d only one operation per instruction
5.
t
General-purpose instructions Typically
s
6.
t
Often, no separate address
i y generation units
7.
.c modes
General-purpose addressing
8. w
Software loops only
w
9. w disabled
Interrupts rarely
10. Register shadowing common
11. Dynamic caches are common
12. Wide range of on-chip and off-chip peripherals and I/O interfaces
13. Asynchronous serial port...
Memory
Memory serves processor short and long-term information storage requirements while
registers serve the processors short-term storage requirements. Both the program and the data
are stored in the memory. This is known as Princeton Architecture where the data and program
occupy the same memory. In Harvard Architecture the program and the data occupy separate

memory blocks. The former leads to simpler architecture. The later needs two separate
connections and hence the data and program can be made parallel leading to parallel processing.
The general purpose processors have the Princeton Architecture.
The memory may be Read-Only-Memory or Random Access Memory (RAM). It may

exist on the same chip with the processor itself or may exist outside the chip. The on-chip
memory is faster than the off-chip memory. To reduce the access (read-write) time a local copy
of a portion of memory can be kept in a small but fast memory called the cache memory. The
memory also can be categorized as Dynamic or Static. Dynamic memory dissipate less power
and hence can be compact and cheaper. But the access time of these memories are slower than
their Static counter parts. In Dynamic RAMs (or DRAM) the data is retained by periodic
refreshing operation. While in the Static Memory (SRAM) the data is retained continuously.
SRAMs are much faster than DRAMs but consume more power. The intermediate cache
memory is an SRAM.
o m
In a typical processor when the CPU needs data, it first looks in its own data registers. If
o t.c
the data isn't there, the CPU looks to see if it's in the nearby Level 1 cache. If that fails, it's off to
the Level 2 cache. If it's nowhere in cache, the CPU looks in main memory. Not there? The CPU
p
gets it from disk. All the while, the clock is ticking, and the CPU is sitting there waiting.
s
Input/Output Devices and Interface Chipslo
g
. b
Typical RTES interact with the environment and
u p users through some inbuilt hardware.
r o
Occasionally external circuits are required for communicating with user, other computers or a
network.
s g
n
In the mobile handset discussed earlier
t the input output devices are, keyboard, the display
d
screen, the antenna, the microphone, speaker,e LED indicators etc. The signal to these units may
t u
be analog or digital in nature. To generate an analog signal from the microprocessor we need an
Digital to Analog Converter(DAC) s and to accept analog signal we need and Analog to Digital
Converter (ADC). These DACtyand ADC again have certain control modes. They may also
c i the microprocessor. To synchronize and control these interface
.
operate at different speed than
chips we may need another w interface chip. Similarly we may have interface chips for keyboard,
w chips
screen and antenna. These serve as relaying units to transfer data between the processor
w
and input/output devices. The input/output devices are generally slower than the processor.
Therefore, the processor may have to wait till they respond to any request for data transfer.
Number of idle clock cycles may be wasted for doing so. However, the input-output interface
chips carry out this task without making the processor to wait or idle.
Signal Conditioning A-D

Sensor
and Amplification Converter
Processor Memory
Actuator D-A
Amplification
Converter
Fig. 4.10 The typical input/output interface blocks

Conclusion
Besides the above units some real time embedded systems may have specific circuits included on
the same chip or circuit board. They are known as Application Specific Integrated Circuit
(ASIC). Some examples are
1. MODEMs (modulator, demodulator units)

It is used to modulate a digital signal into high-frequency analog signal for wire-less
transmission. There are various methods to convert a digital signal into analog form.
Amplitude Shift Keying (ASK)
Frequency Shift Keying (FSK)
Phase Shift Keying (PSK)
Quadrature Phase Shift Keying (QPSK)
The same unit is also used to demodulate the analog signal into digital forms.
o m
2. CODECs (Compress and Decompress Units) t.
c
p o
It is generally used to process digital video and/or audio files. A sCODEC reduces the amount of
o g end and reconstituting the
bl
data to be transmitted by discarding redundant data on the transmitting
signal on the receiving end.
p .
3. Filters o u
r
g by eliminating the out-band noise and other
s
t called Anti-aliasing filters, are used before the A-
Filters are used to condition the incoming signal
n
unnecessary signals. A specific class of filters
e a broad-band signal (signal with a very wide
d
D converters to prevent aliasing while acquiring
u
frequency spectrum)
st
4. Controllers ti y
. c
These are specific circuitswfor controlling, motors, actuators and light-intensities etc.
w
w

Questions-Answers
Q1. Enumerate the similarities and differences between the Microcontroller and Digital Signal
Processor
Ans:
Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip
i/o hardware to minimize chip count in single chip solutions. As a result of using on chip
hardware for I/O and RAM and ROM they usually have pretty low performance CPU.
Microcontrollers also often have timers that generate interrupts and can thus be used with
the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of
a microcontroller is to control the operations of a machine using a fixed program that is
stored in ROM and does not change over the lifetime of the system. The microcontroller is
optimized to handle data in bit and byte size. o m

concerned with getting data from and to its own pins; the architecture and instruction set are
o t.c
Digital Signal Processors have been designed based on the modified Harvard Architecture to
s p
handle real time signals. The features of these processors are suitable for implementing
g
signal processing algorithms. One of the common operations required in such applications is
o
bl
array multiplication. For example convolution and correlation require array multiplication.
.
This is accomplished by multiplication followed by accumulation and addition. This is
p
ou
generally carried out by Multiplier and Accumulator (MAC) units. Some times it is known
as MACD, where D stands for Data move. Generally all the instructions are executed in
gr
single cycle. These DSP units generally use Multiple Access and Multi Ported Memory
s
nt
units. Multiple access memory allows more than one access in one clock period. The Multi-
ported Memory allows multiple addresses as well Data ports. This also increases the number
of access per unit clock cycle.
d e
t u
s
Q2. Name few chips in each of the family of processors such as: Microcontroller, Digital Signal
y
it
Processor, General Purpose Processor
.c
Ans: w
w
w
Microcontroller: Intel 8051, Intel 80196, Motorola 68705
Digital Signal Processors: TI 3206711, TI 3205000
General Purpose Processor: Intel Pentium IV, Power PC
Q3. Enlist the following in the increasing order of their access speed
Flash Memory, Dynamic Memory, Cache Memory, CDROM, Hard Disk, Magnetic Tape,
Processor Memory
Ans:
Magnetic Tape, CDROM, Hard Disk, Dynamic Memory, Flash Memory, Cache Memory,
Processor Memory

Q4. Draw the circuit of an anti-aliasing Filter using Operational amplifiers
Ans:
o m
Low Pass Sallen Key Butterworth Filter t.c
p o
Q5. Is it possible to implement an anti-aliasing filter in the digitalsform?
o g
Ans:
. bl
u
No it is not possible to implement an anti-aliasing filter p in digital form. Because aliasing is an
error introduced at the sampling phase of analog to o
less than twice of the highest frequency presentgthe r digital converter. If the sampling frequency is
higher signal frequencies fold back to lower
frequency band and hence can be distinguished tsin the digital/discrete domain.
e n
Q6. Download any free emulator of some
u d simple microcontrollers such as 8051, 68705 etc and
learn about it.
st
Home work i ty
. c
w of 8051 and explain the functions of various units.
Q7. Draw the internal architecture
w
See http://www.atmel.com/products/8051/
w
Q8. State with justification if the following statements are right (or wrong)
Cache memory can be a static RAM
Dynamic RAMs occupy more space per word storage
The full-form of SDRAM is static-dynamic RAM
BIOS in your PC is not a Random Access Memory (RAM)
Ans:
Cache memory can be a static RAM right
The cache memory need to have very fast access time which is possible with static RAM.
Dynamic RAMs occupy more space per word storage wrong

DRAMs are basically simple MOS based capacitors. Therefore occupy much lower space as
compared to static RAMs.

The full-form of SDRAM is static-dynamic RAM wrong

SDRAM is Synchronous Dynamic RAM. Covered in later chapters
BIOS in your PC is not a Random Access Memory (RAM) Wrong

The BIOS is a CMOS based memory which can be accessed uniformly.
Q9. Explain the function of the following units in a general purpose processor
Instruction Register
Program Counter
Instruction Queue
Control Unit
Ans:
o m
Instruction Register: A register inside the CPU which holds the instruction code temporarily
before sending it to the decoding unit.
o t.c
Program Counter: It is a register inside the CPU which holds the address of the next instruction
s p
code in a program. It gets updated automatically by the address generation unit.
o g
Instruction Queue: A set of memory locations inside the CPU to hold the instructions in a pipe-
bl
line before rending them to the next instruction decoding unit.
.
p
Control Unit: This is responsible in generating timing and control signals for various operations
u
o
inside the CPU. It is very closely associated with the instruction decoding unit.
r
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
5
Memory-I
o Different kinds of Memory

Processor Memory
Primary Memory
Memory Interfacing
Pre-Requisite
5.1 Introduction
o m
This chapter shall describe about the memory. Most of the modern computer
t .c system has been
1
designed on the basis of an architecture called Von-Neumann Architecture
p o
Input g s
Central lo
Output Processing.b Memory
Devices Unitup
r o
s gNeumann Architecture
Fig. 5.1 The Von
n t
The Memory stores the instructions as well
data. The CPU has to be directed to thed
e as data. No one can distinguish an instruction and
t uthrough the following lines
address of the instruction codes.
The memory is connected to the CPU
s
1. Address ti y
2. Data .c
w
3. Control w
w
1
http://en.wikipedia.org/wiki/John_von_Neumann. The so-called von Neumann architecture is a model for a
computing machine that uses a single storage structure to hold both the set of instructions on how to perform the
computation and the data required or generated by the computation. Such machines are also known as stored-
program computers. The separation of storage from the processing unit is implicit in this model.
By treating the instructions in the same way as the data, a stored-program machine can easily change the
instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the
need for a program to increment or otherwise modify the address portion of instructions. This became less important
when index registers and indirect addressing became customary features of machine architecture.

Data Lines
CPU Address Lines Memory
o m
Control Lines o t.c
s p
o g
bl
Fig. 5.2 The Memory Interface
.
In a memory read operation the CPU loads the p
u address onto the address bus. Most cases
o
r is transferred to the processor via the data
these lines are fed to a decoder which selects the proper memory location. The CPU then sends a
g
read control signal. The data is stored in that location
s
lines. t
In the memory write operation afternthe address is loaded the CPU sends the write control
d e memory location.
signal followed by the data to the requested
The memory can be classified t u in various ways i.e. based on the location, power
s
ty be classified as
consumption, way of data storage etc
The memory at the basic level ican
.c Array)
w
1. Processor Memory (Register
2. Internal on-chipwMemory
3. Primary Memory w
4. Cache Memory
5. Secondary Memory
Processor Memory (Register Array)

Most processors have some registers associated with the arithmetic logic units. They store the
operands and the result of an instruction. The data transfer rates are much faster without needing
any additional clock cycles. The number of registers varies from processor to processor. The
more is the number the faster is the instruction execution. But the complexity of the architecture
puts a limit on the amount of the processor memory.

Internal on-chip Memory

In some processors there may be a block of memory location. They are treated as the same way
as the external memory. However it is very fast.
Primary Memory
This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU.
These memories can be static or dynamic.
Cache Memory
This is situated in between the processor and the primary memory. This serves as a buffer to the
immediate instructions or data which the processor anticipates. There can be more than one
levels of cache memory.
o m
Secondary Memory o t.c
s p
g
These are generally treated as Input/Output devices. They are much cheaper mass storage and
o
bl
slower devices connected through some input/output interface circuits. They are generally
.
magnetic or optical memories such as Hard Disk and CDROM devices.
p
u
The memory can also be divided into Volatile and Non-volatile memory.
o
Volatile Memory gr
s
tis switched off. Semiconductor Random Access
The contents are erased when the power
e n
Memories fall into this category.
u d
t
Non-volatile Memory ys
c it
The contents are intact even. of the power is switched off. Magnetic Memories (Hard Disks),
w
Optical Disks (CDROMs),
w Read Only Memories (ROM) fall under this category.
w

CPU
Control Unit
ALU
Registers
Output
Input
o m
o t.c
Memory s p
o g
. bl
Fig. 5.3 The Internal Registers
u p
5.2 Data Storage
r o
g
An m word memory can store m x n: m wordssof n bits each. One word is located at one address
therefore to address m words we need. n t
deaddress m = 2 words
k = Log2(m) address input signals
or k number address linesucan k
s t
Example 4,096 x 8 memory: ty
i
.c
32,768 bits
w input signals
12 address
8w w
input/output data signals
m n memory

m words
n bits per word
Fig. 5.4 Data Array

Memory access
The memory location can be accessed by placing the address on the address lines. The control
lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses
to different locations simultaneously
memory external view
r/w 2k n read and write memory

enable
A0

o m
Ak-1
o t.c
s p
o g
bl Q
p.
Qn-1 0
o u
Fig. 5.5 Memory Array
gr
Memory Specifications ts
en
d
The specification of a typical memory is as follows
u
st
The storage capacity: The number of bits/bytes or words it can store
The memory access time (read access and write access): How long the memory takes to
it y
load the data on to its data lines after it has been addressed or how fast it can store the data upon
.c
supplied through its data lines. This reciprocal of the memory access time is known as Memory
w
Bandwidth w
w
The Power Consumption and Voltage Levels: The power consumption is a major factor
in embedded systems. The lesser is the power consumption the more is packing density.
Size: Size is directly related to the power consumption and data storage capacity.
Generation 1
Generation 2

Generation 3
Generation 4
Fig. 5.6 Four Generations of RAM chips
are concerned. o m
There are two important specifications for the Memory as far as Real Time Embedded Systems
Write Ability
Storage Performance o t.c
s p
Write ability
o g
. bl
It is the manner and speed that a particular memory can be written
Ranges of write ability u p
High end r o
s g
nt
processor writes to memory simply and quickly e.g., RAM
Middle range e
d
processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically
u
t
Erasable and Programmable Read Only Memory)
s
Lower range
it y
.c
special equipment, programmer, must be used to write to memory e.g.,
w
EPROM, OTP ROM (One Time Programmable Read Only Memory)
Low end w
w
bits stored only during fabrication e.g., Mask-programmed ROM
In-system programmable memory
Can be written to by a processor in the embedded system using the memory
Memories in high end and middle range of write ability
Storage permanence
It is the ability to hold the stored bits.
Range of storage permanence
High end
essentially never loses bits
e.g., mask-programmed ROM

Middle range
holds bits days, months, or years after memorys power source turned off
e.g., NVRAM
Lower range
holds bits as long as power supplied to memory
e.g., SRAM
Low end
begins to lose bits almost immediately after written
e.g., DRAM
Nonvolatile memory
Holds bits after power is no longer supplied
High end and middle range of storage permanence
o m
5.3 Common Memory Types
o t.c
Read Only Memory (ROM) s p
o g
This is a nonvolatile memory. It can only be read from butb l not written to, by a processor in an
p .
embedded system. Traditionally written to, programmed, before inserting to embedded system
Uses u
o processor
r
Store software program for general-purpose
g
s
program instructions can bet one or more ROM words
n
Store constant data needed dbyesystem
Implement combinational t ucircuit
s
ti y External view
.c
w k
wenable 2 n ROM
w A 0

Ak-1

Qn-1 Q0
Fig. 5.7 The ROM Structure
Example
The figure shows the structure of a ROM. Horizontal lines represents the words. The vertical
lines give out data. These lines are connected only at circles. If address input is 010 the decoder
sets 2nd word line to 1. The data lines Q3 and Q1 are set to 1 because there is a programmed

connection with word 2s line. The word 2 is not connected with data lines Q2 and Q0. Thus the
output is 1010
Internal view
8 4 ROM
word 0
enable 38 word 1
decoder word 2
A0 word line
A1
A2
data line
o m
t.c
programmable
connection wired-OR
Q3 Q2 Q1 Q0 p o
g s
o
Fig. 5.8 The example of a ROM with decoder and data storage
.bl
p
Implementation of Combinatorial Functions
u
o
Any combinational circuit of n functions of same rk variables can be done with 2 x n ROM. Thek
s g of the ROM locations. The output is the word

stored at that location. n t
inputs of the combinatorial circuit are the address
e
Truth table ud
Inputs (address)
st Outputs
82 ROM
a
0
b
0
c
0 it y y
0
z
0 0 0
word 0
0 0
. c
1 0 1 0
0
1
1
word 1
0
0
1
1 w 0
1
0
1
1
0 1 0
1 w
0 0 1 0
enable
1 0
1
1
w 0
1
1
0
1
1
1
1
c
b
1
1
1
1
1 1 1 1 1 1 1 word 7
a
y z
Fig. 5.9 The combinatorial table
Mask-programmed ROM
The connections programmed at fabrication. They are a set of masks. It can be written only
once (in the factory). But it stores data for ever. Thus it has the highest storage permanence. The
bits never change unless damaged. These are typically used for final design of high-volume
systems.

OTP ROM: One-time programmable ROM

The Connections programmed after manufacture by user. The user provides file of desired
contents of ROM. The file input to machine called ROM programmer. Each programmable
connection is a fuse. The ROM programmer blows fuses where connections should not exist.
Very low write ability: typically written only once and requires ROM programmer device
Very high storage permanence: bits dont change unless reconnected to programmer and
more fuses blown
Commonly used in final products: cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM

This is known as erasable programmable read only memory. The programmable component is a
MOS transistor. This transistor has a floating gate surrounded by an insulator. The Negative
o m
charges form a channel between source and drain storing a logic 1. The Large positive voltage at
t.c
gate causes negative charges to move out of channel and get trapped in floating gate storing a
logic 0. The (Erase) Shining UV rays on surface of floating-gate causes negative charges to
p o
return to channel from floating gate restoring the logic 1. An EPROM package showing quartz
window through which UV light can pass. The EPROM has
g s
Better write ability o
.
can be erased and reprogrammed thousands of timesbl
Reduced storage permanence
u p
o
program lasts about 10 years but is susceptible to radiation and electric noise
r
Typically used during design development
s g
ent
0V
u d +15V
floating
st
ity (b)
.c
w
(a)
(d)
w
w
5-30 min
(c)
Fig. 5.10 The EPROM
EEPROM
EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It
is erased typically by using higher than normal voltage. It can program and erase individual
words unlike the EPROMs where exposure to the UV light erases everything. It has
Better write ability

can be in-system programmable with built-in circuit to provide higher than normal
voltage
built-in memory controller commonly used to hide details from memory user
writes very slow due to erasing and programming
busy pin indicates to processor EEPROM still writing
can be erased and programmed tens of thousands of times
Similar storage permanence to EPROM (about 10 years)
Far more convenient than EPROMs, but more expensive
Flash Memory
o m
It is an extension of EEPROM. It has the same floating gate principle and same write ability and
t.c
storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once,
Writes to single words may be slower p o

rather than one word at a time. The blocks are typically several thousand bytes large
g s
Entire block must be read, word updated, then entire block written back
o

bl
Used with embedded systems storing large data items in nonvolatile memory
.
p
e.g., digital cameras, TV set-top boxes, cell phones
u
RAM: Random-access memorygr
o
ts
Typically volatile memory
en
u d
bits are not held without power supply

st
Read and written to easily by embedded system during execution

t y
Internal structure more complex than ROM
i
.c
a word consists of several memory cells, each storing 1 bit
w
w
each input and output data line connects to each cell in its column
w
rd/wr connected to every cell
when row is enabled by decoder, each cell has logic that stores input data bit when
rd/wr indicates write or outputs stored bit when rd/wr indicates read

external view
r/w
2k n read and
enable write memory
A0

Ak-1

Qn-1 Q0
Fig. 5.11 The structure of RAM
internal view
I3 I2 I1 I0
o m
44 RAM
ot.c
enable 24
s p
decoder
o g
A0
A1 . bl
u p Memory
rd/wr r o cell
g
To every cell
s
e nt Q Q Q 3 2 Q
u d
Fig. 5.12 The RAM decoder and access
st
Basic types of RAM it y
.c
w
SRAM: Static RAM
w
Memorywcell uses flip-flop to store bit
Requires 6 transistors
Holds data as long as power supplied
DRAM: Dynamic RAM
Memory cell uses MOS transistor and capacitor to store bit
More compact than SRAM
Refresh required due to capacitor leak
words cells refreshed when read
Typical refresh rate 15.625 microsec.
Slower to access than SRAM

SRAM
DRAM
Data' Data Data

W
Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
o m

Popular low-cost high-density alternative to SRAM
NVRAM: Nonvolatile RAM o t.c
s p
Holds data after external power removed
o g
Battery-backed RAM
. bl
p
SRAM with own permanently connected battery
u
writes as fast as reads
r o
s g nonvolatile ROM-based memory
no limit on number of writes unlike
SRAM with EEPROM or flashnstores t complete RAM contents on EEPROM or flash
before power
d e
t u
5.4 Example: HM6264 s& 27C256 RAM/ROM devices
ti y
Low-cost low-capacity.c memory devices
Commonly used inw8-bit microcontroller-based embedded systems
w digits indicate device type
w
First two numeric
RAM: 62
ROM: 27
Subsequent digits indicate capacity in kilobits

11-13, 15-19 data<70> data<70>

11-13, 15-19
2,23,21,24, addr<15...0> 27,26,2,23,21, addr<15...0>
25, 3-10 24,25, 3-10
22 /OE 22 /OE
27 /WE 20 /CS
20 /CS1
26 CS2 HM6264 27C256

block diagrams
Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
HM6264 85-100 .01 15 5
27C256 90 .5 100 5
device characteristics
Read operation Write operation

o m
data data
ot.c
addr addr
WE s p
OE
o g
bl
/CS1 /CS1
CS2
p .
CS2
timing diagrams
o u
gr
s
t memory device
5.5 Example: TC55V2325FF-100
e n

d
u32-bit
st
2-megabit synchronous pipelined
Designed to be interfaced with
burst SRAM memory device
processors
Capable of fast sequentialy
it reads and writes as well as single byte I/O
. c
w
w
w

data<310> Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
TC55V23 10 na 1200 3.3
addr<150> 25FF-100
addr<10...0> device characteristics
/CS1
/CS2 A single read operation

CS3
CLK
/WE
/ADSP
/OE
/ADSC
MODE
/ADV
/ADSP
addr <150>
o m
t.c
/ADSC /WE
/ADV /OE
p o
CLK /CS1 and /CS2
g s
o
TC55V2325
FF-100
CS3
. bl
data<310>
u p
block diagram
r o
s gtiming diagram
5.6 Composing memory

e nt
d

tu from size of readily available memories
Memory size needed often differs
When available memory iss larger, simply ignore unneeded high-order address bits and
higher data lines i ty
.
When available memory c is smaller, compose several smaller memories into one larger
memory w
w
Connect side-by-side to increase width of words
w
Connect top to bottom to increase number of words
added high-order address line selects smaller memory containing desired word
using a decoder
Combine techniques to increase number and width of words

Increase number of words

(2m+1) n ROM
2m n
A0 ROM

Am-1
12
Am decoder
2m n
enable ROM

o m
Qn-1

Q0
o t.c
s p
o g
bl
2m 3n ROM
.2 n
enable 2m n
ROM
p
u ROM
m
2m n
ROM
Increase width
A0
r o
of words
Am

s g
e nt
u d
Q3n-1 Q2n-1 Q0
st
it y
c
. Increase number
A
w
w and width of
w words
enable
outputs
Fig. 5.13 Composing Memory
5.7 Conclusion
In this chapter you have learnt about the following
1. Basic Memory types
2. Basic Memory Organization
3. Definitions of RAM, ROM and Cache Memory

4. Difference between Static and Dynamic RAM

5. Various Memory Control Signals
6. Memory Specifications
7. Basics of Memory Interfacing
5.8 Questions
Q1. Discuss the various control signals in a typical RAM device (say HM626)
Ans:
11-13, 15-19 data<70>
2,23,21,24,
25, 3-10
addr<15...0>
o m
t.c
22 /OE
27 /WE
p o
20 /CS1
HM6264 gs
26 CS2
o
.bl
u p
/OE: output enable bar: the output is enables when it is low. It is same as the read bar line
/WE: write enable bar: the line has to made low while writing to this device
r o
CS1: chip select 1 bar: this line has to be made low along with CS2 bar to enable this chip
s gchip and indicate the various signals.

Q2. Download the datasheet of TC55V2325FF
n t
d e
t u
s
ti y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
6
Memory-II
Memory Hierarchy
Cache Memory
- Different types of Cache Mappings
- Cache Impact on System Performance
Dynamic Memory
- Different types of Dynamic RAMs
Memory Management Unit
Pre-Requisite
o m
6.1 Memory Hierarchy
ot.c
s p
Objective is to use inexpensive, fast memory
Main memory o g
bl
Large, inexpensive, slow memory stores entire program and data
.
Cache
u p
Small, expensive, fast memory stores copy of likely accessed parts of larger memory
Can be multiple levels of cache r o
s g
ent
u d Process
st
ty Registers
. ci
w Cache
w
w
Main memory
Disk
Tape
Fig. 6.1 The memory Hierarchy

6.2 Cache
Usually designed with SRAM
faster but more expensive than DRAM
Usually on same chip as processor
space limited, so much smaller than off-chip main memory
faster access (1 cycle vs. several cycles for main memory)
Cache operation
Request for main memory access (read or write)
First, check cache for copy
cache hit
- copy is in cache, quick access
cache miss
- copy not in cache, read address and possibly its neighbors into cache
Several cache design choices
cache mapping, replacement policies, and write techniques o m
o t.c
6.3 Cache Mapping
s p
o g
bl
is necessary as there are far fewer number of available cache addresses than the memory

Are address contents in cache?
p .
Cache mapping used to assign main memory address to cache address and determine hit
or miss
ou
Three basic techniques:
gr
Direct mapping s
Fully associative mapping
Set-associative mapping ent
u d
Caches partitioned into indivisible blocks or lines of adjacent memory addresses
st
usually 4 or 8 addresses per line
it y
Direct Mapping .c
w
w
Main memory address divided into 2 fields
w
Index which contains
- cache address
- number of bits determined by cache size
Tag
- compared with tag stored in cache at address indicated by index
- if tags match, check valid bit
Valid bit
indicates whether data in slot has been loaded from memory
Offset
used to find particular word in cache line

Tag Index Offset
V T D
Data
Valid
=
Fig. 6.2 Direct Mapping o m

Fully Associative Mapping o t.c
s p
Complete main memory address stored in each cache address
o g

Valid bit and offset same as direct mapping . bl
All addresses stored in cache simultaneously compared with desired address
u p
r o
Tag
g
Offset
s
nt
Data
V T V
d e T V T
t u Valid
=
y s =
t
ci
=
.
w
w
w
Fig. 6.3 Fully Associative Mapping
Set-Associative Mapping
Compromise between direct mapping and fully associative mapping
Index same as in direct mapping
But, each cache address contains content and tags of 2 or more memory address locations
Tags of that set simultaneously compared as in fully associative mapping
Cache with set size N called N-way set-associative
2-way, 4-way, 8-way are common

Tag Index Offset
V T D V T D
Data
Valid
= =
Fig. 6.4 Set Associative Mapping o m

6.4 Cache-Replacement Policy ot.c
s p
Technique for choosing which block to replace
o g
when fully associative cache is full
when set-associative caches line is full .bl
Direct mapped cache has no choice
u p
Random
r o
replace block chosen at random
s g
nt
LRU: least-recently used
replace block not accessed for longest time
FIFO: first-in-first-out d e
t u
push block onto queue when accessed
s
choose block to replace by popping queue
y
c it
6.5 .
Cache Write Techniques
w
w

w
When written, data cache must update main memory
Write-through
write to main memory whenever cache is written to
easiest to implement
processor must wait for slower main memory write
potential for unnecessary writes
Write-back
main memory only written when dirty block replaced
extra dirty bit for each block set when cache block written to
reduces number of slow main memory writes
6.6 Cache Impact on System Performance

Most important parameters in terms of performance:

Total size of cache

- total number of data bytes cache can hold
- tag, valid and other house keeping bits not included in total
Degree of associativity
Data block size
Larger caches achieve lower miss rates but higher access cost
e.g.,
- 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
- avg. cost of memory access
= (0.85 * 2) + (0.15 * 20) = 4.7 cycles
4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
- avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105
cycles (improvement)
8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not
change
o m
t.c
- avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) =
4.8904 cycles
p o
6.7 Cache Performance Trade-Offs
g s
o
Improving cache hit rate without increasing size
. bl
Increase line size
Change set-associativity u p
r o
s g
nt
0.16
0.14
d e
t u
0.12
y s
it
% cache miss
.c
0.1 1 way
0.08 w 2 way
w 4 ways
0.06
w 8 way
0.04
0.02
0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
Fig. 6.5 Cache Performance

6.8 Advanced RAM

DRAMs commonly used as main memory in processor based embedded systems
high capacity, low cost
Many variations of DRAMs proposed
need to keep pace with processor speeds
FPM DRAM: fast page mode DRAM
EDO DRAM: extended data out DRAM
SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM
RDRAM: rambus DRAM
6.9 Basic DRAM

Address bus multiplexed between row and column components

o m
Row and column addresses are latched in, sequentially, by strobing ras (row address
t.c
strobe) and cas (column address strobe) signals, respectively
Refresh circuitry can be external or internal to DRAM device
p o
strobes consecutive memory address periodically causing memory content to be
refreshed
g s
o
Refresh circuitry disabled during read or write operation
.bl
data
u p Refresh
r o Circuit
. Buffer
s g Sense
nt Amplifiers
In
Buffer
rd/ wr
Data
Addr
Col d ecas Col Decoder ras, clock
t u
y s
t
ci
Row
Out Buff
Buffer
. er
Decod
er
cas,
Dataw Addr.
w Row ras
address w Bit storage array
Fig. 6.6 The Basic Dynamic RAM Structure
Fast Page Mode DRAM (FPM DRAM)

Each row of memory bit array is viewed as a page
Page contains multiple words
Individual words addressed by column address
Timing diagram:
row (page) address sent
3 words read consecutively by sending column address for each
Extra cycle eliminated on each read/write of words from same

ras
cas
address row col col col

data
data data data
Fig. 6.7 The timing diagram in FPM DRAM
Extended data out DRAM (EDO DRAM)

Improvement of FPM DRAM o m
Extra latch before output buffer
allows strobing of cas before data read operation completed o t.c
Reduces read/write latency by additional cycle
s p
o g
ras
. bl
cas
u p
address r o
row
s g
col col col
data
ent data data data
Speedup through overlap

u d
st
it yFig. 6.8 The timing diagram in EDORAM
.c
w Enhanced Synchronous (ES) DRAM
(S)ynchronous and
w
w
SDRAM latches data on active edge of clock
Eliminates time to detect ras/cas and rd/wr signals
A counter is initialized to column address then incremented on active edge of clock to
access consecutive memory locations
ESDRAM improves SDRAM
added buffers enable overlapping of column addressing
faster clocking and lower read/write latency possible

clock
ras
cas
address row col
data data data data
Fig. 6.9 The timing diagram in SDRAM
Rambus DRAM (RDRAM)

More of a bus interface architecture than DRAM architecture m
Data is latched on both rising and falling edge of clock o
Broken into 4 banks each with own row decoder
can have 4 pages open at a time o t.c
Capable of very high throughput s p
o g
6.10 DRAM Integration Problem
. bl
u
SRAM easily integrated on same chip as processor
p
DRAM more difficult r o
g
sdesigners:DRAM and conventional logic
Different chip making process between
Goal of conventional logic (IC) t
n to reduce signal propagation delays and power
e
- minimize parasitic capacitance
d
consumption
u
st
Goal of DRAM designers:
- create capacitor y
c it beginning to appear
cells to retain stored information
Integration processes
.
w
w
6.11 Memory Management Unit (MMU)
w
Duties of MMU
Handles DRAM refresh, bus interface and arbitration
Takes care of memory sharing among multiple processors
Translates logic memory addresses from processor to physical memory addresses
of DRAM
Modern CPUs often come with MMU built-in
Single-purpose processors can be used
6.12 Question
Q1. Discuss different types of cache mappings.

Ans:
Direct, Fully Associative, Set Associative
Q2 Discuss the size of the cache memory on the system performance.
Ans:
0.16
0.14
0.12
0.1 1 way
% cache miss
0.08
o m 2 way
t.c
4 ways
0.06 8 way
0.04
s po
o g
bl
0.02
0
p .
1 Kb 2 Kb 4 Kb 8 Kb
o u
16 Kb 32 Kb 64 Kb 128 Kb
cache size
Q3. Discuss the differences between EDORAM g

r
and SDRAM
ts
Ans:
e n
u d
EDO RAM st
i t y
ras c
.
w
wcas
w
address row col col col
data data data data
Speedup through overlap

SDRAM
clock
ras
cas
address row col
data data data data
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
7
Digital Signal Processors
After going through this lesson the student would learn
o Architecture of a Real time Signal Processing Platform
o Different Errors introduced during A-D and D-A converter stage
o Digital Signal Processor Architecture
o Difference in the complexity of programs between a General Purpose Processor
and Digital Signal Processor
Pre-Requisite
Introduction o m
o t.c
Evolution of Digital Signal Processors
Comparative Performance with General Purpose Processor s p
o g
7.1 Introduction bl
.
Digital Signal Processing deals with algorithms for u
p
o handling large chunk of data. This branch
identified itself as a separate subject in 70s when rengineers thought about processing the signals
s
arising from nature in the discrete form. Developmentg of Sampling Theory followed and the
design of Analog-to-Digital converters gave t
n an impetus in this direction. The contemporary
applications of digital signal processingewas mainly in speech followed by Communication,
Seismology, Biomedical etc. Later on
u d the field of Image processing emerged as another
important area in signal processing. t
y s
t
idifferent processor classes
The following broadly defines
. c
General Purpose - high performance
Pentiums, w
Usedw
w Alpha's, SPARC
for general purpose software
Heavy weight OS - UNIX, NT
Workstations, PC's
Embedded processors and processor cores
ARM, 486SX, Hitachi SH7000, NEC V800
Single program
Lightweight, real-time OS
DSP support
Cellular phones, consumer electronics (e. g. CD players)
Microcontrollers
Extremely cost sensitive
Small word size - 8 bit common
Highest volume processors by far
Automobiles, toasters, thermostats, ...

A Digital Signal Processor is required to do the following Digital Signal Processing tasks in real
time
Signal Modeling
Difference Equation
Convolution
Transfer Function
Frequency Response
Signal Processing
Data Manipulation
Algorithms
Filtering
Estimation
What is Digital Signal Processing?

o m
Application of mathematical operations to digitally represented signals .c
Signals represented digitally as sequences of samples o t
s p
Digital signals obtained from physical signals via transducers (e.g., microphones) and
analog-to- digital converters (ADC)
o gvia digital-to-analog converters
Digital signals converted back to physical signals
b l
(DAC)
.
p that processes digital signals
Digital Signal Processor (DSP): electronic system
o u
Signal Processinggr
ts Analog Processing
e
Analog Processingn
d
uConditioner Analog Processor
Measurand Sensor
st LPF
ADC
i ty
.c
w
w
w
Digital Processing
DSP Analog Processor

DAC
LPF
Fig. 7.1 The basic Signal Processing Platform
The above figure represents a Real Time digital signal processing system. The measurand can be
temperature, pressure or speech signal which is picked up by a sensor (may be a thermocouple,
microphone, a load cell etc). The conditioner is required to filter, demodulate and amplify the
signal. The analog processor is generally a low-pass filter used for anti-aliasing effect. The ADC
block converts the analog signals into digital form. The DSP block represents the signal
processor. The DAC is for Digital to Analog Converter which converts the digital signals into

analog form. The analog low-pass filter eliminates noise introduced by the interpolation in the
DAC.
ADC
x (t ) xs ( t ) xq ( t ) bbits
Sampler Quantizer Coder
x ( n) xq ( n ) xb ( n )
p(t )
DAC
bbits
Decoder Sample/hold
xb ( n ) y ( n)
o m
Fig. 7.2 D-A and A-D Conversion Process
t.c
The performance of the signal processing system depends to the large p o extent on the ADC. The
g
ADC is specified by the number of bits which defines the resolution.s The conversion time
decides the sampling time. The errors in the ADC are due toothe finite number of bits and finite
conversion time. Some times the noise may be introduced b
l
. by the switching circuits.
pand the settling time at the output.
Similarly the DAC is represented by the number of bits
o u
A DSP tasks requires
Repetitive numeric computations g
r
Attention to numeric fidelity ts
High memory bandwidth, mostly e nvia array accesses
Real-time processing
u d
And the DSP Design should minimize
s t
Cost
i ty
Power
Memory use .c
w
Development time
w
w
Take an Example of FIR filtering both by a General Purpose Processor as well as DSP
Example
FIR Filtering
x (k ) y (k )
h(k )

y ( k ) = ( h0 + h1 z 1 + h2 z 2 + L + hN 1 z N 1 ) x ( k )
= h0 x ( k ) + h1 x ( k 1) + h2 x ( k 2 ) + L + hN 1 x ( k N + 1)
N 1
= hi x ( k i ) = h ( k ) * x ( k )
i =0
An FIR (Finite Impulse Response filter) is represented as shown in the following figure. The
output of the filter is a linear combination of the present and past values of the input. It has
several advantages such as:
Linear Phase
Stability
Improved Computational Time
x (k) o m
1 h0 o t.c
s p
o g
z-1 h1 p. bl
u y (k)
s gr o

z -1
enth2
u d
st
it y
.c
-1w
wz
w hN -1
Fig. 7.3 Tapped Delay Line representation of an FIR filter
FIR filter on (simple) General Purpose Processor

loop:
lw x0, (r0)
lw y0, (r1)
mul a, x0,y0
add b,a,b
inc r0
inc r1
dec ctr

tst ctr
jnz loop
sw b,(r2)
inc r2
This program assumes that the finite window of input signal is stored at the memory location
starting from the address specified by r1 and the equal number filter coefficients are stored at the
memory location starting from the address specified by r0. The result will be stored at the
memory location starting from the address specified by r2. The program assumes the content of
the register b as 0 before the start of the loop.
lw x0, (r0)
lw y0, (r1)
These two instructions load x0 and y0 registers with values from the memory location specified
by the registers r0 and r1 with values x0 and y0
o m
mul a, x0,y0
o t.c
s p
This instruction multiplies x0 with y0 and stores the result in a.
o g
add b,a,b
. bl
operation) and stores the result in b. u p
This instruction adds a with b (which contains already accumulated result from the previous
r o
inc r0
s g
inc r1
dec ctr ent
tst ctr
u d
jnz loop
st
it y
.c
The above portion of the program increment the registers to point to the next memory location,
decrement the counters, to see if the filter order has been reached and tests for 0. It jumps to the
w
start of the loop.
w
sw b,(r2)
w
inc r2
This stores the final result and increments the register r2 to point to the next location.
Let us see the program for an early DSP TMS32010 developed

by Texas
Instruments in 80s.
It has got the following features
16-bit fixed-point
Harvard architecture separate instruction and data memories
Accumulator

Specialized instruction set Load and Accumulate

390 ns Multiple-Accumulate(MAC)
TI TMS32010 (Ist DSP) 1982
Instruction
Memory
Processor
Data
Memory
Datapath:
o m
Mem o t.c
T-Register
s p
o g
. bl
Multiplier
u p
ALUgr
o P-Register
ts
n
de
Accumulator
u
t
i ys
t Fig. 7.4 Basic TMS32010 Architecture
.c
w
The program for the FIR filter (for a 3rd order) is given as follows
w
w
Here X4, H4, ... are direct (absolute) memory addresses:
LT X4 ;Load T with x(n-4)
MPY H4 ;P = H4*X4
;Acc = Acc + P
LTD X3 ;Load T with x(n-3); x(n-4) = x(n-3);
MPY H3 ; P = H3*X3
; Acc = Acc + P
LTD X2
MPY H2
...
Two instructions per tap, but requires unrolling
; for comment lines

LT X4 Loading from direct address X4

MPY H4 Multiply and accumulate
LTD X3 Loading and shifting in the data points in the memory
The advantages of the DSP over the General Purpose Processor can be written as Multiplication
and Accumulation takes place at a time. Therefore this architecture supports filtering kind of
tasks. The loading and subsequent shifting is also takes place at a time.
II. Questions
1. Discuss the different errors introduced in a typical real time signal processing systems.
Answers
Various errors are in

ADC
i. Sampling error
o m
ii. Quantization
iii. Coding
o t.c
Algorithm
s p
iv. in accurate modeling
o g
bl
v. Finite word length
vi. Round of errors
p .
vii. Delay due to finite execution time of the processor
DAC
ou
viii. Decoding
gr
ix. Transients in sampling time
s
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
8
General Purpose
Processors - I
In this lesson the student will learn the following

Architecture of a General Purpose Processor
Various Labels of Pipelines
Basic Idea on Different Execution Units
Branch Prediction
Pre-requisite
Digital Electronics
8.1 Introduction
The first single chip microprocessor came in 1971 by Intel Corporation. It was called Intel 4004
o m
and that was the first single chip CPU ever built. We can say that was the first general purpose
processor. Now the term microprocessor and processor are synonymous. The 4004 was a 4-bit
o t.c
processor, capable of addressing 1K data memory and 4K program memory. It was meant to be
used for a simple calculator. The 4004 had 46 instructions, using only 2,300 transistors in a 16-
p
pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8
s
g
microseconds). In 1975, Motorola introduced the 6800, a chip with 78 instructions and probably
o
bl
the first microprocessor with an index register. In 1979, Motorola introduced the 68000. With
p .
internal 32-bit registers and a 32-bit address space, its bus was still 16 bits due to hardware
prices. On the other hand in 1976, Intel designed 8085 with more instructions to enable/disable
ou
three added interrupt pins (and the serial I/O pins). They also simplified hardware so that it used
gr
only +5V power, and added clock-generator and bus-controller circuits on the chip. In 1978,
s
nt
Intel introduced the 8086, a 16-bit processor which gave rise to the x86 architecture. It did not
contain floating-point instructions. In 1980 the company released the 8087, the first math co-
e
processor they'd developed. Next came the 8088, the processor for the first IBM PC. Even
d
u
though IBM engineers at the time wanted to use the Motorola 68000 in the PC, the company
t
s
already had the rights to produce the 8086 line (by trading rights to Intel for its bubble memory)
y
it
and it could use modified 8085-type components (and 68000-style components were much more
scarce).
.c
w
w
Table 1 Development History of Intel Microprocessors
Intel Processor w
Year of Initial Clock Number of Circuit Line
Introduction Speed Transistors Width
4004 1971 108 kHz 2300 10 micron
8008 1972 500-800 KHz 3500 10 micron
8080 1974 2 MHz 4500 6 micron
8086 1978 5 MHz 29000 3 micron
8088 1979 5 MHz 29000 3 micron
Intel286TM 1982 6 MHz 134,000 1.5 micron
Intel386TM 1985 16 MHz 275,000 1.5 micron
Intel486TM 1989 25 MHz 1.2 Million 1 Micron
PentiumTM 1993 66 MHz 3.1 Million 0.8 Micron
PentiumTM Pro 1995 200 MHz 5.5 Million 0.35 Micron
PentiumTM II 1997 300 MHz 7.5 Million 0.25 Micron

CeleronTM 1998 266 MHz 7.5 Million 0.25 Micron

PentiumTM III 1999 500 MHz 9.5 Million 0.25 Micron
PentiumTM IV 2000 1.5MHz 42 Million 0.18 Micron
ItaniumTM 2001 800 MHz 25 Million 0.18 Micron
Intel Xeon 2001 1.7 GHz 42 million 0.18 micron
ItaniumTM 2 2002 1 GHz 220 million 0.18 micron
PentiumTM M 2005 1.5 GHz 140 Million 90 nm
The development history of Intel family of processors is shown in Table 1. The Very Large Scale
Integration (VLSI) technology has been the main driving force behind the development.
8.2 A Typical Processor
o m
o t.c
s p
o g
. bl
u p
r o
s g
n t photograph
Fig. 8.2 The
d e general purpose processor from VIA (C3) (please

u2) is shown in Fig2 and Fig. 8.3 respectively.
The photograph and architecture of a modern
t
s
refer lesson on Embedded components
ti y
.c
w
w
w

I-Cache & I-TLB I

128-ent 8-way B I-Fetch
64 KB 4-way
8-ent PDC
predecode V
Return stack decode buffer

3 BHTs
Decode &
Decode F Translate
Branch
Prediction
4-entry inst Q
BTB
Translate
o m
X
Bus L2 cache 4-entry inst Q ROM

t.c
Unit
64 Kb 4-way
Register File p
o R
g s
address calculation
lo A
. b D
D-Cache &
- 64 KB
D-TLB
- 128-ent 8-way
u p
4 way 8-ent PDC
r o G
s g Execute
n t Integer ALU FP
e
d Store-Branch
E
MMX/
Q
t u S 3D FP
y s unit unit
i t Write back W
Buffers.c
Store
w
w
wWrite
Buffers
Fig. 8.3 The architecture
Specification
Name: VIA C3TM in EBGA: VIA C3 is the name of the company and EBGA for Enhanced
Ball Grid Array, clock speed is 1 GHz
Ball Grid Array. (Sometimes abbreviated BG.) A ball grid array is a type of microchip
connection methodology. Ball grid array chips typically use a group of solder dots, or balls,

arranged in concentric rectangles to connect to a circuit board. BGA chips are often used in
mobile applications where Pin Grid Array (PGA) chips would take up too much space due to the
length of the pins used to connect the chips to the circuit board.
SIMM
DIP
PGA
o m
o t.c
s p
o g
. bl
u p
r o SIP
s g
Fig. 8.4 tPin Grid Array (PGAA)
e n
u d
st
it y
.c
w
w
w
Fig. 8.5 Ball Grid Array

o m
o t.c
s p
o g
. bl
u p
r o
Fig. 8.6 The Bottom View of the Processor
s g
The Architecture
e nt
d
u lined structure:
t
The processor has a 12-stage integer pipe
s characteristic of a modern general purpose processor. A
t y
ci stored in memory. During execution a processor has to fetch
Pipe Line: This is a very important
.
program is a set of instructions
these instructions from thew memory, decode it and execute them. This process takes few clock
cycles. To increase thewspeed of such processes the processor divide itself into different units.
While one unit gets wthe instructions from the memory, another unit decodes them and some other
unit executes them. This is called pipelining. This can be termed as segmenting a functional unit
such that it can accept new operands every cycle while the total execution of the instruction may
take many cycles. The pipeline construction works like a conveyor belt accepting units until the
pipeline is filled and than producing results every cycle. The above processors has got such a
pipeline divided into 12stages
There are four major functional groups: I-fetch, decode and translate, execution, and data
cache.
The I-fetch components deliver instruction bytes from the large I-cache or the
external bus.
The decode and translate components convert these instruction bytes into internal
execution forms. If there is any branching operation in the program it is identified
here and the processor starts getting new instructions from a different location.
The execution components issue, execute, and retire internal instructions

The data cache components manage the efficient loading and storing of execution
data to and from the caches, bus, and internal components
Instruction Fetch Unit
I-Cache & I-TLB I

128-ent 8-way
64 KB 4-way
8-ent PDC
B
m
co
o t. V
predecode
s p
o g
bl
decode buffer
p .
o u
Fig. 8.7
g r
Cache) or external bus into the instruction n ts
First three pipeline stages (I, B, V) deliver aligned instruction data from the I-cache (Instruction
decode buffers. The primary I-cache contains 64 KB
organized as four-way set associative with
d e 32-byte lines. The associated large I-TLB(Instruction
u
Translation Look-aside Buffer) contains 128 entries organized as 8-way set associative.
t
y s
TLB: translation look-aside buffer
t
i that contains information about the pages in memory the
c
a table in the processors memory
.
w addresses in physical memory that the program has most recently
processor has accessed recently. The table cross-references a programs virtual addresses with
w
the corresponding absolute
w
used. The TLB enables faster computing because it allows the address processing to take place
independent of the normal address-translation pipeline.
The instruction data is predecoded as it comes out of the cache; this predecode is overlapped
with other required operations and, thus, effectively takes no time. The fetched instruction data is
placed sequentially into multiple buffers. Starting with a branch, the first branch-target byte is
left adjusted into the instruction decode buffer.

Instruction Decode Unit
Predecode V
Return stack Decode buffer

3 BHTs
Decode
Decode F &
Branch
Prediction Translate
4-entry inst Q
BTB
Translate oXm
ot.c
Fig. 8.8
s p
o g
bl
Instruction bytes are decoded and translated into the internal format by two pipeline stages (F,X).
p .
The F stage decodes and formats an instruction into an intermediate format. The internal-
format instructions are placed into a five-deep FIFO(First-In-First-Out) queue: the FIQ. The X-
o u
stage translates an intermediate-form instruction from the FIQ into the internal
r
microinstruction format. Instruction fetch, decode, and translation are made asynchronous from
g
s
execution via a five-entry FIFO queue (the XIQ) between the translator and the execution unit.
Branch Prediction ent

u d
st
i ty predec
. c
w
Return stack
Decode buffer
w
w3 BHTs
Dec
Branch
Prediction
4-entry inst Q
BTB
Tran
Fig. 8.9

BHT Branch History Table and BTB Branch Target Buffer

The programs often invoke subroutines which are stored at a different location in the memory. In
general the instruction fetch mechanism fetches instructions beforehand and keeps them in the
cache memory at different stages and sends them for decoding. In case of a branch all such
instructions need to be abandoned and new set of instruction codes from the corresponding
subroutine is to be loaded. Prediction of branch earlier in the pipeline can save time in flushing
out the current instructions and getting new instructions. Branch prediction is a technique that
attempts to infer the proper next instruction address, knowing only the current one. Typically it
uses a Branch Target Buffer (BTB), a small, associative memory that watches the instruction
cache index and tries to predict which index should be accessed next, based on branch history
which stored in another set of buffers known as Branch History Table (BHT). This is carried out
in the F stage.
Integer Unit
o m
o t.c
BT Translate
s p X
o g
bl
Bus L2 4-entry inst Q ROM
Unit 64 Kb 4-way
p . Register R
o u
g r
address calculation A
D-Cache & s D-TLB
t
- 64 KB n - 128-ent 8-way
D
4 way e
u d 8-ent PDC G
st
ti y Integer ALU E
.c
w Store-Branch S
w
w Store
Writeback W
Buffers
Write
Buffers
Fig. 8.10
Decode stage (R): Micro-instructions are decoded, integer register files are accessed and
resource dependencies are evaluated.
Addressing stage (A): Memory addresses are calculated and sent to the D-cache (Data Cache).
Cache Access stages (D, G): The D-cache and D-TLB (Data Translation Look aside Buffer) are
accessed and aligned load data returned at the end of the G-stage.
Execute stage (E): Integer ALU operations are performed. All basic ALU functions take one
clock except multiply and divide.
Store stage (S): Integer store data is grabbed in this stage and placed in a store buffer.
Write-back stage (W): The results of operations are committed to the register file.
Data-Cache and Data Path
BTB Translate X
Socket L2 cache o mROM

t.c
370 Bus 4-entry inst Q
Unit
Bus 64 Kb 4-way
p o
Register File R
g s
o
bl
address calculation A
D-Cache & p .
D-TLB D
u
o- 128-ent 8-way
- 64 KB 4-way r
g 8-ent PDC G
s
ent
u d
st Integer ALU E
it y
.c
w Fig. 8.11
w
w
The D-cache contains 64 KB organized as four-way set associative with 32-byte lines. The
associated large D-TLB contains 128 entries organized as 8-way set associative. The cache,
TLB, and page directory cache all use a pseudo-LRU (Least Recently Used) replacement
algorithm

The L2-Cache Memory
BTB
Translator
Socket L2 cache
370 Bus 4-entry inst Q
Bus Unit
64 Kb 4-way
Register F
address calculation
D-Cache & D-TLB
o m
t.c
- 64 KB 4-way - 128-ent 8-way
8-ent PDC
p o
g s
o
bl
Fig. 8.12
.
p
The L2 cache at any point in time are not containeduin the two 64-KB L1 caches. As lines are
r
displaced from the L1 caches (due to bringing in new o lines from memory), the displaced lines are
placed in the L2 cache. Thus, a future L1-cache gmiss on this displaced line can be satisfied by
returning the line from the L2 cache instead oftshaving to access the external memory.
e n
FP, MMX and 3D
u d
st
i ty Uni
.c Ececute
w
w
w Integer ALU E
FP
Q
Store-Branch S MMX/
FP
3D
Writeback Unit
W Unit
Fig. 8.13
FP; Floating Point Processing Unit
MMX: Multimedia Extension or Matrix Math Extension Unit

3D: Special set of instructions for 3D graphics capabilities
In addition to the integer execution unit, there is a separate 80-bit floating-point execution unit
that can execute floating-point instructions in parallel with integer instructions. Floating-point
instructions proceed through the integer R, A, D, and G stages. Floating-point instructions are
passed from the integer pipeline to the FP-unit through a FIFO queue. This queue, which runs at
the processor clock speed, decouples the slower running FP unit from the integer pipeline so that
the integer pipeline can continue to process instructions overlapped with FP instructions. Basic
arithmetic floating-point instructions (add, multiply, divide, square root, compare, etc.) are
represented by a single internal floating-point instruction. Certain little-used and complex
floating point instructions (sin, tan, etc.), however, are implemented in microcode and are
represented by a long stream of instructions coming from the ROM. These instructions tie up
the integer instruction pipeline such that integer execution cannot proceed until they complete.
This processor contains a separate execution unit for the MMX-compatible instructions. MMX
o m
instructions proceed through the integer R, A, D, and G stages. One MMX instruction can issue
o t.c
into the MMX unit every clock. The MMX multiplier is fully pipelined and can start one non-
dependent MMX multiply[-add] instruction (which consists of up to four separate multiplies)
p
every clock. Other MMX instructions execute in one clock. Multiplies followed by a dependent
s
g
MMX instruction require two clocks. Architecturally, the MMX registers are the same as the
o
bl
floating-point registers. However, there are actually two different register files (one in the FP-
.
unit and one in the MMX units) that are kept synchronized by hardware.
p
There is a separate execution unit for some specific u 3D instructions. These instructions provide
r o
assistance for graphics transformations via new SIMD(Single Instruction Multiple Data) single-
s g
precision floating-point capabilities. These instruction-codes proceed through the integer R, A,
D, and G stages. One 3D instruction can issuet into the 3D unit every clock. The 3D unit has two
n
single-precision floating-point multiplierseand two single-precision floating-point adders. Other
u d and reciprocal square root are provided. The multiplier
functions such as conversions, reciprocal,
and adder are fully pipelined and cant start any non-dependent 3D instructions every clock.
y s
8.3 Conclusion .ci
t
w the architecture of a typical modern general purpose processor(VIA
w
This lesson discussed about
C3) which similar towthe x86 family of microprocessors in the Intel family. In fact this processor
uses the same x86 instruction set as used by the Intel processor. It is a pipelined architecture. The
General Purpose Processor Architecture has the following characteristics
Multiple Stages of Pipeline
More than one Level of Cache Memory
Branch Prediction Mechanism at the early stage of Pipe Line
Separate and Independent Processing Units (Integer Floating Point, MMX, 3D etc)
Because of the uncertainties associated with Branching the overall instruction execution
time is not fixed (therefore it is not suitable for some of the real time applications which
need accurate execution speed)
It handles a very complex instruction set
The over all power consumption because of the complexity of the processor is higher
In the next lesson we shall discuss the signals associated with such a processor.

8.4 Questions and Answers

Q1. Draw the architecture of a similar processor (say P4) from Intel Family and study the
various units.
Q2. What is meant by the superscalar architecture in Intel family of processors? How is it
different/similar to pipelined architecture?
Q3. What kind of instructions do you expect for MMX units? Are they SIMD instructions?
Q4. How do you evaluate sin(x) by hardware?
Q5. What is the method to determine the execution time for a particular instruction for such
processors?
Q6. Enlist some instructions for the above processor.
Q7. What is power consumption of this processor? How do you specify them?
Q8. Give the various logic level voltages for the VIA C3 processor. o m
Q9. How do you number the pins in an EBGA chip?
o t.c
Q10. What is the advantages of EBGA over PGA?
s p
o g
Answers
. bl
Q1. Intel P4 Net-Burst architecture u p
r o
System Bus
s g
nt
Frequently used paths
d e Less frequently used

Bus Unit t u paths
y s
c it
.
3rd
wOptional
Level Cache
w
w
2nd Level Cache 1st Level Cache
8-Way 4-Way
Front End
Trace Cache Execution
Fetch/Decode Microcode Out-Of-Order Retirement
ROM Core
Branch History Update

BTBs/Branch Prediction

Q.2 Superscalar architecture refers to the use of multiple execution units, to allow the
processing of more than one instruction at a time. This can be thought of as a form of
"internal multiprocessing", since there really are multiple parallel processors inside the
CPU. Most modern processors are superscalar; some have more parallel execution units
than others. It can be said to consist of multiple pipelines.
Q3. Some MMX instructions from x86 family

MOVQ Move quadword
PUNPCKHWD Unpack high-order words
PADDUSW Add packed unsigned word integers with unsigned saturation
They also can be SIMD instructions.
Q4. (a) Look Up Table

(b) Taylor Series
(c) From the complex exponential
o m
t.c
Q5. This is done by averaging the instruction execution in various programming models which
includes latency and overhead. This is a statistical measure.
p o
Q6. All x86 family instructions will work.
g s
b lo
Q7. around 7.5 watts
p .
Q8.
o u
g r
Parameter
ts Min Max Units Notes
V Input Low Voltage
IL n -0.58
e V + 0.2 V
0.700 V
V Input High Voltage
IH1.5
u d REF TT V (2)
V Input High Voltage
IH2.5
st 2.0 3.18 V (3)
V Low Level Output Voltage
OL
i ty 0.40 V @I OL
V High Level Output

OH .c Voltage V CMOS V (1)
I Low Level Outputw Current 9 mA @V
OL
w A
CL
LI
w
I Input Leakage Current 100
I Output Leakage Current
LO 100 A
Q9. Refer Text
Q10. Refer Text

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
9
General Purpose
Processors - II
Signals
Signals of a General Purpose Processor
Multiplexing
Address Signals
Data Signals
Control
Bus Arbitration Signals
Status Signal Indicators
Sleep State Indicators
Interrupts
Pre-requisite
o m
t.c
Digital Electronics
9.1 Introduction p o
g s
o
bl
The input/output signals of a processor chip are the matter discussion in this chapter. We shall
p .
take up the same VIA C3 processor as discussed in the last chapter.
In the design flow of a processor the internal architecture is determined and simulated for
optimal performance.
o u
gr
s
APPLICATION REQUIREMENT
ent
CAPTURE
u d INSTRUCTION SET
DESIGN AND CODING
FUNCTIONAL
st
it y
.c ASIC SW
INITIAL ABSTRACT
w FINAL INSTRUCTION SET & HW TOOLS
w
INSTRUCTION SET INITIAL ARCHITECTURE FLOW FLOW
w
ENVIRONMENT REQUIREMENT EXPLORATION OF
CAPTURE ARCHITECTURES
FUNCTIONAL NON - ESTIMATION

FUNCTIONAL
AUGMENTED ABSTRACT
FINAL INSTRUCTION SET & PROCESSOR TOOLS &
INSTRUCTION SET
FINAL ARCHITECTURE HW IMPLEMENTATION
ARCHITECTURE
Fig. 9.1 The overall design flow for a typical processor

The basic architecture decides the signals. Broadly the signals can be classified as:
1. Address Signals
2. Data Signals
3. Control Signals
4. Power Supply Signals
Some of these signals are multiplexed in time for making the VLSI design easier and efficient
without affecting the over all performance.
Multiplexed in Time (known as Time Division Multiplexing)

A digital data transmission method that takes signals from multiple sources, divides them into
pieces which are then placed periodically into time slots,(clock cycles here) transmits them down
a single path and reassembles the time slots back into multiple signals on the remote end of the
transmission
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
y
it
c
. Fig. 9.2 Bottom View of the Processor
w
w
9.2 w VIA Processor discussed earlier
Signals of
The following lines discuss the various signals associated with the processor.
A[31:3]# The address Bus provides addresses for physical memory and external I/O devices.
During cache inquiry cycles, A31#-A3# are used as inputs to perform snoop cycles. This is an
output signal when it sends and address to the memory and I/O device. It serves as both input
and output during snoop cycles. It is synchronized with the Bus Clock (BCLK)
Snoop cycles: The term "snooping" commonly refers to at least three different actions.
Inquire Cycles: These are bus cycles, initiated by external logic, that cause the processor
to look up an address in its physical cache tags.

Internal Snooping: These are internal actions by the processor (rather than external
logic) that are taken during certain types of cache accesses in order to detect self-
modifying code.
Bus Watching: Some caching devices watch their address and data bus continuously
while they are held off the bus, comparing every address driven by another bus master
with their internal cache tags and optionally updating their cached lines on the fly,
during write backs by the other master.
A20M# A20 Mask causes the CPU to make (force to 0) the A20 address bit when driving the
external address bus or performing an internal cache access. A20M# is provided to emulate the 1
M Byte address wrap-around that occurs on the x86. Snoop addressing is not affected. It is an
input signal. If it is not used then it is connected to the power supply. This is not synchronized
with the Bus Clock or anything.
ADS# Address Strobe begins a memory/I/O cycle and indicates the address bus (A31#-A3#) and
o m
transaction request signals (REQ#) are valid. This is an output signal during addressing cycle and
o t.c
an input/output signal during transaction request cycles. This is synchronized with the bus clock.
Memory /I/O cycle: The memory and input output data transfer (read or write) is carried out in
p
different clock cycles. The address is first loaded on the address bus. The processor being faster
s
g
waits till the memory or input/output is ready to send or receive the date through the data bus.
o
bl
Normally it takes more than one clock cycle.
request comes through this line. p .

Transaction Request Cycle: When the external device request the CPU to transmit data. The
ou
r
BCLK Bus Clock: provides the fundamental timing for the CPU. The frequency of the input
g
s
clock determines the operating frequency of the CPUs bus. External timing is defined
nt
referenced to the rising edge of CLK. It is an Input clock signal.
e
u d
BNR# Block Next Request: signals a bus stall by a bus agent unable to accept new transactions.
st
This is an input or output signal and is synchronized with the bus clock.
it y
.c
BPRI# Priority Agent Bus Request arbitrates for ownership of the system bus. Input and is
synchronized with the Bus clock.
w
w
Bus Arbitration: At times external devices signal the processor to release the system
w
address/data/control bus from its control. This is achieved by an external request which
normally comes from the external devices such as a DMA controller or a Coprocessor.
BR[4:0]: Hardware strapping options for setting the processors internal clock multiplier. By
strapping these wires to the supply or ground (some times they can be kept open for making
them 1). This option divides the input clock.
BSEL[1:0]: Bus frequency select balls (BSEL 0 and BSEL 1) identify the appropriate bus speed
(100 MHz or 133 MHz). It is an output signal.
BR0#: It drives the BREQ[0]# signal in the system to request access to the system bus.
D[63:0]#: Data Bus signals are bi-directional signals which provide the data path between the
CPU and external memory and I/O devices. The data bus must assert DRDY# to indicate valid
data transfer. This is both input as well as output.
DBSY#: Data Bus Busy is asserted by the data bus driver to indicate data bus is in use. This is
both input as well as output.
DEFER#: Defer is asserted by target agent and indicates the transaction cannot be guaranteed
as an in-order completion. This is an input signal.
DRDY#: Data Ready is asserted by data driver to indicate that a valid signal is on the data bus.
This is both input and output signal.
FERR#: FPU Error Status indicates an unmasked floating-point error has occurred. FERR# is
asserted during execution of the FPU instruction that caused the error. This is an output signal.
FLUSH#: Flush Internal Caches writing back all data in the modified state. This is an input
signal to the CPU.
o m
HIT#: Snoop Hit indicates that the current cache inquiry address has been found in the cache.
This is both input as well as output signal.
o t.c
p
HITM#: Snoop Hit Modified indicates that the current cache inquiry address has been found in
sg
the cache and dirty data exists in the cache line (modified state). (both input/output)
o
l
INIT#: Initialization resets integer registers and does notbaffect internal cache or floating point
registers. (Input) p .
u
oto the CPU.
INTR: Maskable Interrupt I. This is an input signal
gr
t s
NMI: Non-Maskable Interrupt I
n
eto signal to the target that the operation is atomic.
LOCK#: Lock Status is used by the CPU d
uthat a CPU can perform such that all results will be made
An atomic operation is any operation t
stime and whose operation is safe from interference by other
visible to each CPU at the same
t y
CPUs. For example, reading ori writing a word of memory is an atomic operation.
.c
NCHCTRL: The CPU w
w uses this ball to control integrated I/O pull-ups. A resistance is to be
connected here
w to control the current on the input/output pins.
PWRGD (power good) Indicates that the processors VCC is stable. It is an input signal.
REQ[4:0]#: Request Command is asserted by bus driver to define current transaction type.
RESET#: This is an input that resets the processor and invalidates internal cache without writing
back.
RTTCTRL: The CPU uses this ball to control the output impedance.
RS[2:0]#: Response Status is an input that signals the completion status of the current
transaction when the CPU is the response agent.
SLP#: Sleep when asserted in the stop grant state, causes the CPU to enter the sleep state.
Different Sleep states

"Stop Grant"
Power to CPU is maintained, but no instructions are executed. The CPU halts itself and may
shut down many of its internal components. In Microsoft Windows, the "Standby" command is
associated with this state by default.
"Suspend to RAM"
All power to the CPU is shut off, and the contents of its registers are flushed to RAM, which
remains on. This system state is the most prone to errors and instability.
"Suspend to Disk"
CPU power shut off, but RAM is written to disk and shut off as well. In Microsoft Windows, the
"Hibernate" command is associated with this state. Because the contents of RAM are written out
m
to disk, system context is maintained. For example, unsaved files would not be lost following this.
o
"Soft Off"
o t.c
System is shut down, however some power may be supplied to certain devices to generate a wake
s p
event, for example to support automatic startup from a LAN or USB device. In Microsoft
g
Windows, the "Shut down" command is associated with this state. Mechanical power can usually
o
bl
be removed or restored with no ill effects.
p .
Processor "C" power states
ou
Processor "C" power states are also defined. These are typically implemented in laptop
gr
platforms only. Here the cpu consumes less power while still doing work, and the tradeoff comes
s
nt
between power and performance, rather than power and latency.
d e
SMI#: System Management (SMM) Interrupt forces the processor to save the CPU state to the
t u
top of SMM memory and to begin execution of the SMI services routine at the beginning of the
s
defined SMM memory space. An SMI is a high-priority interrupt than NMI.
y
it
.c
STPCLK#: Stop Clock Input causes the CPU to enter the stop grant state.
w
w
TRDY#: Target Ready Input indicates that the target is ready to receive a write or write-back
w
transfer from the CPU.
VID[3:0]: Voltage Identification Bus informs the regulatory system on the motherboard of the
CPU Core voltage requirements. This is an output signal.
9.3 Conclusion
In this chapter the various signals of a typical general purpose processor has been discussed.
Broadly we can classify them into the following categories.
Address Signals: They are used to address the memory as well as input/output devices. They are
often multiplexed with other control signals. In such cases External Bus controllers latch these
address lines and make them available for a longer time for the memory and input/output devices
while the CPU changes the status of the same. The Bus controllers drive their inputs which are

connected to the CPU to high impedance so as not to interfere with the current state of these lines
from the CPU.
Data Signals: These lines carry the data to and fro the processor and memory or i/o devices.
Transceivers are connected on the data path to control the data flow. The data flow might
succeed some bus transaction signals. This bus transaction signals are necessary to negotiate the
speed mismatch between the input/output and the processor.
Control Signals: These can be generally divided into the following groups
Read Write Control
Memory Write The processor issues this signal while sending data to the memory
Memory Read The processor issues this signal while reading the data from the memory
I/O Read The input/output read signal which is generally preceded by some bus transaction
signals
I/O Write The input/output read signal which is generally succeeded by some bus transaction
signals
o m
from a set of status signal by an external bus controller.
o t.c
These read write signals are not generally directly available from the CPU. They are decoded
s p
Bus Transaction Control
o g
Master versus Slave .bl
master send p
uaddress
Bus
r o Bus
sggo either way

master slave
data tcan
e n the address and receiving or sending the data.

d
A bus transaction includes two parts: sending
Master is the one who starts the busutransaction
responds to the address by sending t
s send
by sending the address. Slave is the one who
data to the master if the master asks for data and receiving
i
data from master if master wantst yto data. These are controlled by signals like Ready, Defer
etc.
.c
Bus Arbitration w
w
Control
w
Control: master initiates requests
Bus Bus
Master Slave
Data can go either way
This is known as requesting to obtain the access to a bus. They are achieved by the following
lines.
Bus Request: The slave requests for the access grant
Bus Grant: Gets the grant signal
Lock: For specific operations the bus requests are not granted as the CPU might be doing some
important operations.

Interrupt Control
In a multitasking environment the Interrupts are external signals to the CPU for emergency
operations. The CPU executes the interrupt service routines while acknowledging the interrupts.
The interrupts are processed according to their priority. More discussion is available in
subsequent lessons.
Processor Control
These lines are activated when there is a power on or the processor comes up from a power-
saving mode such as sleep. These are
Reset
Test lines etc.
Some of the above signals will be discussed in the subsequent lessons.
o m
t.c
p o
Q1. What is maximum memory addressing capability of the processor discussed in this lecture?
g s
o
Ans: The number of address lines is 32. Therefore it can address 232 locations which is 4G bytes
. bl
Q2. p
What do you understand by POST in a desktop computer?
u
r o
Ans: It is called Power On Self Test. This is a routine executed to check the proper functioning
s g
of Hard Disk, CDROM, Floppy Disk and many other on-board and off-board components while
the computer is powered on.
ent
Q3.
u d
Describe the various power-saving modes in a general purpose CPU?
st
y
it
Ans: Refer to: Sleep Mode in Text
Q4. . c
What could be the differences in design of a processor to be used in the following
w
applications?
w
w
LAPTOP
Desktop
Motor Control
Ans:
LAPTOP processor: should be complex General Purpose Processor with low power consumption
and various power saving modes.
Desktop: High Performance processor which has no limit on power consumption.
Motor Control: Simple low power specialized processor with on-chip peripherals with Real Time
Operating System.
Q5. What is the advantage of reducing the High state voltage from 5 V to 3.5 volts? What are
the disadvantages?
Ans:
It reduces the interference but decreases the noise accommodation.
Q6. What is the use of Power-Good signal?
Ans:
It is used to know the quality of supply in side the CPU. If it is not good there may mal-
operations and data loss.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
10
Embedded Processors - I
Architecture of an Embedded Processor

The Architectural Overview of Intel MCS 96 family of
Microcontrollers
Pre-requisite
Digital Electronics
10.1 Introduction
It is generally difficult to draw a clear-cut boundary between the class of microcontrollers and
general purpose microprocessors. Distinctions can be made or assumed on the following
grounds.
o m
.c
Microcontrollers are generally associated with the embedded applications
t
Microprocessors are associated with the desktop computers o
s p
Microcontrollers will have simpler memory hierarchy i.e. g the RAM and ROM may exist
on the same chip and generally the cache memory will
b lobe absent.
The power consumption and temperature rise of .microcontroller is restricted because of
the constraints on the physical dimensions. up
o
8-bit and 16-bit microcontrollers are very rpopular with a simpler design as compared to
s ggeneral purpose processors.
t
large bit-length (32-bit, 64-bit) complex
n
d e
However, recently, the market for 32-bit embedded processors has been growing.
u
Further the issues such as power consumption, cost, and integrated peripherals differentiate a
desktop CPU from an embeddedstprocessor. Other important features include the interrupt
y RAM or ROM, and the number of parallel ports. The
response time, the amount of ton-chip
i
.
desktop world values processing c power, whereas an embedded microprocessor must do the job
w
for a particular application at the lowest possible cost.
w
w

32- or 64-bit
desktop
processors
Performance
Embedded control 32-bit

embedded
controllers/processor
8- or 16-bit
controller
o m
4-bit ot.c
controller
s p
o g
Cost
. bl
p
u vs Cost regions
Fig. 10.1 The Performance
r o
s g
n t
d e
t u
s
ity
.c
w
w
w

ROM EEPROM
RAM
Micro-
processor Serial I/O
A/D
Input Input Parallel I/O
Analog
and and
I/O
output output Timer
D/A
ports ports
o m
t.c
PWM
(a) Microprocessor-based system

p o
g s
o
ROM
bl
EEPROM
.
u p
r o RAM
s g
Analog
in
A/D
n t Serial I/O
d e CPU
t u core Parallel I/O
y s
i t
.c
Timer
w Analog
w PWM Filter
out
w Microcontroller
Digital
PWM
(b) Microcontroller-based system
Fig. 10.2 Microprocessor versus microcontroller
Fig. 10.1 shows the performance cost plot of the available microprocessors. Naturally the more is
the performance the more is the cost. The embedded controllers occupy the lower left hand
corner of the plot.
Fig.10.2 shows the architectural difference between two systems with a general purpose
microprocessor and a microcontroller. The hardware requirement in the former system is more
than that of later. Separate chips or circuits for serial interface, parallel interface, memory and
AD-DA converters are necessary On the other hand the functionality, flexibility and the
complexity of information handling is more in case of the former.
10.2 The Architecture of a Typical Microcontroller

A typical microcontroller chip from the Intel 80X96 family is discussed in the following
paragraphs.
Core Optional Interrupt

ROM Controller
Clock and PTS

Power Mgmt.
o m
o t.c
s p
o g
A/D l WDT
I/O EPA PWM WG
. b FG SIO
u p
Fig. 10.3 The Architectural Block diagramr o of Intel 8XC196 Microcontroller
s g
n t
PTS: Peripheral Transaction Server; I/O: Input/Output Interface; EPA: Event Processor Array;
d eWG: Waveform Generator; A/D- Analog to Digital

PWM: Pulse Width Modulated Outputs;
t u
Converter;
y s
i t
.c Serial Input/Output Port
FG: Frequency Generator; SIO:
w
w
Fig. 10.3 shows the functional block diagram of the microcontroller. The core of the
w
microcontroller consists of the central processing unit (CPU) and memory controller. The CPU
contains the register file and the register arithmetic-logic unit (RALU). A 16-bit internal bus
connects the CPU to both the memory controller and the interrupt controller. An extension of this
bus connects the CPU to the internal peripheral modules. An 8-bit internal bus transfers
instruction bytes from the memory controller to the instruction register in the RALU.

CPU Memory Controller
Register File RALU Prefetch Queue

Microcode
Engine Slave PC
Register ALU Address Register

RAM
Master PC Data Register
PSW
CPU SFRs
o m
t.c
Registers Bus Controller
p o
g s
Fig. 10.4 The Architectural Block diagram of the core
o
l Logic Unit; ALU: Arithmetic Logic
CPU: Central Processing Unit; RALU: Register Arithmetic
. b
Unit;
u p
Master PC: Master Program Counter; PSW: Processor r o Status Word; SFR: Special Function
Registers
s g
n t
CPU Control
d e
The CPU is controlled by the s tu
microcode engine, which instructs the RALU to perform
operations using bytes, words, tor y
i accesses the upper register file. Windowing is a technique that
double-words from either the 256-byte lower register file or
.
through a window that directly
maps blocks of the upperw
c
register file into a window in the lower register file. CPU instructions
move from the 4-bytewprefetch queue in the memory controller into the RALUs instruction
w
register. The microcode engine decodes the instructions and then generates the sequence of
events that cause desired functions to occur.
Register File
The register file is divided into an upper and a lower file. In the lower register file, the lowest 24
bytes are allocated to the CPUs special-function registers (SFRs) and the stack pointer, while
the remainder is available as general-purpose register RAM. The upper register file contains only
general-purpose register RAM. The register RAM can be accessed as bytes, words, or double
words. The RALU accesses the upper and lower register files differently. The lower register file
is always directly accessible with direct addressing. The upper register file is accessible with
direct addressing only when windowing is enabled.

Register Arithmetic-logic Unit (RALU)

The RALU contains the microcode engine, the 16-bit arithmetic logic unit (ALU), the master
program counter (PC), the processor status word (PSW), and several registers. The registers in
the RALU are the instruction register, a constants register, a bit-select register, a loop counter,
and three temporary registers (the upper-word, lower-word, and second-operand registers). The
PSW contains one bit (PSW.1) that globally enables or disables servicing of all maskable
interrupts, one bit (PSW.2) that enables or disables the peripheral transaction server (PTS), and
six Boolean flags that reflect the state of your program. All registers, except the 3-bit bit-select
register and the 6-bit loop counter, are either 16 or 17 bits (16 bits plus a sign extension). Some
of these registers can reduce the ALUs workload by performing simple operations.
The RALU uses the upper- and lower-word registers together for the 32-bit instructions and as
temporary registers for many instructions. These registers have their own shift logic and are used
for operations that require logical shifts, including normalize, multiply, and divide operations.
o m
The six-bit loop counter counts repetitive shifts. The second-operand register stores the second
operand for two-operand instructions, including the multiplier during multiply operations and the
o t.c
divisor during divide operations. During subtraction operations, the output of this register is
complemented before it is moved into the ALU. The RALU speeds up calculations by storing
s p
constants (e.g., 0, 1, and 2) in the constants register so that they are readily available when
g
complementing, incrementing, or decrementing bytes or words. In addition, the constants register
o
bl
generates single-bit masks, based on the bit-select register, for bit-test instructions.
p .
Code Execution u
o
r
The RALU performs most calculations for gthe microcontroller, but it does not use an
accumulator. Instead it operates directly ontthe s lower register file, which essentially provides
256 accumulators. Because data doesn not flow through a single accumulator, the
microcontrollers code executes faster and d emore efficiently.
t u
s
Instruction Format
ti y
These microcontrollers combine .c general-purpose registers with a three-operand instruction
w a single instruction to specify two source registers and a separate
w
format. This format allows
w
destination register. For example, the following instruction multiplies two 16-bit variables and
stores the 32-bit result in a third variable.
Memory Interface Unit

The RALU communicates with all memory, except the register file and peripheral SFRs, through
the memory controller. The memory controller contains the prefetch queue, the slave program
counter (slave PC), address and data registers, and the bus controller. The bus controller drives
the memory bus, which consists of an internal memory bus and the external address/data bus.
The bus controller receives memory-access requests from either the RALU or the prefetch
queue; queue requests always have priority.

When the bus controller receives a request from the queue, it fetches the code from the address
contained in the slave PC. The slave PC increases execution speed because the next instruction
byte is available immediately and the processor need not wait for the master PC to send the
address to the memory controller. If a jump interrupt, call, or return changes the address
sequence, the master PC loads the new address into the slave PC, then the CPU flushes the queue
and continues processing.
Interrupt Service
The interrupt-handling system has two main components: the programmable interrupt controller
and the peripheral transaction server (PTS). The programmable interrupt controller has a
hardware priority scheme that can be modified by the software. Interrupts that go through the
interrupt controller are serviced by interrupt service routines those are provided by you. The
peripheral transaction server (PTS) which is a microcoded hardware interrupt-processor provides
efficient interrupt handling.
o m
Disable Clock Input
o t.c
(Powerdown)
s p
FXTAL 1 Divide-by-two
o g
bl
XTAL 1
Circuit
p .
Disable Clocks
o u
(Powerdown)
XTAL 2
gr Peripheral Clocks (PH1, PH2)
Disable s
Clock
t
Generators CLKOUT
(Powerdown) e
Oscillator n CPU Clocks (PH1, PH2)
u d
s t Disable Clocks
(Idle, Powerdown)
y
it Fig. 10.5 The clock circuitry
. c
Internal Timing w
w
w
The clock circuitry (Fig. 10.5) receives an input clock signal on XTAL1 provided by an
external crystal or oscillator and divides the frequency by two. The clock generators accept the
divided input frequency from the divide-by-two circuit and produce two non-overlapping
internal timing signals, Phase 1(PH1) and Phase 2 (PH2). These signals are active when high.

XTAL 1
TXTAL 1 TXTAL 1
1 State Time 1 State Time
PH 1
PH 2
CLKOUT
Phase 1 Phase 2 Phase 1

o m 2
Phase
Fig. 10.6 The internal clock phases

ot.c
s p
The rising edges of PH1 and PH2 generate the internal CLKOUT
g signal (Fig. 10.6). The
clock circuitry routes separate internal clock signals to the CPU
lo and the peripherals to provide
flexibility in power management. Because of the complexblogic in the clock circuitry, the signal
on the CLKOUT pin is a delayed version of the internalpCLKOUT . signal. This delay varies with
temperature and voltage.
o u
g r
I/O Ports ts
n
Individual I/O port pins are multiplexedeto serve as standard I/O or to carry special function
u d or an off-chip component. If a particular special-
s t
signals associated with an on-chip peripheral
function signal is not used in an application, the associated pin can be individually configured to
i
serve as a standard I/O pin. Ports t y 3 and 4 are exceptions; they are controlled at the port level.
When the bus controller needs
.ccantouse
use the address/data bus, it takes control of the ports. When the
w
address/data bus is idle, you the ports for I/O. Port 0 is an input-only port that is also the
analog input for the A/Dw converter. For more details the reader is requested to see the data
manual at w
www.intel.com/design/mcs96/manuals/27218103.pdf.
Serial I/O (SIO) Port

The microcontroller has a two-channel serial I/O port that shares pins with ports 1 and 2. Some
versions of this microcontroller may not have any. The serial I/O (SIO) port is an
asynchronous/synchronous port that includes a universal asynchronous receiver and transmitter
(UART). The UART has two synchronous modes (modes 0 and 4) and three asynchronous
modes (modes 1, 2, and 3) for both transmission and reception. The asynchronous modes are full
duplex, meaning that they can transmit and receive data simultaneously. The receiver is buffered,
so the reception of a second byte can begin before the first byte is read. The transmitter is also
buffered, allowing continuous transmissions. The SIO port has two channels (channels 0 and 1)
with identical signals and registers.

Event Processor Array (EPA) and Timer/Counters

The event processor array (EPA) performs high-speed input and output functions associated with
its timer/counters. In the input mode, the EPA monitors an input for signal transitions. When an
event occurs, the EPA records the timer value associated with it. This is called a capture event.
In the output mode, the EPA monitors a timer until its value matches that of a stored time value.
When a match occurs, the EPA triggers an output event, which can set, clear, or toggle an output
pin.
This is called a compare event. Both capture and compare events can initiate interrupts, which
can be serviced by either the interrupt controller or the PTS. Timer 1 and timer 2 are both 16-bit
up/down timer/counters that can be clocked internally or externally. Each timer/counter is called
a timer if it is clocked internally and a counter if it is clocked externally.
Pulse-width Modulator (PWM)

o m
t.c
The output waveform from each PWM channel is a variable duty-cycle pulse. Several types of
electric motor control applications require a PWM waveform for most efficient operation. When
p o
filtered, the PWM waveform produces a DC level that can change in 256 steps by varying the
s
duty cycle. The number of steps per PWM period is also programmable (8 bits).
g
o
Frequency Generator
. bl
p
u generator. This peripheral produces a
Some microcontrollers of this class has this frequency
r o
waveform with a fixed duty cycle (50%) and a programmable frequency (ranging from 4 kHz to
1 MHz with a 16 MHz input clock).
s g
n t
Waveform Generator
d e
u
t task of generating synchronized, pulse-width modulated
A waveform generator simplifies sthe
(PWM) outputs. This waveformtygenerator is optimized for motion control applications such as
i
driving 3-phase AC inductioncmotors, 3-phase DC brushless motors, or 4-phase stepping motors.
The waveform generator w
.
can produce three independent pairs of complementary PWM outputs,
which share a commonwcarrier period, dead time, and operating mode. Once it is initialized, the
waveform generatorw operates without CPU intervention unless you need to change a duty cycle.
Analog-to-digital Converter
The analog-to-digital (A/D) converter converts an analog input voltage to a digital equivalent.
Resolution is either 8 or 10 bits; sample and convert times are programmable. Conversions can
be performed on the analog ground and reference voltage, and the results can be used to calculate
gain and zero-offset errors. The internal zero-offset compensation circuit enables automatic zero
offset adjustment. The A/D also has a threshold-detection mode, which can be used to generate
an interrupt when a programmable threshold voltage is crossed in either direction. The A/D scan
mode of the PTS facilitates automated A/D conversions and result storage.

Watchdog Timer
The watchdog timer is a 16-bit internal timer that resets the microcontroller if the software fails
to operate properly.
Special Operating Modes

In addition to the normal execution mode, the microcontroller operates in several special-purpose
modes. Idle and power-down modes conserve power when the microcontroller is inactive. On
circuit emulation (ONCE) mode electrically isolates the microcontroller from the system, and
several other modes provide programming options for nonvolatile memory.
Reducing Power Consumption
o m
In idle mode, the CPU stops executing instructions, but the peripheral clocks remain active.
o t.c
Power consumption drops to about 40% of normal execution mode consumption. Either a
hardware reset or any enabled interrupt source will bring the microcontroller out of idle mode. In
p
power-down mode, all internal clocks are frozen at logic state zero and the internal oscillator is
s
g
shut off. The register file and most peripherals retain their data if VCC is maintained. Power
o
bl
consumption drops into the W range.
Testing the Printed Circuit Board p .

o u
gr
The on-circuit emulation (ONCE) mode electrically isolates the microcontroller from the system.
s
nt
By invoking the ONCE mode, you can test the printed circuit board while the microcontroller is
soldered onto the board.
d e
Programming the Nonvolatile t u Memory
s
ty OTPROM provide several programming options:
The microcontrollers that haveiinternal
c a master EPROM programmer to program and verify one or
Slave programming .allows
w
more slave microcontrollers. Programming vendors and Intel distributors typically use
w a large number of microcontrollers with a customers code and
this mode to program
data. w
Auto programming allows an microcontroller to program itself with code and data
located in an external memory device. Customers typically use this low-cost method to
program a small number of microcontrollers after development and testing are complete.
Run-time programming allows you to program individual nonvolatile memory locations
during normal code execution, under complete software control. Customers typically use
this mode to download a small amount of information to the microcontroller after the rest
of the array has been programmed. For example, you might use run-time programming to
download a unique identification number to a security device.
ROM dump mode allows you to dump the contents of the microcontrollers nonvolatile
memory to a tester or to a memory device (such as flash memory or RAM).

10.3 Conclusion
This lesson discussed about the architecture of a typical high performance microcontrollers.
The next lesson shall discuss the signals of a typical microcontroller from the Intel MCS96
family.

1. What do you mean by the Microcode Engine?
Ans: This is where the instructions which breaks down to smaller micro-instructions are
executed.
Microprogramming was one of the key breakthroughs that allowed system architects to
implement complex instructions in hardware. To understand what microprogramming is, it
m
helps to first consider the alternative: direct execution. With direct execution, the machine
o
t.c
fetches an instruction from memory and feeds it into a hardwired control unit. This control
unit takes the instruction as its input and activates some circuitry that carries out the task. For
p o
instance, if the machine fetches a floating-point ADD and feeds it to the control unit, theres
s
a circuit somewhere in there that kicks in and directs the execution units to make sure that all
g
o
of the shifting, adding, and normalization gets done. Direct execution is actually pretty much
bl
what youd expect to go on inside a computer if you didnt know about microcoding.
.
p
The main advantage of direct execution is thatuits fast. Theres no extra abstraction or
r
translation going on; the machine is just decodingo and executing the instructions right in
hardware. The problem with it is that it cangtake up quite a bit of space. Think about it. If
t
every instruction has to have some circuitrys that executes it, then the more instructions you
e
have, the more space the control unit willn take up. This problem is compounded if some of
the instructions are big and complex,
u d and take a lot of work to execute. So directly executing
instructions for a CISC machinet just wasnt feasible with the limited transistor resources of
s
ti yits almost like theres a mini-CPU on the CPU. The control
the day.
With microprogramming,
.
unit is a microcode engine c that executes microcode instructions. The CPU designer uses
these microinstructionsw to write microprograms, which are stored in a special control
memory. When a w
microcode engine,w normal program instruction is fetched from memory and fed into the
the microcode engine executes the proper microcode subroutine. This
subroutine tells the various functional units what to do and how to do it.
As you can probably guess, in the beginning microcode was a pretty slow way to do
things. The ROM used for control memory was about 10 times faster than magnetic core-
based main memory, so the microcode engine could stay far enough ahead to offer decent
performance. As microcode technology evolved, however, it got faster and faster. (The
microcode engines on current CPUs are about 95% as fast as direct execution) Since
microcode technology was getting better and better, it made more and more sense to just
move functionality from (slower and more expensive) software to (faster and cheaper)
hardware. So ISA instruction counts grew, and program instruction counts shrank.
As microprograms got bigger and bigger to accommodate the growing instructions sets,
however, some serious problems started to emerge. To keep performance up, microcode had
to be highly optimized with no inefficiencies, and it had to be extremely compact in order to
keep memory costs down. And since microcode programs were so large now, it became

much harder to test and debug the code. As a result, the microcode that shipped with
machines was often buggy and had to be patched numerous times out in the field. It was the
difficulties involved with using microcode for control that spurred Patterson and others began
to question whether implementing all of these complex, elaborate instructions in microcode
was really the best use of limited transistor resources.
2. What is the function of the Watch Dog Timer?
Ans: A fail-safe mechanism that intervenes if a system stops functioning. A hardware timer
that is periodically reset by software. If the software crashes or hangs, the watchdog timer
will expire, and the entire system will be reset automatically.
The Watch Dog Unit contains a Watch Dog Timer.
A watchdog timer (WDT) is a device or electronic card that performs a specific operation
after a certain period of time if something goes wrong with an electronic system and the
system does not recover on its own.
o m
A common problem is for a machine or operating system to lock up if two parts or
o t.c
programs conflict, or, in an operating system, if memory management trouble occurs. In
some cases, the system will eventually recover on its own, but this may take an unknown and
p
perhaps extended length of time. A watchdog timer can be programmed to perform a warm
s
g
boot (restarting the system) after a certain number of seconds during which a program or
o
bl
computer fails to respond following the most recent mouse click or keyboard action. The
p .
timer can also be used for other purposes, for example, to actuate the refresh (or reload)
button in a Web browser if a Web site does not fully load after a certain length of time
u
following the entry of a Uniform Resource Locator (URL).
o
A WDT contains a digital counter that g
r
s counts down to zero at a constant speed from a
preset number. The counter speed is keptt constant by a clock circuit. If the counter reaches
zero before the computer recovers, e
n
a signal is sent to designated circuits to perform the
desired action.
u d
st
i ty
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
11
Embedded Processors - II
Signals of a Typical Microcontroller

The Overview of Signals of Intel MCS 96 family of Microcontrollers

Introduction
Typical Signals of a Microcontroller
Pre-requisite
Digital Electronics
11.1 Introduction
o m
Microcontrollers are required to operate in the real world without much of interface
o t.c
circuitry. The input-output signals of such a processor are both analog and digital. The digital
data transmission can be both parallel and serial. The voltage levels also could be different.
s p
The architecture of a basic microcontroller is shown in Fig. 11.1. It illustrates the various
g
modules inside a microcontroller. Common processors will have Digital Input/Output, Timer and
o
bl
Serial Input/Output lines. Some of the microcontrollers also support multi-channel Analog to
p .
Digital Converter (ADC) as well as Digital to Analog Converter (DAC) units. Thus analog signal
input and output pins are also present in typical microcontroller units. For external memory and
o
I/O chips the address as well as data lines are also supported. u
gr
s
t area
8 enRAM 8
u d Port
Timer
16-Bit st CPU ADC A
8 it y
.c
w Tx
w ROM area Serial Port
Rx
w
Port B Port C
5 8
Fig. 11.1 A basic Microcontroller and its signals

Port 11 Port 10 EPORT Port 12
Stack
Watchdog A/D Pulse-width SSI00
Overflow
Timer Converter Modulators SSI01
Module
Peripheral Addr Bus (10)
Peripheral Addr Bus (16) Baud-rate

SIO0
Generator
Bus Control
Memory Addr Bus (24)

Chip-select
AZO:15
Memory Data Bus (16)
Unit
o m Port 2
t.c
Bus
Controller
po
Baud-rate
Peripheral SIO1 Generator
AD15:0 Interrupt
Handler
g s
o
bl
Bus-Control Peripheral Ports 7.8
Interface Unit
Queue p .
Transaction
Server
o uInterrupt
17 Capture/
Microcode
gr Controller
Compares
Engine
s 4 Times
nt
EPA
8 Output/
Source (16)
d e Simulcaptures
t u
y s Port 9
Register
RAM
i t
Memory
Interface
.c
ALU
1 Kbyte Unit
w
w
w
Destination (16)
Code/Data Serial
RAM Debug
3 Kbytes Unit
Fig. 11.2 The architecture of an MCS96 processor
11.2 The Signals of Intel Mcs 96

The various units of an MCS96 processor are shown in Fig. 11.2. The signals of such a processor
can be divided into the following groups

Address/Data Lines
Bus Control Signals
Signals related to Interrupt
Signals related to Timers/Event Manager
Digital Input/Output Ports
Analog Input/Output Ports
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 11.3 Signals of MCS96
Address and Data Pins

A15:0 System Address Bus. These are output pins and provide address bits 015 during the
entire external memory cycle.
A20:16 Address Pins 1620. These are output pins used during external memory cycle. These
are multiplexed with EPORT.4:0. This is a part of the 8-bit extended addressing port. It is used to

support extended addressing. The EPORT is an 8-bit port which can operate either as a general-
purpose I/O signal (I/O mode) or as a special-function signal (special-function mode).
AD15:0 Address/Data Lines These lines serve as input as well as output pins. The function of
these pins depends on the bus width and mode. When a bus access is not occurring, these pins
revert to their I/O port function. AD15:0 drive address bits 015 during the first half of the bus
cycle and drive or receive data during the second half of the bus cycle.
Bus Control and Status Signals

ALE Address Latch Enable: This is an output signal and is active-high output. It is asserted only
during external memory cycles. ALE signals the start of an external bus cycle and indicates that
valid address information is available on the system address/data bus (A20:16 and AD15:0 for a
multiplexed bus; A20:0 for a demultiplexed bus). An external latch can use this signal to
demultiplex address bits 015 from the address/data bus in multiplexed mode.
o m
BHE: Byte High Enable- During 16-bit bus cycles, this active-low output
t. c signal is asserted for
word and high-byte reads and writes to external memory. BHE# indicates
p o that valid data is being
transferred over the upper half of the system data bus.
s
gtransfers from the cpu to external
o
bl writes and word writes to external
WRH Write High. This is an output signal During 16-bit data
.
devices, this active-low output signal is asserted for high-byte
p
memory.
u
oactive-low output signal is asserted during a
BREQ: Bus Request .This is an output signal. This r
gexternal memory cycle.
hold cycle when the bus controller has a pending
ts
n
de Theis inactive-low
CS2:0 Chip-select Lines 02: Output Signal.
memory cycle when the address to beuaccessed
output is asserted during an external
st the range as programmed.
HOLD: Input Signal: Hold Request

i t y An external device uses this active-low input signal to
request control of the bus. c
.
w
HLDA: Output Signal:wBus Hold Acknowledge This active-low output indicates that the CPU
has released the bus was the result of an external device asserting HOLD.
INST Output signal: When high, INST indicates that an instruction is being fetched from
external memory. The signal remains high during the entire bus cycle of an external instruction
fetch.
RD: Read Signal: Output: It is asserted only during external memory reads.
READY: Ready Input: This active-high input can be used to insert wait states in addition to
those programmed in the chip configuration.
WR: Write: Output Signal: This active-low output indicates that an external write is occurring.
This signal is asserted only during external memory writes.

WRH Write High: Output Signal: During 16-bit bus cycles, this active-low output signal is
asserted for high-byte writes and word writes to external memory.
WRL Write Low: Output Signal: During 16-bit bus cycles, this active-low output signal is
asserted for low-byte writes and word writes to external memory.
Processor Control Signals

CLKOUT: Clock Out: It is the output of the internal clock generator. This signal can be
programmed to have different frequencies and can be used by the external devices for
synchronization etc.
EA: External Access: Input Signal: This input determines whether memory accesses to the upper
7 Kbytes of ROM (FF2400FF3FFFH) are directed to internal or external memory. These
o m
accesses are directed to internal memory if EA# is held high and to external memory if EA# is
held low. For an access to any other memory location, the value of EA# is irrelevant.
o t.c
EXTINT: External Interrupt Input: In normal operating mode, a rising edge on EXTINT sets the
s p
EXTINT interrupt pending bit. EXTINT is sampled during phase 2 (CLKOUT high). The
g
minimum high time is one state time. If the EXTINT interrupt is enabled, the CPU executes the
o
bl
interrupt service routine.
p .
u
NMI: Nonmaskable Interrupt Input: In normal operating mode, a rising edge on NMI generates a
o
nonmaskable interrupt. NMI has the highest priority of all prioritized interrupts.
r
g electrically isolates the microcontroller from
ONCE: Input: On-circuit emulation (ONCE) mode s
t you can test the printed circuit board while the
the system. By invoking the ONCE mode,
e n
d
microcontroller is soldered onto the board.
u Enable This active-high input pin enables the on-chip
PLLEN: Input Signal: Phase-locked stLoop
clock multiplier. The PLLEN pin
i t ymust be held low along with the ONCE# pin to enter on-circuit
emulation (ONCE) mode. c
.
w
the microcontroller. w
w
RESET: I/O Reset: A level-sensitive reset input to, and an open-drain system reset output from,
Either a falling edge on or an internal reset turns on a pull-down transistor
connected to the RESET for 16 state times. In the power down and idle modes, asserting RESET
causes the microcontroller to reset and return to normal operating mode.
RPD: Return-From-Power-Down Input Signal: Return from Power down Timing pin for the
return-from-power down circuit.
TMODE: Test-Mode Entry Input: If this pin is held low during reset, the microcontroller will
enter a test mode. The value of several other pins defines the actual test mode.
XTAL1 I Input Crystal/Resonator or External Clock Input: Input to the on-chip oscillator and the
internal clock generators. The internal clock generators provide the peripheral clocks, CPU
clock, and CLKOUT signal. When using an external clock source instead of the on-chip
oscillator, connect the clock input to XTAL1.

XTAL2: Output: Inverted Output for the Crystal/Resonator Output of the on-chip oscillator
inverter. Leave XTAL2 floating when the design uses an external clock source instead of the on-
chip oscillator.
Parallel Digital Input/Output Ports

P2.7:0 I/O Port 2: This is a standard, 8-bit, bidirectional port that shares package pins with
individually selectable special-function signals. P2.6 is multiplexed with the ONCE function.
P3.7:0 I/O Port 3: This is a memory-mapped, 8-bit, bidirectional port with programmable open
drain or complementary output modes.
P4.7:0 I/O Port 4 This is a memory-mapped, 8-bit, bidirectional port with programmable open
drain or complementary output modes.
o m
P5.7:0 I/O Port 5 This is a memory-mapped, 8-bit, bidirectional port.
o t.c
s p
P7.7:0 I/O Port 7 This is a standard, 8-bit, bidirectional port that shares package pins with
individually selectable special-function signals.
o g
. bl
P8.7:0 I/O Port 8: This is a standard, 8-bit, bidirectional port.
u p
o port.
P9.7:0 I/O Port 9: This is a standard, 8-bit, bidirectional
r
s g bidirectional port that is multiplexed with
P10.5:0 I/O Port 10: This is a standard, 6-bit,
n
individually selectable special-function signals.
t
d e
t
P11.7:0 I/O Port 11: This is a standard, u 8-bit, bidirectional port that is multiplexed with
s signals.
individually selectable special-function
y
c iat memory-mapped, 5-bit, bidirectional port. P12.2:0 select the
P12.4:0 I/O Port 12: This is
.
TROM w
w
w
Most of the above ports are shared with other important signals discussed here. For instance Port
3 pins P3.7:0 share package pins with AD7:0. That means by writing a specific word to the
configuration register the pins can change their function.
Serial Digital Input/Output Ports

TXD1:0 Output Signal: Transmit Serial Data 0 and 1. It can be programmed in different modes
by writing specific words to the internal configuration registers.
RXD1:0 Input: Receive Serial Data 0 and 1 in different preprogrammed modes.

Analog Inputs
ACH15:0: Input Analog Channels: These signals are analog inputs to the A/D converter. The
ANGND and VREF pins are also used for the standard A/D converter to function.
Other important signals of a typical microcontroller include
Power Supply and Ground pins at multiple points
Signals from the internal programmable Timer
Debug Pins
The reader is requested to follow the link www.intel.com/design/mcs96/manuals/272804.htm or

www.intel.com/design/mcs96/manuals/27280403.pdf for more details.
Some Specifications of the Processor

Frequency of Operation: 40 MHz
o m
t.c
2 Mbytes of linear address space
1 Kbyte of register RAM
3 Kbytes of code RAM p o
8 Kbytes of ROM
g s
2 peripheral interrupt handlers (PIH) o
6 peripheral interrupts
. bl
83 I/O port pins
2 full-duplex serial ports with baud-rate generators u p
Synchronous serial unit r o
s g
8 pulse-width modulator (PWM) outputs with 8-bit resolution
16-bit watchdog timer
Sixteen 10-bit A/D channels ent
Programmable clock output signal
u d
st
11.3 Conclusions y
it
. c
w
This chapter discussed the important signals of a typical microcontroller. The detailed electrical
w
and timing specifications are available in the respective manuals.
w
11.4 Questions
1. Which ports of the 80C196EA can generate PWM pulses? What is the voltage level of such
pulses?
Ans:

2. Why the power supply is given to multiple points on a chip?
Ans:
The multiple power supply points ensure the following
The voltages at devices (transistors and cells) are better than a set target under a specified
set of varying load conditions in the design. This is to ensure correct operation of circuits
at the expected level of performance.
the current supplied by a pad, pin, or voltage regulator is within a specified limit under
any of the specified loading conditions. This is required: a) for not exceeding the design
capacity of regulators and pads; and b) to distribute currents more uniformly among the
pads, so that the L di/dt voltage variations due to parasitic inductance in the packages
substrate, ball-grid array, and bond wires are minimized.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Memory
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
12
Memory-Interfacing
Requirement of External Memory

Different modes of a typical Embedded Controller
Standard Control Signals for Memory Interface
A typical Example
Pre-Requisite
12.1 Introduction
o m
A Single Chip Microcontroller t.c
p o
RAM area
g s
8
lo Port
8
Timer
. bADC A
8
16bit CPU
u p
r o
s g
n t Serial Port
Tx
ROM area
d e Rx
t u
sB
ti y
Port Port C
.c 5 8
w
w
wCPU: The processing module of the microcontroller
Fig. 12.1 The basic architecture of a Microcontroller
Fig. 12.1 shows the internal architecture of single chip microcontroller with internal RAM as
well as ROM. Most of these microcontrollers do not require external memory for simpler tasks.
The program lengths being small can easily fit into the internal memory. Therefore it often
provides single chip solutions. However the amount of internal memory cannot be increased
beyond a certain limit because of the following reasons.
Power Consumption
Size
The presence of extra memory needs more power consumption and hence higher temperature
rise. The size has to be increased to house the additional memory. The need for extra memory

space arises in some specific applications. Fig. 12.2 shows the basic block diagram of memory
interface to a processor.
Data Lines
CPU Address Lines Memory
Control Lines
Fig. 12.2 The Memory Interface

o m
12.2 External Memory Interfacing to PIC18F8XXX t.c family of
microcontrollers p o
g s
b lo
p.
Data
PIC18F8XXX
EMI Bus
Interface
o u
Logic
gr Memory
ts Address,
en Control
u d
t
Fig. 12.3 External Memory Interface Diagram
s
it y
The above family of microcontroller can have both on-chip as well as off chip external memory.
.c
At times the on-chip memory is a programmable flash type. A special register inside the
w
microcontroller can be programmed (by writing an 8 bit or 16-bit binary number) for using this
w
external memory in various modes. In case of the PIC family the following modes are possible
w
Microcontroller Mode
The processor accesses only on-chip FLASH memory. External Memory Interface functions are
disabled. Attempts to read above the physical limit of the on-chip FLASH causes a read of all
0s (a NOP instruction).
Microprocessor Mode
The processor permits execution and access only through external program memory; the contents
of the on-chip FLASH memory are ignored.

Microprocessor with Boot Block mode

The processor accesses on-chip FLASH memory within only the boot block. The boot block size
is device dependent and is located at the beginning of program memory. Beyond the boot block,
external program memory is accessed all the way up to the 2-MByte limit. Program execution
automatically switches between the two memories as required.
Extended Microcontroller Mode

The processor accesses both internal and external program memories as a single block. The
device can access its entire on-chip FLASH memory; above this, the device accesses external
program memory up to the 2-MByte program space limit. As with Boot Block mode, execution
automatically switches between the two memories as required.
Microprocessor o m Extended
t.c
Microprocessor Microcontroller
with Boot Block Microcontroller
Mode (MP) Mode (MC)
Mode (MPBB) Mode (EMC)
p o
000000h On-Chip
Program
000000h On-Chip
Program
000000h
g s On-Chip
000000h
On-Chip
o Program
bl
Memory Memory Program
Boot Memory
(No No Memory
access) Boundary access
p .
Boundary Boundary
Program Space Execution
Boundary+1
External
o u Boundary+1
Program
Memory gr
s
nt
External
External Reads Program
Program 0s
d e Memory
Memory
t u
1FFFFFh 1FFFFFh
y s 1FFFFFh 1FFFFFh
it On-Chip External
.c
On-Chip External On-Chip External On-Chip
FLASH Memory FLASH Memory FLASH FLASH Memory
w
w
w Fig. 12.4 The memory Map in different modes

AD<0, 15:10>
A<16,17>
AD<7:1>
VDD
ALE
VSS
OE
A<18,19> WRL
AD<9,8> WRH
PIC18F8XXX
UB
o m
ot.cLB
s p
o g
.bl
BA0
p
CE
ou
gr
Fig. 12.5 The address, data and control lines of the PIC18F8XXX microcontroller
required for external memory interfacing
s
e nt
The address, data and control lines of a PIC family of microcontroller is shown in Fig. 12.5 and
are explained below.
u d
st
AD0-AD15: 16-bit Data and 16 bits of Address are multiplexed
it y
c
A16-A19: The 4 most significant bits of the address
.Signal to latch the multiplexed address in the first clock cycle
w
ALE: Address Latch Enable
w
w
WRL Write Low Control Pin to make the memory write the lower byte of the data when it is
low
WRH Write High Control Pin to make the memory write the higher byte of the data when it is
low
OE Output Enable is made low when valid data is made available to the external memory
CE Chip enable line is made low to access the external memory chip
BA0: Byte Address 0
LB Lower Byte Enable Control is kept low when the lower byte is available for the memory.
UB Upper Byte Enable Control is kept low when the upper byte is available for the memory.

The microcontroller has a 16-bit wide bus for data transfer. These data lines are shared with
address lines and are labeled AD<15:0>. Because of this, 16 bits of latching are necessary to
demultiplex the address and data. There are four additional address lines labeled A<19:16>. The
PIC18 architecture provides an internal program counter of 21 bits, offering a capability of 2
Mbytes of addressing.
There are seven control lines that are used in the External Memory Interface: ALE, WRL
, WRH , OE , CE , LB , UB . All of these lines except OE may be used during data writes. All
of these lines except WRL and WRH may be used during fetches and reads. The application
will determine which control lines are necessary. The basic connection diagram is shown in Fig.
12.6. The 16-bit byte select mode is shown here.
D15:DO
PIC18F8XXX MEMORY
m Ax:A0
LATCH
Ax:A0
AD<15:0>
o
ALE
o t.c D15:DO
CE
s p CE
A<19:16>
o g OE WR(1)
.bl
OE
WRH u p
WRL r o
BA0
s g Address Bus
UB
LB ent Data Bus
Control Lines
u d
st
Fig. 12.6 The connection diagram for external memory interface in 16-bit byte select mode
it y
.c
The PIC18 family runs from a clock that is four times faster than its instruction cycle. The four
w
clock pulses are a quarter of the instruction cycle in length and are referred to as Q1, Q2, Q3, and
w
Q4. During Q1, ALE is enabled while address information A<15:0> are placed on pins
w
AD<15:0>. At the same time, the upper address information A<19:16> are available on the
upper address bus. On the negative edge of ALE, the address is latched in the external latch. At
the beginning of Q3, the OE output enable (active low) signal is generated. Also, at the
beginning of Q3, BA0 is generated. This signal will be active high only during Q3, indicating the
state of the program counter Least Significant bit. At the end of Q4, OE goes high and data (16-
bit word) is fetched from memory at the low-to-high transition edge of OE . The timing diagram
for all signals during external memory code execution and table reads is shown in Fig. 12.7.

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
A<19:16> 00h 0Ch
AD<15:0> 3AABh 0E55h CF33h 9256h
BA0
ALE
OE
WRH 1
WRL 1
CE 0
o m
UB 0
o t.c
LB 0
s p
o g
Fig. 12.7 Timing Diagram for
. bl Memory Read
u p
12.3 Conclusion
r o
This lesson discussed a typical external s
g
microcontrollers. A typical timing diagram n
t memory interface example for PIC
for memory read operation is presented.
family of
d e
12.4 Questions t u
s
Q1.Draw the read timing diagram ti yfor a typical memory operation
. c
Ans: w
Refer to text w
w
Q2. Draw the read timing diagram for a typical memory operation

o m
o t.c
s p
o g
. bl
u p
r o
16-bit Write Operation in MCS96
s g family refer Lesson10 and 11
n t
d e
t u
s
ti y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
13
Interfacing bus, Protocols,
ISA bus etc.
Bus, Wires and Ports

Basic Protocols of data transfer
Bus arbitration
ISA bus signals and handshaking
Memory mapped I/O and simple I/O
Parallel I/O and Port Based I/O
Example of interfacing memory to the ports of 8051
Pre-Requisite o m
o t.c
s p
13.1 Introduction o g
. bl
p
The traditional definition of input-output is the devices those create a medium of interaction with
u
1. Printers r o
the human users. They fall into the following categories such as:
2. Visual Display Units s g

3. Keyboard
4. Cameras ent
5. Plotters
u d
6. Scanners
st
t y
However in Real-Time embedded systems the definition of I/O devices is very different. An
i
.c
embedded controller needs to communicate with a wide range of devices namely
1. Analog to Digital (A-D) and Digital to Analog (D-A) Converters
w
2. CODECs
w
w
3. Small Screen Displays such as TFT, LCD etc
4. Antennas
5. Cameras
6. Microphones
7. Touch Screens
Etc.
A typical Embedded system is a Digital Camera as shown in Fig. 13.1. As it can be seen it
possesses broad range of input-output devices such as Lens, Microphone, speakers, Serial
interface standards, TFT screens etc.

Battery and Zoom Lens

USB Voltage Position
Monitoring Measurement
Speed light
Status Buttons Remote
LCD Ir Rx
TV
Monitor
om
Motors Motors MCU
Drivers Video
.c
Op Amps 1.6in/1.8in
ot
TFT Panel
sp
V/H Timing
Generator
g
CCD TI AFE TFT
lo
Lens Module RS232c Controller
.b
TI Digital Media Processor USB
up
Audio
Codec Module 1394
o
gr
32164-MB -MB Flash
ts
Audio Removable Storage
Power Amplifier SDRAM Memory
en
ud
Reset 1.5-V/1.8-/2.5V Core Supply 3.3-V/5-V System Supply 7.5V/12V/15V LCD/CCD Supply
st
ity
.c
Supply
Low Dropout Buck Buck Boost Boost Charge Inverter
Voltage
w
Supervisor Regulator Converter Converter Converter Pump

w
w
Power Management
LI-Ion Protector Li+NiMH Alkaline Wall

Supply
Battery Monitor Battery Charger USB Power
Battery Management
Fig. 13.1
The functionality of an Embedded System can be broadly classified as
Processing
Transformation of data
Implemented using processors
Storage
Retention of data
Implemented using memory
And Communication (also called Interfacing)
Transfer of data between processors and memories
Implemented using buses
Interfacing
o
deadlocks. In our context it is a way of effective communication in real time.
m
Interfacing is a way to communicate and transfer information in either way without ending into
This involves
Addressing o t.c
Arbitration
s p
Protocols
o g
.bl
Master Slave
u p
r o
s g Control Lines
ent Address Lines
u d Data Lines
st
Fig. 13.2(a) The Bus structure
it y
.c
Addressing: The data sent by the master over a specified set of lines which enables just the
w
device for which it is meant
w
Protocols: The literal meaning of protocol is a set of rules. Here it is a set of formal rules
w
describing how to transfer data, especially between two devices.
A simple example is memory read and write protocol. The set of rules or the protocol is
For read (Fig. 13.2 (b))
The CPU must send the memory address
The read line must be enabled
The processor must wait till the memory is ready
Then accept the bits in the data lines

rd'/wr
enable
addr
data
tsetup tread
read protocol
Fig. 13.2(b)
For write (Fig. 13.2(c))
The CPU must send the memory address
o m
t.c
The write line must be enabled
The processor sends the data over the data lines
The processor must wait till the memory is ready
p o
gs
o
rd'/wr
. bl
enable u p
r o
addr
s g
data ent
u d
st tsetup twrite
i t y write protocol
.c
w Fig. 13.2(c)
w
Arbitration: When the same set of address/data/control lines are shared by different units then
w
the bus arbitration logic comes into play. Access to a bus is arbitrated by a bus master. Each
node on a bus has a bus master which requests access to the bus, called a bus request, when then
node requires to use the bus. This is a global request sent to all nodes on the bus. The node that
currently has access to the bus responds with either a bus grant or a bus busy signal, which is
also globally known to all bus masters. (Fig. 13.3)

CPU Memory Memory

1 2
I/O I/O
Device Device DMA
1 2
o m
Fig. 13.3 The bus arbitration of the DMA, known as direct t.c memory access
controller which is responsible for transferring data between o an I/O device and
memory without involving the CPU. It starts with a bussp request to the CPU and
after it is granted it takes over the address/data and control
transfer. After the data transfer is complete it passeslo
g bus to initiate the data
the control over to the CPU.
. b
Before learning more details about each of these concepts u p a concrete definition of the following
terms is necessary. r o
s gleast resistance
t etc). It may be augmented with buffers latches etc.
Wire: It is just a passive physical connection with
n
e of bits, the clock speed etc.
Bus: A group of signals (such as data, address
A bus has standard specification such asdnumber
t uavailable so that any device which meets the specified
Port: It is the set of physical wires
standard can be directly plugged y
s
ti in. Example is the serial, parallel and USB port of the PC.
Time multiplexing: This is to.cShare a single set of wires for multiple pieces of data. It saves wires
at expense of time w
w
w

Time-multiplexed data transfer

Master req Servant Master req Servant
data(15.0) data(15.0)
addr data addr data
mux demux mux demux

data(8) addr/data
req req
data 15:8 7:0 addr/data addr data

o m
Data serializing t.c
o
Address/data muxing
p side transmits 16-bits
Fig. 13.4 The Time multiplexing data transfer. The left hand s
g is synchronized with the
of data in an 8-bit line MSB after the LSB. The transfer
o
req signal. In the example shown on the right hand side
address followed by data in synchronism with .the bl reqthesignal.
same set of wires carry
multiplexer u p mux: stands for
r o
The Handshaking Protocol s g
n t
Strobe Protocol d e
t u
s
i
Masterty req Servant
.c
w
w
w
data
req 1 3
data 2 4
Fig. 13.5(a) Strobe Protocol

1. Master asserts req to receive data

2. Servant puts data on bus within time taccess
3. Master receives data and deasserts req
4. Servant ready for next request
Handshake Protocol
Master Servant
req
ack
data
o m
o t.c
s p
1
o3g
bl
req
2 p . 4
ack
o u
gr
s
nt
data
d e
t
Fig.u13.5(b) Handshake Protocol
s
1. ti y data
Master asserts req to receive
2. c and asserts ack
Servant puts data on .bus
3. wnextandrequest
Master receives data deasserts req
4. w
Servant ready for
w

The Strobe & Handshake combined
Master req Servant
wait
data
req 1 3 req 1 4
wait wait 2 3
o m
t.c
data 2 4 data 5
taccess pto
1. Master asserts req to receive data g s access
1. Master asserts req to receive data

o
bl
2. Servant puts data on bus within time taccess 2. Servant cant put data within taccess, asserts wait ack
.
(wait line is unused)
3. Master receives data and deasserts req u p
3. Servant puts data on bus and deasserts wait
4. Master receives data and deasserts req
4. Servant ready for next request r o
5. Servant ready for next request
s g
Fast-response case
ent Slow-response case
d
Fig. 13.5(c) Strobe and Handshake Combined
u
st
t y
ci in ISA Bus
Handshaking Example
.
w
w
The Industry Standard Architecture (ISA Bus) has been described as below
w
This is a standard bus architecture developed to help the various designers to customize their
product and the interfaces. The pin configuration and the signals are discussed below.

o m
o t.c
s p
o g
Fig. 13.6 The ISA bus l
. b
ISA Signal Descriptions u p
r o
s g
SA19 to SA0 (SA for System Address)
n t
System Address bits 19:0 are used to addressd e memory and I/O devices within the system. These
signals may be used along with LA23uto LA17 to address up to 16 megabytes of memory. Only
the lower 16 bits are used during I/O s t operations to address up to 64K I/O locations. SA19 is the
most significant bit. SA0 is the i tyleast significant bit. These signals are gated on the system bus
when BALE is high and are.c latched on the falling edge of BALE. They remain valid throughout
a read or write command. walsoThese signals are normally driven by the system microprocessor or
DMA controller, but mayw be driven by a bus master on an ISA board that takes ownership of
the bus. w
LA23 to LA17
Unlatched Address bits 23:17 are used to address memory within the system. They are used
along with SA19 to SA0 to address up to 16 megabytes of memory. These signals are valid when
BALE is high. They are "unlatched" and do not stay valid for the entire bus cycle. Decodes of
these signals should be latched on the falling edge of BALE.
AEN
Address Enable is used to degate the system microprocessor and other devices from the bus
during DMA transfers. When this signal is active the system DMA controller has control of the

address, data, and read/write signals. This signal should be included as part of ISA board select
decodes to prevent incorrect board selects during DMA cycles.
BALE
Buffered Address Latch Enable is used to latch the LA23 to LA17 signals or decodes of these
signals. Addresses are latched on the falling edge of BALE. It is forced high during DMA cycles.
When used with AEN, it indicates a valid microprocessor or DMA address.
CLK
System Clock is a free running clock typically in the 8MHz to 10MHz range, although its exact
frequency is not guaranteed. It is used in some ISA board applications to allow synchronization
with the system microprocessor.
o m
t.c
SD15 to SD0
p o
System Data serves as the data bus bits for devices on the ISA bus. SD15 is the most significant
g s
bit. SD0 is the least significant bits. SD7 to SD0 are used for transfer of data with 8-bit devices.
o
bl
SD15 to SD0 are used for transfer of data with 16-bit devices. 16-bit devices transferring data
.
with 8-bit devices shall convert the transfer into two 8-bit cycles using SD7 to SD0.
p
o
DACK0 to DACK3 and DACK5 to DACK7 u
gr
DMA Acknowledge 0 to 3 and 5 to 7 are used to
ts acknowledge DMA requests on DRQ0 to DRQ3
and DRQ5 to DRQ7.
e n
u dto DRQ7
DRQ0 to DRQ3 and DRQ5
s t
DMA Requests are used by ISA i tyboards to request service from the system DMA controller or to
request ownership of the .c bus as a bus master device. These signals may be asserted
w device must hold the request signal active until the system board
asynchronously. The requesting
w
asserts the corresponding
w DACK signal.
I/O CH CK
I/O Channel Check signal may be activated by ISA boards to request than an non-maskable
interrupt (NMI) be generated to the system microprocessor. It is driven active to indicate a
uncorrectable error has been detected.
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait
states. This signals normal state is active high (ready). ISA boards drive the signal inactive low
(not ready) to insert wait states. Devices using this signal to insert wait states should drive it low

immediately after detecting a valid address decode and an active read or write command. The
signal is release high when the device is ready to complete the cycle.
IOR
I/O Read is driven by the owner of the bus and instructs the selected I/O device to drive read data
onto the data bus.
IOW
I/O Write is driven by the owner of the bus and instructs the selected I/O device to capture the
write data on the data bus.
IRQ3 to IRQ7 and IRQ9 to IRQ12 and IRQ14 to IRQ15

o m
Interrupt Requests are used to signal the system microprocessor that can ISA board requires
o t. low to high. The line
attention. An interrupt request is generated when an IRQ line is raised from
must be held high until the microprocessor acknowledges the request
s pIRQ14
through its interrupt service
routine. These signals are prioritized with IRQ9 to IRQ12 and
g to IRQ15 having the
o the lowest priority (IRQ7 is the
highest priority (IRQ9 is the highest) and IRQ3 to IRQ 7 have
b l
lowest).
p .
SMEMR o u
r
g device to drive data onto the data bus. It is
System Memory Read instructs a selected memory s
t the low 1 megabyte of memory space. SMEMR is
active only when the memory decode is within
derived from MEMR and a decode of thee
n
u d low 1 megabyte of memory.
SMEMW st
ti y
.
System Memory Write instructs c a selected memory device to store the data currently on the data
wthe memory decode is within the low 1 megabyte of memory space.
bus. It is active only when
SMEMW is derived from w
w MEMW and a decode of the low 1 megabyte of memory.
MEMR
Memory Read instructs a selected memory device to drive data onto the data bus. It is active on
all memory read cycles.
MEMW
Memory Write instructs a selected memory device to store the data currently on the data bus. It is
active on all memory write cycles.

REFRESH
Memory Refresh is driven low to indicate a memory refresh operation is in progress.
OSC
Oscillator is a clock with a 70ns period (14.31818 MHz). This signal is not synchronous with the
system clock (CLK).
RESET DRV
Reset Drive is driven high to reset or initialize system logic upon power up or subsequent system
reset.
TC o m
o t.c
Terminal Count provides a pulse to signal a terminal count has been reached on a DMA channel
operation.
s p
o g
MASTER
.bl
Master is used by an ISA board along with a DRQ line p
u which
to gain ownership of the ISA bus. Upon
receiving a -DACK a device can pull -MASTER low
r o will allow it to control the system
address, data, and control lines. After MASTERgis low, the device should wait one CLK period
before driving the address and data lines, and tstwo clock periods before issuing a read or write
command.
e n
u d
MEM CS16
st
i ty low by a memory slave device to indicate it is capable of
.cdata transfer. This signal is driven from a decode of the LA23 to
Memory Chip Select 16 is driven
performing a 16-bit memory
w
LA17 address lines.
w
I/O CS16
w
I/O Chip Select 16 is driven low by a I/O slave device to indicate it is capable of performing a
16-bit I/O data transfer. This signal is driven from a decode of the SA15 to SA0 address lines.
0WS
Zero Wait State is driven low by a bus slave device to indicate it is capable of performing a bus
cycle without inserting any additional wait states. To perform a 16-bit memory cycle without
wait states, -0WS is derived from an address decode.

SBHE
System Byte High Enable is driven low to indicate a transfer of data on the high half of the data
bus (D15 to D8).
The Memory Read bus cycle in ISA bus
CYCLE C1 C2 WAIT C3 C4
CLOCK
D[7-0] DATA
A[19-0] ADDRESS
o m
ALE
o t.c
/MEMR s p
o g
CHRDY
. bl
p
u of Data Transfer in ISA bus
o
Fig. 13.7(a) The Handshaking Mode
r
g
The Memory Write bus cycle intsISA bus
e n
u d WAIT
CYCLE C1 C2
st C3 C4
CLOCK ti y
.c DATA
D[7-0]
w
w ADDRESS
A[19-0]
w
ALE
/MEMW
CHRDY
Fig. 13.7(b) The Handshaking Mode of Data Transfer in ISA bus
13.2 I/O addressing

A microprocessor communicates with other devices using some of its pins. Broadly we can
classify them as

Port-based I/O (parallel I/O)

Processor has one or more N-bit ports
Processors software reads and writes a port just like a register
Bus-based I/O
Processor has address, data and control ports that form a single bus
Communication protocol is built into the processor
A single instruction carries out the read or write protocol on the bus
Parallel I/O peripheral
When processor only supports bus-based I/O but parallel I/O needed
Each port on peripheral connected to a register within peripheral that is read/written by the
processor
Processor Memory Processor Port 0

Port 1
System bus o m
Port 2
t.c
Port 3
p o
Parallel I/O peripheral
g s Parallel I/O peripheral
o
. bl
Port A Port B Port C
u p Port A Port B Port C
r o
Adding parallel I/O to a bus-
s g Extended parallel I/O
based I/O processor
n t
Fig. 13.8 d
e
Parallel I/O and extended Parallel I/O
t u
Extended parallel I/O
ys I/O but more ports needed
When processor supportsitport-based
One or more processor . cports interface with parallel I/O peripheral extending total number
e.g., extending 4w
wI/O
of ports available for
ports to 6 ports in figure
w
Types of bus-based I/O:
Memory-mapped I/O and standard I/O
Processor talks to both memory and peripherals using same bus two ways to talk to
peripherals
Memory-mapped I/O
Peripheral registers occupy addresses in same address space as memory
e.g., Bus has 16-bit address
lower 32K addresses may correspond to memory
upper 32k addresses may correspond to peripherals
Standard I/O (I/O-mapped I/O)
Additional pin (M/IO) on bus indicates whether a memory or peripheral access
e.g., Bus has 16-bit address
all 64K addresses correspond to memory when M/IO set to 0

all 64K addresses correspond to peripherals when M/IO set to 1

Memory-mapped I/O vs. Standard I/O
Memory-mapped I/O
Requires no special instructions
Assembly instructions involving memory like MOV and ADD work with peripherals as
well
Standard I/O requires special instructions (e.g., IN, OUT) to move data between
peripheral registers and memory
Standard I/O
No loss of memory addresses to peripherals
Simpler address decoding logic in peripherals possible
When number of peripherals much smaller than address space then high-order address
bits can be ignored
smaller and/or faster comparators
A basic memory protocol
o m
Interfacing an 8051 to external memory
o t.c
p
8051 has three 8-bit ports through which it can communicate withsthe outside world.
Ports P0 and P2 support port-based I/O when 8051 internal
o g memory being used
Those ports serve as data/address buses when externallmemory is being used
16-bit address and 8-bit data are time multiplexed; .low b 8-bits of address must therefore be
latched with aid of ALE (address latch enable) u p
signal
r o
s g
P0 Dn t Q D<07>
d e/CS
A<015>
ALE t u /OE
s G /WE
i ty 8 74373 CS2 /CS1
P2 w
. c HM6264
/WR w /CS
w
/RD D<07>
/PSEN
A<014>
/OE
8051 27C256
Fig. 13.9(a) A basic memory interface

Clock
P0 Adr. 7..0 Data
P2 Adr. 158
Q Adr. 70
ALE
o m
t.c
/RD
p o
Fig. 13.9(b) The timing diagram
g s
b loThe lower byte of the address is
The timing of the various signals is shown in Fig. 13.9(b).
p . The higher byte of the address is
placed along P0 and the address latch enable signal is enabled.
placed along P2. The ALE signal enables the 74373 u chip to latch the address as the P0 bus will
r
be used for data. The P0 bus goes into tri-state (high o impedance state) and switches internally for
g
s baronover
data path. The RD (read) line is enabled. The
when low. The data is received from the memory n t the
the read line indicates that it is active
P0 bus. A memory write cycle can be
explained similarly.
d e
t u
13.3 Conclusion s
ti y
. c about the basics of Input Output interfacing. In the previous
In this lesson you learnt
wgenerator,
chapter you also studied about some input output concepts. But most of those I/O such as Timer,
Watch Dog circuits, PWM w Serial and Parallel ports were part of the microcontroller.
w of interfacing with external devices have been discussed. The difference
In this lesson the basics
between a Bus and a Port should be kept in mind. The ISA bus is discussed to give an idea about
the various bus architectures which will discussed in the later part of this course. You must
browse various websites as listed below for further knowledge.
http://esd.cs.ucr.edu/slide_index.html
http://esd.cs.ucr.edu/wres.html
www.techfest.com/hardware/bus/isa.htm
You should be able to be in a position to learn any microcontroller and their interfacing
protocols.
13.4 Questions
1. List at least 4 differences between the I/O devices for a Real Time Embedded System
(RTES) and a Desktop PC?

RTES I/O PC I/O

It has to operate in real time. The timing May take little longer and need not satisfy
requirement has to met. the stringent timing requirement of the user
The I/O devices need not be meant for the The I/O for desktop encompasses a broad
human user and may consists of analog range. Generally the keypad, monitor,
interfaces, digital controllers, mixed signal
mouse etc which are meant for the human
circuits. users are termed as I/O. But it could have
also the similar I/Os as in case of RTES
The power consumption of these I/O There is virtually no strict limit to the
devices should be limited. power in such I/Os
The size of the I/O devices should be small Generally the size is not a problem as it is
to make it coexist with the processor and not meant to be portable
other devices
handshaking signals are necessary? o m

2. Draw the timing diagram of a memory read protocol for slower memory. What additional
o t.c
Ans: An additional handshaking signal from the memory namely /ready is necessary. The
s p
microcontroller inserts wait states as long as the /ready line is not inactive. The ready line in this
g
case is sampled at the rising edge of the third clock phase. Fig.Q2 reveals the timing of such an
o
bl
operation.
p .
T1 T2 Twaitu T4
r o T5
s g
Clock
ent
u d
Address
st
it y
.c
/RD
w
w
w
/Ready
Data
Fig. Q2 The Timing Diagram of memory read from a slower
3. Enlist the handshaking signals in the ISA bus for dealing with slower I/O devices.
Ans:
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait
states. This signals normal state is active high (ready). ISA boards drive the signal inactive low
(not ready) to insert wait states. Devices using this signal to insert wait states should drive it low
immediately after detecting a valid address decode and an active read or write command. The
signal is release high when the device is ready to complete the cycle.
4. What additional handshaking signals are necessary for bidirectional data transfer over the
same set data lines.
Ans:
For an 8-bit data transfer we need at least 4 additional lines for hand shaking. As shown in
Fig.Q4 there are two ports shown. Port A acts as the 8-bit bidirectional data bus. Port C carries
the handshaking signals.
o m
Write operation: When the data is ready the /OBFA (PC7 output buffer c
t. fullthrough
acknowledge active
low) signal is made 0. The device which is connected acknowledges
acknowledge that it is ready to accept data. It is active low). The p
o /ACKA( PC6
data transfer takes place over
PA0-PA7. s
gmakes the /STBA (PC4 Strobe
Read operation: When the data is ready the external device o
lis sent through IBFA (Input Buffer
acknowledge active low) line low. The acknowledgement
. b
Empty Acknowledge that it is ready to accept data. Itpis active high). The data transfer takes
place.
o u
g r
ts
e n 8
PA7-PA0
u d
st PC7 OBFA
i ty
.c PC6 ACKA
w
w PC4 STBA
w PC5 IBFA
Fig. Q4 The master
5. List the various bus standards used in industry.
Ans:
ISA Bus
The Industry Standard Architecture (ISA) bus is an open, 8-bit (PC and XT) or 16-bit (AT)
asymmetrical I/O channel with numerous compatible hardware implementations.

EISA Bus
The Extended Industry Standard Architecture (EISA) bus is an open, 32-bit, asymmetrical I/O
channel with numerous compatible hardware implementations. The system bus and allows data
transfer rates at a bandwidth of up to 33 MB per second, supports a 4 GB address space, 8 DMA
channels, and is backward compatible with the Industry Standard Architecture (ISA) bus.
PCI Bus
The Peripheral Component Interconnect Local Bus (PCI) is an open, high-performance 32-bit or
64-bit synchronous bus with multiplexed address and data lines, and numerous compatible
hardware implementations. PCI bus support a PCI frequency of 33 MHz and a transfer rate of
132 MB per second.
Futurebus+ o m
.c
Futurebus+ is an open bus, designed by the IEEE 896 committee,t whose architecture and
interfaces are publicly documented, and that is independent of any p
o
underlying architecture. It has
broad-base, cross-industry support; very high throughput (the
g s maximum rate for 64-bit
bandwidth is 160 MB per second; for the 128-bit bandwidth,
lo 180 MB per second). Futurebus+
.
supports a 64-bit address space and a set of control and status b registers (CSRs) that provides all
u p
the necessary ability to enable or disable features; thus supporting multivendor interoperablity.
SCSI Bus r o
s g
The Small Computer Systems Interface (SCSI) n t bus is an ANSI standard for the interconnection
d
of computers with each other and with disks, e floppies, tapes, printers, optical disks, and scanners.
t u
The SCSI standard includes all the mechanical, electrical, and
Data transfer rates are individuallysnegotiated with each device attached to a given SCSI bus. For
i
example, a 4 MB per second device t y and a 10 MB per second device may share a fast narrow bus.
.cis usingis using
When the 4 MB per second device the bus, the transfer rate is 4 MB per second. When
the 10 MB per second devicew the bus, the transfer rate is 10 MB per second. However,
when faster devices arewplaced on a slower bus, their transfer rate is reduced to allow for proper
w
operation in that slower environment.
Note that the speed of the SCSI bus is a function of cable length, with slow, single-ended SCSI
buses supporting a maximum cable length of 6 meters, and fast, single-ended SCSI buses
supporting a maximum cable length of 3 meters.
TURBOchannel Bus
The TURBOchannel bus is a synchronous, 32-bit, asymmetrical I/O channel that can be operated
at any fixed frequency in the range 12.5 MHz to 25 MHz. It is also an open bus, developed by
Digital, whose architecture and interfaces are publicly documented.
At 12.5 MHz, the peak data rate is 50 MB per second. At 25 MHz, the peak data rate is 100 MB
per second.
The TURBOchannel is asymmetrical in that the base system processor and system memory are
defined separately from the TURBOchannel architecture. The I/O operations do not directly

address each other. All data is entered into system memory before being transferred to another
I/O option. The design facilitates a concise and compact protocol with very high performance.
XMI Bus
The XMI bus is a 64-bit wide parallel bus that can sustain a 100 MB per second bandwidth in a
single processor configuration. The bandwidth is exclusive of addressing overhead; the XMI bus
can transmit 100 MB per second of data.
The XMI bus implements a "pended protocol" design so that the bus does not stall between
requests and transmissions of data. Several transactions can be in progress at a given time. Bus
cycles not used by the requesting device are available to other devices on the bus. Arbitration
and data transfers occur simultaneously, with multiplexed data and address lines. These design
features are particularly significant when a combination of multiple devices has a wider
bandwidth than the bus itself.
VME Bus o m
o t.c
s p
Digital UNIX includes a generic VME interface layer that provides customers with a consistent
interface to VME devices across Alpha AXP workstation and server platforms. Currently, VME
o g
adapters are only supported on the TURBOchannel bus. To use the VME interface layer to write
bl
VMEbus device drivers, you must have the Digital UNIX TURBOchannel/VME Adapter Driver
.
u p
Version 2.0 software (Software Product Description 48.50.00) and its required processor and/or
hardware configurations (Software Support Addendum 48.50.00-A).
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
14
Timers
After going through this lesson the student would learn Standard Peripheral Devices most
commonly used in single purpose processors. They are
Timers and Counter Basics

Various Modes of Timer Operation
The internal Timer of 8051
A programmable interval timer 8253
Watchdog Timer and Watchdog circuit
Pre-Requisite
o m
t.c
14 Introduction
p o
The Peripherals of an embedded processor can either be on the same chip as the processor or can
be connected externally.
gs
o
External
Interrupts .bl
On-Chip u p ETC.
Interrupt Flash
r oRAM
On-Chip Timer 1 Counter
Control
s g Timer 0 Inputs
ent
CPU
u d
st
Busit y Serial
Osc
.c
Control 4 I/O Ports Port
w
w
w TXD RXD
P0 P2 P1 P3
Fig. 14.1 Block Diagram of the basic 8051 Architecture
For example in a typical embedded processor as shown in Fig.14.1 timer, interrupt. Serial port
and parallel ports reside on the single chip. These dedicated units are otherwise termed as
single-purpose processor. These units are designed to achieve the following objectives. They can
be a part of the microcontroller or can reside outside the chip and therefore should be properly
interfaced with the processor.
The tasks generally carried out by such units are
Timers, counters, watchdog timers
serial transmission

analog/digital conversions
Timer
Timer is a very common and useful peripheral. It is used to generate events at specific times or
measures the duration of specific events which are external to the processor. It is a
programmable device, i.e. the time period can be adjusted by writing specific bit patterns to some
of the registers called timer-control registers.
Counter
A counter is a more general version of the timer. It is used to count events in the form of pulses
which is fed to it.
o m
Fig.14.2(a) shows the block diagram of a simple timer. This has a 16-bit up counter which
increments with each input clock pulse. Thus the output value Cnt represents the number of
o t.c
pulses since the counter was last reset to zero. An additional output top indicates when the
terminal count has been reached. It may go high for a predetermined time as set by the
p
programmable control word inside the timer unit. The count can be loaded by the external
s
program.
o g
l
an internal clock or external clock. The mode bit when.b
Fig.14.2(b) provides the structure of another timer where a multiplexer is used to choose between
set or reset decided the selection. For
u p For the external count in (cnt_in) it
internal clock(Clk) it behaves like the timer in Fig.14.2(a).
just counts the number of occurrences.
r o
s g
Basic timer
n t Timer/counter
d e Clk
Clk 16 Cnt
t u 2x1 16-bit up 16 Cnt
16-bit up
y s mux counter
counter
iTopt
c
.
w Cnt_in Top
Reset w
w
Fig. 14.2(a)
Reset
Mode
Fig. 14.2(b)
Fig.14.2(c) shows a timer with the terminal count. This can generate an event if a particular
interval of time has been elapsed. The counter restarts after every terminal count.

Timer with a terminal

count
16-bit up
Clk 16 Cnt
counter
Reset
=
Top
o m
o t.c
s p
Terminal count
o g
.bl
u p
r o
g
Fig.s14.2(c)
n t
ude
t
it ys
.c
w
w
w

Clock Amplitude
0
-2
0 5 10 15 20 25 30
Clock Pulse No.
10
Counter Value
Reset and Reload the Timer with a new count each time
0
0 5 10 15 20 25 30
Clock Pulse No.
2
o m
t.c
Output
1
p o
g s
0
o
bl
0 5 10 15 20 25 30
Clock Pulse No.
p .
u
Fig. 14.3 The Timer Count and Output. The timer is in count-down mode. In
o
gr
every clock pulse the count is decremented by 1. When the count value
reaches zero the output of the counter i.e. TOP goes high for a predetermined
s
nt
time. The counter has to be loaded with a new or previous value of the count
reaches zero. d e
by external program or it can be loaded automatically every time the count
u
st
Timer in 8051 Microcontroller
i ty
.c of 8051 which has got two timer units.
Fig.14.1 shows the architecture
The 8051 comes equipped w with two timers, both of which may be controlled, set, read, and
w
w of time between events, 2) Counting the events themselves, or 3)
configured individually. The 8051 timers have three general functions: 1) Keeping time and/or
calculating the amount
Generating baud rates for the serial port.
As mentioned before, the 8051 has two timers which each function essentially the same way.
One timer is TIMER0 and the other is TIMER1. The two timers share two Special Function
Registers(SFR) (TMOD and TCON) which control the timers, and each timer also has two SFRs
dedicated solely to itself (TH0/TL0 and TH1/TL1).
Timer0 and Timer1

The Timer and Counter functions are selected in the Special Function Register TMOD. These
two Timer/Counter have four operating modes which are selected by bit-pairs (M1. M0) in
TMOD. Modes 0, 1, and 2 are the same for both Timer/Counters.Mode3 is different.

MODE0
Either Timer in Mode0 is an 8-bit Counter with a divide-by-32 pre-scaler. In this mode, the
Timer register is configured as a 13-Bit register. As the count rolls over from all 1s to all 0s, it
sets the Timer interrupt flag TF1. The counted input is enabled to the Timer whenTR1 = 1and
either GATE = 0 or INT1 = 1. (Setting GATE = 1 allows the Timer to be controlled by external
input INT1, to facilitate pulse width measurements.)
OSC + 12
C/T = 0 TL1 TH1

(5 Bits) (8 Bits) TF1 INTERRUPT
C/T = 1
CONTROL
T1 PIN
o m
TR1
ot.c
GATE
s p
INT1 PIN
o g
Fig. 14.4 Timer/Counter Mode b 0:l 13-BitCounter
p .
o u
(MSB) r
g C/T M1 (LSB)
GATE C/T M1 M0 sGATE
n t M0
Timer 1
d e M1 M0
Timer 0
Operating Mode
GATE
x is enabled only while INTxu
Gating control when set. Timer/Counter
t pin is 0 0 8-bit Timer/Counter THx with
high and TRx controls pin is set. TLx as 5-bit prescaler.
When cleared Timer tx
i y is enabled 0 1 16-bit Timer/Counter THx and
.c
whenever TRx control bit is set. TLx are cascaded; there is no
prescaler.
C/T Timer or Counter w Selector cleared for 1 0 8-bit auto-reload Timer/Counter
w
Timer operation (input from internal THx holds a value which is to be
w
system clock). Set for Counter
operation (input from Tx input pin).
reloaded
overflows.
into TLx each time it
1 1 (Timer 0) TL0 is an 8-bit

Timer/Counter controlled by the
standard Timer 0 control bits. THO
is an 8-bit timer only controlled by
Timer 1 controls bits.
1 1 (Timer 1) Timer/Counter 1 stopped.
Mode Control Register (TMOD)

(MSB) (LSB)
TF1 TR1 TF0 TR0 IE1 IT1 IE0 IT0
Symbol Position Name and Significance Symbol Position Name and Significance
TF1 TCON.7 Timer 1 overflow Flag. Set by IE1 TCON.3 Interrupt 1 Edge flag. Set by
hardware on Timer/Counter hardware when external
overflow. Cleared by hardware interrupt edge detected.
when processor vectors to Cleared when interrupt
interrupt routine. processed.
TR1 TCON.6 Timer 1 Run control bit. IT1 TCON.2 Interrupt 1 Type control bit.
Set/cleared by software to turn Set/cleared by software to
Timer/Counter on/off. specify falling edge/low level
triggered external interrupts.
TF0 TCON.5 Timer 0 overflow Flag. Set by
hardware on Timer/Counter
overflow. Cleared by hardware
IE0 TCON.1
o m
Interrupt 0 Edge flag. Set by
hardware when external
when processor vectors to
interrupt routine.
o t.c
interrupt edge detected.
Cleared when interrupt
TR0 TCON.4 Timer 1 Run control bit. s p processed.
Set/cleared by software to turn IT0 g

TCON.0
o
Interrupt 0 Type control bit.
bl
Timer/Counter on/off. Set/cleared by software to
p . specify falling edge/low level

triggered external interrupts.
o u
r
Timer/Counter Control Register (TCON)
g
ts
MODE 1: Mode 1 is the same as Mode 0,nexcept that the Timer register is being run with all
16bits.
d e
t u
y s
i t
.c
OSC + 12
wC/T = 0
w C/T TL1
TF1
w =1
CONTROL
(8 Bits) INTERRUPT
T1 PIN
TR1 RELOAD
GATE TH1
(5 Bits)
INT1 PIN
Fig. 14.5 MODE 2 configures the Timer register as an 8-bit counter with
automatic reload

OSC + 12 1/12 f
OSC
1/12 f
OSC
C/T = 0 TL0
(8 Bits) TF0 INTERRUPT
T1 PIN C/T = 1
CONTROL
TR1
GATE
INT1 PIN
1/12 f TH0
OSC
(8 Bits)
TF1
o mINTERRUPT
TR1
CONTROL
t. c
p o
Fig. 14.6 MODE 3: Timer simply holds its count. Timer 0 in Mode 3 establishes
TL0 and TH0 as two separate counters.
gs
o
The Programmable Interval Timer 8253. bl
u p
For processors where the timer unit is not internal o
Fig.14.7 shows the signals for 8253 programmable g r the programmable interval timer can be used.
interval timer.
ts
e n
1 d
D7
t u 24 Vcc
D6
D5 ty 3
s2 23 WR
i 22 RD
.D4D3c 5 8 2120 CS
4
w D2 6 2 19 A1A0
w
w D1D0 78 5 18 CLK OUT 2
2
17
CLK 0 9 16 GATE 2
OUT 0 10 3 15 CLK 1
GATE 0 11 14 GATE 1
GND 12 13 OUT 1

Microprocessor Counter
interface input/output
CLK 0
D7 D0 GATE 0
OUT 0
RD
8253 CLK 1
WR
GATE 1
OUT 1
A0
A1 CLK 2
GATE 2
OUT 2
o m
CS
o t.c
Fig. 14.7 The pin configuration of the timer
s p
o g
bl
Fig.14.8 shows the internal block diagram. There are three separate counter units controlled by
configuration register (Fig.14.9).
p .
Each counter has two inputs, clock and gate and one
counting by decrementing a preloaded value in the o
u output. The clock is signal that helps in
r respective counter register. The gate serves as
an enable input. If the gate is maintained lowgthe counting is disabled. The timing diagram
s of the timer.
explains in detail about the various modes of toperation
e n
u d CLK0
st Counter GATE0
it
Data y
D D
c
.Buffer
Bus #0 OUT0
w
Bus
7 0
w
w
RD
CLK1
WR Read
Counter GATE1
Write
A1 #1
Control
Internal
OUT1
A0 Logic
CS
CLK2
Power supplies Control
Counter GATE2
Vcc
Word
Register #2
OUT2
GND
Fig. 14.8 The internal block diagram of 8253 Table The address map
CS A1 A0 Port
0 0 0 Counter 0
0 0 1 Counter 1
0 1 0 Counter 2
0 1 1 Control register
D7 D6 D5 D4 D3 D2 D1 D0
SC1 SC0 RL1 RL0 M2 M1 M0
o m
BCD
o t.c
0
1 s p Binary counter (16-bit)
BCD (4 decades)
o g
. bl 00 00 0
1
Mode 0
Mode 1
u p 1 0 Mode 2
r o 1 1 Mode 3
s g 1 0 0 Mode 4
nt
1 0 1 Mode 5
d e00 0 Counter latching operation
t u 1 Road/load LSB only
y s 1 0 Road/load MSB only
it 1 1 Road/load LSB first, then MSB
.c Select counter 0
w 0 0
Select counter 1
w 0
1
1
0
w 1 1
Select counter 2
Illegal
Fig. 14.9 Control Register
8253 Operating Modes

Mode 0 Interrupt on terminal count
Mode 1 Programmable one shot
Mode 2 Rate Generator
Mode 3 Square wave rate Generator
Mode 4 Software triggered strobe
Mode 5 Hardware triggered strobe

Mode 0: The output goes high after the terminal count is reached. The counter stops if the Gate is
low. (Fig.14.10(a) & (b)). The timer count register is loaded with a count (say 6) when the WR
line is made low by the processor. The counter unit starts counting down with each clock pulse.
The output goes high when the register value reaches zero. In the mean time if the GATE is
made low (Fig.14.10(b)) the count is suspended at the value(3) till the GATE is enabled again.
CLK
WR
6 5 4 3 2 1
OUT
o m
o t.c
GATE
s p
o g
Fig. 14.10(a) Mode 0 count when Gate is high (enabled)
. bl
CLK
u p
r o
s g
e nt
WR
u d
st
6
t y 5 4 3 3 3 2 1
OUT
. ci
w
w
GATE w
Fig. 14.10(b) Mode 0 count when Gate is low temporarily (disabled)
Mode 1 Programmable mono-shot

The output goes low with the Gate pulse for a predetermined period depending on the counter.
The counter is disabled if the GATE pulse goes momentarily low.
The counter register is loaded with a count value as in the previous case (say 5) (Fig.14.11(a)).
The output responds to the GATE input and goes low for period that equals the count down
period of the register (5 clock pulses in this period). By changing the value of this count the
duration of the output pulse can be changed. If the GATE becomes low before the count down is

completed then the counter will be suspended at that state as long as GATE is low
(Fig.14.11(b)). Thus it works as a mono-shot.
CLK
WR
GATE (trigger)
OUT
5 4 3 2 1
o m
o t.c
depending on the count s p
Fig. 14.11(a) Mode 1 The Gate goes high. The output goes low for the period
o g
CLK . bl
u p
r o
s g
nt
WR
d e
t u
GATE (trigger)
y s
it
.c
w
OUT
w 4 3 3 4 3 2 1
w
Fig. 14.11(b) Mode 1 The Gate pulse is disabled momentarily causing the
counter to stop.
Mode 2 Programmable Rate Generator

Fig.14.12(a) and (b) shows the waveforms corresponding the Timer operation in this mode. In
this mode it operates as a rate generator. The output goes high for a period that equals the time of
count down of the count register (3 in this case). The output goes low exactly for one clock
period before it becomes high again. This is a periodic operation.

CLK
WR
GATE
3 2 1 3 2 1
OUT
Fig. 14.12(a) Mode 2 Operation when the GATE is kept high
o m
CLK o t.c
s p
o g
WR . bl
u p
r o
s g
GATE
ent
d
u 1
OUT 3
st 2 3 3 2 1
i ty
.c
Fig. 14.12(b) Mode 2 operation when the GATE is disabled momentarily.
w Square Wave Rate Generator
w
Mode 3 Programmable
w
It is similar to Mode 2 but the output high and low period is symmetrical. The output goes high
after the count is loaded and it remains high for period which equals the count down period of
the counter register. The output subsequently goes low for an equal period and hence generates a
symmetrical square wave unlike Mode 2. The GATE has no role here. (Fig.14.13).

CLK
WR
n=4
OUT (n=4)
OUT (n=5)
Fig. 14.13 Mode3 Operation: Square Wave generator o m

o t.c
Mode 4 Software Triggered Strobe
s p
o g
bl
In this mode after the count is loaded by the processor the count down starts. The output goes
p .
low for one clock period after the count down is complete. The count down can be suspended by
making the GATE low (Fig.14.14(a) (b)). This is also called a software triggered strobe as the
count down is initiated by a program.
ou
gr
CLK
s
ent
u d
WR st
i t y
.c
OUT
w
w
w 4 3 2 1
Fig. 14.14(a) Mode 4 Software Triggered Strobe when GATE is high

CLK
WR
GATE
OUT
3 3 2 1
4
o m
t.cis momentarily low
Fig. 14.14(b) Mode 4 Software Triggered Strobe when GATE
p o
Mode 5 Hardware Triggered Strobe g s
b lo
The count is loaded by the processor but the count down
p . is initiated by the GATE pulse. The
transition from low to high of the GATE pulse enables
o u count down. The output goes low for one
r
clock period after the count down is complete (Fig.14.15).
g
CLK ts
e n
u d
st
WR
i ty
.c
w
GATE w
w
OUT
5 4 3 2 1
Fig. 14.15 Mode 5 Hardware Triggered Strobe

Watchdog timer
A Watchdog Timer is a circuit that automatically invokes a reset unless the system being
watched sends regular hold-off signals to the Watchdog.
Watchdog Circuit
To make sure that a particular program is executing properly the Watchdog circuit is used.
For instance the program may reset a particular flip-flop periodically. And the flip-flop is set by
an external circuit. Suppose the flip-flop is not reset for long time it can be known by using
external hardware. This will indicate that the program is not executed properly and hence an
exception or interrupt can be generated.
Watch Dog Timer(WDT) provides a unique clock, which is independent of any external
clock. When the WDT is enabled, a counter starts at 00 and increments by 1 until it reaches FF.
o m
When it goes from FF to 00 (which is FF + 1) then the processor will be reset or an exception
o t.c
will be generated. The only way to stop the WDT from resetting the processor or generating an
exception or interrupt is to periodically reset the WDT back to 00 throughout the program. If the
p
program gets stuck for some reason, then the WDT will not be set. The WDT will then reset or
s
g
interrupt the processor. An interrupt service routine will be invoked to take into account the
o
bl
erroneous operation of the program. (getting stuck or going into infinite loop).
Conclusion p .
ou
gr
In this chapter you have learnt about the programmable timer/counter. For most of the embedded
s
nt
processors the timer is internal and exists along with the processor on the same chip. The 8051
microcontroller has 3 different internal timers which can be programmed in various modes by the
d e
configuration and mode control register. An external timer chip namely 8253 has also been
t u
discussed. It has 8 data lines 2 data lines, 1 chip select line and one read and one write control
s
line. The 16 bit counts of the corresponding registers can be loaded with two consecutive write
y
it
operations. Counters and Timers are used for triggering, trapping and managing various real time
.c
events. The least count of the timer depend on the clock. The stability of the clock decides the
w
accuracy of the timings. Timers can be used to generate specific baud rate clocks for
w
asynchronous serial communications. It can be used to measure speed, frequency and analog
w
voltages after Voltage to Frequency conversion. One important application of timer is to generate
Pulse-Width-Modulated (PWM) waveforms. In 8253 the GATE and pulse together can be used
together to generate pulse with different widths. These modulated pulses are used in electronic
power control to reduce harmonics and hence distortions.
You also learnt about the Watch dog circuit and Watch dog timers. These are used to monitor the
activity of a program and the processor.
Questions
Q1. Design a circuit using 8253 to measure the speed of any motor by counting the number of
pulses in definite period.
Q2. Write a pseudo code (any assembly code) to generate sinusoidal pulse width modulated
waveform from the 8253 timer.

Q3. Design a scheme to read temperature from a thermister circuit using a V/F converter and
Timer.
Q4. What are the differences in Mode 4 and Mode 5 operation of 8253 Timer?
Q5. Explain the circuit given in Fig.14.5.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
15
Interrupts
Interrupts
Interrupt Service Subroutines
Polling
Priority Resolving
Daisy Chain Interrupts
Interrupt Structure in 8051 Microcontroller
Programmable Interrupt Controller
Pre-Requisite
o m
t.c
15 Introduction
p o
Real Time Embedded System design requires that I/O devices receive servicing in an
g s
efficient manner so that large amounts of the total system tasks can be assumed by the processor
o
bl
with little or no effect on throughput. The most common method of servicing such devices is the
p .
polled approach. This is where the processor must test each device in sequence and in effect
ask each one if it needs servicing. It is easy to see that a large portion of the main program is
u
looping through this continuous polling cycle and that such a method would have a serious,
o
r
detrimental effect on system throughput, thus, limiting the tasks that could be assumed by the
g
s
microcomputer and reducing the cost effectiveness of using such devices. A more desirable
ent
method would be one that would allow the microprocessor to be executing its main program and
only stop to service peripheral devices when it is told to do so by the device itself. In effect, the
u d
method would provide an external asynchronous input that would inform the processor that it
st
should complete whatever instruction that is currently being executed and fetch a new routine
t y
that will service the requesting device. Once this servicing is complete, however, the processor
i
.c
would resume exactly where it left off. This can be effectively handled by interrupts.
A signal informing a program or a device connected to the processor that an event has
w
w
occurred. When a processor receives an interrupt signal, it takes a specified action depending on
w
the priority and importance of the entity generating the signal. Interrupt signals can cause a
program to suspend itself temporarily to service the interrupt by branching into another program
called Interrupt Service Subroutines (ISS) for the specified device which has caused the
interrupt.
Types of Interrupts
Interrupts can be broadly classified as
- Hardware Interrupts
These are interrupts caused by the connected devices.
- Software Interrupts
These are interrupts deliberately introduced by software instructions to generate user
defined exceptions
- Trap

These are interrupts used by the processor alone to detect any exception such as divide by
zero
Depending on the service the interrupts also can be classified as
- Fixed interrupt
Address of the ISR built into microprocessor, cannot be changed
Either ISR stored at address or a jump to actual ISR stored if not enough bytes available
- Vectored interrupt
Peripheral must provide the address of the ISR
Common when microprocessor has multiple peripherals connected by a system bus
Compromise between fixed and vectored interrupts
One interrupt pin
Table in memory holding ISR addresses (maybe 256 words)
Peripheral doesnt provide ISR address, but rather index into table
Fewer bits are sent by the peripheral
Can move ISR location without changing peripheral
o m
Maskable vs. Non-maskable interrupts
o t.c
Maskable: programmer can set bit that causes processor to ignore interrupt
p
This is important when the processor is executing a time-critical code
s
Non-maskable: a separate interrupt pin that cant be masked
o g
bl
Typically reserved for drastic situations, like power failure requiring immediate backup
of data to non-volatile memory
Example: Interrupt Driven Data Transfer (Fixed Interrupt)p .
ou
r
Fig.15.1(a) shows the block diagram of a system where it is required to read data from a input
g
s
nt
port P1, modify (according to some given algorithm) and send to port P2. The input port
generates data at a very slow pace. There are two ways to transfer data
e
(a) The processor waits till the input is ready with the data and performs a read operation from
d
u
P1 followed by a write operation to P2. This is called Programmed Data Transfer (b) The
t
s
other option is when the input/output device is slow then the device whenever is ready interrupts
y
t
the microprocessor through an Int pin as shown in Fig.15.1. The processor which may be
i
.c
otherwise busy in executing another program (main program here) after receiving the interrupts
w
calls an Interrupt Service Subroutine (ISR) to accomplish the required data transfer. This is
w
known as Interrupt Driven Data Transfer.
w

Program memory Data memory

ISR C
16: MOV R0, 0x8000
System bus
17: # modifies R0
18: MOV 0x8001, R0
19: RETI # ISR return
...
Int P1 P2
Main program
...
PC 0x8000 0x8001
100: instruction
101: instruction
o m
Fig: 15.1(a) The Interrupt Driven Data Transferc
o t.
PC-Program counter, P1-Port 1 P2-Port 2, C-Microcontroller
s p
o g
l
P1.b
receives input data in a
pregister with address 0x8000.
Time
C is executing its main program

at 100
o u
gr
ts
e
After completing instructionn P1 asserts Int to request
servicing by the
d
at 100, C sees Int asserted,
u of 100, microprocessor.
saves the PCs valuet
s ISR fixed
ti y
and sets PC to the
location of 16.
.c
w
w reads data from
The ISR After being read, P1 de-asserts
w modifies the data, and Int.
0x8000,
writes the resulting data to
0x8001.
The ISR returns, thus

restoring PC to 100+1=101,
where P resumes executing.
Fig. 15.1(b) Flow chart for Interrupt Service

Fig.15.1(b) describes the sequence of action taking place after the Port P1 is ready with the data.
Example: Interrupt Driven Data Transfer (Vectored Interrupt)

Program memory C Data memory

ISR
16: MOV R0, 0x8000
System bus
17: # modifies R0
18: MOV 0x8001, R0
19: RETI # ISR return
... Inta P1 P2
Main program Int
... 0 16
PC
100: instruction 0x8000 0x8001
100
101: instruction
o m
t.c
Fig. 15.2(a)
p o
s input data in a
Time
C is executing its main program. P1 receives

registerg
lo
with address 0x8000.
. b
pP1 asserts Int to request servicing
u by the microprocessor.
After completing instruction at 100, C
r o
g
sees Int asserted, saves the PCs value of
s
nt
100, and asserts Inta. P1 detects Inta and puts
interrupt address vector 16 on
d e the data bus.
C jumps to the address on t uthe 0x8000,
bus (16).
The ISR there reads data
y s from
i
modifies the data, and t writes the After being read, P1 deasserts
.c
resulting data to 0x8001. Int.
w
w
w
The ISR returns, thus restoring PC to
100+1=101, where P resumes
executing.
Fig. 15.2(b) Vectored Interrupt Service

Interrupts in a Typical Microcontroller (say 8051)

External
Interrupts
4k 128 Timer 1 Counter

Interrupt
Control ROM RAM Timer 0 Inputs
CPU
Bus Serial
Osc Control Four I/O Ports Port
o m
t.c
TXD RXD
P0 P2 P1 P3
p o
Address/Data
g s
o
bl
Fig. 15.3 The 8051 Architecture
.
The 8051 has 5 interrupt sources: 2 external interrupts,
u p 2 timer interrupts, and the serial port
interrupt.
r o
These interrupts occur because of
s g
1. timers overflowing
2. receiving character via the serial portn t
d
3. transmitting character via the serialeport
4. Two external events
t u
s
Interrupt Enables ti y
. c
Each interrupt source canwbe individually enabled or disabled by setting or clearing a bit in a
w (SFR) named IE (Interrupt Enable). This register also contains a global
Special Function Register
wbe cleared
disable bit, which can to disable all interrupts at once.
Interrupt Priorities
Each interrupt source can also be individually programmed to one of two priority levels by
setting or clearing a bit in the SFR named IP (Interrupt Priority). A low-priority interrupt can be
interrupted by a high-priority interrupt, but not by another low-priority interrupt. A high-priority
interrupt cant be interrupted by any other interrupt source. If two interrupt requests of different
priority levels are received simultaneously, the request of higher priority is serviced. If interrupt
requests of the same priority level are received simultaneously, an internal polling sequence
determines which request is serviced. Thus within each priority level there is a second priority
structure determined by the polling sequence. In operation, all the interrupt flags are latched into
the interrupt control system during State 5 of every machine cycle. The samples are polled
during the following machine cycle. If the flag for an enabled interrupt is found to be set (1), the
interrupt system generates a CALL to the appropriate location in Program Memory, unless some
other condition blocks the interrupt. Several conditions can block an interrupt, among them that
an interrupt of equal or higher priority level is already in progress. The hardware-generated
CALL causes the contents of the Program Counter to be pushed into the stack, and reloads the
PC with the beginning address of the service routine.
Interrupt Enable(IE) Register terrupt Priority (IP) Regist
o m
ot.c
s p
o g
. bl
u p
r o
s g
nt
de
High Priority
IE Register IP Register Interrupt
0
t u
ys
INT0 IT0 IE0
1
i t
TF0 .c Interrupt Pol-
w ling Sequence
0 w
INT1
1
w
IT1 IE1
TF1
RI
TI
Individual Low Priority

Global
Enables Interrupt
Disable
Fig. 15.4 8051 Interrupt Control System

INT0 : External Interrupt 0

INT0 : External Interrupt 1
TF0: Timer 0 Interrupt
TF1: Timer 1 Interrupt
RI,TI: Serial Port Receive/Transmit Interrupt
The service routine for each interrupt begins at a fixed location (fixed address interrupts). Only
the Program Counter (PC) is automatically pushed onto the stack, not the Processor Status Word
(which includes the contents of the accumulator and flag register) or any other register. Having
only the PC automatically saved allows the programmer to decide how much time should be
spent saving other registers. This enhances the interrupt response time, albeit at the expense of
increasing the programmers burden of responsibility. As a result, many interrupt functions that
are typical in control applications toggling a port pin for example, or reloading a timer, or
unloading a serial buffer can often be completed in less time than it takes other architectures to
complete.
o m
Interrupt Interrupt Description
o t.c
Number Vector Address
s p
0 0003h g
EXTERNAL 0
o
1 000Bh
bl
TIMER/COUNTER 0
.
2 0013h
u p
EXTERNAL 1
3 001Bh
r o TIMER/COUNTER 1
4 0023h
s g SERIAL PORT
n t in the following order:

e
Simultaneously occurring interrupts are serviced
d
1. External 0 Interrupt
t u
2. Timer 0 Interrupt
y s
3. External 1 Interrupt
i t
4. Timer 1 Interrupt
. c
5. Serial Interrupt
w
w
The Bus Arbitration w
When there are more than one device need interrupt service then they have to be connected in
specific manner. The processor responds to each one of them. This is called Arbitration. The
method can be divided into following
Priority Arbiter
Daisy Chain Arbiter

Priority Arbiter
C
System bus
7
Inta 5
Priority Peripheral Peripheral
Int arbiter 1 2
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
o m
t.c
Fig. 15.5 The Priority Arbitration
Let us assume that the Priority of the devices are Device1 > Device 2
p o
1. The Processor is executing its program.
g s
o
2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts
Ireq2.
.bl
u
4. Processor stops executing its program and stores its state.
p
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
5. Processor asserts Inta. r o

g
6. Priority arbiter asserts Iack1 to acknowledge Peripheral1.
s
ent
7. Peripheral1 puts its interrupt address vector on the system bus
8. Processor jumps to the address of ISR read from data bus, ISR executes and returns(and
completes handshake with arbiter).
u d
st
y
Thus in case of simultaneous interrupts the device with the highest priority will be served.
it
Daisy Chain Interrupts . c
w
w
w
In this case the peripherals needing interrupt service are connected in a chain as shown in
Fig.15.6. The requests are chained and hence any device interrupting shall be transmitted to the
CPU in a chain.
Let us assume that the Priority of the devices are Device1 > Device 2
1. The Processor is executing its program.
2. Any Peripheral needs servicing asserts Req out. This Req out goes to the Req in of the
subsequent device in the chain
3. Thus the peripheral nearest to the C asserts Int.
4. The processor stops executing its program and stores its state.
5. Processor asserts Inta the nearest device.
6. The Inta passes through the chain till it finds a flag which is set by the device which has
generated the interrupt.
7. The interrupting device sends the Interrupt Address Vector to the processor for its
interrupt service subroutine.
8. The processor jumps to the address of ISR read from data bus, ISR executes and returns.
9. The flag is reset.
The processor now check for the next device which has interrupted simultaneously.
C System bus
Peripheral 1 Peripheral 2
Inta Ack_in Ack_out Ack_in Ack_out
Int Req_out Req_in Req_out Req_in 0
Fig. 15.6 The Daisy Chain Arbitration

In this case The device nearest to the processor has the highest priority o m
c
The service to the subsequent stages is interrupted if the chain is brokent.at one place.
p o
Handling a number of Interrupts by Intelgs8259 Programmable
Interrupt Controller
b lo
The Programmable Interrupt Controller (PlC) functions p .as an overall manager in an Interrupt-
o
Driven system. It accepts requests from the peripheral u equipment, determines which of the
incoming requests is of the highest importancer (priority), ascertains whether the incoming
s gcurrently being serviced, and issues an interrupt
to the CPU based on this determination. nt
request has a higher priority value than the level
d e
t uCPU
INT
s
ti y
.c
w PIC
w
w
RAM I/O (1)
ROM I/O (2)
I/O (N)
Fig. 15.7 Handling a number of interrupts

Each peripheral device or structure usually has a special program or routine that is associated
with its specific functional or operational requirements; this is referred to as a service routine.
The PlC, after issuing an interrupt to the CPU, must somehow input information into the CPU
that can point (vector) the Program Counter to the service routine associated with the requesting
device.
The PIC manages eight levels of requests and has built-in features for expandability to other PIC
(up to 64 levels). It is programmed by system software as an I/O peripheral. The priority modes
can be changed or reconfigured dynamically at any time during main program operation.
Interrupt Request Register (IRR) and In-Service Register (ISR)

The interrupts at the IR input lines are handled by two registers in cascade, the Interrupt Request
Register (lRR) and the In- Service Register (lSR). The IRR is used to indicate all the interrupt
levels which are requesting service, and the ISR is used to store all the interrupt levels which are
currently being serviced.
o m
Priority Resolver o t.c
p
s lRR. The highest priority is
This logic block determines the priorities of the bits set in the
g
o the INTA sequence.
selected and strobed into the corresponding bit of the lSR during
b l
Interrupt Mask Register (IMR) p .
u
olines to be masked. The IMR operates on the
r
ginput will not affect the interrupt request lines of
The lMR stores the bits which disable the interrupt
output of the IRR. Masking of a higher priorityts
lower priority.
e n
u d
Data Bus Buffer
st
t y
i buffer is used to interface the PIC to the System Data Bus.
This 3-state, bidirectional 8-bit
.c are transferred through the Data Bus Buffer.
Control words and status information
w
Read/Write Control w
w Logic
The function of this block is to accept output commands from the CPU. It contains the
Initialization Command Word (lCW) registers and Operation Command Word (OCW) registers
which store the various control formats for device operation. This function block also allows the
status of the PIC to be transferred onto the Data Bus. This function block stores and compares
the IDs of all PICs used in the system. The associated three I/O pins (CAS0- 2) are outputs when
the 8259 is used as a master and are inputs when the 8259 is used as a slave. As a master, the
8259 sends the ID of the interrupting slave device onto the CAS0 - 2 lines. The slave, thus
selected will send its preprogrammed subroutine address onto the Data Bus during the next one
or two consecutive INTA pulses.

D[7..0] IR0
A[0..0] IR1
RD IR2
WR IR3
INT Intel 8259 IR4
INTA IR5
IR6
CAS[2..0] IR7
SP/EN
Fig. 15.8 The 8259 Interrupt Controller
INTA INT
DATA
D7-D0 BUS
BUFFER
CONTROL LOGIC
o m
o t.c
RD READ/
s p IR0
IR1
WR
A0
WRITE
LOGIC
IN-
SERVICE
o g
PRIORITY
INTERRUPT
REQUEST
IR2
IR3
bl
REG RESOLVER REG IR4
(ISR) IR5
CS
p . (IRR)
IR6
IR7
CAS 0 CASCADE
o u
INTERRUPT MASK REG
CAS 1
CAS 2
BUFFER
COMPARATOR
gr (IMR)
s
nt
SP/EN INTERNAL BUS
d e
Fig. 15.9 The Functional Block Diagram
t u
y s
it Table of Signals of the PIC
Signal .c Description
wThese wires are connected to the system bus and are used by the
D[7..0] w microprocessor
A[0..0]
w to write or read the internal registers of the 8259.
This pin acts in conjunction with WR/RD signals. It is used by
the 8259 to decipher various command words the microprocessor
writes and status the microprocessor wishes to read.
WR When this write signal is asserted, the 8259 accepts the command
on the data line, i.e., the microprocessor writes to the 8259 by
placing a command on the data lines and asserting this signal.
RD When this read signal is asserted, the 8259 provides on the data
lines its status, i.e., the microprocessor reads the status of the
8259 by asserting this signal and reading the data lines.
INT This signal is asserted whenever a valid interrupt request is
received by the 8259, i.e., it is used to interrupt the
microprocessor.

INTA This signal, is used to enable 8259 interrupt-vector data onto the
data bus by a sequence of interrupt acknowledge pulses issued by
the microprocessor.
IR 0,1,2,3,4,5,6,7 An interrupt request is executed by a peripheral device when one
of these signals is asserted.
CAS[2..0] These are cascade signals to enable multiple 8259 chips to be
chained together.
SP/EN This function is used in conjunction with the CAS signals for
cascading purposes.
Fig.15.10 shows the daisy chain connection of a number of PICs. The extreme right PIC
interrupts the processor. In this figure the processor can entertain up to 24 different interrupt
requests. The SP/EN signal has been connected to Vcc for the master and grounded for the
slaves.
ADDRESS BUS (16)

o m
CONTROL BUS
o t.c
s p INT REQ
DATA BUS (8)
o g
.bl
CS A0 D7 D0 INTA INT
CAS 0
CS A0 D7 D0 INTA INT
CAS 0
82C59A SLAVE B CAS 1 u p CS A0 D7 D0 INTA INT
CAS 0
82C59A SLAVE A CAS 1
CAS 2
r o CAS 2
CAS 1 MASTER 82C59A
CAS 2
SP/EN 7 6 5 4 3 2 1 0
g
SP/EN 7 6 5 4 3 2 1 0
s
SP/EN 7 6 5 4 3 2 1 0
GND 7 6 5 4 3 2 1 0 GND 7
ent
6 5 4 3 2 1 0 VCC 7 6 5 4 3 2 1 0
u d
st INTERRUPT REQUESTS
it y
.c
Fig. 15.10 Nested Connection of Interrupts
w
Software Interrupts
w
w
These are initiated by the program by specific instructions. On encountering such instructions the
CPU executes an Interrupt service subroutine.
Conclusion
In this chapter you have learnt about the Interrupts and the Programmable Interrupt
Controller. Different methods of interrupt services such as Priority arbitration and Daisy Chain
arbitration have been discussed. In real time systems the interrupts are used for specific cases
and the time of execution of these Interrupt Service Subroutines are almost fixed. Too many
interrupts are not encouraged in real time as it may severely disrupt the services. Please look at
problem no.1 in the exercise.
Most of the embedded processors are equipped with an interrupt structure. Rarely there is a
need to use a PIC. Some of the entry level microcontrollers do not have an inbuilt exception

handler called trap. The trap is also an interrupt which is used to handle some extreme processor
conditions such as divide by 0, overflow etc.
Question Answers
Q1. A computer system has three devices whose characteristics are summarized in the following
table:
Device Service Interrupt Allowable

Time Frequency Latency
D1 150s 1/(800s) 50s
D2 50s 1/(1000s) 50s
D3 100s 1/(800s) 100s
o m
t.c
Service time indicates how long it takes to run the interrupt handler for each device. The
maximum time allowed to elapse between an interrupt request and the start of the interrupt
p o
handler is indicated by allowable latency. If a program P takes 100 seconds to execute when
s
interrupts are disabled, how long will P take to run when interrupts are enabled?
g
o
Ans:
. bl
u p
The CPU time taken to service the interrupts must be found out. Let us consider Device 1. It
takes 400 s to execute and occurs at a frequency of 1/(800s) (1250 times a second). Consider a
r o
time quantum of 1 unit.
s g
The Device 1 shall take (150+50)/800= 1/4 unit
The Device 2 shall take (50+50)/1000=1/10nunit
t
The Device 3 shall take (100+100)/800=1/4 d e unit
u by all these devices is (1/4+1/10+1/4) = 0.6 units
In one unit of real time the cpu time ttaken
y s
t
i can be used by the Program P. For 100 seconds of CPU time
. c
The cpu idle time 0.4 units which
the Real Time required will be 100/0.4= 250 seconds
w
w
Q.2 What is TRAP?
w
Ans:
The term trap denotes a programmer initiated and expected transfer of control to a special
handler routine. In many respects, a trap is nothing more than a specialized subroutine call.
Many texts refer to traps as software interrupts. Traps are usually unconditional; that is, when
you execute an Interrupt instruction, control always transfers to the procedure associated with
the trap. Since traps execute via an explicit instruction, it is easy to determine exactly which
instructions in a program will invoke a trap handling routine.

Q.3. Discuss about the Interrupt Acknowledge Machine Cycle.
Ans:
For vectored interrupts the processor expects the address from the external device. Once it
receives the interrupt it starts an Interrupt acknowledge cycle as shown in the figure. In the
figure TN is the last clock state of the previous instruction immediately after which the processor
checks the status of the Intr pin which has already become high by the external device. Therefore
the processor starts an INTA cycle in which it brings the interrupt vector through the data lines.
If the data lines arte 8-bits and the address required is 16 bits there will be two I/O read. If the
interrupt vector is a number which will be vectored to a look up table then only 8-bits are
required and hence one I/O read will be there.
TN T1 T2 T3
o m
CLK
o t.c
s p
INTREQ
o g
.bl
INTACK
u p
r o
Data
s g Address code
e nt
Last machine cycle
u dInterrupt Acknowledge machine cycle
of instruction
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
16
DMA
The concept of Direct Memory Access

When and where to use DMA?
How to initiate an DMA cycle?
What are the different steps of DMA?
What is a typical DMA controller?
Pre-Requisite
o m
16(I) Introduction o t.c
s p
g
Drect Memory Access (DMA) allows devices to transfer data without subjecting the
o
bl
processor a heavy overhead. Otherwise, the processor would have to copy each piece of data
p .
from the source to the destination. This is typically slower than copying normal blocks of
memory since access to I/O devices over a peripheral bus is generally slower than normal system
o u
RAM. During this time the processor would be unavailable for any other tasks involving
gr
processor bus access. But it can continue to work on any work which does not require bus
s
nt
access. DMA transfers are essential for high performance embedded systems where large chunks
of data need to be transferred from the input/output devices to or from the primary memory.
d e
16(II) DMA Controllertu
s
A DMA controller is iatydevice, usually peripheral to a CPU that is programmed to
. c on behalf of the CPU. A DMA controller can directly access
perform a sequence of data transfers
w A DMA
memory and is used to transfer data from one memory location to another, or from an I/O device
w
to memory and vice versa. controller manages several DMA channels, each of which
can be programmedwto perform a sequence of these DMA transfers. Devices, usually I/O
peripherals, that acquire data that must be read (or devices that must output data and be written
to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request
(DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This
signal is monitored and responded to in much the same way that a processor handles interrupts.
When the DMA controller sees a DMA request, it responds by performing one or many data
transfers from that I/O device into system memory or vice versa. Channels must be enabled by
the processor for the DMA controller to respond to DMA requests. The number of transfers
performed, transfer modes used, and memory locations accessed depends on how the DMA
channel is programmed. A DMA controller typically shares the system memory and I/O bus with
the CPU and has both bus master and slave capability. Fig.16.1 shows the DMA controller
architecture and how the DMA controller interacts with the CPU. In bus master mode, the DMA
controller acquires the system bus (address, data, and control lines) from the CPU to perform the

DMA transfers. Because the CPU releases the system bus for the duration of the transfer, the
process is sometimes referred to as cycle stealing.
In bus slave mode, the DMA controller is accessed by the CPU, which programs the
DMA controller's internal registers to set up DMA transfers. The internal registers consist of
source and destination address registers and transfer count registers for each DMA channel, as
well as control and status registers for initiating, monitoring, and sustaining the operation of the
DMA controller.
DMA Controller
...
Status Register
...
o m
CPU
Enable/
Disable
Mask Register
ot.c
DMA Channel X
s p
Base
o g
bl
Count
TC
Current
p .
Count
o u
Base
gr
s
nt
Address
Current
Address
d e
t u
y s
t
ci
Base Request
DMA Arbitration
Logic . Base Grant
w
w
w
DACKX
DRQX
TC
PC Bus
Fig. 16.1 The DMA controller architecture

DMA Transfer Types and Modes

DMA controllers vary as to the type of DMA transfers and the number of DMA channels
they support. The two types of DMA transfers are flyby DMA transfers and fetch-and-deposit
DMA transfers. The three common transfer modes are single, block, and demand transfer modes.
These DMA transfer types and modes are described in the following paragraphs. The fastest
DMA transfer type is referred to as a single-cycle, single-address, or flyby transfer. In a flyby
DMA transfer, a single bus operation is used to accomplish the transfer, with data read from the
source and written to the destination simultaneously. In flyby operation, the device requesting
service asserts a DMA request on the appropriate channel request line of the DMA controller.
The DMA controller responds by gaining control of the system bus from the CPU and then
issuing the pre-programmed memory address. Simultaneously, the DMA controller sends a
DMA acknowledge signal to the requesting device. This signal alerts the requesting device to
drive the data onto the system data bus or to latch the data from the system bus, depending on the
o m
direction of the transfer. In other words, a flyby DMA transfer looks like a memory read or write
cycle with the DMA controller supplying the address and the I/O device reading or writing the
ot.c
data. Because flyby DMA transfers involve a single memory cycle per data transfer, these
transfers are very efficient. Fig.16.2 shows the flyby DMA transfer signal protocol.
s p
o g DMA request remains
bl
DMA Request
(I/O Device) high for additional
DMA Acknowledge*
p . transfers.
(DMA Controller)
o u
I/O Read*
gr
s
nt
(DMA Controller)
Memory Write*
d e
(DMA Controller)
t u
y s
t
ci
Address Memory Address
(DMA Controller)
.
Data w Data
w
I/O Device
w
Fig. 16.2 Flyby DMA transfer
The second type of DMA transfer is referred to as a dual-cycle, dual-address, flow-

through, or fetch-and-deposit DMA transfer. As these names imply, this type of transfer involves
two memory or I/O cycles. The data being transferred is first read from the I/O device or
memory into a temporary data register internal to the DMA controller. The data is then written to
the memory or I/O device in the next cycle. Fig.16.3 shows the fetch-and-deposit DMA transfer
signal protocol. Although inefficient because the DMA controller performs two cycles and thus
retains the system bus longer, this type of transfer is useful for interfacing devices with different
data bus sizes. For example, a DMA controller can perform two 16-bit read operations from one
location followed by a 32-bit write operation to another location. A DMA controller supporting
this type of transfer has two address registers per channel (source address and destination
address) and bus-size registers, in addition to the usual transfer count and control registers.

Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory
and I/O transfers.
DMA Request
I/O Device
I/O Read*
Memory Write*
(DMA
Controller)
Address I/O Address Memory Address
Data Data Data
o m
t.c
Fig. 16.3 Fetch-and-Deposit DMA Transfer
p o
Single, block, and demand are the most common transfer modes. Single transfer mode
gs
transfers one data value for each DMA request assertion. This mode is the slowest method of
o
bl
transfer because it requires the DMA controller to arbitrate for the system bus with each transfer.
p .
This arbitration is not a major problem on a lightly loaded bus, but it can lead to latency
problems when multiple devices are using the bus. Block and demand transfer modes increase
u
system throughput by allowing the DMA controller to perform multiple DMA transfers when the
o
r
DMA controller has gained the bus. For block mode transfers, the DMA controller performs the
g
s
entire DMA sequence as specified by the transfer count register at the fastest possible rate in
e nt
response to a single DMA request from the I/O device. For demand mode transfers, the DMA
controller performs DMA transfers at the fastest possible rate as long as the I/O device asserts its
u d
DMA request. When the I/O device unasserts this DMA request, transfers are held off.
st
DMA Controller Operation i ty
.c
For each channel,wthe DMA controller saves the programmed address and count in the
registers, as shown w
w copies of the information in the current address and current count
base registers and maintains
in Fig.16.1. Each DMA channel is enabled and disabled via a DMA mask
register. When DMA is started by writing to the base registers and enabling the DMA channel,
the current registers are loaded from the base registers. With each DMA transfer, the value in the
current address register is driven onto the address bus, and the current address register is
automatically incremented or decremented. The current count register determines the number of
transfers remaining and is automatically decremented after each transfer. When the value in the
current count register goes from 0 to -1, a terminal count (TC) signal is generated, which
signifies the completion of the DMA transfer sequence. This termination event is referred to as
reaching terminal count. DMA controllers often generate a hardware TC pulse during the last
cycle of a DMA transfer sequence. This signal can be monitored by the I/O devices participating
in the DMA transfers. DMA controllers require reprogramming when a DMA channel reaches
TC. Thus, DMA controllers require some CPU time, but far less than is required for the CPU to
service device I/O interrupts. When a DMA channel reaches TC, the processor may need to
reprogram the controller for additional DMA transfers. Some DMA controllers interrupt the

processor whenever a channel terminates. DMA controllers also have mechanisms for
automatically reprogramming a DMA channel when the DMA transfer sequence completes.
These mechanisms include auto initialization and buffer chaining. The auto initialization feature
repeats the DMA transfer sequence by reloading the DMA channel's current registers from the
base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is
useful for transferring blocks of data into noncontiguous buffer areas or for handling double-
buffered data acquisition. With buffer chaining, a channel interrupts the CPU and is programmed
with the next address and count parameters while DMA transfers are being performed on the
current buffer. Some DMA controllers minimize CPU intervention further by having a chain
address register that points to a chain control table in memory. The DMA controller then loads
its own channel parameters from memory. Generally, the more sophisticated the DMA
controller, the less servicing the CPU has to perform.
A DMA controller has one or more status registers that are read by the CPU to determine
the state of each DMA channel. The status register typically indicates whether a DMA request is
asserted on a channel and whether a channel has reached TC. Reading the status register often
o m
clears the terminal count information in the register, which leads to problems when multiple
programs are trying to use different DMA channels.
Steps in a Typical DMA cycle
o t.c
p
Device wishing to perform DMA asserts the processors bus request signal.
s
g
1. Processor completes the current bus cycle and then asserts the bus grant signal to the
o
bl
device.
2. The device then asserts the bus grant ack signal. p .
u o
r
3. The processor senses in the change in the state of bus grant ack signal and starts listening
g
to the data and address bus for DMA activity.
s
t from the source to destination address.
4. The DMA device performs the transfer
e n
5. u d
During these transfers, the processor monitors the addresses on the bus and checks if any
location modified during DMA t
sthe bus,operations is cached in the processor. If the processor
detects a cached addresston
i y it can take one of the two actions:
. c
o Processor invalidates the internal cache entry for the address involved in DMA
w
w
write operation
w updates the internal cache when a DMA write is detected
o Processor
6. Once the DMA operations have been completed, the device releases the bus by asserting
the bus release signal.
7. Processor acknowledges the bus release and resumes its bus cycles from the point it left
off.

16(III) 8237 DMA Controller

IOR 1 40 A7
IOW 2 39 A6
MEMR 3 38 A5
MEMW 4 37 A4
NC 5 36 EOP
READY 6 35 A3
HLDA 7 34 A2
ADSTB 8 33 A1
AEN 9 32 A0
HRQ 10 31 VCC
CS 11 30 DB0
CLK 12 29 DB1
RESET 13
DACK2 14
28
27
DB2
DB3 o m
DACK3 15
DREQ3 16
26
25
DB4
DACK0 o t.c
DREQ2 17 24
s p
DACK1
DREQ1 18 23
o g DB5
bl
DREQ0 19 22 DB6
p.
(GND) VSS 20 21 DB7
o u
Fig. 16.4 The DMA pin-out
gr
EOP DECREMENTOR
ts INC/DECREMENTOR IO
RESET TEMP WORD
en TEMP ADDRESS BUFFER
A0-A3
CS COUNT REG (16)
u d REG (16)
READY
CLK TIMING
st 16-BIT BUS
AEN AND
it y
READ BUFFER
16-BIT BUS
READ WRITE BUFFER OUTPUT A4-A7
.c
ADSTB CONTROL
BUFFER
MEMR BASE CURRENT
MEMW w BASE
ADDRESS
WORD
CURRENT
ADDRESS WORD
w
A8-A15
COUNT COUNT
IOR (16) (16)
IOW w (16) (16)
COMMAND
CONTROL
WRITE READ
BUFFER D0-D1
BUFFER
4
DREQ0- PRIORITY COMMAND
DREQ3 ENCODER (8) IO
INTERNAL DATA BUS
HLDA AND BUFFER
ROTATING MASK
HRQ
DB0-DB7
4 PRIORITY (4)
DACK0- LOGIC
DACK3 REQUEST STATUS TEMPORARY
(4) MODE (8) (8)
(4 x 6)
Fig. 16.5 The 8237 Architecture

Signal Description (Fig.16.4 and Fig.16.5)

VCC: is the +5V power supply pin
GND Ground
CLK: CLOCK INPUT: The Clock Input is used to generate the timing signals which control
82C37A operations.
CS: CHIP SELECT: Chip Select is an active low input used to enable the controller onto the data
bus for CPU communications.
RESET: This is an active high input which clears the Command, Status, Request, and
Temporary registers, the First/Last Flip-Flop, and the mode register counter. The Mask register is
set to ignore requests. Following a Reset, the controller is in an idle cycle.
READY: This signal can be used to extend the memory read and write pulses from the 82C37A
to accommodate slow memories or I/O devices.
o
HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates m
that it has relinquished control of the system busses.
DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual o t.c
s p
asynchronous channel request inputs used by peripheral circuits to obtain DMA service. In Fixed
g
Priority, DREQ0 has the highest priority and DREQ3 has the lowest priority. A request is
o
. bl
generated by activating the DREQ line of a channel. DACK will acknowledge the recognition of
a DREQ signal. Polarity of DREQ is programmable. RESET initializes these lines to active high.
u p
DREQ must be maintained until the corresponding DACK goes active. DREQ will not be
r o
recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low
(inactive) and the corresponding mask bit set.
s g
DB0-DB7: DATA BUS: The Data Bus lines tare bidirectional three-state signals connected to the
system data bus. The outputs are enablede innthe Program condition during the I/O Read to output
ud outputs are
the contents of a register to the CPU. The
I/O Write cycle when the CPU is tprogramming
disabled and the inputs are read during an
s the address are output onto the data bus to be strobed into an
cycles, the most significant 8-bitsyof
the 82C37A control registers. During DMA
i t
. c
external latch by ADSTB. In memory-to-memory operations, data from the memory enters the
82C37A on the data bus during the read-from-memory transfer, then during the write-to-memory
w write the data into the new memory location.
transfer, the data bus outputs
w
IOR: READ: I/O Read w is a bidirectional active low three-state line. In the Idle cycle, it is an
input control signal used by the CPU to read the control registers. In the Active cycle, it is an
output control signal used by the 82C37A to access data from the peripheral during a DMA
Write transfer.
IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an
input control signal used by the CPU to load information into the 82C37A. In the Active cycle, it
is an output control signal used by the 82C37A to load data to the peripheral during a DMA Read
transfer.
EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal.
Information concerning the completion of DMA services is available at the bidirectional EOP
pin. The 82C37A allows an external signal to terminate an active DMA service by pulling the
EOP pin low. A pulse is generated by the 82C37A when terminal count (TC) for any channel is
reached, except for channel 0 in memory-to-memory mode. During memory-to-memory

transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by an
open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP
pulse occurs, whether internally or externally generated, the 82C37A will terminate the service,
and if auto-initialize is enabled, the base registers will be written to the current registers of that
channel. The mask bit and TC bit in the status word will be set for the currently active channel
by EOP unless the channel is programmed for autoinitialize. In that case, the mask bit remains
clear.
A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals.
In the Idle cycle, they are inputs and are used by the 82C37A to address the control register to be
loaded or read. In the Active cycle, they are outputs and provide the lower 4-bits of the output
address.
A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide
4-bits of address. These lines are enabled only during the DMA service.
o m
HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the
system bus. When a DREQ occurs and the corresponding mask bit is clear, or a software DMA
o t.c
request is made, the 82C37A issues HRQ. The HLDA signal then informs the controller when
access to the system busses is permitted. For stand-alone operation where the 82C37A always
s p
controls the busses, HRQ may be tied to HLDA. This will result in one S0 state before the
transfer.
o g
DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge
b l is used to notify the individual
p .
peripherals when one has been granted a DMA cycle. The sense of these lines is programmable.
RESET initializes them to active low. u
o the 8-bit latch containing the upper 8
AEN: ADDRESS ENABLE: Address Enable enables r
g can also be used to disable other system bus
address bits onto the system address bus. AEN s
thigh.
drivers during DMA transfers. AEN is active n
ADSTB: ADDRESS STROBE: This d isean active high signal used to control latching of the
t u the strobe input of external transparent octal latches,
upper address byte. It will drive directly
s
ti y operation through elimination of S1 states. ADSTB timing
such as the 82C82. During block operations, ADSTB will only be issued when the upper address
byte must be updated, thus speeding
.c of the 82C37A clock.
is referenced to the falling edge
w The Memory Read signal is an active low three-state output used to
MEMR: MEMORY READ:
w
w
access data from the selected memory location during a DMA Read or a memory-to-memory
transfer.
MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used
to write data to the selected memory location during a DMA Write or a memory-to-memory
transfer.
NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.
Functional Description
The 82C37A direct memory access controller is designed to improve the data transfer rate in
systems which must transfer data from an I/O device to memory, or move a block of memory to
an I/O device. It will also perform memory-to-memory block moves, or fill a block of memory
with data from a single location. Operating modes are provided to handle single byte transfers as

well as discontinuous data streams, which allows the 82C37A to control data movement with
software transparency. The DMA controller is a state-driven address and control signal
generator, which permits data to be transferred directly from an I/O device to memory or vice
versa without ever being stored in a temporary register. This can greatly increase the data
transfer rate for sequential operations, compared with processor move or repeated string
instructions. Memory-to-memory operations require temporary internal storage of the data byte
between generation of the source and destination addresses, so memory-to-memory transfers take
place at less than half the rate of I/O operations, but still much faster than with central processor
techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control
block, priority block, and internal registers are the main components. The timing and control
block derives internal timing from clock input, and generates external control signals. The
Priority Encoder block resolves priority contention between DMA channels requesting service
simultaneously.
DMA Operation m
o
c basically connected in
In a system, the 82C37A address and control outputs and data bus pins are
t.
parallel with the system busses. An external latch is required for theoupper address byte. While
inactive, the controllers outputs are in a high impedance state.pWhen activated by a DMA
request and bus control is relinquished by the host, the 82C37A g sdrives the busses and generates
the control signals to perform the data transfer. The operation
b loperformed by activating one of the
four DMA request inputs has previously been programmed . into the controller via the Command,
p if a block of data is to be transferred
Mode, Address, and Word Count registers. For example,
from RAM to an I/O device, the starting address of o u
the data is loaded into the 82C37A Current
r
g Mode register is programmed for a memory-
and Base Address registers for a particular channel, and the length of the block is loaded into the
channels Word Count register. The corresponding s
toptions are selected by the Command register and
to-I/O operation (read transfer), and various n
e mask bit is cleared to enable recognition of a DMA
d
the other Mode register bits. The channels
u
request (DREQ). The DREQ can either
initiated, the block DMA transfer t be a hardware signal or a software command. Once
s will proceed as the controller outputs the data address,
simultaneous MEMR and IOWtypulses, and selects an I/O device via the DMA acknowledge
c i flows directly from the RAM to the I/O device. After each byte
(DACK) outputs. The data byte
.
is transferred, the addresswis automatically incremented (or decremented) and the word count is
data when the Wordw

w is then repeated for the next byte. The controller stops transferring
decremented. The operation
Count register underflows, or an external EOP is applied.
To further understand 82C37A operation, the states generated by each clock cycle must
be considered. The DMA controller operates in two major cycles, active and idle. After being
programmed, the controller is normally idle until a DMA request occurs on an unmasked
channel, or a software request is given. The 82C37A will then request control of the system
busses and enter the active cycle. The active cycle is composed of several internal states,
depending on what options have been selected and what type of operation has been requested.
The 82C37A can assume seven separate states, each composed of one full clock period. State I
(SI) is the idle state. It is entered when the 82C37A has no valid DMA requests pending, at the
end of a transfer sequence, or when a Reset or Master Clear has occurred. While in SI, the DMA
controller is inactive but may be in the Program Condition (being programmed by the processor).
State 0 (S0) is the first state of a DMA service. The 82C37A has requested a hold but the
processor has not yet returned an acknowledge. The 82C37A may still be programmed until it

has received HLDA from the CPU. An acknowledge from the CPU will signal the DMA transfer
may begin. S1, S2, S3, and S4 are the working state of the DMA service. If more time is needed
to complete a transfer than is available with normal timing, wait states (SW) can be inserted
between S3 and S4 in normal transfers by the use of the Ready line on the 82C37A. For
compressed transfers, wait states can be inserted between S2 and S4. Note that the data is
transferred directly from the I/O device to memory (or vice versa) with IOR and MEMW (or
MEMR and IOW) being active at the same time. The data is not read into or driven out of the
82C37A in I/O-to-memory or memory-to-I/O DMA transfers. Memory-to-memory transfers
require a read-from and a write-to memory to complete each transfer. The states, which resemble
the normal working states, use two-digit numbers for identification. Eight states are required for
a single transfer. The first four states (S11, S12, S13, S14) are used for the read-from-memory
half and the last four state (S21, S22, S23, S24) for the write-to-memory half of the transfer.
16(IV) Conclusion
o m
This lesson has given an overview of DMA controller. The controllers are normally used in high-
o t.c
performance embedded systems where large bulks of data need to transferred from the input to
the memory. One such system is a on-board Digital Signal Processor in a mobile telephone.
s p
Besides fast digital coding and decoding at times this processor is required to process the voice
g
signals to improve the quality. This has to take place in real time. While the voice message is
o
bl
streaming in through the AD-converter it need to be transferred and windowed for filtering.
.
DMA offers a great help here. For simpler systems DMA is not normally used.
p
o u
The signals and functional architecture of a very familiar DMA controller(8237) used in personal
gr
computers has been discussed. For more detailed discussions the readers are requested to visit
s
nt
www.intel.com or any other manufactures and read the datasheet.
16(V) de
Questions and Answers
u
t
s systems? Justify your answers
ti y
Q.1. Can you use 82C37A in embedded
Ans: Only high performance. csystems where the power supply constraints are not stringent. The
supply voltage is 5V andwthe current may reach up to 16 mA resulting in 80 mW of power
consumption. w
w
Q.2 Highlight on different modes of DMA data transfer. Which mode consumes the list power
and which mode is the fastest?
Ans: Refer to text
Q.3. Draw the architecture of 8237 and explain the various parts.
Ans: Refer to text

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
17
USB and IrDA
After going through this lesson the student would be able to learn basics of
The Universal Serial Bus Signals
The IrDA standard
Pre-Requisite
17(I) The USB Port

As personal computers and other microprocessor based embedded systems began
o m
handling photographic images, audio, video and other bulky data, the traditional communications
o t.c
buses are not enough to carry the data as fast as it is desired. So a group of leading computer and
telecom firms including IBM, Intel, Microsoft, Compaq, Digital Equipment, NEC and Northern
Telecom got together and developed USB.
s p
o g
bl
The USB is a medium-speed serial data bus designed to carry relatively large amounts of
p .
data over relatively short cables: up to about five meters long. It can support data rates of up to
12Mb/s (megabits per second). The USB is an addressable bus system, with a seven-bit address
u
code so it can support up to 127 different devices or nodes at once (the all zeroes code is not a
o
r
valid address). However it can have only one host. The host with its peripherals connected via
g
s
the USB forms a star network. On the other hand any device connected to the USB can have a
ent
number of other nodes connected to it in daisy-chain fashion, so it can also form the hub for a
mini-star sub-network. Similarly you can have a device which purely functions as a hub for other
u d
node devices, with no separate function of its own. This expansion via hubs is because the USB
st
supports a tiered star topology, as shown in Fig.17.1. Each USB hub acts as a kind of traffic cop
t y
for its part of the network, routing data from the host to its correct address and preventing bus
i
.c
contention clashes between devices trying to send data at the same time. On a USB hub device,
the single port used to connect to the host PC either directly or via another hub is known as the
w
w
upstream port, while the ports used for connecting other devices to the USB are known as the
w
downstream ports. This is illustrated in Fig.17.2. USB hubs work transparently as far as the host
PC and its operating system are concerned. Most hubs provide either four or seven downstream
ports, or less if they already include a USB device of their own. Another important feature of the
USB is that it is designed to allow hot swapping i.e. devices can be plugged into and unplugged
from the bus without having to turn the power off and on again, re-boot the PC or even manually
start a driver program. A new device can simply be connected to the USB, and the PCs
operating system should recognize it and automatically set up the necessary driver to service it.

USB Host
(PC)
USB device Phone line

(Modem)
+ Hub
USB Hub
o m
Fig. 17.1 The USB is a medium speed serial bus used to transfer data between a
t.c
PC and its peripherals. It uses a tiered star configuration, with expansion via
hubs (either separate, or in USB devices).
p o
g s
o
. bl
u p
r o
s g
USBntMINI HUB
PC
d ePort 1 Port 2 Port 3 Port 4
t u
it ys
.c
w
Upstream port Downstream ports
w
(from PC) (to more devices)
w
Fig. 17.2 The port on a USB device or hub which connects to the PC host (either directly or
via another hub) is known as the upstream port, while hub ports which connect to
additional USB devices are downstream ports. Downstream ports use Type A sockets, while
upstream ports use Type B sockets.
Power and data

USB cables consist of two twisted pairs of wires, one pair used to carry the bidirectional serial
data and the other pair for 5V DC power. This makes it possible for low-powered peripherals
such as a mouse, joystick or modem to be powered directly from the USB or strictly from the
host (or the nearest hub) upstream, via the USB. Most modern PCs have two USB ports, and
each can provide up to 500mA of 5V DC power for bus powered peripherals Individual
peripheral devices (including hubs) can draw a maximum of 100mA from their upstream USB

port, so if they require less than this figure for operation they can be bus powered. If they need
more, they have to use their own power supply such as a plug-pack adaptor. Hubs should be able
to supply up to 500mA at 5V from each downstream port, if they are not bus powered. Serial
data is sent along the USB in differential or push-pull mode, with opposite polarities on the two
signal lines. This improves the signal-to-noise ratio (SNR), by doubling the effective signal
amplitude and also allowing the cancellation of any common-mode noise induced into the cable.
The data is sent in non-return-to-zero (NRTZ) format, with signal levels of 3.3V peak (i.e., 6V
peak differential). USB cables use two different types of connectors: Type-A plugs for the
upstream end, and Type B plugs for the downstream end. Hence the USB ports of PCs are
provided with matching Type-A sockets, as are the downstream ports of hubs, while the
upstream ports of USB devices (including hubs) have Type B sockets. Type-A plugs and sockets
are flat in shape and have the four connections in line, while Type B plugs and sockets are much
squarer in shape and have two connections on either side of the centre spigot (Fig.17.3). Both
types of connector are polarized so they cannot be inserted the wrong way around. Fig.17.3
shows the pin connections for both type of connector, with sockets shown and viewed from the
o m
front. Note that although USB cables having a Type-A plug at each end are available, they
o t.c
should never be used to connect two PCs together, via their USB ports. This is because a USB
network can only have one host, and both would try to claim that role. In any case, the cable
p
would also short their 5V power rails together, which could cause a damaging current to flow.
s
g
USB is not designed for direct data transfer between PCs. All normal USB connections should be
o
bl
made using cables with a Type A plug at one end and a Type B plug at the other, although
p .
extension cables with a Type A plug at one end and a Type A socket at the other can also be
used, providing the total extended length of a cable doesnt exceed 5m. By the way, USB cables
u
are usually easy to identify as the plugs have a distinctive symbol molded into them (Fig.17.4).
o
gr
Data formats (Fig.17.5) s
ent
d
USB data transfer is essentially in the form of packets of data, sent back and forth between the
u
st
host and peripheral devices. However because USB is designed to handle many different types of
data, it can use four different data formats as appropriate. One of the two main formats is bulk
it y
asynchronous mode, which is used for transferring data that is not time critical. The packets can
.c
be interleaved on the USB with others being sent to or from other devices. The other main format
w
is isochronous mode, used to transfer data that is time critical such as audio data to digital
w
speakers, or to/from a modem. These packets must not be delayed by those from other devices.
w
The two other data formats are interrupt format, used by devices to request servicing from the
PC/host, and control format, used by the PC/host to send token packets to control bus operation,
and by all devices to send handshake packets to indicate whether the data they have just received
was OK (ACK) or had errors (NAK). Some of the data formats are illustrated in Fig.17.5. Note
that all data packets begin with a sync byte (01hex), used to synchronize the PLL (phase-locked
loop) in the receiving devices USB controller. This is followed by the packet identifier (PID),
containing a four-bit nibble (sent in both normal and inverted form) which indicates the type of
data and the direction it is going in (i.e., to or from the host). Token packets then have the 7-bit
address of the destination device and a 4-bit end point field to indicate which of that devices
registers its to be sent to. On the other hand data packets have a data field of up to 1023 bytes of
data following the PID field, while Start of Frame (SOF) packets have an 11-bit frame identifier
instead and handshake packets have no other field. Most packets end with a cyclic redundancy
check (CRC) field of either five or 16 bits, for error checking, except handshake packets which
rely on the redundancy in the PID field. All USB data is sent serially, of course, and least-

significant-bit (LSB) first. Luckily all of the fine details of USB handshaking and data transfer
are looked after by the driver software in the host and the firmware built into the USB controller
inside each USB peripheral device and hub
2 1
1 2 3 4
3 4
Type A socket
(from front)
Pin connections
Pin No. Signal
1 + 5V Power
2 - Data
o m
3
4
+ Data
Ground
o t.c
p
s as viewed from the front.
g
Fig. 17.3 Pin connections for the two different types of USB socket,
b lo
p .
o u
gr
ts
e n
u d
st
Fig. 17.4 Most
i ty plugs have this distinctive marking symbol.
USB
.c
w
w
w

SYNC PID Device Address End Point CRC

xxxxxxx xxxx xxxxx
Token packets
00000001 xxxx,xxxx
SYNC PID Data CRC

xxxx,xxxx
Data packets
00000001 (0-1023 bytes) xxxxx
SYNC PID
00000001 xxxx,xxxx
Handshake packets
Packet Identifier Nibble Codes:
OUTPUT = 0001
INPUT = 1001 Tokens
SET UP = 1101
o m
DATA0 = 0011
o t.c
Data
DATA1 = 1011
sp
ACK = 0010
o g
= l1010
NAK
b
p.= 1110
Hankshake
STALL
o u
Fig. 17.5 Examples of the various kindsr of USB signaling and data packets.
s g
n t
17(II) IrDA Standard
d e
t u
s
ti y
.c
w
w
w
IrDA is the abbreviation for the Infrared Data Association, a nonprofit organization for setting
standards in IR serial computer connections.
The transmission in an IrDAcompatible mode (sometimes called SIR for serial IR) uses, in the
simplest case, the RS232 port, a builtin standard of all compatible PCs. With a simple interface,

shortening the bit length to a maximum of 3/16 of its original length for powersaving
requirements, an infrared emitting diode is driven to transmit an optical signal to the receiver.
This type of transmission covers the data range up to115.2 kbit/s which is the maximum data rate
supported by standard UARTs (Fig.17.7). The minimum demand for transmission speed for
IrDA is only 9600 bit/s. All transmissions must be started at this frequency to enable
compatibility. Higher speeds are a matter of negotiation of the ports after establishing the links.
IR output
Pulse shaping Transmitter
TOIM3000 or 4000 series

UART 16550/RS232 TOIM3232 transceiver
Pulse recovery Receiver

o m IR input
o t.c
p
Fig. 17.7 One end of the over all serial link.
s
o g
bl
Please browse www.irda.org for details
p .
o
Serial Port Infrared Receiver u IR RXR MODULE
TSOP1838
gr
78L05ts
e n
1 t ud
6
y s
it
.c
w
w
9
5
w
MAX 232
Fig. 17.8(a) A simple circuit for Infrared interface to RS232 port.
7805- is a voltage regulator which supplies 5V to the MAX232 the Level converter. It converts
the signal which is at 5V and Ground to 12V compatible with RS232 standard.

3
VS
Control
Input
Circuit
1
OUT
PIN
Band Demodu-
AGC
Pass lator
2
GND
Fig. 17.8(b) The TSOP Receiver

o m
Question o t.c
s p
g
Q.1. From the internet find out a microcontroller with in-built USB port and draw its architecture
o
Ans: .bl
u p
DP0
r o DP2
DM0
s g REPEATER
HUB DM2
USB
n t
XTAL1
d e DP3
DM3
t u
y s CLOCK
GPIO
it GENERATOR
.c
XTAL2 PA[0:7]
wLFT
w PD[0:6]
wCPUSEL AVR ADC

ADC[0:11]
TIMER/
COUNTER VCC[1,2,A]
RSTN
ROM VSS[1,2,A]
TEST AND VOLTAGE
SRAM REGULATORS V33[1,2,A]
The architecture of a typical microcontroller from Atmel with an on-chip USB controller
Q.2 Draw the circuit diagram for interfacing an IrDA receiver with a typical microcontroller

Ans:
330 *) +5V**)
3
TSOP18..
>10 K
recomm.
1 C
2
GND
A typical application circuit The Receiver Interface to a Microcontroller
o m
Further Reference
o t.c
1. www.usb.org
s p
2. www.irda.org
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
18
AD and DA Converters
Learn about Real Time Signal Processing

Sampling Theorem
DA Conversion
Different Methods of AD Conversions
o Successive Approximation
o Flash
o Sigma Delta
Pre-Requisite o m
Digital Electronics, Microprocessors o t.c
s p
18 Introduction o g
. bl
u p
The real time embedded controller is expected to process the real world signals within a
specified time. Most of the real world signals are analog in nature. Take the examples of your
r o
mobile phone. The overall architecture is shown on Fig.18.1. The Digital Signal Processor (DSP)
s g
is fed with the analog data from the microphone. It also receives the digital signals after
ent
demodulation from the RF receiver and generates the filtered and noise free analog signal
through the speaker. All the processing is done in real time.
u d
The processing of signals in real time is termed as Real Time Signal Processing which has
st
been coined beautifully in the Signal Processing industry.
it y
.c
wRF receiver (Rx)
w
w DSP
Speaker
Antenna
RF transmitter (Tx) Microphone
Display
Micro-
controller
Keyboard
Fig. 18.1 The block diagram
The detailed steps of such a processing task is outlined in Fig.18.2

Signal Processing
Analog Processing
Analog Processing
Measurand Sensor Conditioner Analog Processor ADC

LPF
Digital Processing
Analog Processor
DSP DAC
LPF
Fig. 18.2 Real Time Processing of Analog Signals o m

t.c speech signal. The
Measurand is the quantity which is measured. In this case it is the analog
sensor is a microphone. In case of your mobile set it is the microphone p o which is embedded in it.
The conditioner can be a preamplifier or a demodulator. The Analog
g s Processor mostly is a Low
lo which
Pass Filter (LPF). This is primarily used to prevent aliasing which is a term to be explained later
. b
in this chapter. The following is the Analog to Digital Converter has a number of stages to
convert an analog signal into digital form. The Digital
system with a processor. Further the processed signal u pisSignal Processing is carried out by a
converted into analog signal by the
Digital to Analog Converter which finally sendsro the output to the real world through another
Low Pass Filter.
s g
t
nis depicted in Fig.18.3
The functional layout of the ADC and DAC
d e
t u
s ADC
ti y
. c x (t) x (t) b bits
x(t) Samplerw x(t) s
Quantizer
q
Coder
w x (n)
q [x (n)]
b
w
p(t)
DAC
b bits
Decoder Sample/hold
[yb(n)] y(n)
Fig. 18.3 The functional layout of the ADC and DAC

The DA Converter
In theory, the simplest method for digital-to-analog conversion is to pull the samples from
memory and convert them into an impulse train.
3
a. Impulse train
2
1
Amplitude
-1
o m
-2
o t.c
-3 s p
0 1 2 3
o g 4 5
Time
. bl
p
Fig. 18.4(a) The analog equivalent of digital words
u
r o
3 s g
ent
c. Zeroth-order hold
2
u d
st
1
t y
Amplitude
i
0 .c
w
-1 w
w
-2
-3
0 1 2 3 4 5
Time
Fig. 18.4(b) The analog voltage after zero-order hold

3
f. Reconstructed analog signal
2
1
Amplitude
0
-1
-2
-3
0 1 2 3 4
o m5
t.c
Time
o
Fig. 18.4(c) The reconstructed analog signal after filtering
p
A digital word (8-bits or 16-bits) can be converted to itssanalog equivalent by weighted
averaging. Fig. 18.5(a) shows the weighted averaging methodofor g a 3-bit converter.
b l V or to a common ground. Only
.
A switch connects an input either to a common voltage
p current
switches currently connected to the voltage source contribute to the non-inverting input
u
summing node. The output voltage is given by the expression drawn below the circuit diagram;
o to ground. There are eight possible
SX = 1 if switch X connects to V, SX = 0 if itrconnects
s
combinations of connections for the three switches,g and these are indicated in the columns of the
n t
table to the right of the diagram. Each combination is associated with a decimal integer as
shown. The inputs are weighted in a 4:2:1 e relationship, so that the sequence of values for 4S3 +
2S2 + S1 form a binary-coded decimaldnumber representation. The magnitude of Vo varies in
units (steps) of (Rf/4R)V from 0 ttou 7. This circuit provides a simplified Digital to Analog
y s controls the switches, and the amplifier provides the analog
t
Converter (DAC). The digital input
i
output.
.c
w
w
w

V S3 S2 S1
R 0 0
0 0
S3
Rf 1 0 0 1
2R
S2 2 0 1 0
-
+ 3 0 1 1
4R V0
S1 4 1 0 0
5 1 0 1
(
V0 = -R f S3 V + S2 V + S1 V
R 2R 4R ) 6 1 1 0
-R
= f V(4S3 + 2S2 + S1)
7 1 1
o m 1
4R
Fig. 18.5(a) The binary weighted register method o t.c
s p
o g
bl
V Rf
S3
2R p . 2R
o u
gr -
+
S2
s V0
nt 2R
R
d e
S1 u
V 1
st 2R R 1(S3) = 3R 2 S3
i t y 1(S2) = V 1 S2
.c 3R 4
w 2R
1(S1) = V 1 S1
w 3R 8
w(
V0 = -R f S3 V 1 + S2 V 1 + S1 V 1 )
3R 2 3R 4 3R 8
= - R f V (4S3 + 2S2 + S1)
24R
Fig. 18.5(b) R-2R ladder D-A conversion circuit
Fig. 18.5(b) depicts the R-2R ladder network. The disadvantage of the binary weighted register is
the availability and manufacturing of exact values of the resistances. Here also the output is
proportional to the binary-coded decimal number.
The output of the above circuits as given in Fig. 18.5(a) and 18.5(b) is equivalent analog
values as shown in Fig. 18.4(a). However to reconstruct the original signal this is further passed
through a zero order hold (ZOH) circuit followed by a filter (Fig.18.2). The reconstructed
waveforms are shown in Fig. 18.4(b) and 18.4(c).
The AD Converter
The ADC consists of a sampler, quantizer and a coder. Each of them is explained below.
Sampler
The sampler in the simplest form is a semiconductor switch as shown below. It is followed by a
hold circuit which a capacitor with a very low leakage path.
Semiconductor
Switch
Analog signal Sampled signal
Capacitor
o m
Control signal
o t.c
s p
1
o g
0.8
.bl
0.6
u p
0.4 r o
s g
nt
0.2
0
0
d e
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
t u
s
Fig. 18.6 The Sample and Hold Circuit
y
i t
.c
w
w
w

Analog Signal
2
1.5
Amplitude 1
0.5
0
-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4
time(ms)
Sampled Signal after the capacitor
2
1.5
Amplitude
1
0.5 o m
0
o t.c
-0.5
0 0.5 1 1.5 2 2.5 p
s3 3.5 4
o g time(ms)
Fig. 18.7 Sample and Holdl Signals
. b
Quantizer
u p
r o till the next switching. The quantizer is
The hold circuit tries to maintain a constant voltage
g
responsible to convert this voltage to a binarysnumber. The number of bits in a binary number
decides the approximation and accuracy. n
t
The sample hand hold output can assumeeany real number in a given range. However because of
u dpossible in the digital domain 0 to 2 which corresponds N-1
to a voltage range of 0 to V volts st

finite number of bits (say N) the levels
i ty
3.025
.b.c Sampled analog signal
w
3.020w
w
Amplitude (in volts)
3.015
3.010
3.005
3.000
0 5 10 15 20 25 30 35 40 45 50
Time
Fig. 18.8(a) Hold Circuit Output
3025
c. Digitized signal
3020
Digital number 3015
3010
3005
3000
0 5 10 15 20 25 30 35 40 45 o50m
Sample number
o t.c
Fig. 18.8(b) The Quantized Value
s p
o g
bl
Coder
p .
ou
This is an optional device which is used after the conversion is complete. In microprocessor
based systems the Coder is responsible for packing several samples and transmitting them
gr
onwards either in synchronous or in asynchronous manner. For example in TI DSK kits you will
s
nt
find the AD converters with CODECs are interfaced to McBSP ports (short form of Multi-
channel Buffered Serial Ports). Several 16-bit sampled values are packed into a frame and
d e
transmitted to the processor or to the memory by Direct Memory Access (DMA). The Coder is
t u
responsible for controlling the ADC and transferring the Data quickly for processing. Sometimes
s
the Codec is responsible for compressing several samples together and transmitting them. In your
y
i t
desktop computers you will find audio interfaces which can digitize and record your voice and
.c
store them in .wav format. Basically this AD conversion followed by coding. The wav format is
w
the Pulse-Code-Modulated (PCM) format of the original digital voice samples.
w
The SamplingwTheorem
The definition of proper sampling is quite simple. Suppose you sample a continuous signal in
some manner. If you can exactly reconstruct the analog signal from the samples, you must have
done the sampling properly. Even if the sampled data appears confusing or incomplete, the key
information has been captured if you can reverse the process. Fig.18.9 shows several sinusoids
before and after digitization. The continuous line represents the analog signal entering the ADC,
while the square markers are the digital signal leaving the ADC. In (a), the analog signal is a
constant DC value, a cosine wave of zero frequency. Since the analog signal is a series of
straight lines between each of the samples, all of the information needed to reconstruct the
analog signal is contained in the digital data. According to our definition, this is proper sampling.
The sine wave shown in (b) has a frequency of 0.09 of the sampling rate. This might represent,
for example, a 90cycle/second sine wave being sampled at1000 samples/second. Expressed in

another way, there are 11.1 samples taken over each complete cycle of the sinusoid. This
situation is more complicated than the previous case, because the analog signal cannot be
reconstructed by simply drawing straight lines between the data points. Do these samples
properly represent the analog signal? The answer is yes, because no other sinusoid, or
combination of sinusoids, will produce this pattern of samples (within the reasonable constraints
listed below). These samples correspond to only one analog signal, and therefore the analog
signal can be exactly reconstructed. Again, an instance of proper sampling. In (c), the situation is
made more difficult by increasing the sine wave's frequency to 0.31 of the sampling rate. This
results in only 3.2 samples per sine wave cycle. Here the samples are so sparse that they don't
even appear to follow the general trend of the analog signal. Do these samples properly represent
the analog waveform? Again, the answer is yes, and for exactly the same reason. The samples
are a unique representation of the analog signal. All of the information needed to reconstruct the
continuous waveform is contained in the digital data. Obviously, it must be more sophisticated
than just drawing straight lines between the data points. As strange as it seems, this is proper
sampling according to our definition. In (d), the analog frequency is pushed even higher to 0.95
o m
of the sampling rate, with a mere 1.05 samples per sine wave cycle. Do these samples properly
o t.c
represent the data? No, they don't! The samples represent a different sine wave from the one
contained in the analog signal. In particular, the original sine wave of 0.95 frequency
p
misrepresents itself as a sine wave of 0.05 frequency in the digital signal. This phenomenon of
s
g
sinusoids changing frequency during sampling is called aliasing. Just as a criminal might take on
o
bl
an assumed name or identity (an alias), the sinusoid assumes another frequency that is not its
p .
own. Since the digital data is no longer uniquely related to a particular analog signal, an
unambiguous reconstruction is impossible. There is nothing in the sampled data to suggest that
u
the original analog signal had a frequency of 0.95 rather than 0.05. The sine wave has hidden its
o
r
true identity completely; the perfect crime has been committed! According to our definition, this
g
s
is an example of improper sampling. This line of reasoning leads to a milestone in DSP, the
ent
sampling theorem. Frequently this is called the Shannon sampling theorem, or the Nyquist
Sampling theorem, after the authors of 1940s papers on the topic. The sampling theorem
u d
indicates that a continuous signal can be properly sampled, only if it does not contain frequency
st
components above one-half of the sampling rate. For instance, a sampling rate of 2,000
t y
samples/second requires the analog signal to be composed of frequencies below 1000
i
.c
cycles/second. If frequencies above this limit are present in the signal, they will be aliased to
frequencies between 0 and 1000 cycles/second, combining with whatever information that was
w
legitimately there.
w
w

3 3
a. Analog frequency = 0.0 (i.e., DC) b. Analog frequency = 0.09 of sampling rate
2 2
Amplitude 1 1
Amplitude
0 0
-1 -1
-2 -2
-3 -3
Time (or sample number) Time (or sample number)
3 3
c. Analog frequency = 0.31 of sampling rate d. Analog frequency = 0.95 of sampling rate
2 2
o m
t.c
1 1
Amplitude
Amplitude
0 0
p o
-1 -1
g s
o
-2 -2
. bl
-3
Time (or sample number) u p-3
Time (or sample number)
r owave at different frequencies
g
Fig. 18.9 Sampling a sine
s
n t
Methods of AD Conversion e
u d
st to digital equivalent at the quantizer. There are various
The analog voltage samples are converted
i t
ways to convert the analog values y to the nearest finite length digital word. Some of these
methods are explained below.c
.
w
w
w

Successive Approximation ADC

V0
R2
R1
0 V1 V2 V-
R3 4
2 - 1 V
V-
0 V2 OG1
V4 OG2 6 DAC
V+
R4 V+
3 + 7 5
0 U1 7
0 V3 uA741
V+ 3
+
V+ o
m 5 ADC
.c 6
V+ V-
2 - V-o tOG2
OG1
- V4 - V7 p
s 4 U2
1
G
o g
- - V8
b l 0
uA741
0 p
0
.
o u
r
Fig. 18.10 The Counter Converter
g DA conversion. The 3-bit input as shown in
ts
The AD conversion is indirectly carried out through
Fig.18.10 to the DA converter may changensequentially from 000 to 111 by a 3-bit counter. The
unknown voltage (V8) is applied to one e
d inputwhich
the unknown voltage the comparatoruoutput
of the comparator. When the DA output exceeds
t was negative becomes positive. This can be
s is approximately equivalent digital value of the unknown
used to latch the counter value which
voltage.
it y
c
. counting is the time taken to reach the highest count is large. For
whas to count 256 for converting the maximum input. It therefore has
The draw back of sequential
w
instance an 8-bit converter
to consume 256 clockw cycles which is large. Therefore, a different method called successive
approximation is used for counting as shown in Fig.18.11.

100
110 010
111 101 011 001
000
Fig. 18.11 The successive approximation counting

o m
Consider a three-bit conversion for simplicity. The counting ADC must
t.c allowinforFig.18.11.
up to eight
start a conversion cycle a three-bit digital register is first cleared, and

o
comparisons (including zero). The search tree for an SAR search is illustrated
p thena reference
To
loaded with the triplet
g
100. The register state provides the input to a DAC, and that provides s output. This
output is compared to the analog signal to be converted, and
loThis
a decision is made whether the
.
analog signal is greater than or less than the reference signal. b comparison is essentially the
same as that made for the previous ADC, except that because
the result of this single comparison is used to eliminate u pconcurrently
of the use of the half-way code
half the possible DAC steps.
As the tree suggests, if the analog signal is greater r o then all the smaller DAC outputs are
g
s it unchanged. In either case the next bit is set
eliminated from consideration. Digital logic associated with the comparison then either clears the
t
nhalf, and a new comparison made. Again half the
MSB (Most Significant Bit) to 0 or simply leaves
to 1, i.e., to the mid-code of the selectede
remaining DAC states are eliminatedd from consideration. Depending on the result of the
comparison the second bit is clearedttou 0, or it is left unchanged at 1. In either case the third bit is
s
set to 1 and the comparison stepyrepeated.
i t Each time a comparison is made half the remaining
.c are needed. The SAR ADC is perhaps the most common of the
DAC output states will be eliminated. Instead of having to step through 2N states for an N bit
conversion only N comparisons
w rapid and relatively inexpensive conversion
w
converters, providing a relatively
Flash Converter
w
Making all the comparisons between the digital states and the analog signal concurrently makes
for a fast conversion cycle. A resistive voltage divider (see figure) can provide all the digital
reference states required. There are eight reference values (including zero) for the three-bit
converter illustrated. Note that the voltage reference states are offset so that they are midway
between reference step values. The analog signal is compared concurrently with each reference
state; therefore a separate comparator is required for each comparison. Digital logic then
combines the several comparator outputs to determine the appropriate binary code to present.

V0 Analog
input
3R
2 111
6.5 Vo
8
R
110
5.5 Vo MSB
8
R 22
101
4.5 Vo
8
R 21
100 LSB
3.5 Vo
8
o m
t.c
R 20
2.5 Vo
011
p o
8
g s
o
bl
R
010
1.5 Vo
8
p .
R
o u
0.5 Vo
8
001
r
g 3-bit flash converter
s
nt
R
2
d e
t uFig. 18.12 Flash Converter
y s
t
ci converters
Sigma-Delta () AD
.
w
w
The analog side of a sigma-delta converter (a 1-bit ADC) is very simple. The digital side, which
w
is what makes the sigma-delta ADC inexpensive to produce, is more complex. It performs
filtering and decimation. The concepts of over-sampling, noise shaping, digital filtering, and
decimation are used to make a sigma-delta ADC.
Over-sampling
First, consider the frequency-domain transfer function of a traditional multi-bit ADC with a sine-
wave input signal. This input is sampled at a frequency Fs. According to Nyquist theory, Fs must
be at least twice the bandwidth of the input signal. When observing the result of an FFT analysis
on the digital output, we see a single tone and lots of random noise extending from DC to Fs/2
(Fig.18.13). Known as quantization noise, this effect results from the following consideration:
the ADC input is a continuous signal with an infinite number of possible states, but the digital
output is a discrete function, whose number of different states is determined by the converter's

resolution. So, the conversion from analog to digital loses some information and introduces some
distortion into the signal. The magnitude of this error is random, with values up to LSB.
o m
o t.c
s p
o g
. bl
p
Fig. 18.13 FFT diagram of a multi-bit ADC with a sampling frequency FS
u
r o
If we divide the fundamental amplitude by the RMS sum of all the frequencies representing
s g
noise, we obtain the signal to noise ratio (SNR). For an N-bit ADC, SNR = 6.02N + 1.76dB. To
ent
improve the SNR in a conventional ADC (and consequently the accuracy of signal reproduction)
you must increase the number of bits. Consider again the above example, but with a sampling
d
frequency increased by the oversampling ratio k, to kFs (Fig.18.14). An FFT analysis shows that
u
t
the noise floor has dropped. SNR is the same as before, but the noise energy has been spread
s
it y
over a wider frequency range. Sigma-delta converters exploit this effect by following the 1-bit
.c
ADC with a digital filter (Fig.18.14). The RMS noise is less, because most of the noise passes
through the digital filter. This action enables sigma-delta converters to achieve wide dynamic
w
range from a low-resolution ADC.
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
Fig. 18.14 FFT diagram of a multi-bit ADC with a sampling frequency kFS and effect of
Digital Filter on Noise Bandwidth
st
it y
Noise Shaping .c
w
w
It includes a difference amplifier, an integrator, and a comparator with feedback loop that
w
contains a 1-bit DAC. (This DAC is simply a switch that connects the negative input of the
difference amplifier to a positive or a negative reference voltage.) The purpose of the feedback
DAC is to maintain the average output of the integrator near the comparator's reference level.
The density of "ones" at the modulator output is proportional to the input signal. For an
increasing input the comparator generates a greater number of "ones," and vice versa for a
decreasing input. By summing the error voltage, the integrator acts as a lowpass filter to the input
signal and a highpass filter to the quantization noise. Thus, most of the quantization noise is
pushed into higher frequencies. Oversampling has changed not the total noise power, but its
distribution. If we apply a digital filter to the noise-shaped delta-sigma modulator, it removes
more noise than does simple oversampling.(Fig.18.16).

Signal Input, X1
+ X2 X3 To Digital
+ X4 Filter
-
Difference Integrator -
Amp Comparator
(1-bit ADC)
X5
(1-bit ADC)
Fig. 18.15 Block Diagram of 1-bit Sigma Delta Converter
o m
o t.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 18.16 The Effect of Integrator and Digital Filter on the Spectrum

1-bit Data Multi-bit Output

Stream Data Data
Analog Delta Sigma Digital Low Decimation

Input Modulator Pass Filter Filter
Fig. 18.17 The Digital Side of the Sigma-Delta modulator
Digital Filtering
The output of the sigma-delta modulator is a 1-bit data stream at the sampling rate, which can be
in the megahertz range. The purpose of the digital-and-decimation filter (Fig.18.17) is to extract
o m
information from this data stream and reduce the data rate to a more useful value. In a sigma-
delta ADC, the digital filter averages the 1-bit data stream, improves the ADC resolution, and
bandwidth, settling time, and stop band rejection. o t.c

removes quantization noise that is outside the band of interest. It determines the signal
s p
Conclusion
o g
. bl
u p
In this chapter you have learnt about the basics of Real Time Signal Processing, DA and AD
conversion methods. Some microcontrollers are already equipped with DA and AD converters
r o
on the same chip. Generally the real world signals are broad band. For instance a triangular wave
g
though periodic will have frequencies ranging till infinite. Therefore anti-aliasing filter is always
s
ent
desirable before AD conversion. This limits the signal bandwidth and hence finite sampling
frequency. The question answer session shall discuss about the quantization error, specifications
d
of the AD and DA converters and errors at the various stages of real time signal processing. The
u
t
details of interfacing shall be discussed in the next lesson.
s
it y
The AD and DA converter fall under mixed VLSI circuits. The digital and analog circuits
coexist on the same chip. This poses design difficulties for VLSI engineers for embedding fast
.c
and high resolution AD converters along with the processors. Sigma-Delta ADCs are most
w
complex and hence rarely found embedded on microcontrollers.
w
w

Question Answers
Q1. What are the errors at different stages in a Real Time Signal Processing system? Elaborate
on the quantization error.
Ans: Refer to text
Q2. What are the difference specifications of a D-A converter?
Ans: No. of bits (8-bits, 16-bits etc), Settling Time, Power Supply range, Power
Consumption, Various Temperature ratings, Packaging
Q3. What are the various specifications of an A-D converter?
range, Power Consumption, Various Temperature ratings, Packaging o m

Ans: No. of bits (8-bits, 16-bits etc), No. of channels, Conversion Time, Power Supply
Q4. How to construct a second order Delta-Sigma AD Converter. o t.c

s p
Ans: Refer to text and Fig.18.15
o g
Q5. . bl temperature signal without using
What method you will adopt to digitize a slowly varying
AD converter?
u p
o
Ans: Instead of AD Converters use Voltage rto Frequency Converters followed by a counter
s g
n t
d e
t u
s
ti y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
19
Analog Interfacing
Know the interfacing of analog signals to microcontrollers/microprocessors

Generating Analog Signals
Designing AD and DA interfaces
Various Methods of acquiring and generating analog data
Pre-Requisite
19(I) Introduction o m
o t.c
Fig.19.1 shows a typical sensor network. You will find a number of sensors and actuators
s p
connected to a common bus to share information and derive a collective decision. This is a
g
complex embedded system. Digital camera falls under such a system. Only the analog signals are
o
. bl
shown here. Last lesson discussed in detail about the AD and DA conversion methods. This
chapter shall discuss the inbuilt AD-DA converter and standalone converters and their
interfacing.
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 19.1 The Analog Interfacing Network

Fig. 19.2 The Analog-Digital-Analog signal path with real time processing
Different Stages of Fig.19.2

o m
t.c
Stage-1 Signal Amplification and Conditioning; Stage-2 Anti-aliasing Filter; Stage-3 Sample and
Hold;
p o
Stage-4 Analog to Digital Converter; Stage-5 Digital Processing and Data manipulation in a
Processor;
gs
o
bl
Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion;
.
Low pass filtering u p
Stage-7 Digital to Analog Conversion; Stage-8 Removal of Glitches and Spikes; Stage-8 Final
r o
19(II) Embedded AD Converters s g in Intel 80196
n t
de converter inbuilt to 80196 embedded processor. The
Fig.19.3 shows the block diagram of the AD
details of the subsystems are given asufollows:
t
Analog Inputs ys
c it EPA or PTS
. V ANGND Command
w REF
w
Analog Mux
w Successive
Control
Logic
Sample Approximation
and Hold A/D
Converter
Status
AD_RESULT AD_COMMAND AD_TIME AD_TEST
Multiplexed with port inputs
Fig. 19.3 The block diagram of the Internal AD converter

Analog Inputs: There are 12 input channels which are multiplexed with the Port P0 and Port P1
of the processor.

ANGND: It is the analog ground which is separately connected to the circuit from where analog
voltage is brought inside the processor.
Vref: It is reference voltage which decides the range of the input voltage. By making it negative
bipolar inputs can be used.
EPA: Event Processor Array

Control applications often require high-speed event control. For example, the controller may
need to periodically generate pulse-width modulated outputs or an interrupt. In another
application, the controller may monitor an input signal to determine the status of an external
device. The event processor array (EPA) was designed to reduce the CPU overhead associated
with these types of event control. This chapter describes the EPA and its timers and explains how
to configure and program them. The EPA can control AD converter such as generating timing
pulses, start conversion signals etc.
o m
PTS: Peripheral Transaction Server
o t.c
The microcontrollers interrupt-handling system has two components: s p the programmable
g
lo by the software. Interrupts that go
interrupt controller and the peripheral transaction server (PTS). The programmable interrupt
b
controller has a hardware priority scheme that can be modified
through the interrupt controller are serviced by interrupt. service routines that you provide. The
upper and lower interrupt vectors in special-purpose u p memory contain the interrupt service
r
routines addresses. The peripheral transaction server o (PTS), a microcoded hardware interrupt
s g sevenhandling;
processor, provides high-speed, low-overhead interrupt it does not modify the stack or
the Processor Status Word. The PTS supports
n t microcoded routines that enable it to
d
transfer bytes or words, either individually
e or in blocks, between any memory locations; manage
complete specific tasks in lesser time than an equivalent interrupt service routine can. It can
t u
multiple analog-to-digital (A/D) conversions; and transmit and receive serial data in either
s
ty
asynchronous or synchronous mode.
i
Analog Mux: Analog .cMultiplexer
w
w
w
It selects a particular analog channel for conversion. Only after completing conversion of one
channel it switches to subsequent channels.
The associated Registers

AD_COMMAND register
This register selects the A/D channel, controls whether the A/D conversion starts immediately or
is triggered by the EPA, and selects the operating mode.
AD_RESULT
For an A/D conversion, the high byte contains the eight MSBs from the conversion, while the
low byte contains the two LSBs from a 10- bit conversion (undefined for an 8-bit conversion),
indicates which A/D channel was used, and indicates whether the channel is idle. For a

threshold-detection, calculate the value for the successive approximation register and write that
value to the high byte of AD_RESULT. Clear the low byte or leave it in its default state.
AD_TEST A/D Conversion Test

This register specifies adjustments for zero-offset errors.
AD_TIME A/D Conversion Time

This register defines the sample window time and the conversion time for each bit.
INT_MASK Interrupt Mask

The AD bit in this register enables or disables the A/D interrupt. Set the AD bit to enable the
interrupt request.
INT_PEND Interrupt Pending

The AD bit in this register, when set, indicates that an A/D interrupt request is pending.
o m
A/D Converter Operation
o t.c
s p
An A/D conversion converts an analog input voltage to a digital value, stores the result in
g
the AD_RESULT register, and sets the A/D interrupt pending bit. An 8-bit conversion provides
o
bl
20 mV resolution, while a 10-bit conversion provides 5 mV resolution. An 8-bit conversion takes
.
less time than a 10-bit conversion because it has two fewer bits to resolve and the comparator
p
ou
requires less settling time for 20 mV resolution than for 5 mV resolution. Either the voltage on
an analog input channel or a test voltage can be converted. Converting the test inputs is used to
gr
calculate the zero-offset error, and the zero-offset adjustment is used to compensate for it. This
s
nt
feature can reduce or eliminate off-chip compensation hardware. Typically, the test voltages are
converted to adjust for the zero-offset error before performing conversions on an input channel.
d e
The AD_TEST register is used to program for zero-offset adjustment. A threshold-detection
t u
compares an input voltage to a programmed reference voltage and sets the A/D interrupt pending
s
bit when the input voltage crosses over or under the reference voltage. A conversion can be
y
it
started by a write to the AD_COMMAND register or it can be initiated by the EPA, which can
.c
provide equally spaced samples or synchronization with external events.
w
w
Once the A/D converter receives the command to start a conversion, a delay time elapses
w
before sampling begins. During this sample delay, the hardware clears the successive
approximation register and selects the designated multiplexer channel. After the sample delay,
the device connects the multiplexer output to the sample capacitor for the specified sample time.
After this sample window closes, it disconnects the multiplexer output from the sample capacitor
so that changes on the input pin will not alter the stored charge while the conversion is in
progress. The device then zeros the comparator and begins the conversion. The A/D converter
uses a successive approximation algorithm to perform the analog-to-digital conversion. The
converter hardware consists of a 256-resistor ladder, a comparator, coupling capacitors, and a 10-
bit successive approximation register (SAR) with logic that guides the process. The resistive
ladder provides 20 mV steps (VREF = 5.12 volts), while capacitive coupling creates 5 mV steps
within the 20 mV ladder voltages. Therefore, 1024 internal reference voltage levels are available
for comparison against the analog input to generate a 10-bit conversion result. In 8- bit
conversion mode, only the resistive ladder is used, providing 256 internal reference voltage
levels. The successive approximation conversion compares a sequence of reference voltages to

the analog input, performing a binary search for the reference voltage that most closely matches
the input. The full scale reference voltage is the first tested. This corresponds to a 10-bit result
where the most-significant bit is zero and all other bits are ones (0111111111). If the analog
input was less than the test voltage, bit 10 of the SAR is left at zero, and a new test voltage of
full scale (0011111111) is tried. If the analog input was greater than the test voltage, bit 9 of
SAR is set. Bit 8 is then cleared for the next test (0101111111). This binary search continues
until 10 (or 8) tests have occurred, at which time the valid conversion result resides in the
AD_RESULT register where it can be read by software. The result is equal to the ratio of the
input voltage divided by the analog supply voltage. If the ratio is 1.00, the result will be all ones.
The following A/D converter parameters are programmable:
conversion input input channel
zero-offset adjustment no adjustment, plus 2.5 mV, minus 2.5 mV, or minus 5.0 mV
conversion times sample window time and conversion time for each bit
operating mode 8- or 10-bit conversion or 8-bit high or low threshold detection
conversion trigger immediate or EPA starts
o m
19(III) The External AD Converters (AD0809)t.c
p o
s
START CLOCK
g
b lo
8-BIT A/D
p .CONTROL & END OF
o u TIMING CONVERSION
(INTERRUPT)
8
CHANNELS
gr
8 ANALOG
MULTIPLE-
XING
ts S.A.R
INPUTS
ANALOG
SWITCHES
e n TRI-
u d COMPARATOR
STATE
st OUTPUT
LATCH
8-BIT
OUTPUTS
i ty BUFFER
.c SWITCH TREE
3-BIT ADDRESS w ADDRESS
w LATCH
AND
ADDRESS
LATCH ENABLE w DECODER 256R REGISTOR
LADDER
VCC GND REF(+) REF(-) OUTPUT

ANABLE
Fig. 19.4 The internal architecture of 0809 AD converter

IN3 1 28 IN2
IN4 2 27 IN1
IN5 3 26 IN0
IN6 4 25 ADD A
IN7 5 24 ADD B
START 6 23 ADD C
EOC 7 22 ALE
-5
2 8 21 2-1MSB
OUTPUT ENABLE 9 20 2-2
CLOCK 10 19 2-3
VCC 11 18 2-4
VREF (+) 12 17 2-8LSB
o m
V (-) c
GND
-7
2
13
14
16
15
REF
-6
2 o t.
s p
g
Fig. 19.5 The signals of 0809loAD converter
. b
Functional Description
u p
r o
Multiplexer
s g
n t analog signal multiplexer. A particular input
The device contains an 8-channel single-ended
channel is selected by using the address d edecoder. Table 1 shows the input states for the address
t
lines to select any channel. The address u is latched into the decoder on the low-to-high transition
of the address latch enable signal. s
i ty
.c TABLE 1
w
SELECTED ANALOG ADDRESS LINE
w CHANNEL C B A
w IN0 L L L
IN1 L L H
IN2 L H L
IN3 L H H
IN4 H L L
IN5 H L H
IN6 H H L
IN7 H H H
The Converter
This 8-bit converter is partitioned into 3 major sections: the 256R ladder network, the successive
approximation register, and the comparator. The converters digital outputs are positive true. The

256R ladder network approach (Figure 1) was chosen over the conventional R/2R ladder because
of its inherent monotonicity, which guarantees no missing digital codes. Monotonicity is
particularly important in closed loop feedback control systems. A non-monotonic relationship
can cause oscillations that will be catastrophic for the system. Additionally, the 256R network
does not cause load variations on the reference voltage.
CONTROLS FROM S.A.R.

REF(+)
1 R
o m
t.c
256 R TO
COMPARATOR
R
p o INPUT
g s
o
bl
R
p .
R
o u
g r
REF(-)
ts
Fig. 19.6 Then 256R ladder network
d e
t
The bottom resistor and the top resistoru of the ladder network in Fig.19.6 are not the same value
as the remainder of the network. The
y s difference in these resistors causes the output characteristic
to be symmetrical with the zero
c it signal
and full-scale points of the transfer curve. The first output
.
transition occur when the analog has reached +12 LSB and succeeding output transitions
w tothefull-scale.
occur every 1 LSB later up The successive approximation register (SAR) performs
w
8-iterations to approximate input voltage. For any SAR type converter, n-iterations are
required for an n-bitwconverter. Fig.19.7 shows a typical example of a 3-bit converter. The A/D
converters successive approximation register (SAR) is reset on the positive edge of the start
conversion (SC) pulse. The conversion is begun on the falling edge of the start conversion pulse.
A conversion in process will be interrupted by receipt of a new start conversion pulse.
Continuous conversion may be accomplished by tying the end-of-conversion (EOC) output to the
SC input. If used in this mode, an external start conversion pulse should be applied after power
up. End-of-conversion will go low between 0 and 8 clock pulses after the rising edge of start
conversion. The most important section of the A/D converter is the comparator. It is this section
which is responsible for the ultimate accuracy of the entire converter.

111 +1/2 LSB INFINITE R

FULL-SCALE 111 TOTAL PERFECT CO
110 IDEAL CURVE UNADJUSTED
A/D OUTPUT CODE ERROR = 1/2 LSB
A/D OUTPUT CODE

110 ERROR IDEAL 3-BIT CODE
101
101
100
100 -1 LSB
011 ABSOLUTE
NONLINEARITY = 1/2 LSB 011
ACCURACY
010
NONLINEARITY = -1/2 LSB 010 -1/2 LSB
001 001 QUANTIZATION
ZERO ERROR = -1/4 LSB ERROR
000 VIN 000
0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 VIN
0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8
VIN AS FRACTION OF FULL-SCALE VIN AS FRACTION OF FULL-SCALE
Fig. 19.7 The 3-bit AD Converter Resolution
Interface to a typical Processor m

Fig.19.8 shows the layout for interface to a processor with 16-address.c
o
and write lines and 8-data lines (DB0-DB7). The address lines are dividedo t lines(AD0-AD15), read
into two groups. AD0-
AD2 are used to select the analog channel. The ALE signal of p
g
address on the lines A0-A2 for keeping a particular channel selected
s the ADC is used to latch the
till the end of conversion.
The other group (AD3-AD15) are decoded and combined
b loAwith Read and Write signals to
generate the START, ALE and OE (output enable) signals. . write operation starts the ADC.
p data transfer. The interrupt service
subroutine can read the data through DB0-DB7 and u
The EOC signal can be used to initiate an interrupt driven
o initiate the next conversion by subsequent
gr
write operation. Fig.19.9 shows the timing diagram with system clock (not the ADC clock).
ts
READ
e n
u d INTERRUPT
ADDRESS st 500 kHz CLK 0E
ti y
5.000V VREF (+) E0C INTERRUPT
DECODE
(AD4 AD15)* 0.000V VREF (-)
.c 2-1 DB7 MSB
w START 2 -2
DB6
w ALE 2-3 DB5
w WRITE
AD0 A
2-4
2-5
DB4
DB3
ADC0808 -6
AD1 B ADC0809 2 DB2
-7
AD2 C 2 DB1
2-8 DB0 LSB
5V SUPPLY
VCC In7 VIN 8

GND
GROUND 0-5V ANALOG

INPUT RANGE
In0 VIN 1
Fig. 19.8 Interface to a typical processor

The timing Diagram (Fig.19.9)

The address latch enable signal and the start conversion are almost made high at the same time as
per the connections in Fig.19.8. The analog input should be stable across the hold capacitor for
the conversion time(tc). The digital outputs remain tri-stated till the output is enabled externally
by the Output Enable(OE) signal. The comparator input changes by the SAR counter and switch
tree through the ladder network till the output almost matches the voltage ate the selected analog
input channel.
Important Specifications
8- time-multiplexed analog channels
Resolution 8 Bits
Supply 5 VDC
Average Power consumption 15 mW
Conversion Time 100 s
19(IV) The DA Converter DAC0808 o m

o t.c
architecture and pin diagram of such a chip. s p
The DAC0808 is an 8-bit monolithic digital-to-analog converter (DAC). Fig.19.9 shows the
o g
bl
p.
MSB LSB
A1 A2 A3 A4 A5 A6 A7 A8
o u
g r
RANGE
CURRENTts SWITCHES I0
CONTROL
e n
u d
st LADDER
R-2R BIAS CIRCUIT GND
ity
.c
VREF (+)
w
w NPN CURRENT VCC
VREF (-) w SOURCE PAIR
REFERENCE COMPEN
CURRENT AMP
VEE

NC (NOTE 2) 1 16
COMPENSATION
GND 2 15
VREF(-)
VEE 3 14
VREF(+)
I0 4 13
VCC
DAC0808
MSB A1 5 12
A8 LSB
A2 6 11
A7
A3 7 10
A6
o m
A4 8 9
t.c
A5
p o
g s
o
bl
Fig. 19.9 The DAC 0808 Signals
p .
The pins are labeled A1 through A8, but note that A1 is the Most Significant Bit, and A8 is the
u
Least Significant Bit (the opposite of the normal convention). The D/A converter has an output
o
r
current, instead of an output voltage. An op-amp converts the current to a voltage. The output
g
s
current from pin 4 ranges between 0 (when the inputs are all 0) to Imax*255/256 when all the
ent
inputs are 1. The current, Imax, is determined by the current into pin 14 (which is at 0 volts).
Since we are using 8 bits, the maximum value is Imax*255/256. The output of the D/A converter
u d
takes some time to settle. Therefore there should be a small delay before sending the next data to
st
the DA. However this delay is very small compared to the conversion time of an AD Converter,
t y
therefore, does not matter in most real time signal processing platforms. Fig.19.10 shows a
i
typical interface.
.c
w
w
w

VCC = 5V
13
5 14 5.000k
MSBA1 10.000V = VREF
6
A2 5k
7 15
A3 5.000k
8 2
DIGITAL A4
9 DAC0808
INPUTS A5
10 4
A6 -
11
A7
12 16 LF351 V0
LSB A8 OUTPUT
3 0.1 F +
o m
o t.c
VEE = -15V
s p
g
Fig. 19.10 Typical connection of DAC0808
o
b l voltage converter. The 8-digital
.
LF351 is an operational amplifier used as current to proportional
inputs at A8-A1 is converted into proportional currentpat pin no.4 of the DAC. The reference
voltages(10V) are supplied at pin 14 and 15(grounded
connected across the Compensation pin 16 and rthe ounegativethrough resistance). A capacitor is
supply to bypass high frequency
noise.
s g
Important Specifications
n t
0.19% Error
d e
Settling time: 150 ns
t u
Slew rate: 8 mA/s
y sto 18V
Power supply voltage range: 4.5V
Power consumption: 33 mW @i 5V
t
.c
w
19(V) w
Conclusion
w
In this lesson you learnt about the following
The internal AD converters of 80196 family of processor
The external microprocessor compatible AD0809 converter
A typical 8-bit DA Converter
Both the ADCs use successive approximation technique. Flash ADCs are complex and therefore
generate difficult VLSI circuits unsuitable for coexistence on the same chip. Sigma-Delta need
very high sampling rate.

Question Answers
Q.1. What are the possible errors in a system as shown in Fig. 19.2?
Ans:
Stage-1 Signal Amplification and Conditioning This can also amplify the noise.
Stage-2 Anti-aliasing Filter Some useful information such as transients in the real systems
cannot be captured.
Stage-3 Sample and Hold The leakage and electromagnetic interference due to switching
Stage-4 Analog to Digital Converter Quantization error due to finite bit length
o m
Stage-5 Digital Processing and Data manipulation in a Processor: Numerical round up errors due
to finite word length and the delay caused by the algorithm.
o t.c
s p
Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion: Error
in reconstruction due to zero-order approximation
o g
. bl
u p
Q.2 Why it is necessary to separate the digital ground from analog ground in a typical ADC?
r o
g
Ans: Digital circuit noise can get to analogue signal path if separate grounding systems are not
s
nt
used for digital and analogue parts. Digital grounds are invariably noisier than analog grounds
because of the switching noise generated in digital chips when they change state. For large
d e
current transients, PCB trace inductances causes voltage drops between various ground points on
t u
the board (ground bounce). Ground bounce translates into varying voltage level bounce on signal
s
lines. For digital lines this isn't a problem unless one crosses a logic threshold. For analog it's just
y
it
plain noise to be added to the signals.
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
20
Field Programmable Gate
Arrays and Applications
After going through this lesson the student will be able to
Define what is a field programmable gate array (FPGA)

Distinguish between an FPGA and a stored-memory processor
List and explain the principle of operation of the various functional units within an FPGA
Compare the architecture and performance specifications of various commercially
available FPGA
Describe the steps in using an FPGA in an embedded system
Introduction
o m
t.c
An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry.
When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware
p o
implementation of the software application. Unlike processors, FPGAs use dedicated hardware
s
for processing logic and do not have an operating system. FPGAs are truly parallel in nature so
g
o
different processing operations do not have to compete for the same resources. As a result, the
bl
performance of one part of the application is not affected when additional processing is added.
.
u p
Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based
control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by
r o
an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed
g
hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow
s
ent
reconfiguration after the control system is deployed to the field. FPGA devices deliver the
performance and reliability of dedicated hardware circuitry.
d
A single FPGA can replace thousands of discrete components by incorporating millions of logic
u
t
gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a
s
it y
matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in
Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches
and wire routes. .c
w
w
w

PROGRAMMABLE
INTERCONNECT I/O BLOCKS
o m
LOGIC BLOCKS o t.c
Fig. 20.1 Internal Structure s p
of FPGA
g
otwo-level
In an FPGA logic blocks are implemented using multiple level
b l low fan-in gates, which gives it a
more compact design compared to an implementation with
provides its user a way to configure: p . AND-OR logic. FPGA
o u
1. The intersection between the logic blocks and
g r
2. The function of each logic block.
ts
Logic block of an FPGA can be configured
n
eas that
in such a way that it can provide functionality as
u d
simple as that of transistor or as complex of a microprocessor. It can used to implement
st and sequential logic functions. Logic blocks of an
different combinations of combinational
FPGA can be implemented by any
i ty the following:
of
.c
1. Transistor pairs
w like basic NAND gates or XOR gates
w
2. combinational gates
w
3. n-input Lookup tables
4. Multiplexers
5. Wide fan-in And-OR structure.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via
electrically programmable switches. Density of logic block used in an FPGA depends on length
and number of wire segments used for routing. Number of segments used for interconnection
typically is a tradeoff between density of logic blocks used and amount of area used up for
routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.

Logic
block
I/O block
o m
.c
Fig. 20.2 Simplified Internal Structure oftFPGA
Why do we need FPGAs? p o

g s
By the early 1980s large scale integrated circuits (LSI) lo formed the back bone of most of
the logic circuits in major systems. Microprocessors, bus/IO b
. controllers, system timers etc were
implemented using integrated circuit fabrication p
u integrated circuits in order to:
technology. Random glue logic or
interconnects were still required to help connect theolarge
1. Generate global control signals (for resetsg r
etc.)
t
2. Data signals from one subsystem to anothers sub system.
Systems typically consisted of few large scalen integrated components and large number of SSI
(small scale integrated circuit) and MSIe (medium scale integrated circuit) components.Intial
u d
amount of interconnect. This reduced st
attempt to solve this problem led to development
system
of Custom ICs which were to replace the large
complexity and manufacturing cost, and improved
performance. However, custom y
it ICs have their own disadvantages. They are relatively very
.
expensive to develop, and delayc introduced for product to market (time to market) because of
1. Cost of developmentw
w are two kinds of costs involved in development of custom ICs
increased design time. There
and design
2. Cost of manufacturew
(A tradeoff usually exists between the two costs)
Therefore the custom IC approach was only viable for products with very high volume, and
which were not time to market sensitive.FPGAs were introduced as an alternative to custom ICs
for implementing entire system on one chip and to provide flexibility of reporogramability to the
user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI
components (within around 10x of custom ICs). Another advantage of FPGAs over Custom ICs
is that with the help of computer aided design (CAD) tools circuits could be implemented in a
short amount of time (no physical layout process, no mask making, no IC manufacturing)
Evaluation of FPGA
In the world of digital electronic systems, there are three basic kinds of devices: memory,
microprocessors, and logic. Memory devices store random information such as the contents of a

spreadsheet or database. Microprocessors execute software instructions to perform a wide variety

of tasks such as running a word processing program or video game. Logic devices provide
specific functions, including device-to-device interfacing, data communication, signal
processing, data display, timing and control operations, and almost every other function a system
must perform.
The first type of user-programmable chip that could implement logic circuits was the
Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit
inputs and data lines as outputs. Logic functions, however, rarely require more than a few
product terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an
inefficient architecture for realizing logic circuits, and so are rarely used in practice for that
purpose. The device that came as a replacement for the PROMs are programmable logic devices
or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean functions in
sum-of-product form. The typical implementation consists of input buffers for all inputs, the
programmable AND-matrix followed by the programmable OR-matrix, and output buffers. The
input buffers provide both the original and the inverted values of each PLA input. The input lines
o m
run horizontally into the AND matrix, while the so-called product-term lines run vertically.
product-terms.
o t.c
Therefore, the size of the AND matrix is twice the number of inputs times the number of
p
When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they
s
g
were expensive to manufacture and offered somewhat poor speed-performance. Both
o
bl
disadvantages were due to the two levels of configurable logic, because programmable logic
p .
planes were difficult to manufacture and introduced significant propagation delays. To overcome
these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide
u
only a single level of programmability, consisting of a programmable wired AND plane that
o
r
feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that
g
s
sequential circuits can be realized. These are often referred to as Simple Programmable Logic
nt
Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.
e
u d
Inputs st PLA
ity Inputs PAL
.c
w
w
Outputs
w
Outputs
Fig. 20.3 Simplified Structure of PLA and PAL

With the advancement of technology, it has become possible to produce devices with
higher capacities than SPLDs.As chip densities increased, it was natural for the PLD
manufacturers to evolve their products into larger (logically, but not necessarily physically) parts
called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs
can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The
larger size of a CPLD allows to implement either more logic equations or a more complicated
design.
Logic Logic
block block
Switch
matrix
Logic Logic
block block
o m
Fig. 20.4 Internal structure of a CPLD
o t.c
p
Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown
s
g
there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than
o
bl
four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect
wiring, just like an ordinary PLD.
p .
Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD
u
may or may not be fully connected. In other words, some of the theoretically possible
o
r
connections between logic block outputs and inputs may not actually be supported within a given
g
s
CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult
ent
to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are
sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than
u d
PLDs, their potential uses are more varied. They are still sometimes used for simple applications
st
like address decoding, but more often contain high-performance control-logic or complex finite
t y
state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in
i
.c
potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs
whenever high-performance logic is required. Because of its less flexible internal architecture,
w
w
the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter.
w
The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This
is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of
logic density, the most features, and the highest performance. The largest FPGA now shipping,
part of the Xilinx Virtex line of devices, provides eight million "system gates" (the relative
density of logic). These advanced devices also offer features such as built-in hardwired
processors (such as the IBM Power PC), substantial amounts of memory, clock management
systems, and support for many of the latest, very fast device-to-device signaling technologies.
FPGAs are used in a wide variety of applications ranging from data processing and storage, to
instrumentation, telecommunications, and digital signal processing. The value of programmable
logic has always been its ability to shorten development cycles for electronic equipment
manufacturers and help them get their product to market faster. As PLD (Programmable Logic
Device) suppliers continue to integrate more functions inside their devices, reduce costs, and
increase the availability of time-saving IP cores, programmable logic is certain to expand its
popularity with digital designers.

FPGA Structural Classification

Basic structure of an FPGA includes logic elements, programmable interconnects and memory.
Arrangement of these blocks is specific to particular manufacturer. On the basis of internal
arrangement of blocks FPGAs can be divided into three classes:
Symmetrical arrays
This architecture consists of logic elements (called CLBs) arranged in rows and columns
of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix
is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input
Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tri-
state control, output transition speed. Interconnects provide routing path. Direct interconnects
between adjacent logic elements have smaller delay compared to general purpose interconnect
o m
t.c
Row based architecture
p o
Row based architecture shown in Fig 20.5 consists of alternating rows of logic modules
g s
and programmable interconnect tracks. Input output blocks is located in the periphery of the
o
bl
rows. One row may be connected to adjacent rows via vertical interconnect. Logic modules can
p .
be implemented in various combinations. Combinatorial modules contain only combinational
elements which Sequential modules contain both combinational elements along with flip flops.
u
This sequential module can implement complex combinatorial-sequential functions. Routing
o
r
tracks are divided into smaller segments connected by anti-fuse elements between them.
g
s
Hierarchical PLDs
ent
u d
st
This architecture is designed in hierarchical manner with top level containing only logic
blocks and interconnects. Each logic block contains number of logic modules. And each logic
it y
module has combinatorial as well as sequential functional elements. Each of these functional
.c
elements is controlled by the programmed memory. Communication between logic blocks is
w
achieved by programmable interconnect arrays. Input output blocks surround this scheme of
w
logic blocks and interconnects. This type of architecture is shown in Fig 20.6.
w

I/O Blocks
Logic
Block
Routing Rows
Channels
I/O Blocks
I/O Blocks
I/O Blocks
o m
Fig. 20.5 Row based Architecture
o t.c
s p
I/O Block
o gLogic
Module
.bl
u p
r o I/O Block
I/O Block
s g
ent
u d
st
ty
ci I/O Block Interconnects
.
w Fig. 20.6 Hierarchical PLD
w
w
FPGA Classification on user programmable switch technologies
FPGAs are based on an array of logic modules and a supply of uncommitted wires to
route signals. In gate arrays these wires are connected by a mask design during manufacture. In
FPGAs, however, these wires are connected by the user and therefore must use an electronic
device to connect them. Three types of devices have been commonly used to do this, pass
transistors controlled by an SRAM cell, a flash or EEPROM cell to pass the signal, or a direct
connect using antifuses. Each of these interconnect devices have their own advantages and
disadvantages. This has a major affect on the design, architecture, and performance of the FPGA.
Classification of FPGAs on user programmable switch technology is given in Fig. 20.7 shown
below.

FPGA
Antifuse- SRAM- EEPROM-

Programmed Programmed Programmed
Actel ACT1 & 2 Alteras MAX

Quicklogics pASIC AMDs Mach
Crosspoints CP20K Xilinx LCA Toshiba Xilinxs EPLD
AT&T Orca Plessers ERA
Altera Flex Atmels CLi
Fig. 20.7 FPGA Classification on user programmable technologyo m

SRAM Based o t.c
s p
g
The major advantage of SRAM based device is that they are infinitely re-programmable
o
.bl
and can be soldered into the system and have their function changed quickly by merely changing
the contents of a PROM. They therefore have simple development mechanics. They can also be
u p
changed in the field by uploading new application code, a feature attractive to designers. It does
r o
however come with a price as the interconnect element has high impedance and capacitance as
g
well as consuming much more area than other technologies. Hence wires are very expensive and
s
nt
slow. The FPGA architect is therefore forced to make large inefficient logic modules (typically a
d e
look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time
when power is applied, needs an external memory to store program and require large area. Fig.
t u
20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor
y s
switches and to control the select lines of multiplexers that drive logic block inputs. The figures
it
gives an example of the connection of one logic block (represented by the AND-gate in the upper
.c
left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled
w
by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the
particular product. w
w

Logic Cell Logic Cell

SRAM
SRAM SRAM
o m
o t.c
Logic Cell p
sLogic Cell
o g
. bl
Fig. 20.8 SRAM-controlled Programmable Switches.
u p
Antifuse Based r o
s g
The antifuse based cell is the highestt density interconnect by being a true cross point.
Thus the designer has a much larger number e n of interconnects so logic modules can be smaller
d also has a much easier time. These devices however
and more efficient. Place and route software
u
are only one-time programmable and
s t therefore have to be thrown out every time a change is
made in the design. The Antifusey has an inherently low capacitance and resistance such that the
t The disadvantage of the antifuse is the requirement to
fastest parts are all Antifuse ibased.
c
. antifuses into the IC process, which means the process will always
integrate the fabrication of the
lag the SRAM process inwscaling. Antifuses are suitable for FPGAs because they can be built
using modified CMOSw technology. As an example, Actels antifuse structure is depicted in Fig.
w
20.9. The figure shows that an antifuse is positioned between two interconnect wires and
physically consists of three sandwiched layers: the top and bottom layers are conductors, and the
middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom
layers, but when programmed the insulator changes to become a low-resistance link. It uses
Poly-Si and n+ diffusion as conductors and ONO as an insulator, but other antifuses rely on
metal for conductors, with amorphous silicon as the middle layer.

oxide Poly-Si
wire
dielectric
wire
antifuse
n+ diffusion
Silicon substrate
Fig. 20.9 Actel Antifuse Structure.
EEPROM Based
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in
an SRAM cell or as a directly programmable switch. When used as a switch they can be very
m
efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile
o
t.c
so they do not require an extra PROM for loading. They, however, do have their detractions. The
EEPROM process is complicated and therefore also lags SRAM technology.
p o
Logic Block and Routing Techniques g s
lo
Crosspoint FPGA: consist of two types of logic blocks.bOne is transistor pair tiles in which
p .
transistor pairs run in parallel lines as shown in figure below:
o u
gr
ts
e n
d
u Pair
t
Transistor
s
ti y Transistor pair tiles in cross-point FPGA.
Fig. 20.10
c
second type of logic blocks. are RAM logic which can be used to implement random access
memory. w
w
Plessey FPGA: Basic w building block here is 2-input NAND gate which is connected to each
other to implement desired function.
Latch
8 interconnect
multiplexer
CLK
lines
8-2
Data
Config RAM
Fig. 20.11 Plessey Logic Block

Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an
advantage in high percentage usage of logic blocks but they require large number of wire
segments and programmable switches which occupy lot of area.
Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be
used to implement different logic functions. For example a 2-input multiplexer with inputs a and
b, select, will implement function ac + bc. If b=0 then it will implement ac, and if a=0 it will
implement bc.
w 0
x 1
0
n1
y 0
1
o m
z 1 o t.c
n3 n4
s p
n2
o g
. bl Block
Fig. 20.12 Actel Logic
u
Typically an Actel logic block consists of multiple numberp of multiplexers and logic gates.
r o
Xilinx Logic block
s g
In Xilinx logic block Look up e nt is used to implement any number of different
table
functionality. The input lines go into d
lookup table gives the result of tthe u thelogic
input and enable of lookup table. The output of the
function that it implements. Lookup table is
s
implemented using SRAM.
i ty
.c
w
Data in
w M S
w U
X
R X
A
B Look-up Outputs
Inputs C Table
D Y
E
M S
U
X
R
Enable
clock Vix
Clock
Reset
Gnd OR
(Global Reset)
Fig. 20.13 Xilinx - LUT based

A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible
functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports
implementation of so many logic functions, however the disadvantage is unusually large number
of memory cells required to implement such a logic block in case number of inputs is large. Fig.
20.13 shows 5-input LUT based implementation of logic block LUT based design provides for
better logic block utilization. A k-input LUT based logic block can be implemented in number of
different ways with tradeoff between performance and logic density.
Logic Block Set by configuration

latch bit-stream
1
INPUTS 4-LUT FF OUTPUT
0
o m
o t.c
s p
g
4-input look up table
o
An n-lut can be shown as a direct implementation of a b l truth-table. Each of the latch
function
.
holds the value of the function corresponding to onepinput combination. For Example: 2-lut
shown in figure below implements 2 input AND and OR
o u functions.
gr2-lut
s
Example:
t
INPUTS
e n AND OR
u d0001 00 01
st 10 0 1
i ty 11 1 1
.c
Altera Logic Blockw
w
w
Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100
input) AND gates feeding into an OR gate with 3-8 inputs. The advantage of large fan in AND
gate based implementation is that few logic blocks can implement the entire functionality
thereby reducing the amount of area required by interconnects. On the other hand disadvantage is
the low density usage of logic blocks in a design that requires fewer input logic. Another
disadvantage is the use of pull up devices (AND gates) that consume static power. To improve
power manufacturers provide low power consuming logic blocks at the expense of delay. Such
logic blocks have gates with high threshold as a result they consume less power. Such logic
blocks can be used in non-critical paths.
Altera, Xilinx are coarse grain architecture.
Example: Alteras FLEX 8000 series consists of a three-level hierarchy. However, the lowest
level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so
the FLEX 8000 is categorized here as an FPGA. It should be noted, however, that FLEX 8000 is
a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a
four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more
than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig.
20.14.
I/O
I/O
Fast Track
interconnect
o m
LAB.c
o t Elements &
(8 Logic
sp
local interconnect)
o g
.bl
p
Fig. 20.14 Architecture ofuAltera FLEX 8000 FPGAs.
The basic logic block, called a Logic Element g ro contains a four-input LUT, a flip-flop, and
(LE)
s The LE also includes cascade circuitry that
special-purpose carry circuitry for arithmetic tcircuits.
allows for efficient implementation of widenAND functions. Details of the LE are illustrated in
Fig. 20.15. d e
t u
y s
i t
.c
Cascade out
Cascade in
w
data1
w
data2
data3
w Look-up Cascade
S
DQ LE out
Table
data4 R
Carry in Carry Carry out
cntrl1 set/clear
cntrl2
cntrl3
clock
cntrl4
Fig. 20.15 Altera FLEX 8000 Logic Element (LE).

In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term
borrowed from Alteras CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect
and each local wire can connect any LE to any other LE within the same LAB. Local
interconnect also connects to the FLEX 8000s global interconnect, called FastTrack. All
FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are
more predictable than FPGAs that employ many smaller length segments because there are fewer
programmable switches in the longer path
From Fast Track
interconnect cntrl Cascade, carry
4 2
data
To Fast Track
4 LE interconnect
o m
t.c
To Fast Track
Local interconnect
LE
p o
interconnect
g s
o
. bl
LEu
p To Fast Track
r o interconnect
s g
nt
to adjacent LAB
d e
u
Fig. 20.16 Altera FLEX 8000 Logic Array Block (LAB).
t
y s
FPGA Design Flow it
.c
w
One of the most important advantages of FPGA based design is that users can design it
w
using CAD tools provided by design automation companies. Generic design flow of an FPGA
w
includes following steps:
System Design
At this stage designer has to decide what portion of his functionality has to be implemented on
FPGA and how to integrate that functionality with rest of the system.
I/O integration with rest of the system

Input Output streams of the FPGA are integrated with rest of the Printed Circuit Board, which
allows the design of the PCB early in design process. FPGA vendors provide extra automation
software solutions for I/O design process.

Design Description
Designer describes design functionality either by using schematic editors or by using one of the
various Hardware Description Languages (HDLs) like Verilog or VHDL.
Synthesis
Once design has been defined CAD tools are used to implement the design on a given FPGA.
Synthesis includes generic optimization, slack optimizations, power optimizations followed by
placement and routing. Implementation includes Partition, Place and route. The output of design
implementation phase is bit-stream file.
Design Verification
o m
t.c andclockreports
Bit stream file is fed to a simulator which simulates the design functionality errors in
desired behavior of the design. Timing tools are used to determine maximum
the design. Now the design is loading onto the target FPGA device p oand testing is done in real
frequency of
environment.
g s
b lo
Hardware design and development
p .
u the embedded software development
The process of creating digital logic is notounlike
process. A description of the hardware's structure
g r and behavior is written in a high-level
hardware description language (usually VHDL
downloaded prior to execution. Of course,n
ts or Verilog) and that code is then compiled and
schematic capture is also an option for design entry,
but it has become less popular as designs e
d of hardware development for programmable logic is
have become more complex and the language-based
u
shown in Fig. 20.17 and describedsint the paragraphs that follow.
tools have improved. The overall process
i
Perhaps the most striking differencet y between hardware and software design is the way a
developer must think about c
. the problem. Software developers tend to think sequentially, even
when they are developingwa multithreaded application. The lines of source code that they write
are always executed inw that order, at least within a given thread. If there is an operating system it
is used to create the w
appearance of parallelism, but there is still just one execution engine. During
design entry, hardware designers must think-and program-in parallel. All of the input signals are
processed in parallel, as they travel through a set of execution engines-each one a series of
macrocells and interconnections-toward their destination output signals. Therefore, the
statements of a hardware description language create structures, all of which are "executed" at
the very same time.

Design Entry
Simulation
Design
Synthesis
Constraints
Place and Route

Design
Library
o m
Download
o t.c
s p
g
Fig. 20.17 Programmable logic design process
o
l
Typically, the design entry step is followed or interspersedbwith periods of functional simulation.
.
That's where a simulator is used to execute the designpand confirm that the correct outputs are
produced for a given set of test inputs. Although uproblems with the size or timing of the
hardware may still crop up later, the designer canroat least be sure that his logic is functionally
s g
correct before going on to the next stage of development.
Compilation only begins after a functionally n tcorrect representation of the hardware exists. This
e steps. First, an intermediate representation of the
hardware compilation consists of two distinct
d
u so its contents do not depend on the particulars of the
hardware design is produced. This step is called synthesis and the result is a representation called
t
s
a netlist. The netlist is device independent,
FPGA or CPLD; it is usuallyy stored in a standard format called the Electronic Design
Interchange Format (EDIF). ci
t
. process is called place & route. This step involves mapping the
w in the netlist onto actual macrocells, interconnections, and input and
The second step in the translation
w
logical structures described
w
output pins. This process is similar to the equivalent step in the development of a printed circuit
board, and it may likewise allow for either automatic or manual layout optimizations. The result
of the place & route process is a bitstream. This name is used generically, despite the fact that
each CPLD or FPGA (or family) has its own, usually proprietary, bitstream format. Suffice it to
say that the bitstream is the binary data that must be loaded into the FPGA or CPLD to cause that
chip to execute a particular hardware design.
Increasingly there are also debuggers available that at least allow for single-stepping the
hardware design as it executes in the programmable logic device. But those only complement a
simulation environment that is able to use some of the information generated during the place &
route step to provide gate-level simulation. Obviously, this type of integration of device-specific
information into a generic simulator requires a good working relationship between the chip and
simulation tool vendors.

Things to Ponder
Q.1 Define the following acronyms as they apply to digital logic circuits:
ASIC
PAL
PLA
PLD
CPLD
FPGA
Q2.How granularity of logic block influences the performance of an FPGA?
Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA,
m
etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are
o
t.c
there any applications where hard-wired logic would do a better job than a programmable
device?
p o
Q4.Some programmable logic devices (and PROM memory devices
g s as well) use tiny fuses
Programming a device by blowing tiny fuses inside ofloit carries certain advantages and
which are intentionally "blown" in specific patterns to represent the desired program.
. b
disadvantages - describe what some of these are.
u p
Q5. Use one 4 x 8 x 4 PLA to implement the function. r o
s g
F ( w, x, y, z ) = wx ' y ' z + wx ' yz '+ wxy 't
1
F ( w, x, y, z )= wx ' y + x ' y ' z e n

2
u d
st
i ty
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
21
Introduction to Hardware
Description Languages - I
At the end of the lesson the student should be able to
Describe a digital IC design flow and explain its various abstraction levels.
Explain the need for a hardware description language in the IC desing flow
Model simple hardware devices at various levels of abstraction using Verilog
(Gate/Switch/Behavioral)
Write Verilog codes meeting the prescribed requirement at a specified level
1.1 Introduction
1.1.1 om
What is a HDL and where does Verilog come?
t.c
HDL is an abbreviation of Hardware Description Language. Any
p o aredigital system can be
represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs
RTL. Verilog is one such HDL and it is a general-purpose language g s easy to learn and use.
used to describe this
Its syntax is similar to C. The idea is to specify how the data lo flows between registers and how
the design processes the data. To define RTL, hierarchical b
. the digital
design concepts play a very
p
u of abstractiondesign
significant role. Hierarchical design methodology facilitates flow with several
levels of abstraction. Verilog HDL can utilize theseolevels to produce a simplified
and efficient representation of the RTL description g rof any digital design.
For example, an HDL might describets the layout of the wires, resistors and transistors on
n level or, it may describe the design at a more
an Integrated Circuit (IC) chip, i.e., the switch
e
u d
micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog
supports all of these levels.
st
1.1.2 Hierarchy of design i t y methodologies
.c
Bottom-Up Design w
w
w
The traditional method of electronic design is bottom-up (designing from transistors and moving
to a higher level of gates and, finally, the system). But with the increase in design complexity
traditional bottom-up designs have to give way to new structural, hierarchical design methods.
Top-Down Design
For HDL representation it is convenient and efficient to adapt this design-style. A real top-down
design allows early testing, fabrication technology independence, a structured system design and
offers many other advantages. But it is very difficult to follow a pure top-down design. Due to
this fact most designs are mix of both the methods, implementing some key elements of both
design styles.

1.1.3 Hierarchical design concept and Verilog

To follow the hierarchical design concepts briefly mentioned above one has to describe the
design in terms of entities called MODULES.
Modules
A module is the basic building block in Verilog. It can be an element or a collection of low level
design blocks. Typically, elements are grouped into modules to provide common functionality
used in places of the design through its port interfaces, but hides the internal implementation.
1.1.4 Abstraction Levels

Behavioral level
Register-Transfer Level o m

Gate Level
Switch level o t.c
s p
Behavioral or algorithmic Level
o g
. bl
This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is
u p
sequential meaning that it consists of a set of instructions that are executed one after the other.
r o
initial, always ,functions and tasks blocks are some of the elements used to define the
g
system at this level. The intricacies of the system are not elaborated at this stage and only the
s
nt
functional description of the individual blocks is prescribed. In this way the whole logic
synthesis gets highly simplified and at the same time more efficient.
d e
Register-Transfer Level t u
y s
i t
.c
Designs using the Register-Transfer Level specify the characteristics of a circuit by operations
and the transfer of data between the registers. An explicit clock is used. RTL design contains
w
exact timing possibility, operations are scheduled to occur at certain times. Modern definition of
w
a RTL code is "Any code that is synthesizable is called RTL code".
w
Gate Level
Within the logic level the characteristics of a system are described by logical links and their
timing properties. All signals are discrete signals. They can only have definite logical values (`0',
`1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates).
It must be indicated here that using the gate level modeling may not be a good idea in logic
design. Gate level code is generated by tools like synthesis tools in the form of netlists which are
used for gate level simulation and for backend.

Switch Level
This is the lowest level of abstraction. A module can be implemented in terms of switches,
storage nodes and interconnection between them.
However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a
design. RTL is frequently used for Verilog description that is a combination of behavioral and
dataflow while being acceptable for synthesis.
Instances
A module provides a template from where one can create objects. When a module is invoked
Verilog creates a unique object from the template, each having its own name, variables,
parameters and I/O interfaces. These are known as instances.
o m
1.1.5 The Design Flow
o t.c
p
This block diagram describes a typical design flow for the description of the digital design for
s
both ASIC and FPGA realizations.
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

LEVEL OF FLOW TOOLS USED

Word processor like Word, Kwriter, AbiWord, Open
Specification
Office
Word processor like Word, Kwriter, AbiWord, for
High Level Design drawing waveform use tools like waveformer or
testbencher or Word, Open Office.
Word processor like Word, Kwriter, AbiWord, for
Micro Design/Low level drawing waveform use tools like waveformer or
design testbencher or Word. For FSM StateCAD or some similar
tool, Open Office
RTL Coding Vim, Emacs, conTEXT, HDL TurboWriter
Modelsim, VCS, Verilog-XL, Veriwell, Finsim, iVerilog,
Simulation
VeriDOS
Design Compiler, FPGA Compiler, Synplify, Leonardo
Synthesis
like Altera and Xilinx for free o m
Spectrum. You can download this from FPGA vendors
Place & Route o t.c

For FPGA use FPGA' vendors P&R tool. ASIC tools
require expensive P&R tools like Apollo. Students can use
LASI, Magic
s p
g
For ASIC and FPGA, the chip needs to be tested in real
o
bl
Post Si Validation environment. Board design, device drivers needs to be in
place
p .
Specification o u
r
g parameters of the system that has to be
s
t one has to decide its bit-size, whether it should
This is the stage at which we define the important
e n
designed. For example for designing a counter
have synchronous reset whether it mustdbe active high enable etc.
t u
s
High Level Design
ti y
.c defines various blocks in the design in the form of modules and
This is the stage at which one
w a microprocessor a high level representation means splitting the
w
instances. For instance for
w
design into blocks based on their function. In this case the various blocks are registers, ALU,
Instruction Decode, Memory Interface, etc.
Micro Design/Low level design

Low level design or Micro design is the phase in which, designer describes how each block is
implemented. It contains details of State machines, counters, Mux, decoders, internal registers.
For state machine entry you can use either Word, or special tools like State CAD. It is always a
good idea if waveform is drawn at various interfaces. This is the phase, where one spends lot of
time. A sample low level design is indicated in the figure below.

RTL Coding
m
In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable
o
t.c
constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are
other choices.
p o
Simulation
g s
lo
. b
Simulation is the process of verifying the functional characteristics of models at any level of
u
meets the functional requirements of the specification,
psee if all the RTL blocks are functionally
abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code
r
correct. To achieve this we need to write testbench, owhich generates clk, reset and required test
vectors. A sample testbench for a counter is asgshown below. Normally, we spend 60-70% of
time in verification of design. ts
e n
u d
st
i ty
.c
w
w
w
We use waveform output from the simulator to see if the DUT (Device Under Test) is
functionally correct. Most of the simulators come with waveform viewer, as design becomes
complex, we write self checking testbench, where testbench applies the test vector, compares the
output of DUT with expected value.
There is another kind of simulation, called timing simulation, which is done after synthesis or
after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT
works at the rated clock speed. This is also called as SDF simulation or gate level simulation

Synthesis
o m
Synthesis is the process in which a synthesis tool like design compiler
t. c takes in the RTL in
Verilog or VHDL, target technology, and constrains as input and o maps the RTL to target
technology primitives. The synthesis tool after mapping the RTL topgates, also does the minimal
amount of timing analysis to see if the mapped design is meeting g s the timing requirements.
(Important thing to note is, synthesis tools are not aware of lo wire delays, they know only gate
.
delays). After the synthesis there are a couple of things thatbare normally done before passing the
netlist to backend (Place and Route)
u p
Verification: Check if the RTL to gate mappingr o is correct.
Scan insertion: Insert the scan chain instheg case of ASIC.
n t
d e
t u
s
ti y
.c
w
w
w
Place & Route

Gate-level netlist from the synthesis tool is taken and imported into place and route tool in the
Verilog netlist format. All the gates and flip-flops are placed, Clock tree synthesis and reset is
routed. After this each block is routed. Output of the P&R tool is a GDS file, this file is used by a

foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which
is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime
Time to do timing analysis.
Post Silicon Validation

Once the chip (silicon) is back from fabrication, it needs to be put in a real environment and
tested before it can be released into market. Since the speed of simulation with RTL is very slow
(number clocks per second), there is always a possibility to find a bug
1.2 Verilog HDL: Syntax and Semantics
1.2.1 Lexical Conventions

o m
The basic lexical conventions used by Verilog HDL are similar to those in the C programming
t.c
language. Verilog HDL is a case-sensitive language. All keywords are in lowercase.
o
1.2.2 Data Types s p
o g
Verilog Language has two primary data types :
. bl
Nets - represents structural connections between components.
u
Registers - represent variables used to store data.p
r o
Every signal has a data type associated with g
Explicitly declared with a declarationtin s it.theData types are:
Verilog code.
e
Implicitly declared with no declarationn but used to connect structural building blocks in
d always net type "wire" and only one bit wide.
the code. Implicit declarations are
u
st
Types of Net
i ty
.c that is used to model different types of hardware (such as PMOS,
Each net type has functionality
NMOS, CMOS, etc).Thisw
w has been tabularized as follows:
Net Data Type w Functionality
wire, tri Interconnecting wire - no special resolution function
wor, trior Wired outputs OR together (models ECL)
wand,triand Wired outputs AND together (models open-collector)
tri0,tri1 Net pulls-down or pulls-up when not driven
supply0,suppy1 Net has a constant logic 0 or logic 1 (supply strength)
Register Data Types

Registers store the last value assigned to them until another assignment statement
changes their value.
Registers represent data storage constructs.
Register arrays are called memories.

Register data types are used as variables in procedural blocks.

A register data type is required if a signal is assigned a value within a procedural block
Procedural blocks begin with keyword initial and always.
Some common data types are listed in the following table:
Data Types Functionality

reg Unsigned variable
integer Signed variable 32 bits
time Unsigned integer- 64 bits
real Double precision floating point variable
1.2.3 Apart from these there are vectors, integer, real & time
register data types.
o m
Some examples are as follows:
Integer o t.c
integer counter; // general purpose variable used as a counter.
s p
o g
bl
initial
counter= -1; // a negative one is stored in the counter
p .
Real
o u
real delta; // Define a real variable called delta.
gr
s
initial
begin ent
u d
delta= 4e10; // delta is assigned in scientific notation
st
delta = 2.13; // delta is assigned a value 2.13
end y
it
c
integer i; // define an integer. I;
w
w
initial w
i = delta ; // I gets the value 2(rounded value of 2.13)
Time
time save_sim_time; // define a time variable save_sim_time
initial
save_sim_time = $time; // save the current simulation time.
n.b. $time is invoked to get the current simulation time
Arrays
integer count [0:7]; // an array of 8 count variables
reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide.
integer matrix[4:0] [0:255] ; // two dimensional array of integers.

1.2.4 Some Constructs Using Data Types
Memories
Memories are modeled simply as one dimensional array of registers each element of the array is
know as an element of word and is addressed by a single array index.
reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words
reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words
membyte [511] // fetches 1 byte word whose address is 511.
Strings
A string is a sequence of characters enclosed by double quotes and all contained on a single line.
Strings used as operands in expressions and assignments are treated as a sequence of eight-bit
o m
ASCII values, with one eight-bit ASCII value representing one character. To declare a variable
o t.c
to store a string, declare a register large enough to hold the maximum number of characters the
variable will hold. Note that no extra bits are required to hold a termination character; Verilog
p
does not store a string termination character. Strings can be manipulated using the standard
s
operators.
o g
bl
When a variable is larger than required to hold a value being assigned, Verilog pads the contents
p .
on the left with zeros after the assignment. This is consistent with the padding that occurs during
assignment of non-string values. Certain characters can be used in strings only when preceded by
u
an introductory character called an escape character. The following table lists these characters in
o
r
the right-hand column with the escape sequence that represents the character in the left-hand
g
column.
s
Modules ent
u d
st
Module are the building blocks of Verilog designs

it y
You create design hierarchy by instantiating modules in other modules.

.c
An instance of a module can be called in another, higher-level module.
w
w
w

o m
o t.c
s p
Ports o g
. bl

p
Ports allow communication between a module and its environment.
u
Ports can be associated by order or by name. r o
All but the top-level modules in a hierarchy have ports.
s g
You declare ports to be input, output or inout. The port declaration syntax is :
ent
input [range_val:range_var] list_of_identifiers;
output [range_val:range_var] list_of_identifiers;
u d
inout [range_val:range_var] list_of_identifiers;
st
Schematic it y
.c
w
w
w
1.2.5 Port Connection Rules

Inputs : internally must always be type net, externally the inputs can be connected to
variable reg or net type.
Outputs : internally can be type net or reg, externally the outputs must be connected to a
variable net type.
Inouts : internally or externally must always be type net, can only be connected to a
variable net type.

Width matching: It is legal to connect internal and external ports of different sizes. But
beware, synthesis tools could report problems.
Unconnected ports : unconnected ports are allowed by using a ","
The net data types are used to connect structure
A net data type is required if a signal can be driven a structural connection.
Example Implicit o m
o t.c
dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected
s p
Example Explicit o g
. bl
dff u0 (.q (q_out),
.q_bar (), u p
.clk (clk_in), r o
.d (d_in),
s g
.rst (rst_in),
e nt
.pre (pre_in)); // Here second port is not connected
u d
1.3 st
Gate Level Modeling
y
i t
In this level of abstraction thecsystem modeling is done at the gate level ,i.e., the properties of the
.
w
gates etc. to be used by the behavioral description of the system are defined. These definitions
are known as primitives. wVerilog has built in primitives for gates, transmission gates, switches,
w
buffers etc.. These primitives are instantiated like modules except that they are predefined in
verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not
gates.
1.3.1 Gate Primitives

And/Or Gates: These have one scalar output and multiple scalar inputs. The output of the gate is
evaluated as soon as the input changes .
wire OUT, IN1, IN2;

// basic gate instantiations
and a1(OUT, IN1, IN2);
nand na1(OUT, IN1, IN2);
or or1(OUT, IN1, IN2);

nor nor1(OUT, IN1, IN2);

xor x1(OUT, IN1, IN2);
xnor nx1(OUT, IN1, IN2);
// more than two inputs; 3 input nand gate
nand na1_3inp(OUT, IN1, IN2, IN3);
// gate instantiation without instance name
and (OUT, IN1, IN2); // legal gate instantiation
Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs
\// basic gate instantiations for bufif
bufif1 b1(out, in, ctrl);

bufif0 b0(out, in, ctrl);
// basic gate instantiations for notif
notif1 n1(out, in, ctrl);
notif0 n0(out, in, ctrl);
o m
Array of instantiations ot.c
s p
wire [7:0] OUT, IN1, IN2;
o g
bl
// basic gate instantiations
nand n_gate[7:0](OUT, IN1, IN2);
p .
Gate-level multiplexer
uo
r
gdesign element
A multiplexer serves a very efficient basic logic ts
// module 4:1 multiplexer
e n
module mux4_to_1(out, i1, i2 , i3, s1, s0);
u d
// port declarations
st
ti y
output out;
input i1, i2, i3;
input s1, s0; . c
w
// internal wire declarations
wire s1n, s0n; w
wire y0, y1, y2, y3 ;w
//gate instantiations
// create s1n and s0n signals
not (s1n, s1);
not (s0n, s0);
// 3-input and gates instantiated
and (y0, i0, s1n, s0n);
and (y1, i1, s1n, s0);
and (y2, i2, s1, s0n);
and (y3, i3, s1, s0);
// 4- input gate instantiated
or (out, y0, y1, y2, y3);
endmodule

1.3.2 Gate and Switch delays

In real circuits, logic gates haves delays associated with them. Verilog provides the mechanism
to associate delays with gates.
Rise, Fall and Turn-off delays.
Minimal, Typical, and Maximum delays
Rise Delay
The rise delay is associated with a gate output transition to 1 from another value (0,x,z).
o m
Fall Delay o t.c
s p
g
The fall delay is associated with a gate output transition to 0 from another value (1,x,z).
o
. bl
u p
r o
s g
Turn-off Delay ent
u d
The Turn-off delay is associated with a gate output transition to z from another value (0,1,x).
Min Value st
it y
The min value is the minimum delay value that the gate is expected to have.
Typ Value
.c
w
The typ value is the typical delay value that the gate is expected to have.
Max Value
w
w
The max value is the maximum delay value that the gate is expected to have.
1.4 Verilog Behavioral Modeling

1.4.1 Procedural Blocks
Verilog behavioral code is inside procedures blocks, but there is an exception, some behavioral
code also exist outside procedures blocks. We can see this in detail as we make progress.
There are two types of procedural blocks in Verilog
initial : initial blocks execute only once at time zero (start execution at time zero).
always : always blocks loop to execute over and over again, in other words as the name
means, it executes always.

Example initial
module initial_example();
reg clk,reset,enable,data;
initial begin
clk = 0;
reset = 0;
enable = 0;
data = 0;
end
endmodule
In the above example, the initial block execution and always block execution starts at time 0.
Always blocks wait for the the event, here positive edge of clock, where as initial block without
waiting just executes all the statements within begin and end statement.
Example always
o m
t.c
module always_example();
reg clk,reset,enable,q_in,data;
always @ (posedge clk) p o
if (reset) begin
g s
data <= 0;
o
end
. bl
else if (enable) begin
data <= q_in; u p
end r o
endmodule
s g
ent
In always block, when the trigger event occurs, the code inside begin and end is executed and
d
then once again the always block waits for next posedge of clock. This process of waiting and
u
t
executing on event is repeated till simulation stops.
s
t y
1.4.2 ci
Procedural Assignment
. Statements
w

w
Procedural assignment statements assign values to reg , integer , real , or time variables

w
and can not assign values to nets ( wire data types)
You can assign to the register (reg data type) the value of a net (wire), constant, another
register, or a specific value.
1.4.3 Procedural Assignment Groups

If a procedure block contains more then one statement, those statements must be enclosed within
Sequential begin - end block
Parallel fork - join block
Example - "begin-end"
module initial_begin_end();
initial begin

#1 clk = 0;
#10 reset = 0;
#5 enable = 0;
#3 data = 0;
end
endmodule
Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data
after 13 units. All the statements are executed sequentially.
Example - "fork-join"
module initial_fork_join();
initial fork
#1 clk = 0;
#10 reset = 0;
o m
#5 enable = 0;
#3 data = 0;
o t.c
join
s p
endmodule
o g
1.4.4 Sequential Statement Groups . bl
u p
The begin - end keywords:
Group several statements together. r o
s g
Cause the statements to be evaluated sequentially (one at a time)
e nt
o Any timing within the sequential groups is relative to the previous statement.
o Delays in the sequence accumulate (each delay is added to the previous delay)
u d
o Block finishes after the last statement in the block.
s t
1.4.5 Parallel Statement i ty Groups
. c
The fork - join keywords:w
w
Group several statements together.
w to be evaluated in parallel ( all at the same time).
Cause the statements
o Timing within parallel group is absolute to the beginning of the group.
o Block finishes after the last statement completes( Statement with high delay, it
can be the first statement in the block).
Example Parallel
module parallel();
reg a;
initial
fork
#10 a = 0;
#11 a = 1;
#12 a = 0;
#13 a = 1;

#14 a = $finish;
join
endmodule
Example - Mixing "begin-end" and "fork - join"

module fork_join();
initial begin
$display ( "Starting simulation" );
fork : FORK_VAL
#1 clk = 0;
#5 reset = 0;
#5 enable = 0;
#2 data = 0;
join
$display ( "Terminating simulation" );
o m
#10 $finish;
end
o t.c
endmodule
s p
g
1.4.6 lo
Blocking and Nonblocking assignment
b
.
pcoded, Hence they are sequential. Since
they block the execution of the next statement, till u
Blocking assignments are executed in the order they are
o the current statement is executed, they are
r
g Since the execution of next statement is not
called blocking assignments. Assignment are made with "=" symbol. Example a = b;
ts
Nonblocking assignments are executed in parallel.
Assignment are made with "<=" symbol. e

n
blocked due to execution of current statement, they are called nonblocking statement.
u d Example a <= b;
Example - blocking and nonblocking st

module blocking_nonblocking();
i t y
reg a, b, c, d ;
// Blocking Assignment w
.c
initial begin w
#10 a = 0;
#11 a = 1;
w
#12 a = 0;
#13 a = 1;
end
initial begin
#10 b <= 0;
#11 b <=1;
#12 b <=0;
#13 b <=1;
end
initial begin
c = #10 0;
c = #11 1;

c = #12 0;
c = #13 1;
end
initial begin
d <= #10 0;
d <= #11 1;
d <= #12 0;
d <= #13 1;
end
initial begin
$monitor( " TIME = %t A = %b B = %b C = %b D = %b" ,$time, a, b, c, d );
#50 $finish(1);
end
endmodule
1.4.7 The Conditional Statement if-else o m

o t.c
The if - else statement controls the execution of other statements. In programming language like
s p
c, if - else controls the flow of program. When more than one statement needs to be executed for
g
an if conditions, then we need to use begin and end as seen in earlier examples.
o
Syntax: if . bl
if (condition) statements;
u p
Syntax: if-else
r o
if (condition) statements;
s g
nt
else
statements;
d e
1.4.8 Syntax: nested if-else-if t u
s
if (condition) statements; ti y
.
else if (condition) statements;c
................ w
................ w
else statements; w
Example- simple if
module simple_if();
reg latch;
wire enable,din;
always @ (enable or din)
if (enable) begin
latch <= din;
end
endmodule
Example- if-else
module if_else();

reg dff;
wire clk,din,reset;
always @ (posedge clk)
if (reset) begin
dff <= 0;
end else begin
dff <= din;
end
endmodule
Example- nested-if-else-if
module nested_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
// If reset is asserted
o m
if (reset == 1'b0) begin
counter <= 4'b0000;
o t.c
// If counter is enable and up count is mode
s p
end else if (enable == 1'b1 && up_en == 1'b1) begin
o g
bl
counter <= counter + 1'b1;
// If counter is enable and down count is mode
end else if (enable == 1'b1 && down_en == 1'b1) begin p .
counter <= counter - 1'b0;
ou
// If counting is disabled
gr
end else begin
s
counter <= counter; // Redundant code ent
end
u d
endmodule
st
it y
Parallel if-else
.c
w
w
In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest pritority and
w
condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't
include reset checking in priority as this does not fall in the combo logic input to the flip-flop as
shown in figure below.

So when we need priority logic, we use nested if-else statements. On the other end if we don't
want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are
mutually exclusive, then we can write the code as shown below.
It is a known fact that priority implementation takes more logic to implement then parallel
implementation. So if you know the inputs are mutually exclusive, then you can code the logic in
parallel if.
module parallel_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
// If reset is asserted
counter <= 4'b0000;
end else begin
// If counter is enable and up count is mode
o m
if (enable == 1'b1 && up_en == 1'b1) begin
counter <= counter + 1'b1;
o t.c
end
s p
// If counter is enable and down count is mode
o g
bl
if (enable == 1'b1 && down_en == 1'b1) begin
counter <= counter - 1'b0;
end p .
end uo
endmodule
g r
1.4.9 The Case Statement nt
s
d e
t u
The case statement compares an expression with a series of cases and executes the statement or
statement group associated with the
y sfirstor matching
case statement supports tsingle
case
c i using begin and end keywords.

multiple statements.
.
Group multiple statements
w
Syntax of a case statement look as shown below.
case () w >
w
< case1 > : < statement
< case2 > : < statement >
default : < statement >
endcase
1.4.10 Looping Statements

Looping statements appear inside procedural blocks only. Verilog has four looping statements
like any other programming language.
forever
repeat
while
for

The forever statement

The forever loop executes continually, the loop never ends. Normally we use forever statement
in initial blocks.
syntax : forever < statement >
Once should be very careful in using a forever statement, if no timing construct is present in the
forever statement, simulation could hang.
The repeat statement
The repeat loop executes statement fixed < number > of times.
syntax : repeat (< number >) (< statement >)
The while loop statement
The while loop executes as long as an evaluates as true. This is same as in any other
programming language.
syntax: while (expression)<statement>
o m
t.c
The for loop statement
Executes an < initial assignment > once at the start of the loop.p o
The for loop is same as the for loop used in any other programming language.
g s
Executes the loop as long as an < expression > evaluates as true.
o
bl
Executes a at the end of each pass through the loop
.
syntax : for (< initial assignment >; < expression >, < step assignment >) < statement >
p
u
Note : verilog does not have ++ operator as in the case of C language.
o
1.5 Switch level modeling g
r
ts
1.5.1 Verilog provides the ability to design e n at MOS-transistor level, however with increase in
complexity of the circuits design at this
udlevel associated
digital design capability and drive tstrengths
is growing tough. Verilog however only provides
to them. Analog capability is not into
s
i ty
picture still. As a matter of fact transistors are only used as switches.
MOS switches .c
//MOS switch keywords w
nmos w
pmos w
Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS
transistors.
Instantiation of NMOS and PMOS switches

nmos n1(out, data, control); // instantiate a NMOS switch
pmos p1(out, data, control); // instantiate a PMOS switch
CMOS switches
Instantiation of a CMOS switch.

cmos c1(out, data, ncontrol, pcontrol ); // instantiate a cmos switch
The ncontrol and pcontrol signals are normally complements of each other
Bidirectional switches
These switches allow signal flow in both directions and are defined by keywords tran,tranif0 ,
and tranif1
Instantiation
tran t1(inout1, inout2); // instance name t1 is optional
tranif0(inout1, inout2, control); // instance name is not specified
tranif1(inout1, inout2, control); // instance name t1 is not specified
o m
t.c
1.5.2 Delay specification of switches
pmos, nmos, rpmos, rnmos p o
Zero(no delay) pmos p1(out,data,scontrol);
o gp1(out,data, control);
bl
One (same delay in all) pmos#(1)
Two(rise, fall) nmos#(1,2).n1(out,data, control);
p
Three(rise, fall, turnoff)mos#(1,3,2)
u n1(out,data,control);
1.5.3 An Instance: Verilog codegrfor a NOR- gate

o
ts
// define a nor gate, my_nor
e n
module my_nor(out, a, b);
u d
output out;
st
ty
input a, b;
c i
//internal wires .
wire c; w
w
// set up pwr n ground lines
w
supply1 pwr;// power is connected to Vdd
supply0 gnd; // connected to Vss
// instantiate pmos switches

pmos (c, pwr, b);
pmos (out, c, a);
//instantiate nmos switches
nmos (out, gnd, a);
Stimulus to test the NOR-gate

// stimulus to test the gate

module stimulus;
reg A, B;
wire OUT;
//instantiate the my_nor module

my_nor n1(OUT, A, B);
//Apply stimulus
initial
begin
//test all possible combinations
A=1b0; B=1b0;
#5 A=1b0; B=1b1;
#5 A=1b1; B=1b0;
#5 A=1b1; B=1b1;
end
o m
//check results
initial
o t.c
$ monitor($time, OUT = %b, B=%b, OUT, A, B);
s p
o g
bl
endmodule
1.6 Some Exercises p .

ou
1.6.1 Gate level modelling gr
s
ent
i) A 2 inp xor gate can be build from my_and, my_or and my_not gates. Construct an xor module
d
in verilog that realises the logic function z= xy'+x'y. Inputs are x, y and z is the output. Write a
u
t
stimulus module that exercises all the four combinations of x and y
s
it y
ii) The logic diagram for an RS latch with delay is being shown.
.c
w
w
w
Write the verilog description for the RS latch, including delays of 1 unit when instantiating the
nor gates. Write the stimulus module for the RS latch using the following table and verify the
outputs.

Set Reset Qn+1

0 0 qn
0 1 0
1 0 1
1 1 ?
iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below
o m
o t.c
s p
o g
.bl
The delay specification for gates b1 and b2 are as follows
u p
r o
Min Typ Max
Rise
s g1 2 3
Fall
Turnoff e nt 3
5
4
6
5
7
u d
st
1.6.2. Behavioral modelling
y
i t
i) Using a while loop design acclk generator whose initial value is 0. time period of the clk is 10.
. design a clk with time period=10 and duty cycle =40%. Initial
w
ii) Using a forever statement,
value of clk is 0
iii) Using the repeat w
w
loop, delay the statement a=a+1 by 20 positive edges of clk.
iv) Design a negative edge triggered D-FF with synchronous clear, active high (D-FF clears only
at negative edge of clk when clear is high). Use behavioral statements only. (Hint: output q of D-
FF must be declared as reg.) Design a clock with a period of 10units and test the D-FF
v) Design a 4 to 1 multiplexer using if and else statements
vi) Design an 8-bit counter by using a forever loop, named block, and disabling of named block.
The counter starts counting at count =5 and finishes at count =67. The count is incremented at
positive edge of clock. The clock has a time period of 10. The counter starts through the loop
only once and then is disabled (hint: use the disable statement)

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
22
Description Languages - II
Call a task and a function in a Verilog code and distinguish between them
Plan and write test benches to a Verilog code such that it can be simulated to check the
desired results and also test the source code
Explain what are User Defined Primitives, classify them and use them in code
2.1 Task and Function
2.1.1 Task
o m
Tasks are used in all programming languages, generally known as procedures or subroutines.
o t.c
Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task,
processing done, and the result returned to the main program. They have to be specifically called,
s p
with data in and out, rather than just wired in to the general netlist. Included in the main body of
code, they can be called many times, reducing code repetition.
o g
Tasks are defined in the module in which they are b l it is possible to define a task in a
used.
p .
separate file and use compile directive 'include to include the task in the file which
instantiates the task. u
o negedge, # delay and wait.
r
sgoutputs.
Tasks can include timing delays, like posedge,
Tasks can have any number of inputs tand

n
The variables declared within thee task are local to that task. The order of declaration
within the task defines how the d
t uvariables passed to the task by the caller are used.
s global variables, when no local variables are used. When
tyassigns the output only at the end of task execution.
Task can take, drive and source
local variables are usediit
.c task or function.
One task can call another
w
Task can be used wfor modeling both combinational and sequential logics.
A task mustwbe specifically called with a statement, it cannot be used within an
expression as a function can.
Syntax
task begins with the keyword task and ends with the keyword endtask
Input and output are declared after the keyword task.
Local variables are declared after input and output declaration.
module simple_task();
task convert;
input [7:0] temp_in;

output [7:0] temp_out;

begin
temp_out = (9/5) *( temp_in + 32)
end
endtask
endmodule
Example - Task using Global Variables

module task_global ();
reg[7:0] temp_in;
reg [7:0] temp_out;
task convert;
always@(temp_in)
begin
temp_out = (9/5) *( temp_in + 32)
end
o m
endtask
endmodule
o t.c
s p
Calling a task
o g
Lets assume that the task in example 1 is stored in a file.b
l
called mytask.v. Advantage of coding
u p modules.
the task in a separate file is that it can then be used in multiple
module task_calling (temp_a, temp_b, temp_c, temp_d);
r o
input [7:0] temp_a, temp_c;
s g
output [7:0] temp_b, temp_d;
reg [7:0] temp_b, temp_d; n t
ìnclude "mytask.v"
d e
always @ (temp_a)
t u
Begin
y s
convert (temp_a, temp_b);
i t
End
. c
always @ (temp_c) w
Begin w
w
convert (temp_c, temp_d);
End
Endmodule
Automatic (Re-entrant) Tasks

Tasks are normally static in nature. All declared items are statically allocated and they are shared
across all uses of the task executing concurrently. Therefore if a task is called simultaneously
from two places in the code, these task calls will operate on the same task variables. it is highly
likely that the result of such operation be incorrect. Thus, keyword automatic is added in front of
the task keyword to make the tasks re-entrant. All items declared within the automatic task are
allocated dynamically for each invocation. Each task call operates in an independent space.

Example
// Module that contains an automatic re-entrant task
//there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it.
module top;
reg[15:0] cd_xor, ef_xor; // variables in module top
reg[15:0] c,d,e,f ; // variables in module top
task automatic bitwise_xor
output[15:0] ab_xor ; // outputs from the task
input[15:0] a,b ; // inputs to the task
begin
#delay ab_and = a & b
ab_or= a| b;
ab_xor= a^ b;
end
endtask
// these two always blocks will call the bitwise_xor task
o m
//concurrent calls will work efficiently
o t.c
// concurrently at each positive edge of the clk, however since the task is re-entrant, the
always @(posedge clk)

s p
bitwise_xor(ef_xor, e ,f );
o g
bl
always @(posedge clk2)// twice the frequency as that of the previous clk
bitwise_xor(cd_xor, c ,d );
endmodule p .
ou
2.1.2 Function g r
s
t very little difference, e.g., a function cannot drive
Function is very much similar to a task, with n
more then one output and, also, it can notecontain delays.
u d in which they are used. It is possible to define
Functions are defined in the
s tuse compile directive 'include to include the function in the
module
i
file which instantiates thetytask.
function in separate file and
.c
function shouldw
w
Function can not include timing delays, like posedge, negedge, # delay. This means that a
be executed in "zero" time delay.
Function canw have any number of inputs but only one output.
The variables declared within the function are local to that function. The order of
declaration within the function defines how the variables are passed to it by the caller.
Function can take, drive and source global variables when no local variables are used.
When local variables are used, it basically assigns output only at the end of function
execution.
Function can be used for modeling combinational logic.
Function can call other functions, but can not call a task.
Syntax
A function begins with the keyword function and ends with the keyword endfunction
Inputs are declared after the keyword function.
Example - Simple Function

module simple_function();
function myfunction;
input a, b, c, d;
begin
myfunction = ((a+b) + (c-d));
end
endfunction
endmodule
Example - Calling a Function

module function_calling(a, b, c, d, e, f);
input a, b, c, d, e ;
output f;
o m
wire f;
ìnclude "myfunction.v"
o t.c
assign f = (myfunction (a,b,c,d)) ? e :0;
s p
endmodule
o g
Automatic (Recursive) Function . bl
u p
o
Functions used normally are non recursive. But to eliminate problems when the same function is
r
g
called concurrently from two locations automatic function is used.
s
Example
ent
// define a factorial with recursive function
u d
module top;
// define the function st
it
function automatic integer factorial:y
input[31:0] oper;
.c
integer i: w
begin w
if (operan>=2) w
factorial= factorial(oper -1)* oper:// recursive call
else
factorial=1;
end
endfunction
// call the function
integer result;
initial
begin
result=factorial(4); // call the factorial of 7
$ display (Factorial of 4 is %0d, result) ; // Displays 24
end
endmodule

Constant function
A constant function is a regular verilog function and is used to reference complex values, can be
used instead of constants.
Signed function
These functions allow the use of signed operation on function return values.
module top;
// signed function declaration
// returns a 64 bit signed value
function signed [63:0] compute _signed (input [63:0] vector);
--
--
endfunction
o m
t.c
// call to the signed function from a higher module
if ( compute_signed(vector)<-3)
begin
p o
--
g s
end
o
--
.bl
endmodule
u p
2.1.3 ro
System tasks and functions
g
s
Introduction
ent
u d
st
There are tasks and functions that are used to generate inputs and check the output during
simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system
it y
functions, and, hence, they can be included even in synthesizable models.
$display, $strobe,w$monitor
.c
w
w
These commands have the same syntax, and display text on the screen during simulation. They
are much less convenient than waveform display tools like GTKWave. or Undertow. $display
and $strobe display once every time they are executed, whereas $monitor displays every time
one of its parameters changes. The difference between $display and $strobe is that $strobe
displays the parameters at the very end of the current simulation time unit rather than exactly
where a change in it took place. The format string is like that in C/C++, and may contain format
characters. Format characters include %d (decimal), %h (hexadecimal), %b (binary), %c
(character), %s (string) and %t (time), %m (hierarchy level). %5d, %5b. b, h, o can be appended
to the task names to change the default format to binary, octal or hexadecimal.
Syntax
$display ("format_string", par_1, par_2, ... );

$strobe ("format_string", par_1, par_2, ... );

$monitor ("format_string", par_1, par_2, ... );
$displayb ( as above but defaults to binary..);
$strobeh (as above but defaults to hex..);
$monitoro (as above but defaults to octal..);
$time, $stime, $realtime

These return the current simulation time as a 64-bit integer, a 32-bit integer, and a real number,
respectively.
$reset, $stop, $finish

o m
t.c
$reset resets the simulation back to time 0; $stop halts the simulator and puts it in the interactive
mode where the user can enter commands; $finish exits the simulator back to the operating
system.
p o
g s
$scope, $showscope o
.bl
p
$scope(hierarchy_name) sets the current hierarchical scope to hierarchy_name. $showscopes(n)
u
o
lists all modules, tasks and block names in (and below, if n is set to 1) the current scope.
r
s g
nt
$random
d e
$random generates a random integer every time it is called. If the sequence is to be repeatable,
t u
the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is
derived from the computer clock.
y s
c it
$dumpfile, $dumpvar, . $dumpon, $dumpoff, $dumpall
w
w
w
These can dump variable changes to a simulation viewer like Debussy. The dump files are
capable of dumping all the variables in a simulation. This is convenient for debugging, but can
be very slow.
Syntax
$dumpfile("filename.dmp")
$dumpvar dumps all variables in the design.
$dumpvar(1, top) dumps all the variables in module top and below, but not modules
instantiated in top.
$dumpvar(2, top) dumps all the variables in module top and 1 level below.
$dumpvar(n, top) dumps all the variables in module top and n-1 levels below.

$dumpvar(0, top) dumps all the variables in module top and all level below.
$dumpon initiates the dump.
$dumpoff stop dumping.
$fopen, $fdisplay, $fstrobe $fmonitor and $fwrite

These commands write more selectively to files.
$fopen opens an output file and gives the open file a handle for use by the other
commands.
$fclose closes the file and lets other programs access it.
$fdisplay and $fwrite write formatted data to a file whenever they are executed. They are
the same except $fdisplay inserts a new line after every execution and $write does not.
o m
t.c
$strobe also writes to a file when executed, but it waits until all other operations in the
time step are complete before writing. Thus initial #1 a=1; b=0;
p o
$fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file
whenever any one of its arguments changes.
g s
o
Syntax . bl
u p
handle1=$fopen("filenam1.suffix")
r o
handle2=$fopen("filenam2.suffix")
s g
n t //strobe data into filenam1.suffix
$fstrobe(handle1, format, variable list)
d elist) //write data into filenam2.suffix
$fdisplay(handle2, format, variable
t u list) //write data into filenam2.suffix all on one line.
s
$fwrite(handle2, format, variable
ti y //put in the format string where a new line is
.c // desired.
w
2.2 w
Writing Testbenches
w
2.2.1 Testbenches
are codes written in HDL to test the design blocks. A testbench is also known as
stimulus, because the coding is such that a stimulus is applied to the designed block and its
functionality is tested by checking the results. For writing a testbench it is important to have the
design specifications of the "design under test" (DUT). Specifications need to be understood
clearly and test plan made accordingly. The test plan, basically, documents the test bench
architecture and the test scenarios (test cases) in detail.
Example Counter
Consider a simple 4-bit up counter, which increments its count when ever enable is high and
resets to zero, when reset is asserted high. Reset is synchronous with clock.

Code for Counter

// Function : 4 bit up counter
module counter (clk, reset, enable, count);
input clk, reset, enable;
output [3:0] count;
reg [3:0] count;
count <= 0;
end else if ( enable == 1'b1) begin
count <= count + 1;
end
endmodule
2.2.2 Test Plan m

We will write self checking test bench, but we will do this in steps to.help
o
c you understand the
concept of writing automated test benches. Our testbench environment o t will look something like
shown in the figure.
s p
o g
. bl
u p
r o
s g
n t
d e
t u
s
ti y
.c
w which contains a clock generator, reset generator, enable logic
DUT is instantiated in testbench
w The compare
generator, compare logic. logic calculates the expected count value of the counter
w
and compares its output with the calculated value
2.2.3 Test Cases

Reset Test : We can start with reset deasserted, followed by asserting reset for few clock
ticks and deasserting the reset, See if counter sets its output to zero.
Enable Test : Assert/deassert enable after reset is applied.
Random Assert/deassert of enable and reset.
2.2.4 Creating testbenches

There are two ways of defining a testbench.

The first way is to simply instantiate the design block(DUT) and write the code such that it
directly drives the signals in the design block. In this case the stimulus block itself is the top-
level block.
In the second style a dummy module acts as the top-level module and both the design(DUT) and
the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT
are defined as reg and outputs from DUT are defined as wire. An important point is that there is
no port list for the test bench.
An example of the stimulus block is given below.
Note that the initial block below is used to set the various inputs of the DUT to a predefined
logic state.
Test Bench with Clock generator

module counter_tb;
reg clk, reset, enable;
wire [3:0] count; o m
counter U0 (
.clk (clk), o t.c
.reset (reset),
s p
.enable (enable),
o g
bl
.count (count)
initial
p .
begin
clk = 0; o u
reset = 0;
gr
s
nt
enable = 0;
end
d e
always
t u
#5 clk = !clk;
y s
endmodule
c it
Initial block in verilog is w
.
executed only once. Thus, the simulator sets the value of clk, reset and
enable to 0(0 makes all
same as the module w
w this signals disabled). It is a good design practice to keep file names
name.
Another elaborated instance of the testbench is shown below. In this instance the usage of system
tasks has been explored.
module counter_tb;
reg clk, reset, enable;
wire [3:0] count;
counter U0 (
.clk (clk),
.reset (reset),
.enable (enable),
.count (count)
initial begin
clk = 0;

reset = 0;
enable = 0;
end
always
#5 clk = !clk;
initial begin
$dumpfile ( "counter.vcd" );
$dumpvars;
end
initial begin
$display( "\t\ttime,\tclk,\treset,\tenable,\tcount" );
$monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count);
end
o m
initial
#100 $finish;
o t.c
//Rest of testbench code after this line
s p
Endmodule
o g
l
$dumpfile is used for specifying the file that simulator willbuse to store the waveform, that can
be used later to view using a waveform viewer. (Please p .
refer to tools section for freeware version
of viewers.) $dumpvars basically instructs the Verilogucompiler to start dumping all the signals
to "counter.vcd". r o
s g (screen), \t is for inserting tab. Syntax is
$display is used for printing text or variables to stdout
n t $monitor keeps track of changes to the
same as printf. Second line $monitor is bit different,
e count). When ever anyone of them changes, it
variables that are in the list (clk, reset, enable,
prints their value, in the respective radixdspecified.
t u after #100 time units (note, all the initial, always
blocks start execution at time 0) ys
$finish is used for terminating simulation
c it
Adding the ResetwLogic .
wlogic to allow us to see what our testbench is doing, we can next add the
Once we have the basicw
reset logic, If we look at the testcases, we see that we had added a constraint that it should be
possible to activate reset anytime during simulation. To achieve this we have many approaches,
but the following one works quite well. There is something called 'events' in Verilog, events can
be triggered, and also monitored to see, if a event has occurred.
Lets code our reset logic in such a way that it waits for the trigger event "reset_trigger" to
happen. When this event happens, reset logic asserts reset at negative edge of clock and de-
asserts on next negative edge as shown in code below. Also after de-asserting the reset, reset
logic triggers another event called "reset_done_trigger". This trigger event can then be used at
some where else in test bench to sync up.
Code for the reset logic

event reset_trigger;

event reset_done_trigger;
initial begin
forever begin
@ (reset_trigger);
@ (negedge clk);
reset = 1;
@ (negedge clk);
reset = 0;
reset_done_trigger;
end
end
Adding test case logic
m
Moving forward, lets add logic to generate the test cases, ok we have three testcases as in the
o
t.c
first part of this tutorial. Lets list them again.
Reset Test : We can start with reset deasserted, followed by asserting reset for few clock
o
ticks and deasserting the reset, See if counter sets its output to zero.
p
Enable Test: Assert/deassert enable after reset is applied.
Random Assert/deassert of enable and reset. g s
o
Adding compare Logic . bl
u p
r o
To make any testbench self checking/automated, a model that mimics the DUT in functionality
s g
needs to be designed.For the counter defined previously the model looks similar to:
Reg [3:0] count_compare;
always @ (posedge clk) ent
if (reset == 1'b1)
u d
count_compare <= 0;
st
else if ( enable == 1'b1)
it y
.c
count_compare <= count_compare + 1;
w
Once the logic to mimic the DUT functionality has been defined, the next step is to add the
w
checker logic. The checker logic at any given point keeps checking the expected value with the
w
actual value. Whenever there is an error, it prints out the expected and the actual values, and,
also, terminates the simulation by triggering the event terminate_sim. This can be appended to
the code above as follows:

if (count_compare != count) begin
$display ( "DUT Error at time %d" , $time);
$display ( " Expected value %d, Got Value %d" , count_compare, count);
#5 -> terminate_sim;
end

2.3 User Defined Primitives

2.3.1 Verilog comes with built in primitives like gates, transmission gates, and switches. This set
sometimes seems to be rather small and a more complex primitive set needs to be constructed.
Verilog provides the facility to design these primitives which are known as UDPs or User
Defined Primitives. UDPs can model:

Combinational Logic
Sequential Logic
One can include timing information along with the UDPs to model complete ASIC library
models.
Syntax
m
UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be
o
t.c
defined outside the main module definition.
This code shows how input/output ports and primitve is declared.
primitive udp_syntax (
p o
a, // Port a
g s
b, // Port b
o
c, // Port c
. bl
d // Port d
) u p
output a;
r o
input b,c,d;
s g
// UDP function code here
endprimitive
ent
u d
Note:
st

it y
A UDP can contain only one output and up to 10 inputs max.
Output Port should be the first port followed by one or more input ports.
.c
All UDP ports are scalar, i.e. Vector ports are not allowed.
w
UDP's can not have bidirectional ports.
w
Body w
Functionality of primitive (both combinational and sequential) is described inside a table, and it
ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use
initial to assign initial value to output.
// This code shows how UDP body looks like

primitive udp_body (
a, // Port a
b, // Port b
c // Port c
);
input b,c;

// UDP function code here

// A = B | C;
table
// B C : A
? 1 : 1;
1 ? : 1;
0 0 : 0;
endtable
endprimitive
Note: A UDP cannot use 'z' in input table and instead it uses x.
2.3.2 Combinational UDPs
m
In combinational UDPs, the output is determined as a function of the current input. Whenever an
o
t.c
input changes value, the UDP is evaluated and one of the state table rows is matched. The output
state is set to the value indicated by that row.
Let us consider the previously mentioned UDP.
p o
g s
TestBench to Check the above UDP o
. bl
include "udp_body.v"
module udp_body_tb(); u p
reg b,c; r o
wire a;
s g
udp_body udp (a,b,c);
initial begin ent
$monitor( " B = %b C = %b A = %b" ,b,c,a);
u d
b = 0;
st
c=0;
it y
#1 b = 1;
#1 c = 1; .c
w
#1 b = 1'bx;
w
#1 c = 0;
#1 b = 1;
w
#1 c = 1'bx;
#1 b = 0;
#10 $finish;
end
endmodule
Sequential UDPs
Sequential UDPs differ in the following manner from the combinational UDPs
The output of a sequential UDP is always defined as a reg

An initial statement can be used to initialize output of sequential UDPs
The format of a state table entry is somewhat different

There are 3 sections in a state table entry: inputs, current state and next state. The three
states are separated by a colon(:) symbol.
The input specification of state table can be in term of input levels or edge transitions
The current state is the current value of the output register.
The next state is computed based on inputs and the current state. The next state becomes
the new value of the output register.
All possible combinations of inputs must be specified to avoid unknown output.
Level sensitive UDPs

// define level sensitive latch by using UDP
primitive latch (q, d, clock, clear)
o m
t.c
//declarations output q;
reg q; // q declared as reg to create internal storage
input d, clock, clear;
p o
// sequential UDP initialization g s
o
// only one initial statement allowed
initial . bl
q=0; // initialize output to value 0
u p
r o
// state table
s g
nt
table
// d clock clear : q : q+ ;
? ? 1 : ? : 0 ;// clear condition
d e
u
// q+ is the new output value
t
1 1 s
0 : ? : 1 ;// latch q = data = 1
y
0 1
it
0 : ? : 0 ;// latch q = data = 0
.c
w
? 0 w
0 : ? : - ;// retain original state if clock = 0
endtable
w
endprimitive
Edgesensitive UDPs
//Define edge sensitive sequential UDP;

primitive edge_dff(output reg q = 0 input d, clock, clear);
// state table
table
// d clock clear : q : q+ ;
? ? 1 : ? : 0 ; // output=0 if clear =1

? ? (10): ? : - ; // ignore negative transition of clear

1 (10) 0 : ? : 1 ;// latch data on negative transition
0 (10) 0 : ? : 0 ;// clock
? (1x) 0 : ? : - ;// hold q if clock transitions to unknown state

? (0?) 0 : ? : - ;// ignore positive transitions of clock
? (x1) 0 : ? : - ;// ignore positive transitions of clock
(??) ? 0 : ? : - ;// ignore any change in d if clock is steady
endtable
endprimitive
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

Some Exercises
1. Task and functions

i. Define a function to multiply 2 four bit number. The output is a 32 bit value. Invoke the
function by using stimulus and check results
ii. define a function to design an 8-function ALU that takes 2 bit numbers a and computes a
5 bit result out based on 3 bit select signal . Ignore overflow or underflow bits.
iii. Define a task to compute even parity of a 16 bit number. The result is a 1-bit value that is
assigned to the output after 3 positive edges of clock. (Hint: use a repeat loop in the task)
iv. Create a design a using a full adder. Use a conditional compilation (idef). Compile the
fulladd4 with def parameter statements in the text macro DPARAM is defined by the
'define 'statement; otherwise compile the full adder with module instance parameter
values.
v. m
Consider a full bit adder. Write a stimulus file to do random testing of the full adder. Use
o
t.c
a random number to generate a 32 bit random number. Pick bits 3:0 and apply them to
input a; pick bits 7:4 and apply them to input b. use bit 8 and apply it to c_in. apply 20
random test vectors and see the output.
p o
g s
2. Timing o
. bl
p
i) a. Consider the negative edge triggered with the asynchronous reset D-FF shown below. Write
u
o
the verilog description for the module D-FF. describe path delays using parallel connection.
r
s g
ent
u d
st
it y
.c
w
w
w
b Modify the above if all the path delays are 5.
ii) Assume that a six delay specification is to be specified for all the path delays. All path delays
are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using
the previous DFF write the six delay specifications for all the paths.

3. UDP
i. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low.
ii. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output
is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If
preset=0 then q is decided by clock and d signals. If preset=x then q=x.
iii. Define a negative edge triggered JK FF, jk_ff with asynchronous preset and clear as a
UDP. Q=1when preset=1 and q=0 when clear=1
o m
o t.c
s p
T he table for JK FF is as follows
o g
J K qn+1 .bl
0 0 qnu p
0 1
r0o
1 0
s g 1
1
e nt1 qn
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
23
Description Languages-III

Interface Verilog code to C & C++ using Programming Language Interface
Synthesize a Verilog code and generate a netlist for layout
Verify the generated code, and carry out optimization and debugging
Classify various types of flows in Verification
3.1 Programming Language interface
3.1.1 Verilog
o m
t.c
PLI (Programming Language Interface) is a facility to invoke C or C++ functions from Verilog
code.
p o
g
are $display, $stop, $random. PLI allows the user to create custom
s systemof calls,
The function invoked in Verilog code is called a system call. Examples built-in system calls
something that
o
Verilog syntax does not allow to do. Some of these are:-
. bl
Power analysis.
u p
Code coverage tools. r o
g
Can modify the Verilog simulation datasstructure - more accurate delays.
n t
Custom output displays.
d e
Co-simulation.
t u
Designs debug utilities.
y s
Simulation analysis. ci
t
C-model interfacew
.
to accelerate simulation.
w
w
Testbench modeling.
To achieve the above few application of PLI, C code should have the access to the internal data
structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called
acc routines or access routines
How it Works?
Write the functions in C/C++ code.
Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator
like VCS allows static linking.
Use this Functions in Verilog code (Mostly Verilog Testbench).

Based on simulator, pass the C/C++ function details to simulator during compile process
of Verilog Code (This is called linking, and you need to refer to simulator user guide to
understand how this is done).
Once linked just run the simulator like any other Verilog simulation.
The block diagram representing above is as follows:
o m
o t.c
s p
o g
bl
During execution of the Verilog code by the simulator, whenever the simulator encounters the
.
user defined system tasks (the one which starts with $), the execution control is passed to PLI
p
routine (C/C++ function).
u
o
Example - Hello World
g r
s
t will print "Hello World". This example does not
Define a function hello ( ), which when calledn
e TF and VPI). For exact linking details, the
d
use any of the PLI standard functions (ACC,
u
the C/C++ functions. st
simulator manuals must be referred. Each simulator implements its own strategy for linking with
ti y
C Code .c
w
#include < stdio.h > w
Void hello () { w
printf ( "\nHello World\n" );
Verilog Code
module hello_pli ();
initial begin
$hello;
#10 $finish;
end
endmodule

3.1.2 Running a Simulation

Once linking is done, simulation is run as a normal simulation with slight modification to the
command line options. These modifications tell the simulator that the PLI routines are being
used (e.g. Modelsim needs to know which shared objects to load in command line).
Writing PLI Application (counter example)
Write the DUT reference model and Checker in C and link that to the Verilog Testbench.
The requirements for writing a C model using PLI
Means of calling the C model, when ever there is change in input signals (Could be wire
or reg or types).

m
Means to get the value of the changes signals in Verilog code or any other signals in
Verilog code from inside the C code. o

t.c
Means to drive the value on any signal inside the Verilog code from C code.
o
s p
There are set of routines (functions), that Verilog PLI provides which satisfy the above
requirements
o g
3.1.3 PLI Application Specification . bl
u p
r o
This can be well understood in context to the above counter logic. The objective is to design the
g
PLI function $counter_monitor and check the response of the designed counter using it. This
s
Implement the Counter logic in C.
ent
problem can be addressed to in the following steps:
Implement the Checker logic in C.

u d
t
Terminate the simulation, whenever the checker fails.
s
t y
This is represented in the block diagram in the figure 23.2.
i
.c
w
w
w
Calling the C function

The change in clock signal is monitored and with its change the counter function is executed
The acc_vcl_add routine is used. The syntax can be obtained in the Verilog PLI LRM.

acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor
signals change, it calls the user defined function (this function is called the Consumer C
routine). The vcl routine has four arguments.
Handle to the monitored object

Consumer C routine to call when the object value changes
String to be passed to consumer C routine
Predefined VCL flags: vcl_verilog_logic for logic monitoring vcl_verilog_strength for
strength monitoring
acc_vcl_add (net, display_net, netname, vcl_verilog_logic);
C Code Basic
o m
The desired C function is Counter_monitor , which is called from the Verilog Testbench. As
like any other C code, header files specific to the application are included.Here the include e file
comprises of the acc routines.
o t.c
s p
The access routine acc_initialize initializes the environment for access routines and must be
g
called from the C-language application program before the program invokes any other access
o
. bl
routines. Before exiting a C-language application program that calls access routines, it is
necessary to exit the access routine environment by calling acc_close at the end of the program.
u p
#include < stdio.h >
r o
#include "acc_user.h"
s g
nt
typedef char * string;
handle clk ;
handle reset ;
d e
handle enable ;
t u
handle dut_count ;
y s
int count ; it
void counter_monitor() .c
{ w
acc_initialize(); w
w
clk = acc_handle_tfarg(1);
reset = acc_handle_tfarg(2);
enable = acc_handle_tfarg(3);
dut_count = acc_handle_tfarg(4);
acc_vcl_add(clk,counter,null,vcl_verilog_logic);
acc_close();
}
void counter ()
printf( "Clock changed state\n" );
Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a
pointer to a specific object in the design hierarchy. Each handle conveys information to access
routines about a unique instance of an accessible object information about the object type and,
also, how and where the data pertaining to it can be obtained. The information of specific object

to handle can be passed from the Verilog code as a parameter to the function $counter_monitor.
This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine.
For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first
parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal
list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null ,
vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk
changes.
Verilog Code
Below is the code of a simple testbench for the counter example. If the object being passed is an
instance, then it should be passed inside double quotes. Since here all the objects are nets or
wires, there is no need to pass them inside the double quotes.
module counter_tb(); o m
reg enable;;
reg reset; o t.c
reg clk_reg;
s p
wire clk;
o g
wire [3:0] count;
initial begin . bl
clk = 0;
u p
reset = 0;
r o
$display( "Asserting reset" );
s g
nt
#10 reset = 1;
#10 reset = 0;
$display ( "Asserting Enable" );
d e
#10 enable = 1;
t u
#20 enable = 0;
y s
it
$display ( "Terminating Simulator" );
#10 $finish; .c
w
End
w
Always
w
#5 clk_reg = !clk_reg;
assign clk = clk_reg;
initial begin
$counter_monitor(top.clk,top.reset,top.enable,top.count);
end
counter U(
clk (clk),
reset (reset),
enable (enable),
count (count)
);
endmodule

Access Routines
Access routines are C programming language routines that provide procedural access to
information within Verilog. Access routines perform one of two operations:
Extract information pertaining to an object from the internal data representation.

Write information pertaining to an object into the internal data representation.
Program Flow using access routines

include < acc_user.h >
void pli_func() {
acc_initialize();
// Main body: Insert the user application code here
acc_close();
o m
acc_user.h : all data-structure related to access routines
o t.c

main body : User-defined application s p
acc_initialize( ) : initialize variables and set up environment

o g
acc_close( ) : Undo the actions taken by the function acc_initialize( )
. bl
Utility Routines
u p
r o
g
Interaction between the Verilog tool and the users routines is handled by a set of programs that
s
nt
are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety
of operations on the parameters passed to the system call and are used to do simulation
d e
synchronization or implementing conditional program breakpoint.
t u
3.2 s
Verilog and Synthesis
i ty
3.2.1 What is logic c
. synthesis?
w
w
w
Logic synthesis is the process of converting a high-level description of design into an optimized
gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of
simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes,
memory, and flip-flops. Standard cells put together form the technology library. Normally,
technology library is known by the minimum feature size (0.18u, 90nm).
A circuit description is written in Hardware description language (HDL) such as Verilog Design
constraints such as timing, area, testability, and power are considered during synthesis. Typical
design flow with a large example is given in the last example of this lesson.

o m
3.2.2 Impact of automation on Logic synthesis t.c
p o
For large designs, manual conversions of the behavioral sdescription to the gate-level
o g of modern sophisticated
l whether after fabrication the design
representation are more prone to error. Prior to the development
b
.
synthesis tools the earlier designers could never be sure that
constraints will be met. Moreover, a significant timep of the design cycle was consumed in
converting the highlevel design into its gate level u
o representation. On account of these, if the
gate level design did not meet the requirements rthen the turnaround time for redesigning the
s g design blocks and there was very little
blocks was also very high. Each designer implemented
consistency in design cycles, hence, although
n t the individual blocks were optimized but the
e Moreover, timing, area and power dissipation was
overall design still contained redundant logics.
fabrication process specific and, hence,dwith the change of processes the entire process needed to
t u
s has solved these problems. The high level design is less
be changed with the design methodology.
y
t
However, now automated logic synthesis
idesigns are described at higher levels of abstraction. High level
prone to human error because
. c
design is done without much concentration on the constraints. The tool takes care of all the
wthat the constraints are taken care of. The designer can go back,
w
constraints and sees to it
w
redesign and synthesize once again very easily if some aspect is found unaddressed. The
turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize
the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis
allows a technology independent design. The tools convert the design into gates using cells from
the standard cell library provided by the vendor.
Design reuse is possible for technology independent designs. If the technology changes the tool
is capable of mapping accordingly.
Constructs Not Supported in Synthesis

Construct Type Notes
Initial Only in testbenches
event Events make more sense for syncing test bench components
real Real data type not supported

time Time data type not supported
force and release force and release of data types not supported
assign and deassign assign and deassign of reg data types is not supported, but,
assign on wire data type is supported
Example of a Non-Synthesizable Verilog construct

Codes containing one or more of the above constructs are not synthesizable. But even with
synthesizable constructs, bad coding may cause serious synthesis concerns.
Example - Initial Statement
module synthesis_initial(
clk,q,d);
o m
t.c
input clk,d;
output q;
p o
reg q;
g s
initial begin
o
q <= 0;
. bl
end
always @ (posedge clk) u p
begin
r o
q <= d;
s g
end
endmodule
e nt
Delays are also non-synthesizablete.g.
d
u a = #10 b; This code is useful only for simulation
purpose.
y s
t
Synthesis tool normally ignoresi such constructs, and just assumes that there is no #10 in above
statement, treating the above.c
code as just a = b.
w
3.2.3 Constructs w
w and Their Description
Construct Type Keyword Description
ports input, inout, output Use inout only at IO level.
This makes design more
parameters parameter
generic
module definition module
signals and variables wire, reg, tri Vectors are allowed
module instances primitive gate Eg- nand (out,a,b) bad idea
instantiation
instances to code RTL this way.
function and tasks function , task Timing constructs ignored

always, if, then, else, case, casex,

procedural initial is not supported
casez
Disabling of named blocks
procedural blocks begin, end, named blocks, disable
allowed
Delay information is
data flow assign
ignored
Disabling of named block
named Blocks disable
supported.
While and forever loops
loops for, while, forever must contain @(posedge
clk) or @(negedge clk)
3.2.4 Operators and Their Description

o m
Operator Type Operator Symbol
o t.c
DESCRIPTION
p
s Multiply
Arithmetic *
o g
bl
/ Division
+
p . Add
-
ou Subtract
%
gr Modulus
s
nt
+ Unary plus
d e- Unary minus
Logical
t u ! Logical negation
y s && Logical and

it || Logical or
Relational .c > Greater than
w
w < Less than
w >= Greater than or equal
<= Less than or equal
Equality == Equality
!= inequality
Reduction & Bitwise negation
~& nand
| or
~| nor
^ xor
^~ ~^ xnor
Shift >> Right shift

<< Left shift

Concatenation {} Concatenation
Conditional ? conditional
Constructs Supported In Synthesis
Construct Type Keyword Description

ports input, inout, output Use inout only at IO level.
This makes design more
parameters parameter
generic
module definition module
signals and variables wire, reg, tri Vectors are allowed
module instances primitive gate o m
Eg- nand (out,a,b) bad idea
t.c
instantiation
instances to code RTL this way.
function and tasks function , task
p o
Timing constructs ignored
procedural
always, if, then, else, case, casex,
g s initial is not supported
o
bl
casez
p.
Disabling of named blocks
procedural blocks begin, end, named blocks, disable
allowed
o u
data flow assign
gr Delay information is
ignored
ts Disabling of named block
named Blocks disable
en supported.
u d While and forever loops

loops
st
for, while, forever must contain @(posedge
it y clk) or @(negedge clk)

.c
3.2.5 w Circuit Modeling and Synthesis in brief
Overall Logic
w
w
Combinational Circuit modeling using assign
RTL description This comprises the high level description of the circuit incorporating the RTL
constructs. Some functional verification is also done at this level to ensure the validity of the
RTL description.
RTL for magnitude comparator
// module magnitude comparator
module magnitude_comparator(A_gt_B, A_lt_B, A_eq_B, A,_B);
//comparison output;
output A_gt_B, A_lt_B, A_eq_B ;
// 4- bit numbers input
input [3:0] A,B;
assign A_gt_B= (A>B) ; // A greater than B
assign A_lt_B= (A<B) ; // A greater than B
assign A_eq_B= (A==B) ; // A greater than B

endmodule
Translation
The RTL description is converted by the logic synthesis tool to an optimized, intermediate,
internal representation. It understands the basic primitives and operators in the Verilog RTL
description but overlooks any of the constraints.
Logic optimization
The logic is optimized to remove the redundant logic. It generates the optimized internal
representation.
Technology library o m
t.c synthesis to replace
o
The technology library contains standard library cells which are used during
the behavioral description by the actual circuit components. Thesepare the basic building blocks.
g
Physical layout of these, are done first and then area is estimated.s Finally, modeling techniques
are used to estimate the power and timing characteristics.
b lo
The library includes the following:
p .
Functionality of the cells
o u
Area of the different cell layout
g r
s
Timing information about the various tcells
Power information of various cellse
n
u d
The synthesis tools use these cells totimplement the design.
y s
t
// Library cells for abc_100 technology
VNAND// 2 input nand gate i
VAND// 2 input and gate .
c
w
w
VNOR // 2 input nor gate
w
VOR// 2 input or gate
VNOT// not gate
VBUF// buffer
Design constraints
Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization
demands a compromise among each of these three constraints. Apart from these operating
conditions-temperature etc. also contribute to synthesis complexity.
Logic synthesis
The logic synthesis tool takes in the RTL design, and generates an optimized gate level
description with the help of technology library, keeping in pace with design constraints.

Verification of the gate level netlist

An optimized gate level netlist must always be checked for its functionality and, in addition, the
synthesis tool must always serve to meet the timing specifications. Timing verification is done in
order to manipulate the synthesis parameters in such a way that different timing constraints like
input delay, output delay etc. are suitably met.
Functional verification
Identical stimulus is run with the original RTL and synthesized gate-level description of the
design. The output is compared for matches.
module stimulus
reg [3:0] A, B;
wire A_GT_B, A_LT_B, A_EQ_B;
// instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B);
o m
t.c
initial
$ monitor ($time, A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b, A_GT_B,
A_LT_B, A_EQ_B, A, B)
p o
// stimulate the magnitude comparator
g s
o
endmodule
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

3.3 Verification
3.3.1 Traditional verification flow
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Traditional verification follows the following steps in general.
1. To verify, first a design specification must be set. This requires analysis of architectural
trade-offs and is usually done by simulating various architectural models of the design.
2. Based on this specification a functional test plan is created. This forms the framework for
verification. Based on this plan various test vectors are applied to the DUT (design under
test), written in verilog. Functional test environments are needed to apply these test
vectors.
3. The DUT is then simulated using traditional software simulators.
4. The output is then analyzed and checked against the expected results. This can be done
manually using waveform viewers and debugging tools or else can be done automatically
by verification tools. If the output matches expected results then verification is complete.

5. Optionally, additional steps can be taken to decrease the risk of future design respin.
These include Hardware Acceleration, Hardware Emulation and assertion based
Verification.
Functional verification
When the specifications for a design are ready, a functional test plan is created based on them.
This is the fundamental framework of the functional verification. Based on this test plan, test
vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to
compare its output with the desired results. If the observed results match the expected values, the
verification part is over.
Functional verification Environment

The verification part can be divided into three substages :
o m
Block level verification: verification is done for blocks of code written in verilog using a
number of test cases.
o t.c
Full chip verification: The goal of full chip verification, i.e, all the feature of the full
chip described in the test plan is complete. s p
g
Extended verification: This stage depicts the corner state bugs.
o
. bl
3.3.2 Formal Verification
u p
r o
g
A formal verification tool proves a design by manipulating it as much as possible. All input
s
nt
changes must, however, conform to the constraints for behaviour validation. Assertions on
interfaces act as constraints to the formal tool. Assertions are made to prove the assertions in the
d e
RTL code false. However, if the constraints are too tight then the tool will not explore all
t u
possible behaviours and may wrongly report the design as faulty.
s
Both the formal and the semi-formal methodologies have come into precedence with the
y
increasing complexity of design.it
.c
w
w
w

o m
o t.c
s p
3.3.3 Semi- formal verification o g
. bl
p
Semi formal verification combines the traditional verification flow using test vectors with the
u
power and thoroughness of formal verification.
r o
Semi-formal methods supplement simulations g with test vectors
n tproperties targeted by formal methods

Embedded assertion checks define the
d e the input constraints
u
Embedded assertion checks defines
t
s limited space exhaustibility from the states reached by
ti y the effect of simulation.The exploration is limited to a
Semi-formal methods explore
simulation, thus, maximizing
c state reached by simulation.
certain point around .the
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
3.3.4 it
Equivalence checking y
. c
w place and route tools create a gate level netlist and physical
After logic synthesis and
implementations of thew RTL design, respectively, it is necessary to check whether these
wthe original RTL design. Here comes equivalence checking. It is an
functionalities match
application of formal verification. It ensures that the gate level or physical netlist has the same
functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate
level representations is constructed. It is mathematically proved that their functionality are same.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

3.4 Some Exercises
3.4.1 PLI
i) Write a user defined system task, $count_and_gates, which counts the number of and gate
primitive in a module instance. Hierarchical module instance name is the input to the task. Use
this task to count the number of and gates in a 4-to-1 multiplexer.
3.4.2 Verilog and Synthesis

i) A 1-bit full subtractor has three inputs x, y, z(previous borrow) and two outputs D(difference)
and B(borrow). The logic equations for D & B are as follows
D=xyz+ xyz+ xyz + xyz
B= xy + xz+ yz
o m
Write the verilog RTL description for the full subtractor. Synthesize the full using any
compare the outputs.

o t.c
technology library available. Apply identical stimulus to the RTL and gate level netlist and
p
s input a[2:0] is provided to the
g
ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit
decoder. The output of the decoder is out[7:0]. The output bitoindexed by a[2:0] gets the value 1,
l
the other bits are 0. Synthesize the decoder, using anybtechnology library available to you.
.
Optimize for smallest area. Apply identical stimuluspto the RTL and gate level netlist and
compare the outputs.
o u
gr
s binary counter with synchronous reset that is
iii) Write the verilog RTL description for a 4-bit
active high.(hint: use always loop with the @t (posedge clock)statement.) synthesize the counter
n
using any technology library available eto you. Optimize for smallest area. Apply identical
u d and compare the outputs.
stimulus to the RTL and gate level netlist
st
i ty
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
24
Parallel Data
Communication
Explain why a parallel interface is needed in an embedded system
List the names of common parallel bus standards along with their important features
Distinguish between the GPIB and other parallel data communication standards
Describe how data communication takes place between the controller, talker and listener
devices connected via a GPIB interface
Questions
Question Visual (If any) A B C D Ans.
Parallel Data Communication
o m D
is preferred when the
t. c
following conditions are
satisfied: p o
i) distance between the
g sT T F T
o
bl T
devices is small
p.
ii) the volume of traffic is F T F
small
iii) the required data rate is
o u T F T T
high
gr
The IEEE 488 standard was
ts Intel IBM HP Sun C
originally developed by
The devices connected in a en 1 2 3 4 C
GPIB system are classified
u d
into the following types of
st
categories
i t y
Each device connected in a
.c 3 5 6 7 B
GPIB system has an n-bit
w
address where n=
w
w
Parallel Data Communication
Data processed by an embedded processor need to be conveyed to other components in the
system, namely, an instrument, a smart actuator, a hard disk or a communication network for
onward transmission to a central data warehouse. Similarly data may have to be fetched from a
digital oscilloscope, a CD-ROM Drive or a sensor from the field. Typically, when the physical
distance between the processor and the other component is small, say a within a few meters and a
high volume of data need to be conveyed in a short time, parallel bus interfaces are used.
In this lesson, we first learn about one of the most popular parallel bus standards, namely the
IEEE 488 standard, also known as the GPIB (formerly HPIB). Next we compare and contrast it
with the other similar standards. Finally we discuss about its future particularly in view of the
recently emerging high-serial bus standards like the USB.

Downloaded from www.citystudentsgroup.blogspot.com go to top
The IEEE 488 (GPIB, HPIB) Standard

This BUS-SYSTEM was designed by Hewlett Packards (Currently known as Agilent
Technologies) Test & Measurement Division, in 1960s and was named as HPIB, a short form
for Hewlett Packard Interface Bus, to control programmable instruments that were
manufactured by the company. It was a short-range digital communications cable standard.
Because of its success and proven reliability, in 1973 the HPIB bus became an American
Standard, adopted by the IEEE and renamed as GPIB, for General Purpose Interface Bus. The
standard's number is IEEE488.1.
In parallel, the International Electronic Commission (IEC), responsible for the international
standardization outside the U.S., approved the standard and called it IEC625.1. Due to
introduction of a new naming scheme for all standards, it was renamed to IEC60625.1 later.
o m
There was a slight difference between the IEEE488.1 and IEC625.1. The IEC625.1 standard
o t.c
used a 25 pin DSUB connector for the bus, the IEEE488.1 standard favored a Centronics-like 24
pin connector. Today, the 24-pin connector is always used, but there are also adaptors available
in case older instruments are equipped with a 25-pin DSUB connector.
s p
o g
bl
The '.1 extension of IEEE488.1 / IEC60625.1 indicates that there are several layers of interface
standards. In fact, there is a whole 'family' of standards:
p .
o ou
IEEE488.1 / IEC60625.1 defines the physical layer of the bus system.
o
gr
IEEE488.2 / IEC60625.2 is not a revision of the '.1' standard, it extends its functionality:
go to top
s
nt
A command language (syntax) is defined and common properties of instruments are
defined. Same command names result in similar actions. In contrast to the '.1' standard
d e
that defines physical means like cables, timing and so on, the '.2' standard focuses on the
instrument model.
t u
o s
An application of IEEE488.2 / IEC60625.2 is IEEE1174. It is currently adopted. Briefly
y
it
stated, it translates GPIB functionality to a serial RS232 line, albeit without networking
go to top
.c
capability. It is intended for low cost instruments.
w
w
Thus GPIB has several versions and makes which reflect the same thing, courtesy to the various
w
developments pertaining to its history.
GPIB Electrical and Mechanical Specifications:
The BUS actually comprises a 24 Wire Cable with both MALE and FEMALE Connectors at
each of the individual ends to facilitate the connectivity in a daisy-chain network topology.
Standard TTL level signals are assumed for the ACTIVE, INACTIVE and TRANSITION states
both for Control and Communication.
Specified Transfer Rate: 1 Mega Byte per second.
Cable length:
Twenty meters between Controller and one Device or
Two meters between two devices

Device fanout : Number of instruments may range from Eight to Ten.
CLASSIFICATION of Instruments or Devices (as are called in the Standard) connected through
this bus system:
TALKER: Designated to send data to other instruments eg., Tape Readers, Data Recorders,
Digital Voltmeters, Digital Oscilloscopes etc.
LISTENER:Designated to receive data from other instruments or Controllers, eg., Printers,
Display devices, Programmable Power Supplies, Programmable Signal Generators etc.
CONTROLLER: Decision maker for the designation of an instrument either as a TALKER
or a LISTNER. Usually this role is carried out by a computer.
go to top
All the Talkers, Listeners and the Controller are connected to each other via the following three
different SYSTEM BUSES:
o m
t.c
(Also see A TYPICAL SEQUENCE of DATA FLOW)
Bidirectional Databus
p o
Bus management Lines
g s
o
bl
Handshake Lines
p .
Eight BI-DIRECTIONAL DATALINES have the following functionalities. These are used to
o u
transfer Data, Addresses, Commands and Status information in the form of Bytes.
r
gthe reception of each data byte being duly
DATA : Transferred as BYTES with
ts
acknowledged. n
e for use on a GPIB usually have some switches
ADDRESSES :Instruments intended d
u address the instrument will be assuming on the BUS.
t
which allow a selection of 5-bit
Addresses are characterizedsas :
i ty
o TALK ADDRESSES . c
w
o LISTEN ADDRESSES
w
CONTROL w and COMMAND: BYTES containing information for orienting the devices
to perform the functions like listen, talk etc. These commands can be referred to as the
CONTROL WORDs necessary for establishing efficient communication between the
Controller and the other class of devices.
The various commands are: (also see the COMMAND TABLE)
o UNIVERSAL Commands
o UNLISTEN Commands
o UNTALK Commands
o SECONDARY Commands
Note: The Commands are sent by the Controller to the instruments.
Five BUS MANAGEMENT LINES have the functionalities as follows:

o IFC : Interface Clear

o ATN : Attention
o SRQ : Service Request go to top
o REN : Remote Enable
o EOI : Identify
go to top
Three HANDSHAKE LINES having the functionality of coordinating the transfer of data bytes
on the data bus.These functions can be elaborated as :
o DAV : Data Valid

o NRFD : Not Ready For Data
o NDAC : Not Data Accepted
Note: The Handshake Signals are necessary to facilitate transmission at different

BANDWIDTHS (Data Rates).
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
go to top

go to top
The Block Diagram
om
.c
ot
g sp
lo
.b
o up
gr
ts
en
ud
st
ity
.c
w
w
w
go to top

Downloaded from www.citystudentsgroup.blogspot.com go to top
The SEQUENCE of events pertaining to the actual communication is as follows:
o Power On: Controller takes up the Control of Buses and sends out the IFC signal to
set all instruments on the bus to a known state.
o Controller starts performing the desired series of measurements or tests.
o Controller asserts the ATN line low and starts sending the command address codes
to the talkers and the listeners.
o The CONTROL WORD Structure:
o m
o t.c
s p
The Control Words are given in brief in the Command Table:
o g
. bl
The Command Table
u p
COMMAND
r o CONTROL WORD
Ignored
s g X1111111
Listen Command
ent X01 + 5 LSBs (actual address)
Talk Command
u d X10 + 5 LSBs (actual address)
Universal Command
st X000 + 4 LSBs (16 Commands)
Unlisten Command
Untalk Command it y X0111111
X1011111
.c
Secondary Commands X11 + 5 LSBs (actual address)
w
Note:
w
w
All the Commands Control words are activated only if the ATN line is asserted
low; otherwise, they are in a disabled state.
X here represents the dont care condition.
+ here represents the NEXT indicated number of LSBs.
The following are the most important features:
The Universal Commands go to all the Listeners and Talkers.

The Untalk or Unlisten Commands are for TURNING on or off the indicated
device.
In addition to all the above-indicated tasks the controller checks for the SRQ line
in the context of SERVICE REQUEST.

o On finding it as LOW, it POLLS each device on the bus in a serial go to top

fashion, that is,one-by-one or in parallel.
o It then determines the source of the SRQ, and asserts the ATN line low.
o It then sends the relevant information or command to all the listners and
the talkers depending on the data utility.
The controller again asserts the ATN line high and data is transferred directly
go to top
from the TALKER to the LISTENERS using a double-handshake-signal
sequence.
Some information about DAV, NRFD, and NDAC are given below:
All are OPEN-COLLECTOR.

o A listener can hold NRFD low to indicate that it is not ready for data.
o A listener can hold NDAC low to indicate that it has not yet accepted a
data byte.
o m
An Instance for the above two points can be sited as follows:
o t.c
o All Listeners release the NRFD line indicating that they are ready to
receive data.
s p
g
o The Talker assets the DAV low to indicate that a valid data is on the bus.
o
bl
o All the addressed listeners then pull NRFD low and start accepting the
data, NDAC line being asserted as high.
p .
o The talker, on sensing the NDAC line getting high unasserts the
ou
corresponding DAV signal. The listeners pull NDAC low again, and the
r
sequence is repeated until the talker has sent all the data bytes it has to
g
send. s
can accept the data. ent
o The Data Transfer Rate depends on the rate at which the slowest listener
u d
o On completion of the data transfer the talker pulls the EOI line of the
st
management group of signals low to indicate the transfer completion.
t y
o Finally, the controller takes control of all the data bus and sends Untalk
i
.c
and Unlisten commands to all the talkers and the listeners, and continues
executing its pre-specified internal instructions.
w
w
w Bus Standards
Other Parallel
The following are some other popular Parallel bus standards. They have been designed
mainly for a particular type of application, namely, within a processor mother-board to
interface various peripherals.
1. ISA (IBM Standard Architecture) Bus. This was primarily designed for the
IBMPC (8086 / 186 / 286 Processor based) and uses a 16 bit data bus. It allows
only up to 1024 port addresses. An extension EISA (Extended ISA) allows upto
32 bit data and addresses.
2. PCI (Peripheral Systems Interconnect), PCI /X and PCI Super Buses. This is an
advanced version of the IBM-PC bus designed for the Pentium range of
processors. It has 32/33 and 64/66 MHz versions ( 64/100 MHz in the PCI / X). A

current standard PCI Super allows upto 800 Mbps on a 64-bit bus. It supports
automatic detection of devices via a 64- byte configuration register which makes
it easy to interface plug-and-play devices in a system.
3. IEEE-796 (Multi bus): Originally introduced by Intel as a means of connecting
multiple processors on the system board, this bus is no longer very popular. It
works with 16 bit data & 24 bit address buses.
4. VME Bus: (Euro-standard) Introduced for the same purpose as Intel Multibus it
works with 24 bit address 8/16/32 bit data buses.
5. SCSI Bus (Small Computer System Interface): This standard was originally
designed for use with Apple Mcintosh computers and then popularized by the
Workstation Vendors. The main purpose is to interface peripherals like harddisks,
CD-ROM Drives and similar relatively slow peripheral which use a data rate less
m
than 100Mbps. The following varieties of SCSI are currently implemented:
o
SCSI-1: Uses an 8-bit bus, and supports data rates of 4 Mbps.c
o t
SCSI-2: Same as SCSI-1, but uses a 50-pin connector p instead of a 25-pin
s most people mean when
connector, and supports multiple devices. This is what
o g
they refer to plain SCSI.
b l
Wide SCSI: Uses a wider cable (168 cable .lines to 68 pins) to support 16-bit
transfers.
u p
Fast SCSI: Uses an 8-bit bus, but doublesr o the clock rate to support data rates of
10 Mbps.
s g
Fast Wide SCSI: Uses a 16-bit n
t
bus and supports data rates of 20 Mbps.
Ultra SCSI: Uses an 8-bit bus, d eand supports data rates of 20 Mbps.
SCSI-3: Uses a 16-bit s
tu
bus and supports data rates of 40 Mbps. Also called Ultra
Wide SCSI.
i ty
Ultra2 SCSI: Uses .c an 8-bit bus and supports data rates of 40 Mbps.
w
Wide Ultra2 wSCSI: Uses a 16-bit bus and supports data rates of 80 Mbps.
w
However, for the kind of applications targeted by GPIB, it is now facing a very strong
competition from the recently introduced high speed serial bus standards. Currently there
are four major candidates for future bus systems in Test & Measurement:
The Universal Serial bus (USB) is now very popular. The current implementation
provides transfer rates of up to 12MBit/s. From that viewpoint, there is no speed
enhancement in comparison to GPIB; in fact, it is a drawback.
USB II is an enhanced USB bus capable of transferring up to 480MBit/s. It is
backwards compatible to USB. The IEC SC65C Working group 3 (that developed
also the IEC625.1 and IEC625.2 standards) is planning to work on this.

IEEE1394 (Fire Wire) is now available with transfer rates up to 400MBit/s. A

specification to simulate GPIB was developed by a working group inside
the IEEE1394 Trade Association. It is called IICP (Industrial and Instrumentation
Control Protocol).
Ethernet and related networks using TCP/IP protocol. Transfer rates up to 1GBit/s
are possible. For simulating GPIB, a specification called VXI-11, introduced by
the VXI plug play alliance, exists.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
25
Serial Data
Communication
Distinguish between serial and parallel data communication
Explain why a communication protocol is needed
Distinguish between the RS-232 and other serial communication standards
Describe how serial communication can be used to interconnect two remote computers
using the telephone line
Questions & Answers

Question Visual (If any) A B C D Ans.
The minimum number of 1 2
o m
3 4 C
t.c
lines used in two-way (full-
duplex) serial data
transmission is p o
Ts F
The digital signals need to
o g A
bl
be converted to audio tones
for transmission through
telephone lines because the p .
bandwidth of these lines is
o u
low
gr
A DCE transmits its digital
s TXD RXD DTR DSR B
output data through the line
Differential signaling is used ent T F B
to reduce the effect of signal
u d
attenuation in the
st
transmission line
it y
.c
w
w
w

MAIN TOPICS .!!

(CLICK ON THE HYPERLINKS BELOW.!!)
DATA COMMUNICATION
SERIAL DATA COMMUNICATION: An overview
PC-PC COMMUNICATION (short) (detail)
ASYNCHRONOUS COMMUNICATION PROTOCOL
Conventional CURRENT LOOPS : The outmoded legend

Serial communicaion using the Current loops
4-20 mA Current Loop
RS232.WHAT IS?. o m
STANDARD
o t.c

SIGNALLING/COMMUNICATION TECHNIQUE
ADVANTAGES/APPLICATIONS s p
Disadvantage
o g
RS422 and RS423.....WHAT IS...?.p.
bl
STANDARD
o u

gr
ADVANTAGES/APPLICATIONS
ts
e n
d
RS485......... WHAT IS......?.
u
t
ys
STANDARD

it
.c
ADVANTAGES/APPLICATIONS
w
CONNECTERS
w and PIN DESCRIPTION
w
Differences Between the various standards at a glance!!
(home..)

Serial Data Communication

Data Communication is one of the most challenging fields today as far as technology
development is concerned. Data, essentially meaning information coded in digital form, that is,
0s and 1s, is needed to be sent from one point to the other either directly or through a network.
And when many such systems need to share the same information or different information
through the same medium, there arises a need for proper organization (rather, socialization) of
the whole network of the systems, so that the whole system works in a cohesive fashion.
Therefore, in order for a proper interaction between the data transmitter (the device needing to
commence data communication) and the data receiver (the system which has to receive the data
sent by a transmitter) there has to be some set of rules or (protocols) which all the interested
parties must obey.
o m
t.c
The requirement above finally paves the way for some DATA COMMUNICATION
STANDARDS.
p o
g s
Depending on the requirement of applications, one has to choose the type of communication
strategy. There are basically two major classifications, namely SERIAL and PARALLEL, each
o
bl
with its variants. The discussion about serial communication will be undertaken in this lesson.
.
Any data communication standard comprises u p
r o
The protocol.
s g
Signal/data/port specificationst for the devices or additional electronic circuitry
involved. e n
u d
What is Serial Communication? st (home..)
y
it and, standards are used in situations having a limitation of
. c
Serial data communication strategies
w But it is also the situation in embedded systems where various
the number of lines that can be spared for communication. This is the primary mode of transfer
w
in long-distance communication.
subsystems share thewcommunication channel and the speed is not a very critical issue.
Standards incorporate both the software and hardware aspects of the system while buses mainly
define the cable characteristics for the same communication type.
Serial data communication is the most common low-level protocol for communicating between
two or more devices. Normally, one device is a computer, while the other device can be a
modem, a printer, another computer, or a scientific instrument such as an oscilloscope or a
function generator.
As the name suggests, the serial port sends and receives bytes of information, rather characters
(used in the other modes of communication), in a serial fashion - one bit at a time. These bytes
are transmitted using either a binary (numerical) format or a text format.

All the data communication systems follow some specific set of standards defined for their
communication capabilities so that the systems are not Vendor specific but for each system the
user has the advantage of selecting the device and interface according to his own choice of make
and range.
The most common serial communication system protocols can be studied under the following
categories: Asynchronous, Synchronous and Bit-Synchronous communication standards.
Asynchronous Communication and Standards (home..)
The Protocol
This protocol allows bits of information to be transmitted between two devices at an
arbitrary point of time.
m
The protocol defines that the data, more appropriately a character is sent as frames
o
t.c
which in turn is a collection of bits.

p o
The start of a frame is identified according to a START bit(s) and a STOP bit(s) identifies
the end of data frame. Thus, the START and the STOP bits are part of the frame being
sent or received.
g s
o
bl
The protocol assumes that both the transmitter and the receiver are configured in the
p .
same way, i.e., follow the same definitions for the start, stop and the actual data bits.
Both devices, namely, the transmitter and the receiver, need to communicate at an agreed
u
upon data rate (baud rate) such as 19,200 KB/s or 115,200 KB/s.
o
r
This protocol has been in use for 15 years and is used to connect PC peripherals such as
g
s
modems and the applications include the classic Internet dial-up modem systems.
nt
Asynchronous systems allow a number of variations including the number of bits in a
e
character (5, 6, 7 or 8 bits), the number of stops bits used (1, 1.5 or 2) and an optional
u d
parity bit. Today the most common standard has 8 bit characters, with 1 stop bit and no
st
parity and this is frequently abbreviated as '8-1-n'. A single 8-bit character, therefore,
t y
consists of 10 bits on the line, i.e., One Start bit, Eight Data bits and One Stop bit (as
i
.c
shown in the figure below).
w
Most important observation here is that the individual characters are framed (unlike all
w
the other standards of serial communication) and NO CLOCK data is communicated
w
between the two ends.
The Typical Data Format (known as FRAME) for Asynchronous Communication
START PARITY STOP

Serial Data BIT(s) DATA BITS BIT(s) BIT(s) Serial Data
Interface Specifications for Asynchronous Serial Data

Communication
The serial port interface for connecting two devices is specified by the TIA
(Telecommunications Industry Association) / EIA-232C (Electronic Industries Alliance)

standard published by the Telecommunications Industry Association; both the physical and
electrical characteristics of the interfaces have been detailed in these publications.
RS-232, RS-422, RS-423 and RS-485 are each a recommended standard (RS-XXX) of the
Electronic Industry Association (EIA) for asynchronous serial communication and have more
recently been rebranded as EIA-232, EIA-422, EIA-423 and EIA-485.
It must be mentioned here that, although, some of the more advanced standards for serial
communication like the USB and FIREWIRE are being popularized these days to fill the gap for
high-speed, relatively short-run, heavy-data-handling applications, but still, the above four
satisfy the needs of all those high-speed and longer run applications found most often in
industrial settings for plant-wide security and equipment networking.
RS-232, 423, 422 and 485 specify the communication system characteristics of the hardware
such as voltage levels, terminating resistances, cable lengths, etc. The standards, however, say
o m
nothing about the software protocol or how data is framed, addressed, checked for errors or
interpreted
o t.c
THE RS-232 s p (home..)
o g
. bl
This is the original serial port interface standard and it stands for Recommended Standard
Number 232 or more appropriately EIA Recommended Standard 232 is the oldest and the most
u p
popular serial communication standard. It was first introduced in 1962 to help ensure
o
connectivity and compatibility across manufacturers for simple serial data communications.
r
s g
Applications
ent (home..)
d
Peripheral connectivity for PCs (the PC COM port hardware), which can range beyond
u
t
modems and printers to many different handheld devices and modern scientific
s
instruments. y
it
c
. and definitions pertaining to this standard can be summarized
All the various characteristics
according to: w
w
The maximum wbit transfer rate capability and cable length.
Communication Technique: names, electrical characteristics and functions of signals.
The mechanical connections and pin assignments.
The Standard
Maximum Bit Transfer Rate, Signal Voltages and Cable Length

RS-232s capabilities range from the original slow data rate of up to 20 kbps to
over 1 Mbps for some of the modern applications.
RS-232 is mainly intended for short cable runs, or local data transfers in a range
up to 50 feet maximum, but it must be mentioned here that it also depends on the
Baud Rate.

It is a robust interface with speeds to 115,200 baud, and

It can withstand a short circuit between any 2 pins.
It can handle signal voltages as high / low as 15 volts.
Signal States and the Communication Technique (home..)
Signals can be in either an active state or an inactive state. RS232 is an Active LOW voltage
driven interface where:
ACTIVE STATE: An active state corresponds to the binary value 1. An active signal state can
also be indicated as logic 1, on, true, or a mark.
INACTIVE STATE: An inactive signal state is stated as logic 0, off, false, or a space.
m
For data signals, the true state occurs when the received signal voltage is more
o
t.c
negative than -3 volts, while the "false" state occurs for voltages more positive than 3
volts.

p o
For control signals, the "true" state occurs when the received signal voltage is more
s
positive than 3 volts, while the "false" state occurs for voltages more negative than -3
g
volts.
o
. bl
Transition or Dead Area
u p
r o
Signal voltage region in the range >-3.0V and < +3.0V is regarded as the 'dead area' and
s g
allows for absorption of noise. This same region is considered a transition region, and the signal
state is undefined.
ent
u d
To bring the signal to the "true" state, the controlling device unasserts (or lowers) the value for
st
data pins and asserts (or raises) the value for control pins. Conversely, to bring the signal to the
t y
"false" state, the controlling device asserts the value for data pins and unasserts the value for
i
.c
control pins. The "true" and "false" states for a data signal and for a control signal are as shown
below.
w
w
w

The Communication Technique
Signal State 0 Signal State 1

6
V 3
O
L Transition dead
T 0
A region
G TIME
E -3
-6 Signal State 1 Signal State 0

o m
t.cStatus
Data Signal Status
o
Control Signal
p
g s
l
A factor that limits the distance of reliable data transfer using o RS-232 is the signaling technique
that it uses.
.b
This interface is single-ended meaning thatpcommunication occurs over a SINGLE
WIRE referenced to GROUND, the grounduwire serving as a second wire. Over that
single wire, marks and spaces are created.r
o
While this is very adequate for slower s g
applications. n t applications, it is not suitable for faster and longer
d e
The communication technique t u
y s
RS-232 is designed for iat unidirectional half-duplex communications mode. That simply
. c (driver) is feeding the data to a receiver over a copper line. The
means that a transmitter
w the direction from driver to receiver over that line. If return
data always follows
transmission iswdesired, another set of driver- receiver pair and separate wires are needed.
w if bi-directional or full-duplex capabilities are needed, two separate
In other words,
communications paths are required.
+ +
Data + Data flow
Tx D R Rx
-
- -
RS-232 Single-Ended, Unidirectional, Half Duplex

Disadvantage (home..)
Being a single-ended system it is more susceptible to induced noise, ground loops and ground
shifts, a ground at one end not the same potential as at the other end of the cable e.g. in
applications under the proximity of heavy electrical installations and machineries But these
vulnerabilities at very high data rates and for those applications a different standard, like the RS-
422 etc., is required which have been explained further.
Some Modern Perspectives/Advantages

Most applications for RS-232 today are for data connectivity between portable handheld devices
and a PC. Some of the differences between the modern RS-232 integrals from the older versions
are:
Such devices require that the RS-232 IC to be very small, have low current drain, operate
from a +3 to +5-V supply.
o m
t.c
They provide ESD protection on all transmit and receive pins. For example, some RS-
232 interfaces have specifically been designed for handheld devices and support data
rates greater than 250 kbps, can operate down to +2.7 V.
p o
s
They can automatically go into a standby mode drawing very small currents of the order
g
o
of only 150 nA when not in use, provide 15 kV ESD protection on data pins and are in
bl
the near-chip-scale 5 X 5 mm quad flat no-lead package.
.
p
Nevertheless, for portable and handheld applicationsuthe older RS-232 is still the most popular
one.
r o
s g (home..)
n t
RS-422 and RS-423 (EIA Recommended
d e Standard 422 and 423)
t uto overcome the distance and speed limitations of RS-
These were designed, specifically;
232.Although they are similar ttoy sthe more advanced RS-232C, but can accommodate higher
i
.c and, accommodate multiple receivers.
baud rates and longer cable lengths
w
The Standard w (home..)
w
Maximum Bit Transfer Rate, Signal Voltages and Cable Length
For both of these standards the data lines can be up to 4,000 feet with a data rate around
100 kbps.
The maximum data rate is around 10 Mbps for short runs, trading off distance for
speed.
The maximum signal voltage levels are 6 volts.
The signaling technique for the RS-422 and RS-423 is mainly responsible for there
superiority over RS-232 in terms of speed and length of transmission as explained in the
next subsection.

Communication Technique
The flair of this standard lies in its capability in tolerating the ground voltage differences
between sender and receiver. Ground voltage differences can occur in electrically noisy
environments where heavy electrical machinery is operating.
The criterion here is the differential-data communication technique, also referred to as
balanced-differential signaling. In this, the driver uses two wires over which the signal
is transmitted. However, each wire is driven and floating separate from ground, meaning,
neither is grounded and in this respect this system is different to the single-ended
systems. Correspondingly, the receiver has two inputs, each floating above ground and
electrically balanced with the other when no data is being transmitted. Data on the line
causes a desired electrical imbalance, which is recognized and amplified by the receiver.
The common-mode signals, such as induced electrical noise on the lines caused from
machinery or radio transmissions, are, for the most part, canceled by the receiver. That is
o m
because the induced noise is identical on each wire and the receiver inverts the signal on
one wire to place it out of phase with the other causing a subtraction to occur which
o t.c
results in a Zero difference. Thus, noise picked up by the long data lines is eliminated at
the receiver and does not interfere with data transfer. Also, because the line is balanced
s p
and separate from ground, there is no problem associated with ground shifts or ground
loops.
o g
bl R
p.
x
o u
gr - +
ts R
+ n +
de
+Data
Tx D
-
t
+ Datasflow
u R Rx
- i ty
.c
Data
- - -
w R
w +
w
Rx
RS-422 Differential Signaling, Unidirectional, Half Duplex, Multi-drop
It may be mentioned here to avoid any ambiguity in understanding the RS-422 and the
RS-423 standards, that, the standard RS-423 is an advanced counterpart of RS-422 which
has been designed to tolerate the ground voltage differences between the sender and the
receiver for the more advanced version of RS-232, that is, the RS-232C.
Unlike RS-232, an RS-422 driver can service up to 10 receivers on the same line (bus).
This is often referred to as a half-duplex single-source multi-drop network, (not to be
confused with multi-point networks associated with RS-485), this will be explained
further in conjugation with RS-485.

Like RS-232, however, RS-422 is still half-duplex one-way data communications over a
two-wire line. If bi-directional or full-duplex operation is desired, another set of driver,
receiver(s) and two-wire line is needed. In which case, RS-485 is worth considering.
Applications
This fits well in process control applications in which instructions are sent out to many actuators
or responders. Ground voltage differences can occur in electrically noisy environments
where heavy electrical machinery is operating.
RS-485
This is an improved RS-422 with the capability of connecting a number of devices (transceivers)
on one serial bus to form a network.
o m
The Standard
o t.c
Maximum Bit Transfer Rate, Signal Voltagessand Cable Length p
o g
Such a network can have a "daisy chain" topology b l each device is connected to two
where
other devices except for the devices on the ends.
p .
Only one device may drive data onto the bus u at a time. The standard does not specify the
rules for deciding who transmits and whenro on such a network. That solely depends upon
the system designer to define. g
s standards but the standard max. data rate is 10
t
Variable data rates are available for this
n do offer up to double the standard range i.e. around
Mbps, however ,some manufacturers e
d expense of cable width.
It can connect upto 32 drivers
u
20 Mbps,but of course, it is at the
t and receivers in fully differential mode similar to the RS
s
422.
i ty
.c
w
Communication Technique (home)
w Standard 485 is designed to provide bi-directional half-duplex
w
EIA Recommended
multi-point data communications over a single two-wire bus.
Like RS-232 and RS-422, full-duplex operation is possible using a four-wire, two-bus
network but the RS-485 transceiver ICs must have separate transmit and receive pins to
accomplish this.
RS-485 has the same distance and data rate specifications as RS-422 and uses
differential signaling but, unlike RS-422, allows multiple drivers on the same bus. As
depicted in the Figure below, each node on the bus can include both a driver and receiver
forming a multi-point star network. Each driver at each node remains in a disabled high-
impedance state until called upon to transmit. This is different than drivers made for RS-
422 where there is only one driver and it is always enabled and cannot be disabled.
With automatic repeaters and tri-state drivers the 32-node limit can be greatly exceeded.
In fact, the ANSI-based SCSI-2 and SCSI-3 bus specifications use RS-485 for the
physical (hardware) layer.

RX TX
D Enable
Enable R
Enable
Data Flow
TX D D TX
Data Flow
RX R R RX
R
D Enable
RX TX
o m
RS-485 Differential Signaling, Bi-directional, Half Duplex,t.cMulti-point
p o
Advantages
g s
Among all of the asynchronous standards mentioned b lo above this standard offers the
maximum data rate. p .
Apart from that special hardware for avoidingubus contention and ,
A higher receiver input impedance withro lower Driver load impedances are its other
s g
assets.
n t
d e (home..)
Differences between the tvarious u standards at a glance
s
ti y and mechanical characteristics for application purposes may
All together the important electrical
be classified and summarized.caccording to the table below.
w
w RS-232 RS-422/423 RS-485
Signaling
w Single-Ended Differential Differential
Technique (Unbalanced) (Balanced) (Balanced)
Drivers and
1 Driver 1 Driver 32 Drivers
Receivers on
1 Receiver 10 Receivers 32 Receivers
Bus
Maximum
50 feet 4000 feet 4000 feet
Cable Length
Original Standard 10 Mbps 10 Mbps
Maximum 20 kbps down to down to
Data Rate 100 kbps 100 kbps
Minimum Loaded
Driver Output +/-5.0 V +/-2.0 V +/-1.5 V
Voltage Levels

Driver Load
3 to 7 k 100 54
Impedance
Receiver Input 4k 12 k
3 to 7 k
Impedance or greater or greater
(home..)
Interfacing of Peripherals Involving the Rs-232 Asynchronous

Communication Standards
The RS-232 standard defines the two devices connected with a serial cable as the Data Terminal
Equipment (DTE) and Data Circuit-Terminating Equipment (DCE). This terminology reflects
the RS-232 origin as a standard for communication between a computer terminal and a modem.
o m
Primary communication is accomplished using three pins: the Transmit Data (TD) pin, the
Receive Data(RD) pin, and the Ground pin (not shown). Other pins are available for data flow
ot.c
control. The serial port pins and the signal assignments for a typical asynchronous serial
communication can be shown in the scheme for a 9-pin male connector (DB9) on the DTE as
under: s p
o g
.bl
p
Serial Port Pin and Signal Assignments
u
r o
Pin Label Signal Name Signal Type
s g1 CD Carrier Detect Control

nt 2
de
RD Received Data Data
t u 3 TD Transmitted Data Data
The DB9 male connector itys 4 DTR Data Terminal Ready Control
.c 5 GND Signal Ground Ground

w 6 DSR Data Set Ready Control
w
w 7 RTS Request to Send Control
8 CTS Clear to Send Control
9 RI Ring Indicator Control
(The RS-232 standard can be referred for a description of the signals and pin assignments used
for a 25-pin connector)
Because RS-232 mainly involves connecting a DTE to a DCE, the pin assignments are defined
such that straight-through cabling is used, where pin 1 is connected to pin 1, pin 2 is connected
to pin 2, and so on. A DTE to DCE serial connection using the Transmit Data (TD) pin and the
Receive Data (RD) pin is shown below.

TD (pin 3) RD (pin 3)
DTE DCE
TD (pin 2) RD (pin 2)
Connecting two DTE's or two DCE's using a straight serial cable, means that the TD pin on each
device are connected to each other, and the RD pin on each device are connected to each other.
Therefore, to connect two like devices, a null modem cable has to be used. As shown below, null
modem cables crosses the transmit and receive lines in the cable.
TD (pin 3) TD (pin 3)
DTE DTE
RD (pin 2) RD (pin 2)
o m
o t.c
Serial ports consist of two signal types: data signals and control signals. To support these signal
s p
types, as well as the signal ground, the RS-232 standard defines a 25-pin connection. However,
g
most PC's and UNIX platforms use a 9-pin connection. In fact, only three pins are required for
o
signal ground. . bl
serial port communications: one for receiving data, one for transmitting data, and one for the
u p
o
Throughout this discussion computer is considered a DTE, while peripheral devices such as
r
g
modems and printers are considered DCE's. Note that many scientific instruments function as
s
nt
DTE's.
d e
The term "data set" is synonymous with "modem" or "device," while the term "data terminal" is
synonymous with "computer."
t u
y s
it
(Detail PC PC communication.) (home..)
.c
w
The schematic for a connection between the PC UART port and the Modem serial port is as
shown below: w
w

TxD RxD
UART RxD TxD MODEM
COM SERIAL
CD
PORT PORT
DSR
DTR
DTE DCE
RTS
CTS
Note: The serial port pin and signal assignments are with respect to the DTE. For example, data
is transmitted from the TD pin of the DTE to the RD pin of the DCE.
The Data Pins o m

ot.c
Most serial port devices support full-duplex communication meaning that they can send and
s p
receive data at the same time. Therefore, separate pins are used for transmitting and receiving
g
data. For these devices, the TD, RD, and GND pins are used. However, some types of serial port
o
.bl
devices support only one-way or half-duplex communications. For these devices, only the TD
and GND pins are used. In the course of explanation, it is assumed that a full-duplex serial port is
connected to the DCE.
u p
o
The TD pin carries data transmitted by a DTE to a DCE. The RD pin carries data that is received
r
by a DTE from a DCE.
s g
The Control Pins ent
u d
st
9-pin serial ports provide several control pins whose functions are to:
it y

.c
Signal the presence of connected devices
Control the flow of data
w
w
w
The control pins include RTS and CTS, DTR and DSR, CD, and RI.
The RTS and CTS Pins

The RTS and CTS pins are used to signal whether the devices are ready to send or receive data.
This type of data flow control - called hardware handshaking - is used to prevent data loss during
transmission. When enabled for both the DTE and DCE, hardware handshaking using RTS and
CTS follows these steps:
1. The DTE asserts the RTS pin to instruct the DCE that it is ready to receive data.
2. The DCE asserts the CTS pin indicating that it is clear to send data over the TD pin. If
data can no longer be sent, the CTS pin is unasserted.
3. The data is transmitted to the DTE over the TD pin. If data can no longer be accepted, the
RTS pin is unasserted by the DTE and the data transmission is stopped.

The DTR and DSR Pins

Many devices use the DSR and DTR pins to signal if they are connected and powered. Signaling
the presence of connected devices using DTR and DSR follows these steps:
1. The DTE asserts the DTR pin to request that the DCE connect to the communication line.
2. The DCE asserts the DSR pin to indicate it's connected.
3. DCE unasserts the DSR pin when it's disconnected from the communication line.
The DTR and DSR pins were originally designed to provide an alternative method of hardware
handshaking. However, the RTS and CTS pins are usually used in this way, and not the DSR and
DTR pins. However, you should refer to your device documentation to determine its specific pin
behavior.
The CD and RI Pins

o m
The CD and RI pins are typically used to indicate the presence of certain t.csignals during modem-
modem connections. p o
g s
CD is used by a modem to signal that it has made a connection
lo with another modem, or has
detected a carrier tone. CD is asserted when the DCEb is receiving a signal of a suitable
frequency. CD is unasserted if the DCE is not receivingpa.suitable signal.
o u signal. RI is asserted when the DCE is

rthe DCE is not receiving a ringing signal (for
RI is used to indicate the presence of an audible ringing
g
receiving a ringing signal. RI is unasserted when
ts
example, it's between rings).
e n
A Practical Example: PC-PC u d Communication (home..)
t
s to send data to another computer located far away from its
PROBLEM: Suppose one PC needs t y
i is in the parallel form, it needs to be converted into its serial
.
vicinity. Now, the actual data c
w
counterpart. This is done by a Parallel-in-Serial-out Shift register and a Serial-in-Parallel-out
w component).
Shift register (some electronic
w
It has to be made sure that the transmitter must not send the data at a rate faster than with which
the receiver can receive it. This is done by introducing some handshaking signals or circuitry in
conjugation with the actual system.
For very short distances, devices like UART(Universal Asynchronous Receiver Transmitter:
IN8250 from National Semiconductors Corporation) and USART (Universal Synchronous
Asynchronous Receiver Transmitter; Intel 8251A from Intel Corporation.) incorporate the
essential circuitry for handling this serial communication with handshaking.
For long distances Telephone lines (switched lines) are more practically feasible because of
there pre-availability.
ONE COMPLICATION: BANDWIDTH is only 300 3000Hz.

REMEDY: Convert the digital signal to audio tones. The device, which is used to do this
conversion and vice-versa, is known as a MODEM.
But how all the above Principles are Applied in Practice?

Consider the control room of a steel plant where one main computer is time-sharing and
communicating data to and fro with some other computers or I/O modules in a DCS or SCADA
hierarchy.
In the simplest way the communication hardware-software can be represented in a top-level

block diagram as follows:
MAIN MODEM DIFFERENT TIME-

MODEM
MICROCOMPUTER SHARED DEVICES
o m
TXD D Dt.c TXD D
po
D RXD
RXD
RTS C
TELEPHONE LINE
g s C RTS T
T o
bl
CTS
CTS
E CD
E p . E CD E
DTR
ou DTR
DSR gr DSR
s
ent
u d
st
it y
.c
w
w
w
A TYPICAL DIGITAL TRANSMISSION SYSTEM
Overall Procedure of Communication

(Note: This is actually the initialization and handshaking description for a typical UART, the
Intel 8251A) (for more details click the box here )
To start with, it should be mentioned that the signals alongside the arrowheads represent the
minimum number of necessary signals for the execution of a typical communication standard or
a protocol; being elaborated later. These signals occur when the main control terminal wants to
send some control signal to the end device or if the end device wants to send some data, say an
alarm or some process output, to the main controller.

Both the main microcomputer and the end-device or the time-shared device can be referred to as
terminals.
Whenever a terminal is switched on it first performs a self-diagnostic test, in which it checks

itself and if it finds that its integrity is fully justified it asserts the DTR (data-terminal ready)
signal low. As the modem senses it getting low, it understands that the terminal is ready.
The modem then replies the terminal by asserting DSR (data-set ready) signal low. Here the
direction of the arrows is of prime importance and must be remembered to get the full
understandability of the whole procedure.
If the terminal is actually having some valuable data to convey to the end-terminal it will assert
the RTS (request-to-send) signal low back to the modem and, in turn, the modem will assert the
CD (carrier-detect) signal to the terminal indicating as if now it has justified the connection with
the terminal computer.
o m
But it may be possible that the modem may not be fully ready to transmit
.c the actual data to the
telephone, this may be because of its buffer saturation and several tother reasons. When the
modem is fully ready to send the data along the telephone line it p
o
will assert the CTS (Clear-to-
send) signal back to the terminal.
g s
b loand the modem. When the terminal
.
The terminal then starts sending the serial data to the modem
gets exhausted of the data it asserts the RTS signal lowpindicating the modem that it has not got
any more data to be sent. The modem in turn unasserts
o uits CTS signal and stops transmitting.
r
g processes are executed at the other end.
The same way initialization and the handshaking
s
t important aspect of data communication is the
n
Therefore, it must be noted here that the very
e for transferring serial data to and from the modem.
definition of the handshaking signals defined
u d
Current loops st (home..)
ti y
Current loops are a standard,. cwhich are used widely in process automation. 20 mA are wirely
w communication data to programmable process controlling devices.
used for transmitting serial
w is 4-20mA current loop, which is used for transmitting analogue
Other widely used standard
measurement signalswbetween the sensor and measurement device.
Serial communication using current loop (home..)
In digital communications 20 mA current loop is a standard. The transmitters will only source 20
mA and the receiver will only sink 20 mA. Current loops often use opto-couplers. Here it is the
current which matters and not the voltages.
For measurement purposes a small resistance, say of value1k, is connected in series with the
receiver/transmitter and the current meter. The current flowing into the receiver indicates the
scaled data, which is actually going inside it. The data transmitted though this kind of interface is
usually a standard RS-232 signal just converted to current pulses. Current on and off the

transmission line depends on how the RS-232 circuit distinguishes between the value of currents
and in what way it interprets the logic state thus obtained.
4-20 mA current loop (home..)
4-20 mA current loop interface is the standard for almost all the process control instruments.
This interface works as follows. The sensor is connected to a process controlling equipment,
which reads the sensor value and supplies a voltage to the loop where the sensor is connected
and reads the amount of current it takes. The typical supply voltage for this arrangement is
around 12-24 Volts through a resistor and the measured output is the voltage drop across that
resistor converted into its current counterpart.
The current loop is designed so that a sensor takes 4 mA current when it is at its minimum value
and 20 mA when it is in its maximum value.
o m
Because the sensor will always pass at least 4 mA current and there is usually a voltage drop of
current. o t.c
many volts over the sensor, many sensor types can be made to be powered from only that loop
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
26
Network Communication
Describe the need and importance of networking in an embedded system
List the commonly adopted network communication standards and explain their basic
features
Distinguish between the CAN Bus, Field Bus and other network communication
standards for embedded applications
Choose a particular network standard to suit an application
Questions & Answers

Question Visual (If any) A
o
B Cm D Ans.
t.c
Ethernet-type networks are D
not suitable in an embedded
system because p o
(i) These are very slow
gs T T F F
o
bl
(ii) These do not provide
any guarantee on
service times p . T T T T
(iii) These are expensive
o u T F T F
Foundation Fieldbus
gr B
implements the following
s
layers of the OSI protocol
(i) 2 ent T T F F
(ii) 3
u d T F T F
(iii) 4
st T F T T
(iv) 7
i
The I2C Bus has the followingt y T T T T
features .c A
w
(i) Two-wire
w T T F F
(ii)
w
Full-duplex
(iii) Master-Slave
F
T
T
F
T
T
F
T
CAN Bus standard was T F B
originally developed for
Chemical Processes

Network Communication
The role of networking in present-day data communication hardly needs any elaboration. The
situation is also similar in the case of embedded systems, particularly those which are distributed
over a larger geographical region the so-called distributed embedded systems. Unfortunately,
the most common network standard, namely the Ethernet, is not suitable for such distributed
systems, especially when there are real-time constraints to be satisfied. This is due to the lack of
any service time guarantee in the Ethernet standard. On the other hand, alternatives like Token
Ring, which do provide a service-time guarantee, are not very suitable because of the
requirement of a ring-type topology not very convenient to implement in the industrial
environment.
The industry therefore proposed a standard called Token-bus (and got it approved as the IEEE
802.5 specification) to cater to such requirements. However, the standard became too complex
standards, which are being implemented in specific applications. o m

and inefficient as a result. Subsequently different manufacturers have come up with their own
In this lesson we learn about three such standards, namely o t.c

s p
o I2C Bus
o g
bl
o Field Bus
o CAN Bus
p .
o u
We discuss about the last one in a little more detail because it is slowly emerging as one of the
gr
most popular networking standards for many embedded applications, like Home Appliances,
s
nt
Automobiles, Ships, Vending Machines, Medical Equipment, small-scale industries etc.
de Bus Standard
The I2C (Inter-Integrated Circuit)
u
t
This standard was introduced by sPhilips primarily to connect a number of integrated circuits
i ty link. It uses a two-wire serial protocol. One of these carries
using a single serial communication
. c the clock. As shown in the figure below, one of the Integrated
the Data while the other carries
w is configured as the master while all the others are configured as
Circuits (IC-1 in this case)
w
slaves. Usually a microprocessor or a microcontroller serves as the master. The Protocol does not
limit the number of wmasters but only master devices can initiate a data transfer. Both master and
Servant devices can act as the senders or receivers of data. Normally, all servant devices go into
high impedance state while the master maintains logic high.

Master 0x01 0x02 0x03

Servant Servant Servant
IC-1 IC-2 IC-3 IC-4
Micro- E2PROM Temp. LCD
processor Sensor Display
Clock
Date
o m
t.c
Start 7 bits 8
bit Slave Address
Read /
Data bits
p o Stop
Write
g s Bit
o Always
Always
.bl Asserted
Asserted by
Master u p Ack Bits by master
r o
s g to indicate successful reception
Used by the receiver of the data
ent
The original specifications for this standards were quite low, namely, 100 kbps with 7 bit
u d
addressing. The recent specifications have raised the data rate to 3.4 Mbps with 10 bit
addressing.
st
it y
The Field Bus .c
w
w
The Fieldbus comprises several versions of which, the PROFI (Process Field)-BUS is the
w
standard for local area network for integrated communications from the field level to the cell
level. It enables large numbers of field devices to be networked, and carries signals from the
distributed I/Os to the programmable controller, which might be several kilometers distant, in a
matter of milliseconds.
With transmission rates of up to 12 Mbit/s, PROFIBUS-DP, the high-speed fieldbus for

distributed I/O, guarantees very short response times. Through the increase of the transmission
rate to 12 Mbit/s, the LAN execution time has become irrelevant. There are no bottlenecks in
data transmission. Short response times can be achieved even over large distances - with copper
conductors up to 9600 meters between the PLC and its remote I/O, and no less than 23 km using
fibere optic conductors. The fieldbus consists of a single two-conductor (copper) cable or a thin
fiber optic conductor. Up to 125 nodes can be networked (up to 32 per LAN segment).

Initiatives such as the Interoperable Systems Project (ISP) from manufacturers under the
leadership of Siemens, Fisher-Rosement and Yokagawa, or its counterpart, the WorldFIP, mainly
supported by Honeywell, wanted to establish a de-facto Fieldbus standard by introducing their
products onto the market. Both organisations merged in the Fieldbus Foundation (FF). This
foundation strives to get a single world standard worked out. Industrial applications range from
pulp and paper production and wastewater treatment right through to power station technology.
PROFIBUS operations are processed by standard telegrams passing between master and slave
using predefined channels called communication relations. Data is stored as objects which can
be addressed in the object directory via an index. PROFIBUS specifies an RS 485 interface with
a baud rate of 9.6 kbit/s over a cable length of 1200 m and up to 500 kbit/s over a cable length of
200 m. Telegrams consist of communication relations of the target device, the PROFIBUS
partner address as well as the indices of the object to be addressed along with any data. With the
exception of broadcasts, all telegrams are answered with a positive or negative
acknowledgement. This ensures rapid recognition of faulty or non-existent stations.
o m
Transmission technology (Physical Layer) of the PROFIBUS-PA can be c
t.characterized as
follows:
o Digital, synchronous bit data transmission. p o
o Data rate 31.25 kbit/s.
g s
o Manchester coding.
b lotransposed two-wire cabling
o Signal transmission and remote power supply with
(screened/unscreened). p .
o Remote power supply DC voltage 9V...32V.
o u
o Signal AC voltage 0.75 Vpp...1 Vpp (send
gr voltage).
o Line and tree topology.
ts
o Up to 1.9 km total cabling.
o Up to 32 members per cable segment.e n
o Can be expanded with maximum
u d four repeaters.
st
The FOUNDATION fieldbus model
i t y is based on the IEC Open Systems Interconnect (OSI)
.c
layered communication model.
w
The Physical layer w
w
The fieldbus physical layer is OSI layer 1. Layer 1 receives encoded messages from the upper
layers and converts the messages to physical signals on the fieldbus transmission medium.
Physical layer requirements are defined by the approved IEC 1158-2 and ISA S50.02-1992
Physical Layer Standards. Communications rates supported are 31.25 kbit/s, 1.0 Mbit/s and 2.5
Mbit/s.
The fieldbus physical layer operating at 31.25 kbit/s is intended to replace the 4-20 mA analog
standard currently used to connect field devices to control systems. Like the 4-20 mA standard,
the FOUNDATION fieldbus supports single wire pair operation, bus powered devices, and
intrinsic safety options.

Fieldbus has additional advantages over 4-20 mA because many devices can connect to a single
wire pair resulting in significant savings in wiring costs.
Communication stack
The communications stack comprises OSI Layers 2 and 7. The FOUNDATION fieldbus does
not use the OSI layers 3, 4, 5 and 6 because the functions of these layers are not needed. Instead
of these layers, the Fieldbus Access Sublayer (FAS) is used to map layer 7 directly to layer 2.
Layer 2, the Data Link Layer (DLL), controls transmission of messages onto the fieldbus. The
DDL manages access to the fieldbus through a deterministic centralised bus scheduler called the
Link Active Scheduler (LAS).
A fieldbus may have multiple Link Masters. If the current LAS fails, one of the Link Masters
FOUNDATION fieldbus is designed to "fail operational". o m

will become the LAS and the operation of the FOUNDATION fieldbus will continue. The
o
The DLL is a subset of the emerging ISA/IEC DLL standards committee work.
t.c
s p
g
The Fieldbus Message Specification (FMS) is modeled after the OSI layer 7 Application Layer.
o
bl
FMS provides the communications services needed by the User Layer for remote access of data
across the fieldbus network.
p .
User Layer o u
r
The User Layer is not defined by the tOSI sg model. However, for the first time, the
e
FOUNDATION fieldbus specification defines n a complete user layer based on function blocks.
Function blocks provide the elementsdnecessary for manufacturers to construct interoperable
instruments and controllers. t u
y s
Device descriptionsci t
.
w by a device description (DD) written in a special programming
w
Each fieldbus device is described
w
language known as Device Description Language (DDL). The DD can be thought of as a
"driver" for the device.
The DD provides all of the information needed for a control system or host to interpret
communications coming from the device, including configuration, and diagnostic information.
Any control system or host can communicate with a device if it "knows" the DD for the device.
The host device uses an interpreter called Device Description Services (DDS) to read the DD for
the device.
New FOUNDATION fieldbus devices can be added to the fieldbus at any time by simply
connecting the device to the fieldbus wire and providing the control system or host can read the
identification of the fieldbus device, including the DD identifier, over the fieldbus. Once the DD
identifier is is known, the host reads the DD from a CDROM and supplies the DD to DDS for
interpretation.
The completion of the technical specifications for an interoperable fieldbus system is a major
milestone in the history of automation. The FOUNDATION fieldbus specification was
developed by a consortium of instrument and control system manufacturers that represent over
90% of the instrumentation and control systems provided to end-users worldwide. The
specifications will allow many manufacturers to deliver a wide range of interoperable fieldbus
devices. These devices will usher in the next major technology transition in process and
manufacturing automation.
The CAN Bus

CAN was the solution developed by Robert Bosch GmbH, Germany in 1986 for the development
of a communication system between three ECUs (electronic control units) in vehicles being
designed by Mercedes. The UART, which had been in use for long, had been rendered unsuitable
in their situation because of its point-to-point communication methodology. The need for a
first CAN in 1987. o m

multi-master communication system became a stringent requirement. Intel then fabricated the
o t.c
Controller Area Network (CAN) is a very reliable and message-oriented serial network that was
s p
originally designed for the automotive industry, but has become a sought after bus in industrial
g
automation as well as other applications. The CAN bus is primarily used in embedded systems,
o
. bl
and is actually a network established among micro controllers. The main features are a two-wire,
half duplex, high-speed network system mainly suited for high-speed applications using short
u p
messages. Its robustness, reliability and compatibility to the design issues in the semiconductor
r o
industry are some of the remarkable aspects of the CAN technology.
s g
Main Features
ent

u d
CAN can link up to 2032 devices (assuming one node with one identifier) on a single
t
network. But accounting to the practical limitations of the hardware (transceivers), it may
s
it y
only link up to110 nodes (with 82C250, Philips) on a single network.
.c
It offers high-speed communication rate up to 1 Mbits/sec thus facilitating real-time
control.
w
It embodies unique error confinement and the error detection features making it more
w
trustworthy and adaptable to a noise critical environment.
w
CAN Versions
Originally, Bosch provided the specifications. However the modern counterpart is designated as
Version 2.0 of this specification, which is divided into two parts:
Version 2.0A or Standard CAN; Using 11 bit identifiers.
Version 2.0B or Extended CAN; Using 29 bit identifiers.
The main aspect of these Versions is the formats of the MESSAGE FRAME; the main
difference being the IDENTIFIER LENGTH.

CAN Standards
There are two ISO standards for CAN. The two differ in their physical layer descriptions.
ISO 11898 handles high-speed applications up to 1Mbit/second.
ISO 11519 can go upto an upper limit of 125kbit/second.
The Can Protocol/Message Formats

In a CAN system, data is transmitted and received using Message Frames. Message Frames carry
data from any transmitting node to single or multiple receiving nodes.
CAN protocol can support two Message Frame formats:

- Version 2.0A - Standard CAN
- Version 2.0B - Extended CAN
o m
t.c
Both formats can be overviewed from figure 1 below.
Main advantages are p o

g s
o
bl
The protocol is highly reliable and error resistant.
.
Has become a worldwide-accepted standard (framework) with the development of industrial
p
Embodies a MULTI-MASTER topology. o u
and available embedded system applications like CANopen and DeviceNet.
gr
Possess Sophisticated Error Detection and Handling Capability.
s
nt
Has High immunity to Electromagnetic Interference.
Associated with, is a Short Latency time for High-Priority Message.
e
The total number of Nodes is not limited by the protocol itself.
d
u
Very easy Adaptation and entails flexible Extension and Modification features.
t
y s
it
BASIC CAN Controller
c
The basic topology for w
.
the CAN Controller has been shown in figure 2 below. The basic
w for message transfers and it has an enhanced counterpart in Full-CAN
controller involves FIFOs
wmessage
controller, which uses BUFFERS instead.

F www.citystudentsgroup.blogspot.com
Downloaded from IGURE 1
THE CAN 2.0 A

PROTOCOL / MESSAGE FRAME
M E S S A G E F R A M E
Idle Arbitration Field Control Data Field CRC Field ACK EOF Intr Idle
om
11-bit Identifier DLC Data (0-8) Bytes 15 bits
.c
ot
sp
SOF RTR slot
g
lo
r1
delimiter delimiter
.b
r0
o up
gr
THE CAN 2.0 B
ts
PROTOCOL / MESSAGE FRAME
en
ud
st
ity
M E S S A G E F R A M E
.c
w
Idle Arbitration Field Control Data Field CRC Field ACK EOF Intr Idle
w
w
11-bit Identifier 18-bit Identifier DLC Data (0-8) Bytes 15 bits
SOF SRR RTR r1 slot

IDE
delimiter delimiter
r0 Version 2 EE IIT, Kharagpur 10
FIGURE 2
THE CONTROLLER TOPOLOGY
om
.c
Global
ot
Status and
sp
B C
Control
g
u P
Registers
lo
s U
.b
up
I I
PROTOCOL 10 Bytes Host CPU
CAN Bus n n
o
CONTROLLER Transmit System
gr
t t
Buffer
ts
e e
en
r r
f f
ud
a a
Acceptance 10 Bytes
st
c c
Decision Filter Receive
ity
e e
Buffer
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
27
Wireless Communication
Describe the benefits and issues in wireless communication
Distinguish between WLAN, WPAN and their different implementations like Ricochet,
HiperLAN, HomeRF and Bluetooth
Choose a particular wireless communication standard to suit an application
Wireless Communication
Third generation wireless technologies are being developed to enable personal, high-speed
interactive connectivity to wide area networks (WANs). The IEEE 802.11x wireless
m
technologies finds themselves with an increasing presence in corporate and academic office
o
t.c
spaces, buildings, and campuses. Furthermore, with slow but steady inroads into public areas
such as airports and coffee bars. WAN, LAN and PAN technologies enable device connectivity
o
to infrastructure-based services - either through campus or corporate backbone intranet.
p
g s
The other end of coverage spectrum is occupied by the short-range embedded wireless
o
. bl
connectivity technologies that allow devices to communicate with each other directly without the
need for an established infrastructure. At this end of the coverage spectrum the wireless
u p
technologies like Ricochet, Bluetooth etc. offer the benefits of omni-directionality and the
r o
elimination of the line-of-sight requirement of RF-based connectivity. The embedded
g
connectivity space resembles a communication bubble that follows people around and empowers
s
nt
them to connect their personal devices with other devices that enter the bubble. Connectivity in
this bubble is spontaneous and ephemeral and can involve several devices of diverse computing
d e
capabilities, unlike wireless LAN solutions that are designed for communication between
t u
devices of sufficient computing power and battery.
y s
it
The table below shows a short comparison of various technologies in the wireless arena.
.c
w
w
w

o m
o t.c
In this lesson we look at the most commonly adopted prospects of different wireless technologies
mentioned above.
s p
o g
WLANs-IEEE 802.11X
. bl
u p
This is the most prominent technology standard for WLANs (Wireless Local Area Networks).
r o
This comprises of a PHY (Physical Layer) and MAC (Physical and Medium Access Control).
g
This allows specific carrier frequencies in the 2.4 GHz range bandwidths with data rates of 1 or 2
s
nt
Mbps. Further enhancements to the same technology has lead to the modern day protocol known
as the 802.11b which provides a basic data rate of 11Mbps and a fall-back rate of 5.5Mbps.All
d e
these technologies operate in the internationally available 2.4GHz ISM band. Both IEEE 802.11
t u
and 802.11b standards are capable of providing communications between a number of terminals
s
as an ad hoc network using peer-to-peer mode (see figures at the end) or as a client/server (see
y
it
figures at the end) wireless configuration or a complicated distributed network (see figures at the
.c
end). All these networks require Wireless Cards (PCMCIA-Personal Computer Memory Card
w
International Association-Cards) and wireless LAN Access points. There are two transmission
w
types for these technologies: Frequency Hopping Spread Spectrum (FHSS) and Direct Sequence
w
Spread Spectrum (DSSS). Whereas FHSS is primarily used for low power, low-range
applications, the DSSS is popular with Ethernet-like data rates. In the ad-hoc network mode, as
there is no central controller, the wireless access cards use the CSMA/CA(Carrier Sense Multiple
Access with Collision Avoidance) protocol to resolve shared access of the channel. In the
client/server configuration, many PCs and laptops, physically close to each other (20 to 500
meters), can be linked to a central hub (Known as the access point) that serves as a bridge
between them and the wired network. The wireless access cards provide the interface between
the PCs and the antenna while the access point serves as the wireless LAN hub. The access point
is as high as the ceiling of a roof and can support 115-250 users for receiving, buffering and
transmitting data between the WLAN and the wired network. Access points can be programmed
to select one of the hopping sequences, and he PCMCIA cards tune in to the corresponding
sequence. The WLAN bridge could also be implemented using line-of-sight directional antennas.
Handover and roaming can also be supported across the various access points. Encryption is also
supported using the optional shared-key RC4 (Ron's Code 4 or Rivest's Cipher) algorithm.

Palm Pilot
Server with Laptop with

wireless card wireless card
o m
PDA
o t.c
s p
o g
bl
Peer-to-Peer wireless mode
p .
u LAN Access Point
Wireless
o
gr
Wired Network ts
en
u d
st
i ty
.c
w
w
w
Client/Server wireless configuration

Station
Wired
Network
Access Access Point

Point
Station
Distributed
System
Access Point
o m
Station
o t.c Station
s p
o g
bl
Wired distributed network
WPANs-802.15X
p .
ou
gr
WPANs (Wireless Personal Area Networks) work as short-range wireless networks. The various
WPAN protocols and their interfaces have been and are being standardized by the IEEE 802.15
s
nt
WG (WPAN Working Group). There are four divisions of this standardization.
e
d
1. Under the IEEE 802.15 WPAN/Bluetooth Task Group
u
t
This group deals with support and development of applications requiring medium-rate WPANs
s
it y
(e.g. Bluetooth). These WPANs are supposed to handle technicalities for PDA communications,
.c
Cell-phones and also possess the QoS for voice applications.
w
2. Under the IEEE 802.15 Coexistence Task Group
w
This division deals with developing specifications on the unlicensed ISM band. This standard
w
also called 802.15.2 is developing recommendations to facilitate coexistence of WPANs (802.15)
and WLANs (802.11) such that applications like Bluetooth and Microwaves could operate
flawlessly in the ISM range.
3. Under the IEEE 802.15 WPAN/High Rate Task Group

This division deals with the development of high-rate (20Mb/s or higher) WPANs. Besides a
high rate the new standard provides low-power and low-cost solutions, addressing the needs of
portable consumer digital imaging and multimedia applications.
4. Under the IEEE 802.15 WPAN/Low Rate Task Group

This group deals with standardization of ultra-low complexity, cost, and power for a low-data
rate (200Kb/s or less) connectivity among inexpensive fixed, portable, and moving devices. A
unique capability this standard is supposed to achieve is location awareness. The targeted
applications are sensors, interactive toys, smart badges, remote controls, and home automation.

Ricochet
This provides a secure mobile access to the desktop from outside an office. This service is
provided by MERICOM a commercial Internet Service Provider (ISP). This was primarily
provided at the airports and some selected areas. The Ricochet Network is a wide area wireless
network system using spread spectrum packet switching technique and Metricom's patented
frequency hopping, checker architecture. The network operates within the license-free (902-928)
MHz) ISM band. A Ricochet wireless micro cellular data network (MCDN) is shown in the
figure below.
Microcell radios on street-

lights or other utility poles
Name Server
o m
Gateway
t. c
p o
Modem
radio
Wireless
Access Point g s
o
bl
Router
p .
u
oNetwork Interconnection facility
gr
Computer
s
ent
u d
Ricochet Wireless Microcellular data network
st
it y
It consists of shoebox sized radio transceivers, also called microcell radios, and are typically
.c
mounted to streetlights or utility poles. The microcells require only a small amount of power
from the streetlight itself with the help of a special adapter. Each micro cell radio employs 162
w
frequency-hopping channels and uses a randomly selected hopping sequence. This allows for a
w
very secure network to all subscribers. Within a 20-sq-mile radius containing about 100
w
microcell radios Richochet installs wired access points (WAPs) to collect and convert RF
packets into a format for transmission through a T1 connection. The Richochet Network has a
backbone called the name server, by checking the subscriber serial number. Data packets
between a Ricochet modem and a micro cell radio may take different routes during
transmissions. They can be routed to another Richochet modem or to one of the Internet
gateways, a telephone system, an X.25 network, and LANs or other corporate intranets, The
telephone system gateway provides telephone modem access (TMA), which can also be used to
connect to online internet services.
Services
Richochet provides immediate, dependable, and secure connections without the cost and
complexities of land-based phone lines, dial-up connections, or cellular modems. Richochet

modem features are its 28,800 bps, 24-hour access. The Richochet wireless network is based on
frequency hopping, spread-spectrum packet radio technology, with transmissions randomly
hopping every two-fifths of a second over 162 channels.
HomeRF
This technology comes under ad-hoc networking which spans an area such as enclosed home or
an office building or a warehouse floor in a workshop. A specification for wireless
communications in home called the shared wireless access protocol (SWAP) has been
developed. Some common applications targeted are:
access to a public network telephone (isochronous multimedia) and Internet (data)
entertainment networks (cable television, digital audio and video with IEEE 1394
transfer and sharing of data and resources (printer, Internet connection, etc.), and home
control and automation.
o m
t.c
Advantages of home RF
p o
In HomeRF same connection can be shared for both voice and data among the devices, at the
s
same time. This technology provides a platform for a broad range of interoperable consumer
g
o
devices for wireless digital communication between PCs and consumer electronic devices
anywhere in and around the home.
. bl
The Working Group u p
r o
The working group comprises of Compaq s g
n t Computer
Hewlett-Packard Co., IBM, Intel Corp., Microsoft Corp.,
Corp., Ericsson Enterprise Networks,
Motorola Corp. and several others. A
typical home RF is shown below.
d e
t u
s
ti y
.c
w
w
w

Phone Connection
MAIN PC Clock
Wireless
Cell Headset
Phone
Cable
Microwave Pager
Modem
Oven
Fridge
Other
Data Pad
PCs
o m
Television
Handheld
Communicator ot.c
s p Other PCs
Architecture-HomeRF
o g
Typical characteristics .bl
u p
Uses the 2.4 GHz ISM band r o
Data rate: 2 Mbps and 1 Mbps
s g

Range: 50m
Mobility 10m/s ent
Topology: Packet-Oriented
u d

st
Supports both centralized communication (Infrastructure) and ad-hoc (Infrastructure-less)
communication
it y

.c
Support for simultaneous voice and data transmissions
Provides Six audio connections at 32kbps with 20ms latency
w
w
Maximum data throughput 1.2 Mbps

w
Supports Low-Power paging mode
Provides QoS to voice-only devices and best effort for data-only devices.
HiperLAN
"HiperLAN" or "High-performance LAN" has been designed specifically for an ad-hoc
environment.
Main characteristics of HiperLAN

Can support both multimedia data and asynchronous data at rates as high as 23.5 Mbps.
Employs 5.15 GHz and 1.71 GHz frequency bands
Range : 50m
Mobility 10m/s

Topology : Packet-Oriented
Supports both centralized and ad-hoc communication.
Supports 25 audio connections at 32kbps and latency=10ms and, a video connection of 2
Mbps with 100ms latency and data rate=13.4Mbps.It supports MPEG or other state-of-
the-art real-time digital audio and video standards.
HiperLANs are available in two types :
o TYPE 1 : This has distributed MAC with QoS provisions and is based on GMSK
(Gaussian minimum shift keying)
o TYPE 2: This has a centralized scheduled MAC and is based on OFDM.
Objectives of HiperLAN
Provide QoS to build multiservice networks
Provide strong security

Handoff when moving between local area and wide area
o m
t.c
Increased throughput
Ease of use, deployment, and maintenance
Affordability and Scalability
p o
A typical HiperLAN system is shown in the figure below:
g s
o
.bl
Fixed Network
u p
r o
s g
e nt
AP
u d
AP AP AP
st
it y
.c
w
w
w
HiperLAN System
Bluetooth & Infrared communication

Bluetooth has been designed to allow low-bandwidth wireless connections. It was started by a
group of companies including Ericsson, Intel, IBM, Nokia and Toshiba known collectively as a
Bluetooth Special Interest Group (SIG) in 1998. Some other companies like Microsoft, Lucent,
3COM, Motorola etc joined it in 1999.The effort got focused on to develop a reliable universal
link for short-range RF communication.

INFRARED WIRELESS COMMUNICATION and BLUETOOTH

Infrared technology is another dominant in the field of wireless communications. This has been
incorporated in remote controls, notebook computers, personal digital assistants etc. It uses the
invisible spectrum of light for transmissions. The IrDA (Infrared Data Association) has
specified one standard method for infrared communications, which is commonly used with
mobile phones and notebook to handheld computers. Like Bluetooth, IrDA has also been
designed for short-range and low-power applications. In addition, the spectrum it utilizes is also
unlicensed. Also, like Bluetooth it also defines a physical layer and a software protocol stack
and, hence, promotes interoperable communications. The difference lies in transmission speeds
and signal paths (Infrared requires line-of-sight paths where RF can penetrate through objects).
A typical use of Bluetooth to connect a Laptop is as shown below.
o m
o t.c
s p
o g
. bl
u p
r o
A Bluetooth Connection
s g instance, Bluetooth radio technology built into

both cellular telephone and a laptop replacesn t the cable used today to connect a laptop to a
Bluetooth provides many options to the user. For
cellular phone. Printers, desktops, FAX e

digital device can be networked by u d machines, keyboards, joysticks and virtually any other
bridge to existing data networksstand mechanism to form small private ad hoc groups of
the Bluetooth system. Bluetooth also provides a universal
i
connected devices away from fixed ty network architectures.
.c
w
Bluetooth wireless communication technologies operate in the 2.4 GHz range. There are
w Thisto isRFimportant
certain propositions related Communication in the 2.4 GHz spectrum which the device
w
developers must follow. for an organized use of the spectrum because it is
globally unlicensed. As such it is bound by specific regulations put forth by various countries in
their respective territories. In context to wireless communications the RF Spectrum has been
divided into 79 channels where bandwidth is limited to 1 MHz per channel. Frequency Hopping
spread spectrum communications must be incorporated. Also proper mechanism for interference
anticipation and removal should also be there. This is essential on account of the fact that the 2.4
GHz spectrum is unlicensed and, hence, more vulnerable to signal congestion because of
increasing number of new users trying to communicate within the bandwidth.
Bluetooth Communication Topology

The Bluetooth network model is based upon the concept of proximity networking which implies
that as soon as two or more devices come within a range of eachother they should be able to
establish a connection. This enables the structuring of PAN (Personal Area Network). There are

two different communication topologies of Bluetooth PANs are piconet and scatternet. They
are described in brief below.
The Piconet
PROXIMITY
SPHERE
PICONET
AS1 Active slave AS1

(includes sniff
PS1
hold modes)
PS1 Parked Slave
MASTER AS2 SB1
PS2
SB1 Standby SB2
Outside Piconet AS3 AS4
o m
o t.c
A PICONET
s p
o g
A piconet consists of single master and all slaves in its proximity, which are communicating with
bl
it. The slaves may be in active, sniff, hold or park modes at any instant of time. There can be
.
u p
upto seven slaves and any number of parked slaves and standby devices in the vicinity of the
master. The above figure shows a typical piconet. The figure shows two spheres. The white filled
r o
inscribed sphere comprises the piconet where the ellipses represent the devices or slaves and the
s g
box represents the master. Thus, there is only one master and several slaves. The slave names
ent
starting from À' represent the Active slaves and these are linked to the master with continuous
lines meaning ÀCTIVE'. The slave names starting with `P' represent the parked slaves. Dashed
d
lines are shown connecting it to the master meaning that the connection is not continuous but the
u
t
devices are in the piconet i.e., `PARKED'. Some other slaves with names starting form `S'
s
it y
indicate the slaves, which are in STAND-BY and these, are actually outside the piconet but
.c
inside the proximity sphere.
w
w
w

The Scatternet
Piconet A Piconet B
AS A1 AS B1
PS A1 PS B1
MASTER MASTER
AS B2
AS B3
AS A2
PS B2
AS A3 AS B4
o m
o t.c
A SCATTERNET
s p
o g
Scatternet is formed when two or more piconets fall in each others proximity. More precisely, a
bl
scatternet is formed when two or more piconets at least partially overlap in time and space.
.
u p
Within a Scatternet a slave can participate in multiple piconets by establishing connections and
synchronizing with different masters in its proximity. A single device may act as master in one
r o
piconet and at the same time as slave in another one. A practical example of scatternet is mobile
s g
communication in which devices move frequently in and out of proximity of other devices.
Figure above shows a typical Scatternet.
e nt
Bluetooth Specifications u d
st
Typical Bluetooth specificationsty
i have been characterized in the table below.
.c
w
w
w

Bluetooth Core Protocols
Upper Layer SDP
L2CAP LMP Audio
Baseband
o m
Low radio Layer
o t.c
s p
Bluetooth Core Protocols
o g
A brief description is as follows. .bl
u p
Service Discovery Protocol (SDP) provides means for application to discover which services are
o
provided by or available through a Bluetooth device. It also allows applications to determine the
r
g
characteristics of those available services. Logical Link Control and adaptation layer protocol
s
nt
(L2CAP) supports higher-level protocol multiplexing, packet-segmentation and reassembly, and
the conveying of QoS (Quality of Service) information. The link managers (on either side) for
d e
link step and control use Link Manager Protocol (LMP). The baseband and link control layer
t u
enables physical RF link between Bluetooth units forming piconet. It provides two different
s
packets, SCO and ACL, which can be transmitted in a multiplexing manner on the same RF link.
y
it
Different master/Slave pairs of the same piconet can use different link types, and the link type
.c
may change arbitrarily during a session. Each link type supports up to sixteen different packet
w
types. Four of these are control packets and are common for both SCO and ACL links. Both link
w
types use a TDD scheme or full-duplex transmissions. The SCO link is symmetric and typically
w
supports time-bounded voice traffic. SCO packets are transmitted over reserved intervals. Once
the connection is established, both master and slave units may send SCO packet types and allow
both voice and data transmission-with only the data portion being retransmitted when corrupted.

Operational States
OPERATIONAL STATES OF THE BLUETOOTH DEVICES
Stand -By
PAGE PAGE SCAN INQUIRY SCAN INQUIRY
o m
o t.c
Master Response Slave Response
s p
Inquiry Response
o g
. bl
u p
r o
s g
nt
Connection
d e
u
OPERATIONAL STATE MACHINE
t
State Description y s
c it
. default state and the lowest power consuming one too. Only the
STANDBY This is the
w in the low-power mode.
w
Bluetooth clock operates
INQUIRY In wthis state a device seeks and gets familiar with the identity of other devices
in its proximity. The other devices must have their Inquiry Scan state ENABLED if they
want to entertain the query from other devices.
PAGE In this state master of a piconet invites other devices to join in. To entertain this
request the invitee must have its Page Scan state ENABLED.
A device may bypass the inquiry state if the identity of the device it is wanting to page is
previously known (see the figure above). The figure above also indicates that any member of a
piconet not necessarily the master, may still perform INQUIRY and PAGE operations for
additional devices, thus, paving way for a Scatternet.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
28
Introduction to Real-Time
Systems
Specific Instructional Objectives

At the end of this lesson, the student would be able to:
Know what a Real-Time system is
Get an overview of the various applications of Real-Time systems
Visualize the basic model of a Real-Time system
Identify the characteristics of a Real-Time system
Understand the safety and reliability aspects of a Real-Time system
Know how to achieve highly reliable software
Get an overview of the software fault-tolerant techniques
Classify the Real-Time tasks into different categories
o m
1. Introduction
o t.c
Commercial usage of computer dates back to a little more than p
scomputing.
fifty years. This brief period
can roughly be divided into mainframe, PC, and post-PC eras of
o g The mainframe era
computer served a large number of users. The PC era .saw bl the emergence
was marked by expensive computers that were quite unaffordable by individuals, and each
could be easily be afforded and used by the individual p

u embedded
of desktops which
users. The post-PC era is seeing
r o
emergence of small and portable computers, and computers in everyday applications,
g everyday.
making an individual interact with several computers
s
n t
e
Real-time and embedded computing applications in the first two computing era were rather
rare and restricted to a few specializeddapplications such as space and defense. In the post-PC
era of computing, the use of computert u systems based on real-time and embedded technologies
s our life and is still growing at a pace that was never seen
ti y and Internet-enabled devices have now captured everyones
has already touched every facet of
before. While embedded processing
imagination, they are just .c a small fraction of applications that have been made possible
by real-time systems. Ifw we casually look around us, we can discover many of them often they
are camouflaged insidewsimple looking devices. If we observe carefully, we can notice several
w which have today become indispensable to our every day life, are in
gadgets and applications
fact based on embedded real-time systems. For example, we have ubiquitous consumer products
such as digital cameras, cell phones, microwave ovens, camcorders, video game sets;
telecommunication domain products and applications such as set-top boxes, cable modems,
voice over IP (VoIP), and video conferencing applications; office products such as fax machines,
laser printers, and security systems. Besides, we encounter real-time systems in hospitals in the
form of medical instrumentation equipments and imaging systems. There are also a large number
of equipments and gadgets based on real-time systems which though we normally do not use
directly, but never the less are still important to our daily life. A few examples of such systems
are Internet routers, base stations in cellular systems, industrial plant automation systems, and
industrial robots.
It can be easily inferred from the above discussion that in recent times real-time computers
have become ubiquitous and have permeated large number of application areas. At present, the

computers used in real-time applications vastly outnumber the computers that are being used in
conventional applications. According to an estimate [3], 70% of all processors manufactured
world-wide are deployed in real-time embedded applications. While it is already true that an
overwhelming majority of all processors being manufactured are getting deployed in real-time
applications, what is more remarkable is the unmistakable trend of steady rise in the fraction
of all processors manufactured world-wide finding their way to real-time applications.
Some of the reasons attributable to the phenomenal growth in the use of real-time
systems in the recent years are the manifold reductions in the size and the cost of the
computers, coupled with the magical improvements to their performance. The availability
of computers at rapidly falling prices, reduced weight, rapidly shrinking sizes, and their
increasing processing power have together contributed to the present scenario. Applications
which not too far back were considered prohibitively expensive to automate can now be
affordably automated. For instance, when microprocessors cost several tens of thousands of
rupees, they were considered to be too expensive to be put inside a washing machine; but when
they cost only a few hundred rupees, their use makes commercial sense.
o m
.c
The rapid growth of applications deploying real-time technologiest has been matched by the
evolutionary growth of the underlying technologies supporting p
o
the development of real-time
systems. In this book, we discuss some of the core technologies
g s used in developing real-time
systems. However, we restrict ourselves to software issues only
lo quite
and keep hardware discussions
to the bare minimum. The software issues that we address
. b are expansive in the sense that
besides the operating system and program development
database issues. u p issues, we discuss the networking and
r o
s g introductory and fundamental issues. In the
In this chapter, we restrict ourselves to some
next three chapters, we discuss some core ttheories underlying the development of practical
n
real-time and embedded systems. Inethe subsequent chapter, we discuss some important
u
features of commercial real-time operatingd systems. After that, we shift our attention to real-
time communication technologies and
st databases.
i t y
1.1. What is Real-Time? .c
w notion of time. Real-time is measured using a physical (real)
w
Real-time is a quantitative
clock. Whenever wewquantify time using a physical clock, we deal with real time. An example
use of this quantitative notion of time can be observed in a description of an automated chemical
plant. Consider this: when the temperature of the chemical reaction chamber attains a certain
o
predetermined temperature, say 250 C, the system automatically switches off the heater within a
predetermined time interval, say within 30 milliseconds. In this description of a part of the
behavior of a chemical plant, the time value that was referred to denotes the readings of some
physical clock present in the plant automation system.
In contrast to real time, logical time (also known as virtual time) deals with a qualitative
notion of time and is expressed using event ordering relations such as before, after, sometimes,
eventually, precedes, succeeds, etc. While dealing with logical time, time readings from a
physical clock are not necessary for ordering the events. As an example, consider the following
part of the behavior of library automation software used to automate the book-keeping activities
of a college library: After a query book command is given by the user, details of all matching

books are displayed by the software. In this example, the events issue of query book
command and display of results are logically ordered in terms of which events follow the
other. But, no quantitative expression of time was required. Clearly, this example behavior is
devoid of any real-time considerations. We are now in a position to define what a real-time
system is:
A system is called a real-time system, when we need quantitative expression of time (i.e.
real-time) to describe the behavior of the system.
Remember that in this definition of a real-time system, it is implicit that all quantitative time
measurements are carried out using a physical clock. A chemical plant, whose part behavior
description is - when temperature of the reaction chamber attains certain predetermined
temperature value, say 250oC, the system automatically switches off the heater within say
30 milliseconds - is clearly a real-time system. Our examples so far were restricted to the
description of partial behavior of systems. The complete behavior of a system can be described
o m
by listing its response to various external stimuli. It may be noted that all the clauses in
o t.c
the description of the behavior of a real-time system need not involve quantitative measures of
time. That is, large parts of a description of the behavior of a system may not have any
p
quantitative expressions of time at all, and still qualify as a real-time system. Any system whose
s
g
behavior can completely be described without using any quantitative expression of time is of
o
bl
course not a real-time system.
p
1.2. Applications of Real-Time Systems
.
o u
Real-time systems have of late, found g r
applications in wide ranging areas. In the
following, we list some of the prominent areas ts of application of real-time systems and in each
n
identified case, we discuss a few example applications in some detail. As we can imagine, the
list would become very vast if we tryeto exhaustively list all areas of applications of real-
u d our list to only a handful of areas, and out of these we
have explained only a few selected s t applications to conserve space. We have pointed out the
time systems. We have therefore restricted
i
quantitative notions of time used ty in the discussed applications. The examples we present are
important to our subsequent.c discussions and would be referred to in the later chapters whenever
required. w
w
1.2.1. Industrial w Applications
Industrial applications constitute a major usage area of real-time systems. A few examples of
industrial applications of real-time systems are: process control systems, industrial automation
systems, SCADA applications, test and measurement equipments, and robotic equipments.
Example 1: Chemical Plant Control
Chemical plant control systems are essentially a type of process control application. In an
automated chemical plant, a real-time computer periodically monitors plant conditions. The
plant conditions are determined based on current readings of pressure, temperature, and
chemical concentration of the reaction chamber. These parameters are sampled
periodically. Based on the values sampled at any time, the automation system decides on the
corrective actions necessary at that instant to maintain the chemical reaction at a certain rate.

Each time the plant conditions are sampled, the automation system should decide on the
exact instantaneous corrective actions required such as changing the pressure,
temperature, or chemical concentration and carry out these actions within certain
predefined time bounds. Typically, the time bounds in such a chemical plant control
application range from a few micro seconds to several milliseconds.
Example 2: Automated Car Assembly Plant
An automated car assembly plant is an example of a plant automation system. In an

automated car assembly plant, the work product (partially assembled car) moves on a
conveyor belt (see Fig. 28.1). By the side of the conveyor belt, several workstations are
placed. Each workstation performs some specific work on the work product such as fitting
engine, fitting door, fitting wheel, and spray painting the car, etc. as it moves on the
conveyor belt. An empty chassis is introduced near the first workstation on the conveyor belt.
A fully assembled car comes out after the work product goes past all the workstations. At
m
each workstation, a sensor senses the arrival of the next partially assembled product. As
o
t.c
soon as the partially assembled product is sensed, the workstation begins to perform
its work on the work product. The time constraint imposed on the workstation computer
p o
is that the workstation must complete its work before the work product moves away to the
s
next workstation. The time bounds involved here are typically of the order of a few hundreds
g
of milliseconds.
o
. bl
u p
Fit r oFit
Fit door
s g wheel Spray
nt
engine paint
Chassis
de
Finished car
t uConveyor Belt
it ys
Fig. 28.1 Schematic Representation of an Automated Car Assembly Plant
.c
w
Example 3: Supervisory Control And Data Acquisition (SCADA)
w
w
SCADA are a category of distributed control systems being used in many industries. A
SCADA system helps monitor and control a large number of distributed events of interest. In
SCADA systems, sensors are scattered at various geographic locations to collect raw data
(called events of interest). These data are then processed and stored in a real-time database.
The database models (or reflects) the current state of the environment. The database is
updated frequently to make it a realistic model of the up-to-date state of the environment. An
example of a SCADA application is an Energy Management System (EMS). An EMS helps
to carry out load balancing in an electrical energy distribution network. The EMS senses the
energy consumption at the distribution points and computes the load across different phases
of power supply. It also helps dynamically balance the load. Another example of a SCADA
system is a system that monitors and controls traffic in a computer network. Depending on
the sensed load in different segments of the network, the SCADA system makes the
router change its traffic routing policy dynamically. The time constraint in such a SCADA

application is that the sensors must sense the system state at regular intervals (say every few
milliseconds) and the same must be processed before the next state is sensed.
1.2.2. Medical
A few examples of medical applications of real-time systems are: robots, MRI scanners,
radiation therapy equipments, bedside monitors, and computerized axial tomography (CAT).
Example 4: Robot Used in Recovery of Displaced Radioactive Material
Robots have become very popular nowadays and are being used in a wide variety of
medical applications. An application that we discuss here is a robot used in retrieving
displaced radioactive materials. Radioactive materials such as Cobalt and Radium are used
for treatment of cancer. At times during treatment, the radioactive Cobalt (or Radium) gets
o m
dislocated and falls down. Since human beings can not come near a radioactive material, a
robot is used to restore the radioactive material to its proper position. The robot walks into
o t.c
the room containing the radioactive material, picks it up, and restores it to its proper position.
The robot has to sense its environment frequently and based on this information, plan its
p
path. The real-time constraint on the path planning task of the robot is that unless it plans the
s
g
path fast enough after an obstacle is detected, it may collide with it. The time constraints
o
bl
involved here are of the order of a few milliseconds.
1.2.3. Peripheral equipments p .

ou
gr
A few examples of peripheral equipments that contain embedded real-time systems are: laser
s
nt
printers, digital copiers, fax machines, digital cameras, and scanners.
Example 5: Laser Printer d e

t u
s
Most laser printers have powerful microprocessors embedded in them to control different
y
it
activities associated with printing. The important activities that a microprocessor embedded
.c
in a laser printer performs include the following: getting data from the communication
w
port(s), typesetting fonts, sensing paper jams, noticing when the printer runs out of paper,
w
sensing when the user presses a button on the control panel, and displaying various messages
w
to the user. The most complex activity that the microprocessor performs is driving the laser
engine. The basic command that a laser engine supports is to put a black dot on the paper.
However, the laser engine has no idea about the exact shapes of different fonts, font sizes,
italic, underlining, boldface, etc. that it may be asked to print. The embedded microprocessor
receives print commands on its input port and determines how the dots can be composed
to achieve the desired document and manages printing the exact shapes through a series
of dot commands issued to the laser engine. The time constraints involved here are of the
order of a few milli seconds.
1.2.4. Automotive and Transportation

A few examples of automotive and transportation applications of real-time systems are:
automotive engine control systems, road traffic signal control, air-traffic control, high-speed train
control, car navigation systems, and MPFI engine control systems.

Example 6: Multi-Point Fuel Injection (MPFI) System
An MPFI system is an automotive engine control system. A conceptual diagram of a car

embedding an MPFI system is shown in Fig.28.2. An MPFI is a real-time system that
controls the rate of fuel injection and allows the engine to operate at its optimal efficiency.
In older models of cars, a mechanical device called the carburetor was used to control the
fuel injection rate to the engine. It was the responsibility of the carburetor to vary the
fuel injection rate depending on the current speed of the vehicle and the desired
acceleration. Careful experiments have suggested that for optimal energy output, the required
fuel injection rate is highly nonlinear with respect to the vehicle speed and acceleration.
Also, experimental results show that the precise fuel injection through multiple points is
more effective than single point injection. In MPFI engines, the precise fuel injection rate at
each injection point is determined by a computer. An MPFI system injects fuel into
individual cylinders resulting in better power balance among the cylinders as well as higher
output from each one along with faster throttle response. The processor primarily controls the
m
ignition timing and the quantity of fuel to be injected. The latter is achieved by controlling
o
t.c
the duration for which the injector valve is open popularly known as pulse width. The
actions of the processor are determined by the data gleaned from sensors located all over the
p o
engine. These sensors constantly monitor the ambient temperature, the engine coolant
s
temperature, exhaust temperature, emission gas contents, engine rpm (speed), vehicle
g
o
road speed, crankshaft position, camshaft position, etc. An MPFI engine with even an 8-bit
bl
computer does a much better job of determining an accurate fuel injection rate for given
.
u p
values of speed and acceleration compared to a carburetor-based system. An MPFI system
not only makes a vehicle more fuel efficient, it also minimizes pollution by reducing partial
combustion. r o
s g
n t (MPFI) System
Multi Point Fuel Injection
d e
t u
s
ti y
.c
w
w Computer
w
Fig. 28.2 A Real-Time System Embedded in an MPFI Car
1.2.5. Telecommunication Applications

A few example uses of real-time systems in telecommunication applications are: cellular
systems, video conferencing, and cable modems.
Example 7: A Cellular System
Cellular systems have become a very popular means of mobile communication. A cellular
system usually maps a city into cells. In each cell, a base station monitors the mobile
handsets present in the cell. Besides, the base station performs several tasks such as
locating a user, sending and receiving control messages to a handset, keeping track of

call details for billing purposes, and hand-off of calls as the mobile moves. Call
hand-off is required when a mobile moves away from a base station. As a mobile moves
away, its received signal strength (RSS) falls at the base station. The base station monitors
this and as soon as the RSS falls below a certain threshold value, it hands-off the
details of the on-going call of the mobile to the base station of the cell to which the mobile
has moved. The hand-off must be completed within a sufficiently small predefined time
interval so that the user does not feel any temporary disruption of service during the hand-off.
Typically call hand-off is required to be achieved within a few milliseconds.
1.2.6. Aerospace
A few important use of real-time systems in aerospace applications are: avionics,
flight simulation, airline cabin management systems, satellite tracking systems, and computer
on-board an aircraft.
Example 8: Computer On-board an Aircraft o m

o t.c
In many modern aircrafts, the pilot can select an auto pilot option. As soon as the pilot
p
switches to the auto pilot mode, an on-board computer takes over all controls of the
s
g
aircraft including navigation, take-off, and landing of the aircraft. In the auto pilot
o
bl
mode, the computer periodically samples velocity and acceleration of the aircraft. From the
p .
sampled data, the on-board computer computes X, Y, and Z co-ordinates of the current
aircraft position and compares them with the pre-specified track data. Before the next sample
o u
values are obtained, it computes the deviation from the specified track values and takes any
gr
corrective actions that may be necessary. In this case, the sampling of the various
s
nt
parameters, and their processing need to be completed within a few micro seconds.
de Applications
1.2.7. Internet and Multimedia
u
t
it ys
Important use of real-time systems in multimedia and Internet applications include: video
conferencing and multimedia multicast, Internet routers and switches.
.c
w
Example 9: Video Conferencing
w
w
In a video conferencing application, video and audio signals are generated by cameras and
microphones respectively. The data are sampled at a certain pre-specified frame rate. These
are then compressed and sent as packets to the receiver over a network. At the receiver-end,
packets are ordered, decompressed, and then played. The time constraint at the receiver-end
is that the receiver must process and play the received frames at a predetermined constant
rate. Thus if thirty frames are to be shown every minute, once a frame play-out is complete,
the next frame must be played within two seconds.
1.2.8. Consumer Electronics

Consumer electronics area abounds numerous applications of real-time systems. A few
sample applications of real-time systems in consumer electronics are: set-top boxes,
audio equipment, Internet telephony, microwave ovens, intelligent washing machines, home
security systems, air conditioning and refrigeration, toys, and cell phones.

Example 10: Cell Phones
Cell phones are possibly the fastest growing segment of consumer electronics. A cell
phone at any point of time carries out a number of tasks simultaneously. These include:
converting input voice to digital signals by deploying digital signal processing (DSP)
techniques, converting electrical signals generated by the microphone to output voice signals,
and sampling incoming base station signals in the control channel. A cell phone responds to
the communications received from the base station within certain specified time
bounds. For example, a base station might command a cell phone to switch the on-going
communication to a specific frequency. The cell phone must comply with such commands
from the base station within a few milliseconds.
1.2.9. Defense Applications
missile systems, satellite-based surveillance systems. o m

Typical defense applications of real-time systems include: missile guidance systems, anti-
Example 11: Missile Guidance System o t.c

p
s and homes onto it. Homing
A guided missile is one that is capable of sensing the target
g
o thermal
becomes easy when the target emits either electrical or
b l radiation. In a missile
mounted computer computes the deviation from p .

guidance system, missile guidance is achieved by a computer mounted on the missile. The
the required trajectory and effects track
changes of the missile to guide it onto the target. uThe time constraint on the computer-based
guidance system is that the sensing and the r o track correction tasks must be activated
s g
frequently enough to keep the missile from diverging from the target. The target sensing and
t
n on the speed of the missile and the type of
track correction tasks are typically required to be completed within a few hundreds of
e
microseconds or even lesser time depending
d
the target.
t u
s
1.2.10. MiscellaneousityApplications
.c already discussed, real-time systems have found numerous
other applications in w
w
Besides the areas of applications
our every day life. An example of such an application is a railway
reservation system. w
Example 12: Railway Reservation System
In a railway reservation system, a central repository maintains the up-to-date data on booking
status of various trains. Ticket booking counters are distributed across different geographic
locations. Customers queue up at different booking counters and submit their reservation
requests. After a reservation request is made at a counter, it normally takes only a few
seconds for the system to confirm the reservation and print the ticket. A real-time constraint
in this application is that once a request is made to the computer, it must print the ticket or
display the seat unavailability message before the average human response time (about 20
seconds) expires, so that the customers do not notice any delay and get a feeling of having
obtained instant results. However, as we discuss a little later (in Section 1.6), this application
is an example of a category of applications that is in some aspects different from the other

discussed applications. For example, even if the results are produced just after 20 seconds,
nothing untoward is going to happen - this may not be the case with the other discussed
applications.
1.3. A Basic Model of a Real-Time System

We have already pointed out that this book confines itself to the software issues in real-time
systems. However, in order to be able to see the software issues in a proper perspective, we need
to have a basic conceptual understanding of the underlying hardware. We therefore in this
section try to develop a broad understanding of high level issues of the underlying hardware in a
real-time system. For a more detailed study of the underlying hardware issues, we refer the
reader to [2]. Fig.28.3 shows a simple model of a real-time system in terms of its important
functional blocks. Unless otherwise mentioned, all our subsequent discussions would implicitly
assume such a model. Observe that in Fig. 28.3, the sensors are interfaced with the input
o m
conditioning block, which in turn is connected to the input interface. The output interface, output
conditioning, and the actuator are interfaced in a complementary manner. In the following, we
t.c
briefly describe the roles of the different functional blocks of a real-time system.
o
s p
Sensor: A sensor converts some physical characteristic of its environment into electrical
g
signals. An example of a sensor is a photo-voltaic cell which converts light energy into
o
bl
electrical energy. A wide variety of temperature and pressure sensors are also used. A
.
temperature sensor typically operates based on the principle of a thermocouple. Temperature
p
o u
sensors based on many other physical principles also exist. For example, one type of
temperature sensor employs the principle of variation of electrical resistance with
gr
temperature (called a varistor). A pressure sensor typically operates based on the
s
nt
piezoelectricity principle. Pressure sensors based on other physical principles also exist.
Input d e Input
Conditioning
t u Interface
Sensor
Unit
y s
i t Human
.c Real-Time
Computer
Computer
w Interface
wOutput
wConditioning Output
Interface
Unit
Actuator
Operators
Fig. 28.3 A Model of a Real-Time System

Actuator: An actuator is any device that takes its inputs from the output interface of a
computer and converts these electrical signals into some physical actions on its environment.
The physical actions may be in the form of motion, change of thermal, electrical, pneumatic,
or physical characteristics of some objects. A popular actuator is a motor. Heaters are also
very commonly used. Besides, several hydraulic and pneumatic actuators are also popular.
Signal Conditioning Units: The electrical signals produced by a computer can rarely
be used to directly drive an actuator. The computer signals usually need conditioning

before they can be used by the actuator. This is termed output conditioning. Similarly, input
conditioning is required to be carried out on sensor signals before they can be accepted
by the computer. For example, analog signals generated by a photo-voltaic cell are normally
in the milli-volts range and need to be conditioned before they can be processed by a
computer. The following are some important types of conditioning carried out on raw
signals generated by sensors and digital signals generated by computers:
1. Voltage Amplification: Voltage amplification is normally required to be carried out to

match the full scale sensor voltage output with the full scale voltage input to the interface
of a computer. For example, a sensor might produce voltage in the millivolts range,
whereas the input interface of a computer may require the input signal level to be of the
order of a volt.
2. Voltage Level Shifting: Voltage level shifting is often required to align the voltage level
generated by a sensor with that acceptable to the computer. For example, a sensor may
produce voltage in the range -0.5 to +0.5 volt, whereas the input interface of the computer
o m
may accept voltage only in the range of 0 to 1 volt. In this case, the sensor voltage must
undergo level shifting before it can be used by the computer.
o t.c
p
3. Frequency Range Shifting and Filtering: Frequency range shifting is often used to
s
g
reduce the noise components in a signal. Many types of noise occur in narrow bands and
o
bl
the signal must be shifted from the noise bands so that noise can be filtered out.
p .
4. Signal Mode Conversion: A type of signal mode conversion that is frequently carried
o u
out during signal conditioning involves changing direct current into alternating current
r
and vice-versa. Another type signal mode conversion that is frequently used is conversion
g
s
of analog signals to a constant amplitude pulse train such that the pulse rate or pulse
ent
width is proportional to the voltage level. Conversion of analog signals to a pulse train is
often necessary for input to systems such as transformer coupled circuits that do not pass
direct current.
u d
st
ity
.
Fromc D/A D/A To output
w
Processor
register converter signal
w Bus conditioning
unit
w
Fig. 28.4 An Output Interface
Interface Unit: Normally commands from the CPU are delivered to the actuator through an
output interface. An output interface converts the stored voltage into analog form and
then outputs this to the actuator circuitry. This of course would require the value
generated to be written on a register (see Fig. 28.4). In an output interface, in order to
produce an analog output, the CPU selects a data register of the output interface and writes
the necessary data to it. The two main functional blocks of an output interface are shown in
Fig. 28.4. The interface takes care of the buffering and the handshake control aspects. Analog
to digital conversion is frequently deployed in an input interface. Similarly, digital to analog
conversion is frequently used in an output interface.
In the following, we discuss the important steps of analog to digital signal conversion (ADC).

Analog to Digital Conversion: Digital computers can not process analog signals.
Therefore, analog signals need to be converted to digital form. Analog signals can be
converted to digital form using a circuitry whose block diagram is shown in Fig. 28.7. Using
the block diagram shown in Fig. 28.7, analog signals are normally converted to digital form
through the following two main steps:
Voltage
Time
o m
Fig. 28.5 Continuous Analog Voltage
o t.c
p
Sample the analog signal (shown in Fig. 28.5) at regular intervals. This sampling can be
s
g
done by a capacitor circuitry that stores the voltage levels. The stored voltage levels can
o
bl
be made discrete. After sampling the analog signal (shown in Fig. 28.5), a step waveform
as shown in Fig. 28.6 is obtained.
p .
Convert the stored value to a binary number
o u by using an analog to digital converter
(ADC) as shown in Fig. 28.7 and store the rdigital value in a register.
s g
n t
d e
t u
y s
Voltage i t
.c
w
w
w
Time
Fig. 28.6 Analog Voltage Converted to Discrete Form

Analog Voltage from

signal conditions
Sample and Hold

A/D Converter
Data Register
16
Binary digits
Fig. 28.7 Conversion of an Analog Signal to a 16 bit Binary Number
o m
Digital to analog conversion can be carried out through a complementary set of operations.
t.c
We leave it as an exercise to the reader to figure out the details of the circuitry that can perform
the digital to analog conversion (DAC).
p o
g
1.4. Characteristics of Real-Time Systems s
b lo
p.
We now discuss a few key characteristics of real-time systems. These characteristics
distinguish real-time systems from non-real-time systems. However, the reader may note that all
o u
the discussed characteristics may not be applicable to every real-time system. Real-time systems
gr
cover such an enormous range of applications and products that a generalization of the
ts
characteristics into a set that is applicable to each and every system is difficult. Different
n
categories of real-time systems may exhibit the characteristics that we identify to different
e
d
extents or may not even exhibit some of the characteristics at all.
u
st
1. Time constraints: Every real-time task is associated with some time constraints. One form
it y
of time constraints that is very common is deadlines associated with tasks. A task deadline
.c
specifies the time before which the task must complete and produce the results. Other types
w
of timing constraints are delay and duration (see Section 1.7). It is the responsibility of the
w
real-time operating system (RTOS) to ensure that all tasks meet their respective time
w
constraints. We shall examine in later chapters how an RTOS can ensure that tasks meet
their respective timing constraints through appropriate task scheduling strategies.
2. New Correctness Criterion: The notion of correctness in real-time systems is different from
that used in the context of traditional systems. In real-time systems, correctness implies not
only logical correctness of the results, but the time at which the results are produced is
important. A logically correct result produced after the deadline would be considered as an
incorrect result.

Actuators
Real Time Computer
Environment Sensors
Fig. 28.8 A Schematic Representation of an Embedded Real-Time System

o m
3.
c
Embedded: A vast majority of real-time systems are embedded int.nature [3]. An embedded
computer system is physically embedded in its environment andooften controls it. Fig. 28.8
shows a schematic representation of an embedded system. Assshown p in Fig. 28.8, the sensors
of the real-time computer collect data from the environment,
o g and pass them on to the real-
time computer for processing. The computer, in turn passes
. bl information (processed data) to
some characteristics of the environment. Several u pexamples of embedded systems were

the actuators to carry out the necessary work on the environment, which results in controlling
r
discussed in Section 1.2. An example of an embeddedo system that we would often refer is the
g in Example 6 of Sec. 1.2.
Multi-Point Fuel Injection (MPFI) system discussed
s
t
Safety-Criticality: For traditional nnon-real-time systems safety and reliability are
4.
independent issues. However, in many d e real-time systems these two issues are intricately
t u
bound together making them safety-critical. Note that a safe system is one that does not
cause any damage even whensit fails. A reliable system on the other hand, is one that can
operate for long durations ofi tytime without exhibiting any failures. A safety-critical system is
. c since any failure of the system can cause extensive damages.
required to be highly reliable
We elaborate this issuewin Section 1.5.
w
5. Concurrency: Awreal-time system usually needs to respond to several independent events
within very short and strict time bounds. For instance, consider a chemical plant automation
system (see Example1 of Sec. 1.2), which monitors the progress of a chemical reaction and
controls the rate of reaction by changing the different parameters of reaction such as
pressure, temperature, chemical concentration. These parameters are sensed using sensors
fixed in the chemical reaction chamber. These sensors may generate data asynchronously at
different rates. Therefore, the real-time system must process data from all the sensors
concurrently, otherwise signals may be lost and the system may malfunction. These systems
can be considered to be non-deterministic, since the behavior of the system depends on the
exact timing of its inputs. A non-deterministic computation is one in which two runs using
the same set of input data can produce two distinct sets of output data in the two runs.
6. Distributed and Feedback Structure: In many real-time systems, the different components
of the system are naturally distributed across widely spread geographic locations. In such

systems, the different events of interest arise at the geographically separate locations.
Therefore, these events may often have to be handled locally and responses produced to them
to prevent overloading of the underlying communication network. Therefore, the sensors and
the actuators may be located at the places where events are generated. An example of such a
system is a petroleum refinery plant distributed over a large geographic area. At each data
source, it makes good design sense to locally process the data before being passed on to a
central processor.
Many distributed as well as centralized real-time systems have a feedback structure as shown
in Fig. 28.9. In these systems, the sensors usually sense the environment periodically. The
sensed data about the environment is processed to determine the corrective actions necessary.
The results of the processing are used to carry out the necessary corrective actions on the
environment through the actuators, which in turn again cause a change to the required
characteristics of the controlled environment, and so on.
o m
Actuator
t.cSensor
o
s p
Actuator Computation gSensor
o Processing
Processing
b l
p .
o
Environmentu
g r
s
t of Real-Time Systems
n
Fig. 28.9 Feedback Structure
e
7. Task Criticality: Task criticality isda measure of the cost of failure of a task. Task criticality
is determined by examining how t ucritical are the results produced by the task to the proper
functioning of the system. A s
t y real-time system may have tasks of very different criticalities.
.
consideration while designingci forthatfault-tolerance.
It is therefore natural to expect the criticalities of the different tasks must be taken into
The higher the criticality of a task, the
more reliable it should w be made. Further, in the event of a failure of a highly critical
w detection and recovery are important. However, it should be realized
task, immediate failure
that task prioritywis a different concept and task criticality does not solely determine the task
priority or the order in which various tasks are to be executed (these issues shall be
elaborated in the later chapters).
8. Custom Hardware: A real-time system is often implemented on custom hardware that is

specifically designed and developed for the purpose. For example, a cell phone does not use
traditional microprocessors. Cell phones use processors which are tiny, supporting only those
processing capabilities that are really necessary for cell phone operation and specifically
designed to be power-efficient to conserve battery life. The capabilities of the processor used
in a cell phone are substantially different from that of a general purpose processor.
Another example is the embedded processor in an MPFI car. In this case, the processor used
need not be a powerful general purpose processor such as a Pentium or an Athlon processor.
Some of the most powerful computers used in MPFI engines are 16- or 32-bit processors
running at approximately 40 MHz. However, unlike the conventional PCs, a processor used

in these car engines do not deal with processing frills such as screen-savers or a dozen of
different applications running at the same time. All that the processor in an MPFI system
needs to do is to compute the required fuel injection rate that is most efficient for a given
speed and acceleration.
9. Reactive: Real-time systems are often reactive. A reactive system is one in which an on-
going interaction between the computer and the environment is maintained. Ordinary systems
compute functions on the input data to generate the output data (See Fig. 28.10 (a)). In other
words, traditional systems compute the output data as some function of the input data. That
is, output data can mathematically be expressed as: output data = (input data). For example,
if some data I1 is given as the input, the system computes O1 as the result O1 = (I1). To
elaborate this concept, consider an example involving library automation software. In a
library automation software, when the query book function is invoked and Real-Time
Systems is entered as the input book name, then the software displays Author name: R.
Mall, Rack Number: 001, Number of Copies: 1.
o m
t.c
po
Starting Reactive System
Input data Output data
Traditional System g s
Parameters
o
(a) bl
p.
(b)
o u
Fig. 28.10 Traditional versus Reactive Systems
r
In contrast to the traditional computationg of the output as a simple function of the
ts any output data but enter into an on-going
input data, real-time systems do not produce
n
interaction with their environment. Ineeach interaction step, the results computed are used to
u d
carry out some actions on the environment. The reaction of the environment is sampled and
s
is fed back to the system. Thereforet the computations in a real-time system can be
i t
considered to be non-terminating.y This reactive nature of real-time systems is schematically
.c
shown in the Fig. 28.10(b).
w conditions, real-time systems need to continue to meet the
w
10. Stability: Under overload
w
deadlines of the most critical tasks, though the deadlines of non-critical tasks may not be met.
This is in contrast to the requirement of fairness for traditional systems even under overload
conditions.
11. Exception Handling: Many real-time systems work round-the-clock and often operate
without human operators. For example, consider a small automated chemical plant that is set
up to work non-stop. When there are no human operators, taking corrective actions on a
failure becomes difficult. Even if no corrective actions can be immediate taken, it is desirable
that a failure does not result in catastrophic situations. A failure should be detected and the
system should continue to operate in a gracefully degraded mode rather than shutting off
abruptly.

1.5. Safety and Reliability

In traditional systems, safety and reliability are normally considered to be independent issues.
It is therefore possible to identify a traditional system that is safe and unreliable and systems that
are reliable but unsafe. Consider the following two examples. Word-processing software may not
be very reliable but is safe. A failure of the software does not usually cause any significant
damage or financial loss. It is therefore an example of an unreliable but safe system. On the other
hand, a hand gun can be unsafe but is reliable. A hand gun rarely fails. A hand gun is an
unsafe system because if it fails for some reason, it can misfire or even explode and cause
significant damage. It is an example of an unsafe but reliable system. These two examples show
that for traditional systems, safety and reliability are independent concerns - it is therefore
possible to increase the safety of a system without affecting its reliability and vice versa.
In real-time systems on the other hand, safety and reliability are coupled together. Before
we need to first understand what exactly is meant by a fail-safe state. o m

analyzing why safety and reliability are no longer independent issues in real-time systems,
o t.c
A fail-safe state of a system is one which if entered when the system fails, no damage would
result.
s p
o g
bl
To give an example, the fail-safe state of a word processing program is one where the
p .
document being processed has been saved onto the disk. All traditional non real-time systems do
have one or more fail-safe states which help separate the issues of safety and reliability - even if
u
a system is known to be unreliable, it can always be made to fail in a fail-safe state, and
o
r
consequently it would still be considered to be a safe system.
g
s
If no damage can result if a system enters a fail-safe state just before it fails, then through
ent
careful transit to a fail-safe state upon a failure, it is possible to turn an extremely unreliable and
unsafe system into a safe system. In many traditional systems this technique is in fact frequently
u d
adopted to turn an unreliable system into a safe system. For example, consider a traffic light
st
controller that controls the flow of traffic at a road intersection. Suppose the traffic light
t y
controller fails frequently and is known to be highly unreliable. Though unreliable, it can still be
i
.c
considered safe if whenever a traffic light controller fails, it enters a fail-safe state where all the
traffic lights are orange and blinking. This is a fail-safe state, since the motorists on seeing
w
w
blinking orange traffic light become aware that the traffic light controller is not working and
w
proceed with caution. Of course, a fail-safe state may not be to make all lights green, in which
case severe accidents could occur. Similarly, all lights turned red is also not a fail-safe state - it
may not cause accidents, but would bring all traffic to a stand still leading to traffic jams.
However, in many real-time systems there are no fail-safe states. Therefore, any failure of the
system can cause severe damages. Such systems are said to be safety-critical systems.
A safety-critical system is one whose failure can cause severe damages.
An example of a safety-critical system is a navigation system on-board an aircraft. An on-

board navigation system has no fail-safe states. When the computer on-board an aircraft fails, a
fail-safe state may not be one where the engine is switched-off! In a safety-critical system, the
absence of fail-safe states implies that safety can only be ensured through increased reliability.
Thus, for safety-critical systems the issues of safety and reliability become interrelated - safety

can only be ensured through increased reliability. It should now be clear why safety-critical
systems need to be highly reliable.
Just to give an example of the level of reliability required of safety-critical systems, consider
the following. For any fly-by-wire aircraft, most of its vital parts are controlled by a computer.
Any failure of the controlling computer is clearly not acceptable. The standard reliability
requirement for such aircrafts is at most 1 failure per 109 flying hours (that is, a million years of
continuous flying!). We examine how a highly reliable system can be developed in the next
section.
1.5.1. How to Achieve High Reliability?

If you are asked by your organization to develop software which should be highly reliable,
how would you proceed to achieve it? Highly reliable software can be developed by adopting all
of the following three important techniques:
o m
Error Avoidance: For achieving high reliability, every possibility
t. c of occurrence of
errors should be minimized during product development as much o as possible. This can be
p software engineering
achieved by adopting a variety of means: using well-founded
practices, using sound design methodologies, adopting suitableg s CASE tools, and so on.
b lothe best available error avoidance

Error Detection and Removal: In spite of using .
techniques, many errors still manage to creep pinto the code. These errors need to be
detected and removed. This can be achieved o uto a large extent by conducting thorough
reviews and testing. Once errors are detected,
g r they can be easily fixed.

ts
Fault-Tolerance: No matter hown meticulously error avoidance and error detection
techniques are used, it is virtuallye
u d impossible to make a practical software system entirely
Errors cause failures. That s t

error-free. Few errors still persist even after carrying out thorough reviews and testing.
is, failures are manifestation of the errors latent in the system.
Therefore to achieve high
i t y reliability, even in situations where errors are present, the
system should be ablecto tolerate the faults and compute the correct results. This is called
.
w
fault-tolerance. Fault-tolerance can be achieved by carefully incorporating redundancy.
w
w Legend:
C1, C2, C3: Redundant
copies of the same component
C1
V
O
C2 T
I Majority Result
N
G
C3
Fig. 28.11 Schematic Representation of TMR

It is relatively simple to design a hardware equipment to be fault-tolerant. The following are

two methods that are popularly used to achieve hardware fault-tolerance:
Error Detection and Removal: In spite of using the best available error avoidance
techniques, many errors still manage to creep into the code. These errors need to be
detected and removed. This can be achieved to a large extent by conducting thorough
reviews and testing. Once errors are detected, they can be easily fixed.
Built In Self Test (BIST): In BIST, the system periodically performs self tests of
its components. Upon detection of a failure, the system automatically reconfigures itself
by switching out the faulty component and switching in one of the redundant good
components.
Triple Modular Redundancy (TMR): In TMR, as the name suggests, three redundant
copies of all critical components are made to run concurrently (see Fig. 28.11). Observe
o m
that in Fig. 28.11, C1, C2, and C3 are the redundant copies of the same critical
o t.c
component. The system performs voting of the results produced by the redundant
components to select the majority result. TMR can help tolerate occurrence of only a
p
single failure at any time. (Can you answer why a TMR scheme can effectively tolerate a
s
g
single component failure only?). An assumption that is implicit in the TMR technique
o
bl
is that at any time only one of the three redundant components can produce
p .
erroneous results. The majority result after voting would be erroneous if two or more
components can fail simultaneously (more precisely, before a repair can be carried out).
u
In situations where two or more components are likely to fail (or produce erroneous
o
r
results), then greater amounts of redundancies would be required to be incorporated. A
g
s
little thinking can show that at least 2n+1 redundant components are required to tolerate
simultaneous failures of n component.
ent
u d
As compared to hardware, software fault-tolerance is much harder to achieve. To investigate
st
the reason behind this, let us first discuss the techniques currently being used to achieve
it y
software fault-tolerance. We do this in the following subsection.
. c
1.6. Software Fault-Tolerance Techniques
w
w
Two methods arew now popularly being used to achieve software fault-tolerance: N-version
programming and recovery block techniques. These two techniques are simple adaptations of the
basic techniques used to provide hardware fault-tolerance. We discuss these two techniques in
the following.
N-Version Programming: This technique is an adaptation of the TMR technique for

hardware fault-tolerance. In the N-version programming technique, independent teams develop
N different versions (value of N depends on the degree of fault-tolerance required) of a software
component (module). The redundant modules are run concurrently (possibly on redundant
hardware). The results produced by the different versions of the module are subjected to voting
at run time and the result on which majority of the components agree is accepted. The central
idea behind this scheme is that independent teams would commit different types of mistakes,
which would be eliminated when the results produced by them are subjected to voting. However,
this scheme is not very successful in achieving fault-tolerance, and the problem can be attributed

to statistical correlation of failures. Statistical correlation of failures means that even though
individual teams worked in isolation to develop the different versions of a software component,
still the different versions fail for identical reasons. In other words, the different versions of a
component show similar failure patterns. This does not mean that the different modules
developed by independent programmers, after all, contain identical errors. The reason for this is
not far to seek, programmers commit errors in those parts of a problem which they perceive to be
difficult - and what is difficult to one team is usually difficult to all teams. So, identical errors
remain in the most complex and least understood parts of a software component.
Recovery Blocks: In the recovery block scheme, the redundant components are called try
blocks. Each try block computes the same end result as the others but is intentionally written
using a different algorithm compared to the other try blocks. In N-version programming, the
different versions of a component are written by different teams of programmers, whereas in
recovery block different algorithms are used in different try blocks. Also, in contrast to the N-
version programming approach where the redundant copies are run concurrently, in the recovery
o m
block approach they are (as shown in Fig. 28.12) run one after another. The results produced by a
o t.c
try block are subjected to an acceptance test (see Fig. 28.12). If the test fails, then the next try
block is tried. This is repeated in a sequence until the result produced by a try block successfully
p
passes the acceptance test. Note that in Fig. 28.12 we have shown acceptance tests separately for
s
g
different try blocks to help understand that the tests are applied to the try blocks one after the
o
bl
other, though it may be the case that the same test is applied to each try block.
p .
o u Legend:
Component r TB: try block
s g
n t
TB2 e TB3
Input TB1
u d TB4 Exception
t
ys
Result Result
it
.c
test test test test
w
Success Failure Failure
w Success
w Result
Fig. 28.12 A Software Fault-Tolerance Scheme Using Recovery Blocks
As was the case with N-version programming, the recovery blocks approach also does not
achieve much success in providing effective fault-tolerance. The reason behind this is again
statistical correlation of failures. Different try blocks fail for identical reasons as was explained
in case of N-version programming approach. Besides, this approach suffers from a further
limitation that it can only be used if the task deadlines are much larger than the task computation
times (i.e. tasks have large laxity), since the different try blocks are put to execution one after the
other when failures occur. The recovery block approach poses special difficulty when used with
real-time tasks with very short slack time (i.e. short deadline and considerable execution time),

as the try blocks are tried out one after the other deadlines may be missed. Therefore, in such
cases the later try-blocks usually contain only skeletal code.
Check points Acceptance test
Progress of
computation
Rollback recovery
Fig. 28.13 Checkpointing and Rollback Recovery

Of course, it is possible that the later try blocks contain only skeletal code, produce only
approximate results and therefore take much less time for computation than the first try block.
Checkpointing and Rollback Recovery: Checkpointing and roll-back recovery is another

o m
popular technique to achieve fault-tolerance. In this technique as the computation proceeds, the
o t.c
system state is tested each time after some meaningful progress in computation is made.
Immediately after a state-check test succeeds, the state of the system is backed up on a stable
p
storage (see Fig. 28.13). In case the next test does not succeed, the system can be made to roll-
s
g
back to the last checkpointed state. After a rollback, from a checkpointed state a fresh
o
bl
computation can be initiated. This technique is especially useful, if there is a chance that the
failure. p .
system state may be corrupted as the computation proceeds, such as data corruption or processor
u
1.7. Types of Real-Time Tasksgro
ts
n is one for which quantitative expressions of time
We have already seen that a real-time task
are needed to describe its behavior. Thisequantitative expression of time usually appears in the
u dwhich the task produces results. The most frequently
form of a constraint on the time at
occurring timing constraint is a s t constraint which is used to express that a task is
deadline
required to compute its resultstywithin some deadline. We therefore implicitly assume only
c i on tasks in this section, though other types of constraints (as
.
deadline type of timing constraints
explained in Sec. .) may
three broad categories:w
w occur in practice. Real-time tasks can be classified into the following
w
A real-time task can be classified into either hard, soft, or firm real-time task
depending on the consequences of a task missing its deadline.
It is not necessary that all tasks of a real-time application belong to the same category. It is
possible that different tasks of a real-time system can belong to different categories. We now
elaborate these three types of real-time tasks.
1.7.1. Hard Real-Time Tasks

A hard real-time task is one that is constrained to produce its results within certain predefined
time bounds. The system is considered to have failed whenever any of its hard real-time tasks
does not produce its required results before the specified time bound.

An example of a system having hard real-time tasks is a robot. The robot cyclically carries
out a number of activities including communication with the host system, logging all completed
activities, sensing the environment to detect any obstacles present, tracking the objects of
interest, path planning, effecting next move, etc. Now consider that the robot suddenly
encounters an obstacle. The robot must detect it and as soon as possible try to escape colliding
with it. If it fails to respond to it quickly (i.e. the concerned tasks are not completed before the
required time bound) then it would collide with the obstacle and the robot would be considered
to have failed. Therefore detecting obstacles and reacting to it are hard real-time tasks.
Another application having hard real-time tasks is an anti-missile system. An anti-missile

system consists of the following critical activities (tasks). An anti-missile system must first
detect all incoming missiles, properly position the anti-missile gun, and then fire to destroy the
incoming missile before the incoming missile can do any damage. All these tasks are hard real-
time in nature and the anti-missile system would be considered to have failed, if any of its tasks
fails to complete before the corresponding deadlines.
o m
Applications having hard real-time tasks are typically safety-critical
t.c (Can you think an
1
p o would result in severe

example of a hard real-time system that is not safety-critical? ) This means
a real-time task, including its failure to meet the associated deadlines,
that any failure of
consequences. This makes hard real-time tasks extremely critical.

g s Criticality of a task can range
lo is a different dimension than
from extremely critical to not so critical. Task criticality therefore
hard or soft characterization of a task. Criticality of a taskbis a measure of the cost of a failure -
p
the higher the cost of failure, the more critical is the task. .
u
For hard real-time tasks in practical systems,ro the time bounds usually range from several
micro seconds to a few milli seconds. It may begnoted that a hard real-time task does not need to
ts but it is merely required that the task must
be completed within the shortest time possible,
complete within the specified time bound. e nIn other words, there is no reward in completing a
u d This is an important observation and this would
hard real-time task much ahead of its deadline.
take a central part in our discussionston task scheduling in the next two chapters.
y s
i t
1.7.2. Firm Real-Time .c Tasks
wtask is associated with some predefined deadline before which it is
Every firm real-time
required to produce w
w
its results. However, unlike a hard real-time task, even when a firm real-time
task does not complete within its deadline, the system does not fail. The late results are merely
discarded. In other words, the utility of the results computed by a firm real-time task becomes
zero after the deadline. Fig. 28.14 schematically shows the utility of the results produced by a
firm real-time task as a function of time. In Fig. 28.14 it can be seen that if the response time of a
task exceeds the specified deadline, then the utility of the results becomes zero and the results are
discarded.
1
Some computer games have hard real-time tasks; these are not safety-critical though. Whenever a timing constraint is not met, the game may
fail, but the failure may at best be a mild irritant to the user.

Utility
100%
Deadline
0
Response Time
Fig. 28.14 Utility of Result of a Firm Real-Time Task with Time
Firm real-time tasks typically abound in multimedia applications. The following are two
examples of firm real- time tasks:
m
Video conferencing: In a video conferencing application, video frames and the
o
t.c
accompanying audio are converted into packets and transmitted to the receiver over a
network. However, some frames may get delayed at different nodes during transit on a
o
packet-switched network due to congestion at different nodes. This may result in varying
p
g s
queuing delays experienced by packets traveling along different routes. Even when
packets traverse the same route, some packets can take much more time than the other
o
.bl
packets due to the specific transmission strategy used at the nodes. When a certain frame
is being played, if some preceding frame arrives at the receiver, then this frame is of no
u p
use and is discarded. Due to this reason, when a frame is delayed by more than say one
r o
second, it is simply discarded at the receiver-end without carrying out any processing on
it.
s g
Satellite-based tracking of enemy n tmovements: Consider a satellite that takes
pictures of an enemy territory e
u d and beams it to a ground station computer frame
by frame. The ground computer
different objects of interest t processes
s thewithenemy.
each frame to find the positional difference of
respect to their position in the previous frame to
determine the movements
i t yof When the ground computer is overloaded, a new
image may be received
.cis ofevennotbefore an older image is taken up for processing. In this
w
case, the older image much use. Hence the older images may be discarded and
w image could be processed.
the recently received
w
For firm real-time tasks, the associated time bounds typically range from a few milli seconds
to several hundreds of milli seconds.
Utility
100% Deadline
0
Response Time
Fig. 28.15 Utility of the Results Produced by a Soft Real-Time Task as a Function of Time
1.7.3. Soft Real-Time Tasks

Soft real-time tasks also have time bounds associated with them. However, unlike hard and
firm real-time tasks, the timing constraints on soft real-time tasks are not expressed as absolute
values. Instead, the constraints are expressed either in terms of the average response times
required.
An example of a soft real-time task is web browsing. Normally, after an URL (Uniform
Resource Locater) is clicked, the corresponding web page is fetched and displayed within a
couple of seconds on the average. However, when it takes several minutes to display a
requested page, we still do not consider the system to have failed, but merely express that
the performance of the system has degraded.
Another example of a soft real-time task is a task handling a request for a seat reservation in
o m
a railway reservation application. Once a request for reservation is made, the response should
occur within 20 seconds on the average. The response may either be in the form of a printed
o t.c
ticket or an apology message on account of unavailability of seats. Alternatively, we might
state the constraint on the ticketing task as: At least in case of 95% of reservation
s p
requests, the ticket should be processed and printed in less than 20 seconds.
o g
bl
Let us now analyze the impact of the failure of a soft real-time task to meet its deadline, by
.
taking the example of the railway reservation task. If the ticket is printed in about 20 seconds, we
p
o u
feel that the system is working fine and get a feel of having obtained instant results. As already
stated, missed deadlines of soft real-time tasks do not result in system failures. However, the
gr
utility of the results produced by a soft real-time task falls continuously with time after the expiry
s
nt
of the deadline as shown in Fig. 28.15. In Fig. 28.15, the utility of the results produced are 100%
if produced before the deadline, and after the deadline is passed the utility of the results slowly
d e
falls off with time. For soft real-time tasks that typically occur in practical applications, the time
t u
bounds usually range from a fraction of a second to a few seconds.
y s
1.7.4. Non-Real-Timeit Tasks
. c
A non-real-time task w
is not associated with any time bounds. Can you think of any example
w Most of the interactive computations you perform nowadays are
of a non-real-time task?
w tasks.
handled by soft real-time However, about two or three decades back, when computers
were not interactive almost all tasks were non-real-time. A few examples of non-real-time tasks
are: batch processing jobs, e-mail, and back ground tasks such as event loggers. You may
however argue that even these tasks, in the strict sense of the term, do have certain time bounds.
For example, an e-mail is expected to reach its destination at least within a couple of hours of
being sent. Similar is the case with a batch processing job such as pay-slip printing. What then
really is the difference between a non-real-time task and a soft real-time task? For non-real-time
tasks, the associated time bounds are typically of the order of a few minutes, hours or even days.
In contrast, the time bounds associated with soft real-time tasks are at most of the order of a few
seconds.

1.8. Exercises
1. State whether you consider the following statements to be TRUE or FALSE. Justify your
answer in each case.
a. A hard real-time application is made up of only hard real-time tasks.
b. Every safety-critical real-time system has a fail-safe state.
c. A deadline constraint between two stimuli can be considered to be a behavioral
constraint on the environment of the system.
d. Hardware fault-tolerance techniques can easily be adapted to provide software fault-
tolerance.
e. A good algorithm for scheduling hard real-time tasks must try to complete each task in
the shortest time possible.
f. All hard real-time systems are safety-critical in nature.
g. Performance constraints on a real-time system ensure that the environment of the
system is well-behaved.
o m
h. Soft real-time tasks are those which do not have any time bounds associated with
them.
o t.c
i. Minimization of average task response times is the objective of any good hard real-
time task-scheduling algorithm.
s p
g
j. It should be the goal of any good real-time operating system to complete every
o
bl
hard real-time task as ahead of its deadline as possible.
2. .
What do you understand by the term real-time? How is the concept of real-time
p
example. ou
different from the traditional notion of time? Explain your answer using a suitable
3.
gr
Using a block diagram show the important hardware components of a real-time system and
s
nt
their interactions. Explain the roles of the different components.
4. In a real-time system, raw sensor signals need to be preprocessed before they can
d e
be used by a computer. Why is it necessary to preprocess the raw sensor signals before
t u
they can be used by a computer? Explain the different types of preprocessing that are
s
normally carried out on sensor signals to make them suitable to be used directly by
y
a computer.
it
5.
.c
Identify the key differences between hard real-time, soft real-time, and firm real-
w
time systems. Give at least one example of real-time tasks corresponding to these three
w
categories. Identify the timing constraints in your tasks and justify why the tasks should be
6.
w
categorized into the categories you have indicated.
Give an example of a soft real-time task and a non-real-time task. Explain the key
difference between the characteristics of these two types of tasks.
7. Draw a schematic model showing the important components of a typical hard real-time
system. Explain the working of the input interface using a suitable schematic diagram.
Explain using a suitable circuit diagram how analog to digital (ADC) conversion is
achieved in an input interface.
8. Explain the check pointing and rollback recovery scheme to provide fault-tolerant real-
time computing. Explain the types of faults it can help tolerate and the faults it can not
tolerate. Explain the situations in which this technique is useful.
9. Answer the following questions concerning fault-tolerance of real-time systems.
a. Explain why hardware fault-tolerance is easier to achieve compared to software fault-
tolerance.
b. Explain the main techniques available to achieve hardware fault-tolerance.

c. What are the main techniques available to achieve software fault-tolerance? What are
the shortcomings of these techniques?
10. What do you understand by the fail-safe state of a system? Safety-critical real-time
systems do not have a fail-safe state. What is the implication of this?
11. Is it possible to have an extremely safe but unreliable system? If your answer is
affirmative, then give an example of such a system. If you answer in the negative, then
justify why it is not possible for such a system to exist.
12. What is a safety-critical system? Give a few practical examples safety-critical hard real-
time systems. Are all hard real-time systems safety-critical? If not, give at least one
example of a hard real-time system that is not safety-critical.
13. Explain with the help of a schematic diagram how the recovery block scheme can
be used to achieve fault- tolerance of real-time tasks. What are the shortcomings of this
scheme? Explain situations where it can be satisfactorily be used and situations where it
can not be used.
14. Identify and represent the timing constraints in the following air-defense system by means
o m
of an extended state machine diagram. Classify each constraint into either performance or
15.
behavioral constraint.
o t.c
Every incoming missile must be detected within 0.2 seconds of its entering the radar
p
coverage area. The intercept missile should be engaged within 5 seconds of detection of
s
g
the target missile. The intercept missile should be fired after 0.1 seconds of its engagement
o
bl
but no later than 1 sec.
16.
.
Represent a washing machine having the following specification by means of an extended
p
state machine diagram. The wash-machine waits for the start switch to be pressed. After
u
the user presses the start switch, the machine fills the wash tub with either hot or cold
o
r
water depending upon the setting of the Hot Wash switch. The water filling continues until
g
s
the high level is sensed. The machine starts the agitation motor and continues agitating the
ent
wash tub until either the preset timer expires or the user presses the stop switch. After the
agitation stops, the machine waits for the user to press the start Drying switch. After the
u d
user presses the start Drying switch, the machine starts the hot air blower and continues
st
blowing hot air into the drying chamber until either the user presses the stop switch or the
preset timer expires.
it y
17.
.c
Represent the timing constraints in a collision avoidance task in an air surveillance system
as an extended finite state machine (EFSM) diagram. The collision avoidance task
w
w
consists of the following activities.
w
a. The first subtask named radar signal processor processes the radar signal on a signal
processor to generate the track record in terms of the targets location and velocity
within 100 mSec of receipt of the signal.
b. The track record is transmitted to the data processor within 1 mSec after the track
record is determined.
c. A subtask on the data processor correlates the received track record with the track
records of other targets that come close to detect potential collision that might occur
within the next 500 mSec.
d. If a collision is anticipated, then the corrective action is determined within 10 mSec by
another subtask running on the data processor.
e. The corrective action is transmitted to the track correction task within 25 mSec.
18. Consider the following (partial) specification of a real-time system:
The velocity of a space-craft must be sampled by a computer on-board the space-craft at
least once every second (the sampling event is denoted by S). After sampling the velocity,
the current position is computed (denoted by event C) within 100msec. Concurrently, the
expected position of the space-craft is retrieved from the database within 200msec
(denoted by event R). Using these data, the deviation from the normal course of the space-
craft must be determined within 100 msec (denoted by event D) and corrective velocity
adjustments must be carried out before a new velocity value is sampled in (the velocity
adjustment event is denoted by A). Calculated positions must be transmitted to the earth
station at least once every minute (position transmission event is denoted by the event T).
Identify the different timing constraints in the system. Classify these into either
performance or behavioral constraints. Construct an EFSM to model the system.
19. Construct the EFSM model of a telephone system whose (partial) behavior is described
below:
After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial
tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone
appears, the first digit should to be dialed within 10 seconds and the subsequent five digits
within 5 seconds of each other. If the dialing of any of the digits is delayed, then an idle
tone is produced. The idle tone continues until the receiver handset is replaced.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
29
Real-Time Task
Scheduling Part 1

Understand the basic terminologies associated with Real-Time task scheduling
Classify the Real-Time tasks with respect to their recurrence
Get an overview of the different types of schedulers
Get an overview of the various ways of classifying scheduling algorithms
Understand the logic of clock-driven scheduling
Get an overview of table-driven schedulers
Get an overview of cyclic schedulers
Work out problems related to table-driven and cyclic schedulers
o m
t.c
Understand how a generalized task scheduler would be
Compare table-driven and cyclic schedulers
p o
1. Real-Time Task Scheduling g s
o
. bl
u p
In the last Chapter we defined a real-time task as one that has some constraints
associated with it. Out of the three broad classes of time constraints we discussed,
r o
deadline constraint on tasks is the most common. In all subsequent discussions we therefore
g
implicitly assume only deadline constraints on real-time tasks, unless we mention otherwise.
s
ent
Real-time tasks get generated in response to some events that may either be external or
internal to the system. For example, a task might get generated due to an internal event
d
such as a clock interrupt occurring every few milliseconds to periodically poll the
u
t
temperature of a chemical plant. Another task might get generated due to an external event
s
it y
such as the user pressing a switch. When a task gets generated, it is said to have arrived or got
released. Every real-time system usually consists of a number of real-time tasks. The time
.c
bounds on different tasks may be different. We had already pointed out that the consequences of
w
a task missing its time bounds may also vary from task to task. This is often expressed as the
criticality of a task. w
w
In the last Chapter, we had pointed out that appropriate scheduling of tasks is the
basic mechanism adopted by a real-time operating system to meet the time constraints of a task.
Therefore, selection of an appropriate task scheduling algorithm is central to the proper
functioning of a real-time system. In this Chapter we discuss some fundamental task
scheduling techniques that are available. An understanding of these techniques would help
us not only to satisfactorily design a real-time application, but also understand and appreciate the
features of modern commercial real-time operating systems discussed in later chapters.
This chapter is organized as follows. We first introduce some basic concepts and
terminologies associated with task scheduling. Subsequently, we discuss two major classes
of task schedulers: clock-driven and event-driven. Finally, we explain some important issues
that must be considered while developing practical applications.

1.1. Basic Terminologies

In this section we introduce a few important concepts and terminologies which would be
useful in understanding the rest of this Chapter.
Task Instance: Each time an event occurs, it triggers the task that handles this event to run.
In other words, a task is generated when some specific event occurs. Real-time tasks therefore
normally recur a large number of times at different instants of time depending on the event
occurrence times. It is possible that real-time tasks recur at random instants. However, most
real-time tasks recur with certain fixed periods. For example, a temperature sensing task in a
chemical plant might recur indefinitely with a certain period because the temperature is sampled
periodically, whereas a task handling a device interrupt might recur at random instants. Each
time a task recurs, it is called an instance of the task. The first time a task occurs, it is
called the first instance of the task. The next occurrence of the task is called its second
o m
instance, and so on. The jth instance of a task Ti would be denoted as Ti(j). Each instance of a
real-time task is associated with a deadline by which it needs to complete and produce
interchangeably when no confusion arises. o t.c

results. We shall at times refer to task instances as processes and use these two terms
s p
o g
.bl
Absolute deadline of Ti (1)
=+d u p
r o
Relative
s g
deadline of Ti(1)
=d
ent

Ti(1)
u d +d Ti(2)
0 st + pi
it y
.c
Arrival of Ti (1) Deadline of Ti (1)
w
w
Fig. 29.1 Relative and Absolute Deadlines of a Task
w
Relative Deadline versus Absolute Deadline: The absolute deadline of a task is the
absolute time value (counted from time 0) by which the results from the task are
expected. Thus, absolute deadline is equal to the interval of time between the time 0 and the
actual instant at which the deadline occurs as measured by some physical clock. Whereas,
relative deadline is the time interval between the start of the task and the instant at which
deadline occurs. In other words, relative deadline is the time interval between the arrival
of a task and the corresponding deadline. The difference between relative and absolute
deadlines is illustrated in Fig. 29.1. It can be observed from Fig. 29.1 that the relative deadline
of the task Ti(1) is d, whereas its absolute deadline is + d.
Response Time: The response time of a task is the time it takes (as measured from the task
arrival time) for the task to produce its results. As already remarked, task instances get generated

due to occurrence of events. These events may be internal to the system, such as clock interrupts,
or external to the system such as a robot encountering an obstacle.
The response time is the time duration from the occurrence of the event generating the task to
the time the task produces its results.
For hard real-time tasks, as long as all their deadlines are met, there is no special advantage
of completing the tasks early. However, for soft real-time tasks, average response time of tasks
is an important metric to measure the performance of a scheduler. A scheduler for soft real-
time tasks should try to execute the tasks in an order that minimizes the average response
time of tasks.
Task Precedence: A task is said to precede another task, if the first task must complete
before the second task can start. When a task Ti precedes another task Tj, then each instance of
o m
Ti precedes the corresponding instance of Tj. That is, if T1 precedes T2, then T1(1) precedes
T2(1), T1(2) precedes T2(2), and so on. A precedence order defines a partial order among tasks.
o t.c
Recollect from a first course on discrete mathematics that a partial order relation is
reflexive, antisymmetric, and transitive. An example partial ordering among tasks is shown in
s p
Fig. 29.2. Here T1 precedes T2, but we cannot relate T1 with either T3 or T4. We shall later
g
use task precedence relation to develop appropriate task scheduling algorithms.
o
bl
. T2
u p
T1
r o
s g
ent
u d
T4
st T3
y
it
c
Fig.. 29.2 Precedence Relation among Tasks
woften need to share their results among each other when one task needs
Data Sharing: Tasks w
w by another task; clearly, the second task must precede the first task.
to share the results produced
In fact, precedence relation between two tasks sometimes implies data sharing between the two
tasks (e.g. first task passing some results to the second task). However, this is not always true.
A task may be required to precede another even when there is no data sharing. For
example, in a chemical plant it may be required that the reaction chamber must be filled with
water before chemicals are introduced. In this case, the task handling filling up the reaction
chamber with water must complete, before the task handling introduction of the chemicals
is activated. It is therefore not appropriate to represent data sharing using precedence relation.
Further, data sharing may occur not only when one task precedes the other, but might occur
among truly concurrent tasks, and overlapping tasks. In other words, data sharing among tasks
does not necessarily impose any particular ordering among tasks. Therefore, data sharing relation
among tasks needs to be represented using a different symbol. We shall represent data sharing
among two tasks using a dashed arrow. In the example of data sharing among tasks represented
in Fig. 29.2, T2 uses the results of T3, but T2 and T3 may execute concurrently. T2 may even

start executing first, after sometimes it may receive some data from T3, and continue its
execution, and so on.
1.2. Types of Real-Time Tasks

Based on the way real-time tasks recur over a period of time, it is possible to classify them
into three main categories: periodic, sporadic, and aperiodic tasks. In the following, we discuss
the important characteristics of these three major categories of real-time tasks.
Periodic Task: A periodic task is one that repeats after a certain fixed time interval. The
precise time instants at which periodic tasks recur are usually demarcated by clock interrupts.
For this reason, periodic tasks are sometimes referred to as clock-driven tasks. The fixed time
interval after which a task repeats is called the period of the task. If Ti is a periodic task, then the
time from 0 till the occurrence of the first instance of Ti (i.e. Ti(1)) is denoted by i, and is
called the phase of the task. The second instance (i.e. Ti(2)) occurs at i + pi. The third instance
m
o
(i.e. Ti(3)) occurs at i + 2 pi and so on. Formally, a periodic task Ti can be represented by a
t.c
4 tuple (i, pi, ei, di) where pi is the period of task, ei is the worst case execution time of the
o
discussions. s p
task, and di is the relative deadline of the task. We shall use this notation extensively in future
o g
. bl
u p
eir o
s g
= 2000 di ent
u d
st
0 i
t y + pi + 2*pi
.c
w
Fig. 29.3 Track Correction Task (2000mSec; pi; ei; di) of a Rocket
w
w
To illustrate the above notation to represent real-time periodic tasks, let us consider
the track correction task typically found in a rocket control software. Assume the following
characteristics of the track correction task. The track correction task starts 2000 milliseconds
after the launch of the rocket, and recurs periodically every 50 milliseconds then on. Each
instance of the task requires a processing time of 8 milliseconds and its relative deadline is 50
milliseconds. Recall that the phase of a task is defined by the occurrence time of the first
instance of the task. Therefore, the phase of this task is 2000 milliseconds. This task can formally
be represented as (2000 mSec, 50 mSec, 8 mSec, 50 mSec). This task is pictorially shown in Fig.
29.3. When the deadline of a task equals its period (i.e. pi=di), we can omit the fourth tuple. In
this case, we can represent the task as Ti= (2000 mSec, 50 mSec, 8 mSec). This would
automatically mean pi=di=50 mSec. Similarly, when i = 0, it can be omitted when no confusion
arises. So, Ti = (20mSec; 100mSec) would indicate a task with i = 0, pi=100mSec, ei=20mSec,
and di=100mSec. Whenever there is any scope for confusion, we shall explicitly write out the
parameters Ti = (pi=50 mSecs, ei = 8 mSecs, di = 40 mSecs), etc.

A vast majority of the tasks present in a typical real-time system are periodic. The reason for
this is that many activities carried out by real-time systems are periodic in nature, for example
monitoring certain conditions, polling information from sensors at regular intervals to carry
out certain action at regular intervals (such as drive some actuators). We shall consider
examples of such tasks found in a typical chemical plant. In a chemical plant several
temperature monitors, pressure monitors, and chemical concentration monitors periodically
sample the current temperature, pressure, and chemical concentration values which are then
communicated to the plant controller. The instances of the temperature, pressure, and chemical
concentration monitoring tasks normally get generated through the interrupts received from a
periodic timer. These inputs are used to compute corrective actions required to maintain the
chemical reaction at a certain rate. The corrective actions are then carried out through actuators.
Sporadic Task: A sporadic task is one that recurs at random instants. A sporadic task Ti
can be is represented by a three tuple:
Ti = (ei, gi, di)
o m
where ei is the worst case execution time of an instance of the task, gi denotes the minimum
o t.c
separation between two consecutive instances of the task, di is the relative deadline. The
minimum separation (gi) between two consecutive instances of the task implies that once an
p
instance of a sporadic task occurs, the next instance cannot occur before gi time units have
s
g
elapsed. That is, gi restricts the rate at which sporadic tasks can arise. As done for
o
bl
periodic tasks, we shall use the convention that the first instance of a sporadic task Ti is denoted
by Ti(1) and the successive instances by Ti(2), Ti(3), etc.
p .
Many sporadic tasks such as emergency message arrivals are highly critical in nature. For
u
example, in a robot a task that gets generated to handle an obstacle that suddenly appears is a
o
r
sporadic task. In a factory, the task that handles fire conditions is a sporadic task. The time of
g
occurrence of these tasks can not be predicted.
s
ent
The criticality of sporadic tasks varies from highly critical to moderately critical. For
example, an I/O device interrupt, or a DMA interrupt is moderately critical. However, a
u d
task handling the reporting of fire conditions is highly critical.
st
t y
Aperiodic Task: An aperiodic task is in many ways similar to a sporadic task. An aperiodic
i
.c
task can arise at random instants. However, in case of an aperiodic task, the minimum separation
gi between two consecutive instances can be 0. That is, two or more instances of an
w
w
aperiodic task might occur at the same time instant. Also, the deadline for an aperiodic
w
tasks is expressed as either an average value or is expressed statistically. Aperiodic tasks are
generally soft real-time tasks.
It is easy to realize why aperiodic tasks need to be soft real-time tasks. Aperiodic
tasks can recur in quick succession. It therefore becomes very difficult to meet the deadlines
of all instances of an aperiodic task. When several aperiodic tasks recur in a quick
succession, there is a bunching of the task instances and it might lead to a few deadline misses.
As already discussed, soft real-time tasks can tolerate a few deadline misses. An example of an
aperiodic task is a logging task in a distributed system. The logging task can be started by
different tasks running on different nodes. The logging requests from different tasks may arrive
at the logger almost at the same time, or the requests may be spaced out in time. Other examples
of aperiodic tasks include operator requests, keyboard presses, mouse movements, etc. In fact,
all interactive commands issued by users are handled by aperiodic tasks.

1.3. Task Scheduling

Real-time task scheduling essentially refers to determining the order in which the various
tasks are to be taken up for execution by the operating system. Every operating system relies on
one or more task schedulers to prepare the schedule of execution of various tasks it needs to run.
Each task scheduler is characterized by the scheduling algorithm it employs. A large number of
algorithms for scheduling real-time tasks have so far been developed. Real-time task scheduling
on uniprocessors is a mature discipline now with most of the important results having been
worked out in the early 1970s. The research results available at present in the literature
are very extensive and it would indeed be grueling to study them exhaustively. In this text,
we therefore classify the available scheduling algorithms into a few broad classes and study the
characteristics of a few important ones in each class.
1.3.1. A Few Basic Concepts

o m
t.c
Before focusing on the different classes of schedulers more closely, let us first
introduce a few important concepts and terminologies which would be used in our later
discussions.
p o
g s
o
Valid Schedule: A valid schedule for a set of tasks is one where at most one task is assigned
bl
to a processor at a time, no task is scheduled before its arrival time, and the precedence and
.
resource constraints of all tasks are satisfied.
u p
r o
Feasible Schedule: A valid schedule is called a feasible schedule, only if all tasks meet their
respective time constraints in the schedule.
s g
n t sch1 is said to be more proficient than another
Proficient Scheduler: A task scheduler
d
scheduler sch2, if sch1 can feasibly schedule
e all task sets that sch2 can feasibly schedule, but not
vice versa. That is, sch1 can feasibly t uschedule all task sets that sch2 can, but there exists at least
s schedule, whereas sch1 can. If sch1 can feasibly schedule
ti y schedule and vice versa, then sch1 and sch2 are called equally
one task set that sch2 can not feasibly
all task sets that sch2 can feasibly
proficient schedulers. .c
w
Optimal Scheduler: w A real-time task scheduler is called optimal, if it can feasibly schedule
any task set that canwbe feasibly scheduled by any other scheduler. In other words, it would
not be possible to find a more proficient scheduling algorithm than an optimal scheduler.
If an optimal scheduler can not schedule some task set, then no other scheduler should be
able to produce a feasible schedule for that task set.
Scheduling Points: The scheduling points of a scheduler are the points on time line at which
the scheduler makes decisions regarding which task is to be run next. It is important to note that
a task scheduler does not need to run continuously, it is activated by the operating system only at
the scheduling points to make the scheduling decision as to which task to be run next. In a
clock-driven scheduler, the scheduling points are defined at the time instants marked by
interrupts generated by a periodic timer. The scheduling points in an event-driven scheduler are
determined by occurrence of certain events.

Preemptive Scheduler: A preemptive scheduler is one which when a higher priority task
arrives, suspends any lower priority task that may be executing and takes up the higher priority
task for execution. Thus, in a preemptive scheduler, it can not be the case that a higher priority
task is ready and waiting for execution, and the lower priority task is executing. A preempted
lower priority task can resume its execution only when no higher priority task is ready.
Utilization: The processor utilization (or simply utilization) of a task is the average time for
which it executes per unit time interval. In notations: for a periodic task Ti, the utilization ui =
ei/pi, where ei is the execution time and pi is the period of Ti. For a set of periodic tasks {Ti}: the
n
total utilization due to all tasks U = i=1 ei/pi. It is the objective of any good scheduling
algorithm to feasibly schedule even those task sets that have very high utilization, i.e. utilization
approaching 1. Of course, on a uniprocessor it is not possible to schedule task sets having
utilization more than 1.
o m
Jitter: Jitter is the deviation of a periodic task from its strict periodic behavior. The
arrival time jitter is the deviation of the task from arriving at the precise periodic time of arrival.
o t.c
It may be caused by imprecise clocks, or other factors such as network congestions. Similarly,
completion time jitter is the deviation of the completion of a task from precise periodic points.
s p
The completion time jitter may be caused by the specific scheduling algorithm employed
g
which takes up a task for scheduling as per convenience and the load at an instant, rather than
o
bl
scheduling at some strict time instants. Jitters are undesirable for some applications.
.
1.4. Classification of Real-Time Task p
u Scheduling Algorithms
o
r task scheduling algorithms exist. A popular
Several schemes of classification of real-timeg
s algorithms based on how the scheduling points
scheme classifies the real-time task scheduling t
n according to this classification scheme are:
e
are defined. The three main types of schedulers
clock-driven, event-driven, and hybrid.d
t u
The clock-driven schedulers are y sthose in which the scheduling points are determined by the
t
i In the event-driven ones, the scheduling points are defined
. c
interrupts received from a clock.
by certain events which precludes clock interrupts. The hybrid ones use both clock interrupts
w
as well as event occurrences
w to define their scheduling points.
w
A few important members of each of these three broad classes of scheduling algorithms are
the following:
1. Clock Driven
Table-driven
Cyclic
2. Event Driven
Simple priority-based
Rate Monotonic Analysis (RMA)
Earliest Deadline First (EDF)
3. Hybrid
Round-robin

Important members of clock-driven schedulers that we discuss in this text are table-driven
and cyclic schedulers. Clock-driven schedulers are simple and efficient. Therefore, these are
frequently used in embedded applications. We investigate these two schedulers in some detail in
Sec. 2.5.
Important examples of event-driven schedulers are Earliest Deadline First (EDF) and Rate
Monotonic Analysis (RMA). Event-driven schedulers are more sophisticated than clock-driven
schedulers and usually are more proficient and flexible than clock-driven schedulers. These are
more proficient because they can feasibly schedule some task sets which clock-driven
schedulers cannot. These are more flexible because they can feasibly schedule sporadic and
aperiodic tasks in addition to periodic tasks, whereas clock-driven schedulers can satisfactorily
handle only periodic tasks. Event-driven scheduling of real-time tasks in a uniprocessor
environment was a subject of intense research during early 1970s, leading to publication of a
large number of research results. Out of the large number of research results that were
published, the following two popular algorithms are the essence of all those results: Earliest
Deadline First (EDF), and Rate Monotonic Analysis (RMA). If we understand these two
o m
schedulers well, we would get a good grip on real-time task scheduling on uniprocessors. Several
variations to these two basic algorithms exist.
o t.c
Another classification of real-time task scheduling algorithms can be made based upon the
p
type of task acceptance test that a scheduler carries out before it takes up a task for scheduling.
s
g
The acceptance test is used to decide whether a newly arrived task would at all be taken up for
o
bl
scheduling or be rejected. Based on the task acceptance test used, there are two broad categories
of task schedulers:
Planning-based p .
Best effort
ou
r
In planning-based schedulers, when a task arrives the scheduler first determines whether the
g
s
task can meet its deadlines, if it is taken up for execution. If not, it is rejected. If the task can
ent
meet its deadline and does not cause other already scheduled tasks to miss their respective
deadlines, then the task is accepted for scheduling. Otherwise, it is rejected. In best effort
u d
schedulers, no acceptance test is applied. All tasks that arrive are taken up for scheduling and
st
best effort is made to meet its deadlines. But, no guarantee is given as to whether a tasks
deadline would be met.
it y
.c
A third type of classification of real-time tasks is based on the target platform on which the
tasks are to be run. The different classes of scheduling algorithms according to this scheme are:
Uniprocessor w
Multiprocessorw
Distributed
w
Uniprocessor scheduling algorithms are possibly the simplest of the three classes of
algorithms. In contrast to uniprocessor algorithms, in multiprocessor and distributed scheduling
algorithms first a decision has to be made regarding which task needs to run on which processor
and then these tasks are scheduled. In contrast to multiprocessors, the processors in a distributed
system do not possess shared memory. Also in contrast to multiprocessors, there is no global up-
to-date state information available in distributed systems. This makes uniprocessor scheduling
algorithms that assume central state information of all tasks and processors to exist unsuitable for
use in distributed systems. Further in distributed systems, the communication among tasks is
through message passing. Communication through message passing is costly. This means that a
scheduling algorithm should not incur too much communication overhead. So carefully
designed distributed algorithms are normally considered suitable for use in a distributed system.
In the following sections, we study the different classes of schedulers in more detail.

1.5. Clock-Driven Scheduling

Clock-driven schedulers make their scheduling decisions regarding which task to run next
only at the clock interrupt points. Clock-driven schedulers are those for which the scheduling
points are determined by timer interrupts. Clock- driven schedulers are also called off-line
schedulers because these schedulers fix the schedule before the system starts to run. That is, the
scheduler pre-determines which task will run when. Therefore, these schedulers incur very little
run time overhead. However, a prominent shortcoming of this class of schedulers is that they
can not satisfactorily handle aperiodic and sporadic tasks since the exact time of occurrence of
these tasks can not be predicted. For this reason, this type of schedulers is also called static
scheduler.
In this section, we study the basic features of two important clock-driven schedulers:
table-driven and cyclic schedulers.
1.5.1. Table-Driven Scheduling

o m
Table-driven schedulers usually pre-compute which task would t.crun when, and store
this schedule in a table at the time the system is designed or
p o configured. Rather than
automatic computation of the schedule by the scheduler, the application
g s programmer can be
lo at run time.
given the freedom to select his own schedule for the set of tasks in the application and store the
b
schedule in a table (called schedule table) to be used by the scheduler
An example of a schedule table is shown in Table 1.. Table 1 shows that task T1 would be
taken up for execution at time instant 0, T would start
2 u pexecution 3 milliseconds afterwards, and
so on. An important question that needs to be addressed
r o at this point is what would be the size of
the schedule table that would be required for some
on a system? An answer to this question cantbe sggiven
given set of periodic real-time tasks to be run
as follows: if a set ST = {T } of n tasks is
n i
eperiods of T T ..., T . For example, if we have the

to be scheduled, then the entries in the table will replicate themselves after LCM (p , p , ,p )
1 2 n
time units, where p , p , , p are the
1 2 n d
u msecs), (e2=20 msecs, p2=100 msecs), (e3=30 msecs,
1, 2, n
following three tasks: (e1=5 msecs, p1=20t
swill repeat after every 1000 msecs. So, for any given task set,
p3=250 msecs); then, the schedule
t y
i for LCM (p , p , ,p ) duration in the schedule table. LCM
.c cycle of the set of tasks ST.
it is sufficient to store entries only 1 2 n
(p , p , , p ) is called the major
1 2 n
w
A major cycle of a set wof tasks is an interval of time on the time line such that in each major
cycle, the differentwtasks recur identically.
In the reasoning we presented above for the computation of the size of a schedule table, one
assumption that we implicitly made is that i = 0. That is, all tasks are in phase.
Start time in
Task
millisecs
T1 0
T2 3
T3 10
T4 12
T5 17
Table 29.1 An Example of a Table-Driven Schedule

However, tasks often do have non-zero phase. It would be interesting to determine what
would be the major cycle when tasks have non-zero phase. The result of an investigation into
this issue has been given as Theorem 2.1.
1.5.2. Theorem 1
The major cycle of a set of tasks ST = {T1, T2, , Tn} is LCM ({p1, p2, , pn}) even when the
tasks have arbitrary phasing.
Proof: As per our definition of a major cycle, even when tasks have non-zero phasing, task
instances would repeat the same way in each major cycle. Let us consider an example in which
the occurrences of a task Ti in a major cycle be as shown in Fig. 29.4. As shown in the example
of Fig. 29.4, there are k-1 occurrences of the task Ti during a major cycle. The first occurrence
of Ti starts time units from the start of the major cycle. The major cycle ends x time units after
same in each major cycle. o m

the last (i.e. (k-1)th) occurrence of the task Ti in the major cycle. Of course, this must be the
o t.c
+x=pi s p
o g
bl
p.
M M
Ti(1) Ti(2) Ti(k-1)
o uTi(k) Ti(k+1) Ti(2k-1)
gr
xs
t x
time en
u d
t
Fig. 29.4 Major Cycle When a Task Ti has Non-Zero Phasing
s
it y
Assume that the size of each major cycle is M. Then, from an inspection of Fig. 29.4, for the
.c
task to repeat identically in each major cycle:
w M = (k-1)pi + + x
w (2.1)
w
Now, for the task Ti to have identical occurrence times in each major cycle, + x must equal
to pi (see Fig. 29.4).
Substituting this in Expr. 2.1, we get, M = (k-1) pi + pi = k pi (2.2)
So, the major cycle M contains an integral multiple of pi. This argument holds for each task
in the task set irrespective of its phase. Therefore M = LCM ({p1, p2, , pn}).
1.5.3. Cyclic Schedulers

Cyclic schedulers are very popular and are being extensively used in the industry. A
large majority of all small embedded applications being manufactured presently are based
on cyclic schedulers. Cyclic schedulers are simple, efficient, and are easy to program. An
example application where a cyclic scheduler is normally used is a temperature controller. A

temperature controller periodically samples the temperature of a room and maintains it at

a preset value. Such temperature controllers are embedded in typical computer-controlled air
conditioners.
Major Cycle Major Cycle

Minor
Cycle
f1 f2 f3 f4 f4n f4n+1 f4n+2 f4n+3
Fig. 29.5 Major and Minor Cycles in a Cyclic Scheduler
A cyclic scheduler repeats a pre-computed schedule. The pre-computed schedule needs to be

stored only for one major cycle. Each task in the task set to be scheduled repeats identically in
m
every major cycle. The major cycle is divided into one or more minor cycles (see Fig.
o
t.c
29.5). Each minor cycle is also sometimes called a frame. In the example shown in Fig. 29.5, the
major cycle has been divided into four minor cycles (frames). The scheduling points of a cyclic
o
scheduler occur at frame boundaries. This means that a task can start executing only at the
p
beginning of a frame.
g s
The frame boundaries are defined through the interrupts generated by a periodic timer. Each
o
. bl
task is assigned to run in one or more frames. The assignment of tasks to frames is stored in
a schedule table. An example schedule table is shown in Figure 29.6.
u p
Task
r o
Frame
Number
T ts
g f
Number
e
T
3
n f 1
udT
1 2
st 3 f 3
it y T4 f4
.c
Fig. 29.6 An Example Schedule Table for a Cyclic Scheduler
w
w
The size of the frame to be used by the scheduler is an important design parameter and needs
w
to be chosen very carefully. A selected frame size should satisfy the following three constraints.
1. Minimum Context Switching: This constraint is imposed to minimize the number

of context switches occurring during task execution. The simplest interpretation of
this constraint is that a task instance must complete running within its assigned
frame. Unless a task completes within its allocated frame, the task might have to be
suspended and restarted in a later frame. This would require a context switch involving
some processing overhead. To avoid unnecessary context switches, the selected frame
size should be larger than the execution time of each task, so that when a task starts at a
frame boundary it should be able to complete within the same frame. Formally, we can
state this constraint as: max({ei}) < F where ei is the execution times of the of task Ti, and
F is the frame size. Note that this constraint imposes a lower-bound on frame size, i.e.,
frame size F must not be smaller than max({ei}).

2. Minimization of Table Size: This constraint requires that the number of entries in the
schedule table should be minimum, in order to minimize the storage requirement of the
schedule table. Remember that cyclic schedulers are used in small embedded applications
with a very small storage capacity. So, this constraint is important to the commercial
success of a product. The number of entries to be stored in the schedule table can be
minimized when the minor cycle squarely divides the major cycle. When the minor cycle
squarely divides the major cycle, the major cycle contains an integral number of minor
cycles (no fractional minor cycles). Unless the minor cycle squarely divides the major
cycle, storing the schedule for one major cycle would not be sufficient, as the schedules
in the major cycle would not repeat and this would make the size of the schedule table
large. We can formulate this constraint as:
M/F = M/F (2.3)
In other words, if the floor of M/F equals M/F, then the major cycle would
contain an integral number of frames.
o m
t.c
Task arrival
Deadline
t
p o
g s
o
t d
. bl
u p
r o
s g
nt
de
0 kF (k+1)F (k+2)F
t u
Fig. 29.7 Satisfaction of a Task Deadline
y s This third constraint on frame size is necessary to meet
t
ci constraint imposes that between the arrival of a task and its
3. Satisfaction of Task Deadline:
.
the task deadlines. This
should not missw

w
deadline, there must exist at least one full frame. This constraint is necessary since a task
its deadline, because by the time it could be taken up for scheduling, the
deadline wasw imminent. Consider this: a task can only be taken up for scheduling at the
start of a frame. If between the arrival and completion of a task, not even one frame
exists, a situation as shown in Fig. 29.7 might arise. In this case, the task arrives
sometimes after the kth frame has started. Obviously it can not be taken up for
scheduling in the kth frame and can only be taken up in the k+1th frame. But, then it
may be too late to meet its deadline since the execution time of a task can be up to the
size of a full frame. This might result in the task missing its deadline since the task
might complete only at the end of (k+1)th frame much after the deadline d has
passed. We therefore need a full frame to exist between the arrival of a task and its
deadline as shown in Fig. 29.8, so that task deadlines could be met.

t Task arrival Deadline
t d
(k+1)
F
0 kF (k+2)F
Fig. 29.8 A Full Frame Exists Between the Arrival and Deadline of a Task
More formally, this constraint can be formulated as follows: Suppose a task arises after
m
t time units have passed since the last frame (see Fig. 29.8). Then, assuming that a
o
t.c
single frame is sufficient to complete the task, the task can complete before its deadline
t) di, or 2F (di + t).
iff (2F
(2.4) p o
s
Remember that the value of t might vary from one instance of the task to another. The
g
o
bl
worst case scenario (where the task is likely to miss its deadline) occurs for the task
instance having the minimum value of t, such that t > 0. This is the worst case
p .
scenario, since under this the task would have to wait the longest before its execution can
start.
o u
r
It should be clear that if a task arrives just after a frame has started, then the task would
g
s
have to wait for the full duration of the current frame before it can be taken up for
ent
execution. If a task at all misses its deadline, then certainly it would be under such
situations. In other words, the worst case scenario for a task to meet its deadline occurs
u d
for its instance that has the minimum separation from the start of a frame. The
st
determination of the minimum separation value (i.e. min(t)) for a task among all
t y
instances of the task would help in determining a feasible frame size. We show by
i
.c
Theorem 2.2 that min(t) is equal to gcd(F, pi). Consequently, this constraint can be
written as:
w for every Ti, 2F gcd(F, pi) di
w (2.5)
w
Note that this constraint defines an upper-bound on frame size for a task Ti, i.e.,
if the frame size is any larger than the defined upper-bound, then tasks might miss their
deadlines. Expr. 2.5 defined the frame size, from the consideration of one task only.
Now considering all tasks, the frame size must be smaller than max(gcd(F, pi)+di)/2.
1.5.4. Theorem 2
The minimum separation of the task arrival from the corresponding frame start time
(min(t)), considering all instances of a task Ti, is equal to gcd(F, pi).
Proof: Let g = gcd(F, pi), where gcd is the function determining the greatest common
divisor of its arguments. It follows from the definition of gcd that g must squarely divide each
of F and pi. Let Ti be a task with zero phasing. Now, assume that this Theorem is violated for
certain integers m and n, such that the Ti(n) occurs in the mth frame and the difference between

the start time of the mth frame and the nth task arrival time is less than g. That is, 0 <
(m F n pi) < g.
Dividing this expression throughout by g, we get:
0 < (m F/g n pi/g) < 1 (2.6)
However, F/g and pi/g are both integers because g is gcd(F, pi,). Therefore, we can write F/g
= I1 and pi/g = I2 for some integral values I1 and I2. Substituting this in Expr 2.6, we get 0 < mI1
nI2 < 1. Since mI1 and nI2 are both integers, their difference cannot be a fractional value
lying between 0 and 1. Therefore, this expression can never be satisfied.
It can therefore be concluded that the minimum time between a frame boundary and
the arrival of the corresponding instance of Ti can not be less than gcd(F, pi).
For a given task set it is possible that more than one frame size satisfies all the three
constraints. In such cases, it is better to choose the shortest frame size. This is because of the
fact that the schedulability of a task set increases as more frames become available over a major
cycle.
o m
It should however be remembered that the mere fact that a suitable frame size can be
determined does not mean that a feasible schedule would be found. It may so happen that there
o t.c
is not enough number of frames available in a major cycle to be assigned to all the task instances.
We now illustrate how an appropriate frame size can be selected for cyclic schedulers
through a few examples.
s p
o g
1.5.5. Examples
bl
.
Example 1: A cyclic scheduler is to be used to run p
u the following set of periodic tasks on a
T = (e =1, p =4), T = o (e =, p =5), T = (e =1, p =20), T = (e =2,
uniprocessor: 1 1 1
p =20). Select an appropriate frame size. g
2
r 2 2 3 3 3 4 4
4
ts
e n
Solution: For the given task set, an appropriate frame size is the one that satisfies all the
three required constraints. In the d following, we determine a suitable frame size F which
t u
satisfies all the three required constraints.
y s
t
Constraint 1: Let F be an iappropriate frame size, then max {ei, F}. From this constraint, we
get F 1.5. .c
w
Constraint 2: Thew major cycle M for the given task set is given by M = LCM(4,5,20) = 20.
M should be an w integral multiple of the frame size F, i.e., M mod F = 0. This consideration
implies that F can take on the values 2, 4, 5, 10, 20. Frame size of 1 has been ruled out since
it would violate the constraint 1.
Constraint 3: To satisfy this constraint, we need to check whether a selected frame size F
satisfies the inequality: 2F gcd(F, pi) < di for each pi.
Let us first try frame size 2.
For F = 2 and task T1:
2 2 gcd(2, 4) 4 4 2 4
Therefore, for p1 the inequality is satisfied.
Let us try for F = 2 and task T2:
2 2 gcd(2, 5) 5 4 1 5
2 2 gcd(2, 20) 20 4 2 20
2 2 gcd(2, 20) 20 4 2 20
For p4 the inequality is satisfied.
Thus, constraint 3 is satisfied by all tasks for frame size 2. So, frame size 2 satisfies all the
three constraints. Hence, 2 is a feasible frame size.
Let us try frame size 4.
2 4 gcd(4, 4) 4 8 4 4
2 4 gcd(4, 5) 5 8 1 5
For p2 the inequality is not satisfied. Therefore, we need not look any further. Clearly, F = 4
is not a suitable frame size.
Let us now try frame size 5, to check if that is also feasible. o m
For F = 5 and task T1, we have
2 5 gcd(5, 4) 4 10 1 4 o t.c
s p
The inequality is not satisfied for T1. We need not look any further. Clearly, F = 5 is not a
suitable frame size.
o g
Let us now try frame size 10. . bl
u p
o
2 10 gcd(10, 4) 4 20 2 4
r
g
The inequality is not satisfied for T1. We need not look any further. Clearly, F=10 is not a
s
nt
suitable frame size.
Let us try if 20 is a feasible frame size.
d e
u
2 20 gcd(20, 4) 4 40 4 4
t
s
Therefore, F = 20 is also not suitable.
y
it
So, only the frame size 2 is suitable for scheduling.
.c
Even though for Example 1 we could successfully find a suitable frame size that satisfies all
w
the three constraints, it is quite probable that a suitable frame size may not exist for many
w
problems. In such cases, to find a feasible frame size we might have to split the task (or
w
a few tasks) that is (are) causing violation of the constraints into smaller sub-tasks
that can be scheduled in different frames.
Example 2: Consider the following set of periodic real-time tasks to be scheduled by a cyclic
scheduler: T1 = (e1=1, p1=4), T2 = (e2=2, p2=5), T3 = (e3=5, p3=20). Determine a
suitable frame size for the task set.
Solution:
Using the first constraint, we have F 5.
Using the second constraint, we have the major cycle M = LCM(4, 5, 20) = 20. So, the
permissible values of F are 5, 10 and 20.
Checking for a frame size that satisfies the third constraint, we can find that no value of F is
suitable. To overcome this problem, we need to split the task that is making the task-set not

schedulable. It is easy to observe that the task T3 has the largest execution time, and
consequently due to constraint 1, makes the feasible frame sizes quite large.
We try splitting T3 into two or three tasks. After splitting T3 into three tasks, we have:
T3.1 = (20, 1, 20), T3.2 = (20, 2, 20), T3.3 = (20, 2, 20).
The possible values of F now are 2 and 4. We can check that now after splitting the tasks,
F=2 and F=4 become feasible frame sizes.
It is very difficult to come up with a clear set of guidelines to identify the exact task that is to
be split, and the parts into which it needs to be split. Therefore, this needs to be done by trial
and error. Further, as the number of tasks to be scheduled increases, this method of trial and
error becomes impractical since each task needs to be checked separately. However, when
the task set consists of only a few tasks we can easily apply this technique to find a feasible
frame size for a set of tasks otherwise not schedulable by a cyclic scheduler.
1.5.6. A Generalized Task Scheduler

o m
We have already stated that cyclic schedulers are overwhelmingly popular in low-cost real-
o t.c
time applications. However, our discussion on cyclic schedulers was so far restricted to
scheduling periodic real-time tasks. On the other hand, many practical applications
s p
typically consist of a mixture of several periodic, aperiodic, and sporadic tasks. In this
g
section, we discuss how aperiodic and sporadic tasks can be accommodated by cyclic schedulers.
o
bl
Recall that the arrival times of aperiodic and sporadic tasks are expressed
.
statistically. Therefore, there is no way to assign aperiodic and sporadic tasks to frames without
p
ou
significantly lowering the overall achievable utilization of the system. In a generalized
scheduler, initially a schedule (assignment of tasks to frames) for only periodic tasks is prepared.
gr
The sporadic and aperiodic tasks are scheduled in the slack times that may be available in the
s
nt
frames. Slack time in a frame is the time left in the frame after a periodic task allocated to the
frame completes its execution. Non-zero slack time in a frame can exist only when the execution
d e
time of the task allocated to it is smaller than the frame size.
t u
A sporadic task is taken up for scheduling only if enough slack time is available for
s
the arriving sporadic task to complete before its deadline. Therefore, a sporadic task on
y
it
its arrival is subjected to an acceptance test. The acceptance test checks whether the task is
.c
likely to be completed within its deadline when executed in the available slack times. If it is
w
not possible to meet the tasks deadline, then the scheduler rejects it and the
w
corresponding recovery routines for the task are run. Since aperiodic tasks do not have strict
w
deadlines, they can be taken up for scheduling without any acceptance test and best effort can be
made to schedule them in the slack times available. Though for aperiodic tasks no acceptance
test is done, but no guarantee is given for a tasks completion time and best effort is made to
complete the task as early as possible.
An efficient implementation of this scheme is that the slack times are stored in a
table and during acceptance test this table is used to check the schedulability of the arriving
tasks.
Another popular alternative is that the aperiodic and sporadic tasks are accepted without any
acceptance test, and best effort is made to meet their respective deadlines.
Pseudo-code for a Generalized Scheduler: The following is the pseudo-code for a

generalized cyclic scheduler we discussed, which schedules periodic, aperiodic and sporadic
tasks. It is assumed that pre-computed schedule for periodic tasks is stored in a schedule table,

and if required the sporadic tasks have already been subjected to an acceptance test and only
those which have passed the test are available for scheduling.
cyclic-scheduler() {
current-task T = Schedule-Table[k];
k = k + 1;
k = k mod N; //N is the total number of tasks in the schedule
table
dispatch-current-task(T);
schedule-sporadic-tasks(); //Current task T completed early,
// sporadic tasks can be taken
up
schedule-aperiodic-tasks(); //At the end of the frame, the running
task
// is pre-empted if not complete
idle(); //No task to run, idle
}
o m
c
The cyclic scheduler routine cyclic-scheduler () is activated at thet.end of every frame by a
periodic timer. If the current task is not complete by the endoof the frame, then it is
suspended and the task to be run in the next frame is dispatched s p by invoking the routine
cyclic-scheduler(). If the task scheduled in a frame completesgearly, then any existing sporadic
or aperiodic task is taken up for execution.
b lo
p .
1.5.7. Comparison of Cyclic with Table-Driven
o u Scheduling
r
Both table-driven and cyclic schedulers aregimportant clock-driven schedulers. A scheduler
s
needs to set a periodic timer only once at the tapplication initialization time. This timer continues
to give an interrupt exactly at every frame e nboundary. But in table-driven scheduling, a timer
has to be set every time a task starts
u d to run. The execution time of a typical real-time task
st
is usually of the order of a few milliseconds. Therefore, a call to a timer is made every few mill
i t y
Seconds. This represents a significant overhead and results in degraded system performance.
.c are so overwhelmingly popular especially in embedded

Therefore, a cyclic scheduler is more efficient than a table-driven scheduler. This probably
is a reason why cyclic schedulers
applications. However, w
w if the overhead of setting a timer can be ignored, a table-driven
needs to be chosenwshould be at least as long as the size of the largest execution time
scheduler is more proficient than a cyclic scheduler because the size of the frame that
of a task in the task set. This is a source of inefficiency, since this results in processor time
being wasted in case of those tasks whose execution times are smaller than the chosen frame
size.
1.6. Exercises
1. State whether the following assertions are True or False. Write one or two sentences to
justify your choice in each case.
a. Average response time is an important performance metric for real-time operating
systems handling running of hard real-time tasks.
b. Unlike table-driven schedulers, cyclic schedulers do not require to store a pre-
computed schedule.

c. The minimum period for which a table-driven scheduler scheduling n periodic tasks
needs to pre-store the schedule is given by max{p1, p2, , pn}, where pi is the period
of the task Ti.
d. A cyclic scheduler is more proficient than a pure table-driven scheduler for
scheduling a set of hard real-time tasks.
e. A suitable figure of merit to compare the performance of different hard real-time task
scheduling algorithms can be the average task response times resulting from each
algorithm.
f. Cyclic schedulers are more proficient than table-driven schedulers.
g. While using a cyclic scheduler to schedule a set of real-time tasks on a uniprocessor,
when a suitable frame size satisfying all the three required constraints has been found,
it is guaranteed that the task set would be feasibly scheduled by the cyclic scheduler.
h. When more than one frame satisfies all the constraints on frame size while scheduling a
set of hard real-time periodic tasks using a cyclic scheduler, the largest of these frame
sizes should be chosen.
o m
i. In table-driven scheduling of three periodic tasks T1,T2,T3, the scheduling table must
the period of the task Ti.

o t.c
have schedules for all tasks drawn up to the time interval [0,max(p1,p2,p3)], where pi is
p
j. When a set of hard real-time periodic tasks are being scheduled using a cyclic
s
g
scheduler, if a certain frame size is found to be not suitable, then any frame size
o
bl
smaller than this would not also be suitable for scheduling the tasks.
p .
k. When a set of hard real-time periodic tasks are being scheduled using a cyclic
scheduler, if a candidate frame size exceeds the execution time of every task and
u
squarely divides the major cycle, then it would be a suitable frame size to schedule the
o
given set of tasks.
gr
s
l. Finding an optimal schedule for a set of independent periodic hard real-time tasks
complete problem. ent

without any resource- sharing constraints under static priority conditions is an NP-
2.
u d
Real-time tasks are normally classified into periodic, aperiodic, and sporadic real-time
task.
st
t y
a. What are the basic criteria based on which a real-time task can be determined to belong
i
.c
to one of the three categories?
b. Identify some characteristics that are unique to each of the three categories of tasks.
w
w
c. Give examples of tasks in practical systems which belong to each of the three
3.
w
categories.
What do you understand by an optimal scheduling algorithm? Is it true that the time
complexity of an optimal scheduling algorithm for scheduling a set of real-time tasks in a
uniprocessor is prohibitively expensive to be of any practical use? Explain your answer.
4. Suppose a set of three periodic tasks is to be scheduled using a cyclic scheduler on a
uniprocessor. Assume that the CPU utilization due to the three tasks is less than 1. Also,
assume that for each of the three tasks, the deadlines equals the respective periods.
Suppose that we are able to find an appropriate frame size (without having to split any of
the tasks) that satisfies the three constraints of minimization of context switches,
minimization of schedule table size, and satisfaction of deadlines. Does this imply that it is
possible to assert that we can feasibly schedule the three tasks using the cyclic scheduler?
If you answer affirmatively, then prove your answer. If you answer negatively, then show
an example involving three tasks that disproves the assertion.
5. Consider a real-time system which consists of three tasks T1, T2, and T3, which have been
characterized in the following table.
Phase Execution Time Relative Deadline Period

Task
mSec mSec mSec mSec
T1 20 10 20 20
T2 40 10 50 50
T3 70 20 80 80
If the tasks are to be scheduled using a table-driven scheduler, what is the length of time
for which the schedules have to be stored in the pre-computed schedule table of the
scheduler.
6. A cyclic real-time scheduler is to be used to schedule three periodic tasks T1, T2,
and T3 with the following characteristics:

Task
mSec mSec mSec
o m mSec
T1 0 20 100
t.c 100
po
T2 0 20 80 80
T3 0 30
g s
150 150
Suggest a suitable frame size that can be used. Show

b lo all intermediate steps in your
calculations.
p .
7. Consider the following set of three independentureal-time periodic tasks.
r o
s g Time Period Deadline
Task
Start Time
mSec n tmSec
Processing
mSec mSec
T 20 d e 25 150 100
1
t u
T 2 40
s 10 50 30
T 3 60 ity 50 200 150
. c
Suppose a cyclic w scheduler is to be used to schedule the task set. What is the
wtask set? Suggest a suitable frame size and provide a feasible schedule
major cycle of the
(task to frame w
assignment for a major cycle) for the task set.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
30
Real-Time Task
Scheduling Part 2

Get an introduction to event-driven schedulers
Understand the basics of Foreground-Background schedulers
Get an overview of Earliest Deadline First (EDF) Algorithm
Work out solutions to problems based on EDF
Know the shortcomings of EDF
Get an overview of Rate Monotonic Algorithm (RMA)
Know the necessary and sufficient conditions for a set of real-time tasks to be RMA-
schedulable
Work out solutions to problems based on EDF o m
Infer the maximum achievable CPU utilization
o t.c
Understand the Advantages and Disadvantages of RMA
s p
Get an overview of Deadline Monotonic Algorithm (DMA)
o g
Understand the phenomenon of Context-Switching . bl and Self-Suspension
u p
1. Event-driven Scheduling An rIntroduction o
s g
n t
In this lesson, we shall discuss the various algorithms for event-driven scheduling. From the
e which the scheduling points are determined by the
previous lesson, we may recollect the following points:
The clock-driven schedulers are thosedin
u event-driven ones, the scheduling points are defined by
interrupts received from a clock. Intthe
s interrupts. The hybrid ones use both clock interrupts as well
ti y scheduling points
certain events which precludes clock
as event occurrences to define their
.
Cyclic schedulers are very c efficient. However, a prominent shortcoming of the cyclic
w very complex to determine a suitable frame size as well as a
schedulers is that it becomes
feasible schedule whenwthe number of tasks increases. Further, in almost every frame some
processing time is w wasted (as the frame size is larger than all task execution times) resulting
in sub-optimal schedules. Event-driven schedulers overcome these shortcomings. Further, event-
driven schedulers can handle aperiodic and sporadic tasks more proficiently. On the flip side,
event-driven schedulers are less efficient as they deploy more complex scheduling algorithms.
Therefore, event-driven schedulers are less suitable for embedded applications as these are
required to be of small size, low cost, and consume minimal amount of power.
It should now be clear why event-driven schedulers are invariably used in all moderate and
large-sized applications having many tasks, whereas cyclic schedulers are predominantly
used in small applications. In event-driven scheduling, the scheduling points are defined
by task completion and task arrival events. This class of schedulers is normally preemptive,
i.e., when a higher priority task becomes ready, it preempts any lower priority task that may be
running.

1.1. Types of Event Driven Schedulers

We discuss three important types of event-driven schedulers:
Simple priority-based
Rate Monotonic Analysis (RMA)
Earliest Deadline First (EDF)
The simplest of these is the foreground-background scheduler, which we discuss next. In
section 3.4, we discuss EDF and in section 3.5, we discuss RMA.
1.2. Foreground-Background Scheduler

A foreground-background scheduler is possibly the simplest priority-driven preemptive
scheduler. In foreground-background scheduling, the real-time tasks in an application are
run as foreground tasks. The sporadic, aperiodic, and non-real-time tasks are run as
background tasks. Among the foreground tasks, at every scheduling point the highest
o m
t.c
priority task is taken up for scheduling. A background task can run when none of the
foreground tasks is ready. In other words, the background tasks run at the lowest priority.
p o
Let us assume that in a certain real-time system, there are n foreground tasks which are
s
denoted as: T1,T2,...,Tn. As already mentioned, the foreground tasks are all periodic. Let TB be
g
o
the only background task. Let eB be the processing time requirement of TB. In this case, the
completion time (ctB) for the background task is given by:
n
.
ctB = eB / (1i=1 ei / pi)
bl
B B
u p (3.1/2.7)
r o
This expression is easy to interpret. When any foreground task is executing, the background
task waits. The average CPU utilization due to the foreground task Ti is ei/pi, since ei amount of
s g
nt
processing time is required over every pi period. It follows that all foreground tasks together
n
would result in CPU utilization of i=1 ei / pi. Therefore, the average time available for execution
d e n
of the background tasks in every unit of time is 1i=1 ei / pi. Hence, Expr. 2.7 follows easily.
t u
s
We now illustrate the applicability of Expr. 2.7 through the following three simple examples.
y
i t
1.3. Examples .c
w
w
Example 1: Consider a real-time system in which tasks are scheduled using foreground-
w
background scheduling. There is only one periodic foreground task Tf : (f =0, pf =50 msec, ef
=100 msec, df =100 msec) and the background task be TB = (eB =1000 msec). Compute the
B B
completion time for background task.
Solution: By using the expression (2.7) to compute the task completion time, we have
ctB = 1000 / (150/100) = 2000 msec
B
So, the background task TB would take 2000 milliseconds to complete.

B
Example 2: In a simple priority-driven preemptive scheduler, two periodic tasks T1 and T2

and a background task are scheduled. The periodic task T1 has the highest priority and
executes once every 20 milliseconds and requires 10 milliseconds of execution time each
time. T2 requires 20 milliseconds of processing every 50 milliseconds. T3 is a background
task and requires 100 milliseconds to complete. Assuming that all the tasks start at time 0,
determine the time at which T3 will complete.

2
Solution: The total utilization due to the foreground tasks: i=1 ei / pi = 10/20 + 20/50 =
90/100.
This implies that the fraction of time remaining for the background task to execute is given
by:
2
1i=1 ei / pi = 10/100.
Therefore, the background task gets 1 millisecond every 10 milliseconds. Thus, the
background task would take 10(100/1) = 1000 milliseconds to complete.
Example 3: Suppose in Example 1, an overhead of 1 msec on account of every context

switch is to be taken into account. Compute the completion time of TB. B
Context Switching Time
Back Foreground Back Foreground

Foreground ground
ground o m
01 51 52 100
o t.c Time in milli secs
Fig. 30.1 Task Schedule for Example 3
s p
o g
bl
Solution: The very first time the foreground task runs (at time 0), it incurs a context
p .
switching overhead of 1 msec. This has been shown as a shaded rectangle in Fig. 30.1.
Subsequently each time the foreground task runs, it preempts the background task and incurs
u
one context switch. On completion of each instance of the foreground task, the background
o
r
task runs and incurs another context switch. With this observation, to simplify our
g
s
computation of the actual completion time of TB, we can imagine that the execution time of
nt
B
every foreground task is increased by two context switch times (one due to itself and
e
the other due to the background task running after each time it completes). Thus, the
d
u
net effect of context switches can be imagined to be causing the execution time of
t
s
the foreground task to increase by 2 context switch times, i.e. to 52 milliseconds from
y
t
50 milliseconds. This has pictorially been shown in Fig. 30.1.
i
.c
Now, using Expr. 2.7, we get the time required by the background task to complete:
1000/(152/100) = 2083.4 milliseconds
w
w
In the following two sections, we examine two important event-driven schedulers: EDF
w
(Earliest Deadline First) and RMA (Rate Monotonic Algorithm). EDF is the optimal dynamic
priority real-time task scheduling algorithm and RMA is the optimal static priority real-time task
scheduling algorithm.
1.4. Earliest Deadline First (EDF) Scheduling

In Earliest Deadline First (EDF) scheduling, at every scheduling point the task having the
shortest deadline is taken up for scheduling. This basic principles of this algorithm is very
intuitive and simple to understand. The schedulability test for EDF is also simple. A task set is
schedulable under EDF, if and only if it satisfies the condition that the total processor utilization
due to the task set is less than 1. For a set of periodic real-time tasks {T1, T2, , Tn}, EDF
schedulability criterion can be expressed as:
n n
i=1
e i / p i = i=1
ui 1
(3.2/2.8)

where ui is average utilization due to the task Ti and n is the total number of tasks in the task
set. Expr. 3.2 is both a necessary and a sufficient condition for a set of tasks to be EDF
schedulable.
EDF has been proven to be an optimal uniprocessor scheduling algorithm. This means that, if
a set of tasks is not schedulable under EDF, then no other scheduling algorithm can feasibly
schedule this task set. In the simple schedulability test for EDF (Expr. 3.2), we assumed that the
period of each task is the same as its deadline. However, in practical problems the period of a
task may at times be different from its deadline. In such cases, the schedulability test needs to be
changed. If pi > di, then each task needs ei amount of computing time every min(pi, di)
duration of time. Therefore, we can rewrite Expr. 3.2 as:
n
i=1
ei / min(pi, di) 1 (3.3/2.9)
However, if pi < di, it is possible that a set of tasks is EDF schedulable, even when the task
set fails to meet the Expr 3.3. Therefore, Expr 3.3 is conservative when pi < di, and is not a
necessary condition, but only a sufficient condition for a given task set to be EDF schedulable.
o m
t.c
Example 4: Consider the following three periodic real-time tasks to be scheduled using EDF
on a uniprocessor: T1 = (e1=10, p1=20), T2 = (e2=5, p2=50), T3 = (e3=10, p3=35). Determine
whether the task set is schedulable.
p o
g s
o
Solution: The total utilization due to the three tasks is given by:
bl
3
ei / pi = 10/20 + 5/50 + 10/35 = 0.89
i=1
p .
This is less than 1. Therefore, the task set is EDF schedulable.
o u
Though EDF is a simple as well as an optimal algorithm, it has a few shortcomings
r
which render it almost unusable in practical applications. The main problems with EDF are
g
s
discussed in Sec. 3.4.3. Next, we discuss the concept of task priority in EDF and then discuss
how EDF can be practically implemented.
e nt
ud Priority Scheduling Algorithm?
1.4.1. Is EDF Really a Dynamic
t
s
y is a dynamic priority scheduling algorithm. Was it after all
i
We stated in Sec 3.3 that EDF
correct on our part to assert.c
t
that EDF is a dynamic priority task scheduling algorithm? If EDF
were to be considered a w dynamic priority algorithm, we should be able determine the precise
priority value of a taskw
at any point of time and also be able to show how it changes with time. If
w
we reflect on our discussions of EDF in this section, EDF scheduling does not require any
priority value to be computed for any task at any time. In fact, EDF has no notion of a priority
value for a task. Tasks are scheduled solely based on the proximity of their deadline.
However, the longer a task waits in a ready queue, the higher is the chance (probability) of
being taken up for scheduling. So, we can imagine that a virtual priority value associated with a
task keeps increasing with time until the task is taken up for scheduling. However, it is important
to understand that in EDF the tasks neither have any priority value associated with them, nor
does the scheduler perform any priority computations to determine the schedulability of a task at
either run time or compile time.
1.4.2. Implementation of EDF

A naive implementation of EDF would be to maintain all tasks that are ready for execution in
a queue. Any freshly arriving task would be inserted at the end of the queue. Every node in the

queue would contain the absolute deadline of the task. At every preemption point, the entire
queue would be scanned from the beginning to determine the task having the shortest deadline.
However, this implementation would be very inefficient. Let us analyze the complexity of this
scheme. Each task insertion will be achieved in O(1) or constant time, but task selection (to run
next) and its deletion would require O(n) time, where n is the number of tasks in the queue.
A more efficient implementation of EDF would be as follows. EDF can be implemented by
maintaining all ready tasks in a sorted priority queue. A sorted priority queue can efficiently be
implemented by using a heap data structure. In the priority queue, the tasks are always kept
sorted according to the proximity of their deadline. When a task arrives, a record for it can be
inserted into the heap in O(log2 n) time where n is the total number of tasks in the priority queue.
At every scheduling point, the next task to be run can be found at the top of the heap. When a
task is taken up for scheduling, it needs to be removed from the priority queue. This can be
achieved in O(1) time.
A still more efficient implementation of the EDF can be achieved as follows under the
assumption that the number of distinct deadlines that tasks in an application can have are
o m
restricted. In this approach, whenever task arrives, its absolute deadline is computed from its
o t.c
release time and its relative deadline. A separate FIFO queue is maintained for each distinct
relative deadline that tasks can have. The scheduler inserts a newly arrived task at the end of the
p
corresponding relative deadline queue. Clearly, tasks in each queue are ordered according to
s
their absolute deadlines.
o g
bl
To find a task with the earliest absolute deadline, the scheduler only needs to search
p .
among the threads of all FIFO queues. If the number of priority queues maintained by the
scheduler is Q, then the order of searching would be O(1). The time to insert a task would also
be O(1).
o u
gr
1.4.3. Shortcomings of EDF s
e nt
d
In this subsection, we highlight some of the important shortcomings of EDF when used for
u
t
scheduling real-time tasks in practical applications.
s
i
Transient Overload Problem:t y Transient overload denotes the overload of a system for a
.c overload occurs when some task takes more time to
very short time. Transient
complete than whatwwas originally planned during the design time. A task may take
longer to completewdue to many reasons. For example, it might enter an infinite loop or
w
encounter an unusual condition and enter a rarely used branch due to some abnormal input
values. When EDF is used to schedule a set of periodic real-time tasks, a task overshooting
its completion time can cause some other task(s) to miss their deadlines. It is usually very
difficult to predict during program design which task might miss its deadline when a transient
overload occurs in the system due to a low priority task overshooting its deadline. The only
prediction that can be made is that the task (tasks) that would run immediately after
the task causing the transient overload would get delayed and might miss its (their)
respective deadline(s). However, at different times a task might be followed by different
tasks in execution. However, this lead does not help us to find which task might miss its
deadline. Even the most critical task might miss its deadline due to a very low priority task
overshooting its planned completion time. So, it should be clear that under EDF any amount
of careful design will not guarantee that the most critical task would not miss its deadline
under transient overload. This is a serious drawback of the EDF scheduling algorithm.

Resource Sharing Problem: When EDF is used to schedule a set of real-time tasks,
unacceptably high overheads might have to be incurred to support resource sharing among
the tasks without making tasks to miss their respective deadlines. We examine this issue in
some detail in the next lesson.
Efficient Implementation Problem: The efficient implementation that we discussed in Sec.

3.4.2 is often not practicable as it is difficult to restrict the number of tasks with
distinct deadlines to a reasonable number. The efficient implementation that achieves
O(1) overhead assumes that the number of relative deadlines is restricted. This may be
unacceptable in some situations. For a more flexible EDF algorithm, we need to keep the
tasks ordered in terms of their deadlines using a priority queue. Whenever a task arrives, it is
inserted into the priority queue. The complexity of insertion of an element into a priority
queue is of the order log2 n, where n is the number of tasks to be scheduled. This represents
a high runtime overhead, since most real-time tasks are periodic with small periods and strict
deadlines.
o m
1.5. Rate Monotonic Algorithm(RMA)
o t.c
s p
We had already pointed out that RMA is an important event-driven scheduling algorithm.
g
This is a static priority algorithm and is extensively used in practical applications. RMA assigns
o
bl
priorities to tasks based on their rates of occurrence. The lower the occurrence rate of a task, the
.
lower is the priority assigned to it. A task having the highest occurrence rate (lowest period) is
p
task scheduling algorithm. o u
accorded the highest priority. RMA has been proved to be the optimal static priority real-time
gr
In RMA, the priority of a task is directly proportional to its rate (or, inversely proportional to its
s
period). That is, the priority of any task Ti is computed as: priority = k / pi, where pi is the
nt
period of the task Ti and k is a constant. Using this simple expression, plots of priority values of
e
d
tasks under RMA for tasks of different periods can be easily obtained. These plots have been
u
st
shown in Fig. 30.10(a) and Fig. 30.10(b). It can be observed from these figures that the priority
of a task increases linearly with the arrival rate of the task and inversely with its period.
it y
.c
w
w
Priority w Priority
Rate Period
(a) (b)
Fig. 30.2 Priority Assignment to Tasks in RMA
1.5.1. Schedulability Test for RMA

An important problem that is addressed during the design of a uniprocessor-based real-time
system is to check whether a set of periodic real-time tasks can feasibly be scheduled under
RMA. Schedulability of a task set under RMA can be determined from a knowledge of the

worst-case execution times and periods of the tasks. A pertinent question at this point is how can
a system developer determine the worst-case execution time of a task even before the system is
developed. The worst-case execution times are usually determined experimentally or through
simulation studies.
The following are some important criteria that can be used to check the schedulability of a set
of tasks set under RMA.
1.5.1.1 Necessary Condition

A set of periodic real-time tasks would not be RMA schedulable unless they satisfy the
following necessary condition:
n n
i=1
e i / p i = i=1
ui 1
where ei is the worst case execution time and pi is the period of the task Ti, n is the number of
tasks to be scheduled, and ui is the CPU utilization due to the task Ti. This test simply expresses
m
the fact that the total CPU utilization due to all the tasks in the task set should be less than 1.
o
1.5.1.2 Sufficient Condition o t.c
p
s is an important result and
g
The derivation of the sufficiency condition for RMA schedulability
o ofwould
was obtained by Liu and Layland in 1973. A formal derivation
b l the Liu and Laylands results
.
from first principles is beyond the scope of this discussion. We subsequently refer to the
sufficiency as the Liu and Laylands condition. Apset of n real-time periodic tasks are
schedulable under RMA, if
o u
ru n(2 1)
n 1/n
s g i=1 i (3.4/2.10)
where u is the utilization due to task T . Lettus now examine the implications of this result. If a
i
e
i
n then it is guaranteed that the set of tasks would be
set of tasks satisfies the sufficient condition,
RMA schedulable.
Consider the case where there is onlytu
d
one task in the system, i.e. n = 1.
Substituting n = 1 in Expr. 3.4, wesget,
i ty u 1(2 1) or u 1
1 1/1 1
Similarly for n = 2, we get, .

c i=1 i i=1 i
w 2 2
For n = 3, we get, w
w i=1
u 2(2 1) or u 0.828
i
1/2
i=1 i
3 3
i=1
ui 3(21/3 1) or i=1
ui 0.78
For n , we get,

i=1
ui 3(21/ 1) or i=1
ui .0

1
ui
0.692
(1,0)
Number of tasks
Fig. 30.3 Achievable Utilization with the Number of Tasks under RMA
Evaluation of Expr. 3.4 when n involves an indeterminate expression of the type .0.
By applying LHospitals rule, we can verify that the right hand side of the expression evaluates
to loge2 = 0.692. From the above computations, it is clear that the maximum CPU utilization that
can be achieved under RMA is 1. This is achieved when there is only a single task in the system.
As the number of tasks increases, the achievable CPU utilization falls and as n , the o m
ot.c
achievable utilization stabilizes at loge2, which is approximately 0.692. This is pictorially shown
in Fig. 30.3. We now illustrate the applicability of the RMA schedulability criteria through a few
examples.
s p
o g
1.5.2. Examples
. bl
u p
Example 5: Check whether the following set of periodic real-time tasks is schedulable under
o
RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=60, p3=200).
r
g
Solution: Let us first compute the totaltsCPU utilization achieved due to the three given
tasks.
e n
u d u = 20/100 + 30/150 + 60/200 = 0.7
3
i
This is less than 1; therefore tthe necessary condition for schedulability of the tasks is
i=1
satisfied. Now checking for y

s
t the sufficiency condition, the task set is schedulable under RMA
i given by Expr. 3.4 is satisfied Checking for satisfaction of
. c
if Liu and Laylands condition
Expr. 3.4, the maximum achievable utilization is given by:
w 1/3
w 3(2 1) = 0.78
w
The total utilization
Laylands criterion:
has already been found to be 0.7. Now substituting these in Liu and
3 1/3
i=1
ui 3(2 1)
Therefore, we get 0.7 < 0.78.
Expr. 3.4, a sufficient condition for RMA schedulability, is satisfied. Therefore, the task set
is RMA-schedulable
Example 6: Check whether the following set of three periodic real-time tasks is
schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3
= (e3=90, p3=200).
Solution: Let us first compute the total CPU utilization due to the given task set:
3
i=1
ui = 20/100 + 30/150 + 90/200 = 0.7

Now checking for Liu and Layland criterion:

3
i=1
ui 0.78
Since 0.85 is not 0.78, the task set is not RMA-schedulable.
Liu and Layland test (Expr. 2.10) is pessimistic in the following sense.
If a task set passes the Liu and Layland test, then it is guaranteed to be RMA schedulable. On
the other hand, even if a task set fails the Liu and Layland test, it may still be RMA
schedulable.
It follows from this that even when a task set fails Liu and Laylands test, we should not
conclude that it is not schedulable under RMA. We need to test further to check if the task set is
RMA schedulable. A test that can be performed to check whether a task set is RMA
schedulable when it fails the Liu and Layland test is the Lehoczkys test. Lehoczkys test has
been expressed as Theorem 3.
o m
t.c
1.5.3. Theorem 3
p o
A set of periodic real-time tasks is RMA schedulable under any task phasing, iff all the tasks
meet their respective first deadlines under zero phasing.
g s
o
.bl
u p
r o
T1 T2 T1 T2 sg T1 T2
n t 60 70
10 30 40
e
(a) T dis in phase with T
90 time in msec
t u 1 2
s
ti y
.c
w
T2 wT1 T2 T1 T2
w
20 30 50 60 80 time in msec
(b) T1 has a 20 msec phase with respect to T2
Fig. 30.4 Worst Case Response Time for a Task Occurs When It is in Phase
with Its Higher Priority Tasks
A formal proof of this Theorem is beyond the scope of this discussion. However, we provide an
intuitive reasoning as to why Theorem 3 must be true. Intuitively, we can understand this result
from the following reasoning. First let us try to understand the following fact.

The worst case response time for a task occurs when it is in phase with its higher
To see why this statement must be true, consider the following statement. Under RMA whenever
a higher priority task is ready, the lower priority tasks can not execute and have to wait. This
implies that, a lower priority task will have to wait for the entire duration of execution of each
higher priority task that arises during the execution of the lower priority task. More number of
instances of a higher priority task will occur, when a task is in phase with it, when it is in phase
with it rather than out of phase with it. This has been illustrated through a simple example in Fig.
30.4. In Fig. 30.4(a), a higher priority task T1=(10,30) is in phase with a lower priority task
T2=(60,120), the response time of T2 is 90 msec. However, in Fig. 30.4(b), when T1 has a 20
msec phase, the response time of T2 becomes 80. Therefore, if a task meets its first deadline
under zero phasing, then they it will meet all its deadlines.
Example 7: Check whether the task set of Example 6 is actually schedulable under RMA.
o m
Solution: Though the results of Liu and Laylands test were negative as per the results
of Example 6, we can apply the Lehoczky test and observe the following:
o t.c
p
For the task T1: e1 < p1 holds since 20 msec < 100 msec. Therefore, it would meet its first
s
deadline (it does not have any tasks that have higher priority).
o g
Deadline for T1 bl
p .
o u
g r
T1
ts
20 e n 100
(a) T1
u d meets its first deadline
s t Deadline for T2
it y
.c
T1 T2 w
w
20 w 50 150
(b) T2 meets its first deadline
Deadline for T3
T1 T2 T3 T1 T3 T2 T3
20 50 100 120 150 180 190 200

(c) T3 meets its first deadline
Fig. 30.5 Checking Lehoczkys Criterion for Tasks of Example 7

For the task T2: T1 is its higher priority task and considering 0 phasing, it would occur once
before the deadline of T2. Therefore, (e1 + e2) < p2 holds, since 20 + 30 = 50 msec < 150
msec. Therefore, T2 meets its first deadline.
For the task T3: (2e1 + 2e2 + e3) < p3 holds, since 220 + 230 + 90 = 190msec < 200 msec.
We have considered 2e1 and 2e2 since T1 and T2 occur twice within the first deadline of
T3. Therefore, T3 meets its first deadline. So, the given task set is schedulable under RMA. The
schedulability test for T3 has pictorially been shown in Fig. 30.5. Since all the tasks meet their
first deadlines under zero phasing, they are RMA schedulable according to Lehoczkys results.
Ti(1)
o m
T1(1) T1(2)
t.c
oT1(3)
s p
o g
1 bl of T
Fig. 30.6 Instances of T over a single .instance i
p
Let us now try to derive a formal expression foruthis important result of Lehoczky. Let {T
T ,T } be the set of tasks to be scheduled. Let us r oalso assume that the tasks have been ordered 1,
2, i
s g priorities are related as: pr(T ) > pr(T ) > >

pr(T ), where pr(T ) denotes the priority ofnthe
i i
t task T . Observe that the task T has the highest
in descending order of their priority. That is, task
i
1
1
2
priority and task T has the least priority. e

This priority ordering can be assumed without any loss
i
d
u of the tasks. Consider that the task T arrives at the
of generalization since the required priority ordering among an arbitrary collection of tasks can
t
s shown in Fig. 30.6. During the first instance of the task T ,
always be achieved by a simple renaming i
t y
time instant 0. Consider the example
three instances of the task Tcihave occurred. Each time T occurs, T has to wait since T has
i
. 1 1 i 1
Let us now determinew

higher priority than T . i
This is given by p w
w the exact number of times that T occurs within a single instance of T .
1
/ p . Since T s execution time is e , then the total execution time required
i 1 1 1
i
due to task T before the deadline of T is p / p e . This expression can easily be generalized
1 i i 1 1
to consider the execution times all tasks having higher priority than Ti (i.e. T1, T2, , Ti1).
Therefore, the time for which Ti will have to wait due to all its higher priority tasks can be
expressed as:
i-1
k=1
pi / pk ek (3.5/2.11)
Expression 3.5 gives the total time required to execute Tis higher priority tasks for which Ti
would have to wait. So, the task Ti would meet its first deadline, iff
i-1
ei + k=1 pi / pk ek pi (3.6/2.12)
That is, if the sum of the execution times of all higher priority tasks occurring before Tis first
deadline, and the execution time of the task itself is less than its period pi, then Ti would
complete before its first deadline. Note that in Expr. 3.6, we have implicitly assumed that the

task periods equal their respective deadlines, i.e. pi = di. If pi < di, then the Expr. 3.6 would need
modifications as follows.
i-1
ei + k=1 di / pk ek di (3.7/2.13)
Note that even if Expr. 3.7 is not satisfied, there is some possibility that the task set
may still be schedulable. This might happen because in Expr. 3.7 we have considered zero
phasing among all the tasks, which is the worst case. In a given problem, some tasks may have
non-zero phasing. Therefore, even when a task set narrowly fails to meet Expr 3.7, there is some
chance that it may in fact be schedulable under RMA. To understand why this is so, consider a
task set where one particular task Ti fails Expr. 3.7, making the task set not schedulable. The
task misses its deadline when it is in phase with all its higher priority task. However, when the
task has non-zero phasing with at least some of its higher priority tasks, the task might
actually meet its first deadline contrary to any negative results of the expression 3.7.
Let us now consider two examples to illustrate the applicability of the Lehoczkys results.
o m
Example 8: Consider the following set of three periodic real-time tasks: T1=(10,20),
T2=(15,60), T3=(20,120) to be run on a uniprocessor. Determine whether the task set is
schedulable under RMA.
o t.c
s p
Solution: First let us try the sufficiency test for RMA schedulability. By Expr. 3.4 (Liu and
Layland test), the task set is schedulable if u 0.78. g
u = 10/20 + 15/60 +lo
i
i
. b 20/120 = 0.91
3.4 is a pessimistic test, we need to test further. up

This is greater than 0.78. Therefore, the given task set fails Liu and Layland test. Since Expr.
Let us now try Lehoczkys test. All the tasks r oT , T , T are already ordered in decreasing
1 2 3
order of their priorities.
s g
Testing for task T :
1
n t
Testing for task T :
e
Since e1 (10 msec) is less than d1 (20 msec), T would meet its first deadline.
1
d+ 60/20 10 60 or 15 + 30 = 45 60 msec
2
u
t T would meet its first deadline.
15
s
Testing for Task T3: i ty
The condition is satisfied. Therefore, 2
.c20 + 120/20 10 + 120/60 15 = 20 + 60 + 30 = 110 msec

This is less than T3s w
w deadline
Since all the three tasks
of 120. Therefore T would meet its first deadline.
3
meet their respective first deadlines, the task set is RMA schedulable
w results.
according to Lehoczkys
Example 9: RMA is used to schedule a set of periodic hard real-time tasks in a system. Is it
possible in this system that a higher priority task misses its deadline, whereas a lower
priority task meets its deadlines? If your answer is negative, prove your denial. If your
answer is affirmative, give an example involving two or three tasks scheduled using RMA
where the lower priority task meets all its deadlines whereas the higher priority task misses
its deadline.
Solution: Yes. It is possible that under RMA a higher priority task misses its deadline where
as a lower priority task meets its deadline. We show this by constructing an example.
Consider the following task set: T1 = (e1=15, p1=20), T2 = (e2=6, p2=35), T3 = (e3=3,
p3=100). For the given task set, it is easy to observe that pr(T1) > pr(T2) > pr(T3). That is, T1,
T2, T3 are ordered in decreasing order of their priorities.
For this task set, T3 meets its deadline according to Lehoczkys test since
e3 + p3 / p2 e2 + p3 / p1 e1 = 3 + ( 100/35 6) + ( 100/20
15)
= 3 + (3 6) + (5 15) = 96 100 msec.
But, T2 does not meet its deadline since
e2 + p2 / p1 e1 = 6 + ( 35/20 15) = 6 + (2 15) = 36 msec.
This is greater than the deadline of T2 (35 msec).
As a consequence of the results of Example 9, by observing that the lowest priority task of a
given task set meets its first deadline, we can not conclude that the entire task set is RMA
schedulable. On the contrary, it is necessary to check each task individually as to
whether it meets its first deadline under zero phasing. If one finds that the lowest
priority task meets its deadline, and concludes that the entire task set would be feasibly
scheduled under RMA, he is likely to be flawed.
1.5.4. Achievable CPU Utilization

o m
Liu and Laylands results (Expr. 3.4) bounded the CPU utilization t.cbelow which a task set
would be schedulable. It is clear from Expr. 3.4 and Fig. 30.10othat the Liu and Layland
schedulability criterion is conservative and restricts the maximum s pachievable utilization due to
any task set which can be feasibly scheduled under RMA to 0.69
o g when the number of tasks in the
l this is a pessimistic figure. In
fact, it has been found experimentally that for a large.bcollection of tasks with independent
task set is large. However, (as you might have already guessed)
periods, the maximum utilization below which a task u pset can feasibly be scheduled is on the
average close to 88%.
r o
For harmonic tasks, the maximum achievable
s g periods
utilization (for a task set to have a feasible
n t scheduled. areLetharmonically
schedule) can still be higher. In fact, if all the task related, then even a
d
periods of a task set said to be harmonicallye related. The task periods in a task set are said to be
task set having 100% utilization can be feasibly us first understand when are the
harmonically related, iff for any twotu arbitrary tasks T and T in the task set, whenever p > p , it
should imply that p is an integral s
i k i k
ti yinteger n > 1. In other words, p should squarely divide p . An

i multiple of p . That is, whenever p > p , it should be possible
k i k
to express p as n p for some
i k
. c task set is the following: T = (5, 30), T2 = (8, 120), T3 = (12,
example of a harmonically related 1
k i
60). w
w a harmonically related task set with even 100% utilization can feasibly
It is easy to prove that
be scheduled. w
1.5.5. Theorem 4
For a set of harmonically related tasks HS = {Ti}, the RMA schedulability criterion is given
n
by i=1 ui 1.
Proof: Let us assume that T1, T2, , Tn be the tasks in the given task set. Let us further
assume that the tasks in the task set T1, T2, , Tn have been arranged in increasing order of their
periods. That is, for any i and j, pi < pj whenever i < j. If this relationship is not satisfied, then a
simple renaming of the tasks can achieve this. Now, according to Expr. 3.6, a task Ti meets its
i-1
deadline, if ei + k=1 pi / pk ek pi.

However, since the task set is harmonically related, pi can be written as m pk for some m.
Using this, pi / pk = pi / pk. Now, Expr. 3.6 can be written as:
i-1
ei + k=1 (pi / pk) ek pi
n-1
For Ti = Tn, we can write, en + k=1 (pn / pk) ek pn.
Dividing both sides of this expression by pn, we get the required result.
n n
Hence, the task set would be schedulable iff k=1 ek / pk 1 or i=1 ui 1.
1.5.6. Advantages and Disadvantages of RMA

In this section, we first discuss the important advantages of RMA over EDF. We then point
out some disadvantages of using RMA. As we had pointed out earlier, RMA is very commonly
used for scheduling real-time tasks in practical applications. Basic support is available in almost
all commercial real-time operating systems for developing applications using RMA. RMA is
m
simple and efficient. RMA is also the optimal static priority task scheduling algorithm. Unlike
o
t.c
EDF, it requires very few special data structures. Most commercial real-time operating systems
support real-time (static) priority levels for tasks. Tasks having real-time priority levels are
p o
arranged in multilevel feedback queues (see Fig. 30.7). Among the tasks in a single level, these
s
commercial real-time operating systems generally provide an option of either time-slicing and
g
round-robin scheduling or FIFO scheduling.
o
. bl
u p
RMA Transient Overload Handling: RMA possesses good transient overload handling
capability. Good transient overload handling capability essentially means that, when a lower
r o
priority task does not complete within its planned completion time, it can not make any higher
g
priority task to miss its deadline. Let us now examine how transient overload would affect a set
s
ent
of tasks scheduled under RMA. Will a delay in completion by a lower priority task affect a
higher priority task? The answer is: No. A lower priority task even when it exceeds its planned
d
execution time cannot make a higher priority task wait according to the basic principles of RMA
u
t
whenever a higher priority task is ready, it preempts any executing lower priority task.
s
it y
Thus, RMA is stable under transient overload and a lower priority task overshooting its
.c
completion time can not make a higher priority task to miss its deadline.
w
w
w

Task Queue Priority Level
o m
6
o t.c
Fig. 30.7 Multi-Level Feedback Queue
s p
o g
bl
The disadvantages of RMA include the following: It is very difficult to support aperiodic and
.
sporadic tasks under RMA. Further, RMA is not optimal when task periods and deadlines differ.
p
1.6. Deadline Monotonic Algorithmo(DMA) u
g r
RMA no longer remains an optimal scheduling
ts algorithm for the periodic real-time tasks,
when task deadlines and periods differ (i.e.nd p ) for some tasks in the task set to be scheduled.
For such task sets, Deadline Monotonic e
i i
d Algorithm (DMA) turns out to be more proficient than

RMA. DMA is essentially a variantu of RMA and assigns priorities to tasks based on their
s t based on task periods as done in RMA. DMA assigns
deadlines, rather than assigning priorities
y deadlines. When the relative deadline of every task is
higher priorities to tasks with tshorter
i
c and DMA produce identical solutions. When the relative
proportional to its period, .RMA
w
deadlines are arbitrary, DMA is more proficient than RMA in the sense that it can sometimes
wourwhen
produce a feasible schedule RMA fails. On the other hand, RMA always fails when DMA
not RMA schedulable.

w
fails. We now illustrate discussions using an example task set that is DMA schedulable but
Example 10: Is the following task set schedulable by DMA? Also check whether it is
schedulable using RMA. T1 = (e1=10, p1=50, d1=35), T2 = (e2=15, p2=100, d1=20), T3 =
(e3=20, p3=200, d1=200) [time in msec].
Solution: First, let us check RMA schedulability of the given set of tasks, by checking the
Lehoczkys criterion. The tasks are already ordered in descending order of their priorities.
Checking for T1:
10 msec < 35 msec. Hence, T1 would meet its first deadline.
Checking for T2:
(10 + 15) > 20 (exceeds deadline)

Thus, T2 will miss its first deadline. Hence, the given task set can not be feasibly scheduled
under RMA.
Now let us check the schedulability using DMA:
Under DMA, the priority ordering of the tasks is as follows: pr(T2) > pr(T1) > pr(T3).
Checking for T2:
15 msec < 20 msec. Hence, T2 will meet its first deadline.
Checking for T1:
(15 + 10) < 35
Hence T1 will meet its first deadline.
Checking for T3:
(20 + 30 + 40) < 200
Therefore, T3 will meet its deadline.
Therefore, the given task set is schedulable under DMA but not under RMA.
1.7. Context Switching Overhead m

o
cthe overheads incurred
So far, while determining schedulability of a task set, we had ignored
t.
on account of context switching. Let us now investigate the effect ofocontext switching overhead
on schedulability of tasks under RMA.
s p
It is easy to realize that under RMA, whenever a task arrives,
the task that is currently running. From this observation, itlo
g it preempts at most one task
can be concluded that in the worst-
case, each task incurs at most two context switches under . b RMA. One when it preempts the
p possibly the task that was preempted or
au
currently running task. And the other when it completes
some other task is dispatched to run. Of course, o task may incur just one context switching
g
overhead, if it does not preempt any task. For example,r it arrives when the processor is idle or
when a higher priority task was running. However, ts we need to consider two context switches for
every task, if we try to determine the worst-casen context switching overhead.
For simplicity we can assume thatecontext switching time is constant, and equals c
d
milliseconds where c is a constant. uFrom this, it follows that the net effect of context switches
t
is to increase the execution time esof each task T to at most e + 2c. It is therefore clear that in
tytime into consideration, in all schedulability computations, we
i i i
order to take context switching i
need to replace e by e + 2c
i i .cfor each T .i
w
Example 11: Check w whether the following set of periodic real-time tasks is schedulable
under RMA on w a uniprocessor: T = (e =20, p =100), T = (e =30, p =150), T = (e =90,
1 1 1 2 2 2 3 3
p3=200). Assume that context switching overhead does not exceed 1 msec, and is to be taken
into account in schedulability computations.
Solution: The net effect of context switches is to increase the execution time of each
task by two context switching times. Therefore, the utilization due to the task set is:
3
i=1
ui = 22/100 + 32/150 + 92/200 = 0.89
3
Since i=1 ui > 0.78, the task set is not RMA schedulable according to the Liu and Layland
test.
Let us try Lehoczkys test. The tasks are already ordered in descending order of their
priorities.
Checking for task T1:
22 < 100

The condition is satisfied; therefore T1 meets its first deadline.

(222) + 32 < 150
(222) + (322) + 90 < 200.
Therefore, the task set can be feasibly scheduled under RMA even when context switching
overhead is taken into consideration.
1.8. Self Suspension

A task might cause its self-suspension, when it performs its input/output operations or when it
waits for some events/conditions to occur. When a task self suspends itself, the operating system
o m
removes it from the ready queue, places it in the blocked queue, and takes up the next eligible
task for scheduling. Thus, self-suspension introduces an additional scheduling point, which we
scheduling point given in Sec. 2.3.1 (lesson 2). o t.c

did not consider in the earlier sections. Accordingly, we need to augment our definition of a
s p
g
In event-driven scheduling, the scheduling points are defined by task completion, task
o
arrival, and self-suspension events.
. bl
u p
Let us now determine the effect of self-suspension on the schedulability of a task set. Let us
r o
consider a set of periodic real-time tasks {T1, T2, , Tn}, which have been arranged in the
g
increasing order of their priorities (or decreasing order of their periods). Let the worst case
s
nt
self-suspension time of a task Ti is bi. Let the delay that the task Ti might incur due to its own
self-suspension and the self-suspension of all higher priority tasks be bti. Then, bti can be
expressed as:
d e
t u i-1
bti = bi + k=1 min(ek, bk) (3.8/2.15)
y s
t
Self-suspension of a higher priority task Tk may affect the response time of a lower priority task
i
.c
Ti by as much as its execution time ek if ek < bk. This worst case delay might occur when the
higher priority task after self-suspension starts its execution exactly at the time instant the lower
w
priority task would have otherwise executed. That is, after self-suspension, the execution of the
w
w
higher priority task overlaps with the lower priority task, with which it would otherwise not have
overlapped. However, if ek > bk, then the self suspension of a higher priority task can delay a
lower priority task by at most bk, since the maximum overlap period of the execution of a higher
priority task due to self-suspension is restricted to bk.
Note that in a system where some of the tasks are non preemptable, the effect of self
suspension is much more severe than that computed by Expr.3.8. The reason is that, every time
a processor self suspends itself, it loses the processor. It may be blocked by a non-preemptive
lower priority task after the completion of self-suspension. Thus, in a non-preemptable scenario,
a task incurs delays due to self-suspension of itself and its higher priority tasks, and also the
delay caused due to non-preemptable lower priority tasks. Obviously, a task can not get delayed
due to the self-suspension of a lower priority non-preemptable task.
The RMA task schedulability condition of Liu and Layland (Expr. 3.4) needs to change when
we consider the effect of self-suspension of tasks. To consider the effect of self-suspension in
Expr. 3.4, we need to substitute ei by (ei + bti). If we consider the effect of self-suspension on
task completion time, the Lehoczky criterion (Expr. 3.6) would also have to be generalized:
i-1
ei + bti + k=1 pi / pk ek pi (3.9/2.16)
We have so far implicitly assumed that a task undergoes at most a single self
suspension. However, if a task undergoes multiple self-suspensions, then expression 3.9 we
derived above, would need to be changed. We leave this as an exercise for the reader.
Example 14: Consider the following set of periodic real-time tasks: T1 = (e1=10, p1=50), T2 =
(e2=25, p2=150), T3 = (e3=50, p3=200) [all in msec]. Assume that the self-suspension times of
T1, T2, and T3 are 3 msec, 3 msec, and 5 msec, respectively. Determine whether the tasks
would meet their respective deadlines, if scheduled using RMA.
Solution: The tasks are already ordered in descending order of their priorities. By using the
generalized Lehoczkys condition given by Expr. 3.9, we get:
For T1 to be schedulable:
(10 + 3) < 50
Therefore T1 would meet its first deadline.
o m
t.c
(25 + 6 + 103) < 150
Therefore, T2 meets its first deadline.
p o
g s
o
(50 + 11 + (104 + 252)) < 200
bl
This inequality is also satisfied. Therefore, T3 would also meet its first deadline.
.
self-suspension of tasks is considered. u p
It can therefore be concluded that the given task set is schedulable under RMA even when
r o
1.9. Self Suspension with Context s g Switching Overhead
n t
Let us examine the effect of context e switches on the generalized Lehoczkys test (Expr.3.9)
for schedulability of a task set, whichdtakes self-suspension by tasks into account. In a fixed
priority preemptable system, each ttasku preempts at most one other task if there is no self
s
suspension. Therefore, each taskysuffers at most two context switches one context switch when
t
it starts and another when itcicompletes. It is easy to realize that any time when a task self-
suspends, it causes at most .two additional context switches. Using a similar reasoning, we can
determine that when eachw
w task is allowed to self-suspend twice, additional four context switching
w
overheads are incurred. Let us denote the maximum context switch time as c. The effect of a
single self-suspension of tasks is to effectively increase the execution time of each task T in the
i
worst case from ei to (ei + 4c). Thus, context switching overhead in the presence of a single
self-suspension of tasks can be taken care of by replacing the execution time of a task Ti by (ei +
4c) in Expr. 3.9. We can easily extend this argument to consider two, three, or more self-
suspensions.
1.10.Exercises
1. State whether the following assertions are True or False. Write one or two sentences to
a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper
bound on achievable utilization improves as the number in tasks in the system being
developed increases.

b. If a set of periodic real-time tasks fails Lehoczkys test, then it can safely be
concluded that this task set can not be feasibly scheduled under RMA.
c. A time-sliced round-robin scheduler uses preemptive scheduling.
d. RMA is an optimal static priority scheduling algorithm to schedule a set of periodic
real-time tasks on a non-preemptive operating system.
e. Self-suspension of tasks impacts the worst case response times of the individual tasks
much more adversely when preemption of tasks is supported by the operating system
compared to the case when preemption is not supported.
f. When a set of periodic real-time tasks is being scheduled using RMA, it can not be
the case that a lower priority task meets its deadline, whereas some higher priority
task does not.
g. EDF (Earliest Deadline First) algorithm possesses good transient overload handling
capability.
h. A time-sliced round robin scheduler is an example of a non-preemptive scheduler.
i. EDF algorithm is an optimal algorithm for scheduling hard real-time tasks on
o m
a uniprocessor when the task set is a mixture of periodic and aperiodic tasks.
j.
t.c
In a non-preemptable operating system employing RMA scheduling for a set of real-
o
time periodic tasks, self-suspension of a higher priority task (due to I/O etc.) may
increase the response time of a lower priority task.
s p
g
k. The worst-case response time for a task occurs when it is out of phase with its higher
o
bl
priority tasks.
l.
.
Good real-time task scheduling algorithms ensure fairness to real-time tasks while
scheduling. p
2. u
State whether the following assertions are True or False. Write one or two sentences to
o
gr
s
a. The EDF algorithm is optimal for scheduling real-time tasks in a uniprocessor in a
non-preemptive environment.
ent
b. When RMA is used to schedule a set of hard real-time periodic tasks in a
u d
uniprocessor environment, if the processor becomes overloaded any time during
st
system execution due to overrun by the lowest priority task, it would be very difficult
t y
to predict which task would miss its deadline.
i
.c
c. While scheduling a set of real-time periodic tasks whose task periods are
harmonically related, the upper bound on the achievable CPU utilization is the same
w
w
for both EDF and RMA algorithms.
w
d. In a non-preemptive event-driven task scheduler, scheduling decisions are made
only at the arrival and completion of tasks.
e. The following is the correct arrangement of the three major classes of real-time
scheduling algorithms in ascending order of their run-time overheads.
static priority preemptive scheduling algorithms
table-driven algorithms
dynamic priority algorithms
f. While scheduling a set of independent hard real-time periodic tasks on a
uniprocessor, RMA can be as proficient as EDF under some constraints on the task
set.
g. RMA should be preferred over the time-sliced round-robin algorithm for scheduling a
set of soft real-time tasks on a uniprocessor.

h. Under RMA, the achievable utilization of a set of hard real-time periodic tasks
would drop when task periods are multiples of each other compared to the case
when they are not.
i. RMA scheduling of a set of real-time periodic tasks using the Liu and Layland
criterion might produce infeasible schedules when the task periods are different from
the task deadlines.
3. What do you understand by scheduling point of a task scheduling algorithm? How are the
scheduling points determined in (i) clock-driven, (ii) event-driven, (iii) hybrid schedulers?
How will your definition of scheduling points for the three classes of schedulers change
when (a) self-suspension of tasks, and (b) context switching overheads of tasks are taken
into account.
4. What do you understand by jitter associated with a periodic task? How are these
jitters caused?
5. Is EDF algorithm used for scheduling real-time tasks a dynamic priority scheduling
algorithm? Does EDF compute any priority value of tasks any time? If you answer
o m
affirmatively, then explain when is the priority computed and how is it computed. If you
6.
answer in negative, then explain the concept of priority in EDF.
o t.c
What is the sufficient condition for EDF schedulability of a set of periodic tasks whose
p
period and deadline are different? Construct an example involving a set of three periodic
s
g
tasks whose period differ from their respective deadlines such that the task set fails the
o
bl
sufficient condition and yet is EDF schedulable. Verify your answer. Show all your
7.
intermediate steps.
p .
A preemptive static priority real-time task scheduler is used to schedule two periodic tasks
T1 and T2 with the following characteristics:
o u
r
g Relative Deadline
Phase t
Execution Times Period
mSecn
Task
mSec mSec mSec
d10e
T1 0
u
t 20
20 20
T2 0
y s 50 50
it
.c
Assume that T1 has higher priority than T2. A background task arrives at time 0 and would
w
require 1000mSec to complete. Compute the completion time of the background task
w
assuming that context switching takes no more than 0.5 mSec.
8. w
Assume that a preemptive priority-based system consists of three periodic foreground tasks
T1, T2, and T3 with the following characteristics:

Task
mSec mSec mSec mSec
T1 0 20 100 100
T2 0 30 150 150
T3 0 30 300 300
T1 has higher priority than T2 and T2 has higher priority than T3. A background task Tb
arrives at time 0 and would require 2000mSec to complete. Compute the completion time
of the background task Tb assuming that context switching time takes no more than 1
mSec.

9. Consider the following set of four independent real-time periodic tasks.
Start Time Processing Time Period

Task
msec msec msec
T1 20 25 150
T2 40 10 50
T3 20 15 50
T4 60 50 200
Assume that task T3 is more critical than task T2. Check whether the task set can be
feasibly scheduled using RMA.
10. What is the worst case response time of the background task of a system in which the
background task requires 1000 msec to complete? There are two foreground tasks. The
o m
higher priority foreground task executes once every 100mSec and each time requires
o t.c
25mSec to complete. The lower priority foreground task executes once every 50 msec and
requires 15 msec to complete. Context switching requires no more than 1 msec.
11. p
Construct an example involving more than one hard real-time periodic task whose
s
g
aggregate processor utilization is 1, and yet schedulable under RMA.
o
bl
12. Determine whether the following set of periodic tasks is schedulable on a uniprocessor
computation. p .
using DMA (Deadline Monotonic Algorithm). Show all intermediate steps in your
o u
Start Time
gr
Processing Time Period Deadline
Task
s
nt 25
mSec mSec mSec mSec
de
T1 20 150 140
T2 60 10 60 40
t u
ys
T3 40 20 200 120
T4 25 i t 10 80 25
.c
13. w
Consider the following set of three independent real-time periodic tasks.
w
wStart Time Processing Time Period Deadline
Task
mSec mSec mSec mSec
T1 20 25 150 100
T2 60 10 50 30
T3 40 50 200 150
Determine whether the task set is schedulable on a uniprocessor using EDF. Show
all intermediate steps in your computation.
14. Determine whether the following set of periodic real-time tasks is schedulable on a
uniprocessor using RMA. Show the intermediate steps in your computation. Is RMA
optimal when the task deadlines differ from the task periods?

Start Time Processing Time Period Deadline

Task
mSec mSec mSec mSec
T1 20 25 150 100
T2 40 7 40 40
T3 60 10 60 50
T4 25 10 30 20
15. Construct an example involving two periodic real-time tasks which can be feasibly
scheduled by both RMA and EDF, but the schedule generated by RMA differs from that
generated by EDF. Draw the two schedules on a time line and highlight how the two
schedules differ. Consider the two tasks such that for each task:
a. the period is the same as deadline
b. period is different from deadline
16.
m
Can multiprocessor real-time task scheduling algorithms be used satisfactorily in
o
distributed systems. Explain the basic difference between the characteristics of a real-time
applications running on distributed systems. o t.c

task scheduling algorithm for multiprocessors and a real-time task scheduling algorithm for
17.
s p
Construct an example involving a set of hard real-time periodic tasks that are not
g
schedulable under RMA but could be feasibly scheduled by DMA. Verify your answer,
o
18.
showing all intermediate steps.
. bl
Three hard real-time periodic tasks T1 = (50, 100, 100), T2 = (70, 200, 200), and T3 = (60,
u p
400, 400) [time in msec] are to be scheduled on a uniprocessor using RMA. Can the task
o
set be feasibly be scheduled? Suppose context switch overhead of 1 millisecond is to be
r
g
taken into account, determine the schedulability.
s
nt
19. Consider the following set of three real-time periodic tasks.
Start Time d e
Processing Time Period Deadline
Task
mSec
t u mSec mSec mSec
y s
T1 20
c it 25 150 100
T2 .
40 10 50 50
T3 w60 50 200 200
w
a. w
Check whether the three given tasks are schedulable under RMA. Show all
intermediate steps in your computation.
b. Assuming that each context switch incurs an overhead of 1 msec, determine whether
the tasks are schedulable under RMA. Also, determine the average context switching
overhead per unit of task execution.
c. Assume that T1, T2, and T3 self-suspend for 10 msec, 20 msec, and 15 msec
respectively. Determine whether the task set remains schedulable under RMA. The
context switching overhead of 1 msec should be considered in your result. You can
assume that each task undergoes self-suspension only once during each of its
execution.
d. Assuming that T1 and T2 are assigned the same priority value, determine the
additional delay in response time that T2 would incur compared to the case when they
are assigned distinct priorities. Ignore the self-suspension times and the context
switch overhead for this part of the question.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
31
Concepts in Real-Time
Operating Systems

Know the clock and time services provided by a Real-Time OS
Get an overview of the features that a Real-Time OS is required to support
Investigate Unix as a Real-Time operating System
Know the shortcomings on traditional Unix in Real-Time applications
Know the different approaches taken to make Unix suitable for real-time applications
Investigate Windows as a Real-Time operating System
Know the features of Windows NT desirable for Real-Time applications
Know the shortcomings of Windows NT
o m
t.c
Compare Windows with Unix OS
1. Introduction p o
g s
o
bl
In the last three lessons, we discussed the important real-time task scheduling techniques. We
p .
highlighted that timely production of results in accordance to a physical clock is vital to
the satisfactory operation of a real-time system. We had also pointed out that real-time
ou
operating systems are primarily responsible for ensuring that every real-time task meets its
r
timeliness requirements. A real-time operating system in turn achieves this by using appropriate
g
s
nt
task scheduling techniques. Normally real-time operating systems provide flexibility to the
programmers to select an appropriate scheduling policy among several supported policies.
e
Deployment of an appropriate task scheduling technique out of the supported techniques is
d
u
therefore an important concern for every real-time programmer. To be able to determine the
t
s
suitability of a scheduling algorithm for a given problem, a thorough understanding of the
y
t
characteristics of various real-time task scheduling algorithms is important. We therefore had a
i
.c
rather elaborate discussion on real-time task scheduling techniques and certain related issues
w
such as sharing of critical resources and handling task dependencies.
w
In this lesson, we examine the important features that a real-time operating system is
w
expected to support. We start by discussing the time service supports provided by the real-time
operating systems, since accurate and high precision clocks are very important to the successful
operation any real- time application. Next, we point out the important features that a real-time
operating system needs to support. Finally, we discuss the issues that would arise if we attempt
to use a general purpose operating system such as UNIX or Windows in real-time applications.
1.1. Time Services

Clocks and time services are among some of the basic facilities provided to programmers by
every real-time operating system. The time services provided by an operating system are based
on a software clock called the system clock maintained by the operating system. The system
clock is maintained by the kernel based on the interrupts received from the hardware clock.
Since hard real-time systems usually have timing constraints in the micro seconds range, the

system clock should have sufficiently fine resolution 1 to support the necessary time services.
However, designers of real-time operating systems find it very difficult to support very fine
resolution system clocks. In current technology, the resolution of hardware clocks is usually finer
than a nanosecond (contemporary processor speeds exceed 3GHz). But, the clock resolution
being made available by modern real-time operating systems to the programmers is of the order
of several milliseconds or worse. Let us first investigate why real-time operating system
designers find it difficult to maintain system clocks with sufficiently fine resolution. We then
examine various time services that are built based on the system clock, and made available to the
real-time programmers.
The hardware clock periodically generates interrupts (often called time service
interrupts). After each clock interrupt, the kernel updates the software clock and also
performs certain other work (explained in Sec 4.1.1). A thread can get the current time
reading of the system clock by invoking a system call supported by the operating system
(such as the POSIX clock-gettime()). The finer the resolution of the clock, the more
frequent need to be the time service interrupts and larger is the amount of processor time the
kernel spends in responding to these interrupts. This overhead places a limitation on how
o m
o t.c
fine is the system clock resolution a computer can support. Another issue that caps the
resolution of the system clock is the response time of the clock-gettime() system call is
p
not deterministic. In fact, every system call (or for that matter, a function call) has some
s
g
associated jitter. The problem gets aggravated in the following situation. The jitter is caused on
o
bl
account of interrupts having higher priority than system calls. When an interrupt occurs, the
p .
processing of a system call is stalled. Also, the preemption time of system calls can vary
because many operating systems disable interrupts while processing a system call. The variation
u
in the response time (jitter) introduces an error in the accuracy of the time value that the calling
o
r
thread gets from the kernel. Remember that jitter was defined as the difference between the
g
s
worst-case response time and the best case response time (see Sec. 2.3.1). In commercially
e nt
available operating systems, jitters associated with system calls can be several milliseconds. A
software clock resolution finer than this error, is therefore not meaningful.
u d
We now examine the different activities that are carried out by a handler routine after a clock
st
interrupt occurs. Subsequently, we discuss how sufficient fine resolution can be provided in the
it
presence of jitter in function calls. y
c
. Processing
1.1.1. Clock Interrupt
w
w
w Expiration time
t4=15 t3=3 t2=3 t1=1
Handler Handler
2 1
Fig. 31.1 Structure of a Timer Queue
1
Clock resolution denotes the time granularity provided by the clock of a computer. It corresponds to the duration
of time that elapses between two successive clock ticks.

Each time a clock interrupt occurs, besides incrementing the software clock, the handler
routine carries out the following activities:
Process timer events: Real-time operating systems maintain either per-process timer
queues or a single system-wide timer queue. The structure of such a timer queue has been
shown in Fig. 31.1. A timer queue contains all timers arranged in order of their expiration
times. Each timer is associated with a handler routine. The handler routine is the function that
should be invoked when the timer expires. At each clock interrupt, the kernel checks the
timer data structures in the timer queue to see if any timer event has occurred. If it finds that
a timer event has occurred, then it queues the corresponding handler routine in the ready
queue.
Update ready list: Since the occurrence of the last clock event, some tasks might have
arrived or become ready due to the fulfillment of certain conditions they were waiting for.
The tasks in the wait queue are checked, the tasks which are found to have become ready, are
o m
queued in the ready queue. If a task having higher priority than the currently running task is
is invoked.
o t.c
found to have become ready, then the currently running task is preempted and the scheduler
p
Update execution budget: At each clock interrupt, the scheduler decrements the time slice
s
g
(budget) remaining for the executing task. If the remaining budget becomes zero and the task
o
bl
is not complete, then the task is preempted, the scheduler is invoked to select another task to
run.
p .
o
1.1.2. Providing High Clock Resolution u
g r
We had pointed out in Sec. 4.1 that there
ts are two main difficulties in providing a high
n with processing the clock interrupt becomes
resolution timer. First, the overhead associated
excessive. Secondly, the jitter associatedewith the time lookup system call (clock-gettime()) is
u d Therefore, it is not useful to provide a clock with a
s t some real-time applications need to deal with timing
often of the order of several milliseconds.
resolution any finer than this. However,
constraints of the order of a fewtynanoseconds. Is it at all possible to support time measurement
i
with nanosecond resolution? cA way to provide sufficiently fine clock resolution is by mapping a
.
clock directly (throughw
w
hardware clock into the address space of applications. An application can then read the hardware
a normal memory read operation) without having to make a system call.
w
On a Pentium processor, a user thread can be made to read the Pentium time stamp counter. This
counter starts at 0 when the system is powered up and increments after each processor cycle. At
todays processor speed, this means that during every nanosecond interval, the counter
increments several times.
However, making the hardware clock readable by an application significantly reduces
the portability of the application. Processors other than Pentium may not have a high
resolution counter, and certainly the memory address map and resolution would differ.
1.1.3. Timers
We had pointed out that timer service is a vital service that is provided to applications
by all real-time operating systems. Real-time operating systems normally support two main
types of timers: periodic timers and aperiodic (or one shot) timers. We now discuss some basic
concepts about these two types of timers.

Periodic Timers: Periodic timers are used mainly for sampling events at regular intervals or
performing some activities periodically. Once a periodic timer is set, each time after it
expires the corresponding handler routine is invoked, it gets reinserted into the timer queue.
For example, a periodic timer may be set to 100 msec and its handler set to poll the
temperature sensor after every 100 msec interval.
Aperiodic (or One Shot) Timers: These timers are set to expire only once. Watchdog
timers are popular examples of one shot timers.
f(){
wd_start(t1, exception-handler); start
t1
o m
wd_tickle ( );
o t.c end
}
Fig. 31.2 Use of a Watchdog Timer s p
o g
Watchdog timers are used extensively in real-time b l
programs to detect when a task misses
its deadline, and then to initiate exception handling p .
procedures upon a deadline miss. An
example use of a watchdog timer has been illustrated
ou inf()Fig.through
timer is set at the start of a certain critical rfunction
31.2. In Fig. 31.2, a watchdog
a wd_start(t1) call. The
g
starting of the task. If the function f() doestsnot complete even after t time units have elapsed,
wd_start(t1) call sets the watch dog timer to expire by the specified deadline (t ) of the
1
then the watchdog timer fires, indicatinge n that the task deadline must have been missed and
1
the exception handling procedure isdinitiated. In case the task completes before the watchdog
t
timer expires (i.e. the task completesu within its deadline), then the watchdog timer is reset
s
using a wd_ tickle() call.
ti y
c
1.2. Features ofwa. Real-Time Operating System
w
Before discussing w about commercial real-time operating systems, we must clearly understand
the features normally expected of a real-time operating system and also let us compare different
real-time operating systems. This would also let us understand the differences between a
traditional operating system and a real-time operating system. In the following, we identify
some important features required of a real-time operating system, and especially those that are
normally absent in traditional operating systems.
Clock and Timer Support: Clock and timer services with adequate resolution are one of the
most important issues in real-time programming. Hard real-time application development often
requires support of timer services with resolution of the order of a few microseconds. And even
finer resolution may be required in case of certain special applications. Clocks and timers are a
vital part of every real-time operating system. On the other hand, traditional operating systems
often do not provide time services with sufficiently high resolution.

Real-Time Priority Levels: A real-time operating system must support static priority levels.
A priority level supported by an operating system is called static, when once the
programmer assigns a priority value to a task, the operating system does not change it by
itself. Static priority levels are also called real-time priority levels. This is because, as we
discuss in section 4.3, all traditional operating systems dynamically change the priority levels of
tasks from programmer assigned values to maximize system throughput. Such priority levels that
are changed by the operating system dynamically are obviously not static priorities.
Fast Task Preemption: For successful operation of a real-time application, whenever a

high priority critical task arrives, an executing low priority task should be made to
instantly yield the CPU to it. The time duration for which a higher priority task waits before it
is allowed to execute is quantitatively expressed as the corresponding task preemption time.
Contemporary real-time operating systems have task preemption times of the order of a few
micro seconds. However, in traditional operating systems, the worst case task preemption time
is usually of the order of a second. We discuss in the next section that this significantly large
o m
latency is caused by a non-preemptive kernel. It goes without saying that a real-time
times of the order of a few micro seconds.

o t.c
operating system needs to have a preemptive kernel and should have task preemption
p
s is defined as the time delay
Predictable and Fast Interrupt Latency: Interrupt latency
g
between the occurrence of an interrupt and the running ofo the corresponding ISR (Interrupt
b l bound on interrupt latency must be
Service Routine). In real-time operating systems, the upper
.
p The way low interrupt latency is
bounded and is expected to be less than a few micro seconds.
achieved, is by performing bulk of the activities of u ISR in a deferred procedure call (DPC). A
DPC is essentially a task that performs most of the r oISR activity. A DPC is executed later at a
s
certain priority value. Further, support for nestedg interrupts are usually desired. That is, a real-
time operating system should not only be
n t preemptive while executing kernel routines, but
d e timingas requirements.
should be preemptive during interrupt servicing well. This is especially important for hard
u
real-time applications with sub-microsecond
t
y s Among Real-Time Tasks: If real- time tasks are allowed to
Support for Resource Sharing
t
share critical resources among ithemselves using the traditional resource sharing techniques, then
c
the response times of tasks. can become unbounded leading to deadline misses. This is one
compelling reason as towwhy every commercial real-time operating system should at the
minimum provide the w
wbasic priority inheritance mechanism. Support of priority ceiling protocol
(PCP) is also desirable, if large and moderate sized applications are to be supported.
Requirements on Memory Management: As far as general-purpose operating systems are

concerned, it is rare to find one that does not support virtual memory and memory protection
features. However, embedded real-time operating systems almost never support these features.
Only those that are meant for large and complex applications do. Real-time operating systems for
large and medium sized applications are expected to provide virtual memory support, not only to
meet the memory demands of the heavy weight tasks of the application, but to let the memory
demanding non-real-time applications such as text editors, e-mail software, etc. to also run on the
same platform. Virtual memory reduces the average memory access time, but degrades the
worst-case memory access time. The penalty of using virtual memory is the overhead associated
with storing the address translation table and performing the virtual to physical address
translations. Moreover, fetching pages from the secondary memory on demand incurs significant
latency. Therefore, operating systems supporting virtual memory must provide the real-time
applications with some means of controlling paging, such as memory locking. Memory locking
prevents a page from being swapped from memory to hard disk. In the absence of memory
locking feature, memory access times of even critical real-time tasks can show large jitter, as the
access time would greatly depend on whether the required page is in the physical memory or has
been swapped out.
Memory protection is another important issue that needs to be carefully considered. Lack of
support for memory protection among tasks leads to a single address space for the tasks.
Arguments for having only a single address space include simplicity, saving memory bits,
and light weight system calls. For small embedded applications, the overhead of a few Kilo
Bytes of memory per process can be unacceptable. However, when no memory protection is
provided by the operating system, the cost of developing and testing a program without memory
protection becomes very high when the complexity of the application increases. Also,
maintenance cost increases as any change in one module would require retesting the entire
system.
Embedded real-time operating systems usually do not support virtual memory. Embedded
o m
real-time operating systems create physically contiguous blocks of memory for an application
o t.c
upon request. However, memory fragmentation is a potential problem for a system that does not
support virtual memory. Also, memory protection becomes difficult to support a non-virtual
p
memory management system. For this reason, in many embedded systems, the kernel and the
s
g
user processes execute in the same space, i.e. there is no memory protection. Hence, a system
o
bl
call and a function call within an application are indistinguishable. This makes debugging
system freeze. p .
applications difficult, since a run away pointer can corrupt the operating system code, making the
ou
r
Additional Requirements for Embedded Real-Time Operating Systems: Embedded
g
s
applications usually have constraints on cost, size, and power consumption. Embedded real-time
ent
operating systems should be capable of diskless operation, since many times disks are either too
bulky to use, or increase the cost of deployment. Further, embedded operating systems should
u d
minimize total power consumption of the system. Embedded operating systems usually reside on
st
ROM. For certain applications which require faster response, it may be necessary to run the real-
t y
time operating system on a RAM. Since the access time of a RAM is lower than that of a ROM,
i
.c
this would result in faster execution. Irrespective of whether ROM or RAM is used, all ICs are
expensive. Therefore, for real-time operating systems for embedded applications it is desirable to
w
w
have as small a foot print (memory usage) as possible. Since embedded products are typically
w
manufactured large scale, every rupee saved on memory and other hardware requirements
impacts millions in profit.
1.3. Unix as a Real-Time Operating System

Unix is a popular general purpose operating system that was originally developed for the
mainframe computers. However, UNIX and its variants have now permeated to desktop and even
handheld computers. Since UNIX and its variants inexpensive and are widely available, it is
worthwhile to investigate whether Unix can be used in real-time applications. This investigation
would lead us to some significant findings and would give us some crucial insights into the
current Unix-based real-time operating systems that are currently commercially available.
The traditional UNIX operating system suffers from several shortcomings when used in real-
time applications.
We elaborate these problems in the following two subsections.

The two most troublesome problems that a real-time programmer faces while using Unix
for real-time applications include non-preemptive Unix kernel and dynamically changing
priority of tasks.
1.3.1. Non-Preemptive Kernel

One of the biggest problems that real-time programmers face while using Unix for real-time
application development is that Unix kernel cannot be preempted. That is, all interrupts are
disabled when any operating system routine runs. To set things in proper perspective, let us
elaborate this issue.
Application programs invoke operating system services through system calls. Examples of
system calls include the operating system services for creating a process, interprocess
communication, I/O operations, etc. After a system call is invoked by an application, the
o m
arguments given by the application while invoking the system call are checked. Next, a special
o t.c
instruction called a trap (or a software interrupt) is executed. As soon as the trap instruction is
executed, the handler routine changes the processor state from user mode to kernel mode (or
s p
supervisor mode), and the execution of the required kernel routine starts. The change of mode
during a system call has schematically been depicted in Fig. 31.3.
o g
System call
. bl
u p
r o
s g
nt
Check parameters Application Program
(user mode)
System call
OS Service
(Kernel mode) d e
Trap Next statement
t u

y s
c it
. of an Operating System Service through System Call
Fig. 31.3 Invocation
w
w
At the risk of digressing from the focus of this discussion, let us understand an
w
important operating systems concept. Certain operations such as handling devices, creating
processes, file operations, etc., need to be done in the kernel mode only. That is, application
programs are prevented from carrying out these operations, and need to request the operating
system (through a system call) to carry out the required operation. This restriction enables the
kernel to enforce discipline among different programs in accessing these objects. In case
such operations are not performed in the kernel mode, different application programs
might interfere with each others operation. An example of an operating system where all
operations were performed in user mode is the once popular operating system DOS
(though DOS is nearly obsolete now). In DOS, application programs are free to carry out any
operation in user mode 2 , including crashing the system by deleting the system files. The
instability this can bring about is clearly unacceptable in real-time environment, and is usually
considered insufficient in general applications as well.
2
In fact, in DOS there is only one mode of operation, i.e. kernel mode and user mode are indistinguishable.

A process running in kernel mode cannot be preempted by other processes. In other

words, the Unix kernel is non-preemptive. On the other hand, the Unix system does preempt
processes running in the user mode. A consequence of this is that even when a low priority
process makes a system call, the high priority processes would have to wait until the system call
completes. The longest system calls may take up to several hundreds of milliseconds to
complete. Worst-case preemption times of several hundreds of milliseconds can easily cause,
high priority tasks with short deadlines of the order of a few milliseconds to miss their deadlines.
Let us now investigate, why the Unix kernel was designed to be non-preemptive in
the first place. Whenever an operating system routine starts to execute, all interrupts are
disabled. The interrupts are enabled only after the operating system routine completes. This
was a very efficient way of preserving the integrity of the kernel data structures. It saved the
overheads associated with setting and releasing locks and resulted in lower average task
preemption times. Though a non-preemptive kernel results in worst-case task response time of
upto a second, it was acceptable to Unix designers. At that time, the Unix designers did not
foresee usage of Unix in real-time applications. Of course, it could have been possible to ensure
o m
correctness of kernel data structures by using locks at appropriate places rather than disabling
o t.c
interrupts, but it would have resulted in increasing the average task preemption time. In Sec.
4.4.4 we investigate how modern real-time operating systems make the kernel preemptive
without unduly increasing the task preemption time.
s p
o g
bl
1.3.2. Dynamic Priority Levels
p .
ou
In Unix systems real-time tasks can not be assigned static priority values. Soon after a
programmer sets a priority value, the operating system alters it. This makes it very difficult to
gr
schedule real-time tasks using algorithms such as RMA or EDF, since both these schedulers
s
nt
assume that once task priorities are assigned, it should not be altered by any other parts of the
operating system. It is instructive to understand why Unix dynamically changes the priority
values of tasks in the first place.
d e
t u
Unix uses round-robin scheduling with multilevel feedback. This scheduler arranges tasks in
s
multilevel queues as shown in Fig. 31.4. At every preemption point, the scheduler scans the
y
it
multilevel queue from the top (highest priority) and selects the task at the head of the first non-
.c
empty queue. Each task is allowed to run for a fixed time quantum (or time slice) at a time. Unix
w
normally uses one second time slice. That is, if the running process does not block or complete
w
within one second of its starting execution, it is preempted and the scheduler selects the next task
w
for dispatching. Unix system however allows configuring the default one second time slice
during system generation. The kernel preempts a process that does not complete within its
assigned time quantum, recomputes its priority, and inserts it back into one of the priority queues
depending on the recomputed priority value of the task.

Tasks Priority Level
Task Queues
3
o m
o
6 t.c
Fig. 31.4 Multi-Level Feedback Queuess p
g
Unix periodically computes the priority of a task bbased lo on the type of the task and
p .
its execution history. The priority of a task (T ) is recomputed
i at the end of its j-th time slice
using the following two expressions:
o u
gir ) + CPU(T , j) + nice(T )
Pr(T , j) = Base(T i i (4.1)i
CPU(T , j) =nU(T
i
ts , j1) / 2 + CPU(T , j1) / 2
i i (4.2)
where Pr(T , j) is the priority of the
i
d e task T at the end
i of its j-th time slice; U(T , j) i
u
is the utilization of the task T for its j-th time slice, and CPU(T , j) is the weighted history of
CPU utilization of the task T at thetend of its j-th time slice. Base(T ) is the base priority of the
i i
y s associated with T . User processes can have non-negative

i i
i i
t
task T and nice(T ) is the nice value
i nice value lowers the priority value of a process (i.e. being nice
nice values. Thus, effectivelycthe
i
.
to the other processes).
w
w CPU(Tdefined.
Expr. 4.2 has been recursively Unfolding the recursion, we get:
, j) = U(T , j1) / 2 + U(T , j2) / 4 +
w i i i (4.3)
It can be easily seen from Expr. 4.3 that, in the computation of the weighted history of CPU
utilization of a task, the activity (i.e. processing or I/O) of the task in the immediately concluded
interval is given the maximum weightage. If the task used up CPU for the full duration of the
slice (i.e. 100% CPU utilization), then CPU(Ti, j) gets a higher value indicating a lower priority.
Observe that the activities of the task in the preceding intervals get progressively lower
weightage. It should be clear that CPU(Ti, j) captures the weighted history of CPU utilization of
the task Ti at the end of its j-th time slice.
Now, substituting Expr 4.3 in Expr. 4.1, we get:
Pr(Ti, j) = Base(Ti) + U(Ti, j1) / 2 + U(Ti, j2) / 4 + + nice(Ti) (4.4)
The purpose of the base priority term in the priority computation expression (Expr.
4.4) is to divide all tasks into a set of fixed bands of priority levels. The values of
U(Ti , j) and nice components are restricted to be small enough to prevent a process from
migrating from its assigned band. The bands have been designed to optimize I/O, especially

block I/O. The different priority bands under Unix in decreasing order of priorities are:
swapper, block I/O, file manipulation, character I/O and device control, and user processes.
Tasks performing block I/O are assigned the highest priority band. To give an example of block
I/O, consider the I/O that occurs while handling a page fault in a virtual memory system. Such
block I/O use DMA-based transfer, and hence make efficient use of I/O channel. Character I/O
includes mouse and keyboard transfers. The priority bands were designed to provide the most
effective use of the I/O channels.
Dynamic re-computation of priorities was motivated from the following consideration. Unix
designers observed that in any computer system, I/O is the bottleneck. Processors are extremely
fast compared to the transfer rates of I/O devices. I/O devices such as keyboards are necessarily
slow to cope up with the human response times. Other devices such as printers and disks
deploy mechanical components that are inherently slow and therefore can not sustain very
high rate of data transfer. Therefore, effective use of the I/O channels is very important to
increase the overall system throughput. The I/O channels should be kept as busy as possible for
letting the interactive tasks to get good response time. To keep the I/O channels busy, any task
o m
performing I/O should not be kept waiting for CPU. For this reason, as soon as a task blocks for
o t.c
I/O, its priority is increased by the priority re-computation rule given in Expr. 4.4. However, if a
task makes full use of its last assigned time slice, it is determined to be computation-bound and
p
its priority is reduced. Thus the basic philosophy of Unix operating system is that the interactive
s
g
tasks are made to assume higher priority levels and are processed at the earliest. This gives the
o
bl
interactive users good response time. This technique has now become an accepted way of
p .
scheduling soft real-time tasks across almost all available general purpose operating systems.
We can now state from the above observations that the overall effect of re-
u
computation of priority values using Expr. 4.4 as follows:
o
r
g and higher priorities, whereas CPU-
In Unix, I/O intensive tasks migrate to higher
intensive tasks seek lower priority levels. t
s
e n
No doubt that the approach taken d by Unix is very appropriate for maximizing the average
t u
task throughput, and does indeed provide good average responses time to interactive (soft real-
s
time) tasks. In fact, almost every
i ty modern operating system does very similar dynamic re-
good average response time.to c the interactive tasks. However, for hard real-time tasks, dynamic
computation of the task priorities to maximize the overall system throughput and to provide
shifting of priority valuesw

w is clearly not appropriate.
w
1.3.3. Other Deficiencies of Unix
We have so far discussed two glaring shortcomings of Unix in handling the
requirements of real-time applications. We now discuss a few other deficiencies of Unix that
crop up while trying to use Unix in real-time applications.
Insufficient Device Driver Support: In Unix, (remember that we are talking of the original
Unix System V) device drivers run in kernel mode. Therefore, if support for a new device is to
be added, then the driver module has to be linked to the kernel modules necessitating a system
generation step. As a result, providing support for a new device in an already deployed
application is cumbersome.

Lack of Real-Time File Services: In Unix, file blocks are allocated as and when they are
requested by an application. As a consequence, while a task is writing to a file, it may encounter
an error when the disk runs out of space. In other words, no guarantee is given that disk space
would be available when a task writes a block to a file. Traditional file writing approaches also
result in slow writes since required space has to be allocated before writing a block. Another
problem with the traditional file systems is that blocks of the same file may not be contiguously
located on the disk. This would result in read operations taking unpredictable times, resulting in
jitter in data access. In real-time file systems significant performance improvement can be
achieved by storing files contiguously on the disk. Since the file system pre-allocates space, the
times for read and write operations are more predictable.
Inadequate Timer Services Support: In Unix systems, real-time timer support is

insufficient for many hard real-time applications. The clock resolution that is provided to
applications is 10 milliseconds, which is too coarse for many hard real-time applications.
1.4. Unix-based Real-Time Operating Systems o m

o t.c
We have already seen in the previous section that traditional Unix systems are not suitable
s p
for being used in hard real-time applications. In this section, we discuss the different approaches
g
that have been undertaken to make Unix suitable for real-time applications.
o
. Kernel
1.4.1. Extensions To The Traditional Unix bl
u p
A naive attempted in the past to makeo traditional Unix suitable for real-time
r
applications was by adding some real-timegcapabilities over the basic kernel. These
additionally implemented capabilities includedts real-time timer support, a real-time task
scheduler built over the Unix scheduler, n
d
fundamental problems with the Unix system
e etc. However, these extensions do not address the
that were pointed out in the last section; namely,
u
non-preemptive kernel and dynamictpriority levels. No wonder that superficial extensions to
s without addressing the fundamental deficiencies of the
ti y of the requirements of hard real-time applications.
the capabilities of the Unix kernel
Unix system would fall wide short
.c
1.4.2. Host-TargetwApproach
w
w
Host-target operating systems are popularly being deployed in embedded applications. In this
approach, the real- time application development is done on a host machine. The host machine is
either a traditional Unix operating system or an Windows system. The real-time application is
developed on the host and the developed application is downloaded onto a target board that is to
be embedded in a real-time system. A ROM-resident small real-time kernel is used in the target
board. This approach has schematically been shown in Fig. 31.5.

ICP/IP
Host System
Target Board
Fig. 31.5 Schematic Representation of a Host-Target System
The main idea behind this approach is that the real-time operating system running on the
o m
target board be kept as small and simple as possible. This implies that the operating system on
the target board would lack virtual memory management support, neither does it support any
real-time operating system. o t.c

utilities such as compilers, program editors, etc. The processor on the target board would run the
s p
The host system must have the program development environment, including compilers,
g
editors, library, cross-compilers, debuggers etc. These are memory demanding applications that
o
bl
require virtual memory support. The host is usually connected to the target using a serial port or
p .
a TCP/IP connection (see Fig. 31.5). The real-time program is developed on the host. It is then
cross-compiled to generate code for the target processor. Subsequently, the executable module is
ou
downloaded to the target board. Tasks are executed on the target board and the execution is
gr
controlled at the host side using a symbolic cross-debugger. Once the program works
s
nt
successfully, it is fused on a ROM or flash memory and becomes ready to be deployed in
applications.
d e
Commercial examples of host-target real-time operating systems include PSOS, VxWorks,
u
and VRTX. We examine these commercial products in lesson 5. We would point out that these
t
s
operating systems, due to their small size, limited functionality, and optimal design achieve
y
it
much better performance figures than full-fledged operating systems. For example, the task
.c
preemption times of these systems are of the order of few microseconds compared to several
w
hundreds of milliseconds for traditional Unix systems.
w
w Point Approach
1.4.3. Preemption
We have already pointed out that one of the major shortcomings of the traditional
Unix V code is that during a system call, all interrupts are masked(disabled) for the entire
duration of execution of the system call. This leads to unacceptable worst case task response
time of the order of second, making Unix-based systems unacceptable for most hard real-time
applications.
An approach that has been taken by a few vendors to improve the real-time performance of
non-preemptive kernels is the introduction of preemption points in system routines. Preemption
points in the execution of a system routine are the instants at which the kernel data structures are
consistent. At these points, the kernel can safely be preempted to make way for any waiting
higher priority real-time tasks without corrupting any kernel data structures. In this approach,
when the execution of a system call reaches a preemption point, the kernel checks to
see if any higher priority tasks have become ready. If there is at least one, it preempts

the processing of the kernel routine and dispatches the waiting highest priority task
immediately. The worst-case preemption latency in this technique therefore becomes the longest
time between two consecutive preemption points. As a result, the worst-case response times of
tasks are now several folds lower than those for traditional operating systems without preemption
points. This makes the preemption point-based operating systems suitable for use in many
categories hard real-time applications, though still not suitable for applications requiring
preemption latency of the order of a few micro seconds or less. Another advantage of
this approach is that it involves only minor changes to be made to the kernel code.
Many operating systems have taken the preemption point approach in the past, a prominent
example being HP-UX.
1.4.4. Self-Host Systems

Unlike the host-target approach where application development is carried out on a
o m
separate host system machine running traditional Unix, in self-host systems a real-time
application is developed on the same system on which the real-time application would finally
o t.c
run. Of course, while deploying the application, the operating system modules that are
not essential during task execution are excluded during deployment to minimize the size
s p
of the operating system in the embedded application. Remember that in host-target approach,
g
the target real-time operating system was a lean and efficient system that could only run the
o
bl
application but did not include program development facilities; program development was
.
carried out on the host system. This made application development and debugging difficult and
p
ou
required cross-compiler and cross-debugger support. Self-host approach takes a different
approach where the real-time application is developed on the full-fledged operating system, and
gr
once the application runs satisfactorily it is fused on the target board on a ROM or flash memory
s
nt
along with a stripped down version of the same operating system.
Most of the self-host operating systems that are available now are based on micro-kernel
d e
architecture. Use of microkernel architecture for a self-host operating system entails several
t u
advantages. In microkernel architecture, only the core functionalities such as interrupt handling
s
and process management are implemented as kernel routines. All other functionalities such as
y
it
memory management, file management, device management, etc are implemented as add-on
.c
modules which operate in user mode. As a result, it becomes very easy to configure the
w
operating system. Also, the micro kernel is lean and therefore becomes much more
w
efficient. A monolithic operating system binds most drivers, file systems, and protocol
w
stacks to the operating system kernel and all kernel processes share the same address
space. Hence a single programming error in any of these components can cause a fatal
kernel fault. In microkernel-based operating systems, these components run in separate
memory-protected address spaces. So, system crashes on this count are very rare, and
microkernel-based operating systems are very reliable.
We had discussed earlier that any Unix-based system has to overcome the following
two main shortcomings of the traditional Unix kernel in order to be useful in hard real-time
applications: non-preemptive kernel and dynamic priority values. We now examine how these
problems are overcome in self-host systems.
Non-preemptive kernel: We had identified the genesis of the problem of non-preemptive

Unix kernel in Sec.4.3.1. We had remarked that in order to preserve the integrity of the kernel
data structures, all interrupts are disabled as long as a system call does not complete. This was

done from efficiency considerations and worked well for non-real-time and uniprocessor
applications.
Masking interrupts during kernel processing makes to even very small critical routines
to have worst case response times of the order of a second. Further, this approach
would not work in multiprocessor environments. In multiprocessor environments masking
the interrupts for one processor does not help, as the tasks running on other processors can still
corrupt the kernel data structure.
It is now clear that in order to make the kernel preemptive, locks must be used at appropriate
places in the kernel code. In fully preemptive Unix systems, normally two types of locks are
used: kernel-level locks, and spin locks.
T2 T1
Busy wait
Spin lock
Critical
Resource o m
o t.c
s p
Fig. 31.6 Operation of a Spin Lock
o g
b l a task waits for a kernel level lock
to be released, it is blocked and undergoes a context p .
A kernel-level lock is similar to a traditional lock. When
switch. It becomes ready only after the
required lock is released by the holding task and u becomes available. This type of locks is
inefficient when critical resources are requiredro for short durations of the order of a few
milliseconds or less. In some situations such g
t scarrying
context switching overheads are not acceptable.
Consider that some task requires the lock for
single arithmetic operation) on some critical
n out very small processing (possibly a
e resource. Now, if a kernel level lock is used,
another task requesting the lock atd that time would be blocked and a context switch
would be incurred, also the cachetu
y s contents, pages of the task etc. may be swapped. Here a
i t
context switching time is comparable to the time for which a task needs a resource even greater
c
than it. In such a situation, a spin lock would be appropriate. Now let us understand the
operation of a spin lock. A.spin lock has been schematically shown in Fig. 31.6. In Fig. 31.6, a
w by the tasks T and T for very short times (comparable to a context
w
critical resource is required 1 2
guarding the resource.w Meanwhile, the task T requests the resource. When task T cannot get
switching time). This resource is protected by a spin lock. The task T has acquired the spin lock
2
1
2
access to the resource, it just busy waits (shown as a loop in the figure) and does not block and
suffer context switch. T2 gets the resource as soon as T1 relinquishes the resource.
Real-Time Priorities: Let us now examine how self-host systems address the problem of
dynamic priority levels of the traditional Unix systems. In Unix based real-time operating
systems, in addition to dynamic priorities, real-time and idle priorities are supported. Fig. 31.7
schematically shows the three available priority levels.

Real-time
Priorities
127
Dynamic
Priorities
254
255 Idle Non-Migrating
Priority
Fig. 31.7 Priority Changes in Self-host Unix Systems
Idle(Non-Migrating): This is the lowest priority. The task that runs when there are no other
o m
tasks to run (idle), runs at this level. Idle priorities are static and are not recomputed
t.c
periodically.
p o
Dynamic: Dynamic priorities are recomputed periodically to improve the average response
s
time of soft real-time tasks. Dynamic re-computation of priorities ensures that I/O bound
g
o
bl
tasks migrate to higher priorities and CPU-bound tasks operate at lower priority levels. As
shown in Fig. 31.7, dynamic priority levels are higher than the idle priority, but are lower than
the real-time priorities. p .
o u
r
Real-Time: Real-time priorities are static priorities and are not recomputed. Hard real-time
g
s
tasks operate at these levels. Tasks having real-time priorities operate at higher priorities than the
tasks with dynamic priority levels.
e nt
ud Operating System
1.5. Windows As A Real-Time
t
s
ity evolved
Microsofts Windows operating
Windows operating systems chave
systems are extremely popular in desktop computers.
. over the years last twenty five years from the naive
w of DOS almost every year and kept on adding new features to DOS
DOS (Disk Operating System). Microsoft developed DOS in the early eighties. Microsoft kept
w
on announcing new versions
w
in the successive versions. DOS evolved to the Windows operating systems, whose main
distinguishing feature was a graphical front-end. As several new versions of Windows kept on
appearing by way of upgrades, the Windows code was completely rewritten in 1998 to develop
the Windows NT system. Since the code was completely rewritten, Windows NT system was
much more stable (does not crash) than the earlier DOS-based systems. The later versions of
Microsofts operating systems were descendants of the Windows NT; the DOS-based systems
were scrapped. Fig. 31.8 shows the genealogy of the various operating systems from the
Microsoft stable. Because stability is a major requirement for hard real-time applications, we
consider only the Windows NT and its descendants in our study and do not include the DOS line
of products.

DOS
Windows NT
Windows 3.1
Windows 2000
Windows 95 New
code
Windows XP
Windows 98
Fig. 31.8 Genealogy of Operating Systems from Microsofts Stable

o m
An organization owning Windows NT systems might be interested to use it for its
o t.c
real-time applications on account of either cost saving or convenience. This is especially true in
prototype application development and also when only a limited number of deployments are
s p
required. In the following, we critically analyze the suitability of Windows NT for real-time
g
application development. First, we highlight some features of Windows NT that are very
o
. bl
relevant and useful to a real-time application developer. In the subsequent subsection, we point
out some of the lacuna of Windows NT when used in real-time application development.
u p
1.5.1. Features of Windows NT r o
s g
Windows NT has several features which tare very desirable for real-time applications such as
support for multithreading, real-time priority e n levels, and timer. Moreover, the clock resolutions
d
are sufficiently fine for most real-time applications.
u
Windows NT supports 32 priority
s t idle, levels (see Fig. 31.9). Each process belongs to one
of the following priority classes:
t y normal, high, real-time. By default, the priority
class at which an application
ci runs is normal.
the priority is recomputed .periodically.
Both normal and high are variable type where
NT uses priority-driven preemptive scheduling and
w
threads of real-time priorities
Processes such as screen wsaver have precedence over all other threads including kernel threads.
use priority class idle. NT lowers the priority of a task (belonging
to variable type) if itwused all of its last time slice. It raises the priority of a task if it blocked for
I/O and could not use its last time slice in full. However, the change of a task from its base
priority is restricted to 2.

Real-time critical 31
Real-time Real-time normal

16-31
Real-time idle 16
Dynamic-time critical 15
Dynamic Dynamic normal

1-15
Dynamic idle 1 o m
Idle 0
ot.c
s p
g
Fig. 31.9 Task Priorities in Windows NT
o
1.5.2. Shortcomings of Windows NT . bl
u p
r o
In spite of the impressive support that Windows provides for real-time program development
g
as discussed in Section 4.5.1, a programmer trying to use Windows in real-time system
s
nt
development has to cope up with several problems. Of these, the following two main problems
are the most troublesome.
d e
1. t
Interrupt Processing: Priority u level of interrupts is always higher than that of the user-
s threads of real-time class. When an interrupt occurs, the
ty machines state and makes the system execute an Interrupt
level threads; including the
handler routine saves ithe
.
Service Routine (ISR).c Only critical processing is performed in ISR and the bulk of the
processing is donewas a Deferred Procedure Call(DPC). DPCs for various interrupts are
w queue in a FIFO manner. While this separation of ISR and DPC has
queued in the DPC
the advantagewof providing quick response to further interrupts, it has the disadvantage of
maintaining the all DPCs at the same priorities. A DPC can not be preempted by another
DPC but by an interrupt. DPCs are executed in FIFO order at a priority lower than the
hardware interrupt priorities but higher than the priority of the scheduler/dispatcher.
Further, it is not possible for a user-level thread to execute at a priority higher
than that of ISRs or DPCs. Therefore, even ISRs and DPCs corresponding to very
low priority tasks can preempt real-time processes. Therefore, the potential blocking
of real-time tasks due to DPCs can be large. For example, interrupts due to page faults
generated by low priority tasks would get processed faster than real-time processes.
Also, ISRs and DPCs generated due to key board and mouse interactions would operate
at higher priority levels compared to real-time tasks. If there are processes doing network
or disk I/O, the effect of system-wide FIFO queues may lead to unbounded response
times for even real-time threads.

These problems have been avoided by Windows CE operating system through a priority
inheritance mechanism.
2. Support for Resource Sharing Protocols: We had discussed in Chapter 3 that unless
appropriate resource sharing protocols are used, tasks while accessing shared resources
may suffer unbounded priority inversions leading to deadline misses and even system
failure. Windows NT does not provide any support (such as priority inheritance, etc.) to
support real-time tasks to share critical resource among themselves. This is a major
shortcoming of Windows NT when used in real-time applications.
Since most real-time applications do involve resource sharing among tasks we outline
below the possible ways in which user-level functionalities can be added to the Windows
NT system.
The simplest approach to let real-time tasks share critical resources without unbounded
priority inversions is as follows. As soon as a task is successful in locking a non-
preemptable resource, its priority can be raised to the highest priority (31). As soon as a
task releases the required resource, its priority is restored. However, we know that this
arrangement would lead to large inheritance-related inversions.
o m
o t.c
Another possibility is to implement the priority ceiling protocol (PCP). To implement this
protocol, we need to restrict the real-time tasks to have even priorities (i.e. 16, 18, ..., 30).
p
The reason for this restriction is that NT does not support FIFO scheduling among equal
s
g
priority tasks. If the highest priority among all tasks needing a resource is 2n, then the
o
bl
ceiling priority of the resource is 2n+1. In Unix, FIFO option among equal priority tasks
.
is available; therefore all available priority levels can be used.
p
1.6. Windows vs Unix o u
r
g NT versus Unix
ts
Table 31.1 Windows
Real-Time Featuree
n Windows NT Unix V
u d
DPCs
st
Real-Time priorities
Yes
Yes
No
No
y
it memory
Locking virtual Yes Yes
. c
Timer precision 1 msec 10 msec
w
w
Asynchronous I/O Yes No
w
Though Windows NT has many of the features desired of a real-time operating system, its
implementation of DPCs together its lack of protocol support for resource sharing among equal
priority tasks makes it unsuitable for use in safety-critical real-time applications. A comparison
of the extent to which some of the basic features required for real-time programming are
provided by Windows NT and Unix V is indicated in Table 1. With careful programming,
Windows NT may be useful for applications that can tolerate occasional deadline misses, and
have deadlines of the order of hundreds of milliseconds than microseconds. Of course, to be used
in such applications, the processor utilization must be kept sufficiently low and priority inversion
control must be provided at the user level.

1.7. Exercises
1. State whether the following assertions are True or False. Justify your answer in each case.
a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper
bound on achievable utilization improves as the number in tasks in the system being
developed increases.
b. Under the Unix operating system, computation intensive tasks dynamically
gravitate towards higher priorities.
c. Normally, task switching time is larger than task preemption time.
d. Suppose a real-time operating system does not support memory protection, then a
procedure call and a system call are indistinguishable in that system.
e. Watchdog timers are typically used to start certain tasks at regular intervals.
f.
m
For the memory of same size under segmented and virtual addressing schemes, the
o
segmented addressing scheme would in general incur lower memory access jitter
2.
compared to the virtual addressing scheme.
o t.c
Even though clock frequency of modern processors is of the order of several GHz, why do
s p
many modern real-time operating systems not support nanosecond or even microsecond
g
resolution clocks? Is it possible for an operating system to support nanosecond resolution
o
bl
clocks in operating systems at present? Explain how this can be achieved.
3. .
Give an example of a real-time application for which a simple segmented memory
p
ou
management support by the RTOS is preferred and another example of an application for
which virtual memory management support is essential. Justify your choices.
4.
gr
Is it possible to meet the service requirements of hard real-time applications by writing
s
nt
additional layers over the Unix System V kernel? If your answer is no, explain the
reason. If your answer is yes, explain what additional features you would implement in
d e
the external layer of Unix System V kernel for supporting hard real-time applications.
5.
t u
Briefly indicate how Unix dynamically recomputes task priority values. Why is such re-
s
computation of task priorities required? What are the implications of such priority re-
y
it
computations on real-time application development?
6.
.c
Why is Unix V non-preemptive in kernel mode? How do fully preemptive kernels based
w
on Unix (e.g. Linux) overcome this problem? Briefly describe an experimental set up that
w
can be used to determine the preemptability of different operating systems by high-priority
7.
w
real-time tasks when a low priority task has made a system call.
Explain how interrupts are handled in Windows NT. Explain how the interrupt processing
scheme of Windows NT makes it unsuitable for hard real-time applications. How has this
problem been overcome in WinCE?
8. Would you recommend Unix System V to be used for a few real-time tasks for running a
data acquisition application? Assume that the computation time for these tasks is of the
order of few hundreds of milliseconds and the deadline of these tasks is of the order of
several tens of seconds. Justify your answer.
9. Explain the problems that you would encounter if you try to develop and run a hard real-
time system on the Windows NT operating system.
10. Briefly explain why the traditional Unix kernel is not suitable to be used in a
multiprocessor environments. Define a spin lock and a kernel-level lock and explain their
use in realizing a preemptive kernel.

11. What do you understand by a microkernel-based operating system? Explain the advantages
of a microkernel- based real-time operating system over a monolithic operating system.
12. What is the difference between a self-host and a host-target based embedded operating
system? Give at least one example of a commercial operating system from each category.
What problems would a real-time application developer might face while using RT-Linux
for developing hard real-time applications?
13. What are the important features required in a real-time operating system? Analyze to what
extent these features are provided by Windows NT and Unix V.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
32
Commercial Real-Time
Operating Systems

Get an understanding of open software
Know the historical background under which POSIX was developed
Get an overview of POSIX
Understand the Real-Time POSIX standard
Get an insight into the features of some of the popular Real-Time OS: PSOS, VRTX,
VxWorks, QNX, C/OS-II, RT-Linux, Lynx, Windows CE
1. Introduction
o m
Many real-time operating systems are at present available commercially. In this lesson, we
o t.c
analyze some of the popular real-time operating systems and investigate why these popular
systems cannot be used across all applications. We also examine the POSIX standards for RTOS
and their implications.
s p
o g
1.1. POSIX
. bl
u p
POSIX stands for Portable Operating System Interface. X has been suffixed to the
r o
abbreviation to make it sound Unix-like. Over the last decade, POSIX has become an important
g
standard in the operating systems area including real-time operating systems. The importance of
s
nt
POSIX can be gauzed from the fact that nowadays it has become uncommon to come across a
commercial operating system that is not POSIX-compliant. POSIX started as an open software
d e
initiative. Since POSIX has now become overwhelmingly popular, we discuss the POSIX
t u
requirements on real-time operating systems. We start with a brief introduction to open
s
software movement and then trace the historical events that have led to the emergence of
y
it
POSIX. Subsequently, we highlight the important requirements of real-time POSIX.
.c
w
1.2. Open Software
w
w
An open system is a vendor neutral environment, which allows users to intermix hardware,
software, and networking solutions from different vendors. Open systems are based on open
standards and are not copyrighted, saving users from expensive intellectual property right (IPR)
law suits. The most important characteristics of open systems are: interoperability and
portability. Interoperability means systems from multiple vendors can exchange information
among each other. A system is portable if it can be moved from one environment to another
without modifications. As part of the open system initiative, open software movement has
become popular.
Advantages of open software include the following: It reduces cost of development and time
to market a product. It helps increase the availability of add-on software packages. It enhances
the ease of programming. It facilitates easy integration of separately developed modules.
POSIX is an off-shoot of the open software movement.
Open Software standards can be divided into three categories:

Open Source: Provides portability at the source code level. To run an application on a new
platform would require only compilation and linking. ANSI and POSIX are important open
source standards.
Open Object: This standard provides portability of unlinked object modules across different
platforms. To run an application in a new environment, relinking of the object modules
would be required.
Open Binary: This standard provides complete software portability across hardware
platforms based on a common binary language structure. An open binary product can be
portable at the executable code level. At the moment, no open binary standards.
The main goal of POSIX is application portability at the source code level. Before we discuss
about RT-POSIX, let us explore the historical background under which POSIX was developed.
1.3. Genesis of POSIX

o m
Before we discuss the different features of the POSIX standard in the
t. c next subsection, let us
understand the historical developments that led to the development ofoPOSIX.
Unix was originally developed by AT&T Bell Labs. Since
s p AT&T was primarily a
telecommunication company, it felt that Unix was not commercially
o g important for it. Therefore,
l
it distributed Unix source code free of cost to several universities. UCB (University of California
at Berkeley) was one of the earliest recipient of Unix source .b code.
AT&T later got interested in computers, realized theppotential of Unix and started developing
Unix further and came up with Unix V. Meanwhile,
o uUCB had incorporated TCP/IP into Unix
through a large DARPA (Defense Advanced Research
g r Project Agency of USA) project and had
come up with BSD 4.3 and C Shell. With this,
ts the commercial importance of Unix started to
ways: IBM with its AIX, HP with its HP-UX,

n
grow rapidly. As a result, many vendors implemented and extended Unix services in different
e Sun with its Solaris, Digital with its Ultrix, and
SCO with SCO-Unix. Since there were d
u soItmany variants of Unix, portability of applications
s
across Unix platforms became a problem.t resulted in a situation where a program written on
one Unix platform would not run
i tyonwasanother platform.
. c
The need for a standard Unix recognized by all. The first effort towards standardization
w
of Unix was taken by AT&T in the form of its SVID (System V Interface Definition). However,
BSD and other vendorswignored this initiative. The next initiative was taken under ANSI/IEEE,
which yielded POSIX. w
1.4. Overview of POSIX
POSIX is an off-shoot of the open software movement, and portability of applications across
different variants of Unix operating systems was the major concern. POSIX standard defines
only interfaces to operating system services and the semantics of these services, but does not
specify how exactly the services are to be implemented. For example, the standard does not
specify whether an operating system kernel must be single threaded or multithreaded or
at what priority level the kernel services are to be executed.
The POSIX standard has several parts. The important parts of POSIX and the aspects that
they deal with, are the following:

Open Source: Provides portability at the source code level. To run an application on a new
platform would require only compilation and linking. ANSI and POSIX are important open
source standards.
POSIX 1 : system interfaces and system call parameters

POSIX 2 : shells and utilities
POSIX 3 : test methods for verifying conformance to POSIX
POSIX 4 : real-time extensions
1.5. Real-Time POSIX Standard

POSIX.4 deals with real-time extensions to POSIX and is also known as POSIX-RT. For an
operating system to be POSIX-RT compliant, it must meet the different requirements specified in
the POSIX-RT standard. The main requirements of the POSIX-RT are the following:
m
Execution scheduling: An operating system to be POSIX-RT compliant must provide
o
t.c
support for real-time (static) priorities.
o
Performance requirements on system calls: It specifies the worst case execution times
p

required for most real-time operating services.
g s
Timers: Periodic and one shot timers (also called b
lo
Priority levels: The number of priority levels supported should be at least 32.
The system clock is called CLOCK REALTIME

. watch dog timer) should be supported.
p when the system supports real-time
POSIX. u
o be supported. A real-time file system can
r
gbe able to store file blocks contiguously on the
Real-time files: Real-time file system should
s
tdelay in file access in virtual memory system.
pre-allocate storage for files and should
n
disk. This enables to have predictable
e should be supported. POSIX-RT defines the
Memory locking: Memory locking d
u to lock all pages of a process, mlock() to lock a
t
s to lock only the current page. The unlock services are
operating system services: mlockall()
t y
range of pages, and mlockpage()
i and munlockpage. Memory locking services have been
. c
munlockall(), munlock(),
introduced to support deterministic memory access.

w
Multithreadingwsupport: Real-time threading support is mandated. Real-time threads
w
are schedulable entities of a real-time application that have individual timeliness
constraints and may have collective timeliness constraints when belonging to a runnable
set of threads.
1.6. A Survey of Contemporary Real-Time Operating Systems

In this section, we briefly survey the important feature of some of the popular real-time
operating systems that are being used in commercial applications.
1.6.1. PSOS
PSOS is a popular real-time operating system that is being primarily used in embedded
applications. It is available from Wind River Systems, a large player in the real-time operating
system arena. It is a host-target type of real- time operating system. PSOS is being used in
several commercial embedded products. An example application of PSOS is in the base stations
of the cellular systems.
Legend:
XRAY+: Source level
Debgguer
PROBE: Target Debgger
Editor
Cross-
compiler
XRAY+
Libraries Application
P
Host Computer N PSOS+
A PHILE
TCP/IP +
PROBE
o m
ot.c
Target
s p
o g
Fig. 32.1 PSOS-based Development of Embedded Software
PSOS-based application development has schematically . bl been shown in Fig. 32.1. The host
computer is typically a desktop. Both Unix and Windows
u p hosts are supported. The target board
contains the embedded processor, ROM, RAM, etc.
r o TheOnhost computer runs the editor, cross-
s gPROBE are installed

compiler, source-level debugger, and library routines. the target board PSOS+, and other
optional modules such as PNA+, PHILE, and
n t
network manager. It provides TCP/IP communication
on a RAM. PNA+ is the
over Ethernet and FDDI. It conforms to
e
dPNA+ provides efficient downloading and debugging
Unix 4.3 (BSD) socket syntax and is compatible with other TCP/IP-based networking standards
such as ftp and NFS. Using these,
t u
communication between the targetsand the host. PROBE+ is the target debugger and XRAY+ is
the source-level debugger. Thei t yapplication development is done on the host machine and is
c The application is debugged using the source debugger
downloaded to the target .board.
w runs satisfactorily, it is fused on a ROM and installed on the
(XRAY+). Once the application
target board. w
We now highlightwsome important features of PSOS. PSOS consists of 32 priority levels. In
the minimal configuration, the foot print of the operating system is only 12KBytes. For sharing
critical resources among real-time tasks, it supports priority inheritance and priority ceiling
protocols. It support segmented memory management. It allocates tasks to memory regions. A
memory region is a physically contiguous block of memory. A memory region is created by the
operating system in response to a call from an application.
In most modern operating systems, the control jumps to the kernel when an interrupt
occurs. PSOS takes a different approach. The device drivers are outside the kernel and can be
loaded and removed at the run time. When an interrupt occurs, the processor jumps directly to
the ISR (interrupt service routine) pointed to by the vector table.
The intention is not only to gain speed, but also to give the application developer complete
control over interrupt handling.

1.6.2. VRTX
VRTX is a POSIX-RT compliant operating system from Mentor Graphics. VRTX has been
certified by the US FAA (Federal Aviation Agency) for use in mission and life critical
applications such as avionics. VRTX has two multitasking kernels: VRTXsa and VRTXmc.
VRTXsa is used for large and medium applications. It supports virtual memory. It has a
POSIX-compliant library and supports priority inheritance. Its system calls are deterministic and
fully preemptable. VRTXmc is optimized for power consumption and ROM and RAM sizes. It
has therefore a very small foot print. The kernel typically requires only 4 to 8 Kbytes of ROM
and 1KBytes of RAM. It does not support virtual memory. This version is targeted for cell
phones and other small hand-held devices.
1.6.3. VxWorks
m
VxWorks is a product from Wind River Systems. It is host-target system. The host can be
o
t.c
either a Windows or a Unix machine. It supports most POSIX-RT functionalities. VxWorks
comes with an integrated development environment (IDE) called Tornado. In addition to the
p o
standard support for program development tools such as editor, cross-compiler, cross-debugger,
s
etc. Tornado contains VxSim and WindView. VxSim simulates a VxWorks target for use as a
g
o
prototyping and testing environment. WindView provides debugging tools for the simulator
environment. VxMP is the multiprocessor version of VxWorks.
. bl
u p
VxWorks was deployed in the Mars Pathfinder which was sent to Mars in 1997. Pathfinder
landed in Mars, responded to ground commands, and started to send science and engineering
r o
data. However, there was a hitch: it repeatedly reset itself. Remotely using trace generation,
g
logging, and debugging tools of VxWorks, it was found that the cause was unbounded priority
s
ent
inversion. The unbounded priority inversion caused real-time tasks to miss their deadlines, and
as a result, the exception handler reset the system each time. Although VxWorks supports
d
priority inheritance, using the remote debugging tool, it was found to have been disabled in the
u
t
configuration file. The problem was fixed by enabling it.
s
1.6.4. QNX it y
.c
w
w
QNX is a product from QNX Software System Ltd. QNX Neutrino offers POSIX-compliant
w
APIs and is implemented using microkernel architecture.
The microkernel architecture of QNX is shown in Fig. 32.2. Because of the fine grained
scalability of the microkernel architecture, it can be configured to a very small size a critical
advantage in high volume devices, where even a 1% reduction in memory costs can return
millions of dollars in profit.

File Device
System Driver
Micro Message Passing

Kernel
Appli- TCP/IP
cation Manager
Fig. 32.2 Microkernel Architecture of QNX
1.6.5. C/OS-II
C/OS-II is a free RTOS, easily available on Internet. It is written in ANSI C and contains
o m
small portion of assembly code. The assembly language portion has been kept to a minimum to
t.c
make it easy to port it to different processors. To date, C/OS-II has been ported to over 100
p o
different processor architectures ranging from 8-bit to 64-bit microprocessors, microcontrollers,
and DSPs. Some important features of C/OS-II are highlighted in the following.
g s
C/OS-II was designed so that the programmer can use just a few of the offered services
o
bl
or select the entire range of services. This allows the programmer to minimize the
.
amount of memory needed by C/OS-II on a per-product basis.
p
C/OS-II has a fully preemptive kernel. Thisumeans that C/OS-II always ensures that
the highest priority task that is ready wouldrobe taken up for execution.

g
C/OS-II allows up to 64 tasks to bescreated. Each task operates at a unique priority
level. There are 64 priority levels. n t This means that round-robin scheduling is not
supported. The priority levels are e
C/OS-II uses a partitionedtu
d used as the PID (Process Identifier) for the tasks.
memory management. Each memory partition consists of
several fixed sized blocks. s
y A task obtains memory blocks from the memory partition and
the task must create iat memory partition before it can be used. Allocation and
. c memory blocks is done in constant time and is deterministic.
deallocation of fixed-sized
A task can create w and use multiple memory partitions, so that it can use memory blocks
w
of different sizes.

w
C/OS-II has been certified by Federal Aviation Administration (FAA) for use in
commercial aircraft by meeting the demanding requirements of its standard for software
used in avionics. To meet the requirements of this standard it was demonstrated through
documentation and testing that it is robust and safe.
1.6.6. RT Linux
Linux is by large a free operating system. It is robust, feature rich, and efficient. Several real-
time implementations of Linux (RT-Linux) are available. It is a self-host operating system (see
Fig. 32.3). RT-Linux runs along with a Linux system. The real-time kernel sits between the
hardware and the Linux system. The RT kernel intercepts all interrupts generated by the
hardware. Fig. 32.12 schematically shows this aspect. If an interrupt is to cause a real-time task

to run, the real-time kernel preempts Linux, if Linux is running at that time, and lets the real-time
task run. Thus, in effect Linux runs as a task of RT-Linux.
Linux
RT Linux
Hardware
Fig. 32.3 Structure of RT Linux
o m
The real-time applications are written as loadable kernel modules. In essence, real-time
applications run in the kernel space.
o t.c
In the approach taken by RT Linux, there are effectively two independent kernels: real-time
p
kernel and Linux kernel. Therefore, this approach is also known as the dual kernel approach as
s
g
the real-time kernel is implemented outside the Linux kernel. Any task that requires deterministic
o
bl
scheduling is run as a real-time task. These tasks preempt Linux whenever they need to execute
p .
and yield the CPU to Linux only when no real-time task is ready to run.
Compared to the microkernel approach, the following are the shortcomings of the dual-kernel
approach.
o u

gr
Duplicated Coding Efforts: Tasks running in the real-time kernel can not make full use
s
nt
of the Linux system services file systems, networking, and so on. In fact, if a real-time
task invokes a Linux service, it will be subject to the same preemption problems that
d e
prohibit Linux processes from behaving deterministically. As a result, new drivers and
t u
system services must be created specifically for the real-time kernel even when
s
equivalent services already exist for Linux.
y
it
.c
Fragile Execution Environment: Tasks running in the real-time kernel do not benefit
from the MMU-protected environment that Linux provides to the regular non-real-time
w
processes. Instead, they run unprotected in the kernel space. Consequently, any real-time
w
task that contains a coding error such as a corrupt C pointer can easily cause a fatal kernel
w
fault. This is serious problem since many embedded applications are safety-critical in
nature.
Limited Portability: In the dual kernel approach, the real-time tasks are not Linux
processes at all; but programs written using a small subset of POSIX APIs. To aggravate
the matter, different implementations of dual kernels use different APIs. As a result, real-
time programs written using one vendors RT-Linux version may not run on anothers.
Programming Difficulty: RT-Linux kernels support only a limited subset of POSIX
APIs. Therefore, application development takes more effort and time.

1.6.7. Lynx
Lynx is a self host system. The currently available version of Lynx (Lynx 3.0) is a
microkernel-based real-time operating system, though the earlier versions were based on
monolithic design. Lynx is fully compatible with Linux. With Lynxs binary compatibility, a
Linux programs binary image can be run directly on Lynx. On the other hand, for other Linux
compatible operating systems such as QNX, Linux applications need to be recompiled in order to
run on them. The Lynx microkernel is 28KBytes in size and provides the essential services in
scheduling, interrupt dispatch, and synchronization. The other services are provided as kernel
plug-ins (KPIs). By adding KPIs to the microkernel, the system can be configured to support I/O,
file systems, sockets, and so on. With full configuration, it can function as a multipurpose Unix
machine on which both hard and soft real-time tasks can run. Unlike many embedded real-time
operating systems, Lynx supports memory protection.
1.6.8. Windows CE
o m
c
Windows CE is a stripped down version of Windows, and hast.a minimum footprint of
p o all threads are run in
400KBytes only. It provides 256 priority levels. To optimize performance,
the kernel mode. The timer accuracy is 1 msec for sleep and wait
g s related APIs. The different
o time. Also, interrupt servicing is
functionalities of the kernel are broken down into small non-preemptive sections. So, during
system call preemption is turned off for only short periods lof
b
preemptable. That is, it supports nested interrupts. It uses. memory management unit (MMU) for
virtual memory management. u p
Windows CE uses a priority inheritance scheme
r o to avoid priority inversion problem present
s g Whenthea thread
in Windows NT. Normally, the kernel thread handling page fault (i.e. DPC) runs at priority
level higher than NORMAL (refer Sec. 4.5.2).
n t with priority level NORMAL
d
raised to the priority of the thread causing
e the page fault. This ensures that a thread is not
suffers a page fault, the priority of the corresponding kernel thread handling this page fault is
t
blocked by another lower priority thread u even when it suffers a page fault.
y s
1.6.9. Exercises i t
.c
w statements are True or False. Justify your answer in each case.
1.
w
State whether the following
w
a. In real-time Linux (RT-Linux), real-time processes are scheduled at priorities
higher than the kernel processes.
b. EDF scheduling of tasks is commonly supported in commercial real-time operating
systems such as PSOS and VRTX.
c. POSIX 1003.4 (real-time standard) requires that real-time processes be scheduled at
priorities higher than kernel processes.
d. POSIX is an attempt by ANSI/IEEE to enable executable files to be portable
across different Unix machines.
2. What is the difference between block I/O and character I/O? Give examples of each.
Which type of I/O is accorded higher priority by Unix? Why?
3. List four important features that a POSIX 1003.4 (Real-Time standard) compliant
operating system must support. Is preemptability of kernel processes required by POSIX
1003.4? Can a Unix-based operating system using the preemption-point technique claim to
be POSIX 1003.4 compliant? Explain your answers.

4. Suppose you are the manufacturer of small embedded components used mainly in
consumer electronics goods such as automobiles, MP3 players, and computer-based toys.
Would you prefer to use PSOS, WinCE, or RT-Linux in your embedded component?
Explain the reasons behind your answer.
5. What is the difference between a system call and a function call? What problems, if any,
might arise if the system calls are invoked as procedure calls?
6. Explain how a real-time operating system differs from a traditional operating system.
Name a few real-time operating systems that are commercially available.
7. What is open software? Does an open software mandate portability of the executable files
across different platforms? Name an open software standard for real-time operating
systems. What is the advantage of using an open software operating system for real-time
application development? What are the pros and cons of using an open software product in
program development compared to a proprietary product?
8. Identify at least four important advantages of using VxWorks as the operating system for
real-time applications compared to using Unix V.3.
9.
o m
What is an open source standard? How is it different from open object and open binary
10.
o t.c
standards? Give some examples of popular open source software products.
Can multithreading result in faster response times (compared to single threaded tasks) even
p
in uniprocessor systems? Explain your answer and identify the reasons to support your
s
answer.
o g
References (Lessons 24 - 28) . bl
u p
1. o
C.M. Krishna and Shin K.G., Real-Time Systems, Tata McGraw-Hill, 1999.
r
2. Philip A. Laplante, Real-Time System Design
s g and Analysis, Prentice Hall of India, 1996.
3. n
Jane W.S. Liu, Real-Time Systems, Pearson t Press, 2000.
4. Alan C. Shaw, Real-Time Systems d eand Software, John Wiley and Sons, 2001.
t u
5.
s
C. SivaRam Murthy and G. Manimaran, Resource Management in Real-Time Systems and
Networks, MIT Press, 2001.
i ty
6. c
B. Dasarathy, Timing .Constraints of Real-Time Systems: Constructs for Expressing Them,
w80-86.
Methods for Validating Them, IEEE Transactions on Software Engineering, January 1985,
w
Vol. 11, No. 1, pages
7. w
Lui Sha, Ragunathan Rajkumar, John P. Lehoczky, Priority inheritance protocols: An
approach to real-time synchronization,, IEEE Transactions on Computers, 1990, Vol. 39,
pages 1175-1185.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
33
Introduction to Software
Engineering

Get an introduction to software engineering
Understand the need for software engineering principles
Identify the causes of and solutions for software crisis
Differentiate a piece of program from a software product
Understand the evolution of software design techniques over last 50 years
Identify the features of a structured program and its advantages
Identify the features of various design techniques
m
Differentiate between the exploratory style and modern styles of software
o
t.c
development
Explain what a life cycle model is o
Understand the need for a software life cycle model s p
o g
Identify the different phases of the classical waterfall model and related activities
. bl
Identify the phase-entry and phase-exit criteria of each phase
Explain what a prototype is u p
r o
g
Explain the need for prototype development
s
n t each phase of a spiral model
State the activities carried out during
d e
1. Introduction t u
y s
t
i technology, computers have become more powerful and
.c a computer is, the more sophisticated programs it can run.
With the advancement of
sophisticated. The more powerful
Thus, programmers havew
w been
coped with this challenge
tasked to solve larger and more complex problems. They have
by innovating and by building on their past programming experience.
w and experience of writing good quality programs in efficient and cost-
All those past innovations
effective ways have been systematically organized into a body of knowledge. This body of
knowledge forms the basis of software engineering principles. Thus, we can view software
engineering as a systematic collection of past experience. The experience is arranged in the form
of methodologies and guidelines.
1.1. The Need for Software Engineering

Alternatively, software engineering can be viewed an engineering approach to software
development. A small program can be written without using software engineering principles. But
if one wants to develop a large software product, then software engineering principles are
indispensable to achieve a good quality software cost effectively. These definitions can be
elaborated with the help of a building construction analogy.

Suppose you have a friend who asked you to build a small wall as shown in fig. 33.1. You
would be able to do that using your common sense. You will get building materials like bricks;
cement etc. and you will then build the wall.
Fig. 33.1 A Small Wall
But what would happen if the same friend asked you to build a large multistoried building as
shown in fig. 33.2?
o m
o t.c
Fig. 33.2 A Multistoried Buildings p
You don't have a very good idea about building suchlo
g
a huge complex. It would be very
.
difficult to extend your idea about a small wall construction b into constructing a large building.
p because you would not have the
Even if you tried to build a large building, it would collapse
u
o
requisite knowledge about the strength of materials, testing, planning, architectural design, etc.
Building a small wall and building a large buildingr are entirely different ball games. You can use
s g a small wall, but building a large building
requires knowledge of civil, architectural andn tother engineering principles.
your intuition and still be successful in building
d e principles it would be difficult to develop large

Without using software engineering
t u to develop large programs to accommodate multiple
s such large commercial programs is that the complexity
programs. In industry it is usually needed
y
t
functions. A problem with developing
i increase exponentially with their sizes as shown in fig. 33.3.
c
and difficulty levels of the programs
.
wtimes more difficult to develop, but may as well turn out to be 100
For example, a program of size 1,000 lines of code has some complexity. But a program with
w
10,000 LOC is not just 10
w
times more difficult unless software engineering principles are used. In such situations software
engineering techniques come to the rescue. Software engineering helps to reduce programming
complexity. Software engineering principles use two important techniques to reduce problem
complexity: abstraction and decomposition.
The principle of abstraction (in fig. 33.4) implies that a problem can be simplified by
omitting irrelevant details. Once the simpler problem is solved then the omitted details can be
taken into consideration to solve the next lower level abstraction, and so on.

taken to develop
Efforts and Time
Complexity,
Size
Fig. 33.3 Increase in development time and effort with problem size
o m
o t.c
1.1.1. Abstraction and Decomposition p
g s
o
. bl 3rd abstraction
u p
r o
s g
ent 3rd abstraction
u d
st
it y
.c 1st abstraction
w
w
w
Full problem
Fig. 33.4 A hierarchy of abstraction

The other approach to tackle problem complexity is decomposition. In this technique, a
complex problem is divided into several smaller problems and then the smaller problems are
solved one by one. However, in this technique any random decomposition of a problem into
smaller parts will not help. The problem has to be decomposed such that each component of the
decomposed problem can be solved independently and then the solution of the different

components can be combined to get the full solution. A good decomposition of a problem as
shown in fig. 33.5 should minimize interactions among various components. If the different
subcomponents are interrelated, then the different components cannot be solved separately and
the desired reduction in complexity will not be realized.
o m
o t.c
p
Fig. 33.5 Decomposition of a large problem into asset of smaller problems
o g
1.2. The Software Crisis
. bl
Software engineering appears to be among the few u p options available to tackle the present
software crisis. r o
s g words, consider the following. The expenses
t
To explain the present software crisis in simple
n
e
that organizations all around the world are incurring on software purchases compared to those on
hardware purchases have been showingda worrying trend over the years (as shown in fig. 33.6)
t u
s
ti y
.c
Hardware cost / Software cost
w
w
w
1960 Year 2002
Fig. 33.6 Change in the relative cost of hardware and software over time

Organizations are spending larger and larger portions of their budget on software. Not only
are the software products turning out to be more expensive than hardware, but they also present a
host of other problems to the customers: software products are difficult to alter, debug, and
enhance; use resources non-optimally; often fail to meet the user requirements; are far from
being reliable; frequently crash; and are often delivered late. Among these, the trend of
increasing software costs is probably the most important symptom of the present software crisis.
Remember that the cost we are talking of here is not on account of increased features, but due to
ineffective development of the product characterized by inefficient resource usage, and time and
cost over-runs.
There are many factors that have contributed to the making of the present software crisis.
Factors are larger problem sizes, lack of adequate training in software engineering, increasing
skill shortage, and low productivity improvements.
It is believed that the only satisfactory solution to the present software crisis can possibly
m
come from a spread of software engineering practices among the engineers, coupled with further
o
t.c
advancements to the software engineering discipline itself.
1.3. Program vs. Software Product p o

g s
o
Programs are developed by individuals for their personal use. They are therefore, small in
bl
size and have limited functionality but software products are extremely large. In case of a
.
u p
program, the programmer himself is the sole user but on the other hand, in case of a software
product, most users are not involved with the development. In case of a program, a single
r o
developer is involved but in case of a software product, a large number of developers are
s g
involved. For a program, the user interface may not be very important, because the programmer
ent
is the sole user. On the other hand, for a software product, user interface must be carefully
designed and implemented because developers of that product and users of that product are
d
totally different. In case of a program, very little documentation is expected, but a software
u
t
product must be well documented. A program can be developed according to the programmers
s
it y
individual style of development, but a software product must be developed using the accepted
.c
software engineering principles.
w
2. Evolution ofwProgram Design Techniques
w
During the 1950s, most programs were being written in assembly language. These programs
were limited to about a few hundreds of lines of assembly code, i.e. were very small in size.
Every programmer developed programs in his own individual style - based on his intuition. This
type of programming was called Exploratory Programming.
The next significant development which occurred during early 1960s in the area computer
programming was the high-level language programming. Use of high-level language
programming reduced development efforts and development time significantly. Languages like
FORTRAN, ALGOL, and COBOL were introduced at that time.

2.1. Structured Programming

As the size and complexity of programs kept on increasing, the exploratory programming
style proved to be insufficient. Programmers found it increasingly difficult not only to write cost-
effective and correct programs, but also to understand and maintain programs written by others.
To cope with this problem, experienced programmers advised other programmers to pay
particular attention to the design of the programs control flow structure (in late 1960s). In the
late 1960s, it was found that the "GOTO" statement was the main culprit which makes control
structure of a program complicated and messy. At that time most of the programmers used
assembly languages extensively. They considered use of "GOTO" statements in high-level
languages were very natural because of their familiarity with JUMP statements which are very
frequently used in assembly language programming. So they did not really accept that they can
write programs without using GOTO statements, and considered the frequent use of GOTO
statements inevitable. At this time, Dijkstra [1968] published his (now famous) article GOTO
o m
Statements Considered Harmful. Expectedly, many programmers were enraged to read this
article. They published several counter articles highlighting the advantages and inevitable use of
o t.c
GOTO statements. But, soon it was conclusively proved that only three programming constructs
sequence, selection, and iteration were sufficient to express any programming logic. This
formed the basis of the structured programming methodology.
s p
o g
2.1.1. Features of Structured Programming
. bl
A structured program uses three types of program u p constructs i.e. selection, sequence and
r o flows by restricting the use of GOTO
iteration. Structured programs avoid unstructured control
statements. A structured program consists of ga well partitioned set of modules. Structured
ts constructs such as if-then-else, do-while, etc.
programming uses single entry, single-exit program
Thus, the structured programming principle
e n emphasizes designing neat control structures for
programs.
u d
st
2.1.2. Advantages of Structured
i ty Programming
Structured programs are.ceasier to read and understand. Structured programs are easier to
maintain. They require w less effort and time for development. They are amenable to easier
debugging and usually w
w fewer errors are made in the course of writing such programs.
2.2. Data Structure-Oriented Design

After structured programming, the next important development was data structure-oriented
design. Programmers argued that for writing a good program, it is important to pay more
attention to the design of data structure, of the program rather than to the design of its control
structure. Data structure-oriented design techniques actually help to derive program structure
from the data structure of the program. Example of a very popular data structure-oriented design
technique is Jackson's Structured Programming (JSP) methodology, developed by Michael
Jackson in the1970s.

2.3. Data Flow-Oriented Design

Next significant development in the late 1970s was the development of data flow-oriented
design technique. Experienced programmers stated that to have a good program structure, one
has to study how the data flows from input to the output of the program. Every program reads
data and then processes that data to produce some output. Once the data flow structure is
identified, then from there one can derive the program structure.
2.4. Object-Oriented Design

Object-oriented design (1980s) is the latest and very widely used technique. It has an
intuitively appealing design approach in which natural objects (such as employees, pay-roll
register, etc.) occurring in a problem are first identified. Relationships among objects (such as
composition, reference and inheritance) are determined. Each object essentially acts as a data
hiding entity.
o m
2.5. Changes in Software Development Practices t.c
p o
g s
An important difference is that the exploratory software development style is based on error
b lo that it is much more cost-

correction while the software engineering principles are primarily based on error prevention.
Inherent in the software engineering principles is the realization
effective to prevent errors from occurring than to correct
p . them as and when they are detected.
Even when errors occur, software engineering principles
o u emphasize detection of errors as close
to the point where the errors are committed as rpossible. In the exploratory style, errors are
detected only during the final product testing.g In contrast, the modern practice of software
development is to develop the software through ts several well-defined stages such as requirements
n attempts are made to detect and fix as many errors
specification, design, coding, testing, etc.,eand
d occur.
as possible in the same phase in which they
u
s t considered synonymous with software development. For
i
instance, exploratory programming ty style believed in developing a working system as quickly as
In the exploratory style, coding was
.c modifying it until it performed satisfactorily.

possible and then successively
w
w development style, coding is regarded as only a small part of the
In the modern software
w
overall software development activities. There are several development activities such as design
and testing which typically require much more effort than coding.
A lot of attention is being paid to requirements specification. Significant effort is now being
devoted to develop a clear specification of the problem before any development activity is
started.
Now, there is a distinct design phase where standard design techniques are employed.
Periodic reviews are being carried out during all stages of the development process. The
main objective of carrying out reviews is phase containment of errors, i.e. detect and correct
errors as soon as possible. Defects are usually not detected as soon as they occur, rather they are
noticed much later in the life cycle. Once a defect is detected, we have to go back to the phase

where it was introduced and rework those phases - possibly change the design or change the code
and so on.
Today, software testing has become very systematic and standard testing techniques are
available. Testing activity has also become all encompassing in the sense that test cases are being
developed right from the requirements specification stage.
There is better visibility of design and code. By visibility we mean production of good
quality, consistent and standard documents during every phase. In the past, very little attention
was paid to producing good quality and consistent documents. In the exploratory style, the
design and test activities, even if carried out (in whatever way), were not documented
satisfactorily. Today, consciously good quality documents are being developed during product
development. This has made fault diagnosis and maintenance smoother.
Now, projects are first thoroughly planned. Project planning normally includes preparation of
m
various types of estimates, resource scheduling, and development of project tracking plans.
o
t.c
Several techniques and tools for tasks such as configuration management, cost estimation,
scheduling, etc. are used for effective software project management.
p o
assurance. g s
Several metrics are being used to help in software project management and software quality
o
3. Software Life Cycle Model . bl
u p
r o
A software life cycle model (also called process model) is a descriptive and diagrammatic
s g
representation of the software life cycle. A life cycle model represents all the activities required
ent
to make a software product transit through its life cycle phases. It also captures the order in
which these activities are to be undertaken. In other words, a life cycle model maps the different
d
activities performed on a software product from its inception to its retirement. Different life
u
t
cycle models may map the basic development activities to phases in different ways. Thus, no
s
it y
matter which life cycle model is followed, the basic activities are included in all life cycle
.c
models though the activities may be carried out in different orders in different life cycle models.
During any life cycle phase, more than one activity may also be carried out. For example, the
w
design phase might consist of the structured analysis activity followed by the structured design
activity. w
w
3.1. The Need for a Life Cycle Model
The development team must identify a suitable life cycle model for the particular project and
then adhere to it. Without using a particular life cycle model, the development of a software
product would not be in a systematic and disciplined manner. When a software product is being
developed by a team there must be a clear understanding among team members about when and
what to do. Otherwise it would lead to chaos and project failure. Let us try to illustrate this
problem using an example. Suppose a software development problem is divided into several
parts and the parts are assigned to the team members. From then on, suppose the team members
are allowed the freedom to develop the parts assigned to them in whatever way they like. It is
possible that one member might start writing the code for his part, another might decide to

prepare the test documents first, and some other engineer might begin with the design phase of
the parts assigned to him. This would be one of the perfect recipes for project failure.
A software life cycle model defines entry and exit criteria for every phase. A phase can start
only if its phase-entry criteria have been satisfied. So without a software life cycle model, the
entry and exit criteria for a phase cannot be recognized. Without models (such as classical
waterfall model, iterative waterfall model, prototyping model, evolutionary model, spiral model
etc.), it becomes difficult for software project managers to monitor the progress of the project.
Many life cycle models have been proposed so far. Each of them has some advantages as
well as some disadvantages. A few important and commonly used life cycle models are as
follows:
Classical Waterfall Model

Iterative Waterfall Model
Prototyping Model
o m
Evolutionary Model
Spiral Model o t.c
s p
3.2. Classical Waterfall Model o g
. bl
u p
The classical waterfall model is intuitively the most obvious way to develop software.
Though the classical waterfall model is elegant and intuitively obvious, we will see that it is not a
r o
practical model in the sense that it can not be used in actual software development projects.
g
Thus, we can consider this model to be a theoretical way of developing software. But all other
s
ent
life cycle models are essentially derived from the classical waterfall model. So, in order to be
able to appreciate other life cycle models, we must first learn the classical waterfall model.
u d
Classical waterfall model divides the life cycle into the following phases as shown in fig.
33.7: st
it y
Feasibility study
.c
w
Requirements analysis and specification
w
Design w
Coding and unit testing
Integration and system testing
Maintenance

Feasibility Study
Requirement analysis
and specification
Design
Coding
Testing
o m
Maintenance o t.c
s p
o g
bl
Fig. 33.7 Classical Waterfall Model
3.2.1. Feasibility Study p .

o u
gr
The main aim of feasibility study is to determine whether it would be financially and
s
nt
technically feasible to develop the product

e
At first project managers or team leaders try to have a rough understanding of what is
d
required to be done by visiting the client side. They study different input data to the
t u
system and output data to be produced by the system. They study what kind of processing
y s
is needed to be done on these data and they look at the various constraints on the
c
behaviour of the system.it

. understanding of the problem, they investigate the different
wpossible. Then they examine each of the solutions in terms of what
After they have an overall
w
solutions that are
w are required, what would be the cost of development and what would
kinds of resources
be the development time for each solution.
Based on this analysis, they pick the best solution and determine whether the solution is
feasible financially and technically. They check whether the customer budget would meet
the cost of the product and whether they have sufficient technical expertise in the area of
development.
The following is an example of a feasibility study undertaken by an organization. It is

intended to give one a feel of the activities and issues involved in the feasibility study phase of a
typical software project.

Case Study
A mining company named Galaxy Mining Company Ltd. (GMC) has mines located at various
places in India. It has about fifty different mine sites spread across eight states. The company
employs a large number of mines at each mine site. Mining being a risky profession, the
company intends to operate a special provident fund, which would exist in addition to the
standard provident fund that the miners already enjoy. The main objective of having the special
provident fund (SPF) would be to quickly distribute some compensation before the standard
provident amount is paid. According to this scheme, each mine site would deduct SPF
instalments from each miner every month and deposit the same with the CSPFC (Central Special
Provident Fund Commissioner). The CSPFC will maintain all details regarding the SPF
instalments collected from the miners. GMC employed a reputed software vendor Adventure
Software Inc. to undertake the task of developing the software for automating the maintenance of
SPF records of all employees. GMC realized that besides saving manpower on bookkeeping
work, the software would help in speedy settlement of claim cases. GMC indicated that the
o m
amount it could afford for this software to be developed and installed was 1 million rupees.
.c
Adventure Software Inc. deputed their project manager to carry out tthe feasibility study. The
project manager discussed the matter with the top managers of GMC p o to get an overview of the
g
project. He also discussed the issues involved with the several fields PF officers at various mine
sites to determine the exact details of the project. The project
lodatabase
manager identified two broad
approaches to solve the problem. One was to have a central
. b which could be accessed
databases at each mine site and to update the central u pdatabase

and updated via a satellite connection to various mine sites. The other approach was to have local
periodically through a dial-up
connection. These periodic updates could be done r o
on a daily or hourly basis depending on the
g
s and more fault-tolerant as the local mine
delay acceptable to GMC in invoking various functions of the software. The project manager
n t
found that the second approach was very affordable
d e
sites could still operate even when the communication link to the central database temporarily
t u communication with the mine sites. He arrived at a

failed. The project manager quickly analyzed the database functionalities required, the user-
s He found that the solution involving maintenance of local
interface issues, and the software handling
y
cost to develop from the analysis.
t
i periodic updating of a central database was financially and
databases at the mine sites and
. c
technically feasible. The project manager discussed his solution with the GMC management and
w
w acceptable to them as well.
found that the solution was
w
3.2.2. Requirements Analysis and Specification
The aim of the requirements analysis and specification phase is to understand the exact
requirements of the customer and to document them properly. This phase consists of two distinct
activities, namely
Requirements gathering and analysis, and
Requirements specification
The goal of the requirements gathering activity is to collect all relevant information from the
customer regarding the product to be developed with a view to clearly understand the customer
requirements and weed out the incompleteness and inconsistencies in these requirements.

The requirements analysis activity is begun by collecting all relevant data regarding the
product to be developed from the users of the product and from the customer through interviews
and discussions. For example, to perform the requirements analysis of a business accounting
software required by an organization, the analyst might interview all the accountants of the
organization to ascertain their requirements. The data collected from such a group of users
usually contain several contradictions and ambiguities, since each user typically has only a
partial and incomplete view of the system. Therefore it is necessary to identify all ambiguities
and contradictions in the requirements and resolve them through further discussions with the
customer. After all ambiguities, inconsistencies, and incompleteness have been resolved and all
the requirements properly understood, the requirements specification activity can start. During
this activity, the user requirements are systematically organized into a Software Requirements
Specification (SRS) document.
The customer requirements identified during the requirements gathering and analysis activity
are organized into an SRS document. The important components of this document are functional
o
requirements, the non-functional requirements, and the goals of implementation. m
3.2.3. Design o t.c
p
The goal of the design phase is to transform the requirementssspecified in the SRS document
o g
l
into a structure that is suitable for implementation in some programming language. In technical
terms, during the design phase the software architecture isbderived from the SRS document. Two
p . design approach and the object-
distinctly different approaches are available: the traditional
oriented design approach.
o u
Traditional design approach: Traditional g r consists of two different activities; first a
design
ts is carried out where the detailed structure of
structured analysis of the requirements specification
the problem is examined. This is followed
e n by a structured design activity. During structured
design, the results of structured analysisdare transformed into the software design.
t u In this technique, various objects that occur in the
problem domain and the solutionys
Object-oriented design approach:
exist among these objects areitidentified. The object structure is further refined to obtain the
domain are first identified, and the different relationships that
detailed design. .c
w
3.2.4. Coding w
w
and Unit Testing
The purpose of the coding and unit testing phase (sometimes called the implementation
phase) of software development is to translate the software design into source code. Each
component of the design is implemented as a program module. The end-product of this phase is a
set of program modules that have been individually tested.
During this phase, each module is unit tested to determine the correct working of all the
individual modules. It involves testing each module in isolation as this is the most efficient way
to debug the errors identified at this stage.
3.2.5. Integration and System Testing

Integration of different modules is undertaken once they have been coded and unit tested.
During the integration and system testing phase, the modules are integrated in a planned manner.

The different modules making up a software product are almost never integrated in one shot.
Integration is normally carried out incrementally over a number of steps. During each integration
step, the partially integrated system is tested and a set of previously planned modules are added
to it. Finally, when all the modules have been successfully integrated and tested, system testing is
carried out. The goal of system testing is to ensure that the developed system conforms to the
requirements laid out in the SRS document. System testing usually consists of three different
kinds of testing activities:
testing: It is the system testing performed by the development team.
testing: It is the system testing performed by a friendly set of customers.
Acceptance testing: It is the system testing performed by the customer himself after
product delivery to determine whether to accept or reject the delivered product.
System testing is normally carried out in a planned manner according to the system test plan
document. The system test plan identifies all testing-related activities that must be performed,
m
specifies the schedule of testing, and allocates resources. It also lists all the test cases and the
o
t.c
expected outputs for each test case.
3.2.6. Maintenance p o
g s
o
bl
Maintenance of a typical software product requires much more than the effort necessary to
p .
develop the product itself. Many studies carried out in the past confirm this and indicate that the
relative effort of development of a typical software product to its maintenance effort is roughly
o u
in the 40:60 ratio. Maintenance involves performing any one or more of the following three
kinds of activities: r
g during the product development phase. This is
ts
Correcting errors that were not discovered
called corrective maintenance.
e n
Improving the implementation d
t u requirements. This is called perfective maintenance.
of the system, and enhancing the functionalities of the
s
system according to the customers
Porting the software totywork in a new environment. For example, porting may be
. ci to work on a new computer platform or with a new operating
required to get the software
w adaptive maintenance.
system. This is called
w
w of the Classical Waterfall Model
3.2.7. Shortcomings
The classical waterfall model is an idealistic one since it assumes that no development error
is ever committed by the engineers during any of the life cycle phases. However, in practical
development environments, the engineers do commit a large number of errors in almost every
phase of the life cycle. The source of the defects can be many: oversight, wrong assumptions, use
of inappropriate technology, communication gap among the project engineers, etc. These defects
usually get detected much later in the life cycle. For example, a design defect might go unnoticed
till we reach the coding or testing phase. Once a defect is detected, the engineers need to go back
to the phase where the defect had occurred and redo some of the work done during that phase
and the subsequent phases to correct the defect and its effect on the later phases. Therefore, in
any practical software development work, it is not possible to strictly follow the classical
waterfall model.

3.2.8. Phase-Entry and Phase-Exit Criteria
At the start of the feasibility study, project managers or team leaders try to understand what
the actual problem is, by visiting the client side. At the end of that phase, they pick the best
solution and determine whether the solution is feasible financially and technically.
At the start of requirements analysis and specification phase, the required data is collected.
After that requirement specification is carried out. Finally, SRS document is produced.
At the start of design phase, context diagram and different levels of DFDs are produced
according to the SRS document. At the end of this phase module structure (structure chart) is
produced.
During the coding phase each module (independently compilation unit) of the design is
coded. Then each module is tested independently as a stand-alone unit and debugged separately.
After this each module is documented individually. The end product of the implementation phase
is a set of program modules that have been tested individually but not tested together.
m
After the implementation phase, different modules which have been tested individually are
o
t.c
integrated in a planned manner. After all the modules have been successfully integrated and
tested, system testing is carried out.
p o
Software maintenance denotes any changes made to a software product after it has been
s
delivered to the customer. Maintenance is inevitable for almost any kind of product. However,
g
o
most products need maintenance due to the wear and tear caused by use.
. bl
3.3. Prototyping Model
u p
r o
A prototype is a toy implementation of the system. A prototype usually exhibits limited
s g
functional capabilities, low reliability, and inefficient performance compared to the actual
ent
software. A prototype is usually built using several shortcuts. The shortcuts might involve using
inefficient, inaccurate, or dummy functions. The shortcut implementation of a function, for
u d
example, may produce the desired results by using a table look-up instead of performing the
st
actual computations. A prototype usually turns out to be a very crude version of the actual
system.
it y
3.3.1. The Need for
c
. a Prototype
w
w
w
There are several uses of a prototype. An important purpose is to illustrate the input data
formats, messages, reports, and the interactive dialogues to the customer. This is a valuable
mechanism for gaining better understanding of the customers needs.
how screens might look like
how the user interface would behave
how the system would produce outputs, etc.
This is something similar to what the architectural designers of a building do; they show a
prototype of the building to their customer. The customer can evaluate whether he likes it or not
and the changes that he would need in the actual product. A similar thing happens in the case of a
software product and its prototyping model.
32.

3.4. Spiral Model
o m
o t.c
p
The Spiral model of software development is shown in fig. 33.8. The diagrammatic
s
g
representation of this model appears like a spiral with many loops. The exact number of loops in
o
bl
the spiral is not fixed. Each loop of the spiral represents a phase of the software process. For
p .
example, the innermost loop might be concerned with feasibility study; the next loop with
requirements specification; the next one with design, and so on. Each phase in this model is split
u
into four sectors (or quadrants) as shown in fig. 33.8. The following activities are carried out
o
during each phase of a spiral model.
gr
s
First quadrant (Objective Setting):
ent
During the first quadrant, we need to identify the objectives of the phase.
d
Examine the risks associated with these objectives
u
st
t y
Second quadrant (Risk Assessment and Reduction):
i
A detailed analysis is carried out for each identified project risk.
.c
Steps are taken to reduce the risks. For example, if there is a risk that the requirements
w
w
are inappropriate, a prototype system may be developed
w
Third quadrant (Objective Setting):
Develop and validate the next level of the product after resolving the identified risks.
Fourth quadrant (Objective Setting):

Review the results achieved so far with the customer and plan the next iteration around
the spiral.
With each iteration around the spiral, progressively a more complete version of the
software gets built.
3.4.1. A Meta Model

The spiral model is called a meta-model since it encompasses all other life cycle models.
Risk handling is inherently built into this model. The spiral model is suitable for development of

technically challenging software products that are prone to several kinds of risks. However, this
model is much more complex than the other models. This is probably a factor deterring its use in
ordinary projects.
3.5. Comparison of Different Life Cycle Models

The classical waterfall model can be considered as the basic model and all other life cycle
models as embellishments of this model. However, the classical waterfall model can not be used
in practical development projects, since this model supports no mechanism to handle the errors
committed during any of the phases.
This problem is overcome in the iterative waterfall model. The iterative waterfall model is
probably the most widely used software development model evolved so far. This model is simple
to understand and use. However, this model is suitable only for well-understood problems; it is
not suitable for very large projects and for projects that are subject to many risks.
o m
The prototyping model is suitable for projects for which either the user requirements or the
underlying technical aspects are not well understood. This model is especially popular for
development of the user-interface part of the projects.
o t.c
The evolutionary approach is suitable for large problems which can be decomposed into a
s p
set of modules for incremental development and delivery. This model is also widely used for
g
object-oriented development projects. Of course, this model can only be used if the incremental
o
bl
delivery of the system is acceptable to the customer.
.
The spiral model is called a meta-model since it encompasses all other life cycle models.
p
ou
Risk handling is inherently built into this model. The spiral model is suitable for development of
technically challenging software products that are prone to several kinds of risks. However, this
gr
model is much more complex than the other models. This is probably a factor deterring its use in
s
nt
ordinary projects.
The different software life cycle models can be compared from the viewpoint of the
d e
customer. Initially, customer confidence in the development team is usually high irrespective of
t u
the development model followed. During the long development process, customer confidence
s
normally drops, as no working product is immediately visible. Developers answer customer
y
it
queries using technical slang, and delays are announced. This gives rise to customer resentment.
.c
On the other hand, an evolutionary approach lets the customer experiment with a working
w
product much earlier than the monolithic approaches. Another important advantage of the
w
incremental model is that it reduces the customers trauma of getting used to an entirely new
w
system. The gradual introduction of the product via incremental phases provides time to the
customer to adjust to the new product. Also, from the customers financial viewpoint,
incremental development does not require a large upfront capital outlay. The customer can order
the incremental versions as and when he can afford them.
3.6. Exercises
1. Mark the following as True or False. Justify your answer.
a. All software engineering principles are backed by either scientific basis or theoretical
proof.
b. There are well defined steps through which a problem is solved using an exploratory
style.
c. Evolutionary life cycle model is ideally suited for development of very small software
products typically requiring a few months of development effort.

d. Prototyping life cycle model is the most suitable one for undertaking a software
development project susceptible to schedule slippage.
e. Spiral life cycle model is not suitable for products that are vulnerable to a large
number of risks.
2. For the following, mark all options which are true.
a. Which of the following problems can be considered to be contributing to the present
software crisis?
large problem size
lack of rapid progress of software engineering
lack of intelligent engineers
shortage of skilled manpower
b. Which of the following are essential program constructs (i.e. it would not be possible
to develop programs for any given problem without using the construct)?
Sequence
Selection
Jump o m
Iteration
o t.c
c. In a classical waterfall model, which phase precedes the design phase?
Coding and unit testing s p
Maintenance
o g
Feasibility study . bl
u p
d. Among development phases of software life cycle, which phase typically consumes
the maximum effort? r o
g
s
Design
Coding ent
Testing
u d
st
e. Among all the phases of software life cycle, which phase consumes the maximum
effort?
Design it y
Maintenance .c
Testing
w
w
f.
Codingw
In the classical waterfall model, during which phase is the Software Requirement
Specification (SRS) document produced?
Design
Maintenance
Coding
g. Which phase is the last development phase in the classical waterfall software life
cycle?
Design
Maintenance
Testing
Coding

h. Which development phase in classical waterfall life cycle immediately follows coding
phase?
Design
Maintenance
Testing
Requirement analysis and specification
3. Identify the problem one would face, if he tries to develop a large software product without
using software engineering principles.
4. Identify the two important techniques that software engineering uses to tackle the problem
of exponential growth of problem complexity with its size.
5. State five symptoms of the present software crisis.
6. State four factors that have contributed to the making of the present software crisis.
7. Suggest at least two possible solutions to the present software crisis.
8. Identify at least four basic characteristics that differentiate a simple program from a
9.
software product.
o m
Identify two important features of that a program must satisfy to be called as a structured
10.
program.
Explain exploratory program development style. o t.c
11.
s p
Show at least three important drawbacks of the exploratory programming style.
12. g
Identify at least two advantages of using high-level languages over assembly languages.
o
bl
13. State at least two basic differences between control flow-oriented and data flow-oriented
14.
design techniques.
p .
State at least five advantages of object-oriented design techniques.
15.
ou
State at least three differences between the exploratory style and modern styles of software
development.
gr
s
nt
16. Explain the problems that might be faced by an organization if it does not follow any
software life cycle model.
17.
d e
Differentiate between structured analysis and structured design.
18. u
Identify at least three activities undertaken in an object-oriented software design approach.
t
19. s
State why it is a good idea to test a module in isolation from other modules.
y
20.
it
Identify why different modules making up a software product are almost never integrated
in one shot.
.c
21.
w
Identify the necessity of integration and system testing.
22. w
Identify six different phases of a classical waterfall model. Mention the reasons for which
23.
w
classical waterfall model can be considered impractical and cannot be used in real projects.
Explain what a software prototype is. Identify three reasons for the necessity of developing
a prototype during software development.
24. Explain the situations under which it is beneficial to develop a prototype during software
development.
25. Identify the activities carried out during each phase of a spiral model. Discuss the
advantages of using spiral model.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Issues
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
34
Requirements Analysis
and Specification

Get an overview of Requirements Gathering And Analysis
Identify the important parts and properties of a SRS document
Identify and document the functional and non-functional requirements from a
problem description
Identify the problems that an organization would face if it does not develop SRS
documents
Identify the problems that an unstructured specification would create during software
development
Understand the basics of decision trees and decision table
Understand formal techniques and formal specification languages
Differentiate between model-oriented and property-oriented approaches
Explain the operational semantics of a formal method o m
o t.c
Identify the merits and limitations of formal requirements specification
Explain and develop axiomatic specification and algebraic specification
p
Identify the basic properties of a good algebraic specification
s
State the properties of a structured specification
o g
. bl
State the advantages and disadvantages of algebraic specifications
State the features of an executable specification language (4GL) with suitable
examples u p
r o
1. Introduction s g
ent
The requirements analysis and specification phase starts once the feasibility study phase is
u d
complete and the project is found to be technically sound and feasible. The goal of the
st
requirements analysis and specification phase is to clearly understand customer requirements and
t y
to systematically organize these requirements in a specification document. This phase consists of
i
.c
the following two activities:
Requirements Gathering And Analysis
w
Requirements Specification
w
w Gathering And Analysis
2. Requirements
The analyst starts requirements gathering and analysis activity by collecting all information
from the customer which could be used to develop the requirements of the system. He then
analyses the collected information to obtain a clear and thorough understanding of the product to
be developed, with a view to removing all ambiguities and inconsistencies from the initial
customer perception of the problem. The following basic questions pertaining to the project
should be clearly understood by the analyst in order to obtain a good grasp of the problem:
What is the problem?
Why is it important to solve the problem?
What are the possible solutions to the problem?
What exactly are the data input to the system and what exactly are the data output by the
system?

What are the likely complexities that might arise while solving the problem?
If there are external software or hardware with which the developed software has to
interface, then what exactly would the data interchange formats with the external system
be?
After the analyst has understood the exact customer requirements, he proceeds to identify and
resolve the various requirements problems. The most important requirements problems that the
analyst has to identify and eliminate are the problems of anomalies, inconsistencies, and
incompleteness. When the analyst detects any inconsistencies, anomalies or incompleteness in
the gathered requirements, he resolves them by carrying out further discussions with the end-
users and the customers.
3. SRS Document
After the analyst has collected all the requirements information regarding the software to be
o m
developed, and has removed all the incompleteness, in consistencies, and anomalies from the
document.
o t.c
specification, he starts to systematically organize the requirements in the form of an SRS
The important parts of SRS document are:

Functional requirements of the system s p
Non-functional requirements of the system, and o g
Goals of implementation
. bl
u p
3.1.1. Functional Requirements
r o
s g functionalities required from the system. Here
The functional requirements part discusses
n
we list all high-level functions {f } that the
t system performs. Each high-level function f , as
the
i
d e
shown in fig. 34.1, is considered as a transformation
i
of a set of input data to some corresponding
t u
output data. The user can get some meaningful piece of work done using a high-level function.
s
ti y
.c Input Output
w
w data fi
data
w
Fig. 34.1 Function fi
3.1.2. Non-Functional Requirements

Non-functional requirements deal with the characteristics of the system which can not be
expressed as functions - such as the maintainability of the system, portability of the system,
usability of the system, etc.
Non-functional requirements may include:

Reliability issues
Accuracy of results
Human-computer interface issues
Constraints on the system implementation, etc.

3.1.3. Goals of Implementation

The goals of implementation part documents some general suggestions regarding
development. These suggestions guide trade-off among design goals. The goals of
implementation section might document issues such as revisions to the system functionalities
that may be required in the future, new devices to be supported in the future, reusability issues,
etc. These are the items which the developers might keep in their mind during development so
that the developed system may meet some aspects that are not required immediately.
3.1.4. Identify Functional Requirements

The high-level functional requirements often need to be identified either from an informal
problem description document or from a conceptual understanding of the problem. Each high-
level requirement characterizes a way of system usage by some user to perform some meaningful
m
piece of work. There can be many types of users of a system and their requirements from the
o
t.c
system may be very different. So, it is often useful to identify the different types of users who
might use the system and then try to identify the requirements from each users perspective.
p o
Here we list all functions {f } that the system performs. Each
i
g
34.1, is considered as a transformation of a set of input data to some
s function f , as shown in fig.
i
corresponding output data.
lo
Example . b
Consider the case of the library system, where u p
r o
F1: Search Book function (fig. 34.2)
s g
Input: An authors name n t
Output:
e
d and the location of these books in the library
Details of the authors books
t u
s
i ty
.cAuthor Book
w
w name f1
details
w
Fig. 34.2 Book Function
So, the function Search Book (F1) takes the author's name and transforms it into book details.
Functional requirements actually describe a set of high-level requirements, where each high-
level requirement takes some data from the user and provides some data to the user as an output.
Also each high-level requirement might consist of several other functions.
3.1.5. Document Functional Requirements

For documenting the functional requirements, we need to specify the set of functionalities
supported by the system. A function can be specified by identifying the state at which the data is

to be input to the system, its input data domain, the output data domain, and the type of
processing to be carried on the input data to obtain the output data. Let us first try to document
the withdraw-cash function of an ATM (Automated Teller Machine) system. The withdraw-cash
is a high-level requirement. It has several sub-requirements corresponding to the different user
interactions. These different interaction sequences capture the different scenarios.
Example: Withdraw Cash from ATM
R1: withdraw cash

Description: The withdraw cash function first determines the type of account that the user has
and the account number from which the user wishes to withdraw cash. It checks the balance to
determine whether the requested amount is available in the account. If enough balance is
available, it outputs the required cash, otherwise it generates an error message.
R1.1: select withdraw amount option

Input: withdraw amount option
o m
t.c
Output: user prompted to enter the account type
R1.2: select account type

p o
Input: user option
g s
Output: prompt to enter amount
o
R1.3: get required amount . bl
u p
Input: amount to be withdrawn in integer values greater than 100 and less than 10,000 in
multiples of 100.
r o
g
Output: The requested cash and printed transaction statement.
s
nt
Processing: the amount is debited from the users account if sufficient balance is available,
otherwise an error message displayed.
d e
3.1.6. Properties of a Good t u SRS Document
y s
t
The important properties ofi a good SRS document are the following:
.cshould be concise and at the same time unambiguous, consistent,
w irrelevant descriptions reduce readability and also increase error
Concise: The SRS document
w
and complete. Verbose and
possibilities. w
Structured: It should be well-structured. A well-structured document is easy to understand and
modify. In practice, the SRS document undergoes several revisions to cope up with the customer
requirements. Often, the customer requirements evolve over a period of time. Therefore, in order
to make the modifications to the SRS document easy, it is important to make the document well-
structured.
Black-box view: It should only specify what the system should do and refrain from stating how
to do these. This means that the SRS document should specify the external behaviour of the
system and not discuss the implementation issues. The SRS document should view the system to
be developed as black box, and should specify the externally visible behaviour of the system. For
this reason, the SRS document is also called the black-box specification of a system.
Conceptual integrity: It should show conceptual integrity so that the reader can easily
understand it.

Response to undesired events: It should characterize acceptable responses to undesired events.

These are called system response to exceptional conditions.
Verifiable: All requirements of the system as documented in the SRS document should be
verifiable. This means that it should be possible to determine whether or not requirements have
been met in an implementation.
3.1.7. Problems without a SRS Document

The important problems that an organization would face if it does not develop an SRS
document are as follows:
Without developing the SRS document, the system would not be implemented
according to customer needs.
Software developers would not know whether what they are developing is what
exactly is required by the customer.
m
Without SRS document, it will be very difficult for the maintenance engineers to
o
t.c
understand the functionality of the system.
It will be very difficult for user document writers to write the users manuals
properly without understanding the SRS document.
p o
g s
o
3.1.8. Identify Non-Functional Requirements
l
. b
Non-functional requirements may include:
Reliability issues u p
Performance issues r o
Human - computer interface issues s g
n
Interface with other external systems t
e
Security and maintainability of the system, etc.
d
t
3.1.9. Problems with AnsUnstructured
u
Specification
i t y
. c
The problems that an unstructured specification would create during software development
are as follows: wdifficult to understand that document.
w
It would be very
w
It would be very difficult to modify that document.
Conceptual integrity in that document would not be shown.
The SRS document might be ambiguous and inconsistent.
3.1.10. Techniques for Representing Complex Logic

A good SRS document should properly characterize the conditions under which different
scenarios of interaction occur. Sometimes such conditions are complex and several alternative
interaction and processing sequences may exist.
There are two main techniques available to analyze and represent complex processing logic:
decision trees and decision tables.

1. Decision Trees
A decision tree gives a graphic view of the processing logic involved in decision making and
the corresponding actions taken. The edges of a decision tree represent conditions and the leaf
nodes represent the actions to be performed depending on the outcome of testing the condition.
Example
Consider Library Membership Automation Software (LMS) where it should support the
following three options:
New member
Renewal
Cancel membership
New member option

Decision: When the 'new member' option is selected, the software asks details about the member
like member's name, address, phone number etc.
o m
o t.c
Action: If proper information is entered, then a membership record for the member is created
and a bill is printed for the annual membership charge plus the security deposit payable.
s p
Renewal option
o g
bl
Decision: If the 'renewal' option is chosen, the LMS asks for the member's name and his
p .
membership number to check whether he is a valid member or not.
Action: If the membership is valid then membership expiry date is updated and the annual
u
membership bill is printed, otherwise an error message is displayed.
o
g r
Cancel membership option
s
Decision: If the 'cancel membership' optiont is selected, then the software asks for member's
name and his membership number. e n
Action: The membership is cancelled,da cheque for the balance amount due to the member is
t u is deleted from the database.
s
printed and finally the membership record
y
t
i the above example
. c
Decision tree representation of
The following tree (fig. 34.3) shows the graphical representation of the above example. After
w the user, the system makes a decision and then performs the
getting information from
w
w
corresponding actions.

Action
Get details
Create records
Print bills
New member
Get details
Renewal Update record
Print bills
User output
Cancellation
Get details
o m
Print cheque
Delete record
Invalid Option
o t.c
p
s Print error
o g massage
Fig. 34.3 Decision tree for. blLMS
u p
2. Decision Tables
r o
A decision table is used to represent the s g processing logic in a tabular or a matrix
form. The upper rows of the table specify the n t variables or conditions to be evaluated. The lower
complex
rows of the table specify the actions to beetaken when the corresponding conditions are satisfied.
u d
Example s t
shows how to represent c ityproblemLMS
Consider the previously discussed example. The decision table shown in fig. 34.4
. the conditions and the lower part shows what actions are taken.
the in a tabular form. Here the table is divided into two
Each column of thew

w
parts. The upper part shows
table is a rule.
w

Conditions
Valid selection No Yes Yes Yes
New member - Yes No No
Renewal - No Yes No
Cancellation - No No Yes
Actions
Display error message x - - -
Ask member's details - x - -
Build customer record - - x -
Generate bill - x x -
Ask member's name & membership number - - x x
Update expiry date - - xo-
m
Print cheque - -
o t.-c x
Delete record -
s p- - x
o
Fig. 34.4 Decision table for LMS g
. bl
u p
From the above table you can easily understand that, if the valid selection condition is false,
then the action taken for this condition is 'display error message' and so on.
r o
4. Formal Requirements Specification s g
n t
A formal technique is a mathematicalemethod to specify a hardware and/or software system,
u d
verify whether a specification is realizable, verify that an implementation satisfies its
specification, prove properties of at system without necessarily running the system, etc. The
s
ti y is provided by the specification language.
mathematical basis of a formal method
.c
4.1. Formal Specification w Language
w language consists of two sets syn and sem, and a relation sat between
w
A formal specification
them. The set syn is called the syntactic domain, the set sem is called the semantic domain, and
the relation sat is called the satisfaction relation. For a given specification syn, and model of the
system sem, if sat(syn, sem) as shown in fig.34.5, then syn is said to be the specification of sem,
and sem is said to be the specificand of syn.
SYN SEM
SAT
Fig. 34.5 sat (syn, sem)

4.1.1. Syntactic Domains

The syntactic domain of a formal specification language consists of an alphabet of symbols
and set of formation rules to construct well-formed formulae from the alphabet. The well-formed
formulae are used to specify a system.
4.1.2. Semantic Domains

Formal techniques can have considerably different semantic domains. Abstract data type
specification languages are used to specify algebras, theories, and programs. Programming
languages are used to specify functions from input to output values. Concurrent and distributed
system specification languages are used to specify state sequences, event sequences, state-
transition sequences, synchronization trees, partial orders, state machines, etc.
4.1.3. Satisfaction Relation o m

t.celement of the semantic
Given the model of a system, it is important to determine whether an
domain satisfies the specifications. This satisfaction is determined p oby using a homomorphism
known as semantic abstraction function. The semantic abstractionsfunction maps the elements of
the semantic domain into equivalent classes. There can beodifferentg specifications describing
b l specification languages. Some of
different aspects of a system model, possibly using different
.
these specifications describe the systems behaviourp and the others describe the systems
structure. Consequently, two broad classes of semantic
o u abstraction functions are defined: those
r a systems structure.
that preserve a systems behaviour and those that preserve
g
ts
4.2. Model-Oriented Vs. Property-Oriented
e n Approach
u d
property oriented approaches. In t
Formal methods are usually classified into two broad categories model oriented and
s a model-oriented style, one defines a systems behaviour
directly by constructing a modelty
c i etc.
of the system in terms of mathematical structures such as tuples,
.
relations, functions, sets, sequences,
In the property-orientedw style, the system's behaviour is defined indirectly by stating its
wform of a set of axioms that the system must satisfy.
properties, usually in the
Example
w
Let us consider a simple producer/consumer example. In a property-oriented style, we would
probably start by listing the properties of the system like: the consumer can start consuming
only after the producer has produced an item, the producer starts to produce an item only
after the consumer has consumed the last item, etc. Examples of property-oriented
specification styles are axiomatic specification and algebraic specification. In a model-
oriented approach, we start by defining the basic operations, p (produce) and c (consume).
Then we can state that S1 + p S, S + c S1. Thus the model-oriented approaches
essentially specify a program by writing another, presumably simpler program. Examples of
popular model-oriented specification techniques are Z, CSP, CCS, etc.
In the property-oriented style, the system's behaviour is defined indirectly by stating its
properties, usually in the form of a set of axioms that the system must satisfy.

Model-oriented approaches are more suited to use in later phases of life cycle because here
even minor changes to a specification may lead to drastic changes to the entire specification.
They do not support logical conjunctions (AND) and disjunctions (OR).
Property-oriented approaches are suitable for requirements specification because they can be
easily changed. They specify a system as a conjunction of axioms and you can easily replace one
axiom with another one.
4.3. Operational Semantics

Informally, the operational semantics of a formal method is the way computations are
represented. There are different types of operational semantics according to what is meant by a
single run of the system and how the runs are grouped together to describe the behaviour of the
system. Some commonly used operational semantics are as follows:
4.3.1. Linear Semantics

o m
t
In this approach, a run of a system is described by a sequence (possibly.c infinite) of events or
p o
states. The concurrent activities of the system are represented by non-deterministic inter-leavings
of the automatic actions. For example, a concurrent activity ab
g s is represented by the set of
lo of the set of all its runs. To make

sequential activities a;b and b;a. This is a simple but rather unnatural representation of
concurrency. The behaviour of a system in this model consists
b
. are imposed on computations to
exclude the unwanted interleaving. u p
this model realistic, usually justice and fairness restrictions
r o
4.3.2. Branching Semantics s g
t
n is represented by a directed graph as shown in the
e
In this approach, the behaviour of a system
d the possible states in the evolution of a system. The
u
fig. 34.6. The nodes of the graph represent
t represent the states which can be generated by any of the
descendants of each node of the graph
y s
i t
atomic actions enabled at that state. Although this semantic model distinguishes the branching
.c
points in a computation, still it represents concurrency by interleaving.
w
w
w
Fig. 34.6 Branching semantics

4.3.3. Maximally Parallel Semantics

In this approach, all the concurrent actions enabled at any state are assumed to be taken
together. This is again not a natural model of concurrency since it implicitly assumes the
availability of all the required computational resources.
4.3.4. Partial Order Semantics

Under this view, the semantics ascribed to a system is a structure of states satisfying a partial
order relation among the states (events). The partial order represents a precedence ordering
among events, and constrains some events to occur only after some other events have occurred;
while the occurrence of other events (called concurrent events) is considered to be incomparable.
This fact identifies concurrency as a phenomenon not translatable to any interleaved
representation.
o m
A
o t.c
s p
o g
B
bl
p.
D F
o u
gr
ts
C
en E
u d
st
Fig. 34.7 Partial order semantics
i ty
For example, Fig. 34.7 shows that we can compare node B with node D, but we can't
compare node D with node A. .c
w
4.4. w
Merits of Formal Requirements Specification
w
Formal methods possess several positive features, some of which are discussed below.
Formal specifications encourage rigour. Often, the very process of construction of a
rigorous specification is more important than the formal specification itself. The
construction of a rigorous specification clarifies several aspects of system behaviour
that are not obvious in an informal specification.
Formal methods usually have a well-founded mathematical basis. Thus, formal
specifications are not only more precise, but also mathematically sound and can be
used to reason about the properties of a specification and to rigorously prove that an
implementation satisfies its specifications.
Formal methods have well-defined semantics. Therefore, ambiguity in specifications
is automatically avoided when one formally specifies a system.

The mathematical basis of the formal methods facilitates automating the analysis of
specifications. For example, a tableau-based technique has been used to automatically
check the consistency of specifications. Also, automatic theorem proving techniques
can be used to verify that an implementation satisfies its specifications. The possibility
of automatic verification is one of the most important advantages of formal methods.
Formal specifications can be executed to obtain immediate feedback on the features of
the specified system. This concept of executable specifications is related to rapid
prototyping. Informally, a prototype is a toy working model of a system that can
provide immediate feedback on the behaviour of the specified system, and is
especially useful in checking the completeness of specifications.
4.5. Limitations of Formal Requirements Specification
o m
It is clear that formal methods provide mathematically sound frameworks using which
systems can be specified, developed and verified in a systematic manner. However, formal
methods suffer from several shortcomings, some of which are the following:
Formal methods are difficult to learn and use. o t.c
The basic incompleteness results of first-order logic s
p
g suggest that it is impossible to
o proving techniques.
check absolute correctness of systems using theorem
l
b problems. This shortcoming results
Formal techniques are not able to handle complex .
p problems blow up the complexity of
u
from the fact that, even moderately complicated
o a large unstructured set of mathematical
formulae is difficult to comprehend. gr
formal specification and their analysis. Also,
ts
5. Axiomatic Specification en
ud first-order logic is used to write the pre and post-
In axiomatic specification of a tsystem,
conditions to specify the operationsy s of the system in the form of axioms. The pre-conditions
t
i that must be satisfied before an operation can successfully be
. c
basically capture the conditions
invoked. In essence, the pre-conditions capture the requirements on the input parameters of a
w the conditions that must be satisfied when a function completes
function. The post-conditions
w isareconsidered
execution and the function to have been executed successfully. Thus, the post-
w
conditions are essentially the constraints on the results produced for the function execution to be
considered successful.
5.1. Steps to Develop an Axiomatic Specification

The following are the sequence of steps that can be followed to systematically develop the
axiomatic specifications of a function:
Establish the range of input values over which the function should behave correctly. Also
find out other constraints on the input parameters and write them in the form of a
predicate.
Specify a predicate defining the conditions which must hold on the output of the function
if it behaved properly.

Establish the changes made to the functions input parameters after execution of the
function. Pure mathematical functions do not change their input and therefore this type of
assertion is not necessary for pure functions.
Combine all of the above into pre and post conditions of the function.
5.2. Examples
Example 1
Specify the pre- and post-conditions of a function that takes a real number as argument and
returns half the input value if the input is less than or equal to 100, or else returns double the
value.
f (x : real) : real
pre : x R
post : {(x100) (f(x) = x/2)} {(x>100) (f(x) = 2x)}
o m
Example 2
t.c array and an integer
Axiomatically specify a function named search which takes an integer
key value as its arguments and returns the index in the array where p o the key value is present.
g s : Integer
search(X : IntArray, key : Integer)
lo X[i] = key
pre : i [Xfirst.Xlast],
post : {(X[search(X,bkey)] = key) (X = X)}
Here, the convention that has been followed is that, p . if a function changes any of its input
parameters, and if that parameter is named X, then
o u it has been referred that after the function
completes execution as X.
g r
ts
6. Algebraic Specification n
d e
tu an object class or type is specified in terms of
In the algebraic specification technique
relationships existing between thes operations defined on that type. It was first brought into
i
prominence by Guttag [1980, 1985] t y in specification of abstract data types. Various notations of
algebraic specifications have.c
evolved, including those based on OBJ and Larch languages.
w
6.1. Representation w of Algebraic Specification
w
Essentially, algebraic specifications define a system as a heterogeneous algebra. A
heterogeneous algebra is a collection of different sets on which several operations are defined.
Traditional algebras are homogeneous. A homogeneous algebra consists of a single set and
several operations; {I, +, -, *, /}. In contrast, alphabetic strings together with operations of
concatenation and length {A, I, con, len}, is not a homogeneous algebra, since the range of the
length operation is the set of integers. To define a heterogeneous algebra, we first need to specify
its signature, the involved operations, and their domains and ranges. Using algebraic
specification, we define the meaning of a set of interface procedure by using equations. An
algebraic specification is usually presented in four sections.
1. Types section
In this section, the sorts (or the data types) being used are specified.

2. Exceptions section
This section gives the names of the exceptional conditions that might occur when different
operations are carried out. These exception conditions are used in the later sections of an
algebraic specification.
3. Syntax section
This section defines the signatures of the interface procedures. The collection of sets that
form input domain of an operator and the sort where the output is produced are called the
signature of the operator. For example, PUSH takes a stack and an element and returns a new
stack.
stack x element stack
4. Equations section
This section gives a set of rewrite rules (or equations) defining the meaning of the interface
procedures in terms of each other. In general, this section is allowed to contain conditional
expressions.
6.2. Operators o m
o t.c
By convention, each equation is implicitly universally quantified over all possible values of
s p
the variables. Names not mentioned in the syntax section such r or e are variables. The first
g
step in defining an algebraic specification is to identify the set of required operations. After
o
. bl
having identified the required operators, it is helpful to classify them as basic constructor
operators, extra constructor operators, basic inspector operators, or extra inspection operators.
u p
The definition of these categories of operators is as follows:
r o
1. Basic construction operators: These operators are used to create or modify entities of a
type. The basic construction operators are g
s essential to generate all possible element of the
t and append are basic construction operators.
type being specified. For example, create
n
e are the construction operators other than the basic
2. Extra construction operators: These
construction operators. For example, u d the operator remove is an extra construction operator,
because even without using remove t
s it is possible to generate all values of the type being
specified.
i t y
. c
3. Basic inspection operators: These operators evaluate attributes of a type without
modifying them, e.g.,w eval, get, etc. Let S be the set of operators whose range is not the data
w set of the basic operators S1 is a subset of S, such that each
type being specified.
w canThe
operator from S-S1 be expressed in terms of the operators from S1.
4. Extra inspection operators. These are the inspection operators that are not basic
inspectors.
6.3. Writing Algebraic Specifications

A good rule of thumb while writing an algebraic specification, is to first establish which are
the constructor (basic and extra) and inspection operators (basic and extra). Then write down an
axiom for composition of each basic construction operator over each basic inspection operator
and extra constructor operator. Also, write down an axiom for each of the extra inspector in
terms of any of the basic inspectors. Thus, if there are m1 basic constructors, m2 extra
constructors, n1 basic inspectors, and n2 extra inspectors, we should have m1 (m2+n1) + n2
axioms are the minimum required and many more axioms may be needed to make the

specification complete. Using a complete set of rewrite rules, it is possible to simplify an

arbitrary sequence of operations on the interface procedures.
The first step in defining an algebraic specification is to identify the set of required
operations. After having identified the required operators, it is helpful to classify them as basic
constructor operators, extra constructor operators, basic inspector operators, or extra inspector
operators. A simple way to determine whether an operator is a constructor (basic or extra) or an
inspector (basic or extra) is to check the syntax expression for the operator. If the type being
specified appears on the right hand side of the expression then it is a constructor, otherwise it is
an inspection operator. For example, in case of the following example, create is a constructor
because point appears on the right hand side of the expression and point is the data type being
specified. But, xcoord is an inspection operator since it does not modify the point type.
Example
Let us specify a data type point supporting the operations create, xcoord, ycoord, and isequal
where the operations have their usual meaning.
Types:
o m
defines point
uses boolean, integer
o t.c
Syntax:
create : integer integer point s p
o g
bl
xcoord : point integer
ycoord : point integer
isequal : point point Boolean p .
Equations:
o u
xcoord(create(x, y)) = x
gr
s
nt
ycoord(create(x, y)) = y
isequal(create(x1, y1), create(x2, y2)) = ((x1 = x2) and (y1 = y2))
e
In this example, we have only one basic constructor (create), and three basic inspectors
d
u
(xcoord, ycoord, and isequal). Therefore, we have only 3 equations.
t
s
6.4. Properties of Algebraic i ty Specifications
c
. that every good algebraic specification should possess are:
w
Three important properties
w
w of operations on the interface procedures. There is no simple
Completeness: This property ensures that using the equations, it should be possible to reduce
any arbitrary sequence
procedure to ensure that an algebraic specification is complete.
Finite termination property: This property essentially addresses the following question: Do
applications of the rewrite rules to arbitrary expressions involving the interface procedures
always terminate? For arbitrary algebraic equations, convergence (finite termination) is un-
decidable. But, if the right hand side of each rewrite rule has fewer terms than the left, then
the rewrite process must terminate.
Unique termination property: This property indicates whether application of rewrite rules
in different orders always result in the same answer. Essentially, to determine this property,
the answer to the following question needs to be checked: Can all possible sequence of
choices in application of the rewrite rules to an arbitrary expression involving the interface
procedures always give the same number? Checking the unique termination property is a
very difficult problem.

6.5. Structured Specification

Developing algebraic specifications is time consuming. Therefore efforts have been made to
device ways to ease the task of developing algebraic specifications. The following are some of
the techniques that have successfully been used to reduce the effort in writing the specifications.
Incremental specification: The idea behind incremental specification is to first develop the
specifications of the simple types and then specify more complex types by using the
specifications of the simple types.
Specification instantiation: This involves taking an existing specification which has been
developed using a generic parameter and instantiating it with some other sort.
6.6. Pros and Cons of Algebraic Specifications

Algebraic specifications have a strong mathematical basis and can be viewed as
m
heterogeneous algebra. Therefore, they are unambiguous and precise. Using an algebraic
o
t.c
specification, the effect of any arbitrary sequence of operations involving the interface
procedures can automatically be studied. A major shortcoming of algebraic specifications is that
p o
they cannot deal with side effects. Therefore, algebraic specifications are difficult to interchange
s
with typical programming languages. Also, algebraic specifications are hard to understand.
g
7. Executable Specification Language b(4GL) lo
.
p or by using a programming language,
u
If the specification of a system is expressed formally
o
g r
then it becomes possible to directly execute the specification. However, executable specifications
s
are usually slow and inefficient, 4GLs3 (4th Generation Languages) are examples of executable
specification languages. 4GLs are successfult because there is a lot of commonality across data
e n reuse, where the common abstractions have been
processing applications. 4GLs rely on software
u d
identified and parameterized. Careful experiments have shown that rewriting 4GL programs in
higher level languages results in up tto 50% lower memory usage and also the program execution
s
ti y of a 4GL is Structured Query Language (SQL).
time can reduce by ten folds. Example
8. Exercises w.
c
w as True or False. Justify your answer.
1. w
Mark the following
proof.
b. Functional requirements address maintainability, portability, and usability issues.
c. The edges of decision tree represent corresponding actions to be performed according
to conditions.
d. The upper rows of the decision table specify the corresponding actions to be taken
when an evaluation test is satisfied.
e. A column in a decision table is called an attribute.
f. Pre-conditions of axiomatic specifications state the requirements on the parameters of
the function before the function can start executing.
g. Post-conditions of axiomatic specifications state the requirements on the parameters
of the function when the function is completed.

h. Homogeneous algebra is a collection of different sets on which several operations are

defined.
i. Applications developed using 4 GLs would normally be more efficient and run faster
compared to applications developed using 3 GL.
j. An SRS document normally contains
Functional requirements of the system
Module structure
Configuration management plan
Non-functional requirements of the system
Constraints on the system
k. The structured specification technique that is used to reduce the effort in writing
specification is
Incremental specification
Specification instantiation o m
Both the above
None of the above o t.c
l. Examples of executable specifications are s p
Third generation languages
o g
Fourth generation languages
Second-generation languages . bl
First generation languages u p
3. Identify the roles of a system analyst. r o
4.
s g
Identify the important parts of an SRS document. Identify the problems an organization
5. ent
might face without developing an SRS document.
Identify the non-functional requirement-issues that are considered for a given problem
description.
u d
6. t
Discuss the problems that an unstructured specification would create during software
s
development.
it y
.c
7. Identify the necessity of using formal technique in the context of requirements
specification.
8. w
Identify the differences between model-oriented and property-oriented approaches in the
w
context of requirements specification.
9. w
Explain the use of operational semantic.
10. Explain the use of algebraic specifications in the context of requirements specification.
Identify the requirements of algebraic specifications to define a system.
11. Identify the essential sections of an algebraic specification to define a system. Explain the
steps for developing algebraic specification of simple problems.
12. Identify the properties that every good algebraic specification should possess.
13. Identify the basic properties of a structured specification.
14. Discuss the advantages and disadvantages of algebraic specification.
15. Write down the important features of an executable specification language with examples.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Issues
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
35
Modelling Timing
Constraints

Explain what an event is
Classify the types of events
Classify the different types of timing constraints
Explain what a delay constraint is
Explain what a deadline constraint is
Explain what a duration constraint is
Identify the different types of delay, deadline, and duration constraints associated
with a system
Explain how timing constraints can be modelled o m
Explain a Finite State Machine (FSM)
o t.c
Explain an Extended Finite State Machine (EFSM)
s p
g
Explain how different timing constraints can be modelled using EFSM
o
bl
p.
1. Timing Constraints An Introduction
u
o
The correctness of real-time tasks depend bothron the logical correctness of the result, as well
s g constraints. The timing constraints as we shall
n t in a system. These events may be generated by
as, on the satisfaction of the corresponding timing
see in this section, in fact, apply to certain events
the tasks themselves or the environment of
d e the system. An example of such an event is the event
u result. We must first properly characterize the events
of activation of a motor. Remember that the results may be generated at different times and it
t
s
may not be in the form of a single one-time
ti y behavior of real-time systems.
in a system, to understand the timing
.c
1.1. Events in a System w
An event may w
w
be generated either by the system or its environment. Based on this
consideration, events can be classified into the following two types:
Stimulus Events: Stimulus events are generated by the environment and act on the
system. These events can be produced asynchronously (i.e. aperiodically). For example, a
user pressing a button on a telephone set generates a stimulus event to act on the telephone
system. Stimulus events can also be generated periodically. As an instance, consider the periodic
sensing of the temperature of the reactor in a nuclear plant.
Response Events: Response events are usually produced by the system in response to
some stimulus events. Response events act on the environment. For example, consider a
chemical plant where as soon as the temperature exceeds 100 C, the system responds by
switching off the heater. Here, the event of temperature exceeding 100 C is the stimulus and
switching off of the heater is the response. Response events can either be periodic or aperiodic.

An event may either be instantaneous or may have certain duration. For example, a
button press event is described by the duration for which the button was kept pressed.
Some authors argue that durational events are really not a basic type of event, but can be
expressed using other events. In fact, it is possible to consider a duration event as a combination
of two events: a start event and an end event. For example, the button press event can be
described by a combination of start button press and end button press events. However, it is
often convenient to retain the notion of a durational event. In this text, we consider durational
events as a special class of events. Using the preliminary notions about events discussed in this
subsection, we classify various types of timing constraints in subsection 1.7.1.
1.2. Classification of Timing Constraints

A classification of the different types of timing constraints is important. Not only would it
give us an insight into the different types of timing constraints that can exist in a system,
o m
but it can also help us to quickly identify the different timing constraints that can exist
from a casual examination of a problem. That is, in addition to better understanding of the
accurately. o t.c
behavior of a system, it can also let us work out the specification of a real-time system
s p
Different timing constraints associated with a real-time system can broadly be classified into
performance and behavioral constraints.
o g
bl
Performance constraints are the constraints that are imposed on the response of the system.
.
Behavioral constraints are the constraints that are imposed on the stimuli generated by the
p
environment.
ou
Behavioral constraints ensure that the environment of a system is well behaved, whereas
gr
performance constraints ensure that the computer system performs satisfactorily.
s
nt
Each of performance and behavioral constraints can further be classified into the following
three types:
Delay Constraint
d e
Deadline Constraint
t u
Duration Constraint
y s
it
These three classes of constraints are explained in the subsequent sections.
.c
w
1.2.1. Delay Constraints
w
w
A delay constraint captures the minimum time (delay) that must elapse between the
occurrence of two arbitrary events e1 and e2. After e1 occurs, if e2 occurs earlier than the
minimum delay, then a delay violation is said to occur. A delay constraint on the event e2 can be
expressed more formally as follows:
t(e2 ) t(e1 ) d
where t(e2 ) and t(e1 ) are the time stamps on the events e2 and e1 respectively and d is the
minimum delay specified from e2. A delay constraint on the events e2 with respect to the event
e1 is shown pictorially in Fig. 35.1. In Fig. 35.1s, denotes the actual separation in time
between the occurrence of the two events e1 and e2 and d is the required minimum separation
between the two events (delay). It is easy to see that e2 must occur after at least d time units have
elapsed since the occurrence of e1; otherwise we shall have a delay violation.

>= d
d
t=0 t(e1) t(e2)
Fig. 35.1 Delay Constraint between two events e1 and e2
1.2.2. Deadline Constraints

A deadline constraint captures the permissible maximum separation between any two
arbitrary events e1 and e2. In other words, the second event (i.e. e2) must follow the first event
(i.e. e1) within the permissible maximum separation time. Consider that t(e1 ) and t(e2 ) are the
time stamps on the occurrence of the events e1 and e2 respectively and d is the deadline as
shown in Fig. 35.2. In Fig. 35.2, denotes the actual separation between the time of occurrence
o m
of the two events e1 and e2, and d is the deadline. A deadline constraint implies that e2 must
satisfy the constraint:

o t.c
occur within d time units of e1s occurrence. We can alternatively state that t(e1) and t(e2) must
t(e2 ) t(e1 ) d
s p
o g
<= d
.bl
u p d
t=0 t(e1)
r o t(e2)
s g between two events e and e

n t
Fig. 35.2 Deadline Constraint 1 2
d
The deadline and delay constraints can e further be classified into two types each based on
whether the constraint is imposed on
t u 1.3.stimulus or on the response event. This has been
the
s
explained with some examples in section
ti y
1.2.3. Duration Constraints .c
w
w on an event specifies the time period over which the event acts. A
A duration constraint
w either be minimum type or maximum type. The minimum type duration
duration constraint can
constraint requires that once the event starts, the event must not end before a certain minimum
duration; whereas a maximum type duration constraint requires that once the event starts, the
event must end before a certain maximum duration elapses.

Public
Switched
Telephone
Network

Telephone
system
o m
Call
t.c
Call Receiver
Initiator
Environment
p o
gs
o
Fig. 35.3 Schematic Representation ofla Telephone System
. b
1.3. Examples of Different Types of uTiming p Constraints
r o
We illustrate the different classes of timing
s g constraints by using the examples from a
Note that I have intentionally drawn an e ntstyled telephone,
telephone system discussed in. A schematic diagram
old
of a telephone system is given in Fig. 35.3.
because its operation is easier to
understand! Here, the telephone handset
u d and the Public Switched Telephone Network (PSTN)
stexamplesystem
are considered as constituting the computer and the users as forming the environment. In
the following, we give a few simple
i t y
different types of timing constraints.
operations of the telephone system to illustrate the
. c
wa real-time system depending on whether the two events involved
Deadline constraints: In the following, we discuss four different types of deadline constraints
w
that may be identified in
w are stimulus type or response type.
in a deadline constraint
StimulusStimulus (SS): In this case, the deadline is defined between two stimuli. This is a
behavioral constraint, since the constraint is imposed on the second event which is a stimulus.
An example of an SS type of deadline constraint is the following:
Once a user completes dialling a digit, he must dial the next digit within the next 5 seconds;
otherwise an idle tone is produced.
In this example, the dialing two consecutive digits represent the two stimuli to the telephone
system.
StimulusResponse (SR): In this case, the deadline is defined on the response event, measured
from the occurrence of the corresponding stimulus event. This is a performance constraint,
since the constraint is imposed on a response event. An example of an SR type of deadline
constraint is the following:

Once the receiver of the hand set is lifted, the dial tone must be produced by the
system within 2 seconds, otherwise a beeping sound is produced until the handset is
replaced. In this example, the lifting of the receiver hand set represents a stimulus to the
telephone system and production of the dial tone is the response.
ResponseStimulus (RS): Here the deadline is on the production of response counted
from the corresponding stimulus. This is a behavioral constraint, since the constraint is
imposed on the stimulus event. An example of an RS type of deadline constraint is the following:
Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise
the system enters an idle state and an idle tone is produced.
ResponseResponse (RR): An RR type of deadline constraint is defined on two response
events. In this case, once the first response event occurs, the second response event must occur
before a certain deadline. This is a performance constraint, since the timing constraint has been
defined on a response event. An example of an RR type of deadline constraint is the following:
m
Once the ring tone is given to the callee, the corresponding ring back tone must be
o
t.c
given to the caller within two seconds, otherwise the call is terminated.
o
Here ring back tone and the corresponding ring tone are the two response events.
p
Delay Constraints: We can identify only one type of delay
g sin constraint (SS type) in the
loSS type of a delay constraint is a

telephone system example that we are considering. However, other problems it may be
possible to identify different types of delay constraints. An
b
behavioral constraint. An example of an SS type of delay .constraint is the following:
p
Once a digit is dialled, the next digit shouldu be dialled after at least 1 second.
Otherwise, a beeping sound is produced until the r o call initiator replaces the handset.
s g of dialling of the next digit (stimulus) after a
digit is dialled (also a stimulus). n t
Here the delay constraint is defined on the event
e
Duration Constraint: A durationdconstraint on an event specifies the time interval over
which the event acts. An example oftauduration constraint is the following:
s
If you press the button of thetyhandset for less than 15 seconds, it connects to the local
operator. If you press the c
i
. button for any duration lasting between 15 to 30 seconds, it
connects to the international
30 seconds, then on w
w operator. If you keep the button pressed for more than
releasing it would produce the dial tone.
w Timing Constraints
Performance Constraints Behaviorial Constraints
Delay Deadline Duration Delay Deadline Duration
RR SR SR RR RS SS SS RS
Fig. 35.4 Classification of Timing Constraints

A classification of the different types of timing constraints that we discussed in this

section is shown in Fig. 35.4. Note that a performance constraint can either be delay,
deadline, or durational type. The delay or deadline constraints on performance can either be
RR or RS type. Similarly, the behavioral constraints can either be delay, deadline, or durational
type. The delay or deadline constraints on behavior of environment can either be RS or SS type.
2. Modelling Timing Constraints

In this section, we describe how the timing constraints identified in Sec. 1.2 can be modelled.
Modelling time constraints is very important since once a model of the time constraints in a
system is constructed, it can serve as a formal specification of the system. Further, if all the
timing constraints in a system are modelled accurately, then it may even be used to automatically
generate code from it. Besides serving as a specification, modelling time constraints can help to
verify and understand a real-time system.
2.1. The Finite State Machine (FSM) o m

o t.c
s p
The modelling approach we discuss here is based on Finite State Machines (FSMs). An
FSM is a powerful tool which has long been used to model traditional systems. In an FSM, a
o g
state is defined in terms of the values assumed by some attributes. For example, the states of an
bl
elevator may be denoted in terms of its directions of motion. Here direction is the attribute,
.
u p
based on which the states up, down, and stationery are defined.
In an FSM model, at any point of time a system can be in any one of a (possibly infinite)
r o
number of states. A state is represented by a circle. The system changes state due to events that
g
change the values of, or relations among the state variables. A state change is also called a state
s
ent
transition. A transition causing event may either be an interface event that are transmitted
between the environment and the computer system or it could also be an internal event that is
d
generated and consumed solely within the system. A transition from one state to another is
u
t
represented by drawing a directed arc from the source to the destination (see Fig.35.5). The event
s
it y
causing a transition is annotated on the arc. We keep our discussions of FSM to the bare
minimum since we assume that the reader is familiar with basic FSM modelling of traditional
systems. .c
w
2.2. w
Extended Finite State Machine (EFSM)
w
We use an Extended Finite State Machine to model time constraints. EFSM extends the
traditional FSM by incorporating the action of setting a timer and the expiry event of a timer.
The notations we use for construction of EFSMs are simple and straightforward. Therefore
rather than introducing them formally, we have illustrated them through an example in Fig. 35.5.
The example shown in Fig. 35.5 describes that if an event e1 occurs when the current state of the
system is s1, then an action will be taken by setting a timer to expire in the next 20 milliseconds
and the system transits to state s2.

E1 / set timer (20 ms)
S1 S2
Fig. 35.5 Conventions Used in Drawing an EFSM
We have already discussed that events can be considered to be of two types: stimulus events
and response events. We had also discussed different types of timing constraints in Section 1.3.
Now we explain how these constraints can be modelled by using EFSMs.
2.2.1. Stimulus-Stimulus (SS)

m
Let us consider the example of an SS type of deadline constraint we had discussed in Section
o
t.c
1.3: Once the first digit has been dialled on the telephone handset, the next digit must be dialled
within the next 5 milliseconds. This has been modelled in Fig. 35.6. In Fig.35.6, we can observe
p o
that as soon as the first digit is dialled, the system enters the Await Second Digit state and the
s
timer is set to 20 milliseconds. If the next digit does not appear within 20 milliseconds, then the
g
o
timer alarm expires and the system enters the Await Caller On-hook state and a beeping sound
bl
is produced. If the second digit occurs before 20 milliseconds, then the system transits to the
.
Await Next Digit state.
u p
r o
s g Await
t
Secondndigit/
Next
Digit
e (5 ms)
set timer
d
t u
it ys
.c
w Timer alarm/beeping
w
First digit/ Await
w
set timer (5 ms) Second
Digit Await
Caller
On-hook
Fig. 35.6 Model of an SS Type of Deadline Constraint
2.2.2. Response-Stimulus (RS)

In Sec. 1.3, we had considered the following example of an RS type of deadline constraint:
Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise the system
enters an idle state and an idle tone is produced.

The EFSM model for this constraint is shown in Fig. 35.7. In Fig. 35.7, as soon as dial tone
appears, a timer is set to expire in 30 seconds and the system transits to the Await First Digit
state. If the timer expires before the first digit arrives, then the system transits to an idle state
where an idle tone is produced. Otherwise, if the digit appears first, then the system transits to
the Await Second Digit state.
Await
Second
Digit
First digit
Timer alarm/idle tone

o m
t.c
Await
Dial tone/ First
set timer Digit
p o
(30 s)
g s Idle
o
.bl
Fig. 35.7 Model of an RS p
uType of Deadline
r o
2.2.3. StimulusResponse (SR) g
ts
e n example of an SR type of deadline constraint:
In Sec. 1.3, we had considered the following
Once the receiver of the hand setd is lifted, the dial tone must be produced by the
system within 20 seconds, otherwise t u a beeping sound is produced until the handset is
replaced. y s
i t
timer is set to expire after 2.c
The EFSM model for this constraint is shown in Fig. 35.8. As soon as the handset is lifted, a
sec and the system transits to Await Dial Tone state. If the dial
w transits to Await First Digit state. Otherwise, it transits to
tone appears first, then the
Await Receiver On-hookw system
w state.

Await
First
Digit
Dial tone
Timer alarm/beeping
Await
Dial
Hand set lift/ Tone
set timer Await
(2 s)
o m
Receiver
On-hook
o t.c
s
Fig. 35.8 Model of an SR Type of Deadlinep
o g
.bl
u p Await
First
r o Digit
g
Ring-backtstone
e n
u d
st
it y Timer alarm/terminate call
.c
Await
Ring-back
w
Ring-tone/ Tone
w
set timer
w
(2 s)
Await
Receiver
On-hook
Fig. 35.9 Model of an RR Type of Deadline Constraint
2.2.4. ResponseResponse (RR)

In Sec. 1.3, we had considered the following example of an RR type of constraint: Once the
ring tone is given to the callee, the corresponding ring back tone must be given to the caller
within two seconds, otherwise the call is terminated.
The EFSM model for this constraint is shown in Fig. 35.9. In Fig. 35.9, as soon as the ring
tone is produced, the system transits to Await Ring-back Tone state, and a timer is set to expire

in 2 seconds. If the ring-back tone appears first, the system transits to Await First Digit state,
else it enters Await Receiver On-hook state, and the call is terminated.
2.2.5. Delay Constraint

A delay constraint between two events is one where after an event occurs, a minimum time
must elapse before the other event can occur. We had considered the following example of delay
constraint in Sec. 1.3: After a digit is dialed, the next digit should be dialed no sooner than 10
milliseconds. The EFSM model for it is shown in Fig. 35.10. In Fig. 35.10, if the next digit
appears before the alarm, then the beeping sound is produced and the system transits to Await
Caller On-hook state.
Await
Next
Digit
o m
Timer alarm o t.c
s p
o g
. bl
Await
u p
Next digit/beeping
First digit/
Next
Event r o
set timer
s g
nt
Await
(10 ms) Caller
d e On-hook
t u
y s
Fig. 35.10 Model of an SS Type of Delay Constraint
c it
.
2.2.6. Durational Constraint
w
w constraint, an event is required to occur for a specific duration. The
w constraint we had considered in Sec. 1.3 is the following: If you press
In case of a durational
example of a durational
the button of the handset for less than 15 seconds it connects to the local operator. If you press
the button for any duration lasting between 15 to 30 seconds, it connects to the international
operator. If you keep the button pressed for more than 30 seconds, then on releasing it would
produce the dial tone.

Local
Operator
Button
release
Button
press
Await Inter-
Set Event 1 national
alarm Button Operator
(15sec) release
Timer
alarm/
Set Await
alarm Event 2
o m
Dial
Tone
(15sec) Button
release/ t. c
dial tone
p o
Timer
alarm g s
oAwait
.bl Button
Release
u p
r o
g
Fig. 35.11 A Model of a Durational Constraint
s
ent
The EFSM model for this example is shown in Fig. 35.11. Note that we have introduced two
u d
intermediate states Await Event 1 and Await Event 2 to model a durational constraint.
st
3. Exercises y
it
. c
1.
w
Mark the following as True or False. Justify your answer.
a. A deadline w constraint between two stimuli can be considered to be a behavioural
constraintwon the environment of the system.
2. Identify and represent the timing constraints in the following air-defense system by means
of an extended state machine diagram. Classify each constraint into either performance or
behavioral constraint.
Every incoming missile must be detected within 0.2 seconds of its entering the radar
coverage area. The intercept missile should be engaged within 5 seconds of detection of
the target missile. The intercept missile should be fired after 0.1 Seconds of its engagement
but no later than 1 second.
3. Represent a wash-machine having the following specification by means of an extended
state machine diagram.
The wash-machine waits for the start switch to be pressed. After the user presses the start
switch, the machine fills the wash tub with either hot or cold water depending upon the
setting of the HotWash switch. The water filling continues until the high level is sensed.
The machine starts the agitation motor and continues agitating the wash tub until either the

preset timer expires or the user presses the stop switch. After the agitation stops, the
machine waits for the user to press the startDrying switch. After the user presses the
startDrying switch, the machine starts the hot air blower and continues blowing hot air into
the drying chamber until either the user presses the Stop switch or the preset timer expires.
4. What is the difference between a performance constraint and a behavioral constraint? Give
practical examples of each type of constraint.
5. Represent the timing constraints in a collision avoidance task in an air surveillance system
as an extended finite state machine (EFSM) diagram. The collision avoidance task
consists of the following activities.
The first subtask named radar signal processor processes the radar signal on a signal
processor to generate the track record in terms of the targets location and velocity
within 100 mSec of receipt of the signal.
The track record is transmitted to the data processor within 1 mSec after the track
record is determined.
A subtask on the data processor correlates the received track record with the track
o m
records of other targets that come close to detect potential collision that might occur
within the next 500 mSec.
o t.c
If a collision is anticipated, then the corrective action is determined within 10 mSec
by another subtask running on the data processor.
s p
g
The corrective action is transmitted to the track correction task within 25 mSec.
o
bl
6. Consider the following (partial) specification of a real-time system:
p .
The velocity of a space-craft must be sampled by a computer on-board the spacecraft at
least once every second (the sampling event is denoted by S). After sampling the velocity,
ou
the current position is computed (denoted by event C) within 100 msec, parallelly, the
gr
expected position of the space-craft is retrieved from the database within 200 msec
s
nt
(denoted by event R). Using these data, the deviation from the normal course of the space-
craft must be determined within 100 msec (denoted by event D) and corrective velocity
d e
adjustments must be carried out before a new velocity value is sampled in (the velocity
t u
adjustment event is denoted by A). Calculated positions must be transmitted to the earth
s
station at least once every minute (position transmission event is denoted by the event T).
y
it
Identify the different timing constraints in the system. Classify these into either
.c
performance or behavioral constraints. Construct an EFSM to model the system.
7. w
Construct the EFSM model of a telephone system whose (partial) behavior is described
below: w
w
After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial
tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone
appears, the first digit should to be dialled within 10 seconds and the subsequent five digits
within 5 seconds of each other. If the dialling of any of the digits is delayed, then an idle
tone is produced. The idle tone continues until the receiver handset is replaced.
8. What are the different types of timing constraints that can occur in a system? Give
examples of each.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Issues
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
36
Software Design Part 1

Identify the software design activities
State the desirable characteristics of a good software design
Understand cohesion and coupling
Explain the importance of functional independence in software design
State the features of a function-oriented design approach
State the features of an object-oriented design approach
Differentiate between function-oriented and object-oriented design approach
Identify the activities carried out during the structured analysis phase
o m
t.c
Explain the Data Flow Diagram and its importance in software design
Explain the Data Dictionary and its importance
p o
Identify whether a DFD is balanced
g s
o
bl
Draw the context diagram of any given problem
Draw the DFD model of any given problem
p .
u
Develop the data dictionary for any given problem
o
Identify common errors that can occur
r
g while constructing a DFD model
s
t model
n
Identify the shortcomings of a DFD
e chart and a flow chart
d
Differentiate between a structure
u out during transform analysis with examples
st
Identify the activities carried
Explain what is meant t y
i by transaction analysis
.c
1. Introductionww
w
The goal of the design phase is to transform the requirements specified in the SRS document
into a structure that is suitable for implementation in some programming language. A good
software design is seldom arrived by using a single step procedure, but requires several iterations
through a series of steps. Design activities can be broadly classified into two important parts:
Preliminary (or high-level) design and
Detailed design
High-level design means identification of different modules and the control relationships
among them and the definition of the interfaces among these modules. The outcome of high-
level design is called the program structure or software architecture. During detailed design, the
data structure and the algorithms of the different modules are designed. The outcome of the
detailed design stage is usually known as the module-specification document.

1.1. Characteristics of a Good Software Design

However, most researchers and software engineers agree on a few desirable characteristics
that every good software design for general application must possess. They are listed below:
Correctness: A good design should correctly implement all the functionalities identified in
the SRS document.
Understandability: A good design is easily understandable.
Efficiency: It should be efficient.
Maintainability: It should be easily amenable to change.
1.2. Current Design Approaches

Most researchers and engineers agree that a good software design implies clean
o m
decomposition of the problem into modules, and the neat arrangement of these modules in a
coupling. o t.c
hierarchy. The primary characteristics of neat module decomposition are high cohesion and low
s p
1.2.1. Cohesion
o g
. bl
u p
Most researchers and engineers agree that a good software design implies clean
decomposition of the problem into modules, and the neat arrangement of these modules in a
r o
hierarchy. The primary characteristics of neat module decomposition are high cohesion and low
coupling.
s g
ent
Cohesion is a measure of functional strength of a module. A module having high cohesion
and low coupling is said to be functionally independent of other modules. By the term functional
d
independence, we mean that a cohesive module performs a single task or function. The different
u
t
classes of cohesion that a module may possess are depicted in fig. 36.1.
s
i t y
Coincidental Logical
.cTemporal Procedural Communicational Sequential Functional
Low w High
w
w Fig. 36.1 Classification of Cohesion
Coincidental cohesion: A module is said to have coincidental cohesion, if it performs a set of

tasks that relate to each other very loosely, if at all. In this case, the module contains a random
collection of functions. It is likely that the functions have been put in the module out of pure
coincidence without any thought or design.
Logical cohesion: A module is said to be logically cohesive, if all elements of the module
perform similar operations, e.g. error handling, data input, data output, etc. An example of
logical cohesion is the case where a set of print functions generating different output reports are
arranged into a single module.
Temporal cohesion: When a module contains functions that are related by the fact that all the
functions must be executed in the same time span, the module is said to exhibit temporal

cohesion. The set of functions responsible for initialization, start-up, shutdown of some process,
etc. exhibit temporal cohesion.
Procedural cohesion: A module is said to possess procedural cohesion, if the set of functions of
the module are all part of a procedure (algorithm) in which a certain sequence of steps have to be
carried out for achieving an objective, e.g. the algorithm for decoding a message.
Communicational cohesion: A module is said to have communicational cohesion, if all
functions of the module refer to or update the same data structure, e.g. the set of functions
defined on an array or a stack.
Sequential cohesion: A module is said to possess sequential cohesion, if the elements of a
module form the parts of sequence, where the output from one element of the sequence is input
to the next.
Functional cohesion: Functional cohesion is said to exist, if different elements of a module
cooperate to achieve a single function. For example, a module containing all the functions
m
required to manage employees pay-roll displays functional cohesion. Suppose a module displays
o
t.c
functional cohesion, and we are asked to describe what the module does, then we would be able
to describe it using a single sentence.
p o
1.2.2. Coupling g s
o
l of interdependence or interaction
.
Coupling between two modules is a measure of the degree b
between the two modules. A module having high cohesion
u p interchange
and low coupling is said to be
o
functionally independent of other modules. If two modules large amounts of data,
then they are highly interdependent. The degree rof coupling between two modules depends on
s g is basically determined by the number of
their interface complexity. The interface complexity
n
types of parameters that are interchanged whilet invoking the functions of the module. Even if no
techniques to precisely and quantitatively
d e estimate the coupling between two modules exist
u Five types of coupling can occur between any two
today, classification of the different types of coupling will help to quantitatively estimate the
t
s
degree of coupling between two modules.
modules as shown in fig. 36.2. y
ti
. c
Datew Stamp Control Common Content
w
Low w High
Fig. 36.2 Classification of coupling
Stamp Coupling: Two modules are stamped coupled, if they communicate using a composite
data item such as a record in PASCAL or a structure in C.
Control coupling: Control coupling exists between two couples, if data from one module is used
to direct the order of instructions execution in another. An example of control coupling is a flag
set in one module and tested in another module.
Common coupling: Two modules are common coupled, if they share some global data items.
Content coupling: Content coupling exists between two modules, if their code is shared, e.g. a
branch from one module into another module.

1.2.3. Functional Independence

A module having high cohesion and low coupling is said to be functionally independent of
other modules. By the term functional independence, we mean that a cohesive module performs
a single task or function. A functionally independent module has minimal interaction with other
modules.
Functional independence is a key to any good design primarily due to the following reasons:
Error isolation: Functional independence reduces error propagation. The reason behind this
is that if a module is functionally independent, its degree of interaction with the other
modules is less. Therefore, any error existing in a module would not directly effect the other
modules.
Scope of reuse: Reuse of a module becomes possible- because each module does some well-
defined and precise function and the interaction of the module with the other modules is
different program. o m
simple and minimal. Therefore, a cohesive module can be easily taken out and reused in a
c
Understandability: Complexity of the design is reduced, becauset.different modules can be
understood in isolation as modules are more or less independent of
p oeach other.
s
1.2.4. Function-Oriented Design Approachog
b l
p .
The following are the salient features of a typical function-oriented design approach:
1. A system is viewed as something that performs
o u a set of functions. Starting at this high-
g r
level view of the system, each function is successively refined into more detailed functions.
s membership number to him, and prints a bill
For example, consider a function create-new-library member which essentially creates the
t
towards his membership charge. Thise
n
record for a new member, assigns a unique
function may consist of the following sub-functions:
assign-membership-number d
create-member-record t u
print-bill y s
t
Each of these sub-functionsi may be split into more detailed sub-functions and so on.
.c
2. The system state is centralized and shared among different functions, e.g. data such as
w for reference
w
member-records is available
create-new-member
and updating to several functions such as:
w
delete-member
update-member-record
1.2.5. Object-Oriented Design Approach

In the object-oriented design approach, the system is viewed as collection of objects (i.e.
entities). The state is decentralized among the objects and each object manages its own state
information. For example, in a Library Automation Software, each library member may be a
separate object with its own data and functions to operate on these data. In fact, the functions
defined for one object cannot refer or change data of other objects. Objects have their own
internal data which define their state. Similar objects constitute a class. In other words, each
object is a member of some class. Classes may inherit features from super class. Conceptually,
objects communicate by message passing.

1.2.6. Function-Oriented Vs. Object-Oriented Design

The following are some of the important differences between function-oriented and object-
oriented design.
Unlike function-oriented design methods, in OOD, the basic abstraction are not real-
world functions such as sort, display, track, etc, but real-world entities such as
employee, picture, machine, radar system, etc. For example in OOD, an employee
pay-roll software is not developed by designing functions such as update-employee-
record, get-employee-address, etc. but by designing objects such as employees,
departments, etc.
In object-oriented design, software is not developed by designing functions such as
update-employee-record, get-employee-address, etc., but by designing objects such as
employee, department, etc.
o m
In OOD, state information is not represented in a centralized shared memory but is
distributed among the objects of the system. For example,
t. c while developing an
employee pay-roll system, the employee data such as the
p o names of the employees,
g s
their code numbers, basic salaries, etc. are usually implemented
traditional programming system; whereas in an object-oriented
as global data in a
system these data are
distributed among different employee objects of thelo system. Objects communicate by
. b the state information of another
passing messages. Therefore, one object may discover
p or the other the real-world functions
object by interrogating it. Of course, somewhere
u
must be implemented.
r o
Function-oriented techniques such asgSA/SD group functions together if, as a group,
ts On the other hand, object-oriented techniques
they constitute a higher-level function.
e n of the data they operate on.
group functions together on the basis
u d the object-oriented and the function-oriented design

t
To illustrate the differences between
s
ty
approaches, an example can be considered.
i
.c
Example: Fire-Alarm System
The owner of a large w
w multi-storied building wants to have a computerized fire alarm system
w
for his building. Smoke detectors and fire alarms would be placed in each room of the
building. The fire alarm system would monitor the status of these smoke detectors. Whenever
a fire condition is reported by any of the smoke detectors, the fire alarm system should
determine the location at which the fire condition is reported by any of the smoke detectors.
The fire alarm system should determine the location at which the fire condition has occurred
and then sound the alarms only in the neighboring locations. The fire alarm system should
also flash an alarm message on the computer consol. Fire fighting personnel man the console
round the clock. After a fire condition has been successfully handled, the fire alarm system
should support resetting the alarms by the fire fighting personnel.

Function-Oriented Approach:
/* Global data (system state ) accessible by various functions */
BOOL detector_status[MAX_ROOMS];
int detector_locs[MAX_ROOMS];
BOOL alarm_status[MAX_ROOMS];/* alarm activated when status is set */
int alarm_locs[MAX_ROOMS]; /* room number where alarm is located */
int neighbor-alarm[MAX_ROOMS][10];
/* each detector has at most 10 neighboring locations */
The functions which operate on the system state are:

interrogate_detectors();
get_detector_location();
determine_neighbor();
ring_alarm();
o m
reset_alarm();
report_fire_location();
o t.c
s p
Object-Oriented Approach:
o g
bl
class detector
attributes: status, location, neighbors
p .
operations: create, sense-status, get-location, find-neighbors
ou
class alarm
gr
attributes: location, status
s
nt
operations: create, ring-alarm, get_location, reset-alarm
e
u d
In the object oriented program, an appropriate number of instances of the class detector and
st
alarm should be created. If the function-oriented and the object-oriented programs are examined,
t y
then it is seen that in the function-oriented program the system state is centralized and several
i
.c
functions on this central data is defined. In case of the object-oriented program, the state
information is distributed among various objects.
w
w
It is not necessary that an object-oriented design be implemented by using an object-oriented
w
language only. However, an object-oriented language such as C++, supports the definition of all
the basic mechanisms of class, inheritance, objects, methods, etc., and also supports all key
object-oriented concepts that we have just discussed. Thus, an object-oriented language
facilitates the implementation of an OOD. However, an OOD can as well be implemented using
a conventional procedural language though it may require more effort to implement an OOD
using a procedural language as compared to the effort required for implementing the same design
using an object-oriented language.
Even though object-oriented and function-oriented approaches are remarkably different
approaches to software design, they do not replace each other but complement each other in
some sense. For example, usually one applies the top-down function oriented techniques to
design the internal methods of a class, once the classes are identified. In this case, though
outwardly the system appears to have been developed in an object-oriented fashion, inside each
class there may be a small hierarchy of functions designed in a top-down manner.

2. Function-Oriented Software Design

Function-oriented design techniques view a system as a black-box that performs a set of
high-level functions. During the design process, these high-level functions are successively
decomposed into more detailed functions and finally the different identified functions are
mapped to modules. The term top-down decomposition is often used to denote such successive
decompositions of a set of high-level functions into more detailed functions.
2.1. Structured Analysis

Structured analysis is used to carry out the top-down decomposition of a set of high-level
functions depicted in the problem description and to represent them graphically. During
structured analysis, functional decomposition of the system is achieved. That is, each function
that the system performs is analysed and hierarchically decomposed into more detailed functions.
Top-down decomposition approach. o m

Structured analysis technique is based on the following essential underlying principles:
o t.c
Divide and conquer principle. Each function is decomposed independently.
Graphical representation of the analysis results using Data Flow Diagrams (DFDs).
s p
2.2. Data Flow Diagrams o g
. bl
u p
The DFD (also known as a bubble chart) is a simple graphical formalism that can be used to
represent a system in terms of the input data to the system, various processing carried out on
r o
these data, and the output data generated by the system. A DFD model uses a very limited
s g
number of primitive symbols (as shown in fig. 36.3) to represent the functions performed by a
nt
system and the data flow among these functions.
e
u d
st
i t y
Data Store .
Processc External Entity Data Flow Output
w
wFig. 36.3 Symbols used for designing DFDs
w
The main reason why the DFD technique is so popular is probably because of the fact that
DFD is a very simple formalism it is simple to understand and use. Starting with a set of high-
level functions that a system performs, a DFD model hierarchically represents various sub-
functions. In fact, any hierarchical model is simple to understand. The human mind is such that it
can easily understand any hierarchical model of a system because in a hierarchical model,
starting with a very simple and abstract model of a system, different details of the system are
slowly introduced through different hierarchies. The data flow diagramming technique also
follows a very simple set of intuitive concepts and rules. DFD is an elegant modeling technique
that turns out to be useful not only to represent the results of structured analysis of a software
problem but also for several other applications such as showing the flow of documents or items
in an organization.

2.2.1. Data Dictionary

A data dictionary lists all data items appearing in the DFD model of a system. The data items
listed include all data flows and the contents of all data stores appearing on the DFDs in the DFD
model of a system.
A data dictionary lists the purpose of all data items and the definition of all composite data
items in terms of their component data items. For example, a data dictionary entry may represent
that the data grossPay consists of the components regularPay and overtimePay.
grossPay = regularPay + overtimePay
For the smallest units of data items, the data dictionary lists their name and their type.
A data dictionary plays a very important role in any software development process because
of the following reasons:
A data dictionary provides a standard terminology for all relevant data for use by
engineers working in a project. A consistent vocabulary for data items is very
m
important, since in large projects different engineers of the project have a tendency to
o
t.c
use different terms to refer to the same data, which unnecessarily causes confusion.
o
The data dictionary provides the analyst with a means to determine the definition of
p
s
different data structures in terms of their component elements.
g
o
2.3. DFD : Levels and Model
. bl
u p
The DFD model of a system typically consists of several DFDs, viz., level 0 DFD, level 1
r o
DFD, level 2 DFDs, etc. A single data dictionary should capture all the data appearing in all the
DFDs constituting the DFD model of a system.
s g
2.3.1. Balancing DFDs ent
u d
t
The data that flow into or out of a bubble must match the data flow at the next level of DFD.
s
it y
This is known as balancing a DFD. The concept of balancing a DFD has been illustrated in fig.
.c
36.4. In the level 1 of the DFD, data items d1 and d3 flow out of the bubble 0.1 and the data item
d2 flows into the bubble P1. In the next level, bubble 0.1 is decomposed. The decomposition is
w
balanced, as d1 and d3 flow out of the level 2 diagram and d2 flows in.
w
w

d3
P1 P2
0.1 0.2
d2
d1
d4
P3
0.3
(a) Level 1 DFD
d2 o m
o t.c
P11
0.1.1 s p
o g
d21 l
bd23
p .
o u
P12
gr P13
0.1.2 s 0.1.3
e nt
d1
u d d22 d3
st
i t y (b) Level 2 DFD
.c
w
w
Fig. 36.4 An example showing balanced decomposition
w
2.3.2. Context Diagram
The context diagram is the most abstract data flow representation of a system. It represents
the entire system as a single bubble. This bubble is labeled according to the main function of the
system. The various external entities with which the system interacts and the data flow occurring
between the system and the external entities are also represented. The data input to the system
and the data output from the system are represented as incoming and outgoing arrows. These
data flow arrows should be annotated with the corresponding data names. The name context
diagram is well justified because it represents the context in which the system is to exist, i.e. the
external entities who would interact with the system and the specific data items they would be
supplying the system and the data items they would be receiving from the system. The context
diagram is also called the level 0 DFD.

To develop the context diagram of the system, we have to analyse the SRS document to
identify the different types of users who would be using the system and the kinds of data they
would be inputting to the system and the data they would be receiving from the system. Here, the
term users of the system also includes the external systems which supply data to or receive
data from the system.
The bubble in the context diagram is annotated with the name of the software system being
developed (usually a noun). This is in contrast with the bubbles in all other levels which are
annotated with verbs. This is expected since the purpose of the context diagram is to capture the
context of the system rather than its functionality.
Example 1: RMS Calculating Software
A software system called RMS calculating software would read three integral numbers from
the user in the range of -1000 and +1000 and then determine the root mean square (rms) of
the three input numbers and display it. In this example, the context diagram (fig. 36.5) is
o m
simple to draw. The system accepts three integers from the user and returns the result to him.
o t.c
User
s p
o g
dataitems
.
rmsbl
o up
g
rms
r
s
t0
Calculator
n
ude
t
i ys
t Fig. 36.5 Context Diagram
.c
w
Example 2: Tic-Tac-Toe Computer Game
w
w
Tic-tac-toe is a computer game in which a human player and the computer make alternative
moves on a 3 3 square. A move consists of marking previously unmarked square. The
player, who is first to place three consecutive marks along a straight line (i.e. along a row,
column, or diagonal) on the square, wins. As soon as either of the human player or the
computer wins, a message congratulating the winner should be displayed. If neither player
manages to get three consecutive marks along a straight line, nor all the squares on the board
are filled up, then the game is drawn. The computer always tries to win a game. The context
diagram of this problem is shown in fig. 36.6.

display Tic-Tac-Toe
Software
Human
move
Player
Fig. 36.6 Context diagram for tic-tac-toe computer game
2.3.3. Developing the DFD Model

o m
c
A DFD model of a system graphically depicts the transformationt.of the data input to the
system to the final result through a hierarchy of levels. A DFD starts
p o with the most abstract
definition of the system (lowest level) and at each higher slevel DFD, more details are
successively introduced. To develop a higher-level DFD model,
o g processes are decomposed into
b
their sub-processes and the data flow among these sub-processes l is identified.
.
p of the problem is also called the
To develop the data flow model of a system, first the most abstract representation of the
o
context diagram. After, developing the context rdiagram,
u
problem is to be worked out. The most abstract representation
the higher-level DFDs have to be
developed.
s g
n t
Context Diagram
d e
u
t 1 DFD, examine the high-level functional requirements.
Level 1 DFD: To develop the level
s
i tylevel 1 DFD.
If there are between 3 to 7 high-level functional requirements, then these can be directly
represented as bubbles in the
. c We can then examine the input data to these
If a system has morewthan 7 high-level functional requirements, then some of the related
functions, the data output by these functions, and represent them appropriately in the diagram.
w
requirements have towbe combined and represented in the form of a bubble in the level 1 DFD.
Such a bubble can be split in the lower DFD levels. If a system has less than three high-level
functional requirements, then some of them need to be split into their sub-functions so that we
have roughly about 5 to 7 bubbles on the diagram.
Decomposition: Each bubble in the DFD represents a function performed by the system. The
bubbles are decomposed into sub-functions at the successive levels of the DFD. Decomposition
of a bubble is also known as factoring or exploding a bubble. Each bubble at any level of DFD is
usually decomposed to anything between 3 to 7 bubbles. Too few bubbles at any level make that
level superfluous. For example, if a bubble is decomposed to just one bubble or two bubbles,
then this decomposition becomes redundant. Also, too many bubbles, i.e. more than 7 bubbles at
any level of a DFD makes the DFD model hard to understand. Decomposition of a bubble should
be carried on until a level is reached at which the function of the bubble can be described using a
simple algorithm.

Numbering the Bubbles: It is necessary to number the different bubbles occurring in the
DFD. These numbers help in uniquely identifying any bubble in the DFD from its bubble
number. The bubble at the context level is usually assigned the number 0 to indicate that it is the
0 level DFD. Bubbles at level 1 are numbered, 0.1, 0.2, 0.3, etc. When a bubble numbered x is
decomposed, its children bubble are numbered x.1, x.2, x.3, etc. In this numbering scheme, by
looking at the number of a bubble, we can unambiguously determine its level, its ancestors and
its successors.
Example: Supermarket Prize Scheme
A supermarket needs to develop the following software to encourage regular customers. For
this, the customer needs to supply his/her residence address, telephone number and the
driving license number. Each customer who registers for this scheme is assigned a unique
customer number (CN) by the computer. A customer can present his CN to the check out
staff when he makes any purchase. In this case, the value of his purchase is credited against
o m
his CN. At the end of each year, the supermarket intends to award surprise gifts to 10
ot.c
customers who make the highest total purchase over the year. Also, it intends to award a 22
carat gold coin to every customer whose purchase exceeds Rs.10,000. The entries against the
p
CN are the reset on the day of every year after the prize winners lists are generated.
s
o g
. bl
Sales-clerk
Sales details
u p
Winner-list
r o
s g
Super-
entsoftware
market
d
Gen-winner
u
0
Manager
st
command
it y
.c Customer-
details
CN
w
w
w Customer
Fig. 36.7 Context diagram for supermarket problem
The context diagram for this problem is shown in fig. 36.7, the level 1 DFD in fig. 36.8, and the
level 2 DFD in fig. 36.9.

Customer-details Sales details
Register- CN
Register-
customer sales
0.1 0.3
Customer-data Sales-info
o m
Generate-
winner-list o t.c
0.2
s p
o g
bl
Winner-list
.
Generate-winner-command
p
o u
gr
Fig. 36.8 Level 1 diagram for supermarket problem
ts
en Generate-winner-command
u d
st
Surprise-gift
ty
winner-list Sales-info
. ci
w
Gen-surprise- Find-total-
w
gift-winner sales
w 0.2.1 0.2.3
Total-sales
Sales-info
Gen-gold- Reset
coin-gift- 0.2.3
Gold-coin- winner
winner-list 0.2.2
Fig. 36.9 Level 2 diagram for supermarket problem

Data Dictionary for the DFD Model

address: name + house# + street# + city + pin
sales-details: {item + amount}* + CN
CN: integer
customer-data: {address + CN}*
sales-info: {sales-details}*
winner-list: surprise-gift-winner-list + gold-coin-winner-list
surprise-gift-winner-list: {address + CN}*
gold-coin-winner-list: {address + CN}*
gen-winner-command: command
total-sales: {CN + integer}*
2.3.4. Common Errors in Constructing DFD Model
o m
Although DFDs are simple to understand and draw, students and practitioners alike
encounter similar types of problems while modelling software problems using DFDs. While
o t.c
learning from experience is a powerful thing, it is an expensive pedagogical technique in the
business world. It is therefore helpful to understand the different types of mistakes that users
usually make while constructing the DFD model of systems.
s p
g
Many beginners commit the mistake of drawing more than one bubble in the context
o
diagram. A context diagram should depict the system
. bl as a single bubble.
Many beginners have external entities appearing
u p at all levels of DFDs. All external
oother levels of the DFD.
entities interacting with the system should be represented only in the context diagram.
r
g too less or too many bubbles in a DFD. Only 3
The external entities should not appear at
s
t
It is a common oversight to have either
to 7 bubbles per diagram should n
between 3 and 7 bubbles. de
be allowed, i.e. each bubble should be decomposed to
t u levels of DFD unbalanced.

s
Many beginners leave different
ty
A common mistake icommitted by many beginners while developing a DFD model is
. c
attempting to represent control information in a DFD. It is important to realize that a
DFD is the data
information.w
w flow representation of a system and it does not represent control
The following examples represent some mistakes of this kind:

w can be searched in the library catalogue by inputting its name. If the book
A book
is available in the library, then the details of the book are displayed. If the book is
not listed in the catalogue, then an error message is generated. While generating
the DFD model for this simple problem, many beginners commit the mistake of
drawing an arrow (as shown in fig. 36.10) to indicate the error function is invoked
after the search book. But, this is a control information and should not be shown
on the DFD.

Key Words
Show- Error-message
Search- error-
book message
Search-results
Fig. 36.10 To show control information on a DFD A mistake
Another error is trying to represent when or in what order different functions

(processes) are invoked and the conditions under which different functions are
invoked.

o m
If a bubble A invokes either the bubble B or the bubble C depending upon some
conditions, we need only to represent the data that flows between bubbles A and
modules are invoked. o t.c

B or bubbles A and C and not the conditions depending on which the two
A data store should be connected only to bubbles through

p
s data arrows. A data store
cannot be connected to either another data store or otogan external entity.
. bl by the DFD model. No function
All the functionalities of the system must be captured
p be overlooked.
of the system specified in its SRS document should
u
o
Only those functions of the system rspecified in the SRS document should be
represented, i.e. the designer should g not assume functionality of the system not
specified by the SRS document andts then try to represent them in the DFD.
n
Improper or unsatisfactory dataedictionary.
u d must be intuitive. Some students and even practicing
The data and function names
engineers use symbolic s
t
ty
data names such a, b, c, etc. Such names hinder understanding
the DFD model. i
.c
2.3.5. Shortcomings w
w of a DFD Model
w from several shortcomings. The important shortcomings of the DFD
DFD models suffer
models are the following:
DFDs leave ample scope to be imprecise. In the DFD model, we judge the function
performed by a bubble from its label. However, a short label may not capture the
entire functionality of a bubble. For example, a bubble named find-book-position has
only intuitive meaning and does not specify several things, e.g. what happens when
some input information is missing or is incorrect. Further, the find-book-position
bubble may not convey anything regarding what happens when the required book is
missing.
Control aspects are not defined by a DFD. For instance, the order in which inputs are
consumed and outputs are produced by a bubble is not specified. A DFD model does
not specify the order in which the different bubbles are executed. Representation of
such aspects is very important for modeling real-time systems.

The method of carrying out decomposition to arrive at the successive levels and the
ultimate level to which decomposition is carried out are highly subjective and depend
on the choice and judgment of the analyst. Due to this reason, even for the same
problem, several alternative DFD representations are possible. Further, many a times it
is not possible to say which DFD representation is superior or preferable to another.
The data flow diagramming technique does not provide any specific guidance as to
how exactly to decompose a given function into its sub-functions and we have to use
subjective judgment to carry out decomposition.
2.3.6. Extending DFD Technique To Real-Time Systems

The aim of structured design is to transform the results of the structured analysis (i.e. a DFD
representation into a structure chart). A structure chart represents the software architecture, i.e.
the various modules making up the system, the module dependency, and the parameters that are
m
passed among the different modules. Since the main focus in a structure chart representation is
o
t.c
on the module structure of software and the interaction between the different modules, the
procedural aspects are not represented.
p o
A real-time system is one where the functions must not only produce correct result but also
s
should produce them by some pre-specified time. For real-time systems since reasoning about
g
o
time is important to come up with a correct design, explicit representation of control and event
bl
flow aspects are essential. One of the widely accepted techniques for extending the DFD
.
u p
technique to real-time system analysis is the Ward and Mellor technique [1985]. In the Ward and
Mellor notation, a type of process that handles only control flows is introduced. These processes
r o
representing control processing are denoted using dashed bubbles. Control flows are shown
using dashed lines/arrows.
s g
ent
Unlike Ward and Mellor, Hatley and Pirbhai [1987] show the dashed and solid
representations on separate diagrams. To be able to separate the data processing and the control
d
processing aspects, a Control Flow Diagram (CFD) is defined. This reduces the complexity of
u
t
the diagrams. In order to link the data processing and control processing diagrams, a notational
s
it y
reference (solid bar) to a control specification is used. The CSPEC describes the following:
The effect of an external event or control signal
.c
The processes that are invoked as a consequence of an event
w
Control specifications represent the behaviour of the system in two different ways:
w
It contains a state transition diagram (STD). The STD is a sequential specification of
w
behaviour.
It contains a program activation table (PAT). The PAT is a combinational
specification of behaviour. PAT represents invocation sequence of bubbles in a DFD.
2.4. Structured Design

The aim of structured design is to transform the results of the structured analysis (i.e. a DFD
representation into a structure chart). A structure chart represents the software architecture, i.e.
the various modules making up the system, the module dependency, and the parameters that are
passed among the different modules. Since the main focus in a structure chart representation is
on the module structure of software and the interaction between the different modules, the
procedural aspects are not represented.

2.4.1. Flow Chart Vs. Structure Chart

We are all familiar with the flow chart representation of a program. Flow chart is a
convenient technique to represent the flow of control in a program. A structure chart differs from
a flow chart in three principal ways:
It is usually difficult to identify the different modules of the software from its flow
chart representation.
Sequential ordering of tasks inherent in a flow chart is suppressed in a structure chart.
2.4.2. Transformation of a DFD into a Structure Chart

Systematic techniques are available to transform the DFD representation of a problem into a
module structure represented by a structure chart. Structured design provides two strategies:
Transform Analysis
Transaction Analysis
o m
2.4.3. Transform Analysis t.c
o
p
s (modules) and the high
Transform analysis identifies the primary functional components
g
o transform analysis is to divide the
level inputs and outputs for these components. The first step in
b l
DFD into 3 types of parts:
Input p .
Logical processing
o u
Output r
g that transform input data from physical (e.g.
The input portion of the DFD includes processes
s
t internal tables, lists, etc.). Each input portion is
character from terminal) to logical forms (e.g.
called an afferent branch.
n
e output data from logical to physical form. Each
The output portion of a DFD transforms d
u The remaining portion of a DFD is called central
st
output portion is called efferent branch.
transform.
t y
i analysis, the structure chart is derived by drawing one functional
. c
In the next step of transform
component for the central transform, and the afferent and efferent branches. These are drawn
w would invoke these modules.
w
below a root module, which
w
Identifying the highest level input and output transforms requires experience and skill. One
possible approach is to trace the inputs until a bubble is found whose output cannot be deduced
from its inputs alone. Processes which validate input or add information to them are not central
transforms. Processes which sort input or filter data from it are. The first level structure chart is
produced by representing each input and output unit as boxes and each central transform as a
single box.
In the third step of transform analysis, the structure chart is refined by adding sub-functions
required by each of the high-level functional components. Many levels of functional components
may be added. This process of breaking functional components into subcomponents is called
factoring. Factoring includes adding read and write modules, error-handling modules,
initialization and termination process, identifying customer modules etc. The factoring process is
continued until all bubbles in the DFD are represented in the structure chart.
Example: Structure chart for the RMS software

For this example, the context diagram was drawn earlier.

To draw the level 1 DFD (fig. 36.11), from a cursory analysis of the problem description, we
can see that there are four basic functions that the system needs to perform accept the input
numbers from the user, validate the numbers, calculate the root mean square of the input
numbers and, then display the result.
validate- valid-data compute- rms display- rms

data-items input rms result
0.1 0.2 0.3
Fig. 36.11 Level 1 DFD

o m
By observing the level 1 DFD, we identify the validate-input as .the
t c afferent branch, and
o
write-output as the efferent branch, and the remaining (i.e. compute-rms) as the central
p we get the structure chart
shown in fig. 36.12. g s
transform. By applying the step 2 and step 3 of transform analysis,
o
. bl
u
main p
r o
s g
n t
valid-data
e
d valid-data rms
rms
t u
s
i ty
. c
w
get-good- compute- write-result
w
w data rms
data-items valid-data
data-items
read-input validate-
input
Fig. 36.12 Structure chart

2.4.4. Transaction Analysis

A transaction allows the user to perform some meaningful piece of work. Transaction
analysis is useful while designing transaction processing programs. In a transaction-driven
system, one of several possible paths through the DFD is traversed depending upon the input
data item. This is in contrast to a transform centred system which is characterized by similar
processing steps for each data item. Each different way in which input data is handled is a
transaction. A simple way to identify a transaction is to check the input data. The number of
bubbles on which the input data to the DFD are incident defines the number of transactions.
However, some transactions may not require any input data. These transactions can be identified
from the experience of solving a large number of examples.
For each identified transaction, trace the input data to the output. All the traversed bubbles
belong to the transaction. These bubbles should be mapped to the same module on the structure
chart. In the structure chart, draw a root module and below this module draw each identified
o m
transaction a module. Every transaction carries a tag, which identifies its type. Transaction
analysis uses this tag to divide the system into transaction modules and a transaction-centre
module.
ot.c
The structure chart for the supermarket prize scheme software is shown in fig. 36.13.
s p
o g
root
. bl
u p
r o
customer-
s g sales-registration
registration
t
winner-list-
n
generation
e
register- ud Gen-winner-
customer t
register-
y s list sales
c it surp-
customer- . rise gold total-
details w sales list coin
CN total-
list
sales
w
w sales- total-
sales
sales-
details
sales-details
info
gen-
get- gen- gold- get- record-
customer- generate- find-total- surprise- coin- sales- sales-
CN sales winner- details
details gift-list details
list
Fig. 36.13 Structure chart for the supermarket prize scheme

3. Exercises
1. Mark the following as True or False. Justify your answer.
a. Coupling between two modules is nothing but a measure of the degree of dependence
between them.
b. The primary characteristic of a good design is low cohesion and high coupling.
c. A module having high cohesion and low coupling is said to be functionally
independent of other modules.
d. The degree of coupling between two modules does not depend on their interface
complexity.
e. In the function-oriented design approach, the system state is decentralized and not
shared among different functions.
f. The essence of any good function-oriented design technique is to map the functions
performing similar activities into a module.
o m
g. In the object-oriented design, the basic abstraction is real-world functions.
h. An OOD (Object-Oriented Design) can be implemented using object-oriented
i.
languages only.
o t.c
A DFD model of a system represents the functions performed by the system and the
data flow taking place among these functions.
s p
j. g
A data dictionary lists all data items appearing in the DFD model of a system but does
o
bl
not capture the composition relationship among the data.
.
k. The context diagram of a system represents it using more than one bubble.
p
l.
u
A DFD captures the order in which the processes (bubbles) operate.
o
m. There should be at the most one control relationship between any two modules in a
properly designed structure chart.
gr
s
nt
a. The desirable characteristics that every good software design need are
Correctness
d e
Understandability
t u
Efficiency
y s
Maintainability it
.c
All of the above
w
b. A module is said to have logical cohesion, if
w
it performs a set of tasks that relate to each other very loosely.
w
all the functions of the module are executed within the same time span.
all elements of the module perform similar operations, e.g. error handling, data
input, data output, etc.
None of the above.
c. High coupling among modules makes it
difficult to understand and maintain the product
difficult to implement and debug
expensive to develop the product as the modules having high coupling cannot be
developed independently
all of the above
d. The desirable characteristics that every good software design need are
error isolation
scope of reuse

understandability
all of the above
e. The purpose of structured analysis is
to capture the detailed structure of the system as perceived by the user
to define the structure of the solution that is suitable for implementation in some
programming language
all of the above
f. Structured analysis technique is based on
top-down decomposition approach
bottom-up approach
divide and conquer principle
none of the above
g. Data Flow Diagram (DFD) is also known as a:
structure chart
bubble chart
o m
t.c
Gantt chart
PERT chart
h. The context diagram of a DFD is also known as
p o
level 0 DFD
g s
level 1 DFD o
level 2 DFD
. bl
none of the above
u p
i. Decomposition of a bubble is also known as
classification r o
factoring s g
exploding
aggregation ent
u d
j.
t
Decomposition of a bubble should be carried on
s
till the atomic program instructions are reached
up to two levels it y
.c
until a level is reached at which the function of the bubble can be described using
w
a simple algorithm
w
none of the above
k. w
The bubbles in a level-1 DFD represent
exactly one high-level functional requirement described in SRS document
more than one high-level functional requirement
part of a high-level functional requirement
any of the above depending on the problem
l. By looking at the structure chart, we can
say whether a module calls another module just once or many times
not say whether a module calls another module just once or many times
tell the order in which the different modules are invoked
not tell the order in which the different modules are invoked
m. In which of the following ways does a structure chart differ from a flow chart?
it is always difficult to identify the different modules of the software from its flow
chart representation

data interchange among different modules is not presented in a flow chart

sequential ordering of tasks inherent in a flow chart is suppressed in a structure
chart
none of the above
n. The input portion in the DFD that transforms input data from physical to logical form
is called
central transform
efferent branch
afferent branch
none of the above
o. If during structured design, you observe that the data entering a DFD are incident on
different bubbles, then you would use:
transform analysis
transaction analysis
combination of transform and transaction analysis
neither transform nor transaction analysis o m
t.c
p. During detailed design, which of the following activities take place?
o
the pseudo code for the different modules of the structure chart are developed in
the form of MSPECs s p
g
data structures are designed for the different modules of the structure chart
o
module structure is designed
. bl
3.
none of the above
u p
State the major design activities. Identify separately, the activities undertaken during high-
level design and detailed design. r o
4.
s g
Why is functional independence of a module a key factor for a good software design?
5.
nt
What the salient features of a function-oriented design approach and object-oriented design
e
approach. Differentiate between both these approaches.
6.
u d
Identify the aim of the structured analysis activity. Which documents are produced at the
st
end of structured analysis activity?
7.
t y
Identify the necessity of constructing DFDs in the context of a good software design.
i
8.
9. .c
Write down the importance of data dictionary in the context of good software design.
Explain the term balancing a DFD with an example
10. w
Discuss the essential activities required to develop the DFD of a system more
w
11. w
systematically.
What do you understand by top-down decomposition in the context of structured analysis?
Explain with a suitable example.
12. Identify the common errors made during construction of a DFD model. Identify the
shortcomings of the DFD model.
13. Differentiate between a structure chart and a flow chart.
14. Explain transform analysis with a suitable example.
15. Explain transaction analysis with an example.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Issues
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
37
Software Design Part 2

Define classes, objects, attributes, and methods
Explain data abstraction and identify its advantages
Explain inheritance, and identify the different types of inheritance and its advantages
Explain encapsulation and identify its advantages
Explain polymorphism, static and dynamic binding
Identify the advantages of object-oriented design
Explain what a model is
Understand the different views of the system that are captured by UML diagrams
o m
t.c
Explain the use case model of a system
Factorize use cases into different component use cases
p o
Explain the relationships among classes by means of
g s association, aggregation, and
composition
Draw interaction diagrams for a given problem.b
lo
Draw activity diagrams for a given problem u p
Develop the state chart diagram for a given r o class
s g
Explain design patterns
n t
Identify pattern solution for e
u d a particular problem in terms of class and interaction
diagrams
Explain expert pattern,
t
screator pattern, controller pattern, and model-view-separation
pattern i t y
. c
Explain domain modelling
w
Identify thewtypes of objects identified during domain analysis and explain their
w
interaction
Identify the different approaches for identifying objects in the context of OOD
methodology
Develop sequence diagram for a use case
1. Object-Oriented Concepts An Introduction

In this section, first we will discuss the basic mechanisms in object-oriented paradigm. We
will then discuss some key concepts and a few related technical terms.

1.1. Basic Entities

Classes, Objects, Attributes, and Methods: Similar objects constitute a class. This means,
objects possessing similar attributes and displaying similar behaviour constitute a class. For
example, all employee objects have similar attributes such as his name, code number, salary,
address, etc. and exhibits similar behaviour as other employee objects. Once the class is defined,
it serves as a template for object creation. Since each object is created as an instance of some
class, classes can be considered as abstract data types (ADTs).
In the object-oriented approach, a system is designed as a set of interfacing objects.
Normally, each object represents a tangible real-world entity such as a library member, an
employee, a book, etc. Objects are basically class variables. However, at times some conceptual
entities can be considered as objects (e.g. a scheduler, a controller, etc.) to simplify solutions to
certain problems. When a system is analysed, developed, and implemented in terms of the
natural objects occurring in it, it becomes easier to understand the design and implementation of
o m
the system. Each object essentially consists of some data that are private to the object and a set of
functions that operate on those data as shown in fig.37.1. In fact, the functions of an object have
ot.c
the sole authority to operate on the private data of that object. Therefore, an object can not
directly access the data internal to another object. However, an object can indirectly access the
s p
internal data of the other objects by invoking the operations (i.e. methods) supported by those
objects.
o g
m7 m8 .bl
u p
r o
s g
ent
u d
st
it y
.c
m1 w m5
m2 w Data m6
w
Object
mi are methods of the object

m 3m 4
Fig. 37.1 A Model of an object

The data internal to an object are called the attributes of the object, and the functions
supported by an object are called its methods. Fig. 37.2 shows LibraryMember class with eight
attributes and five methods.
1.2. Data Abstraction

Data abstraction means that each object hides (abstracts away) from other objects the exact
way in which its internal information is organized and manipulated. It only provides a set of
methods, which other objects can use for accessing and manipulating this private information of
the object. Other objects can not directly access the private data of an object. For example, a
stack object might store its internal data either in the form of an array of values or in the form of
a linked list. Other objects would not know how exactly this object has stored its data and how it
manipulates its data. What they would know is the set of methods such as push, pop, and top-of-
stack that it provides to the other objects for accessing and manipulating the data.
1.2.1. Advantages of Data Abstraction o m

o t.c
s p
An important advantage of the principle of data abstraction is that it reduces coupling among
the objects. Therefore, it reduces the overall complexity of a design, and helps in maintenance
and code reuse.
o g
. bl
p
LibraryMember
u
r
Member Name o
g
Membership Number
s
nt
Address
Phone Number
d e
E-Mail Number
u
Membership Admission Date
t
y sMembership Expiry Date
it Books Issued
.c issueBook();
w findPendingBooks();
w findOverdueBooks();
w returnBook();
findMembershipDetails();
Fig. 37.2 A class model with attributes and methods
1.3. Inheritance
The inheritance feature allows us to define a new class by extending or modifying an existing
class. The original class is called the base class (or super class) and the new class obtained
through inheritance is called as the derived class (or sub class). A base class is a generalization of
its derived classes. This means that the base class contains only those properties that are common
to all the derived classes. Again each derived class is a specialization of its base class because it

modifies or extends the basic properties of the base class in certain ways. Thus, the inheritance
relationship can be viewed as a generalization-specialization relationship.
Using the inheritance relationship, different classes can be arranged in a class hierarchy (or
class tree). In addition to inheriting all properties of the base class, a derived class can define
new properties. That is, it can define new data and methods. It can even give new definitions to
methods which already exist in the base class. Redefinition of methods which existed in the base
class is called as method overriding. In fig. 37.3, LibraryMember is the base class for the
derived classes Faculty, Student, and Staff. Similarly, Student is the base class for the derived
classes Undergraduate, Postgraduate, and Research. Each derived class inherits all the data and
methods of the base class. It can also define additional data and methods or modify some of the
inherited data and methods. The different classes in a library automation system and the
inheritance relationship among them are shown in the fig. 37.3. The inheritance relationship has
been represented in fig. 37.3 using a directed arrow drawn from a derived class to its base class.
In fig. 37.3, the LibraryMember base class might define the data for name, address, and library
membership number for each member. Though Faculty, Student, and Staff classes inherit these
o m
data, they might have to redefine the respective issue-book methods because the number of
o t.c
books that can be borrowed and the duration of loan may be different for the different category
of library members. Thus, the issue-book method is overridden by each of the derived classes
p
and the derived classes might define additional data max-number-books and max-duration-of-
s
issue which may vary for the different member categories.
o g
Library Member . bl
u p Bass class
r o
s g
nt
deStudent
Derived
Faculty Staff Classes
t u
it ys
.c
w
w
Under Graduate Post Graduate Research
w
Fig. 37.3 Library Information System Example
1.3.1. Object-Oriented Vs. Object-Based Languages

Languages that support classes (Abstract Data Types) but do not support inheritance are
called object-based languages. On the other hand, languages that support both classes as well as
inheritance are called object-oriented languages.
1.3.2. Advantages of Inheritance

An important advantage of the inheritance mechanism is code reuse. If certain methods or
data are similar in asset of classes, then instead of defining these methods and data each of these

classes separately, these methods and data are defined only once in the base class and are
inherited by each of its subclasses. For example, in the Library Information System example of
fig. 37.3, each category of member objects Faculty, Student, and Staff need the data member-
name, member-address, and membership-number and therefore these data are defined in the base
class LibraryMember and inherited by its subclasses.
Another advantage of the inheritance mechanism is the conceptual simplification that comes
from reducing the number of independent features of the classes.
1.3.3. Multiple Inheritance

Multiple inheritance is a mechanism by which a sub class can inherit attributes and methods
from more than one base class. Suppose research students are also allowed to be staff of the
institute, then some of the characteristics of the Research class might be similar to the Student
class and some other characteristics might be similar to the Staff class. Such a class hierarchy
o m
can be represented as in fig. 37.4. Multiple inheritance is represented by arrows drawn from the
subclass to each of the base classes. The class Research inherits features from both the classes
Student and Staff.
o t.c
Library Member s p
g
Base class
o
. bl
u p
r o
Faculty
s
Studentg Staff Derived Classes
n t
ude
t
tys
ci
Multiple
Under Graduate Post Graduate Research
. Inheritance
w
w
Fig. 37.4 Library Information System example with multiple inheritance.
w
1.4. Encapsulation
The property of an object by which it interfaces with the outside world only through
messages is referred to as encapsulation. The data of an object are encapsulated within its
methods and are available only through message-based communication. This concept is
schematically represented in fig. 37.5.

m3 m4
m2 m5
Data
m1 m6
Methods
Fig. 37.5 Schematic representation of the concept of encapsulation
o m
1.4.1. Advantages Of Encapsulation
o t.c
Encapsulation offers three important advantages:
s p
g
It protects an objects internal data from corruption by other objects. This protection
o
. bl
includes protection from unauthorized access and protection from different types of
problems that arise from concurrent access of data such as deadlock and inconsistent
values.
u p
r o
Encapsulation hides the internal structure of an object so that interaction with the
s
object is simple and standardized. This g facilitates reuse of objects across different
n t structure or procedures of an object are modified,
projects. Furthermore, if the internal
e results in easy maintenance.
other objects are not affected. This
d
Since objects communicate
t u among each other using messages only, they are weakly
s are inherently weakly coupled enhances understanding
coupled. The fact that objects
of design since eachty
i object can be studied and understood almost in isolation from
other objects.
.c
w
1.5. Polymorphism w
w
Polymorphism literally means poly (many) morphs (forms). Broadly speaking,
polymorphism denotes the following:
The same message can result in different actions when received by different objects. This
is also referred to as static binding. This occurs when multiple methods with the same
operation name exist.
When we have an inheritance hierarchy, an object can be assigned to another object of its
ancestor class. When such an assignment occurs, a method call to the ancestor object
would result in the invocation of the appropriate method of the object of the derived
class. The exact method to which a method call would be bound cannot be known at
compile time, and is dynamically decided at the runtime. This is also known as dynamic
binding.

1.5.1. Static Binding

An example of static binding is the following. Suppose a class named Circle has three
definitions for the create operation. One definition does not take any argument and creates a
circle with default parameters. The second definition takes the center point and radius as its
parameters. In this case, the fill style values for the circle would be set to default no fill. The
third takes the centre point, the radius, and the fill style as its input. When the create method is
invoked, depending on the parameters given in the invocation, the appropriate method will be
called. If create is invoked with no parameters, then a default circle would be created. If only the
centre and the radius are supplied, then an appropriate circle would be created with no fill type,
and so on. A class definition of the Circle class with the overloaded create method is shown in
fig. 37.6. When the same operation (e.g. create) is implemented by multiple methods, the method
name is said to be overloaded.
Class Circle
{
private: o m
float x, y, radius;
int fillType; o t.c
s p
public:
o g
bl
create();
.
create (float x, float y, float centre);
p
} o u
create (float x, float y, float centre, int fillType);
r
g overloaded create method
ts
Fig. 37.6 Circle class with
e n
1.5.2. Dynamic Binding
u d
t
ys can send a generic message to a set of objects which
Using dynamic binding a programmer
may be of different types (i.e.,itbelonging to different classes) and leave the exact way in which
the message would be handled. c to the receiving objects. Suppose we have a class hierarchy of
win a drawing as shown in fig. 37.7. Now, suppose the display method
different geometric objects
w and is overridden in each derived class. If the different types of
is declared in the shape
w class
geometric objects making up a drawing are stored in an array of type shape, then a single call to
the display method for each object would take care to display the appropriate drawing element.
That is, the same draw call to a shape object would take care of drawing the appropriate shape.
This code segment is shown in fig. 37.8.

Shape
Circle Rectangle Line
Ellipse Square
o m
Cube
o t.c
p
s objects
g
Fig. 37.7 Class hierarchy of geometric
b lo
Traditional Code
p .
Object-oriented Code
if(shape == Circle) then ou

draw_circle();
gr
else if(shape ==
ts
Rectangle)then
e
draw_rectangle();
n shape.draw();
____
u d
____
st
ti y
.c code and object-oriented code using dynamic binding
Fig. 37.8 Traditional
w
w
1.5.3. Advantages w of Dynamic Binding
The main advantage of dynamic binding is that it leads to elegant programming and
facilitates code reuse and maintenance. With dynamic binding, new derived objects can be added
with minimal changes to existing objects. This advantage of polymorphism can be illustrated by
comparing the code segments of an object-oriented program and a traditional program for
drawing various graphic objects on the screen. It can be assumed that the shape is the base class,
and the classes Circle, Rectangle, and Ellipse are derived from it. Now, shape can be assigned
any objects of type Circle, Rectangle, Ellipse, etc. But, a draw method invocation of the shape
object would invoke the appropriate method. It can be easily seen in fig. 37.8 that, because of
dynamic binding, the object-oriented code is much more concise and intellectually appealing.
Also, suppose in the example program segment, it is later found necessary to handle a new
graphics drawing primitive, say Ellipse, then, the procedural code has to be changed by adding a

new if-then-else clause. However, in case of the object-oriented program, the code need not
change. Only a new class called Ellipse has to be defined.
1.6. Advantages of the Object-Oriented Design

In the last few years that OOD has come into existence, it has found widespread acceptance
in industry as well as in academics. The main reason for the popularity of OOD is that it holds
the following promises:
Code and design reuse.
Increased productivity.
Ease of testing and maintenance.
Better code and design understandability.
Out of all these advantages, the chief advantage of OOD is improved productivity which
comes about due to a variety of factors, such as
Code reuse by the use of predefined class libraries.
Code reuse due to inheritance. o m
Better problem decomposition. o t.c
Simpler and more intuitive abstraction, i.e. better organization of inherent complexity.
s p
2. Object Modelling using UML
o g
. bl
2.1. Model and its uses
u p
r o
A model captures aspects important for some application while omitting (or abstracting) the
s g
rest. A model in the context of software development can be graphical, textual, mathematical, or
ent
program code-based. Models are very useful in documenting design and analysis results. Models
also facilitate the analysis and design procedures themselves. Graphical models are very popular
u d
because they are easy to understand and construct. UML is primarily a graphical modelling tool.
st
However, it often requires text explanations to accompany the graphical models.
t y
An important reason behind constructing a model is that it helps manage complexity. Once
i
.c
models of a system have been constructed, they can be used for a variety of purposes during
software development, including the following:
w
Code reuse by the use of predefined class libraries
Analysis w
Specification
w
Code generation
Design
Visualize and understand the problem and the working of a system
Testing, etc.
In all these applications, the UML models can not only be used to document the results but
also to arrive at the results themselves. Since a model can be used for a variety of purposes, it is
reasonable to expect that the model would vary depending on the purpose for which it is being
constructed. For example, a model developed for initial analysis and specification should be very
different from the one used for design. A model that is being used for analysis and specification
would not show any of the design decisions that would be made later on during the design stage.
On the other hand, a model used for design purposes should capture all the design decisions.
Therefore, it is a good idea to explicitly mention the purpose for which a model has been
developed, along with the model.
2.2. UML diagrams

UML can be used to construct nine different types of diagrams to capture five different views
of a system. Just as a building can be modelled from several views (or perspectives) such as
ventilation perspective, electrical perspective, lighting perspective, heating perspective, etc.; the
different UML diagrams provide different perspectives of the software system to be developed
and facilitate a comprehensive understanding of the system. Such models can be refined to get
the actual implementation of the system.
The UML diagrams can capture the following five views of a system:
Users view
Structural view
Behavioral view
Implementation view
Environmental view
o m
Fig. 37.9 shows the UML diagrams responsible for providing the different views.
Users view: This view defines the functionalities (facilities) made available by the system to
o t.c
its users. The users view captures the external users view of the system in terms of the
functionalities offered by the system. The users view is a black-box view of the system
s p
where the internal structure, the dynamic behavior of different system components, the
g
implementation etc. are not visible. The users view is very different from all other views in
o
bl
the sense that it is a functional model compared to the object model of all other views. The
.
users view can be considered as the central view and all other views are expected to conform
p
u
to this view. This thinking is in fact the crux of any user centric development style.
o
r
gBehavioral View
s
t Sequence Diagram
Structural View
- Class Diagram en
- Object Diagram d Collaboration Diagram
u Users View State-chart-Dia
t
s - Use Case Activity Dia
it y
. c Diagram
w
Implementation View Environmental View
w- Component Diagram - Deployment Dia
w
Fig. 37.9 Different types of diagrams and views supported in UML
Structural view: The structural view defines the kinds of objects (classes) important to the
understanding of the working of a system and to its implementation. It also captures the
relationships among the classes (objects). The structural model is also called the static model,
since the structure of a system does not change with time.
Behavioral view: The behavioural view captures how objects interact with each other to
realize the system behaviour. The system behaviour captures the time-dependent (dynamic)
behaviour of the system.
Implementation view: This view captures the important components of the system and their
dependencies.

Environmental view: This view models how the different components are implemented on
different pieces of hardware.
2.3. Use Case Model

The use case model for any system consists of a set of use cases. Intuitively, use cases
represent the different ways in which a system can be used by the users. A simple way to find all
the use cases of a system is to ask the question: What can the users do by using the system?
Thus for the Library Information System (LIS), the use cases could be:
issue-book
query-book
return-book
create-member
add-book, etc
o m
Use cases correspond to the high-level functional requirements. The use cases partition the
system behaviour into transactions, so that each transaction performs some useful action from the
exchanges between the user and the system to complete. o t.c

users point of view. Each transaction may involve either a single message or multiple message
s p
2.3.1. Purpose of Use Cases
o g
. bl
u p
The purpose of a use case is to define a piece of coherent behaviour without revealing the
internal structure of the system. The use cases do not mention any specific algorithm to be used
r o
or the internal data representation, internal structure of the software, etc. A use case typically
g
represents a sequence of interactions between the user and the system. These interactions consist
s
ent
of one mainline sequence. The mainline sequence represents the normal interaction between a
user and the system. The mainline sequence is the most occurring sequence of interaction. For
d
example, the mainline sequence of the withdraw cash use case supported by a bank ATM drawn,
u
t
complete the transaction, and get the amount. Several variations to the main line sequence may
s
it y
also exist. Typically, a variation from the mainline sequence occurs when some specific
conditions hold. For the bank ATM example, variations or alternate scenarios may occur, if the
.c
password is invalid or the amount to be withdrawn exceeds the amount balance. The variations
w
are also called alternative paths. A use case can be viewed as a set of related scenarios tied
w
together by a common goal. The mainline sequence and each of the variations are called
w
scenarios or instances of the use case. Each scenario is a single path of user events and system
activity through the use case.
2.3.2. Representation of Use Cases

Use cases can be represented by drawing a use case diagram and writing an accompanying
text elaborating the drawing. In the use case diagram, each use case is represented by an ellipse
with the name of the use case written inside the ellipse. All the ellipses (i.e. use cases) of a
system are enclosed within a rectangle which represents the system boundary. The name of the
system being modelled (such as Library Information System) appears inside the rectangle.
The different users of the system are represented by using the stick person icon. Each stick
person icon is normally referred to as an actor. An actor is a role played by a user with respect to
the system use. It is possible that the same user may play the role of multiple actors. Each actor
can participate in one or more use cases. The line connecting the actor and the use case is called
the communication relationship. It indicates that the actor makes use of the functionality
provided by the use case. Both the human users and the external systems can be represented by
stick person icons. When a stick person icon represents an external system, it is annotated by the
stereotype <<external system>>.
Example
The use case model for the Tic-Tac-Toe problem is shown in fig. 37.10. This software has
only one use case play move. Note that the use case get-user-move is not used here. The
name get-user-move would be inappropriate because the use cases should be named from
the users perspective.
Play move
o m
Player
o t.c
s p
Tic-tac-toe game
o g
. bl game
Fig. 37.10 Use case model for tic-tac-toe
Text Description u p
Each ellipse on the use case diagram should be r oaccompanied by a text description. The text
s g
description should define the details of the interaction between the user and the computer and
n t all the behaviour associated with the use case
other aspects of the use case. It should include
e
in terms of the mainline sequence, different variations to the normal behaviour, the system
responses associated with the use dcase, the exceptional conditions that may occur in the
t u is often written in a conversational style describing
behaviour, etc. The behaviour description
the interactions between the y s and the system. The text description may be informal, but
actor
t
i The following are some of the information which may be
. c
some structuring is recommended.
included in a use case text description in addition to the mainline sequence, and the
alternative scenarios. w
Contact persons: w This section lists personnel of the client organization with whom the use
w
case was discussed, date and time of the meeting, etc.
Actors: In addition to identifying the actors, some information about actors using this use
case which may help the implementation of the use case may be recorded.
Pre-condition: The preconditions would describe the state of the system before the use case
execution starts.
Post-condition: This captures the state of the system after the use case has successfully
completed.
Non-functional requirements: This could contain the important constraints for the design
and implementation, such as platform and environment conditions, qualitative statements,
response time requirements, etc.
Exceptions, error situations: This contains only the domain-related errors such as lack of
users access rights, invalid entry in the input fields, etc. Obviously, errors that are not
domain related, such as software errors, need not be discussed here.
Sample dialogs: These serve as examples illustrating the use case.

Specific user interface requirements: These contain specific requirements for the user
interface of the use case. For example, it may contain forms to be used, screen shots,
interaction style, etc.
Document references: This part contains references to specific domain-related documents
which may be useful to understand the system operation.
2.3.3. Utility of Use Cases

Use cases (represented by ellipses) along with the accompanying text description serve as a
type of requirements specification of the system and form the core model to which all other
models must conform. But, what about the actors (stick person icons)? One possible use of
identifying the different types of users (actors) is in identifying and implementing a security
mechanism through a login system, so that each actor can involve only those functionalities to
which he is entitled to. Another possible use is in preparing the documentation (e.g. users
understanding the exact functioning of the system. o m

manual) targeted at each category of user. Further, actors help in identifying the use cases and
o
2.3.4. Factoring of Commonality Among Use Cases
t.c
s p
It is often desirable to factor use cases into component usegcases. Actually, factoring of use
b
cases is required under two situations. First, complex use cases lo need to be factored into simpler
p
use cases. This would not only make the behaviour associated . with the use case much more
o u use cases
comprehensible, but also make the corresponding interaction diagrams more tractable. Without
decomposition, the interaction diagrams for complex
r
gcases. Factoring
accommodated on a single sized (A4) paper. Secondly,
may become too large to be
use cases need to be factored whenever
there is common behaviour across different use ts would make it possible to define
n
eof use cases. This makes analysis of the class design
such behaviour only once and reuse it whenever required. It is desirable to factor out common
usage such as error handling from a set
much simpler and elegant. However,tu
d
a word of caution here. Factoring of use cases should not be
done except for achieving the above s
y two objectives. From the design point of view, it is not
advantageous to break up a useitcase into many smaller parts just for the sake of it.
.c for factoring of use cases, as follows:
UML offers three mechanisms
w
Generalization w
w can be used when one use case is similar to another, but does
Use case generalization
something slightly differently or something more. Generalization works the same way with
use cases as it does with classes. The child use case inherits the behaviour and meaning of the
parent use case. The notation is the same too (as shown in fig. 37.11). It is important to
remember that the base and the derived use cases are separate use cases and should have
separate text descriptions.

Pay membership fee
Pay through credit card Pay through library

pay card
o
Fig. 37.11 Representation of use case generalization m
o t.c
Base Use
<< include >> p
s Use
case g
Common
o case
b l
p .
Fig. 37.12 Representation of use case inclusion
ou
Includes
gr
The includes relationship in the older versions of UML (prior to UML 1.1) was known as the
s
ent
uses relationship. The includes relationship involves one use case including the behaviour of
another use case in its sequence of events and actions. The includes relationship occurs when
d
a chunk of behaviour is similar across a number of use cases. The factoring of such
u
t
behaviour will help in not repeating the specification and implementation across different use
s
it y
cases. Thus, the includes relationship explores the issue of reuse by factoring out the
commonality across use cases. It can also be gainfully employed to decompose a large and
.c
complex use cases into more manageable parts. As shown in fig. 37.12, the includes
w
relationship is represented using a predefined stereotype <<include>>. In the includes
w
relationship, a base use case compulsorily and automatically includes the behaviour of the
w
common use cases. As shown in example fig. 37.13, issue-book and renew-book both include
check-reservation use case. The base use case may include several use cases. In such cases, it
may interleave their associated common use cases together. The common use case becomes a
separate use case and the independent text description should be provided for it.

Issue Book Renew Book
<< include >> << include >>
<< include >> << include >>
Check for Get user Update

Reservation selection selected books
Fig. 37.13 Example use case inclusion o m

Extends
o t.c
The main idea behind the extends relationship among the use cases is that it allows you to
s p
show optional system behaviour. An optional system behaviour is extended only under
g
certain conditions. This relationship among use cases is also predefined as a stereotype as
o
. bl
shown in fig.37.14. The extends relationship is similar to generalization. But unlike
generalization, the extending use case can add additional behaviour only at an extension
u p
point only when certain conditions are satisfied. The extension points are points within the
r o
use case where variation to the mainline (normal) action sequence may occur. The extends
g
relationship is normally used to capture alternate paths or scenarios.
s
ent
<< extends >>
Base
u d Common
Use case
st Use case
it y
.c
Fig. 37.14 Representation of use case extension
w
w
Organization of Use Cases
w
When the use cases are factored, they are organized hierarchically. The high-level use cases
are refined into a set of smaller and more refined use cases as shown in fig. 37.15. Top-level
use cases are super-ordinate to the refined use cases. The refined use cases are sub-ordinate
to the top-level use cases. Note that only the complex use cases should be decomposed and
organized in a hierarchy. It is not necessary to decompose simple use cases. The functionality
of the super-ordinate use cases is traceable to their sub-ordinate use cases. Thus, the
functionality provided by the super-ordinate use cases is composite of the functionality of the
sub-ordinate use cases. In the highest level of the use case model, only the fundamental use
cases are shown. The focus is on the application context. Therefore, this level is also referred
to as the context diagram. In the context diagram, the system limits are emphasized. The top-
level diagram contains only those use cases with which the external users of the system
interact. The subsystem-level use cases specify the services offered by the subsystems to the
other subsystems. Any number of levels involving the subsystems may be utilized. In the

lowest level of the use case hierarchy, the class-level use cases specify the functional
fragments or operations offered by the classes.
use case 1 use case 3

External users
use case 2
use case 3.1 use case 3.3
o m
Subsystems
t.c
use case 3.2
p o
gs
o
. bl
u p
r o
use case 1
s g case 3
use
nt
de
Method
use case 2
t u
s
ti y37.15 Hierarchical organization of use cases
Fig.
.c
w
2.4. Class Diagrams
w
w
A class diagram describes the static structure of a system. It shows how a system is
structured rather than how it behaves. The static structure of a system comprises of a number of
class diagrams and their dependencies. The main constituents of a class diagram are classes and
their relationships: generalization, aggregation, association, and various kinds of dependencies.
The classes represent entities with common features, i.e. attributes and operations. Classes
are represented as solid outline rectangles with compartments. Classes have a mandatory name
compartment where the name is written centered in boldface. The class name is usually written
using mixed case convention and begins with an uppercase. The class names are usually chosen
to be singular nouns. An example of a class is shown in fig. 37.1.2. Classes have optional
attributes and operations compartments. A class may appear on several diagrams. Its attributes
and operations are suppressed on all but one diagram.

2.4.1. Association
Associations are needed to enable objects to communicate with each other. An association
describes a connection between classes. The association relation between two objects is called
object connection or link. Links are instances of associations. A link is a physical or conceptual
connection between object instances. For example, suppose Amit has borrowed the book Graph
Theory. Here, borrowed is the connection between the objects Amit and Graph Theory book.
Mathematically, a link can be considered to be a tuple, i.e. an ordered list of object instances. An
association describes a group of links with a common structure and common semantics. For
example, consider the statement that Library Member borrows Books. Here, borrows is the
association between the class LibraryMember and the class Book. Usually, an association is a
binary relation (between two classes). However, three or more different classes can be involved
in an association. A class can have an association relationship with itself (called recursive
association). In this case, it is usually assumed that two different objects of the class are linked
by the association relationship.
o m
Association between two classes is represented by drawing a straight line between the
o t.c
concerned classes. Fig. 37.16 illustrates the graphical representation of the association relation.
The name of the association is written along side the association line. An arrowhead may be
s p
placed on the association line to indicate the reading direction of the association. The arrowhead
g
should not be misunderstood to be indicating the direction of a pointer implementing an
o
bl
association. On each side of the association relation, the multiplicity is noted as an individual
.
number or as a value range. The multiplicity indicates how many instances of one class are
p
o u
associated with each other. Value ranges of multiplicity are noted by specifying the minimum
and maximum value, separated by two dots, e.g. 1..5. An asterisk is a wild card and means many
gr
(zero or more). The association of fig. 37.16 should be read as Many books may be borrowed
s
nt
by a Library Member. Observe that associations (and links) appear as verbs in the problem
statement.
d e
t u1 W borrowed by *
Library Member
y s Book
c it
. 37.16 Association between two classes
w Fig.
w realized by assigning appropriate reference attributes to the classes
Associations are usually
w can be implemented using pointers from one object class to another.
involved. Thus, associations
Links and associations can also be implemented by using a separate class that stores which
objects of a class are linked to which objects of another class. Some CASE tools use the role
names of the association relation for the corresponding automatically generated attribute.
2.4.2. Aggregation
Aggregation is a special type of association where the involved classes represent a whole-part
relationship. The aggregate takes the responsibility of forwarding messages to the appropriate
parts. Thus, the aggregate takes the responsibility of delegation and leadership. When an instance
of one object contains instances of some other objects, then aggregation (or composition)
relationship exists between the composite object and the component object. Aggregation is

represented by the diamond symbol at the composite end of a relationship. The number of
instances of the component class aggregated can also be shown as in fig. 37.17 (a).
1 * 1 *
Document Paragraph Line
Fig. 37.17(a) Representation of aggregation
The aggregation relationship cannot be reflexive (i.e. recursive). That is, an object cannot
contain objects of the same class as itself. Also, the aggregation relation is not symmetric. That
is, two classes A and B cannot contain instances of each other. However, the aggregation
relationship can be transitive. In this case, aggregation may consist of an arbitrary number of
levels.
2.4.3. Composition o m
o t.c
s p
Composition is a stricter form of aggregation, in which the parts are existence-dependent on
the whole. This means that the life of the parts are closely tied to the life of the whole. When the
g
whole is created, the parts are created and when the whole is destroyed, the parts are destroyed.
o
. bl
A typical example of composition is an invoice object with invoice items. As soon as the invoice
object is created, all the invoice items in it are created and as soon as the invoice object is
u p
destroyed, all invoice items in it are also destroyed. The composition relationship is represented
r o
as a filled diamond drawn at the composite-end. An example of the composition relationship is
shown in fig. 37.17 (b).
s g
e 1 nt *
Order
u d Item
s t
ti y37.17(b) Representation of composition
Fig.
2.5. Interaction Diagrams .c

w
w
w
Interaction diagrams are models that describe how a group of objects collaborate to realize
some behaviour. Typically, each interaction diagram realizes the behaviour of a single use case.
An interaction diagram shows a number of example objects and the messages that are passed
between the objects within the use case.
There are two kinds of interaction diagrams: sequence diagrams and collaboration
diagrams. These two diagrams are equivalent in the sense that any one diagram can be derived
automatically from the other. However, they are both useful. These two actually portray different
perspectives of behaviour of the system and different types of inferences can be drawn from
them. The interaction diagrams can be considered as a major tool in the design methodology.
2.5.1. Sequence Diagrams

A sequence diagram shows interaction among objects as a two dimensional chart. The chart
is read from top to bottom. The objects participating in the interaction are shown at the top of the

chart as boxes attached to a vertical dashed line. Inside the box, the name of the object is written
with a colon separating it from the name of the class, and both the name of the object and class
are underlined. The objects appearing at the top signify that the object already existed when the
use case execution was initiated. However, if some object is created during the execution of the
use case and participates in the interaction (e.g. a method call), then the object should be shown
at the appropriate place on the diagram where it is created. The vertical dashed line is called the
objects lifeline. The lifeline indicates the existence of the object at any particular point of time.
The rectangle drawn on the lifetime is called the activation symbol and indicates that the object
is active as long as the rectangle exists. Each message is indicated as an arrow between the
lifelines of two objects. The messages are shown in chronological order from the top to the
bottom. That is, reading the diagram from the top to the bottom would show the sequence in
which the messages occur. Each message is labeled with the message name. Some control
information can also be included. Two types of control information are particularly valuable.
A condition (e.g. [invalid]) indicates that a message is sent, only if the condition is true.
An iteration marker shows the message is sent many times to multiple receiver objects as
o m
would happen when a collection or the elements of an array are being iterated. The basis
of the iteration can also be indicated e.g. [for every book object].
ot.c
Library Book Library s p
Library
Renewal Book
o g Book Library
bl
Boundary Member
controller Register
p .
o u
renewBook r
g findMemberBorrowing
s
display
Borrowing
ent
selectBooks
u d
bookSelected
s t *find
[reserved]
it y [reserved]
apology .c apology
w confirm
update
w
w
confirm updateMemberBorrowing
Fig. 37.18 Sequence diagram for the renew book use case

The sequence diagram for the book renewal use case for the Library Automation Software is
shown in fig. 37.18. The development of the sequence diagram in the development methodology
would help us in determining the responsibilities of the different classes; i.e. what methods
should be supported by each class.
2.5.2. Collaboration Diagrams

A collaboration diagram shows both structural and behavioural aspects explicitly. This is
unlike a sequence diagram which shows only the behavioural aspects. The structural aspect of a
collaboration diagram consists of objects and the links existing between them. In this diagram, an
object is also called a collaborator. The behavioural aspect is described by the set of messages
exchanged among the different collaborators. The link between objects is shown as a solid line
and can be used to send messages between two objects. The message is shown as a labeled arrow
placed near the link. Messages are prefixed with sequence numbers because they are the only
o m
way to describe the relative sequencing of the messages in this diagram. The collaboration
diagram for the example of fig. 37.18 is shown in fig. 37.19. The use of the collaboration
associated with which other classes. o t.c

diagrams in our development process would be to help us to determine which classes are
s p
g
Library
o
bl
Book 6: *find Book
10: confirm p .
Register
[reserved]
o u 9: update
8: apology 5: bookSelected
gr [reserved]
s
1: renewBook
ent 7: apology
Library
u d
Library Book
Boundary
st Renewal
controller
i ty
3: displayBorrowing
.c 2: findMemberBorrowing
w
4: selectBooks
w
w12: confirm Library
Member
Fig. 37.19 Collaboration diagram for the renew book use case
2.6. Activity Diagrams

The activity diagram is possibly one modelling element which was not present in any of the
predecessors of UML. No such diagrams were present either in the works of Booch, Jacobson, or
Rumbaugh. It is possibly based on the event diagram of Odell [1992] though the notation is very
different from that used by Odell. The activity diagram focuses on representing activities or
chunks of processing which may or may not correspond to the methods of classes. An activity is

a state with an internal action and one or more outgoing transitions which automatically follow
the termination of the internal activity. If an activity has more than one outgoing transition, then
these must be identified through conditions. An interesting feature of the activity diagrams is the
swim lanes. Swim lanes enable you to group activities based on who is performing them, e.g.
academic department vs. hostel office. Thus swim lanes subdivide activities based on the
responsibilities of some components. The activities in a swim lane can be assigned to some
model elements, e.g. classes or some component, etc.
Activity diagrams are normally employed in business process modelling. This is carried out
during the initial stages of requirements analysis and specification. Activity diagrams can be very
useful to understand complex processing activities involving many components. Later these
diagrams can be used to develop interaction diagrams which help to allocate activities
(responsibilities) to classes.
Academic Section Accounts Section Hostel Office Hospital Department

check
student
o m
t.c
records
receive
p o
fees
gs
o
. bl
u p
allot create
r ohostel hospital
s g record
t
n receive register in
d e fees
courses
t u conduct
y s medical
it allot
examinatio
.c room
w
w
issue
w
identity
card
Fig. 37.20 Activity diagram for student admission procedure at IIT
The student admission process in IIT is shown as an activity diagram in fig. 37.20. This
shows the part played by different components of the Institute in the admission procedure. After
the fees are received at the account section, parallel activities start at the hostel office, hospital,
and the Department. After all these activities are completed (this synchronization is represented
as a horizontal line), the identity card can be issued to a student by the Academic section.

2.7. State Chart Diagrams

A state chart diagram is normally used to model how the state of an object changes in its
lifetime. State chat diagrams are good at describing how the behaviour of an object changes
across several use case executions. However, if we are interested in modelling some behaviour
that involves several objects collaborating with each other, state chart diagram is not appropriate.
State chart diagrams are based on the finite state machine (FSM) formalism. An FSM consists of
a finite number of states corresponding to those of the object being modelled. The object
undergoes state changes when specific events occur. The FSM formalism existed long before the
object-oriented technology and has been used for a wide variety of applications. Apart from
modelling, it has even been used in theoretical computer science as a generator for regular
languages.
A major disadvantage of the FSM formalism is the state explosion problem. The number of
states becomes too many and the model too complex when used to model practical systems. This
o m
problem is overcome in UML by using state charts. The state chart formalism was proposed by
David Harel [1990]. A state chart is a hierarchical model of a system and introduces the concept
of a composite state (also called nested state).
o t.c
Actions are associated with transitions and are considered to be processes that occur quickly
s p
and are not interruptible. Activities are associated with states and can take a longer time. An
activity can be interrupted by an event.
o g
.
Order received bl
Unprocessed orderu
p
r o
s g
[reject] checked
n t [accept] checked
d e
t u
Rejected order
y s Accepted order
c it
. items not available] processed
w
[some
w
w [all items available] processed/deliver
all items available
Pending order Fulfilled order
newsupply
Fig. 37.21 State chart diagram for an order object
The basic elements of the state chart diagram are as follows:

Initial state. This is represented as a filled circle.
Final state. This is represented by a filled circle inside a larger circle.

State. These are represented by rectangles with rounded corners.

Transition. A transition is shown as an arrow between two states. Normally, the name
of the event which causes the transition is placed along side the arrow. A guard to the
transition can also be assigned. A guard is a Boolean logic condition. The transition
can take place only if the grade evaluates to true. The syntax for the label of the
transition is shown in 3 parts: event[guard]/action.
An example state chart for the order object of the Trade House Automation software is
shown in fig. 37.21.
3. Object-Oriented Software Development

The object-modelling concepts introduced in the earlier sections can be put together to
develop an object-oriented analysis and design methodology. Object-oriented design (OOD)
advocates a radically different design approach compared to the traditional function-oriented
o m
design approach. OOD paradigm suggests that the natural objects (i.e. the entities) occurring in a
problem should be identified first and then implemented. Object-oriented design techniques not
o t.c
only identify objects, but also identify the internal details of these identified objects. Also, the
relationships existing among different objects are identified and represented in such a way that
the objects can be easily implemented using a programming language.
s p
g
The term object-oriented analysis (OOA) refers to a method of developing an initial model of
o
bl
the software from the requirements specification. The analysis model is refined into a design
.
model. The design model can be implemented using a programming language. The term object-
p
u
oriented programming refers to the implementation of programs using object-oriented concepts.
o
3.1. Design Patterns gr
s
tproblems that recur in many applications. A pattern
Design patterns are reusable solutions to
e n
serves as a guide for creating a goodddesign. Patterns are based on sound common sense and
the application of fundamental design t uprinciples. These are created by people who spot repeating
s solutions are typically described in terms of class and
ti yof design patterns are expert pattern, creator pattern, controller
themes across designs. The pattern
interaction diagrams. Examples
pattern, etc. . c
w the model of a good solution, design patterns include a clear
In addition to providing
w
specification of the problem,
would not work. Thus,w a designandpattern
also explain the circumstances in which the solution would and
has four important parts:
The problem
The context in which the problem occurs
The solution
The context within which the solution works
3.1.1. Design Pattern Solutions

The design pattern solutions are typically described in terms of class and interaction
diagrams.
Expert Pattern
Problem: Which class should be responsible for doing certain things?

Solution: Assign responsibility to the information expert the class that has the information
necessary to fulfill the required responsibility. The expert pattern expresses the common
intuition that objects do things related to the information they have. The class diagram and
collaboration diagrams for this solution to the problem of which class should compute the
total sales is shown in the fig. 37.1.1.
Sale Transaction Saleltem ItemSpecification
(a)
1: total
SaleTransaction
2: subtotal
Saleltem
3: price
o m
ItemSpecification
o t.c
(b)
s p
o g
bl
Fig. 37.22 Expert pattern: (a) Class diagram (b) Collaboration diagram
Creator Pattern p .
ou
r
Problem: Which class should be responsible for creating a new instance of some class?
g
s
Solution: Assign a class C1 the responsibility to create an instance of class C2, if one or more
of the following are true:
ent
C1 is an aggregation of objects of type C2
C1 contains objects of type C2
u d
t
C1 closely uses objects of type C2
s
i t y
C1 has the data that would be required to initialize the objects of type C2, when they
.c
are created
Controller Patternw
w
w
Problem: Who should be responsible for handling the actor requests?
Solution: For every use case, there should be a separate controller object which would be
responsible for handling requests from the actor. Also, the same controller should be used for
all the actor requests pertaining to one use case so that it becomes possible to maintain the
necessary information about the state of the use case. The state information maintained by a
controller can be used to identify the out-of-sequence actor requests, e.g. whether voucher
request is received before arrange payment request.
Model View Separation Pattern
Problem: How should the non-GUI classes communicate with the GUI classes?
Context in which the problem occurs: This is a very commonly occurring pattern which is
found in almost every problem. Here, model is a synonym for the domain layer objects, view
is a synonym for the presentation layer objects such as the GUI objects.

Solution: The model view separation pattern states that model objects should not have direct
knowledge (or be directly coupled) of the view objects. This means that there should not be
any direct calls from other objects to the GUI objects. This results in a good solution, because
the GUI classes are related to a particular application whereas the other classes may be
reused.
There are actually two solutions to this problem which work in different circumstances.
These are as follows:
Solution 1: Polling or Pull from above
It is the responsibility of a GUI object to ask for the relevant information from the other
objects, i.e. the GUI objects pull the necessary information from the other objects whenever
required.
This model is frequently used. However, it is inefficient for certain applications. For
example, simulation applications which require visualization, the GUI objects would not
know when the necessary information becomes available. Other examples are, monitoring
applications such as network monitoring, stock market quotes, and so on. In these situations,
o m
a push-from-below model of display update is required. Since push-from-below is not an
objects is required.
o t.c
acceptable solution, an indirect mode of communication from the other objects to the GUI
Solution 2: Publish- subscribe pattern

s p
g
An event notification system is implemented through which the publisher can indirectly
o
bl
notify the subscribers as soon as the necessary information becomes available. An event
p .
manager class can be defined as one which keeps track of the subscribers and the types of
events they are interested in. An event is published by the publisher by sending a message to
u
the event manager object. The event manager notifies all registered subscribers usually via a
o
r
parameterized message (called a callback). Some languages specifically support event
g
s
manager classes. For example, Java provides the EventListener interface for such purposes.
3.2. Domain Modelling ent

u d
st
Domain modelling is known as conceptual modelling. A domain model is a representation of
it y
the concepts or objects appearing in the problem domain. It also captures the obvious
.c
relationships among these objects. Examples of such conceptual objects are the Book,
w
BookRegister, MemeberRegister, LibraryMember, etc. The recommended strategy is to quickly
w
create a rough conceptual model where the emphasis is in finding the obvious concepts
w
expressed in the requirements while deferring a detailed investigation. Later during the
development process, the conceptual model is incrementally refined and extended.
The objects identified during domain analysis can be classified into three types:
Boundary objects
Controller objects
Entity objects
The boundary and controller objects can be systematically identified from the use case
diagram whereas identification of entity objects requires practice. So, the crux of the domain
modeling activity is to identify the entity models.
3.2.1. Boundary objects

The boundary objects are those with which the actors interact. These include screens, menus,
forms, dialogs, etc. The boundary objects are mainly responsible for user interaction. Therefore,

they normally do not include any processing logic. However, they may be responsible for
validating inputs, formatting, outputs, etc. The boundary objects were earlier being called the
interface objects. However, the term interface class is being used for Java, COM/DCOM, and
UML with different meaning. A recommendation for the initial identification of the boundary
classes is to define one boundary class per actor/use case pair.
3.2.2. Entity objects

These normally hold information such as data tables and files that need to outlive use case
execution, e.g. Book, BookRegister, LibraryMember, etc. Many of the entity objects are dumb
servers. They are normally responsible for storing data, fetching data, and doing some
fundamental kinds of operation that do not change often.
3.2.3. Controller objects

o m
t.c
The controller objects coordinate the activities of a set of entity objects and interface with the
boundary objects to provide the overall behavior of the system. The responsibilities assigned to a
p o
controller object are closely related to the realization of a specific use case. The controller
s
objects effectively decouple the boundary and entity objects from one another making the system
g
o
tolerant to changes of the user interface and processing logic. The controller objects embody
bl
most of the logic involved with the use case realization (this logic may change from time to
.
u p
time). A typical interaction of a controller object with boundary and entity objects is shown in
fig. 37.22. Normally, each use case is realized using one controller object. However, some use
r o
cases can be realized without using any controller object, i.e. through boundary and entity
g
objects only. This is often true for use cases that achieve only some simple manipulation of the
s
stored information.
ent
3.2.4. Example
u d
st
t y
Lets consider the query book availability use case of the Library Information System
i
.c
(LIS). Realization of the use case involves only matching the given book name against the books
available in the catalog. More complex use cases may require more than one controller object to
w
w
realize the use case. A complex use case can have several controller objects such as transaction
w
manager, resource coordinator, and error handler. There is another situation where a use case can
have more than one controller object. Sometimes the use cases require the controller object to
transit through a number of states. In such cases, one controller object might have to be created
for each execution of the use case.

Boundary 1 Controller Boundary 2
Entity 1 Entity 2 Entity 3
Fig. 37.23 A typical realization of a use case through the collaboration of boundary,
controller, and entity objects
o m
3.2.5. Identification of Entity Objects o t.c
s p
g
One of the most important steps in any object-oriented design methodology is the
o
bl
identification of objects. In fact, the quality of the final design depends to a great extent on the
p .
appropriateness of the objects identified. However, to date no formal methodology exists for
identification of objects. Several semi-formal and informal approaches have been proposed for
o u
object identification. These can be classified into the following broad classes:
Grammatical analysis of the problem description
gr
Derivation from data flow s
ent
Derivation from the entity relationship (E-R) diagram
A widely accepted object identification approach is the grammatical analysis approach.
u d
Grady Booch originated the grammatical analysis approach [1991]. In Boochs approach, the
st
nouns occurring in the extended problem description statement (processing narrative) are
i ty
mapped to objects and the verbs are mapped to methods.
.c
w Identification Method
3.3. Boochs Object
w
w
Boochs object identification approach requires a processing narrative of the given problem
to be first developed. The processing narrative describes the problem and discusses how it can be
solved. The objects are identified by noting down the nouns in the processing narrative.
Synonym of a noun must be eliminated. If an object is required to implement a solution, then it is
said to be part of the solution space. Otherwise, if an object is necessary only to describe the
problem, then it is said to be a part of the problem space. However, several of the nouns may not
be objects. An imperative procedure name, i.e., noun form of a verb actually represents an action
and should not be considered as an object. A potential object found after lexical analysis is
usually considered legitimate, only if it satisfies the following criteria:
Retained information: Some information about the object should be remembered for the
system to function. If an object does not contain any private data, it can not be expected to
play any important role in the system.
Multiple attributes: Usually objects have multiple attributes and support multiple methods.
It is very rare to find useful objects which store only a single data element or support only a

single method, because an object having only a single data element or method is usually
implemented as a part of another object.
Common operations: A set of operations can be defined for potential objects. If these
operations apply to all occurrences of the object, then a class can be defined. An attribute or
operation defined for a class must apply to each instance of the class. If some of the attributes
or operations apply only to some specific instances of the class, then one or more subclasses
can be needed for these special objects.
Normally, the actors themselves and the interactions among themselves should be excluded
from the entity identification exercise. However, some times there is a need to maintain
information about an actor within the system. This is not the same as modeling the actor. These
classes are sometimes called surrogates. For example, in the Library Information System (LIS)
we would need to store information about each library member. This is independent of the fact
that the library member also plays the role of an actor of the system.
Although the grammatical approach is simple and intuitively appealing, yet through a naive
use of the approach, it is very difficult to achieve high quality results. In particular, it is very
o m
difficult to come up with useful abstractions simply by doing grammatical analysis of the
description into independent and intuitively correct elements.

o t.c
problem description. Useful abstractions usually result from clever factoring of the problem
s p
3.3.1. An Example: Tic-Tac-Toe
o g
Tic-tac-toe is a computer game in which a human player . bland the computer make alternative
moves on a 3 x 3 square. A move consists of markingpa previously unmarked square. A player
who first places three consecutive marks along aostraight u line (i.e., along a row, column, or
diagonal) on the square, wins the game. As soonr as either the human player or the computer
s g be displayed. If neither player manages to get
wins, a message congratulating the winner should
three consecutive marks along a straight line, n t but all the squares on the board are filled up, then
the game is drawn. The computer always e
u d tries to win a game.
have been underlined in the problem t

By performing a grammatical analysis of this problem statement, it can be seen that nouns
s description and the actions or verbs have been italicized.
However, on closer examinationtysynonyms can be eliminated from the identified nouns. The list
c i synonyms is the following: Tic-tac-toe, computer game, human
.
of nouns after eliminating the
w
player, move, square, mark, straight line, board, row, column, and diagonal.
wdomain.objects,
From this list of possible nouns can be eliminated e.g. human player, as it does not
belong to the problemw Also, the nouns square, game, computer, Tic-tac-toe, straight
line, row, column, and diagonal can be eliminated, as any data and methods can not be
associated with them. The noun move can also be eliminated from the list of potential objects
since it is an imperative verb and actually represents an action. Thus, there is only one object left
board.
After being experienced in object identification, it is not normally necessary to really identify
all nouns in the problem description by underlining them or actually listing them down, and
systematically eliminate the non-objects to arrive at the final set of objects.
The step-by-step workout of the analysis and design procedure is given as follows:
The use case model is shown in fig 37.10.
The initial domain model is shown in fig 37.23(a).
The domain model after adding the boundary and control classes is shown in fig
37.23(b).
Sequence diagram for the play move use case is shown in fig. 37.25.

Class diagram is shown in fig. 37.24. The messages of the sequence diagram have
been populated as methods of the corresponding classes.
Board
(a)
PlayMoveBoundary PlayMoveController Board
(b)
Fig. 37.24 (a) Initial domain model (b) Refined domain model
o m
t.c
Board PlayMoveBoundary
int position [9]
p o
checkMoveValidity
checkResult gs
AnnounceInvalidMove
announceResult
o
bl
playMove displayBoard
p .
o u
gr
ts
Controller
e n
u d
st announceResult
announceInvalidMove
it y
.c Fig. 37.25 Class diagram
w
w
w

:playMove :playMove
:Board
Boundary Controller
Move acceptMove
checkMoveValidity
[invalid move]
[invalid move] announceInvalidMove
announcelnvalidMove checkWinner
[game over]
announceResult
[game over]
o m
announceResult
o t.c
p
playMove
s
o g
checkWinner
[game over]
. bl [game over]
announceResult
u p announceResult
getBoardPositions
r o
displayBoardPosition
s g
nt
[game not over]
promtNextMove
d e
t u
y s
Fig. 37.26tSequence diagram for the play move use case
c i
4. Exercises w
.
w
1. w as True or False. Justify your answer.
Mark the following
proof.
b. Data abstraction helps in easy code maintenance and code reuse.
c. Classes can be considered equivalent to Abstract Data Types (ADTs).
d. The inheritance relationship describes has a relationship among classes.
e. Inheritance feature of the object oriented paradigm helps in code reuse.
f. An important advantage of polymorphism is facilitation of reuse.
g. Using dynamic binding a programmer can send a generic message to a set of objects
which may be of different types i.e. belonging to different classes.
h. In dynamic binding, address of an invoked method is known only at the compile time
i. For any given problem, one should construct all the views using all the diagrams
provided by UML.
j. Use cases are explicitly dependent among themselves.

k. Each actor can participate in one and only one use case.
l. Class diagrams developed using UML can serve as the functional specification of a
system.
m. The terms method and operation are equivalent concepts and can be used
interchangeably.
n. The aggregation relationship can be recursively defined, i.e. an object can contain
instances of itself.
o. In a UML class diagram, the aggregation relationship defines an equivalence
relationship among objects.
p. The aggregation relationship can be considered to be a special type of association
relationship.
q. Normally, you use an interaction diagram to represent how the behaviour of an object
changes over its life time.
r. The interaction diagrams can be effectively used to describe how the behaviour of an
object changes across several use cases.
o m
s. A state chart diagram is good at describing behaviour that involves multiple objects
t.
cooperating with each other to achieve some behaviour.
o t.c
Facade pattern tells how non-GUI classes should communicate with the GUI classes.
u. The use cases should be tightly tied to the GUI.
s p
g
v. The responsibilities assigned to a controller object are closely related to the
o
bl
realization of a specific use case.
the final class diagram. p .

w. There is a one-to-one correspondence between the classes of the domain model and
u
x. A large number of message exchanges between objects indicates good delegation and
o
is a sure sign of a design well-done.
gr
s
y. Deep class hierarchies are the hallmark of any good OOD.
2. ent
z. Cohesiveness of the data and methods within a class is a sign of good OOD.
For the following, mark all options which are true.
u d
a. In the object-oriented approach, each object essentially consists of
t
some data that are private to the object
s
it y
a set of functions (or operations) that operate on those data
.c
the set of methods it provides to the other objects for accessing and manipulating
the data
w
w
none of the above
w
b. Redefinition of methods in a derived class which existed in the base class is called
function overloading
operator overloading
method overriding
none of the above
c. The mechanism by which a subclass inherits attributes and methods from more than
one base class is called
single inheritance
multiple inheritance
multi-level inheritance
hierarchical inheritance
d. In the object-oriented approach, the same message can result in different actions
when received by different objects. This feature is referred to as
static binding

dynamic binding
genericity
overloading
e. UML is
a language to model syntax
an object-oriented development methodology
an automatic code generation tool
none of the above
f. In the context of use case diagram, the stick person icon is used to represent
human users
external systems
internal systems
none of the above
g. The design pattern solutions are typically described in terms of
class diagrams
o m
t.c
object diagrams
interaction diagrams
both class and interaction diagrams p o
h.
g s
The class that should be responsible for doing certain things for which it has the
o
bl
necessary information is the solution proposed by
creator pattern
controller pattern p .
expert pattern
ou
facade pattern
gr
s
nt
i. The class that should be responsible for creating a new instance of some class is the
solution proposed by
creator pattern
d e
controller pattern
t u
expert pattern
y s
facade pattern it
j. .c
The objects identified during domain analysis can be classified into
w
boundary objects
w
controller objects
w
entity objects
all of the above
k. The most critical part of the domain modelling activity is to identify
controller objects
boundary objects
entity objects
none of the above
l. The objects which effectively decouple the boundary and entity objects from one
another making the system tolerant to changes of the user interface and processing
logic are
controller objects
boundary objects
entity objects

none of the above

3. What is the basic difference between a class and its object? Also, identify the basic
difference between methods and messages.
4. Explain what you understand by data abstraction. Identify its advantages.
5. Explain the different types of inheritance with examples. Identify the advantages of
inheritance.
6. Explain encapsulation in the context of OO programming. State the advantages of
encapsulation.
7. Identify the differences between static binding and dynamic binding. What are the
advantages of dynamic binding?
8. Explain the advantages of object-oriented design.
9. Explain the need of a model in the context of software development.
10. Describe the different types of views of a system captured by UML diagrams.
11. What is the purpose of a use case? What is the necessity for developing use case diagram?
12. Which diagrams in UML capture the behavioural view of the system? Which UML
diagrams capture the structural aspects of a system?
o m
13.
t.c
Which UML diagrams capture the important components of the system and their
dependencies?
o
14. p
Represent the following relations among classes using UML diagram.
s
a.
o g
Students credit 5 courses each semester. Each course is taught by one or more
bl
teachers.
b.
unit, and total price. p .
Bill contains a number of items. Each item describes some commodity, the price of
c.
ou
An order consists of one or more order items. Each order item contains the name of
r
the item, its quantity and the date by which it is required. Each order item is described
g
s
by an item type specification object having details such as its vendor addresses, its
15.
unit price, and the manufacturer.
ent
How should you identify use cases of a system?
16.
u d
What is the difference between an operation and a method in the context of OOD
technique?
st
17.
t y
What does the association relationship among classes represent? Give examples of the
i
.c
association relationship.
18.
w
What does aggregation relationship between classes represent? Give examples of
w
aggregation relationship between classes.
19.
20.
w
Why are objects always passed by reference in all popular programming languages?
What are design patterns? What are the advantages of using design patterns? Write down
some popular design patterns and their necessities.
21. Give an outline of object-oriented development process.
22. What is meant by domain modelling? Differentiate the different types of objects that are
identified during domain analysis.

References (Lessons 29 - 33)

1. Sommerville, Software Engineering, Addison Wesley, Reading, MA, USA, 2000.
2. Steve Heath, Embedded System Design: Real World Design, Butter-worth Heinemann,
Newton, Mass., USA, May 2002.
3. Hatley D. and Pirbhai I., Strategies for Real-Time System Specification, Dorset House,
New York, 1987.
4. Ward P.T. and Mellor S.J., Structured Development of Real-Time Systems, Yourdon
Press, New York, 1985.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
38
Testing Embedded
Systems
Distinguish between the terms testing and verification

Describe the common types of faults that occur in embedded systems
Explain the various types of models that are used to represent the faults
Describe the methodology of testing systems with embedded cores
Distinguish among terms like DFT, BIST and on-line testing
Explain the need and mechanism of Automatic Test Pattern Generation in the context of
testing embedded hard-ware software systems
Testing Embedded Systems o m

o t.c
1. Introduction
s p
o g
bl
What is testing?
p .
Testing is an organized process to verify the behavior, performance, and reliability of a
o u
device or system against designed specifications.

gr
It ensures a device or system to be as defect-free as possible.
s
nt
Expected behavior, performance, and reliability must be both formally described and
measurable.
d e
Verification vs. Testing [1] t u
y s
Verification or debugging t
i is the process of removing defects ("bugs") in the design phase
.c design, when manufactured will behave as expected.
to ensure that the synthesized
w step to ensure that the manufactured device is defect free.
Testing is a manufacturing
w
Testing is one of the detective measures, and verification one of the corrective measures
of quality. w
Verification Testing
Verifies the correctness of design. Verifies correctness of manufactured
system.
Performed by simulation, hardware Two-part process:
emulation, or formal methods. 1. Test generation: software process
executed once during design.
2. Test application: electrical tests
applied to hardware.

Performed once prior to manufacturing. Test application performed on every

manufactured device.
Responsible for quality of design. Responsible for quality of devices.
What is an "embedded system"?

Embedded systems are electronically controlled system where hardware and software are
combined [2-3]. These are computers incorporated in consumer products or other devices to
perform application-specific functions. The enduser is usually not even aware of their existence.
Embedded systems can contain a variety of computing devices, such as microcontrollers,
application-specific integrated circuits, and digital signal processors. Most systems used in real
life as power plant system, medical instrument system, home appliances, air traffic control
o m
station, routers and firewalls, telecommunication exchanges, robotics and industrial automation,
smart cards, personal digital assistant (PDA) and cellular phone are example of embedded
system.
o t.c
Real-Time System s p
o g
Most, if not all, embedded systems are "real-time". The terms
b l "real-time" and "embedded" are
.
pthe time at which the result is produced.
often used interchangeably. A real-time system is one in which the correctness of a computation
not only depends on its logical correctness, but also on
o u
g r
In hard real time systems if the timing constraints
s of the system are not met, system crash
could be the consequence. For example,
not an option, time deadlines mustebe ntfollowed.
in mission-critical application where failure is
In case of soft real time systems

u d no catastrophe will occur if deadline fails and the time
limits are negotiable.
st
i ty
. c
In spite of the progress of hardware/software codesign, hardware and software in embedded
system are usually considered separately in the design process. There is a strong interaction
w in their failure mechanisms and diagnosis, as in other aspects of
between hardware and software
w
w in the traditional sense, however it can perform inappropriately due to
system performance. System failures often involve defects in both hardware and software.
Software does not break
faults in the underlying hardware, as well as specification or design flaws in either the hardware
or the software. At the same time, the software can be exploited to test for and respond to the
presence of faults in the underlying hardware. It is necessary to understand the importance of the
testing of embedded system, as its functions have been complicated. However the studies related
to embedded system test are not adequate.
2. Embedded Systems Testing

Test methodologies and test goals differ in the hardware and software domains. Embedded
software development uses specialized compilers and development software that offer means for
debugging. Developers build application software on more powerful computers and eventually
test the application in the target processing environment.

In contrast, hardware testing is concerned mainly with functional verification and self-test after
chip is manufactured. Hardware developers use tools to simulate the correct behavior of circuit
models. Vendors design chips for self-test which mainly ensures proper operation of circuit
models after their implementation. Test engineers who are not the original hardware developers
test the integrated system.
This conventional, divided approach to software and hardware development does not address the
embedded system as a whole during the system design process. It instead focuses on these two
critical issues of testing separately. New problems arise when developers integrate the
components from these different domains.
In theory, unsatisfactory performance of the system under test should lead to a redesign. In
practice, a redesign is rarely feasible because of the cost and delay involved in another complete
design iteration. A common engineering practice is to compensate for problems within the
integrated system prototype by using software patches. These changes can unintentionally affect
the behavior of other parts in the computing system.
o m
t.c provide an excellent
At a higher abstraction level, executable specification languages
o
means to assess embedded-systems designs. Developers can thenptest system-level prototypes
with either formal verification techniques or simulation. A scurrent shortcoming of many
approaches is, however, that the transition from testing atothe g system level to testing at the
b l at the implementation level has
received attention in the research community only p .
implementation level is largely ad hoc. To date, system testing
as coverification, which simulates both
o u
hardware and software components conjointly. Coverification runs simulations of specifications
on powerful computer systems. Commercially ravailable coverification tools link hardware
s g
simulators and software debuggers in the implementation phase of the design process.
t
n employed in mobile products, they are exposed
e
Since embedded systems are frequently
to vibration and other environmental dstresses that can cause them to fail. Some embedded
systems, such as those in automotivetu
y s applications, are exposed to extremely harsh environments.
These applications are preparing
i t embedded systems to meet new and more stringent
.c
requirements of safety and reliability is a significant challenge for designers. Critical applications
and applications with high availability requirements are the main candidates for on-line testing.
w
3. Faults inw w
Embedded Systems
Incorrectness in hardware systems may be described in different terms as defect, error
and faults. These three terms are quite bit confusing. We will define these terms as follows [1]:
Defect: A defect in a hardware system is the unintended difference between the implemented
hardware and its intended design. This may be a process defects, material defects, age defects or
package effects.
Error: A wrong output signal produced by a defective system is called an error. An error is an
effect whose cause is some defect. Errors induce failures, that is, a deviation from
appropriate system behavior. If the failure can lead to an accident, it is a hazard.
Fault: A representation of a defect at the abstraction level is called a fault. Faults are physical
or logical defects in the design or implementation of a device.

3.1 Hardware Fault Model (Gate Level Fault Models)

As the complexity and integration of hardware are increasing with technology, defects
are too numerous and very difficult to analyze. A fault model helps us to identify the targets for
testing and analysis of failure. Further, the effectiveness of the model in terms of its relation to
actual failures should be established by experiments. Faults in a digital system can be classified
into three groups: design, fabrication, and operational faults. Design faults are made by human
designers or CAD software (simulators, translators, or layout generators), and occur during the
design process. These faults are not directly related to the testing process. Fabrication defects
are due to an imperfect manufacturing process. Defects on hardware itself, bad connections,
bridges, improper semiconductor doping and irregular power supply are the examples of physical
faults. Physical faults are also called as defect-oriented faults. Operational or logical faults are
occurred due to environmental disturbances during normal operation of embedded system. Such
disturbances include electromagnetic interference, operator mistakes, and extremes of
o m
temperature and vibration. Some design defects and manufacturing faults escape detection and
combine with wearout and environmental disturbances to cause problems in the field.
o t.c
Hardware faults are classified as stuck-at faults, bridging faults, open faults, power
s p
disturbance faults, spurious current faults, memory faults, transistor faults etc. The most
g
commonly used fault model is that of the stuck-at fault model [1]. This is modeled by having a
o
bl
line segment stuck at logic 0 or 1 (stuck-at 1 or stuck-at 0).
p .
Stuck-at Fault: This is due to the flaws on hardware, and they represent faults of the signal
u
lines. A signal line is the input or output of a logic gate. Each connecting line can have two types
o
gr
of faults: stuck-at-0 (s-a-0) or stuck-at-1 (s-a-1). In general several stuck-at faults can be
simultaneously present in the circuit. A circuit with n lines can have 3n 1 possible stuck line
s
e nt
combinations as each line can be one of the three states: s-a-0, s-a-1 or fault free. Even a
moderate value of n will give large number of multiple stuck-at faults. It is a common practice,
d
therefore to model only single stuck-at faults. An n-line circuit can have at most 2n single stuck-
ut
at faults. This number can be further reduced by fault collapsing technique.
s
i ty by the following properties:
Single stuck-at faults is characterized
1. Fault will occur only.c in one line.
w
3. The fault canw
w
2. The faulty line is permanently set to either 0 or 1.
be at an input or output of a gate.
4. Every fan-out branch is to be considered as a separate line.
Figure 38.1 gives an example of a single stuck-at fault. A stuck-at-1 fault as marked at the output
of OR gate implies that the faulty signal remains 1 irrespective of the input state of the OR gate.

Faultv Response
1
AND True Response
1
AND 0 (1)
0
OR 0(1)
0
Stuck-at-1
Fig. 38.1 An example of a stuck-at fault
Bridging faults: These are due to a short between a group of signal. The logic value of the
shorted net may be modeled as 1-dominant (OR bridge), 0-dominant (AND bridge), or
o m
ot.c
intermediate, depending upon the technology in which the circuit is implemented.
Stuck-Open and Stuck-Short faults: MOS transistor is considered as an ideal switch and two
s p
types of faults are modeled. In stuck-open fault a single transistor is permanently stuck in the
g
open state and in stuck-short fault a single transistor is permanently shorted irrespective of its
o
gate voltage. These are caused by bad connection of signal line.
. bl
whole system. u p
Power disturbance faults: These are caused by inconsistent power supplies and affect the
r o
g
Spurious current faults: that exposed to heavy ion affect whole system.
s
nt
Operational faults are usually classified according to their duration:
d e
Permanent faults exist indefinitely if no corrective action is taken. These are mainly
manufacturing faults and are not frequently occur due to change in system operation or
environmental disturbances. t u
y s
Intermittent faults appear, disappear, and reappear frequently. They are difficult to predict, but
it
their effects are highly correlated. Most of these faults are due to marginal design or
.c
manufacturing steps. These faults occur under a typical environmental disturbance.
w
w
Transient faults appear for an instant and disappear quickly. These are not correlated with each
w
other. These are occurred due random environmental disturbances. Power disturbance faults and
spurious current faults are transient faults.
3.2 Software-Hardware Covalidation Fault Model

A design error is a difference between the designers intent and an executable specification of
the design. Executable specifications are often expressed using high-level hardware-software
languages. Design errors may range from simple syntax errors confined to a single line of a
design description, to a fundamental misunderstanding of the design specification which may
impact a large segment of the description. A design fault describes the behavior of a set of design
errors, allowing a large set of design errors to be modeled by a small set of design faults. The
majority of covalidation fault models are behavioral-level fault models. Existing covalidation
fault models can be classified by the style of behavioral description upon which the models are
based. Many different internal behavioral formats are possible [8]. The covalidation fault models

currently applied to hardware-software designs have their origins in either the hardware [9] or
the software [10] domains.
3.2.1 Textual Fault Models

A textual fault model is one, which is applied directly to the original textual behavioral
description. The simplest textual fault model is the statement coverage metric introduced in
software testing [10] which associates a potential fault with each line of code, and requires that
each statement in the description be executed during testing. This coverage metric is accepted as
having limited accuracy in part because fault effect observation is ignored. Mutation analysis is a
textual fault model which was originally developed in the field of software test, and has also
been applied to hardware validation. A mutant is a version of a behavioral description which
differs from the original by a single potential design error. A mutation operator is a function
which is applied to the original program to generate a mutant.
3.2.2 Control-Dataflow Fault Models o m

o t.c
s p
A number of fault models are based on the traversal of paths through the contol data flow graph
(CDFG) representing the system behavior. In order to apply these fault models to a hardware-
o g
software design, both hardware and software components must be converted into a CDFG
bl
description. Applying these fault models to the CDFG representing a single process is a well
.
u p
understood task. Existing CDFG fault models are restricted to the testing of single processes. The
earliest control-dataflow fault models include the branch coverage and path coverage [10]
models used in software testing.
r o
g
The branch coverage metric associates potential faults with each direction of each
s
ent
conditional in the CDFG. The branch coverage metric has been used for behavioral validation for
coverage evaluation and test generation [11, 12]. The path coverage metric is a more demanding
d
metric than the branch coverage metric because path coverage reflects the number of control-
u
t
flow paths taken. The assumption is that an error is associated with some path through the
s
it y
control flow graph and all control paths must be executed to guarantee fault detection.
Many CDFG fault models consider the requirements for fault activation without
.c
explicitly considering fault effect observability. Researchers have developed observability-based
w
behavioral fault models [13, 14] to alleviate this weakness.
w
w
3.2.3 State Machine Fault Models
Finite state machines (FSMs) are the classic method of describing the behavior of a sequential
system and fault models have been defined to be applied to state machines. The commonly used
fault models are state coverage which requires that all states be reached, and transition coverage
which requires that all transitions be traversed. State machine transition tours, paths covering
each transition of the machine, are applied to microprocessor validation [15]. The most
significant problem with the use of state machine fault models is the complexity resulting from
the state space size of typical systems. Several efforts have been made to alleviate this problem
by identifying a subset of the state machine which is critical for validation [16].

3.2.4 Application-Specific Fault Models

A fault model which is designed to be generally applicable to arbitrary design types may not be
as effective as a fault model which targets the behavioral features of a specific application. To
justify the cost of developing and evaluating an application-specific fault model, the market for
the application must be very large and the fault modes of the application must be well
understood. For this reason, application-specific fault models are seen in microprocessor test and
validation [17,18].
3.3 Interface Faults

To manage the high complexity of hardware-software design and covalidation, efforts have been
made to separate the behavior of each component from the communication architecture [19].
Interface covalidation becomes more significant with the onset of core-based design
m
methodologies which utilize pre-designed, pre-verified cores. Since each core component is pre-
o
t.c
verified, the system covalidation problem focuses on the interface between the components. A
case study of the interface-based covalidation of an image compression system has been
presented [20].
p o
s
4. og
Testing of Embedded Core-Based System-on-Chips (SOCs)
l
. b
core, the UDL tests, and interconnect tests. Each u
p of the individual core tests of each
The system-on-chip test is a single composite test comprised
individual core or UDL test may involve
r
surrounding components. Certain operational constraintso (e.g., safe mode, low power mode,
s
bypass mode) are often required which necessitates g access and isolation modes.
n t
In a core-based system-on-chip [5], e the system integrator designs the User Defined Logic
(UDL) and assembles the pre-designeddcores provided by the core vendor. A core is typically
u DSP, RISC processor, or DRAM core. Embedded cores
hardware description of standard IC te.g.,
y s and in order to protect IP, core vendors do not release the
t
represent intellectual property (IP)
detailed structural information ito the system integrator. Instead a set of test pattern is provided by
.c a specific fault coverage. Though the cores are tested as part of
the core vendor that guarantees
w by the system integrator, the system integrator deals the core as a
w
overall system performance
w
black box. These test patterns must be applied to the cores in a given order, using a specific clock
strategy.
The core internal test developed by a core provider need to be adequately described,
ported and ready for plug and play, i.e., for interoperability, with the system chip test. For an
internal test to accompany its corresponding core and be interoperable, it needs to be described in
an commonly accepted, i.e., standard, format. Such a standard format is currently being
developed by IEEE PI 500 and referred to as standardization of a core test description language
[22].
In SOCs cores are often embedded in several layers of user-defined or other core-based
logic, and direct physical access to its peripheries is not available from chip I/Os. Hence, an
electronic access mechanism is needed. This access mechanism requires additional logic, such as
a wrapper around the core and wiring, such as a test access mechanism to connect core
peripheries to the test sources and sinks. The wrapper performs switching between normal mode
and the test mode(s) and the wiring is meant to connect the wrapper which surrounds the core to
the test source and sink. The wrapper can also be utilized for core isolation. Typically, a core
needs to be isolated from its surroundings in certain test modes. Core isolation is often required
on the input side, the output side, or both.
source sink
test access embedded test access
mechnism core mechnism
wrapper
Fig. 38. 2 Overview of the three elements in an embedded-core test approach: (1) test
pattern source, (2) test access mechanism, and (3) core test wrapper [5].
o m
A conceptual architecture for testing embedded-core-based SOCs is shown in Figure 38.2 It
consists of three structural elements:
ot.c
1. Test Pattern Source and Sink
s p
o g
bl
The test pattern source generates the test stimuli for the embedded core, and the test pattern sink
p .
compares the response(s) to the expected response(s). Test pattern source as well as sink can be
implemented either off-chip by external Automatic Test Equipment (ATE), on-chip by Built-In
ou
Self-Test (or Embedded ATE), or as a combination of both. Source and sink do not need to be of
gr
the same type, e.g., the source of an embedded core can be implemented off-chip, while the sink
s
nt
of the same core is implemented on-chip. The choice for a certain type of source or sink is
determined by (1) The type of circuitry in the core, (2) The type of pre-defined tests that come
e
with the core and (3) Quality and Cost considerations. The type of circuitry of a certain core and
d
u
the type of predefined tests that come with the core determine which implementation options are
t
s
left open for test pattern source and sink. The actual choice for a particular source or sink is in
y
it
general determined by quality and cost considerations. On-chip sources and sinks provide better
.c
accuracy and performance related defect coverage, but at the same time increase the silicon area
w
and hence might reduce manufacturing yield.
w
wMechanism
2. Test Access
The test access mechanism takes care of on-chip test pattern transport. It can be used (1) to
transport test stimuli from the test pattern source to the core-under-test, and (2) to transport test
responses from the core-under-test to the test pattern sink. The test access mechanism is by
definition, implemented on-chip. Although for one core often the same type of' test access
mechanism is used for both stimulus as well as response transportation, this is not required and
various combinations may co-exist. Designing a test access mechanism involves making a trade-
off between the transport capacity (bandwidth) of the mechanism and the test application cost it
induces. The bandwidth is limited by the bandwidth of source and sink and the amount of silicon
area one wants to spend on the test access mechanism itself.

3. Core Test Wrapper

The core test wrapper forms the interface between the embedded core and its system chip
environment. It connects the core terminals both to the rest of the IC, as well as to the test access
mechanism. By definition, the core test wrapper is implemented on-chip.
The core test wrapper should have the following mandatory modes.
Normal operation (i.e., non-test) mode of' the core. In this mode, the core is connected to
its system-IC environment and the wrapper is transparent.
Core test mode. In this mode the test access mechanism is connected to the core, such
that test stimuli can be applied at the core's inputs and responses can be observed at the
core's outputs.
Interconnect test mode. In this mode the test access mechanism is connected to the
interconnect wiring and logic, such that test stimuli can be applied at the core's outputs
and responses can be observed at the core's inputs.
o m
t.c
Apart from these mandatory modes, a core test wrapper might have several optional modes, e.g.,
a detach mode to disconnect the core from its system chip environment and the test access
o
mechanism, or a bypass mode for the test access mechanisms. Depending on the implementation
p
g s
of the test access mechanism, some of the above modes may coincide. For example, if the test
access mechanism uses existing functionality, normal operation and core test mode may
o
coincide.
. bl
u p
Pre-designed cores have their own internal clock distribution system. Different cores
r o
have different clock propagation delays, which might result in clock skew for inter-core
g
communication. The system-IC designer should take care of this clock skew issue in the
s
nt
functional communication between cores. However, clock skew might also corrupt the data
transfer over the test access mechanism, especially if this mechanism is shared by multiple cores.
d e
The core test wrapper is the best place to have provisions for clock skew prevention in the test
access paths between the cores. t u
y s
it
In addition to the test integration and interdependence issues, the system chip composite
.c
test requires adequate test scheduling. Effective test scheduling for SOCs is challenging because
w
it must address several conflicting goals: (1) total SOC testing time minimization, (2) power
w
dissipation, (3) precedence constraints among tests and (4) area overhead constraints [2]. Also,
w
test scheduling is necessary to run intra-core and inter-core tests in certain order not to impact the
initialization and final contents of individual cores.
5. On-Line Testing
On-line testing addresses the detection of operational faults, and is found in computers that
support critical or high-availability applications [23]. The goal of on-line testing is to detect fault
effects, that is, errors, and take appropriate corrective action. On-line testing can be performed by
external or internal monitoring, using either hardware or software; internal monitoring is referred
to as self-testing. Monitoring is internal if it takes place on the same substrate as the circuit under
test (CUT); nowadays, this usually means inside a single ICa system-on-a-chip (SOC).
There are four primary parameters to consider in the design of an on-line testing scheme:

Error coverage (EC): This is defined as the fraction of all modeled errors that are detected,
usually expressed in percent. Critical and highly available systems require very good error
detection or error coverage to minimize the impact of errors that lead to system failure.
Error latency (EL): This is the difference between the first time the error is activated and the
first time it is detected. EL is affected by the time taken to perform a test and by how often tests
are executed. A related parameter is fault latency (FL), defined as the difference between the
onset of the fault and its detection. Clearly, FL EL, so when EL is difficult to determine, FL is
often used instead.
Space redundancy (SR): This is the extra hardware or firmware needed to perform on-line
testing.
Time redundancy (TR): This is the extra time needed to perform on-line testing.
An ideal on-line testing scheme would have 100% error coverage, error latency of 1
clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the
o m
CUT, and impose no functional or structural restrictions on the CUT. To cover all of the fault
types described earlier, two different modes of on-line testing are employed: concurrent testing
o t.c
which takes place during normal system operation, and non-concurrent testing which takes place
while normal operation is temporarily suspended. These operating modes must often be
s p
overlapped to provide a comprehensive on-line testing strategy at acceptable cost.
o g
5.1 Non-concurrent testing
.bl
p
u or time-triggered (periodic), and is
This form of testing is either event-triggered (sporadic)
characterized by low space and time redundancy.o Event-triggered testing is initiated by key
r
events or state changes in the life of a system,gsuch as start-up or shutdown, and its goal is to
detect permanent faults. It is usually advisablets to detect and repair permanent faults as soon as
e n
possible. Event-triggered tests resemble manufacturing tests.
u d faults using the insamethe types

Time-triggered testing is activated at predetermined times operation of the system. It is
t
often done periodically to detect permanent of tests applied by event
triggered testing. This approach iss especially useful in systems that run for extended periods,
where no significant events occuri ty that can trigger testing. Periodic testing is also essential for
detecting intermittent faults..cPeriodic testing can identify latent design or manufacturing flaws
that only appear under thewright environmental conditions.
w
w
5.2 Concurrent testing
Non-concurrent testing [23] cannot detect transient or intermittent faults whose effects disappear
quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults.
However, concurrent testing is not by itself particularly useful for diagnosing the source of
errors, so it is often combined with diagnostic software. It may also be combined with non-
concurrent testing to detect or diagnose complex faults of all types.
A common method of providing hardware support for concurrent testing, especially for
detecting control errors, is a watchdog timer. This is a counter that must be reset by the system
on a repetitive basis to indicate that the system is functioning properly. A watchdog timer is
based on the assumption that the system is fault-freeor at least aliveif it is able to perform
the simple task of resetting the timer at appropriate intervals, which implies that control flow is
correctly traversing timer reset points.
For critical or highly available systems, it is essential to have a comprehensive approach

to on-line testing that covers all expected permanent, intermittent, and transient faults. In recent
years, built-in-self-test (BIST) has emerged as an important method for testing manufacturing
faults, and it is increasingly promoted for on-line testing as well.
6. Test Pattern Generation
6.1 Test Plan

Test plans are generated to verify the device specification, which comprise of the decision on test
type, fault coverage, test time etc. For example, the test pattern generator and response analyzer
may reside on an automatic test equipment (ATE) or on-chip, depending on the test environment.
In the case of production testing in an industry, ATE may be the option, while on-site testing
may require on-chip testers (BIST).
o m
t.c
6.2 Test Programming
p o
The test program comprises modules for the generation of the test vectors and the corresponding
g s
expected responses from a circuit with normal behavior. CAD tools are used to automate the
o
bl
generation of optimized test vectors for the purpose [1,24]. Figure. 38.3 illustrates the basic steps
in the development of a test program.
p .
o u
Chip specifications
gr Test generation Logic design
(from simulators)
s
ent
Test plan
u d Vectors
Physical design
Test types
st
i t y Test Pin assignments
. c
Timing specs Program
Generator
w
w Test program
w
Fig. 38.3 Test program generation
6.3 Test Pattern Generation

Test pattern generation is the process of generating a (minimal) set of input patterns to stimulate
the inputs of a circuit, such that detectable faults can be sensitized and their effects can be
propagated to the output. The process can be done in two phases: (1) derivation of a test, and (2)
application of a test. For (1), appropriate models for the circuit (gate or transistor level) and
faults are to be decided. Construction of the test is to be accomplished in a manner such that the
output signal from a faulty circuit is different from that of a good circuit. This can be
computationally very expensive, but the task is to be performed offline and only once at the end
of the design stage. The generation of a test set can be obtained either by algorithmic methods

(with or without heuristics), or by pseudo-random methods. On the other hand, for (2), a test is
subsequently applied many times to each integrated circuit and thus must be efficient both in
space (storage requirements for the patterns) and in time. The main considerations in evaluating
a test set are: (i) the time to construct a minimal test set; (ii) the size of the test set; (iii) the time
involved to carry out the test; and (iv) the equipment required (if external). Most algorithmic test
pattern generators are based on the concept of sensitized paths.
The Sensitized Path Method is a heuristic approach to generating tests for general
combinational logic networks. The circuit is assumed to have only a single fault in it. The
sensitized path method consists of two parts:
1. The creation of a SENSITIZED PATH from the fault to the primary output. This involves
assigning logic values to the gate inputs in the path from the fault site to a primary output, such
that the fault effect is propagated to the output.
2. The JUSTIFICATION operation, where the assignments made to gate inputs on the sensitized
path is traced back to the primary inputs. This may require several backtracks and iterations.
o m
In the case of sequential circuits the same logic is applied but before that the sequential elements
t.c
are explicitly driven to a required state using scan based design-for-test (DFT) circuitry [1,24].
p o
The best-known algorithms are the D-algorithm, PODEM and FAN [1,24]. Three steps can be
s
identified in most automatic test pattern generation (ATPG) programs: (a) listing the signals on
g
o
the inputs of a gate controlling the line on which a fault should be detected; (b) determining the
bl
primary input conditions necessary to obtain these signals (back propagation) and sensitizing the
.
u p
path to the primary outputs such that the signals and faults can be observed; (c) repeating this
procedure until all detectable faults in a given fault set have been covered.
r o
6.4 ATPG for Hardware-Software s g Covalidation
n t
Several automatic test generation (ATG)e approaches have been developed which vary in the
u
class of search algorithm used, the fault dmodel assumed, the search space technique used, and the
design abstraction level used. In s t to perform test generation for the entire system, both
order
hardware and software component
i y behaviors must be described in a uniform manner. Although
tpossible,
many behavioral formats are
. c ATG approaches have focused on CDFG and FSM
behavioral models.
w
w
Two classeswof search algorithms have been explored, fault directed and coverage
directed. Fault directed techniques successively target a specific fault and construct a test
sequence to detect that fault. Each new test sequence is merged with the current test sequence
(typically through concatenation) and the resulting fault coverage is evaluated to determine if test
generation is complete. Fault directed algorithms have the advantage that they are complete in
the sense that a test sequence will be found for a fault if a test sequence exists, assuming that
sufficient CPU time is allowed. For test generation, each CDFG path can be associated with a set
of constraints which must be satisfied to traverse the path. Because the operations found in a
hardware-software description can be either boolean or arithmetic, the solution method chosen
must be able to handle both types of operations. Constraint logic programming (CLP) techniques
[27] are capable to handle a broad range of constraints including non-linear constraints on both
boolean and arithmetic variables. State machine testing has been accomplished by defining a
transition tour which is a path which traverses each state machine transition at least once
26ransition tours have been generated by iteratively improving an existing partial tour by

concatenating on to it the shortest path to an uncovered transition [26 A significant limitation to

state machine test generation techniques is the time complexity of the state enumeration process
performed during test generation.
Coverage directed algorithms seek to improve coverage without targeting any specific
fault. These algorithms heuristically modify an existing test set to improve total coverage, and
then evaluate the fault coverage produced by the modified test set. If the modified test set
corresponds to an improvement in fault coverage then the modification is accepted. Otherwise
the modification is either rejected or another heuristic is used to determine the acceptability of
the modification. The modification method is typically either random or directed random. An
example of such a technique is presented in [25] which uses a genetic algorithm to successively
improve the population of test sequences.
7. Embedded Software Testing
7.1 Software Unit Testing o m

o t.c
s p
The unit module is either an isolated function or a class. This is done by the development team,
typically the developer and is done usually in the peer review mode. Test data /test cases are
o g
developed based on the specification of the module. The test case consists of either:
bl
Data-intensive testing: applying a large range of data variation for function parameter
.
values, or
u p
Scenario-based testing: exercising different method invocation sequences to perform all
r o
possible use cases as found in the requirements.
s g
n t
Points of Observation are returned value parameters, object property assessments, and source
every effort should be made to locate and

e
code coverage. Since it is not easy to track down trivial errors in a complex embedded system,
d remove them at the unit-test level.
t u
s
7.2 Software Integration
i ty Testing
.c together. Now the module to be tested is a set of functions or
w of integration testing is the validation of the interface. The same
All the unit modules are integrated
w
a cluster of classes. The essence
w
type of Points of Control applies as for unit testing (data-intensive main function call or method-
invocation sequences), while Points of Observation focus on interactions between lower-level
models using information flow diagrams.
First, performance tests can be run that should provide a good indication about the validity of the
architecture. As for functional testing, the earlier is the better. Each forthcoming step will then
include performance testing. White-box testing is also the method used during that step.
Therefore software integration testing is the responsibility of the developer.
7.3 Software Validation Testing

This can be considered one of the activities that occur toward the end of each software
integration. Partial use-case instances, which also called partial scenarios, begin to drive the test
implementation. The test implementation is less aware of and influenced by the implementation
details of the module. Points of Observation include resource usage evaluation since the module

is a significant part of the overall system. This is considered as white-box testing. Therefore,
software validation testing is also the responsibility of the developer.
7.4 System Unit Testing

Now the module to be tested is a full system that consists of user code as tested during software
validation testing plus all real-time operating system (RTOS) and platform-related pieces such as
tasking mechanisms, communications, interrupts, and so on. The Point of Control protocol is no
longer a call to a function or a method invocation, but rather a message sent/received using the
RTOS message queues, for example. Test scripts usually bring the module under test into the
desired initial state; then generate ordered sequences of samples of messages; and validate
messages received by comparing (1) message content against expected messages and (2) date of
reception against timing constraints. The test script is distributed and deployed over the various
virtual testers. System resources are monitored to assess the system's ability to sustain embedded
o m
system execution. For this aspect, grey-box testing is the preferred testing method. In most cases,
only a knowledge of the interface to the module is required to implement and execute
of the developer or of a dedicated system integration team. o t.c

appropriate tests. Depending on the organization, system unit testing is either the responsibility
s p
7.5 System Integration Testing
o g
. bl
u p
The module to be tested starts from a set of components within a single node and eventually
encompasses all system nodes up to a set of distributed nodes. The Points of Control and
r o
Observations (PCOs) are a mix of RTOS and network-related communication protocols, such as
g
RTOS events and network messages. In addition to a component, a Virtual Tester can also play
s
ent
the role of a node. As for software integration, the focus is on validating the various interfaces.
Grey-box testing is the preferred testing method. System integration testing is typically the
d
responsibility of the system integration team.
u
t
7.6 System ValidationysTesting
c it
. a complete implementation subsystem or the complete embedded
The module to be tested is now
system. The objectives ofw
w this final aspect are several:
Meet external-actor functional requirements. Note that an external-actor might either be a
w
device in a telecom network (say if our embedded system is an Internet Router), or a
person (if the system is a consumer device), or both (an Internet Router that can be
administered by an end user).
Perform final non-functional testing such as load and robustness testing. Virtual testers
can be duplicated to simulate load, and be programmed to generate failures in the system.
Ensure interoperability with other connected equipment. Check conformance to
applicable interconnection standards. Going into details for these objectives is not in the
scope of this article. Black-box testing is the preferred method: The tester typically
concentrates on both frequently used and potentially risky or dangerous use-case
instances.

8. Interaction Testing Technique between Hardware and

Software in Embedded Systems
In embedded system where hardware and software are combined, unexpected situation can occur
owing to the interaction faults between hardware and software. As the functions of embedded
system get more complicated, it gets more difficult to detect faults that cause such troubles.
Hence, Faults Injection Technique is strongly recommended in a way it observes system
behaviors by injecting faults into target system so as to detect interaction faults between
hardware and software in embedded system.
The test data selection technique discussed in [21] first simulates behaviors of embedded
system to software program from requirement specification. Then hardware faults, after being
converted to software faults, are injected into the simulated program. And finally, effective test
data are selected to detect faults caused by the interactions between hardware and software.
o m
t.c
9. Conclusion
p o
Rapid advances in test development techniques are needed to reduce the test cost of million-gate
s
SOC devices. In this chapter a number of state-of-the-art techniques are discussed for testing of
g
o
embedded systems. Modular test techniques for digital, mixed-signal, and hierarchical SOCs
bl
must develop further to keep pace with design complexity and integration density. The test data
.
u p
bandwidth needs for analog cores are significantly different than that for digital cores, therefore
unified top-level testing of mixed-signal SOCs remains major challenge. This chapter also
r o
described granular based embedded software testing technique.
s g
References
e nt
d
u Essentials of Electronic Testing Kluwer academic
[1]
t
M. L. Bushnell and V. D Agarwal,
s
Publishers, Norwell, MA, 2000.
[2] ti y for Embedded Software?, IEEE Computer, pp 18-26,
E. A. Lee, What's Ahead
September, 2000. .c
[3] w forConference,
E. A. Lee, Computing embedded systems, proceeding of IEEE Instrumentation and
w
Measurement Technology Budapest, Hungary, May, 2001.
[4] Semiconductorw Industry Association, International Technology Roadmap for
Semiconductors, 2001 Edition, http://public.itrs.net/Files/2001ITRS/Home.html
[5] Y. Zorian, E.J.Marinissen, and S.Dey, Testing Embedded-Core Based System Chips,
IEEE Computer, 32,52-60,1999
[6] M-C Hsueh, T. K.Tsai, and R. K. Lyer, Fault Injection Techniques and Tools, IEEE
Computer, pp75-82, April,1997.
[7] V. Encontre, Testing Embedded Systems: Do You Have The GuTs for It? www-
128.ibm.com/developerworks/rational/library/content/03July/1000/1050/1050.pdf
[8] D. D. Gajski and F. Vahid, Specification and design of embedded hardware-software
systems, IEEE Design and Test of Computers, vol. 12, pp. 5367, 1995.
[9] S. Dey, A. Raghunathan, and K. D. Wagner, Design for testability techniques at the
behavioral and register-transfer level, Journal of Electronic Testing: Theory and
Applications (JETTA), vol. 13, pp. 7991, October 1998.
[10] B. Beizer, Software Testing Techniques, Second Edition, Van Nostrand Reinhold, 1990.

[11] G. Al Hayek and C. Robach, From specification validation to hardware testing: A

unified method, in International Test Conference, pp. 885893, October 1996.
[12] A. von Mayrhauser, T. Chen, J. Kok, C. Anderson, A. Read, and A. Hajjar, On
choosing test criteria for behavioral level harware design verification, in High Level
Design Validation and Test Workshop, pp. 124130, 2000.
[13] L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil, A formal evaluation of data
flow path selection criteria, IEEE Trans. on Software Engineering, vol. SE-15, pp.
13181332, 1989.
[14] S. C. Ntafos, A comparison of some structural testing strategies, IEEE Trans. on
Software Engineering, vol. SE-14, pp. 868874, 1988.
[15] J. Laski and B. Korel, A data flow oriented program testing strategy, IEEE Trans. on
Software Engineering, vol. SE-9, pp. 3343, 1983.
[16] Q. Zhang and I. G. Harris, A domain coverage metric for the validation of behavioral
vhdl descriptions, in International Test Conference, October 2000.
[17] D. Moundanos, J. A. Abraham, and Y. V. Hoskote, Abstraction techniques for
o m
validation coverage analysis and test generation, IEEE Transactions on Computers, vol.
[18]
47, pp. 214, January 1998.
o t.c
N. Malik, S. Roberts, A. Pita, and R. Dobson, Automaton: an autonomous coverage-
p
based multiprocessor system verification environment, in IEEE International Workshop
s
on Rapid System Prototyping, pp. 168172, June 1997.
o g
bl
[19] K.-T. Cheng and A. S. Krishnakumar, Automatic functional test bench generation using
1993. p .
the extended finite state machine model, in Design Automation Conference, pp. 16,
[20] u
J. P. Bergmann and M. A. Horowitz, Improving coverage analysis and test generation
o
r
for large designs, in International Conference on Computer-Aided Design, pp. 580583,
g
1999.
s
[21]
nt
A. Sung and B. Choi, An Interaction Testing Technique between Hardware and
e
Software in Embedded Systems, Proceedings of Ninth Asia-Pacific Software
u d
Engineering Conference, 2002. 4-6 Dec. 2002 Page(s):457 464
[22]
st
IEEE P I500 Web Site. http://grouper.ieee.org/groups/I SOO/.
[23]
t y
H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE
i
[24] .c
Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24
M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and
w
w
Testable Design, IEEE Press 1990.
[25]
w
F. Corno, M. Sonze Reorda, G. Squillero, A. Manzone, and A. Pincetti, Automatic test
bench generation for validation of RT-level descriptions: an industrial experience, in
Design Automation and Test in Europe, pp. 385389, 2000.
[26] R. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, Architecture validation for
processors, in International ymposium on Computer Architecture, pp. 404413, 1995.
[27] P. Van Hentenryck, Constraint Satisfaction in Logic Programming, MIT Press, 1989.
Problems
1. How testing differs from verification?
2. What is embedded system? Define hard real-time system and soft real-time system
with example.
3. Why testing embedded system is difficult?
4. How hardware testing differs from software testing?

5. What is co-testing?
6. Distinguish between defects, errors and faults with example.
7. Calculate the total number of single and multiple stuck-at faults for a logic circuit
with n lines.
8. In the circuit shown in Figure 38.4 if any of the following tests detect the fault x1 s-
a-0?
a) (0,1,1,1)
b) (1,0,1,1)
c) (1,1,0,1)
d) (1,0,1,0)
x1
o m
t.c
po
z
x2
g s
o
x3
. bl
x4
u p
r o
Fig. P1
s g
9. ent
Define the following fault models using examples where possible:
u d
a) Single and multiple stuck-at fault
b) Bridging fault
st
t y
c) Stuck-open and stuck-short fault
i
d) Operational fault
.c
10.
w
What is meant by co-validation fault model?
11.
w
Describe different software fault model?
12.
13.
w
Describe the basic structure of core-based testing approach for embedded system.
What is concurrent or on-line testing? How it differs from non-concurrent testing?
14. Define error coverage, error latency, space redundancy and time redundancy in view
of on-line testing?
15. What is a test vector? How test vectors are generated? Describe different techniques
for test pattern generation.
16. Define the following for software testing:
a) Software unit testing
b) Software integration testing
c) Software validation testing
d) System unit testing
e) System integration testing
f) System validation testing

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
39
Design for Testability
Explain the meaning of the term Design for Testability (DFT)

Describe some adhoc and some formal methods of incorporating DFT in a system level
design
Explain the scan-chain based method of DFT
Highlight the advantages and disadvantages of scan-based designs and discuss
alternatives
Design for Testability
1. Introduction o m
o t.c
The embedded system is an information processing system that consists of hardware and
s p
software components. Nowadays, the number of embedded computing systems in areas such as
g
telecommunications, automotive electronics, office automation, and military applications are
o
. bl
steadily growing. This market expansion arises from greater memory densities as well as
improvements in embeddable processor cores, intellectual-property modules, and sensing
u p
technologies. At the same time, these improvements have increased the amount of software
r o
needed to manage the hardware components, leading to a higher level of system complexity.
g
Designers can no longer develop high-performance systems from scratch but must use
s
nt
sophisticated system modeling tools.
e
The increased complexity of embedded systems and the reduced access to internal nodes has
d
u
made it not only more difficult to diagnose and locate faulty components, but also the functions
t
s
of embedded components may be difficult to measure. Creating testable designs is key to
y
t
developing complex hardware and/or software systems that function reliably throughout their
i
.c
operational life. Testability can be defined with respect to a fault. A fault is testable if there
exists a well-specified procedure (e.g., test pattern generation, evaluation, and application) to
w
w
expose it, and the procedure is implementable with a reasonable cost using current technologies.
w
Testability of the fault therefore represents the inverse of the cost in detecting the fault. A circuit
is testable with respect to a fault set when each and every fault in this set is testable.
Design-for-testability techniques improve the controllability and observability of internal nodes,
so that embedded functions can be tested. Two basic properties determine the testability of a
node: 1) controllability, which is a measure of the difficulty of setting internal circuit nodes to 0
or 1 by assigning values to primary inputs (PIs), and 2) observability, which is a measure of the
difficulty of propagating a nodes value to a primary output (PO) [1-3]. A node is said to be
testable if it is easily controlled and observed. For sequential circuits, some have added
predictability, which represents the ability to obtain known output values in response to given
input stimuli. The factors affecting predictability include initializability, races, hazards,
oscillations, etc. DFT techniques include analog test busses and scan methods. Testability can
also be improved with BIST circuitry, where signal generators and analysis circuitry are
implemented on chip [1, 3-4]. Without testability, design flaws may escape detection until a

product is in the hands of users; equally, operational failures may prove difficult to detect and
diagnose.
Increased embedded system complexity makes thorough assessment of system integrity by
testing external black-box behavior almost impossible. System complexity also complicates test
equipment and procedures. Design for testability should increase a systems testability, resulting
in improved quality while reducing time to market and test costs.
Traditionally, hardware designers and test engineers have focused on proving the correct
manufacture of a design and on locating and repairing field failures. They have developed
several highly structured and effective solutions to this problem, including scan design and self
test. Design verification has been a less formal task, based on the designers skills. However,
designers have found that structured design-for-test features aiding manufacture and repair can
significantly simplify design verification. These features reduce verification cycles from weeks
to days in some cases.
o m
In contrast, software designers and test engineers have targeted design validation and
verification. Unlike hardware, software does not break during field use. Design errors, rather
o t.c
than incorrect replication or wear out, cause operational bugs. Efforts have focused on improving
specifications and programming styles rather than on adding explicit test facilities. For example,
s p
modular design, structured programming, formal specification, and object orientation have all
proven effective in simplifying test.
o g
Although these different approaches are effective when we
b l can cleanly separate a designs
.
p level test strategies. Yet, we may not
hardware and software parts, problems arise when boundaries blur. For example, in the early
design stages of a complex system, we must define systemu
o and which in software. In other cases,
have decided which parts to implement in hardware r
software running on general-purpose hardwaregmay initially deliver certain functions that we
subsequently move to firmware or hardwarets to improve performance. Designers must ensure a
n
e which draw hardware and software test techniques
testable, finished design regardless of implementation decisions. Supporting hardware-software
codesign requires cotesting techniques,
u d
together into a cohesive whole.
st
2. Design for Testability i t y Techniques
.c
w refers to those design techniques that make the task of subsequent
w
Design for testability (DFT)
w
testing easier. There is definitely no single methodology that solves all embedded system-testing
problems. There also is no single DFT technique, which is effective for all kinds of circuits. DFT
techniques can largely be divided into two categories, i.e., ad hoc techniques and structured
(systematic) techniques.
DFT methods for digital circuits:
Ad-hoc methods
Structured methods:
Scan
Partial Scan
Built-in self-test (discussed in Lesson 34)
Boundary scan (discussed in Lesson 34)

2.1 Ad-hoc DFT methods

Good design practices learnt through experience are used as guidelines for ad-hoc DFT. Some
important guidelines are given below.
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of
the most important steps in designing a testable chip is to first partition the chip in an
appropriate way such that for each functional module there is an effective (DFT)
technique to test it. Partitioning must be done at every level of the design process, from
architecture to circuit, whether testing is considered or not. Partitioning can be functional
(according to functional module boundaries) or physical (based on circuit topology).
Partitioning can be done by using multiplexers and/or scan chains.
o m
Test access points must be inserted to enhance controllability & observability of the
o t.c
circuit. Test points include control points (CPs) and observation points (OPs). The CPs
are active test points, while the OPs are passive ones. There are also test points, which are
p
both CPs and OPs. Before exercising test through test points that are not PIs and POs, one
s
g
should investigate into additional requirements on the test points raised by the use of test
o
bl
equipments.
.
Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on
p
u
reset mechanism controllable from primary inputs is the most effective and widely used
o
approach.
g r

s
Test control must be provided for difficult-to-control
t signals.
n
Automatic Test Equipment (ATE) requirements
e such as pin limitation, tri-stating, timing
resolution, speed, memory depth,
u detc., should be considered during the design process to
driving capability, analog/mixed-signal support,
internal/boundary scan support,
s
avoid delay of the project and
t unnecessary investment on the equipments.
i tyand clocks should be disabled during test. To guarantee tester
.c oscillator and clock generator circuitry should be isolated
Internal oscillators, PLLs
synchronization, internal
during the test of w
w the functional circuitry. The internal oscillators and clocks should also
be tested separately.
Analog and w digital circuits should be kept physically separate. Analog circuit testing is
very much different from digital circuit testing. Testing for analog circuits refers to real
measurement, since analog signals are continuous (as opposed to discrete or logic signals
in digital circuits). They require different test equipments and different test
methodologies. Therefore they should be tested separately.
Things to be avoided
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in
the combinational logic can give rise to oscillation for certain inputs. Since no clocking is
employed, timing is continuous instead of discrete, which makes tester synchronization
virtually impossible, and therefore only functional test by application board can be used.

Monostables and self-resetting logic should be avoided. A monostable (one-shot)

multivibrator produces a pulse of constant duration in response to the rising or falling
transition of the trigger input. Its pulse duration is usually controlled externally by a
resistor and a capacitor (with current technology, they also can be integrated on chip).
One-shots are used mainly for 1) pulse shaping, 2) switch-on delays, 3) switch-off delays,
4) signal delays. Since it is not controlled by clocks, synchronization and precise duration
control are very difficult, which in turn reduces testability by ATE. Counters and dividers
are better candidates for delay control.
Redundant gates must be avoided.
High fanin/fanout combinations must be avoided as large fan-in makes the inputs of the
gate difficult to observe and makes the gate output difficult to control.
Gated clocks should be avoided. These degrade the controllability of circuit nodes.
fact, there are drawbacks for these methods: o m

The above guidelines are from experienced practitioners. These are not complete or universal. In
There is a lack of experts and tools.

o t.c
Test generation is often manual
s p
This method cannot guarantee for high fault coverage.
o g
It may increase design iterations.
. bl
This is not suitable for large circuits
u p
r o
2.2 Scan Design Approaches for
s g DFT
n t
2.2.1Objectives of Scan Design
d e
u
Scan design is implemented tto provide controllability and observability of internal state
s
ti y
variables for testing a circuit.
It is also effective for.c
circuit partitioning.
A scan design with wfull controllability and observability turns the sequential test problem
w
w
into a combinational one.
2.2.2 Scan Design Requirements

Circuit is designed using pre-specified design rules.
Test structure (hardware) is added to the verified design.
One (or more) test control (TC) pin at the primary input is required.
Flip-flops are replaced by scan flip-flops (SFF) and are connected so that they
behave as a shift register in the test mode. The output of one SFF is connected to
the input of next SFF. The input of the first flip-flop in the chain is directly
connected to an input pin (denoted as SCANIn), and the output of the last flip-
flop is directly connected to an output pin (denoted as SCANOUT). In this way,
all the flip-flops can be loaded with a known value, and their value can be easily

accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the
scan insertion operation.
Input/output of each scan shift register must be available on PI/PO.
Combinational ATPG is used to obtain tests for all testable faults in the combinational
logic.
Shift register tests are applied and ATPG tests are converted into scan sequences for use
in manufacturing test.
Primary Primary
Inputs Outputs
SFF SCANOUT
Combinational
Logic
o m
t.c
SFF
p o
SFF
g s
o
TC . bl
SCANIN
u p
CLK
r o
s
Fig. 39.1 Scan structureg to a design
Fig. 39.1 shows a scan structure connected n t to design. The scan flip-flips (FFs) must be
d e effectively turns the sequential testing problem
interconnected in a particular way. This approach
t u with
into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there
are two types of overheads associated
y s (including
this technique that the designers care about very
i t
much. These are the hardware overhead three extra pins, multiplexers for all FFs, and
.c overhead (including multiplexer delay and FF delay due to
extra routing area) and performance
extra load). w
w
2.2.3 Scan Design w Rules
Only clocked D-type master-slave flip-flops for all state variables should be used.
At least one PI pin must be available for test. It is better if more pins are available.
All clock inputs to flip-flops must be controlled from primary inputs (PIs). There will be
no gated clock. This is necessary for FFs to function as a scan register.
Clocks must not feed data inputs of flip-flops. A violation of this can lead to a race
condition in the normal mode.

2.2.4 Scan Overheads

The use of scan design produces two types of overheads. These are area overhead and
performance overhead. The scan hardware requires extra area and slows down the signals.
IO pin overhead: At least one primary pin necessary for test.
Area overhead: Gate overhead = [4 nsff/(ng+10nff)] x 100%, where ng = number of
combinational gates; nff = number of flip-flops; nsff = number of scan flip-flops; For full
scan number of scan flip-flops is equal to the number of original circuit flip-flops.
Example: ng = 100k gates, nff = 2k flip-flops, overhead = 6.7%. For more accurate
estimation scan wiring and layout area must be taken into consideration.
Performance overhead: The multiplexer of the scan flip-flop adds two gate-delays in
combinational path. Fanouts of the flip-flops also increased by 1, which can increase the
clock period.
o m
t.c
2.3 Scan Variations
p o
There have been many variations of scan as listed below, few of these are discussed here.
MUXed Scan g s
o
Scan path
. bl
Scan-Hold Flip-Flop
u p
Serial scan r o
Level-Sensitive Scan Design (LSSD) sg
Scan set n t
Random access scan d e
t u
s
2.3.1 MUX Scan ti y
.c in 1973 by M. Williams & Angell.
w
It was invented at Stanford
In this approachwa MUX is inserted in front of each FF to be placed in the scan chain.
w

C/L
X Z
SI M FF M FF M FF SO
C
T
DI L1 L2
D Q D Q
SI
o m
t.c
T
C
p o
Fig. 39.2 The Shift-Register Modificationsapproach
g
Fig. 39.2 shows that when the test mode pin T=0, thelocircuit is in normal operation mode
. b
and when T=1, it is in test mode (or shift-register mode).
The scan flip-flips (FFs) must be interconnectedu p in a particular way. This approach
r o into a combinational one and can be fully
effectively turns the sequential testing problem
tested by compact ATPG patterns.
s g
n t
There are two types of overheads associated with this method. The hardware overhead
overhead includes multiplexer delay

e
due to three extra pins, multiplexers for all FFs, and extra routing area. The performance
d and FF delay due to extra load.
t u
s
2.3.2 Scan Path
i ty
c
This approach is also. called the Clock Scan Approach.
w
It was inventedw by Kobayashi et al. in 1968, and reported by Funatsu et al. in 1975, and
w
adopted by NEC.
In this approach multiplexing is done by two different clocks instead of a MUX.
It uses two-port raceless D-FFs as shown in Figure 39.3. Each FF consists of two latches
operating in a master-slave fashion, and has two clocks (C1 and C2) to control the scan
input (SI) and the normal data input (DI) separately.
The two-port raceless D-FF is controlled in the following way:
For normal mode operation C2 = 1 to block SI and C1 = 0 1 to load DI.
For shift register test mode C1 = 1 to block DI and C2 = 0 1 to load SI.

C2
SI
DI DO
SO
C1 L2
L1
Fig. 39.3 Logic diagram of the two-port raceless D-FF

o m
c
This approach gives a lower hardware overhead (due to t.dense layout) and less
performance penalty (due to the removal of the MUX in frontoof the FF) compared to the
MUX Scan Approach. The real figures however depend s p on the circuit style and
technology selected, and on the physical implementation.
o g
b l
2.3.3 Level-Sensitive Scan Design (LSSD) p .
u
o and T. Williams in 1977 and 1978.
This approach was introduced by Eichelberger
g r
It is a latch-based design used at IBM.ts
e
It guarantees race-free and hazard-free
n system operation as well as testing.
It is insensitive to component u
d
s
faster and has a lower hardware
t complexity than SR modification.
timing variations such as rise time, fall time, and delay. It is
It uses two latches (one i ty for normal operation and one for scan) and three clocks.
Furthermore, to enjoy .cthe luxury of race-free and hazard-free system operation and test,
the designer has towfollow a set of complicated design rules.
wis level sensitive (LS) iff the steady state response to any allowed input
A logic circuitw
change is independent of the delays within the circuit. Also, the response is independent
of the order in which the inputs change

D
C
CD +L
D 0 0 L
L L +L 0 1 L
C 1 0 0
1 1 1
Fig. 39.4 A polarity-hold latch

o m
DI o t.c
+L1
DI
s p
o
C
g
SI
L1 +L1
C
.bl A
SI
u p +L2
r o +L2 B L2
s g
A
ent
B
u d
st
it y
Fig. 39.5 The polarity-hold shift-register latch (SRL)
.c
LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure
w
39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent
w
on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data
w
propagation and stabilization time. Figure 39.5 shows the polarity-hold shift-register latch (SRL)
used in LSSD as the scan cell.
The scan cell is controlled in the following way:
Normal mode: A=B=0, C=0 1.
SR (test) mode: C=0, AB=10 01 to shift SI through L1 and L2.
Advantages of LSSD
1. Correct operation independent of AC characteristics is guaranteed.
2. FSM is reduced to combinational logic as far as testing is concerned.
3. Hazards and races are eliminated, which simplifies test generation and fault simulation.

Drawbacks of LSSD
1. Complex design rules are imposed on designers. There is no freedom to vary from the
overall schemes. It increases the design complexity and hardware costs (4-20% more
hardware and 4 extra pins).
2. Asynchronous designs are not allowed in this approach.
3. Sequential routing of latches can introduce irregular structures.
4. Faults changing combinational function to sequential one may cause trouble, e.g., bridging
and CMOS stuck-open faults.
5. Test application becomes a slow process, and normal-speed testing of the entire test
sequence is impossible.
6. It is not good for memory intensive designs.
o m
o t.c
2.3.4 Random Access Scan s p
o g

bl
This approach was developed by Fujitsu and was used by Fujitsu, Amdahl, and TI.
.

p
It uses an address decoder. By using address decoder we can select a particular FF and
u
either set it to any desired value or read out its value. Figure 39.6 shows a random access
r o
structure and Figure 39.7 shows the RAM cell [1,6-7].
s g
ent
Combinational
u d PO
PI
Logic
st
ity RAM
c
CK
. TC
nff bite
w SCANOUT
wSCANIN
w
Select
Address Address
Log2 nff bites Decoder
Fig. 39.6 The Random Access structure

D Q
From comb. logic To comb.
SD logic
SCANIN Scan flip-flop
(SF
CK
TC
SCAN
SE
o m OUT
Fig. 39.7 The RAM cell

o t.c

s p
The difference between this approach and the previous ones is that the state vector can
g
now be accessed in a random sequence. Since neighboring patterns can be arranged so
o
test application time can be reduced. . bl
that they differ in only a few bits, and only a few response bits need to be observed, the
In this approach test length is reduced. u p

r o
g
This approach provides the ability to `watch' a node in normal operation mode, which is
s
nt
impossible with previous scan methods.

d e
This is suitable for delay and embedded memory testing.
The major disadvantage of the approach is high hardware overhead due to address
t u
decoder, gates added to SFF, address register, extra pins and routing
y s
t
ci
2.3.5 Scan-Hold Flip-Flop
.
w
w
Special type of scan flip-flop with an additional latch designed for low power testing
application. w
It was proposed by DasGupta et al [5]. Figure 39.8 shows a hold latch cascaded with the
SFF.
The control input HOLD keeps the output steady at previous state of flip-flop.
For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomes
transparent.
For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0.
Hardware overhead increases by about 30% due to extra hardware the hold latch.
This approach reduces power dissipation and isolate asynchronous part during scan.
It is suitable for delay test [8].

To SD of
next SHFF
D
Q
S
SFF
T
Q
CK
o m
HO
o t.c
Fig. 39.8 Scan-hold flip-flop (SHFF) s
p
o g
Partial Scan Design . bl
p
In this approach only a subset of flip-flopsuis scanned. The main objectives of this
r o and scan sequence length. It would be
approach are to minimize the area overhead
s g
t
possible to achieve required fault coverage
In this approach sequential ATPG isnused to generate test patterns. Sequential ATPG has
d
number of difficulties such as poor e initializability, poor controllability and observability
t u of gates, number of FFs and sequential depth give little
of the state variables etc. Number
idea regarding testability sand presence of cycles makes testing difficult. Therefore
i
sequential circuit must be tysimplified in such a way so that test generation becomes easier.
c
Removal of selected.flip-flops from scan improves performance and allows limited scan
w
It also allowsw
w
design rule violations.
automation in scan flip-flop selection and test generation
Figure 39.9 shows a design using partial scan architecture [1].
Sequential depth is calculated as the maximum number of FFs encountered from PI line
to PO line.

PI PO
Combinational
circuit
CK1
FF
FF o m
CK2
t.c SCANOUT
SFF
s po
TC
o g
SFF
.bl
SCANIN
u p
r o
Fig. 39.9 Design usings gpartial scan structure
n t
Things to be followed for a partial
d e scan method
t umust be selected, removal of which would eliminate all
A minimum set of flip-flops
y s
cycles. t
i to keep overhead low.
. c
Break only the long cycles
w self-lops should be removed.
All cycles other than
w
3. Conclusions w
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus
it is essential that a designer must consider how the IC will be tested and extra structures will
be incorporated in the design. Scan design has been the backbone of design for testability in
the industry for a long time. Design automation tools are available for scan insertion into a
circuit which then generate test patterns. Overhead increases due to the scan insertion in a
circuit. In ASIC design 10 to 15 % scan overhead is generally accepted.

References
[1] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic

[2] M. Abramovici, M.A. Breuer, and A.D. Friedman, Digital Systems Testing and Testable
Design, IEEE Press 1990.
[3] V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 1:
Principles, IEEE Design and Test of Computers,Vol. 10, No. 1, Mar. 1993, pp. 73-82.
[4] V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 2:
Applications, IEEE Design and Test of Computers, Vol. 10, No. 2, June 1993, pp. 69-77.
[5] S. DasGupta, R. G. Walther, and T. W. Williams, An Enhencement to LSSD and Some
Applications of LSSD in Reliability, in Proc. Of the International Fault-Tolerant
Computing Symposium.
[6] B. R. Wilkins, Testing Digital Circuits, An Introduction, Berkshire, UK: Van Nostrand
Reinhold, 1986[RAM].
o m
t.c
[7] T.W.Williams, editor, VLSI Testing. Amsterdam, The Netherlands: North-Holand, 1986
[RAM].
[8]
p o
A.Krstic and K-T. Cheng, Delay Fault Testing for VLSI Circuits. Boston: Kluwer
Academic Publishers, 1998.
g s
o
Review Questions bl
.
1.
p
What is Design-for-Testability (DFT)? Whatuare the different kinds of DFT techniques
used for digital circuit testing? r o
s g for ad-hoc testing? Describe drawbacks of ad-
2. What are the things that must be followed
hoc testing. n t
3.
e
Describe a full scan structuredimplemented in a digital design. What are the scan
overheads? t u
Suppose that your chipthas y s 100,000 gates and 2,000 flip-flops. A combinational ATPG
4.
i
produced 500 vectorscto fully test the logic. A single scan-chain design will require about
.
w
106 clock cycles for testing. Find the scan test length if 10 scan chains are implemented.
wgate overhead
Given that the circuit has 10 PIs and 10 POs, and only one extra pin can be added for test,
how much morew will be needed for the new design?
5. For a circuit with 100000 gates and 2000 flip-flops connected in a single chain, what will
be the gate overhead for a scan design where scan-hold flip-flops are used?
6. Calculate the syndromes for the carry and sum outputs of a full adder cell. Determine
whether there is any single stuck fault on any input for which one of the outputs is
syndrome-untestable. If there is, suggest an implementation possibly with added inputs,
which makes the cell syndrome-testable.
7. Describe the operation of a level-sensitive scan design implemented in a digital design.
What are design rules to be followed to make the design race-free and hazard-free? What
are the advantages and disadvantages of LSSD?

8. Consider the random-access scan architecture. How would you organize the test data to
minimize the total test time? Describe a simple heuristic for ordering these data.
9. Make a comparison of different scan variations in terms of scan overhead.
10. Consider the combinational circuit below which has been portioned into 3 cones (two
CONE Xs and one CONE Y) and one Exclusive-OR gate.
J
A
B CONE X G
C
D K
CONE Y
H
CONE X
o m
t.c
E
F
p o
g s
For those two cones, we have the following information.
o
. bl
its output is also specified. u p
CONE X has a structure which can be tested 100% by using the following 4 vectors and
r o
B / H sg C / F
A/G
n t OUTPUT
0
d e
0 1 0
0
t u 1 1 0
1
t ys 1 0 1
1ci
. 0 0 1
w
w
w specified.
CONE Y has a structure which can be tested 100% by using the following 4 vectors and
its output is also
C D E OUTPUT
0 0 1 0
0 1 0 1
1 0 1 1
1 1 1 0
Derive a smallest test set to test this circuit so that each partition is applied the required 4
test vectors. Also, the XOR gate should be exhaustively tested.

Fill in the blank entries below. (You may not add additional vectors).
A B C D E F G H J K
0 0 1 1 0
0 1 1 0
1 1 0 1 1
1 0 0 1
o m
o t.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
40
Built-In-Self-Test (BIST)
for Embedded Systems
Explain the meaning of the term Built-in Self-Test (BIST)
Identify the main components of BIST functionality
Describe the various methods of test pattern generation for designing embedded systems
with BIST
Define what is a Signature Analysis Register and describe some methods to designing
such units
Explain what is a Built-in Logic Block Observer (BILBO) and describe how to use this
block for designing BIST
Built-In-Self-Test (BIST) for Embedded Systems o m

1. Introduction ot.c
s p
g
BIST is a design-for-testability technique that places the testing functions physically with the
o
.bl
circuit under test (CUT), as illustrated in Figure 40.1 [1]. The basic BIST architecture requires
the addition of three hardware blocks to a digital circuit: a test pattern generator, a response
u p
analyzer, and a test controller. The test pattern generator generates the test patterns for the CUT.
r o
Examples of pattern generators are a ROM with stored patterns, a counter, and a linear feedback
g
shift register (LFSR). A typical response analyzer is a comparator with stored responses or an
s
nt
LFSR used as a signature analyzer. It compacts and analyzes the test responses to determine
correctness of the CUT. A test control block is necessary to activate the test and analyze the
d e
responses. However, in general, several test-related functions can be executed through a test
controller circuit.
t u
y s
it Test
.c Test Controller
w ROM
w Reference
w Signature
Hard ware M Output
pattern generator U CUT Comparator
Response
X Compactor
PO
Good/Faulty
Signature
Fig. 40.1 A Typical BIST Architecture
As shown in Figure 40.1, the wires from primary inputs (PIs) to MUX and wires from circuit
output to primary outputs (POs) cannot be tested by BIST. In normal operation, the CUT
receives its inputs from other modules and performs the function for which it was designed.
During test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT,

and the test responses are evaluated by a output response compactor. In the most common type
of BIST, test responses are compacted in output response compactor to form (fault) signatures.
The response signatures are compared with reference golden signatures generated or stored on-
chip, and the error signal indicates whether chip is good or faulty.
Four primary parameters must be considered in developing a BIST methodology for embedded
systems; these correspond with the design parameters for on-line testing techniques discussed in
earlier chapter [2].
Fault coverage: This is the fraction of faults of interest that can be exposed by the test
patterns produced by pattern generator and detected by output response monitor. In
presence of input bit stream errors there is a chance that the computed signature matches
the golden signature, and the circuit is reported as fault free. This undesirable property is
called masking or aliasing.
Test set size: This is the number of test patterns produced by the test generator, and is
closely linked to fault coverage: generally, large test sets imply high fault coverage.

o m
Hardware overhead: The extra hardware required for BIST is considered to be overhead.

In most embedded systems, high hardware overhead is not acceptable.
o t.c
Performance overhead: This refers to the impact of BIST hardware on normal circuit
p
performance such as its worst-case (critical) path delays. Overhead of this type is
s
sometimes more important than hardware overhead.
o g
Issues for BIST . bl
u p
o
Area Overhead: Additional active area due to test controller, pattern generator, response
r
evaluator and testing of BIST hardware.
s g
nt
Pin Overhead: At least 1 additional pin is needed to activate BIST operation. Input MUX
adds extra pin overheads.

d e
Performance overhead: Extra path delays are added due to BIST.

t u
Yield loss increases due to increased chip area.
s
Design effort and time increases due to design BIST.
y

it
The BIST hardware complexity increases when the BIST hardware is made testable.
.c
Benefits of BIST w
w
w
It reduces testing and maintenance cost, as it requires simpler and less expensive ATE.
BIST significantly reduces cost of automatic test pattern generation (ATPG).
It reduces storage and maintenance of test patterns.
It can test many units in parallel.
It takes shorter test application times.
It can test at functional system speed.
BIST can be used for non-concurrent, on-line testing of the logic and memory parts of a system
[2]. It can readily be configured for event-triggered testing, in which case, the BIST control can
be tied to the system reset so that testing occurs during system start-up or shutdown. BIST can
also be designed for periodic testing with low fault latency. This requires incorporating a testing
process into the CUT that guarantees the detection of all target faults within a fixed time.
On-line BIST is usually implemented with the twin goals of complete fault coverage and low
fault latency. Hence, the test generation (TG) and response monitor (RM) are generally designed

to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable set
size. These goals are met by different techniques in different parts of the system.
TG and RM are often implemented by simple, counter-like circuits, especially linear-feedback
shift registers (LFSRs) [3]. The LFSR is simply a shift register formed from standard flip-flops,
with the outputs of selected flip-flops being fed back (modulo-2) to the shift registers inputs.
When used as a TG, an LFSR is set to cycle rapidly through a large number of its states. These
states, whose choice and order depend on the design parameters of the LFSR, define the test
patterns. In this mode of operation, an LFSR is seen as a source of (pseudo) random tests that
are, in principle, applicable to any fault and circuit types. An LFSR can also serve as an RM by
counting (in a special sense) the responses produced by the tests. An LFSR RMs final contents
after applying a sequence of test responses forms a fault signature, which can be compared to a
known or generated good signature, to see if a fault is present. Ensuring that the fault coverage is
sufficiently high and the number of tests is sufficiently low are the main problems with random
BIST methods. Two general approaches have been proposed to preserve the cost advantages of
o m
LFSRs while making the generated test sequence much shorter. Test points can be inserted in the
CUT to improve controllability and observability; however, they can also result in performance
o t.c
loss. Alternatively, some determinism can be introduced into the generated test sequence, for
example, by inserting specific seed tests that are known to detect hard faults.
A typical BIST architecture using LFSR is shown in Figure 40.2 [4].
p
s Since the output patterns of
the LFSR are time-shifted and repeated, they become correlated; o g this reduces the effectiveness of
the fault detection. Therefore a phase shifter (a network l
b of XOR gates) is often used to
decorrelate the output patterns of the LFSR. The response. of the CUT is usually compacted by a
u
multiple input shift register (MISR) to a small signature,
p which is compared with a known fault-
r
free signature to determine whether the CUT is faulty. o
s g
n t Scan chain 1 (/bits)
d e
LFSR
. t
Phaseu Scan chain 2 (/bits) MISR
y s
shifter .
.
. c i t .
.
.
w Scan chain n (/bits)
w
w
Fig. 40.2 A generic BIST architecture based on an LFSR, an MISR, and a phase shifter
2. BIST Test Pattern Generation Techniques
2.1 Stored patterns

An automatic test pattern generation (ATPG) and fault simulation technique is used to generate
the test patterns. A good test pattern set is stored in a ROM on the chip. When BIST is activated,
test patterns are applied to the CUT and the responses are compared with the corresponding
stored patterns. Although stored-pattern BIST can provide excellent fault coverage, it has limited
applicability due to its high area overhead.

2.2 Exhaustive patterns

Exhaustive pattern BIST eliminates the test generation process and has very high fault coverage.
To test an n-input block of combinational logic, it applies all possible 2n-input patterns to the
block. Even with high clock speeds, the time required to apply the patterns may make exhaustive
pattern BIST impractical for a circuit with n>20.
DQ1 DQ2 DQ3
Clock
o m
Reset Q1 Q2 o Q3
t.c
Fig. 40.3 Exhaustive pattern generator sp
o g
2.3 Pseudo-exhaustive patterns
. bl
In pseudo-exhaustive pattern generation, the circuit p
u is partitioned into several smaller sub-
o
r applied to each sub-circuit. The main goal
circuits based on the output cones of influence, possibly overlapping blocks with fewer than n
g
inputs. Then all possible test patterns are exhaustively
sfault coverage as the exhaustive testing and, at the
of pseudo-exhaustive test is to obtain the samet
same time, minimize the testing time. Sincenclose to 100% fault coverage is guaranteed, there is
d
no need for fault simulation for exhaustive
e testing and pseudo-exhaustive testing. However,
u to partition the circuits into pseudo-exhaustive testable
such a method requires extra design teffort
s of test patterns and test responses is also a major
ti y may also increase the overhead and decrease the
sub-circuits. Moreover, the delivery
consideration. The added hardware
performance. .c
w
w
w
X1
Five-Bit
X2 2
Binary h
Counter X3 6
1 3
0 for Counter 1 2-Bit X4
2-1 X5 1
1 for Counter 2 MUX
X6 4
Five-Bit f
Binary X7 7
Counter X8 5
2
Fig. 40.4 Pseudo-exhaustive pattern generator

Circuit partitioning for pseudo-exhaustive pattern generation can be done by cone segmentation
as shown in Figure 40.4. Here, a cone is defined as the fan-ins of an output pin. If the size of the
largest cone in K, the patterns must have the property to guarantee that the patterns applied to
any K inputs must contain all possible combinations. In Figure 40.4, the total circuit is divided
into two cones based on the cones of influence. For cone 1 the PO h is influenced by X1, X2, X3,
X4 and X5 while PO f is influenced by inputs X4, X5, X6, X7 and X8. Therefore the total test
pattern needed for exhaustive testing of cone 1 and cone 2 is (25 +25) = 64. But the original
circuit with 8 inputs requires 28 = 256 test patterns exhaustive test.
2.4 Pseudo-Random Pattern Generation

A string of 0s and 1s is called a pseudo-random binary sequence when the bits appear to be
random in the local sense, but they are in someway repeatable. The linear feedback shift register
(LFSR) pattern generator is most commonly used for pseudo-random pattern generation. In
o
contrast with other methods, pseudo-random pattern BIST may require a long test time and
m
general, this requires more patterns than deterministic ATPG, but less than the exhaustive test. In
ot.c
necessitate evaluation of fault coverage by fault simulation. This pattern type, however, has the
potential for lower hardware and performance overheads and less design effort than the
s p
preceding methods. In pseudorandom test patterns, each bit has an approximately equal
g
probability of being a 0 or a 1. The number of patterns applied is typically of the order of 103 to
o
bl
107 and is related to the circuit's testability and the fault coverage required.
p .
Linear feedback shift register reseeding [5] is an example of a BIST technique that is based on
u
controlling the LFSR state. LFSR reseeding may be static, that is LFSR stops generating patterns
o
gr
while loading seeds, or dynamic, that is, test generation and seed loading can proceed
simultaneously. The length of the seed can be either equal to the size of the LFSR (full
s
e nt
reseeding) or less than the LFSR (partial reseeding). In [5], a dynamic reseeding technique that
allows partial reseeding is proposed to encode test vectors. A set of linear equations is solved to
d
obtain the seeds, and test vectors are ordered to facilitate the solution of this set of linear
u
equations.
st
it y
.c
w
wh
w n-1 hn-2 h2 h1
D FF D FF D FF D FF
Xn-1 Xn-2 X1 X0
Fig. 40.5 Standard Linear Feedback Shift Register
Figure 40.5 shows a standard, external exclusive-OR linear feedback shift register. There are n
flip-flops (Xn-1,X0) and this is called n-stage LFSR. It can be a near-exhaustive test pattern
generator as it cycles through 2n-1 states excluding all 0 states. This is known as a maximal
length LFSR. Figure 40.6 shows the implementation of a n-stage LFSR with actual digital
circuit. [1]

hn-1 hn-2 h2 h1
D Q D Q D Q D Q
n-1 n-2
x x x 1
Xn-1 Xn-2 X1 X0
Clock
Fig. 40.6 n-stage LFSR implementation with actual digital circuit
2.5 Pattern Generation by Counter

o m
In a BIST pattern generator based on a folding counter, the properties of the folding counter are
o t.c
exploited to find the seeds needed to cover the given set of deterministic patterns. Width
compression is combined with reseeding to reduce the hardware overhead. In a two-dimensional
s p
test data compression technique an LFSR and a folding counter are combined for scan-based
g
BIST. LFSR reseeding is used to reduce the number of bits to be stored for each pattern
o
bl
(horizontal compression) and folding counter reseeding is used to reduce the number of patterns
(vertical compression).
p .
u
ro Generation
2.6 Weighted Pseudo-random Pattern
g
s
e nt
Bit-flipping [9], bit-fixing, and weighted random BIST [1,8] are example of techniques that rely
on altering the patterns generated by LFSR to embed deterministic test cubes. A hybrid between
d
pseudorandom and stored-pattern BIST, weighted pseudorandom pattern BIST is effective for
u
t
dealing with hard-to-detect faults. In a pseudorandom test, each input bit has a probability of 1/2
s
it y
of being either a 0 or a 1. In a weighted pseudorandom test, the probabilities, or input weights,
can differ. The essence of weighted pseudorandom testing is to bias the probabilities of the input
.c
bits so that the tests needed for hard-to-detect faults are more likely to occur. One approach uses
w
software that determines a single or multiple weight set based on a probabilistic analysis of the
w
hard-to detect faults. Another approach uses a heuristic-based initial weight set followed by
w
additional weight sets produced with the help of an ATPG system. The weights are either
realized by logic or stored in on-chip ROM. With these techniques, researchers obtained fault
coverage over 98% for 10 designs, which is the same as the coverage of deterministic test
vectors.
In hybrid BIST method based on weighted pseudorandom testing, a weight of 0, 1, or
(unbiased) is assigned to each scan chain in CUT. The weight sets are compressed and stored on
the tester. During test application, an on-chip lookup table is used to decompress the data from
the tester and generate weight sets. In order to reduce the hardware overhead, scan cells are
carefully reordered and a special ATPG approach is used to generate suitable test cubes.

DQ DQ DQ DQ DQ DQ DQ DQ
X7 X6 X5 X4 X3 X2 X1 X0
1/16 1/8 1/4 1/2

Weight W1
select W2 1 of 4 MUX
Inversion
Fig. 40.7 Weighted pseudo-random pattern generator
LFSR 0
o m
t.c
0
123 193 61 114

p o
228 92 25
g s
D D D lo D D D D
Q Q
.Qb Q Q Q Q
u p
1/8 3/4 1/2 7/8 1/2 0.8 r o0.6 0.8 0.4 0.5 0.3 0.3
(a)
s g (b)
n t
e
Fig. 40.8 weighted pseudorandom patterns.
d
Figure 40.7 shows a weightedtupseudo-random pattern generator implemented with
s
ti y of 1s and 0s. As shown in Figure 40.8 (a), if a 3-input
programmable probabilities of generating zeros and ones at the PIs. As we know, LFSR
generates pattern with equal probability
. c of 1s becomes 0.125. If a 2-input OR gate is used, the
AND gate is used, the probability
probability becomes 0.75.wSecond, one can use cellular automata to produce patterns of desired
w
weights as shown in Figure
w 40.8(b).
2.7 Cellular Automata for Pattern Generation
Cellular automata are excellent for pattern generation, because they have a better randomness
distribution than LFSRs. There is no shift induced bit value correlation. A cellular automaton is a
collection of cells with regular connections. Each pattern generator cell has few logic gates, a
flip-flop and is connected only to its local neighbors. If Ci is the state of the current CA cell, Ci+1
and Ci-1 are the states of its neighboring cells. The next state of cell Ci is determined by (Ci-1, Ci ,
and Ci+1). The cell is replicated to produce cellular automaton. The two commonly used CA
structures are shown in Figure 40.9.

0 0
Fca Fca Fca Fca Fca Fca
D DD D D D D
Q QQ Q Q Q Q
(a) CA with null boundary conditions
Fca Fca Fca Fca Fca Fca
D DD D D D o m
D
QQ
t.c
Q Q Q Q Q
(b) CA with null cyclic boundary conditions p o

g
Fig. 40.9 The structure of cellular automata s
o
lthe test response data and produce a
In addition to an LFSR, a straightforward way to compress
. b
accumulator aliasing are difficult parameters to u
p the FSM hardware overhead and
fault signature is to use an FSM or an accumulator. However,
control. Keeping the hardware overhead
o in RM design.
acceptably low and reducing aliasing are the mainrdifficulty
s g
2.9 Comparison of Test Generation n t Strategies
d e
Implementing a BIST strategy, the main
t u issues are fault coverage, hardware overhead, test time
overhead, and design effort. These
y s test
four issues have very complicated relationship. Table 1
i
summarizes the characteristics oft the strategies mentioned earlier based on the four issues.
.c7.1 Comparison of different test strategies
Table
w
w
w
Test Generation
Methodology
Fault Hardware Test Time Design
Coverage Overhead Overhead Effort
Stored Pattern High High Short Large
Exhaustive High Low Long Small
Pseudo-exhaustive High High Medium Large
Pseudo-random Low Low Long Small
Weighted Pseudo-random Medium Medium Long Medium
3. BIST Response Compression/Compaction Techniques

During BIST, large amount of data in CUT responses are applied to Response Monitor (RM).
For example, if we consider a circuit of 200 outputs and if we want to generate 5 million random

patterns, then the CUT response to RM will be 1 billion bits. This is not manageable in practice.
So it is necessary to compact this enormous amount of circuit responses to a manageable size
that can be stored on the chip. The response analyzer compresses a very long test response into a
single word. Such a word is called a signature. The signature is then compared with the prestored
golden signature obtained from the fault-free responses using the same compression mechanism.
If the signature matches the golden copy, the CUT is regarded fault-free. Otherwise, it is faulty.
There are different response analysis methods such as ones count, transition count, syndrome
count, and signature analysis.
Compression: A reversible process used to reduce the size of the response. It is difficult in hard
ware.
Compaction: An irreversible (lossy) process used to reduce the size of the response.
a) Parity compression: It computes the parity of a bit stream.

b) Syndrome: It counts the number of 1s in the bit stream.
c)
o m
Transition count: It counts the number of times 01 and 10 condition occur in the
d)
bit stream.
o t.c
Cyclic Redundancy Check (CRC): It is also called signature. It computes CRC check
word on the bit stream.
s p
o g
bl
Signature analysis Compact good machine response into good machine signature. Actual
.
signature generated during testing, and compared with good machine signature.
p
ou
Aliasing: Compression is like a function that maps a large input space (the response) into a small
output space (signature). It is a many-to-one mapping. Errors may occur in the in the input bit
gr
stream. Therefore, a faulty response may have the signature that matches the to the golden
s
nt
signature and the circuit is reported as the fault-free one. Such a situation is referred as the
aliasing or masking. The aliasing probability is the possibility that a faulty response is treated as
fault-free. It is defined as follows:
d e
t u
Let us assume that the possible input patterns are uniformly distributed over the possible mapped
y s
signature values. There are 2m input patterns, 2r signatures and 2n-r input patterns map into given
it
.c
signature. Then the aliasing or masking probability
P(M)= w
Number of erroneos input that map into the golden signature
w Number of faulty input responses
2 m-r -1
w
= m
2 -1
2 m-r
m for large m
2
1
= r
2
The aliasing probability is the major considerations in response analysis. Due to the n-to-1
mapping property of the compression, it is unlikely to do diagnosis after compression. Therefore,
the diagnosis resoluation is very poor after compression. In addition to the aliasing probability,
hardware overhead and hardware compatibility are also important issues. Here, hardware
compatibility is referred to how well the BIST hardware can be incorporated in the CUT or DFT.

3.1 Ones Count

The number of ones in the CUT output response is counted. In this method the number of ones is
the signature. It requires a simple counter to accomplish the goal. Figure 40.10 shows the test
structure of ones count for a single output CUT. For multiple output ones, a counter for each
output or one output at a time with the same input sequence can be used. Input test sequence can
be permuted without changing the count.
Test
CUT
Pattern
Clock Counter
o m
Fig. 40.10 Ones count compression circuit structure
t.c
p ofollows:
For N-bit test length with r ones the masking probability is shown as
g s
N
Number of masking sequences = 1
b lo
r
.
2 possible output sequences with only one fault free. p
N
o u
N
g r
r ts

The masking probabilities: P(M) =
( 2 de1)
N n ( N)
1 2
t u small and very large r. It always detects odd number of

s
It has low masking probability for very
ti y of errors.
errors and it may detect even number
.c
3.2 Transition Count w
It is very similar to w
w
ones count technique. In this method the number of transitions in the CUT
response, zero to one and/or one to zero is counted. Figure 40.11 shows a test structure of
transition counting. It has simple hardware DFF with EXOR to detect a transition and counter to
count number of transitions. It has less aliasing probability than ones counting. Test sequences
cannot be permuted. Permutation of input sequences will change the number of transitions. On
the other hand, one can reorder the test sequence to maximize or minimize the transitions, hence,
minimize the aliasing probability.

DFF
Test
CUT
Pattern
Clock Counter
Fig. 40.11 Transition count compression circuit structure

For N-bit test length with r transitions the masking probability is shown as follows:
For the test length of N, there are N-1 transitions.
N
Number of masking sequences = 1
r
o m
Hence,
N 1
is the number of sequences that has r transitions.
r
o t.c
Since the first output can be either one or zero, therefore, the total p
s number Nmust be multiplied by
2. Therefore total number of sequences with same transition o
g 1
b l counts : 2
r
. Again, only one
of them is fault-free.
p .
N 1 o u
2 1 gr
P(M) = 2 s
t( N)
1 2
Masking probabilities:
( )e
2N
1 n
u d
3.3 Syndrome Testingst
i ty of ones of the CUT output response. The syndrome is 1/8
c
Syndrome is defined as the probability
for a 3-input AND gate and.7/8 for a 3-input OR gate if the inputs has equal probability of ones
w a BIST circuit structure for the syndrome count. It is very similar
w
and zeros. Figure 40.12 shows
w
to ones count and transition count. The difference is that the final count is divided by the number
of patterns being applied. The most distinguished feature of syndrome testing is that the
syndrome is independent of the implementation. It is solely determined by its function of the
circuit.
random
test CUT
pattern
Clock Syndrome Counter
Counter
Syndrome
Fig. 40.12 Syndrome testing circuit structure
The originally design of syndrome test applies exhaustive patterns. Hence, the syndrome is
S = K / 2 n , where n is the number of inputs and K is the number of minterms. A circuit is
syndrome testable if all single stuck-at faults are syndrome detectable. The interesting part of
syndrome testing is that any function can be designed as being syndrome testable.
3.4 LFSR Structure

External and internal type LFSR is used. Both types use D type flip-flop and exclusive-
OR logic as shown in Figure 40.13.
In external type LFSR, XOR gates are placed outside the shift path. It is also called type 1
LFSR [1].
In internal type LFSRs, also called type 2 LFSR, XOR gates are placed in between the
flip-flops.
(a) External Type

o m
(b) Internal Type
t.c
po
D3 D2 D1 D0 D3 D2 D1 D0
Fig. 40.13 Two types of LFSR g s

o
.bl
One of the most important properties of LFSRs is their recurrence relationship. The recurrence
u p
relation guarantees that the states of a LFSR are repeated in a certain order. For a given sequence
r o
of numbers a0, a1, a2,an,.. We can define a generating function:
G(x) = a0 + a1x + a2x2 + + amxm +
s g
nt

= a xm
m=0
m
d e
{am } = {a0 , a1 , a2 ,......} u
st
where ai = 1or 0 depending on the out put stage and time ti .
it y
The initial states are a-n, a-n+1,.,a-2, a-1. The recurrent relation defining {am}is
n
am = ci am i
.c
w
i =1
w
w is not fed back
where c = 0, means output
i
= 1, otherwise

n
G ( x ) = ci am i x m
m = 0 i =1
n
= ci x i am i x m
i =1 m=0

n

= ci x a i x + .... + a1 x + am x m
i i 1
i =1 m=0
c x (a x i + .... + a1 x 1 )
n
i
i i
G ( x) = i =1
n
1 ci x i
i =1
G(x) has been expressed in terms of the initial state and the feedback coefficients. The
n
denominator of the polynomial G(x), f ( x ) = 1 ci x i is called the characteristic polynomial of o m
the LFSR.
i =1
ot.c
s p
g Analysis
3.5 LFSR for Response Compaction: Signature
b lo
It uses cyclic redundancy check code (CRCC) generator
In this method, data bits from circuit Pos to p .be compacted
(LFSR) for response compacter
u as a decreasing order
coefficient polynomial
ro
CRCC divides the PO polynomial by itsgcharacteristic polynomial that leaves remainder
s
t totoknown
of division in LFSR. LFSR must be initialized seed value (usually 0) before testing.
e n
After testing, signature in LFSR is compared good machine signature
For an output sequence of length N, dthere is a total of 2 -1 faulty sequence. Let the input
t u N
s
sequence is represented as P(x) as P(x)=Q(X)G(x)+R(x). G(x) is the characteristic polynomial;
Q(x) is the quotient; and R(x) istythe remainder or signature. For those aliasing faulty sequence,
the remainder R(x) will be thecisame as the fault-free one. Since, P(x) is of order N and G(x) is of
. of N-r. Hence, there are 2 possible Q(x) or P(x). One of them
N-r
w the aliasing probability is shown as follows:
order r, hence Q(x) has an order
is fault-free.
w
Therefore,
2 1 w
N r
r
P(M ) = 2 for large N. Masking probabilities is independent of input sequence.
2 1
N
Figure 40.14 illustrates a modular LFSR as a response compactor.

Characteristics Polynomial x5 + x3 + x + 1
01010001 D Q D Q D Q D Q D Q
1 x x2 x3 x4
CLOCK
X0 X1 X2 X3 X4
Fig. 40.14 Modular LFSR as a response compactor
Any divisor polynomial G(x) with two or more non-zero coefficients will detect all
single-bit errors.
3.6 Multiple-Input Signature Register (MISR)

The problem with ordinary LFSR response compacter is too much hardware overhead if
one of these is put on each primary output (PO).
Multiole-input signature register (MISR) is the solution that compacts all outputs into one
LFSR. It works because LFSR is linear and obeys superposition principle.
All responses are superimposed in one LFSR. The final remainder is XOR sum of
remainders of polynomial divisions of each PO by the characteristic polynomial.
Golden
signature
m
L C M
o m
t.c Signature
F U I
. .
po
S T S Analyzer
. .
R R
. .
g s
o
bl
Si(x)
Test Response
p .
patterns
o
Ri(x) u
g r
Fig. 40.15 Multiple input signature register
s
t test cycle i, the test responses are stable on CUT
Figure 40.15 illustrates a m-stage MISR. After
outputs, but the shifting clock has not yete
n
been applied.
Ri(x)= (m-1)th polynomial representingdthe test responses after test cycle i.
t u of the MISR after test cycle i.
Si(x)=polynomial representing the state
s
R ( x ) = r x + r x + ........
i i , m 1
m 1
ti y + r x + r
i ,m 2
m2
i ,1 i ,0
S ( x) = S x + S i , m 1 x .c+ ........ + S x + S
m 1
i ,m 2
m2
i
S ( x ) = R ( x ) + xS (w
i +1 i
w
x ) mod G ( x )
i
i ,1 i ,0
w polynomial
G ( x ) is the characteristic
Assume initial state of MISR is 0. So,
S0 ( x ) = 0
S1 ( x ) = R0 ( x ) + xS0 ( x ) mod G ( x ) = R0 ( x )
S 2 ( x ) = R1 ( x ) + xS1 ( x ) mod G ( x ) = R1 ( x ) + R0 ( x ) mod G ( x )
.
.
S n ( x ) = x n 1 R0 ( x ) + x n 2 R1 ( x ) + ....... + xRn 2 ( x ) + Rn 1 ( x ) mod G ( x )
This is the signature left in MISR after n patterns are applied. Let us consider a n-bit response
compactor with m-bit error polynomial. Then the error polynomial is of (m+n-2) degree that

gives (2m+n-1-1) non-zero values. G(x) has 2n-1-1 nonzero multiples that result m polynomials of
degree <=m+n-2.
2n1 1
P( M ) = m+ n 1
Probability of masking 2 1
1
m
2
3.7 Logic BIST Architecture

Test-per-clock system
More hardware, less test time.
BILBO: Built in logic bloc observer
Test-per-scan system.
Less hardware, more test time.
STUMPS: Self-Test using a MISR and Parallel Shift register. o m
Circular self-test path
Lowest hardware, lowest fault coverage. o t.c
s p
3.7.1 Test-Per-Clock BIST
o g
. bl
u p
Two different test-per-clock BIST structures are shown in Figure 40.16. For every test clock,
LFSR generates a test vector and Signature Analyzer (MISR) compresses a response vector. In
r o
every clock period some new set of faults is tested. This system requires more hardware. It takes
g
less test time. It can be used for exhaustive test, pseudo-exhaustive test, pseudorandom testing,
s
and weight pseudorandom testing.
ent
LFSR
u d
st LFSR Shift Register
it y
CUT .c CUT
w
w
MISR w MISR
Fig. 40.16 Test-Per-Clock BIST structure
3.7.2 Built-in Logic Block Observer (BILBO)[1]

Built-in logic block observation is a well known approach for pipelined architecture. It adds
some extra hardware to the existing registers (D flip-flop, pattern generator, response
compacter, & scan chain) to make them multifunctional. All FFs are reset to 0. The circuit
diagram of a BILBO module is shown in Figure 40.17. The BILBO has two control signals (B1
and B2).

B1 D1 D2 Dn-1 Dn
B2
MUX
S1 0 DQ DQ DQ D Q SO
Clock 1 C C C C
Q1 Q2 Qn-1 Qn
Fig. 40.17 BILBO Example
Four different modes of BILBO operation

(a) Scan-in-Scan-out: shift register
(b) Normal register mode: PIPO register
(c) Pattern generator mode: LFSR o m
(d) Response compactor mode: MISR
o t.c
sp
3.7.3 BILBO Usage for multi-CUT structure [1]
g
o
.bl
As shown in Figure 40.18, in this BILBO structure, multiple modules can be tested
simultaneously. The total operation is done in two phase as stated below.
u p
B r o B
L C I C
s g I C
U
M
nt
F U L U L I
S T B T B T S
R A O
1 d e B O
2
C R
t u
s
(a) Example test configuration.
y
it
.c
Fig. 40.18 Circuit configured with BILBO
Phase 1 w
w
w
In this mode of operation BILBO1 operates in MISR mode and BILBO2 operates in LFSR
mode. CUT A and CUT C are tested in parallel.
Phase 2
In this of operation BILBO1 operates in LFSR mode and BILBO2 operates in MISR mode. Only
CUT B is tested in this mode of operation.
3.7.4 Test-Per-Scan BIST

Instead of using LFSR and MISR for every input/output pins, this approach combine
LFSR/MISR with shift register to minimize the hardware overhead. Figure 40.19 shows the basic
circuit structure of a test-per-scan BIST. In BIST mode, LFSR generates test vectors and shifted
to the inputs of the CUT via scan register. At the same time, the response are scanned in and
compressed by the LFSR. Due to the use of scan chain for the delivery of test patterns and

responses, the test speed is much slower than the test-per-clock approach. The clocks required
for a test cycle is the maximal of the scan stages of input and output scan registers. Also fall in
this category include CEBS, LOCST, and STUMP.
SI SI
LFSR Scan Register SRI LFSR Scan Register SRI
CUT CUT
SO SO
MISR Scan Register SRO MISR Scan Register SRO
(a) Simple system

m
(b) Alternative system
o
Fig. 40.19 Basic test-per-scan structure
o t.c
s p
3.7.5 Self-Testing Using MISR and Parallel Shift register
sequence generator (STUMP) g
b lo
The architecture of the self-testing using MISR and
p . parallel SRSG (STUMP) is shown in
u
Figure 40.20. Instead of using only one scan chain, it uses multiple scan chains to minimize the
o lengths,
test time. Since the scan chains may have different
g r
length of the longest scan chain) to load all the chains. For such
the LFSR runs for N cycles (the
a design, the internal type LFSR
s
t scan chains
is preferred. If the external type is used, the difference between two LFSR output bits is only the
time shift. Hence, the correlation between two
e n can be very high.
u d
st
Pseudo-Random Test Pattern Generator
i ty
.c Input Phase Shifting Network
w
w
w
SR1 CUT SR CUT SR
SR2 n-1 n
MISR
Fig. 40.20 STUMPS test-per-scan testing system
Test Procedure of STUMP

1. Scan in patterns from LFSR to all scan chain.
2. Switch to normal function mode and apply one clock.
3. Scan out chains into MISR.
4. Overlap steps 1 and 3.
4. BIST for Structured Circuits

Structured design techniques are the keys to the high integration of VLSI circuits. The structured
circuits include read only memories (ROM), random access memories (RAM), programmable
logic array (PLA), and many others. In this section, we would like to focus on PLAs because
they are tightly coupled with the logic circuits. While, memories are usually categorized as
different category. Due to the regularity of the structure and the simplicity of the design, PLAs
are commonly used in digital systems. PLAs are efficient and effective for the implementation of
arbitrary logic functions, combinational or sequential. Therefore, in this section, we would like to
discuss the BIST for PLAs.
A PLA is conceptually a two level AND-OR structure realization of Boolean function. Figure
40.21 shows a general structure of a PLA. A PLA typically consists of three parts, input
decoders, the AND plane, the OR plane, and the output buffer. The input decoders are usually
implemented as single-bit decoders which produce the direct and the complement form of inputs.
m
The AND plane is used to generate all the product terms. The OR plane sum the required product
o
t.c
terms to form the output bits. In the physical implementation, they are implemented as NAND-
NAND or NOR-NOR structure.
p o
g s
o
l Plane
AND Plane
product . bOR
First NOR Plane u p
lines Second NOR Plane
.
.r o
s g
nt
... ...
de
Input Decoders Output Buffers
... ...
t u PLA Outputs
PLA Inputs
s
i ty A general structure of a PLA.
Fig. 40.21
c
As mentioned earlier in the. fault model section, PLAs has the following faults, stuck-at faults,
w faults. Test generation for PLAs is more difficult than that for the
conventional logic. w
w
bridging faults, and crosspoint
This is because that PLAs have more complicated fault models. Further, a
typical PLA may have as many as 50 inputs, 67 inputs, and 190 product terms [10-11].
Functional testing of such PLAs can be a difficult task. PLAs often contain unintentional and
unidentifiable redundancy which might cause fault masking. Further more, PLAs are often
embedded in the logic which complicates the test application and response observation.
Therefore, many people proposed the use of BIST to handle the test of PLAs.
5. BIST Applications
Manufactures are increasingly employing BIST in real products. Examples of such applications
are given to illustrate the use of BIST in semiconductor, communications, and computer
industrial.

5.1 Exhaustive Test in the Intel 80386 [12]

Intel 80386 has BIST logic for the exhaustive test of three control PLAs and three control
ROMs. For PLAs, the exhaustive patterns are generated by LFSRs embedded in the input
registers. For ROMs, the patterns are generated by the microprogram counter which is part of the
normal logic. The largest PLA has 19 input bits. Hence, the test length is 512K clock cycles. The
test responses are compressed by MISRs at the outputs. The contents of MISRs are continuously
shifted out to an LFSR. At the end of testing, the contents of LFSRs are compared.
5.2 Pseudorandom Test in the IBM RISC/6000 [13]

The RISC/6000 has extensive BIST structure to cover the entire system. In accord with
their tradition, RISC/6000 has full serial scan. Hence, the BIST it uses is the pseudorandom
testing in the form of STUMPS. For embedded RAMs, it performs self-test and delay testing. For
m
the BIST, it has a on chip processor (COP) on each chip. In COP, there are an LFSR for pattern
o
t.c
generation, a MISR for response compression, and a counter for address counting in RAM bist.
The COP counts for less than 3% of the chip area.
p o
5.3 Embedded Cache Memories BIST of MC68060
g s [14]
b lo First it has adhoc direct memory
MC68060 has two test approaches for embedded memories.
access for manufacturing testing because it has the only p . memory approach that meets all the
u logic to make address, data in, data
design goals. The adhoc direct memory acess uses additional
o
out, and control line for each memory accessible
g r through package pins. An additional set of
s memory array. For the burn-in test, it builds
control signals selects which memory is activated. The approach makes each memory visible
t
the BIST hardware around the adhoc test e nlogic. The two-scheme approach is used because it
through the chip pins as though it is a stand-alone
u
meets the burn-in requirements with littled additional logic.
st
5.4 ALU Based Programmable i ty MISR of MC68HC11 [15]
.c
w
Broseghini and Lenhert implemented an ALU-Based self-test system on a MC68HC11 Family
waliasing
microcontroller. A fully programmable pseudorandom pattern generator and MISR are used to
w
reduce test length and probabilities. They added microcodes to configure ALU into a
LFSR or MISR. It transforms the adder into a LFSR by forcing the carry input to 0. With such a
feature, the hardware overhead is minimized. The overhead is only 25% as compare to the
implementation by dedicated hardware.
References

[2] H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE
Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24
[3] M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and
Testable Design, IEEE Press 1990.
[4] R. Zurawski, Embedded Systems Handbook, Taylor & Francis, 2005.
[5] C. V. Krishna, A. Jalas, and N. A. Tauba, Test vector encoding using partial LFSR
reseeding, in Proceeding of the International Test Conference, pp. 885-893, 2001.
[6] J. Rajski, J. Tyszer, and N. Zacharia, Test data decompression for multiple scan designs
with boundary scan, IEEE Transactions on Computers, 47, pp. 1188-1200, 1998.
[7] N. A. Tauba and E.J.MaCluskey, Altering a pseudo-random bit sequence for scan
based, in Proceedings of International Test Conference, 1996, pp. 167-175.
[8] S. Wang, Low hardware overhead scan based 3-weight weighted random BIST, in
Proceedings of International Test Conference, 2001, pp. 868-877.
[9] H. J. Wunderlich and G.Kiefer, Bit-flipping BIST, in Proceedings of International
Conference on Computer-Aided Design, 1996, pp. 337-343.
[10] C.Y. Liu, K.K Saluja, and J.S. Ypadhyaya, BIST-PLA: A Built-in Self-Test Design of
Large Programmable Logic Arrays, Proc. 24th Design Automation Conf., June 1987, pp.
385-391.
[11] C.Y.Liu and K.K.Saluja, Built -In Self-Test Techniques for Programmable logic
Arrays, in VLSI Fault Modeling and Testing Techniques, G. W. Zobrist,ed., Ablex
Publishing, Norwood, N.J.,1993.
o m
[12]
t.c
P. Gelsinger, Design and Test of the 80386, IEEE Design & Test of Computers, Vol. 4,
No. 3, June 1987, pp.42-50.
o
[13] p
I.M. Ratiu and H.B. Bakouglu, Pseudorandom Built-In Self-Test Methodology and
s
g
implementation for the IBM RISC System/6000 Processor, IBM J. Research and
o
bl
Development, Vol. 34. 1990, pp.78-84.
[14]
.
A.L. Crouch, M. Pressly, J. Circello, Testability Features of the MC68060
p
Microprocessor, Proc. Intl Test Conf., 1994, pp. 60-69.
[15] u
J. Broseghini and D.H. Lenhert, An ALU-Based Programmable MISR/Pseudorandom
o
r
Generator for a MC68HC11 Family Self-Test, Proc. Intl Test Conf., 1993, pp. 349-358.
g
s
Problems
ent
u d
st
1. What is Built-In-Self-Test? Discuss the issues and benefits of BIST. Describe BIST
architecture and its operation.
it y
2. Excluding the circuit under test, what are the four basic components of BIST and what
.c
function does each component perform?
w
3. Which two BIST components are necessary for system-level testing and why?
w
4. What are the different techniques for test pattern generation?
w
5. Discuss exhaustive and pseudo-exhaustive pattern generation. Give an example to show
that pseudo-exhaustive testing requires less number of test pattern than exhaustive
testing.
6. What is pseudorandom pattern generation? What is an LFSR? Describe pattern
generation using LFSR.
7. Make a comparison of different test strategies based on fault coverage, hardware
overhead, test time overhead and design effort.
8. An LFSR based signature register compresses an n-bit input pattern into an m-bit
signature. Derive an expression for the probability of aliasing. Clearly state any
assumptions you make.
9. Design a weighted pseudo-random pattern generator with programmable weights 1/2, 1/4,
11/32 and 1/16.
10. Prove that the number of 1s in an m-sequence differs from the number of 0s by one.

11. Consider a LFSR based pattern generator where the feedback network is a single XOR
gate before the first stage. If the number of (feedback) inputs to the XOR is odd, is it
possible for the LFSR to generate maximal length sequence? Justify or contradict.
12. Show the schematic diagram of a 4-bit BILBO register.
13. A given data path has p number of n-bit registers. For having BIST capability, suppose
a% of the registers are converted to BILBO. Estimate the percentage overhead in the
registers in terms of extra hardware. All gates may be assumed to have unit cost in your
calculation.
14. It is said that by adding some extra hardware, a combinational circuit can be made
syndrome testable for single stuck-at faults. Illustrate the process for a circuit realizing
the Boolean function f = AB + BC.
15. Define the following:
a) Compression
b) Compaction
c) Signature analysis
d) Aliasing or masking
o m
16. Describe different response compaction techniques.
o t.c
17. What are different types of LFSR? What is modular LFSR? What is characteristic
polynomial?
s p
18. Implement a standard LFSR for the characteristic polynomial f(x) = x8+x7+x2+1.
19. Given the polynomial P(x)=x4+x2+x+1: o g
. bl
a. Design an external feedback LSFR with characteristic polynomial P(x).
c. Is this a maximal length LFSR? u p

b. Starting this LFSR in the all 1s state, determine the sequence produced.
d. Is the characteristic polynomial primitive?r o

s g
20. Describe how LFSR is used in signature analysis for response compaction.
polynomial P(x)=x6+x2+1: ent

21. For an internal feedback Signature Analysis Register (SAR) with characteristic
u d
a) Draw a logic diagram for the complete register.
st
b) Determine the resultant signature that would be obtained for the following serial
t y
sequence of output responses produced by a known good CUT assuming the SAR
i
.c
is initialized to the all 0s state. Give the binary value of the resultant signature as
it would be contained in the SAR in your logic diagram above.
w 101001010010 time
w
w
22. What is MISR? Give architecture of an m-stage MISR and derive its signature. What is
the masking probability of MISR?
23. Describe with example and diagram what are test-per-clock system and test-per-scan
system. What is the difference between them?
24. What is BILBO? Describe BILBO architecture and its operation?
25. Describe how BILBO is implemented in digital circuits?
26. Describe STUMPS testing system and its test procedure.
27. Give some examples of practical BIST application in industry.

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
41
Boundary Scan Methods
and Standards
Explain the meaning of the term Boundary Scan

List the IEEE 1149 series of standards with their important features
Describe the architecture of IEEE 1149.1 boundary scan and explain the functionality of
each of its components
Explain, with the help of an example, how a board-level design can be equipped with the
boundary scan feature
Describe the advantages and disadvantages of the boundary scan technique
Boundary Scan Methods and Standards

o m
1. Boundary Scan History and Family o t.c
s p
g
Boundary Scan is a family of test methodologies aiming at resolving many test problems: from
o
bl
chip level to system level, from logic cores to interconnects between cores, and from digital
p .
circuits to analog or mixed-mode circuits. It is now widely accepted in industry and has been
considered as an industry standard in most large IC system designs. Boundary-scan, as defined
ou
by the IEEE Std. 1149.1 standard [1-3], is an integrated method for testing interconnects on
gr
printed circuit board that is implemented at the IC level. Earlier, most Printed Circuit Board
s
nt
(PCB) testing was done using bed-of-nail in-circuit test equipment. Recent advances with VLSI
technology now enable microprocessors and Application Specific Integrated Circuits (ASICs) to
e
be packaged into fine pitch, high count packages. The miniaturization of device packaging, the
d
u
development of surface-mounted packaging, double-sided and multi-layer board to accommodate
t
s
the extra interconnects between the increased density of devices on the board reduces the
y
it
physical accessibility of test points for traditional bed-of-nails in-circuit tester and poses a great
.c
challenge to test manufacturing defects in future. The long-term solution to this reduction in
w
physical probe access was to consider building the access inside the device i.e. a boundary scan
w
register. In 1985, a group of European companies formed Joint European Test Action Group
w
(JETAG) and by 1988 the Joint Test Action Group (JTAG) was formed by several companies to
tackle these challenges. The JTAG has developed a specification for boundary-scan testing that
was standardized in 1990 by IEEE as the IEEE Std. 1149.1-1990. In 1993 a new revision to the
IEEE Std. 1149.1 standard was introduced (1149.1a) and it contained many clarifications,
corrections, and enhancements. In 1994, a supplement that contains a description of the
boundary-scan Description Language (BSDL) was added to the standard. Since that time, this
standard has been adopted by major electronics companies all over the world. Applications are
found in high volume, high-end consumer products, telecommunication products, defense
systems, computers, peripherals, and avionics. Now, due to its economic advantages, smaller
companies that cannot afford expensive in-circuit testers are using boundary-scan. Figure 41.1
gives an overview of the boundary scan family, now known as the IEEE 1149.x standards.

Number Description Year

IEEE 1149.1 Testing of digital chips and interconnections Std 1149.1 1990
between chips
IEEE 1149.1a Added supplement A. Rewrite of the chapter Std 1149.1a 1993
describing boundary register
IEEE 1149.1b Supplement B - formal description of the Std 1149.1b 1994
boundary-scan Description Language (BSDL)
IEEE 1149.1c Corrections, clarifications and enhancements of Std 1149.1 2001
IEEE Std 1149.1a and Std 1149.1b. Combines
1149.1a & 1149.1b
IEEE 1149.2 Extended Digital Serial Interface. It has merged Obsoleteo m
with 1149.1 group.
o t.c
IEEE 1149.3 Direct Access Testability Interface
s p Obsolete
IEEE 1149.4 Test Mixed-Signal and Analog assemblies
o g Std. 1149.4 1999
l
b (MTM) Std. 1149.5 1995
IEEE 1149.5 Standard Module Test and Maintenance
p .
Bus Protocol. Deals with test uat system level,
r o
1149.2 has merged with.
s g
IEEE 1149.6
n t differential nets. Std 1149.6 - 2002
Includes AC-coupled and/or
IEEE 1532 It is a derivative
d e standard for in-system 2000
t
programming (ISP) u of digital devices.
s
ti yFig. 41.1 IEEE 1149 Family
.c
The Std. 1149.1, usually w
w referred to as the digital boundary scan, is the one that has been used
w
widely. It can be divided into two parts: 1149.1a, or the digital Boundary Scan Standard, and
1149.1b, or the Boundary Scan Description Language (BSDL) [1,6]. Std. 1149.1 defines the chip
level test architecture for digital circuits, and Std. 1149.1b is a hardware description language
used to describe boundary scan architecture. The 1149.2 defines the extended digital series
interface in the chip level. It has merged with 1149.1 group. The 1149.3 defines the direct access
interface in contrast to 1149.2. Unfortunately this work has been discontinued. 1149.4 IEEE
Standard deals with Mixed-Signal Test Bus [4]. This standard extends the test structure defined
in IEEE Std. 1149.1 to allow testing and measurement of mixed-signal circuits. The standard
describes the architecture and the means of control and access to analog and digital test data.
The Std.1149.5 defines the bus protocol at the module level. By combining this level and
Std.1149.1a one can easily carry out the testing of a PC board.
1149.6 IEEE Standard for Boundary-Scan Testing of Advanced Digital Networks is released in
2002. This standard augments 1149.1 for the testing of conventional digital networks and 1149.4
for analog networks. The 1149.6 standard defines boundary-scan structures and methods

required to test advanced digital networks that are not fully covered by IEEE Std. 1149.1, such
as networks that are AC-coupled, differential, or both.
1532 IEEE Standard is developed for In-System Configuration of Programmable Devices [5].
This extension of 1149.1 standardizes programming access and methodology for programmable
integrated circuit devices. Devices such as CPLDs and FPGAs, regardless of vendor, that
implement this standard may be configured (written), read back, erased and verified, singly or
concurrently, with a standardized set of resources based upon the algorithm description
contained in the 1532 BSDL file. JTAG Technologies programming tools contain support for
1532-compliant devices and automatically generate the applications.
Clearly the testing of mixed-mode circuits at the various levels of integration will be a critical
test issue for the system-on-chip design. Therefore there is a demand to combine all the boundary
scan standards into an integrated one.
2. Boundary Scan Architecture

o m
The boundary-scan test architecture provides a means to test interconnects between integrated
o t.c
circuits on a board without using physical test probes. It adds a boundary-scan cell that includes
a multiplexer and latches, to each pin on the device. Figure 41.2 [1] illustrates the main elements
of a universal boundary-scan device.
s p
The Figure 41.2 shows the following elements:
o g
Test Access Port (TAP) with a set of four dedicated . bl test pins: Test Data In (TDI), Test
Mode Select (TMS), Test Clock (TCK), Test Data
u p Out (TDO) and one optional test pin

Test Reset (TRST*).
r o input and primary output pin, connected
gregister (Boundary Scan).
A boundary-scan cell on each device primary
s
A TAP controller with inputs TCK, n
t
internally to form a serial boundary-scan
TMS, and TRST*.

d e holding the current instruction.
An n-bit (n >= 2) instruction register
t
A 1-bit Bypass register (Bypass).u
s register capable of being loaded with a permanent
ti y
An optional 32-bit Identification
device identification code.
.c
w
w
w

1149.1 Chip Architecture

Boundary-Scan Register
Internal Register
Any Digital Chip

1
Bypass Register
TDI
o m
TDO
Identification Register o t.c

s p
1 Instruction Register
o g
. bl
TMS
TAP
u p
TCK r
Controller o
s g 1
n
TRST* t (optional)
Fig. 41.2 Main Elements d eof a IEEE 1149.1 Device Architecture
t u
The test access ports (TAP), which
y s the bus protocol of boundary scan, are the additional
define
c itof the operations

I/O pins needed for each chip employing Std.1149.1a. The TAP controller is a 16-state final state
.
machine that controls each step of boundary scan. Each instruction to be carried
out by the boundary scanwarchitecture is stored in the Instruction Register. The various control
signals associated withw the instruction are then provided by a decoder. Several Test Data
wstored test data or some system related information such as the chip ID,
Registers are used to
company name, etc.
2.1 Bus Protocol

The Test Access Ports (TAPs) are genral purpose ports and provide access to the test function of
the IC between the application circuit and the chips I/O pads. It includes four mandatory pins
TCK, TDI, TDO and TMS and one optional pin TRST* as described below. All TAP inputs and
outputs shall be dedicated connections to the component (i.e., the pins used shall not be used for
any other purpose).
Test Clock Input (TCK): a clock independent of the system clock for the chip so that test
operations can be synchronized between the various parts of a chip. It also synchronizes
the operations between the various chips on a printed circuit board. As a convention, the

test instructions and data are loaded from system input pins on the rising edge of TCK
and driven through system output pins on its falling edge. TCK is pulsed by the
equipment controlling the test and not by the tested device. It can be pulsed at any
frequency (up to a maximum of some MHz). It can be even pulsed at varying rates.
Test Data Input (TDI): an input line to allow the test instruction and test data to be loaded
into the instruction register and the various test data registers, respectively.
Test Data Output (TDO): an output line used to serially output the data from the JTAG
registers to the equipment controlling the test.
Test Mode Selector (TMS): the test control input to the TAP controller. It controls the
transitions of the test interface state machine. The test operations are controlled by the
sequence of 1s and 0s applied to this input. Usually this is the most important input that
has to be controlled by external testers or the on-board test controller.
Test Reset Input (TRST*): The optional TRST* pin is used to initialize the TAP controller, that
o m
is, if the TRST* pin is used, then the TAP controller can be asynchronously reset to a Test-
t.c
Logic-Reset state when a 0 is applied at TRST*. This pin can also be used to reset the circuit
under test, however it is not recommended for this application.
p o
2.2 Boundary Scan Cell g s
lo
The IEEE Std. 1149.1a specifies the design of four.b test data registers as shown in Figure
41.2. Two mandatory test data registers, the bypass and
u p the boundary-scan resisters, must be
o
included in any boundary scan architecture. The boundary scan register, though may be a little
confusing by its name, refers to the collection ofr the boundary scan cells. The other registers,
s gdesign-specific test data registers, can be added
optionally. n t
such as the device identification register and the
d e
Basic Boundary t u Scan Cell (BC 1)
s
ti y Scan Out = 0, Functional mode
.c (SO) Mode = 1, Test mode
(for BC_1)
w
w
Data In w 0 Data Out
(PI) Capture Update
Hold Cell 1 (PO)
Scan Cell
0
D Q D Q
1
Clk Clk
C
U
Scan in ShiftDR ClockDR UpdateDR
S
(SI)
Fig. 41.3 Basic Boundary Scan Cell

Figure 41.3 [1] shows a basic universal boundary-scan cell, known as a BC_1. The cell has four
modes of operation: normal, update, capture, and serial shift. The memory elements are two D-
type flip-flops with front-end and back-end multiplexing of data. It is important to note that the
circuit shown in Figure 41.3 is only an example of how the requirement defined in the Standard
could be realized. The IEEE 1149.1 Standard does not mandate the design of the circuit, only its
functional specification. The four modes of operation are as follows:
1) During normal mode also called serial mode, Data_In is passed straight through to
Data_Out.
2) During update mode, the content of the Update Hold cell is passed through to Data_Out.
Signal values already present in the output scan cells to be passed out through the device
output pins. Signal values already present in the input scan cells will be passed into the
internal logic.
m
3) During capture mode, the Data_In signal is routed to the input Capture Scan cell and the
o
t.c
value is captured by the next ClockDR. ClockDR is a derivative of TCK. Signal values
on device input pins to be loaded into input cells, and signal values passing from the
p o
internal logic to device output pins to be loaded into output cells
4) During shift mode, the Scan_Out of one Capture Scan cellsis passed to the Scan_In of the
o g
next Capture Scan cell via a hard-wired path.
b l input pin and the various modes
.
The Test ClocK, TCK, is fed in via yet another dedicated device
of operation are controlled by a dedicated Test Mode pSelect (TMS) serial control signal. Note
o
that both capture and shift operations do not interfere u with the normal passing of data from the
parallel-in terminal to the parallel-out terminal. rThis allows on the fly capture of operational
s g without interference. This application of
values and the shifting out of these values for inspection
t
the boundary-scan register has tremendousnpotential for real-time monitoring of the operational
d e
status of a system a sort of electronic camera taking snapshots and is one reason why TCK
is kept separate from any system clocks.
t u
s
2.3 Boundary Scan iPath ty
At the device level, the w
.c
boundary-scan elements contribute nothing to the functionality of the
internal logic. In fact, w
the boundary-scan path is independent of the function of the device. The
w
value of the scan path is at the board level as shown in Figure 41.4 [1].
The figure shows a board containing four boundary-scan devices. It is seen that there is an edge-
connector input called TDI connected to the TDI of the first device. TDO from the first device is
permanently connected to TDI of the second device, and so on, creating a global serial scan path
terminating at the edge connector output called TDO. TCK is connected in parallel to each
device TCK input. TMS is connected in parallel to each device TMS input. All cell boundary
data registers are serially loaded and read from this single chain.

Boundary-scan cell
Chip 1 Chip 2
TDI
TMS TMS
Serial
data in
TCK TCK
Chip 4
Chip 3
TMS TMS
o m
TCK TCK
o t.c Serial
data out
s p TDO
o g
.bl TCK
TMS
Serial test interconnect
u p System interconnect
r o
Fig. 41.4 MCM with Serial Boundary Scan Chain
s g
The advantage of this configuration is that tonly two pins on the PCB/MCM are needed for
n
boundary scan data register support. Theedisadvantage is very long shifting sequences to deliver
test patterns to each component, and todshift out test responses. This leads to expensive time on
t
the external tester. As shown in Figureu 41.5 [1], the single scan chain is broken into two parallel
boundary scan chains, which share y s a common test clock (TCK). The extra pin overhead is one
t
i scan chains, so the test patterns are half as long and test
time is roughly halved. Here.c
more pin. As there are two boundary
both chains share common TDI and TDO pins, so when the top two
w
chips are being shifted, the
TDO lines. The opposite wmustbottom two chips must be disabled so that they do not drive their
hold true when the bottom two chips are being tested.
w

TMS1
TMS2
TDO
TCK
TDI
TDI TDO TDI TDO TDI TDO TDI TDO
o m
o t.c
s p
o g
.bl
u p
r o
Fig. 41.5 MCM with two parallel boundary scan chains
s g
2.4 TAP Controller
n t
d e
The operation of the test interface is controlled by the Test Access Port (TAP) controller. This is
u
a 16-state finite state-machine whoset state transitions are controller by the TMS signal; the state-
s 41.7. The TAP controller can change state only at the
ti ystate is determined by the logic level of TMS. In other words,
transition diagram is shown in Figure
rising edge of TCK and the next
the state transition in Figure.c41.6 follows the edge with label 1 when the TMS line is set to 1,
otherwise the edge withw label 0 is followed. The output signals of the TAP controller
corresponding to a subsetw the labels associated with the various states. As shown in Figure
41.2, the TAP consistswof fourofmandatory terminals plus one optional terminal. The main functions of
the TAP controller are:
To reset the boundary scan architecture,
To select the output of instruction or test data to shift out to TDO,
To provide control signals to load instructions into Instruction Register,
To provide signals to shift test data from TDI and test response to TDO, and
To provide signals to perform test functions such as capture and application of test data.

TAP Controller
TMS ClockDR
TCK ShiftDR
TRST* UpdateDR
16-state FSM Reset*
TAP Controller Select
(Moore machine)
ClockIR
ShiftIR
UpdateIR
o m
t.c
Enable
p o
Fig. 41.6 Top level view of TAP Controller
g s
Figure 41.6 shows a top-level view of TAP Controller. TMSlo
. b andtheTCK (and the optional TRST*)
signals include dedicated signals to the Instruction u p (ClockIR, ShiftIR, UpdateIR) and
go to a 16-state finite-state machine controller, which produces
register
various control signals. These
r o
generic signals to all data registers (ClockDR, ShiftDR, UpdateDR). The data register that
s g particular
actually responds is the one enabled by the conditional control signals generated at the parallel
n t
outputs of the Instruction register, according to the instruction.
The other signals, Reset, Select and Enable
d e are distributed as follows:
t u register and to the target Data Register
Reset is distributed to the Instruction
s
Select is distributed to theyoutput multiplexer
ti
Enable is distributed.to c the output driver amplifier
w uses the term Data Register to mean any target register except
w
It must be noted that the Standard
w
the Instruction register

TAP Controller State Diagram

Test_Logic
1 Reset
0
Run_Test/ 1 Select 1 Select 1
0 DR_Scan IR_Scan
Idle
0 0
1 1
Capture_DR Capture_IR
0 0
Shift_DR 0 Shift_IR 0
1 1
1 1
Exit_DR Exit1_IR
0 0
o m
Pause_DR 0 Pause_IR
t.c 0
po
1 1
0
Exit2_DR
0
g s Exit2_IR
1
o 1
bl
p.
Update_DR Update_IR
1 0 1 0
o u
g r of TAP controller
s
Fig. 41.7 State transition diagram
t
e
Figure 41.7 shows the 16-state state table for n the TAP controller. The value on the state transition
u d
arcs is the value of TMS. A state transition occurs on the positive edge of TCK and the controller
output values change on the negativet edge of TCK. The 16 states can be divided into three parts.
y
The first part contains the reset and s idle states, the second and third parts control the operations
t
i respectively. Since the only difference between the second
of the data and instruction registers,
and the third parts are on the.cregisters they deal with, in the following only the states in the first
w
and second parts are described.
part. w Similar description on the second part can be applied to the third
w
1. Test-Logic-Reset: In this state, the boundary scan circuitry is disabled and the system is in
its normal function. Whenever a Reset* signal is applied to the BS circuit, it also goes back
to this state. One should also notice that whatever state the TAP controller is at, it will goes
back to this state if 5 consecutive 1's are applied through TMS to the TAP controller.
2. Run-Test/Idle: This is a state at which the boundary scan circuitry is waiting for some test
operations such as BIST operations to complete. One typical example is that if a BIST
operation requires 216 cycles to complete, then after setting up the initial condition for the
BIST operation, the TAP controller will go back to this state and wait for 216 cycles before it
starts to shift out the test results.
3. Select-DR-Scan: This is a temporary state to allow the test data sequence for the selected
test-data register to be initiated.

4. Capture-DR: In this state, data can be loaded in parallel to the data registers selected by the
current instruction.
5. Shift-DR: In this state, test data are scanned in series through the data registers selected by
the current instruction. The TAP controller may stay at this state as long as TMS=0. For
each clock cycle, one data bit is shifted into (out of) the selected data register through TDI
(TDO).
6. Exit-DR: All parallel-loaded (from the Capture-DR state) or shifted (from the Shift-DR
state) data are held in the selected data register in this state.
7. Pause-DR: The BS pauses its function here to wait for some external operations. For
example, when a long test data is to be loaded to the chip(s) under test, the external tester
may need to reload the data from time to time. The Pause-DR is a state that allows the
boundary scan architecture to wait for more data to shift in.
8. Exit2-DR: This state represents the end of the Pause-DR operation, allows the TAP
controller to go back to ShiftDR state for more data to shift in.
o m
9. Update-DR: The test data stored in the first stage of boundary scan
t. c cells is loaded to the
second stage in this state.
p o
2.5 Bypass and Identification Registers gs
b lo
Figure 41.8 shows a typical design for a Bypass register.
p . It is a 1-bit register, selected by the
means that the Update_DR control has no effect ono
u
Bypass instruction and provides a basic serial-shift function. There is no parallel output (which
a rhard-wired value of logic 0.

the register), but there is a defined effect with
the Capture_DR control the register captures g
ts
e n
0
u d
st D Q To TDO
From TDI
i ty
.c
w
w ShiftDR Clk
w
ClockDR
Fig. 41.8 Bypass register
2.6 Instruction Register

As shown in Figure 41.9, an Instruction register has a shift scan section that can be connected
between TDI and TDO, and a hold section that holds the current instruction. There may be some
decoding logic beyond the hold section depending on the width of the register and the number of
different instructions. The control signals to the Instruction register originate from the TAP
controller and either cause a shift-in/shift-out through the Instruction register shift section, or
cause the contents of the shift section to be passed across to the hold section (parallel Update

operation). It is also possible to load (Capture) internal hard-wired values into the shift section of
the Instruction register. The Instruction register must be at least two-bits long to allow coding of
the four mandatory instructions Extest, Bypass, Sample, Preload but the maximum length
of the Instruction register is not defined. In capture mode, the two least significant bits must
capture a 01 pattern. (Note: by convention, the least-significant bit of any register connected
between the device TDI and TDO pins, is always the bit closest to TDO.) The values captured
into higher-order bits of the Instruction register are not defined in the Standard. One possible use
of these higher-order bits is to capture an informal identification code if the optional 32-bit
Identification register is not implemented. In practice, the only mandated bits for the Instruction
register capture is the 01 pattern in the two least-significant bits. We will return to the value of
capturing this pattern later in the tutorial.
Instruction Register
m
DR select and control signals routed to selected target register
o
o t.c
p
Decode Logic
s
o g
b l register
.Hold
p current instruction)
(Holds
o u
r
g Scan Register
From
ts To
TDI
e n
Scan-in new instruction/scan-out capture bits) TDO
u d
TAP
st Higher order bits:
0 1
Controller
i ty
IR Control
current instruction, status bits, informal ident,
.c results of a power-up self test,
w Fig. 41.9 Instruction register
w
2.7 Instruction Set
w
The IEEE 1149.1 Standard describes four mandatory instructions: Extest, Bypass, Sample, and
Preload, and six optional instructions: Intest, Idcode, Usercode, Runbist, Clamp and HighZ.
Whenever a register is selected to become active between TDI and TDO, it is always possible to
perform three operations on the register: parallel Capture followed by serial Shift followed by
parallel Update. The order of these operations is fixed by the state-sequencing design of the TAP
controller. For some target Data registers, some of these operations will be effectively null
operations, no ops.

Standard Instructions
Instruction Selected Data Register
Mandatory:
Extest Boundary scan (formerly all-0s code)
Bypass Bypass (initialized state, all-1s code)
Sample Boundary scan (device in functional mode)
Preload Boundary scan (device in function mode)
Optional:
Intest Boundary scan
Idcode identification (initialized state if present)
Usercode Identification (for PLDs)
Runbist Result register
o m
t.c
Clamp Bypass (output pins in safe state)
HighZ Bypass (output pins in high-Z state)
p o
NB. All unused instruction codes must default to Bypass
g s
EXTEST: This instruction is used to test interconnect between two chips. The code for Extest
o
bl
used to be defined to be the all-0s code. The EXTEST instruction places an IEEE 1149.1
p .
compliant device into an external boundary test mode and selects the boundary scan register to
be connected between TDI and TDO. During this instruction, the boundary scan cells associated
o u
with outputs are preloaded with test patterns to test downstream devices. The input boundary
g r
cells are set up to capture the input data for later analysis.
BYPASS: A device's boundary scan chaintscan be skipped using the BYPASS instruction,
allowing the data to pass through the bypassnregister. The Bypass instruction must be assigned an
e
all-1s code and when executed, causesdthe Bypass register to be placed between the TDI and
t u of a selected device without incurring the overhead of
TDO pins. This allows efficient testing
traversing through other devices. sThe BYPASS instruction allows an IEEE 1149.1 compliant
i
device to remain in a functional tymode and selects the bypass register to be connected between
the TDI and TDO pins. The.cBYPASS instruction allows serial data to be transferred through a
device from the TDI pin towthe TDO pin without affecting the operation of the device.
w
w
SAMPLE/PRELOAD: The Sample and Preload instructions, and their predecessor the
Sample/Preload instruction, selects the Boundary-Scan register when executed. The instruction
sets up the boundary-scan cells either to sample (capture) values or to preload known values into
the boundary-scan cells prior to some follow-on operation. During this instruction, the boundary
scan register can be accessed via a data scan operation, to take a sample of the functional data
entering and leaving the device. This instruction is also used to preload test data into the
boundary-scan register prior to loading an EXTEST instruction.
INTEST: With this command the boundary scan register (BSR) is connected between the TDI
and the TDO signals. The chip's internal core-logic signals are sampled and captured by the BSR
cells at the entry to the "Capture_DR" state as shown in TAP state transition diagram. The
contents of the BSR register are shifted out via the TDO line at exits from the "Shift_DR" state.
As the contents of the BSR (the captured data) are shifted out, new data are sifted in at the entries
to the "Shift_DR" state. The new contents of the BSR are applied to the chip's core-logic signals
during the "Update_DR" state.

IDCODE: This is used to select the Identification register between TDI and TDO, preparatory to
loading the internally-held 32-bit identification code and reading it out through TDO. The 32 bits
are used to identify the manufacturer of the device, its part number and its version number.
USERCODE: This instruction selects the same 32-bit register as IDCODE, but allows an
alternative 32 bits of identity data to be loaded and serially shifted out. This instruction is used
for dual-personality devices, such as Complex Programmable Logic Devices and Field
Programmable Gate Arrays.
RUNBIST: An important optional instruction is RunBist. Because of the growing importance of
internal self-test structures, the behavior of RunBist is defined in the Standard. The self-test
routine must be self-initializing (i.e., no external seed values are allowed), and the execution of
RunBist essentially targets a self-test result register between TDI and TDO. At the end of the
self-test cycle, the targeted data register holds the Pass/Fail result. With this instruction one can
control the execution of the memory BIST by the TAP controller, and hence reducing the
hardware overhead for the BIST controller.
o m
CLAMP: Clamp is an instruction that uses boundary-scan cells to drive preset values established
o t.c
initially with the Preload instruction onto the outputs of devices, and then selects the Bypass
register between TDI and TDO (unlike the Preload instruction which leaves the device with the
s p
boundary-scan register still selected until a new instruction is executed or the device is returned
g
to the Test_Logic Reset state). Clamp would be used to set up safe guarding values on the
o
bl
outputs of certain devices in order to avoid bus contention problems, for example.
p .
HIGH-Z: It is similar to Clamp instruction, but it leaves the device output pins in a high-
impedance state rather than drive fixed logic-1 or logic-0 values. HighZ also selects the Bypass
register between TDI and TDO. o u
g r
3. On Board Test Controller ts
e n
So far the test architecture of boundary
u d scan inside the chip under test has been discussed. A
t
major problem remains is "Who is going to control the whole boundary scan test procedure?" In
general there are two solutions forsthis problem: using an external tester and using a special on-
y expensive because of the involving of an IC tester. The
board controller. The former isitusually
c to complete the whole test procedure. As clear from the above
latter provides an economic.way
wthe test data, the most important signal that a test controller has to
description, in addition to
w There exist two methods to provide this signal in a board: the star
provide is the TMS signal.
configuration and thewring configuration as shown in Figure 41.10. In the star configuration the
TMS is broadcast to all chips. Hence all chips must execute the same operation at any time. For
the ring structure, the test controller provides one independent TMS signal for each chip,
therefore great flexibility of the test procedure is facilitated.

Application chips Application chips

TDI TDI
TCK TCK #1
TMS #1 TMS
Bus TDO Bus TDO
master master
TDI TD0 TDI
TD0 TCK
TDI
TMS1 TCK
#2 TMS #2
TMS TMS2
TDI TDO TDO
TMS TMSN
TCK TCK
TDI TDI
TCK TCK #N
TMS #N TMS
TDO TDO
(a) (b)
o m
Fig. 41.10 BUS master for chips with BS: (a) star structure, (b) ring structure
4. How Boundary Scan Testing Is Done o t.c

s p
g
In a board design there usually can be many JTAG compliant devices. All these devices can be
o
bl
connected together to form a single scan chain as illustrated in Figure 41.11, "Single Boundary
checking of devices can be performed simultaneously. p .

Scan Chain on a Board." Alternatively, multiple scan chains can be established so parallel
ou
Figure 41.11, "Single Boundary Scan Chain on a Board," illustrates the on onboard TAP
gr
controllers connected to an offboard TAP control device, such as a personal computer, through a
s
nt
TAP access connector. The offboard TAP control device can perform different tests during board
manufacturing without the need of bed-of-nail equipment.
d e
t u
y s
it
.c
w
w
w

L L L
O O O
G G G
I I I
C C C
TDI TDO TDI TDO TDI TDO
BP BP BP
IR IR IR
DR DR DR
TCK TMS TCK TMS TCK TMS
TAP TAP TAP
o m
o t.c
s p
o g
bl
TAP Control Device
(Test Software
Figureon11PC/WS)
p .
u
o
Test Connector
g r
s
t Scan Chain on a Board
n
Fig. 41.11 Single Boundary
e Sequence
5. Simple Board Level Test
u d
t
One of the first tests that shouldys
i t be performed for a PCB test is called the infra-structure test.
c
This test is used to determine whether all the components are installed correctly. This test relies
on the fact that the last two .bits of the instruction register (IR) are always ``01''. By shifting out
w chain, it can be determined whether the device is properly installed.
w
the IR of each device in the
w test is successful, the board level interconnect test can begin. This is
This is accomplished through sequencing the TAP controller for IR read.
After the infra-structure
accomplished through the EXTEST command. This test can be used to check out `òpens'' and
``shorts'' on the PCB. The test patterns are preloaded into the output pins of the driving devices.
Then they are propagated to the receiving devices and captured in the input boundary scan cells.
The result can then be shifted out through the TDO pin for analysis.
These patterns can be generated and analyzed automatically, via software programs. This feature
is normally offered through tools like Automatic Test Pattern Generation (ATPG) or Boundary
Scan Test Pattern Generation (BTPG).
6. Boundary Scan Description Language

Boundary Scan Description Language (BSDL) has been approved as the IEEE Std. 1149.1b
(the original boundary scan standard is IEEE Std. 1149.1a) [1,6]. This VHDL compatible

language can greatly reduce the effort to incorporate boundary scan into a chip, and hence is
quite useful when a designer wishes to design boundary scan in his own style. Basically for those
parts that are mandatory to the Std. 1149.1a such as the TAP controller and the BYPASS
register, the designer does not need to describe them; they can be automatically generated. The
designer only has to describe the specifications related to his own design such as the length of
boundary scan register, the user-defined boundary scan instructions, the decoder for his own
instructions, the I/O pins assignment. In general these descriptions are quite easy to prepare. In
fact, currently many CAD tools already implement the boundary scan generation procedure and
thus it may even not needed for a designer to write the BSDL file: the tools can automatically
generate the needed boundary scan circuitry for any circuit design as long as the I/O of the
design is specified.
Any manufacturer of a JTAG compliant device must provide a BSDL file for that device. The
BSDL file contains information on the function of each of the pins on the device - which are
used as I/Os, power or ground. BSDL files describe the Boundary Scan architecture of a JTAG-
compliant device, and are written in VHDL. The BSDL file includes:
o m
t.c
1. Entity Declaration: The entity declaration is a VHDL construct that is used to identify the
name of the device that is described by the BSDL file.
p o
BSDL file. s
2. Generic Parameter: The Generic parameter specifies which package is described by the
g
o
l and states whether that pin is an
3. Logical Port Description: lists all of the pads on a device,
. b
bit;). u p
input(in bit;), output(out bit;), bidirectional (inout bit;) or unavailable for boundary scan (linkage
r o shows how the pads on the device die are

.4. Package Pin Mapping: The Package Pin Mapping
s g
wired to the pins on the device package.
n t VHDL packages that contain attributes, types,
5. Use statements: The use statement calls
e File.
constants, etc. that are referenced in thedBSDL
t u Port Identification identifies the JTAG pins: TDI, TDO,
6. Scan Port Identification: The Scan
TMS, TCK and TRST (if used). y
s
ti
7. TAP description: provides . cadditional information on the device's JTAG logic; the Instruction
Register length, Instructionw Opcodes, device IDCODE, etc. These characteristics are device
specific. w
w
8. Boundary Register description: provides the structure of the Boundary Scan cells on the
device. Each pin on a device may have up to three Boundary Scan cells, each cell consisting of a
register and a latch.

12 11
D6 Q6 D6 6 0 Q6
C 13 C 10
D5 O Q5 D5 7 O 1 Q5
R 14 R 9
D4 Q4 D4 8 2 Q4
E E
15 8
D3 Q3 D3 9 3 Q3
L L 7
16
D2 O Q2 D2 10 O 4 Q2
G 17 G 6
D1 I Q1 D1 11 I 5 Q1
C 1 C
CLK CLK 12
TAP
m
Controller
o
2
o t.3c 4
5
TDI TCK TMS TDO
(a) p
s (b)
o g
bl
Fig. 41.12 Example to illustrate BSDL (a) core logic (b) after BS insertion
7. p Scan
Benefits and Penalties of Boundary
.
o u
The decision whether to use boundary-scan usually
g r involves economics. Designers often hesitate
ts
to use boundary-scan due to the additional silicon involved. In many cases it may appear that the
penalties outweigh the benefits for an ASIC. n However, considering an analysis spanning all
assembly levels and all test phases duringethe system's life, the benefits will usually outweigh the
u d
penalties.
st
Benefits
i t y
.c
The benefits provided by boundary-scan include the following:
w
lower test generation
w costs
reduced test time
reduced timew to market
simpler and less costly testers
compatibility with tester interfaces
high-density packaging devices accommodation
By providing access to the scan chain I/Os, the need for physical test points on the board is
eliminated or greatly reduced, leading to significant savings as a result of simpler board layouts,
less costly test fixtures, reduced time on in-circuit test systems, increased use of standard
interfaces, and faster time-to-market. In addition to board testing, boundary-scan allows
programming almost all types of CPLDs and flash memories, regardless of size or package type,
on the board, after PCB assembly. In-system programming saves money and improves
throughput by reducing device handling, simplifying inventory management, and integrating the
programming steps into the board production line.

Penalties
The penalties incurred in using boundary-scan include the following:
extra silicon due to boundary scan circuitry
added pins
additional design effort
degradation in performance due to gate delays through the additional circuitry
increased power consumption
Boundary Scan Example
Since boundary-scan design is new to many designers, an example of gate count for a circuit
with boundary scan is discussed here. This provides an estimate for the circuitry sizes required to
implement the IEEE 1149.1 standard, but without the extensions defined in the standard. The
example uses a library-based gate array design environment. The gate counts given are based on
requirement. o m
commercial cells and relate to a 10000 gate design in a 40-pin package. Table 1 gives the gate
Logic Element o t.c

Gate Equivalent
s p
Variable Size
o g Approx
bl
Boundary-scan Register (40 cells) 680
Fixed Sizes
TAP controller p . 131
Instruction Register (2 bits)
ou 28
Bypass Register
gr 9
Miscellaneous Logic
s 20 Approx
ent Total 868 Approx
u d
Table: 1 Gate requirements for a Gate Array Boundary-scan Design
st
it y
It must be noted that in Table 1 the boundary-scan implementation requires 868 gates, requiring
an estimated 8 percent overhead. It also be noted that the cells used in this example were created
.c
prior to publication of the IEEE 1149.1 standard. If specific cell designs had been available to
w
support the standard or if the vendor had placed the boundary-scan circuitry in areas of the ASIC
w
not available to the user, then the design would have required less.
w
9. Conclusion
Board level testing has become more complex with the increasing use of fine pitch, high pin
count devices. However with the use of boundary scan the implementation of board level testing
is done more efficiently and at lower cost. This standard provides a unique opportunity to
simplify the design debug and test processes by enabling a simple and standard means of
automatically creating and applying tests at the device, board, and system levels. Boundary scan
is the only solution for MCMs and limited-access SMT/ML boards. The standard supports
external testing with an ATE. The IEEE 1532-2000 In-System Configuration (ISC) standard
makes use of 1149.1 boundary-scan structures within the CPLD and FPGA devices.

References
[1] IEEE-SA Standards Board, 3 Park Avenue, New York, NY 10016-5997, USA, IEEE
Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1-2002,
(Revision of IEEE Std 1149.1-1990), http://grouper.ieee.org/groups/1149/1or
http://standards.ieee.org/catalog/
[2] Parker, The boundary-scan handbook: analog and digital, Kluwer Academic Press,
1998 (2nd Edition).
[4] IEEE 1149.4 Mixed-Signal Test Bus Standard web site:
http://grouper.ieee.org/groups/1149/4
[5] IEEE 1532 In-System Configuration Standard web site:
http://grouper.ieee.org/groups/1532/
[6] Agilent Technologies BSDL verification
o m service:
t.c
http://www.agilent.com/see/bsdl_service
Problems p o
g s
o
bl
1. What is Boundary Scan? What is the motivation of boundary scan?
.
2. How boundary scan technique differs from so-called bed-of-nails techniques?
p
3. What are the different device packaging styles?
uo
4. What is JTAG?
g r
5. Give an overview of the boundary scansfamily i.e., 1149.
n t
6. Show boundary scan architecture and
d e describe functions of its elements.
t u
7. Show the basic cell of a boundary-scan register. Describe different modes of its
ys chips with 100 pins each. The length of the total scan chain
operation.
8. A board is composed ofit100
is 10,000 bits. Find a.c
possible testing strategy to reduce the scan chain length.
w What are the main functions of TAP controller?
w
9. What is TAP controller?
w boundary scan chain and its operation. What are its disadvantages and
10. Describe a serial
discuss a strategy to overcome these.
11. Discuss different instruction sets and their functions.
12. Considering a board populated by IEEE 1149.1-compliant devices (a "pure" boundary-
scan board), summarize a board-test strategy.
13. What is the goal of the infrastructure test? Is the infrastructure test mandatory or
optional? Which are the main steps of an infrastructure test?
14. Consider the example depicted in the following figure.

TDO
TDI
A C E
B IC1 IC2 F
D
This circuit has two primary inputs, two primary outputs and two nets that connect the ICs one to
o m
the other. There is only 1 TAP, which connects the TDI and TDO of both ICs. Prepare a test plan
for this circuit.
o t.c
15. Consider a board composed of 100 40-pin Boundary-Scan devices, 2,000 interconnects,
s p
an 8-bit Instruction Register per device, a 32-bit Identification Register per device, and a
g
10 MHz test application rate. Compute the test time to execute a test session.
o
16. What is BSDL. What are the different BSDL files?bl
p .
o u
gr
ts
e n
u d
st
i ty
.c
w
w
w

o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
42
On-line Testing of
Embedded Systems
Explain the meaning of the term On-line Testing

Describe the main issues in on-line testing and identify applications where on-line testing
are required for embedded systems
Distinguish among concurrent and non-concurrent testing and their relations with BIST
and on-line testing
Describe an application of on-line testing for System-on-Chip
On-line Testing of Embedded Systems
1. Introduction o m
o t.c
EMBEDDED SYSTEMS are computers incorporated in consumer products or other devices
s p
to perform application-specific functions. The product user is usually not even aware of the
g
existence of these systems. From toys to medical devices, from ovens to automobiles, the range
o
. bl
of products incorporating microprocessor-based, software controlled systems has expanded
rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems is
u p
clear: They promise previously impossible functions that enhance the performance of people or
r o
machines. As these systems gain sophistication, manufacturers are using them in increasingly
g
critical applications products that can result in injury, economic loss, or unacceptable
s
nt
inconvenience when they do not perform as required.
d e
Embedded systems can contain a variety of computing devices, such as microcontrollers,
application-specific integrated circuits, and digital signal processors. A key requirement is that
t u
these computing devices continuously respond to external events in real time. Makers of
y s
embedded systems take many measures to ensure safety and reliability throughout the lifetime of
it
products incorporating the systems. Here, we consider techniques for identifying faults during
.c
normal operation of the productthat is, online-testing techniques. We evaluate them on the
w
basis of error coverage, error latency, space redundancy, and time redundancy.
w
2. w
Embedded-system test issues
Cost constraints in consumer products typically translate into stringent constraints on product
components. Thus, embedded systems are particularly cost sensitive. In many applications, low
production and maintenance costs are as important as performance.
Moreover, as people become dependent on computer-based systems, their expectations of
these systems availability increase dramatically. Nevertheless, most people still expect
significant downtime with computer systemsperhaps a few hours per month. People are much
less patient with computer downtime in other consumer products, since the items in question did
not demonstrate this type of failure before embedded systems were added. Thus, complex
consumer products with high availability requirements must be quickly and easily repaired. For
this reason, automobile manufacturers, among others, are increasingly providing online detection
and diagnosis, capabilities previously found only in very complex and expensive applications

such as aerospace systems. Using embedded systems to incorporate functions previously

considered exotic in low-cost, everyday products is a growing trend.
Since embedded systems are frequently components of mobile products, they are exposed to
vibration and other environmental stresses that can cause them to fail. Embedded systems in
automotive applications are exposed to extremely harsh environments, even beyond those
experienced by most portable devices. These applications are proliferating rapidly, and their
more stringent safety and reliability requirements pose a significant challenge for designers.
Critical applications and applications with high availability requirements are the main candidates
for online testing.
Embedded systems consist of hardware and software, each usually considered separately in
the design process, despite progress in the field of hardware-software co design. A strong
synergy exists between hardware and software failure mechanisms and diagnosis, as in other
aspects of system performance. System failures often involve defects in both hardware and
software. Software does not break in the common sense of the term. However, it can perform
inappropriately due to faults in the underlying hardware or specification or design flaws in either
o m
hardware or software. At the same time, one can exploit the software to test for and respond to
the presence of faults in the underlying hardware.
o t.c
Online software testing aims at detecting design faults (bugs) that avoid detection before the
p
embedded system is incorporated and used in a product. Even with extensive testing and formal
s
g
verification of the system, some bugs escape detection. Residual bugs in well-tested software
o
bl
typically behave as intermittent faults, becoming apparent only in rare system states. Online
p .
software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptance
testing checks for the presence or absence of well-defined events or conditions, usually
u
expressed as true-or-false conditions (predicates), related to the correctness or safety of
o
r
preceding computations. Diversity techniques compare replicated computations, either with
g
s
minor variations in data (data diversity) or with procedures written by separate, unrelated design
ent
teams (design diversity). This chapter focuses on digital hardware testing, including techniques
by which hardware tests itself, built-in self-test (BIST). Nevertheless, we must consider the role
u d
of software in detecting, diagnosing, and handling hardware faults. If we can use software to test
st
hardware, why should we add hardware to test hardware? There are two possible answers. First,
t y
it may be cheaper or more practical to use hardware for some tasks and software for others. In an
i
.c
embedded system, programs are stored online in hardware-implemented memories such as
ROMs (for this reason, embedded software is sometimes called firmware). This program storage
w
w
space is a finite resource whose cost is measured in exactly the same way as other hardware. A
w
function such as a test is soft only in the sense that it can easily be modified or omitted in the
final implementation.
The second answer involves the time that elapses between a faults occurrence and a problem
arising from that fault. For instance, a fault may induce an erroneous system state that can
ultimately lead to an accident. If the elapsed time between the faults occurrence and the
corresponding accident is short, the fault must be detected immediately. Acceptance tests can
detect many faults and errors in both software and hardware. However, their exact fault coverage
is hard to measure, and even when coverage is complete, acceptance tests may take a long time
to detect some faults. BIST typically targets relatively few hardware faults, but it detects them
quickly.
These two issues, cost and latency, are the main parameters in deciding whether to use
hardware or software for testing and which hardware or software technique to use. This decision
requires system-level analysis. We do not consider software methods here. Rather, we emphasize
the appropriate use of widely implemented BIST methods for online hardware testing. These
methods are components in the hardware-software trade-off.
3. Online testing
Faults are physical or logical defects in the design or implementation of a digital device.
Under certain conditions, they lead to errorsthat is, incorrect system states. Errors induce
failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is a
hazard. Faults can be classified into three groups: design, fabrication, and operational. Design
faults are made by human designers or CAD software (simulators, translators, or layout
generators) during the design process. Fabrication defects result from an imperfect
manufacturing process. For example, shorts and opens are common manufacturing defects in
VLSI circuits. Operational faults result from wear or environmental disturbances during normal
system operation. Such disturbances include electromagnetic interference, operator mistakes, and
extremes of temperature and vibration. Some design defects and manufacturing faults escape
detection and combine with wear and environmental disturbances to cause problems in the field.
Operational faults are usually classified by their duration:
m
Permanent faults remain in existence indefinitely if no corrective action is taken. Many
o
t.c
are residual design or manufacturing faults. The rest usually occur during changes in
system operation such as system start-up or shutdown or as a result of a catastrophic
environmental disturbance such as a collision. p o
s
Intermittent faults appear, disappear, and reappear repeatedly. They are difficult to
g
o
bl
predict, but their effects are highly correlated. When intermittent faults are present, the
system works well most of the time but fails under atypical environmental conditions.
p .
Transient faults appear and disappear quickly and are not correlated with each other.
u
They are most commonly induced by random environmental disturbances.
o
r
One generally uses online testing to detect operational faults in computers that support critical or
g
s
high-availability applications. The goal of online testing is to detect fault effects, or errors, and
ent
take appropriate corrective action. For example, in some critical applications, the system shuts
down after an error is detected. In other applications, error detection triggers a reconfiguration
u d
mechanism that allows the system to continue operating, perhaps with some performance
st
degradation. Online testing can take the form of external or internal monitoring, using either
t y
hardware or software. Internal monitoring, also called self-testing, takes place on the same
i
.c
substrate as the circuit under test (CUT). Today, this usually means inside a single ICa system
on a chip. There are four primary parameters to consider in designing an online-testing scheme:
w
error coveragethe fraction of modeled errors detected, usually expressed as a
w
w
percentage. Critical and highly available systems require very good error coverage to
minimize the probability of system failure.
error latencythe difference between the first time an error becomes active and the first
time it is detected. Error latency depends on the time taken to perform a test and how
often tests are executed. A related parameter is fault latency, the difference between the
onset of the fault and its detection. Clearly, fault latency is greater than or equal to error
latency, so when error latency is difficult to determine, test designers often consider fault
latency instead.
space redundancythe extra hardware or firmware needed for online testing.
time redundancythe extra time needed for online testing.
The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock
cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT
and impose no functional or structural restrictions on it. Most BIST methods meet some of these
constraints without addressing others. Considering all four parameters in the design of an online-

testing scheme may create conflicting goals. High coverage requires high error latency, space
redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling
1) minimize time redundancy but require more hardware. On the other hand, schemes with
delayed detection (error latency greater than 1) reduce time and space redundancy at the expense
of increased error latency. Several proposed delayed-detection techniques assume
equiprobability of input combinations and try to establish a probabilistic bound on error latency
[2]. As a result, certain faults remain undetected for a long time because tests for them rarely
appear at the CUTs inputs.
To cover all the operational fault types described earlier, test engineers use two different
modes of online testing: concurrent and non-concurrent. Concurrent testing takes place during
normal system operation, and non-concurrent testing takes place while normal operation is
temporarily suspended. One must often overlap these test modes to provide a comprehensive
online-testing strategy at acceptable cost.
4. Non-concurrent testing m
o
c (periodic) and is
t.
This form of testing is either event-triggered (sporadic) or time-triggered
characterized by low space and time redundancy. Event triggeredotesting is initiated by key
events or state changes such as start-up or shutdown, and its goalpis to detect permanent faults.
Detecting and repairing permanent faults as soon as possible g s is usually advisable. Event-
triggered tests resemble manufacturing tests. Any such test lcan o be applied online, as long as the
. b
required testing resources are available. Typically, the hardware is partitioned into components,
each exercised by specific tests. RAMs, for instance, are
u p tested with manufacturing tests such as
March tests [3]. o
Time-triggered testing occurs at predeterminedr times in the operation of the system. It detects
g
permanent faults, often using the same typess of tests applied by event-triggered testing. The
n
periodic approach is especially useful in systems
t that run for extended periods during which no
significant events occur to trigger testing.
d e Periodic testing is also essential for detecting
t
intermittent faults. Such faults typicallyu behave as permanent faults for short periods. Since they
usually represent conditions that must
y s orbemanufacturing
corrected, diagnostic resolution is important. Periodic
i
testing can identify latent design t flaws that appear only under certain
.cduring each test
environmental conditions. Time-triggered tests are frequently partitioned and interleaved so that
w
only part of the test is applied period.
w
5. Concurrent w testing
Non-concurrent testing cannot detect transient or intermittent faults whose effects disappear
quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults.
However, concurrent testing is not particularly useful for diagnosing the source of errors, so test
designers often combine it with diagnostic software. They may also combine concurrent and
non-concurrent testing to detect or diagnose complex faults of all types.
A common method of providing hardware support for concurrent testing, especially for
detecting control errors, is a watchdog timer [4]. This is a counter that the system resets
repeatedly to indicate that the system is functioning properly. The watchdog concept assumes
that the system is fault-freeor at least aliveif it can reset the timer at appropriate intervals.
The ability to perform this simple task implies that control flow is correctly traversing timer-reset
points. One can monitor system sequencing very precisely by guarding the watchdog- reset
operations with software-based acceptance tests that check signatures computed while control

flow traverses various checkpoints. To implement this last approach in hardware, one can
construct more complex hardware watchdogs.
A key element of concurrent testing for data errors is redundancy. For example, the
duplication-with-comparison (DWC) technique5 detects any single error at the expense of 100%
space redundancy. This technique requires two copies of the CUT, which operate in tandem with
identical inputs. Any discrepancy in their outputs indicates an error. In many applications,
DWCs high hardware overhead is unacceptable. Moreover, it is difficult to prevent minor
timing variations between duplicated modules from invalidating comparison.
A possible lower-cost alternative is time redundancy. A technique called double execution, or
retry, executes critical operations more than once at diverse time points and compares their
results. Transient faults are likely to affect only one instance of the operation and thus can be
detected. Another technique, re-computing with shifted operands (RESO) [5] achieves almost the
same error coverage as DWC with 100% time redundancy but very little space redundancy.
However, no one has demonstrated the practicality of double execution and RESO for online
testing of general logic circuits.
o m
A third, widely used form of redundancy is information redundancythe addition of
o t.c
redundant coded information such as a parity-check bit[5]. Such codes are particularly effective
for detecting memory and data transmission errors, since memories and networks are susceptible
p
to transient errors. Coding methods can also detect errors in data computed during critical
s
operations.
o g
6. Built-in self-test . bl
u p
o
For critical or highly available systems, a comprehensive online-testing approach that covers
r
g
all expected permanent, intermittent, and transient faults is essential. In recent years, BIST has
s
nt
emerged as an important method of testing manufacturing faults, and researchers increasingly
promote it for online testing as well.
d e
BIST is a design-for-testability technique that places test functions physically on chip with
t u
the CUT, as illustrated in Figure 42.1. In normal operating mode, the CUT receives its inputs
s
from other modules and performs the function for which it was de-signed. In test mode, a test
y
it
pattern generator circuit applies a sequence of test patterns to the CUT, and a response monitor
.c
evaluates the test responses. In the most common type of BIST, the response monitor compacts
w
the test responses to form fault signatures. It compares the fault signatures with reference
w
signatures generated or stored on chip, and an error signal indicates any discrepancies detected.
w
We assume this type of BIST in the following discussion.
In developing a BIST methodology for embedded systems, we must consider four primary
parameters related to those listed earlier for online-testing techniques:
fault coveragethe fraction of faults of interest that the test patterns produced by the test
generator can expose and the response monitor can detect. Most monitors produce a fault-
free signature for some faulty response sequences, an undesirable property called
aliasing.
test set sizethe number of test patterns produced by the test generator. Test set size is
closely linked to fault coverage; generally, large test sets imply high fault coverage.
However, for online testing, test set size must be small to reduce fault and error latency.
hardware overheadthe extra hardware needed for BIST. In most embedded systems,
high hardware overhead is not acceptable.

performance penaltythe impact of BIST hardware on normal circuit performance, such

as worst-case (critical) path delays. Overhead of this type is sometimes more important
than hardware overhead.
System designers can use BIST for non-concurrent, online testing of a systems logic and
memory[6]. They can readily configure the BIST hardware for event-triggered testing, tying the
BIST control to the system reset so that testing occurs during system start-up or shutdown. BIST
can also be designed for periodic testing with low fault latency. This requires incorporating a test
process that guarantees the detection of all target faults within a fixed time.
Designers usually implement online BIST with the goals of complete fault coverage and low
fault latency. Hence, they generally design the test generator and the response monitor to
guarantee coverage of specific fault models, minimum hardware overhead, and reasonable test
set size. Different parts of the system meet these goals by different techniques.
Test generator and response monitor implementations often consist of simple, counter like
circuits; especially linear- feedback shift registers [5]. An LFSR is formed from standard flip-
m
flops, with outputs of selected flip-flops being fed back (modulo 2) to its inputs. When used as a
o
t.c
test generator, an LFSR is set to cycle rapidly through a large number of its states. These states,
whose choice and order depend on the LFSRs design parameters, define the test patterns. In this
o
mode of operation, an LFSR is a source of pseudorandom tests that are, in principle, applicable
p
g s
to any fault and circuit types. An LFSR can also serve as a response monitor by counting (in a
special sense) the responses produced by the tests. After receiving a sequence of test responses,
o
good signature to determine whether a fault is present. . bl
an LFSR response monitor forms a fault signature, which it compares to a known or generated
u p
Ensuring that fault coverage is sufficiently high and the number of tests is sufficiently low
r o
are the main problems with random BIST methods. Researchers have proposed two general
g
approaches to preserve the cost advantages of LFSRs while greatly shortening the generated test
s
nt
sequence. One approach is to insert test points in the CUT to improve controllability and
observability. However, this approach can result in performance loss. Alternatively, one can
d e
introduce some determinism into the generated test sequencefor example, by inserting specific
t
seed tests known to detect hard faults. u
s
Some CUTs, including data path circuits, contain hard-to detect faults that are detectable by
y
it
only a few test patterns, denoted Thard. An N-bit LSFR can generate a sequence that eventually
.c
includes 2N - 1 patterns (essentially all possibilities). However, the probability that the tests in
w
Thard will appear early in the sequence is low. In such cases, one can use deterministic testing,
w
which tailors the generated test sequence to the CUTs functional properties, instead of random
w
testing. Deterministic testing is especially suited to RAMs, ROMs, and other highly regular
components. A deterministic technique called transparent BIST [3] applies BIST to RAMs while
preserving the RAM contentsa particularly desirable feature for online testing. Keeping
hardware overhead acceptably low is the main difficulty with deterministic BIST.
A straightforward way to generate a specific test set is to store it in a ROM and address each
stored test pattern with a counter. Unfortunately, ROMs tend to be much too expensive for
storing entire test sequences. An alternative method is to synthesize a finite-state machine that
directly generates the test set. However, the relatively large test set size and test vector width, as
well as the test sets irregular structure, are much more than current FSM synthesis programs can
handle.
Another group of test generator design methods, loosely called deterministic, attempt to
embed a complete test set in a specific generated sequence. Again the generated tests must meet
the coverage, overhead, and test size constraints weve discussed. An earlier article [7] presents a
representative BIST design method for data path circuits that meets these requirements. The test

generators structure, based on a twisted-ring counter, is tailored to produce a regular,

deterministic test sequence of reasonable size. One can systematically rescale the test generator
as the size of anon-bit-sliced data path CUT, such as a carry-look-ahead adder, changes. Instead
of using an LFSR, a straightforward way to compress test response data and produce a fault
signature is to use an FSM or an accumulator. However, FSM hardware overhead and
accumulator aliasing are difficult parameters to control. Keeping hardware overhead acceptably
low and reducing aliasing are the main difficulties in response monitor design.
Inputs
Outputs
Multi-
Test plexer
pattern Circuit under test
sequence (CUT) Error
Response
Test monitor
generator
o m
Control
o t.c
s p
Fig. 42.1 A General BIST Scheme
o g
An Example . bl
u p
o
IEEE 1149.4 based Architecture for OLT of a Mixed Signal SoC
r
s g
Analog/mixed signal blocks like DCDC converters, PLLs, ADCs, etc. and digital modules
in SoCs. The have been used as cores ofethe nt SoC benchmark

like application specific processors, micro controllers, UATRs, bus controllers etc. typically exist
Controller for Electro-Hydraulic
Actuators which is being used as thedcase study. It is to be noted that this case study is used
t u is generic which applies for all Mixed Signal SoCs.
only for illustration and the architecture
y s specific processor, microcontroller, bus controller etc.
All the digital blocks like instruction
t
have been designed with OLTi capability using the CAD tool descried in [8]. Further, all these
digital cores are IEEE 1149.1 .ccompliant. In other words, all the digital cores are designed with a
w
blanket comprising an on-line
w beenmonitor
modules the observer have
and IEEE 1149.1 compliance circuitry. For the analog
designed using ADCs and digital logic [9]. The test blanket for
w cores comprises IEEE 1149.4 circuitry. A dedicated test controller is
the analog/mixed signal
designed and placed on-chip that schedules the various lines tests during the operation of the
SoC. The block diagram of the SoC being used as the case study is illustrated in Figure 42.2.
The basic functionality of the SoC under consideration is discussed below.
Electronic Controller Electro Hydraulic system

Actuator systems are vital in the flight control system, providing the motive force necessary
to move the flight control surfaces. Hydraulic actuators are very common in space vehicle and
flight control systems, where force/ weight consideration is very much important. This system
positions the control surface of aircraft meeting performance requirement which acting against
external loads. The actuator commands are processed in four identical analog servo loops, which
command the four coils of force motor driving the hydraulic servo valve used to control the

motion of the dual tandem hydraulic jack. The motion of the spool of the hydraulic servo valve
(Master control Valve), regulates the flow of oil to the tandem jacks, thereby determine the ram
position. The Spool and ram positions are controlled by means of feedback loops. The actuator
system is controlled by the on-board flight electronics. A lot of work has been done for On-line
fault detection and diagnosis of the mechanical system, however OLT of the electronic systems
were hardly looked into. It is to be noted that as Electro Hydraulic Actuators are mainly used in
mission critical systems like avionics; for reliable operation on-line fault detection and diagnosis
is required for both the mechanical and the electronic sub-systems.
The IEEE 1149.1 and 1149.4 circuitry are utilized to perform the BIST of the interconnecting
buses in between the cores. It may be noted that on-line tests are carried only for cores, which are
more susceptible to failures. However, the interconnecting buses are tested during startup and at
intervals when cores being connected by them are ideal. The test scheduling logic can be
designed as suggested in [10].
The following three classes of tests are carried in the SoC:
om
1. Interconnect test of the interconnecting buses (BIST)
c
t.
o
Interconnect testing is to detect open circuits in the interconnect betweens the cores, and to detect
p of whether they are
and diagnose bridging faults anywhere in the Interconnect --regardless
normally carry digital or analog signals. This test is performed g s by EXTEST instruction and
digital test patterns are generated from the pre-programmed test
b lo controller.
p .
2. Parametric test of the interconnecting
o u buses (BIST)
Parametric test: Parametric test permits analog g r measurements using analog stimulus and
responses. This test is also performed by EXTESTts instruction. For this only three values of
analog voltages viz., V =VDD, V =VDD/3,
H Low
e n V = VSS are given as test inputs by the controller
G
and the voltages at the output of the line
u d under test is sampled after one bit coarse digitization as
mentioned in the IEEE 1149.4 standard
s t
3. Internal test of thecicores ty (Concurrent tests)
.
This test is performed bywINTEST instruction and this enables the on-line monitors placed on
w in the SoC. This test can be enabled concurrently with the SoC
w
each of the cores present
operation and need not be synchronized to start up of the normal operation of the SoC. The
asynchronous startup/shutdown of the on-line testers facilitates power saving and higher
reliability of the test circuitry if compared to the functional circuit.
7. References
1) M.R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995.
2) K.K. Saluja, R. Sharma, and C.R. Kime, A Concurrent Testing Technique for Digital
Circuits, IEEE Trans. Computer-Aided Design, Vol. 7, No. 12, Dec. 1988, pp. 1250-
1259.
3) M. Nicolaidis, Theory of Transparent BIST for RAMs, IEEE Trans. Computers, Vol.
45, No. 10, Oct. 1996, pp. 1141-1156.

4) A. Mahmood and E. McCluskey, Concurrent Error Detection Using Watchdog

ProcessorsA Survey, IEEE Trans. Computers, Vol. 37, No. 2, Feb. 1988, pp. 160-174.
5) B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley,
Reading, Mass., 1989.
6) B.T. Murray and J.P. Hayes, Testing ICs: Getting to the Core of the Problem,
Computer, Vol. 29, No. 11, Nov. 1996, pp. 32-45.
7) H. Al-Asaad, J.P. Hayes, and B.T. Murray, Scalable Test Generators for High-Speed
Data Path Circuits, J. Electronic Testing: Theory and Applications, Vol. 12, No. 1/2,
Feb./Apr. 1998, pp. 111-125 (reprinted in On-Line Testing for VLSI, M. Nicolaidis, Y.
Zorian, and D.K. Pradhan, eds., Kluwer, Boston, 1998).
8) A Formal Approach to On-Line Monitoring of Digital VLSI Circuits: Theory, Design
and Implementation, Biswas, S Mukhopadhyay, A Patra, Journal of Electronic Testing:
Theory and Applications, Vol. 20, October 2005, pp-503-537.
9) S. Biswas, B Chatterjee, S Mukhopadhyay, A Patra, A Novel Method for On-Line
Testing of Mixed Signal System On a Chip: A Case study of Base Band Controller,
29th National System Conference, IIT Mumbai, INDIA 2005, pp 2.1-2.23.
o m
o t.c
10) An Optimal Test Sequence for the JTAG/IEEE P1149.1 Test Access Port Controller,
A.T. Dahbura, M.U. Uyar, Chi. W. Yau, International Test Conference, USA, 1998, pp
55-62.
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w

XTAL
Timing
Application Specific
DATA Processor
RAM Clock
16kB Divider
System Bus Interface
System
BUS
ADC
DACo m
o t.c
p
s Electro Hydraulic
TDI
o g Actuator System
bl
TMS (Simulation in
TCK
TDO On Chip Test Controller
(JTAG Interface)
p . Lab-View in a PC)
VH
o u
VL
VG
gr
s
e nt
AB1 AB2
u d
st
i t y DC/DC
.c Converter
w Battery &
w
Power supply to the cores Charger
w
Data and Control paths
IEEE 1149.4/1149.1 Boundary Scan Bus
Analog Buses (1149.4) AB1 and AB2
Digital Cores with on line Digital monitors [6] (FPGA)
Analog/Mixed Signal Cores with Along Monitors [3] (ASIC)
Program running in PC and data I/O using cards HILS
Fig. 42.2 Block Diagram of the SOC Representing On-Line Test Capability


Embedded System PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Embedded System PDF

Uploaded by

Copyright:

Available Formats

o m

Example, Definitions, Common Architecture

Know what an embedded system is

Version 2 EE IIT, Kharagpur 3

Version 2 EE IIT, Kharagpur 4

When we want to purchase any of them what do we look for?

Phone Weight Screen Games Camera Radio Ring tones Memory

with MMC cards r o

Version 2 EE IIT, Kharagpur 5

Version 2 EE IIT, Kharagpur 6

while developing such a system. Size,t

Version 2 EE IIT, Kharagpur 7

Common Architecture of Real Time Embedded Systems

Version 2 EE IIT, Kharagpur 8

Version 2 EE IIT, Kharagpur 9

Hard Disk drive

Version 2 EE IIT, Kharagpur 10

Version 2 EE IIT, Kharagpur 11

units capable of specialized functions such as motor control, voice encoding,

Version 2 EE IIT, Kharagpur 12

Questions and Answers

(b) and (e) are embedded systems

Version 2 EE IIT, Kharagpur 13

Q.2 Write five advantages and five disadvantages of embodiment.

i t a desktop computer system typically focuses on

Version 2 EE IIT, Kharagpur 14

Version 2 EE IIT, Kharagpur 15

Structure and Design

Version 2 EE IIT, Kharagpur 3

Fig. 2.1(b) Camcorder

Fig. 2.1(e) Washer and Dryers

Version 2 EE IIT, Kharagpur 4

office automation fax machines, copiers, printers, and scanners

Fig. 2.1(f) Fax cum printer cum copier

Fig. 2.1(i)Automated Teller Machines

Version 2 EE IIT, Kharagpur 5

Fig. 2.1(j)ECU of a Vehicle

Version 2 EE IIT, Kharagpur 6

Antenna e n DSP Speaker

Version 2 EE IIT, Kharagpur 7

Components of an Embedded System

easily programmable too. It is achieveddby using Flash memories.1

Version 2 EE IIT, Kharagpur 8

Version 2 EE IIT, Kharagpur 9

The Performance Design Metric

The two main measures of performance are:

Latency or response time

System level Design

Sub-system or Node Level design

Processor Level Design

Task Level Design

Overall System specifications

Node level design

Version 2 EE IIT, Kharagpur 12

Example 1: A handheld Global Positioning System Receiver

For details please http://www.gpsworld.com/

Version 2 EE IIT, Kharagpur 13

Therefore one can say that a Hard Disk Drive is an RTES.

Version 2 EE IIT, Kharagpur 14

Q3. Elaborate on the time-to-market design metric.

Q4. What is Moores Law? How was it conceived?

Version 2 EE IIT, Kharagpur 15

References and Further Reading