Professional Documents
Culture Documents
ot.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
Version 2 EE IIT, Kharagpur 1
o m
ot.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
1
Introduction to Real Time
Embedded Systems Part I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
o m
o t.c
Fig. 1.1 Mobile Phones
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
From the above specifications it is clear that a mobile phone is a very complex device which
houses a number of miniature gadgets functioning coherently on a single device.
Moreover each of these embedded gadgets such as digital camera or an FM radio along with the
telephone has a number of operating modes such as:
you may like to adjust the zoom of the digital camera,
you may like to reduce the screen brightness,
you may like to change the ring tone,
you may like to relay a specific song from your favorite FM station to your friend
using your mobile
You may like to use it as a calculator, address book, emailing device etc.
m
These variations in the functionality can only be achieved by a very flexible device.
o
.c than a Customized
This flexible device sitting at the heart of the circuits is none tother
Microprocessor better known as an Embedded Processor and the o mobile phone housing a
number of functionalities is known as an Embedded System. sp
g
Since it satisfies the requirement of a number of users atlo the same time (you and your friend,
you and the radio station, you and the telephone network . b etc) it is working within a time-
u p acceptable delay. We call this as to
constraint, i.e. it has to satisfy everyone with the minimum
work in Real Time. This is unlike your holidaying
r o attitude when you take the clock on your
stride.
s g
We can also say that it does not make us n twait long for taking our words and relaying them as
well as receiving them, unlike an email d eserver, which might take days to receive/deliver your
message when the network is congested
t u or slow.
s
ti y telephone as a Real Time Embedded System (RTES)
Thus we can name the mobile
.c
Definitions w
w
Now we are ready to wtake some definitions
Real Time
Real-time usually means time as prescribed by external sources
For example the time struck by clock (however fast or late it might be). The timings generated by
your requirements. You may like to call someone at mid-night and send him a picture. This
external timing requirements imposed by the user is the real-time for the embedded system.
Embedded (Embodiment)
Embodied phenomena are those that by their very nature occur in real time and real space
In other words, A number of systems coexist to discharge a specific function in real time
Thus A Real Time Embedded System (RTES) is precisely the union of subsystems to
discharge a specific task coherently. Hence forth we call them as RTES. RTES as a generic term
may mean a wide variety of systems in the real world. However we will be concerned about
them which use programmable devices such as microprocessors or microcontrollers and have
specific functions. We shall characterize them as follows.
Characteristics of an Rtes
Single-Functioned
o m
Here single-functioned means specific functions. The RTES is c
t. usually meant for very
specific functions. Generally a special purpose microprocessor executes
p o
over again for a specific purpose. If the user wants to change the functionality,
a program over and
e.g. changing the
mobile phone from conversation to camera mode or calculator s
o g function. These operations are
mode the program gets flushed
l
out and a new program is loaded which carries out the requisite
monitored and controlled by an operating system called asbReal Time Operating System (RTOS)
.
p etc. as compared to the conventional
which has much simpler complexity but more rigid constraints
operating systems such as Micro Soft Windows and Unix
o u
gr
Tightly Constrained ts
e n
u d
The constraints on the design and marketability of RTES are more rigid than their non-real-
factors.
it y
c
.Time
Reactive and Real w
w
Many embeddedwsystems must continually react to changes in the systems environment and
must compute certain results in real time without delay. For example, a cars cruise controller
continually monitors and reacts to speed and brake sensors. It must compute acceleration or
deceleration amounts repeatedly within a limited time; a delayed computation could result in a
failure to maintain control of the car. In contrast a desktop computer system typically focuses on
computations, with relatively infrequent (from the computers perspective) reactions to input
devices. In addition, a delay in those computations, while perhaps inconvenient to the computer
user, typically does not result in a system failure.
4
Very few in India will be interested to buy a mobile phone if it costs Rs50,000/- even if it provides you a faster
processor with 200MB of memory to store your address, your favorite mp3 music and plays them , acts as a small-
screen TV whenever you desire, takes your call intelligently
However in USA majority can afford it !!!!!!
However for the sake of our understanding we can discuss some common form of systems at
the block diagram level. Any system can hierarchically divided into subsystems. Each sub-
system may be further segregated into smaller systems. And each of these smaller systems may
consist of some discrete parts. This is called Hardware configuration.
Some of these parts may be programmable and therefore must have some place to keep these
programs. In RTES the on-chip or on-board non-volatile memory does keep these programs.
These programs are the part of the Real Time Operating System (RTOS) and continually run as
o m
long as the gadget is receiving power. A part of the RTOS also executes itself in the stand-by
mode while taking a very little power from the battery. This is also called the sleep mode of the
system.
o t.c
s p
Both the hardware and software coexist in a coherent manner. Tasks which can be both
g
carried out by software and hardware affect the design process of the system. For example a
o
bl
multiplication action may be done by hardware or it can be done by software by repeated
.
additions. Hardware based multiplication improves the speed at the cost of increased complexity
p
ou
of the arithmetic logic unit (ALU) of the embedded processor. On the other hand software based
multiplication is slower but the ALU is simpler to design. These are some of the conflicting
gr
requirements which need to be resolved on the requirements as imposed by the overall system.
s
nt
This is known as Hardware-Software Codesign or simply Codesign.
d e
Let us treat both the hardware and the imbibed software in the same spirit and treat them as
t u
systems or subsystems. Later on we shall know where to put them together and how. Thus we
s
can now draw a hierarchical block diagram representation of the whole system as follows:
y
it
.c
w
w
w
System
Subsystems
o m
Components
o t.c
s p
= interfaces
o g
. bl
= key interface
u p standards
= uses open
r o
s g and Architecture
Fig. 1.2 The System Interface
t
The red and grey spheres in Fig.1.2n represent interface standards. When a system is
assembled it starts with some chassis ordaesingle subsystem. Subsequently subsystems are added
onto it to make it a complete system. u
s t
Let us take the example of i tya Desktop Computer. Though not an Embedded System it can
.c a system from its subsystems.
give us a nice example of assembling
w a desktop computer (Fig.1.3) starting with the chassis and then
w
You can start assembling
w mode power supply), motherboard, followed by hard disk drive,
take the SMPS (switched
CDROM drive, Graphic Cards, Ethernet Cards etc. Each of these subsystems consists of several
components e.g. Application Specific Integrated Circuits (ASICs), microprocessors, Analog as
well as Digital VLSI circuits, Miniature Motor and its control electronics, Multilevel Power
supply units crystal clock generators, Surface mounted capacitors and resistors etc. In the end
you close the chassis and connect Keyboard, Mouse, Speakers, Visual Display Units, Ethernet
Cable, Microphone, Camera etc fitting them into certain well-defined sockets.
As we can see that each of the subsystems inside or outside the Desktop has cables fitting
well into the slots meant for them. These cables and slots are uniform for almost any Desktop
you choose to assemble. The connection of one subsystem into the other and vice-versa is known
as Interfacing. It is so easy to assemble because they are all standardized. Therefore,
standardization of the interfaces is most essential for the universal applicability of the system and
its compatibility with other systems. There can be open standards which makes it exchange
information with products from other companies. It may have certain key standards, which is
only meant for the specific company which manufactures them.
SMPS
CDROM drive
o m
t.c
oInterface Cables
s p
o g
bl Mother Board
u p.
r o
s g
n t
d e
t u
y s
Fig. 1.3 Inside Desktop Computer
i t
A Desktop Computer will c
. have more open standards than an Embedded System. This is
w in the later. Many of the components of the embedded systems
because of the level of integration
w chip. This concept is known as System on Chip (SOC) design. Thus
are integrated on to a single
w
there are only few subsystems left to be connected.
Analyzing the assembling process of a Desktop let us comparatively assess the possible
subsystems of the typical RTES.
One such segregation is shown in Fig.1.4. The explanation of various parts as follows:
User Interface: for interacting with users. May consists of keyboard, touch pad etc
ASIC: Application Specific Integrated Circuit: for specific functions like motor control, data
modulation etc.
Microcontroller(C): A family of microprocessors
Real Time Operating System (RTOS): contains all the software for the system control and user
interface
Controller Process: The overall control algorithm for the external process. It also provides
timing and control for the various units inside the embedded system.
Digital Signal Processor (DSP) a typical family of microprocessors
DSP assembly code: code for DSP stored in program memory
Dual Ported Memory: Data Memory accessible by two processors at the same time
CODEC: Compressor/Decompressor of the data
User Interface Process: The part of the RTOS that runs the software for User Interface activities
Controller Process: The part of the RTOS that runs the software for Timing and Control
amongst the various units of the embedded system
o m
.c Process
User Interface
t
Controller
o
s p
ASIC C RTOS
o g User Interface
. bl Process
p
System Bus
u
r o
DSP Digital Signal
s g Digital Signal DSP
nt
assembly code Processor Processor assembly code
d e
u
t Dual-port memory
y s CODEC
t
ci
Hardware
.
w Software
w
w Fig. 1.4 Architecture of an Embedded System
The above architecture represents a hypothetical Embedded System (we will see more realistic
ones in subsequent examples). More than one microprocessor (2 DSPs and 1 C) are employed
here to carry out different tasks. As we will learn later, the C is generally meant for simpler and
slower jobs such as carrying out a Proportional Integral (PI) control action or interpreting the
user commands etc. The DSP is a more heavy duty processor capable of doing real time signal
processing and control. Both the DSPs along with their operating systems and codes are
independent of each other. They share the same memory without interfering with each other.
This kind of memory is known as dual ported memory or two-way post-box memory. The Real
Time Operating System (RTOS) controls the timing requirement of all the devices. It executes
the over all control algorithm of the process while diverting more complex tasks to the DSPs. It
also specifically controls the C for the necessary user interactivity. The ASICs are specialized
Please click on
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Ans:
o m
t.c
(a) Ceiling Fans: These are not programmable.
(b) & (e) obey all definitions of Embedded Systems such as
o
(i) Working in Real Time (ii) Programmable (iii) A number of systems coexist on a
p
(c) g s
single platform to discharge one function(single functioned)
Television Set: Only a small part of it is programmable. It can work without being
o
(d)
programmable. It is not tightly constrained.
. bl
Desktop Keyboard: Though it has a processor normally it is not programmable.
u p
Definition of Real Time Systems r o
s g
n t is called a real-time operation if the combined
An operation within a larger dynamic system
d e
reaction- and operation-time of a task operating
maximum delay allowed, in view of circumstances
on current events or input, is no longer than the
outside the operation. The task must also
t u
occur before the system to be controlled becomes unstable. A real-time operation is not
s
necessarily fast, as slow systemsycan allow slow real-time operations. This applies for all types
c
of dynamically changing systems.it The polar opposite of a real-time operation is a batch job with
. somewhere in between the two extremes.
interactive timesharing falling
w
w is said to be hard real-time if the correctness of an operation depends
Alternately, a system
w correctness of the operation but also upon the time at which it is
not only upon the logical
performed. An operation performed after the deadline is, by definition, incorrect, and usually has
no value. In a soft real-time system the value of an operation declines steadily after the deadline
expires.
Embedded System
An embedded system is a special-purpose system in which the computer is completely
encapsulated by the device it controls. Unlike a general-purpose computer, such as a personal
computer, an embedded system performs pre-defined tasks, usually with very specific
requirements. Since the system is dedicated to a specific task, design engineers can optimize it,
reducing the size and cost of the product. Embedded systems are often mass-produced, so the
cost savings may be multiplied by millions of items.
Handheld computers or PDAs are generally considered embedded devices because of the
nature of their hardware design, even though they are more expandable in software terms. This
line of definition continues to blur as devices expand.
Ans:
Five advantages:
1. Smaller Size
2. Smaller Weight
3. Lower Power Consumption
4. Lower Electromagnetic Interference
5. Lower Price
Five disadvantages
o m
1.
2.
Lower Mean Time Between Failure
Repair and Maintenance is not possible
o t.c
3. Faster Obsolesce
s p
4. Unmanageable Heat Loss
o g
bl
5. Difficult to Design
p .
Q3. What do you mean by Reactive in Real Time. Cite an example.
u
o
Ans:
g r
s
Many embedded systems must continually treact to changes in the systems environment and
n
must compute certain results in real timeewithout delay. For example, a cars cruise controller
u
continually monitors and reacts to speed d and brake sensors. It must compute acceleration or
st In contrast
deceleration amounts repeatedly within
failure to maintain control of theycar.
a limited time; a delayed computation could result in a
(i) Mobile Telephone (ii)Digital Camera (iii) A programmable calculator (iv) An iPod (v) A
digital blood pressure machine
iPod: The iPod is a brand of portable media players designed and marketed by Apple Computer.
Devices in the iPod family are designed around a central scroll wheel (except for the iPod
shuffle) and provide a simple user interface. The full-sized model stores media on a built-in hard
drive, while the smaller iPod use flash memory. Like many digital audio players, iPods can serve
as external data storage devices when connected to a computer.
Q5. Write the model number and detailed specification of your/friends mobile telephone.
Manufacturer
Model:
Network Types: EGSM/ GSM /CDMA
Form Factor: The industry standard that defines the physical, external dimensions of a particular
device. The size, configuration, and other specifications used to describe hardware.
Battery Life Talk (hrs):
Battery Life Standby (hrs):
Battery Type:
Measurements
Weight:
Dimensions:
Display Display Type: Colour or Black & White
Display Size (px):
o m
Display Colours:
o t.c
General Options
s p
Camera:
o g
bl
Mega Pixel:
Email Client:
Games: Yes p .
High Speed Data:
ou
MP3 Player:
gr
PC Sync: Yes
s
Phonebook:
Platform Series ent
Polyphonic Ring tones:
u d
Predictive Text:
st
Streaming Multimedia:
it y
Text Messages:
Wireless Internet: Opera .c
w
w
Other Options
Alarm:
w
Bluetooth:
Calculator:
Calendar:
Data Capable:
EMS:
FM Radio:
Graphics (Custom):
Infrared:
Speaker Phone:
USB:
Vibrate:
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
2
Introduction to Real Time
Embedded Systems Part II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student will
Learn more about the numerous day-to-day real time embedded systems
Learn the internal hardware of a typical mobile phone
Learn about the important components of an RTES
Learn more about a mobile phone
Learn about the various important design issues
Also learn the design flow
o m
Pre-Requisite o t.c
s p
Digital Electronics, Microprocessors
o g
bl
p.
Common Examples Of Embedded Systems
u
r o
Some of the common examples of Embedded Systems are given below:
g
s cameras, camcorders, DVD players, portable
t
n assistants etc.
Consumer electronics cell phones, pagers, digital
e
video games, calculators, and personal digital
d
t u
s
i ty
.c
w
w
w Fig. 2.1(a) Digital Camera
o m
.c
Fig. 2.1(c) Personal Digital Assistants
t
p o
g s
b lo
Home appliances microwave ovens, answering machines. p . thermostats, home security systems,
washing machines. and lighting systems etc.
o u
gr
ts
e n
u d
st 2.1(d) Microwave Oven
Fig.
i ty
.c
w
w
w
business equipment electronic cash registers, curbside check-in, alarm systems, card readers
product scanners, and automated teller machines
o m
o t.c
s p
o g
. bl
p
Fig. 2.1(g) Electronic Cash Registers
u
r o
s g
ent
u d
st
it y
.c
Fig. 2.1(h)Electronic Card Readers
w
w
w
automobiles Electronic Control Unit(ECU) which includes transmission control, cruise control,
fuel injection, antilock brakes, and active suspension in the same or separate modules.
o m
o t.c
s p
Mobile Phone o g
. bl
typical architecture of RTES. u p
Let us take the same mobile phone as discussed in Lesson 1 as example for illustrating the
r o
s g components:
In general, a cell phone is composed of the following
n t
A Circuit board (Fig. 2.2)
d e
Antenna
t u
Microphone
y s
t
Speaker
. ci
w (LCD)
Liquid crystal display
w
Keyboard
w
Battery
o m
o t.c
s p
o g
bl
Fig. 2.2 The Cell Phone Circuitry
p .
o u
gr
ts
RF receiver (Rx)
d
u (Tx)
t
RF transmitter
s
Microphone
ti y
.c Micro- Display
w controller
w Keyboard
w
Fig. 2.3 The block diagram
A typical mobile phone handset (Fig. 2.3) should include standard I/O devices (keyboard, LCD),
plus a microphone, speaker and antenna for wireless communication. The Digital Signal
Processor (DSP) performs the signal processing, and the micro-controller controls the user
interface, battery management, call setup etc. The performance specification of the DSP is very
crucial since the conversion has to take place in real time. This is why almost all cell phones
contain such a special processor dedicated for making digital-to-analog (DA) and analog-to-
digital(AD) conversions and real time processing such as modulation and demodulation etc. The
Read Only Memory (ROM) and flash memory (Electrically Erasable and Programmable
Memory) chips provide storage for the phones operating system(RTOS) and various data such
as phone numbers, calendars information, games etc.
1. Microprocessor
This is the heart of any RTES. The microprocessors used here are different from the general
purpose microprocessors like Pentium Sun SPARC etc. They are designed to meet some specific
requirements. For example Intel 8048 is a special purpose microprocessor which you will find in
the Keyboards of your Desktop computer. It is used to scan the keystrokes and send them in a
synchronous manner to your PC. Similarly mobile phones Digital Cameras use special purpose
processors for voice and image processing. A washer and dryer may use some other type of
processor for Real Time Control and Instrumentation.
o m
2. Memory
o t.c
p
s Circuit Board(PCB) or same
The microprocessor and memory must co-exit on the same Power
g
chip. Compactness, speed and low power consumption areothe characteristics required for the
b lsemiconductor memories are used in
memory to be used in an RTES. Therefore, very low power
almost all such devices. For housing the operating system .
pduration.
Read Only Memory(ROM) is used.
The program or data loaded might exist for considerable
o u It is like changing the setup of
your Desktop Computer. Similar user defined setups
g r exist in RTES. For example you may like to
s be capable of retaining the information even
change the ring tone of your mobile and keep it for some time. You may like to change the
t
after the power is removed. In other words e n the memory should be non-volatile and should be
screen color etc. In these cases the memory should
t u
s
3. Input Output Devices ti y and Interfaces
c
.necessary
w
Input/Output interfaces are to make the RTES interact with the external world. They
could be Visual Display wspeakers
Units such as TFT screens in a mobile phone, touch pad key board,
antenna, microphones,w etc. These RTES should also have open interfaces to other
devices such as Desktop Computers, Local Area Networks (LAN) and other RTES. For example
you may like to download your address book into your personal digital assistant (PDA). Or you
may like to download some mp3 songs from your favorite internet site into your mp3 player.
These input/output devices along with standard software protocols in the RTOS provide the
necessary interface to these standards.
1
A memory technology similar in characteristics to EPROM(Erasable Programmable Read Only Memory) memory,
with the exception that erasing is performed electrically instead of via ultraviolet light, and, depending upon the
organization of the flash memory device, erasing may be accomplished in blocks (typically 64k bytes at a time)
instead of the entire device.
4. Software
The RTES is the just the physical body as long as it is not programmed. It is like the human body
without life. Whenever you switch on your mobile telephone you might have marked some
activities on the screen. Whenever you move from one city to the other you might have noticed
the changes on your screen. Or when you are gone for a picnic away from your city you might
have marked the no-signal sign. These activities are taken care of by the Real Time Operating
System sitting on the non-volatile memory of the RTES.
Besides the above an RTES may have various other components and Application Specific
Integrated Circuits (ASIC) for specialized functions such as motor control, modulation,
demodulation, CODEC.
The design of a Real Time Embedded System has a number of constraints. The following section
discusses these issues.
o m
Design Issues
o t.c
The constraints in the embedded systems design are imposed s p
by external as well as internal
g
lo
specifications. Design metrics are introduced to measure the cost function taking into account
the technical as well as economic considerations.
. b
Design Metrics u p
r o
A Design Metric is a measurable feature g
t sare conflicting requirements i.e. optimizing one
of the systems performance, cost, time for
implementation and safety etc. Most of these
e
shall not optimize the other: e.g. a cheaper
n processor may have a lousy performance as far as
speed and throughput is concerned.
u d
s t
ty into account while designing embedded systems
Following metrics are generally taken
i
NRE cost (nonrecurring .c engineering cost)
w
It is one-time cost ofw
w
designing the system. Once the system is designed, any number of units can
be manufactured without incurring any additional design cost; hence the term nonrecurring.
Suppose three technologies are available for use in a particular product. Assume that
implementing the product using technology A would result in an NRE cost of $2,000 and unit
cost of $100, that technology B would have an NRE cost of $30,000 and unit cost of $30, and
that technology C would have an NRE cost of $100,000 and unit cost of $2. Ignoring all other
design metrics, like time-to-market, the best technology choice will depend on the number of
units we plan to produce.
Unit cost
The monetary cost of manufacturing each copy of the system, excluding NRE cost.
Size
The physical space required by the system, often measured in bytes for software, and gates or
transistors for hardware.
Performance
The execution time of the system
Power Consumption
It is the amount of power consumed by the system, which may determine the lifetime of a
battery, or the cooling requirements of the IC, since more power means more heat.
Flexibility o m
o t.c
The ability to change the functionality of the system without incurring heavy NRE cost. Software
is typically considered very flexible.
s p
o g
Time-to-prototype
. bl
The time needed to build a working version of the p
u system, which may be bigger or more
expensive than the final system implementation,obut it can be used to verify the systems
gr functionality.
usefulness and correctness and to refine the systems
ts
Time-to-market e n
u d
The time required to develop a system
s t to the point that it can be released and sold to customers.
The main contributors are design
i t y time, manufacturing time, and testing time. This metric has
c
become especially demanding in recent years. Introducing an embedded system to the
marketplace early can make.a big difference in the systems profitability.
w
Maintainabilityw
w
It is the ability to modify the system after its initial release, especially by designers who did not
originally design the system.
Correctness
This is the measure of the confidence that we have implemented the systems functionality
correctly. We can check the functionality throughout the process of designing the system, and we
can insert test circuitry to check that manufacturing was correct.
Throughput
This is the number of tasks that can be processed per unit time. For example, a camera may be
able to process 4 images per second
These are the some of the cost measures for developing an RTES. Optimization of the overall
cost of design includes each of these factors taken with some multiplying factors depending on
m
their importance. And the importance of each of these factors depends on the type of application.
o
t.c
For instance in defense related applications while designing an anti-ballistic system the execution
time is the deciding factor. On the other hand, for de-noising a photograph in an embedded
p o
camera in your mobile handset the execution time may be little relaxed if it can bring down the
cost and complexity of the embedded Digital Signal Processor. s
g
o
The design flow of an RTES involves several steps. The costl and performance is tuned and fine-
. b is enumerated below.
tuned in a recursive manner. An overall design methodology
u p
Design Methodology (Fig. 2.4) r o
s g
System Requirement and Specifications
n t
Define the problem
What your embedded system is requireddto do?
e
t u control)
syour system?
Define the requirements (inputs, outputs,
y
What are the inputs and outputs of
t
i them
. c
Write down the specifications for
Specify if the signals are in digital or analogue form. Specify the voltage levels, frequency etc.
w
w segregated into the following steps
The design task can be further
Questions
Q1. Give one example of a typical embedded system other than listed in this lecture. Draw the
block diagram and discuss the function of the various blocks. What type of embedded
processor they use?
Ans:
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
A GPS receiver receives signals from a constellation of at least four out of a total of 24 satellites.
Based on the timing and other information signals sent by these satellites the digital signal
processor calculates the position using triangulation.
The major block diagram is divided into (1) Active Antenna System (2)RF/IF front end (3) The
Digital Signal Processor(DSP)
The Active Antenna System houses the antenna a band pass filter and a low noise amplifier
(LNA)
The RF/IF front end houses another band pass filter, the RF amplifier and the demodulator and
A/D converter.
The DSP accepts the digital data and decodes the signal to retrieve the information sent by the
GPS satellites.
Q2. Discuss about the Hard Disk Drive housed in your PC. Is it an RTES?
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
Ans:
u d
st
it y
Hard drives have two kinds of components: internal and external. External components are
.c
located on a printed circuit board called logic board while internal components are located in a
sealed chamber called HDA or Hard Drive Assembly.
w
w
For details browse http://www.hardwaresecrets.com/article/177/3
w
The big circuit is the controller. It is in charge of everything: exchanging data between the hard
drive and the computer, controlling the motors on the hard drive, commanding the heads to read
or write data, etc.
All these tasks are carried out as demanded by the processor sitting on the motherboard. It can be
verified to be single-functioned, tightly constrained,
Ans:
The time required to develop a system to the point that it can be released and sold to customers.
The main contributors are design time, manufacturing time, and testing time. This metric has
become especially demanding in recent years. Introducing an embedded system to the
marketplace early can make a big difference in the systems profitability.
Moore's law is the empirical observation that the complexity of integrated circuits, with respect
to minimum component cost, doubles every 24 months. It is attributed to Gordon E. Moor, a co-
founder of Intel.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
3
Embedded Systems
Components Part I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would
Pre-Requisite
Digital Electronics, Microprocessors
o m
Introduction
o t.c
s p
The various components of an Embedded System can be hierarchically grouped as
g
System Level Components to Transistor Level Components. A system (subsystem) component is
o
. bl
different than what is considered a "standard" electronic component. Standard components are
the familiar active devices such as integrated circuits, microprocessors, memory, diodes,
u p
transistors, etc. along with passives such as resistors, capacitors, and inductors. These are the
o
basic elements needed to mount on a circuit board for a customized, application-specific design.
r
s ghas active and passive components mounted on
A system component on the other hand,
n t task. (Fig. 3.1) System components can be either
circuit boards that are configured for a specific
d e as highly integrated building blocks of a system. A
single- or multi-function modules that serve
system component can be as simpleuas a digital I/O board or as complex as a computer with
video, memory, networking, and I/O s t all on a single board. System components support industry
standards and are available fromty
i multiple sources worldwide.
.c
w
w
w
System
Subsystems
(PCBs)
o m
t.c
Gate Level Components
Generally inside the
Integrated Circuits rarely outside
p o
g s
o
Fig. 3.1 The Hierarchical Components
. bl
Structure of an Embedded System
u p
r o
g
The typical structure of an embedded system is shown in Fig. 3.2. This can be compared
s
nt
with that of a Desktop Computer as shown in Fig. 3.3. Normally in an embedded system the
primary memory, central processing unit and many peripheral components including analog-to-
d e
digital converters are housed on a single chip. These single chips are called as Microcontrollers.
ut
This is shown by dotted lines in Fig. 3.2.
s
y computer may contain all these units on a single Power
i t
On the other hand a desktop
Circuit Board (PCB) called cas the Mother Board. Since these computers handle much larger
. to the embedded systems there has to be elaborate arrangements
wtransfer between the CPU and memory, CPU and input/output devices
dimension of data as compared
w
for storage and faster data
w
and memory and input/output devices. The storage is accomplished by cheaper secondary
memories like Hard Disks and CDROM drives. The data transfer process is improved by
incorporating multi-level cache and direct memory access methods. Generally no such
arrangements are necessary for embedded systems. Because of the number of heterogeneous
components in a desktop computer the power supply is required at multiple voltage-levels
(typically 12, 5, 3, 25 volts). On the other hand an Embedded Systems chip may just need
one level DC power supply (typically +5V).
In a desktop computer various units operate at different speeds. Even the units inside a
typical CPU such as Pentium-IV may operate at different speeds. The timing and control units
are complex and provide multi-phase clock signal to the CPU and other peripherals at different
voltage levels. The timing and control unit for an Embedded system may be much simpler.
Primary Memory
Power Supply
Central Processing Unit
o m
t.c
AD Converter-Analog to Digital Converter
UART Universal Asynchronous Receiver and Transmitter
p o
s
Fig. 3.2 The typical structure of an Embedded System
g
o
. bl
Primary Memory
u p
r o
sg
Cache tMemory
e n
Direct Memory Access
u d
Power Supply
st Microprocessor
it y
.c
w
w Input Output Interfaces
w
Keyboard, Hard Disk Drive,
Network Card,
Video Display Units
Typical Example
A Single Board Computer (SBC)
Since you are familiar with Desktop Computers, we should see how to make a desktop
PC on a single power circuit board. They will be called Single Board Computers or SBC.
These SBCs are typical embedded systems custom-made generally for Industrial
Applications. In the introductory lectures you should have done some exercises on your PC.
Now try to compare with this SBC with your desktop.
Let us look at an example of a single board computer from EBC-C3PLUS SBC from
Winsystems1.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
Fig. 3.4 The Single Board Computer (SBC)
w
w
Let us discuss and try to understand the features of the above single board Embedded computer.
w
This will pave the way of our understanding more complex System-On-Chip (SOC) type of
systems.
The various unit and their specifications are as follows
VIA 733MHz or 1 GHz low power C3 processor EBX-compliant board (Fig. 3.5)
This is the processor on this SBC. VIA represents the company which manufactures the
processor (www.via.com.tw), 733MHz or 1GHz is the clock frequency of this processor. C3 is
1
Courtesy WinSystems, Inc. 715 Stadium Drive, Arlington Texas 76011
http://sbc.winsystems.com/products/sbcs/ebcc3plus.html
the brand name as P3 and P4 for Intel. (You must be familiar with Intel processors as your PC
has one)
o m
Fig. 3.5 The Processor
o t.c
32 to 512MB of system PC133 SDRAM supported in a 168-pin DIMM
p
s socket
g
o on the SBC. SDRAM stands for
32 to 512 MB tells the possible Random Access Memory size l
bthis in the memory chapter. 168-pin
Synchronous Dynamic RAM. We will learn more about
DIMM stands for Dual-In-Line Memory-Modules which
.
p the memory chips and can fit into
holds
the board easily.
o u
DIMMs Look like this gr
ts
e n
u d
st
ti y
.c
w
w Fig. 3.6 DIMM
w
Socket for up to 1Giga Byte bootable DiskOnChip or 512KB SRAM or 1MB EPROM
These are Static RAMs (SRAM) or EPROM which houses the operating system just like the
Hard Disk in a Desktop computer
Type I and II Compact Flash (CF) cards supported
It is otherwise known as semiconductor hard-disk or floppy disk.
Flash memory is an advanced form of Electrically Erasable and Programmable Read Only
Memory (EEPROM). Type I and Type II are just two different designs Type II being more
compact and is a recent version.
t.c
system developers with a standardized specification for integrated PC audio devices. AC'97
p
bit playback in stereo and 48kHz/20-bit in multi-channel playback modes o
defined a high-quality audio architecture for the PC and is capable of delivering up to 96kHz/20-
cards are much smaller than ISA-bus cards found in PC's p together (104). PC104
u and stack together which eliminates the
need for a motherboard, backplane, and/or card cageo
AT keyboard controller and PS/2 mouse supportg
r
ts
An 84-key keyboard introduced with the PCn/AT. It was later replaced with the 101-key
Enhanced Keyboard.
d e
Two interrupt controllers and 7 DMA t uchannels, Three, 16-bit counter/timers, Real Time Clock,
Watch Dog Timer and Power on Self
y s Test
t
i channels, counter/timers and Real Time Clock are used for real
The interrupt controllers, DMA
. c
time applications.
w
Specifications w
w
+5 volt only operation
Mechanical
Dimensions: 5.75" x 8.0" (146mm x 203mm)
Jumpers: 0.025" square posts
Connectors
Serial, Parallel, Keyboard: 50-pin on 0.100" grid
COM3 & 4: 20-pin on 0.100" grid
Floppy Disk Interface: 34-pin on 0.100" grid
EIDE Interface: 40-pin on 0.100" grid (Primary)
44-pin on 2mm grid (Primary)
40-pin on 0.100" grid (Secondary)
50-pin 2mm Flash connector
Parallel I/O: Two, 50-pin on 0.100" grid
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Conclusion
It is apparent from the above example that a typical embedded system consist of by and large the
following units housed on a single board or chip.
1. Processor
2. Memory
3. Input/Output interface chips
4. I/O Devices including Sensors and Actuators
5. A-D and D-A converters
6. Software as operating system
7. Application Software
One or more of the above units can be housed on a single PCB or single chip
In a typical Embedded Systems the Microprocessor, a large part of the memory and major I/O
devices are housed on a single chip called a microcontroller. Being custom-made the embedded
systems are required to function for specific purposes with little user programmability. The user
interaction is converted into a series of commands which is executed by the RTOS by calling
various subroutines. RTOS is stored in a flash memory or read-only-memory. There will be
additional scratch-pad memory for temporary data storage. If the CPU sits on the same chip as
o m
memory then a part of the memory can be used for scratch-pad purposes. Otherwise a number of
o t.c
CPU registers will be required for the same. CPU communicates with the memory through the
address and data bus. The timing and control of these data exchange takes place by the control
p
unit of the CPU via the control lines. The memory which is housed on the same chip as the CPU
s
g
has the fastest transfer rate. This is also known as the memory band-width or bit rate. The
o
bl
memory outside the processor chip is slower and hence has a lesser transfer rate. On the other
p .
hand Input/Output devices have a varied degree of bandwidth. These varying degrees of data
transfer rates are handled in different ways by the processor. The slower devices need interface
u
chips. Generally chips which are faster than the microprocessor are not used.
o
r
Architecture of a typical embedded-system is shown in Fig. 3.8. The hardware unit consists of
g
s
the above units along with a digital as well as an analog subsystem. The software in the form of a
RTOS resides in the memory.
e nt
u d
st EMBEDDED SYSTEM
it y
.c hardware
w software
w digital
w mechanical
optical
subsystem
subsystem sensors analog
subsystem
actuators
Question Answers
Q1. What are the Hierarchical components in a embedded system design.
Ans:
System
Subsystems
(PCBs)
o u
r
The Hierarchical Components
g
s
Q.2. What is LVDS?
e nt
Ans: u d
st
Known as Low Voltage Differential
i ty canSignaling. The advantages of such a standard is low noise
.
and low interference such that c one increase the data transmission rate. Instead of 0 and 5 V
w
or 5V a voltage level of 1.5 or 3.3 V is used for High and 0 or 1 V is used for Low. The Low to
w
High voltage swing reduces interference. A differential mode rejects common mode noises.
w
Q.3. Is there any actuator in your mobile phone?
Ans:
There is a vibrator in a mobile phone which can be activated to indicate an incoming call or
message. Generally there is a coreless motor which is operated by the microcontroller for
generating the vibration.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
1
Introduction
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
4
Embedded Systems
Components Part II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Overview on Components
Instructional Objectives
After going through this lesson the student would
Pre-Requisite
Digital Electronics, Microprocessors o m
o t.c
You are now almost familiar with the various components of an embedded system. In this
chapter we shall discuss some of the general components such as
s p
Processors o g
Memory . bl
Input/Out Devices u p
r o
Processors s g
ent
The central processing unit is the most important component in an embedded system. It exists in
u d
an integrated manner along with memory and other peripherals. Depending on the type of
st
applications the processors are broadly classified into 3 major categories
it y
1. General Purpose Microprocessors
.c
2. Microcontrollers
w
w
3. Digital Signal Processors
w
For more specific applications customized processors can also be designed. Unless the demand is
high the design and manufacturing cost of such processors will be high. Therefore, in most of the
applications the design is carried out using already available processors in the market. However,
the Field Programmable Gate Arrays (FPGA) can be used to implement simple customized
processors easily. An FPGA is a type of logic chip that can be programmed. They support
thousands of gates which can be connected and disconnected like an EPROM (Erasable
Programmable Read Only Memory). They are especially popular for prototyping integrated
circuit designs. Once the design is set, hardwired chips are produced for faster performance.
generally cheap because of the manufacturing of large number of units. The NRE (Non-recurring
Engineering Cost: Lesson I) is spread over a large number of units. Being cheaper the
manufacturer can invest more for improving the VLSI design with advanced optimized
architectural features. Thus the performance, size and power consumption can be improved.
Most cases, for such processors the design tools are provided by the manufacturer. Also the
supporting hardware is cheap and easily available. However, only a part of the processor
capability may be needed for a specific design and hence the over all embedded system will not
be as optimized as it should have been as far as the space, power and reliability is concerned.
Processor
Control unit Datapath
ALU
Controller Control
/Status
o m
t.c
oRegisters
s p
o g
.bl
PC IR
u p
r o
s g
ent I/O
u d Memory
st
it y
.c
Fig. 4.1 The architecture of a General Purpose Processor
w
w
Pentium IV is such a general purpose processor with most advanced architectural features.
w
Compared to its overall performance the cost is also low.
A general purpose processor consists of a data path, a control unit tightly linked with the
memory. (Fig. 4.1)
The Data Path consists of a circuitry for transforming data and storing temporary data. It
contains an arithmetic-logic-unit(ALU) capable of transforming data through operations such as
addition, subtraction, logical AND, logical OR, inverting, shifting etc. The data-path also
contains registers capable of storing temporary data generated out of ALU or related operations.
The internal data-bus carries data within the data path while the external data bus carries data to
and from the data memory. The size of the data path indicates the bit-size of the CPU. An 8-bit
data path means an 8-bit CPU such as 8085 etc.
The Control Unit consists of circuitry for retrieving program instructions and for moving data to,
from, and through the data-path according to those instructions. It has a program counter(PC) to
hold the address of the next program instruction to fetch and an Instruction register(IR) to hold
Version 2 EE IIT, Kharagpur 4
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
the fetched instruction. It also has a timing unit in the form of state registers and control logic.
The controller sequences through the states and generates the control signals necessary to read
instructions into the IR and control the flow of data in the data path. Generally the address size is
specified by the control unit as it is responsible to communicate with the memory. For each
instruction the controller typically sequences through several stages, such as fetching the
instruction from memory, decoding it, fetching the operands, executing the instruction in the data
path and storing the results. Each stage takes few clock cycles.
Microcontroller
Just as you put all the major components of a Desktop PC on to a Single Board Computer (SBC)
if you put all the major components of a Single Board Computer on to a single chip it will be
called as a Microcontroller. Because of the limitations in the VLSI design most of the
input/output functions exist in a simplified manner. Typical architecture of such a
microprocessor is shown in Fig. 4.2.
o m
t.c
po
Interrupt
Address Bus
IRAM XRAM
Serial Controller
Port
g s
o
bl
Parallel
.
Port
p
C500 Core
u
ROM
Timers
r o (1 or 8 Datapointer)
Peripheral
g
Parallel
s
Bus
Port
nt
Access
Control Control
A
d e
D
t u Housekeeper
y s RST
i t Ext.
EA
Data Bus
MDU
.c Control PSEN
w Port0/Port2
ALE
w WDU
XTAL
w
Fig. 4.2 The architecture of a typical microcontroller named as C500 from
Infineon Technology, Germany
*The double-lined blocks are core to the processor. Other blocks are on-chip
The external control block handles the external control signals and the clock generation.
The access control unit is responsible for the selection of the on-chip memory resources.
The IRAM provides the internal RAM which includes the general purpose registers.
The XRAM is another additional internal RAM sometimes provided
The interrupt requests from the peripheral units are handled by an Interrupt Controller Unit.
Serial interfaces, timers, capture/compare units, A/D converters, watchdog units (WDU), or a
multiply/divide unit (MDU) are typical examples for on-chip peripheral units. The external
signals of these peripheral units are available at multifunctional parallel I/O ports or at dedicated
pins.
o m
algorithms. One of the common operations required in such applications is array multiplication.
For example convolution and correlation require array multiplication. This is accomplished by
o t.c
multiplication followed by accumulation and addition. This is generally carried out by Multiplier
and Accumulator (MAC) units. Some times it is known as MACD, where D stands for Data
move. Generally all the instructions are executed in single cycle.
s p
o g
.bl
Processing
Result/Operands
u p Data
Unit r o Memory
s g
ent
d
uAddress
Status Opcode
st
it y
Controlw
.c Instructions Program
w
Unit Memory
w Address
The Very Long Instruction Word (VLIW) architecture is also suitable for Signal Processing
applications. This has got a number of functional units and data paths as seen in Fig. 4.5. The
long instruction words are fetched from the memory. The operands and the operation to be
performed by the various units are specified in the instruction itself. The multiple functional
units share a common multi-ported register file for fetching the operands and storing the results.
o m
Parallel random access to the register file is possible through the read/write cross bar. Execution
Multi-ported Register
p
s File
g
b lo
.
Program Control Unit
p
u Cross Bar
r o
Read/Write
s g
n t
d e
Functional
u . . . . . . . Functional
st Unit 1 Unit n
it y
.c
w Instruction Cache
w
w
Fig. 4.5 Block Diagram of VLIW architecture
Microprocessors vs Microcontrollers
A microprocessor is a general-purpose digital computers central processing unit. To make a
complete microcomputer, you add memory (ROM and RAM) memory decoders, an oscillator,
and a number of I/O devices. The prime use of a microprocessor is to read data, perform
extensive calculations on that data, and store the results in a mass storage device or display the
results. These processors have complex architectures with multiple stages of pipelining and
parallel processing. The memory is divided into stages such as multi-level cache and RAM. The
development time of General Purpose Microprocessors is high because of a very complex VLSI
design.
ROM EEPROM
RAM
Microprocessor
Serial I/O
A/D
Parallel I/O
Analog I/O Input and
output
Input and
o m
t.c
output
ports ports Timer
po
D/A
g s
o PWM
. bl
u p
Fig. 4.6 A Microprocessor based System
r o
g
The design of the microcontroller is driven by the desire to make it as expandable and flexible
s
nt
as possible. Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to
on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip
d e
hardware for I/O and RAM and ROM they usually have pretty low performance CPU.
t u
Microcontrollers also often have timers that generate interrupts and can thus be used with the
s
CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a
y
it
microcontroller is to control the operations of a machine using a fixed program that is stored in
.c
ROM and does not change over the lifetime of the system. The microcontroller is concerned with
w
getting data from and to its own pins; the architecture and instruction set are optimized to handle
w
data in bit and byte size.
w
ROM EEPROM
RAM
Parallel I/O
Timer
o m Analog out
t.c
po
Microcontroller PWM Filter
g s
o Digital PWM
. bl
p
Fig. 4.7 A Microcontroller
u
r o
The contrast between a microcontroller and a microprocessor is best exemplified by the fact that
s g
most microprocessors have many operation codes (opcodes) for moving data from external
ent
memory to the CPU; microcontrollers may have one or two. Microprocessors may have one or
two types of bit-handling instructions; microcontrollers will have many.
u d
A basic Microprocessors vs a basic DSP
st
it y
.c Program
w Memory
w
w Processor
Data
Memory
5. Very limited SIMD(Single Instruction Multiple Data) features and Specialized, complex
instructions
6. Multiple operations per instruction
7. Dedicated address generation units
8. Specialized addressing [ Auto-increment Modulo (circular) Bit-reversed ]
9. Hardware looping.
10. Interrupts disabled during certain operations
11. Limited or no register Shadowing
12. Rarely have dynamic features
13. Relatively narrow range of DSP oriented on-chip peripherals and I/O interfaces
14. synchronous serial port
o m
Processor Memory
o t.c
s p
g
Fig. 4.9 Memory Organization in General Purpose Processor
o
Characterization of General Purpose Processor bl
.
1. CPUs for PCs and workstations E.g., Intel Pentium
p
u IV
r o
2. Von Neumann architecture
s g
3. Typically 1 access per cycle
n t
4. Most operations take more than 1 e cycle
u d only one operation per instruction
5.
t
General-purpose instructions Typically
s
6.
t
Often, no separate address
i y generation units
7.
.c modes
General-purpose addressing
8. w
Software loops only
w
9. w disabled
Interrupts rarely
10. Register shadowing common
11. Dynamic caches are common
12. Wide range of on-chip and off-chip peripherals and I/O interfaces
13. Asynchronous serial port...
Memory
Memory serves processor short and long-term information storage requirements while
registers serve the processors short-term storage requirements. Both the program and the data
are stored in the memory. This is known as Princeton Architecture where the data and program
occupy the same memory. In Harvard Architecture the program and the data occupy separate
memory blocks. The former leads to simpler architecture. The later needs two separate
connections and hence the data and program can be made parallel leading to parallel processing.
The general purpose processors have the Princeton Architecture.
o m
In a typical processor when the CPU needs data, it first looks in its own data registers. If
o t.c
the data isn't there, the CPU looks to see if it's in the nearby Level 1 cache. If that fails, it's off to
the Level 2 cache. If it's nowhere in cache, the CPU looks in main memory. Not there? The CPU
p
gets it from disk. All the while, the clock is ticking, and the CPU is sitting there waiting.
s
Input/Output Devices and Interface Chipslo
g
. b
Typical RTES interact with the environment and
u p users through some inbuilt hardware.
r o
Occasionally external circuits are required for communicating with user, other computers or a
network.
s g
n
In the mobile handset discussed earlier
t the input output devices are, keyboard, the display
d
screen, the antenna, the microphone, speaker,e LED indicators etc. The signal to these units may
t u
be analog or digital in nature. To generate an analog signal from the microprocessor we need an
Digital to Analog Converter(DAC) s and to accept analog signal we need and Analog to Digital
Converter (ADC). These DACtyand ADC again have certain control modes. They may also
c i the microprocessor. To synchronize and control these interface
.
operate at different speed than
chips we may need another w interface chip. Similarly we may have interface chips for keyboard,
w chips
screen and antenna. These serve as relaying units to transfer data between the processor
w
and input/output devices. The input/output devices are generally slower than the processor.
Therefore, the processor may have to wait till they respond to any request for data transfer.
Number of idle clock cycles may be wasted for doing so. However, the input-output interface
chips carry out this task without making the processor to wait or idle.
Conclusion
Besides the above units some real time embedded systems may have specific circuits included on
the same chip or circuit board. They are known as Application Specific Integrated Circuit
(ASIC). Some examples are
Questions-Answers
Q1. Enumerate the similarities and differences between the Microcontroller and Digital Signal
Processor
Ans:
Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip
i/o hardware to minimize chip count in single chip solutions. As a result of using on chip
hardware for I/O and RAM and ROM they usually have pretty low performance CPU.
Microcontrollers also often have timers that generate interrupts and can thus be used with
the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of
a microcontroller is to control the operations of a machine using a fixed program that is
stored in ROM and does not change over the lifetime of the system. The microcontroller is
o t.c
Digital Signal Processors have been designed based on the modified Harvard Architecture to
s p
handle real time signals. The features of these processors are suitable for implementing
g
signal processing algorithms. One of the common operations required in such applications is
o
bl
array multiplication. For example convolution and correlation require array multiplication.
.
This is accomplished by multiplication followed by accumulation and addition. This is
p
ou
generally carried out by Multiplier and Accumulator (MAC) units. Some times it is known
as MACD, where D stands for Data move. Generally all the instructions are executed in
gr
single cycle. These DSP units generally use Multiple Access and Multi Ported Memory
s
nt
units. Multiple access memory allows more than one access in one clock period. The Multi-
ported Memory allows multiple addresses as well Data ports. This also increases the number
of access per unit clock cycle.
d e
t u
s
Q2. Name few chips in each of the family of processors such as: Microcontroller, Digital Signal
y
it
Processor, General Purpose Processor
.c
Ans: w
w
w
Microcontroller: Intel 8051, Intel 80196, Motorola 68705
Digital Signal Processors: TI 3206711, TI 3205000
General Purpose Processor: Intel Pentium IV, Power PC
Q3. Enlist the following in the increasing order of their access speed
Flash Memory, Dynamic Memory, Cache Memory, CDROM, Hard Disk, Magnetic Tape,
Processor Memory
Ans:
Magnetic Tape, CDROM, Hard Disk, Dynamic Memory, Flash Memory, Cache Memory,
Processor Memory
Ans:
o m
Low Pass Sallen Key Butterworth Filter t.c
p o
Q5. Is it possible to implement an anti-aliasing filter in the digitalsform?
o g
Ans:
. bl
u
No it is not possible to implement an anti-aliasing filter p in digital form. Because aliasing is an
error introduced at the sampling phase of analog to o
less than twice of the highest frequency presentgthe r digital converter. If the sampling frequency is
higher signal frequencies fold back to lower
frequency band and hence can be distinguished tsin the digital/discrete domain.
e n
Q6. Download any free emulator of some
u d simple microcontrollers such as 8051, 68705 etc and
learn about it.
st
Home work i ty
. c
w of 8051 and explain the functions of various units.
Q7. Draw the internal architecture
w
See http://www.atmel.com/products/8051/
w
Q8. State with justification if the following statements are right (or wrong)
Cache memory can be a static RAM
Dynamic RAMs occupy more space per word storage
The full-form of SDRAM is static-dynamic RAM
BIOS in your PC is not a Random Access Memory (RAM)
Ans:
Cache memory can be a static RAM right
The cache memory need to have very fast access time which is possible with static RAM.
Q9. Explain the function of the following units in a general purpose processor
Instruction Register
Program Counter
Instruction Queue
Control Unit
Ans:
o m
Instruction Register: A register inside the CPU which holds the instruction code temporarily
before sending it to the decoding unit.
o t.c
Program Counter: It is a register inside the CPU which holds the address of the next instruction
s p
code in a program. It gets updated automatically by the address generation unit.
o g
Instruction Queue: A set of memory locations inside the CPU to hold the instructions in a pipe-
bl
line before rending them to the next instruction decoding unit.
.
p
Control Unit: This is responsible in generating timing and control signals for various operations
u
o
inside the CPU. It is very closely associated with the instruction decoding unit.
r
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
5
Memory-I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would
Pre-Requisite
Digital Electronics, Microprocessors
5.1 Introduction
o m
This chapter shall describe about the memory. Most of the modern computer
t .c system has been
1
designed on the basis of an architecture called Von-Neumann Architecture
p o
Input g s
Central lo
Output Processing.b Memory
Devices Unitup
r o
s gNeumann Architecture
Fig. 5.1 The Von
n t
The Memory stores the instructions as well
data. The CPU has to be directed to thed
e as data. No one can distinguish an instruction and
t uthrough the following lines
address of the instruction codes.
The memory is connected to the CPU
s
1. Address ti y
2. Data .c
w
3. Control w
w
1
http://en.wikipedia.org/wiki/John_von_Neumann. The so-called von Neumann architecture is a model for a
computing machine that uses a single storage structure to hold both the set of instructions on how to perform the
computation and the data required or generated by the computation. Such machines are also known as stored-
program computers. The separation of storage from the processing unit is implicit in this model.
By treating the instructions in the same way as the data, a stored-program machine can easily change the
instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the
need for a program to increment or otherwise modify the address portion of instructions. This became less important
when index registers and indirect addressing became customary features of machine architecture.
Data Lines
o m
Control Lines o t.c
s p
o g
bl
Fig. 5.2 The Memory Interface
.
In a memory read operation the CPU loads the p
u address onto the address bus. Most cases
o
r is transferred to the processor via the data
these lines are fed to a decoder which selects the proper memory location. The CPU then sends a
g
read control signal. The data is stored in that location
s
lines. t
In the memory write operation afternthe address is loaded the CPU sends the write control
d e memory location.
signal followed by the data to the requested
The memory can be classified t u in various ways i.e. based on the location, power
s
ty be classified as
consumption, way of data storage etc
The memory at the basic level ican
.c Array)
w
1. Processor Memory (Register
2. Internal on-chipwMemory
3. Primary Memory w
4. Cache Memory
5. Secondary Memory
Primary Memory
This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU.
These memories can be static or dynamic.
Cache Memory
This is situated in between the processor and the primary memory. This serves as a buffer to the
immediate instructions or data which the processor anticipates. There can be more than one
levels of cache memory.
o m
Secondary Memory o t.c
s p
g
These are generally treated as Input/Output devices. They are much cheaper mass storage and
o
bl
slower devices connected through some input/output interface circuits. They are generally
.
magnetic or optical memories such as Hard Disk and CDROM devices.
p
u
The memory can also be divided into Volatile and Non-volatile memory.
o
Volatile Memory gr
s
tis switched off. Semiconductor Random Access
The contents are erased when the power
e n
Memories fall into this category.
u d
t
Non-volatile Memory ys
c it
The contents are intact even. of the power is switched off. Magnetic Memories (Hard Disks),
w
Optical Disks (CDROMs),
w Read Only Memories (ROM) fall under this category.
w
CPU
Control Unit
ALU
Registers
Output
Input
o m
o t.c
Memory s p
o g
. bl
Fig. 5.3 The Internal Registers
u p
5.2 Data Storage
r o
g
An m word memory can store m x n: m wordssof n bits each. One word is located at one address
therefore to address m words we need. n t
deaddress m = 2 words
k = Log2(m) address input signals
or k number address linesucan k
s t
Example 4,096 x 8 memory: ty
i
.c
32,768 bits
w input signals
12 address
8w w
input/output data signals
m n memory
m words
Memory access
The memory location can be accessed by placing the address on the address lines. The control
lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses
to different locations simultaneously
memory external view
A0
o m
Ak-1
o t.c
s p
o g
bl Q
p.
Qn-1 0
o u
Fig. 5.5 Memory Array
gr
Memory Specifications ts
en
d
The specification of a typical memory is as follows
u
st
The storage capacity: The number of bits/bytes or words it can store
The memory access time (read access and write access): How long the memory takes to
it y
load the data on to its data lines after it has been addressed or how fast it can store the data upon
.c
supplied through its data lines. This reciprocal of the memory access time is known as Memory
w
Bandwidth w
w
The Power Consumption and Voltage Levels: The power consumption is a major factor
in embedded systems. The lesser is the power consumption the more is packing density.
Size: Size is directly related to the power consumption and data storage capacity.
Generation 1
Generation 2
Generation 3
Generation 4
are concerned. o m
There are two important specifications for the Memory as far as Real Time Embedded Systems
Write Ability
Storage Performance o t.c
s p
Write ability
o g
. bl
It is the manner and speed that a particular memory can be written
Ranges of write ability u p
High end r o
s g
nt
processor writes to memory simply and quickly e.g., RAM
Middle range e
d
processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically
u
t
Erasable and Programmable Read Only Memory)
s
Lower range
it y
.c
special equipment, programmer, must be used to write to memory e.g.,
w
EPROM, OTP ROM (One Time Programmable Read Only Memory)
Low end w
w
bits stored only during fabrication e.g., Mask-programmed ROM
In-system programmable memory
Can be written to by a processor in the embedded system using the memory
Memories in high end and middle range of write ability
Storage permanence
It is the ability to hold the stored bits.
Range of storage permanence
High end
essentially never loses bits
e.g., mask-programmed ROM
Middle range
holds bits days, months, or years after memorys power source turned off
e.g., NVRAM
Lower range
holds bits as long as power supplied to memory
e.g., SRAM
Low end
begins to lose bits almost immediately after written
e.g., DRAM
Nonvolatile memory
Holds bits after power is no longer supplied
High end and middle range of storage permanence
o m
5.3 Common Memory Types
o t.c
Read Only Memory (ROM) s p
o g
This is a nonvolatile memory. It can only be read from butb l not written to, by a processor in an
p .
embedded system. Traditionally written to, programmed, before inserting to embedded system
Uses u
o processor
r
Store software program for general-purpose
g
s
program instructions can bet one or more ROM words
n
Store constant data needed dbyesystem
Implement combinational t ucircuit
s
ti y External view
.c
w k
wenable 2 n ROM
w A 0
Ak-1
Qn-1 Q0
Fig. 5.7 The ROM Structure
Example
The figure shows the structure of a ROM. Horizontal lines represents the words. The vertical
lines give out data. These lines are connected only at circles. If address input is 010 the decoder
sets 2nd word line to 1. The data lines Q3 and Q1 are set to 1 because there is a programmed
connection with word 2s line. The word 2 is not connected with data lines Q2 and Q0. Thus the
output is 1010
Internal view
8 4 ROM
word 0
enable 38 word 1
decoder word 2
A0 word line
A1
A2
data line
o m
t.c
programmable
connection wired-OR
Q3 Q2 Q1 Q0 p o
g s
o
Fig. 5.8 The example of a ROM with decoder and data storage
.bl
p
Implementation of Combinatorial Functions
u
o
Any combinational circuit of n functions of same rk variables can be done with 2 x n ROM. Thek
e
Truth table ud
Inputs (address)
st Outputs
82 ROM
a
0
b
0
c
0 it y y
0
z
0 0 0
word 0
0 0
. c
1 0 1 0
0
1
1
word 1
0
0
1
1 w 0
1
0
1
1
0 1 0
1 w
0 0 1 0
enable
1 0
1
1
w 0
1
1
0
1
1
1
1
c
b
1
1
1
1
1 1 1 1 1 1 1 word 7
a
y z
Mask-programmed ROM
The connections programmed at fabrication. They are a set of masks. It can be written only
once (in the factory). But it stores data for ever. Thus it has the highest storage permanence. The
bits never change unless damaged. These are typically used for final design of high-volume
systems.
t.c
gate causes negative charges to move out of channel and get trapped in floating gate storing a
logic 0. The (Erase) Shining UV rays on surface of floating-gate causes negative charges to
p o
return to channel from floating gate restoring the logic 1. An EPROM package showing quartz
window through which UV light can pass. The EPROM has
g s
Better write ability o
.
can be erased and reprogrammed thousands of timesbl
Reduced storage permanence
u p
o
program lasts about 10 years but is susceptible to radiation and electric noise
r
Typically used during design development
s g
ent
0V
u d +15V
floating
st
ity (b)
.c
w
(a)
(d)
w
w
5-30 min
(c)
EEPROM
EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It
is erased typically by using higher than normal voltage. It can program and erase individual
words unlike the EPROMs where exposure to the UV light erases everything. It has
Version 2 EE IIT, Kharagpur 11
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Flash Memory
o m
It is an extension of EEPROM. It has the same floating gate principle and same write ability and
t.c
storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once,
g s
Entire block must be read, word updated, then entire block written back
o
bl
Used with embedded systems storing large data items in nonvolatile memory
.
p
e.g., digital cameras, TV set-top boxes, cell phones
u
RAM: Random-access memorygr
o
ts
Typically volatile memory
en
u d
bits are not held without power supply
st
Read and written to easily by embedded system during execution
t y
Internal structure more complex than ROM
i
.c
a word consists of several memory cells, each storing 1 bit
w
w
each input and output data line connects to each cell in its column
w
rd/wr connected to every cell
when row is enabled by decoder, each cell has logic that stores input data bit when
rd/wr indicates write or outputs stored bit when rd/wr indicates read
external view
r/w
2k n read and
enable write memory
A0
Ak-1
Qn-1 Q0
Fig. 5.11 The structure of RAM
internal view
I3 I2 I1 I0
o m
44 RAM
ot.c
enable 24
s p
decoder
o g
A0
A1 . bl
u p Memory
rd/wr r o cell
g
To every cell
s
e nt Q Q Q 3 2 Q
u d
Fig. 5.12 The RAM decoder and access
st
Basic types of RAM it y
.c
w
SRAM: Static RAM
w
Memorywcell uses flip-flop to store bit
Requires 6 transistors
Holds data as long as power supplied
DRAM: Dynamic RAM
Memory cell uses MOS transistor and capacitor to store bit
More compact than SRAM
Refresh required due to capacitor leak
words cells refreshed when read
Typical refresh rate 15.625 microsec.
Slower to access than SRAM
SRAM
DRAM
Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
o m
Popular low-cost high-density alternative to SRAM
NVRAM: Nonvolatile RAM o t.c
s p
Holds data after external power removed
o g
Battery-backed RAM
. bl
p
SRAM with own permanently connected battery
u
writes as fast as reads
r o
s g nonvolatile ROM-based memory
no limit on number of writes unlike
SRAM with EEPROM or flashnstores t complete RAM contents on EEPROM or flash
before power
d e
t u
5.4 Example: HM6264 s& 27C256 RAM/ROM devices
ti y
Low-cost low-capacity.c memory devices
Commonly used inw8-bit microcontroller-based embedded systems
w digits indicate device type
w
First two numeric
RAM: 62
ROM: 27
Subsequent digits indicate capacity in kilobits
Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
HM6264 85-100 .01 15 5
27C256 90 .5 100 5
device characteristics
o g
bl
/CS1 /CS1
CS2
p .
CS2
timing diagrams
o u
gr
s
t memory device
5.5 Example: TC55V2325FF-100
e n
d
u32-bit
st
2-megabit synchronous pipelined
Designed to be interfaced with
burst SRAM memory device
processors
Capable of fast sequentialy
it reads and writes as well as single byte I/O
. c
w
w
w
data<310> Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
TC55V23 10 na 1200 3.3
addr<150> 25FF-100
/CS1
/ADV /OE
p o
CLK /CS1 and /CS2
g s
o
TC55V2325
FF-100
CS3
. bl
data<310>
u p
block diagram
r o
s gtiming diagram
2m n
enable ROM
o m
Qn-1
Q0
o t.c
s p
o g
bl
2m 3n ROM
.2 n
enable 2m n
ROM
p
u ROM
m
2m n
ROM
Increase width
A0
r o
of words
Am
s g
e nt
u d
Q3n-1 Q2n-1 Q0
st
it y
c
. Increase number
A
w
w and width of
w words
enable
outputs
5.7 Conclusion
In this chapter you have learnt about the following
1. Basic Memory types
2. Basic Memory Organization
3. Definitions of RAM, ROM and Cache Memory
5.8 Questions
Q1. Discuss the various control signals in a typical RAM device (say HM626)
Ans:
2,23,21,24,
25, 3-10
addr<15...0>
o m
t.c
22 /OE
27 /WE
p o
20 /CS1
HM6264 gs
26 CS2
o
.bl
u p
/OE: output enable bar: the output is enables when it is low. It is same as the read bar line
/WE: write enable bar: the line has to made low while writing to this device
r o
CS1: chip select 1 bar: this line has to be made low along with CS2 bar to enable this chip
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
6
Memory-II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would
Memory Hierarchy
Cache Memory
- Different types of Cache Mappings
- Cache Impact on System Performance
Dynamic Memory
- Different types of Dynamic RAMs
Memory Management Unit
Pre-Requisite
Digital Electronics, Microprocessors
o m
6.1 Memory Hierarchy
ot.c
s p
Objective is to use inexpensive, fast memory
Main memory o g
bl
Large, inexpensive, slow memory stores entire program and data
.
Cache
u p
Small, expensive, fast memory stores copy of likely accessed parts of larger memory
Can be multiple levels of cache r o
s g
ent
u d Process
st
ty Registers
. ci
w Cache
w
w
Main memory
Disk
Tape
6.2 Cache
Usually designed with SRAM
faster but more expensive than DRAM
Usually on same chip as processor
space limited, so much smaller than off-chip main memory
faster access (1 cycle vs. several cycles for main memory)
Cache operation
Request for main memory access (read or write)
First, check cache for copy
cache hit
- copy is in cache, quick access
cache miss
- copy not in cache, read address and possibly its neighbors into cache
Several cache design choices
cache mapping, replacement policies, and write techniques o m
o t.c
6.3 Cache Mapping
s p
o g
bl
is necessary as there are far fewer number of available cache addresses than the memory
Are address contents in cache?
p .
Cache mapping used to assign main memory address to cache address and determine hit
or miss
ou
Three basic techniques:
gr
Direct mapping s
Fully associative mapping
Set-associative mapping ent
u d
Caches partitioned into indivisible blocks or lines of adjacent memory addresses
st
usually 4 or 8 addresses per line
it y
Direct Mapping .c
w
w
Main memory address divided into 2 fields
w
Index which contains
- cache address
- number of bits determined by cache size
Tag
- compared with tag stored in cache at address indicated by index
- if tags match, check valid bit
Valid bit
indicates whether data in slot has been loaded from memory
Offset
used to find particular word in cache line
V T D
Data
Valid
=
u p
r o
Tag
g
Offset
s
nt
Data
V T V
d e T V T
t u Valid
=
y s =
t
ci
=
.
w
w
w
Fig. 6.3 Fully Associative Mapping
Set-Associative Mapping
Compromise between direct mapping and fully associative mapping
Index same as in direct mapping
But, each cache address contains content and tags of 2 or more memory address locations
Tags of that set simultaneously compared as in fully associative mapping
Cache with set size N called N-way set-associative
2-way, 4-way, 8-way are common
V T D V T D
Data
Valid
= =
0.14
d e
t u
0.12
y s
it
% cache miss
.c
0.1 1 way
0.08 w 2 way
w 4 ways
0.06
w 8 way
0.04
0.02
0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
t.c
strobe) and cas (column address strobe) signals, respectively
Refresh circuitry can be external or internal to DRAM device
p o
strobes consecutive memory address periodically causing memory content to be
refreshed
g s
o
Refresh circuitry disabled during read or write operation
.bl
data
u p Refresh
r o Circuit
. Buffer
s g Sense
nt Amplifiers
In
Buffer
rd/ wr
Data
Addr
Col d ecas Col Decoder ras, clock
t u
y s
t
ci
Row
Out Buff
Buffer
. er
Decod
er
cas,
Dataw Addr.
w Row ras
address w Bit storage array
ras
cas
data
.c
w Enhanced Synchronous (ES) DRAM
(S)ynchronous and
w
w
SDRAM latches data on active edge of clock
Eliminates time to detect ras/cas and rd/wr signals
A counter is initialized to column address then incremented on active edge of clock to
access consecutive memory locations
ESDRAM improves SDRAM
added buffers enable overlapping of column addressing
faster clocking and lower read/write latency possible
clock
ras
cas
6.12 Question
Q1. Discuss different types of cache mappings.
Ans:
Ans:
0.16
0.14
0.12
0.1 1 way
% cache miss
0.08
o m 2 way
t.c
4 ways
0.06 8 way
0.04
s po
o g
bl
0.02
0
p .
1 Kb 2 Kb 4 Kb 8 Kb
o u
16 Kb 32 Kb 64 Kb 128 Kb
cache size
SDRAM
clock
ras
cas
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
7
Digital Signal Processors
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn
o Architecture of a Real time Signal Processing Platform
o Different Errors introduced during A-D and D-A converter stage
o Digital Signal Processor Architecture
o Difference in the complexity of programs between a General Purpose Processor
and Digital Signal Processor
Pre-Requisite
Digital Electronics, Microprocessors
Introduction o m
o t.c
Evolution of Digital Signal Processors
Comparative Performance with General Purpose Processor s p
o g
7.1 Introduction bl
.
Digital Signal Processing deals with algorithms for u
p
o handling large chunk of data. This branch
identified itself as a separate subject in 70s when rengineers thought about processing the signals
s
arising from nature in the discrete form. Developmentg of Sampling Theory followed and the
design of Analog-to-Digital converters gave t
n an impetus in this direction. The contemporary
applications of digital signal processingewas mainly in speech followed by Communication,
Seismology, Biomedical etc. Later on
u d the field of Image processing emerged as another
important area in signal processing. t
y s
t
idifferent processor classes
The following broadly defines
. c
General Purpose - high performance
Pentiums, w
Usedw
w Alpha's, SPARC
for general purpose software
Heavy weight OS - UNIX, NT
Workstations, PC's
Embedded processors and processor cores
ARM, 486SX, Hitachi SH7000, NEC V800
Single program
Lightweight, real-time OS
DSP support
Cellular phones, consumer electronics (e. g. CD players)
Microcontrollers
Extremely cost sensitive
Small word size - 8 bit common
Highest volume processors by far
Automobiles, toasters, thermostats, ...
A Digital Signal Processor is required to do the following Digital Signal Processing tasks in real
time
Signal Modeling
Difference Equation
Convolution
Transfer Function
Frequency Response
Signal Processing
Data Manipulation
Algorithms
Filtering
Estimation
i ty
.c
w
w
w
Digital Processing
The above figure represents a Real Time digital signal processing system. The measurand can be
temperature, pressure or speech signal which is picked up by a sensor (may be a thermocouple,
microphone, a load cell etc). The conditioner is required to filter, demodulate and amplify the
signal. The analog processor is generally a low-pass filter used for anti-aliasing effect. The ADC
block converts the analog signals into digital form. The DSP block represents the signal
processor. The DAC is for Digital to Analog Converter which converts the digital signals into
analog form. The analog low-pass filter eliminates noise introduced by the interpolation in the
DAC.
ADC
x (t ) xs ( t ) xq ( t ) bbits
Sampler Quantizer Coder
x ( n) xq ( n ) xb ( n )
p(t )
DAC
bbits
Decoder Sample/hold
xb ( n ) y ( n)
o m
Fig. 7.2 D-A and A-D Conversion Process
t.c
The performance of the signal processing system depends to the large p o extent on the ADC. The
g
ADC is specified by the number of bits which defines the resolution.s The conversion time
decides the sampling time. The errors in the ADC are due toothe finite number of bits and finite
conversion time. Some times the noise may be introduced b
l
. by the switching circuits.
pand the settling time at the output.
Similarly the DAC is represented by the number of bits
o u
A DSP tasks requires
Repetitive numeric computations g
r
Attention to numeric fidelity ts
High memory bandwidth, mostly e nvia array accesses
Real-time processing
u d
And the DSP Design should minimize
s t
Cost
i ty
Power
Memory use .c
w
Development time
w
w
Take an Example of FIR filtering both by a General Purpose Processor as well as DSP
Example
FIR Filtering
x (k ) y (k )
h(k )
y ( k ) = ( h0 + h1 z 1 + h2 z 2 + L + hN 1 z N 1 ) x ( k )
= h0 x ( k ) + h1 x ( k 1) + h2 x ( k 2 ) + L + hN 1 x ( k N + 1)
N 1
= hi x ( k i ) = h ( k ) * x ( k )
i =0
An FIR (Finite Impulse Response filter) is represented as shown in the following figure. The
output of the filter is a linear combination of the present and past values of the input. It has
several advantages such as:
Linear Phase
Stability
Improved Computational Time
x (k) o m
1 h0 o t.c
s p
o g
z-1 h1 p. bl
u y (k)
s gr o
z -1
enth2
u d
st
it y
.c
-1w
wz
w hN -1
Fig. 7.3 Tapped Delay Line representation of an FIR filter
tst ctr
jnz loop
sw b,(r2)
inc r2
This program assumes that the finite window of input signal is stored at the memory location
starting from the address specified by r1 and the equal number filter coefficients are stored at the
memory location starting from the address specified by r0. The result will be stored at the
memory location starting from the address specified by r2. The program assumes the content of
the register b as 0 before the start of the loop.
lw x0, (r0)
lw y0, (r1)
These two instructions load x0 and y0 registers with values from the memory location specified
by the registers r0 and r1 with values x0 and y0
o m
mul a, x0,y0
o t.c
s p
This instruction multiplies x0 with y0 and stores the result in a.
o g
add b,a,b
. bl
operation) and stores the result in b. u p
This instruction adds a with b (which contains already accumulated result from the previous
r o
inc r0
s g
inc r1
dec ctr ent
tst ctr
u d
jnz loop
st
it y
.c
The above portion of the program increment the registers to point to the next memory location,
decrement the counters, to see if the filter order has been reached and tests for 0. It jumps to the
w
start of the loop.
w
sw b,(r2)
w
inc r2
This stores the final result and increments the register r2 to point to the next location.
Instruction
Memory
Processor
Data
Memory
Datapath:
o m
Mem o t.c
T-Register
s p
o g
. bl
Multiplier
u p
ALUgr
o P-Register
ts
n
de
Accumulator
u
t
i ys
t Fig. 7.4 Basic TMS32010 Architecture
.c
w
The program for the FIR filter (for a 3rd order) is given as follows
w
w
Here X4, H4, ... are direct (absolute) memory addresses:
LT X4 ;Load T with x(n-4)
MPY H4 ;P = H4*X4
;Acc = Acc + P
LTD X3 ;Load T with x(n-3); x(n-4) = x(n-3);
MPY H3 ; P = H3*X3
; Acc = Acc + P
LTD X2
MPY H2
...
Two instructions per tap, but requires unrolling
; for comment lines
II. Questions
1. Discuss the different errors introduced in a typical real time signal processing systems.
Answers
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
8
General Purpose
Processors - I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Pre-requisite
Digital Electronics
8.1 Introduction
The first single chip microprocessor came in 1971 by Intel Corporation. It was called Intel 4004
o m
and that was the first single chip CPU ever built. We can say that was the first general purpose
processor. Now the term microprocessor and processor are synonymous. The 4004 was a 4-bit
o t.c
processor, capable of addressing 1K data memory and 4K program memory. It was meant to be
used for a simple calculator. The 4004 had 46 instructions, using only 2,300 transistors in a 16-
p
pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8
s
g
microseconds). In 1975, Motorola introduced the 6800, a chip with 78 instructions and probably
o
bl
the first microprocessor with an index register. In 1979, Motorola introduced the 68000. With
p .
internal 32-bit registers and a 32-bit address space, its bus was still 16 bits due to hardware
prices. On the other hand in 1976, Intel designed 8085 with more instructions to enable/disable
ou
three added interrupt pins (and the serial I/O pins). They also simplified hardware so that it used
gr
only +5V power, and added clock-generator and bus-controller circuits on the chip. In 1978,
s
nt
Intel introduced the 8086, a 16-bit processor which gave rise to the x86 architecture. It did not
contain floating-point instructions. In 1980 the company released the 8087, the first math co-
e
processor they'd developed. Next came the 8088, the processor for the first IBM PC. Even
d
u
though IBM engineers at the time wanted to use the Motorola 68000 in the PC, the company
t
s
already had the rights to produce the 8086 line (by trading rights to Intel for its bubble memory)
y
it
and it could use modified 8085-type components (and 68000-style components were much more
scarce).
.c
w
w
Table 1 Development History of Intel Microprocessors
Intel Processor w
Year of Initial Clock Number of Circuit Line
Introduction Speed Transistors Width
4004 1971 108 kHz 2300 10 micron
8008 1972 500-800 KHz 3500 10 micron
8080 1974 2 MHz 4500 6 micron
8086 1978 5 MHz 29000 3 micron
8088 1979 5 MHz 29000 3 micron
Intel286TM 1982 6 MHz 134,000 1.5 micron
Intel386TM 1985 16 MHz 275,000 1.5 micron
Intel486TM 1989 25 MHz 1.2 Million 1 Micron
PentiumTM 1993 66 MHz 3.1 Million 0.8 Micron
PentiumTM Pro 1995 200 MHz 5.5 Million 0.35 Micron
PentiumTM II 1997 300 MHz 7.5 Million 0.25 Micron
The development history of Intel family of processors is shown in Table 1. The Very Large Scale
Integration (VLSI) technology has been the main driving force behind the development.
o m
o t.c
s p
o g
. bl
u p
r o
s g
n t photograph
Fig. 8.2 The
BTB
Translate
o m
X
. b D
D-Cache &
- 64 KB
D-TLB
- 128-ent 8-way
u p
4 way 8-ent PDC
r o G
s g Execute
n t Integer ALU FP
e
d Store-Branch
E
MMX/
Q
t u S 3D FP
y s unit unit
i t Write back W
Buffers.c
Store
w
w
wWrite
Buffers
Specification
Name: VIA C3TM in EBGA: VIA C3 is the name of the company and EBGA for Enhanced
Ball Grid Array, clock speed is 1 GHz
Ball Grid Array. (Sometimes abbreviated BG.) A ball grid array is a type of microchip
connection methodology. Ball grid array chips typically use a group of solder dots, or balls,
arranged in concentric rectangles to connect to a circuit board. BGA chips are often used in
mobile applications where Pin Grid Array (PGA) chips would take up too much space due to the
length of the pins used to connect the chips to the circuit board.
SIMM
DIP
PGA
o m
o t.c
s p
o g
. bl
u p
r o SIP
s g
Fig. 8.4 tPin Grid Array (PGAA)
e n
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
Fig. 8.6 The Bottom View of the Processor
s g
The Architecture
e nt
d
u lined structure:
t
The processor has a 12-stage integer pipe
s characteristic of a modern general purpose processor. A
t y
ci stored in memory. During execution a processor has to fetch
Pipe Line: This is a very important
.
program is a set of instructions
these instructions from thew memory, decode it and execute them. This process takes few clock
cycles. To increase thewspeed of such processes the processor divide itself into different units.
While one unit gets wthe instructions from the memory, another unit decodes them and some other
unit executes them. This is called pipelining. This can be termed as segmenting a functional unit
such that it can accept new operands every cycle while the total execution of the instruction may
take many cycles. The pipeline construction works like a conveyor belt accepting units until the
pipeline is filled and than producing results every cycle. The above processors has got such a
pipeline divided into 12stages
There are four major functional groups: I-fetch, decode and translate, execution, and data
cache.
The I-fetch components deliver instruction bytes from the large I-cache or the
external bus.
The decode and translate components convert these instruction bytes into internal
execution forms. If there is any branching operation in the program it is identified
here and the processor starts getting new instructions from a different location.
The execution components issue, execute, and retire internal instructions
The data cache components manage the efficient loading and storing of execution
data to and from the caches, bus, and internal components
p .
o u
Fig. 8.7
g r
Cache) or external bus into the instruction n ts
First three pipeline stages (I, B, V) deliver aligned instruction data from the I-cache (Instruction
decode buffers. The primary I-cache contains 64 KB
organized as four-way set associative with
d e 32-byte lines. The associated large I-TLB(Instruction
u
Translation Look-aside Buffer) contains 128 entries organized as 8-way set associative.
t
y s
TLB: translation look-aside buffer
t
i that contains information about the pages in memory the
c
a table in the processors memory
.
w addresses in physical memory that the program has most recently
processor has accessed recently. The table cross-references a programs virtual addresses with
w
the corresponding absolute
w
used. The TLB enables faster computing because it allows the address processing to take place
independent of the normal address-translation pipeline.
The instruction data is predecoded as it comes out of the cache; this predecode is overlapped
with other required operations and, thus, effectively takes no time. The fetched instruction data is
placed sequentially into multiple buffers. Starting with a branch, the first branch-target byte is
left adjusted into the instruction decode buffer.
Predecode V
BTB
Translate oXm
ot.c
Fig. 8.8
s p
o g
bl
Instruction bytes are decoded and translated into the internal format by two pipeline stages (F,X).
p .
The F stage decodes and formats an instruction into an intermediate format. The internal-
format instructions are placed into a five-deep FIFO(First-In-First-Out) queue: the FIQ. The X-
o u
stage translates an intermediate-form instruction from the FIQ into the internal
r
microinstruction format. Instruction fetch, decode, and translation are made asynchronous from
g
s
execution via a five-entry FIFO queue (the XIQ) between the translator and the execution unit.
BTB
Tran
Fig. 8.9
Integer Unit
o m
o t.c
BT Translate
s p X
o g
bl
Bus L2 4-entry inst Q ROM
Unit 64 Kb 4-way
p . Register R
o u
g r
address calculation A
D-Cache & s D-TLB
t
- 64 KB n - 128-ent 8-way
D
4 way e
u d 8-ent PDC G
st
ti y Integer ALU E
.c
w Store-Branch S
w
w Store
Writeback W
Buffers
Write
Buffers
Fig. 8.10
Decode stage (R): Micro-instructions are decoded, integer register files are accessed and
resource dependencies are evaluated.
Addressing stage (A): Memory addresses are calculated and sent to the D-cache (Data Cache).
Version 2 EE IIT, Kharagpur 10
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Cache Access stages (D, G): The D-cache and D-TLB (Data Translation Look aside Buffer) are
accessed and aligned load data returned at the end of the G-stage.
Execute stage (E): Integer ALU operations are performed. All basic ALU functions take one
clock except multiply and divide.
Store stage (S): Integer store data is grabbed in this stage and placed in a store buffer.
Write-back stage (W): The results of operations are committed to the register file.
BTB Translate X
g s
o
bl
address calculation A
D-Cache & p .
D-TLB D
u
o- 128-ent 8-way
- 64 KB 4-way r
g 8-ent PDC G
s
ent
u d
st Integer ALU E
it y
.c
w Fig. 8.11
w
w
The D-cache contains 64 KB organized as four-way set associative with 32-byte lines. The
associated large D-TLB contains 128 entries organized as 8-way set associative. The cache,
TLB, and page directory cache all use a pseudo-LRU (Least Recently Used) replacement
algorithm
BTB
Translator
Socket L2 cache
370 Bus 4-entry inst Q
Bus Unit
64 Kb 4-way
Register F
address calculation
o m
t.c
- 64 KB 4-way - 128-ent 8-way
8-ent PDC
p o
g s
o
bl
Fig. 8.12
.
p
The L2 cache at any point in time are not containeduin the two 64-KB L1 caches. As lines are
r
displaced from the L1 caches (due to bringing in new o lines from memory), the displaced lines are
placed in the L2 cache. Thus, a future L1-cache gmiss on this displaced line can be satisfied by
returning the line from the L2 cache instead oftshaving to access the external memory.
e n
FP, MMX and 3D
u d
st
i ty Uni
.c Ececute
w
w
w Integer ALU E
FP
Q
Store-Branch S MMX/
FP
3D
Writeback Unit
W Unit
Fig. 8.13
FP; Floating Point Processing Unit
MMX: Multimedia Extension or Matrix Math Extension Unit
In addition to the integer execution unit, there is a separate 80-bit floating-point execution unit
that can execute floating-point instructions in parallel with integer instructions. Floating-point
instructions proceed through the integer R, A, D, and G stages. Floating-point instructions are
passed from the integer pipeline to the FP-unit through a FIFO queue. This queue, which runs at
the processor clock speed, decouples the slower running FP unit from the integer pipeline so that
the integer pipeline can continue to process instructions overlapped with FP instructions. Basic
arithmetic floating-point instructions (add, multiply, divide, square root, compare, etc.) are
represented by a single internal floating-point instruction. Certain little-used and complex
floating point instructions (sin, tan, etc.), however, are implemented in microcode and are
represented by a long stream of instructions coming from the ROM. These instructions tie up
the integer instruction pipeline such that integer execution cannot proceed until they complete.
This processor contains a separate execution unit for the MMX-compatible instructions. MMX
o m
instructions proceed through the integer R, A, D, and G stages. One MMX instruction can issue
o t.c
into the MMX unit every clock. The MMX multiplier is fully pipelined and can start one non-
dependent MMX multiply[-add] instruction (which consists of up to four separate multiplies)
p
every clock. Other MMX instructions execute in one clock. Multiplies followed by a dependent
s
g
MMX instruction require two clocks. Architecturally, the MMX registers are the same as the
o
bl
floating-point registers. However, there are actually two different register files (one in the FP-
.
unit and one in the MMX units) that are kept synchronized by hardware.
p
There is a separate execution unit for some specific u 3D instructions. These instructions provide
r o
assistance for graphics transformations via new SIMD(Single Instruction Multiple Data) single-
s g
precision floating-point capabilities. These instruction-codes proceed through the integer R, A,
D, and G stages. One 3D instruction can issuet into the 3D unit every clock. The 3D unit has two
n
single-precision floating-point multiplierseand two single-precision floating-point adders. Other
u d and reciprocal square root are provided. The multiplier
functions such as conversions, reciprocal,
and adder are fully pipelined and cant start any non-dependent 3D instructions every clock.
y s
8.3 Conclusion .ci
t
w the architecture of a typical modern general purpose processor(VIA
w
This lesson discussed about
C3) which similar towthe x86 family of microprocessors in the Intel family. In fact this processor
uses the same x86 instruction set as used by the Intel processor. It is a pipelined architecture. The
General Purpose Processor Architecture has the following characteristics
Multiple Stages of Pipeline
More than one Level of Cache Memory
Branch Prediction Mechanism at the early stage of Pipe Line
Separate and Independent Processing Units (Integer Floating Point, MMX, 3D etc)
Because of the uncertainties associated with Branching the overall instruction execution
time is not fixed (therefore it is not suitable for some of the real time applications which
need accurate execution speed)
It handles a very complex instruction set
The over all power consumption because of the complexity of the processor is higher
In the next lesson we shall discuss the signals associated with such a processor.
y s
c it
.
3rd
wOptional
Level Cache
w
w
2nd Level Cache 1st Level Cache
8-Way 4-Way
Front End
Trace Cache Execution
Fetch/Decode Microcode Out-Of-Order Retirement
ROM Core
Q.2 Superscalar architecture refers to the use of multiple execution units, to allow the
processing of more than one instruction at a time. This can be thought of as a form of
"internal multiprocessing", since there really are multiple parallel processors inside the
CPU. Most modern processors are superscalar; some have more parallel execution units
than others. It can be said to consist of multiple pipelines.
LI
w
I Input Leakage Current 100
I Output Leakage Current
LO 100 A
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
9
General Purpose
Processors - II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Signals
In this lesson the student will learn the following
Signals of a General Purpose Processor
Multiplexing
Address Signals
Data Signals
Control
Bus Arbitration Signals
Status Signal Indicators
Sleep State Indicators
Interrupts
Pre-requisite
o m
t.c
Digital Electronics
9.1 Introduction p o
g s
o
bl
The input/output signals of a processor chip are the matter discussion in this chapter. We shall
p .
take up the same VIA C3 processor as discussed in the last chapter.
In the design flow of a processor the internal architecture is determined and simulated for
optimal performance.
o u
gr
s
APPLICATION REQUIREMENT
ent
CAPTURE
u d INSTRUCTION SET
DESIGN AND CODING
FUNCTIONAL
st
it y
.c ASIC SW
INITIAL ABSTRACT
w FINAL INSTRUCTION SET & HW TOOLS
w
INSTRUCTION SET INITIAL ARCHITECTURE FLOW FLOW
w
ENVIRONMENT REQUIREMENT EXPLORATION OF
CAPTURE ARCHITECTURES
AUGMENTED ABSTRACT
FINAL INSTRUCTION SET & PROCESSOR TOOLS &
INSTRUCTION SET
FINAL ARCHITECTURE HW IMPLEMENTATION
ARCHITECTURE
The basic architecture decides the signals. Broadly the signals can be classified as:
1. Address Signals
2. Data Signals
3. Control Signals
4. Power Supply Signals
Some of these signals are multiplexed in time for making the VLSI design easier and efficient
without affecting the over all performance.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
y
it
c
. Fig. 9.2 Bottom View of the Processor
w
w
9.2 w VIA Processor discussed earlier
Signals of
The following lines discuss the various signals associated with the processor.
A[31:3]# The address Bus provides addresses for physical memory and external I/O devices.
During cache inquiry cycles, A31#-A3# are used as inputs to perform snoop cycles. This is an
output signal when it sends and address to the memory and I/O device. It serves as both input
and output during snoop cycles. It is synchronized with the Bus Clock (BCLK)
Snoop cycles: The term "snooping" commonly refers to at least three different actions.
Inquire Cycles: These are bus cycles, initiated by external logic, that cause the processor
to look up an address in its physical cache tags.
Internal Snooping: These are internal actions by the processor (rather than external
logic) that are taken during certain types of cache accesses in order to detect self-
modifying code.
Bus Watching: Some caching devices watch their address and data bus continuously
while they are held off the bus, comparing every address driven by another bus master
with their internal cache tags and optionally updating their cached lines on the fly,
during write backs by the other master.
A20M# A20 Mask causes the CPU to make (force to 0) the A20 address bit when driving the
external address bus or performing an internal cache access. A20M# is provided to emulate the 1
M Byte address wrap-around that occurs on the x86. Snoop addressing is not affected. It is an
input signal. If it is not used then it is connected to the power supply. This is not synchronized
with the Bus Clock or anything.
ADS# Address Strobe begins a memory/I/O cycle and indicates the address bus (A31#-A3#) and
o m
transaction request signals (REQ#) are valid. This is an output signal during addressing cycle and
o t.c
an input/output signal during transaction request cycles. This is synchronized with the bus clock.
Memory /I/O cycle: The memory and input output data transfer (read or write) is carried out in
p
different clock cycles. The address is first loaded on the address bus. The processor being faster
s
g
waits till the memory or input/output is ready to send or receive the date through the data bus.
o
bl
Normally it takes more than one clock cycle.
ou
r
BCLK Bus Clock: provides the fundamental timing for the CPU. The frequency of the input
g
s
clock determines the operating frequency of the CPUs bus. External timing is defined
nt
referenced to the rising edge of CLK. It is an Input clock signal.
e
u d
BNR# Block Next Request: signals a bus stall by a bus agent unable to accept new transactions.
st
This is an input or output signal and is synchronized with the bus clock.
it y
.c
BPRI# Priority Agent Bus Request arbitrates for ownership of the system bus. Input and is
synchronized with the Bus clock.
w
w
Bus Arbitration: At times external devices signal the processor to release the system
w
address/data/control bus from its control. This is achieved by an external request which
normally comes from the external devices such as a DMA controller or a Coprocessor.
BR[4:0]: Hardware strapping options for setting the processors internal clock multiplier. By
strapping these wires to the supply or ground (some times they can be kept open for making
them 1). This option divides the input clock.
BSEL[1:0]: Bus frequency select balls (BSEL 0 and BSEL 1) identify the appropriate bus speed
(100 MHz or 133 MHz). It is an output signal.
BR0#: It drives the BREQ[0]# signal in the system to request access to the system bus.
D[63:0]#: Data Bus signals are bi-directional signals which provide the data path between the
CPU and external memory and I/O devices. The data bus must assert DRDY# to indicate valid
data transfer. This is both input as well as output.
Version 2 EE IIT, Kharagpur 6
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
DBSY#: Data Bus Busy is asserted by the data bus driver to indicate data bus is in use. This is
both input as well as output.
DEFER#: Defer is asserted by target agent and indicates the transaction cannot be guaranteed
as an in-order completion. This is an input signal.
DRDY#: Data Ready is asserted by data driver to indicate that a valid signal is on the data bus.
This is both input and output signal.
FERR#: FPU Error Status indicates an unmasked floating-point error has occurred. FERR# is
asserted during execution of the FPU instruction that caused the error. This is an output signal.
FLUSH#: Flush Internal Caches writing back all data in the modified state. This is an input
signal to the CPU.
o m
HIT#: Snoop Hit indicates that the current cache inquiry address has been found in the cache.
This is both input as well as output signal.
o t.c
p
HITM#: Snoop Hit Modified indicates that the current cache inquiry address has been found in
sg
the cache and dirty data exists in the cache line (modified state). (both input/output)
o
l
INIT#: Initialization resets integer registers and does notbaffect internal cache or floating point
registers. (Input) p .
u
oto the CPU.
INTR: Maskable Interrupt I. This is an input signal
gr
t s
NMI: Non-Maskable Interrupt I
n
eto signal to the target that the operation is atomic.
LOCK#: Lock Status is used by the CPU d
uthat a CPU can perform such that all results will be made
An atomic operation is any operation t
stime and whose operation is safe from interference by other
visible to each CPU at the same
t y
CPUs. For example, reading ori writing a word of memory is an atomic operation.
.c
NCHCTRL: The CPU w
w uses this ball to control integrated I/O pull-ups. A resistance is to be
connected here
w to control the current on the input/output pins.
PWRGD (power good) Indicates that the processors VCC is stable. It is an input signal.
REQ[4:0]#: Request Command is asserted by bus driver to define current transaction type.
RESET#: This is an input that resets the processor and invalidates internal cache without writing
back.
RTTCTRL: The CPU uses this ball to control the output impedance.
RS[2:0]#: Response Status is an input that signals the completion status of the current
transaction when the CPU is the response agent.
SLP#: Sleep when asserted in the stop grant state, causes the CPU to enter the sleep state.
Version 2 EE IIT, Kharagpur 7
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
"Suspend to RAM"
All power to the CPU is shut off, and the contents of its registers are flushed to RAM, which
remains on. This system state is the most prone to errors and instability.
"Suspend to Disk"
CPU power shut off, but RAM is written to disk and shut off as well. In Microsoft Windows, the
"Hibernate" command is associated with this state. Because the contents of RAM are written out
m
to disk, system context is maintained. For example, unsaved files would not be lost following this.
o
"Soft Off"
o t.c
System is shut down, however some power may be supplied to certain devices to generate a wake
s p
event, for example to support automatic startup from a LAN or USB device. In Microsoft
g
Windows, the "Shut down" command is associated with this state. Mechanical power can usually
o
bl
be removed or restored with no ill effects.
p .
Processor "C" power states
ou
Processor "C" power states are also defined. These are typically implemented in laptop
gr
platforms only. Here the cpu consumes less power while still doing work, and the tradeoff comes
s
nt
between power and performance, rather than power and latency.
d e
SMI#: System Management (SMM) Interrupt forces the processor to save the CPU state to the
t u
top of SMM memory and to begin execution of the SMI services routine at the beginning of the
s
defined SMM memory space. An SMI is a high-priority interrupt than NMI.
y
it
.c
STPCLK#: Stop Clock Input causes the CPU to enter the stop grant state.
w
w
TRDY#: Target Ready Input indicates that the target is ready to receive a write or write-back
w
transfer from the CPU.
VID[3:0]: Voltage Identification Bus informs the regulatory system on the motherboard of the
CPU Core voltage requirements. This is an output signal.
9.3 Conclusion
In this chapter the various signals of a typical general purpose processor has been discussed.
Broadly we can classify them into the following categories.
Address Signals: They are used to address the memory as well as input/output devices. They are
often multiplexed with other control signals. In such cases External Bus controllers latch these
address lines and make them available for a longer time for the memory and input/output devices
while the CPU changes the status of the same. The Bus controllers drive their inputs which are
connected to the CPU to high impedance so as not to interfere with the current state of these lines
from the CPU.
Data Signals: These lines carry the data to and fro the processor and memory or i/o devices.
Transceivers are connected on the data path to control the data flow. The data flow might
succeed some bus transaction signals. This bus transaction signals are necessary to negotiate the
speed mismatch between the input/output and the processor.
Control Signals: These can be generally divided into the following groups
Read Write Control
Memory Write The processor issues this signal while sending data to the memory
Memory Read The processor issues this signal while reading the data from the memory
I/O Read The input/output read signal which is generally preceded by some bus transaction
signals
I/O Write The input/output read signal which is generally succeeded by some bus transaction
signals
o m
from a set of status signal by an external bus controller.
o t.c
These read write signals are not generally directly available from the CPU. They are decoded
s p
Bus Transaction Control
o g
Master versus Slave .bl
master send p
uaddress
Bus
r o Bus
This is known as requesting to obtain the access to a bus. They are achieved by the following
lines.
Bus Request: The slave requests for the access grant
Bus Grant: Gets the grant signal
Lock: For specific operations the bus requests are not granted as the CPU might be doing some
important operations.
Interrupt Control
In a multitasking environment the Interrupts are external signals to the CPU for emergency
operations. The CPU executes the interrupt service routines while acknowledging the interrupts.
The interrupts are processed according to their priority. More discussion is available in
subsequent lessons.
Processor Control
These lines are activated when there is a power on or the processor comes up from a power-
saving mode such as sleep. These are
Reset
Test lines etc.
Some of the above signals will be discussed in the subsequent lessons.
o m
t.c
9.4 Questions and Answers
p o
Q1. What is maximum memory addressing capability of the processor discussed in this lecture?
g s
o
Ans: The number of address lines is 32. Therefore it can address 232 locations which is 4G bytes
. bl
Q2. p
What do you understand by POST in a desktop computer?
u
r o
Ans: It is called Power On Self Test. This is a routine executed to check the proper functioning
s g
of Hard Disk, CDROM, Floppy Disk and many other on-board and off-board components while
the computer is powered on.
ent
Q3.
u d
Describe the various power-saving modes in a general purpose CPU?
st
y
it
Ans: Refer to: Sleep Mode in Text
Q4. . c
What could be the differences in design of a processor to be used in the following
w
applications?
w
w
LAPTOP
Desktop
Motor Control
Ans:
LAPTOP processor: should be complex General Purpose Processor with low power consumption
and various power saving modes.
Motor Control: Simple low power specialized processor with on-chip peripherals with Real Time
Operating System.
Q5. What is the advantage of reducing the High state voltage from 5 V to 3.5 volts? What are
the disadvantages?
Version 2 EE IIT, Kharagpur 10
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Ans:
It reduces the interference but decreases the noise accommodation.
Ans:
It is used to know the quality of supply in side the CPU. If it is not good there may mal-
operations and data loss.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
10
Embedded Processors - I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Pre-requisite
Digital Electronics
10.1 Introduction
It is generally difficult to draw a clear-cut boundary between the class of microcontrollers and
general purpose microprocessors. Distinctions can be made or assumed on the following
grounds.
o m
.c
Microcontrollers are generally associated with the embedded applications
t
Microprocessors are associated with the desktop computers o
s p
Microcontrollers will have simpler memory hierarchy i.e. g the RAM and ROM may exist
on the same chip and generally the cache memory will
b lobe absent.
The power consumption and temperature rise of .microcontroller is restricted because of
the constraints on the physical dimensions. up
o
8-bit and 16-bit microcontrollers are very rpopular with a simpler design as compared to
s ggeneral purpose processors.
t
large bit-length (32-bit, 64-bit) complex
n
d e
However, recently, the market for 32-bit embedded processors has been growing.
u
Further the issues such as power consumption, cost, and integrated peripherals differentiate a
desktop CPU from an embeddedstprocessor. Other important features include the interrupt
y RAM or ROM, and the number of parallel ports. The
response time, the amount of ton-chip
i
.
desktop world values processing c power, whereas an embedded microprocessor must do the job
w
for a particular application at the lowest possible cost.
w
w
32- or 64-bit
desktop
processors
Performance
8- or 16-bit
controller
o m
4-bit ot.c
controller
s p
o g
Cost
. bl
p
u vs Cost regions
Fig. 10.1 The Performance
r o
s g
n t
d e
t u
s
ity
.c
w
w
w
ROM EEPROM
RAM
Micro-
processor Serial I/O
A/D
Input Input Parallel I/O
Analog
and and
I/O
output output Timer
D/A
ports ports
o m
t.c
PWM
s g
Analog
in
A/D
n t Serial I/O
d e CPU
t u core Parallel I/O
y s
i t
.c
Timer
w Analog
w PWM Filter
out
w Microcontroller
Digital
PWM
(b) Microcontroller-based system
Fig. 10.2 Microprocessor versus microcontroller
Fig. 10.1 shows the performance cost plot of the available microprocessors. Naturally the more is
the performance the more is the cost. The embedded controllers occupy the lower left hand
corner of the plot.
Fig.10.2 shows the architectural difference between two systems with a general purpose
microprocessor and a microcontroller. The hardware requirement in the former system is more
than that of later. Separate chips or circuits for serial interface, parallel interface, memory and
AD-DA converters are necessary On the other hand the functionality, flexibility and the
complexity of information handling is more in case of the former.
Version 2 EE IIT, Kharagpur 5
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
A/D l WDT
I/O EPA PWM WG
. b FG SIO
u p
Fig. 10.3 The Architectural Block diagramr o of Intel 8XC196 Microcontroller
s g
n t
PTS: Peripheral Transaction Server; I/O: Input/Output Interface; EPA: Event Processor Array;
PSW
CPU SFRs
o m
t.c
Registers Bus Controller
p o
g s
Fig. 10.4 The Architectural Block diagram of the core
o
l Logic Unit; ALU: Arithmetic Logic
CPU: Central Processing Unit; RALU: Register Arithmetic
. b
Unit;
u p
Master PC: Master Program Counter; PSW: Processor r o Status Word; SFR: Special Function
Registers
s g
n t
CPU Control
d e
The CPU is controlled by the s tu
microcode engine, which instructs the RALU to perform
operations using bytes, words, tor y
i accesses the upper register file. Windowing is a technique that
double-words from either the 256-byte lower register file or
.
through a window that directly
maps blocks of the upperw
c
register file into a window in the lower register file. CPU instructions
move from the 4-bytewprefetch queue in the memory controller into the RALUs instruction
w
register. The microcode engine decodes the instructions and then generates the sequence of
events that cause desired functions to occur.
Register File
The register file is divided into an upper and a lower file. In the lower register file, the lowest 24
bytes are allocated to the CPUs special-function registers (SFRs) and the stack pointer, while
the remainder is available as general-purpose register RAM. The upper register file contains only
general-purpose register RAM. The register RAM can be accessed as bytes, words, or double
words. The RALU accesses the upper and lower register files differently. The lower register file
is always directly accessible with direct addressing. The upper register file is accessible with
direct addressing only when windowing is enabled.
o m
The six-bit loop counter counts repetitive shifts. The second-operand register stores the second
operand for two-operand instructions, including the multiplier during multiply operations and the
o t.c
divisor during divide operations. During subtraction operations, the output of this register is
complemented before it is moved into the ALU. The RALU speeds up calculations by storing
s p
constants (e.g., 0, 1, and 2) in the constants register so that they are readily available when
g
complementing, incrementing, or decrementing bytes or words. In addition, the constants register
o
bl
generates single-bit masks, based on the bit-select register, for bit-test instructions.
p .
Code Execution u
o
r
The RALU performs most calculations for gthe microcontroller, but it does not use an
accumulator. Instead it operates directly ontthe s lower register file, which essentially provides
256 accumulators. Because data doesn not flow through a single accumulator, the
microcontrollers code executes faster and d emore efficiently.
t u
s
Instruction Format
ti y
These microcontrollers combine .c general-purpose registers with a three-operand instruction
w a single instruction to specify two source registers and a separate
w
format. This format allows
w
destination register. For example, the following instruction multiplies two 16-bit variables and
stores the 32-bit result in a third variable.
When the bus controller receives a request from the queue, it fetches the code from the address
contained in the slave PC. The slave PC increases execution speed because the next instruction
byte is available immediately and the processor need not wait for the master PC to send the
address to the memory controller. If a jump interrupt, call, or return changes the address
sequence, the master PC loads the new address into the slave PC, then the CPU flushes the queue
and continues processing.
Interrupt Service
The interrupt-handling system has two main components: the programmable interrupt controller
and the peripheral transaction server (PTS). The programmable interrupt controller has a
hardware priority scheme that can be modified by the software. Interrupts that go through the
interrupt controller are serviced by interrupt service routines those are provided by you. The
peripheral transaction server (PTS) which is a microcoded hardware interrupt-processor provides
efficient interrupt handling.
o m
Disable Clock Input
o t.c
(Powerdown)
s p
FXTAL 1 Divide-by-two
o g
bl
XTAL 1
Circuit
p .
Disable Clocks
o u
(Powerdown)
XTAL 2
gr Peripheral Clocks (PH1, PH2)
Disable s
Clock
t
Generators CLKOUT
(Powerdown) e
Oscillator n CPU Clocks (PH1, PH2)
u d
s t Disable Clocks
(Idle, Powerdown)
y
it Fig. 10.5 The clock circuitry
. c
Internal Timing w
w
w
The clock circuitry (Fig. 10.5) receives an input clock signal on XTAL1 provided by an
external crystal or oscillator and divides the frequency by two. The clock generators accept the
divided input frequency from the divide-by-two circuit and produce two non-overlapping
internal timing signals, Phase 1(PH1) and Phase 2 (PH2). These signals are active when high.
XTAL 1
TXTAL 1 TXTAL 1
1 State Time 1 State Time
PH 1
PH 2
CLKOUT
Analog-to-digital Converter
The analog-to-digital (A/D) converter converts an analog input voltage to a digital equivalent.
Resolution is either 8 or 10 bits; sample and convert times are programmable. Conversions can
be performed on the analog ground and reference voltage, and the results can be used to calculate
gain and zero-offset errors. The internal zero-offset compensation circuit enables automatic zero
offset adjustment. The A/D also has a threshold-detection mode, which can be used to generate
an interrupt when a programmable threshold voltage is crossed in either direction. The A/D scan
mode of the PTS facilitates automated A/D conversions and result storage.
Watchdog Timer
The watchdog timer is a 16-bit internal timer that resets the microcontroller if the software fails
to operate properly.
o m
In idle mode, the CPU stops executing instructions, but the peripheral clocks remain active.
o t.c
Power consumption drops to about 40% of normal execution mode consumption. Either a
hardware reset or any enabled interrupt source will bring the microcontroller out of idle mode. In
p
power-down mode, all internal clocks are frozen at logic state zero and the internal oscillator is
s
g
shut off. The register file and most peripherals retain their data if VCC is maintained. Power
o
bl
consumption drops into the W range.
10.3 Conclusion
This lesson discussed about the architecture of a typical high performance microcontrollers.
The next lesson shall discuss the signals of a typical microcontroller from the Intel MCS96
family.
Ans: This is where the instructions which breaks down to smaller micro-instructions are
executed.
Microprogramming was one of the key breakthroughs that allowed system architects to
implement complex instructions in hardware. To understand what microprogramming is, it
m
helps to first consider the alternative: direct execution. With direct execution, the machine
o
t.c
fetches an instruction from memory and feeds it into a hardwired control unit. This control
unit takes the instruction as its input and activates some circuitry that carries out the task. For
p o
instance, if the machine fetches a floating-point ADD and feeds it to the control unit, theres
s
a circuit somewhere in there that kicks in and directs the execution units to make sure that all
g
o
of the shifting, adding, and normalization gets done. Direct execution is actually pretty much
bl
what youd expect to go on inside a computer if you didnt know about microcoding.
.
p
The main advantage of direct execution is thatuits fast. Theres no extra abstraction or
r
translation going on; the machine is just decodingo and executing the instructions right in
hardware. The problem with it is that it cangtake up quite a bit of space. Think about it. If
t
every instruction has to have some circuitrys that executes it, then the more instructions you
e
have, the more space the control unit willn take up. This problem is compounded if some of
the instructions are big and complex,
u d and take a lot of work to execute. So directly executing
instructions for a CISC machinet just wasnt feasible with the limited transistor resources of
s
ti yits almost like theres a mini-CPU on the CPU. The control
the day.
With microprogramming,
.
unit is a microcode engine c that executes microcode instructions. The CPU designer uses
these microinstructionsw to write microprograms, which are stored in a special control
memory. When a w
microcode engine,w normal program instruction is fetched from memory and fed into the
the microcode engine executes the proper microcode subroutine. This
subroutine tells the various functional units what to do and how to do it.
As you can probably guess, in the beginning microcode was a pretty slow way to do
things. The ROM used for control memory was about 10 times faster than magnetic core-
based main memory, so the microcode engine could stay far enough ahead to offer decent
performance. As microcode technology evolved, however, it got faster and faster. (The
microcode engines on current CPUs are about 95% as fast as direct execution) Since
microcode technology was getting better and better, it made more and more sense to just
move functionality from (slower and more expensive) software to (faster and cheaper)
hardware. So ISA instruction counts grew, and program instruction counts shrank.
As microprograms got bigger and bigger to accommodate the growing instructions sets,
however, some serious problems started to emerge. To keep performance up, microcode had
to be highly optimized with no inefficiencies, and it had to be extremely compact in order to
keep memory costs down. And since microcode programs were so large now, it became
much harder to test and debug the code. As a result, the microcode that shipped with
machines was often buggy and had to be patched numerous times out in the field. It was the
difficulties involved with using microcode for control that spurred Patterson and others began
to question whether implementing all of these complex, elaborate instructions in microcode
was really the best use of limited transistor resources.
Ans: A fail-safe mechanism that intervenes if a system stops functioning. A hardware timer
that is periodically reset by software. If the software crashes or hangs, the watchdog timer
will expire, and the entire system will be reset automatically.
The Watch Dog Unit contains a Watch Dog Timer.
A watchdog timer (WDT) is a device or electronic card that performs a specific operation
after a certain period of time if something goes wrong with an electronic system and the
system does not recover on its own.
o m
A common problem is for a machine or operating system to lock up if two parts or
o t.c
programs conflict, or, in an operating system, if memory management trouble occurs. In
some cases, the system will eventually recover on its own, but this may take an unknown and
p
perhaps extended length of time. A watchdog timer can be programmed to perform a warm
s
g
boot (restarting the system) after a certain number of seconds during which a program or
o
bl
computer fails to respond following the most recent mouse click or keyboard action. The
p .
timer can also be used for other purposes, for example, to actuate the refresh (or reload)
button in a Web browser if a Web site does not fully load after a certain length of time
u
following the entry of a Uniform Resource Locator (URL).
o
A WDT contains a digital counter that g
r
s counts down to zero at a constant speed from a
preset number. The counter speed is keptt constant by a clock circuit. If the counter reaches
zero before the computer recovers, e
n
a signal is sent to designated circuits to perform the
desired action.
u d
st
i ty
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
11
Embedded Processors - II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Pre-requisite
Digital Electronics
11.1 Introduction
o m
Microcontrollers are required to operate in the real world without much of interface
o t.c
circuitry. The input-output signals of such a processor are both analog and digital. The digital
data transmission can be both parallel and serial. The voltage levels also could be different.
s p
The architecture of a basic microcontroller is shown in Fig. 11.1. It illustrates the various
g
modules inside a microcontroller. Common processors will have Digital Input/Output, Timer and
o
bl
Serial Input/Output lines. Some of the microcontrollers also support multi-channel Analog to
p .
Digital Converter (ADC) as well as Digital to Analog Converter (DAC) units. Thus analog signal
input and output pins are also present in typical microcontroller units. For external memory and
o
I/O chips the address as well as data lines are also supported. u
gr
s
t area
8 enRAM 8
u d Port
Timer
16-Bit st CPU ADC A
8 it y
.c
w Tx
w ROM area Serial Port
Rx
w
Port B Port C
5 8
Stack
Watchdog A/D Pulse-width SSI00
Overflow
Timer Converter Modulators SSI01
Module
Bus Control
AZO:15
Memory Data Bus (16)
Unit
o m Port 2
t.c
Bus
Controller
po
Baud-rate
Peripheral SIO1 Generator
AD15:0 Interrupt
Handler
g s
o
bl
Bus-Control Peripheral Ports 7.8
Interface Unit
Queue p .
Transaction
Server
o uInterrupt
17 Capture/
Microcode
gr Controller
Compares
Engine
s 4 Times
nt
EPA
8 Output/
Source (16)
d e Simulcaptures
t u
y s Port 9
Register
RAM
i t
Memory
Interface
.c
ALU
1 Kbyte Unit
w
w
w
Destination (16)
Code/Data Serial
RAM Debug
3 Kbytes Unit
Address/Data Lines
Bus Control Signals
Signals related to Interrupt
Signals related to Timers/Event Manager
Digital Input/Output Ports
Analog Input/Output Ports
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
A20:16 Address Pins 1620. These are output pins used during external memory cycle. These
are multiplexed with EPORT.4:0. This is a part of the 8-bit extended addressing port. It is used to
support extended addressing. The EPORT is an 8-bit port which can operate either as a general-
purpose I/O signal (I/O mode) or as a special-function signal (special-function mode).
AD15:0 Address/Data Lines These lines serve as input as well as output pins. The function of
these pins depends on the bus width and mode. When a bus access is not occurring, these pins
revert to their I/O port function. AD15:0 drive address bits 015 during the first half of the bus
cycle and drive or receive data during the second half of the bus cycle.
INST Output signal: When high, INST indicates that an instruction is being fetched from
external memory. The signal remains high during the entire bus cycle of an external instruction
fetch.
RD: Read Signal: Output: It is asserted only during external memory reads.
READY: Ready Input: This active-high input can be used to insert wait states in addition to
those programmed in the chip configuration.
WR: Write: Output Signal: This active-low output indicates that an external write is occurring.
This signal is asserted only during external memory writes.
WRH Write High: Output Signal: During 16-bit bus cycles, this active-low output signal is
asserted for high-byte writes and word writes to external memory.
WRL Write Low: Output Signal: During 16-bit bus cycles, this active-low output signal is
asserted for low-byte writes and word writes to external memory.
EA: External Access: Input Signal: This input determines whether memory accesses to the upper
7 Kbytes of ROM (FF2400FF3FFFH) are directed to internal or external memory. These
o m
accesses are directed to internal memory if EA# is held high and to external memory if EA# is
held low. For an access to any other memory location, the value of EA# is irrelevant.
o t.c
EXTINT: External Interrupt Input: In normal operating mode, a rising edge on EXTINT sets the
s p
EXTINT interrupt pending bit. EXTINT is sampled during phase 2 (CLKOUT high). The
g
minimum high time is one state time. If the EXTINT interrupt is enabled, the CPU executes the
o
bl
interrupt service routine.
p .
u
NMI: Nonmaskable Interrupt Input: In normal operating mode, a rising edge on NMI generates a
o
nonmaskable interrupt. NMI has the highest priority of all prioritized interrupts.
r
g electrically isolates the microcontroller from
ONCE: Input: On-circuit emulation (ONCE) mode s
t you can test the printed circuit board while the
the system. By invoking the ONCE mode,
e n
d
microcontroller is soldered onto the board.
u Enable This active-high input pin enables the on-chip
PLLEN: Input Signal: Phase-locked stLoop
clock multiplier. The PLLEN pin
i t ymust be held low along with the ONCE# pin to enter on-circuit
emulation (ONCE) mode. c
.
w
the microcontroller. w
w
RESET: I/O Reset: A level-sensitive reset input to, and an open-drain system reset output from,
Either a falling edge on or an internal reset turns on a pull-down transistor
connected to the RESET for 16 state times. In the power down and idle modes, asserting RESET
causes the microcontroller to reset and return to normal operating mode.
RPD: Return-From-Power-Down Input Signal: Return from Power down Timing pin for the
return-from-power down circuit.
TMODE: Test-Mode Entry Input: If this pin is held low during reset, the microcontroller will
enter a test mode. The value of several other pins defines the actual test mode.
XTAL1 I Input Crystal/Resonator or External Clock Input: Input to the on-chip oscillator and the
internal clock generators. The internal clock generators provide the peripheral clocks, CPU
clock, and CLKOUT signal. When using an external clock source instead of the on-chip
oscillator, connect the clock input to XTAL1.
XTAL2: Output: Inverted Output for the Crystal/Resonator Output of the on-chip oscillator
inverter. Leave XTAL2 floating when the design uses an external clock source instead of the on-
chip oscillator.
P3.7:0 I/O Port 3: This is a memory-mapped, 8-bit, bidirectional port with programmable open
drain or complementary output modes.
P4.7:0 I/O Port 4 This is a memory-mapped, 8-bit, bidirectional port with programmable open
drain or complementary output modes.
o m
P5.7:0 I/O Port 5 This is a memory-mapped, 8-bit, bidirectional port.
o t.c
s p
P7.7:0 I/O Port 7 This is a standard, 8-bit, bidirectional port that shares package pins with
individually selectable special-function signals.
o g
. bl
P8.7:0 I/O Port 8: This is a standard, 8-bit, bidirectional port.
u p
o port.
P9.7:0 I/O Port 9: This is a standard, 8-bit, bidirectional
r
s g bidirectional port that is multiplexed with
P10.5:0 I/O Port 10: This is a standard, 6-bit,
n
individually selectable special-function signals.
t
d e
t
P11.7:0 I/O Port 11: This is a standard, u 8-bit, bidirectional port that is multiplexed with
s signals.
individually selectable special-function
y
c iat memory-mapped, 5-bit, bidirectional port. P12.2:0 select the
P12.4:0 I/O Port 12: This is
.
TROM w
w
w
Most of the above ports are shared with other important signals discussed here. For instance Port
3 pins P3.7:0 share package pins with AD7:0. That means by writing a specific word to the
configuration register the pins can change their function.
Analog Inputs
ACH15:0: Input Analog Channels: These signals are analog inputs to the A/D converter. The
ANGND and VREF pins are also used for the standard A/D converter to function.
Other important signals of a typical microcontroller include
Power Supply and Ground pins at multiple points
Signals from the internal programmable Timer
Debug Pins
Ans:
Ans:
The multiple power supply points ensure the following
The voltages at devices (transistors and cells) are better than a set target under a specified
set of varying load conditions in the design. This is to ensure correct operation of circuits
at the expected level of performance.
the current supplied by a pad, pin, or voltage regulator is within a specified limit under
any of the specified loading conditions. This is required: a) for not exceeding the design
capacity of regulators and pads; and b) to distribute currents more uniformly among the
pads, so that the L di/dt voltage variations due to parasitic inductance in the packages
substrate, ball-grid array, and bond wires are minimized.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
2
Embedded Processors and
Memory
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
12
Memory-Interfacing
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn
Pre-Requisite
Digital Electronics, Microprocessors
12.1 Introduction
o m
A Single Chip Microcontroller t.c
p o
RAM area
g s
8
lo Port
8
Timer
. bADC A
8
16bit CPU
u p
r o
s g
n t Serial Port
Tx
ROM area
d e Rx
t u
sB
ti y
Port Port C
.c 5 8
w
w
wCPU: The processing module of the microcontroller
Fig. 12.1 The basic architecture of a Microcontroller
Fig. 12.1 shows the internal architecture of single chip microcontroller with internal RAM as
well as ROM. Most of these microcontrollers do not require external memory for simpler tasks.
The program lengths being small can easily fit into the internal memory. Therefore it often
provides single chip solutions. However the amount of internal memory cannot be increased
beyond a certain limit because of the following reasons.
Power Consumption
Size
The presence of extra memory needs more power consumption and hence higher temperature
rise. The size has to be increased to house the additional memory. The need for extra memory
space arises in some specific applications. Fig. 12.2 shows the basic block diagram of memory
interface to a processor.
Data Lines
Control Lines
EMI Bus
Interface
o u
Logic
gr Memory
ts Address,
en Control
u d
t
Fig. 12.3 External Memory Interface Diagram
s
it y
The above family of microcontroller can have both on-chip as well as off chip external memory.
.c
At times the on-chip memory is a programmable flash type. A special register inside the
w
microcontroller can be programmed (by writing an 8 bit or 16-bit binary number) for using this
w
external memory in various modes. In case of the PIC family the following modes are possible
w
Microcontroller Mode
The processor accesses only on-chip FLASH memory. External Memory Interface functions are
disabled. Attempts to read above the physical limit of the on-chip FLASH causes a read of all
0s (a NOP instruction).
Microprocessor Mode
The processor permits execution and access only through external program memory; the contents
of the on-chip FLASH memory are ignored.
Microprocessor o m Extended
t.c
Microprocessor Microcontroller
with Boot Block Microcontroller
Mode (MP) Mode (MC)
Mode (MPBB) Mode (EMC)
p o
000000h On-Chip
Program
000000h On-Chip
Program
000000h
g s On-Chip
000000h
On-Chip
o Program
bl
Memory Memory Program
Boot Memory
(No No Memory
access) Boundary access
p .
Boundary Boundary
Program Space Execution
Boundary+1
External
o u Boundary+1
Program
Memory gr
s
nt
External
External Reads Program
Program 0s
d e Memory
Memory
t u
1FFFFFh 1FFFFFh
y s 1FFFFFh 1FFFFFh
it On-Chip External
.c
On-Chip External On-Chip External On-Chip
FLASH Memory FLASH Memory FLASH FLASH Memory
w
w
w Fig. 12.4 The memory Map in different modes
AD<0, 15:10>
A<16,17>
AD<7:1>
VDD
ALE
VSS
OE
A<18,19> WRL
AD<9,8> WRH
PIC18F8XXX
UB
o m
ot.cLB
s p
o g
.bl
BA0
p
CE
ou
gr
Fig. 12.5 The address, data and control lines of the PIC18F8XXX microcontroller
required for external memory interfacing
s
e nt
The address, data and control lines of a PIC family of microcontroller is shown in Fig. 12.5 and
are explained below.
u d
st
AD0-AD15: 16-bit Data and 16 bits of Address are multiplexed
it y
c
A16-A19: The 4 most significant bits of the address
.Signal to latch the multiplexed address in the first clock cycle
w
ALE: Address Latch Enable
w
w
WRL Write Low Control Pin to make the memory write the lower byte of the data when it is
low
WRH Write High Control Pin to make the memory write the higher byte of the data when it is
low
OE Output Enable is made low when valid data is made available to the external memory
CE Chip enable line is made low to access the external memory chip
BA0: Byte Address 0
LB Lower Byte Enable Control is kept low when the lower byte is available for the memory.
UB Upper Byte Enable Control is kept low when the upper byte is available for the memory.
The microcontroller has a 16-bit wide bus for data transfer. These data lines are shared with
address lines and are labeled AD<15:0>. Because of this, 16 bits of latching are necessary to
demultiplex the address and data. There are four additional address lines labeled A<19:16>. The
PIC18 architecture provides an internal program counter of 21 bits, offering a capability of 2
Mbytes of addressing.
There are seven control lines that are used in the External Memory Interface: ALE, WRL
, WRH , OE , CE , LB , UB . All of these lines except OE may be used during data writes. All
of these lines except WRL and WRH may be used during fetches and reads. The application
will determine which control lines are necessary. The basic connection diagram is shown in Fig.
12.6. The 16-bit byte select mode is shown here.
D15:DO
PIC18F8XXX MEMORY
m Ax:A0
LATCH
Ax:A0
AD<15:0>
o
ALE
o t.c D15:DO
CE
s p CE
A<19:16>
o g OE WR(1)
.bl
OE
WRH u p
WRL r o
BA0
s g Address Bus
UB
LB ent Data Bus
Control Lines
u d
st
Fig. 12.6 The connection diagram for external memory interface in 16-bit byte select mode
it y
.c
The PIC18 family runs from a clock that is four times faster than its instruction cycle. The four
w
clock pulses are a quarter of the instruction cycle in length and are referred to as Q1, Q2, Q3, and
w
Q4. During Q1, ALE is enabled while address information A<15:0> are placed on pins
w
AD<15:0>. At the same time, the upper address information A<19:16> are available on the
upper address bus. On the negative edge of ALE, the address is latched in the external latch. At
the beginning of Q3, the OE output enable (active low) signal is generated. Also, at the
beginning of Q3, BA0 is generated. This signal will be active high only during Q3, indicating the
state of the program counter Least Significant bit. At the end of Q4, OE goes high and data (16-
bit word) is fetched from memory at the low-to-high transition edge of OE . The timing diagram
for all signals during external memory code execution and table reads is shown in Fig. 12.7.
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
BA0
ALE
OE
WRH 1
WRL 1
CE 0
o m
UB 0
o t.c
LB 0
s p
o g
Fig. 12.7 Timing Diagram for
. bl Memory Read
u p
12.3 Conclusion
r o
This lesson discussed a typical external s
g
microcontrollers. A typical timing diagram n
t memory interface example for PIC
for memory read operation is presented.
family of
d e
12.4 Questions t u
s
Q1.Draw the read timing diagram ti yfor a typical memory operation
. c
Ans: w
Refer to text w
w
Q2. Draw the read timing diagram for a typical memory operation
o m
o t.c
s p
o g
. bl
u p
r o
16-bit Write Operation in MCS96
s g family refer Lesson10 and 11
n t
d e
t u
s
ti y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
13
Interfacing bus, Protocols,
ISA bus etc.
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn
Pre-Requisite o m
o t.c
Digital Electronics, Microprocessors
s p
13.1 Introduction o g
. bl
p
The traditional definition of input-output is the devices those create a medium of interaction with
u
1. Printers r o
the human users. They fall into the following categories such as:
om
Motors Motors MCU
Drivers Video
.c
Op Amps 1.6in/1.8in
ot
TFT Panel
sp
V/H Timing
Generator
g
CCD TI AFE TFT
lo
Lens Module RS232c Controller
.b
TI Digital Media Processor USB
up
Audio
Codec Module 1394
o
gr
32164-MB -MB Flash
ts
Audio Removable Storage
Power Amplifier SDRAM Memory
en
ud
Reset 1.5-V/1.8-/2.5V Core Supply 3.3-V/5-V System Supply 7.5V/12V/15V LCD/CCD Supply
st
ity
.c
Supply
Low Dropout Buck Buck Boost Boost Charge Inverter
Voltage
w
Power Management
Fig. 13.1
Version 2 EE IIT, Kharagpur 4
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Processing
Transformation of data
Implemented using processors
Storage
Retention of data
Implemented using memory
And Communication (also called Interfacing)
Transfer of data between processors and memories
Implemented using buses
Interfacing
o
deadlocks. In our context it is a way of effective communication in real time.
m
Interfacing is a way to communicate and transfer information in either way without ending into
This involves
Addressing o t.c
Arbitration
s p
Protocols
o g
.bl
Master Slave
u p
r o
s g Control Lines
u d Data Lines
st
Fig. 13.2(a) The Bus structure
it y
.c
Addressing: The data sent by the master over a specified set of lines which enables just the
w
device for which it is meant
w
Protocols: The literal meaning of protocol is a set of rules. Here it is a set of formal rules
w
describing how to transfer data, especially between two devices.
A simple example is memory read and write protocol. The set of rules or the protocol is
For read (Fig. 13.2 (b))
The CPU must send the memory address
The read line must be enabled
The processor must wait till the memory is ready
Then accept the bits in the data lines
rd'/wr
enable
addr
data
tsetup tread
read protocol
Fig. 13.2(b)
For write (Fig. 13.2(c))
The CPU must send the memory address
o m
t.c
The write line must be enabled
The processor sends the data over the data lines
The processor must wait till the memory is ready
p o
gs
o
rd'/wr
. bl
enable u p
r o
addr
s g
data ent
u d
st tsetup twrite
i t y write protocol
.c
w Fig. 13.2(c)
w
Arbitration: When the same set of address/data/control lines are shared by different units then
w
the bus arbitration logic comes into play. Access to a bus is arbitrated by a bus master. Each
node on a bus has a bus master which requests access to the bus, called a bus request, when then
node requires to use the bus. This is a global request sent to all nodes on the bus. The node that
currently has access to the bus responds with either a bus grant or a bus busy signal, which is
also globally known to all bus masters. (Fig. 13.3)
I/O I/O
Device Device DMA
1 2
o m
Fig. 13.3 The bus arbitration of the DMA, known as direct t.c memory access
controller which is responsible for transferring data between o an I/O device and
memory without involving the CPU. It starts with a bussp request to the CPU and
after it is granted it takes over the address/data and control
transfer. After the data transfer is complete it passeslo
g bus to initiate the data
the control over to the CPU.
. b
Before learning more details about each of these concepts u p a concrete definition of the following
terms is necessary. r o
s gleast resistance
t etc). It may be augmented with buffers latches etc.
Wire: It is just a passive physical connection with
n
e of bits, the clock speed etc.
Bus: A group of signals (such as data, address
A bus has standard specification such asdnumber
t uavailable so that any device which meets the specified
Port: It is the set of physical wires
standard can be directly plugged y
s
ti in. Example is the serial, parallel and USB port of the PC.
Time multiplexing: This is to.cShare a single set of wires for multiple pieces of data. It saves wires
at expense of time w
w
w
req req
r o
The Handshaking Protocol s g
n t
Strobe Protocol d e
t u
s
i
Masterty req Servant
.c
w
w
w
data
req 1 3
data 2 4
Handshake Protocol
Master Servant
req
ack
data
o m
o t.c
s p
1
o3g
bl
req
2 p . 4
ack
o u
gr
s
nt
data
d e
t
Fig.u13.5(b) Handshake Protocol
s
1. ti y data
Master asserts req to receive
2. c and asserts ack
Servant puts data on .bus
3. wnextandrequest
Master receives data deasserts req
4. w
Servant ready for
w
wait
data
req 1 3 req 1 4
wait wait 2 3
o m
t.c
data 2 4 data 5
taccess pto
1. Master asserts req to receive data g s access
d
Fig. 13.5(c) Strobe and Handshake Combined
u
st
t y
ci in ISA Bus
Handshaking Example
.
w
w
The Industry Standard Architecture (ISA Bus) has been described as below
w
This is a standard bus architecture developed to help the various designers to customize their
product and the interfaces. The pin configuration and the signals are discussed below.
o m
o t.c
s p
o g
Fig. 13.6 The ISA bus l
. b
ISA Signal Descriptions u p
r o
s g
SA19 to SA0 (SA for System Address)
n t
System Address bits 19:0 are used to addressd e memory and I/O devices within the system. These
signals may be used along with LA23uto LA17 to address up to 16 megabytes of memory. Only
the lower 16 bits are used during I/O s t operations to address up to 64K I/O locations. SA19 is the
most significant bit. SA0 is the i tyleast significant bit. These signals are gated on the system bus
when BALE is high and are.c latched on the falling edge of BALE. They remain valid throughout
a read or write command. walsoThese signals are normally driven by the system microprocessor or
DMA controller, but mayw be driven by a bus master on an ISA board that takes ownership of
the bus. w
LA23 to LA17
Unlatched Address bits 23:17 are used to address memory within the system. They are used
along with SA19 to SA0 to address up to 16 megabytes of memory. These signals are valid when
BALE is high. They are "unlatched" and do not stay valid for the entire bus cycle. Decodes of
these signals should be latched on the falling edge of BALE.
AEN
Address Enable is used to degate the system microprocessor and other devices from the bus
during DMA transfers. When this signal is active the system DMA controller has control of the
address, data, and read/write signals. This signal should be included as part of ISA board select
decodes to prevent incorrect board selects during DMA cycles.
BALE
Buffered Address Latch Enable is used to latch the LA23 to LA17 signals or decodes of these
signals. Addresses are latched on the falling edge of BALE. It is forced high during DMA cycles.
When used with AEN, it indicates a valid microprocessor or DMA address.
CLK
System Clock is a free running clock typically in the 8MHz to 10MHz range, although its exact
frequency is not guaranteed. It is used in some ISA board applications to allow synchronization
with the system microprocessor.
o m
t.c
SD15 to SD0
p o
System Data serves as the data bus bits for devices on the ISA bus. SD15 is the most significant
g s
bit. SD0 is the least significant bits. SD7 to SD0 are used for transfer of data with 8-bit devices.
o
bl
SD15 to SD0 are used for transfer of data with 16-bit devices. 16-bit devices transferring data
.
with 8-bit devices shall convert the transfer into two 8-bit cycles using SD7 to SD0.
p
o
DACK0 to DACK3 and DACK5 to DACK7 u
gr
DMA Acknowledge 0 to 3 and 5 to 7 are used to
ts acknowledge DMA requests on DRQ0 to DRQ3
and DRQ5 to DRQ7.
e n
u dto DRQ7
DRQ0 to DRQ3 and DRQ5
s t
DMA Requests are used by ISA i tyboards to request service from the system DMA controller or to
request ownership of the .c bus as a bus master device. These signals may be asserted
w device must hold the request signal active until the system board
asynchronously. The requesting
w
asserts the corresponding
w DACK signal.
I/O CH CK
I/O Channel Check signal may be activated by ISA boards to request than an non-maskable
interrupt (NMI) be generated to the system microprocessor. It is driven active to indicate a
uncorrectable error has been detected.
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait
states. This signals normal state is active high (ready). ISA boards drive the signal inactive low
(not ready) to insert wait states. Devices using this signal to insert wait states should drive it low
immediately after detecting a valid address decode and an active read or write command. The
signal is release high when the device is ready to complete the cycle.
IOR
I/O Read is driven by the owner of the bus and instructs the selected I/O device to drive read data
onto the data bus.
IOW
I/O Write is driven by the owner of the bus and instructs the selected I/O device to capture the
write data on the data bus.
SMEMW st
ti y
.
System Memory Write instructs c a selected memory device to store the data currently on the data
wthe memory decode is within the low 1 megabyte of memory space.
bus. It is active only when
SMEMW is derived from w
w MEMW and a decode of the low 1 megabyte of memory.
MEMR
Memory Read instructs a selected memory device to drive data onto the data bus. It is active on
all memory read cycles.
MEMW
Memory Write instructs a selected memory device to store the data currently on the data bus. It is
active on all memory write cycles.
REFRESH
Memory Refresh is driven low to indicate a memory refresh operation is in progress.
OSC
Oscillator is a clock with a 70ns period (14.31818 MHz). This signal is not synchronous with the
system clock (CLK).
RESET DRV
Reset Drive is driven high to reset or initialize system logic upon power up or subsequent system
reset.
TC o m
o t.c
Terminal Count provides a pulse to signal a terminal count has been reached on a DMA channel
operation.
s p
o g
MASTER
.bl
Master is used by an ISA board along with a DRQ line p
u which
to gain ownership of the ISA bus. Upon
receiving a -DACK a device can pull -MASTER low
r o will allow it to control the system
address, data, and control lines. After MASTERgis low, the device should wait one CLK period
before driving the address and data lines, and tstwo clock periods before issuing a read or write
command.
e n
u d
MEM CS16
st
i ty low by a memory slave device to indicate it is capable of
.cdata transfer. This signal is driven from a decode of the LA23 to
Memory Chip Select 16 is driven
performing a 16-bit memory
w
LA17 address lines.
w
I/O CS16
w
I/O Chip Select 16 is driven low by a I/O slave device to indicate it is capable of performing a
16-bit I/O data transfer. This signal is driven from a decode of the SA15 to SA0 address lines.
0WS
Zero Wait State is driven low by a bus slave device to indicate it is capable of performing a bus
cycle without inserting any additional wait states. To perform a 16-bit memory cycle without
wait states, -0WS is derived from an address decode.
SBHE
System Byte High Enable is driven low to indicate a transfer of data on the high half of the data
bus (D15 to D8).
CYCLE C1 C2 WAIT C3 C4
CLOCK
D[7-0] DATA
A[19-0] ADDRESS
o m
ALE
o t.c
/MEMR s p
o g
CHRDY
. bl
p
u of Data Transfer in ISA bus
o
Fig. 13.7(a) The Handshaking Mode
r
g
The Memory Write bus cycle intsISA bus
e n
u d WAIT
CYCLE C1 C2
st C3 C4
CLOCK ti y
.c DATA
D[7-0]
w
w ADDRESS
A[19-0]
w
ALE
/MEMW
CHRDY
System bus o m
Port 2
t.c
Port 3
p o
Parallel I/O peripheral
g s Parallel I/O peripheral
o
. bl
Port A Port B Port C
u p Port A Port B Port C
r o
Adding parallel I/O to a bus-
s g Extended parallel I/O
based I/O processor
n t
Fig. 13.8 d
e
Parallel I/O and extended Parallel I/O
t u
Extended parallel I/O
ys I/O but more ports needed
When processor supportsitport-based
One or more processor . cports interface with parallel I/O peripheral extending total number
e.g., extending 4w
wI/O
of ports available for
ports to 6 ports in figure
w
Types of bus-based I/O:
Memory-mapped I/O and standard I/O
Processor talks to both memory and peripherals using same bus two ways to talk to
peripherals
Memory-mapped I/O
Peripheral registers occupy addresses in same address space as memory
e.g., Bus has 16-bit address
lower 32K addresses may correspond to memory
upper 32k addresses may correspond to peripherals
Standard I/O (I/O-mapped I/O)
Additional pin (M/IO) on bus indicates whether a memory or peripheral access
e.g., Bus has 16-bit address
all 64K addresses correspond to memory when M/IO set to 0
d e/CS
A<015>
ALE t u /OE
s G /WE
i ty 8 74373 CS2 /CS1
P2 w
. c HM6264
/WR w /CS
w
/RD D<07>
/PSEN
A<014>
/OE
8051 27C256
Clock
P2 Adr. 158
Q Adr. 70
ALE
o m
t.c
/RD
p o
Fig. 13.9(b) The timing diagram
g s
b loThe lower byte of the address is
The timing of the various signals is shown in Fig. 13.9(b).
p . The higher byte of the address is
placed along P0 and the address latch enable signal is enabled.
placed along P2. The ALE signal enables the 74373 u chip to latch the address as the P0 bus will
r
be used for data. The P0 bus goes into tri-state (high o impedance state) and switches internally for
g
s baronover
data path. The RD (read) line is enabled. The
when low. The data is received from the memory n t the
the read line indicates that it is active
P0 bus. A memory write cycle can be
explained similarly.
d e
t u
13.3 Conclusion s
ti y
. c about the basics of Input Output interfacing. In the previous
In this lesson you learnt
wgenerator,
chapter you also studied about some input output concepts. But most of those I/O such as Timer,
Watch Dog circuits, PWM w Serial and Parallel ports were part of the microcontroller.
w of interfacing with external devices have been discussed. The difference
In this lesson the basics
between a Bus and a Port should be kept in mind. The ISA bus is discussed to give an idea about
the various bus architectures which will discussed in the later part of this course. You must
browse various websites as listed below for further knowledge.
http://esd.cs.ucr.edu/slide_index.html
http://esd.cs.ucr.edu/wres.html
www.techfest.com/hardware/bus/isa.htm
You should be able to be in a position to learn any microcontroller and their interfacing
protocols.
13.4 Questions
1. List at least 4 differences between the I/O devices for a Real Time Embedded System
(RTES) and a Desktop PC?
o t.c
Ans: An additional handshaking signal from the memory namely /ready is necessary. The
s p
microcontroller inserts wait states as long as the /ready line is not inactive. The ready line in this
g
case is sampled at the rising edge of the third clock phase. Fig.Q2 reveals the timing of such an
o
bl
operation.
p .
T1 T2 Twaitu T4
r o T5
s g
Clock
ent
u d
Address
st
it y
.c
/RD
w
w
w
/Ready
Data
3. Enlist the handshaking signals in the ISA bus for dealing with slower I/O devices.
Version 2 EE IIT, Kharagpur 19
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Ans:
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait
states. This signals normal state is active high (ready). ISA boards drive the signal inactive low
(not ready) to insert wait states. Devices using this signal to insert wait states should drive it low
immediately after detecting a valid address decode and an active read or write command. The
signal is release high when the device is ready to complete the cycle.
4. What additional handshaking signals are necessary for bidirectional data transfer over the
same set data lines.
Ans:
For an 8-bit data transfer we need at least 4 additional lines for hand shaking. As shown in
Fig.Q4 there are two ports shown. Port A acts as the 8-bit bidirectional data bus. Port C carries
the handshaking signals.
o m
Write operation: When the data is ready the /OBFA (PC7 output buffer c
t. fullthrough
acknowledge active
low) signal is made 0. The device which is connected acknowledges
acknowledge that it is ready to accept data. It is active low). The p
o /ACKA( PC6
data transfer takes place over
PA0-PA7. s
gmakes the /STBA (PC4 Strobe
Read operation: When the data is ready the external device o
lis sent through IBFA (Input Buffer
acknowledge active low) line low. The acknowledgement
. b
Empty Acknowledge that it is ready to accept data. Itpis active high). The data transfer takes
place.
o u
g r
ts
e n 8
PA7-PA0
u d
st PC7 OBFA
i ty
.c PC6 ACKA
w
w PC4 STBA
w PC5 IBFA
Ans:
ISA Bus
The Industry Standard Architecture (ISA) bus is an open, 8-bit (PC and XT) or 16-bit (AT)
asymmetrical I/O channel with numerous compatible hardware implementations.
EISA Bus
The Extended Industry Standard Architecture (EISA) bus is an open, 32-bit, asymmetrical I/O
channel with numerous compatible hardware implementations. The system bus and allows data
transfer rates at a bandwidth of up to 33 MB per second, supports a 4 GB address space, 8 DMA
channels, and is backward compatible with the Industry Standard Architecture (ISA) bus.
PCI Bus
The Peripheral Component Interconnect Local Bus (PCI) is an open, high-performance 32-bit or
64-bit synchronous bus with multiplexed address and data lines, and numerous compatible
hardware implementations. PCI bus support a PCI frequency of 33 MHz and a transfer rate of
132 MB per second.
Futurebus+ o m
.c
Futurebus+ is an open bus, designed by the IEEE 896 committee,t whose architecture and
interfaces are publicly documented, and that is independent of any p
o
underlying architecture. It has
broad-base, cross-industry support; very high throughput (the
g s maximum rate for 64-bit
bandwidth is 160 MB per second; for the 128-bit bandwidth,
lo 180 MB per second). Futurebus+
.
supports a 64-bit address space and a set of control and status b registers (CSRs) that provides all
u p
the necessary ability to enable or disable features; thus supporting multivendor interoperablity.
SCSI Bus r o
s g
The Small Computer Systems Interface (SCSI) n t bus is an ANSI standard for the interconnection
d
of computers with each other and with disks, e floppies, tapes, printers, optical disks, and scanners.
t u
The SCSI standard includes all the mechanical, electrical, and
Data transfer rates are individuallysnegotiated with each device attached to a given SCSI bus. For
i
example, a 4 MB per second device t y and a 10 MB per second device may share a fast narrow bus.
.cis usingis using
When the 4 MB per second device the bus, the transfer rate is 4 MB per second. When
the 10 MB per second devicew the bus, the transfer rate is 10 MB per second. However,
when faster devices arewplaced on a slower bus, their transfer rate is reduced to allow for proper
w
operation in that slower environment.
Note that the speed of the SCSI bus is a function of cable length, with slow, single-ended SCSI
buses supporting a maximum cable length of 6 meters, and fast, single-ended SCSI buses
supporting a maximum cable length of 3 meters.
TURBOchannel Bus
The TURBOchannel bus is a synchronous, 32-bit, asymmetrical I/O channel that can be operated
at any fixed frequency in the range 12.5 MHz to 25 MHz. It is also an open bus, developed by
Digital, whose architecture and interfaces are publicly documented.
At 12.5 MHz, the peak data rate is 50 MB per second. At 25 MHz, the peak data rate is 100 MB
per second.
The TURBOchannel is asymmetrical in that the base system processor and system memory are
defined separately from the TURBOchannel architecture. The I/O operations do not directly
address each other. All data is entered into system memory before being transferred to another
I/O option. The design facilitates a concise and compact protocol with very high performance.
XMI Bus
The XMI bus is a 64-bit wide parallel bus that can sustain a 100 MB per second bandwidth in a
single processor configuration. The bandwidth is exclusive of addressing overhead; the XMI bus
can transmit 100 MB per second of data.
The XMI bus implements a "pended protocol" design so that the bus does not stall between
requests and transmissions of data. Several transactions can be in progress at a given time. Bus
cycles not used by the requesting device are available to other devices on the bus. Arbitration
and data transfers occur simultaneously, with multiplexed data and address lines. These design
features are particularly significant when a combination of multiple devices has a wider
bandwidth than the bus itself.
VME Bus o m
o t.c
s p
Digital UNIX includes a generic VME interface layer that provides customers with a consistent
interface to VME devices across Alpha AXP workstation and server platforms. Currently, VME
o g
adapters are only supported on the TURBOchannel bus. To use the VME interface layer to write
bl
VMEbus device drivers, you must have the Digital UNIX TURBOchannel/VME Adapter Driver
.
u p
Version 2.0 software (Software Product Description 48.50.00) and its required processor and/or
hardware configurations (Software Support Addendum 48.50.00-A).
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
14
Timers
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn Standard Peripheral Devices most
commonly used in single purpose processors. They are
Pre-Requisite
Digital Electronics, Microprocessors
o m
t.c
14 Introduction
p o
The Peripherals of an embedded processor can either be on the same chip as the processor or can
be connected externally.
gs
o
External
Interrupts .bl
On-Chip u p ETC.
Interrupt Flash
r oRAM
On-Chip Timer 1 Counter
Control
s g Timer 0 Inputs
ent
CPU
u d
st
Busit y Serial
Osc
.c
Control 4 I/O Ports Port
w
w
w TXD RXD
P0 P2 P1 P3
For example in a typical embedded processor as shown in Fig.14.1 timer, interrupt. Serial port
and parallel ports reside on the single chip. These dedicated units are otherwise termed as
single-purpose processor. These units are designed to achieve the following objectives. They can
be a part of the microcontroller or can reside outside the chip and therefore should be properly
interfaced with the processor.
The tasks generally carried out by such units are
Timers, counters, watchdog timers
serial transmission
analog/digital conversions
Timer
Timer is a very common and useful peripheral. It is used to generate events at specific times or
measures the duration of specific events which are external to the processor. It is a
programmable device, i.e. the time period can be adjusted by writing specific bit patterns to some
of the registers called timer-control registers.
Counter
A counter is a more general version of the timer. It is used to count events in the form of pulses
which is fed to it.
o m
Fig.14.2(a) shows the block diagram of a simple timer. This has a 16-bit up counter which
increments with each input clock pulse. Thus the output value Cnt represents the number of
o t.c
pulses since the counter was last reset to zero. An additional output top indicates when the
terminal count has been reached. It may go high for a predetermined time as set by the
p
programmable control word inside the timer unit. The count can be loaded by the external
s
program.
o g
l
an internal clock or external clock. The mode bit when.b
Fig.14.2(b) provides the structure of another timer where a multiplexer is used to choose between
set or reset decided the selection. For
u p For the external count in (cnt_in) it
internal clock(Clk) it behaves like the timer in Fig.14.2(a).
just counts the number of occurrences.
r o
s g
Basic timer
n t Timer/counter
d e Clk
Clk 16 Cnt
t u 2x1 16-bit up 16 Cnt
16-bit up
y s mux counter
counter
iTopt
c
.
w Cnt_in Top
Reset w
w
Fig. 14.2(a)
Reset
Mode
Fig. 14.2(b)
Fig.14.2(c) shows a timer with the terminal count. This can generate an event if a particular
interval of time has been elapsed. The counter restarts after every terminal count.
16-bit up
Clk 16 Cnt
counter
Reset
=
Top
o m
o t.c
s p
Terminal count
o g
.bl
u p
r o
g
Fig.s14.2(c)
n t
ude
t
it ys
.c
w
w
w
Clock Amplitude
0
-2
0 5 10 15 20 25 30
Clock Pulse No.
10
Counter Value
Reset and Reload the Timer with a new count each time
0
0 5 10 15 20 25 30
Clock Pulse No.
2
o m
t.c
Output
1
p o
g s
0
o
bl
0 5 10 15 20 25 30
Clock Pulse No.
p .
u
Fig. 14.3 The Timer Count and Output. The timer is in count-down mode. In
o
gr
every clock pulse the count is decremented by 1. When the count value
reaches zero the output of the counter i.e. TOP goes high for a predetermined
s
nt
time. The counter has to be loaded with a new or previous value of the count
reaches zero. d e
by external program or it can be loaded automatically every time the count
u
st
Timer in 8051 Microcontroller
i ty
.c of 8051 which has got two timer units.
Fig.14.1 shows the architecture
The 8051 comes equipped w with two timers, both of which may be controlled, set, read, and
w
w of time between events, 2) Counting the events themselves, or 3)
configured individually. The 8051 timers have three general functions: 1) Keeping time and/or
calculating the amount
Generating baud rates for the serial port.
As mentioned before, the 8051 has two timers which each function essentially the same way.
One timer is TIMER0 and the other is TIMER1. The two timers share two Special Function
Registers(SFR) (TMOD and TCON) which control the timers, and each timer also has two SFRs
dedicated solely to itself (TH0/TL0 and TH1/TL1).
MODE0
Either Timer in Mode0 is an 8-bit Counter with a divide-by-32 pre-scaler. In this mode, the
Timer register is configured as a 13-Bit register. As the count rolls over from all 1s to all 0s, it
sets the Timer interrupt flag TF1. The counted input is enabled to the Timer whenTR1 = 1and
either GATE = 0 or INT1 = 1. (Setting GATE = 1 allows the Timer to be controlled by external
input INT1, to facilitate pulse width measurements.)
OSC + 12
ot.c
GATE
s p
INT1 PIN
o g
Fig. 14.4 Timer/Counter Mode b 0:l 13-BitCounter
p .
o u
(MSB) r
g C/T M1 (LSB)
GATE C/T M1 M0 sGATE
n t M0
Timer 1
d e M1 M0
Timer 0
Operating Mode
GATE
x is enabled only while INTxu
Gating control when set. Timer/Counter
t pin is 0 0 8-bit Timer/Counter THx with
high and TRx controls pin is set. TLx as 5-bit prescaler.
When cleared Timer tx
i y is enabled 0 1 16-bit Timer/Counter THx and
.c
whenever TRx control bit is set. TLx are cascaded; there is no
prescaler.
C/T Timer or Counter w Selector cleared for 1 0 8-bit auto-reload Timer/Counter
w
Timer operation (input from internal THx holds a value which is to be
w
system clock). Set for Counter
operation (input from Tx input pin).
reloaded
overflows.
into TLx each time it
(MSB) (LSB)
TF1 TR1 TF0 TR0 IE1 IT1 IE0 IT0
Symbol Position Name and Significance Symbol Position Name and Significance
TF1 TCON.7 Timer 1 overflow Flag. Set by IE1 TCON.3 Interrupt 1 Edge flag. Set by
hardware on Timer/Counter hardware when external
overflow. Cleared by hardware interrupt edge detected.
when processor vectors to Cleared when interrupt
interrupt routine. processed.
TR1 TCON.6 Timer 1 Run control bit. IT1 TCON.2 Interrupt 1 Type control bit.
Set/cleared by software to turn Set/cleared by software to
Timer/Counter on/off. specify falling edge/low level
triggered external interrupts.
TF0 TCON.5 Timer 0 overflow Flag. Set by
hardware on Timer/Counter
overflow. Cleared by hardware
IE0 TCON.1
o m
Interrupt 0 Edge flag. Set by
hardware when external
when processor vectors to
interrupt routine.
o t.c
interrupt edge detected.
Cleared when interrupt
bl
Timer/Counter on/off. Set/cleared by software to
o u
r
Timer/Counter Control Register (TCON)
g
ts
MODE 1: Mode 1 is the same as Mode 0,nexcept that the Timer register is being run with all
16bits.
d e
t u
y s
i t
.c
OSC + 12
wC/T = 0
w C/T TL1
TF1
w =1
CONTROL
(8 Bits) INTERRUPT
T1 PIN
TR1 RELOAD
GATE TH1
(5 Bits)
INT1 PIN
Fig. 14.5 MODE 2 configures the Timer register as an 8-bit counter with
automatic reload
OSC + 12 1/12 f
OSC
1/12 f
OSC
C/T = 0 TL0
(8 Bits) TF0 INTERRUPT
T1 PIN C/T = 1
CONTROL
TR1
GATE
INT1 PIN
1/12 f TH0
OSC
(8 Bits)
TF1
o mINTERRUPT
TR1
CONTROL
t. c
p o
Fig. 14.6 MODE 3: Timer simply holds its count. Timer 0 in Mode 3 establishes
TL0 and TH0 as two separate counters.
gs
o
The Programmable Interval Timer 8253. bl
u p
For processors where the timer unit is not internal o
Fig.14.7 shows the signals for 8253 programmable g r the programmable interval timer can be used.
interval timer.
ts
e n
1 d
D7
t u 24 Vcc
D6
D5 ty 3
s2 23 WR
i 22 RD
.D4D3c 5 8 2120 CS
4
w D2 6 2 19 A1A0
w
w D1D0 78 5 18 CLK OUT 2
2
17
CLK 0 9 16 GATE 2
OUT 0 10 3 15 CLK 1
GATE 0 11 14 GATE 1
GND 12 13 OUT 1
Microprocessor Counter
interface input/output
CLK 0
D7 D0 GATE 0
OUT 0
RD
8253 CLK 1
WR
GATE 1
OUT 1
A0
A1 CLK 2
GATE 2
OUT 2
o m
CS
o t.c
Fig. 14.7 The pin configuration of the timer
s p
o g
bl
Fig.14.8 shows the internal block diagram. There are three separate counter units controlled by
configuration register (Fig.14.9).
p .
Each counter has two inputs, clock and gate and one
counting by decrementing a preloaded value in the o
u output. The clock is signal that helps in
r respective counter register. The gate serves as
an enable input. If the gate is maintained lowgthe counting is disabled. The timing diagram
s of the timer.
explains in detail about the various modes of toperation
e n
u d CLK0
st Counter GATE0
it
Data y
D D
c
.Buffer
Bus #0 OUT0
w
Bus
7 0
w
w
RD
CLK1
WR Read
Counter GATE1
Write
A1 #1
Control
Internal
OUT1
A0 Logic
CS
CLK2
Power supplies Control
Counter GATE2
Vcc
Word
Register #2
OUT2
GND
Fig. 14.8 The internal block diagram of 8253 Table The address map
Version 2 EE IIT, Kharagpur 10
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
CS A1 A0 Port
0 0 0 Counter 0
0 0 1 Counter 1
0 1 0 Counter 2
0 1 1 Control register
D7 D6 D5 D4 D3 D2 D1 D0
SC1 SC0 RL1 RL0 M2 M1 M0
o m
BCD
o t.c
0
1 s p Binary counter (16-bit)
BCD (4 decades)
o g
. bl 00 00 0
1
Mode 0
Mode 1
u p 1 0 Mode 2
r o 1 1 Mode 3
s g 1 0 0 Mode 4
nt
1 0 1 Mode 5
.c Select counter 0
w 0 0
Select counter 1
w 0
1
1
0
w 1 1
Select counter 2
Illegal
Mode 0: The output goes high after the terminal count is reached. The counter stops if the Gate is
low. (Fig.14.10(a) & (b)). The timer count register is loaded with a count (say 6) when the WR
line is made low by the processor. The counter unit starts counting down with each clock pulse.
The output goes high when the register value reaches zero. In the mean time if the GATE is
made low (Fig.14.10(b)) the count is suspended at the value(3) till the GATE is enabled again.
CLK
WR
6 5 4 3 2 1
OUT
o m
o t.c
GATE
s p
o g
Fig. 14.10(a) Mode 0 count when Gate is high (enabled)
. bl
CLK
u p
r o
s g
e nt
WR
u d
st
6
t y 5 4 3 3 3 2 1
OUT
. ci
w
w
GATE w
Fig. 14.10(b) Mode 0 count when Gate is low temporarily (disabled)
completed then the counter will be suspended at that state as long as GATE is low
(Fig.14.11(b)). Thus it works as a mono-shot.
CLK
WR
GATE (trigger)
OUT
5 4 3 2 1
o m
o t.c
depending on the count s p
Fig. 14.11(a) Mode 1 The Gate goes high. The output goes low for the period
o g
CLK . bl
u p
r o
s g
nt
WR
d e
t u
GATE (trigger)
y s
it
.c
w
OUT
w 4 3 3 4 3 2 1
w
Fig. 14.11(b) Mode 1 The Gate pulse is disabled momentarily causing the
counter to stop.
CLK
WR
GATE
3 2 1 3 2 1
OUT
Fig. 14.12(a) Mode 2 Operation when the GATE is kept high
o m
CLK o t.c
s p
o g
WR . bl
u p
r o
s g
GATE
ent
d
u 1
OUT 3
st 2 3 3 2 1
i ty
.c
Fig. 14.12(b) Mode 2 operation when the GATE is disabled momentarily.
w Square Wave Rate Generator
w
Mode 3 Programmable
w
It is similar to Mode 2 but the output high and low period is symmetrical. The output goes high
after the count is loaded and it remains high for period which equals the count down period of
the counter register. The output subsequently goes low for an equal period and hence generates a
symmetrical square wave unlike Mode 2. The GATE has no role here. (Fig.14.13).
CLK
WR
n=4
OUT (n=4)
OUT (n=5)
p .
low for one clock period after the count down is complete. The count down can be suspended by
making the GATE low (Fig.14.14(a) (b)). This is also called a software triggered strobe as the
count down is initiated by a program.
ou
gr
CLK
s
ent
u d
WR st
i t y
.c
OUT
w
w
w 4 3 2 1
CLK
WR
GATE
OUT
3 3 2 1
4
o m
t.cis momentarily low
Fig. 14.14(b) Mode 4 Software Triggered Strobe when GATE
p o
Mode 5 Hardware Triggered Strobe g s
b lo
The count is loaded by the processor but the count down
p . is initiated by the GATE pulse. The
transition from low to high of the GATE pulse enables
o u count down. The output goes low for one
r
clock period after the count down is complete (Fig.14.15).
g
CLK ts
e n
u d
st
WR
i ty
.c
w
GATE w
w
OUT
5 4 3 2 1
Watchdog timer
A Watchdog Timer is a circuit that automatically invokes a reset unless the system being
watched sends regular hold-off signals to the Watchdog.
Watchdog Circuit
To make sure that a particular program is executing properly the Watchdog circuit is used.
For instance the program may reset a particular flip-flop periodically. And the flip-flop is set by
an external circuit. Suppose the flip-flop is not reset for long time it can be known by using
external hardware. This will indicate that the program is not executed properly and hence an
exception or interrupt can be generated.
Watch Dog Timer(WDT) provides a unique clock, which is independent of any external
clock. When the WDT is enabled, a counter starts at 00 and increments by 1 until it reaches FF.
o m
When it goes from FF to 00 (which is FF + 1) then the processor will be reset or an exception
o t.c
will be generated. The only way to stop the WDT from resetting the processor or generating an
exception or interrupt is to periodically reset the WDT back to 00 throughout the program. If the
p
program gets stuck for some reason, then the WDT will not be set. The WDT will then reset or
s
g
interrupt the processor. An interrupt service routine will be invoked to take into account the
o
bl
erroneous operation of the program. (getting stuck or going into infinite loop).
Conclusion p .
ou
gr
In this chapter you have learnt about the programmable timer/counter. For most of the embedded
s
nt
processors the timer is internal and exists along with the processor on the same chip. The 8051
microcontroller has 3 different internal timers which can be programmed in various modes by the
d e
configuration and mode control register. An external timer chip namely 8253 has also been
t u
discussed. It has 8 data lines 2 data lines, 1 chip select line and one read and one write control
s
line. The 16 bit counts of the corresponding registers can be loaded with two consecutive write
y
it
operations. Counters and Timers are used for triggering, trapping and managing various real time
.c
events. The least count of the timer depend on the clock. The stability of the clock decides the
w
accuracy of the timings. Timers can be used to generate specific baud rate clocks for
w
asynchronous serial communications. It can be used to measure speed, frequency and analog
w
voltages after Voltage to Frequency conversion. One important application of timer is to generate
Pulse-Width-Modulated (PWM) waveforms. In 8253 the GATE and pulse together can be used
together to generate pulse with different widths. These modulated pulses are used in electronic
power control to reduce harmonics and hence distortions.
You also learnt about the Watch dog circuit and Watch dog timers. These are used to monitor the
activity of a program and the processor.
Questions
Q1. Design a circuit using 8253 to measure the speed of any motor by counting the number of
pulses in definite period.
Q2. Write a pseudo code (any assembly code) to generate sinusoidal pulse width modulated
waveform from the 8253 timer.
Q3. Design a scheme to read temperature from a thermister circuit using a V/F converter and
Timer.
Q4. What are the differences in Mode 4 and Mode 5 operation of 8253 Timer?
Q5. Explain the circuit given in Fig.14.5.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
15
Interrupts
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn
Interrupts
Interrupt Service Subroutines
Polling
Priority Resolving
Daisy Chain Interrupts
Interrupt Structure in 8051 Microcontroller
Programmable Interrupt Controller
Pre-Requisite
Digital Electronics, Microprocessors
o m
t.c
15 Introduction
p o
Real Time Embedded System design requires that I/O devices receive servicing in an
g s
efficient manner so that large amounts of the total system tasks can be assumed by the processor
o
bl
with little or no effect on throughput. The most common method of servicing such devices is the
p .
polled approach. This is where the processor must test each device in sequence and in effect
ask each one if it needs servicing. It is easy to see that a large portion of the main program is
u
looping through this continuous polling cycle and that such a method would have a serious,
o
r
detrimental effect on system throughput, thus, limiting the tasks that could be assumed by the
g
s
microcomputer and reducing the cost effectiveness of using such devices. A more desirable
ent
method would be one that would allow the microprocessor to be executing its main program and
only stop to service peripheral devices when it is told to do so by the device itself. In effect, the
u d
method would provide an external asynchronous input that would inform the processor that it
st
should complete whatever instruction that is currently being executed and fetch a new routine
t y
that will service the requesting device. Once this servicing is complete, however, the processor
i
.c
would resume exactly where it left off. This can be effectively handled by interrupts.
A signal informing a program or a device connected to the processor that an event has
w
w
occurred. When a processor receives an interrupt signal, it takes a specified action depending on
w
the priority and importance of the entity generating the signal. Interrupt signals can cause a
program to suspend itself temporarily to service the interrupt by branching into another program
called Interrupt Service Subroutines (ISS) for the specified device which has caused the
interrupt.
Types of Interrupts
Interrupts can be broadly classified as
- Hardware Interrupts
These are interrupts caused by the connected devices.
- Software Interrupts
These are interrupts deliberately introduced by software instructions to generate user
defined exceptions
- Trap
These are interrupts used by the processor alone to detect any exception such as divide by
zero
Depending on the service the interrupts also can be classified as
- Fixed interrupt
Address of the ISR built into microprocessor, cannot be changed
Either ISR stored at address or a jump to actual ISR stored if not enough bytes available
- Vectored interrupt
Peripheral must provide the address of the ISR
Common when microprocessor has multiple peripherals connected by a system bus
Compromise between fixed and vectored interrupts
One interrupt pin
Table in memory holding ISR addresses (maybe 256 words)
Peripheral doesnt provide ISR address, but rather index into table
Fewer bits are sent by the peripheral
Can move ISR location without changing peripheral
o m
Maskable vs. Non-maskable interrupts
o t.c
Maskable: programmer can set bit that causes processor to ignore interrupt
p
This is important when the processor is executing a time-critical code
s
Non-maskable: a separate interrupt pin that cant be masked
o g
bl
Typically reserved for drastic situations, like power failure requiring immediate backup
of data to non-volatile memory
Example: Interrupt Driven Data Transfer (Fixed Interrupt)p .
ou
r
Fig.15.1(a) shows the block diagram of a system where it is required to read data from a input
g
s
nt
port P1, modify (according to some given algorithm) and send to port P2. The input port
generates data at a very slow pace. There are two ways to transfer data
e
(a) The processor waits till the input is ready with the data and performs a read operation from
d
u
P1 followed by a write operation to P2. This is called Programmed Data Transfer (b) The
t
s
other option is when the input/output device is slow then the device whenever is ready interrupts
y
t
the microprocessor through an Int pin as shown in Fig.15.1. The processor which may be
i
.c
otherwise busy in executing another program (main program here) after receiving the interrupts
w
calls an Interrupt Service Subroutine (ISR) to accomplish the required data transfer. This is
w
known as Interrupt Driven Data Transfer.
w
o m
Fig: 15.1(a) The Interrupt Driven Data Transferc
o t.
PC-Program counter, P1-Port 1 P2-Port 2, C-Microcontroller
s p
o g
l
P1.b
receives input data in a
pregister with address 0x8000.
Time
o m
t.c
Fig. 15.2(a)
p o
s input data in a
Time
. b
pP1 asserts Int to request servicing
u by the microprocessor.
After completing instruction at 100, C
r o
g
sees Int asserted, saves the PCs value of
s
nt
100, and asserts Inta. P1 detects Inta and puts
interrupt address vector 16 on
d e the data bus.
C jumps to the address on t uthe 0x8000,
bus (16).
The ISR there reads data
y s from
i
modifies the data, and t writes the After being read, P1 deasserts
.c
resulting data to 0x8001. Int.
w
w
w
The ISR returns, thus restoring PC to
100+1=101, where P resumes
executing.
CPU
Bus Serial
Osc Control Four I/O Ports Port
o m
t.c
TXD RXD
P0 P2 P1 P3
p o
Address/Data
g s
o
bl
Fig. 15.3 The 8051 Architecture
.
The 8051 has 5 interrupt sources: 2 external interrupts,
u p 2 timer interrupts, and the serial port
interrupt.
r o
These interrupts occur because of
s g
1. timers overflowing
2. receiving character via the serial portn t
d
3. transmitting character via the serialeport
4. Two external events
t u
s
Interrupt Enables ti y
. c
Each interrupt source canwbe individually enabled or disabled by setting or clearing a bit in a
w (SFR) named IE (Interrupt Enable). This register also contains a global
Special Function Register
wbe cleared
disable bit, which can to disable all interrupts at once.
Interrupt Priorities
Each interrupt source can also be individually programmed to one of two priority levels by
setting or clearing a bit in the SFR named IP (Interrupt Priority). A low-priority interrupt can be
interrupted by a high-priority interrupt, but not by another low-priority interrupt. A high-priority
interrupt cant be interrupted by any other interrupt source. If two interrupt requests of different
priority levels are received simultaneously, the request of higher priority is serviced. If interrupt
requests of the same priority level are received simultaneously, an internal polling sequence
determines which request is serviced. Thus within each priority level there is a second priority
structure determined by the polling sequence. In operation, all the interrupt flags are latched into
the interrupt control system during State 5 of every machine cycle. The samples are polled
during the following machine cycle. If the flag for an enabled interrupt is found to be set (1), the
Version 2 EE IIT, Kharagpur 7
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
interrupt system generates a CALL to the appropriate location in Program Memory, unless some
other condition blocks the interrupt. Several conditions can block an interrupt, among them that
an interrupt of equal or higher priority level is already in progress. The hardware-generated
CALL causes the contents of the Program Counter to be pushed into the stack, and reloads the
PC with the beginning address of the service routine.
o m
ot.c
s p
o g
. bl
u p
r o
s g
nt
de
High Priority
IE Register IP Register Interrupt
0
t u
ys
INT0 IT0 IE0
1
i t
TF0 .c Interrupt Pol-
w ling Sequence
0 w
INT1
1
w
IT1 IE1
TF1
RI
TI
The service routine for each interrupt begins at a fixed location (fixed address interrupts). Only
the Program Counter (PC) is automatically pushed onto the stack, not the Processor Status Word
(which includes the contents of the accumulator and flag register) or any other register. Having
only the PC automatically saved allows the programmer to decide how much time should be
spent saving other registers. This enhances the interrupt response time, albeit at the expense of
increasing the programmers burden of responsibility. As a result, many interrupt functions that
are typical in control applications toggling a port pin for example, or reloading a timer, or
unloading a serial buffer can often be completed in less time than it takes other architectures to
complete.
o m
Interrupt Interrupt Description
o t.c
Number Vector Address
s p
0 0003h g
EXTERNAL 0
o
1 000Bh
bl
TIMER/COUNTER 0
.
2 0013h
u p
EXTERNAL 1
3 001Bh
r o TIMER/COUNTER 1
4 0023h
s g SERIAL PORT
Priority Arbiter
C
System bus
7
Inta 5
Priority Peripheral Peripheral
Int arbiter 1 2
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
o m
t.c
Fig. 15.5 The Priority Arbitration
Let us assume that the Priority of the devices are Device1 > Device 2
p o
1. The Processor is executing its program.
g s
o
2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts
Ireq2.
.bl
u
4. Processor stops executing its program and stores its state.
p
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
8. The processor jumps to the address of ISR read from data bus, ISR executes and returns.
9. The flag is reset.
The processor now check for the next device which has interrupted simultaneously.
C System bus
Peripheral 1 Peripheral 2
Inta Ack_in Ack_out Ack_in Ack_out
Int Req_out Req_in Req_out Req_in 0
d e
t uCPU
INT
s
ti y
.c
w PIC
w
w
RAM I/O (1)
I/O (N)
Each peripheral device or structure usually has a special program or routine that is associated
with its specific functional or operational requirements; this is referred to as a service routine.
The PlC, after issuing an interrupt to the CPU, must somehow input information into the CPU
that can point (vector) the Program Counter to the service routine associated with the requesting
device.
The PIC manages eight levels of requests and has built-in features for expandability to other PIC
(up to 64 levels). It is programmed by system software as an I/O peripheral. The priority modes
can be changed or reconfigured dynamically at any time during main program operation.
D[7..0] IR0
A[0..0] IR1
RD IR2
WR IR3
INT Intel 8259 IR4
INTA IR5
IR6
CAS[2..0] IR7
SP/EN
INTA INT
DATA
D7-D0 BUS
BUFFER
CONTROL LOGIC
o m
o t.c
RD READ/
s p IR0
IR1
WR
A0
WRITE
LOGIC
IN-
SERVICE
o g
PRIORITY
INTERRUPT
REQUEST
IR2
IR3
bl
REG RESOLVER REG IR4
(ISR) IR5
CS
p . (IRR)
IR6
IR7
CAS 0 CASCADE
o u
INTERRUPT MASK REG
CAS 1
CAS 2
BUFFER
COMPARATOR
gr (IMR)
s
nt
SP/EN INTERNAL BUS
d e
Fig. 15.9 The Functional Block Diagram
t u
y s
it Table of Signals of the PIC
Signal .c Description
wThese wires are connected to the system bus and are used by the
D[7..0] w microprocessor
A[0..0]
w to write or read the internal registers of the 8259.
This pin acts in conjunction with WR/RD signals. It is used by
the 8259 to decipher various command words the microprocessor
writes and status the microprocessor wishes to read.
WR When this write signal is asserted, the 8259 accepts the command
on the data line, i.e., the microprocessor writes to the 8259 by
placing a command on the data lines and asserting this signal.
RD When this read signal is asserted, the 8259 provides on the data
lines its status, i.e., the microprocessor reads the status of the
8259 by asserting this signal and reading the data lines.
INT This signal is asserted whenever a valid interrupt request is
received by the 8259, i.e., it is used to interrupt the
microprocessor.
INTA This signal, is used to enable 8259 interrupt-vector data onto the
data bus by a sequence of interrupt acknowledge pulses issued by
the microprocessor.
IR 0,1,2,3,4,5,6,7 An interrupt request is executed by a peripheral device when one
of these signals is asserted.
CAS[2..0] These are cascade signals to enable multiple 8259 chips to be
chained together.
SP/EN This function is used in conjunction with the CAS signals for
cascading purposes.
Fig.15.10 shows the daisy chain connection of a number of PICs. The extreme right PIC
interrupts the processor. In this figure the processor can entertain up to 24 different interrupt
requests. The SP/EN signal has been connected to Vcc for the master and grounded for the
slaves.
o g
.bl
CS A0 D7 D0 INTA INT
CAS 0
CS A0 D7 D0 INTA INT
CAS 0
82C59A SLAVE B CAS 1 u p CS A0 D7 D0 INTA INT
CAS 0
82C59A SLAVE A CAS 1
CAS 2
r o CAS 2
CAS 1 MASTER 82C59A
CAS 2
SP/EN 7 6 5 4 3 2 1 0
g
SP/EN 7 6 5 4 3 2 1 0
s
SP/EN 7 6 5 4 3 2 1 0
GND 7 6 5 4 3 2 1 0 GND 7
ent
6 5 4 3 2 1 0 VCC 7 6 5 4 3 2 1 0
u d
st INTERRUPT REQUESTS
it y
.c
Fig. 15.10 Nested Connection of Interrupts
w
Software Interrupts
w
w
These are initiated by the program by specific instructions. On encountering such instructions the
CPU executes an Interrupt service subroutine.
Conclusion
In this chapter you have learnt about the Interrupts and the Programmable Interrupt
Controller. Different methods of interrupt services such as Priority arbitration and Daisy Chain
arbitration have been discussed. In real time systems the interrupts are used for specific cases
and the time of execution of these Interrupt Service Subroutines are almost fixed. Too many
interrupts are not encouraged in real time as it may severely disrupt the services. Please look at
problem no.1 in the exercise.
Most of the embedded processors are equipped with an interrupt structure. Rarely there is a
need to use a PIC. Some of the entry level microcontrollers do not have an inbuilt exception
handler called trap. The trap is also an interrupt which is used to handle some extreme processor
conditions such as divide by 0, overflow etc.
Question Answers
Q1. A computer system has three devices whose characteristics are summarized in the following
table:
o m
t.c
Service time indicates how long it takes to run the interrupt handler for each device. The
maximum time allowed to elapse between an interrupt request and the start of the interrupt
p o
handler is indicated by allowable latency. If a program P takes 100 seconds to execute when
s
interrupts are disabled, how long will P take to run when interrupts are enabled?
g
o
Ans:
. bl
u p
The CPU time taken to service the interrupts must be found out. Let us consider Device 1. It
takes 400 s to execute and occurs at a frequency of 1/(800s) (1250 times a second). Consider a
r o
time quantum of 1 unit.
s g
The Device 1 shall take (150+50)/800= 1/4 unit
The Device 2 shall take (50+50)/1000=1/10nunit
t
The Device 3 shall take (100+100)/800=1/4 d e unit
u by all these devices is (1/4+1/10+1/4) = 0.6 units
In one unit of real time the cpu time ttaken
y s
t
i can be used by the Program P. For 100 seconds of CPU time
. c
The cpu idle time 0.4 units which
the Real Time required will be 100/0.4= 250 seconds
w
w
Q.2 What is TRAP?
w
Ans:
The term trap denotes a programmer initiated and expected transfer of control to a special
handler routine. In many respects, a trap is nothing more than a specialized subroutine call.
Many texts refer to traps as software interrupts. Traps are usually unconditional; that is, when
you execute an Interrupt instruction, control always transfers to the procedure associated with
the trap. Since traps execute via an explicit instruction, it is easy to determine exactly which
instructions in a program will invoke a trap handling routine.
Ans:
For vectored interrupts the processor expects the address from the external device. Once it
receives the interrupt it starts an Interrupt acknowledge cycle as shown in the figure. In the
figure TN is the last clock state of the previous instruction immediately after which the processor
checks the status of the Intr pin which has already become high by the external device. Therefore
the processor starts an INTA cycle in which it brings the interrupt vector through the data lines.
If the data lines arte 8-bits and the address required is 16 bits there will be two I/O read. If the
interrupt vector is a number which will be vectored to a look up table then only 8-bits are
required and hence one I/O read will be there.
TN T1 T2 T3
o m
CLK
o t.c
s p
INTREQ
o g
.bl
INTACK
u p
r o
Data
s g Address code
e nt
Last machine cycle
u dInterrupt Acknowledge machine cycle
of instruction
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
16
DMA
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would learn
Pre-Requisite
Digital Electronics, Microprocessors
o m
16(I) Introduction o t.c
s p
g
Drect Memory Access (DMA) allows devices to transfer data without subjecting the
o
bl
processor a heavy overhead. Otherwise, the processor would have to copy each piece of data
p .
from the source to the destination. This is typically slower than copying normal blocks of
memory since access to I/O devices over a peripheral bus is generally slower than normal system
o u
RAM. During this time the processor would be unavailable for any other tasks involving
gr
processor bus access. But it can continue to work on any work which does not require bus
s
nt
access. DMA transfers are essential for high performance embedded systems where large chunks
of data need to be transferred from the input/output devices to or from the primary memory.
d e
16(II) DMA Controllertu
s
A DMA controller is iatydevice, usually peripheral to a CPU that is programmed to
. c on behalf of the CPU. A DMA controller can directly access
perform a sequence of data transfers
w A DMA
memory and is used to transfer data from one memory location to another, or from an I/O device
w
to memory and vice versa. controller manages several DMA channels, each of which
can be programmedwto perform a sequence of these DMA transfers. Devices, usually I/O
peripherals, that acquire data that must be read (or devices that must output data and be written
to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request
(DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This
signal is monitored and responded to in much the same way that a processor handles interrupts.
When the DMA controller sees a DMA request, it responds by performing one or many data
transfers from that I/O device into system memory or vice versa. Channels must be enabled by
the processor for the DMA controller to respond to DMA requests. The number of transfers
performed, transfer modes used, and memory locations accessed depends on how the DMA
channel is programmed. A DMA controller typically shares the system memory and I/O bus with
the CPU and has both bus master and slave capability. Fig.16.1 shows the DMA controller
architecture and how the DMA controller interacts with the CPU. In bus master mode, the DMA
controller acquires the system bus (address, data, and control lines) from the CPU to perform the
DMA transfers. Because the CPU releases the system bus for the duration of the transfer, the
process is sometimes referred to as cycle stealing.
In bus slave mode, the DMA controller is accessed by the CPU, which programs the
DMA controller's internal registers to set up DMA transfers. The internal registers consist of
source and destination address registers and transfer count registers for each DMA channel, as
well as control and status registers for initiating, monitoring, and sustaining the operation of the
DMA controller.
DMA Controller
...
Status Register
...
o m
CPU
Enable/
Disable
Mask Register
ot.c
DMA Channel X
s p
Base
o g
bl
Count
TC
Current
p .
Count
o u
Base
gr
s
nt
Address
Current
Address
d e
t u
y s
t
ci
Base Request
DMA Arbitration
Logic . Base Grant
w
w
w
DACKX
DRQX
TC
PC Bus
o m
direction of the transfer. In other words, a flyby DMA transfer looks like a memory read or write
cycle with the DMA controller supplying the address and the I/O device reading or writing the
ot.c
data. Because flyby DMA transfers involve a single memory cycle per data transfer, these
transfers are very efficient. Fig.16.2 shows the flyby DMA transfer signal protocol.
s p
o g DMA request remains
bl
DMA Request
(I/O Device) high for additional
DMA Acknowledge*
p . transfers.
(DMA Controller)
o u
I/O Read*
gr
s
nt
(DMA Controller)
Memory Write*
d e
(DMA Controller)
t u
y s
t
ci
Address Memory Address
(DMA Controller)
.
Data w Data
w
I/O Device
w
Fig. 16.2 Flyby DMA transfer
Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory
and I/O transfers.
DMA Request
I/O Device
I/O Read*
Memory Write*
(DMA
Controller)
o m
t.c
Fig. 16.3 Fetch-and-Deposit DMA Transfer
p o
Single, block, and demand are the most common transfer modes. Single transfer mode
gs
transfers one data value for each DMA request assertion. This mode is the slowest method of
o
bl
transfer because it requires the DMA controller to arbitrate for the system bus with each transfer.
p .
This arbitration is not a major problem on a lightly loaded bus, but it can lead to latency
problems when multiple devices are using the bus. Block and demand transfer modes increase
u
system throughput by allowing the DMA controller to perform multiple DMA transfers when the
o
r
DMA controller has gained the bus. For block mode transfers, the DMA controller performs the
g
s
entire DMA sequence as specified by the transfer count register at the fastest possible rate in
e nt
response to a single DMA request from the I/O device. For demand mode transfers, the DMA
controller performs DMA transfers at the fastest possible rate as long as the I/O device asserts its
u d
DMA request. When the I/O device unasserts this DMA request, transfers are held off.
st
DMA Controller Operation i ty
.c
For each channel,wthe DMA controller saves the programmed address and count in the
registers, as shown w
w copies of the information in the current address and current count
base registers and maintains
in Fig.16.1. Each DMA channel is enabled and disabled via a DMA mask
register. When DMA is started by writing to the base registers and enabling the DMA channel,
the current registers are loaded from the base registers. With each DMA transfer, the value in the
current address register is driven onto the address bus, and the current address register is
automatically incremented or decremented. The current count register determines the number of
transfers remaining and is automatically decremented after each transfer. When the value in the
current count register goes from 0 to -1, a terminal count (TC) signal is generated, which
signifies the completion of the DMA transfer sequence. This termination event is referred to as
reaching terminal count. DMA controllers often generate a hardware TC pulse during the last
cycle of a DMA transfer sequence. This signal can be monitored by the I/O devices participating
in the DMA transfers. DMA controllers require reprogramming when a DMA channel reaches
TC. Thus, DMA controllers require some CPU time, but far less than is required for the CPU to
service device I/O interrupts. When a DMA channel reaches TC, the processor may need to
reprogram the controller for additional DMA transfers. Some DMA controllers interrupt the
processor whenever a channel terminates. DMA controllers also have mechanisms for
automatically reprogramming a DMA channel when the DMA transfer sequence completes.
These mechanisms include auto initialization and buffer chaining. The auto initialization feature
repeats the DMA transfer sequence by reloading the DMA channel's current registers from the
base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is
useful for transferring blocks of data into noncontiguous buffer areas or for handling double-
buffered data acquisition. With buffer chaining, a channel interrupts the CPU and is programmed
with the next address and count parameters while DMA transfers are being performed on the
current buffer. Some DMA controllers minimize CPU intervention further by having a chain
address register that points to a chain control table in memory. The DMA controller then loads
its own channel parameters from memory. Generally, the more sophisticated the DMA
controller, the less servicing the CPU has to perform.
A DMA controller has one or more status registers that are read by the CPU to determine
the state of each DMA channel. The status register typically indicates whether a DMA request is
asserted on a channel and whether a channel has reached TC. Reading the status register often
o m
clears the terminal count information in the register, which leads to problems when multiple
programs are trying to use different DMA channels.
Steps in a Typical DMA cycle
o t.c
p
Device wishing to perform DMA asserts the processors bus request signal.
s
g
1. Processor completes the current bus cycle and then asserts the bus grant signal to the
o
bl
device.
2. The device then asserts the bus grant ack signal. p .
u o
r
3. The processor senses in the change in the state of bus grant ack signal and starts listening
g
to the data and address bus for DMA activity.
s
t from the source to destination address.
4. The DMA device performs the transfer
e n
5. u d
During these transfers, the processor monitors the addresses on the bus and checks if any
location modified during DMA t
sthe bus,operations is cached in the processor. If the processor
detects a cached addresston
i y it can take one of the two actions:
. c
o Processor invalidates the internal cache entry for the address involved in DMA
w
w
write operation
w updates the internal cache when a DMA write is detected
o Processor
6. Once the DMA operations have been completed, the device releases the bus by asserting
the bus release signal.
7. Processor acknowledges the bus release and resumes its bus cycles from the point it left
off.
bl
DREQ0 19 22 DB6
p.
(GND) VSS 20 21 DB7
o u
Fig. 16.4 The DMA pin-out
gr
EOP DECREMENTOR
ts INC/DECREMENTOR IO
RESET TEMP WORD
en TEMP ADDRESS BUFFER
A0-A3
u d REG (16)
READY
CLK TIMING
st 16-BIT BUS
AEN AND
it y
READ BUFFER
16-BIT BUS
READ WRITE BUFFER OUTPUT A4-A7
.c
ADSTB CONTROL
BUFFER
MEMR BASE CURRENT
MEMW w BASE
ADDRESS
WORD
CURRENT
ADDRESS WORD
w
A8-A15
COUNT COUNT
IOR (16) (16)
IOW w (16) (16)
COMMAND
CONTROL
WRITE READ
BUFFER D0-D1
BUFFER
4
DREQ0- PRIORITY COMMAND
DREQ3 ENCODER (8) IO
INTERNAL DATA BUS
HLDA AND BUFFER
ROTATING MASK
HRQ
DB0-DB7
4 PRIORITY (4)
DACK0- LOGIC
DACK3 REQUEST STATUS TEMPORARY
(4) MODE (8) (8)
(4 x 6)
o
HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates m
that it has relinquished control of the system busses.
DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual o t.c
s p
asynchronous channel request inputs used by peripheral circuits to obtain DMA service. In Fixed
g
Priority, DREQ0 has the highest priority and DREQ3 has the lowest priority. A request is
o
. bl
generated by activating the DREQ line of a channel. DACK will acknowledge the recognition of
a DREQ signal. Polarity of DREQ is programmable. RESET initializes these lines to active high.
u p
DREQ must be maintained until the corresponding DACK goes active. DREQ will not be
r o
recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low
(inactive) and the corresponding mask bit set.
s g
DB0-DB7: DATA BUS: The Data Bus lines tare bidirectional three-state signals connected to the
system data bus. The outputs are enablede innthe Program condition during the I/O Read to output
ud outputs are
the contents of a register to the CPU. The
I/O Write cycle when the CPU is tprogramming
disabled and the inputs are read during an
s the address are output onto the data bus to be strobed into an
cycles, the most significant 8-bitsyof
the 82C37A control registers. During DMA
i t
. c
external latch by ADSTB. In memory-to-memory operations, data from the memory enters the
82C37A on the data bus during the read-from-memory transfer, then during the write-to-memory
w write the data into the new memory location.
transfer, the data bus outputs
w
IOR: READ: I/O Read w is a bidirectional active low three-state line. In the Idle cycle, it is an
input control signal used by the CPU to read the control registers. In the Active cycle, it is an
output control signal used by the 82C37A to access data from the peripheral during a DMA
Write transfer.
IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an
input control signal used by the CPU to load information into the 82C37A. In the Active cycle, it
is an output control signal used by the 82C37A to load data to the peripheral during a DMA Read
transfer.
EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal.
Information concerning the completion of DMA services is available at the bidirectional EOP
pin. The 82C37A allows an external signal to terminate an active DMA service by pulling the
EOP pin low. A pulse is generated by the 82C37A when terminal count (TC) for any channel is
reached, except for channel 0 in memory-to-memory mode. During memory-to-memory
transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by an
open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP
pulse occurs, whether internally or externally generated, the 82C37A will terminate the service,
and if auto-initialize is enabled, the base registers will be written to the current registers of that
channel. The mask bit and TC bit in the status word will be set for the currently active channel
by EOP unless the channel is programmed for autoinitialize. In that case, the mask bit remains
clear.
A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals.
In the Idle cycle, they are inputs and are used by the 82C37A to address the control register to be
loaded or read. In the Active cycle, they are outputs and provide the lower 4-bits of the output
address.
A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide
4-bits of address. These lines are enabled only during the DMA service.
o m
HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the
system bus. When a DREQ occurs and the corresponding mask bit is clear, or a software DMA
o t.c
request is made, the 82C37A issues HRQ. The HLDA signal then informs the controller when
access to the system busses is permitted. For stand-alone operation where the 82C37A always
s p
controls the busses, HRQ may be tied to HLDA. This will result in one S0 state before the
transfer.
o g
DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge
b l is used to notify the individual
p .
peripherals when one has been granted a DMA cycle. The sense of these lines is programmable.
RESET initializes them to active low. u
o the 8-bit latch containing the upper 8
AEN: ADDRESS ENABLE: Address Enable enables r
g can also be used to disable other system bus
address bits onto the system address bus. AEN s
thigh.
drivers during DMA transfers. AEN is active n
ADSTB: ADDRESS STROBE: This d isean active high signal used to control latching of the
t u the strobe input of external transparent octal latches,
upper address byte. It will drive directly
s
ti y operation through elimination of S1 states. ADSTB timing
such as the 82C82. During block operations, ADSTB will only be issued when the upper address
byte must be updated, thus speeding
.c of the 82C37A clock.
is referenced to the falling edge
w The Memory Read signal is an active low three-state output used to
MEMR: MEMORY READ:
w
w
access data from the selected memory location during a DMA Read or a memory-to-memory
transfer.
MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used
to write data to the selected memory location during a DMA Write or a memory-to-memory
transfer.
NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.
Functional Description
The 82C37A direct memory access controller is designed to improve the data transfer rate in
systems which must transfer data from an I/O device to memory, or move a block of memory to
an I/O device. It will also perform memory-to-memory block moves, or fill a block of memory
with data from a single location. Operating modes are provided to handle single byte transfers as
well as discontinuous data streams, which allows the 82C37A to control data movement with
software transparency. The DMA controller is a state-driven address and control signal
generator, which permits data to be transferred directly from an I/O device to memory or vice
versa without ever being stored in a temporary register. This can greatly increase the data
transfer rate for sequential operations, compared with processor move or repeated string
instructions. Memory-to-memory operations require temporary internal storage of the data byte
between generation of the source and destination addresses, so memory-to-memory transfers take
place at less than half the rate of I/O operations, but still much faster than with central processor
techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control
block, priority block, and internal registers are the main components. The timing and control
block derives internal timing from clock input, and generates external control signals. The
Priority Encoder block resolves priority contention between DMA channels requesting service
simultaneously.
DMA Operation m
o
c basically connected in
In a system, the 82C37A address and control outputs and data bus pins are
t.
parallel with the system busses. An external latch is required for theoupper address byte. While
inactive, the controllers outputs are in a high impedance state.pWhen activated by a DMA
request and bus control is relinquished by the host, the 82C37A g sdrives the busses and generates
the control signals to perform the data transfer. The operation
b loperformed by activating one of the
four DMA request inputs has previously been programmed . into the controller via the Command,
p if a block of data is to be transferred
Mode, Address, and Word Count registers. For example,
from RAM to an I/O device, the starting address of o u
the data is loaded into the 82C37A Current
r
g Mode register is programmed for a memory-
and Base Address registers for a particular channel, and the length of the block is loaded into the
channels Word Count register. The corresponding s
toptions are selected by the Command register and
to-I/O operation (read transfer), and various n
e mask bit is cleared to enable recognition of a DMA
d
the other Mode register bits. The channels
u
request (DREQ). The DREQ can either
initiated, the block DMA transfer t be a hardware signal or a software command. Once
s will proceed as the controller outputs the data address,
simultaneous MEMR and IOWtypulses, and selects an I/O device via the DMA acknowledge
c i flows directly from the RAM to the I/O device. After each byte
(DACK) outputs. The data byte
.
is transferred, the addresswis automatically incremented (or decremented) and the word count is
To further understand 82C37A operation, the states generated by each clock cycle must
be considered. The DMA controller operates in two major cycles, active and idle. After being
programmed, the controller is normally idle until a DMA request occurs on an unmasked
channel, or a software request is given. The 82C37A will then request control of the system
busses and enter the active cycle. The active cycle is composed of several internal states,
depending on what options have been selected and what type of operation has been requested.
The 82C37A can assume seven separate states, each composed of one full clock period. State I
(SI) is the idle state. It is entered when the 82C37A has no valid DMA requests pending, at the
end of a transfer sequence, or when a Reset or Master Clear has occurred. While in SI, the DMA
controller is inactive but may be in the Program Condition (being programmed by the processor).
State 0 (S0) is the first state of a DMA service. The 82C37A has requested a hold but the
processor has not yet returned an acknowledge. The 82C37A may still be programmed until it
has received HLDA from the CPU. An acknowledge from the CPU will signal the DMA transfer
may begin. S1, S2, S3, and S4 are the working state of the DMA service. If more time is needed
to complete a transfer than is available with normal timing, wait states (SW) can be inserted
between S3 and S4 in normal transfers by the use of the Ready line on the 82C37A. For
compressed transfers, wait states can be inserted between S2 and S4. Note that the data is
transferred directly from the I/O device to memory (or vice versa) with IOR and MEMW (or
MEMR and IOW) being active at the same time. The data is not read into or driven out of the
82C37A in I/O-to-memory or memory-to-I/O DMA transfers. Memory-to-memory transfers
require a read-from and a write-to memory to complete each transfer. The states, which resemble
the normal working states, use two-digit numbers for identification. Eight states are required for
a single transfer. The first four states (S11, S12, S13, S14) are used for the read-from-memory
half and the last four state (S21, S22, S23, S24) for the write-to-memory half of the transfer.
16(IV) Conclusion
o m
This lesson has given an overview of DMA controller. The controllers are normally used in high-
o t.c
performance embedded systems where large bulks of data need to transferred from the input to
the memory. One such system is a on-board Digital Signal Processor in a mobile telephone.
s p
Besides fast digital coding and decoding at times this processor is required to process the voice
g
signals to improve the quality. This has to take place in real time. While the voice message is
o
bl
streaming in through the AD-converter it need to be transferred and windowed for filtering.
.
DMA offers a great help here. For simpler systems DMA is not normally used.
p
o u
The signals and functional architecture of a very familiar DMA controller(8237) used in personal
gr
computers has been discussed. For more detailed discussions the readers are requested to visit
s
nt
www.intel.com or any other manufactures and read the datasheet.
16(V) de
Questions and Answers
u
t
s systems? Justify your answers
ti y
Q.1. Can you use 82C37A in embedded
Ans: Only high performance. csystems where the power supply constraints are not stringent. The
supply voltage is 5V andwthe current may reach up to 16 mA resulting in 80 mW of power
consumption. w
w
Q.2 Highlight on different modes of DMA data transfer. Which mode consumes the list power
and which mode is the fastest?
Q.3. Draw the architecture of 8237 and explain the various parts.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
17
USB and IrDA
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to learn basics of
The Universal Serial Bus Signals
The IrDA standard
Pre-Requisite
Digital Electronics, Microprocessors
o t.c
buses are not enough to carry the data as fast as it is desired. So a group of leading computer and
telecom firms including IBM, Intel, Microsoft, Compaq, Digital Equipment, NEC and Northern
Telecom got together and developed USB.
s p
o g
bl
The USB is a medium-speed serial data bus designed to carry relatively large amounts of
p .
data over relatively short cables: up to about five meters long. It can support data rates of up to
12Mb/s (megabits per second). The USB is an addressable bus system, with a seven-bit address
u
code so it can support up to 127 different devices or nodes at once (the all zeroes code is not a
o
r
valid address). However it can have only one host. The host with its peripherals connected via
g
s
the USB forms a star network. On the other hand any device connected to the USB can have a
ent
number of other nodes connected to it in daisy-chain fashion, so it can also form the hub for a
mini-star sub-network. Similarly you can have a device which purely functions as a hub for other
u d
node devices, with no separate function of its own. This expansion via hubs is because the USB
st
supports a tiered star topology, as shown in Fig.17.1. Each USB hub acts as a kind of traffic cop
t y
for its part of the network, routing data from the host to its correct address and preventing bus
i
.c
contention clashes between devices trying to send data at the same time. On a USB hub device,
the single port used to connect to the host PC either directly or via another hub is known as the
w
w
upstream port, while the ports used for connecting other devices to the USB are known as the
w
downstream ports. This is illustrated in Fig.17.2. USB hubs work transparently as far as the host
PC and its operating system are concerned. Most hubs provide either four or seven downstream
ports, or less if they already include a USB device of their own. Another important feature of the
USB is that it is designed to allow hot swapping i.e. devices can be plugged into and unplugged
from the bus without having to turn the power off and on again, re-boot the PC or even manually
start a driver program. A new device can simply be connected to the USB, and the PCs
operating system should recognize it and automatically set up the necessary driver to service it.
USB Host
(PC)
USB Hub
o m
Fig. 17.1 The USB is a medium speed serial bus used to transfer data between a
t.c
PC and its peripherals. It uses a tiered star configuration, with expansion via
hubs (either separate, or in USB devices).
p o
g s
o
. bl
u p
r o
s g
USBntMINI HUB
PC
d ePort 1 Port 2 Port 3 Port 4
t u
it ys
.c
w
Upstream port Downstream ports
w
(from PC) (to more devices)
w
Fig. 17.2 The port on a USB device or hub which connects to the PC host (either directly or
via another hub) is known as the upstream port, while hub ports which connect to
additional USB devices are downstream ports. Downstream ports use Type A sockets, while
upstream ports use Type B sockets.
port, so if they require less than this figure for operation they can be bus powered. If they need
more, they have to use their own power supply such as a plug-pack adaptor. Hubs should be able
to supply up to 500mA at 5V from each downstream port, if they are not bus powered. Serial
data is sent along the USB in differential or push-pull mode, with opposite polarities on the two
signal lines. This improves the signal-to-noise ratio (SNR), by doubling the effective signal
amplitude and also allowing the cancellation of any common-mode noise induced into the cable.
The data is sent in non-return-to-zero (NRTZ) format, with signal levels of 3.3V peak (i.e., 6V
peak differential). USB cables use two different types of connectors: Type-A plugs for the
upstream end, and Type B plugs for the downstream end. Hence the USB ports of PCs are
provided with matching Type-A sockets, as are the downstream ports of hubs, while the
upstream ports of USB devices (including hubs) have Type B sockets. Type-A plugs and sockets
are flat in shape and have the four connections in line, while Type B plugs and sockets are much
squarer in shape and have two connections on either side of the centre spigot (Fig.17.3). Both
types of connector are polarized so they cannot be inserted the wrong way around. Fig.17.3
shows the pin connections for both type of connector, with sockets shown and viewed from the
o m
front. Note that although USB cables having a Type-A plug at each end are available, they
o t.c
should never be used to connect two PCs together, via their USB ports. This is because a USB
network can only have one host, and both would try to claim that role. In any case, the cable
p
would also short their 5V power rails together, which could cause a damaging current to flow.
s
g
USB is not designed for direct data transfer between PCs. All normal USB connections should be
o
bl
made using cables with a Type A plug at one end and a Type B plug at the other, although
p .
extension cables with a Type A plug at one end and a Type A socket at the other can also be
used, providing the total extended length of a cable doesnt exceed 5m. By the way, USB cables
u
are usually easy to identify as the plugs have a distinctive symbol molded into them (Fig.17.4).
o
gr
Data formats (Fig.17.5) s
ent
d
USB data transfer is essentially in the form of packets of data, sent back and forth between the
u
st
host and peripheral devices. However because USB is designed to handle many different types of
data, it can use four different data formats as appropriate. One of the two main formats is bulk
it y
asynchronous mode, which is used for transferring data that is not time critical. The packets can
.c
be interleaved on the USB with others being sent to or from other devices. The other main format
w
is isochronous mode, used to transfer data that is time critical such as audio data to digital
w
speakers, or to/from a modem. These packets must not be delayed by those from other devices.
w
The two other data formats are interrupt format, used by devices to request servicing from the
PC/host, and control format, used by the PC/host to send token packets to control bus operation,
and by all devices to send handshake packets to indicate whether the data they have just received
was OK (ACK) or had errors (NAK). Some of the data formats are illustrated in Fig.17.5. Note
that all data packets begin with a sync byte (01hex), used to synchronize the PLL (phase-locked
loop) in the receiving devices USB controller. This is followed by the packet identifier (PID),
containing a four-bit nibble (sent in both normal and inverted form) which indicates the type of
data and the direction it is going in (i.e., to or from the host). Token packets then have the 7-bit
address of the destination device and a 4-bit end point field to indicate which of that devices
registers its to be sent to. On the other hand data packets have a data field of up to 1023 bytes of
data following the PID field, while Start of Frame (SOF) packets have an 11-bit frame identifier
instead and handshake packets have no other field. Most packets end with a cyclic redundancy
check (CRC) field of either five or 16 bits, for error checking, except handshake packets which
rely on the redundancy in the PID field. All USB data is sent serially, of course, and least-
significant-bit (LSB) first. Luckily all of the fine details of USB handshaking and data transfer
are looked after by the driver software in the host and the firmware built into the USB controller
inside each USB peripheral device and hub
2 1
1 2 3 4
3 4
Type A socket
(from front)
Pin connections
Pin No. Signal
1 + 5V Power
2 - Data
o m
3
4
+ Data
Ground
o t.c
p
s as viewed from the front.
g
Fig. 17.3 Pin connections for the two different types of USB socket,
b lo
p .
o u
gr
ts
e n
u d
st
Fig. 17.4 Most
i ty plugs have this distinctive marking symbol.
USB
.c
w
w
w
SYNC PID
00000001 xxxx,xxxx
Handshake packets
Packet Identifier Nibble Codes:
OUTPUT = 0001
INPUT = 1001 Tokens
SET UP = 1101
o m
DATA0 = 0011
o t.c
Data
DATA1 = 1011
sp
ACK = 0010
o g
= l1010
NAK
b
p.= 1110
Hankshake
STALL
o u
Fig. 17.5 Examples of the various kindsr of USB signaling and data packets.
s g
n t
17(II) IrDA Standard
d e
t u
s
ti y
.c
w
w
w
IrDA is the abbreviation for the Infrared Data Association, a nonprofit organization for setting
standards in IR serial computer connections.
The transmission in an IrDAcompatible mode (sometimes called SIR for serial IR) uses, in the
simplest case, the RS232 port, a builtin standard of all compatible PCs. With a simple interface,
shortening the bit length to a maximum of 3/16 of its original length for powersaving
requirements, an infrared emitting diode is driven to transmit an optical signal to the receiver.
This type of transmission covers the data range up to115.2 kbit/s which is the maximum data rate
supported by standard UARTs (Fig.17.7). The minimum demand for transmission speed for
IrDA is only 9600 bit/s. All transmissions must be started at this frequency to enable
compatibility. Higher speeds are a matter of negotiation of the ports after establishing the links.
IR output
Pulse shaping Transmitter
o t.c
p
Fig. 17.7 One end of the over all serial link.
s
o g
bl
Please browse www.irda.org for details
p .
o
Serial Port Infrared Receiver u IR RXR MODULE
TSOP1838
gr
78L05ts
e n
1 t ud
6
y s
it
.c
w
w
9
5
w
MAX 232
7805- is a voltage regulator which supplies 5V to the MAX232 the Level converter. It converts
the signal which is at 5V and Ground to 12V compatible with RS232 standard.
3
VS
Control
Input
Circuit
1
OUT
PIN
Band Demodu-
AGC
Pass lator
2
GND
.c
XTAL2 PA[0:7]
wLFT
w PD[0:6]
TIMER/
COUNTER VCC[1,2,A]
RSTN
ROM VSS[1,2,A]
TEST AND VOLTAGE
SRAM REGULATORS V33[1,2,A]
The architecture of a typical microcontroller from Atmel with an on-chip USB controller
Q.2 Draw the circuit diagram for interfacing an IrDA receiver with a typical microcontroller
Ans:
330 *) +5V**)
3
TSOP18..
>10 K
recomm.
1 C
2
GND
A typical application circuit The Receiver Interface to a Microcontroller
o m
Further Reference
o t.c
1. www.usb.org
s p
2. www.irda.org
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
18
AD and DA Converters
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Pre-Requisite o m
Digital Electronics, Microprocessors o t.c
s p
18 Introduction o g
. bl
u p
The real time embedded controller is expected to process the real world signals within a
specified time. Most of the real world signals are analog in nature. Take the examples of your
r o
mobile phone. The overall architecture is shown on Fig.18.1. The Digital Signal Processor (DSP)
s g
is fed with the analog data from the microphone. It also receives the digital signals after
ent
demodulation from the RF receiver and generates the filtered and noise free analog signal
through the speaker. All the processing is done in real time.
u d
The processing of signals in real time is termed as Real Time Signal Processing which has
st
been coined beautifully in the Signal Processing industry.
it y
.c
wRF receiver (Rx)
w
w DSP
Speaker
Antenna
RF transmitter (Tx) Microphone
Display
Micro-
controller
Keyboard
Signal Processing
Analog Processing
Analog Processing
Digital Processing
Analog Processor
DSP DAC
LPF
DAC
b bits
Decoder Sample/hold
[yb(n)] y(n)
The DA Converter
In theory, the simplest method for digital-to-analog conversion is to pull the samples from
memory and convert them into an impulse train.
3
a. Impulse train
2
1
Amplitude
-1
o m
-2
o t.c
-3 s p
0 1 2 3
o g 4 5
Time
. bl
p
Fig. 18.4(a) The analog equivalent of digital words
u
r o
3 s g
ent
c. Zeroth-order hold
2
u d
st
1
t y
Amplitude
i
0 .c
w
-1 w
w
-2
-3
0 1 2 3 4 5
Time
Fig. 18.4(b) The analog voltage after zero-order hold
3
f. Reconstructed analog signal
2
1
Amplitude
0
-1
-2
-3
0 1 2 3 4
o m5
t.c
Time
o
Fig. 18.4(c) The reconstructed analog signal after filtering
p
A digital word (8-bits or 16-bits) can be converted to itssanalog equivalent by weighted
averaging. Fig. 18.5(a) shows the weighted averaging methodofor g a 3-bit converter.
b l V or to a common ground. Only
.
A switch connects an input either to a common voltage
p current
switches currently connected to the voltage source contribute to the non-inverting input
u
summing node. The output voltage is given by the expression drawn below the circuit diagram;
o to ground. There are eight possible
SX = 1 if switch X connects to V, SX = 0 if itrconnects
s
combinations of connections for the three switches,g and these are indicated in the columns of the
n t
table to the right of the diagram. Each combination is associated with a decimal integer as
shown. The inputs are weighted in a 4:2:1 e relationship, so that the sequence of values for 4S3 +
2S2 + S1 form a binary-coded decimaldnumber representation. The magnitude of Vo varies in
units (steps) of (Rf/4R)V from 0 ttou 7. This circuit provides a simplified Digital to Analog
y s controls the switches, and the amplifier provides the analog
t
Converter (DAC). The digital input
i
output.
.c
w
w
w
V S3 S2 S1
R 0 0
0 0
S3
Rf 1 0 0 1
2R
S2 2 0 1 0
-
+ 3 0 1 1
4R V0
S1 4 1 0 0
5 1 0 1
(
V0 = -R f S3 V + S2 V + S1 V
R 2R 4R ) 6 1 1 0
-R
= f V(4S3 + 2S2 + S1)
7 1 1
o m 1
4R
Fig. 18.5(a) The binary weighted register method o t.c
s p
o g
bl
V Rf
S3
2R p . 2R
o u
gr -
+
S2
s V0
nt 2R
R
d e
S1 u
V 1
st 2R R 1(S3) = 3R 2 S3
i t y 1(S2) = V 1 S2
.c 3R 4
w 2R
1(S1) = V 1 S1
w 3R 8
w(
V0 = -R f S3 V 1 + S2 V 1 + S1 V 1 )
3R 2 3R 4 3R 8
= - R f V (4S3 + 2S2 + S1)
24R
Fig. 18.5(b) R-2R ladder D-A conversion circuit
Fig. 18.5(b) depicts the R-2R ladder network. The disadvantage of the binary weighted register is
the availability and manufacturing of exact values of the resistances. Here also the output is
proportional to the binary-coded decimal number.
The output of the above circuits as given in Fig. 18.5(a) and 18.5(b) is equivalent analog
values as shown in Fig. 18.4(a). However to reconstruct the original signal this is further passed
through a zero order hold (ZOH) circuit followed by a filter (Fig.18.2). The reconstructed
waveforms are shown in Fig. 18.4(b) and 18.4(c).
Version 2 EE IIT, Kharagpur 7
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
The AD Converter
The ADC consists of a sampler, quantizer and a coder. Each of them is explained below.
Sampler
The sampler in the simplest form is a semiconductor switch as shown below. It is followed by a
hold circuit which a capacitor with a very low leakage path.
Semiconductor
Switch
Analog signal Sampled signal
Capacitor
o m
Control signal
o t.c
s p
1
o g
0.8
.bl
0.6
u p
0.4 r o
s g
nt
0.2
0
0
d e
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
t u
s
Fig. 18.6 The Sample and Hold Circuit
y
i t
.c
w
w
w
Analog Signal
2
1.5
Amplitude 1
0.5
0
-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4
time(ms)
Sampled Signal after the capacitor
2
1.5
Amplitude
1
0.5 o m
0
o t.c
-0.5
0 0.5 1 1.5 2 2.5 p
s3 3.5 4
o g time(ms)
Fig. 18.7 Sample and Holdl Signals
. b
Quantizer
u p
r o till the next switching. The quantizer is
The hold circuit tries to maintain a constant voltage
g
responsible to convert this voltage to a binarysnumber. The number of bits in a binary number
decides the approximation and accuracy. n
t
The sample hand hold output can assumeeany real number in a given range. However because of
u dpossible in the digital domain 0 to 2 which corresponds N-1
i ty
3.025
.b.c Sampled analog signal
w
3.020w
w
Amplitude (in volts)
3.015
3.010
3.005
3.000
0 5 10 15 20 25 30 35 40 45 50
Time
Fig. 18.8(a) Hold Circuit Output
Version 2 EE IIT, Kharagpur 9
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
3025
c. Digitized signal
3020
3010
3005
3000
0 5 10 15 20 25 30 35 40 45 o50m
Sample number
o t.c
Fig. 18.8(b) The Quantized Value
s p
o g
bl
Coder
p .
ou
This is an optional device which is used after the conversion is complete. In microprocessor
based systems the Coder is responsible for packing several samples and transmitting them
gr
onwards either in synchronous or in asynchronous manner. For example in TI DSK kits you will
s
nt
find the AD converters with CODECs are interfaced to McBSP ports (short form of Multi-
channel Buffered Serial Ports). Several 16-bit sampled values are packed into a frame and
d e
transmitted to the processor or to the memory by Direct Memory Access (DMA). The Coder is
t u
responsible for controlling the ADC and transferring the Data quickly for processing. Sometimes
s
the Codec is responsible for compressing several samples together and transmitting them. In your
y
i t
desktop computers you will find audio interfaces which can digitize and record your voice and
.c
store them in .wav format. Basically this AD conversion followed by coding. The wav format is
w
the Pulse-Code-Modulated (PCM) format of the original digital voice samples.
w
The SamplingwTheorem
The definition of proper sampling is quite simple. Suppose you sample a continuous signal in
some manner. If you can exactly reconstruct the analog signal from the samples, you must have
done the sampling properly. Even if the sampled data appears confusing or incomplete, the key
information has been captured if you can reverse the process. Fig.18.9 shows several sinusoids
before and after digitization. The continuous line represents the analog signal entering the ADC,
while the square markers are the digital signal leaving the ADC. In (a), the analog signal is a
constant DC value, a cosine wave of zero frequency. Since the analog signal is a series of
straight lines between each of the samples, all of the information needed to reconstruct the
analog signal is contained in the digital data. According to our definition, this is proper sampling.
The sine wave shown in (b) has a frequency of 0.09 of the sampling rate. This might represent,
for example, a 90cycle/second sine wave being sampled at1000 samples/second. Expressed in
another way, there are 11.1 samples taken over each complete cycle of the sinusoid. This
situation is more complicated than the previous case, because the analog signal cannot be
reconstructed by simply drawing straight lines between the data points. Do these samples
properly represent the analog signal? The answer is yes, because no other sinusoid, or
combination of sinusoids, will produce this pattern of samples (within the reasonable constraints
listed below). These samples correspond to only one analog signal, and therefore the analog
signal can be exactly reconstructed. Again, an instance of proper sampling. In (c), the situation is
made more difficult by increasing the sine wave's frequency to 0.31 of the sampling rate. This
results in only 3.2 samples per sine wave cycle. Here the samples are so sparse that they don't
even appear to follow the general trend of the analog signal. Do these samples properly represent
the analog waveform? Again, the answer is yes, and for exactly the same reason. The samples
are a unique representation of the analog signal. All of the information needed to reconstruct the
continuous waveform is contained in the digital data. Obviously, it must be more sophisticated
than just drawing straight lines between the data points. As strange as it seems, this is proper
sampling according to our definition. In (d), the analog frequency is pushed even higher to 0.95
o m
of the sampling rate, with a mere 1.05 samples per sine wave cycle. Do these samples properly
o t.c
represent the data? No, they don't! The samples represent a different sine wave from the one
contained in the analog signal. In particular, the original sine wave of 0.95 frequency
p
misrepresents itself as a sine wave of 0.05 frequency in the digital signal. This phenomenon of
s
g
sinusoids changing frequency during sampling is called aliasing. Just as a criminal might take on
o
bl
an assumed name or identity (an alias), the sinusoid assumes another frequency that is not its
p .
own. Since the digital data is no longer uniquely related to a particular analog signal, an
unambiguous reconstruction is impossible. There is nothing in the sampled data to suggest that
u
the original analog signal had a frequency of 0.95 rather than 0.05. The sine wave has hidden its
o
r
true identity completely; the perfect crime has been committed! According to our definition, this
g
s
is an example of improper sampling. This line of reasoning leads to a milestone in DSP, the
ent
sampling theorem. Frequently this is called the Shannon sampling theorem, or the Nyquist
Sampling theorem, after the authors of 1940s papers on the topic. The sampling theorem
u d
indicates that a continuous signal can be properly sampled, only if it does not contain frequency
st
components above one-half of the sampling rate. For instance, a sampling rate of 2,000
t y
samples/second requires the analog signal to be composed of frequencies below 1000
i
.c
cycles/second. If frequencies above this limit are present in the signal, they will be aliased to
frequencies between 0 and 1000 cycles/second, combining with whatever information that was
w
legitimately there.
w
w
3 3
a. Analog frequency = 0.0 (i.e., DC) b. Analog frequency = 0.09 of sampling rate
2 2
Amplitude 1 1
Amplitude
0 0
-1 -1
-2 -2
-3 -3
Time (or sample number) Time (or sample number)
3 3
c. Analog frequency = 0.31 of sampling rate d. Analog frequency = 0.95 of sampling rate
2 2
o m
t.c
1 1
Amplitude
Amplitude
0 0
p o
-1 -1
g s
o
-2 -2
. bl
-3
Time (or sample number) u p-3
Time (or sample number)
r owave at different frequencies
g
Fig. 18.9 Sampling a sine
s
n t
Methods of AD Conversion e
u d
st to digital equivalent at the quantizer. There are various
The analog voltage samples are converted
i t
ways to convert the analog values y to the nearest finite length digital word. Some of these
methods are explained below.c
.
w
w
w
0 V1 V2 V-
R3 4
2 - 1 V
V-
0 V2 OG1
V4 OG2 6 DAC
V+
R4 V+
3 + 7 5
0 U1 7
0 V3 uA741
V+ 3
+
V+ o
m 5 ADC
.c 6
V+ V-
2 - V-o tOG2
OG1
- V4 - V7 p
s 4 U2
1
G
o g
- - V8
b l 0
uA741
0 p
0
.
o u
r
Fig. 18.10 The Counter Converter
g DA conversion. The 3-bit input as shown in
ts
The AD conversion is indirectly carried out through
Fig.18.10 to the DA converter may changensequentially from 000 to 111 by a 3-bit counter. The
unknown voltage (V8) is applied to one e
d inputwhich
the unknown voltage the comparatoruoutput
of the comparator. When the DA output exceeds
t was negative becomes positive. This can be
s is approximately equivalent digital value of the unknown
used to latch the counter value which
voltage.
it y
c
. counting is the time taken to reach the highest count is large. For
whas to count 256 for converting the maximum input. It therefore has
The draw back of sequential
w
instance an 8-bit converter
to consume 256 clockw cycles which is large. Therefore, a different method called successive
approximation is used for counting as shown in Fig.18.11.
100
110 010
000
.c are needed. The SAR ADC is perhaps the most common of the
DAC output states will be eliminated. Instead of having to step through 2N states for an N bit
conversion only N comparisons
w rapid and relatively inexpensive conversion
w
converters, providing a relatively
Flash Converter
w
Making all the comparisons between the digital states and the analog signal concurrently makes
for a fast conversion cycle. A resistive voltage divider (see figure) can provide all the digital
reference states required. There are eight reference values (including zero) for the three-bit
converter illustrated. Note that the voltage reference states are offset so that they are midway
between reference step values. The analog signal is compared concurrently with each reference
state; therefore a separate comparator is required for each comparison. Digital logic then
combines the several comparator outputs to determine the appropriate binary code to present.
V0 Analog
input
3R
2 111
6.5 Vo
8
R
110
5.5 Vo MSB
8
R 22
101
4.5 Vo
8
R 21
100 LSB
3.5 Vo
8
o m
t.c
R 20
2.5 Vo
011
p o
8
g s
o
bl
R
010
1.5 Vo
8
p .
R
o u
0.5 Vo
8
001
r
g 3-bit flash converter
s
nt
R
2
d e
t uFig. 18.12 Flash Converter
y s
t
ci converters
Sigma-Delta () AD
.
w
w
The analog side of a sigma-delta converter (a 1-bit ADC) is very simple. The digital side, which
w
is what makes the sigma-delta ADC inexpensive to produce, is more complex. It performs
filtering and decimation. The concepts of over-sampling, noise shaping, digital filtering, and
decimation are used to make a sigma-delta ADC.
Over-sampling
First, consider the frequency-domain transfer function of a traditional multi-bit ADC with a sine-
wave input signal. This input is sampled at a frequency Fs. According to Nyquist theory, Fs must
be at least twice the bandwidth of the input signal. When observing the result of an FFT analysis
on the digital output, we see a single tone and lots of random noise extending from DC to Fs/2
(Fig.18.13). Known as quantization noise, this effect results from the following consideration:
the ADC input is a continuous signal with an infinite number of possible states, but the digital
output is a discrete function, whose number of different states is determined by the converter's
resolution. So, the conversion from analog to digital loses some information and introduces some
distortion into the signal. The magnitude of this error is random, with values up to LSB.
o m
o t.c
s p
o g
. bl
p
Fig. 18.13 FFT diagram of a multi-bit ADC with a sampling frequency FS
u
r o
If we divide the fundamental amplitude by the RMS sum of all the frequencies representing
s g
noise, we obtain the signal to noise ratio (SNR). For an N-bit ADC, SNR = 6.02N + 1.76dB. To
ent
improve the SNR in a conventional ADC (and consequently the accuracy of signal reproduction)
you must increase the number of bits. Consider again the above example, but with a sampling
d
frequency increased by the oversampling ratio k, to kFs (Fig.18.14). An FFT analysis shows that
u
t
the noise floor has dropped. SNR is the same as before, but the noise energy has been spread
s
it y
over a wider frequency range. Sigma-delta converters exploit this effect by following the 1-bit
.c
ADC with a digital filter (Fig.18.14). The RMS noise is less, because most of the noise passes
through the digital filter. This action enables sigma-delta converters to achieve wide dynamic
w
range from a low-resolution ADC.
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
Fig. 18.14 FFT diagram of a multi-bit ADC with a sampling frequency kFS and effect of
Digital Filter on Noise Bandwidth
st
it y
Noise Shaping .c
w
w
It includes a difference amplifier, an integrator, and a comparator with feedback loop that
w
contains a 1-bit DAC. (This DAC is simply a switch that connects the negative input of the
difference amplifier to a positive or a negative reference voltage.) The purpose of the feedback
DAC is to maintain the average output of the integrator near the comparator's reference level.
The density of "ones" at the modulator output is proportional to the input signal. For an
increasing input the comparator generates a greater number of "ones," and vice versa for a
decreasing input. By summing the error voltage, the integrator acts as a lowpass filter to the input
signal and a highpass filter to the quantization noise. Thus, most of the quantization noise is
pushed into higher frequencies. Oversampling has changed not the total noise power, but its
distribution. If we apply a digital filter to the noise-shaped delta-sigma modulator, it removes
more noise than does simple oversampling.(Fig.18.16).
Signal Input, X1
+ X2 X3 To Digital
+ X4 Filter
-
Difference Integrator -
Amp Comparator
(1-bit ADC)
X5
(1-bit ADC)
o m
o t.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 18.16 The Effect of Integrator and Digital Filter on the Spectrum
Digital Filtering
The output of the sigma-delta modulator is a 1-bit data stream at the sampling rate, which can be
in the megahertz range. The purpose of the digital-and-decimation filter (Fig.18.17) is to extract
o m
information from this data stream and reduce the data rate to a more useful value. In a sigma-
delta ADC, the digital filter averages the 1-bit data stream, improves the ADC resolution, and
s p
Conclusion
o g
. bl
u p
In this chapter you have learnt about the basics of Real Time Signal Processing, DA and AD
conversion methods. Some microcontrollers are already equipped with DA and AD converters
r o
on the same chip. Generally the real world signals are broad band. For instance a triangular wave
g
though periodic will have frequencies ranging till infinite. Therefore anti-aliasing filter is always
s
ent
desirable before AD conversion. This limits the signal bandwidth and hence finite sampling
frequency. The question answer session shall discuss about the quantization error, specifications
d
of the AD and DA converters and errors at the various stages of real time signal processing. The
u
t
details of interfacing shall be discussed in the next lesson.
s
it y
The AD and DA converter fall under mixed VLSI circuits. The digital and analog circuits
coexist on the same chip. This poses design difficulties for VLSI engineers for embedding fast
.c
and high resolution AD converters along with the processors. Sigma-Delta ADCs are most
w
complex and hence rarely found embedded on microcontrollers.
w
w
Question Answers
Q1. What are the errors at different stages in a Real Time Signal Processing system? Elaborate
on the quantization error.
Ans: No. of bits (8-bits, 16-bits etc), Settling Time, Power Supply range, Power
Consumption, Various Temperature ratings, Packaging
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
19
Analog Interfacing
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Pre-Requisite
Digital Electronics, Microprocessors
19(I) Introduction o m
o t.c
Fig.19.1 shows a typical sensor network. You will find a number of sensors and actuators
s p
connected to a common bus to share information and derive a collective decision. This is a
g
complex embedded system. Digital camera falls under such a system. Only the analog signals are
o
. bl
shown here. Last lesson discussed in detail about the AD and DA conversion methods. This
chapter shall discuss the inbuilt AD-DA converter and standalone converters and their
interfacing.
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Fig. 19.2 The Analog-Digital-Analog signal path with real time processing
r o
19(II) Embedded AD Converters s g in Intel 80196
n t
de converter inbuilt to 80196 embedded processor. The
Fig.19.3 shows the block diagram of the AD
details of the subsystems are given asufollows:
t
Analog Inputs ys
c it EPA or PTS
. V ANGND Command
w REF
w
Analog Mux
w Successive
Control
Logic
Sample Approximation
and Hold A/D
Converter
Status
ANGND: It is the analog ground which is separately connected to the circuit from where analog
voltage is brought inside the processor.
Vref: It is reference voltage which decides the range of the input voltage. By making it negative
bipolar inputs can be used.
d
transfer bytes or words, either individually
e or in blocks, between any memory locations; manage
complete specific tasks in lesser time than an equivalent interrupt service routine can. It can
t u
multiple analog-to-digital (A/D) conversions; and transmit and receive serial data in either
s
ty
asynchronous or synchronous mode.
i
Analog Mux: Analog .cMultiplexer
w
w
w
It selects a particular analog channel for conversion. Only after completing conversion of one
channel it switches to subsequent channels.
AD_RESULT
For an A/D conversion, the high byte contains the eight MSBs from the conversion, while the
low byte contains the two LSBs from a 10- bit conversion (undefined for an 8-bit conversion),
indicates which A/D channel was used, and indicates whether the channel is idle. For a
threshold-detection, calculate the value for the successive approximation register and write that
value to the high byte of AD_RESULT. Clear the low byte or leave it in its default state.
the analog input, performing a binary search for the reference voltage that most closely matches
the input. The full scale reference voltage is the first tested. This corresponds to a 10-bit result
where the most-significant bit is zero and all other bits are ones (0111111111). If the analog
input was less than the test voltage, bit 10 of the SAR is left at zero, and a new test voltage of
full scale (0011111111) is tried. If the analog input was greater than the test voltage, bit 9 of
SAR is set. Bit 8 is then cleared for the next test (0101111111). This binary search continues
until 10 (or 8) tests have occurred, at which time the valid conversion result resides in the
AD_RESULT register where it can be read by software. The result is equal to the ratio of the
input voltage divided by the analog supply voltage. If the ratio is 1.00, the result will be all ones.
The following A/D converter parameters are programmable:
conversion input input channel
zero-offset adjustment no adjustment, plus 2.5 mV, minus 2.5 mV, or minus 5.0 mV
conversion times sample window time and conversion time for each bit
operating mode 8- or 10-bit conversion or 8-bit high or low threshold detection
conversion trigger immediate or EPA starts
o m
19(III) The External AD Converters (AD0809)t.c
p o
s
START CLOCK
g
b lo
8-BIT A/D
o u TIMING CONVERSION
(INTERRUPT)
8
CHANNELS
gr
8 ANALOG
MULTIPLE-
XING
ts S.A.R
INPUTS
ANALOG
SWITCHES
e n TRI-
u d COMPARATOR
STATE
st OUTPUT
LATCH
8-BIT
OUTPUTS
i ty BUFFER
.c SWITCH TREE
w LATCH
AND
ADDRESS
LATCH ENABLE w DECODER 256R REGISTOR
LADDER
IN3 1 28 IN2
IN4 2 27 IN1
IN5 3 26 IN0
IN6 4 25 ADD A
IN7 5 24 ADD B
START 6 23 ADD C
EOC 7 22 ALE
-5
2 8 21 2-1MSB
OUTPUT ENABLE 9 20 2-2
CLOCK 10 19 2-3
VCC 11 18 2-4
VREF (+) 12 17 2-8LSB
o m
V (-) c
GND
-7
2
13
14
16
15
REF
-6
2 o t.
s p
g
Fig. 19.5 The signals of 0809loAD converter
. b
Functional Description
u p
r o
Multiplexer
s g
n t analog signal multiplexer. A particular input
The device contains an 8-channel single-ended
channel is selected by using the address d edecoder. Table 1 shows the input states for the address
t
lines to select any channel. The address u is latched into the decoder on the low-to-high transition
of the address latch enable signal. s
i ty
.c TABLE 1
w
SELECTED ANALOG ADDRESS LINE
w CHANNEL C B A
w IN0 L L L
IN1 L L H
IN2 L H L
IN3 L H H
IN4 H L L
IN5 H L H
IN6 H H L
IN7 H H H
The Converter
This 8-bit converter is partitioned into 3 major sections: the 256R ladder network, the successive
approximation register, and the comparator. The converters digital outputs are positive true. The
256R ladder network approach (Figure 1) was chosen over the conventional R/2R ladder because
of its inherent monotonicity, which guarantees no missing digital codes. Monotonicity is
particularly important in closed loop feedback control systems. A non-monotonic relationship
can cause oscillations that will be catastrophic for the system. Additionally, the 256R network
does not cause load variations on the reference voltage.
1 R
o m
t.c
256 R TO
COMPARATOR
R
p o INPUT
g s
o
bl
R
p .
R
o u
g r
REF(-)
ts
Fig. 19.6 Then 256R ladder network
d e
t
The bottom resistor and the top resistoru of the ladder network in Fig.19.6 are not the same value
as the remainder of the network. The
y s difference in these resistors causes the output characteristic
to be symmetrical with the zero
c it signal
and full-scale points of the transfer curve. The first output
.
transition occur when the analog has reached +12 LSB and succeeding output transitions
w tothefull-scale.
occur every 1 LSB later up The successive approximation register (SAR) performs
w
8-iterations to approximate input voltage. For any SAR type converter, n-iterations are
required for an n-bitwconverter. Fig.19.7 shows a typical example of a 3-bit converter. The A/D
converters successive approximation register (SAR) is reset on the positive edge of the start
conversion (SC) pulse. The conversion is begun on the falling edge of the start conversion pulse.
A conversion in process will be interrupted by receipt of a new start conversion pulse.
Continuous conversion may be accomplished by tying the end-of-conversion (EOC) output to the
SC input. If used in this mode, an external start conversion pulse should be applied after power
up. End-of-conversion will go low between 0 and 8 clock pulses after the rising edge of start
conversion. The most important section of the A/D converter is the comparator. It is this section
which is responsible for the ultimate accuracy of the entire converter.
ti y
5.000V VREF (+) E0C INTERRUPT
DECODE
(AD4 AD15)* 0.000V VREF (-)
w START 2 -2
DB6
w WRITE
AD0 A
2-4
2-5
DB4
DB3
ADC0808 -6
AD1 B ADC0809 2 DB2
-7
AD2 C 2 DB1
2-8 DB0 LSB
5V SUPPLY
In0 VIN 1
o g
bl
p.
MSB LSB
A1 A2 A3 A4 A5 A6 A7 A8
o u
g r
RANGE
CURRENTts SWITCHES I0
CONTROL
e n
u d
st LADDER
R-2R BIAS CIRCUIT GND
ity
.c
VREF (+)
w
w NPN CURRENT VCC
REFERENCE COMPEN
CURRENT AMP
VEE
NC (NOTE 2) 1 16
COMPENSATION
GND 2 15
VREF(-)
VEE 3 14
VREF(+)
I0 4 13
VCC
DAC0808
MSB A1 5 12
A8 LSB
A2 6 11
A7
A3 7 10
A6
o m
A4 8 9
t.c
A5
p o
g s
o
bl
Fig. 19.9 The DAC 0808 Signals
p .
The pins are labeled A1 through A8, but note that A1 is the Most Significant Bit, and A8 is the
u
Least Significant Bit (the opposite of the normal convention). The D/A converter has an output
o
r
current, instead of an output voltage. An op-amp converts the current to a voltage. The output
g
s
current from pin 4 ranges between 0 (when the inputs are all 0) to Imax*255/256 when all the
ent
inputs are 1. The current, Imax, is determined by the current into pin 14 (which is at 0 volts).
Since we are using 8 bits, the maximum value is Imax*255/256. The output of the D/A converter
u d
takes some time to settle. Therefore there should be a small delay before sending the next data to
st
the DA. However this delay is very small compared to the conversion time of an AD Converter,
t y
therefore, does not matter in most real time signal processing platforms. Fig.19.10 shows a
i
typical interface.
.c
w
w
w
VCC = 5V
13
5 14 5.000k
MSBA1 10.000V = VREF
6
A2 5k
7 15
A3 5.000k
8 2
DIGITAL A4
9 DAC0808
INPUTS A5
10 4
A6 -
11
A7
12 16 LF351 V0
LSB A8 OUTPUT
3 0.1 F +
o m
o t.c
VEE = -15V
s p
g
Fig. 19.10 Typical connection of DAC0808
o
b l voltage converter. The 8-digital
.
LF351 is an operational amplifier used as current to proportional
inputs at A8-A1 is converted into proportional currentpat pin no.4 of the DAC. The reference
voltages(10V) are supplied at pin 14 and 15(grounded
connected across the Compensation pin 16 and rthe ounegativethrough resistance). A capacitor is
supply to bypass high frequency
noise.
s g
Important Specifications
n t
0.19% Error
d e
Settling time: 150 ns
t u
Slew rate: 8 mA/s
y sto 18V
Power supply voltage range: 4.5V
Power consumption: 33 mW @i 5V
t
.c
w
19(V) w
Conclusion
w
In this lesson you learnt about the following
The internal AD converters of 80196 family of processor
The external microprocessor compatible AD0809 converter
A typical 8-bit DA Converter
Both the ADCs use successive approximation technique. Flash ADCs are complex and therefore
generate difficult VLSI circuits unsuitable for coexistence on the same chip. Sigma-Delta need
very high sampling rate.
Question Answers
Q.1. What are the possible errors in a system as shown in Fig. 19.2?
Ans:
Stage-1 Signal Amplification and Conditioning This can also amplify the noise.
Stage-2 Anti-aliasing Filter Some useful information such as transients in the real systems
cannot be captured.
Stage-3 Sample and Hold The leakage and electromagnetic interference due to switching
Stage-4 Analog to Digital Converter Quantization error due to finite bit length
o m
Stage-5 Digital Processing and Data manipulation in a Processor: Numerical round up errors due
to finite word length and the delay caused by the algorithm.
o t.c
s p
Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion: Error
in reconstruction due to zero-order approximation
o g
. bl
u p
Q.2 Why it is necessary to separate the digital ground from analog ground in a typical ADC?
r o
g
Ans: Digital circuit noise can get to analogue signal path if separate grounding systems are not
s
nt
used for digital and analogue parts. Digital grounds are invariably noisier than analog grounds
because of the switching noise generated in digital chips when they change state. For large
d e
current transients, PCB trace inductances causes voltage drops between various ground points on
t u
the board (ground bounce). Ground bounce translates into varying voltage level bounce on signal
s
lines. For digital lines this isn't a problem unless one crosses a logic threshold. For analog it's just
y
it
plain noise to be added to the signals.
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
20
Field Programmable Gate
Arrays and Applications
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student will be able to
Introduction
o m
t.c
An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry.
When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware
p o
implementation of the software application. Unlike processors, FPGAs use dedicated hardware
s
for processing logic and do not have an operating system. FPGAs are truly parallel in nature so
g
o
different processing operations do not have to compete for the same resources. As a result, the
bl
performance of one part of the application is not affected when additional processing is added.
.
u p
Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based
control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by
r o
an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed
g
hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow
s
ent
reconfiguration after the control system is deployed to the field. FPGA devices deliver the
performance and reliability of dedicated hardware circuitry.
d
A single FPGA can replace thousands of discrete components by incorporating millions of logic
u
t
gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a
s
it y
matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in
Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches
and wire routes. .c
w
w
w
PROGRAMMABLE
INTERCONNECT I/O BLOCKS
o m
LOGIC BLOCKS o t.c
Fig. 20.1 Internal Structure s p
of FPGA
g
otwo-level
In an FPGA logic blocks are implemented using multiple level
b l low fan-in gates, which gives it a
more compact design compared to an implementation with
provides its user a way to configure: p . AND-OR logic. FPGA
o u
1. The intersection between the logic blocks and
g r
2. The function of each logic block.
ts
Logic block of an FPGA can be configured
n
eas that
in such a way that it can provide functionality as
u d
simple as that of transistor or as complex of a microprocessor. It can used to implement
st and sequential logic functions. Logic blocks of an
different combinations of combinational
FPGA can be implemented by any
i ty the following:
of
.c
1. Transistor pairs
w like basic NAND gates or XOR gates
w
2. combinational gates
w
3. n-input Lookup tables
4. Multiplexers
5. Wide fan-in And-OR structure.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via
electrically programmable switches. Density of logic block used in an FPGA depends on length
and number of wire segments used for routing. Number of segments used for interconnection
typically is a tradeoff between density of logic blocks used and amount of area used up for
routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.
Logic
block
I/O block
o m
.c
Fig. 20.2 Simplified Internal Structure oftFPGA
1. Cost of developmentw
w are two kinds of costs involved in development of custom ICs
increased design time. There
and design
2. Cost of manufacturew
(A tradeoff usually exists between the two costs)
Therefore the custom IC approach was only viable for products with very high volume, and
which were not time to market sensitive.FPGAs were introduced as an alternative to custom ICs
for implementing entire system on one chip and to provide flexibility of reporogramability to the
user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI
components (within around 10x of custom ICs). Another advantage of FPGAs over Custom ICs
is that with the help of computer aided design (CAD) tools circuits could be implemented in a
short amount of time (no physical layout process, no mask making, no IC manufacturing)
Evaluation of FPGA
In the world of digital electronic systems, there are three basic kinds of devices: memory,
microprocessors, and logic. Memory devices store random information such as the contents of a
product-terms.
o t.c
Therefore, the size of the AND matrix is twice the number of inputs times the number of
p
When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they
s
g
were expensive to manufacture and offered somewhat poor speed-performance. Both
o
bl
disadvantages were due to the two levels of configurable logic, because programmable logic
p .
planes were difficult to manufacture and introduced significant propagation delays. To overcome
these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide
u
only a single level of programmability, consisting of a programmable wired AND plane that
o
r
feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that
g
s
sequential circuits can be realized. These are often referred to as Simple Programmable Logic
nt
Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.
e
u d
Inputs st PLA
ity Inputs PAL
.c
w
w
Outputs
w
Outputs
With the advancement of technology, it has become possible to produce devices with
higher capacities than SPLDs.As chip densities increased, it was natural for the PLD
manufacturers to evolve their products into larger (logically, but not necessarily physically) parts
called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs
can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The
larger size of a CPLD allows to implement either more logic equations or a more complicated
design.
Logic Logic
block block
Switch
matrix
Logic Logic
block block
o m
Fig. 20.4 Internal structure of a CPLD
o t.c
p
Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown
s
g
there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than
o
bl
four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect
wiring, just like an ordinary PLD.
p .
Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD
u
may or may not be fully connected. In other words, some of the theoretically possible
o
r
connections between logic block outputs and inputs may not actually be supported within a given
g
s
CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult
ent
to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are
sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than
u d
PLDs, their potential uses are more varied. They are still sometimes used for simple applications
st
like address decoding, but more often contain high-performance control-logic or complex finite
t y
state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in
i
.c
potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs
whenever high-performance logic is required. Because of its less flexible internal architecture,
w
w
the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter.
w
The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This
is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of
logic density, the most features, and the highest performance. The largest FPGA now shipping,
part of the Xilinx Virtex line of devices, provides eight million "system gates" (the relative
density of logic). These advanced devices also offer features such as built-in hardwired
processors (such as the IBM Power PC), substantial amounts of memory, clock management
systems, and support for many of the latest, very fast device-to-device signaling technologies.
FPGAs are used in a wide variety of applications ranging from data processing and storage, to
instrumentation, telecommunications, and digital signal processing. The value of programmable
logic has always been its ability to shorten development cycles for electronic equipment
manufacturers and help them get their product to market faster. As PLD (Programmable Logic
Device) suppliers continue to integrate more functions inside their devices, reduce costs, and
increase the availability of time-saving IP cores, programmable logic is certain to expand its
popularity with digital designers.
Symmetrical arrays
This architecture consists of logic elements (called CLBs) arranged in rows and columns
of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix
is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input
Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tri-
state control, output transition speed. Interconnects provide routing path. Direct interconnects
between adjacent logic elements have smaller delay compared to general purpose interconnect
o m
t.c
Row based architecture
p o
Row based architecture shown in Fig 20.5 consists of alternating rows of logic modules
g s
and programmable interconnect tracks. Input output blocks is located in the periphery of the
o
bl
rows. One row may be connected to adjacent rows via vertical interconnect. Logic modules can
p .
be implemented in various combinations. Combinatorial modules contain only combinational
elements which Sequential modules contain both combinational elements along with flip flops.
u
This sequential module can implement complex combinatorial-sequential functions. Routing
o
r
tracks are divided into smaller segments connected by anti-fuse elements between them.
g
s
Hierarchical PLDs
ent
u d
st
This architecture is designed in hierarchical manner with top level containing only logic
blocks and interconnects. Each logic block contains number of logic modules. And each logic
it y
module has combinatorial as well as sequential functional elements. Each of these functional
.c
elements is controlled by the programmed memory. Communication between logic blocks is
w
achieved by programmable interconnect arrays. Input output blocks surround this scheme of
w
logic blocks and interconnects. This type of architecture is shown in Fig 20.6.
w
I/O Blocks
Logic
Block
Routing Rows
Channels
I/O Blocks
I/O Blocks
I/O Blocks
o m
Fig. 20.5 Row based Architecture
o t.c
s p
I/O Block
o gLogic
Module
.bl
u p
r o I/O Block
I/O Block
s g
ent
u d
st
ty
ci I/O Block Interconnects
.
w Fig. 20.6 Hierarchical PLD
w
w
FPGA Classification on user programmable switch technologies
FPGAs are based on an array of logic modules and a supply of uncommitted wires to
route signals. In gate arrays these wires are connected by a mask design during manufacture. In
FPGAs, however, these wires are connected by the user and therefore must use an electronic
device to connect them. Three types of devices have been commonly used to do this, pass
transistors controlled by an SRAM cell, a flash or EEPROM cell to pass the signal, or a direct
connect using antifuses. Each of these interconnect devices have their own advantages and
disadvantages. This has a major affect on the design, architecture, and performance of the FPGA.
Classification of FPGAs on user programmable switch technology is given in Fig. 20.7 shown
below.
FPGA
d e
look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time
when power is applied, needs an external memory to store program and require large area. Fig.
t u
20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor
y s
switches and to control the select lines of multiplexers that drive logic block inputs. The figures
it
gives an example of the connection of one logic block (represented by the AND-gate in the upper
.c
left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled
w
by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the
particular product. w
w
SRAM SRAM
o m
o t.c
Logic Cell p
sLogic Cell
o g
. bl
Fig. 20.8 SRAM-controlled Programmable Switches.
u p
Antifuse Based r o
s g
The antifuse based cell is the highestt density interconnect by being a true cross point.
Thus the designer has a much larger number e n of interconnects so logic modules can be smaller
d also has a much easier time. These devices however
and more efficient. Place and route software
u
are only one-time programmable and
s t therefore have to be thrown out every time a change is
made in the design. The Antifusey has an inherently low capacitance and resistance such that the
t The disadvantage of the antifuse is the requirement to
fastest parts are all Antifuse ibased.
c
. antifuses into the IC process, which means the process will always
integrate the fabrication of the
lag the SRAM process inwscaling. Antifuses are suitable for FPGAs because they can be built
using modified CMOSw technology. As an example, Actels antifuse structure is depicted in Fig.
w
20.9. The figure shows that an antifuse is positioned between two interconnect wires and
physically consists of three sandwiched layers: the top and bottom layers are conductors, and the
middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom
layers, but when programmed the insulator changes to become a low-resistance link. It uses
Poly-Si and n+ diffusion as conductors and ONO as an insulator, but other antifuses rely on
metal for conductors, with amorphous silicon as the middle layer.
oxide Poly-Si
wire
dielectric
wire
antifuse
n+ diffusion
Silicon substrate
EEPROM Based
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in
an SRAM cell or as a directly programmable switch. When used as a switch they can be very
m
efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile
o
t.c
so they do not require an extra PROM for loading. They, however, do have their detractions. The
EEPROM process is complicated and therefore also lags SRAM technology.
p o
Logic Block and Routing Techniques g s
lo
Crosspoint FPGA: consist of two types of logic blocks.bOne is transistor pair tiles in which
p .
transistor pairs run in parallel lines as shown in figure below:
o u
gr
ts
e n
d
u Pair
t
Transistor
s
ti y Transistor pair tiles in cross-point FPGA.
Fig. 20.10
c
second type of logic blocks. are RAM logic which can be used to implement random access
memory. w
w
Plessey FPGA: Basic w building block here is 2-input NAND gate which is connected to each
other to implement desired function.
Latch
8 interconnect
multiplexer
CLK
lines
8-2
Data
Config RAM
Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an
advantage in high percentage usage of logic blocks but they require large number of wire
segments and programmable switches which occupy lot of area.
Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be
used to implement different logic functions. For example a 2-input multiplexer with inputs a and
b, select, will implement function ac + bc. If b=0 then it will implement ac, and if a=0 it will
implement bc.
w 0
x 1
0
n1
y 0
1
o m
z 1 o t.c
n3 n4
s p
n2
o g
. bl Block
Fig. 20.12 Actel Logic
u
Typically an Actel logic block consists of multiple numberp of multiplexers and logic gates.
r o
Xilinx Logic block
s g
In Xilinx logic block Look up e nt is used to implement any number of different
table
functionality. The input lines go into d
lookup table gives the result of tthe u thelogic
input and enable of lookup table. The output of the
function that it implements. Lookup table is
s
implemented using SRAM.
i ty
.c
w
Data in
w M S
w U
X
R X
A
B Look-up Outputs
Inputs C Table
D Y
E
M S
U
X
R
Enable
clock Vix
Clock
Reset
Gnd OR
(Global Reset)
A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible
functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports
implementation of so many logic functions, however the disadvantage is unusually large number
of memory cells required to implement such a logic block in case number of inputs is large. Fig.
20.13 shows 5-input LUT based implementation of logic block LUT based design provides for
better logic block utilization. A k-input LUT based logic block can be implemented in number of
different ways with tradeoff between performance and logic density.
1
INPUTS 4-LUT FF OUTPUT
0
o m
o t.c
s p
g
4-input look up table
o
An n-lut can be shown as a direct implementation of a b l truth-table. Each of the latch
function
.
holds the value of the function corresponding to onepinput combination. For Example: 2-lut
shown in figure below implements 2 input AND and OR
o u functions.
gr2-lut
s
Example:
t
INPUTS
e n AND OR
u d0001 00 01
st 10 0 1
i ty 11 1 1
.c
Altera Logic Blockw
w
w
Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100
input) AND gates feeding into an OR gate with 3-8 inputs. The advantage of large fan in AND
gate based implementation is that few logic blocks can implement the entire functionality
thereby reducing the amount of area required by interconnects. On the other hand disadvantage is
the low density usage of logic blocks in a design that requires fewer input logic. Another
disadvantage is the use of pull up devices (AND gates) that consume static power. To improve
power manufacturers provide low power consuming logic blocks at the expense of delay. Such
logic blocks have gates with high threshold as a result they consume less power. Such logic
blocks can be used in non-critical paths.
Altera, Xilinx are coarse grain architecture.
Example: Alteras FLEX 8000 series consists of a three-level hierarchy. However, the lowest
level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so
the FLEX 8000 is categorized here as an FPGA. It should be noted, however, that FLEX 8000 is
Version 2 EE IIT, Kharagpur 14
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a
four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more
than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig.
20.14.
I/O
I/O
Fast Track
interconnect
o m
LAB.c
o t Elements &
(8 Logic
sp
local interconnect)
o g
.bl
p
Fig. 20.14 Architecture ofuAltera FLEX 8000 FPGAs.
The basic logic block, called a Logic Element g ro contains a four-input LUT, a flip-flop, and
(LE)
s The LE also includes cascade circuitry that
special-purpose carry circuitry for arithmetic tcircuits.
allows for efficient implementation of widenAND functions. Details of the LE are illustrated in
Fig. 20.15. d e
t u
y s
i t
.c
Cascade out
Cascade in
w
data1
w
data2
data3
w Look-up Cascade
S
DQ LE out
Table
data4 R
cntrl1 set/clear
cntrl2
cntrl3
clock
cntrl4
In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term
borrowed from Alteras CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect
and each local wire can connect any LE to any other LE within the same LAB. Local
interconnect also connects to the FLEX 8000s global interconnect, called FastTrack. All
FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are
more predictable than FPGAs that employ many smaller length segments because there are fewer
programmable switches in the longer path
From Fast Track
interconnect cntrl Cascade, carry
4 2
data
To Fast Track
4 LE interconnect
o m
t.c
To Fast Track
Local interconnect
LE
p o
interconnect
g s
o
. bl
LEu
p To Fast Track
r o interconnect
s g
nt
to adjacent LAB
d e
u
Fig. 20.16 Altera FLEX 8000 Logic Array Block (LAB).
t
y s
FPGA Design Flow it
.c
w
One of the most important advantages of FPGA based design is that users can design it
w
using CAD tools provided by design automation companies. Generic design flow of an FPGA
w
includes following steps:
System Design
At this stage designer has to decide what portion of his functionality has to be implemented on
FPGA and how to integrate that functionality with rest of the system.
Design Description
Designer describes design functionality either by using schematic editors or by using one of the
various Hardware Description Languages (HDLs) like Verilog or VHDL.
Synthesis
Once design has been defined CAD tools are used to implement the design on a given FPGA.
Synthesis includes generic optimization, slack optimizations, power optimizations followed by
placement and routing. Implementation includes Partition, Place and route. The output of design
implementation phase is bit-stream file.
Design Verification
o m
t.c andclockreports
Bit stream file is fed to a simulator which simulates the design functionality errors in
desired behavior of the design. Timing tools are used to determine maximum
the design. Now the design is loading onto the target FPGA device p oand testing is done in real
frequency of
environment.
g s
b lo
Hardware design and development
p .
u the embedded software development
The process of creating digital logic is notounlike
process. A description of the hardware's structure
g r and behavior is written in a high-level
hardware description language (usually VHDL
downloaded prior to execution. Of course,n
ts or Verilog) and that code is then compiled and
schematic capture is also an option for design entry,
but it has become less popular as designs e
d of hardware development for programmable logic is
have become more complex and the language-based
u
shown in Fig. 20.17 and describedsint the paragraphs that follow.
tools have improved. The overall process
i
Perhaps the most striking differencet y between hardware and software design is the way a
developer must think about c
. the problem. Software developers tend to think sequentially, even
when they are developingwa multithreaded application. The lines of source code that they write
are always executed inw that order, at least within a given thread. If there is an operating system it
is used to create the w
appearance of parallelism, but there is still just one execution engine. During
design entry, hardware designers must think-and program-in parallel. All of the input signals are
processed in parallel, as they travel through a set of execution engines-each one a series of
macrocells and interconnections-toward their destination output signals. Therefore, the
statements of a hardware description language create structures, all of which are "executed" at
the very same time.
Design Entry
Simulation
Design
Synthesis
Constraints
Things to Ponder
Q.1 Define the following acronyms as they apply to digital logic circuits:
ASIC
PAL
PLA
PLD
CPLD
FPGA
Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA,
m
etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are
o
t.c
there any applications where hard-wired logic would do a better job than a programmable
device?
p o
Q4.Some programmable logic devices (and PROM memory devices
g s as well) use tiny fuses
Programming a device by blowing tiny fuses inside ofloit carries certain advantages and
which are intentionally "blown" in specific patterns to represent the desired program.
. b
disadvantages - describe what some of these are.
u p
Q5. Use one 4 x 8 x 4 PLA to implement the function. r o
s g
F ( w, x, y, z ) = wx ' y ' z + wx ' yz '+ wxy 't
1
u d
st
i ty
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
21
Introduction to Hardware
Description Languages - I
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
Describe a digital IC design flow and explain its various abstraction levels.
Explain the need for a hardware description language in the IC desing flow
Model simple hardware devices at various levels of abstraction using Verilog
(Gate/Switch/Behavioral)
Write Verilog codes meeting the prescribed requirement at a specified level
1.1 Introduction
1.1.1 om
What is a HDL and where does Verilog come?
t.c
HDL is an abbreviation of Hardware Description Language. Any
p o aredigital system can be
represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs
RTL. Verilog is one such HDL and it is a general-purpose language g s easy to learn and use.
used to describe this
Its syntax is similar to C. The idea is to specify how the data lo flows between registers and how
the design processes the data. To define RTL, hierarchical b
. the digital
design concepts play a very
p
u of abstractiondesign
significant role. Hierarchical design methodology facilitates flow with several
levels of abstraction. Verilog HDL can utilize theseolevels to produce a simplified
and efficient representation of the RTL description g rof any digital design.
For example, an HDL might describets the layout of the wires, resistors and transistors on
n level or, it may describe the design at a more
an Integrated Circuit (IC) chip, i.e., the switch
e
u d
micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog
supports all of these levels.
st
1.1.2 Hierarchy of design i t y methodologies
.c
Bottom-Up Design w
w
w
The traditional method of electronic design is bottom-up (designing from transistors and moving
to a higher level of gates and, finally, the system). But with the increase in design complexity
traditional bottom-up designs have to give way to new structural, hierarchical design methods.
Top-Down Design
For HDL representation it is convenient and efficient to adapt this design-style. A real top-down
design allows early testing, fabrication technology independence, a structured system design and
offers many other advantages. But it is very difficult to follow a pure top-down design. Due to
this fact most designs are mix of both the methods, implementing some key elements of both
design styles.
Modules
A module is the basic building block in Verilog. It can be an element or a collection of low level
design blocks. Typically, elements are grouped into modules to provide common functionality
used in places of the design through its port interfaces, but hides the internal implementation.
Switch Level
This is the lowest level of abstraction. A module can be implemented in terms of switches,
storage nodes and interconnection between them.
However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a
design. RTL is frequently used for Verilog description that is a combination of behavioral and
dataflow while being acceptable for synthesis.
Instances
A module provides a template from where one can create objects. When a module is invoked
Verilog creates a unique object from the template, each having its own name, variables,
parameters and I/O interfaces. These are known as instances.
o m
1.1.5 The Design Flow
o t.c
p
This block diagram describes a typical design flow for the description of the digital design for
s
both ASIC and FPGA realizations.
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
RTL Coding
m
In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable
o
t.c
constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are
other choices.
p o
Simulation
g s
lo
. b
Simulation is the process of verifying the functional characteristics of models at any level of
u
meets the functional requirements of the specification,
psee if all the RTL blocks are functionally
abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code
r
correct. To achieve this we need to write testbench, owhich generates clk, reset and required test
vectors. A sample testbench for a counter is asgshown below. Normally, we spend 60-70% of
time in verification of design. ts
e n
u d
st
i ty
.c
w
w
w
We use waveform output from the simulator to see if the DUT (Device Under Test) is
functionally correct. Most of the simulators come with waveform viewer, as design becomes
complex, we write self checking testbench, where testbench applies the test vector, compares the
output of DUT with expected value.
There is another kind of simulation, called timing simulation, which is done after synthesis or
after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT
works at the rated clock speed. This is also called as SDF simulation or gate level simulation
Synthesis
o m
Synthesis is the process in which a synthesis tool like design compiler
t. c takes in the RTL in
Verilog or VHDL, target technology, and constrains as input and o maps the RTL to target
technology primitives. The synthesis tool after mapping the RTL topgates, also does the minimal
amount of timing analysis to see if the mapped design is meeting g s the timing requirements.
(Important thing to note is, synthesis tools are not aware of lo wire delays, they know only gate
.
delays). After the synthesis there are a couple of things thatbare normally done before passing the
netlist to backend (Place and Route)
u p
Verification: Check if the RTL to gate mappingr o is correct.
Scan insertion: Insert the scan chain instheg case of ASIC.
n t
d e
t u
s
ti y
.c
w
w
w
foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which
is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime
Time to do timing analysis.
1.2.3 Apart from these there are vectors, integer, real & time
register data types.
o m
Some examples are as follows:
Integer o t.c
integer counter; // general purpose variable used as a counter.
s p
o g
bl
initial
counter= -1; // a negative one is stored in the counter
p .
Real
o u
real delta; // Define a real variable called delta.
gr
s
initial
begin ent
u d
delta= 4e10; // delta is assigned in scientific notation
st
delta = 2.13; // delta is assigned a value 2.13
end y
it
c
integer i; // define an integer. I;
w
w
initial w
i = delta ; // I gets the value 2(rounded value of 2.13)
Time
time save_sim_time; // define a time variable save_sim_time
initial
save_sim_time = $time; // save the current simulation time.
n.b. $time is invoked to get the current simulation time
Arrays
integer count [0:7]; // an array of 8 count variables
reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide.
integer matrix[4:0] [0:255] ; // two dimensional array of integers.
Memories
Memories are modeled simply as one dimensional array of registers each element of the array is
know as an element of word and is addressed by a single array index.
reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words
reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words
membyte [511] // fetches 1 byte word whose address is 511.
Strings
A string is a sequence of characters enclosed by double quotes and all contained on a single line.
Strings used as operands in expressions and assignments are treated as a sequence of eight-bit
o m
ASCII values, with one eight-bit ASCII value representing one character. To declare a variable
o t.c
to store a string, declare a register large enough to hold the maximum number of characters the
variable will hold. Note that no extra bits are required to hold a termination character; Verilog
p
does not store a string termination character. Strings can be manipulated using the standard
s
operators.
o g
bl
When a variable is larger than required to hold a value being assigned, Verilog pads the contents
p .
on the left with zeros after the assignment. This is consistent with the padding that occurs during
assignment of non-string values. Certain characters can be used in strings only when preceded by
u
an introductory character called an escape character. The following table lists these characters in
o
r
the right-hand column with the escape sequence that represents the character in the left-hand
g
column.
s
Modules ent
u d
st
Module are the building blocks of Verilog designs
it y
You create design hierarchy by instantiating modules in other modules.
.c
An instance of a module can be called in another, higher-level module.
w
w
w
o m
o t.c
s p
Ports o g
. bl
p
Ports allow communication between a module and its environment.
u
Ports can be associated by order or by name. r o
All but the top-level modules in a hierarchy have ports.
s g
You declare ports to be input, output or inout. The port declaration syntax is :
ent
input [range_val:range_var] list_of_identifiers;
output [range_val:range_var] list_of_identifiers;
u d
inout [range_val:range_var] list_of_identifiers;
st
Schematic it y
.c
w
w
w
Width matching: It is legal to connect internal and external ports of different sizes. But
beware, synthesis tools could report problems.
Unconnected ports : unconnected ports are allowed by using a ","
The net data types are used to connect structure
A net data type is required if a signal can be driven a structural connection.
Example Implicit o m
o t.c
dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected
s p
Example Explicit o g
. bl
dff u0 (.q (q_out),
.q_bar (), u p
.clk (clk_in), r o
.d (d_in),
s g
.rst (rst_in),
e nt
.pre (pre_in)); // Here second port is not connected
u d
1.3 st
Gate Level Modeling
y
i t
In this level of abstraction thecsystem modeling is done at the gate level ,i.e., the properties of the
.
w
gates etc. to be used by the behavioral description of the system are defined. These definitions
are known as primitives. wVerilog has built in primitives for gates, transmission gates, switches,
w
buffers etc.. These primitives are instantiated like modules except that they are predefined in
verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not
gates.
Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs
\// basic gate instantiations for bufif
Rise Delay
The rise delay is associated with a gate output transition to 1 from another value (0,x,z).
o m
Fall Delay o t.c
s p
g
The fall delay is associated with a gate output transition to 0 from another value (1,x,z).
o
. bl
u p
r o
s g
Turn-off Delay ent
u d
The Turn-off delay is associated with a gate output transition to z from another value (0,1,x).
Min Value st
it y
The min value is the minimum delay value that the gate is expected to have.
Typ Value
.c
w
The typ value is the typical delay value that the gate is expected to have.
Max Value
w
w
The max value is the maximum delay value that the gate is expected to have.
initial : initial blocks execute only once at time zero (start execution at time zero).
always : always blocks loop to execute over and over again, in other words as the name
means, it executes always.
Example initial
module initial_example();
reg clk,reset,enable,data;
initial begin
clk = 0;
reset = 0;
enable = 0;
data = 0;
end
endmodule
In the above example, the initial block execution and always block execution starts at time 0.
Always blocks wait for the the event, here positive edge of clock, where as initial block without
waiting just executes all the statements within begin and end statement.
Example always
o m
t.c
module always_example();
reg clk,reset,enable,q_in,data;
always @ (posedge clk) p o
if (reset) begin
g s
data <= 0;
o
end
. bl
else if (enable) begin
data <= q_in; u p
end r o
endmodule
s g
ent
In always block, when the trigger event occurs, the code inside begin and end is executed and
d
then once again the always block waits for next posedge of clock. This process of waiting and
u
t
executing on event is repeated till simulation stops.
s
t y
1.4.2 ci
Procedural Assignment
. Statements
w
w
Procedural assignment statements assign values to reg , integer , real , or time variables
w
and can not assign values to nets ( wire data types)
You can assign to the register (reg data type) the value of a net (wire), constant, another
register, or a specific value.
Example - "begin-end"
module initial_begin_end();
reg clk,reset,enable,data;
initial begin
#1 clk = 0;
#10 reset = 0;
#5 enable = 0;
#3 data = 0;
end
endmodule
Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data
after 13 units. All the statements are executed sequentially.
Example - "fork-join"
module initial_fork_join();
reg clk,reset,enable,data;
initial fork
#1 clk = 0;
#10 reset = 0;
o m
#5 enable = 0;
#3 data = 0;
o t.c
join
s p
endmodule
o g
1.4.4 Sequential Statement Groups . bl
u p
The begin - end keywords:
Group several statements together. r o
s g
Cause the statements to be evaluated sequentially (one at a time)
e nt
o Any timing within the sequential groups is relative to the previous statement.
o Delays in the sequence accumulate (each delay is added to the previous delay)
u d
o Block finishes after the last statement in the block.
s t
1.4.5 Parallel Statement i ty Groups
. c
The fork - join keywords:w
w
Group several statements together.
w to be evaluated in parallel ( all at the same time).
Cause the statements
o Timing within parallel group is absolute to the beginning of the group.
o Block finishes after the last statement completes( Statement with high delay, it
can be the first statement in the block).
Example Parallel
module parallel();
reg a;
initial
fork
#10 a = 0;
#11 a = 1;
#12 a = 0;
#13 a = 1;
#14 a = $finish;
join
endmodule
u d Example a <= b;
c = #12 0;
c = #13 1;
end
initial begin
d <= #10 0;
d <= #11 1;
d <= #12 0;
d <= #13 1;
end
initial begin
$monitor( " TIME = %t A = %b B = %b C = %b D = %b" ,$time, a, b, c, d );
#50 $finish(1);
end
endmodule
Example- if-else
module if_else();
reg dff;
wire clk,din,reset;
always @ (posedge clk)
if (reset) begin
dff <= 0;
end else begin
dff <= din;
end
endmodule
Example- nested-if-else-if
module nested_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
always @ (posedge clk)
// If reset is asserted
o m
if (reset == 1'b0) begin
counter <= 4'b0000;
o t.c
// If counter is enable and up count is mode
s p
end else if (enable == 1'b1 && up_en == 1'b1) begin
o g
bl
counter <= counter + 1'b1;
// If counter is enable and down count is mode
end else if (enable == 1'b1 && down_en == 1'b1) begin p .
counter <= counter - 1'b0;
ou
// If counting is disabled
gr
end else begin
s
counter <= counter; // Redundant code ent
end
u d
endmodule
st
it y
Parallel if-else
.c
w
w
In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest pritority and
w
condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't
include reset checking in priority as this does not fall in the combo logic input to the flip-flop as
shown in figure below.
So when we need priority logic, we use nested if-else statements. On the other end if we don't
want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are
mutually exclusive, then we can write the code as shown below.
It is a known fact that priority implementation takes more logic to implement then parallel
implementation. So if you know the inputs are mutually exclusive, then you can code the logic in
parallel if.
module parallel_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
always @ (posedge clk)
// If reset is asserted
if (reset == 1'b0) begin
counter <= 4'b0000;
end else begin
// If counter is enable and up count is mode
o m
if (enable == 1'b1 && up_en == 1'b1) begin
counter <= counter + 1'b1;
o t.c
end
s p
// If counter is enable and down count is mode
o g
bl
if (enable == 1'b1 && down_en == 1'b1) begin
counter <= counter - 1'b0;
end p .
end uo
endmodule
g r
1.4.9 The Case Statement nt
s
d e
t u
The case statement compares an expression with a series of cases and executes the statement or
statement group associated with the
y sfirstor matching
case statement supports tsingle
case
Executes an < initial assignment > once at the start of the loop.p o
The for loop is same as the for loop used in any other programming language.
g s
Executes the loop as long as an < expression > evaluates as true.
o
bl
Executes a at the end of each pass through the loop
.
syntax : for (< initial assignment >; < expression >, < step assignment >) < statement >
p
u
Note : verilog does not have ++ operator as in the case of C language.
o
1.5 Switch level modeling g
r
ts
1.5.1 Verilog provides the ability to design e n at MOS-transistor level, however with increase in
complexity of the circuits design at this
udlevel associated
digital design capability and drive tstrengths
is growing tough. Verilog however only provides
to them. Analog capability is not into
s
i ty
picture still. As a matter of fact transistors are only used as switches.
MOS switches .c
//MOS switch keywords w
nmos w
pmos w
Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS
transistors.
CMOS switches
Instantiation of a CMOS switch.
The ncontrol and pcontrol signals are normally complements of each other
Bidirectional switches
These switches allow signal flow in both directions and are defined by keywords tran,tranif0 ,
and tranif1
Instantiation
tran t1(inout1, inout2); // instance name t1 is optional
tranif0(inout1, inout2, control); // instance name is not specified
tranif1(inout1, inout2, control); // instance name t1 is not specified
o m
t.c
1.5.2 Delay specification of switches
pmos, nmos, rpmos, rnmos p o
Zero(no delay) pmos p1(out,data,scontrol);
o gp1(out,data, control);
bl
One (same delay in all) pmos#(1)
Two(rise, fall) nmos#(1,2).n1(out,data, control);
p
Three(rise, fall, turnoff)mos#(1,3,2)
u n1(out,data,control);
c i
//internal wires .
wire c; w
w
// set up pwr n ground lines
w
supply1 pwr;// power is connected to Vdd
supply0 gnd; // connected to Vss
module stimulus;
reg A, B;
wire OUT;
//Apply stimulus
initial
begin
//test all possible combinations
A=1b0; B=1b0;
#5 A=1b0; B=1b1;
#5 A=1b1; B=1b0;
#5 A=1b1; B=1b1;
end
o m
//check results
initial
o t.c
$ monitor($time, OUT = %b, B=%b, OUT, A, B);
s p
o g
bl
endmodule
.c
w
w
w
Write the verilog description for the RS latch, including delays of 1 unit when instantiating the
nor gates. Write the stimulus module for the RS latch using the following table and verify the
outputs.
iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below
o m
o t.c
s p
o g
.bl
The delay specification for gates b1 and b2 are as follows
u p
r o
Min Typ Max
Rise
s g1 2 3
Fall
Turnoff e nt 3
5
4
6
5
7
u d
st
1.6.2. Behavioral modelling
y
i t
i) Using a while loop design acclk generator whose initial value is 0. time period of the clk is 10.
. design a clk with time period=10 and duty cycle =40%. Initial
w
ii) Using a forever statement,
value of clk is 0
iii) Using the repeat w
w
loop, delay the statement a=a+1 by 20 positive edges of clk.
iv) Design a negative edge triggered D-FF with synchronous clear, active high (D-FF clears only
at negative edge of clk when clear is high). Use behavioral statements only. (Hint: output q of D-
FF must be declared as reg.) Design a clock with a period of 10units and test the D-FF
v) Design a 4 to 1 multiplexer using if and else statements
vi) Design an 8-bit counter by using a forever loop, named block, and disabling of named block.
The counter starts counting at count =5 and finishes at count =67. The count is incremented at
positive edge of clock. The clock has a time period of 10. The counter starts through the loop
only once and then is disabled (hint: use the disable statement)
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
22
Introduction to Hardware
Description Languages - II
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
At the end of the lesson the student should be able to
Call a task and a function in a Verilog code and distinguish between them
Plan and write test benches to a Verilog code such that it can be simulated to check the
desired results and also test the source code
Explain what are User Defined Primitives, classify them and use them in code
2.1.1 Task
o m
Tasks are used in all programming languages, generally known as procedures or sub- routines.
o t.c
Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task,
processing done, and the result returned to the main program. They have to be specifically called,
s p
with data in and out, rather than just wired in to the general netlist. Included in the main body of
code, they can be called many times, reducing code repetition.
o g
Tasks are defined in the module in which they are b l it is possible to define a task in a
used.
p .
separate file and use compile directive 'include to include the task in the file which
instantiates the task. u
o negedge, # delay and wait.
r
sgoutputs.
Tasks can include timing delays, like posedge,
Tasks can have any number of inputs tand
n
The variables declared within thee task are local to that task. The order of declaration
within the task defines how the d
t uvariables passed to the task by the caller are used.
s global variables, when no local variables are used. When
tyassigns the output only at the end of task execution.
Task can take, drive and source
local variables are usediit
.c task or function.
One task can call another
w
Task can be used wfor modeling both combinational and sequential logics.
A task mustwbe specifically called with a statement, it cannot be used within an
expression as a function can.
Syntax
task begins with the keyword task and ends with the keyword endtask
Input and output are declared after the keyword task.
Local variables are declared after input and output declaration.
module simple_task();
task convert;
input [7:0] temp_in;
Example
// Module that contains an automatic re-entrant task
//there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it.
module top;
reg[15:0] cd_xor, ef_xor; // variables in module top
reg[15:0] c,d,e,f ; // variables in module top
task automatic bitwise_xor
output[15:0] ab_xor ; // outputs from the task
input[15:0] a,b ; // inputs to the task
begin
#delay ab_and = a & b
ab_or= a| b;
ab_xor= a^ b;
end
endtask
// these two always blocks will call the bitwise_xor task
o m
//concurrent calls will work efficiently
o t.c
// concurrently at each positive edge of the clk, however since the task is re-entrant, the
i
file which instantiates thetytask.
function in separate file and
.c
function shouldw
w
Function can not include timing delays, like posedge, negedge, # delay. This means that a
be executed in "zero" time delay.
Function canw have any number of inputs but only one output.
The variables declared within the function are local to that function. The order of
declaration within the function defines how the variables are passed to it by the caller.
Function can take, drive and source global variables when no local variables are used.
When local variables are used, it basically assigns output only at the end of function
execution.
Function can be used for modeling combinational logic.
Function can call other functions, but can not call a task.
Syntax
A function begins with the keyword function and ends with the keyword endfunction
Version 2 EE IIT, Kharagpur 5
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Constant function
A constant function is a regular verilog function and is used to reference complex values, can be
used instead of constants.
Signed function
These functions allow the use of signed operation on function return values.
module top;
// signed function declaration
// returns a 64 bit signed value
function signed [63:0] compute _signed (input [63:0] vector);
--
--
endfunction
o m
t.c
// call to the signed function from a higher module
if ( compute_signed(vector)<-3)
begin
p o
--
g s
end
o
--
.bl
endmodule
u p
2.1.3 ro
System tasks and functions
g
s
Introduction
ent
u d
st
There are tasks and functions that are used to generate inputs and check the output during
simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system
it y
functions, and, hence, they can be included even in synthesizable models.
$display, $strobe,w$monitor
.c
w
w
These commands have the same syntax, and display text on the screen during simulation. They
are much less convenient than waveform display tools like GTKWave. or Undertow. $display
and $strobe display once every time they are executed, whereas $monitor displays every time
one of its parameters changes. The difference between $display and $strobe is that $strobe
displays the parameters at the very end of the current simulation time unit rather than exactly
where a change in it took place. The format string is like that in C/C++, and may contain format
characters. Format characters include %d (decimal), %h (hexadecimal), %b (binary), %c
(character), %s (string) and %t (time), %m (hierarchy level). %5d, %5b. b, h, o can be appended
to the task names to change the default format to binary, octal or hexadecimal.
Syntax
$display ("format_string", par_1, par_2, ... );
d e
$random generates a random integer every time it is called. If the sequence is to be repeatable,
t u
the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is
derived from the computer clock.
y s
c it
$dumpfile, $dumpvar, . $dumpon, $dumpoff, $dumpall
w
w
w
These can dump variable changes to a simulation viewer like Debussy. The dump files are
capable of dumping all the variables in a simulation. This is convenient for debugging, but can
be very slow.
Syntax
$dumpfile("filename.dmp")
$dumpvar dumps all variables in the design.
$dumpvar(1, top) dumps all the variables in module top and below, but not modules
instantiated in top.
$dumpvar(2, top) dumps all the variables in module top and 1 level below.
$dumpvar(n, top) dumps all the variables in module top and n-1 levels below.
$dumpvar(0, top) dumps all the variables in module top and all level below.
$dumpon initiates the dump.
$dumpoff stop dumping.
$fopen opens an output file and gives the open file a handle for use by the other
commands.
$fclose closes the file and lets other programs access it.
$fdisplay and $fwrite write formatted data to a file whenever they are executed. They are
the same except $fdisplay inserts a new line after every execution and $write does not.
o m
t.c
$strobe also writes to a file when executed, but it waits until all other operations in the
time step are complete before writing. Thus initial #1 a=1; b=0;
p o
$fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file
whenever any one of its arguments changes.
g s
o
Syntax . bl
u p
handle1=$fopen("filenam1.suffix")
r o
handle2=$fopen("filenam2.suffix")
s g
n t //strobe data into filenam1.suffix
$fstrobe(handle1, format, variable list)
d elist) //write data into filenam2.suffix
$fdisplay(handle2, format, variable
t u list) //write data into filenam2.suffix all on one line.
s
$fwrite(handle2, format, variable
ti y //put in the format string where a new line is
.c // desired.
w
2.2 w
Writing Testbenches
w
2.2.1 Testbenches
are codes written in HDL to test the design blocks. A testbench is also known as
stimulus, because the coding is such that a stimulus is applied to the designed block and its
functionality is tested by checking the results. For writing a testbench it is important to have the
design specifications of the "design under test" (DUT). Specifications need to be understood
clearly and test plan made accordingly. The test plan, basically, documents the test bench
architecture and the test scenarios (test cases) in detail.
Example Counter
Consider a simple 4-bit up counter, which increments its count when ever enable is high and
resets to zero, when reset is asserted high. Reset is synchronous with clock.
The first way is to simply instantiate the design block(DUT) and write the code such that it
directly drives the signals in the design block. In this case the stimulus block itself is the top-
level block.
In the second style a dummy module acts as the top-level module and both the design(DUT) and
the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT
are defined as reg and outputs from DUT are defined as wire. An important point is that there is
no port list for the test bench.
An example of the stimulus block is given below.
Note that the initial block below is used to set the various inputs of the DUT to a predefined
logic state.
Another elaborated instance of the testbench is shown below. In this instance the usage of system
tasks has been explored.
module counter_tb;
reg clk, reset, enable;
wire [3:0] count;
counter U0 (
.clk (clk),
.reset (reset),
.enable (enable),
.count (count)
initial begin
clk = 0;
reset = 0;
enable = 0;
end
always
#5 clk = !clk;
initial begin
$dumpfile ( "counter.vcd" );
$dumpvars;
end
initial begin
$display( "\t\ttime,\tclk,\treset,\tenable,\tcount" );
$monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count);
end
o m
initial
#100 $finish;
o t.c
//Rest of testbench code after this line
s p
Endmodule
o g
l
$dumpfile is used for specifying the file that simulator willbuse to store the waveform, that can
be used later to view using a waveform viewer. (Please p .
refer to tools section for freeware version
of viewers.) $dumpvars basically instructs the Verilogucompiler to start dumping all the signals
to "counter.vcd". r o
s g (screen), \t is for inserting tab. Syntax is
$display is used for printing text or variables to stdout
n t $monitor keeps track of changes to the
same as printf. Second line $monitor is bit different,
e count). When ever anyone of them changes, it
variables that are in the list (clk, reset, enable,
prints their value, in the respective radixdspecified.
t u after #100 time units (note, all the initial, always
blocks start execution at time 0) ys
$finish is used for terminating simulation
c it
Adding the ResetwLogic .
wlogic to allow us to see what our testbench is doing, we can next add the
Once we have the basicw
reset logic, If we look at the testcases, we see that we had added a constraint that it should be
possible to activate reset anytime during simulation. To achieve this we have many approaches,
but the following one works quite well. There is something called 'events' in Verilog, events can
be triggered, and also monitored to see, if a event has occurred.
Lets code our reset logic in such a way that it waits for the trigger event "reset_trigger" to
happen. When this event happens, reset logic asserts reset at negative edge of clock and de-
asserts on next negative edge as shown in code below. Also after de-asserting the reset, reset
logic triggers another event called "reset_done_trigger". This trigger event can then be used at
some where else in test bench to sync up.
event reset_done_trigger;
initial begin
forever begin
@ (reset_trigger);
@ (negedge clk);
reset = 1;
@ (negedge clk);
reset = 0;
reset_done_trigger;
end
end
m
Moving forward, lets add logic to generate the test cases, ok we have three testcases as in the
o
t.c
first part of this tutorial. Lets list them again.
Reset Test : We can start with reset deasserted, followed by asserting reset for few clock
o
ticks and deasserting the reset, See if counter sets its output to zero.
p
Enable Test: Assert/deassert enable after reset is applied.
Random Assert/deassert of enable and reset. g s
o
Adding compare Logic . bl
u p
r o
To make any testbench self checking/automated, a model that mimics the DUT in functionality
s g
needs to be designed.For the counter defined previously the model looks similar to:
Reg [3:0] count_compare;
always @ (posedge clk) ent
if (reset == 1'b1)
u d
count_compare <= 0;
st
else if ( enable == 1'b1)
it y
.c
count_compare <= count_compare + 1;
w
Once the logic to mimic the DUT functionality has been defined, the next step is to add the
w
checker logic. The checker logic at any given point keeps checking the expected value with the
w
actual value. Whenever there is an error, it prints out the expected and the actual values, and,
also, terminates the simulation by triggering the event terminate_sim. This can be appended to
the code above as follows:
Syntax
m
UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be
o
t.c
defined outside the main module definition.
This code shows how input/output ports and primitve is declared.
primitive udp_syntax (
p o
a, // Port a
g s
b, // Port b
o
c, // Port c
. bl
d // Port d
) u p
output a;
r o
input b,c,d;
s g
// UDP function code here
endprimitive
ent
u d
Note:
st
it y
A UDP can contain only one output and up to 10 inputs max.
Output Port should be the first port followed by one or more input ports.
.c
All UDP ports are scalar, i.e. Vector ports are not allowed.
w
UDP's can not have bidirectional ports.
w
Body w
Functionality of primitive (both combinational and sequential) is described inside a table, and it
ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use
initial to assign initial value to output.
Note: A UDP cannot use 'z' in input table and instead it uses x.
m
In combinational UDPs, the output is determined as a function of the current input. Whenever an
o
t.c
input changes value, the UDP is evaluated and one of the state table rows is matched. The output
state is set to the value indicated by that row.
Let us consider the previously mentioned UDP.
p o
g s
TestBench to Check the above UDP o
. bl
include "udp_body.v"
module udp_body_tb(); u p
reg b,c; r o
wire a;
s g
udp_body udp (a,b,c);
initial begin ent
$monitor( " B = %b C = %b A = %b" ,b,c,a);
u d
b = 0;
st
c=0;
it y
#1 b = 1;
#1 c = 1; .c
w
#1 b = 1'bx;
w
#1 c = 0;
#1 b = 1;
w
#1 c = 1'bx;
#1 b = 0;
#10 $finish;
end
endmodule
Sequential UDPs
Sequential UDPs differ in the following manner from the combinational UDPs
o m
t.c
//declarations output q;
reg q; // q declared as reg to create internal storage
input d, clock, clear;
p o
// sequential UDP initialization g s
o
// only one initial statement allowed
initial . bl
q=0; // initialize output to value 0
u p
r o
// state table
s g
nt
table
// d clock clear : q : q+ ;
? ? 1 : ? : 0 ;// clear condition
d e
u
// q+ is the new output value
t
1 1 s
0 : ? : 1 ;// latch q = data = 1
y
0 1
it
0 : ? : 0 ;// latch q = data = 0
.c
w
? 0 w
0 : ? : - ;// retain original state if clock = 0
endtable
w
endprimitive
Edgesensitive UDPs
// state table
table
// d clock clear : q : q+ ;
? ? 1 : ? : 0 ; // output=0 if clear =1
endtable
endprimitive
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Some Exercises
ii) Assume that a six delay specification is to be specified for all the path delays. All path delays
are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using
the previous DFF write the six delay specifications for all the paths.
3. UDP
i. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low.
ii. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output
is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If
preset=0 then q is decided by clock and d signals. If preset=x then q=x.
iii. Define a negative edge triggered JK FF, jk_ff with asynchronous preset and clear as a
UDP. Q=1when preset=1 and q=0 when clear=1
o m
o t.c
s p
T he table for JK FF is as follows
o g
J K qn+1 .bl
0 0 qnu p
0 1
r0o
1 0
s g 1
1
e nt1 qn
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
4
Design of Embedded
Processors
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
23
Introduction to Hardware
Description Languages-III
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
3.1.1 Verilog
o m
t.c
PLI (Programming Language Interface) is a facility to invoke C or C++ functions from Verilog
code.
p o
g
are $display, $stop, $random. PLI allows the user to create custom
s systemof calls,
The function invoked in Verilog code is called a system call. Examples built-in system calls
something that
o
Verilog syntax does not allow to do. Some of these are:-
. bl
Power analysis.
u p
Code coverage tools. r o
g
Can modify the Verilog simulation datasstructure - more accurate delays.
n t
Custom output displays.
d e
Co-simulation.
t u
Designs debug utilities.
y s
Simulation analysis. ci
t
C-model interfacew
.
to accelerate simulation.
w
w
Testbench modeling.
To achieve the above few application of PLI, C code should have the access to the internal data
structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called
acc routines or access routines
How it Works?
Write the functions in C/C++ code.
Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator
like VCS allows static linking.
Use this Functions in Verilog code (Mostly Verilog Testbench).
Based on simulator, pass the C/C++ function details to simulator during compile process
of Verilog Code (This is called linking, and you need to refer to simulator user guide to
understand how this is done).
Once linked just run the simulator like any other Verilog simulation.
o m
o t.c
s p
o g
bl
During execution of the Verilog code by the simulator, whenever the simulator encounters the
.
user defined system tasks (the one which starts with $), the execution control is passed to PLI
p
routine (C/C++ function).
u
o
Example - Hello World
g r
s
t will print "Hello World". This example does not
Define a function hello ( ), which when calledn
e TF and VPI). For exact linking details, the
d
use any of the PLI standard functions (ACC,
u
the C/C++ functions. st
simulator manuals must be referred. Each simulator implements its own strategy for linking with
ti y
C Code .c
w
#include < stdio.h > w
Void hello () { w
printf ( "\nHello World\n" );
Verilog Code
module hello_pli ();
initial begin
$hello;
#10 $finish;
end
endmodule
Write the DUT reference model and Checker in C and link that to the Verilog Testbench.
Means of calling the C model, when ever there is change in input signals (Could be wire
or reg or types).
m
Means to get the value of the changes signals in Verilog code or any other signals in
Verilog code from inside the C code. o
t.c
Means to drive the value on any signal inside the Verilog code from C code.
o
s p
There are set of routines (functions), that Verilog PLI provides which satisfy the above
requirements
o g
3.1.3 PLI Application Specification . bl
u p
r o
This can be well understood in context to the above counter logic. The objective is to design the
g
PLI function $counter_monitor and check the response of the designed counter using it. This
s
Implement the Counter logic in C.
ent
problem can be addressed to in the following steps:
acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor
signals change, it calls the user defined function (this function is called the Consumer C
routine). The vcl routine has four arguments.
C Code Basic
o m
The desired C function is Counter_monitor , which is called from the Verilog Testbench. As
like any other C code, header files specific to the application are included.Here the include e file
comprises of the acc routines.
o t.c
s p
The access routine acc_initialize initializes the environment for access routines and must be
g
called from the C-language application program before the program invokes any other access
o
. bl
routines. Before exiting a C-language application program that calls access routines, it is
necessary to exit the access routine environment by calling acc_close at the end of the program.
u p
#include < stdio.h >
r o
#include "acc_user.h"
s g
nt
typedef char * string;
handle clk ;
handle reset ;
d e
handle enable ;
t u
handle dut_count ;
y s
int count ; it
void counter_monitor() .c
{ w
acc_initialize(); w
w
clk = acc_handle_tfarg(1);
reset = acc_handle_tfarg(2);
enable = acc_handle_tfarg(3);
dut_count = acc_handle_tfarg(4);
acc_vcl_add(clk,counter,null,vcl_verilog_logic);
acc_close();
}
void counter ()
printf( "Clock changed state\n" );
Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a
pointer to a specific object in the design hierarchy. Each handle conveys information to access
routines about a unique instance of an accessible object information about the object type and,
also, how and where the data pertaining to it can be obtained. The information of specific object
to handle can be passed from the Verilog code as a parameter to the function $counter_monitor.
This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine.
For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first
parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal
list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null ,
vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk
changes.
Verilog Code
Below is the code of a simple testbench for the counter example. If the object being passed is an
instance, then it should be passed inside double quotes. Since here all the objects are nets or
wires, there is no need to pass them inside the double quotes.
module counter_tb(); o m
reg enable;;
reg reset; o t.c
reg clk_reg;
s p
wire clk;
o g
wire [3:0] count;
initial begin . bl
clk = 0;
u p
reset = 0;
r o
$display( "Asserting reset" );
s g
nt
#10 reset = 1;
#10 reset = 0;
$display ( "Asserting Enable" );
d e
#10 enable = 1;
t u
#20 enable = 0;
y s
it
$display ( "Terminating Simulator" );
#10 $finish; .c
w
End
w
Always
w
#5 clk_reg = !clk_reg;
assign clk = clk_reg;
initial begin
$counter_monitor(top.clk,top.reset,top.enable,top.count);
end
counter U(
clk (clk),
reset (reset),
enable (enable),
count (count)
);
endmodule
Access Routines
Access routines are C programming language routines that provide procedural access to
information within Verilog. Access routines perform one of two operations:
o g
acc_close( ) : Undo the actions taken by the function acc_initialize( )
. bl
Utility Routines
u p
r o
g
Interaction between the Verilog tool and the users routines is handled by a set of programs that
s
nt
are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety
of operations on the parameters passed to the system call and are used to do simulation
d e
synchronization or implementing conditional program breakpoint.
t u
3.2 s
Verilog and Synthesis
i ty
3.2.1 What is logic c
. synthesis?
w
w
w
Logic synthesis is the process of converting a high-level description of design into an optimized
gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of
simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes,
memory, and flip-flops. Standard cells put together form the technology library. Normally,
technology library is known by the minimum feature size (0.18u, 90nm).
A circuit description is written in Hardware description language (HDL) such as Verilog Design
constraints such as timing, area, testability, and power are considered during synthesis. Typical
design flow with a large example is given in the last example of this lesson.
o m
3.2.2 Impact of automation on Logic synthesis t.c
p o
For large designs, manual conversions of the behavioral sdescription to the gate-level
o g of modern sophisticated
l whether after fabrication the design
representation are more prone to error. Prior to the development
b
.
synthesis tools the earlier designers could never be sure that
constraints will be met. Moreover, a significant timep of the design cycle was consumed in
converting the highlevel design into its gate level u
o representation. On account of these, if the
gate level design did not meet the requirements rthen the turnaround time for redesigning the
s g design blocks and there was very little
blocks was also very high. Each designer implemented
consistency in design cycles, hence, although
n t the individual blocks were optimized but the
e Moreover, timing, area and power dissipation was
overall design still contained redundant logics.
fabrication process specific and, hence,dwith the change of processes the entire process needed to
t u
s has solved these problems. The high level design is less
be changed with the design methodology.
y
t
However, now automated logic synthesis
idesigns are described at higher levels of abstraction. High level
prone to human error because
. c
design is done without much concentration on the constraints. The tool takes care of all the
wthat the constraints are taken care of. The designer can go back,
w
constraints and sees to it
w
redesign and synthesize once again very easily if some aspect is found unaddressed. The
turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize
the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis
allows a technology independent design. The tools convert the design into gates using cells from
the standard cell library provided by the vendor.
Design reuse is possible for technology independent designs. If the technology changes the tool
is capable of mapping accordingly.
force and release force and release of data types not supported
assign and deassign assign and deassign of reg data types is not supported, but,
assign on wire data type is supported
module synthesis_initial(
clk,q,d);
o m
t.c
input clk,d;
output q;
p o
reg q;
g s
initial begin
o
q <= 0;
. bl
end
always @ (posedge clk) u p
begin
r o
q <= d;
s g
end
endmodule
e nt
Delays are also non-synthesizablete.g.
d
u a = #10 b; This code is useful only for simulation
purpose.
y s
t
Synthesis tool normally ignoresi such constructs, and just assumes that there is no #10 in above
statement, treating the above.c
code as just a = b.
w
3.2.3 Constructs w
w and Their Description
Construct Type Keyword Description
ports input, inout, output Use inout only at IO level.
This makes design more
parameters parameter
generic
module definition module
signals and variables wire, reg, tri Vectors are allowed
module instances primitive gate Eg- nand (out,a,b) bad idea
instantiation
instances to code RTL this way.
function and tasks function , task Timing constructs ignored
d e- Unary minus
Logical
t u ! Logical negation
procedural
always, if, then, else, case, casex,
g s initial is not supported
o
bl
casez
p.
Disabling of named blocks
procedural blocks begin, end, named blocks, disable
allowed
o u
data flow assign
gr Delay information is
ignored
ts Disabling of named block
named Blocks disable
en supported.
Translation
The RTL description is converted by the logic synthesis tool to an optimized, intermediate,
internal representation. It understands the basic primitives and operators in the Verilog RTL
description but overlooks any of the constraints.
Logic optimization
The logic is optimized to remove the redundant logic. It generates the optimized internal
representation.
Technology library o m
t.c synthesis to replace
o
The technology library contains standard library cells which are used during
the behavioral description by the actual circuit components. Thesepare the basic building blocks.
g
Physical layout of these, are done first and then area is estimated.s Finally, modeling techniques
are used to estimate the power and timing characteristics.
b lo
The library includes the following:
p .
Functionality of the cells
o u
Area of the different cell layout
g r
s
Timing information about the various tcells
Power information of various cellse
n
u d
The synthesis tools use these cells totimplement the design.
y s
t
// Library cells for abc_100 technology
VNAND// 2 input nand gate i
VAND// 2 input and gate .
c
w
w
VNOR // 2 input nor gate
w
VOR// 2 input or gate
VNOT// not gate
VBUF// buffer
Design constraints
Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization
demands a compromise among each of these three constraints. Apart from these operating
conditions-temperature etc. also contribute to synthesis complexity.
Logic synthesis
The logic synthesis tool takes in the RTL design, and generates an optimized gate level
description with the help of technology library, keeping in pace with design constraints.
Functional verification
Identical stimulus is run with the original RTL and synthesized gate-level description of the
design. The output is compared for matches.
module stimulus
reg [3:0] A, B;
wire A_GT_B, A_LT_B, A_EQ_B;
// instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B);
o m
t.c
initial
$ monitor ($time, A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b, A_GT_B,
A_LT_B, A_EQ_B, A, B)
p o
// stimulate the magnitude comparator
g s
o
endmodule
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
3.3 Verification
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Traditional verification follows the following steps in general.
1. To verify, first a design specification must be set. This requires analysis of architectural
trade-offs and is usually done by simulating various architectural models of the design.
2. Based on this specification a functional test plan is created. This forms the framework for
verification. Based on this plan various test vectors are applied to the DUT (design under
test), written in verilog. Functional test environments are needed to apply these test
vectors.
3. The DUT is then simulated using traditional software simulators.
4. The output is then analyzed and checked against the expected results. This can be done
manually using waveform viewers and debugging tools or else can be done automatically
by verification tools. If the output matches expected results then verification is complete.
5. Optionally, additional steps can be taken to decrease the risk of future design respin.
These include Hardware Acceleration, Hardware Emulation and assertion based
Verification.
Functional verification
When the specifications for a design are ready, a functional test plan is created based on them.
This is the fundamental framework of the functional verification. Based on this test plan, test
vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to
compare its output with the desired results. If the observed results match the expected values, the
verification part is over.
o m
o t.c
s p
3.3.3 Semi- formal verification o g
. bl
p
Semi formal verification combines the traditional verification flow using test vectors with the
u
power and thoroughness of formal verification.
r o
Semi-formal methods supplement simulations g with test vectors
n tproperties targeted by formal methods
Embedded assertion checks define the
d e the input constraints
u
Embedded assertion checks defines
t
s limited space exhaustibility from the states reached by
ti y the effect of simulation.The exploration is limited to a
Semi-formal methods explore
simulation, thus, maximizing
c state reached by simulation.
certain point around .the
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
3.3.4 it
Equivalence checking y
. c
w place and route tools create a gate level netlist and physical
After logic synthesis and
implementations of thew RTL design, respectively, it is necessary to check whether these
wthe original RTL design. Here comes equivalence checking. It is an
functionalities match
application of formal verification. It ensures that the gate level or physical netlist has the same
functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate
level representations is constructed. It is mathematically proved that their functionality are same.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
3.4.1 PLI
i) Write a user defined system task, $count_and_gates, which counts the number of and gate
primitive in a module instance. Hierarchical module instance name is the input to the task. Use
this task to count the number of and gates in a 4-to-1 multiplexer.
p
s input a[2:0] is provided to the
g
ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit
decoder. The output of the decoder is out[7:0]. The output bitoindexed by a[2:0] gets the value 1,
l
the other bits are 0. Synthesize the decoder, using anybtechnology library available to you.
.
Optimize for smallest area. Apply identical stimuluspto the RTL and gate level netlist and
compare the outputs.
o u
gr
s binary counter with synchronous reset that is
iii) Write the verilog RTL description for a 4-bit
active high.(hint: use always loop with the @t (posedge clock)statement.) synthesize the counter
n
using any technology library available eto you. Optimize for smallest area. Apply identical
u d and compare the outputs.
stimulus to the RTL and gate level netlist
st
i ty
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
24
Parallel Data
Communication
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Explain why a parallel interface is needed in an embedded system
List the names of common parallel bus standards along with their important features
Distinguish between the GPIB and other parallel data communication standards
Describe how data communication takes place between the controller, talker and listener
devices connected via a GPIB interface
Questions
Question Visual (If any) A B C D Ans.
Parallel Data Communication
o m D
is preferred when the
t. c
following conditions are
satisfied: p o
i) distance between the
g sT T F T
o
bl T
devices is small
p.
ii) the volume of traffic is F T F
small
iii) the required data rate is
o u T F T T
high
gr
The IEEE 488 standard was
ts Intel IBM HP Sun C
originally developed by
The devices connected in a en 1 2 3 4 C
GPIB system are classified
u d
into the following types of
st
categories
i t y
Each device connected in a
.c 3 5 6 7 B
GPIB system has an n-bit
w
address where n=
w
w
Parallel Data Communication
Data processed by an embedded processor need to be conveyed to other components in the
system, namely, an instrument, a smart actuator, a hard disk or a communication network for
onward transmission to a central data warehouse. Similarly data may have to be fetched from a
digital oscilloscope, a CD-ROM Drive or a sensor from the field. Typically, when the physical
distance between the processor and the other component is small, say a within a few meters and a
high volume of data need to be conveyed in a short time, parallel bus interfaces are used.
In this lesson, we first learn about one of the most popular parallel bus standards, namely the
IEEE 488 standard, also known as the GPIB (formerly HPIB). Next we compare and contrast it
with the other similar standards. Finally we discuss about its future particularly in view of the
recently emerging high-serial bus standards like the USB.
Because of its success and proven reliability, in 1973 the HPIB bus became an American
Standard, adopted by the IEEE and renamed as GPIB, for General Purpose Interface Bus. The
standard's number is IEEE488.1.
In parallel, the International Electronic Commission (IEC), responsible for the international
standardization outside the U.S., approved the standard and called it IEC625.1. Due to
introduction of a new naming scheme for all standards, it was renamed to IEC60625.1 later.
o m
There was a slight difference between the IEEE488.1 and IEC625.1. The IEC625.1 standard
o t.c
used a 25 pin DSUB connector for the bus, the IEEE488.1 standard favored a Centronics-like 24
pin connector. Today, the 24-pin connector is always used, but there are also adaptors available
in case older instruments are equipped with a 25-pin DSUB connector.
s p
o g
bl
The '.1 extension of IEEE488.1 / IEC60625.1 indicates that there are several layers of interface
standards. In fact, there is a whole 'family' of standards:
p .
o ou
IEEE488.1 / IEC60625.1 defines the physical layer of the bus system.
o
gr
IEEE488.2 / IEC60625.2 is not a revision of the '.1' standard, it extends its functionality:
go to top
s
nt
A command language (syntax) is defined and common properties of instruments are
defined. Same command names result in similar actions. In contrast to the '.1' standard
d e
that defines physical means like cables, timing and so on, the '.2' standard focuses on the
instrument model.
t u
o s
An application of IEEE488.2 / IEC60625.2 is IEEE1174. It is currently adopted. Briefly
y
it
stated, it translates GPIB functionality to a serial RS232 line, albeit without networking
go to top
.c
capability. It is intended for low cost instruments.
w
w
Thus GPIB has several versions and makes which reflect the same thing, courtesy to the various
w
developments pertaining to its history.
The BUS actually comprises a 24 Wire Cable with both MALE and FEMALE Connectors at
each of the individual ends to facilitate the connectivity in a daisy-chain network topology.
Standard TTL level signals are assumed for the ACTIVE, INACTIVE and TRANSITION states
both for Control and Communication.
Specified Transfer Rate: 1 Mega Byte per second.
Cable length:
Twenty meters between Controller and one Device or
Two meters between two devices
CLASSIFICATION of Instruments or Devices (as are called in the Standard) connected through
this bus system:
TALKER: Designated to send data to other instruments eg., Tape Readers, Data Recorders,
Digital Voltmeters, Digital Oscilloscopes etc.
LISTENER:Designated to receive data from other instruments or Controllers, eg., Printers,
Display devices, Programmable Power Supplies, Programmable Signal Generators etc.
CONTROLLER: Decision maker for the designation of an instrument either as a TALKER
or a LISTNER. Usually this role is carried out by a computer.
go to top
All the Talkers, Listeners and the Controller are connected to each other via the following three
different SYSTEM BUSES:
o m
t.c
(Also see A TYPICAL SEQUENCE of DATA FLOW)
Bidirectional Databus
p o
Bus management Lines
g s
o
bl
Handshake Lines
p .
Eight BI-DIRECTIONAL DATALINES have the following functionalities. These are used to
o u
transfer Data, Addresses, Commands and Status information in the form of Bytes.
r
gthe reception of each data byte being duly
DATA : Transferred as BYTES with
ts
acknowledged. n
e for use on a GPIB usually have some switches
ADDRESSES :Instruments intended d
u address the instrument will be assuming on the BUS.
t
which allow a selection of 5-bit
Addresses are characterizedsas :
i ty
o TALK ADDRESSES . c
w
o LISTEN ADDRESSES
w
CONTROL w and COMMAND: BYTES containing information for orienting the devices
to perform the functions like listen, talk etc. These commands can be referred to as the
CONTROL WORDs necessary for establishing efficient communication between the
Controller and the other class of devices.
The various commands are: (also see the COMMAND TABLE)
o UNIVERSAL Commands
o UNLISTEN Commands
o UNTALK Commands
o SECONDARY Commands
go to top
om
.c
ot
g sp
lo
.b
o up
gr
ts
en
ud
st
ity
.c
w
w
w
go to top
o Power On: Controller takes up the Control of Buses and sends out the IFC signal to
set all instruments on the bus to a known state.
o Controller starts performing the desired series of measurements or tests.
o Controller asserts the ATN line low and starts sending the command address codes
to the talkers and the listeners.
o The CONTROL WORD Structure:
o m
o t.c
s p
The Control Words are given in brief in the Command Table:
o g
. bl
The Command Table
u p
COMMAND
r o CONTROL WORD
Ignored
s g X1111111
Listen Command
ent X01 + 5 LSBs (actual address)
Talk Command
u d X10 + 5 LSBs (actual address)
Universal Command
st X000 + 4 LSBs (16 Commands)
Unlisten Command
Untalk Command it y X0111111
X1011111
.c
Secondary Commands X11 + 5 LSBs (actual address)
w
Note:
w
w
All the Commands Control words are activated only if the ATN line is asserted
low; otherwise, they are in a disabled state.
X here represents the dont care condition.
+ here represents the NEXT indicated number of LSBs.
Some information about DAV, NRFD, and NDAC are given below:
u d
o On completion of the data transfer the talker pulls the EOI line of the
st
management group of signals low to indicate the transfer completion.
t y
o Finally, the controller takes control of all the data bus and sends Untalk
i
.c
and Unlisten commands to all the talkers and the listeners, and continues
executing its pre-specified internal instructions.
w
w
w Bus Standards
Other Parallel
The following are some other popular Parallel bus standards. They have been designed
mainly for a particular type of application, namely, within a processor mother-board to
interface various peripherals.
1. ISA (IBM Standard Architecture) Bus. This was primarily designed for the
IBMPC (8086 / 186 / 286 Processor based) and uses a 16 bit data bus. It allows
only up to 1024 port addresses. An extension EISA (Extended ISA) allows upto
32 bit data and addresses.
2. PCI (Peripheral Systems Interconnect), PCI /X and PCI Super Buses. This is an
advanced version of the IBM-PC bus designed for the Pentium range of
processors. It has 32/33 and 64/66 MHz versions ( 64/100 MHz in the PCI / X). A
current standard PCI Super allows upto 800 Mbps on a 64-bit bus. It supports
automatic detection of devices via a 64- byte configuration register which makes
it easy to interface plug-and-play devices in a system.
3. IEEE-796 (Multi bus): Originally introduced by Intel as a means of connecting
multiple processors on the system board, this bus is no longer very popular. It
works with 16 bit data & 24 bit address buses.
4. VME Bus: (Euro-standard) Introduced for the same purpose as Intel Multibus it
works with 24 bit address 8/16/32 bit data buses.
5. SCSI Bus (Small Computer System Interface): This standard was originally
designed for use with Apple Mcintosh computers and then popularized by the
Workstation Vendors. The main purpose is to interface peripherals like harddisks,
CD-ROM Drives and similar relatively slow peripheral which use a data rate less
m
than 100Mbps. The following varieties of SCSI are currently implemented:
o
SCSI-1: Uses an 8-bit bus, and supports data rates of 4 Mbps.c
o t
SCSI-2: Same as SCSI-1, but uses a 50-pin connector p instead of a 25-pin
s most people mean when
connector, and supports multiple devices. This is what
o g
they refer to plain SCSI.
b l
Wide SCSI: Uses a wider cable (168 cable .lines to 68 pins) to support 16-bit
transfers.
u p
Fast SCSI: Uses an 8-bit bus, but doublesr o the clock rate to support data rates of
10 Mbps.
s g
Fast Wide SCSI: Uses a 16-bit n
t
bus and supports data rates of 20 Mbps.
Ultra SCSI: Uses an 8-bit bus, d eand supports data rates of 20 Mbps.
SCSI-3: Uses a 16-bit s
tu
bus and supports data rates of 40 Mbps. Also called Ultra
Wide SCSI.
i ty
Ultra2 SCSI: Uses .c an 8-bit bus and supports data rates of 40 Mbps.
w
Wide Ultra2 wSCSI: Uses a 16-bit bus and supports data rates of 80 Mbps.
w
However, for the kind of applications targeted by GPIB, it is now facing a very strong
competition from the recently introduced high speed serial bus standards. Currently there
are four major candidates for future bus systems in Test & Measurement:
The Universal Serial bus (USB) is now very popular. The current implementation
provides transfer rates of up to 12MBit/s. From that viewpoint, there is no speed
enhancement in comparison to GPIB; in fact, it is a drawback.
USB II is an enhanced USB bus capable of transferring up to 480MBit/s. It is
backwards compatible to USB. The IEC SC65C Working group 3 (that developed
also the IEC625.1 and IEC625.2 standards) is planning to work on this.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
25
Serial Data
Communication
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Distinguish between serial and parallel data communication
Explain why a communication protocol is needed
Distinguish between the RS-232 and other serial communication standards
Describe how serial communication can be used to interconnect two remote computers
using the telephone line
t.c
lines used in two-way (full-
duplex) serial data
transmission is p o
Ts F
The digital signals need to
o g A
bl
be converted to audio tones
for transmission through
telephone lines because the p .
bandwidth of these lines is
o u
low
gr
A DCE transmits its digital
s TXD RXD DTR DSR B
output data through the line
Differential signaling is used ent T F B
to reduce the effect of signal
u d
attenuation in the
st
transmission line
it y
.c
w
w
w
DATA COMMUNICATION
SERIAL DATA COMMUNICATION: An overview
PC-PC COMMUNICATION (short) (detail)
ASYNCHRONOUS COMMUNICATION PROTOCOL
RS232.WHAT IS?. o m
STANDARD
o t.c
SIGNALLING/COMMUNICATION TECHNIQUE
ADVANTAGES/APPLICATIONS s p
Disadvantage
o g
RS422 and RS423.....WHAT IS...?.p.
bl
STANDARD
o u
gr
SIGNALLING/COMMUNICATION TECHNIQUE
ADVANTAGES/APPLICATIONS
ts
e n
d
RS485......... WHAT IS......?.
u
t
ys
STANDARD
it
SIGNALLING/COMMUNICATION TECHNIQUE
.c
ADVANTAGES/APPLICATIONS
w
CONNECTERS
w and PIN DESCRIPTION
w
Differences Between the various standards at a glance!!
(home..)
And when many such systems need to share the same information or different information
through the same medium, there arises a need for proper organization (rather, socialization) of
the whole network of the systems, so that the whole system works in a cohesive fashion.
Therefore, in order for a proper interaction between the data transmitter (the device needing to
commence data communication) and the data receiver (the system which has to receive the data
sent by a transmitter) there has to be some set of rules or (protocols) which all the interested
parties must obey.
o m
t.c
The requirement above finally paves the way for some DATA COMMUNICATION
STANDARDS.
p o
g s
Depending on the requirement of applications, one has to choose the type of communication
strategy. There are basically two major classifications, namely SERIAL and PARALLEL, each
o
bl
with its variants. The discussion about serial communication will be undertaken in this lesson.
.
Any data communication standard comprises u p
r o
The protocol.
s g
Signal/data/port specificationst for the devices or additional electronic circuitry
involved. e n
u d
What is Serial Communication? st (home..)
y
it and, standards are used in situations having a limitation of
. c
Serial data communication strategies
w But it is also the situation in embedded systems where various
the number of lines that can be spared for communication. This is the primary mode of transfer
w
in long-distance communication.
subsystems share thewcommunication channel and the speed is not a very critical issue.
Standards incorporate both the software and hardware aspects of the system while buses mainly
define the cable characteristics for the same communication type.
Serial data communication is the most common low-level protocol for communicating between
two or more devices. Normally, one device is a computer, while the other device can be a
modem, a printer, another computer, or a scientific instrument such as an oscilloscope or a
function generator.
As the name suggests, the serial port sends and receives bytes of information, rather characters
(used in the other modes of communication), in a serial fashion - one bit at a time. These bytes
are transmitted using either a binary (numerical) format or a text format.
All the data communication systems follow some specific set of standards defined for their
communication capabilities so that the systems are not Vendor specific but for each system the
user has the advantage of selecting the device and interface according to his own choice of make
and range.
The most common serial communication system protocols can be studied under the following
categories: Asynchronous, Synchronous and Bit-Synchronous communication standards.
The Protocol
This protocol allows bits of information to be transmitted between two devices at an
arbitrary point of time.
m
The protocol defines that the data, more appropriately a character is sent as frames
o
t.c
which in turn is a collection of bits.
p o
The start of a frame is identified according to a START bit(s) and a STOP bit(s) identifies
the end of data frame. Thus, the START and the STOP bits are part of the frame being
sent or received.
g s
o
bl
The protocol assumes that both the transmitter and the receiver are configured in the
p .
same way, i.e., follow the same definitions for the start, stop and the actual data bits.
Both devices, namely, the transmitter and the receiver, need to communicate at an agreed
u
upon data rate (baud rate) such as 19,200 KB/s or 115,200 KB/s.
o
r
This protocol has been in use for 15 years and is used to connect PC peripherals such as
g
s
modems and the applications include the classic Internet dial-up modem systems.
nt
Asynchronous systems allow a number of variations including the number of bits in a
e
character (5, 6, 7 or 8 bits), the number of stops bits used (1, 1.5 or 2) and an optional
u d
parity bit. Today the most common standard has 8 bit characters, with 1 stop bit and no
st
parity and this is frequently abbreviated as '8-1-n'. A single 8-bit character, therefore,
t y
consists of 10 bits on the line, i.e., One Start bit, Eight Data bits and One Stop bit (as
i
.c
shown in the figure below).
w
Most important observation here is that the individual characters are framed (unlike all
w
the other standards of serial communication) and NO CLOCK data is communicated
w
between the two ends.
standard published by the Telecommunications Industry Association; both the physical and
electrical characteristics of the interfaces have been detailed in these publications.
RS-232, RS-422, RS-423 and RS-485 are each a recommended standard (RS-XXX) of the
Electronic Industry Association (EIA) for asynchronous serial communication and have more
recently been rebranded as EIA-232, EIA-422, EIA-423 and EIA-485.
It must be mentioned here that, although, some of the more advanced standards for serial
communication like the USB and FIREWIRE are being popularized these days to fill the gap for
high-speed, relatively short-run, heavy-data-handling applications, but still, the above four
satisfy the needs of all those high-speed and longer run applications found most often in
industrial settings for plant-wide security and equipment networking.
RS-232, 423, 422 and 485 specify the communication system characteristics of the hardware
such as voltage levels, terminating resistances, cable lengths, etc. The standards, however, say
o m
nothing about the software protocol or how data is framed, addressed, checked for errors or
interpreted
o t.c
THE RS-232 s p (home..)
o g
. bl
This is the original serial port interface standard and it stands for Recommended Standard
Number 232 or more appropriately EIA Recommended Standard 232 is the oldest and the most
u p
popular serial communication standard. It was first introduced in 1962 to help ensure
o
connectivity and compatibility across manufacturers for simple serial data communications.
r
s g
Applications
ent (home..)
d
Peripheral connectivity for PCs (the PC COM port hardware), which can range beyond
u
t
modems and printers to many different handheld devices and modern scientific
s
instruments. y
it
c
. and definitions pertaining to this standard can be summarized
All the various characteristics
according to: w
w
The maximum wbit transfer rate capability and cable length.
Communication Technique: names, electrical characteristics and functions of signals.
The mechanical connections and pin assignments.
The Standard
Signals can be in either an active state or an inactive state. RS232 is an Active LOW voltage
driven interface where:
ACTIVE STATE: An active state corresponds to the binary value 1. An active signal state can
also be indicated as logic 1, on, true, or a mark.
INACTIVE STATE: An inactive signal state is stated as logic 0, off, false, or a space.
m
For data signals, the true state occurs when the received signal voltage is more
o
t.c
negative than -3 volts, while the "false" state occurs for voltages more positive than 3
volts.
p o
For control signals, the "true" state occurs when the received signal voltage is more
s
positive than 3 volts, while the "false" state occurs for voltages more negative than -3
g
volts.
o
. bl
Transition or Dead Area
u p
r o
Signal voltage region in the range >-3.0V and < +3.0V is regarded as the 'dead area' and
s g
allows for absorption of noise. This same region is considered a transition region, and the signal
state is undefined.
ent
u d
To bring the signal to the "true" state, the controlling device unasserts (or lowers) the value for
st
data pins and asserts (or raises) the value for control pins. Conversely, to bring the signal to the
t y
"false" state, the controlling device asserts the value for data pins and unasserts the value for
i
.c
control pins. The "true" and "false" states for a data signal and for a control signal are as shown
below.
w
w
w
V 3
O
L Transition dead
T 0
A region
G TIME
E -3
d e
The communication technique t u
y s
RS-232 is designed for iat unidirectional half-duplex communications mode. That simply
. c (driver) is feeding the data to a receiver over a copper line. The
means that a transmitter
w the direction from driver to receiver over that line. If return
data always follows
transmission iswdesired, another set of driver- receiver pair and separate wires are needed.
w if bi-directional or full-duplex capabilities are needed, two separate
In other words,
communications paths are required.
+ +
Data + Data flow
Tx D R Rx
-
- -
RS-232 Single-Ended, Unidirectional, Half Duplex
Disadvantage (home..)
Being a single-ended system it is more susceptible to induced noise, ground loops and ground
shifts, a ground at one end not the same potential as at the other end of the cable e.g. in
applications under the proximity of heavy electrical installations and machineries But these
vulnerabilities at very high data rates and for those applications a different standard, like the RS-
422 etc., is required which have been explained further.
n t
RS-422 and RS-423 (EIA Recommended
d e Standard 422 and 423)
t uto overcome the distance and speed limitations of RS-
These were designed, specifically;
232.Although they are similar ttoy sthe more advanced RS-232C, but can accommodate higher
i
.c and, accommodate multiple receivers.
baud rates and longer cable lengths
w
The Standard w (home..)
w
Maximum Bit Transfer Rate, Signal Voltages and Cable Length
For both of these standards the data lines can be up to 4,000 feet with a data rate around
100 kbps.
The maximum data rate is around 10 Mbps for short runs, trading off distance for
speed.
The maximum signal voltage levels are 6 volts.
The signaling technique for the RS-422 and RS-423 is mainly responsible for there
superiority over RS-232 in terms of speed and length of transmission as explained in the
next subsection.
Communication Technique
The flair of this standard lies in its capability in tolerating the ground voltage differences
between sender and receiver. Ground voltage differences can occur in electrically noisy
environments where heavy electrical machinery is operating.
The criterion here is the differential-data communication technique, also referred to as
balanced-differential signaling. In this, the driver uses two wires over which the signal
is transmitted. However, each wire is driven and floating separate from ground, meaning,
neither is grounded and in this respect this system is different to the single-ended
systems. Correspondingly, the receiver has two inputs, each floating above ground and
electrically balanced with the other when no data is being transmitted. Data on the line
causes a desired electrical imbalance, which is recognized and amplified by the receiver.
The common-mode signals, such as induced electrical noise on the lines caused from
machinery or radio transmissions, are, for the most part, canceled by the receiver. That is
o m
because the induced noise is identical on each wire and the receiver inverts the signal on
one wire to place it out of phase with the other causing a subtraction to occur which
o t.c
results in a Zero difference. Thus, noise picked up by the long data lines is eliminated at
the receiver and does not interfere with data transfer. Also, because the line is balanced
s p
and separate from ground, there is no problem associated with ground shifts or ground
loops.
o g
bl R
p.
x
o u
gr - +
ts R
+ n +
de
+Data
Tx D
-
t
+ Datasflow
u R Rx
- i ty
.c
Data
- - -
w R
w +
w
Rx
RS-422 Differential Signaling, Unidirectional, Half Duplex, Multi-drop
It may be mentioned here to avoid any ambiguity in understanding the RS-422 and the
RS-423 standards, that, the standard RS-423 is an advanced counterpart of RS-422 which
has been designed to tolerate the ground voltage differences between the sender and the
receiver for the more advanced version of RS-232, that is, the RS-232C.
Unlike RS-232, an RS-422 driver can service up to 10 receivers on the same line (bus).
This is often referred to as a half-duplex single-source multi-drop network, (not to be
confused with multi-point networks associated with RS-485), this will be explained
further in conjugation with RS-485.
Like RS-232, however, RS-422 is still half-duplex one-way data communications over a
two-wire line. If bi-directional or full-duplex operation is desired, another set of driver,
receiver(s) and two-wire line is needed. In which case, RS-485 is worth considering.
Applications
This fits well in process control applications in which instructions are sent out to many actuators
or responders. Ground voltage differences can occur in electrically noisy environments
where heavy electrical machinery is operating.
RS-485
This is an improved RS-422 with the capability of connecting a number of devices (transceivers)
on one serial bus to form a network.
o m
The Standard
o t.c
Maximum Bit Transfer Rate, Signal Voltagessand Cable Length p
o g
Such a network can have a "daisy chain" topology b l each device is connected to two
where
other devices except for the devices on the ends.
p .
Only one device may drive data onto the bus u at a time. The standard does not specify the
rules for deciding who transmits and whenro on such a network. That solely depends upon
the system designer to define. g
s standards but the standard max. data rate is 10
t
Variable data rates are available for this
n do offer up to double the standard range i.e. around
Mbps, however ,some manufacturers e
d expense of cable width.
It can connect upto 32 drivers
u
20 Mbps,but of course, it is at the
t and receivers in fully differential mode similar to the RS
s
422.
i ty
.c
w
Communication Technique (home)
w Standard 485 is designed to provide bi-directional half-duplex
w
EIA Recommended
multi-point data communications over a single two-wire bus.
Like RS-232 and RS-422, full-duplex operation is possible using a four-wire, two-bus
network but the RS-485 transceiver ICs must have separate transmit and receive pins to
accomplish this.
RS-485 has the same distance and data rate specifications as RS-422 and uses
differential signaling but, unlike RS-422, allows multiple drivers on the same bus. As
depicted in the Figure below, each node on the bus can include both a driver and receiver
forming a multi-point star network. Each driver at each node remains in a disabled high-
impedance state until called upon to transmit. This is different than drivers made for RS-
422 where there is only one driver and it is always enabled and cannot be disabled.
With automatic repeaters and tri-state drivers the 32-node limit can be greatly exceeded.
In fact, the ANSI-based SCSI-2 and SCSI-3 bus specifications use RS-485 for the
physical (hardware) layer.
RX TX
D Enable
Enable R
Enable
Data Flow
TX D D TX
Data Flow
RX R R RX
R
D Enable
RX TX
o m
RS-485 Differential Signaling, Bi-directional, Half Duplex,t.cMulti-point
p o
Advantages
g s
Among all of the asynchronous standards mentioned b lo above this standard offers the
maximum data rate. p .
Apart from that special hardware for avoidingubus contention and ,
A higher receiver input impedance withro lower Driver load impedances are its other
s g
assets.
n t
d e (home..)
Differences between the tvarious u standards at a glance
s
ti y and mechanical characteristics for application purposes may
All together the important electrical
be classified and summarized.caccording to the table below.
w
w RS-232 RS-422/423 RS-485
Signaling
w Single-Ended Differential Differential
Technique (Unbalanced) (Balanced) (Balanced)
Drivers and
1 Driver 1 Driver 32 Drivers
Receivers on
1 Receiver 10 Receivers 32 Receivers
Bus
Maximum
50 feet 4000 feet 4000 feet
Cable Length
Original Standard 10 Mbps 10 Mbps
Maximum 20 kbps down to down to
Data Rate 100 kbps 100 kbps
Minimum Loaded
Driver Output +/-5.0 V +/-2.0 V +/-1.5 V
Voltage Levels
Driver Load
3 to 7 k 100 54
Impedance
Receiver Input 4k 12 k
3 to 7 k
Impedance or greater or greater
(home..)
o m
Primary communication is accomplished using three pins: the Transmit Data (TD) pin, the
Receive Data(RD) pin, and the Ground pin (not shown). Other pins are available for data flow
ot.c
control. The serial port pins and the signal assignments for a typical asynchronous serial
communication can be shown in the scheme for a 9-pin male connector (DB9) on the DTE as
under: s p
o g
.bl
p
Serial Port Pin and Signal Assignments
u
r o
Pin Label Signal Name Signal Type
The DB9 male connector itys 4 DTR Data Terminal Ready Control
(The RS-232 standard can be referred for a description of the signals and pin assignments used
for a 25-pin connector)
Because RS-232 mainly involves connecting a DTE to a DCE, the pin assignments are defined
such that straight-through cabling is used, where pin 1 is connected to pin 1, pin 2 is connected
to pin 2, and so on. A DTE to DCE serial connection using the Transmit Data (TD) pin and the
Receive Data (RD) pin is shown below.
TD (pin 3) RD (pin 3)
DTE DCE
TD (pin 2) RD (pin 2)
Connecting two DTE's or two DCE's using a straight serial cable, means that the TD pin on each
device are connected to each other, and the RD pin on each device are connected to each other.
Therefore, to connect two like devices, a null modem cable has to be used. As shown below, null
modem cables crosses the transmit and receive lines in the cable.
TD (pin 3) TD (pin 3)
DTE DTE
RD (pin 2) RD (pin 2)
o m
o t.c
Serial ports consist of two signal types: data signals and control signals. To support these signal
s p
types, as well as the signal ground, the RS-232 standard defines a 25-pin connection. However,
g
most PC's and UNIX platforms use a 9-pin connection. In fact, only three pins are required for
o
signal ground. . bl
serial port communications: one for receiving data, one for transmitting data, and one for the
u p
o
Throughout this discussion computer is considered a DTE, while peripheral devices such as
r
g
modems and printers are considered DCE's. Note that many scientific instruments function as
s
nt
DTE's.
d e
The term "data set" is synonymous with "modem" or "device," while the term "data terminal" is
synonymous with "computer."
t u
y s
it
(Detail PC PC communication.) (home..)
.c
w
The schematic for a connection between the PC UART port and the Modem serial port is as
shown below: w
w
TxD RxD
UART RxD TxD MODEM
COM SERIAL
CD
PORT PORT
DSR
DTR
DTE DCE
RTS
CTS
Note: The serial port pin and signal assignments are with respect to the DTE. For example, data
is transmitted from the TD pin of the DTE to the RD pin of the DCE.
it y
.c
Signal the presence of connected devices
Control the flow of data
w
w
w
The control pins include RTS and CTS, DTR and DSR, CD, and RI.
1. The DTE asserts the RTS pin to instruct the DCE that it is ready to receive data.
2. The DCE asserts the CTS pin indicating that it is clear to send data over the TD pin. If
data can no longer be sent, the CTS pin is unasserted.
3. The data is transmitted to the DTE over the TD pin. If data can no longer be accepted, the
RTS pin is unasserted by the DTE and the data transmission is stopped.
1. The DTE asserts the DTR pin to request that the DCE connect to the communication line.
2. The DCE asserts the DSR pin to indicate it's connected.
3. DCE unasserts the DSR pin when it's disconnected from the communication line.
The DTR and DSR pins were originally designed to provide an alternative method of hardware
handshaking. However, the RTS and CTS pins are usually used in this way, and not the DSR and
DTR pins. However, you should refer to your device documentation to determine its specific pin
behavior.
For very short distances, devices like UART(Universal Asynchronous Receiver Transmitter:
IN8250 from National Semiconductors Corporation) and USART (Universal Synchronous
Asynchronous Receiver Transmitter; Intel 8251A from Intel Corporation.) incorporate the
essential circuitry for handling this serial communication with handshaking.
For long distances Telephone lines (switched lines) are more practically feasible because of
there pre-availability.
REMEDY: Convert the digital signal to audio tones. The device, which is used to do this
conversion and vice-versa, is known as a MODEM.
o m
TXD D Dt.c TXD D
po
D RXD
RXD
RTS C
TELEPHONE LINE
g s C RTS T
T o
bl
CTS
CTS
E CD
E p . E CD E
DTR
ou DTR
DSR gr DSR
s
ent
u d
st
it y
.c
w
w
w
A TYPICAL DIGITAL TRANSMISSION SYSTEM
To start with, it should be mentioned that the signals alongside the arrowheads represent the
minimum number of necessary signals for the execution of a typical communication standard or
a protocol; being elaborated later. These signals occur when the main control terminal wants to
send some control signal to the end device or if the end device wants to send some data, say an
alarm or some process output, to the main controller.
Both the main microcomputer and the end-device or the time-shared device can be referred to as
terminals.
The modem then replies the terminal by asserting DSR (data-set ready) signal low. Here the
direction of the arrows is of prime importance and must be remembered to get the full
understandability of the whole procedure.
If the terminal is actually having some valuable data to convey to the end-terminal it will assert
the RTS (request-to-send) signal low back to the modem and, in turn, the modem will assert the
CD (carrier-detect) signal to the terminal indicating as if now it has justified the connection with
the terminal computer.
o m
But it may be possible that the modem may not be fully ready to transmit
.c the actual data to the
telephone, this may be because of its buffer saturation and several tother reasons. When the
modem is fully ready to send the data along the telephone line it p
o
will assert the CTS (Clear-to-
send) signal back to the terminal.
g s
b loand the modem. When the terminal
.
The terminal then starts sending the serial data to the modem
gets exhausted of the data it asserts the RTS signal lowpindicating the modem that it has not got
any more data to be sent. The modem in turn unasserts
o uits CTS signal and stops transmitting.
r
g processes are executed at the other end.
The same way initialization and the handshaking
s
t important aspect of data communication is the
n
Therefore, it must be noted here that the very
e for transferring serial data to and from the modem.
definition of the handshaking signals defined
u d
Current loops st (home..)
ti y
Current loops are a standard,. cwhich are used widely in process automation. 20 mA are wirely
w communication data to programmable process controlling devices.
used for transmitting serial
w is 4-20mA current loop, which is used for transmitting analogue
Other widely used standard
measurement signalswbetween the sensor and measurement device.
In digital communications 20 mA current loop is a standard. The transmitters will only source 20
mA and the receiver will only sink 20 mA. Current loops often use opto-couplers. Here it is the
current which matters and not the voltages.
For measurement purposes a small resistance, say of value1k, is connected in series with the
receiver/transmitter and the current meter. The current flowing into the receiver indicates the
scaled data, which is actually going inside it. The data transmitted though this kind of interface is
usually a standard RS-232 signal just converted to current pulses. Current on and off the
transmission line depends on how the RS-232 circuit distinguishes between the value of currents
and in what way it interprets the logic state thus obtained.
4-20 mA current loop interface is the standard for almost all the process control instruments.
This interface works as follows. The sensor is connected to a process controlling equipment,
which reads the sensor value and supplies a voltage to the loop where the sensor is connected
and reads the amount of current it takes. The typical supply voltage for this arrangement is
around 12-24 Volts through a resistor and the measured output is the voltage drop across that
resistor converted into its current counterpart.
The current loop is designed so that a sensor takes 4 mA current when it is at its minimum value
and 20 mA when it is in its maximum value.
o m
Because the sensor will always pass at least 4 mA current and there is usually a voltage drop of
current. o t.c
many volts over the sensor, many sensor types can be made to be powered from only that loop
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
26
Network Communication
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Describe the need and importance of networking in an embedded system
List the commonly adopted network communication standards and explain their basic
features
Distinguish between the CAN Bus, Field Bus and other network communication
standards for embedded applications
Choose a particular network standard to suit an application
t.c
Ethernet-type networks are D
not suitable in an embedded
system because p o
(i) These are very slow
gs T T F F
o
bl
(ii) These do not provide
any guarantee on
service times p . T T T T
(iii) These are expensive
o u T F T F
Foundation Fieldbus
gr B
implements the following
s
layers of the OSI protocol
(i) 2 ent T T F F
(ii) 3
u d T F T F
(iii) 4
st T F T T
(iv) 7
i
The I2C Bus has the followingt y T T T T
features .c A
w
(i) Two-wire
w T T F F
(ii)
w
Full-duplex
(iii) Master-Slave
F
T
T
F
T
T
F
T
CAN Bus standard was T F B
originally developed for
Chemical Processes
Network Communication
The role of networking in present-day data communication hardly needs any elaboration. The
situation is also similar in the case of embedded systems, particularly those which are distributed
over a larger geographical region the so-called distributed embedded systems. Unfortunately,
the most common network standard, namely the Ethernet, is not suitable for such distributed
systems, especially when there are real-time constraints to be satisfied. This is due to the lack of
any service time guarantee in the Ethernet standard. On the other hand, alternatives like Token
Ring, which do provide a service-time guarantee, are not very suitable because of the
requirement of a ring-type topology not very convenient to implement in the industrial
environment.
The industry therefore proposed a standard called Token-bus (and got it approved as the IEEE
802.5 specification) to cater to such requirements. However, the standard became too complex
de Bus Standard
The I2C (Inter-Integrated Circuit)
u
t
This standard was introduced by sPhilips primarily to connect a number of integrated circuits
i ty link. It uses a two-wire serial protocol. One of these carries
using a single serial communication
. c the clock. As shown in the figure below, one of the Integrated
the Data while the other carries
w is configured as the master while all the others are configured as
Circuits (IC-1 in this case)
w
slaves. Usually a microprocessor or a microcontroller serves as the master. The Protocol does not
limit the number of wmasters but only master devices can initiate a data transfer. Both master and
Servant devices can act as the senders or receivers of data. Normally, all servant devices go into
high impedance state while the master maintains logic high.
Clock
Date
o m
t.c
Start 7 bits 8
bit Slave Address
Read /
Data bits
p o Stop
Write
g s Bit
o Always
Always
.bl Asserted
Asserted by
Master u p Ack Bits by master
r o
s g to indicate successful reception
Used by the receiver of the data
ent
The original specifications for this standards were quite low, namely, 100 kbps with 7 bit
u d
addressing. The recent specifications have raised the data rate to 3.4 Mbps with 10 bit
addressing.
st
it y
The Field Bus .c
w
w
The Fieldbus comprises several versions of which, the PROFI (Process Field)-BUS is the
w
standard for local area network for integrated communications from the field level to the cell
level. It enables large numbers of field devices to be networked, and carries signals from the
distributed I/Os to the programmable controller, which might be several kilometers distant, in a
matter of milliseconds.
Initiatives such as the Interoperable Systems Project (ISP) from manufacturers under the
leadership of Siemens, Fisher-Rosement and Yokagawa, or its counterpart, the WorldFIP, mainly
supported by Honeywell, wanted to establish a de-facto Fieldbus standard by introducing their
products onto the market. Both organisations merged in the Fieldbus Foundation (FF). This
foundation strives to get a single world standard worked out. Industrial applications range from
pulp and paper production and wastewater treatment right through to power station technology.
PROFIBUS operations are processed by standard telegrams passing between master and slave
using predefined channels called communication relations. Data is stored as objects which can
be addressed in the object directory via an index. PROFIBUS specifies an RS 485 interface with
a baud rate of 9.6 kbit/s over a cable length of 1200 m and up to 500 kbit/s over a cable length of
200 m. Telegrams consist of communication relations of the target device, the PROFIBUS
partner address as well as the indices of the object to be addressed along with any data. With the
exception of broadcasts, all telegrams are answered with a positive or negative
acknowledgement. This ensures rapid recognition of faulty or non-existent stations.
o m
Transmission technology (Physical Layer) of the PROFIBUS-PA can be c
t.characterized as
follows:
o Digital, synchronous bit data transmission. p o
o Data rate 31.25 kbit/s.
g s
o Manchester coding.
b lotransposed two-wire cabling
o Signal transmission and remote power supply with
(screened/unscreened). p .
o Remote power supply DC voltage 9V...32V.
o u
o Signal AC voltage 0.75 Vpp...1 Vpp (send
gr voltage).
o Line and tree topology.
ts
o Up to 1.9 km total cabling.
o Up to 32 members per cable segment.e n
o Can be expanded with maximum
u d four repeaters.
st
The FOUNDATION fieldbus model
i t y is based on the IEC Open Systems Interconnect (OSI)
.c
layered communication model.
w
The Physical layer w
w
The fieldbus physical layer is OSI layer 1. Layer 1 receives encoded messages from the upper
layers and converts the messages to physical signals on the fieldbus transmission medium.
Physical layer requirements are defined by the approved IEC 1158-2 and ISA S50.02-1992
Physical Layer Standards. Communications rates supported are 31.25 kbit/s, 1.0 Mbit/s and 2.5
Mbit/s.
The fieldbus physical layer operating at 31.25 kbit/s is intended to replace the 4-20 mA analog
standard currently used to connect field devices to control systems. Like the 4-20 mA standard,
the FOUNDATION fieldbus supports single wire pair operation, bus powered devices, and
intrinsic safety options.
Fieldbus has additional advantages over 4-20 mA because many devices can connect to a single
wire pair resulting in significant savings in wiring costs.
Communication stack
The communications stack comprises OSI Layers 2 and 7. The FOUNDATION fieldbus does
not use the OSI layers 3, 4, 5 and 6 because the functions of these layers are not needed. Instead
of these layers, the Fieldbus Access Sublayer (FAS) is used to map layer 7 directly to layer 2.
Layer 2, the Data Link Layer (DLL), controls transmission of messages onto the fieldbus. The
DDL manages access to the fieldbus through a deterministic centralised bus scheduler called the
Link Active Scheduler (LAS).
A fieldbus may have multiple Link Masters. If the current LAS fails, one of the Link Masters
o
The DLL is a subset of the emerging ISA/IEC DLL standards committee work.
t.c
s p
g
The Fieldbus Message Specification (FMS) is modeled after the OSI layer 7 Application Layer.
o
bl
FMS provides the communications services needed by the User Layer for remote access of data
across the fieldbus network.
p .
User Layer o u
r
The User Layer is not defined by the tOSI sg model. However, for the first time, the
e
FOUNDATION fieldbus specification defines n a complete user layer based on function blocks.
Function blocks provide the elementsdnecessary for manufacturers to construct interoperable
instruments and controllers. t u
y s
Device descriptionsci t
.
w by a device description (DD) written in a special programming
w
Each fieldbus device is described
w
language known as Device Description Language (DDL). The DD can be thought of as a
"driver" for the device.
The DD provides all of the information needed for a control system or host to interpret
communications coming from the device, including configuration, and diagnostic information.
Any control system or host can communicate with a device if it "knows" the DD for the device.
The host device uses an interpreter called Device Description Services (DDS) to read the DD for
the device.
New FOUNDATION fieldbus devices can be added to the fieldbus at any time by simply
connecting the device to the fieldbus wire and providing the control system or host can read the
identification of the fieldbus device, including the DD identifier, over the fieldbus. Once the DD
identifier is is known, the host reads the DD from a CDROM and supplies the DD to DDS for
interpretation.
Version 2 EE IIT, Kharagpur 7
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
The completion of the technical specifications for an interoperable fieldbus system is a major
milestone in the history of automation. The FOUNDATION fieldbus specification was
developed by a consortium of instrument and control system manufacturers that represent over
90% of the instrumentation and control systems provided to end-users worldwide. The
specifications will allow many manufacturers to deliver a wide range of interoperable fieldbus
devices. These devices will usher in the next major technology transition in process and
manufacturing automation.
o t.c
Controller Area Network (CAN) is a very reliable and message-oriented serial network that was
s p
originally designed for the automotive industry, but has become a sought after bus in industrial
g
automation as well as other applications. The CAN bus is primarily used in embedded systems,
o
. bl
and is actually a network established among micro controllers. The main features are a two-wire,
half duplex, high-speed network system mainly suited for high-speed applications using short
u p
messages. Its robustness, reliability and compatibility to the design issues in the semiconductor
r o
industry are some of the remarkable aspects of the CAN technology.
s g
Main Features
ent
u d
CAN can link up to 2032 devices (assuming one node with one identifier) on a single
t
network. But accounting to the practical limitations of the hardware (transceivers), it may
s
it y
only link up to110 nodes (with 82C250, Philips) on a single network.
.c
It offers high-speed communication rate up to 1 Mbits/sec thus facilitating real-time
control.
w
It embodies unique error confinement and the error detection features making it more
w
trustworthy and adaptable to a noise critical environment.
w
CAN Versions
Originally, Bosch provided the specifications. However the modern counterpart is designated as
Version 2.0 of this specification, which is divided into two parts:
Version 2.0A or Standard CAN; Using 11 bit identifiers.
Version 2.0B or Extended CAN; Using 29 bit identifiers.
The main aspect of these Versions is the formats of the MESSAGE FRAME; the main
difference being the IDENTIFIER LENGTH.
CAN Standards
There are two ISO standards for CAN. The two differ in their physical layer descriptions.
ISO 11898 handles high-speed applications up to 1Mbit/second.
ISO 11519 can go upto an upper limit of 125kbit/second.
gr
Possess Sophisticated Error Detection and Handling Capability.
s
nt
Has High immunity to Electromagnetic Interference.
Associated with, is a Short Latency time for High-Priority Message.
e
The total number of Nodes is not limited by the protocol itself.
d
u
Very easy Adaptation and entails flexible Extension and Modification features.
t
y s
it
BASIC CAN Controller
c
The basic topology for w
.
the CAN Controller has been shown in figure 2 below. The basic
w for message transfers and it has an enhanced counterpart in Full-CAN
controller involves FIFOs
wmessage
controller, which uses BUFFERS instead.
M E S S A G E F R A M E
Idle Arbitration Field Control Data Field CRC Field ACK EOF Intr Idle
om
11-bit Identifier DLC Data (0-8) Bytes 15 bits
.c
ot
sp
SOF RTR slot
g
lo
r1
delimiter delimiter
.b
r0
o up
gr
THE CAN 2.0 B
ts
PROTOCOL / MESSAGE FRAME
en
ud
st
ity
M E S S A G E F R A M E
.c
w
Idle Arbitration Field Control Data Field CRC Field ACK EOF Intr Idle
w
w
FIGURE 2
om
.c
Global
ot
Status and
sp
B C
Control
g
u P
Registers
lo
s U
.b
up
I I
PROTOCOL 10 Bytes Host CPU
CAN Bus n n
o
CONTROLLER Transmit System
gr
t t
Buffer
ts
e e
en
r r
f f
ud
a a
Acceptance 10 Bytes
st
c c
Decision Filter Receive
ity
e e
Buffer
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
5
Embedded
Communications
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
27
Wireless Communication
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Describe the benefits and issues in wireless communication
Distinguish between WLAN, WPAN and their different implementations like Ricochet,
HiperLAN, HomeRF and Bluetooth
Choose a particular wireless communication standard to suit an application
Wireless Communication
Third generation wireless technologies are being developed to enable personal, high-speed
interactive connectivity to wide area networks (WANs). The IEEE 802.11x wireless
m
technologies finds themselves with an increasing presence in corporate and academic office
o
t.c
spaces, buildings, and campuses. Furthermore, with slow but steady inroads into public areas
such as airports and coffee bars. WAN, LAN and PAN technologies enable device connectivity
o
to infrastructure-based services - either through campus or corporate backbone intranet.
p
g s
The other end of coverage spectrum is occupied by the short-range embedded wireless
o
. bl
connectivity technologies that allow devices to communicate with each other directly without the
need for an established infrastructure. At this end of the coverage spectrum the wireless
u p
technologies like Ricochet, Bluetooth etc. offer the benefits of omni-directionality and the
r o
elimination of the line-of-sight requirement of RF-based connectivity. The embedded
g
connectivity space resembles a communication bubble that follows people around and empowers
s
nt
them to connect their personal devices with other devices that enter the bubble. Connectivity in
this bubble is spontaneous and ephemeral and can involve several devices of diverse computing
d e
capabilities, unlike wireless LAN solutions that are designed for communication between
t u
devices of sufficient computing power and battery.
y s
it
The table below shows a short comparison of various technologies in the wireless arena.
.c
w
w
w
o m
o t.c
In this lesson we look at the most commonly adopted prospects of different wireless technologies
mentioned above.
s p
o g
WLANs-IEEE 802.11X
. bl
u p
This is the most prominent technology standard for WLANs (Wireless Local Area Networks).
r o
This comprises of a PHY (Physical Layer) and MAC (Physical and Medium Access Control).
g
This allows specific carrier frequencies in the 2.4 GHz range bandwidths with data rates of 1 or 2
s
nt
Mbps. Further enhancements to the same technology has lead to the modern day protocol known
as the 802.11b which provides a basic data rate of 11Mbps and a fall-back rate of 5.5Mbps.All
d e
these technologies operate in the internationally available 2.4GHz ISM band. Both IEEE 802.11
t u
and 802.11b standards are capable of providing communications between a number of terminals
s
as an ad hoc network using peer-to-peer mode (see figures at the end) or as a client/server (see
y
it
figures at the end) wireless configuration or a complicated distributed network (see figures at the
.c
end). All these networks require Wireless Cards (PCMCIA-Personal Computer Memory Card
w
International Association-Cards) and wireless LAN Access points. There are two transmission
w
types for these technologies: Frequency Hopping Spread Spectrum (FHSS) and Direct Sequence
w
Spread Spectrum (DSSS). Whereas FHSS is primarily used for low power, low-range
applications, the DSSS is popular with Ethernet-like data rates. In the ad-hoc network mode, as
there is no central controller, the wireless access cards use the CSMA/CA(Carrier Sense Multiple
Access with Collision Avoidance) protocol to resolve shared access of the channel. In the
client/server configuration, many PCs and laptops, physically close to each other (20 to 500
meters), can be linked to a central hub (Known as the access point) that serves as a bridge
between them and the wired network. The wireless access cards provide the interface between
the PCs and the antenna while the access point serves as the wireless LAN hub. The access point
is as high as the ceiling of a roof and can support 115-250 users for receiving, buffering and
transmitting data between the WLAN and the wired network. Access points can be programmed
to select one of the hopping sequences, and he PCMCIA cards tune in to the corresponding
sequence. The WLAN bridge could also be implemented using line-of-sight directional antennas.
Handover and roaming can also be supported across the various access points. Encryption is also
supported using the optional shared-key RC4 (Ron's Code 4 or Rivest's Cipher) algorithm.
Palm Pilot
o m
PDA
o t.c
s p
o g
bl
Peer-to-Peer wireless mode
p .
u LAN Access Point
Wireless
o
gr
Wired Network ts
en
u d
st
i ty
.c
w
w
w
Station
Wired
Network
Station
Distributed
System
Access Point
o m
Station
o t.c Station
s p
o g
bl
Wired distributed network
WPANs-802.15X
p .
ou
gr
WPANs (Wireless Personal Area Networks) work as short-range wireless networks. The various
WPAN protocols and their interfaces have been and are being standardized by the IEEE 802.15
s
nt
WG (WPAN Working Group). There are four divisions of this standardization.
e
d
1. Under the IEEE 802.15 WPAN/Bluetooth Task Group
u
t
This group deals with support and development of applications requiring medium-rate WPANs
s
it y
(e.g. Bluetooth). These WPANs are supposed to handle technicalities for PDA communications,
.c
Cell-phones and also possess the QoS for voice applications.
w
2. Under the IEEE 802.15 Coexistence Task Group
w
This division deals with developing specifications on the unlicensed ISM band. This standard
w
also called 802.15.2 is developing recommendations to facilitate coexistence of WPANs (802.15)
and WLANs (802.11) such that applications like Bluetooth and Microwaves could operate
flawlessly in the ISM range.
Ricochet
This provides a secure mobile access to the desktop from outside an office. This service is
provided by MERICOM a commercial Internet Service Provider (ISP). This was primarily
provided at the airports and some selected areas. The Ricochet Network is a wide area wireless
network system using spread spectrum packet switching technique and Metricom's patented
frequency hopping, checker architecture. The network operates within the license-free (902-928)
MHz) ISM band. A Ricochet wireless micro cellular data network (MCDN) is shown in the
figure below.
Name Server
o m
Gateway
t. c
p o
Modem
radio
Wireless
Access Point g s
o
bl
Router
p .
u
oNetwork Interconnection facility
gr
Computer
s
ent
u d
Ricochet Wireless Microcellular data network
st
it y
It consists of shoebox sized radio transceivers, also called microcell radios, and are typically
.c
mounted to streetlights or utility poles. The microcells require only a small amount of power
from the streetlight itself with the help of a special adapter. Each micro cell radio employs 162
w
frequency-hopping channels and uses a randomly selected hopping sequence. This allows for a
w
very secure network to all subscribers. Within a 20-sq-mile radius containing about 100
w
microcell radios Richochet installs wired access points (WAPs) to collect and convert RF
packets into a format for transmission through a T1 connection. The Richochet Network has a
backbone called the name server, by checking the subscriber serial number. Data packets
between a Ricochet modem and a micro cell radio may take different routes during
transmissions. They can be routed to another Richochet modem or to one of the Internet
gateways, a telephone system, an X.25 network, and LANs or other corporate intranets, The
telephone system gateway provides telephone modem access (TMA), which can also be used to
connect to online internet services.
Services
Richochet provides immediate, dependable, and secure connections without the cost and
complexities of land-based phone lines, dial-up connections, or cellular modems. Richochet
modem features are its 28,800 bps, 24-hour access. The Richochet wireless network is based on
frequency hopping, spread-spectrum packet radio technology, with transmissions randomly
hopping every two-fifths of a second over 162 channels.
HomeRF
This technology comes under ad-hoc networking which spans an area such as enclosed home or
an office building or a warehouse floor in a workshop. A specification for wireless
communications in home called the shared wireless access protocol (SWAP) has been
developed. Some common applications targeted are:
access to a public network telephone (isochronous multimedia) and Internet (data)
entertainment networks (cable television, digital audio and video with IEEE 1394
transfer and sharing of data and resources (printer, Internet connection, etc.), and home
control and automation.
o m
t.c
Advantages of home RF
p o
In HomeRF same connection can be shared for both voice and data among the devices, at the
s
same time. This technology provides a platform for a broad range of interoperable consumer
g
o
devices for wireless digital communication between PCs and consumer electronic devices
anywhere in and around the home.
. bl
The Working Group u p
r o
The working group comprises of Compaq s g
n t Computer
Hewlett-Packard Co., IBM, Intel Corp., Microsoft Corp.,
Corp., Ericsson Enterprise Networks,
Motorola Corp. and several others. A
typical home RF is shown below.
d e
t u
s
ti y
.c
w
w
w
Phone Connection
MAIN PC Clock
Wireless
Cell Headset
Phone
Cable
Microwave Pager
Modem
Oven
Fridge
Other
Data Pad
PCs
o m
Television
Handheld
Communicator ot.c
s p Other PCs
Architecture-HomeRF
o g
Typical characteristics .bl
u p
Uses the 2.4 GHz ISM band r o
Data rate: 2 Mbps and 1 Mbps
s g
Range: 50m
Mobility 10m/s ent
Topology: Packet-Oriented
u d
st
Supports both centralized communication (Infrastructure) and ad-hoc (Infrastructure-less)
communication
it y
.c
Support for simultaneous voice and data transmissions
Provides Six audio connections at 32kbps with 20ms latency
w
w
Maximum data throughput 1.2 Mbps
w
Supports Low-Power paging mode
Provides QoS to voice-only devices and best effort for data-only devices.
HiperLAN
"HiperLAN" or "High-performance LAN" has been designed specifically for an ad-hoc
environment.
Topology : Packet-Oriented
Supports both centralized and ad-hoc communication.
Supports 25 audio connections at 32kbps and latency=10ms and, a video connection of 2
Mbps with 100ms latency and data rate=13.4Mbps.It supports MPEG or other state-of-
the-art real-time digital audio and video standards.
HiperLANs are available in two types :
o TYPE 1 : This has distributed MAC with QoS provisions and is based on GMSK
(Gaussian minimum shift keying)
o TYPE 2: This has a centralized scheduled MAC and is based on OFDM.
Objectives of HiperLAN
Provide QoS to build multiservice networks
Provide strong security
Handoff when moving between local area and wide area
o m
t.c
Increased throughput
Ease of use, deployment, and maintenance
Affordability and Scalability
p o
A typical HiperLAN system is shown in the figure below:
g s
o
.bl
Fixed Network
u p
r o
s g
e nt
AP
u d
AP AP AP
st
it y
.c
w
w
w
HiperLAN System
o m
o t.c
s p
o g
. bl
u p
r o
A Bluetooth Connection
bridge to existing data networksstand mechanism to form small private ad hoc groups of
the Bluetooth system. Bluetooth also provides a universal
i
connected devices away from fixed ty network architectures.
.c
w
Bluetooth wireless communication technologies operate in the 2.4 GHz range. There are
w Thisto isRFimportant
certain propositions related Communication in the 2.4 GHz spectrum which the device
w
developers must follow. for an organized use of the spectrum because it is
globally unlicensed. As such it is bound by specific regulations put forth by various countries in
their respective territories. In context to wireless communications the RF Spectrum has been
divided into 79 channels where bandwidth is limited to 1 MHz per channel. Frequency Hopping
spread spectrum communications must be incorporated. Also proper mechanism for interference
anticipation and removal should also be there. This is essential on account of the fact that the 2.4
GHz spectrum is unlicensed and, hence, more vulnerable to signal congestion because of
increasing number of new users trying to communicate within the bandwidth.
two different communication topologies of Bluetooth PANs are piconet and scatternet. They
are described in brief below.
The Piconet
PROXIMITY
SPHERE
PICONET
ent
starting from `A' represent the Active slaves and these are linked to the master with continuous
lines meaning `ACTIVE'. The slave names starting with `P' represent the parked slaves. Dashed
d
lines are shown connecting it to the master meaning that the connection is not continuous but the
u
t
devices are in the piconet i.e., `PARKED'. Some other slaves with names starting form `S'
s
it y
indicate the slaves, which are in STAND-BY and these, are actually outside the piconet but
.c
inside the proximity sphere.
w
w
w
The Scatternet
Piconet A Piconet B
AS A1 AS B1
PS A1 PS B1
MASTER MASTER
AS B2
AS B3
AS A2
PS B2
AS A3 AS B4
o m
o t.c
A SCATTERNET
s p
o g
Scatternet is formed when two or more piconets fall in each others proximity. More precisely, a
bl
scatternet is formed when two or more piconets at least partially overlap in time and space.
.
u p
Within a Scatternet a slave can participate in multiple piconets by establishing connections and
synchronizing with different masters in its proximity. A single device may act as master in one
r o
piconet and at the same time as slave in another one. A practical example of scatternet is mobile
s g
communication in which devices move frequently in and out of proximity of other devices.
Figure above shows a typical Scatternet.
e nt
Bluetooth Specifications u d
st
Typical Bluetooth specificationsty
i have been characterized in the table below.
.c
w
w
w
Baseband
o m
Low radio Layer
o t.c
s p
Bluetooth Core Protocols
o g
A brief description is as follows. .bl
u p
Service Discovery Protocol (SDP) provides means for application to discover which services are
o
provided by or available through a Bluetooth device. It also allows applications to determine the
r
g
characteristics of those available services. Logical Link Control and adaptation layer protocol
s
nt
(L2CAP) supports higher-level protocol multiplexing, packet-segmentation and reassembly, and
the conveying of QoS (Quality of Service) information. The link managers (on either side) for
d e
link step and control use Link Manager Protocol (LMP). The baseband and link control layer
t u
enables physical RF link between Bluetooth units forming piconet. It provides two different
s
packets, SCO and ACL, which can be transmitted in a multiplexing manner on the same RF link.
y
it
Different master/Slave pairs of the same piconet can use different link types, and the link type
.c
may change arbitrarily during a session. Each link type supports up to sixteen different packet
w
types. Four of these are control packets and are common for both SCO and ACL links. Both link
w
types use a TDD scheme or full-duplex transmissions. The SCO link is symmetric and typically
w
supports time-bounded voice traffic. SCO packets are transmitted over reserved intervals. Once
the connection is established, both master and slave units may send SCO packet types and allow
both voice and data transmission-with only the data portion being retransmitted when corrupted.
Operational States
Stand -By
o m
o t.c
Master Response Slave Response
s p
Inquiry Response
o g
. bl
u p
r o
s g
nt
Connection
d e
u
OPERATIONAL STATE MACHINE
t
State Description y s
c it
. default state and the lowest power consuming one too. Only the
STANDBY This is the
w in the low-power mode.
w
Bluetooth clock operates
INQUIRY In wthis state a device seeks and gets familiar with the identity of other devices
in its proximity. The other devices must have their Inquiry Scan state ENABLED if they
want to entertain the query from other devices.
PAGE In this state master of a piconet invites other devices to join in. To entertain this
request the invitee must have its Page Scan state ENABLED.
A device may bypass the inquiry state if the identity of the device it is wanting to page is
previously known (see the figure above). The figure above also indicates that any member of a
piconet not necessarily the master, may still perform INQUIRY and PAGE operations for
additional devices, thus, paving way for a Scatternet.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
28
Introduction to Real-Time
Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
computer served a large number of users. The PC era .saw bl the emergence
was marked by expensive computers that were quite unaffordable by individuals, and each
It can be easily inferred from the above discussion that in recent times real-time computers
have become ubiquitous and have permeated large number of application areas. At present, the
computers used in real-time applications vastly outnumber the computers that are being used in
conventional applications. According to an estimate [3], 70% of all processors manufactured
world-wide are deployed in real-time embedded applications. While it is already true that an
overwhelming majority of all processors being manufactured are getting deployed in real-time
applications, what is more remarkable is the unmistakable trend of steady rise in the fraction
of all processors manufactured world-wide finding their way to real-time applications.
Some of the reasons attributable to the phenomenal growth in the use of real-time
systems in the recent years are the manifold reductions in the size and the cost of the
computers, coupled with the magical improvements to their performance. The availability
of computers at rapidly falling prices, reduced weight, rapidly shrinking sizes, and their
increasing processing power have together contributed to the present scenario. Applications
which not too far back were considered prohibitively expensive to automate can now be
affordably automated. For instance, when microprocessors cost several tens of thousands of
rupees, they were considered to be too expensive to be put inside a washing machine; but when
they cost only a few hundred rupees, their use makes commercial sense.
o m
.c
The rapid growth of applications deploying real-time technologiest has been matched by the
evolutionary growth of the underlying technologies supporting p
o
the development of real-time
systems. In this book, we discuss some of the core technologies
g s used in developing real-time
systems. However, we restrict ourselves to software issues only
lo quite
and keep hardware discussions
to the bare minimum. The software issues that we address
. b are expansive in the sense that
besides the operating system and program development
database issues. u p issues, we discuss the networking and
r o
s g introductory and fundamental issues. In the
In this chapter, we restrict ourselves to some
next three chapters, we discuss some core ttheories underlying the development of practical
n
real-time and embedded systems. Inethe subsequent chapter, we discuss some important
u
features of commercial real-time operatingd systems. After that, we shift our attention to real-
time communication technologies and
st databases.
i t y
1.1. What is Real-Time? .c
w notion of time. Real-time is measured using a physical (real)
w
Real-time is a quantitative
clock. Whenever wewquantify time using a physical clock, we deal with real time. An example
use of this quantitative notion of time can be observed in a description of an automated chemical
plant. Consider this: when the temperature of the chemical reaction chamber attains a certain
o
predetermined temperature, say 250 C, the system automatically switches off the heater within a
predetermined time interval, say within 30 milliseconds. In this description of a part of the
behavior of a chemical plant, the time value that was referred to denotes the readings of some
physical clock present in the plant automation system.
In contrast to real time, logical time (also known as virtual time) deals with a qualitative
notion of time and is expressed using event ordering relations such as before, after, sometimes,
eventually, precedes, succeeds, etc. While dealing with logical time, time readings from a
physical clock are not necessary for ordering the events. As an example, consider the following
part of the behavior of library automation software used to automate the book-keeping activities
of a college library: After a query book command is given by the user, details of all matching
books are displayed by the software. In this example, the events issue of query book
command and display of results are logically ordered in terms of which events follow the
other. But, no quantitative expression of time was required. Clearly, this example behavior is
devoid of any real-time considerations. We are now in a position to define what a real-time
system is:
A system is called a real-time system, when we need quantitative expression of time (i.e.
real-time) to describe the behavior of the system.
Remember that in this definition of a real-time system, it is implicit that all quantitative time
measurements are carried out using a physical clock. A chemical plant, whose part behavior
description is - when temperature of the reaction chamber attains certain predetermined
temperature value, say 250oC, the system automatically switches off the heater within say
30 milliseconds - is clearly a real-time system. Our examples so far were restricted to the
description of partial behavior of systems. The complete behavior of a system can be described
o m
by listing its response to various external stimuli. It may be noted that all the clauses in
o t.c
the description of the behavior of a real-time system need not involve quantitative measures of
time. That is, large parts of a description of the behavior of a system may not have any
p
quantitative expressions of time at all, and still qualify as a real-time system. Any system whose
s
g
behavior can completely be described without using any quantitative expression of time is of
o
bl
course not a real-time system.
p
1.2. Applications of Real-Time Systems
.
o u
Real-time systems have of late, found g r
applications in wide ranging areas. In the
following, we list some of the prominent areas ts of application of real-time systems and in each
n
identified case, we discuss a few example applications in some detail. As we can imagine, the
list would become very vast if we tryeto exhaustively list all areas of applications of real-
u d our list to only a handful of areas, and out of these we
have explained only a few selected s t applications to conserve space. We have pointed out the
time systems. We have therefore restricted
i
quantitative notions of time used ty in the discussed applications. The examples we present are
important to our subsequent.c discussions and would be referred to in the later chapters whenever
required. w
w
1.2.1. Industrial w Applications
Industrial applications constitute a major usage area of real-time systems. A few examples of
industrial applications of real-time systems are: process control systems, industrial automation
systems, SCADA applications, test and measurement equipments, and robotic equipments.
Chemical plant control systems are essentially a type of process control application. In an
automated chemical plant, a real-time computer periodically monitors plant conditions. The
plant conditions are determined based on current readings of pressure, temperature, and
chemical concentration of the reaction chamber. These parameters are sampled
periodically. Based on the values sampled at any time, the automation system decides on the
corrective actions necessary at that instant to maintain the chemical reaction at a certain rate.
Each time the plant conditions are sampled, the automation system should decide on the
exact instantaneous corrective actions required such as changing the pressure,
temperature, or chemical concentration and carry out these actions within certain
predefined time bounds. Typically, the time bounds in such a chemical plant control
application range from a few micro seconds to several milliseconds.
nt
engine paint
Chassis
de
Finished car
t uConveyor Belt
it ys
Fig. 28.1 Schematic Representation of an Automated Car Assembly Plant
.c
w
Example 3: Supervisory Control And Data Acquisition (SCADA)
w
w
SCADA are a category of distributed control systems being used in many industries. A
SCADA system helps monitor and control a large number of distributed events of interest. In
SCADA systems, sensors are scattered at various geographic locations to collect raw data
(called events of interest). These data are then processed and stored in a real-time database.
The database models (or reflects) the current state of the environment. The database is
updated frequently to make it a realistic model of the up-to-date state of the environment. An
example of a SCADA application is an Energy Management System (EMS). An EMS helps
to carry out load balancing in an electrical energy distribution network. The EMS senses the
energy consumption at the distribution points and computes the load across different phases
of power supply. It also helps dynamically balance the load. Another example of a SCADA
system is a system that monitors and controls traffic in a computer network. Depending on
the sensed load in different segments of the network, the SCADA system makes the
router change its traffic routing policy dynamically. The time constraint in such a SCADA
application is that the sensors must sense the system state at regular intervals (say every few
milliseconds) and the same must be processed before the next state is sensed.
1.2.2. Medical
A few examples of medical applications of real-time systems are: robots, MRI scanners,
radiation therapy equipments, bedside monitors, and computerized axial tomography (CAT).
Robots have become very popular nowadays and are being used in a wide variety of
medical applications. An application that we discuss here is a robot used in retrieving
displaced radioactive materials. Radioactive materials such as Cobalt and Radium are used
for treatment of cancer. At times during treatment, the radioactive Cobalt (or Radium) gets
o m
dislocated and falls down. Since human beings can not come near a radioactive material, a
robot is used to restore the radioactive material to its proper position. The robot walks into
o t.c
the room containing the radioactive material, picks it up, and restores it to its proper position.
The robot has to sense its environment frequently and based on this information, plan its
p
path. The real-time constraint on the path planning task of the robot is that unless it plans the
s
g
path fast enough after an obstacle is detected, it may collide with it. The time constraints
o
bl
involved here are of the order of a few milliseconds.
d e
t u
s
ti y
.c
w
w Computer
w
Fig. 28.2 A Real-Time System Embedded in an MPFI Car
Cellular systems have become a very popular means of mobile communication. A cellular
system usually maps a city into cells. In each cell, a base station monitors the mobile
handsets present in the cell. Besides, the base station performs several tasks such as
locating a user, sending and receiving control messages to a handset, keeping track of
call details for billing purposes, and hand-off of calls as the mobile moves. Call
hand-off is required when a mobile moves away from a base station. As a mobile moves
away, its received signal strength (RSS) falls at the base station. The base station monitors
this and as soon as the RSS falls below a certain threshold value, it hands-off the
details of the on-going call of the mobile to the base station of the cell to which the mobile
has moved. The hand-off must be completed within a sufficiently small predefined time
interval so that the user does not feel any temporary disruption of service during the hand-off.
Typically call hand-off is required to be achieved within a few milliseconds.
1.2.6. Aerospace
A few important use of real-time systems in aerospace applications are: avionics,
flight simulation, airline cabin management systems, satellite tracking systems, and computer
on-board an aircraft.
p .
sampled data, the on-board computer computes X, Y, and Z co-ordinates of the current
aircraft position and compares them with the pre-specified track data. Before the next sample
o u
values are obtained, it computes the deviation from the specified track values and takes any
gr
corrective actions that may be necessary. In this case, the sampling of the various
s
nt
parameters, and their processing need to be completed within a few micro seconds.
de Applications
1.2.7. Internet and Multimedia
u
t
it ys
Important use of real-time systems in multimedia and Internet applications include: video
conferencing and multimedia multicast, Internet routers and switches.
.c
w
Example 9: Video Conferencing
w
w
In a video conferencing application, video and audio signals are generated by cameras and
microphones respectively. The data are sampled at a certain pre-specified frame rate. These
are then compressed and sent as packets to the receiver over a network. At the receiver-end,
packets are ordered, decompressed, and then played. The time constraint at the receiver-end
is that the receiver must process and play the received frames at a predetermined constant
rate. Thus if thirty frames are to be shown every minute, once a frame play-out is complete,
the next frame must be played within two seconds.
Cell phones are possibly the fastest growing segment of consumer electronics. A cell
phone at any point of time carries out a number of tasks simultaneously. These include:
converting input voice to digital signals by deploying digital signal processing (DSP)
techniques, converting electrical signals generated by the microphone to output voice signals,
and sampling incoming base station signals in the control channel. A cell phone responds to
the communications received from the base station within certain specified time
bounds. For example, a base station might command a cell phone to switch the on-going
communication to a specific frequency. The cell phone must comply with such commands
from the base station within a few milliseconds.
In a railway reservation system, a central repository maintains the up-to-date data on booking
status of various trains. Ticket booking counters are distributed across different geographic
locations. Customers queue up at different booking counters and submit their reservation
requests. After a reservation request is made at a counter, it normally takes only a few
seconds for the system to confirm the reservation and print the ticket. A real-time constraint
in this application is that once a request is made to the computer, it must print the ticket or
display the seat unavailability message before the average human response time (about 20
seconds) expires, so that the customers do not notice any delay and get a feeling of having
obtained instant results. However, as we discuss a little later (in Section 1.6), this application
is an example of a category of applications that is in some aspects different from the other
discussed applications. For example, even if the results are produced just after 20 seconds,
nothing untoward is going to happen - this may not be the case with the other discussed
applications.
o m
conditioning block, which in turn is connected to the input interface. The output interface, output
conditioning, and the actuator are interfaced in a complementary manner. In the following, we
t.c
briefly describe the roles of the different functional blocks of a real-time system.
o
s p
Sensor: A sensor converts some physical characteristic of its environment into electrical
g
signals. An example of a sensor is a photo-voltaic cell which converts light energy into
o
bl
electrical energy. A wide variety of temperature and pressure sensors are also used. A
.
temperature sensor typically operates based on the principle of a thermocouple. Temperature
p
o u
sensors based on many other physical principles also exist. For example, one type of
temperature sensor employs the principle of variation of electrical resistance with
gr
temperature (called a varistor). A pressure sensor typically operates based on the
s
nt
piezoelectricity principle. Pressure sensors based on other physical principles also exist.
Input d e Input
Conditioning
t u Interface
Sensor
Unit
y s
i t Human
.c Real-Time
Computer
Computer
w Interface
wOutput
wConditioning Output
Interface
Unit
Actuator
Operators
Signal Conditioning Units: The electrical signals produced by a computer can rarely
be used to directly drive an actuator. The computer signals usually need conditioning
before they can be used by the actuator. This is termed output conditioning. Similarly, input
conditioning is required to be carried out on sensor signals before they can be accepted
by the computer. For example, analog signals generated by a photo-voltaic cell are normally
in the milli-volts range and need to be conditioned before they can be processed by a
computer. The following are some important types of conditioning carried out on raw
signals generated by sensors and digital signals generated by computers:
p .
4. Signal Mode Conversion: A type of signal mode conversion that is frequently carried
o u
out during signal conditioning involves changing direct current into alternating current
r
and vice-versa. Another type signal mode conversion that is frequently used is conversion
g
s
of analog signals to a constant amplitude pulse train such that the pulse rate or pulse
ent
width is proportional to the voltage level. Conversion of analog signals to a pulse train is
often necessary for input to systems such as transformer coupled circuits that do not pass
direct current.
u d
st
ity
.
Fromc D/A D/A To output
w
Processor
register converter signal
w Bus conditioning
unit
w
Fig. 28.4 An Output Interface
Interface Unit: Normally commands from the CPU are delivered to the actuator through an
output interface. An output interface converts the stored voltage into analog form and
then outputs this to the actuator circuitry. This of course would require the value
generated to be written on a register (see Fig. 28.4). In an output interface, in order to
produce an analog output, the CPU selects a data register of the output interface and writes
the necessary data to it. The two main functional blocks of an output interface are shown in
Fig. 28.4. The interface takes care of the buffering and the handshake control aspects. Analog
to digital conversion is frequently deployed in an input interface. Similarly, digital to analog
conversion is frequently used in an output interface.
In the following, we discuss the important steps of analog to digital signal conversion (ADC).
Analog to Digital Conversion: Digital computers can not process analog signals.
Therefore, analog signals need to be converted to digital form. Analog signals can be
converted to digital form using a circuitry whose block diagram is shown in Fig. 28.7. Using
the block diagram shown in Fig. 28.7, analog signals are normally converted to digital form
through the following two main steps:
Voltage
Time
o m
Fig. 28.5 Continuous Analog Voltage
o t.c
p
Sample the analog signal (shown in Fig. 28.5) at regular intervals. This sampling can be
s
g
done by a capacitor circuitry that stores the voltage levels. The stored voltage levels can
o
bl
be made discrete. After sampling the analog signal (shown in Fig. 28.5), a step waveform
as shown in Fig. 28.6 is obtained.
p .
Convert the stored value to a binary number
o u by using an analog to digital converter
(ADC) as shown in Fig. 28.7 and store the rdigital value in a register.
s g
n t
d e
t u
y s
Voltage i t
.c
w
w
w
Time
Fig. 28.6 Analog Voltage Converted to Discrete Form
o m
Digital to analog conversion can be carried out through a complementary set of operations.
t.c
We leave it as an exercise to the reader to figure out the details of the circuitry that can perform
the digital to analog conversion (DAC).
p o
g
1.4. Characteristics of Real-Time Systems s
b lo
p.
We now discuss a few key characteristics of real-time systems. These characteristics
distinguish real-time systems from non-real-time systems. However, the reader may note that all
o u
the discussed characteristics may not be applicable to every real-time system. Real-time systems
gr
cover such an enormous range of applications and products that a generalization of the
ts
characteristics into a set that is applicable to each and every system is difficult. Different
n
categories of real-time systems may exhibit the characteristics that we identify to different
e
d
extents or may not even exhibit some of the characteristics at all.
u
st
1. Time constraints: Every real-time task is associated with some time constraints. One form
it y
of time constraints that is very common is deadlines associated with tasks. A task deadline
.c
specifies the time before which the task must complete and produce the results. Other types
w
of timing constraints are delay and duration (see Section 1.7). It is the responsibility of the
w
real-time operating system (RTOS) to ensure that all tasks meet their respective time
w
constraints. We shall examine in later chapters how an RTOS can ensure that tasks meet
their respective timing constraints through appropriate task scheduling strategies.
2. New Correctness Criterion: The notion of correctness in real-time systems is different from
that used in the context of traditional systems. In real-time systems, correctness implies not
only logical correctness of the results, but the time at which the results are produced is
important. A logically correct result produced after the deadline would be considered as an
incorrect result.
Actuators
Environment Sensors
r
discussed in Section 1.2. An example of an embeddedo system that we would often refer is the
g in Example 6 of Sec. 1.2.
Multi-Point Fuel Injection (MPFI) system discussed
s
t
Safety-Criticality: For traditional nnon-real-time systems safety and reliability are
4.
independent issues. However, in many d e real-time systems these two issues are intricately
t u
bound together making them safety-critical. Note that a safe system is one that does not
cause any damage even whensit fails. A reliable system on the other hand, is one that can
operate for long durations ofi tytime without exhibiting any failures. A safety-critical system is
. c since any failure of the system can cause extensive damages.
required to be highly reliable
We elaborate this issuewin Section 1.5.
w
5. Concurrency: Awreal-time system usually needs to respond to several independent events
within very short and strict time bounds. For instance, consider a chemical plant automation
system (see Example1 of Sec. 1.2), which monitors the progress of a chemical reaction and
controls the rate of reaction by changing the different parameters of reaction such as
pressure, temperature, chemical concentration. These parameters are sensed using sensors
fixed in the chemical reaction chamber. These sensors may generate data asynchronously at
different rates. Therefore, the real-time system must process data from all the sensors
concurrently, otherwise signals may be lost and the system may malfunction. These systems
can be considered to be non-deterministic, since the behavior of the system depends on the
exact timing of its inputs. A non-deterministic computation is one in which two runs using
the same set of input data can produce two distinct sets of output data in the two runs.
6. Distributed and Feedback Structure: In many real-time systems, the different components
of the system are naturally distributed across widely spread geographic locations. In such
systems, the different events of interest arise at the geographically separate locations.
Therefore, these events may often have to be handled locally and responses produced to them
to prevent overloading of the underlying communication network. Therefore, the sensors and
the actuators may be located at the places where events are generated. An example of such a
system is a petroleum refinery plant distributed over a large geographic area. At each data
source, it makes good design sense to locally process the data before being passed on to a
central processor.
Many distributed as well as centralized real-time systems have a feedback structure as shown
in Fig. 28.9. In these systems, the sensors usually sense the environment periodically. The
sensed data about the environment is processed to determine the corrective actions necessary.
The results of the processing are used to carry out the necessary corrective actions on the
environment through the actuators, which in turn again cause a change to the required
characteristics of the controlled environment, and so on.
o m
Actuator
t.cSensor
o
s p
Actuator Computation gSensor
o Processing
Processing
b l
p .
o
Environmentu
g r
s
t of Real-Time Systems
n
Fig. 28.9 Feedback Structure
e
7. Task Criticality: Task criticality isda measure of the cost of failure of a task. Task criticality
is determined by examining how t ucritical are the results produced by the task to the proper
functioning of the system. A s
t y real-time system may have tasks of very different criticalities.
.
consideration while designingci forthatfault-tolerance.
It is therefore natural to expect the criticalities of the different tasks must be taken into
The higher the criticality of a task, the
more reliable it should w be made. Further, in the event of a failure of a highly critical
w detection and recovery are important. However, it should be realized
task, immediate failure
that task prioritywis a different concept and task criticality does not solely determine the task
priority or the order in which various tasks are to be executed (these issues shall be
elaborated in the later chapters).
in these car engines do not deal with processing frills such as screen-savers or a dozen of
different applications running at the same time. All that the processor in an MPFI system
needs to do is to compute the required fuel injection rate that is most efficient for a given
speed and acceleration.
9. Reactive: Real-time systems are often reactive. A reactive system is one in which an on-
going interaction between the computer and the environment is maintained. Ordinary systems
compute functions on the input data to generate the output data (See Fig. 28.10 (a)). In other
words, traditional systems compute the output data as some function of the input data. That
is, output data can mathematically be expressed as: output data = (input data). For example,
if some data I1 is given as the input, the system computes O1 as the result O1 = (I1). To
elaborate this concept, consider an example involving library automation software. In a
library automation software, when the query book function is invoked and Real-Time
Systems is entered as the input book name, then the software displays Author name: R.
Mall, Rack Number: 001, Number of Copies: 1.
o m
t.c
po
Starting Reactive System
Input data Output data
Traditional System g s
Parameters
o
(a) bl
p.
(b)
o u
Fig. 28.10 Traditional versus Reactive Systems
r
In contrast to the traditional computationg of the output as a simple function of the
ts any output data but enter into an on-going
input data, real-time systems do not produce
n
interaction with their environment. Ineeach interaction step, the results computed are used to
u d
carry out some actions on the environment. The reaction of the environment is sampled and
s
is fed back to the system. Thereforet the computations in a real-time system can be
i t
considered to be non-terminating.y This reactive nature of real-time systems is schematically
.c
shown in the Fig. 28.10(b).
w conditions, real-time systems need to continue to meet the
w
10. Stability: Under overload
w
deadlines of the most critical tasks, though the deadlines of non-critical tasks may not be met.
This is in contrast to the requirement of fairness for traditional systems even under overload
conditions.
11. Exception Handling: Many real-time systems work round-the-clock and often operate
without human operators. For example, consider a small automated chemical plant that is set
up to work non-stop. When there are no human operators, taking corrective actions on a
failure becomes difficult. Even if no corrective actions can be immediate taken, it is desirable
that a failure does not result in catastrophic situations. A failure should be detected and the
system should continue to operate in a gracefully degraded mode rather than shutting off
abruptly.
In real-time systems on the other hand, safety and reliability are coupled together. Before
o t.c
A fail-safe state of a system is one which if entered when the system fails, no damage would
result.
s p
o g
bl
To give an example, the fail-safe state of a word processing program is one where the
p .
document being processed has been saved onto the disk. All traditional non real-time systems do
have one or more fail-safe states which help separate the issues of safety and reliability - even if
u
a system is known to be unreliable, it can always be made to fail in a fail-safe state, and
o
r
consequently it would still be considered to be a safe system.
g
s
If no damage can result if a system enters a fail-safe state just before it fails, then through
ent
careful transit to a fail-safe state upon a failure, it is possible to turn an extremely unreliable and
unsafe system into a safe system. In many traditional systems this technique is in fact frequently
u d
adopted to turn an unreliable system into a safe system. For example, consider a traffic light
st
controller that controls the flow of traffic at a road intersection. Suppose the traffic light
t y
controller fails frequently and is known to be highly unreliable. Though unreliable, it can still be
i
.c
considered safe if whenever a traffic light controller fails, it enters a fail-safe state where all the
traffic lights are orange and blinking. This is a fail-safe state, since the motorists on seeing
w
w
blinking orange traffic light become aware that the traffic light controller is not working and
w
proceed with caution. Of course, a fail-safe state may not be to make all lights green, in which
case severe accidents could occur. Similarly, all lights turned red is also not a fail-safe state - it
may not cause accidents, but would bring all traffic to a stand still leading to traffic jams.
However, in many real-time systems there are no fail-safe states. Therefore, any failure of the
system can cause severe damages. Such systems are said to be safety-critical systems.
can only be ensured through increased reliability. It should now be clear why safety-critical
systems need to be highly reliable.
Just to give an example of the level of reliability required of safety-critical systems, consider
the following. For any fly-by-wire aircraft, most of its vital parts are controlled by a computer.
Any failure of the controlling computer is clearly not acceptable. The standard reliability
requirement for such aircrafts is at most 1 failure per 109 flying hours (that is, a million years of
continuous flying!). We examine how a highly reliable system can be developed in the next
section.
Error Detection and Removal: In spite of using the best available error avoidance
techniques, many errors still manage to creep into the code. These errors need to be
detected and removed. This can be achieved to a large extent by conducting thorough
reviews and testing. Once errors are detected, they can be easily fixed.
Built In Self Test (BIST): In BIST, the system periodically performs self tests of
its components. Upon detection of a failure, the system automatically reconfigures itself
by switching out the faulty component and switching in one of the redundant good
components.
Triple Modular Redundancy (TMR): In TMR, as the name suggests, three redundant
copies of all critical components are made to run concurrently (see Fig. 28.11). Observe
o m
that in Fig. 28.11, C1, C2, and C3 are the redundant copies of the same critical
o t.c
component. The system performs voting of the results produced by the redundant
components to select the majority result. TMR can help tolerate occurrence of only a
p
single failure at any time. (Can you answer why a TMR scheme can effectively tolerate a
s
g
single component failure only?). An assumption that is implicit in the TMR technique
o
bl
is that at any time only one of the three redundant components can produce
p .
erroneous results. The majority result after voting would be erroneous if two or more
components can fail simultaneously (more precisely, before a repair can be carried out).
u
In situations where two or more components are likely to fail (or produce erroneous
o
r
results), then greater amounts of redundancies would be required to be incorporated. A
g
s
little thinking can show that at least 2n+1 redundant components are required to tolerate
simultaneous failures of n component.
ent
u d
As compared to hardware, software fault-tolerance is much harder to achieve. To investigate
st
the reason behind this, let us first discuss the techniques currently being used to achieve
it y
software fault-tolerance. We do this in the following subsection.
. c
1.6. Software Fault-Tolerance Techniques
w
w
Two methods arew now popularly being used to achieve software fault-tolerance: N-version
programming and recovery block techniques. These two techniques are simple adaptations of the
basic techniques used to provide hardware fault-tolerance. We discuss these two techniques in
the following.
to statistical correlation of failures. Statistical correlation of failures means that even though
individual teams worked in isolation to develop the different versions of a software component,
still the different versions fail for identical reasons. In other words, the different versions of a
component show similar failure patterns. This does not mean that the different modules
developed by independent programmers, after all, contain identical errors. The reason for this is
not far to seek, programmers commit errors in those parts of a problem which they perceive to be
difficult - and what is difficult to one team is usually difficult to all teams. So, identical errors
remain in the most complex and least understood parts of a software component.
Recovery Blocks: In the recovery block scheme, the redundant components are called try
blocks. Each try block computes the same end result as the others but is intentionally written
using a different algorithm compared to the other try blocks. In N-version programming, the
different versions of a component are written by different teams of programmers, whereas in
recovery block different algorithms are used in different try blocks. Also, in contrast to the N-
version programming approach where the redundant copies are run concurrently, in the recovery
o m
block approach they are (as shown in Fig. 28.12) run one after another. The results produced by a
o t.c
try block are subjected to an acceptance test (see Fig. 28.12). If the test fails, then the next try
block is tried. This is repeated in a sequence until the result produced by a try block successfully
p
passes the acceptance test. Note that in Fig. 28.12 we have shown acceptance tests separately for
s
g
different try blocks to help understand that the tests are applied to the try blocks one after the
o
bl
other, though it may be the case that the same test is applied to each try block.
p .
o u Legend:
Component r TB: try block
s g
n t
TB2 e TB3
Input TB1
u d TB4 Exception
t
ys
Result Result
it
.c
test test test test
w
Success Failure Failure
w Success
w Result
As was the case with N-version programming, the recovery blocks approach also does not
achieve much success in providing effective fault-tolerance. The reason behind this is again
statistical correlation of failures. Different try blocks fail for identical reasons as was explained
in case of N-version programming approach. Besides, this approach suffers from a further
limitation that it can only be used if the task deadlines are much larger than the task computation
times (i.e. tasks have large laxity), since the different try blocks are put to execution one after the
other when failures occur. The recovery block approach poses special difficulty when used with
real-time tasks with very short slack time (i.e. short deadline and considerable execution time),
as the try blocks are tried out one after the other deadlines may be missed. Therefore, in such
cases the later try-blocks usually contain only skeletal code.
Progress of
computation
Rollback recovery
o t.c
system state is tested each time after some meaningful progress in computation is made.
Immediately after a state-check test succeeds, the state of the system is backed up on a stable
p
storage (see Fig. 28.13). In case the next test does not succeed, the system can be made to roll-
s
g
back to the last checkpointed state. After a rollback, from a checkpointed state a fresh
o
bl
computation can be initiated. This technique is especially useful, if there is a chance that the
failure. p .
system state may be corrupted as the computation proceeds, such as data corruption or processor
u
1.7. Types of Real-Time Tasksgro
ts
n is one for which quantitative expressions of time
We have already seen that a real-time task
are needed to describe its behavior. Thisequantitative expression of time usually appears in the
u dwhich the task produces results. The most frequently
form of a constraint on the time at
occurring timing constraint is a s t constraint which is used to express that a task is
deadline
required to compute its resultstywithin some deadline. We therefore implicitly assume only
c i on tasks in this section, though other types of constraints (as
.
deadline type of timing constraints
explained in Sec. .) may
three broad categories:w
w occur in practice. Real-time tasks can be classified into the following
w
A real-time task can be classified into either hard, soft, or firm real-time task
depending on the consequences of a task missing its deadline.
It is not necessary that all tasks of a real-time application belong to the same category. It is
possible that different tasks of a real-time system can belong to different categories. We now
elaborate these three types of real-time tasks.
An example of a system having hard real-time tasks is a robot. The robot cyclically carries
out a number of activities including communication with the host system, logging all completed
activities, sensing the environment to detect any obstacles present, tracking the objects of
interest, path planning, effecting next move, etc. Now consider that the robot suddenly
encounters an obstacle. The robot must detect it and as soon as possible try to escape colliding
with it. If it fails to respond to it quickly (i.e. the concerned tasks are not completed before the
required time bound) then it would collide with the obstacle and the robot would be considered
to have failed. Therefore detecting obstacles and reacting to it are hard real-time tasks.
1
Some computer games have hard real-time tasks; these are not safety-critical though. Whenever a timing constraint is not met, the game may
fail, but the failure may at best be a mild irritant to the user.
Utility
100%
Deadline
0
Response Time
Fig. 28.14 Utility of Result of a Firm Real-Time Task with Time
Firm real-time tasks typically abound in multimedia applications. The following are two
examples of firm real- time tasks:
m
Video conferencing: In a video conferencing application, video frames and the
o
t.c
accompanying audio are converted into packets and transmitted to the receiver over a
network. However, some frames may get delayed at different nodes during transit on a
o
packet-switched network due to congestion at different nodes. This may result in varying
p
g s
queuing delays experienced by packets traveling along different routes. Even when
packets traverse the same route, some packets can take much more time than the other
o
.bl
packets due to the specific transmission strategy used at the nodes. When a certain frame
is being played, if some preceding frame arrives at the receiver, then this frame is of no
u p
use and is discarded. Due to this reason, when a frame is delayed by more than say one
r o
second, it is simply discarded at the receiver-end without carrying out any processing on
it.
s g
Satellite-based tracking of enemy n tmovements: Consider a satellite that takes
pictures of an enemy territory e
u d and beams it to a ground station computer frame
by frame. The ground computer
different objects of interest t processes
s thewithenemy.
each frame to find the positional difference of
respect to their position in the previous frame to
determine the movements
i t yof When the ground computer is overloaded, a new
image may be received
.cis ofevennotbefore an older image is taken up for processing. In this
w
case, the older image much use. Hence the older images may be discarded and
w image could be processed.
the recently received
w
For firm real-time tasks, the associated time bounds typically range from a few milli seconds
to several hundreds of milli seconds.
Utility
100% Deadline
0
Response Time
Fig. 28.15 Utility of the Results Produced by a Soft Real-Time Task as a Function of Time
Version 2 EE IIT, Kharagpur 24
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
An example of a soft real-time task is web browsing. Normally, after an URL (Uniform
Resource Locater) is clicked, the corresponding web page is fetched and displayed within a
couple of seconds on the average. However, when it takes several minutes to display a
requested page, we still do not consider the system to have failed, but merely express that
the performance of the system has degraded.
Another example of a soft real-time task is a task handling a request for a seat reservation in
o m
a railway reservation application. Once a request for reservation is made, the response should
occur within 20 seconds on the average. The response may either be in the form of a printed
o t.c
ticket or an apology message on account of unavailability of seats. Alternatively, we might
state the constraint on the ticketing task as: At least in case of 95% of reservation
s p
requests, the ticket should be processed and printed in less than 20 seconds.
o g
bl
Let us now analyze the impact of the failure of a soft real-time task to meet its deadline, by
.
taking the example of the railway reservation task. If the ticket is printed in about 20 seconds, we
p
o u
feel that the system is working fine and get a feel of having obtained instant results. As already
stated, missed deadlines of soft real-time tasks do not result in system failures. However, the
gr
utility of the results produced by a soft real-time task falls continuously with time after the expiry
s
nt
of the deadline as shown in Fig. 28.15. In Fig. 28.15, the utility of the results produced are 100%
if produced before the deadline, and after the deadline is passed the utility of the results slowly
d e
falls off with time. For soft real-time tasks that typically occur in practical applications, the time
t u
bounds usually range from a fraction of a second to a few seconds.
y s
1.7.4. Non-Real-Timeit Tasks
. c
A non-real-time task w
is not associated with any time bounds. Can you think of any example
w Most of the interactive computations you perform nowadays are
of a non-real-time task?
w tasks.
handled by soft real-time However, about two or three decades back, when computers
were not interactive almost all tasks were non-real-time. A few examples of non-real-time tasks
are: batch processing jobs, e-mail, and back ground tasks such as event loggers. You may
however argue that even these tasks, in the strict sense of the term, do have certain time bounds.
For example, an e-mail is expected to reach its destination at least within a couple of hours of
being sent. Similar is the case with a batch processing job such as pay-slip printing. What then
really is the difference between a non-real-time task and a soft real-time task? For non-real-time
tasks, the associated time bounds are typically of the order of a few minutes, hours or even days.
In contrast, the time bounds associated with soft real-time tasks are at most of the order of a few
seconds.
1.8. Exercises
1. State whether you consider the following statements to be TRUE or FALSE. Justify your
answer in each case.
a. A hard real-time application is made up of only hard real-time tasks.
b. Every safety-critical real-time system has a fail-safe state.
c. A deadline constraint between two stimuli can be considered to be a behavioral
constraint on the environment of the system.
d. Hardware fault-tolerance techniques can easily be adapted to provide software fault-
tolerance.
e. A good algorithm for scheduling hard real-time tasks must try to complete each task in
the shortest time possible.
f. All hard real-time systems are safety-critical in nature.
g. Performance constraints on a real-time system ensure that the environment of the
system is well-behaved.
o m
h. Soft real-time tasks are those which do not have any time bounds associated with
them.
o t.c
i. Minimization of average task response times is the objective of any good hard real-
time task-scheduling algorithm.
s p
g
j. It should be the goal of any good real-time operating system to complete every
o
bl
hard real-time task as ahead of its deadline as possible.
2. .
What do you understand by the term real-time? How is the concept of real-time
p
example. ou
different from the traditional notion of time? Explain your answer using a suitable
3.
gr
Using a block diagram show the important hardware components of a real-time system and
s
nt
their interactions. Explain the roles of the different components.
4. In a real-time system, raw sensor signals need to be preprocessed before they can
d e
be used by a computer. Why is it necessary to preprocess the raw sensor signals before
t u
they can be used by a computer? Explain the different types of preprocessing that are
s
normally carried out on sensor signals to make them suitable to be used directly by
y
a computer.
it
5.
.c
Identify the key differences between hard real-time, soft real-time, and firm real-
w
time systems. Give at least one example of real-time tasks corresponding to these three
w
categories. Identify the timing constraints in your tasks and justify why the tasks should be
6.
w
categorized into the categories you have indicated.
Give an example of a soft real-time task and a non-real-time task. Explain the key
difference between the characteristics of these two types of tasks.
7. Draw a schematic model showing the important components of a typical hard real-time
system. Explain the working of the input interface using a suitable schematic diagram.
Explain using a suitable circuit diagram how analog to digital (ADC) conversion is
achieved in an input interface.
8. Explain the check pointing and rollback recovery scheme to provide fault-tolerant real-
time computing. Explain the types of faults it can help tolerate and the faults it can not
tolerate. Explain the situations in which this technique is useful.
9. Answer the following questions concerning fault-tolerance of real-time systems.
a. Explain why hardware fault-tolerance is easier to achieve compared to software fault-
tolerance.
b. Explain the main techniques available to achieve hardware fault-tolerance.
c. What are the main techniques available to achieve software fault-tolerance? What are
the shortcomings of these techniques?
10. What do you understand by the fail-safe state of a system? Safety-critical real-time
systems do not have a fail-safe state. What is the implication of this?
11. Is it possible to have an extremely safe but unreliable system? If your answer is
affirmative, then give an example of such a system. If you answer in the negative, then
justify why it is not possible for such a system to exist.
12. What is a safety-critical system? Give a few practical examples safety-critical hard real-
time systems. Are all hard real-time systems safety-critical? If not, give at least one
example of a hard real-time system that is not safety-critical.
13. Explain with the help of a schematic diagram how the recovery block scheme can
be used to achieve fault- tolerance of real-time tasks. What are the shortcomings of this
scheme? Explain situations where it can be satisfactorily be used and situations where it
can not be used.
14. Identify and represent the timing constraints in the following air-defense system by means
o m
of an extended state machine diagram. Classify each constraint into either performance or
15.
behavioral constraint.
o t.c
Every incoming missile must be detected within 0.2 seconds of its entering the radar
p
coverage area. The intercept missile should be engaged within 5 seconds of detection of
s
g
the target missile. The intercept missile should be fired after 0.1 seconds of its engagement
o
bl
but no later than 1 sec.
16.
.
Represent a washing machine having the following specification by means of an extended
p
state machine diagram. The wash-machine waits for the start switch to be pressed. After
u
the user presses the start switch, the machine fills the wash tub with either hot or cold
o
r
water depending upon the setting of the Hot Wash switch. The water filling continues until
g
s
the high level is sensed. The machine starts the agitation motor and continues agitating the
ent
wash tub until either the preset timer expires or the user presses the stop switch. After the
agitation stops, the machine waits for the user to press the start Drying switch. After the
u d
user presses the start Drying switch, the machine starts the hot air blower and continues
st
blowing hot air into the drying chamber until either the user presses the stop switch or the
preset timer expires.
it y
17.
.c
Represent the timing constraints in a collision avoidance task in an air surveillance system
as an extended finite state machine (EFSM) diagram. The collision avoidance task
w
w
consists of the following activities.
w
a. The first subtask named radar signal processor processes the radar signal on a signal
processor to generate the track record in terms of the targets location and velocity
within 100 mSec of receipt of the signal.
b. The track record is transmitted to the data processor within 1 mSec after the track
record is determined.
c. A subtask on the data processor correlates the received track record with the track
records of other targets that come close to detect potential collision that might occur
within the next 500 mSec.
d. If a collision is anticipated, then the corrective action is determined within 10 mSec by
another subtask running on the data processor.
e. The corrective action is transmitted to the track correction task within 25 mSec.
18. Consider the following (partial) specification of a real-time system:
The velocity of a space-craft must be sampled by a computer on-board the space-craft at
least once every second (the sampling event is denoted by S). After sampling the velocity,
the current position is computed (denoted by event C) within 100msec. Concurrently, the
Version 2 EE IIT, Kharagpur 27
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
expected position of the space-craft is retrieved from the database within 200msec
(denoted by event R). Using these data, the deviation from the normal course of the space-
craft must be determined within 100 msec (denoted by event D) and corrective velocity
adjustments must be carried out before a new velocity value is sampled in (the velocity
adjustment event is denoted by A). Calculated positions must be transmitted to the earth
station at least once every minute (position transmission event is denoted by the event T).
Identify the different timing constraints in the system. Classify these into either
performance or behavioral constraints. Construct an EFSM to model the system.
19. Construct the EFSM model of a telephone system whose (partial) behavior is described
below:
After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial
tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone
appears, the first digit should to be dialed within 10 seconds and the subsequent five digits
within 5 seconds of each other. If the dialing of any of the digits is delayed, then an idle
tone is produced. The idle tone continues until the receiver handset is replaced.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
29
Real-Time Task
Scheduling Part 1
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Task Instance: Each time an event occurs, it triggers the task that handles this event to run.
In other words, a task is generated when some specific event occurs. Real-time tasks therefore
normally recur a large number of times at different instants of time depending on the event
occurrence times. It is possible that real-time tasks recur at random instants. However, most
real-time tasks recur with certain fixed periods. For example, a temperature sensing task in a
chemical plant might recur indefinitely with a certain period because the temperature is sampled
periodically, whereas a task handling a device interrupt might recur at random instants. Each
time a task recurs, it is called an instance of the task. The first time a task occurs, it is
called the first instance of the task. The next occurrence of the task is called its second
o m
instance, and so on. The jth instance of a task Ti would be denoted as Ti(j). Each instance of a
real-time task is associated with a deadline by which it needs to complete and produce
s p
o g
.bl
Absolute deadline of Ti (1)
=+d u p
r o
Relative
s g
deadline of Ti(1)
=d
ent
Ti(1)
u d +d Ti(2)
0 st + pi
it y
.c
Arrival of Ti (1) Deadline of Ti (1)
w
w
Fig. 29.1 Relative and Absolute Deadlines of a Task
w
Relative Deadline versus Absolute Deadline: The absolute deadline of a task is the
absolute time value (counted from time 0) by which the results from the task are
expected. Thus, absolute deadline is equal to the interval of time between the time 0 and the
actual instant at which the deadline occurs as measured by some physical clock. Whereas,
relative deadline is the time interval between the start of the task and the instant at which
deadline occurs. In other words, relative deadline is the time interval between the arrival
of a task and the corresponding deadline. The difference between relative and absolute
deadlines is illustrated in Fig. 29.1. It can be observed from Fig. 29.1 that the relative deadline
of the task Ti(1) is d, whereas its absolute deadline is + d.
Response Time: The response time of a task is the time it takes (as measured from the task
arrival time) for the task to produce its results. As already remarked, task instances get generated
due to occurrence of events. These events may be internal to the system, such as clock interrupts,
or external to the system such as a robot encountering an obstacle.
The response time is the time duration from the occurrence of the event generating the task to
the time the task produces its results.
For hard real-time tasks, as long as all their deadlines are met, there is no special advantage
of completing the tasks early. However, for soft real-time tasks, average response time of tasks
is an important metric to measure the performance of a scheduler. A scheduler for soft real-
time tasks should try to execute the tasks in an order that minimizes the average response
time of tasks.
Task Precedence: A task is said to precede another task, if the first task must complete
before the second task can start. When a task Ti precedes another task Tj, then each instance of
o m
Ti precedes the corresponding instance of Tj. That is, if T1 precedes T2, then T1(1) precedes
T2(1), T1(2) precedes T2(2), and so on. A precedence order defines a partial order among tasks.
o t.c
Recollect from a first course on discrete mathematics that a partial order relation is
reflexive, antisymmetric, and transitive. An example partial ordering among tasks is shown in
s p
Fig. 29.2. Here T1 precedes T2, but we cannot relate T1 with either T3 or T4. We shall later
g
use task precedence relation to develop appropriate task scheduling algorithms.
o
bl
. T2
u p
T1
r o
s g
ent
u d
T4
st T3
y
it
c
Fig.. 29.2 Precedence Relation among Tasks
woften need to share their results among each other when one task needs
Data Sharing: Tasks w
w by another task; clearly, the second task must precede the first task.
to share the results produced
In fact, precedence relation between two tasks sometimes implies data sharing between the two
tasks (e.g. first task passing some results to the second task). However, this is not always true.
A task may be required to precede another even when there is no data sharing. For
example, in a chemical plant it may be required that the reaction chamber must be filled with
water before chemicals are introduced. In this case, the task handling filling up the reaction
chamber with water must complete, before the task handling introduction of the chemicals
is activated. It is therefore not appropriate to represent data sharing using precedence relation.
Further, data sharing may occur not only when one task precedes the other, but might occur
among truly concurrent tasks, and overlapping tasks. In other words, data sharing among tasks
does not necessarily impose any particular ordering among tasks. Therefore, data sharing relation
among tasks needs to be represented using a different symbol. We shall represent data sharing
among two tasks using a dashed arrow. In the example of data sharing among tasks represented
in Fig. 29.2, T2 uses the results of T3, but T2 and T3 may execute concurrently. T2 may even
start executing first, after sometimes it may receive some data from T3, and continue its
execution, and so on.
Periodic Task: A periodic task is one that repeats after a certain fixed time interval. The
precise time instants at which periodic tasks recur are usually demarcated by clock interrupts.
For this reason, periodic tasks are sometimes referred to as clock-driven tasks. The fixed time
interval after which a task repeats is called the period of the task. If Ti is a periodic task, then the
time from 0 till the occurrence of the first instance of Ti (i.e. Ti(1)) is denoted by i, and is
called the phase of the task. The second instance (i.e. Ti(2)) occurs at i + pi. The third instance
m
o
(i.e. Ti(3)) occurs at i + 2 pi and so on. Formally, a periodic task Ti can be represented by a
t.c
4 tuple (i, pi, ei, di) where pi is the period of task, ei is the worst case execution time of the
o
discussions. s p
task, and di is the relative deadline of the task. We shall use this notation extensively in future
o g
. bl
u p
eir o
s g
= 2000 di ent
u d
st
0 i
t y + pi + 2*pi
.c
w
Fig. 29.3 Track Correction Task (2000mSec; pi; ei; di) of a Rocket
w
w
To illustrate the above notation to represent real-time periodic tasks, let us consider
the track correction task typically found in a rocket control software. Assume the following
characteristics of the track correction task. The track correction task starts 2000 milliseconds
after the launch of the rocket, and recurs periodically every 50 milliseconds then on. Each
instance of the task requires a processing time of 8 milliseconds and its relative deadline is 50
milliseconds. Recall that the phase of a task is defined by the occurrence time of the first
instance of the task. Therefore, the phase of this task is 2000 milliseconds. This task can formally
be represented as (2000 mSec, 50 mSec, 8 mSec, 50 mSec). This task is pictorially shown in Fig.
29.3. When the deadline of a task equals its period (i.e. pi=di), we can omit the fourth tuple. In
this case, we can represent the task as Ti= (2000 mSec, 50 mSec, 8 mSec). This would
automatically mean pi=di=50 mSec. Similarly, when i = 0, it can be omitted when no confusion
arises. So, Ti = (20mSec; 100mSec) would indicate a task with i = 0, pi=100mSec, ei=20mSec,
and di=100mSec. Whenever there is any scope for confusion, we shall explicitly write out the
parameters Ti = (pi=50 mSecs, ei = 8 mSecs, di = 40 mSecs), etc.
A vast majority of the tasks present in a typical real-time system are periodic. The reason for
this is that many activities carried out by real-time systems are periodic in nature, for example
monitoring certain conditions, polling information from sensors at regular intervals to carry
out certain action at regular intervals (such as drive some actuators). We shall consider
examples of such tasks found in a typical chemical plant. In a chemical plant several
temperature monitors, pressure monitors, and chemical concentration monitors periodically
sample the current temperature, pressure, and chemical concentration values which are then
communicated to the plant controller. The instances of the temperature, pressure, and chemical
concentration monitoring tasks normally get generated through the interrupts received from a
periodic timer. These inputs are used to compute corrective actions required to maintain the
chemical reaction at a certain rate. The corrective actions are then carried out through actuators.
Sporadic Task: A sporadic task is one that recurs at random instants. A sporadic task Ti
can be is represented by a three tuple:
Ti = (ei, gi, di)
o m
where ei is the worst case execution time of an instance of the task, gi denotes the minimum
o t.c
separation between two consecutive instances of the task, di is the relative deadline. The
minimum separation (gi) between two consecutive instances of the task implies that once an
p
instance of a sporadic task occurs, the next instance cannot occur before gi time units have
s
g
elapsed. That is, gi restricts the rate at which sporadic tasks can arise. As done for
o
bl
periodic tasks, we shall use the convention that the first instance of a sporadic task Ti is denoted
by Ti(1) and the successive instances by Ti(2), Ti(3), etc.
p .
Many sporadic tasks such as emergency message arrivals are highly critical in nature. For
u
example, in a robot a task that gets generated to handle an obstacle that suddenly appears is a
o
r
sporadic task. In a factory, the task that handles fire conditions is a sporadic task. The time of
g
occurrence of these tasks can not be predicted.
s
ent
The criticality of sporadic tasks varies from highly critical to moderately critical. For
example, an I/O device interrupt, or a DMA interrupt is moderately critical. However, a
u d
task handling the reporting of fire conditions is highly critical.
st
t y
Aperiodic Task: An aperiodic task is in many ways similar to a sporadic task. An aperiodic
i
.c
task can arise at random instants. However, in case of an aperiodic task, the minimum separation
gi between two consecutive instances can be 0. That is, two or more instances of an
w
w
aperiodic task might occur at the same time instant. Also, the deadline for an aperiodic
w
tasks is expressed as either an average value or is expressed statistically. Aperiodic tasks are
generally soft real-time tasks.
It is easy to realize why aperiodic tasks need to be soft real-time tasks. Aperiodic
tasks can recur in quick succession. It therefore becomes very difficult to meet the deadlines
of all instances of an aperiodic task. When several aperiodic tasks recur in a quick
succession, there is a bunching of the task instances and it might lead to a few deadline misses.
As already discussed, soft real-time tasks can tolerate a few deadline misses. An example of an
aperiodic task is a logging task in a distributed system. The logging task can be started by
different tasks running on different nodes. The logging requests from different tasks may arrive
at the logger almost at the same time, or the requests may be spaced out in time. Other examples
of aperiodic tasks include operator requests, keyboard presses, mouse movements, etc. In fact,
all interactive commands issued by users are handled by aperiodic tasks.
Scheduling Points: The scheduling points of a scheduler are the points on time line at which
the scheduler makes decisions regarding which task is to be run next. It is important to note that
a task scheduler does not need to run continuously, it is activated by the operating system only at
the scheduling points to make the scheduling decision as to which task to be run next. In a
clock-driven scheduler, the scheduling points are defined at the time instants marked by
interrupts generated by a periodic timer. The scheduling points in an event-driven scheduler are
determined by occurrence of certain events.
Preemptive Scheduler: A preemptive scheduler is one which when a higher priority task
arrives, suspends any lower priority task that may be executing and takes up the higher priority
task for execution. Thus, in a preemptive scheduler, it can not be the case that a higher priority
task is ready and waiting for execution, and the lower priority task is executing. A preempted
lower priority task can resume its execution only when no higher priority task is ready.
Utilization: The processor utilization (or simply utilization) of a task is the average time for
which it executes per unit time interval. In notations: for a periodic task Ti, the utilization ui =
ei/pi, where ei is the execution time and pi is the period of Ti. For a set of periodic tasks {Ti}: the
n
total utilization due to all tasks U = i=1 ei/pi. It is the objective of any good scheduling
algorithm to feasibly schedule even those task sets that have very high utilization, i.e. utilization
approaching 1. Of course, on a uniprocessor it is not possible to schedule task sets having
utilization more than 1.
o m
Jitter: Jitter is the deviation of a periodic task from its strict periodic behavior. The
arrival time jitter is the deviation of the task from arriving at the precise periodic time of arrival.
o t.c
It may be caused by imprecise clocks, or other factors such as network congestions. Similarly,
completion time jitter is the deviation of the completion of a task from precise periodic points.
s p
The completion time jitter may be caused by the specific scheduling algorithm employed
g
which takes up a task for scheduling as per convenience and the load at an instant, rather than
o
bl
scheduling at some strict time instants. Jitters are undesirable for some applications.
.
1.4. Classification of Real-Time Task p
u Scheduling Algorithms
o
r task scheduling algorithms exist. A popular
Several schemes of classification of real-timeg
s algorithms based on how the scheduling points
scheme classifies the real-time task scheduling t
n according to this classification scheme are:
e
are defined. The three main types of schedulers
clock-driven, event-driven, and hybrid.d
t u
The clock-driven schedulers are y sthose in which the scheduling points are determined by the
t
i In the event-driven ones, the scheduling points are defined
. c
interrupts received from a clock.
by certain events which precludes clock interrupts. The hybrid ones use both clock interrupts
w
as well as event occurrences
w to define their scheduling points.
w
A few important members of each of these three broad classes of scheduling algorithms are
the following:
1. Clock Driven
Table-driven
Cyclic
2. Event Driven
Simple priority-based
Rate Monotonic Analysis (RMA)
Earliest Deadline First (EDF)
3. Hybrid
Round-robin
Important members of clock-driven schedulers that we discuss in this text are table-driven
and cyclic schedulers. Clock-driven schedulers are simple and efficient. Therefore, these are
frequently used in embedded applications. We investigate these two schedulers in some detail in
Sec. 2.5.
Important examples of event-driven schedulers are Earliest Deadline First (EDF) and Rate
Monotonic Analysis (RMA). Event-driven schedulers are more sophisticated than clock-driven
schedulers and usually are more proficient and flexible than clock-driven schedulers. These are
more proficient because they can feasibly schedule some task sets which clock-driven
schedulers cannot. These are more flexible because they can feasibly schedule sporadic and
aperiodic tasks in addition to periodic tasks, whereas clock-driven schedulers can satisfactorily
handle only periodic tasks. Event-driven scheduling of real-time tasks in a uniprocessor
environment was a subject of intense research during early 1970s, leading to publication of a
large number of research results. Out of the large number of research results that were
published, the following two popular algorithms are the essence of all those results: Earliest
Deadline First (EDF), and Rate Monotonic Analysis (RMA). If we understand these two
o m
schedulers well, we would get a good grip on real-time task scheduling on uniprocessors. Several
variations to these two basic algorithms exist.
o t.c
Another classification of real-time task scheduling algorithms can be made based upon the
p
type of task acceptance test that a scheduler carries out before it takes up a task for scheduling.
s
g
The acceptance test is used to decide whether a newly arrived task would at all be taken up for
o
bl
scheduling or be rejected. Based on the task acceptance test used, there are two broad categories
of task schedulers:
Planning-based p .
Best effort
ou
r
In planning-based schedulers, when a task arrives the scheduler first determines whether the
g
s
task can meet its dead- lines, if it is taken up for execution. If not, it is rejected. If the task can
ent
meet its deadline and does not cause other already scheduled tasks to miss their respective
deadlines, then the task is accepted for scheduling. Otherwise, it is rejected. In best effort
u d
schedulers, no acceptance test is applied. All tasks that arrive are taken up for scheduling and
st
best effort is made to meet its deadlines. But, no guarantee is given as to whether a tasks
deadline would be met.
it y
.c
A third type of classification of real-time tasks is based on the target platform on which the
tasks are to be run. The different classes of scheduling algorithms according to this scheme are:
Uniprocessor w
Multiprocessorw
Distributed
w
Uniprocessor scheduling algorithms are possibly the simplest of the three classes of
algorithms. In contrast to uniprocessor algorithms, in multiprocessor and distributed scheduling
algorithms first a decision has to be made regarding which task needs to run on which processor
and then these tasks are scheduled. In contrast to multiprocessors, the processors in a distributed
system do not possess shared memory. Also in contrast to multiprocessors, there is no global up-
to-date state information available in distributed systems. This makes uniprocessor scheduling
algorithms that assume central state information of all tasks and processors to exist unsuitable for
use in distributed systems. Further in distributed systems, the communication among tasks is
through message passing. Communication through message passing is costly. This means that a
scheduling algorithm should not incur too much communication over- head. So carefully
designed distributed algorithms are normally considered suitable for use in a distributed system.
In the following sections, we study the different classes of schedulers in more detail.
lo at run time.
given the freedom to select his own schedule for the set of tasks in the application and store the
b
schedule in a table (called schedule table) to be used by the scheduler
An example of a schedule table is shown in Table 1.. Table 1 shows that task T1 would be
taken up for execution at time instant 0, T would start
2 u pexecution 3 milliseconds afterwards, and
so on. An important question that needs to be addressed
r o at this point is what would be the size of
the schedule table that would be required for some
on a system? An answer to this question cantbe sggiven
given set of periodic real-time tasks to be run
as follows: if a set ST = {T } of n tasks is
n i
In the reasoning we presented above for the computation of the size of a schedule table, one
assumption that we implicitly made is that i = 0. That is, all tasks are in phase.
Start time in
Task
millisecs
T1 0
T2 3
T3 10
T4 12
T5 17
However, tasks often do have non-zero phase. It would be interesting to determine what
would be the major cycle when tasks have non-zero phase. The result of an investigation into
this issue has been given as Theorem 2.1.
1.5.2. Theorem 1
The major cycle of a set of tasks ST = {T1, T2, , Tn} is LCM ({p1, p2, , pn}) even when the
tasks have arbitrary phasing.
Proof: As per our definition of a major cycle, even when tasks have non-zero phasing, task
instances would repeat the same way in each major cycle. Let us consider an example in which
the occurrences of a task Ti in a major cycle be as shown in Fig. 29.4. As shown in the example
of Fig. 29.4, there are k-1 occurrences of the task Ti during a major cycle. The first occurrence
of Ti starts time units from the start of the major cycle. The major cycle ends x time units after
o t.c
+x=pi s p
o g
bl
p.
M M
Ti(1) Ti(2) Ti(k-1)
o uTi(k) Ti(k+1) Ti(2k-1)
gr
xs
t x
time en
u d
t
Fig. 29.4 Major Cycle When a Task Ti has Non-Zero Phasing
s
it y
Assume that the size of each major cycle is M. Then, from an inspection of Fig. 29.4, for the
.c
task to repeat identically in each major cycle:
w M = (k-1)pi + + x
w (2.1)
w
Now, for the task Ti to have identical occurrence times in each major cycle, + x must equal
to pi (see Fig. 29.4).
Substituting this in Expr. 2.1, we get, M = (k-1) pi + pi = k pi (2.2)
So, the major cycle M contains an integral multiple of pi. This argument holds for each task
in the task set irrespective of its phase. Therefore M = LCM ({p1, p2, , pn}).
e
T
3
n f 1
udT
1 2
st 3 f 3
it y T4 f4
.c
Fig. 29.6 An Example Schedule Table for a Cyclic Scheduler
w
w
The size of the frame to be used by the scheduler is an important design parameter and needs
w
to be chosen very carefully. A selected frame size should satisfy the following three constraints.
2. Minimization of Table Size: This constraint requires that the number of entries in the
schedule table should be minimum, in order to minimize the storage requirement of the
schedule table. Remember that cyclic schedulers are used in small embedded applications
with a very small storage capacity. So, this constraint is important to the commercial
success of a product. The number of entries to be stored in the schedule table can be
minimized when the minor cycle squarely divides the major cycle. When the minor cycle
squarely divides the major cycle, the major cycle contains an integral number of minor
cycles (no fractional minor cycles). Unless the minor cycle squarely divides the major
cycle, storing the schedule for one major cycle would not be sufficient, as the schedules
in the major cycle would not repeat and this would make the size of the schedule table
large. We can formulate this constraint as:
M/F = M/F (2.3)
In other words, if the floor of M/F equals M/F, then the major cycle would
contain an integral number of frames.
o m
t.c
Task arrival
Deadline
t
p o
g s
o
t d
. bl
u p
r o
s g
nt
de
0 kF (k+1)F (k+2)F
t u
Fig. 29.7 Satisfaction of a Task Deadline
y s This third constraint on frame size is necessary to meet
t
ci constraint imposes that between the arrival of a task and its
3. Satisfaction of Task Deadline:
.
the task deadlines. This
t d
(k+1)
F
0 kF (k+2)F
Fig. 29.8 A Full Frame Exists Between the Arrival and Deadline of a Task
More formally, this constraint can be formulated as follows: Suppose a task arises after
m
t time units have passed since the last frame (see Fig. 29.8). Then, assuming that a
o
t.c
single frame is sufficient to complete the task, the task can complete before its deadline
t) di, or 2F (di + t).
iff (2F
(2.4) p o
s
Remember that the value of t might vary from one instance of the task to another. The
g
o
bl
worst case scenario (where the task is likely to miss its deadline) occurs for the task
instance having the minimum value of t, such that t > 0. This is the worst case
p .
scenario, since under this the task would have to wait the longest before its execution can
start.
o u
r
It should be clear that if a task arrives just after a frame has started, then the task would
g
s
have to wait for the full duration of the current frame before it can be taken up for
ent
execution. If a task at all misses its deadline, then certainly it would be under such
situations. In other words, the worst case scenario for a task to meet its deadline occurs
u d
for its instance that has the minimum separation from the start of a frame. The
st
determination of the minimum separation value (i.e. min(t)) for a task among all
t y
instances of the task would help in determining a feasible frame size. We show by
i
.c
Theorem 2.2 that min(t) is equal to gcd(F, pi). Consequently, this constraint can be
written as:
w for every Ti, 2F gcd(F, pi) di
w (2.5)
w
Note that this constraint defines an upper-bound on frame size for a task Ti, i.e.,
if the frame size is any larger than the defined upper-bound, then tasks might miss their
deadlines. Expr. 2.5 defined the frame size, from the consideration of one task only.
Now considering all tasks, the frame size must be smaller than max(gcd(F, pi)+di)/2.
1.5.4. Theorem 2
The minimum separation of the task arrival from the corresponding frame start time
(min(t)), considering all instances of a task Ti, is equal to gcd(F, pi).
Proof: Let g = gcd(F, pi), where gcd is the function determining the greatest common
divisor of its arguments. It follows from the definition of gcd that g must squarely divide each
of F and pi. Let Ti be a task with zero phasing. Now, assume that this Theorem is violated for
certain integers m and n, such that the Ti(n) occurs in the mth frame and the difference between
the start time of the mth frame and the nth task arrival time is less than g. That is, 0 <
(m F n pi) < g.
Dividing this expression throughout by g, we get:
0 < (m F/g n pi/g) < 1 (2.6)
However, F/g and pi/g are both integers because g is gcd(F, pi,). Therefore, we can write F/g
= I1 and pi/g = I2 for some integral values I1 and I2. Substituting this in Expr 2.6, we get 0 < mI1
nI2 < 1. Since mI1 and nI2 are both integers, their difference cannot be a fractional value
lying between 0 and 1. Therefore, this expression can never be satisfied.
It can therefore be concluded that the minimum time between a frame boundary and
the arrival of the corresponding instance of Ti can not be less than gcd(F, pi).
For a given task set it is possible that more than one frame size satisfies all the three
constraints. In such cases, it is better to choose the shortest frame size. This is because of the
fact that the schedulability of a task set increases as more frames become available over a major
cycle.
o m
It should however be remembered that the mere fact that a suitable frame size can be
determined does not mean that a feasible schedule would be found. It may so happen that there
o t.c
is not enough number of frames available in a major cycle to be assigned to all the task instances.
We now illustrate how an appropriate frame size can be selected for cyclic schedulers
through a few examples.
s p
o g
1.5.5. Examples
bl
.
Example 1: A cyclic scheduler is to be used to run p
u the following set of periodic tasks on a
T = (e =1, p =4), T = o (e =, p =5), T = (e =1, p =20), T = (e =2,
uniprocessor: 1 1 1
p =20). Select an appropriate frame size. g
2
r 2 2 3 3 3 4 4
4
ts
e n
Solution: For the given task set, an appropriate frame size is the one that satisfies all the
three required constraints. In the d following, we determine a suitable frame size F which
t u
satisfies all the three required constraints.
y s
t
Constraint 1: Let F be an iappropriate frame size, then max {ei, F}. From this constraint, we
get F 1.5. .c
w
Constraint 2: Thew major cycle M for the given task set is given by M = LCM(4,5,20) = 20.
M should be an w integral multiple of the frame size F, i.e., M mod F = 0. This consideration
implies that F can take on the values 2, 4, 5, 10, 20. Frame size of 1 has been ruled out since
it would violate the constraint 1.
Constraint 3: To satisfy this constraint, we need to check whether a selected frame size F
satisfies the inequality: 2F gcd(F, pi) < di for each pi.
Let us first try frame size 2.
For F = 2 and task T1:
2 2 gcd(2, 4) 4 4 2 4
Therefore, for p1 the inequality is satisfied.
Let us try for F = 2 and task T2:
2 2 gcd(2, 5) 5 4 1 5
Therefore, for p2 the inequality is satisfied.
Let us try for F = 2 and task T3:
Version 2 EE IIT, Kharagpur 16
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
2 2 gcd(2, 20) 20 4 2 20
Therefore, for p3 the inequality is satisfied.
For F = 2 and task T4:
2 2 gcd(2, 20) 20 4 2 20
For p4 the inequality is satisfied.
Thus, constraint 3 is satisfied by all tasks for frame size 2. So, frame size 2 satisfies all the
three constraints. Hence, 2 is a feasible frame size.
Let us try frame size 4.
For F = 4 and task T1:
2 4 gcd(4, 4) 4 8 4 4
Therefore, for p1 the inequality is satisfied.
Let us try for F = 4 and task T2:
2 4 gcd(4, 5) 5 8 1 5
For p2 the inequality is not satisfied. Therefore, we need not look any further. Clearly, F = 4
is not a suitable frame size.
Let us now try frame size 5, to check if that is also feasible. o m
For F = 5 and task T1, we have
2 5 gcd(5, 4) 4 10 1 4 o t.c
s p
The inequality is not satisfied for T1. We need not look any further. Clearly, F = 5 is not a
suitable frame size.
o g
Let us now try frame size 10. . bl
For F = 10 and task T1, we have
u p
o
2 10 gcd(10, 4) 4 20 2 4
r
g
The inequality is not satisfied for T1. We need not look any further. Clearly, F=10 is not a
s
nt
suitable frame size.
Let us try if 20 is a feasible frame size.
For F = 20 and task T1, we have
d e
u
2 20 gcd(20, 4) 4 40 4 4
t
s
Therefore, F = 20 is also not suitable.
y
it
So, only the frame size 2 is suitable for scheduling.
.c
Even though for Example 1 we could successfully find a suitable frame size that satisfies all
w
the three constraints, it is quite probable that a suitable frame size may not exist for many
w
problems. In such cases, to find a feasible frame size we might have to split the task (or
w
a few tasks) that is (are) causing violation of the constraints into smaller sub-tasks
that can be scheduled in different frames.
Example 2: Consider the following set of periodic real-time tasks to be scheduled by a cyclic
scheduler: T1 = (e1=1, p1=4), T2 = (e2=2, p2=5), T3 = (e3=5, p3=20). Determine a
suitable frame size for the task set.
Solution:
Using the first constraint, we have F 5.
Using the second constraint, we have the major cycle M = LCM(4, 5, 20) = 20. So, the
permissible values of F are 5, 10 and 20.
Checking for a frame size that satisfies the third constraint, we can find that no value of F is
suitable. To overcome this problem, we need to split the task that is making the task-set not
schedulable. It is easy to observe that the task T3 has the largest execution time, and
consequently due to constraint 1, makes the feasible frame sizes quite large.
We try splitting T3 into two or three tasks. After splitting T3 into three tasks, we have:
T3.1 = (20, 1, 20), T3.2 = (20, 2, 20), T3.3 = (20, 2, 20).
The possible values of F now are 2 and 4. We can check that now after splitting the tasks,
F=2 and F=4 become feasible frame sizes.
It is very difficult to come up with a clear set of guidelines to identify the exact task that is to
be split, and the parts into which it needs to be split. Therefore, this needs to be done by trial
and error. Further, as the number of tasks to be scheduled increases, this method of trial and
error becomes impractical since each task needs to be checked separately. However, when
the task set consists of only a few tasks we can easily apply this technique to find a feasible
frame size for a set of tasks otherwise not schedulable by a cyclic scheduler.
o t.c
time applications. However, our discussion on cyclic schedulers was so far restricted to
scheduling periodic real-time tasks. On the other hand, many practical applications
s p
typically consist of a mixture of several periodic, aperiodic, and sporadic tasks. In this
g
section, we discuss how aperiodic and sporadic tasks can be accommodated by cyclic schedulers.
o
bl
Recall that the arrival times of aperiodic and sporadic tasks are expressed
.
statistically. Therefore, there is no way to assign aperiodic and sporadic tasks to frames without
p
ou
significantly lowering the overall achievable utilization of the system. In a generalized
scheduler, initially a schedule (assignment of tasks to frames) for only periodic tasks is prepared.
gr
The sporadic and aperiodic tasks are scheduled in the slack times that may be available in the
s
nt
frames. Slack time in a frame is the time left in the frame after a periodic task allocated to the
frame completes its execution. Non-zero slack time in a frame can exist only when the execution
d e
time of the task allocated to it is smaller than the frame size.
t u
A sporadic task is taken up for scheduling only if enough slack time is available for
s
the arriving sporadic task to complete before its deadline. Therefore, a sporadic task on
y
it
its arrival is subjected to an acceptance test. The acceptance test checks whether the task is
.c
likely to be completed within its deadline when executed in the available slack times. If it is
w
not possible to meet the tasks deadline, then the scheduler rejects it and the
w
corresponding recovery routines for the task are run. Since aperiodic tasks do not have strict
w
deadlines, they can be taken up for scheduling without any acceptance test and best effort can be
made to schedule them in the slack times available. Though for aperiodic tasks no acceptance
test is done, but no guarantee is given for a tasks completion time and best effort is made to
complete the task as early as possible.
An efficient implementation of this scheme is that the slack times are stored in a
table and during acceptance test this table is used to check the schedulability of the arriving
tasks.
Another popular alternative is that the aperiodic and sporadic tasks are accepted without any
acceptance test, and best effort is made to meet their respective deadlines.
and if required the sporadic tasks have already been subjected to an acceptance test and only
those which have passed the test are available for scheduling.
cyclic-scheduler() {
current-task T = Schedule-Table[k];
k = k + 1;
k = k mod N; //N is the total number of tasks in the schedule
table
dispatch-current-task(T);
schedule-sporadic-tasks(); //Current task T completed early,
// sporadic tasks can be taken
up
schedule-aperiodic-tasks(); //At the end of the frame, the running
task
// is pre-empted if not complete
idle(); //No task to run, idle
}
o m
c
The cyclic scheduler routine cyclic-scheduler () is activated at thet.end of every frame by a
periodic timer. If the current task is not complete by the endoof the frame, then it is
suspended and the task to be run in the next frame is dispatched s p by invoking the routine
cyclic-scheduler(). If the task scheduled in a frame completesgearly, then any existing sporadic
or aperiodic task is taken up for execution.
b lo
p .
1.5.7. Comparison of Cyclic with Table-Driven
o u Scheduling
r
Both table-driven and cyclic schedulers aregimportant clock-driven schedulers. A scheduler
s
needs to set a periodic timer only once at the tapplication initialization time. This timer continues
to give an interrupt exactly at every frame e nboundary. But in table-driven scheduling, a timer
has to be set every time a task starts
u d to run. The execution time of a typical real-time task
st
is usually of the order of a few milliseconds. Therefore, a call to a timer is made every few mill
i t y
Seconds. This represents a significant overhead and results in degraded system performance.
needs to be chosenwshould be at least as long as the size of the largest execution time
scheduler is more proficient than a cyclic scheduler because the size of the frame that
of a task in the task set. This is a source of inefficiency, since this results in processor time
being wasted in case of those tasks whose execution times are smaller than the chosen frame
size.
1.6. Exercises
1. State whether the following assertions are True or False. Write one or two sentences to
justify your choice in each case.
a. Average response time is an important performance metric for real-time operating
systems handling running of hard real-time tasks.
b. Unlike table-driven schedulers, cyclic schedulers do not require to store a pre-
computed schedule.
c. The minimum period for which a table-driven scheduler scheduling n periodic tasks
needs to pre-store the schedule is given by max{p1, p2, , pn}, where pi is the period
of the task Ti.
d. A cyclic scheduler is more proficient than a pure table-driven scheduler for
scheduling a set of hard real-time tasks.
e. A suitable figure of merit to compare the performance of different hard real-time task
scheduling algorithms can be the average task response times resulting from each
algorithm.
f. Cyclic schedulers are more proficient than table-driven schedulers.
g. While using a cyclic scheduler to schedule a set of real-time tasks on a uniprocessor,
when a suitable frame size satisfying all the three required constraints has been found,
it is guaranteed that the task set would be feasibly scheduled by the cyclic scheduler.
h. When more than one frame satisfies all the constraints on frame size while scheduling a
set of hard real-time periodic tasks using a cyclic scheduler, the largest of these frame
sizes should be chosen.
o m
i. In table-driven scheduling of three periodic tasks T1,T2,T3, the scheduling table must
p
j. When a set of hard real-time periodic tasks are being scheduled using a cyclic
s
g
scheduler, if a certain frame size is found to be not suitable, then any frame size
o
bl
smaller than this would not also be suitable for scheduling the tasks.
p .
k. When a set of hard real-time periodic tasks are being scheduled using a cyclic
scheduler, if a candidate frame size exceeds the execution time of every task and
u
squarely divides the major cycle, then it would be a suitable frame size to schedule the
o
given set of tasks.
gr
s
l. Finding an optimal schedule for a set of independent periodic hard real-time tasks
2.
u d
Real-time tasks are normally classified into periodic, aperiodic, and sporadic real-time
task.
st
t y
a. What are the basic criteria based on which a real-time task can be determined to belong
i
.c
to one of the three categories?
b. Identify some characteristics that are unique to each of the three categories of tasks.
w
w
c. Give examples of tasks in practical systems which belong to each of the three
3.
w
categories.
What do you understand by an optimal scheduling algorithm? Is it true that the time
complexity of an optimal scheduling algorithm for scheduling a set of real-time tasks in a
uniprocessor is prohibitively expensive to be of any practical use? Explain your answer.
4. Suppose a set of three periodic tasks is to be scheduled using a cyclic scheduler on a
uniprocessor. Assume that the CPU utilization due to the three tasks is less than 1. Also,
assume that for each of the three tasks, the deadlines equals the respective periods.
Suppose that we are able to find an appropriate frame size (without having to split any of
the tasks) that satisfies the three constraints of minimization of context switches,
minimization of schedule table size, and satisfaction of deadlines. Does this imply that it is
possible to assert that we can feasibly schedule the three tasks using the cyclic scheduler?
If you answer affirmatively, then prove your answer. If you answer negatively, then show
an example involving three tasks that disproves the assertion.
5. Consider a real-time system which consists of three tasks T1, T2, and T3, which have been
characterized in the following table.
Version 2 EE IIT, Kharagpur 20
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
If the tasks are to be scheduled using a table-driven scheduler, what is the length of time
for which the schedules have to be stored in the pre-computed schedule table of the
scheduler.
6. A cyclic real-time scheduler is to be used to schedule three periodic tasks T1, T2,
and T3 with the following characteristics:
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
30
Real-Time Task
Scheduling Part 2
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
u p (3.1/2.7)
r o
This expression is easy to interpret. When any foreground task is executing, the background
task waits. The average CPU utilization due to the foreground task Ti is ei/pi, since ei amount of
s g
nt
processing time is required over every pi period. It follows that all foreground tasks together
n
would result in CPU utilization of i=1 ei / pi. Therefore, the average time available for execution
d e n
of the background tasks in every unit of time is 1i=1 ei / pi. Hence, Expr. 2.7 follows easily.
t u
s
We now illustrate the applicability of Expr. 2.7 through the following three simple examples.
y
i t
1.3. Examples .c
w
w
Example 1: Consider a real-time system in which tasks are scheduled using foreground-
w
background scheduling. There is only one periodic foreground task Tf : (f =0, pf =50 msec, ef
=100 msec, df =100 msec) and the background task be TB = (eB =1000 msec). Compute the
B B
Solution: By using the expression (2.7) to compute the task completion time, we have
ctB = 1000 / (150/100) = 2000 msec
B
2
Solution: The total utilization due to the foreground tasks: i=1 ei / pi = 10/20 + 20/50 =
90/100.
This implies that the fraction of time remaining for the background task to execute is given
by:
2
1i=1 ei / pi = 10/100.
Therefore, the background task gets 1 millisecond every 10 milliseconds. Thus, the
background task would take 10(100/1) = 1000 milliseconds to complete.
p .
switching overhead of 1 msec. This has been shown as a shaded rectangle in Fig. 30.1.
Subsequently each time the foreground task runs, it preempts the background task and incurs
u
one context switch. On completion of each instance of the foreground task, the background
o
r
task runs and incurs another context switch. With this observation, to simplify our
g
s
computation of the actual completion time of TB, we can imagine that the execution time of
nt
B
every foreground task is increased by two context switch times (one due to itself and
e
the other due to the background task running after each time it completes). Thus, the
d
u
net effect of context switches can be imagined to be causing the execution time of
t
s
the foreground task to increase by 2 context switch times, i.e. to 52 milliseconds from
y
t
50 milliseconds. This has pictorially been shown in Fig. 30.1.
i
.c
Now, using Expr. 2.7, we get the time required by the background task to complete:
1000/(152/100) = 2083.4 milliseconds
w
w
In the following two sections, we examine two important event-driven schedulers: EDF
w
(Earliest Deadline First) and RMA (Rate Monotonic Algorithm). EDF is the optimal dynamic
priority real-time task scheduling algorithm and RMA is the optimal static priority real-time task
scheduling algorithm.
where ui is average utilization due to the task Ti and n is the total number of tasks in the task
set. Expr. 3.2 is both a necessary and a sufficient condition for a set of tasks to be EDF
schedulable.
EDF has been proven to be an optimal uniprocessor scheduling algorithm. This means that, if
a set of tasks is not schedulable under EDF, then no other scheduling algorithm can feasibly
schedule this task set. In the simple schedulability test for EDF (Expr. 3.2), we assumed that the
period of each task is the same as its deadline. However, in practical problems the period of a
task may at times be different from its deadline. In such cases, the schedulability test needs to be
changed. If pi > di, then each task needs ei amount of computing time every min(pi, di)
duration of time. Therefore, we can rewrite Expr. 3.2 as:
n
i=1
ei / min(pi, di) 1 (3.3/2.9)
However, if pi < di, it is possible that a set of tasks is EDF schedulable, even when the task
set fails to meet the Expr 3.3. Therefore, Expr 3.3 is conservative when pi < di, and is not a
necessary condition, but only a sufficient condition for a given task set to be EDF schedulable.
o m
t.c
Example 4: Consider the following three periodic real-time tasks to be scheduled using EDF
on a uniprocessor: T1 = (e1=10, p1=20), T2 = (e2=5, p2=50), T3 = (e3=10, p3=35). Determine
whether the task set is schedulable.
p o
g s
o
Solution: The total utilization due to the three tasks is given by:
bl
3
ei / pi = 10/20 + 5/50 + 10/35 = 0.89
i=1
p .
This is less than 1. Therefore, the task set is EDF schedulable.
o u
Though EDF is a simple as well as an optimal algorithm, it has a few shortcomings
r
which render it almost unusable in practical applications. The main problems with EDF are
g
s
discussed in Sec. 3.4.3. Next, we discuss the concept of task priority in EDF and then discuss
how EDF can be practically implemented.
e nt
ud Priority Scheduling Algorithm?
1.4.1. Is EDF Really a Dynamic
t
s
y is a dynamic priority scheduling algorithm. Was it after all
i
We stated in Sec 3.3 that EDF
correct on our part to assert.c
t
that EDF is a dynamic priority task scheduling algorithm? If EDF
were to be considered a w dynamic priority algorithm, we should be able determine the precise
priority value of a taskw
at any point of time and also be able to show how it changes with time. If
w
we reflect on our discussions of EDF in this section, EDF scheduling does not require any
priority value to be computed for any task at any time. In fact, EDF has no notion of a priority
value for a task. Tasks are scheduled solely based on the proximity of their deadline.
However, the longer a task waits in a ready queue, the higher is the chance (probability) of
being taken up for scheduling. So, we can imagine that a virtual priority value associated with a
task keeps increasing with time until the task is taken up for scheduling. However, it is important
to understand that in EDF the tasks neither have any priority value associated with them, nor
does the scheduler perform any priority computations to determine the schedulability of a task at
either run time or compile time.
queue would contain the absolute deadline of the task. At every preemption point, the entire
queue would be scanned from the beginning to determine the task having the shortest deadline.
However, this implementation would be very inefficient. Let us analyze the complexity of this
scheme. Each task insertion will be achieved in O(1) or constant time, but task selection (to run
next) and its deletion would require O(n) time, where n is the number of tasks in the queue.
A more efficient implementation of EDF would be as follows. EDF can be implemented by
maintaining all ready tasks in a sorted priority queue. A sorted priority queue can efficiently be
implemented by using a heap data structure. In the priority queue, the tasks are always kept
sorted according to the proximity of their deadline. When a task arrives, a record for it can be
inserted into the heap in O(log2 n) time where n is the total number of tasks in the priority queue.
At every scheduling point, the next task to be run can be found at the top of the heap. When a
task is taken up for scheduling, it needs to be removed from the priority queue. This can be
achieved in O(1) time.
A still more efficient implementation of the EDF can be achieved as follows under the
assumption that the number of distinct deadlines that tasks in an application can have are
o m
restricted. In this approach, whenever task arrives, its absolute deadline is computed from its
o t.c
release time and its relative deadline. A separate FIFO queue is maintained for each distinct
relative deadline that tasks can have. The scheduler inserts a newly arrived task at the end of the
p
corresponding relative deadline queue. Clearly, tasks in each queue are ordered according to
s
their absolute deadlines.
o g
bl
To find a task with the earliest absolute deadline, the scheduler only needs to search
p .
among the threads of all FIFO queues. If the number of priority queues maintained by the
scheduler is Q, then the order of searching would be O(1). The time to insert a task would also
be O(1).
o u
gr
1.4.3. Shortcomings of EDF s
e nt
d
In this subsection, we highlight some of the important shortcomings of EDF when used for
u
t
scheduling real-time tasks in practical applications.
s
i
Transient Overload Problem:t y Transient overload denotes the overload of a system for a
.c overload occurs when some task takes more time to
very short time. Transient
complete than whatwwas originally planned during the design time. A task may take
longer to completewdue to many reasons. For example, it might enter an infinite loop or
w
encounter an unusual condition and enter a rarely used branch due to some abnormal input
values. When EDF is used to schedule a set of periodic real-time tasks, a task overshooting
its completion time can cause some other task(s) to miss their deadlines. It is usually very
difficult to predict during program design which task might miss its deadline when a transient
overload occurs in the system due to a low priority task overshooting its deadline. The only
prediction that can be made is that the task (tasks) that would run immediately after
the task causing the transient overload would get delayed and might miss its (their)
respective deadline(s). However, at different times a task might be followed by different
tasks in execution. However, this lead does not help us to find which task might miss its
deadline. Even the most critical task might miss its deadline due to a very low priority task
overshooting its planned completion time. So, it should be clear that under EDF any amount
of careful design will not guarantee that the most critical task would not miss its deadline
under transient overload. This is a serious drawback of the EDF scheduling algorithm.
Resource Sharing Problem: When EDF is used to schedule a set of real-time tasks,
unacceptably high overheads might have to be incurred to support resource sharing among
the tasks without making tasks to miss their respective deadlines. We examine this issue in
some detail in the next lesson.
gr
In RMA, the priority of a task is directly proportional to its rate (or, inversely proportional to its
s
period). That is, the priority of any task Ti is computed as: priority = k / pi, where pi is the
nt
period of the task Ti and k is a constant. Using this simple expression, plots of priority values of
e
d
tasks under RMA for tasks of different periods can be easily obtained. These plots have been
u
st
shown in Fig. 30.10(a) and Fig. 30.10(b). It can be observed from these figures that the priority
of a task increases linearly with the arrival rate of the task and inversely with its period.
it y
.c
w
w
Priority w Priority
Rate Period
(a) (b)
Fig. 30.2 Priority Assignment to Tasks in RMA
worst-case execution times and periods of the tasks. A pertinent question at this point is how can
a system developer determine the worst-case execution time of a task even before the system is
developed. The worst-case execution times are usually determined experimentally or through
simulation studies.
The following are some important criteria that can be used to check the schedulability of a set
of tasks set under RMA.
s g i=1 i (3.4/2.10)
where u is the utilization due to task T . Lettus now examine the implications of this result. If a
i
e
i
n then it is guaranteed that the set of tasks would be
set of tasks satisfies the sufficient condition,
RMA schedulable.
Consider the case where there is onlytu
d
one task in the system, i.e. n = 1.
Substituting n = 1 in Expr. 3.4, wesget,
i ty u 1(2 1) or u 1
1 1/1 1
w 2 2
For n = 3, we get, w
w i=1
u 2(2 1) or u 0.828
i
1/2
i=1 i
3 3
i=1
ui 3(21/3 1) or i=1
ui 0.78
For n , we get,
i=1
ui 3(21/ 1) or i=1
ui .0
1
ui
0.692
(1,0)
Number of tasks
Fig. 30.3 Achievable Utilization with the Number of Tasks under RMA
Evaluation of Expr. 3.4 when n involves an indeterminate expression of the type .0.
By applying LHospitals rule, we can verify that the right hand side of the expression evaluates
to loge2 = 0.692. From the above computations, it is clear that the maximum CPU utilization that
can be achieved under RMA is 1. This is achieved when there is only a single task in the system.
As the number of tasks increases, the achievable CPU utilization falls and as n , the o m
ot.c
achievable utilization stabilizes at loge2, which is approximately 0.692. This is pictorially shown
in Fig. 30.3. We now illustrate the applicability of the RMA schedulability criteria through a few
examples.
s p
o g
1.5.2. Examples
. bl
u p
Example 5: Check whether the following set of periodic real-time tasks is schedulable under
o
RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=60, p3=200).
r
g
Solution: Let us first compute the totaltsCPU utilization achieved due to the three given
tasks.
e n
u d u = 20/100 + 30/150 + 60/200 = 0.7
3
i
This is less than 1; therefore tthe necessary condition for schedulability of the tasks is
i=1
Example 6: Check whether the following set of three periodic real-time tasks is
schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3
= (e3=90, p3=200).
Solution: Let us first compute the total CPU utilization due to the given task set:
3
i=1
ui = 20/100 + 30/150 + 90/200 = 0.7
If a task set passes the Liu and Layland test, then it is guaranteed to be RMA schedulable. On
the other hand, even if a task set fails the Liu and Layland test, it may still be RMA
schedulable.
It follows from this that even when a task set fails Liu and Laylands test, we should not
conclude that it is not schedulable under RMA. We need to test further to check if the task set is
RMA schedulable. A test that can be per- formed to check whether a task set is RMA
schedulable when it fails the Liu and Layland test is the Lehoczkys test. Lehoczkys test has
been expressed as Theorem 3.
o m
t.c
1.5.3. Theorem 3
p o
A set of periodic real-time tasks is RMA schedulable under any task phasing, iff all the tasks
meet their respective first deadlines under zero phasing.
g s
o
.bl
u p
r o
T1 T2 T1 T2 sg T1 T2
n t 60 70
10 30 40
e
(a) T dis in phase with T
90 time in msec
t u 1 2
s
ti y
.c
w
T2 wT1 T2 T1 T2
w
20 30 50 60 80 time in msec
Fig. 30.4 Worst Case Response Time for a Task Occurs When It is in Phase
with Its Higher Priority Tasks
A formal proof of this Theorem is beyond the scope of this discussion. However, we provide an
intuitive reasoning as to why Theorem 3 must be true. Intuitively, we can understand this result
from the following reasoning. First let us try to understand the following fact.
The worst case response time for a task occurs when it is in phase with its higher
To see why this statement must be true, consider the following statement. Under RMA whenever
a higher priority task is ready, the lower priority tasks can not execute and have to wait. This
implies that, a lower priority task will have to wait for the entire duration of execution of each
higher priority task that arises during the execution of the lower priority task. More number of
instances of a higher priority task will occur, when a task is in phase with it, when it is in phase
with it rather than out of phase with it. This has been illustrated through a simple example in Fig.
30.4. In Fig. 30.4(a), a higher priority task T1=(10,30) is in phase with a lower priority task
T2=(60,120), the response time of T2 is 90 msec. However, in Fig. 30.4(b), when T1 has a 20
msec phase, the response time of T2 becomes 80. Therefore, if a task meets its first deadline
under zero phasing, then they it will meet all its deadlines.
Example 7: Check whether the task set of Example 6 is actually schedulable under RMA.
o m
Solution: Though the results of Liu and Laylands test were negative as per the results
of Example 6, we can apply the Lehoczky test and observe the following:
o t.c
p
For the task T1: e1 < p1 holds since 20 msec < 100 msec. Therefore, it would meet its first
s
deadline (it does not have any tasks that have higher priority).
o g
Deadline for T1 bl
p .
o u
g r
T1
ts
20 e n 100
(a) T1
u d meets its first deadline
s t Deadline for T2
it y
.c
T1 T2 w
w
20 w 50 150
(b) T2 meets its first deadline
Deadline for T3
T1 T2 T3 T1 T3 T2 T3
For the task T2: T1 is its higher priority task and considering 0 phasing, it would occur once
before the deadline of T2. Therefore, (e1 + e2) < p2 holds, since 20 + 30 = 50 msec < 150
msec. Therefore, T2 meets its first deadline.
For the task T3: (2e1 + 2e2 + e3) < p3 holds, since 220 + 230 + 90 = 190msec < 200 msec.
We have considered 2e1 and 2e2 since T1 and T2 occur twice within the first deadline of
T3. Therefore, T3 meets its first deadline. So, the given task set is schedulable under RMA. The
schedulability test for T3 has pictorially been shown in Fig. 30.5. Since all the tasks meet their
first deadlines under zero phasing, they are RMA schedulable according to Lehoczkys results.
Ti(1)
o m
T1(1) T1(2)
t.c
oT1(3)
s p
o g
1 bl of T
Fig. 30.6 Instances of T over a single .instance i
p
Let us now try to derive a formal expression foruthis important result of Lehoczky. Let {T
T ,T } be the set of tasks to be scheduled. Let us r oalso assume that the tasks have been ordered 1,
2, i
t y
time instant 0. Consider the example
three instances of the task Tcihave occurred. Each time T occurs, T has to wait since T has
i
. 1 1 i 1
This is given by p w
w the exact number of times that T occurs within a single instance of T .
1
/ p . Since T s execution time is e , then the total execution time required
i 1 1 1
i
due to task T before the deadline of T is p / p e . This expression can easily be generalized
1 i i 1 1
to consider the execution times all tasks having higher priority than Ti (i.e. T1, T2, , Ti1).
Therefore, the time for which Ti will have to wait due to all its higher priority tasks can be
expressed as:
i-1
k=1
pi / pk ek (3.5/2.11)
Expression 3.5 gives the total time required to execute Tis higher priority tasks for which Ti
would have to wait. So, the task Ti would meet its first deadline, iff
i-1
ei + k=1 pi / pk ek pi (3.6/2.12)
That is, if the sum of the execution times of all higher priority tasks occurring before Tis first
deadline, and the execution time of the task itself is less than its period pi, then Ti would
complete before its first deadline. Note that in Expr. 3.6, we have implicitly assumed that the
task periods equal their respective deadlines, i.e. pi = di. If pi < di, then the Expr. 3.6 would need
modifications as follows.
i-1
ei + k=1 di / pk ek di (3.7/2.13)
Note that even if Expr. 3.7 is not satisfied, there is some possibility that the task set
may still be schedulable. This might happen because in Expr. 3.7 we have considered zero
phasing among all the tasks, which is the worst case. In a given problem, some tasks may have
non-zero phasing. Therefore, even when a task set narrowly fails to meet Expr 3.7, there is some
chance that it may in fact be schedulable under RMA. To understand why this is so, consider a
task set where one particular task Ti fails Expr. 3.7, making the task set not schedulable. The
task misses its deadline when it is in phase with all its higher priority task. However, when the
task has non-zero phasing with at least some of its higher priority tasks, the task might
actually meet its first deadline contrary to any negative results of the expression 3.7.
Let us now consider two examples to illustrate the applicability of the Lehoczkys results.
o m
Example 8: Consider the following set of three periodic real-time tasks: T1=(10,20),
T2=(15,60), T3=(20,120) to be run on a uniprocessor. Determine whether the task set is
schedulable under RMA.
o t.c
s p
Solution: First let us try the sufficiency test for RMA schedulability. By Expr. 3.4 (Liu and
Layland test), the task set is schedulable if u 0.78. g
u = 10/20 + 15/60 +lo
i
i
. b 20/120 = 0.91
Let us now try Lehoczkys test. All the tasks r oT , T , T are already ordered in decreasing
1 2 3
order of their priorities.
s g
Testing for task T :
1
n t
Testing for task T :
e
Since e1 (10 msec) is less than d1 (20 msec), T would meet its first deadline.
1
d+ 60/20 10 60 or 15 + 30 = 45 60 msec
2
u
t T would meet its first deadline.
15
s
Testing for Task T3: i ty
The condition is satisfied. Therefore, 2
Example 9: RMA is used to schedule a set of periodic hard real-time tasks in a system. Is it
possible in this system that a higher priority task misses its deadline, whereas a lower
priority task meets its deadlines? If your answer is negative, prove your denial. If your
answer is affirmative, give an example involving two or three tasks scheduled using RMA
where the lower priority task meets all its deadlines whereas the higher priority task misses
its deadline.
Solution: Yes. It is possible that under RMA a higher priority task misses its deadline where
as a lower priority task meets its deadline. We show this by constructing an example.
Consider the following task set: T1 = (e1=15, p1=20), T2 = (e2=6, p2=35), T3 = (e3=3,
p3=100). For the given task set, it is easy to observe that pr(T1) > pr(T2) > pr(T3). That is, T1,
T2, T3 are ordered in decreasing order of their priorities.
Version 2 EE IIT, Kharagpur 14
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
For this task set, T3 meets its deadline according to Lehoczkys test since
e3 + p3 / p2 e2 + p3 / p1 e1 = 3 + ( 100/35 6) + ( 100/20
15)
= 3 + (3 6) + (5 15) = 96 100 msec.
But, T2 does not meet its deadline since
e2 + p2 / p1 e1 = 6 + ( 35/20 15) = 6 + (2 15) = 36 msec.
This is greater than the deadline of T2 (35 msec).
As a consequence of the results of Example 9, by observing that the lowest priority task of a
given task set meets its first deadline, we can not conclude that the entire task set is RMA
schedulable. On the contrary, it is necessary to check each task individually as to
whether it meets its first deadline under zero phasing. If one finds that the lowest
priority task meets its deadline, and concludes that the entire task set would be feasibly
scheduled under RMA, he is likely to be flawed.
periods, the maximum utilization below which a task u pset can feasibly be scheduled is on the
average close to 88%.
r o
For harmonic tasks, the maximum achievable
s g periods
utilization (for a task set to have a feasible
n t scheduled. areLetharmonically
schedule) can still be higher. In fact, if all the task related, then even a
d
periods of a task set said to be harmonicallye related. The task periods in a task set are said to be
task set having 100% utilization can be feasibly us first understand when are the
harmonically related, iff for any twotu arbitrary tasks T and T in the task set, whenever p > p , it
should imply that p is an integral s
i k i k
60). w
w a harmonically related task set with even 100% utilization can feasibly
It is easy to prove that
be scheduled. w
1.5.5. Theorem 4
For a set of harmonically related tasks HS = {Ti}, the RMA schedulability criterion is given
n
by i=1 ui 1.
Proof: Let us assume that T1, T2, , Tn be the tasks in the given task set. Let us further
assume that the tasks in the task set T1, T2, , Tn have been arranged in increasing order of their
periods. That is, for any i and j, pi < pj whenever i < j. If this relationship is not satisfied, then a
simple renaming of the tasks can achieve this. Now, according to Expr. 3.6, a task Ti meets its
i-1
deadline, if ei + k=1 pi / pk ek pi.
However, since the task set is harmonically related, pi can be written as m pk for some m.
Using this, pi / pk = pi / pk. Now, Expr. 3.6 can be written as:
i-1
ei + k=1 (pi / pk) ek pi
n-1
For Ti = Tn, we can write, en + k=1 (pn / pk) ek pn.
Dividing both sides of this expression by pn, we get the required result.
n n
Hence, the task set would be schedulable iff k=1 ek / pk 1 or i=1 ui 1.
.c
completion time can not make a higher priority task to miss its deadline.
w
w
w
o m
6
o t.c
Fig. 30.7 Multi-Level Feedback Queue
s p
o g
bl
The disadvantages of RMA include the following: It is very difficult to support aperiodic and
.
sporadic tasks under RMA. Further, RMA is not optimal when task periods and deadlines differ.
p
1.6. Deadline Monotonic Algorithmo(DMA) u
g r
RMA no longer remains an optimal scheduling
ts algorithm for the periodic real-time tasks,
when task deadlines and periods differ (i.e.nd p ) for some tasks in the task set to be scheduled.
For such task sets, Deadline Monotonic e
i i
Example 10: Is the following task set schedulable by DMA? Also check whether it is
schedulable using RMA. T1 = (e1=10, p1=50, d1=35), T2 = (e2=15, p2=100, d1=20), T3 =
(e3=20, p3=200, d1=200) [time in msec].
Solution: First, let us check RMA schedulability of the given set of tasks, by checking the
Lehoczkys criterion. The tasks are already ordered in descending order of their priorities.
Checking for T1:
10 msec < 35 msec. Hence, T1 would meet its first deadline.
Checking for T2:
(10 + 15) > 20 (exceeds deadline)
Thus, T2 will miss its first deadline. Hence, the given task set can not be feasibly scheduled
under RMA.
Now let us check the schedulability using DMA:
Under DMA, the priority ordering of the tasks is as follows: pr(T2) > pr(T1) > pr(T3).
Checking for T2:
15 msec < 20 msec. Hence, T2 will meet its first deadline.
Checking for T1:
(15 + 10) < 35
Hence T1 will meet its first deadline.
Checking for T3:
(20 + 30 + 40) < 200
Therefore, T3 will meet its deadline.
Therefore, the given task set is schedulable under DMA but not under RMA.
Solution: The net effect of context switches is to increase the execution time of each
task by two context switching times. Therefore, the utilization due to the task set is:
3
i=1
ui = 22/100 + 32/150 + 92/200 = 0.89
3
Since i=1 ui > 0.78, the task set is not RMA schedulable according to the Liu and Layland
test.
Let us try Lehoczkys test. The tasks are already ordered in descending order of their
priorities.
Checking for task T1:
22 < 100
o m
removes it from the ready queue, places it in the blocked queue, and takes up the next eligible
task for scheduling. Thus, self-suspension introduces an additional scheduling point, which we
s p
g
In event-driven scheduling, the scheduling points are defined by task completion, task
o
arrival, and self-suspension events.
. bl
u p
Let us now determine the effect of self-suspension on the schedulability of a task set. Let us
r o
consider a set of periodic real-time tasks {T1, T2, , Tn}, which have been arranged in the
g
increasing order of their priorities (or decreasing order of their periods). Let the worst case
s
nt
self-suspension time of a task Ti is bi. Let the delay that the task Ti might incur due to its own
self-suspension and the self-suspension of all higher priority tasks be bti. Then, bti can be
expressed as:
d e
t u i-1
bti = bi + k=1 min(ek, bk) (3.8/2.15)
y s
t
Self-suspension of a higher priority task Tk may affect the response time of a lower priority task
i
.c
Ti by as much as its execution time ek if ek < bk. This worst case delay might occur when the
higher priority task after self-suspension starts its execution exactly at the time instant the lower
w
priority task would have otherwise executed. That is, after self-suspension, the execution of the
w
w
higher priority task overlaps with the lower priority task, with which it would otherwise not have
overlapped. However, if ek > bk, then the self suspension of a higher priority task can delay a
lower priority task by at most bk, since the maximum overlap period of the execution of a higher
priority task due to self-suspension is restricted to bk.
Note that in a system where some of the tasks are non preemptable, the effect of self
suspension is much more severe than that computed by Expr.3.8. The reason is that, every time
a processor self suspends itself, it loses the processor. It may be blocked by a non-preemptive
lower priority task after the completion of self-suspension. Thus, in a non-preemptable scenario,
a task incurs delays due to self-suspension of itself and its higher priority tasks, and also the
delay caused due to non-preemptable lower priority tasks. Obviously, a task can not get delayed
due to the self-suspension of a lower priority non-preemptable task.
The RMA task schedulability condition of Liu and Layland (Expr. 3.4) needs to change when
we consider the effect of self-suspension of tasks. To consider the effect of self-suspension in
Expr. 3.4, we need to substitute ei by (ei + bti). If we consider the effect of self-suspension on
task completion time, the Lehoczky criterion (Expr. 3.6) would also have to be generalized:
Version 2 EE IIT, Kharagpur 19
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
i-1
ei + bti + k=1 pi / pk ek pi (3.9/2.16)
We have so far implicitly assumed that a task undergoes at most a single self
suspension. However, if a task undergoes multiple self-suspensions, then expression 3.9 we
derived above, would need to be changed. We leave this as an exercise for the reader.
Example 14: Consider the following set of periodic real-time tasks: T1 = (e1=10, p1=50), T2 =
(e2=25, p2=150), T3 = (e3=50, p3=200) [all in msec]. Assume that the self-suspension times of
T1, T2, and T3 are 3 msec, 3 msec, and 5 msec, respectively. Determine whether the tasks
would meet their respective deadlines, if scheduled using RMA.
Solution: The tasks are already ordered in descending order of their priorities. By using the
generalized Lehoczkys condition given by Expr. 3.9, we get:
For T1 to be schedulable:
(10 + 3) < 50
Therefore T1 would meet its first deadline.
o m
t.c
For T2 to be schedulable:
(25 + 6 + 103) < 150
Therefore, T2 meets its first deadline.
p o
For T3 to be schedulable:
g s
o
(50 + 11 + (104 + 252)) < 200
bl
This inequality is also satisfied. Therefore, T3 would also meet its first deadline.
.
self-suspension of tasks is considered. u p
It can therefore be concluded that the given task set is schedulable under RMA even when
r o
1.9. Self Suspension with Context s g Switching Overhead
n t
Let us examine the effect of context e switches on the generalized Lehoczkys test (Expr.3.9)
for schedulability of a task set, whichdtakes self-suspension by tasks into account. In a fixed
priority preemptable system, each ttasku preempts at most one other task if there is no self
s
suspension. Therefore, each taskysuffers at most two context switches one context switch when
t
it starts and another when itcicompletes. It is easy to realize that any time when a task self-
suspends, it causes at most .two additional context switches. Using a similar reasoning, we can
determine that when eachw
w task is allowed to self-suspend twice, additional four context switching
w
overheads are incurred. Let us denote the maximum context switch time as c. The effect of a
single self-suspension of tasks is to effectively increase the execution time of each task T in the
i
worst case from ei to (ei + 4c). Thus, context switching overhead in the presence of a single
self-suspension of tasks can be taken care of by replacing the execution time of a task Ti by (ei +
4c) in Expr. 3.9. We can easily extend this argument to consider two, three, or more self-
suspensions.
1.10.Exercises
1. State whether the following assertions are True or False. Write one or two sentences to
justify your choice in each case.
a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper
bound on achievable utilization improves as the number in tasks in the system being
developed increases.
b. If a set of periodic real-time tasks fails Lehoczkys test, then it can safely be
concluded that this task set can not be feasibly scheduled under RMA.
c. A time-sliced round-robin scheduler uses preemptive scheduling.
d. RMA is an optimal static priority scheduling algorithm to schedule a set of periodic
real-time tasks on a non-preemptive operating system.
e. Self-suspension of tasks impacts the worst case response times of the individual tasks
much more adversely when preemption of tasks is supported by the operating system
compared to the case when preemption is not supported.
f. When a set of periodic real-time tasks is being scheduled using RMA, it can not be
the case that a lower priority task meets its deadline, whereas some higher priority
task does not.
g. EDF (Earliest Deadline First) algorithm possesses good transient overload handling
capability.
h. A time-sliced round robin scheduler is an example of a non-preemptive scheduler.
i. EDF algorithm is an optimal algorithm for scheduling hard real-time tasks on
o m
a uniprocessor when the task set is a mixture of periodic and aperiodic tasks.
j.
t.c
In a non-preemptable operating system employing RMA scheduling for a set of real-
o
time periodic tasks, self-suspension of a higher priority task (due to I/O etc.) may
increase the response time of a lower priority task.
s p
g
k. The worst-case response time for a task occurs when it is out of phase with its higher
o
bl
priority tasks.
l.
.
Good real-time task scheduling algorithms ensure fairness to real-time tasks while
scheduling. p
2. u
State whether the following assertions are True or False. Write one or two sentences to
o
justify your choice in each case.
gr
s
a. The EDF algorithm is optimal for scheduling real-time tasks in a uniprocessor in a
non-preemptive environment.
ent
b. When RMA is used to schedule a set of hard real-time periodic tasks in a
u d
uniprocessor environment, if the processor becomes overloaded any time during
st
system execution due to overrun by the lowest priority task, it would be very difficult
t y
to predict which task would miss its deadline.
i
.c
c. While scheduling a set of real-time periodic tasks whose task periods are
harmonically related, the upper bound on the achievable CPU utilization is the same
w
w
for both EDF and RMA algorithms.
w
d. In a non-preemptive event-driven task scheduler, scheduling decisions are made
only at the arrival and completion of tasks.
e. The following is the correct arrangement of the three major classes of real-time
scheduling algorithms in ascending order of their run-time overheads.
static priority preemptive scheduling algorithms
table-driven algorithms
dynamic priority algorithms
f. While scheduling a set of independent hard real-time periodic tasks on a
uniprocessor, RMA can be as proficient as EDF under some constraints on the task
set.
g. RMA should be preferred over the time-sliced round-robin algorithm for scheduling a
set of soft real-time tasks on a uniprocessor.
h. Under RMA, the achievable utilization of a set of hard real-time periodic tasks
would drop when task periods are multiples of each other compared to the case
when they are not.
i. RMA scheduling of a set of real-time periodic tasks using the Liu and Layland
criterion might produce infeasible schedules when the task periods are different from
the task deadlines.
3. What do you understand by scheduling point of a task scheduling algorithm? How are the
scheduling points determined in (i) clock-driven, (ii) event-driven, (iii) hybrid schedulers?
How will your definition of scheduling points for the three classes of schedulers change
when (a) self-suspension of tasks, and (b) context switching overheads of tasks are taken
into account.
4. What do you understand by jitter associated with a periodic task? How are these
jitters caused?
5. Is EDF algorithm used for scheduling real-time tasks a dynamic priority scheduling
algorithm? Does EDF compute any priority value of tasks any time? If you answer
o m
affirmatively, then explain when is the priority computed and how is it computed. If you
6.
answer in negative, then explain the concept of priority in EDF.
o t.c
What is the sufficient condition for EDF schedulability of a set of periodic tasks whose
p
period and deadline are different? Construct an example involving a set of three periodic
s
g
tasks whose period differ from their respective deadlines such that the task set fails the
o
bl
sufficient condition and yet is EDF schedulable. Verify your answer. Show all your
7.
intermediate steps.
p .
A preemptive static priority real-time task scheduler is used to schedule two periodic tasks
T1 and T2 with the following characteristics:
o u
r
g Relative Deadline
Phase t
Execution Times Period
mSecn
Task
mSec mSec mSec
d10e
T1 0
u
t 20
20 20
T2 0
y s 50 50
it
.c
Assume that T1 has higher priority than T2. A background task arrives at time 0 and would
w
require 1000mSec to complete. Compute the completion time of the background task
w
assuming that context switching takes no more than 0.5 mSec.
8. w
Assume that a preemptive priority-based system consists of three periodic foreground tasks
T1, T2, and T3 with the following characteristics:
T1 has higher priority than T2 and T2 has higher priority than T3. A background task Tb
arrives at time 0 and would require 2000mSec to complete. Compute the completion time
of the background task Tb assuming that context switching time takes no more than 1
mSec.
Assume that task T3 is more critical than task T2. Check whether the task set can be
feasibly scheduled using RMA.
10. What is the worst case response time of the background task of a system in which the
background task requires 1000 msec to complete? There are two foreground tasks. The
o m
higher priority foreground task executes once every 100mSec and each time requires
o t.c
25mSec to complete. The lower priority foreground task executes once every 50 msec and
requires 15 msec to complete. Context switching requires no more than 1 msec.
11. p
Construct an example involving more than one hard real-time periodic task whose
s
g
aggregate processor utilization is 1, and yet schedulable under RMA.
o
bl
12. Determine whether the following set of periodic tasks is schedulable on a uniprocessor
computation. p .
using DMA (Deadline Monotonic Algorithm). Show all intermediate steps in your
o u
Start Time
gr
Processing Time Period Deadline
Task
s
nt 25
mSec mSec mSec mSec
de
T1 20 150 140
T2 60 10 60 40
t u
ys
T3 40 20 200 120
T4 25 i t 10 80 25
.c
13. w
Consider the following set of three independent real-time periodic tasks.
w
wStart Time Processing Time Period Deadline
Task
mSec mSec mSec mSec
T1 20 25 150 100
T2 60 10 50 30
T3 40 50 200 150
Determine whether the task set is schedulable on a uniprocessor using EDF. Show
all intermediate steps in your computation.
14. Determine whether the following set of periodic real-time tasks is schedulable on a
uniprocessor using RMA. Show the intermediate steps in your computation. Is RMA
optimal when the task deadlines differ from the task periods?
15. Construct an example involving two periodic real-time tasks which can be feasibly
scheduled by both RMA and EDF, but the schedule generated by RMA differs from that
generated by EDF. Draw the two schedules on a time line and highlight how the two
schedules differ. Consider the two tasks such that for each task:
a. the period is the same as deadline
b. period is different from deadline
16.
m
Can multiprocessor real-time task scheduling algorithms be used satisfactorily in
o
distributed systems. Explain the basic difference between the characteristics of a real-time
17.
s p
Construct an example involving a set of hard real-time periodic tasks that are not
g
schedulable under RMA but could be feasibly scheduled by DMA. Verify your answer,
o
18.
showing all intermediate steps.
. bl
Three hard real-time periodic tasks T1 = (50, 100, 100), T2 = (70, 200, 200), and T3 = (60,
u p
400, 400) [time in msec] are to be scheduled on a uniprocessor using RMA. Can the task
o
set be feasibly be scheduled? Suppose context switch overhead of 1 millisecond is to be
r
g
taken into account, determine the schedulability.
s
nt
19. Consider the following set of three real-time periodic tasks.
Start Time d e
Processing Time Period Deadline
Task
mSec
t u mSec mSec mSec
y s
T1 20
c it 25 150 100
T2 .
40 10 50 50
T3 w60 50 200 200
w
a. w
Check whether the three given tasks are schedulable under RMA. Show all
intermediate steps in your computation.
b. Assuming that each context switch incurs an overhead of 1 msec, determine whether
the tasks are schedulable under RMA. Also, determine the average context switching
overhead per unit of task execution.
c. Assume that T1, T2, and T3 self-suspend for 10 msec, 20 msec, and 15 msec
respectively. Determine whether the task set remains schedulable under RMA. The
context switching overhead of 1 msec should be considered in your result. You can
assume that each task undergoes self-suspension only once during each of its
execution.
d. Assuming that T1 and T2 are assigned the same priority value, determine the
additional delay in response time that T2 would incur compared to the case when they
are assigned distinct priorities. Ignore the self-suspension times and the context
switch overhead for this part of the question.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
31
Concepts in Real-Time
Operating Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
1. Introduction p o
g s
o
bl
In the last three lessons, we discussed the important real-time task scheduling techniques. We
p .
highlighted that timely production of results in accordance to a physical clock is vital to
the satisfactory operation of a real-time system. We had also pointed out that real-time
ou
operating systems are primarily responsible for ensuring that every real-time task meets its
r
timeliness requirements. A real-time operating system in turn achieves this by using appropriate
g
s
nt
task scheduling techniques. Normally real-time operating systems provide flexibility to the
programmers to select an appropriate scheduling policy among several supported policies.
e
Deployment of an appropriate task scheduling technique out of the supported techniques is
d
u
therefore an important concern for every real-time programmer. To be able to determine the
t
s
suitability of a scheduling algorithm for a given problem, a thorough understanding of the
y
t
characteristics of various real-time task scheduling algorithms is important. We therefore had a
i
.c
rather elaborate discussion on real-time task scheduling techniques and certain related issues
w
such as sharing of critical resources and handling task dependencies.
w
In this lesson, we examine the important features that a real-time operating system is
w
expected to support. We start by discussing the time service supports provided by the real-time
operating systems, since accurate and high precision clocks are very important to the successful
operation any real- time application. Next, we point out the important features that a real-time
operating system needs to support. Finally, we discuss the issues that would arise if we attempt
to use a general purpose operating system such as UNIX or Windows in real-time applications.
system clock should have sufficiently fine resolution 1 to support the necessary time services.
However, designers of real-time operating systems find it very difficult to support very fine
resolution system clocks. In current technology, the resolution of hardware clocks is usually finer
than a nanosecond (contemporary processor speeds exceed 3GHz). But, the clock resolution
being made available by modern real-time operating systems to the programmers is of the order
of several milliseconds or worse. Let us first investigate why real-time operating system
designers find it difficult to maintain system clocks with sufficiently fine resolution. We then
examine various time services that are built based on the system clock, and made available to the
real-time programmers.
The hardware clock periodically generates interrupts (often called time service
interrupts). After each clock interrupt, the kernel updates the software clock and also
performs certain other work (explained in Sec 4.1.1). A thread can get the current time
reading of the system clock by invoking a system call supported by the operating system
(such as the POSIX clock-gettime()). The finer the resolution of the clock, the more
frequent need to be the time service interrupts and larger is the amount of processor time the
kernel spends in responding to these interrupts. This overhead places a limitation on how
o m
o t.c
fine is the system clock resolution a computer can support. Another issue that caps the
resolution of the system clock is the response time of the clock-gettime() system call is
p
not deterministic. In fact, every system call (or for that matter, a function call) has some
s
g
associated jitter. The problem gets aggravated in the following situation. The jitter is caused on
o
bl
account of interrupts having higher priority than system calls. When an interrupt occurs, the
p .
processing of a system call is stalled. Also, the preemption time of system calls can vary
because many operating systems disable interrupts while processing a system call. The variation
u
in the response time (jitter) introduces an error in the accuracy of the time value that the calling
o
r
thread gets from the kernel. Remember that jitter was defined as the difference between the
g
s
worst-case response time and the best case response time (see Sec. 2.3.1). In commercially
e nt
available operating systems, jitters associated with system calls can be several milliseconds. A
software clock resolution finer than this error, is therefore not meaningful.
u d
We now examine the different activities that are carried out by a handler routine after a clock
st
interrupt occurs. Subsequently, we discuss how sufficient fine resolution can be provided in the
it
presence of jitter in function calls. y
c
. Processing
1.1.1. Clock Interrupt
w
w
w Expiration time
Handler Handler
2 1
1
Clock resolution denotes the time granularity provided by the clock of a computer. It corresponds to the duration
of time that elapses between two successive clock ticks.
Each time a clock interrupt occurs, besides incrementing the software clock, the handler
routine carries out the following activities:
Process timer events: Real-time operating systems maintain either per-process timer
queues or a single system-wide timer queue. The structure of such a timer queue has been
shown in Fig. 31.1. A timer queue contains all timers arranged in order of their expiration
times. Each timer is associated with a handler routine. The handler routine is the function that
should be invoked when the timer expires. At each clock interrupt, the kernel checks the
timer data structures in the timer queue to see if any timer event has occurred. If it finds that
a timer event has occurred, then it queues the corresponding handler routine in the ready
queue.
Update ready list: Since the occurrence of the last clock event, some tasks might have
arrived or become ready due to the fulfillment of certain conditions they were waiting for.
The tasks in the wait queue are checked, the tasks which are found to have become ready, are
o m
queued in the ready queue. If a task having higher priority than the currently running task is
is invoked.
o t.c
found to have become ready, then the currently running task is preempted and the scheduler
p
Update execution budget: At each clock interrupt, the scheduler decrements the time slice
s
g
(budget) remaining for the executing task. If the remaining budget becomes zero and the task
o
bl
is not complete, then the task is preempted, the scheduler is invoked to select another task to
run.
p .
o
1.1.2. Providing High Clock Resolution u
g r
We had pointed out in Sec. 4.1 that there
ts are two main difficulties in providing a high
n with processing the clock interrupt becomes
resolution timer. First, the overhead associated
excessive. Secondly, the jitter associatedewith the time lookup system call (clock-gettime()) is
u d Therefore, it is not useful to provide a clock with a
s t some real-time applications need to deal with timing
often of the order of several milliseconds.
resolution any finer than this. However,
constraints of the order of a fewtynanoseconds. Is it at all possible to support time measurement
i
with nanosecond resolution? cA way to provide sufficiently fine clock resolution is by mapping a
.
clock directly (throughw
w
hardware clock into the address space of applications. An application can then read the hardware
a normal memory read operation) without having to make a system call.
w
On a Pentium processor, a user thread can be made to read the Pentium time stamp counter. This
counter starts at 0 when the system is powered up and increments after each processor cycle. At
todays processor speed, this means that during every nanosecond interval, the counter
increments several times.
However, making the hardware clock readable by an application significantly reduces
the portability of the application. Processors other than Pentium may not have a high
resolution counter, and certainly the memory address map and resolution would differ.
1.1.3. Timers
We had pointed out that timer service is a vital service that is provided to applications
by all real-time operating systems. Real-time operating systems normally support two main
types of timers: periodic timers and aperiodic (or one shot) timers. We now discuss some basic
concepts about these two types of timers.
Periodic Timers: Periodic timers are used mainly for sampling events at regular intervals or
performing some activities periodically. Once a periodic timer is set, each time after it
expires the corresponding handler routine is invoked, it gets reinserted into the timer queue.
For example, a periodic timer may be set to 100 msec and its handler set to poll the
temperature sensor after every 100 msec interval.
Aperiodic (or One Shot) Timers: These timers are set to expire only once. Watchdog
timers are popular examples of one shot timers.
f(){
wd_start(t1, exception-handler); start
t1
o m
wd_tickle ( );
o t.c end
}
Fig. 31.2 Use of a Watchdog Timer s p
o g
Watchdog timers are used extensively in real-time b l
programs to detect when a task misses
its deadline, and then to initiate exception handling p .
procedures upon a deadline miss. An
example use of a watchdog timer has been illustrated
ou inf()Fig.through
timer is set at the start of a certain critical rfunction
31.2. In Fig. 31.2, a watchdog
a wd_start(t1) call. The
g
starting of the task. If the function f() doestsnot complete even after t time units have elapsed,
wd_start(t1) call sets the watch dog timer to expire by the specified deadline (t ) of the
1
then the watchdog timer fires, indicatinge n that the task deadline must have been missed and
1
the exception handling procedure isdinitiated. In case the task completes before the watchdog
t
timer expires (i.e. the task completesu within its deadline), then the watchdog timer is reset
s
using a wd_ tickle() call.
ti y
c
1.2. Features ofwa. Real-Time Operating System
w
Before discussing w about commercial real-time operating systems, we must clearly understand
the features normally expected of a real-time operating system and also let us compare different
real-time operating systems. This would also let us understand the differences between a
traditional operating system and a real-time operating system. In the following, we identify
some important features required of a real-time operating system, and especially those that are
normally absent in traditional operating systems.
Clock and Timer Support: Clock and timer services with adequate resolution are one of the
most important issues in real-time programming. Hard real-time application development often
requires support of timer services with resolution of the order of a few microseconds. And even
finer resolution may be required in case of certain special applications. Clocks and timers are a
vital part of every real-time operating system. On the other hand, traditional operating systems
often do not provide time services with sufficiently high resolution.
Real-Time Priority Levels: A real-time operating system must support static priority levels.
A priority level supported by an operating system is called static, when once the
programmer assigns a priority value to a task, the operating system does not change it by
itself. Static priority levels are also called real-time priority levels. This is because, as we
discuss in section 4.3, all traditional operating systems dynamically change the priority levels of
tasks from programmer assigned values to maximize system throughput. Such priority levels that
are changed by the operating system dynamically are obviously not static priorities.
p
s is defined as the time delay
Predictable and Fast Interrupt Latency: Interrupt latency
g
between the occurrence of an interrupt and the running ofo the corresponding ISR (Interrupt
b l bound on interrupt latency must be
Service Routine). In real-time operating systems, the upper
.
p The way low interrupt latency is
bounded and is expected to be less than a few micro seconds.
achieved, is by performing bulk of the activities of u ISR in a deferred procedure call (DPC). A
DPC is essentially a task that performs most of the r oISR activity. A DPC is executed later at a
s
certain priority value. Further, support for nestedg interrupts are usually desired. That is, a real-
time operating system should not only be
n t preemptive while executing kernel routines, but
d e timingas requirements.
should be preemptive during interrupt servicing well. This is especially important for hard
u
real-time applications with sub-microsecond
t
y s Among Real-Time Tasks: If real- time tasks are allowed to
Support for Resource Sharing
t
share critical resources among ithemselves using the traditional resource sharing techniques, then
c
the response times of tasks. can become unbounded leading to deadline misses. This is one
compelling reason as towwhy every commercial real-time operating system should at the
minimum provide the w
wbasic priority inheritance mechanism. Support of priority ceiling protocol
(PCP) is also desirable, if large and moderate sized applications are to be supported.
applications with some means of controlling paging, such as memory locking. Memory locking
prevents a page from being swapped from memory to hard disk. In the absence of memory
locking feature, memory access times of even critical real-time tasks can show large jitter, as the
access time would greatly depend on whether the required page is in the physical memory or has
been swapped out.
Memory protection is another important issue that needs to be carefully considered. Lack of
support for memory protection among tasks leads to a single address space for the tasks.
Arguments for having only a single address space include simplicity, saving memory bits,
and light weight system calls. For small embedded applications, the overhead of a few Kilo
Bytes of memory per process can be unacceptable. However, when no memory protection is
provided by the operating system, the cost of developing and testing a program without memory
protection becomes very high when the complexity of the application increases. Also,
maintenance cost increases as any change in one module would require retesting the entire
system.
Embedded real-time operating systems usually do not support virtual memory. Embedded
o m
real-time operating systems create physically contiguous blocks of memory for an application
o t.c
upon request. However, memory fragmentation is a potential problem for a system that does not
support virtual memory. Also, memory protection becomes difficult to support a non-virtual
p
memory management system. For this reason, in many embedded systems, the kernel and the
s
g
user processes execute in the same space, i.e. there is no memory protection. Hence, a system
o
bl
call and a function call within an application are indistinguishable. This makes debugging
system freeze. p .
applications difficult, since a run away pointer can corrupt the operating system code, making the
ou
r
Additional Requirements for Embedded Real-Time Operating Systems: Embedded
g
s
applications usually have constraints on cost, size, and power consumption. Embedded real-time
ent
operating systems should be capable of diskless operation, since many times disks are either too
bulky to use, or increase the cost of deployment. Further, embedded operating systems should
u d
minimize total power consumption of the system. Embedded operating systems usually reside on
st
ROM. For certain applications which require faster response, it may be necessary to run the real-
t y
time operating system on a RAM. Since the access time of a RAM is lower than that of a ROM,
i
.c
this would result in faster execution. Irrespective of whether ROM or RAM is used, all ICs are
expensive. Therefore, for real-time operating systems for embedded applications it is desirable to
w
w
have as small a foot print (memory usage) as possible. Since embedded products are typically
w
manufactured large scale, every rupee saved on memory and other hardware requirements
impacts millions in profit.
The two most troublesome problems that a real-time programmer faces while using Unix
for real-time applications include non-preemptive Unix kernel and dynamically changing
priority of tasks.
o t.c
instruction called a trap (or a software interrupt) is executed. As soon as the trap instruction is
executed, the handler routine changes the processor state from user mode to kernel mode (or
s p
supervisor mode), and the execution of the required kernel routine starts. The change of mode
during a system call has schematically been depicted in Fig. 31.3.
o g
System call
. bl
u p
r o
s g
nt
Check parameters Application Program
(user mode)
System call
OS Service
(Kernel mode) d e
Trap Next statement
t u
y s
c it
. of an Operating System Service through System Call
Fig. 31.3 Invocation
w
w
At the risk of digressing from the focus of this discussion, let us understand an
w
important operating systems concept. Certain operations such as handling devices, creating
processes, file operations, etc., need to be done in the kernel mode only. That is, application
programs are prevented from carrying out these operations, and need to request the operating
system (through a system call) to carry out the required operation. This restriction enables the
kernel to enforce discipline among different programs in accessing these objects. In case
such operations are not performed in the kernel mode, different application programs
might interfere with each others operation. An example of an operating system where all
operations were performed in user mode is the once popular operating system DOS
(though DOS is nearly obsolete now). In DOS, application programs are free to carry out any
operation in user mode 2 , including crashing the system by deleting the system files. The
instability this can bring about is clearly unacceptable in real-time environment, and is usually
considered insufficient in general applications as well.
2
In fact, in DOS there is only one mode of operation, i.e. kernel mode and user mode are indistinguishable.
o t.c
interrupts, but it would have resulted in increasing the average task preemption time. In Sec.
4.4.4 we investigate how modern real-time operating systems make the kernel preemptive
without unduly increasing the task preemption time.
s p
o g
bl
1.3.2. Dynamic Priority Levels
p .
ou
In Unix systems real-time tasks can not be assigned static priority values. Soon after a
programmer sets a priority value, the operating system alters it. This makes it very difficult to
gr
schedule real-time tasks using algorithms such as RMA or EDF, since both these schedulers
s
nt
assume that once task priorities are assigned, it should not be altered by any other parts of the
operating system. It is instructive to understand why Unix dynamically changes the priority
values of tasks in the first place.
d e
t u
Unix uses round-robin scheduling with multilevel feedback. This scheduler arranges tasks in
s
multilevel queues as shown in Fig. 31.4. At every preemption point, the scheduler scans the
y
it
multilevel queue from the top (highest priority) and selects the task at the head of the first non-
.c
empty queue. Each task is allowed to run for a fixed time quantum (or time slice) at a time. Unix
w
normally uses one second time slice. That is, if the running process does not block or complete
w
within one second of its starting execution, it is preempted and the scheduler selects the next task
w
for dispatching. Unix system however allows configuring the default one second time slice
during system generation. The kernel preempts a process that does not complete within its
assigned time quantum, recomputes its priority, and inserts it back into one of the priority queues
depending on the recomputed priority value of the task.
Task Queues
3
o m
o
6 t.c
Fig. 31.4 Multi-Level Feedback Queuess p
g
Unix periodically computes the priority of a task bbased lo on the type of the task and
p .
its execution history. The priority of a task (T ) is recomputed
i at the end of its j-th time slice
using the following two expressions:
o u
gir ) + CPU(T , j) + nice(T )
Pr(T , j) = Base(T i i (4.1)i
CPU(T , j) =nU(T
i
ts , j1) / 2 + CPU(T , j1) / 2
i i (4.2)
where Pr(T , j) is the priority of the
i
d e task T at the end
i of its j-th time slice; U(T , j) i
u
is the utilization of the task T for its j-th time slice, and CPU(T , j) is the weighted history of
CPU utilization of the task T at thetend of its j-th time slice. Base(T ) is the base priority of the
i i
.
to the other processes).
w
w CPU(Tdefined.
Expr. 4.2 has been recursively Unfolding the recursion, we get:
, j) = U(T , j1) / 2 + U(T , j2) / 4 +
w i i i (4.3)
It can be easily seen from Expr. 4.3 that, in the computation of the weighted history of CPU
utilization of a task, the activity (i.e. processing or I/O) of the task in the immediately concluded
interval is given the maximum weightage. If the task used up CPU for the full duration of the
slice (i.e. 100% CPU utilization), then CPU(Ti, j) gets a higher value indicating a lower priority.
Observe that the activities of the task in the preceding intervals get progressively lower
weightage. It should be clear that CPU(Ti, j) captures the weighted history of CPU utilization of
the task Ti at the end of its j-th time slice.
Now, substituting Expr 4.3 in Expr. 4.1, we get:
Pr(Ti, j) = Base(Ti) + U(Ti, j1) / 2 + U(Ti, j2) / 4 + + nice(Ti) (4.4)
The purpose of the base priority term in the priority computation expression (Expr.
4.4) is to divide all tasks into a set of fixed bands of priority levels. The values of
U(Ti , j) and nice components are restricted to be small enough to prevent a process from
migrating from its assigned band. The bands have been designed to optimize I/O, especially
block I/O. The different priority bands under Unix in decreasing order of priorities are:
swapper, block I/O, file manipulation, character I/O and device control, and user processes.
Tasks performing block I/O are assigned the highest priority band. To give an example of block
I/O, consider the I/O that occurs while handling a page fault in a virtual memory system. Such
block I/O use DMA-based transfer, and hence make efficient use of I/O channel. Character I/O
includes mouse and keyboard transfers. The priority bands were designed to provide the most
effective use of the I/O channels.
Dynamic re-computation of priorities was motivated from the following consideration. Unix
designers observed that in any computer system, I/O is the bottleneck. Processors are extremely
fast compared to the transfer rates of I/O devices. I/O devices such as keyboards are necessarily
slow to cope up with the human response times. Other devices such as printers and disks
deploy mechanical components that are inherently slow and therefore can not sustain very
high rate of data transfer. Therefore, effective use of the I/O channels is very important to
increase the overall system throughput. The I/O channels should be kept as busy as possible for
letting the interactive tasks to get good response time. To keep the I/O channels busy, any task
o m
performing I/O should not be kept waiting for CPU. For this reason, as soon as a task blocks for
o t.c
I/O, its priority is increased by the priority re-computation rule given in Expr. 4.4. However, if a
task makes full use of its last assigned time slice, it is determined to be computation-bound and
p
its priority is reduced. Thus the basic philosophy of Unix operating system is that the interactive
s
g
tasks are made to assume higher priority levels and are processed at the earliest. This gives the
o
bl
interactive users good response time. This technique has now become an accepted way of
p .
scheduling soft real-time tasks across almost all available general purpose operating systems.
We can now state from the above observations that the overall effect of re-
u
computation of priority values using Expr. 4.4 as follows:
o
r
g and higher priorities, whereas CPU-
In Unix, I/O intensive tasks migrate to higher
intensive tasks seek lower priority levels. t
s
e n
No doubt that the approach taken d by Unix is very appropriate for maximizing the average
t u
task throughput, and does indeed provide good average responses time to interactive (soft real-
s
time) tasks. In fact, almost every
i ty modern operating system does very similar dynamic re-
good average response time.to c the interactive tasks. However, for hard real-time tasks, dynamic
computation of the task priorities to maximize the overall system throughput and to provide
Insufficient Device Driver Support: In Unix, (remember that we are talking of the original
Unix System V) device drivers run in kernel mode. Therefore, if support for a new device is to
be added, then the driver module has to be linked to the kernel modules necessitating a system
generation step. As a result, providing support for a new device in an already deployed
application is cumbersome.
Lack of Real-Time File Services: In Unix, file blocks are allocated as and when they are
requested by an application. As a consequence, while a task is writing to a file, it may encounter
an error when the disk runs out of space. In other words, no guarantee is given that disk space
would be available when a task writes a block to a file. Traditional file writing approaches also
result in slow writes since required space has to be allocated before writing a block. Another
problem with the traditional file systems is that blocks of the same file may not be contiguously
located on the disk. This would result in read operations taking unpredictable times, resulting in
jitter in data access. In real-time file systems significant performance improvement can be
achieved by storing files contiguously on the disk. Since the file system pre-allocates space, the
times for read and write operations are more predictable.
ICP/IP
Host System
Target Board
Fig. 31.5 Schematic Representation of a Host-Target System
The main idea behind this approach is that the real-time operating system running on the
o m
target board be kept as small and simple as possible. This implies that the operating system on
the target board would lack virtual memory management support, neither does it support any
s p
The host system must have the program development environment, including compilers,
g
editors, library, cross-compilers, debuggers etc. These are memory demanding applications that
o
bl
require virtual memory support. The host is usually connected to the target using a serial port or
p .
a TCP/IP connection (see Fig. 31.5). The real-time program is developed on the host. It is then
cross-compiled to generate code for the target processor. Subsequently, the executable module is
ou
downloaded to the target board. Tasks are executed on the target board and the execution is
gr
controlled at the host side using a symbolic cross-debugger. Once the program works
s
nt
successfully, it is fused on a ROM or flash memory and becomes ready to be deployed in
applications.
d e
Commercial examples of host-target real-time operating systems include PSOS, VxWorks,
u
and VRTX. We examine these commercial products in lesson 5. We would point out that these
t
s
operating systems, due to their small size, limited functionality, and optimal design achieve
y
it
much better performance figures than full-fledged operating systems. For example, the task
.c
preemption times of these systems are of the order of few microseconds compared to several
w
hundreds of milliseconds for traditional Unix systems.
w
w Point Approach
1.4.3. Preemption
We have already pointed out that one of the major shortcomings of the traditional
Unix V code is that during a system call, all interrupts are masked(disabled) for the entire
duration of execution of the system call. This leads to unacceptable worst case task response
time of the order of second, making Unix-based systems unacceptable for most hard real-time
applications.
An approach that has been taken by a few vendors to improve the real-time performance of
non-preemptive kernels is the introduction of preemption points in system routines. Preemption
points in the execution of a system routine are the instants at which the kernel data structures are
consistent. At these points, the kernel can safely be preempted to make way for any waiting
higher priority real-time tasks without corrupting any kernel data structures. In this approach,
when the execution of a system call reaches a preemption point, the kernel checks to
see if any higher priority tasks have become ready. If there is at least one, it preempts
the processing of the kernel routine and dispatches the waiting highest priority task
immediately. The worst-case preemption latency in this technique therefore becomes the longest
time between two consecutive preemption points. As a result, the worst-case response times of
tasks are now several folds lower than those for traditional operating systems without preemption
points. This makes the preemption point-based operating systems suitable for use in many
categories hard real-time applications, though still not suitable for applications requiring
preemption latency of the order of a few micro seconds or less. Another advantage of
this approach is that it involves only minor changes to be made to the kernel code.
Many operating systems have taken the preemption point approach in the past, a prominent
example being HP-UX.
o m
separate host system machine running traditional Unix, in self-host systems a real-time
application is developed on the same system on which the real-time application would finally
o t.c
run. Of course, while deploying the application, the operating system modules that are
not essential during task execution are excluded during deployment to minimize the size
s p
of the operating system in the embedded application. Remember that in host-target approach,
g
the target real-time operating system was a lean and efficient system that could only run the
o
bl
application but did not include program development facilities; program development was
.
carried out on the host system. This made application development and debugging difficult and
p
ou
required cross-compiler and cross-debugger support. Self-host approach takes a different
approach where the real-time application is developed on the full-fledged operating system, and
gr
once the application runs satisfactorily it is fused on the target board on a ROM or flash memory
s
nt
along with a stripped down version of the same operating system.
Most of the self-host operating systems that are available now are based on micro-kernel
d e
architecture. Use of microkernel architecture for a self-host operating system entails several
t u
advantages. In microkernel architecture, only the core functionalities such as interrupt handling
s
and process management are implemented as kernel routines. All other functionalities such as
y
it
memory management, file management, device management, etc are implemented as add-on
.c
modules which operate in user mode. As a result, it becomes very easy to configure the
w
operating system. Also, the micro kernel is lean and therefore becomes much more
w
efficient. A monolithic operating system binds most drivers, file systems, and protocol
w
stacks to the operating system kernel and all kernel processes share the same address
space. Hence a single programming error in any of these components can cause a fatal
kernel fault. In microkernel-based operating systems, these components run in separate
memory-protected address spaces. So, system crashes on this count are very rare, and
microkernel-based operating systems are very reliable.
We had discussed earlier that any Unix-based system has to overcome the following
two main shortcomings of the traditional Unix kernel in order to be useful in hard real-time
applications: non-preemptive kernel and dynamic priority values. We now examine how these
problems are overcome in self-host systems.
done from efficiency considerations and worked well for non-real-time and uniprocessor
applications.
Masking interrupts during kernel processing makes to even very small critical routines
to have worst case response times of the order of a second. Further, this approach
would not work in multiprocessor environments. In multiprocessor environments masking
the interrupts for one processor does not help, as the tasks running on other processors can still
corrupt the kernel data structure.
It is now clear that in order to make the kernel preemptive, locks must be used at appropriate
places in the kernel code. In fully preemptive Unix systems, normally two types of locks are
used: kernel-level locks, and spin locks.
T2 T1
Busy wait
Spin lock
Critical
Resource o m
o t.c
s p
Fig. 31.6 Operation of a Spin Lock
o g
b l a task waits for a kernel level lock
to be released, it is blocked and undergoes a context p .
A kernel-level lock is similar to a traditional lock. When
switch. It becomes ready only after the
required lock is released by the holding task and u becomes available. This type of locks is
inefficient when critical resources are requiredro for short durations of the order of a few
milliseconds or less. In some situations such g
t scarrying
context switching overheads are not acceptable.
Consider that some task requires the lock for
single arithmetic operation) on some critical
n out very small processing (possibly a
e resource. Now, if a kernel level lock is used,
another task requesting the lock atd that time would be blocked and a context switch
would be incurred, also the cachetu
y s contents, pages of the task etc. may be swapped. Here a
i t
context switching time is comparable to the time for which a task needs a resource even greater
c
than it. In such a situation, a spin lock would be appropriate. Now let us understand the
operation of a spin lock. A.spin lock has been schematically shown in Fig. 31.6. In Fig. 31.6, a
w by the tasks T and T for very short times (comparable to a context
w
critical resource is required 1 2
guarding the resource.w Meanwhile, the task T requests the resource. When task T cannot get
switching time). This resource is protected by a spin lock. The task T has acquired the spin lock
2
1
2
access to the resource, it just busy waits (shown as a loop in the figure) and does not block and
suffer context switch. T2 gets the resource as soon as T1 relinquishes the resource.
Real-Time Priorities: Let us now examine how self-host systems address the problem of
dynamic priority levels of the traditional Unix systems. In Unix based real-time operating
systems, in addition to dynamic priorities, real-time and idle priorities are supported. Fig. 31.7
schematically shows the three available priority levels.
Real-time
Priorities
127
Dynamic
Priorities
254
255 Idle Non-Migrating
Priority
Idle(Non-Migrating): This is the lowest priority. The task that runs when there are no other
o m
tasks to run (idle), runs at this level. Idle priorities are static and are not recomputed
t.c
periodically.
p o
Dynamic: Dynamic priorities are recomputed periodically to improve the average response
s
time of soft real-time tasks. Dynamic re-computation of priorities ensures that I/O bound
g
o
bl
tasks migrate to higher priorities and CPU-bound tasks operate at lower priority levels. As
shown in Fig. 31.7, dynamic priority levels are higher than the idle priority, but are lower than
the real-time priorities. p .
o u
r
Real-Time: Real-time priorities are static priorities and are not recomputed. Hard real-time
g
s
tasks operate at these levels. Tasks having real-time priorities operate at higher priorities than the
tasks with dynamic priority levels.
e nt
ud Operating System
1.5. Windows As A Real-Time
t
s
ity evolved
Microsofts Windows operating
Windows operating systems chave
systems are extremely popular in desktop computers.
. over the years last twenty five years from the naive
w of DOS almost every year and kept on adding new features to DOS
DOS (Disk Operating System). Microsoft developed DOS in the early eighties. Microsoft kept
w
on announcing new versions
w
in the successive versions. DOS evolved to the Windows operating systems, whose main
distinguishing feature was a graphical front-end. As several new versions of Windows kept on
appearing by way of upgrades, the Windows code was completely rewritten in 1998 to develop
the Windows NT system. Since the code was completely rewritten, Windows NT system was
much more stable (does not crash) than the earlier DOS-based systems. The later versions of
Microsofts operating systems were descendants of the Windows NT; the DOS-based systems
were scrapped. Fig. 31.8 shows the genealogy of the various operating systems from the
Microsoft stable. Because stability is a major requirement for hard real-time applications, we
consider only the Windows NT and its descendants in our study and do not include the DOS line
of products.
DOS
Windows NT
Windows 3.1
Windows 2000
Windows 95 New
code
Windows XP
Windows 98
o t.c
real-time applications on account of either cost saving or convenience. This is especially true in
prototype application development and also when only a limited number of deployments are
s p
required. In the following, we critically analyze the suitability of Windows NT for real-time
g
application development. First, we highlight some features of Windows NT that are very
o
. bl
relevant and useful to a real-time application developer. In the subsequent subsection, we point
out some of the lacuna of Windows NT when used in real-time application development.
u p
1.5.1. Features of Windows NT r o
s g
Windows NT has several features which tare very desirable for real-time applications such as
support for multithreading, real-time priority e n levels, and timer. Moreover, the clock resolutions
d
are sufficiently fine for most real-time applications.
u
Windows NT supports 32 priority
s t idle, levels (see Fig. 31.9). Each process belongs to one
of the following priority classes:
t y normal, high, real-time. By default, the priority
class at which an application
ci runs is normal.
the priority is recomputed .periodically.
Both normal and high are variable type where
NT uses priority-driven pre- emptive scheduling and
w
threads of real-time priorities
Processes such as screen wsaver have precedence over all other threads including kernel threads.
use priority class idle. NT lowers the priority of a task (belonging
to variable type) if itwused all of its last time slice. It raises the priority of a task if it blocked for
I/O and could not use its last time slice in full. However, the change of a task from its base
priority is restricted to 2.
Real-time critical 31
Real-time idle 16
Dynamic-time critical 15
Dynamic idle 1 o m
Idle 0
ot.c
s p
g
Fig. 31.9 Task Priorities in Windows NT
o
1.5.2. Shortcomings of Windows NT . bl
u p
r o
In spite of the impressive support that Windows provides for real-time program development
g
as discussed in Section 4.5.1, a programmer trying to use Windows in real-time system
s
nt
development has to cope up with several problems. Of these, the following two main problems
are the most troublesome.
d e
1. t
Interrupt Processing: Priority u level of interrupts is always higher than that of the user-
s threads of real-time class. When an interrupt occurs, the
ty machines state and makes the system execute an Interrupt
level threads; including the
handler routine saves ithe
.
Service Routine (ISR).c Only critical processing is performed in ISR and the bulk of the
processing is donewas a Deferred Procedure Call(DPC). DPCs for various interrupts are
w queue in a FIFO manner. While this separation of ISR and DPC has
queued in the DPC
the advantagewof providing quick response to further interrupts, it has the disadvantage of
maintaining the all DPCs at the same priorities. A DPC can not be preempted by another
DPC but by an interrupt. DPCs are executed in FIFO order at a priority lower than the
hardware interrupt priorities but higher than the priority of the scheduler/dispatcher.
Further, it is not possible for a user-level thread to execute at a priority higher
than that of ISRs or DPCs. Therefore, even ISRs and DPCs corresponding to very
low priority tasks can preempt real-time processes. Therefore, the potential blocking
of real-time tasks due to DPCs can be large. For example, interrupts due to page faults
generated by low priority tasks would get processed faster than real-time processes.
Also, ISRs and DPCs generated due to key board and mouse interactions would operate
at higher priority levels compared to real-time tasks. If there are processes doing network
or disk I/O, the effect of system-wide FIFO queues may lead to unbounded response
times for even real-time threads.
These problems have been avoided by Windows CE operating system through a priority
inheritance mechanism.
2. Support for Resource Sharing Protocols: We had discussed in Chapter 3 that unless
appropriate resource sharing protocols are used, tasks while accessing shared resources
may suffer unbounded priority inversions leading to deadline misses and even system
failure. Windows NT does not provide any support (such as priority inheritance, etc.) to
support real-time tasks to share critical resource among themselves. This is a major
shortcoming of Windows NT when used in real-time applications.
Since most real-time applications do involve resource sharing among tasks we outline
below the possible ways in which user-level functionalities can be added to the Windows
NT system.
The simplest approach to let real-time tasks share critical resources without unbounded
priority inversions is as follows. As soon as a task is successful in locking a non-
preemptable resource, its priority can be raised to the highest priority (31). As soon as a
task releases the required resource, its priority is restored. However, we know that this
arrangement would lead to large inheritance-related inversions.
o m
o t.c
Another possibility is to implement the priority ceiling protocol (PCP). To implement this
protocol, we need to restrict the real-time tasks to have even priorities (i.e. 16, 18, ..., 30).
p
The reason for this restriction is that NT does not support FIFO scheduling among equal
s
g
priority tasks. If the highest priority among all tasks needing a resource is 2n, then the
o
bl
ceiling priority of the resource is 2n+1. In Unix, FIFO option among equal priority tasks
.
is available; therefore all available priority levels can be used.
p
1.6. Windows vs Unix o u
r
g NT versus Unix
ts
Table 31.1 Windows
Real-Time Featuree
n Windows NT Unix V
u d
DPCs
st
Real-Time priorities
Yes
Yes
No
No
y
it memory
Locking virtual Yes Yes
. c
Timer precision 1 msec 10 msec
w
w
Asynchronous I/O Yes No
w
Though Windows NT has many of the features desired of a real-time operating system, its
implementation of DPCs together its lack of protocol support for resource sharing among equal
priority tasks makes it unsuitable for use in safety-critical real-time applications. A comparison
of the extent to which some of the basic features required for real-time programming are
provided by Windows NT and Unix V is indicated in Table 1. With careful programming,
Windows NT may be useful for applications that can tolerate occasional deadline misses, and
have deadlines of the order of hundreds of milliseconds than microseconds. Of course, to be used
in such applications, the processor utilization must be kept sufficiently low and priority inversion
control must be provided at the user level.
1.7. Exercises
1. State whether the following assertions are True or False. Justify your answer in each case.
a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper
bound on achievable utilization improves as the number in tasks in the system being
developed increases.
b. Under the Unix operating system, computation intensive tasks dynamically
gravitate towards higher priorities.
c. Normally, task switching time is larger than task preemption time.
d. Suppose a real-time operating system does not support memory protection, then a
procedure call and a system call are indistinguishable in that system.
e. Watchdog timers are typically used to start certain tasks at regular intervals.
f.
m
For the memory of same size under segmented and virtual addressing schemes, the
o
segmented addressing scheme would in general incur lower memory access jitter
2.
compared to the virtual addressing scheme.
o t.c
Even though clock frequency of modern processors is of the order of several GHz, why do
s p
many modern real-time operating systems not support nanosecond or even microsecond
g
resolution clocks? Is it possible for an operating system to support nanosecond resolution
o
bl
clocks in operating systems at present? Explain how this can be achieved.
3. .
Give an example of a real-time application for which a simple segmented memory
p
ou
management support by the RTOS is preferred and another example of an application for
which virtual memory management support is essential. Justify your choices.
4.
gr
Is it possible to meet the service requirements of hard real-time applications by writing
s
nt
additional layers over the Unix System V kernel? If your answer is no, explain the
reason. If your answer is yes, explain what additional features you would implement in
d e
the external layer of Unix System V kernel for supporting hard real-time applications.
5.
t u
Briefly indicate how Unix dynamically recomputes task priority values. Why is such re-
s
computation of task priorities required? What are the implications of such priority re-
y
it
computations on real-time application development?
6.
.c
Why is Unix V non-preemptive in kernel mode? How do fully preemptive kernels based
w
on Unix (e.g. Linux) overcome this problem? Briefly describe an experimental set up that
w
can be used to determine the preemptability of different operating systems by high-priority
7.
w
real-time tasks when a low priority task has made a system call.
Explain how interrupts are handled in Windows NT. Explain how the interrupt processing
scheme of Windows NT makes it unsuitable for hard real-time applications. How has this
problem been overcome in WinCE?
8. Would you recommend Unix System V to be used for a few real-time tasks for running a
data acquisition application? Assume that the computation time for these tasks is of the
order of few hundreds of milliseconds and the deadline of these tasks is of the order of
several tens of seconds. Justify your answer.
9. Explain the problems that you would encounter if you try to develop and run a hard real-
time system on the Windows NT operating system.
10. Briefly explain why the traditional Unix kernel is not suitable to be used in a
multiprocessor environments. Define a spin lock and a kernel-level lock and explain their
use in realizing a preemptive kernel.
11. What do you understand by a microkernel-based operating system? Explain the advantages
of a microkernel- based real-time operating system over a monolithic operating system.
12. What is the difference between a self-host and a host-target based embedded operating
system? Give at least one example of a commercial operating system from each category.
What problems would a real-time application developer might face while using RT-Linux
for developing hard real-time applications?
13. What are the important features required in a real-time operating system? Analyze to what
extent these features are provided by Windows NT and Unix V.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
6
Embedded System
Software
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
32
Commercial Real-Time
Operating Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
1. Introduction
o m
Many real-time operating systems are at present available commercially. In this lesson, we
o t.c
analyze some of the popular real-time operating systems and investigate why these popular
systems cannot be used across all applications. We also examine the POSIX standards for RTOS
and their implications.
s p
o g
1.1. POSIX
. bl
u p
POSIX stands for Portable Operating System Interface. X has been suffixed to the
r o
abbreviation to make it sound Unix-like. Over the last decade, POSIX has become an important
g
standard in the operating systems area including real-time operating systems. The importance of
s
nt
POSIX can be gauzed from the fact that nowadays it has become uncommon to come across a
commercial operating system that is not POSIX-compliant. POSIX started as an open software
d e
initiative. Since POSIX has now become overwhelmingly popular, we discuss the POSIX
t u
requirements on real-time operating systems. We start with a brief introduction to open
s
software movement and then trace the historical events that have led to the emergence of
y
it
POSIX. Subsequently, we highlight the important requirements of real-time POSIX.
.c
w
1.2. Open Software
w
w
An open system is a vendor neutral environment, which allows users to intermix hardware,
software, and networking solutions from different vendors. Open systems are based on open
standards and are not copyrighted, saving users from expensive intellectual property right (IPR)
law suits. The most important characteristics of open systems are: interoperability and
portability. Interoperability means systems from multiple vendors can exchange information
among each other. A system is portable if it can be moved from one environment to another
without modifications. As part of the open system initiative, open software movement has
become popular.
Advantages of open software include the following: It reduces cost of development and time
to market a product. It helps increase the availability of add-on software packages. It enhances
the ease of programming. It facilitates easy integration of separately developed modules.
POSIX is an off-shoot of the open software movement.
Open Software standards can be divided into three categories:
Open Source: Provides portability at the source code level. To run an application on a new
platform would require only compilation and linking. ANSI and POSIX are important open
source standards.
Open Object: This standard provides portability of unlinked object modules across different
platforms. To run an application in a new environment, relinking of the object modules
would be required.
Open Binary: This standard provides complete software portability across hardware
platforms based on a common binary language structure. An open binary product can be
portable at the executable code level. At the moment, no open binary standards.
The main goal of POSIX is application portability at the source code level. Before we discuss
about RT-POSIX, let us explore the historical background under which POSIX was developed.
Open Source: Provides portability at the source code level. To run an application on a new
platform would require only compilation and linking. ANSI and POSIX are important open
source standards.
1.6.1. PSOS
PSOS is a popular real-time operating system that is being primarily used in embedded
applications. It is available from Wind River Systems, a large player in the real-time operating
system arena. It is a host-target type of real- time operating system. PSOS is being used in
Version 2 EE IIT, Kharagpur 5
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
several commercial embedded products. An example application of PSOS is in the base stations
of the cellular systems.
Legend:
XRAY+: Source level
Debgguer
PROBE: Target Debgger
Editor
Cross-
compiler
XRAY+
Libraries Application
P
Host Computer N PSOS+
A PHILE
TCP/IP +
PROBE
o m
ot.c
Target
s p
o g
Fig. 32.1 PSOS-based Development of Embedded Software
PSOS-based application development has schematically . bl been shown in Fig. 32.1. The host
computer is typically a desktop. Both Unix and Windows
u p hosts are supported. The target board
contains the embedded processor, ROM, RAM, etc.
r o TheOnhost computer runs the editor, cross-
1.6.2. VRTX
VRTX is a POSIX-RT compliant operating system from Mentor Graphics. VRTX has been
certified by the US FAA (Federal Aviation Agency) for use in mission and life critical
applications such as avionics. VRTX has two multitasking kernels: VRTXsa and VRTXmc.
VRTXsa is used for large and medium applications. It supports virtual memory. It has a
POSIX-compliant library and supports priority inheritance. Its system calls are deterministic and
fully preemptable. VRTXmc is optimized for power consumption and ROM and RAM sizes. It
has therefore a very small foot print. The kernel typically requires only 4 to 8 Kbytes of ROM
and 1KBytes of RAM. It does not support virtual memory. This version is targeted for cell
phones and other small hand-held devices.
1.6.3. VxWorks
m
VxWorks is a product from Wind River Systems. It is host-target system. The host can be
o
t.c
either a Windows or a Unix machine. It supports most POSIX-RT functionalities. VxWorks
comes with an integrated development environment (IDE) called Tornado. In addition to the
p o
standard support for program development tools such as editor, cross-compiler, cross-debugger,
s
etc. Tornado contains VxSim and WindView. VxSim simulates a VxWorks target for use as a
g
o
prototyping and testing environment. WindView provides debugging tools for the simulator
environment. VxMP is the multiprocessor version of VxWorks.
. bl
u p
VxWorks was deployed in the Mars Pathfinder which was sent to Mars in 1997. Pathfinder
landed in Mars, responded to ground commands, and started to send science and engineering
r o
data. However, there was a hitch: it repeatedly reset itself. Remotely using trace generation,
g
logging, and debugging tools of VxWorks, it was found that the cause was unbounded priority
s
ent
inversion. The unbounded priority inversion caused real-time tasks to miss their deadlines, and
as a result, the exception handler reset the system each time. Although VxWorks supports
d
priority inheritance, using the remote debugging tool, it was found to have been disabled in the
u
t
configuration file. The problem was fixed by enabling it.
s
1.6.4. QNX it y
.c
w
w
QNX is a product from QNX Software System Ltd. QNX Neutrino offers POSIX-compliant
w
APIs and is implemented using microkernel architecture.
The microkernel architecture of QNX is shown in Fig. 32.2. Because of the fine grained
scalability of the micro- kernel architecture, it can be configured to a very small size a critical
advantage in high volume devices, where even a 1% reduction in memory costs can return
millions of dollars in profit.
File Device
System Driver
Appli- TCP/IP
cation Manager
1.6.5. C/OS-II
C/OS-II is a free RTOS, easily available on Internet. It is written in ANSI C and contains
o m
small portion of assembly code. The assembly language portion has been kept to a minimum to
t.c
make it easy to port it to different processors. To date, C/OS-II has been ported to over 100
p o
different processor architectures ranging from 8-bit to 64-bit microprocessors, microcontrollers,
and DSPs. Some important features of C/OS-II are highlighted in the following.
g s
C/OS-II was designed so that the programmer can use just a few of the offered services
o
bl
or select the entire range of services. This allows the programmer to minimize the
.
amount of memory needed by C/OS-II on a per-product basis.
p
C/OS-II has a fully preemptive kernel. Thisumeans that C/OS-II always ensures that
the highest priority task that is ready wouldrobe taken up for execution.
g
C/OS-II allows up to 64 tasks to bescreated. Each task operates at a unique priority
level. There are 64 priority levels. n t This means that round-robin scheduling is not
supported. The priority levels are e
C/OS-II uses a partitionedtu
d used as the PID (Process Identifier) for the tasks.
memory management. Each memory partition consists of
several fixed sized blocks. s
y A task obtains memory blocks from the memory partition and
the task must create iat memory partition before it can be used. Allocation and
. c memory blocks is done in constant time and is deterministic.
deallocation of fixed-sized
A task can create w and use multiple memory partitions, so that it can use memory blocks
w
of different sizes.
w
C/OS-II has been certified by Federal Aviation Administration (FAA) for use in
commercial aircraft by meeting the demanding requirements of its standard for software
used in avionics. To meet the requirements of this standard it was demonstrated through
documentation and testing that it is robust and safe.
1.6.6. RT Linux
Linux is by large a free operating system. It is robust, feature rich, and efficient. Several real-
time implementations of Linux (RT-Linux) are available. It is a self-host operating system (see
Fig. 32.3). RT-Linux runs along with a Linux system. The real-time kernel sits between the
hardware and the Linux system. The RT kernel intercepts all interrupts generated by the
hardware. Fig. 32.12 schematically shows this aspect. If an interrupt is to cause a real-time task
to run, the real-time kernel preempts Linux, if Linux is running at that time, and lets the real-time
task run. Thus, in effect Linux runs as a task of RT-Linux.
Linux
RT Linux
Hardware
o m
The real-time applications are written as loadable kernel modules. In essence, real-time
applications run in the kernel space.
o t.c
In the approach taken by RT Linux, there are effectively two independent kernels: real-time
p
kernel and Linux kernel. Therefore, this approach is also known as the dual kernel approach as
s
g
the real-time kernel is implemented outside the Linux kernel. Any task that requires deterministic
o
bl
scheduling is run as a real-time task. These tasks preempt Linux whenever they need to execute
p .
and yield the CPU to Linux only when no real-time task is ready to run.
Compared to the microkernel approach, the following are the shortcomings of the dual-kernel
approach.
o u
gr
Duplicated Coding Efforts: Tasks running in the real-time kernel can not make full use
s
nt
of the Linux system services file systems, networking, and so on. In fact, if a real-time
task invokes a Linux service, it will be subject to the same preemption problems that
d e
prohibit Linux processes from behaving deterministically. As a result, new drivers and
t u
system services must be created specifically for the real-time kernel even when
s
equivalent services already exist for Linux.
y
it
.c
Fragile Execution Environment: Tasks running in the real-time kernel do not benefit
from the MMU-protected environment that Linux provides to the regular non-real-time
w
processes. Instead, they run unprotected in the kernel space. Consequently, any real-time
w
task that contains a coding error such as a corrupt C pointer can easily cause a fatal kernel
w
fault. This is serious problem since many embedded applications are safety-critical in
nature.
Limited Portability: In the dual kernel approach, the real-time tasks are not Linux
processes at all; but programs written using a small subset of POSIX APIs. To aggravate
the matter, different implementations of dual kernels use different APIs. As a result, real-
time programs written using one vendors RT-Linux version may not run on anothers.
Programming Difficulty: RT-Linux kernels support only a limited subset of POSIX
APIs. Therefore, application development takes more effort and time.
1.6.7. Lynx
Lynx is a self host system. The currently available version of Lynx (Lynx 3.0) is a
microkernel-based real-time operating system, though the earlier versions were based on
monolithic design. Lynx is fully compatible with Linux. With Lynxs binary compatibility, a
Linux programs binary image can be run directly on Lynx. On the other hand, for other Linux
compatible operating systems such as QNX, Linux applications need to be recompiled in order to
run on them. The Lynx microkernel is 28KBytes in size and provides the essential services in
scheduling, interrupt dispatch, and synchronization. The other services are provided as kernel
plug-ins (KPIs). By adding KPIs to the microkernel, the system can be configured to support I/O,
file systems, sockets, and so on. With full configuration, it can function as a multipurpose Unix
machine on which both hard and soft real-time tasks can run. Unlike many embedded real-time
operating systems, Lynx supports memory protection.
1.6.8. Windows CE
o m
c
Windows CE is a stripped down version of Windows, and hast.a minimum footprint of
p o all threads are run in
400KBytes only. It provides 256 priority levels. To optimize performance,
the kernel mode. The timer accuracy is 1 msec for sleep and wait
g s related APIs. The different
o time. Also, interrupt servicing is
functionalities of the kernel are broken down into small non-preemptive sections. So, during
system call preemption is turned off for only short periods lof
b
preemptable. That is, it supports nested interrupts. It uses. memory management unit (MMU) for
virtual memory management. u p
Windows CE uses a priority inheritance scheme
r o to avoid priority inversion problem present
s g Whenthea thread
in Windows NT. Normally, the kernel thread handling page fault (i.e. DPC) runs at priority
level higher than NORMAL (refer Sec. 4.5.2).
n t with priority level NORMAL
d
raised to the priority of the thread causing
e the page fault. This ensures that a thread is not
suffers a page fault, the priority of the corresponding kernel thread handling this page fault is
t
blocked by another lower priority thread u even when it suffers a page fault.
y s
1.6.9. Exercises i t
.c
w statements are True or False. Justify your answer in each case.
1.
w
State whether the following
w
a. In real-time Linux (RT-Linux), real-time processes are scheduled at priorities
higher than the kernel processes.
b. EDF scheduling of tasks is commonly supported in commercial real-time operating
systems such as PSOS and VRTX.
c. POSIX 1003.4 (real-time standard) requires that real-time processes be scheduled at
priorities higher than kernel processes.
d. POSIX is an attempt by ANSI/IEEE to enable executable files to be portable
across different Unix machines.
2. What is the difference between block I/O and character I/O? Give examples of each.
Which type of I/O is accorded higher priority by Unix? Why?
3. List four important features that a POSIX 1003.4 (Real-Time standard) compliant
operating system must support. Is preemptability of kernel processes required by POSIX
1003.4? Can a Unix-based operating system using the preemption-point technique claim to
be POSIX 1003.4 compliant? Explain your answers.
4. Suppose you are the manufacturer of small embedded components used mainly in
consumer electronics goods such as automobiles, MP3 players, and computer-based toys.
Would you prefer to use PSOS, WinCE, or RT-Linux in your embedded component?
Explain the reasons behind your answer.
5. What is the difference between a system call and a function call? What problems, if any,
might arise if the system calls are invoked as procedure calls?
6. Explain how a real-time operating system differs from a traditional operating system.
Name a few real-time operating systems that are commercially available.
7. What is open software? Does an open software mandate portability of the executable files
across different platforms? Name an open software standard for real-time operating
systems. What is the advantage of using an open software operating system for real-time
application development? What are the pros and cons of using an open software product in
program development compared to a proprietary product?
8. Identify at least four important advantages of using VxWorks as the operating system for
real-time applications compared to using Unix V.3.
9.
o m
What is an open source standard? How is it different from open object and open binary
10.
o t.c
standards? Give some examples of popular open source software products.
Can multithreading result in faster response times (compared to single threaded tasks) even
p
in uniprocessor systems? Explain your answer and identify the reasons to support your
s
answer.
o g
References (Lessons 24 - 28) . bl
u p
1. o
C.M. Krishna and Shin K.G., Real-Time Systems, Tata McGraw-Hill, 1999.
r
2. Philip A. Laplante, Real-Time System Design
s g and Analysis, Prentice Hall of India, 1996.
3. n
Jane W.S. Liu, Real-Time Systems, Pearson t Press, 2000.
4. Alan C. Shaw, Real-Time Systems d eand Software, John Wiley and Sons, 2001.
t u
5.
s
C. SivaRam Murthy and G. Manimaran, Resource Management in Real-Time Systems and
Networks, MIT Press, 2001.
i ty
6. c
B. Dasarathy, Timing .Constraints of Real-Time Systems: Constructs for Expressing Them,
w80-86.
Methods for Validating Them, IEEE Transactions on Software Engineering, January 1985,
w
Vol. 11, No. 1, pages
7. w
Lui Sha, Ragunathan Rajkumar, John P. Lehoczky, Priority inheritance protocols: An
approach to real-time synchronization,, IEEE Transactions on Computers, 1990, Vol. 39,
pages 1175-1185.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
33
Introduction to Software
Engineering
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
d e
1. Introduction t u
y s
t
i technology, computers have become more powerful and
.c a computer is, the more sophisticated programs it can run.
With the advancement of
sophisticated. The more powerful
Thus, programmers havew
w been
coped with this challenge
tasked to solve larger and more complex problems. They have
by innovating and by building on their past programming experience.
w and experience of writing good quality programs in efficient and cost-
All those past innovations
effective ways have been systematically organized into a body of knowledge. This body of
knowledge forms the basis of software engineering principles. Thus, we can view software
engineering as a systematic collection of past experience. The experience is arranged in the form
of methodologies and guidelines.
Suppose you have a friend who asked you to build a small wall as shown in fig. 33.1. You
would be able to do that using your common sense. You will get building materials like bricks;
cement etc. and you will then build the wall.
But what would happen if the same friend asked you to build a large multistoried building as
shown in fig. 33.2?
o m
o t.c
Fig. 33.2 A Multistoried Buildings p
You don't have a very good idea about building suchlo
g
a huge complex. It would be very
.
difficult to extend your idea about a small wall construction b into constructing a large building.
p because you would not have the
Even if you tried to build a large building, it would collapse
u
o
requisite knowledge about the strength of materials, testing, planning, architectural design, etc.
Building a small wall and building a large buildingr are entirely different ball games. You can use
s g a small wall, but building a large building
requires knowledge of civil, architectural andn tother engineering principles.
your intuition and still be successful in building
The principle of abstraction (in fig. 33.4) implies that a problem can be simplified by
omitting irrelevant details. Once the simpler problem is solved then the omitted details can be
taken into consideration to solve the next lower level abstraction, and so on.
taken to develop
Efforts and Time
Complexity,
Size
Fig. 33.3 Increase in development time and effort with problem size
o m
o t.c
1.1.1. Abstraction and Decomposition p
g s
o
. bl 3rd abstraction
u p
r o
s g
ent 3rd abstraction
u d
st
it y
.c 1st abstraction
w
w
w
Full problem
components can be combined to get the full solution. A good decomposition of a problem as
shown in fig. 33.5 should minimize interactions among various components. If the different
subcomponents are interrelated, then the different components cannot be solved separately and
the desired reduction in complexity will not be realized.
o m
o t.c
p
Fig. 33.5 Decomposition of a large problem into asset of smaller problems
o g
1.2. The Software Crisis
. bl
Software engineering appears to be among the few u p options available to tackle the present
software crisis. r o
s g words, consider the following. The expenses
t
To explain the present software crisis in simple
n
e
that organizations all around the world are incurring on software purchases compared to those on
hardware purchases have been showingda worrying trend over the years (as shown in fig. 33.6)
t u
s
ti y
.c
Hardware cost / Software cost
w
w
w
Fig. 33.6 Change in the relative cost of hardware and software over time
Organizations are spending larger and larger portions of their budget on software. Not only
are the software products turning out to be more expensive than hardware, but they also present a
host of other problems to the customers: software products are difficult to alter, debug, and
enhance; use resources non-optimally; often fail to meet the user requirements; are far from
being reliable; frequently crash; and are often delivered late. Among these, the trend of
increasing software costs is probably the most important symptom of the present software crisis.
Remember that the cost we are talking of here is not on account of increased features, but due to
ineffective development of the product characterized by inefficient resource usage, and time and
cost over-runs.
There are many factors that have contributed to the making of the present software crisis.
Factors are larger problem sizes, lack of adequate training in software engineering, increasing
skill shortage, and low productivity improvements.
It is believed that the only satisfactory solution to the present software crisis can possibly
m
come from a spread of software engineering practices among the engineers, coupled with further
o
t.c
advancements to the software engineering discipline itself.
ent
is the sole user. On the other hand, for a software product, user interface must be carefully
designed and implemented because developers of that product and users of that product are
d
totally different. In case of a program, very little documentation is expected, but a software
u
t
product must be well documented. A program can be developed according to the programmers
s
it y
individual style of development, but a software product must be developed using the accepted
.c
software engineering principles.
w
2. Evolution ofwProgram Design Techniques
w
During the 1950s, most programs were being written in assembly language. These programs
were limited to about a few hundreds of lines of assembly code, i.e. were very small in size.
Every programmer developed programs in his own individual style - based on his intuition. This
type of programming was called Exploratory Programming.
The next significant development which occurred during early 1960s in the area computer
programming was the high-level language programming. Use of high-level language
programming reduced development efforts and development time significantly. Languages like
FORTRAN, ALGOL, and COBOL were introduced at that time.
o m
Statements Considered Harmful. Expectedly, many programmers were enraged to read this
article. They published several counter articles highlighting the advantages and inevitable use of
o t.c
GOTO statements. But, soon it was conclusively proved that only three programming constructs
sequence, selection, and iteration were sufficient to express any programming logic. This
formed the basis of the structured programming methodology.
s p
o g
2.1.1. Features of Structured Programming
. bl
A structured program uses three types of program u p constructs i.e. selection, sequence and
r o flows by restricting the use of GOTO
iteration. Structured programs avoid unstructured control
statements. A structured program consists of ga well partitioned set of modules. Structured
ts constructs such as if-then-else, do-while, etc.
programming uses single entry, single-exit program
Thus, the structured programming principle
e n emphasizes designing neat control structures for
programs.
u d
st
2.1.2. Advantages of Structured
i ty Programming
Structured programs are.ceasier to read and understand. Structured programs are easier to
maintain. They require w less effort and time for development. They are amenable to easier
debugging and usually w
w fewer errors are made in the course of writing such programs.
A lot of attention is being paid to requirements specification. Significant effort is now being
devoted to develop a clear specification of the problem before any development activity is
started.
Now, there is a distinct design phase where standard design techniques are employed.
Periodic reviews are being carried out during all stages of the development process. The
main objective of carrying out reviews is phase containment of errors, i.e. detect and correct
errors as soon as possible. Defects are usually not detected as soon as they occur, rather they are
noticed much later in the life cycle. Once a defect is detected, we have to go back to the phase
where it was introduced and rework those phases - possibly change the design or change the code
and so on.
Today, software testing has become very systematic and standard testing techniques are
available. Testing activity has also become all encompassing in the sense that test cases are being
developed right from the requirements specification stage.
There is better visibility of design and code. By visibility we mean production of good
quality, consistent and standard documents during every phase. In the past, very little attention
was paid to producing good quality and consistent documents. In the exploratory style, the
design and test activities, even if carried out (in whatever way), were not documented
satisfactorily. Today, consciously good quality documents are being developed during product
development. This has made fault diagnosis and maintenance smoother.
Now, projects are first thoroughly planned. Project planning normally includes preparation of
m
various types of estimates, resource scheduling, and development of project tracking plans.
o
t.c
Several techniques and tools for tasks such as configuration management, cost estimation,
scheduling, etc. are used for effective software project management.
p o
assurance. g s
Several metrics are being used to help in software project management and software quality
o
3. Software Life Cycle Model . bl
u p
r o
A software life cycle model (also called process model) is a descriptive and diagrammatic
s g
representation of the software life cycle. A life cycle model represents all the activities required
ent
to make a software product transit through its life cycle phases. It also captures the order in
which these activities are to be undertaken. In other words, a life cycle model maps the different
d
activities performed on a software product from its inception to its retirement. Different life
u
t
cycle models may map the basic development activities to phases in different ways. Thus, no
s
it y
matter which life cycle model is followed, the basic activities are included in all life cycle
.c
models though the activities may be carried out in different orders in different life cycle models.
During any life cycle phase, more than one activity may also be carried out. For example, the
w
design phase might consist of the structured analysis activity followed by the structured design
activity. w
w
3.1. The Need for a Life Cycle Model
The development team must identify a suitable life cycle model for the particular project and
then adhere to it. Without using a particular life cycle model, the development of a software
product would not be in a systematic and disciplined manner. When a software product is being
developed by a team there must be a clear understanding among team members about when and
what to do. Otherwise it would lead to chaos and project failure. Let us try to illustrate this
problem using an example. Suppose a software development problem is divided into several
parts and the parts are assigned to the team members. From then on, suppose the team members
are allowed the freedom to develop the parts assigned to them in whatever way they like. It is
possible that one member might start writing the code for his part, another might decide to
prepare the test documents first, and some other engineer might begin with the design phase of
the parts assigned to him. This would be one of the perfect recipes for project failure.
A software life cycle model defines entry and exit criteria for every phase. A phase can start
only if its phase-entry criteria have been satisfied. So without a software life cycle model, the
entry and exit criteria for a phase cannot be recognized. Without models (such as classical
waterfall model, iterative waterfall model, prototyping model, evolutionary model, spiral model
etc.), it becomes difficult for software project managers to monitor the progress of the project.
Many life cycle models have been proposed so far. Each of them has some advantages as
well as some disadvantages. A few important and commonly used life cycle models are as
follows:
u d
Classical waterfall model divides the life cycle into the following phases as shown in fig.
33.7: st
it y
Feasibility study
.c
w
Requirements analysis and specification
w
Design w
Coding and unit testing
Integration and system testing
Maintenance
Feasibility Study
Requirement analysis
and specification
Design
Coding
Testing
o m
Maintenance o t.c
s p
o g
bl
Fig. 33.7 Classical Waterfall Model
Case Study
A mining company named Galaxy Mining Company Ltd. (GMC) has mines located at various
places in India. It has about fifty different mine sites spread across eight states. The company
employs a large number of mines at each mine site. Mining being a risky profession, the
company intends to operate a special provident fund, which would exist in addition to the
standard provident fund that the miners already enjoy. The main objective of having the special
provident fund (SPF) would be to quickly distribute some compensation before the standard
provident amount is paid. According to this scheme, each mine site would deduct SPF
instalments from each miner every month and deposit the same with the CSPFC (Central Special
Provident Fund Commissioner). The CSPFC will maintain all details regarding the SPF
instalments collected from the miners. GMC employed a reputed software vendor Adventure
Software Inc. to undertake the task of developing the software for automating the maintenance of
SPF records of all employees. GMC realized that besides saving manpower on bookkeeping
work, the software would help in speedy settlement of claim cases. GMC indicated that the
o m
amount it could afford for this software to be developed and installed was 1 million rupees.
.c
Adventure Software Inc. deputed their project manager to carry out tthe feasibility study. The
project manager discussed the matter with the top managers of GMC p o to get an overview of the
g
project. He also discussed the issues involved with the several fields PF officers at various mine
sites to determine the exact details of the project. The project
lodatabase
manager identified two broad
approaches to solve the problem. One was to have a central
. b which could be accessed
d e
sites could still operate even when the communication link to the central database temporarily
The goal of the requirements gathering activity is to collect all relevant information from the
customer regarding the product to be developed with a view to clearly understand the customer
requirements and weed out the incompleteness and inconsistencies in these requirements.
The requirements analysis activity is begun by collecting all relevant data regarding the
product to be developed from the users of the product and from the customer through interviews
and discussions. For example, to perform the requirements analysis of a business accounting
software required by an organization, the analyst might interview all the accountants of the
organization to ascertain their requirements. The data collected from such a group of users
usually contain several contradictions and ambiguities, since each user typically has only a
partial and incomplete view of the system. Therefore it is necessary to identify all ambiguities
and contradictions in the requirements and resolve them through further discussions with the
customer. After all ambiguities, inconsistencies, and incompleteness have been resolved and all
the requirements properly understood, the requirements specification activity can start. During
this activity, the user requirements are systematically organized into a Software Requirements
Specification (SRS) document.
The customer requirements identified during the requirements gathering and analysis activity
are organized into an SRS document. The important components of this document are functional
o
requirements, the non-functional requirements, and the goals of implementation. m
3.2.3. Design o t.c
p
The goal of the design phase is to transform the requirementssspecified in the SRS document
o g
l
into a structure that is suitable for implementation in some programming language. In technical
terms, during the design phase the software architecture isbderived from the SRS document. Two
p . design approach and the object-
distinctly different approaches are available: the traditional
oriented design approach.
o u
Traditional design approach: Traditional g r consists of two different activities; first a
design
ts is carried out where the detailed structure of
structured analysis of the requirements specification
the problem is examined. This is followed
e n by a structured design activity. During structured
design, the results of structured analysisdare transformed into the software design.
t u In this technique, various objects that occur in the
problem domain and the solutionys
Object-oriented design approach:
exist among these objects areitidentified. The object structure is further refined to obtain the
domain are first identified, and the different relationships that
detailed design. .c
w
3.2.4. Coding w
w
and Unit Testing
The purpose of the coding and unit testing phase (sometimes called the implementation
phase) of software development is to translate the software design into source code. Each
component of the design is implemented as a program module. The end-product of this phase is a
set of program modules that have been individually tested.
During this phase, each module is unit tested to determine the correct working of all the
individual modules. It involves testing each module in isolation as this is the most efficient way
to debug the errors identified at this stage.
The different modules making up a software product are almost never integrated in one shot.
Integration is normally carried out incrementally over a number of steps. During each integration
step, the partially integrated system is tested and a set of previously planned modules are added
to it. Finally, when all the modules have been successfully integrated and tested, system testing is
carried out. The goal of system testing is to ensure that the developed system conforms to the
requirements laid out in the SRS document. System testing usually consists of three different
kinds of testing activities:
testing: It is the system testing performed by the development team.
testing: It is the system testing performed by a friendly set of customers.
Acceptance testing: It is the system testing performed by the customer himself after
product delivery to determine whether to accept or reject the delivered product.
System testing is normally carried out in a planned manner according to the system test plan
document. The system test plan identifies all testing-related activities that must be performed,
m
specifies the schedule of testing, and allocates resources. It also lists all the test cases and the
o
t.c
expected outputs for each test case.
3.2.6. Maintenance p o
g s
o
bl
Maintenance of a typical software product requires much more than the effort necessary to
p .
develop the product itself. Many studies carried out in the past confirm this and indicate that the
relative effort of development of a typical software product to its maintenance effort is roughly
o u
in the 40:60 ratio. Maintenance involves performing any one or more of the following three
kinds of activities: r
g during the product development phase. This is
ts
Correcting errors that were not discovered
called corrective maintenance.
e n
Improving the implementation d
t u requirements. This is called perfective maintenance.
of the system, and enhancing the functionalities of the
s
system according to the customers
Porting the software totywork in a new environment. For example, porting may be
. ci to work on a new computer platform or with a new operating
required to get the software
w adaptive maintenance.
system. This is called
w
w of the Classical Waterfall Model
3.2.7. Shortcomings
The classical waterfall model is an idealistic one since it assumes that no development error
is ever committed by the engineers during any of the life cycle phases. However, in practical
development environments, the engineers do commit a large number of errors in almost every
phase of the life cycle. The source of the defects can be many: oversight, wrong assumptions, use
of inappropriate technology, communication gap among the project engineers, etc. These defects
usually get detected much later in the life cycle. For example, a design defect might go unnoticed
till we reach the coding or testing phase. Once a defect is detected, the engineers need to go back
to the phase where the defect had occurred and redo some of the work done during that phase
and the subsequent phases to correct the defect and its effect on the later phases. Therefore, in
any practical software development work, it is not possible to strictly follow the classical
waterfall model.
At the start of the feasibility study, project managers or team leaders try to understand what
the actual problem is, by visiting the client side. At the end of that phase, they pick the best
solution and determine whether the solution is feasible financially and technically.
At the start of requirements analysis and specification phase, the required data is collected.
After that requirement specification is carried out. Finally, SRS document is produced.
At the start of design phase, context diagram and different levels of DFDs are produced
according to the SRS document. At the end of this phase module structure (structure chart) is
produced.
During the coding phase each module (independently compilation unit) of the design is
coded. Then each module is tested independently as a stand-alone unit and debugged separately.
After this each module is documented individually. The end product of the implementation phase
is a set of program modules that have been tested individually but not tested together.
m
After the implementation phase, different modules which have been tested individually are
o
t.c
integrated in a planned manner. After all the modules have been successfully integrated and
tested, system testing is carried out.
p o
Software maintenance denotes any changes made to a software product after it has been
s
delivered to the customer. Maintenance is inevitable for almost any kind of product. However,
g
o
most products need maintenance due to the wear and tear caused by use.
. bl
3.3. Prototyping Model
u p
r o
A prototype is a toy implementation of the system. A prototype usually exhibits limited
s g
functional capabilities, low reliability, and inefficient performance compared to the actual
ent
software. A prototype is usually built using several shortcuts. The shortcuts might involve using
inefficient, inaccurate, or dummy functions. The shortcut implementation of a function, for
u d
example, may produce the desired results by using a table look-up instead of performing the
st
actual computations. A prototype usually turns out to be a very crude version of the actual
system.
it y
3.3.1. The Need for
c
. a Prototype
w
w
w
There are several uses of a prototype. An important purpose is to illustrate the input data
formats, messages, reports, and the interactive dialogues to the customer. This is a valuable
mechanism for gaining better understanding of the customers needs.
how screens might look like
how the user interface would behave
how the system would produce outputs, etc.
This is something similar to what the architectural designers of a building do; they show a
prototype of the building to their customer. The customer can evaluate whether he likes it or not
and the changes that he would need in the actual product. A similar thing happens in the case of a
software product and its prototyping model.
32.
o m
o t.c
p
The Spiral model of software development is shown in fig. 33.8. The diagrammatic
s
g
representation of this model appears like a spiral with many loops. The exact number of loops in
o
bl
the spiral is not fixed. Each loop of the spiral represents a phase of the software process. For
p .
example, the innermost loop might be concerned with feasibility study; the next loop with
requirements specification; the next one with design, and so on. Each phase in this model is split
u
into four sectors (or quadrants) as shown in fig. 33.8. The following activities are carried out
o
during each phase of a spiral model.
gr
s
First quadrant (Objective Setting):
ent
During the first quadrant, we need to identify the objectives of the phase.
d
Examine the risks associated with these objectives
u
st
t y
Second quadrant (Risk Assessment and Reduction):
i
A detailed analysis is carried out for each identified project risk.
.c
Steps are taken to reduce the risks. For example, if there is a risk that the requirements
w
w
are inappropriate, a prototype system may be developed
w
Third quadrant (Objective Setting):
Develop and validate the next level of the product after resolving the identified risks.
technically challenging software products that are prone to several kinds of risks. However, this
model is much more complex than the other models. This is probably a factor deterring its use in
ordinary projects.
o m
The prototyping model is suitable for projects for which either the user requirements or the
underlying technical aspects are not well understood. This model is especially popular for
development of the user-interface part of the projects.
o t.c
The evolutionary approach is suitable for large problems which can be decomposed into a
s p
set of modules for incremental development and delivery. This model is also widely used for
g
object-oriented development projects. Of course, this model can only be used if the incremental
o
bl
delivery of the system is acceptable to the customer.
.
The spiral model is called a meta-model since it encompasses all other life cycle models.
p
ou
Risk handling is inherently built into this model. The spiral model is suitable for development of
technically challenging software products that are prone to several kinds of risks. However, this
gr
model is much more complex than the other models. This is probably a factor deterring its use in
s
nt
ordinary projects.
The different software life cycle models can be compared from the viewpoint of the
d e
customer. Initially, customer confidence in the development team is usually high irrespective of
t u
the development model followed. During the long development process, customer confidence
s
normally drops, as no working product is immediately visible. Developers answer customer
y
it
queries using technical slang, and delays are announced. This gives rise to customer resentment.
.c
On the other hand, an evolutionary approach lets the customer experiment with a working
w
product much earlier than the monolithic approaches. Another important advantage of the
w
incremental model is that it reduces the customers trauma of getting used to an entirely new
w
system. The gradual introduction of the product via incremental phases provides time to the
customer to adjust to the new product. Also, from the customers financial viewpoint,
incremental development does not require a large upfront capital outlay. The customer can order
the incremental versions as and when he can afford them.
3.6. Exercises
1. Mark the following as True or False. Justify your answer.
a. All software engineering principles are backed by either scientific basis or theoretical
proof.
b. There are well defined steps through which a problem is solved using an exploratory
style.
c. Evolutionary life cycle model is ideally suited for development of very small software
products typically requiring a few months of development effort.
d. Prototyping life cycle model is the most suitable one for undertaking a software
development project susceptible to schedule slippage.
e. Spiral life cycle model is not suitable for products that are vulnerable to a large
number of risks.
2. For the following, mark all options which are true.
a. Which of the following problems can be considered to be contributing to the present
software crisis?
large problem size
lack of rapid progress of software engineering
lack of intelligent engineers
shortage of skilled manpower
b. Which of the following are essential program constructs (i.e. it would not be possible
to develop programs for any given problem without using the construct)?
Sequence
Selection
Jump o m
Iteration
o t.c
c. In a classical waterfall model, which phase precedes the design phase?
Coding and unit testing s p
Maintenance
o g
Requirements analysis and specification
Feasibility study . bl
u p
d. Among development phases of software life cycle, which phase typically consumes
the maximum effort? r o
g
Requirements analysis and specification
s
Design
Coding ent
Testing
u d
st
e. Among all the phases of software life cycle, which phase consumes the maximum
effort?
Design it y
Maintenance .c
Testing
w
w
f.
Codingw
In the classical waterfall model, during which phase is the Software Requirement
Specification (SRS) document produced?
Design
Maintenance
Requirements analysis and specification
Coding
g. Which phase is the last development phase in the classical waterfall software life
cycle?
Design
Maintenance
Testing
Coding
h. Which development phase in classical waterfall life cycle immediately follows coding
phase?
Design
Maintenance
Testing
Requirement analysis and specification
3. Identify the problem one would face, if he tries to develop a large software product without
using software engineering principles.
4. Identify the two important techniques that software engineering uses to tackle the problem
of exponential growth of problem complexity with its size.
5. State five symptoms of the present software crisis.
6. State four factors that have contributed to the making of the present software crisis.
7. Suggest at least two possible solutions to the present software crisis.
8. Identify at least four basic characteristics that differentiate a simple program from a
9.
software product.
o m
Identify two important features of that a program must satisfy to be called as a structured
10.
program.
Explain exploratory program development style. o t.c
11.
s p
Show at least three important drawbacks of the exploratory programming style.
12. g
Identify at least two advantages of using high-level languages over assembly languages.
o
bl
13. State at least two basic differences between control flow-oriented and data flow-oriented
14.
design techniques.
p .
State at least five advantages of object-oriented design techniques.
15.
ou
State at least three differences between the exploratory style and modern styles of software
development.
gr
s
nt
16. Explain the problems that might be faced by an organization if it does not follow any
software life cycle model.
17.
d e
Differentiate between structured analysis and structured design.
18. u
Identify at least three activities undertaken in an object-oriented software design approach.
t
19. s
State why it is a good idea to test a module in isolation from other modules.
y
20.
it
Identify why different modules making up a software product are almost never integrated
in one shot.
.c
21.
w
Identify the necessity of integration and system testing.
22. w
Identify six different phases of a classical waterfall model. Mention the reasons for which
23.
w
classical waterfall model can be considered impractical and cannot be used in real projects.
Explain what a software prototype is. Identify three reasons for the necessity of developing
a prototype during software development.
24. Explain the situations under which it is beneficial to develop a prototype during software
development.
25. Identify the activities carried out during each phase of a spiral model. Discuss the
advantages of using spiral model.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
34
Requirements Analysis
and Specification
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
What are the likely complexities that might arise while solving the problem?
If there are external software or hardware with which the developed software has to
interface, then what exactly would the data interchange formats with the external system
be?
After the analyst has understood the exact customer requirements, he proceeds to identify and
resolve the various requirements problems. The most important requirements problems that the
analyst has to identify and eliminate are the problems of anomalies, inconsistencies, and
incompleteness. When the analyst detects any inconsistencies, anomalies or incompleteness in
the gathered requirements, he resolves them by carrying out further discussions with the end-
users and the customers.
3. SRS Document
After the analyst has collected all the requirements information regarding the software to be
o m
developed, and has removed all the incompleteness, in consistencies, and anomalies from the
document.
o t.c
specification, he starts to systematically organize the requirements in the form of an SRS
d e
shown in fig. 34.1, is considered as a transformation
i
of a set of input data to some corresponding
t u
output data. The user can get some meaningful piece of work done using a high-level function.
s
ti y
.c Input Output
w
w data fi
data
w
Fig. 34.1 Function fi
So, the function Search Book (F1) takes the author's name and transforms it into book details.
Functional requirements actually describe a set of high-level requirements, where each high-
level requirement takes some data from the user and provides some data to the user as an output.
Also each high-level requirement might consist of several other functions.
to be input to the system, its input data domain, the output data domain, and the type of
processing to be carried on the input data to obtain the output data. Let us first try to document
the withdraw-cash function of an ATM (Automated Teller Machine) system. The withdraw-cash
is a high-level requirement. It has several sub-requirements corresponding to the different user
interactions. These different interaction sequences capture the different scenarios.
1. Decision Trees
A decision tree gives a graphic view of the processing logic involved in decision making and
the corresponding actions taken. The edges of a decision tree represent conditions and the leaf
nodes represent the actions to be performed depending on the outcome of testing the condition.
Example
Consider Library Membership Automation Software (LMS) where it should support the
following three options:
New member
Renewal
Cancel membership
s p
Renewal option
o g
bl
Decision: If the 'renewal' option is chosen, the LMS asks for the member's name and his
p .
membership number to check whether he is a valid member or not.
Action: If the membership is valid then membership expiry date is updated and the annual
u
membership bill is printed, otherwise an error message is displayed.
o
g r
Cancel membership option
s
Decision: If the 'cancel membership' optiont is selected, then the software asks for member's
name and his membership number. e n
Action: The membership is cancelled,da cheque for the balance amount due to the member is
t u is deleted from the database.
s
printed and finally the membership record
y
t
i the above example
. c
Decision tree representation of
The following tree (fig. 34.3) shows the graphical representation of the above example. After
w the user, the system makes a decision and then performs the
getting information from
w
w
corresponding actions.
Action
Get details
Create records
Print bills
New member
Get details
Renewal Update record
Print bills
User output
Cancellation
Get details
o m
Print cheque
Delete record
Invalid Option
o t.c
p
s Print error
o g massage
Fig. 34.3 Decision tree for. blLMS
u p
2. Decision Tables
r o
A decision table is used to represent the s g processing logic in a tabular or a matrix
form. The upper rows of the table specify the n t variables or conditions to be evaluated. The lower
complex
rows of the table specify the actions to beetaken when the corresponding conditions are satisfied.
u d
Example s t
shows how to represent c ityproblemLMS
Consider the previously discussed example. The decision table shown in fig. 34.4
. the conditions and the lower part shows what actions are taken.
the in a tabular form. Here the table is divided into two
Conditions
Valid selection No Yes Yes Yes
New member - Yes No No
Renewal - No Yes No
Cancellation - No No Yes
Actions
Display error message x - - -
Ask member's details - x - -
Build customer record - - x -
Generate bill - x x -
Ask member's name & membership number - - x x
Update expiry date - - xo-
m
Print cheque - -
o t.-c x
Delete record -
s p- - x
o
Fig. 34.4 Decision table for LMS g
. bl
u p
From the above table you can easily understand that, if the valid selection condition is false,
then the action taken for this condition is 'display error message' and so on.
r o
4. Formal Requirements Specification s g
n t
A formal technique is a mathematicalemethod to specify a hardware and/or software system,
u d
verify whether a specification is realizable, verify that an implementation satisfies its
specification, prove properties of at system without necessarily running the system, etc. The
s
ti y is provided by the specification language.
mathematical basis of a formal method
.c
4.1. Formal Specification w Language
w language consists of two sets syn and sem, and a relation sat between
w
A formal specification
them. The set syn is called the syntactic domain, the set sem is called the semantic domain, and
the relation sat is called the satisfaction relation. For a given specification syn, and model of the
system sem, if sat(syn, sem) as shown in fig.34.5, then syn is said to be the specification of sem,
and sem is said to be the specificand of syn.
SYN SEM
SAT
u d
property oriented approaches. In t
Formal methods are usually classified into two broad categories model oriented and
s a model-oriented style, one defines a systems behaviour
directly by constructing a modelty
c i etc.
of the system in terms of mathematical structures such as tuples,
.
relations, functions, sets, sequences,
In the property-orientedw style, the system's behaviour is defined indirectly by stating its
wform of a set of axioms that the system must satisfy.
properties, usually in the
Example
w
Let us consider a simple producer/consumer example. In a property-oriented style, we would
probably start by listing the properties of the system like: the consumer can start consuming
only after the producer has produced an item, the producer starts to produce an item only
after the consumer has consumed the last item, etc. Examples of property-oriented
specification styles are axiomatic specification and algebraic specification. In a model-
oriented approach, we start by defining the basic operations, p (produce) and c (consume).
Then we can state that S1 + p S, S + c S1. Thus the model-oriented approaches
essentially specify a program by writing another, presumably simpler program. Examples of
popular model-oriented specification techniques are Z, CSP, CCS, etc.
In the property-oriented style, the system's behaviour is defined indirectly by stating its
properties, usually in the form of a set of axioms that the system must satisfy.
Model-oriented approaches are more suited to use in later phases of life cycle because here
even minor changes to a specification may lead to drastic changes to the entire specification.
They do not support logical conjunctions (AND) and disjunctions (OR).
Property-oriented approaches are suitable for requirements specification because they can be
easily changed. They specify a system as a conjunction of axioms and you can easily replace one
axiom with another one.
r o
4.3.2. Branching Semantics s g
t
n is represented by a directed graph as shown in the
e
In this approach, the behaviour of a system
d the possible states in the evolution of a system. The
u
fig. 34.6. The nodes of the graph represent
t represent the states which can be generated by any of the
descendants of each node of the graph
y s
i t
atomic actions enabled at that state. Although this semantic model distinguishes the branching
.c
points in a computation, still it represents concurrency by interleaving.
w
w
w
o m
A
o t.c
s p
o g
B
bl
p.
D F
o u
gr
ts
C
en E
u d
st
Fig. 34.7 Partial order semantics
i ty
For example, Fig. 34.7 shows that we can compare node B with node D, but we can't
compare node D with node A. .c
w
4.4. w
Merits of Formal Requirements Specification
w
Formal methods possess several positive features, some of which are discussed below.
Formal specifications encourage rigour. Often, the very process of construction of a
rigorous specification is more important than the formal specification itself. The
construction of a rigorous specification clarifies several aspects of system behaviour
that are not obvious in an informal specification.
Formal methods usually have a well-founded mathematical basis. Thus, formal
specifications are not only more precise, but also mathematically sound and can be
used to reason about the properties of a specification and to rigorously prove that an
implementation satisfies its specifications.
Formal methods have well-defined semantics. Therefore, ambiguity in specifications
is automatically avoided when one formally specifies a system.
The mathematical basis of the formal methods facilitates automating the analysis of
specifications. For example, a tableau-based technique has been used to automatically
check the consistency of specifications. Also, automatic theorem proving techniques
can be used to verify that an implementation satisfies its specifications. The possibility
of automatic verification is one of the most important advantages of formal methods.
Formal specifications can be executed to obtain immediate feedback on the features of
the specified system. This concept of executable specifications is related to rapid
prototyping. Informally, a prototype is a toy working model of a system that can
provide immediate feedback on the behaviour of the specified system, and is
especially useful in checking the completeness of specifications.
o m
It is clear that formal methods provide mathematically sound frameworks using which
systems can be specified, developed and verified in a systematic manner. However, formal
methods suffer from several shortcomings, some of which are the following:
Formal methods are difficult to learn and use. o t.c
The basic incompleteness results of first-order logic s
p
g suggest that it is impossible to
o proving techniques.
check absolute correctness of systems using theorem
l
b problems. This shortcoming results
Formal techniques are not able to handle complex .
p problems blow up the complexity of
u
from the fact that, even moderately complicated
o a large unstructured set of mathematical
formulae is difficult to comprehend. gr
formal specification and their analysis. Also,
ts
5. Axiomatic Specification en
ud first-order logic is used to write the pre and post-
In axiomatic specification of a tsystem,
conditions to specify the operationsy s of the system in the form of axioms. The pre-conditions
t
i that must be satisfied before an operation can successfully be
. c
basically capture the conditions
invoked. In essence, the pre-conditions capture the requirements on the input parameters of a
w the conditions that must be satisfied when a function completes
function. The post-conditions
w isareconsidered
execution and the function to have been executed successfully. Thus, the post-
w
conditions are essentially the constraints on the results produced for the function execution to be
considered successful.
Establish the changes made to the functions input parameters after execution of the
function. Pure mathematical functions do not change their input and therefore this type of
assertion is not necessary for pure functions.
Combine all of the above into pre and post conditions of the function.
5.2. Examples
Example 1
Specify the pre- and post-conditions of a function that takes a real number as argument and
returns half the input value if the input is less than or equal to 100, or else returns double the
value.
f (x : real) : real
pre : x R
post : {(x100) (f(x) = x/2)} {(x>100) (f(x) = 2x)}
o m
Example 2
t.c array and an integer
Axiomatically specify a function named search which takes an integer
key value as its arguments and returns the index in the array where p o the key value is present.
g s : Integer
search(X : IntArray, key : Integer)
lo X[i] = key
pre : i [Xfirst.Xlast],
post : {(X[search(X,bkey)] = key) (X = X)}
Here, the convention that has been followed is that, p . if a function changes any of its input
parameters, and if that parameter is named X, then
o u it has been referred that after the function
completes execution as X.
g r
ts
6. Algebraic Specification n
d e
tu an object class or type is specified in terms of
In the algebraic specification technique
relationships existing between thes operations defined on that type. It was first brought into
i
prominence by Guttag [1980, 1985] t y in specification of abstract data types. Various notations of
algebraic specifications have.c
evolved, including those based on OBJ and Larch languages.
w
6.1. Representation w of Algebraic Specification
w
Essentially, algebraic specifications define a system as a heterogeneous algebra. A
heterogeneous algebra is a collection of different sets on which several operations are defined.
Traditional algebras are homogeneous. A homogeneous algebra consists of a single set and
several operations; {I, +, -, *, /}. In contrast, alphabetic strings together with operations of
concatenation and length {A, I, con, len}, is not a homogeneous algebra, since the range of the
length operation is the set of integers. To define a heterogeneous algebra, we first need to specify
its signature, the involved operations, and their domains and ranges. Using algebraic
specification, we define the meaning of a set of interface procedure by using equations. An
algebraic specification is usually presented in four sections.
1. Types section
In this section, the sorts (or the data types) being used are specified.
2. Exceptions section
This section gives the names of the exceptional conditions that might occur when different
operations are carried out. These exception conditions are used in the later sections of an
algebraic specification.
3. Syntax section
This section defines the signatures of the interface procedures. The collection of sets that
form input domain of an operator and the sort where the output is produced are called the
signature of the operator. For example, PUSH takes a stack and an element and returns a new
stack.
stack x element stack
4. Equations section
This section gives a set of rewrite rules (or equations) defining the meaning of the interface
procedures in terms of each other. In general, this section is allowed to contain conditional
expressions.
6.2. Operators o m
o t.c
By convention, each equation is implicitly universally quantified over all possible values of
s p
the variables. Names not mentioned in the syntax section such r or e are variables. The first
g
step in defining an algebraic specification is to identify the set of required operations. After
o
. bl
having identified the required operators, it is helpful to classify them as basic constructor
operators, extra constructor operators, basic inspector operators, or extra inspection operators.
u p
The definition of these categories of operators is as follows:
r o
1. Basic construction operators: These operators are used to create or modify entities of a
type. The basic construction operators are g
s essential to generate all possible element of the
t and append are basic construction operators.
type being specified. For example, create
n
e are the construction operators other than the basic
2. Extra construction operators: These
construction operators. For example, u d the operator remove is an extra construction operator,
because even without using remove t
s it is possible to generate all values of the type being
specified.
i t y
. c
3. Basic inspection operators: These operators evaluate attributes of a type without
modifying them, e.g.,w eval, get, etc. Let S be the set of operators whose range is not the data
w set of the basic operators S1 is a subset of S, such that each
type being specified.
w canThe
operator from S-S1 be expressed in terms of the operators from S1.
4. Extra inspection operators. These are the inspection operators that are not basic
inspectors.
Example
Let us specify a data type point supporting the operations create, xcoord, ycoord, and isequal
where the operations have their usual meaning.
Types:
o m
defines point
uses boolean, integer
o t.c
Syntax:
create : integer integer point s p
o g
bl
xcoord : point integer
ycoord : point integer
isequal : point point Boolean p .
Equations:
o u
xcoord(create(x, y)) = x
gr
s
nt
ycoord(create(x, y)) = y
isequal(create(x1, y1), create(x2, y2)) = ((x1 = x2) and (y1 = y2))
e
In this example, we have only one basic constructor (create), and three basic inspectors
d
u
(xcoord, ycoord, and isequal). Therefore, we have only 3 equations.
t
s
6.4. Properties of Algebraic i ty Specifications
c
. that every good algebraic specification should possess are:
w
Three important properties
w
w of operations on the interface procedures. There is no simple
Completeness: This property ensures that using the equations, it should be possible to reduce
any arbitrary sequence
procedure to ensure that an algebraic specification is complete.
Finite termination property: This property essentially addresses the following question: Do
applications of the rewrite rules to arbitrary expressions involving the interface procedures
always terminate? For arbitrary algebraic equations, convergence (finite termination) is un-
decidable. But, if the right hand side of each rewrite rule has fewer terms than the left, then
the rewrite process must terminate.
Unique termination property: This property indicates whether application of rewrite rules
in different orders always result in the same answer. Essentially, to determine this property,
the answer to the following question needs to be checked: Can all possible sequence of
choices in application of the rewrite rules to an arbitrary expression involving the interface
procedures always give the same number? Checking the unique termination property is a
very difficult problem.
8. Exercises w.
c
w as True or False. Justify your answer.
1. w
Mark the following
a. All software engineering principles are backed by either scientific basis or theoretical
proof.
b. Functional requirements address maintainability, portability, and usability issues.
c. The edges of decision tree represent corresponding actions to be performed according
to conditions.
d. The upper rows of the decision table specify the corresponding actions to be taken
when an evaluation test is satisfied.
e. A column in a decision table is called an attribute.
f. Pre-conditions of axiomatic specifications state the requirements on the parameters of
the function before the function can start executing.
g. Post-conditions of axiomatic specifications state the requirements on the parameters
of the function when the function is completed.
k. The structured specification technique that is used to reduce the effort in writing
specification is
Incremental specification
Specification instantiation o m
Both the above
None of the above o t.c
l. Examples of executable specifications are s p
Third generation languages
o g
Fourth generation languages
Second-generation languages . bl
First generation languages u p
3. Identify the roles of a system analyst. r o
4.
s g
Identify the important parts of an SRS document. Identify the problems an organization
5. ent
might face without developing an SRS document.
Identify the non-functional requirement-issues that are considered for a given problem
description.
u d
6. t
Discuss the problems that an unstructured specification would create during software
s
development.
it y
.c
7. Identify the necessity of using formal technique in the context of requirements
specification.
8. w
Identify the differences between model-oriented and property-oriented approaches in the
w
context of requirements specification.
9. w
Explain the use of operational semantic.
10. Explain the use of algebraic specifications in the context of requirements specification.
Identify the requirements of algebraic specifications to define a system.
11. Identify the essential sections of an algebraic specification to define a system. Explain the
steps for developing algebraic specification of simple problems.
12. Identify the properties that every good algebraic specification should possess.
13. Identify the basic properties of a structured specification.
14. Discuss the advantages and disadvantages of algebraic specification.
15. Write down the important features of an executable specification language with examples.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
35
Modelling Timing
Constraints
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
An event may either be instantaneous or may have certain duration. For example, a
button press event is described by the duration for which the button was kept pressed.
Some authors argue that durational events are really not a basic type of event, but can be
expressed using other events. In fact, it is possible to consider a duration event as a combination
of two events: a start event and an end event. For example, the button press event can be
described by a combination of start button press and end button press events. However, it is
often convenient to retain the notion of a durational event. In this text, we consider durational
events as a special class of events. Using the preliminary notions about events discussed in this
subsection, we classify various types of timing constraints in subsection 1.7.1.
o m
but it can also help us to quickly identify the different timing constraints that can exist
from a casual examination of a problem. That is, in addition to better understanding of the
accurately. o t.c
behavior of a system, it can also let us work out the specification of a real-time system
s p
Different timing constraints associated with a real-time system can broadly be classified into
performance and behavioral constraints.
o g
bl
Performance constraints are the constraints that are imposed on the response of the system.
.
Behavioral constraints are the constraints that are imposed on the stimuli generated by the
p
environment.
ou
Behavioral constraints ensure that the environment of a system is well behaved, whereas
gr
performance constraints ensure that the computer system performs satisfactorily.
s
nt
Each of performance and behavioral constraints can further be classified into the following
three types:
Delay Constraint
d e
Deadline Constraint
t u
Duration Constraint
y s
it
These three classes of constraints are explained in the subsequent sections.
.c
w
1.2.1. Delay Constraints
w
w
A delay constraint captures the minimum time (delay) that must elapse between the
occurrence of two arbitrary events e1 and e2. After e1 occurs, if e2 occurs earlier than the
minimum delay, then a delay violation is said to occur. A delay constraint on the event e2 can be
expressed more formally as follows:
t(e2 ) t(e1 ) d
where t(e2 ) and t(e1 ) are the time stamps on the events e2 and e1 respectively and d is the
minimum delay specified from e2. A delay constraint on the events e2 with respect to the event
e1 is shown pictorially in Fig. 35.1. In Fig. 35.1s, denotes the actual separation in time
between the occurrence of the two events e1 and e2 and d is the required minimum separation
between the two events (delay). It is easy to see that e2 must occur after at least d time units have
elapsed since the occurrence of e1; otherwise we shall have a delay violation.
>= d
d
t=0 t(e1) t(e2)
t(e2 ) t(e1 ) d
s p
o g
<= d
.bl
u p d
t=0 t(e1)
r o t(e2)
d
The deadline and delay constraints can e further be classified into two types each based on
whether the constraint is imposed on
t u 1.3.stimulus or on the response event. This has been
the
s
explained with some examples in section
ti y
1.2.3. Duration Constraints .c
w
w on an event specifies the time period over which the event acts. A
A duration constraint
w either be minimum type or maximum type. The minimum type duration
duration constraint can
constraint requires that once the event starts, the event must not end before a certain minimum
duration; whereas a maximum type duration constraint requires that once the event starts, the
event must end before a certain maximum duration elapses.
Public
Switched
Telephone
Network
Telephone
system
o m
Call
t.c
Call Receiver
Initiator
Environment
p o
gs
o
Fig. 35.3 Schematic Representation ofla Telephone System
. b
1.3. Examples of Different Types of uTiming p Constraints
r o
We illustrate the different classes of timing
s g constraints by using the examples from a
Note that I have intentionally drawn an e ntstyled telephone,
telephone system discussed in. A schematic diagram
old
of a telephone system is given in Fig. 35.3.
because its operation is easier to
understand! Here, the telephone handset
u d and the Public Switched Telephone Network (PSTN)
stexamplesystem
are considered as constituting the computer and the users as forming the environment. In
the following, we give a few simple
i t y
different types of timing constraints.
operations of the telephone system to illustrate the
. c
wa real-time system depending on whether the two events involved
Deadline constraints: In the following, we discuss four different types of deadline constraints
w
that may be identified in
w are stimulus type or response type.
in a deadline constraint
StimulusStimulus (SS): In this case, the deadline is defined between two stimuli. This is a
behavioral constraint, since the constraint is imposed on the second event which is a stimulus.
An example of an SS type of deadline constraint is the following:
Once a user completes dialling a digit, he must dial the next digit within the next 5 seconds;
otherwise an idle tone is produced.
In this example, the dialing two consecutive digits represent the two stimuli to the telephone
system.
StimulusResponse (SR): In this case, the deadline is defined on the response event, measured
from the occurrence of the corresponding stimulus event. This is a performance constraint,
since the constraint is imposed on a response event. An example of an SR type of deadline
constraint is the following:
Once the receiver of the hand set is lifted, the dial tone must be produced by the
system within 2 seconds, otherwise a beeping sound is produced until the handset is
replaced. In this example, the lifting of the receiver hand set represents a stimulus to the
telephone system and production of the dial tone is the response.
ResponseStimulus (RS): Here the deadline is on the production of response counted
from the corresponding stimulus. This is a behavioral constraint, since the constraint is
imposed on the stimulus event. An example of an RS type of deadline constraint is the following:
Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise
the system enters an idle state and an idle tone is produced.
ResponseResponse (RR): An RR type of deadline constraint is defined on two response
events. In this case, once the first response event occurs, the second response event must occur
before a certain deadline. This is a performance constraint, since the timing constraint has been
defined on a response event. An example of an RR type of deadline constraint is the following:
m
Once the ring tone is given to the callee, the corresponding ring back tone must be
o
t.c
given to the caller within two seconds, otherwise the call is terminated.
o
Here ring back tone and the corresponding ring tone are the two response events.
p
Delay Constraints: We can identify only one type of delay
g sin constraint (SS type) in the
RR SR SR RR RS SS SS RS
S1 S2
We have already discussed that events can be considered to be of two types: stimulus events
and response events. We had also discussed different types of timing constraints in Section 1.3.
Now we explain how these constraints can be modelled by using EFSMs.
The EFSM model for this constraint is shown in Fig. 35.7. In Fig. 35.7, as soon as dial tone
appears, a timer is set to expire in 30 seconds and the system transits to the Await First Digit
state. If the timer expires before the first digit arrives, then the system transits to an idle state
where an idle tone is produced. Otherwise, if the digit appears first, then the system transits to
the Await Second Digit state.
Await
Second
Digit
First digit
o
.bl
Fig. 35.7 Model of an RS p
uType of Deadline
r o
2.2.3. StimulusResponse (SR) g
ts
e n example of an SR type of deadline constraint:
In Sec. 1.3, we had considered the following
Once the receiver of the hand setd is lifted, the dial tone must be produced by the
system within 20 seconds, otherwise t u a beeping sound is produced until the handset is
replaced. y s
i t
timer is set to expire after 2.c
The EFSM model for this constraint is shown in Fig. 35.8. As soon as the handset is lifted, a
sec and the system transits to Await Dial Tone state. If the dial
w transits to Await First Digit state. Otherwise, it transits to
tone appears first, then the
Await Receiver On-hookw system
w state.
Await
First
Digit
Dial tone
Timer alarm/beeping
Await
Dial
Hand set lift/ Tone
set timer Await
(2 s)
o m
Receiver
On-hook
o t.c
s
Fig. 35.8 Model of an SR Type of Deadlinep
o g
.bl
u p Await
First
r o Digit
g
Ring-backtstone
e n
u d
st
it y Timer alarm/terminate call
.c
Await
Ring-back
w
Ring-tone/ Tone
w
set timer
w
(2 s)
Await
Receiver
On-hook
in 2 seconds. If the ring-back tone appears first, the system transits to Await First Digit state,
else it enters Await Receiver On-hook state, and the call is terminated.
Await
Next
Digit
o m
Timer alarm o t.c
s p
o g
. bl
Await
u p
Next digit/beeping
First digit/
Next
Event r o
set timer
s g
nt
Await
(10 ms) Caller
d e On-hook
t u
y s
Fig. 35.10 Model of an SS Type of Delay Constraint
c it
.
2.2.6. Durational Constraint
w
w constraint, an event is required to occur for a specific duration. The
w constraint we had considered in Sec. 1.3 is the following: If you press
In case of a durational
example of a durational
the button of the handset for less than 15 seconds it connects to the local operator. If you press
the button for any duration lasting between 15 to 30 seconds, it connects to the international
operator. If you keep the button pressed for more than 30 seconds, then on releasing it would
produce the dial tone.
Local
Operator
Button
release
Button
press
Await Inter-
Set Event 1 national
alarm Button Operator
(15sec) release
Timer
alarm/
Set Await
alarm Event 2
o m
Dial
Tone
(15sec) Button
release/ t. c
dial tone
p o
Timer
alarm g s
oAwait
.bl Button
Release
u p
r o
g
Fig. 35.11 A Model of a Durational Constraint
s
ent
The EFSM model for this example is shown in Fig. 35.11. Note that we have introduced two
u d
intermediate states Await Event 1 and Await Event 2 to model a durational constraint.
st
3. Exercises y
it
. c
1.
w
Mark the following as True or False. Justify your answer.
a. A deadline w constraint between two stimuli can be considered to be a behavioural
constraintwon the environment of the system.
2. Identify and represent the timing constraints in the following air-defense system by means
of an extended state machine diagram. Classify each constraint into either performance or
behavioral constraint.
Every incoming missile must be detected within 0.2 seconds of its entering the radar
coverage area. The intercept missile should be engaged within 5 seconds of detection of
the target missile. The intercept missile should be fired after 0.1 Seconds of its engagement
but no later than 1 second.
3. Represent a wash-machine having the following specification by means of an extended
state machine diagram.
The wash-machine waits for the start switch to be pressed. After the user presses the start
switch, the machine fills the wash tub with either hot or cold water depending upon the
setting of the HotWash switch. The water filling continues until the high level is sensed.
The machine starts the agitation motor and continues agitating the wash tub until either the
preset timer expires or the user presses the stop switch. After the agitation stops, the
machine waits for the user to press the startDrying switch. After the user presses the
startDrying switch, the machine starts the hot air blower and continues blowing hot air into
the drying chamber until either the user presses the Stop switch or the preset timer expires.
4. What is the difference between a performance constraint and a behavioral constraint? Give
practical examples of each type of constraint.
5. Represent the timing constraints in a collision avoidance task in an air surveillance system
as an extended finite state machine (EFSM) diagram. The collision avoidance task
consists of the following activities.
The first subtask named radar signal processor processes the radar signal on a signal
processor to generate the track record in terms of the targets location and velocity
within 100 mSec of receipt of the signal.
The track record is transmitted to the data processor within 1 mSec after the track
record is determined.
A subtask on the data processor correlates the received track record with the track
o m
records of other targets that come close to detect potential collision that might occur
within the next 500 mSec.
o t.c
If a collision is anticipated, then the corrective action is determined within 10 mSec
by another subtask running on the data processor.
s p
g
The corrective action is transmitted to the track correction task within 25 mSec.
o
bl
6. Consider the following (partial) specification of a real-time system:
p .
The velocity of a space-craft must be sampled by a computer on-board the spacecraft at
least once every second (the sampling event is denoted by S). After sampling the velocity,
ou
the current position is computed (denoted by event C) within 100 msec, parallelly, the
gr
expected position of the space-craft is retrieved from the database within 200 msec
s
nt
(denoted by event R). Using these data, the deviation from the normal course of the space-
craft must be determined within 100 msec (denoted by event D) and corrective velocity
d e
adjustments must be carried out before a new velocity value is sampled in (the velocity
t u
adjustment event is denoted by A). Calculated positions must be transmitted to the earth
s
station at least once every minute (position transmission event is denoted by the event T).
y
it
Identify the different timing constraints in the system. Classify these into either
.c
performance or behavioral constraints. Construct an EFSM to model the system.
7. w
Construct the EFSM model of a telephone system whose (partial) behavior is described
below: w
w
After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial
tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone
appears, the first digit should to be dialled within 10 seconds and the subsequent five digits
within 5 seconds of each other. If the dialling of any of the digits is delayed, then an idle
tone is produced. The idle tone continues until the receiver handset is replaced.
8. What are the different types of timing constraints that can occur in a system? Give
examples of each.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
36
Software Design Part 1
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
coupling. o t.c
hierarchy. The primary characteristics of neat module decomposition are high cohesion and low
s p
1.2.1. Cohesion
o g
. bl
u p
Most researchers and engineers agree that a good software design implies clean
decomposition of the problem into modules, and the neat arrangement of these modules in a
r o
hierarchy. The primary characteristics of neat module decomposition are high cohesion and low
coupling.
s g
ent
Cohesion is a measure of functional strength of a module. A module having high cohesion
and low coupling is said to be functionally independent of other modules. By the term functional
d
independence, we mean that a cohesive module performs a single task or function. The different
u
t
classes of cohesion that a module may possess are depicted in fig. 36.1.
s
i t y
Coincidental Logical
.cTemporal Procedural Communicational Sequential Functional
Low w High
w
w Fig. 36.1 Classification of Cohesion
cohesion. The set of functions responsible for initialization, start-up, shutdown of some process,
etc. exhibit temporal cohesion.
Procedural cohesion: A module is said to possess procedural cohesion, if the set of functions of
the module are all part of a procedure (algorithm) in which a certain sequence of steps have to be
carried out for achieving an objective, e.g. the algorithm for decoding a message.
Communicational cohesion: A module is said to have communicational cohesion, if all
functions of the module refer to or update the same data structure, e.g. the set of functions
defined on an array or a stack.
Sequential cohesion: A module is said to possess sequential cohesion, if the elements of a
module form the parts of sequence, where the output from one element of the sequence is input
to the next.
Functional cohesion: Functional cohesion is said to exist, if different elements of a module
cooperate to achieve a single function. For example, a module containing all the functions
m
required to manage employees pay-roll displays functional cohesion. Suppose a module displays
o
t.c
functional cohesion, and we are asked to describe what the module does, then we would be able
to describe it using a single sentence.
p o
1.2.2. Coupling g s
o
l of interdependence or interaction
.
Coupling between two modules is a measure of the degree b
between the two modules. A module having high cohesion
u p interchange
and low coupling is said to be
o
functionally independent of other modules. If two modules large amounts of data,
then they are highly interdependent. The degree rof coupling between two modules depends on
s g is basically determined by the number of
their interface complexity. The interface complexity
n
types of parameters that are interchanged whilet invoking the functions of the module. Even if no
techniques to precisely and quantitatively
d e estimate the coupling between two modules exist
u Five types of coupling can occur between any two
today, classification of the different types of coupling will help to quantitatively estimate the
t
s
degree of coupling between two modules.
modules as shown in fig. 36.2. y
ti
. c
Datew Stamp Control Common Content
w
Low w High
Stamp Coupling: Two modules are stamped coupled, if they communicate using a composite
data item such as a record in PASCAL or a structure in C.
Control coupling: Control coupling exists between two couples, if data from one module is used
to direct the order of instructions execution in another. An example of control coupling is a flag
set in one module and tested in another module.
Common coupling: Two modules are common coupled, if they share some global data items.
Content coupling: Content coupling exists between two modules, if their code is shared, e.g. a
branch from one module into another module.
different program. o m
simple and minimal. Therefore, a cohesive module can be easily taken out and reused in a
c
Understandability: Complexity of the design is reduced, becauset.different modules can be
understood in isolation as modules are more or less independent of
p oeach other.
s
1.2.4. Function-Oriented Design Approachog
b l
p .
The following are the salient features of a typical function-oriented design approach:
1. A system is viewed as something that performs
o u a set of functions. Starting at this high-
g r
level view of the system, each function is successively refined into more detailed functions.
s membership number to him, and prints a bill
For example, consider a function create-new-library member which essentially creates the
t
towards his membership charge. Thise
n
record for a new member, assigns a unique
function may consist of the following sub-functions:
assign-membership-number d
create-member-record t u
print-bill y s
t
Each of these sub-functionsi may be split into more detailed sub-functions and so on.
.c
2. The system state is centralized and shared among different functions, e.g. data such as
w for reference
w
member-records is available
create-new-member
and updating to several functions such as:
w
delete-member
update-member-record
Unlike function-oriented design methods, in OOD, the basic abstraction are not real-
world functions such as sort, display, track, etc, but real-world entities such as
employee, picture, machine, radar system, etc. For example in OOD, an employee
pay-roll software is not developed by designing functions such as update-employee-
record, get-employee-address, etc. but by designing objects such as employees,
departments, etc.
In object-oriented design, software is not developed by designing functions such as
update-employee-record, get-employee-address, etc., but by designing objects such as
employee, department, etc.
o m
In OOD, state information is not represented in a centralized shared memory but is
distributed among the objects of the system. For example,
t. c while developing an
employee pay-roll system, the employee data such as the
p o names of the employees,
g s
their code numbers, basic salaries, etc. are usually implemented
traditional programming system; whereas in an object-oriented
as global data in a
system these data are
distributed among different employee objects of thelo system. Objects communicate by
. b the state information of another
passing messages. Therefore, one object may discover
p or the other the real-world functions
object by interrogating it. Of course, somewhere
u
must be implemented.
r o
Function-oriented techniques such asgSA/SD group functions together if, as a group,
ts On the other hand, object-oriented techniques
they constitute a higher-level function.
e n of the data they operate on.
group functions together on the basis
Function-Oriented Approach:
BOOL detector_status[MAX_ROOMS];
int detector_locs[MAX_ROOMS];
BOOL alarm_status[MAX_ROOMS];/* alarm activated when status is set */
int alarm_locs[MAX_ROOMS]; /* room number where alarm is located */
int neighbor-alarm[MAX_ROOMS][10];
/* each detector has at most 10 neighboring locations */
o t.c
Divide and conquer principle. Each function is decomposed independently.
Graphical representation of the analysis results using Data Flow Diagrams (DFDs).
s p
2.2. Data Flow Diagrams o g
. bl
u p
The DFD (also known as a bubble chart) is a simple graphical formalism that can be used to
represent a system in terms of the input data to the system, various processing carried out on
r o
these data, and the output data generated by the system. A DFD model uses a very limited
s g
number of primitive symbols (as shown in fig. 36.3) to represent the functions performed by a
nt
system and the data flow among these functions.
e
u d
st
i t y
Data Store .
Processc External Entity Data Flow Output
w
wFig. 36.3 Symbols used for designing DFDs
w
The main reason why the DFD technique is so popular is probably because of the fact that
DFD is a very simple formalism it is simple to understand and use. Starting with a set of high-
level functions that a system performs, a DFD model hierarchically represents various sub-
functions. In fact, any hierarchical model is simple to understand. The human mind is such that it
can easily understand any hierarchical model of a system because in a hierarchical model,
starting with a very simple and abstract model of a system, different details of the system are
slowly introduced through different hierarchies. The data flow diagramming technique also
follows a very simple set of intuitive concepts and rules. DFD is an elegant modeling technique
that turns out to be useful not only to represent the results of structured analysis of a software
problem but also for several other applications such as showing the flow of documents or items
in an organization.
o
The data dictionary provides the analyst with a means to determine the definition of
p
s
different data structures in terms of their component elements.
g
o
2.3. DFD : Levels and Model
. bl
u p
The DFD model of a system typically consists of several DFDs, viz., level 0 DFD, level 1
r o
DFD, level 2 DFDs, etc. A single data dictionary should capture all the data appearing in all the
DFDs constituting the DFD model of a system.
s g
2.3.1. Balancing DFDs ent
u d
t
The data that flow into or out of a bubble must match the data flow at the next level of DFD.
s
it y
This is known as balancing a DFD. The concept of balancing a DFD has been illustrated in fig.
.c
36.4. In the level 1 of the DFD, data items d1 and d3 flow out of the bubble 0.1 and the data item
d2 flows into the bubble P1. In the next level, bubble 0.1 is decomposed. The decomposition is
w
balanced, as d1 and d3 flow out of the level 2 diagram and d2 flows in.
w
w
d3
P1 P2
0.1 0.2
d2
d1
d4
P3
0.3
d2 o m
o t.c
P11
0.1.1 s p
o g
d21 l
bd23
p .
o u
P12
gr P13
0.1.2 s 0.1.3
e nt
d1
u d d22 d3
st
i t y (b) Level 2 DFD
.c
w
w
Fig. 36.4 An example showing balanced decomposition
w
2.3.2. Context Diagram
The context diagram is the most abstract data flow representation of a system. It represents
the entire system as a single bubble. This bubble is labeled according to the main function of the
system. The various external entities with which the system interacts and the data flow occurring
between the system and the external entities are also represented. The data input to the system
and the data output from the system are represented as incoming and outgoing arrows. These
data flow arrows should be annotated with the corresponding data names. The name context
diagram is well justified because it represents the context in which the system is to exist, i.e. the
external entities who would interact with the system and the specific data items they would be
supplying the system and the data items they would be receiving from the system. The context
diagram is also called the level 0 DFD.
To develop the context diagram of the system, we have to analyse the SRS document to
identify the different types of users who would be using the system and the kinds of data they
would be inputting to the system and the data they would be receiving from the system. Here, the
term users of the system also includes the external systems which supply data to or receive
data from the system.
The bubble in the context diagram is annotated with the name of the software system being
developed (usually a noun). This is in contrast with the bubbles in all other levels which are
annotated with verbs. This is expected since the purpose of the context diagram is to capture the
context of the system rather than its functionality.
A software system called RMS calculating software would read three integral numbers from
the user in the range of -1000 and +1000 and then determine the root mean square (rms) of
the three input numbers and display it. In this example, the context diagram (fig. 36.5) is
o m
simple to draw. The system accepts three integers from the user and returns the result to him.
o t.c
User
s p
o g
dataitems
.
rmsbl
o up
g
rms
r
s
t0
Calculator
n
ude
t
i ys
t Fig. 36.5 Context Diagram
.c
w
Example 2: Tic-Tac-Toe Computer Game
w
w
Tic-tac-toe is a computer game in which a human player and the computer make alternative
moves on a 3 3 square. A move consists of marking previously unmarked square. The
player, who is first to place three consecutive marks along a straight line (i.e. along a row,
column, or diagonal) on the square, wins. As soon as either of the human player or the
computer wins, a message congratulating the winner should be displayed. If neither player
manages to get three consecutive marks along a straight line, nor all the squares on the board
are filled up, then the game is drawn. The computer always tries to win a game. The context
diagram of this problem is shown in fig. 36.6.
display Tic-Tac-Toe
Software
Human
move
Player
o
context diagram. After, developing the context rdiagram,
u
problem is to be worked out. The most abstract representation
the higher-level DFDs have to be
developed.
s g
n t
Context Diagram
d e
u
t 1 DFD, examine the high-level functional requirements.
Level 1 DFD: To develop the level
s
i tylevel 1 DFD.
If there are between 3 to 7 high-level functional requirements, then these can be directly
represented as bubbles in the
. c We can then examine the input data to these
If a system has morewthan 7 high-level functional requirements, then some of the related
functions, the data output by these functions, and represent them appropriately in the diagram.
w
requirements have towbe combined and represented in the form of a bubble in the level 1 DFD.
Such a bubble can be split in the lower DFD levels. If a system has less than three high-level
functional requirements, then some of them need to be split into their sub-functions so that we
have roughly about 5 to 7 bubbles on the diagram.
Decomposition: Each bubble in the DFD represents a function performed by the system. The
bubbles are decomposed into sub-functions at the successive levels of the DFD. Decomposition
of a bubble is also known as factoring or exploding a bubble. Each bubble at any level of DFD is
usually decomposed to anything between 3 to 7 bubbles. Too few bubbles at any level make that
level superfluous. For example, if a bubble is decomposed to just one bubble or two bubbles,
then this decomposition becomes redundant. Also, too many bubbles, i.e. more than 7 bubbles at
any level of a DFD makes the DFD model hard to understand. Decomposition of a bubble should
be carried on until a level is reached at which the function of the bubble can be described using a
simple algorithm.
Numbering the Bubbles: It is necessary to number the different bubbles occurring in the
DFD. These numbers help in uniquely identifying any bubble in the DFD from its bubble
number. The bubble at the context level is usually assigned the number 0 to indicate that it is the
0 level DFD. Bubbles at level 1 are numbered, 0.1, 0.2, 0.3, etc. When a bubble numbered x is
decomposed, its children bubble are numbered x.1, x.2, x.3, etc. In this numbering scheme, by
looking at the number of a bubble, we can unambiguously determine its level, its ancestors and
its successors.
A supermarket needs to develop the following software to encourage regular customers. For
this, the customer needs to supply his/her residence address, telephone number and the
driving license number. Each customer who registers for this scheme is assigned a unique
customer number (CN) by the computer. A customer can present his CN to the check out
staff when he makes any purchase. In this case, the value of his purchase is credited against
o m
his CN. At the end of each year, the supermarket intends to award surprise gifts to 10
ot.c
customers who make the highest total purchase over the year. Also, it intends to award a 22
carat gold coin to every customer whose purchase exceeds Rs.10,000. The entries against the
p
CN are the reset on the day of every year after the prize winners lists are generated.
s
o g
. bl
Sales-clerk
Sales details
u p
Winner-list
r o
s g
Super-
entsoftware
market
d
Gen-winner
u
0
Manager
st
command
it y
.c Customer-
details
CN
w
w
w Customer
The context diagram for this problem is shown in fig. 36.7, the level 1 DFD in fig. 36.8, and the
level 2 DFD in fig. 36.9.
Register- CN
Register-
customer sales
0.1 0.3
Customer-data Sales-info
o m
Generate-
winner-list o t.c
0.2
s p
o g
bl
Winner-list
.
Generate-winner-command
p
o u
gr
Fig. 36.8 Level 1 diagram for supermarket problem
ts
en Generate-winner-command
u d
st
Surprise-gift
ty
winner-list Sales-info
. ci
w
Gen-surprise- Find-total-
w
gift-winner sales
w 0.2.1 0.2.3
Total-sales
Sales-info
Gen-gold- Reset
coin-gift- 0.2.3
Gold-coin- winner
winner-list 0.2.2
o m
Although DFDs are simple to understand and draw, students and practitioners alike
encounter similar types of problems while modelling software problems using DFDs. While
o t.c
learning from experience is a powerful thing, it is an expensive pedagogical technique in the
business world. It is therefore helpful to understand the different types of mistakes that users
usually make while constructing the DFD model of systems.
s p
g
Many beginners commit the mistake of drawing more than one bubble in the context
o
diagram. A context diagram should depict the system
. bl as a single bubble.
Many beginners have external entities appearing
u p at all levels of DFDs. All external
oother levels of the DFD.
entities interacting with the system should be represented only in the context diagram.
r
g too less or too many bubbles in a DFD. Only 3
The external entities should not appear at
s
t
It is a common oversight to have either
to 7 bubbles per diagram should n
between 3 and 7 bubbles. de
be allowed, i.e. each bubble should be decomposed to
Key Words
Show- Error-message
Search- error-
book message
Search-results
Fig. 36.10 To show control information on a DFD A mistake
The method of carrying out decomposition to arrive at the successive levels and the
ultimate level to which decomposition is carried out are highly subjective and depend
on the choice and judgment of the analyst. Due to this reason, even for the same
problem, several alternative DFD representations are possible. Further, many a times it
is not possible to say which DFD representation is superior or preferable to another.
The data flow diagramming technique does not provide any specific guidance as to
how exactly to decompose a given function into its sub-functions and we have to use
subjective judgment to carry out decomposition.
o
. bl
u
main p
r o
s g
n t
valid-data
e
d valid-data rms
rms
t u
s
i ty
. c
w
get-good- compute- write-result
w
w data rms
data-items valid-data
data-items
read-input validate-
input
o m
transaction a module. Every transaction carries a tag, which identifies its type. Transaction
analysis uses this tag to divide the system into transaction modules and a transaction-centre
module.
ot.c
The structure chart for the supermarket prize scheme software is shown in fig. 36.13.
s p
o g
root
. bl
u p
r o
customer-
s g sales-registration
registration
t
winner-list-
n
generation
e
register- ud Gen-winner-
customer t
register-
y s list sales
c it surp-
customer- . rise gold total-
details w sales list coin
CN total-
list
sales
w
w sales- total-
sales
sales-
details
sales-details
info
gen-
get- gen- gold- get- record-
customer- generate- find-total- surprise- coin- sales- sales-
CN sales winner- details
details gift-list details
list
3. Exercises
1. Mark the following as True or False. Justify your answer.
a. Coupling between two modules is nothing but a measure of the degree of dependence
between them.
b. The primary characteristic of a good design is low cohesion and high coupling.
c. A module having high cohesion and low coupling is said to be functionally
independent of other modules.
d. The degree of coupling between two modules does not depend on their interface
complexity.
e. In the function-oriented design approach, the system state is decentralized and not
shared among different functions.
f. The essence of any good function-oriented design technique is to map the functions
performing similar activities into a module.
o m
g. In the object-oriented design, the basic abstraction is real-world functions.
h. An OOD (Object-Oriented Design) can be implemented using object-oriented
i.
languages only.
o t.c
A DFD model of a system represents the functions performed by the system and the
data flow taking place among these functions.
s p
j. g
A data dictionary lists all data items appearing in the DFD model of a system but does
o
bl
not capture the composition relationship among the data.
.
k. The context diagram of a system represents it using more than one bubble.
p
l.
u
A DFD captures the order in which the processes (bubbles) operate.
o
m. There should be at the most one control relationship between any two modules in a
properly designed structure chart.
gr
s
nt
2. For the following, mark all options which are true.
a. The desirable characteristics that every good software design need are
Correctness
d e
Understandability
t u
Efficiency
y s
Maintainability it
.c
All of the above
w
b. A module is said to have logical cohesion, if
w
it performs a set of tasks that relate to each other very loosely.
w
all the functions of the module are executed within the same time span.
all elements of the module perform similar operations, e.g. error handling, data
input, data output, etc.
None of the above.
c. High coupling among modules makes it
difficult to understand and maintain the product
difficult to implement and debug
expensive to develop the product as the modules having high coupling cannot be
developed independently
all of the above
d. The desirable characteristics that every good software design need are
error isolation
scope of reuse
understandability
all of the above
e. The purpose of structured analysis is
to capture the detailed structure of the system as perceived by the user
to define the structure of the solution that is suitable for implementation in some
programming language
all of the above
f. Structured analysis technique is based on
top-down decomposition approach
bottom-up approach
divide and conquer principle
none of the above
g. Data Flow Diagram (DFD) is also known as a:
structure chart
bubble chart
o m
t.c
Gantt chart
PERT chart
h. The context diagram of a DFD is also known as
p o
level 0 DFD
g s
level 1 DFD o
level 2 DFD
. bl
none of the above
u p
i. Decomposition of a bubble is also known as
classification r o
factoring s g
exploding
aggregation ent
u d
j.
t
Decomposition of a bubble should be carried on
s
till the atomic program instructions are reached
up to two levels it y
.c
until a level is reached at which the function of the bubble can be described using
w
a simple algorithm
w
none of the above
k. w
The bubbles in a level-1 DFD represent
exactly one high-level functional requirement described in SRS document
more than one high-level functional requirement
part of a high-level functional requirement
any of the above depending on the problem
l. By looking at the structure chart, we can
say whether a module calls another module just once or many times
not say whether a module calls another module just once or many times
tell the order in which the different modules are invoked
not tell the order in which the different modules are invoked
m. In which of the following ways does a structure chart differ from a flow chart?
it is always difficult to identify the different modules of the software from its flow
chart representation
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
7
Software Engineering
Issues
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
37
Software Design Part 2
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
the system. Each object essentially consists of some data that are private to the object and a set of
functions that operate on those data as shown in fig.37.1. In fact, the functions of an object have
ot.c
the sole authority to operate on the private data of that object. Therefore, an object can not
directly access the data internal to another object. However, an object can indirectly access the
s p
internal data of the other objects by invoking the operations (i.e. methods) supported by those
objects.
o g
m7 m8 .bl
u p
r o
s g
ent
u d
st
it y
.c
m1 w m5
m2 w Data m6
w
Object
The data internal to an object are called the attributes of the object, and the functions
supported by an object are called its methods. Fig. 37.2 shows LibraryMember class with eight
attributes and five methods.
d e
E-Mail Number
u
Membership Admission Date
t
y sMembership Expiry Date
it Books Issued
.c issueBook();
w findPendingBooks();
w findOverdueBooks();
w returnBook();
findMembershipDetails();
1.3. Inheritance
The inheritance feature allows us to define a new class by extending or modifying an existing
class. The original class is called the base class (or super class) and the new class obtained
through inheritance is called as the derived class (or sub class). A base class is a generalization of
its derived classes. This means that the base class contains only those properties that are common
to all the derived classes. Again each derived class is a specialization of its base class because it
modifies or extends the basic properties of the base class in certain ways. Thus, the inheritance
relationship can be viewed as a generalization-specialization relationship.
Using the inheritance relationship, different classes can be arranged in a class hierarchy (or
class tree). In addition to inheriting all properties of the base class, a derived class can define
new properties. That is, it can define new data and methods. It can even give new definitions to
methods which already exist in the base class. Redefinition of methods which existed in the base
class is called as method overriding. In fig. 37.3, LibraryMember is the base class for the
derived classes Faculty, Student, and Staff. Similarly, Student is the base class for the derived
classes Undergraduate, Postgraduate, and Research. Each derived class inherits all the data and
methods of the base class. It can also define additional data and methods or modify some of the
inherited data and methods. The different classes in a library automation system and the
inheritance relationship among them are shown in the fig. 37.3. The inheritance relationship has
been represented in fig. 37.3 using a directed arrow drawn from a derived class to its base class.
In fig. 37.3, the LibraryMember base class might define the data for name, address, and library
membership number for each member. Though Faculty, Student, and Staff classes inherit these
o m
data, they might have to redefine the respective issue-book methods because the number of
o t.c
books that can be borrowed and the duration of loan may be different for the different category
of library members. Thus, the issue-book method is overridden by each of the derived classes
p
and the derived classes might define additional data max-number-books and max-duration-of-
s
issue which may vary for the different member categories.
o g
Library Member . bl
u p Bass class
r o
s g
nt
deStudent
Derived
Faculty Staff Classes
t u
it ys
.c
w
w
Under Graduate Post Graduate Research
w
Fig. 37.3 Library Information System Example
classes separately, these methods and data are defined only once in the base class and are
inherited by each of its subclasses. For example, in the Library Information System example of
fig. 37.3, each category of member objects Faculty, Student, and Staff need the data member-
name, member-address, and membership-number and therefore these data are defined in the base
class LibraryMember and inherited by its subclasses.
Another advantage of the inheritance mechanism is the conceptual simplification that comes
from reducing the number of independent features of the classes.
o m
can be represented as in fig. 37.4. Multiple inheritance is represented by arrows drawn from the
subclass to each of the base classes. The class Research inherits features from both the classes
Student and Staff.
o t.c
Library Member s p
g
Base class
o
. bl
u p
r o
Faculty
s
Studentg Staff Derived Classes
n t
ude
t
tys
ci
Multiple
Under Graduate Post Graduate Research
. Inheritance
w
w
Fig. 37.4 Library Information System example with multiple inheritance.
w
1.4. Encapsulation
The property of an object by which it interfaces with the outside world only through
messages is referred to as encapsulation. The data of an object are encapsulated within its
methods and are available only through message-based communication. This concept is
schematically represented in fig. 37.5.
m3 m4
m2 m5
Data
m1 m6
Methods
o m
1.4.1. Advantages Of Encapsulation
o t.c
Encapsulation offers three important advantages:
s p
g
It protects an objects internal data from corruption by other objects. This protection
o
. bl
includes protection from unauthorized access and protection from different types of
problems that arise from concurrent access of data such as deadlock and inconsistent
values.
u p
r o
Encapsulation hides the internal structure of an object so that interaction with the
s
object is simple and standardized. This g facilitates reuse of objects across different
n t structure or procedures of an object are modified,
projects. Furthermore, if the internal
e results in easy maintenance.
other objects are not affected. This
d
Since objects communicate
t u among each other using messages only, they are weakly
s are inherently weakly coupled enhances understanding
coupled. The fact that objects
of design since eachty
i object can be studied and understood almost in isolation from
other objects.
.c
w
1.5. Polymorphism w
w
Polymorphism literally means poly (many) morphs (forms). Broadly speaking,
polymorphism denotes the following:
The same message can result in different actions when received by different objects. This
is also referred to as static binding. This occurs when multiple methods with the same
operation name exist.
When we have an inheritance hierarchy, an object can be assigned to another object of its
ancestor class. When such an assignment occurs, a method call to the ancestor object
would result in the invocation of the appropriate method of the object of the derived
class. The exact method to which a method call would be bound cannot be known at
compile time, and is dynamically decided at the runtime. This is also known as dynamic
binding.
r
g overloaded create method
ts
Fig. 37.6 Circle class with
e n
1.5.2. Dynamic Binding
u d
t
ys can send a generic message to a set of objects which
Using dynamic binding a programmer
may be of different types (i.e.,itbelonging to different classes) and leave the exact way in which
the message would be handled. c to the receiving objects. Suppose we have a class hierarchy of
win a drawing as shown in fig. 37.7. Now, suppose the display method
different geometric objects
w and is overridden in each derived class. If the different types of
is declared in the shape
w class
geometric objects making up a drawing are stored in an array of type shape, then a single call to
the display method for each object would take care to display the appropriate drawing element.
That is, the same draw call to a shape object would take care of drawing the appropriate shape.
This code segment is shown in fig. 37.8.
Shape
Ellipse Square
o m
Cube
o t.c
p
s objects
g
Fig. 37.7 Class hierarchy of geometric
b lo
Traditional Code
p .
Object-oriented Code
____
u d
____
st
ti y
.c code and object-oriented code using dynamic binding
Fig. 37.8 Traditional
w
w
1.5.3. Advantages w of Dynamic Binding
The main advantage of dynamic binding is that it leads to elegant programming and
facilitates code reuse and maintenance. With dynamic binding, new derived objects can be added
with minimal changes to existing objects. This advantage of polymorphism can be illustrated by
comparing the code segments of an object-oriented program and a traditional program for
drawing various graphic objects on the screen. It can be assumed that the shape is the base class,
and the classes Circle, Rectangle, and Ellipse are derived from it. Now, shape can be assigned
any objects of type Circle, Rectangle, Ellipse, etc. But, a draw method invocation of the shape
object would invoke the appropriate method. It can be easily seen in fig. 37.8 that, because of
dynamic binding, the object-oriented code is much more concise and intellectually appealing.
Also, suppose in the example program segment, it is later found necessary to handle a new
graphics drawing primitive, say Ellipse, then, the procedural code has to be changed by adding a
new if-then-else clause. However, in case of the object-oriented program, the code need not
change. Only a new class called Ellipse has to be defined.
s p
2. Object Modelling using UML
o g
. bl
2.1. Model and its uses
u p
r o
A model captures aspects important for some application while omitting (or abstracting) the
s g
rest. A model in the context of software development can be graphical, textual, mathematical, or
ent
program code-based. Models are very useful in documenting design and analysis results. Models
also facilitate the analysis and design procedures themselves. Graphical models are very popular
u d
because they are easy to understand and construct. UML is primarily a graphical modelling tool.
st
However, it often requires text explanations to accompany the graphical models.
t y
An important reason behind constructing a model is that it helps manage complexity. Once
i
.c
models of a system have been constructed, they can be used for a variety of purposes during
software development, including the following:
w
Code reuse by the use of predefined class libraries
Analysis w
Specification
w
Code generation
Design
Visualize and understand the problem and the working of a system
Testing, etc.
In all these applications, the UML models can not only be used to document the results but
also to arrive at the results themselves. Since a model can be used for a variety of purposes, it is
reasonable to expect that the model would vary depending on the purpose for which it is being
constructed. For example, a model developed for initial analysis and specification should be very
different from the one used for design. A model that is being used for analysis and specification
would not show any of the design decisions that would be made later on during the design stage.
On the other hand, a model used for design purposes should capture all the design decisions.
Therefore, it is a good idea to explicitly mention the purpose for which a model has been
developed, along with the model.
Version 2 EE IIT, Kharagpur 11
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
Fig. 37.9 shows the UML diagrams responsible for providing the different views.
Users view: This view defines the functionalities (facilities) made available by the system to
o t.c
its users. The users view captures the external users view of the system in terms of the
functionalities offered by the system. The users view is a black-box view of the system
s p
where the internal structure, the dynamic behavior of different system components, the
g
implementation etc. are not visible. The users view is very different from all other views in
o
bl
the sense that it is a functional model compared to the object model of all other views. The
.
users view can be considered as the central view and all other views are expected to conform
p
u
to this view. This thinking is in fact the crux of any user centric development style.
o
r
gBehavioral View
s
t Sequence Diagram
Structural View
- Class Diagram en
- Object Diagram d Collaboration Diagram
u Users View State-chart-Dia
t
s - Use Case Activity Dia
it y
. c Diagram
w
Implementation View Environmental View
w- Component Diagram - Deployment Dia
w
Fig. 37.9 Different types of diagrams and views supported in UML
Structural view: The structural view defines the kinds of objects (classes) important to the
understanding of the working of a system and to its implementation. It also captures the
relationships among the classes (objects). The structural model is also called the static model,
since the structure of a system does not change with time.
Behavioral view: The behavioural view captures how objects interact with each other to
realize the system behaviour. The system behaviour captures the time-dependent (dynamic)
behaviour of the system.
Implementation view: This view captures the important components of the system and their
dependencies.
Environmental view: This view models how the different components are implemented on
different pieces of hardware.
o m
Use cases correspond to the high-level functional requirements. The use cases partition the
system behaviour into transactions, so that each transaction performs some useful action from the
s p
2.3.1. Purpose of Use Cases
o g
. bl
u p
The purpose of a use case is to define a piece of coherent behaviour without revealing the
internal structure of the system. The use cases do not mention any specific algorithm to be used
r o
or the internal data representation, internal structure of the software, etc. A use case typically
g
represents a sequence of interactions between the user and the system. These interactions consist
s
ent
of one mainline sequence. The mainline sequence represents the normal interaction between a
user and the system. The mainline sequence is the most occurring sequence of interaction. For
d
example, the mainline sequence of the withdraw cash use case supported by a bank ATM drawn,
u
t
complete the transaction, and get the amount. Several variations to the main line sequence may
s
it y
also exist. Typically, a variation from the mainline sequence occurs when some specific
conditions hold. For the bank ATM example, variations or alternate scenarios may occur, if the
.c
password is invalid or the amount to be withdrawn exceeds the amount balance. The variations
w
are also called alternative paths. A use case can be viewed as a set of related scenarios tied
w
together by a common goal. The mainline sequence and each of the variations are called
w
scenarios or instances of the use case. Each scenario is a single path of user events and system
activity through the use case.
the communication relationship. It indicates that the actor makes use of the functionality
provided by the use case. Both the human users and the external systems can be represented by
stick person icons. When a stick person icon represents an external system, it is annotated by the
stereotype <<external system>>.
Example
The use case model for the Tic-Tac-Toe problem is shown in fig. 37.10. This software has
only one use case play move. Note that the use case get-user-move is not used here. The
name get-user-move would be inappropriate because the use cases should be named from
the users perspective.
Play move
o m
Player
o t.c
s p
Tic-tac-toe game
o g
. bl game
Fig. 37.10 Use case model for tic-tac-toe
Text Description u p
Each ellipse on the use case diagram should be r oaccompanied by a text description. The text
s g
description should define the details of the interaction between the user and the computer and
n t all the behaviour associated with the use case
other aspects of the use case. It should include
e
in terms of the mainline sequence, different variations to the normal behaviour, the system
responses associated with the use dcase, the exceptional conditions that may occur in the
t u is often written in a conversational style describing
behaviour, etc. The behaviour description
the interactions between the y s and the system. The text description may be informal, but
actor
t
i The following are some of the information which may be
. c
some structuring is recommended.
included in a use case text description in addition to the mainline sequence, and the
alternative scenarios. w
Contact persons: w This section lists personnel of the client organization with whom the use
w
case was discussed, date and time of the meeting, etc.
Actors: In addition to identifying the actors, some information about actors using this use
case which may help the implementation of the use case may be recorded.
Pre-condition: The preconditions would describe the state of the system before the use case
execution starts.
Post-condition: This captures the state of the system after the use case has successfully
completed.
Non-functional requirements: This could contain the important constraints for the design
and implementation, such as platform and environment conditions, qualitative statements,
response time requirements, etc.
Exceptions, error situations: This contains only the domain-related errors such as lack of
users access rights, invalid entry in the input fields, etc. Obviously, errors that are not
domain related, such as software errors, need not be discussed here.
Sample dialogs: These serve as examples illustrating the use case.
Specific user interface requirements: These contain specific requirements for the user
interface of the use case. For example, it may contain forms to be used, screen shots,
interaction style, etc.
Document references: This part contains references to specific domain-related documents
which may be useful to understand the system operation.
o
2.3.4. Factoring of Commonality Among Use Cases
t.c
s p
It is often desirable to factor use cases into component usegcases. Actually, factoring of use
b
cases is required under two situations. First, complex use cases lo need to be factored into simpler
p
use cases. This would not only make the behaviour associated . with the use case much more
o u use cases
comprehensible, but also make the corresponding interaction diagrams more tractable. Without
decomposition, the interaction diagrams for complex
r
gcases. Factoring
accommodated on a single sized (A4) paper. Secondly,
may become too large to be
use cases need to be factored whenever
there is common behaviour across different use ts would make it possible to define
n
eof use cases. This makes analysis of the class design
such behaviour only once and reuse it whenever required. It is desirable to factor out common
usage such as error handling from a set
much simpler and elegant. However,tu
d
a word of caution here. Factoring of use cases should not be
done except for achieving the above s
y two objectives. From the design point of view, it is not
advantageous to break up a useitcase into many smaller parts just for the sake of it.
.c for factoring of use cases, as follows:
UML offers three mechanisms
w
Generalization w
w can be used when one use case is similar to another, but does
Use case generalization
something slightly differently or something more. Generalization works the same way with
use cases as it does with classes. The child use case inherits the behaviour and meaning of the
parent use case. The notation is the same too (as shown in fig. 37.11). It is important to
remember that the base and the derived use cases are separate use cases and should have
separate text descriptions.
o
Fig. 37.11 Representation of use case generalization m
o t.c
Base Use
<< include >> p
s Use
case g
Common
o case
b l
p .
Fig. 37.12 Representation of use case inclusion
ou
Includes
gr
The includes relationship in the older versions of UML (prior to UML 1.1) was known as the
s
ent
uses relationship. The includes relationship involves one use case including the behaviour of
another use case in its sequence of events and actions. The includes relationship occurs when
d
a chunk of behaviour is similar across a number of use cases. The factoring of such
u
t
behaviour will help in not repeating the specification and implementation across different use
s
it y
cases. Thus, the includes relationship explores the issue of reuse by factoring out the
commonality across use cases. It can also be gainfully employed to decompose a large and
.c
complex use cases into more manageable parts. As shown in fig. 37.12, the includes
w
relationship is represented using a predefined stereotype <<include>>. In the includes
w
relationship, a base use case compulsorily and automatically includes the behaviour of the
w
common use cases. As shown in example fig. 37.13, issue-book and renew-book both include
check-reservation use case. The base use case may include several use cases. In such cases, it
may interleave their associated common use cases together. The common use case becomes a
separate use case and the independent text description should be provided for it.
it y
.c
Fig. 37.14 Representation of use case extension
w
w
Organization of Use Cases
w
When the use cases are factored, they are organized hierarchically. The high-level use cases
are refined into a set of smaller and more refined use cases as shown in fig. 37.15. Top-level
use cases are super-ordinate to the refined use cases. The refined use cases are sub-ordinate
to the top-level use cases. Note that only the complex use cases should be decomposed and
organized in a hierarchy. It is not necessary to decompose simple use cases. The functionality
of the super-ordinate use cases is traceable to their sub-ordinate use cases. Thus, the
functionality provided by the super-ordinate use cases is composite of the functionality of the
sub-ordinate use cases. In the highest level of the use case model, only the fundamental use
cases are shown. The focus is on the application context. Therefore, this level is also referred
to as the context diagram. In the context diagram, the system limits are emphasized. The top-
level diagram contains only those use cases with which the external users of the system
interact. The subsystem-level use cases specify the services offered by the subsystems to the
other subsystems. Any number of levels involving the subsystems may be utilized. In the
lowest level of the use case hierarchy, the class-level use cases specify the functional
fragments or operations offered by the classes.
o m
Subsystems
t.c
use case 3.2
p o
gs
o
. bl
u p
r o
use case 1
s g case 3
use
nt
de
Method
use case 2
t u
s
ti y37.15 Hierarchical organization of use cases
Fig.
.c
w
2.4. Class Diagrams
w
w
A class diagram describes the static structure of a system. It shows how a system is
structured rather than how it behaves. The static structure of a system comprises of a number of
class diagrams and their dependencies. The main constituents of a class diagram are classes and
their relationships: generalization, aggregation, association, and various kinds of dependencies.
The classes represent entities with common features, i.e. attributes and operations. Classes
are represented as solid outline rectangles with compartments. Classes have a mandatory name
compartment where the name is written centered in boldface. The class name is usually written
using mixed case convention and begins with an uppercase. The class names are usually chosen
to be singular nouns. An example of a class is shown in fig. 37.1.2. Classes have optional
attributes and operations compartments. A class may appear on several diagrams. Its attributes
and operations are suppressed on all but one diagram.
2.4.1. Association
Associations are needed to enable objects to communicate with each other. An association
describes a connection between classes. The association relation between two objects is called
object connection or link. Links are instances of associations. A link is a physical or conceptual
connection between object instances. For example, suppose Amit has borrowed the book Graph
Theory. Here, borrowed is the connection between the objects Amit and Graph Theory book.
Mathematically, a link can be considered to be a tuple, i.e. an ordered list of object instances. An
association describes a group of links with a common structure and common semantics. For
example, consider the statement that Library Member borrows Books. Here, borrows is the
association between the class LibraryMember and the class Book. Usually, an association is a
binary relation (between two classes). However, three or more different classes can be involved
in an association. A class can have an association relationship with itself (called recursive
association). In this case, it is usually assumed that two different objects of the class are linked
by the association relationship.
o m
Association between two classes is represented by drawing a straight line between the
o t.c
concerned classes. Fig. 37.16 illustrates the graphical representation of the association relation.
The name of the association is written along side the association line. An arrowhead may be
s p
placed on the association line to indicate the reading direction of the association. The arrowhead
g
should not be misunderstood to be indicating the direction of a pointer implementing an
o
bl
association. On each side of the association relation, the multiplicity is noted as an individual
.
number or as a value range. The multiplicity indicates how many instances of one class are
p
o u
associated with each other. Value ranges of multiplicity are noted by specifying the minimum
and maximum value, separated by two dots, e.g. 1..5. An asterisk is a wild card and means many
gr
(zero or more). The association of fig. 37.16 should be read as Many books may be borrowed
s
nt
by a Library Member. Observe that associations (and links) appear as verbs in the problem
statement.
d e
t u1 W borrowed by *
Library Member
y s Book
c it
. 37.16 Association between two classes
w Fig.
w realized by assigning appropriate reference attributes to the classes
Associations are usually
w can be implemented using pointers from one object class to another.
involved. Thus, associations
Links and associations can also be implemented by using a separate class that stores which
objects of a class are linked to which objects of another class. Some CASE tools use the role
names of the association relation for the corresponding automatically generated attribute.
2.4.2. Aggregation
Aggregation is a special type of association where the involved classes represent a whole-part
relationship. The aggregate takes the responsibility of forwarding messages to the appropriate
parts. Thus, the aggregate takes the responsibility of delegation and leadership. When an instance
of one object contains instances of some other objects, then aggregation (or composition)
relationship exists between the composite object and the component object. Aggregation is
represented by the diamond symbol at the composite end of a relationship. The number of
instances of the component class aggregated can also be shown as in fig. 37.17 (a).
1 * 1 *
Document Paragraph Line
The aggregation relationship cannot be reflexive (i.e. recursive). That is, an object cannot
contain objects of the same class as itself. Also, the aggregation relation is not symmetric. That
is, two classes A and B cannot contain instances of each other. However, the aggregation
relationship can be transitive. In this case, aggregation may consist of an arbitrary number of
levels.
2.4.3. Composition o m
o t.c
s p
Composition is a stricter form of aggregation, in which the parts are existence-dependent on
the whole. This means that the life of the parts are closely tied to the life of the whole. When the
g
whole is created, the parts are created and when the whole is destroyed, the parts are destroyed.
o
. bl
A typical example of composition is an invoice object with invoice items. As soon as the invoice
object is created, all the invoice items in it are created and as soon as the invoice object is
u p
destroyed, all invoice items in it are also destroyed. The composition relationship is represented
r o
as a filled diamond drawn at the composite-end. An example of the composition relationship is
shown in fig. 37.17 (b).
s g
e 1 nt *
Order
u d Item
s t
ti y37.17(b) Representation of composition
Fig.
chart as boxes attached to a vertical dashed line. Inside the box, the name of the object is written
with a colon separating it from the name of the class, and both the name of the object and class
are underlined. The objects appearing at the top signify that the object already existed when the
use case execution was initiated. However, if some object is created during the execution of the
use case and participates in the interaction (e.g. a method call), then the object should be shown
at the appropriate place on the diagram where it is created. The vertical dashed line is called the
objects lifeline. The lifeline indicates the existence of the object at any particular point of time.
The rectangle drawn on the lifetime is called the activation symbol and indicates that the object
is active as long as the rectangle exists. Each message is indicated as an arrow between the
lifelines of two objects. The messages are shown in chronological order from the top to the
bottom. That is, reading the diagram from the top to the bottom would show the sequence in
which the messages occur. Each message is labeled with the message name. Some control
information can also be included. Two types of control information are particularly valuable.
A condition (e.g. [invalid]) indicates that a message is sent, only if the condition is true.
An iteration marker shows the message is sent many times to multiple receiver objects as
o m
would happen when a collection or the elements of an array are being iterated. The basis
of the iteration can also be indicated e.g. [for every book object].
ot.c
Library Book Library s p
Library
Renewal Book
o g Book Library
bl
Boundary Member
controller Register
p .
o u
renewBook r
g findMemberBorrowing
s
display
Borrowing
ent
selectBooks
u d
bookSelected
s t *find
[reserved]
it y [reserved]
apology .c apology
w confirm
update
w
w
confirm updateMemberBorrowing
Fig. 37.18 Sequence diagram for the renew book use case
The sequence diagram for the book renewal use case for the Library Automation Software is
shown in fig. 37.18. The development of the sequence diagram in the development methodology
would help us in determining the responsibilities of the different classes; i.e. what methods
should be supported by each class.
o m
way to describe the relative sequencing of the messages in this diagram. The collaboration
diagram for the example of fig. 37.18 is shown in fig. 37.19. The use of the collaboration
s p
g
Library
o
bl
Book 6: *find Book
10: confirm p .
Register
[reserved]
o u 9: update
8: apology 5: bookSelected
gr [reserved]
s
1: renewBook
ent 7: apology
Library
u d
Library Book
Boundary
st Renewal
controller
i ty
3: displayBorrowing
.c 2: findMemberBorrowing
w
4: selectBooks
w
w12: confirm Library
Member
Fig. 37.19 Collaboration diagram for the renew book use case
a state with an internal action and one or more outgoing transitions which automatically follow
the termination of the internal activity. If an activity has more than one outgoing transition, then
these must be identified through conditions. An interesting feature of the activity diagrams is the
swim lanes. Swim lanes enable you to group activities based on who is performing them, e.g.
academic department vs. hostel office. Thus swim lanes subdivide activities based on the
responsibilities of some components. The activities in a swim lane can be assigned to some
model elements, e.g. classes or some component, etc.
Activity diagrams are normally employed in business process modelling. This is carried out
during the initial stages of requirements analysis and specification. Activity diagrams can be very
useful to understand complex processing activities involving many components. Later these
diagrams can be used to develop interaction diagrams which help to allocate activities
(responsibilities) to classes.
s g record
t
n receive register in
d e fees
courses
t u conduct
y s medical
it allot
examinatio
.c room
w
w
issue
w
identity
card
The student admission process in IIT is shown as an activity diagram in fig. 37.20. This
shows the part played by different components of the Institute in the admission procedure. After
the fees are received at the account section, parallel activities start at the hostel office, hospital,
and the Department. After all these activities are completed (this synchronization is represented
as a horizontal line), the identity card can be issued to a student by the Academic section.
o m
problem is overcome in UML by using state charts. The state chart formalism was proposed by
David Harel [1990]. A state chart is a hierarchical model of a system and introduces the concept
of a composite state (also called nested state).
o t.c
Actions are associated with transitions and are considered to be processes that occur quickly
s p
and are not interruptible. Activities are associated with states and can take a longer time. An
activity can be interrupted by an event.
o g
.
Order received bl
Unprocessed orderu
p
r o
s g
[reject] checked
n t [accept] checked
d e
t u
Rejected order
y s Accepted order
c it
. items not available] processed
w
[some
w
w [all items available] processed/deliver
all items available
Pending order Fulfilled order
newsupply
o m
design approach. OOD paradigm suggests that the natural objects (i.e. the entities) occurring in a
problem should be identified first and then implemented. Object-oriented design techniques not
o t.c
only identify objects, but also identify the internal details of these identified objects. Also, the
relationships existing among different objects are identified and represented in such a way that
the objects can be easily implemented using a programming language.
s p
g
The term object-oriented analysis (OOA) refers to a method of developing an initial model of
o
bl
the software from the requirements specification. The analysis model is refined into a design
.
model. The design model can be implemented using a programming language. The term object-
p
u
oriented programming refers to the implementation of programs using object-oriented concepts.
o
3.1. Design Patterns gr
s
tproblems that recur in many applications. A pattern
Design patterns are reusable solutions to
e n
serves as a guide for creating a goodddesign. Patterns are based on sound common sense and
the application of fundamental design t uprinciples. These are created by people who spot repeating
s solutions are typically described in terms of class and
ti yof design patterns are expert pattern, creator pattern, controller
themes across designs. The pattern
interaction diagrams. Examples
pattern, etc. . c
w the model of a good solution, design patterns include a clear
In addition to providing
w
specification of the problem,
would not work. Thus,w a designandpattern
also explain the circumstances in which the solution would and
has four important parts:
The problem
The context in which the problem occurs
The solution
The context within which the solution works
Solution: Assign responsibility to the information expert the class that has the information
necessary to fulfill the required responsibility. The expert pattern expresses the common
intuition that objects do things related to the information they have. The class diagram and
collaboration diagrams for this solution to the problem of which class should compute the
total sales is shown in the fig. 37.1.1.
(a)
1: total
SaleTransaction
2: subtotal
Saleltem
3: price
o m
ItemSpecification
o t.c
(b)
s p
o g
bl
Fig. 37.22 Expert pattern: (a) Class diagram (b) Collaboration diagram
Creator Pattern p .
ou
r
Problem: Which class should be responsible for creating a new instance of some class?
g
s
Solution: Assign a class C1 the responsibility to create an instance of class C2, if one or more
of the following are true:
ent
C1 is an aggregation of objects of type C2
C1 contains objects of type C2
u d
t
C1 closely uses objects of type C2
s
i t y
C1 has the data that would be required to initialize the objects of type C2, when they
.c
are created
Controller Patternw
w
w
Problem: Who should be responsible for handling the actor requests?
Solution: For every use case, there should be a separate controller object which would be
responsible for handling requests from the actor. Also, the same controller should be used for
all the actor requests pertaining to one use case so that it becomes possible to maintain the
necessary information about the state of the use case. The state information maintained by a
controller can be used to identify the out-of-sequence actor requests, e.g. whether voucher
request is received before arrange payment request.
Model View Separation Pattern
Problem: How should the non-GUI classes communicate with the GUI classes?
Context in which the problem occurs: This is a very commonly occurring pattern which is
found in almost every problem. Here, model is a synonym for the domain layer objects, view
is a synonym for the presentation layer objects such as the GUI objects.
Solution: The model view separation pattern states that model objects should not have direct
knowledge (or be directly coupled) of the view objects. This means that there should not be
any direct calls from other objects to the GUI objects. This results in a good solution, because
the GUI classes are related to a particular application whereas the other classes may be
reused.
There are actually two solutions to this problem which work in different circumstances.
These are as follows:
Solution 1: Polling or Pull from above
It is the responsibility of a GUI object to ask for the relevant information from the other
objects, i.e. the GUI objects pull the necessary information from the other objects whenever
required.
This model is frequently used. However, it is inefficient for certain applications. For
example, simulation applications which require visualization, the GUI objects would not
know when the necessary information becomes available. Other examples are, monitoring
applications such as network monitoring, stock market quotes, and so on. In these situations,
o m
a push-from-below model of display update is required. Since push-from-below is not an
objects is required.
o t.c
acceptable solution, an indirect mode of communication from the other objects to the GUI
p .
manager class can be defined as one which keeps track of the subscribers and the types of
events they are interested in. An event is published by the publisher by sending a message to
u
the event manager object. The event manager notifies all registered subscribers usually via a
o
r
parameterized message (called a callback). Some languages specifically support event
g
s
manager classes. For example, Java provides the EventListener interface for such purposes.
they normally do not include any processing logic. However, they may be responsible for
validating inputs, formatting, outputs, etc. The boundary objects were earlier being called the
interface objects. However, the term interface class is being used for Java, COM/DCOM, and
UML with different meaning. A recommendation for the initial identification of the boundary
classes is to define one boundary class per actor/use case pair.
Fig. 37.23 A typical realization of a use case through the collaboration of boundary,
controller, and entity objects
o m
3.2.5. Identification of Entity Objects o t.c
s p
g
One of the most important steps in any object-oriented design methodology is the
o
bl
identification of objects. In fact, the quality of the final design depends to a great extent on the
p .
appropriateness of the objects identified. However, to date no formal methodology exists for
identification of objects. Several semi-formal and informal approaches have been proposed for
o u
object identification. These can be classified into the following broad classes:
Grammatical analysis of the problem description
gr
Derivation from data flow s
ent
Derivation from the entity relationship (E-R) diagram
A widely accepted object identification approach is the grammatical analysis approach.
u d
Grady Booch originated the grammatical analysis approach [1991]. In Boochs approach, the
st
nouns occurring in the extended problem description statement (processing narrative) are
i ty
mapped to objects and the verbs are mapped to methods.
.c
w Identification Method
3.3. Boochs Object
w
w
Boochs object identification approach requires a processing narrative of the given problem
to be first developed. The processing narrative describes the problem and discusses how it can be
solved. The objects are identified by noting down the nouns in the processing narrative.
Synonym of a noun must be eliminated. If an object is required to implement a solution, then it is
said to be part of the solution space. Otherwise, if an object is necessary only to describe the
problem, then it is said to be a part of the problem space. However, several of the nouns may not
be objects. An imperative procedure name, i.e., noun form of a verb actually represents an action
and should not be considered as an object. A potential object found after lexical analysis is
usually considered legitimate, only if it satisfies the following criteria:
Retained information: Some information about the object should be remembered for the
system to function. If an object does not contain any private data, it can not be expected to
play any important role in the system.
Multiple attributes: Usually objects have multiple attributes and support multiple methods.
It is very rare to find useful objects which store only a single data element or support only a
single method, because an object having only a single data element or method is usually
implemented as a part of another object.
Common operations: A set of operations can be defined for potential objects. If these
operations apply to all occurrences of the object, then a class can be defined. An attribute or
operation defined for a class must apply to each instance of the class. If some of the attributes
or operations apply only to some specific instances of the class, then one or more subclasses
can be needed for these special objects.
Normally, the actors themselves and the interactions among themselves should be excluded
from the entity identification exercise. However, some times there is a need to maintain
information about an actor within the system. This is not the same as modeling the actor. These
classes are sometimes called surrogates. For example, in the Library Information System (LIS)
we would need to store information about each library member. This is independent of the fact
that the library member also plays the role of an actor of the system.
Although the grammatical approach is simple and intuitively appealing, yet through a naive
use of the approach, it is very difficult to achieve high quality results. In particular, it is very
o m
difficult to come up with useful abstractions simply by doing grammatical analysis of the
s p
3.3.1. An Example: Tic-Tac-Toe
o g
Tic-tac-toe is a computer game in which a human player . bland the computer make alternative
moves on a 3 x 3 square. A move consists of markingpa previously unmarked square. A player
who first places three consecutive marks along aostraight u line (i.e., along a row, column, or
diagonal) on the square, wins the game. As soonr as either the human player or the computer
s g be displayed. If neither player manages to get
wins, a message congratulating the winner should
three consecutive marks along a straight line, n t but all the squares on the board are filled up, then
the game is drawn. The computer always e
u d tries to win a game.
Class diagram is shown in fig. 37.24. The messages of the sequence diagram have
been populated as methods of the corresponding classes.
Board
(a)
(b)
Fig. 37.24 (a) Initial domain model (b) Refined domain model
o m
t.c
Board PlayMoveBoundary
int position [9]
p o
checkMoveValidity
checkResult gs
AnnounceInvalidMove
announceResult
o
bl
playMove displayBoard
p .
o u
gr
ts
Controller
e n
u d
st announceResult
announceInvalidMove
it y
.c Fig. 37.25 Class diagram
w
w
w
:playMove :playMove
:Board
Boundary Controller
Move acceptMove
checkMoveValidity
[invalid move]
[invalid move] announceInvalidMove
announcelnvalidMove checkWinner
[game over]
announceResult
[game over]
o m
announceResult
o t.c
p
playMove
s
o g
checkWinner
[game over]
. bl [game over]
announceResult
u p announceResult
getBoardPositions
r o
displayBoardPosition
s g
nt
[game not over]
promtNextMove
d e
t u
y s
Fig. 37.26tSequence diagram for the play move use case
c i
4. Exercises w
.
w
1. w as True or False. Justify your answer.
Mark the following
a. All software engineering principles are backed by either scientific basis or theoretical
proof.
b. Data abstraction helps in easy code maintenance and code reuse.
c. Classes can be considered equivalent to Abstract Data Types (ADTs).
d. The inheritance relationship describes has a relationship among classes.
e. Inheritance feature of the object oriented paradigm helps in code reuse.
f. An important advantage of polymorphism is facilitation of reuse.
g. Using dynamic binding a programmer can send a generic message to a set of objects
which may be of different types i.e. belonging to different classes.
h. In dynamic binding, address of an invoked method is known only at the compile time
i. For any given problem, one should construct all the views using all the diagrams
provided by UML.
j. Use cases are explicitly dependent among themselves.
k. Each actor can participate in one and only one use case.
l. Class diagrams developed using UML can serve as the functional specification of a
system.
m. The terms method and operation are equivalent concepts and can be used
interchangeably.
n. The aggregation relationship can be recursively defined, i.e. an object can contain
instances of itself.
o. In a UML class diagram, the aggregation relationship defines an equivalence
relationship among objects.
p. The aggregation relationship can be considered to be a special type of association
relationship.
q. Normally, you use an interaction diagram to represent how the behaviour of an object
changes over its life time.
r. The interaction diagrams can be effectively used to describe how the behaviour of an
object changes across several use cases.
o m
s. A state chart diagram is good at describing behaviour that involves multiple objects
t.
cooperating with each other to achieve some behaviour.
o t.c
Facade pattern tells how non-GUI classes should communicate with the GUI classes.
u. The use cases should be tightly tied to the GUI.
s p
g
v. The responsibilities assigned to a controller object are closely related to the
o
bl
realization of a specific use case.
u
x. A large number of message exchanges between objects indicates good delegation and
o
is a sure sign of a design well-done.
gr
s
y. Deep class hierarchies are the hallmark of any good OOD.
2. ent
z. Cohesiveness of the data and methods within a class is a sign of good OOD.
For the following, mark all options which are true.
u d
a. In the object-oriented approach, each object essentially consists of
t
some data that are private to the object
s
it y
a set of functions (or operations) that operate on those data
.c
the set of methods it provides to the other objects for accessing and manipulating
the data
w
w
none of the above
w
b. Redefinition of methods in a derived class which existed in the base class is called
function overloading
operator overloading
method overriding
none of the above
c. The mechanism by which a subclass inherits attributes and methods from more than
one base class is called
single inheritance
multiple inheritance
multi-level inheritance
hierarchical inheritance
d. In the object-oriented approach, the same message can result in different actions
when received by different objects. This feature is referred to as
static binding
dynamic binding
genericity
overloading
e. UML is
a language to model syntax
an object-oriented development methodology
an automatic code generation tool
none of the above
f. In the context of use case diagram, the stick person icon is used to represent
human users
external systems
internal systems
none of the above
g. The design pattern solutions are typically described in terms of
class diagrams
o m
t.c
object diagrams
interaction diagrams
both class and interaction diagrams p o
h.
g s
The class that should be responsible for doing certain things for which it has the
o
bl
necessary information is the solution proposed by
creator pattern
controller pattern p .
expert pattern
ou
facade pattern
gr
s
nt
i. The class that should be responsible for creating a new instance of some class is the
solution proposed by
creator pattern
d e
controller pattern
t u
expert pattern
y s
facade pattern it
j. .c
The objects identified during domain analysis can be classified into
w
boundary objects
w
controller objects
w
entity objects
all of the above
k. The most critical part of the domain modelling activity is to identify
controller objects
boundary objects
entity objects
none of the above
l. The objects which effectively decouple the boundary and entity objects from one
another making the system tolerant to changes of the user interface and processing
logic are
controller objects
boundary objects
entity objects
bl
teachers.
b.
unit, and total price. p .
Bill contains a number of items. Each item describes some commodity, the price of
c.
ou
An order consists of one or more order items. Each order item contains the name of
r
the item, its quantity and the date by which it is required. Each order item is described
g
s
by an item type specification object having details such as its vendor addresses, its
15.
unit price, and the manufacturer.
ent
How should you identify use cases of a system?
16.
u d
What is the difference between an operation and a method in the context of OOD
technique?
st
17.
t y
What does the association relationship among classes represent? Give examples of the
i
.c
association relationship.
18.
w
What does aggregation relationship between classes represent? Give examples of
w
aggregation relationship between classes.
19.
20.
w
Why are objects always passed by reference in all popular programming languages?
What are design patterns? What are the advantages of using design patterns? Write down
some popular design patterns and their necessities.
21. Give an outline of object-oriented development process.
22. What is meant by domain modelling? Differentiate the different types of objects that are
identified during domain analysis.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
38
Testing Embedded
Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Verification Testing
Verifies the correctness of design. Verifies correctness of manufactured
system.
Performed by simulation, hardware Two-part process:
emulation, or formal methods. 1. Test generation: software process
executed once during design.
2. Test application: electrical tests
applied to hardware.
o m
station, routers and firewalls, telecommunication exchanges, robotics and industrial automation,
smart cards, personal digital assistant (PDA) and cellular phone are example of embedded
system.
o t.c
Real-Time System s p
o g
Most, if not all, embedded systems are "real-time". The terms
b l "real-time" and "embedded" are
.
pthe time at which the result is produced.
often used interchangeably. A real-time system is one in which the correctness of a computation
not only depends on its logical correctness, but also on
o u
g r
In hard real time systems if the timing constraints
s of the system are not met, system crash
could be the consequence. For example,
not an option, time deadlines mustebe ntfollowed.
in mission-critical application where failure is
In contrast, hardware testing is concerned mainly with functional verification and self-test after
chip is manufactured. Hardware developers use tools to simulate the correct behavior of circuit
models. Vendors design chips for self-test which mainly ensures proper operation of circuit
models after their implementation. Test engineers who are not the original hardware developers
test the integrated system.
This conventional, divided approach to software and hardware development does not address the
embedded system as a whole during the system design process. It instead focuses on these two
critical issues of testing separately. New problems arise when developers integrate the
components from these different domains.
In theory, unsatisfactory performance of the system under test should lead to a redesign. In
practice, a redesign is rarely feasible because of the cost and delay involved in another complete
design iteration. A common engineering practice is to compensate for problems within the
integrated system prototype by using software patches. These changes can unintentionally affect
the behavior of other parts in the computing system.
o m
t.c provide an excellent
At a higher abstraction level, executable specification languages
o
means to assess embedded-systems designs. Developers can thenptest system-level prototypes
with either formal verification techniques or simulation. A scurrent shortcoming of many
approaches is, however, that the transition from testing atothe g system level to testing at the
b l at the implementation level has
received attention in the research community only p .
implementation level is largely ad hoc. To date, system testing
as coverification, which simulates both
o u
hardware and software components conjointly. Coverification runs simulations of specifications
on powerful computer systems. Commercially ravailable coverification tools link hardware
s g
simulators and software debuggers in the implementation phase of the design process.
t
n employed in mobile products, they are exposed
e
Since embedded systems are frequently
to vibration and other environmental dstresses that can cause them to fail. Some embedded
systems, such as those in automotivetu
y s applications, are exposed to extremely harsh environments.
These applications are preparing
i t embedded systems to meet new and more stringent
.c
requirements of safety and reliability is a significant challenge for designers. Critical applications
and applications with high availability requirements are the main candidates for on-line testing.
w
3. Faults inw w
Embedded Systems
Incorrectness in hardware systems may be described in different terms as defect, error
and faults. These three terms are quite bit confusing. We will define these terms as follows [1]:
Defect: A defect in a hardware system is the unintended difference between the implemented
hardware and its intended design. This may be a process defects, material defects, age defects or
package effects.
Error: A wrong output signal produced by a defective system is called an error. An error is an
effect whose cause is some defect. Errors induce failures, that is, a deviation from
appropriate system behavior. If the failure can lead to an accident, it is a hazard.
Fault: A representation of a defect at the abstraction level is called a fault. Faults are physical
or logical defects in the design or implementation of a device.
o m
temperature and vibration. Some design defects and manufacturing faults escape detection and
combine with wearout and environmental disturbances to cause problems in the field.
o t.c
Hardware faults are classified as stuck-at faults, bridging faults, open faults, power
s p
disturbance faults, spurious current faults, memory faults, transistor faults etc. The most
g
commonly used fault model is that of the stuck-at fault model [1]. This is modeled by having a
o
bl
line segment stuck at logic 0 or 1 (stuck-at 1 or stuck-at 0).
p .
Stuck-at Fault: This is due to the flaws on hardware, and they represent faults of the signal
u
lines. A signal line is the input or output of a logic gate. Each connecting line can have two types
o
gr
of faults: stuck-at-0 (s-a-0) or stuck-at-1 (s-a-1). In general several stuck-at faults can be
simultaneously present in the circuit. A circuit with n lines can have 3n 1 possible stuck line
s
e nt
combinations as each line can be one of the three states: s-a-0, s-a-1 or fault free. Even a
moderate value of n will give large number of multiple stuck-at faults. It is a common practice,
d
therefore to model only single stuck-at faults. An n-line circuit can have at most 2n single stuck-
ut
at faults. This number can be further reduced by fault collapsing technique.
s
i ty by the following properties:
Single stuck-at faults is characterized
1. Fault will occur only.c in one line.
w
3. The fault canw
w
2. The faulty line is permanently set to either 0 or 1.
be at an input or output of a gate.
4. Every fan-out branch is to be considered as a separate line.
Figure 38.1 gives an example of a single stuck-at fault. A stuck-at-1 fault as marked at the output
of OR gate implies that the faulty signal remains 1 irrespective of the input state of the OR gate.
Faultv Response
1
AND True Response
1
AND 0 (1)
0
OR 0(1)
0
Stuck-at-1
Bridging faults: These are due to a short between a group of signal. The logic value of the
shorted net may be modeled as 1-dominant (OR bridge), 0-dominant (AND bridge), or
o m
ot.c
intermediate, depending upon the technology in which the circuit is implemented.
Stuck-Open and Stuck-Short faults: MOS transistor is considered as an ideal switch and two
s p
types of faults are modeled. In stuck-open fault a single transistor is permanently stuck in the
g
open state and in stuck-short fault a single transistor is permanently shorted irrespective of its
o
gate voltage. These are caused by bad connection of signal line.
. bl
whole system. u p
Power disturbance faults: These are caused by inconsistent power supplies and affect the
r o
g
Spurious current faults: that exposed to heavy ion affect whole system.
s
nt
Operational faults are usually classified according to their duration:
d e
Permanent faults exist indefinitely if no corrective action is taken. These are mainly
manufacturing faults and are not frequently occur due to change in system operation or
environmental disturbances. t u
y s
Intermittent faults appear, disappear, and reappear frequently. They are difficult to predict, but
it
their effects are highly correlated. Most of these faults are due to marginal design or
.c
manufacturing steps. These faults occur under a typical environmental disturbance.
w
w
Transient faults appear for an instant and disappear quickly. These are not correlated with each
w
other. These are occurred due random environmental disturbances. Power disturbance faults and
spurious current faults are transient faults.
currently applied to hardware-software designs have their origins in either the hardware [9] or
the software [10] domains.
The core internal test developed by a core provider need to be adequately described,
ported and ready for plug and play, i.e., for interoperability, with the system chip test. For an
internal test to accompany its corresponding core and be interoperable, it needs to be described in
an commonly accepted, i.e., standard, format. Such a standard format is currently being
developed by IEEE PI 500 and referred to as standardization of a core test description language
[22].
In SOCs cores are often embedded in several layers of user-defined or other core-based
logic, and direct physical access to its peripheries is not available from chip I/Os. Hence, an
electronic access mechanism is needed. This access mechanism requires additional logic, such as
a wrapper around the core and wiring, such as a test access mechanism to connect core
peripheries to the test sources and sinks. The wrapper performs switching between normal mode
Version 2 EE IIT, Kharagpur 9
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
and the test mode(s) and the wiring is meant to connect the wrapper which surrounds the core to
the test source and sink. The wrapper can also be utilized for core isolation. Typically, a core
needs to be isolated from its surroundings in certain test modes. Core isolation is often required
on the input side, the output side, or both.
source sink
test access embedded test access
mechnism core mechnism
wrapper
Fig. 38. 2 Overview of the three elements in an embedded-core test approach: (1) test
pattern source, (2) test access mechanism, and (3) core test wrapper [5].
o m
A conceptual architecture for testing embedded-core-based SOCs is shown in Figure 38.2 It
consists of three structural elements:
ot.c
1. Test Pattern Source and Sink
s p
o g
bl
The test pattern source generates the test stimuli for the embedded core, and the test pattern sink
p .
compares the response(s) to the expected response(s). Test pattern source as well as sink can be
implemented either off-chip by external Automatic Test Equipment (ATE), on-chip by Built-In
ou
Self-Test (or Embedded ATE), or as a combination of both. Source and sink do not need to be of
gr
the same type, e.g., the source of an embedded core can be implemented off-chip, while the sink
s
nt
of the same core is implemented on-chip. The choice for a certain type of source or sink is
determined by (1) The type of circuitry in the core, (2) The type of pre-defined tests that come
e
with the core and (3) Quality and Cost considerations. The type of circuitry of a certain core and
d
u
the type of predefined tests that come with the core determine which implementation options are
t
s
left open for test pattern source and sink. The actual choice for a particular source or sink is in
y
it
general determined by quality and cost considerations. On-chip sources and sinks provide better
.c
accuracy and performance related defect coverage, but at the same time increase the silicon area
w
and hence might reduce manufacturing yield.
w
wMechanism
2. Test Access
The test access mechanism takes care of on-chip test pattern transport. It can be used (1) to
transport test stimuli from the test pattern source to the core-under-test, and (2) to transport test
responses from the core-under-test to the test pattern sink. The test access mechanism is by
definition, implemented on-chip. Although for one core often the same type of' test access
mechanism is used for both stimulus as well as response transportation, this is not required and
various combinations may co-exist. Designing a test access mechanism involves making a trade-
off between the transport capacity (bandwidth) of the mechanism and the test application cost it
induces. The bandwidth is limited by the bandwidth of source and sink and the amount of silicon
area one wants to spend on the test access mechanism itself.
o m
t.c
Apart from these mandatory modes, a core test wrapper might have several optional modes, e.g.,
a detach mode to disconnect the core from its system chip environment and the test access
o
mechanism, or a bypass mode for the test access mechanisms. Depending on the implementation
p
g s
of the test access mechanism, some of the above modes may coincide. For example, if the test
access mechanism uses existing functionality, normal operation and core test mode may
o
coincide.
. bl
u p
Pre-designed cores have their own internal clock distribution system. Different cores
r o
have different clock propagation delays, which might result in clock skew for inter-core
g
communication. The system-IC designer should take care of this clock skew issue in the
s
nt
functional communication between cores. However, clock skew might also corrupt the data
transfer over the test access mechanism, especially if this mechanism is shared by multiple cores.
d e
The core test wrapper is the best place to have provisions for clock skew prevention in the test
access paths between the cores. t u
y s
it
In addition to the test integration and interdependence issues, the system chip composite
.c
test requires adequate test scheduling. Effective test scheduling for SOCs is challenging because
w
it must address several conflicting goals: (1) total SOC testing time minimization, (2) power
w
dissipation, (3) precedence constraints among tests and (4) area overhead constraints [2]. Also,
w
test scheduling is necessary to run intra-core and inter-core tests in certain order not to impact the
initialization and final contents of individual cores.
5. On-Line Testing
On-line testing addresses the detection of operational faults, and is found in computers that
support critical or high-availability applications [23]. The goal of on-line testing is to detect fault
effects, that is, errors, and take appropriate corrective action. On-line testing can be performed by
external or internal monitoring, using either hardware or software; internal monitoring is referred
to as self-testing. Monitoring is internal if it takes place on the same substrate as the circuit under
test (CUT); nowadays, this usually means inside a single ICa system-on-a-chip (SOC).
There are four primary parameters to consider in the design of an on-line testing scheme:
Error coverage (EC): This is defined as the fraction of all modeled errors that are detected,
usually expressed in percent. Critical and highly available systems require very good error
detection or error coverage to minimize the impact of errors that lead to system failure.
Error latency (EL): This is the difference between the first time the error is activated and the
first time it is detected. EL is affected by the time taken to perform a test and by how often tests
are executed. A related parameter is fault latency (FL), defined as the difference between the
onset of the fault and its detection. Clearly, FL EL, so when EL is difficult to determine, FL is
often used instead.
Space redundancy (SR): This is the extra hardware or firmware needed to perform on-line
testing.
Time redundancy (TR): This is the extra time needed to perform on-line testing.
An ideal on-line testing scheme would have 100% error coverage, error latency of 1
clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the
o m
CUT, and impose no functional or structural restrictions on the CUT. To cover all of the fault
types described earlier, two different modes of on-line testing are employed: concurrent testing
o t.c
which takes place during normal system operation, and non-concurrent testing which takes place
while normal operation is temporarily suspended. These operating modes must often be
s p
overlapped to provide a comprehensive on-line testing strategy at acceptable cost.
o g
5.1 Non-concurrent testing
.bl
p
u or time-triggered (periodic), and is
This form of testing is either event-triggered (sporadic)
characterized by low space and time redundancy.o Event-triggered testing is initiated by key
r
events or state changes in the life of a system,gsuch as start-up or shutdown, and its goal is to
detect permanent faults. It is usually advisablets to detect and repair permanent faults as soon as
e n
possible. Event-triggered tests resemble manufacturing tests.
A common method of providing hardware support for concurrent testing, especially for
detecting control errors, is a watchdog timer. This is a counter that must be reset by the system
on a repetitive basis to indicate that the system is functioning properly. A watchdog timer is
based on the assumption that the system is fault-freeor at least aliveif it is able to perform
the simple task of resetting the timer at appropriate intervals, which implies that control flow is
correctly traversing timer reset points.
Version 2 EE IIT, Kharagpur 12
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
t.c
6.2 Test Programming
p o
The test program comprises modules for the generation of the test vectors and the corresponding
g s
expected responses from a circuit with normal behavior. CAD tools are used to automate the
o
bl
generation of optimized test vectors for the purpose [1,24]. Figure. 38.3 illustrates the basic steps
in the development of a test program.
p .
o u
Chip specifications
gr Test generation Logic design
(from simulators)
s
ent
Test plan
u d Vectors
Physical design
Test types
st
i t y Test Pin assignments
. c
Timing specs Program
Generator
w
w Test program
w
Fig. 38.3 Test program generation
(with or without heuristics), or by pseudo-random methods. On the other hand, for (2), a test is
subsequently applied many times to each integrated circuit and thus must be efficient both in
space (storage requirements for the patterns) and in time. The main considerations in evaluating
a test set are: (i) the time to construct a minimal test set; (ii) the size of the test set; (iii) the time
involved to carry out the test; and (iv) the equipment required (if external). Most algorithmic test
pattern generators are based on the concept of sensitized paths.
The Sensitized Path Method is a heuristic approach to generating tests for general
combinational logic networks. The circuit is assumed to have only a single fault in it. The
sensitized path method consists of two parts:
1. The creation of a SENSITIZED PATH from the fault to the primary output. This involves
assigning logic values to the gate inputs in the path from the fault site to a primary output, such
that the fault effect is propagated to the output.
2. The JUSTIFICATION operation, where the assignments made to gate inputs on the sensitized
path is traced back to the primary inputs. This may require several backtracks and iterations.
o m
In the case of sequential circuits the same logic is applied but before that the sequential elements
t.c
are explicitly driven to a required state using scan based design-for-test (DFT) circuitry [1,24].
p o
The best-known algorithms are the D-algorithm, PODEM and FAN [1,24]. Three steps can be
s
identified in most automatic test pattern generation (ATPG) programs: (a) listing the signals on
g
o
the inputs of a gate controlling the line on which a fault should be detected; (b) determining the
bl
primary input conditions necessary to obtain these signals (back propagation) and sensitizing the
.
u p
path to the primary outputs such that the signals and faults can be observed; (c) repeating this
procedure until all detectable faults in a given fault set have been covered.
r o
6.4 ATPG for Hardware-Software s g Covalidation
n t
Several automatic test generation (ATG)e approaches have been developed which vary in the
u
class of search algorithm used, the fault dmodel assumed, the search space technique used, and the
design abstraction level used. In s t to perform test generation for the entire system, both
order
hardware and software component
i y behaviors must be described in a uniform manner. Although
tpossible,
many behavioral formats are
. c ATG approaches have focused on CDFG and FSM
behavioral models.
w
w
Two classeswof search algorithms have been explored, fault directed and coverage
directed. Fault directed techniques successively target a specific fault and construct a test
sequence to detect that fault. Each new test sequence is merged with the current test sequence
(typically through concatenation) and the resulting fault coverage is evaluated to determine if test
generation is complete. Fault directed algorithms have the advantage that they are complete in
the sense that a test sequence will be found for a fault if a test sequence exists, assuming that
sufficient CPU time is allowed. For test generation, each CDFG path can be associated with a set
of constraints which must be satisfied to traverse the path. Because the operations found in a
hardware-software description can be either boolean or arithmetic, the solution method chosen
must be able to handle both types of operations. Constraint logic programming (CLP) techniques
[27] are capable to handle a broad range of constraints including non-linear constraints on both
boolean and arithmetic variables. State machine testing has been accomplished by defining a
transition tour which is a path which traverses each state machine transition at least once
26ransition tours have been generated by iteratively improving an existing partial tour by
Coverage directed algorithms seek to improve coverage without targeting any specific
fault. These algorithms heuristically modify an existing test set to improve total coverage, and
then evaluate the fault coverage produced by the modified test set. If the modified test set
corresponds to an improvement in fault coverage then the modification is accepted. Otherwise
the modification is either rejected or another heuristic is used to determine the acceptability of
the modification. The modification method is typically either random or directed random. An
example of such a technique is presented in [25] which uses a genetic algorithm to successively
improve the population of test sequences.
is a significant part of the overall system. This is considered as white-box testing. Therefore,
software validation testing is also the responsibility of the developer.
o m
system execution. For this aspect, grey-box testing is the preferred testing method. In most cases,
only a knowledge of the interface to the module is required to implement and execute
s p
7.5 System Integration Testing
o g
. bl
u p
The module to be tested starts from a set of components within a single node and eventually
encompasses all system nodes up to a set of distributed nodes. The Points of Control and
r o
Observations (PCOs) are a mix of RTOS and network-related communication protocols, such as
g
RTOS events and network messages. In addition to a component, a Virtual Tester can also play
s
ent
the role of a node. As for software integration, the focus is on validating the various interfaces.
Grey-box testing is the preferred testing method. System integration testing is typically the
d
responsibility of the system integration team.
u
t
7.6 System ValidationysTesting
c it
. a complete implementation subsystem or the complete embedded
The module to be tested is now
system. The objectives ofw
w this final aspect are several:
Meet external-actor functional requirements. Note that an external-actor might either be a
w
device in a telecom network (say if our embedded system is an Internet Router), or a
person (if the system is a consumer device), or both (an Internet Router that can be
administered by an end user).
Perform final non-functional testing such as load and robustness testing. Virtual testers
can be duplicated to simulate load, and be programmed to generate failures in the system.
Ensure interoperability with other connected equipment. Check conformance to
applicable interconnection standards. Going into details for these objectives is not in the
scope of this article. Black-box testing is the preferred method: The tester typically
concentrates on both frequently used and potentially risky or dangerous use-case
instances.
The test data selection technique discussed in [21] first simulates behaviors of embedded
system to software program from requirement specification. Then hardware faults, after being
converted to software faults, are injected into the simulated program. And finally, effective test
data are selected to detect faults caused by the interactions between hardware and software.
o m
t.c
9. Conclusion
p o
Rapid advances in test development techniques are needed to reduce the test cost of million-gate
s
SOC devices. In this chapter a number of state-of-the-art techniques are discussed for testing of
g
o
embedded systems. Modular test techniques for digital, mixed-signal, and hierarchical SOCs
bl
must develop further to keep pace with design complexity and integration density. The test data
.
u p
bandwidth needs for analog cores are significantly different than that for digital cores, therefore
unified top-level testing of mixed-signal SOCs remains major challenge. This chapter also
r o
described granular based embedded software testing technique.
s g
References
e nt
d
u Essentials of Electronic Testing Kluwer academic
[1]
t
M. L. Bushnell and V. D Agarwal,
s
Publishers, Norwell, MA, 2000.
[2] ti y for Embedded Software?, IEEE Computer, pp 18-26,
E. A. Lee, What's Ahead
September, 2000. .c
[3] w forConference,
E. A. Lee, Computing embedded systems, proceeding of IEEE Instrumentation and
w
Measurement Technology Budapest, Hungary, May, 2001.
[4] Semiconductorw Industry Association, International Technology Roadmap for
Semiconductors, 2001 Edition, http://public.itrs.net/Files/2001ITRS/Home.html
[5] Y. Zorian, E.J.Marinissen, and S.Dey, Testing Embedded-Core Based System Chips,
IEEE Computer, 32,52-60,1999
[6] M-C Hsueh, T. K.Tsai, and R. K. Lyer, Fault Injection Techniques and Tools, IEEE
Computer, pp75-82, April,1997.
[7] V. Encontre, Testing Embedded Systems: Do You Have The GuTs for It? www-
128.ibm.com/developerworks/rational/library/content/03July/1000/1050/1050.pdf
[8] D. D. Gajski and F. Vahid, Specification and design of embedded hardware-software
systems, IEEE Design and Test of Computers, vol. 12, pp. 5367, 1995.
[9] S. Dey, A. Raghunathan, and K. D. Wagner, Design for testability techniques at the
behavioral and register-transfer level, Journal of Electronic Testing: Theory and
Applications (JETTA), vol. 13, pp. 7991, October 1998.
[10] B. Beizer, Software Testing Techniques, Second Edition, Van Nostrand Reinhold, 1990.
[18]
47, pp. 214, January 1998.
o t.c
N. Malik, S. Roberts, A. Pita, and R. Dobson, Automaton: an autonomous coverage-
p
based multiprocessor system verification environment, in IEEE International Workshop
s
on Rapid System Prototyping, pp. 168172, June 1997.
o g
bl
[19] K.-T. Cheng and A. S. Krishnakumar, Automatic functional test bench generation using
1993. p .
the extended finite state machine model, in Design Automation Conference, pp. 16,
[20] u
J. P. Bergmann and M. A. Horowitz, Improving coverage analysis and test generation
o
r
for large designs, in International Conference on Computer-Aided Design, pp. 580583,
g
1999.
s
[21]
nt
A. Sung and B. Choi, An Interaction Testing Technique between Hardware and
e
Software in Embedded Systems, Proceedings of Ninth Asia-Pacific Software
u d
Engineering Conference, 2002. 4-6 Dec. 2002 Page(s):457 464
[22]
st
IEEE P I500 Web Site. http://grouper.ieee.org/groups/I SOO/.
[23]
t y
H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE
i
[24] .c
Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24
M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and
w
w
Testable Design, IEEE Press 1990.
[25]
w
F. Corno, M. Sonze Reorda, G. Squillero, A. Manzone, and A. Pincetti, Automatic test
bench generation for validation of RT-level descriptions: an industrial experience, in
Design Automation and Test in Europe, pp. 385389, 2000.
[26] R. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, Architecture validation for
processors, in International ymposium on Computer Architecture, pp. 404413, 1995.
[27] P. Van Hentenryck, Constraint Satisfaction in Logic Programming, MIT Press, 1989.
Problems
1. How testing differs from verification?
2. What is embedded system? Define hard real-time system and soft real-time system
with example.
3. Why testing embedded system is difficult?
4. How hardware testing differs from software testing?
5. What is co-testing?
6. Distinguish between defects, errors and faults with example.
7. Calculate the total number of single and multiple stuck-at faults for a logic circuit
with n lines.
8. In the circuit shown in Figure 38.4 if any of the following tests detect the fault x1 s-
a-0?
a) (0,1,1,1)
b) (1,0,1,1)
c) (1,1,0,1)
d) (1,0,1,0)
x1
o m
t.c
po
z
x2
g s
o
x3
. bl
x4
u p
r o
Fig. P1
s g
9. ent
Define the following fault models using examples where possible:
u d
a) Single and multiple stuck-at fault
b) Bridging fault
st
t y
c) Stuck-open and stuck-short fault
i
d) Operational fault
.c
10.
w
What is meant by co-validation fault model?
11.
w
Describe different software fault model?
12.
13.
w
Describe the basic structure of core-based testing approach for embedded system.
What is concurrent or on-line testing? How it differs from non-concurrent testing?
14. Define error coverage, error latency, space redundancy and time redundancy in view
of on-line testing?
15. What is a test vector? How test vectors are generated? Describe different techniques
for test pattern generation.
16. Define the following for software testing:
a) Software unit testing
b) Software integration testing
c) Software validation testing
d) System unit testing
e) System integration testing
f) System validation testing
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
39
Design for Testability
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
1. Introduction o m
o t.c
The embedded system is an information processing system that consists of hardware and
s p
software components. Nowadays, the number of embedded computing systems in areas such as
g
telecommunications, automotive electronics, office automation, and military applications are
o
. bl
steadily growing. This market expansion arises from greater memory densities as well as
improvements in embeddable processor cores, intellectual-property modules, and sensing
u p
technologies. At the same time, these improvements have increased the amount of software
r o
needed to manage the hardware components, leading to a higher level of system complexity.
g
Designers can no longer develop high-performance systems from scratch but must use
s
nt
sophisticated system modeling tools.
e
The increased complexity of embedded systems and the reduced access to internal nodes has
d
u
made it not only more difficult to diagnose and locate faulty components, but also the functions
t
s
of embedded components may be difficult to measure. Creating testable designs is key to
y
t
developing complex hardware and/or software systems that function reliably throughout their
i
.c
operational life. Testability can be defined with respect to a fault. A fault is testable if there
exists a well-specified procedure (e.g., test pattern generation, evaluation, and application) to
w
w
expose it, and the procedure is implementable with a reasonable cost using current technologies.
w
Testability of the fault therefore represents the inverse of the cost in detecting the fault. A circuit
is testable with respect to a fault set when each and every fault in this set is testable.
Design-for-testability techniques improve the controllability and observability of internal nodes,
so that embedded functions can be tested. Two basic properties determine the testability of a
node: 1) controllability, which is a measure of the difficulty of setting internal circuit nodes to 0
or 1 by assigning values to primary inputs (PIs), and 2) observability, which is a measure of the
difficulty of propagating a nodes value to a primary output (PO) [1-3]. A node is said to be
testable if it is easily controlled and observed. For sequential circuits, some have added
predictability, which represents the ability to obtain known output values in response to given
input stimuli. The factors affecting predictability include initializability, races, hazards,
oscillations, etc. DFT techniques include analog test busses and scan methods. Testability can
also be improved with BIST circuitry, where signal generators and analysis circuitry are
implemented on chip [1, 3-4]. Without testability, design flaws may escape detection until a
product is in the hands of users; equally, operational failures may prove difficult to detect and
diagnose.
Increased embedded system complexity makes thorough assessment of system integrity by
testing external black-box behavior almost impossible. System complexity also complicates test
equipment and procedures. Design for testability should increase a systems testability, resulting
in improved quality while reducing time to market and test costs.
Traditionally, hardware designers and test engineers have focused on proving the correct
manufacture of a design and on locating and repairing field failures. They have developed
several highly structured and effective solutions to this problem, including scan design and self
test. Design verification has been a less formal task, based on the designers skills. However,
designers have found that structured design-for-test features aiding manufacture and repair can
significantly simplify design verification. These features reduce verification cycles from weeks
to days in some cases.
o m
In contrast, software designers and test engineers have targeted design validation and
verification. Unlike hardware, software does not break during field use. Design errors, rather
o t.c
than incorrect replication or wear out, cause operational bugs. Efforts have focused on improving
specifications and programming styles rather than on adding explicit test facilities. For example,
s p
modular design, structured programming, formal specification, and object orientation have all
proven effective in simplifying test.
o g
Although these different approaches are effective when we
b l can cleanly separate a designs
.
p level test strategies. Yet, we may not
hardware and software parts, problems arise when boundaries blur. For example, in the early
design stages of a complex system, we must define systemu
o and which in software. In other cases,
have decided which parts to implement in hardware r
software running on general-purpose hardwaregmay initially deliver certain functions that we
subsequently move to firmware or hardwarets to improve performance. Designers must ensure a
n
e which draw hardware and software test techniques
testable, finished design regardless of implementation decisions. Supporting hardware-software
codesign requires cotesting techniques,
u d
together into a cohesive whole.
st
2. Design for Testability i t y Techniques
.c
w refers to those design techniques that make the task of subsequent
w
Design for testability (DFT)
w
testing easier. There is definitely no single methodology that solves all embedded system-testing
problems. There also is no single DFT technique, which is effective for all kinds of circuits. DFT
techniques can largely be divided into two categories, i.e., ad hoc techniques and structured
(systematic) techniques.
DFT methods for digital circuits:
Ad-hoc methods
Structured methods:
Scan
Partial Scan
Built-in self-test (discussed in Lesson 34)
Boundary scan (discussed in Lesson 34)
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of
the most important steps in designing a testable chip is to first partition the chip in an
appropriate way such that for each functional module there is an effective (DFT)
technique to test it. Partitioning must be done at every level of the design process, from
architecture to circuit, whether testing is considered or not. Partitioning can be functional
(according to functional module boundaries) or physical (based on circuit topology).
Partitioning can be done by using multiplexers and/or scan chains.
o m
Test access points must be inserted to enhance controllability & observability of the
o t.c
circuit. Test points include control points (CPs) and observation points (OPs). The CPs
are active test points, while the OPs are passive ones. There are also test points, which are
p
both CPs and OPs. Before exercising test through test points that are not PIs and POs, one
s
g
should investigate into additional requirements on the test points raised by the use of test
o
bl
equipments.
.
Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on
p
u
reset mechanism controllable from primary inputs is the most effective and widely used
o
approach.
g r
s
Test control must be provided for difficult-to-control
t signals.
n
Automatic Test Equipment (ATE) requirements
e such as pin limitation, tri-stating, timing
resolution, speed, memory depth,
u detc., should be considered during the design process to
driving capability, analog/mixed-signal support,
internal/boundary scan support,
s
avoid delay of the project and
t unnecessary investment on the equipments.
i tyand clocks should be disabled during test. To guarantee tester
.c oscillator and clock generator circuitry should be isolated
Internal oscillators, PLLs
synchronization, internal
during the test of w
w the functional circuitry. The internal oscillators and clocks should also
be tested separately.
Analog and w digital circuits should be kept physically separate. Analog circuit testing is
very much different from digital circuit testing. Testing for analog circuits refers to real
measurement, since analog signals are continuous (as opposed to discrete or logic signals
in digital circuits). They require different test equipments and different test
methodologies. Therefore they should be tested separately.
Things to be avoided
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in
the combinational logic can give rise to oscillation for certain inputs. Since no clocking is
employed, timing is continuous instead of discrete, which makes tester synchronization
virtually impossible, and therefore only functional test by application board can be used.
accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the
scan insertion operation.
Input/output of each scan shift register must be available on PI/PO.
Combinational ATPG is used to obtain tests for all testable faults in the combinational
logic.
Shift register tests are applied and ATPG tests are converted into scan sequences for use
in manufacturing test.
Primary Primary
Inputs Outputs
SFF SCANOUT
Combinational
Logic
o m
t.c
SFF
p o
SFF
g s
o
TC . bl
SCANIN
u p
CLK
r o
s
Fig. 39.1 Scan structureg to a design
Fig. 39.1 shows a scan structure connected n t to design. The scan flip-flips (FFs) must be
d e effectively turns the sequential testing problem
interconnected in a particular way. This approach
t u with
into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there
are two types of overheads associated
y s (including
this technique that the designers care about very
i t
much. These are the hardware overhead three extra pins, multiplexers for all FFs, and
.c overhead (including multiplexer delay and FF delay due to
extra routing area) and performance
extra load). w
w
2.2.3 Scan Design w Rules
Only clocked D-type master-slave flip-flops for all state variables should be used.
At least one PI pin must be available for test. It is better if more pins are available.
All clock inputs to flip-flops must be controlled from primary inputs (PIs). There will be
no gated clock. This is necessary for FFs to function as a scan register.
Clocks must not feed data inputs of flip-flops. A violation of this can lead to a race
condition in the normal mode.
o m
t.c
2.3 Scan Variations
p o
There have been many variations of scan as listed below, few of these are discussed here.
MUXed Scan g s
o
Scan path
. bl
Scan-Hold Flip-Flop
u p
Serial scan r o
Level-Sensitive Scan Design (LSSD) sg
Scan set n t
Random access scan d e
t u
s
2.3.1 MUX Scan ti y
.c in 1973 by M. Williams & Angell.
w
It was invented at Stanford
In this approachwa MUX is inserted in front of each FF to be placed in the scan chain.
w
C/L
X Z
SI M FF M FF M FF SO
C
T
DI L1 L2
D Q D Q
SI
o m
t.c
T
C
p o
Fig. 39.2 The Shift-Register Modificationsapproach
g
Fig. 39.2 shows that when the test mode pin T=0, thelocircuit is in normal operation mode
. b
and when T=1, it is in test mode (or shift-register mode).
The scan flip-flips (FFs) must be interconnectedu p in a particular way. This approach
r o into a combinational one and can be fully
effectively turns the sequential testing problem
tested by compact ATPG patterns.
s g
n t
There are two types of overheads associated with this method. The hardware overhead
C2
SI
DI DO
SO
C1 L2
L1
It uses two latches (one i ty for normal operation and one for scan) and three clocks.
Furthermore, to enjoy .cthe luxury of race-free and hazard-free system operation and test,
the designer has towfollow a set of complicated design rules.
wis level sensitive (LS) iff the steady state response to any allowed input
A logic circuitw
change is independent of the delays within the circuit. Also, the response is independent
of the order in which the inputs change
D
C
CD +L
D 0 0 L
L L +L 0 1 L
C 1 0 0
1 1 1
C
.bl A
SI
u p +L2
r o +L2 B L2
s g
A
ent
B
u d
st
it y
Fig. 39.5 The polarity-hold shift-register latch (SRL)
.c
LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure
w
39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent
w
on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data
w
propagation and stabilization time. Figure 39.5 shows the polarity-hold shift-register latch (SRL)
used in LSSD as the scan cell.
The scan cell is controlled in the following way:
Normal mode: A=B=0, C=0 1.
SR (test) mode: C=0, AB=10 01 to shift SI through L1 and L2.
Advantages of LSSD
1. Correct operation independent of AC characteristics is guaranteed.
2. FSM is reduced to combinational logic as far as testing is concerned.
3. Hazards and races are eliminated, which simplifies test generation and fault simulation.
Drawbacks of LSSD
1. Complex design rules are imposed on designers. There is no freedom to vary from the
overall schemes. It increases the design complexity and hardware costs (4-20% more
hardware and 4 extra pins).
2. Asynchronous designs are not allowed in this approach.
3. Sequential routing of latches can introduce irregular structures.
4. Faults changing combinational function to sequential one may cause trouble, e.g., bridging
and CMOS stuck-open faults.
5. Test application becomes a slow process, and normal-speed testing of the entire test
sequence is impossible.
6. It is not good for memory intensive designs.
o m
o t.c
2.3.4 Random Access Scan s p
o g
bl
This approach was developed by Fujitsu and was used by Fujitsu, Amdahl, and TI.
.
p
It uses an address decoder. By using address decoder we can select a particular FF and
u
either set it to any desired value or read out its value. Figure 39.6 shows a random access
r o
structure and Figure 39.7 shows the RAM cell [1,6-7].
s g
ent
Combinational
u d PO
PI
Logic
st
ity RAM
c
CK
. TC
nff bite
w SCANOUT
wSCANIN
w
Select
Address Address
Log2 nff bites Decoder
D Q
From comb. logic To comb.
SD logic
SCANIN Scan flip-flop
(SF
CK
TC
SCAN
SE
o m OUT
The control input HOLD keeps the output steady at previous state of flip-flop.
For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomes
transparent.
For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0.
Hardware overhead increases by about 30% due to extra hardware the hold latch.
This approach reduces power dissipation and isolate asynchronous part during scan.
It is suitable for delay test [8].
To SD of
next SHFF
D
Q
S
SFF
T
Q
CK
o m
HO
o t.c
Fig. 39.8 Scan-hold flip-flop (SHFF) s
p
o g
Partial Scan Design . bl
p
In this approach only a subset of flip-flopsuis scanned. The main objectives of this
r o and scan sequence length. It would be
approach are to minimize the area overhead
s g
t
possible to achieve required fault coverage
In this approach sequential ATPG isnused to generate test patterns. Sequential ATPG has
d
number of difficulties such as poor e initializability, poor controllability and observability
t u of gates, number of FFs and sequential depth give little
of the state variables etc. Number
idea regarding testability sand presence of cycles makes testing difficult. Therefore
i
sequential circuit must be tysimplified in such a way so that test generation becomes easier.
c
Removal of selected.flip-flops from scan improves performance and allows limited scan
w
It also allowsw
w
design rule violations.
automation in scan flip-flop selection and test generation
Figure 39.9 shows a design using partial scan architecture [1].
Sequential depth is calculated as the maximum number of FFs encountered from PI line
to PO line.
PI PO
Combinational
circuit
CK1
FF
FF o m
CK2
t.c SCANOUT
SFF
s po
TC
o g
SFF
.bl
SCANIN
u p
r o
Fig. 39.9 Design usings gpartial scan structure
n t
Things to be followed for a partial
d e scan method
t umust be selected, removal of which would eliminate all
A minimum set of flip-flops
y s
cycles. t
i to keep overhead low.
. c
Break only the long cycles
w self-lops should be removed.
All cycles other than
w
3. Conclusions w
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus
it is essential that a designer must consider how the IC will be tested and extra structures will
be incorporated in the design. Scan design has been the backbone of design for testability in
the industry for a long time. Design automation tools are available for scan insertion into a
circuit which then generate test patterns. Overhead increases due to the scan insertion in a
circuit. In ASIC design 10 to 15 % scan overhead is generally accepted.
References
8. Consider the random-access scan architecture. How would you organize the test data to
minimize the total test time? Describe a simple heuristic for ordering these data.
9. Make a comparison of different scan variations in terms of scan overhead.
10. Consider the combinational circuit below which has been portioned into 3 cones (two
CONE Xs and one CONE Y) and one Exclusive-OR gate.
J
A
B CONE X G
C
D K
CONE Y
H
CONE X
o m
t.c
E
F
p o
g s
For those two cones, we have the following information.
o
. bl
its output is also specified. u p
CONE X has a structure which can be tested 100% by using the following 4 vectors and
r o
B / H sg C / F
A/G
n t OUTPUT
0
d e
0 1 0
0
t u 1 1 0
1
t ys 1 0 1
1ci
. 0 0 1
w
w
w specified.
CONE Y has a structure which can be tested 100% by using the following 4 vectors and
its output is also
C D E OUTPUT
0 0 1 0
0 1 0 1
1 0 1 1
1 1 1 0
Derive a smallest test set to test this circuit so that each partition is applied the required 4
test vectors. Also, the XOR gate should be exhaustively tested.
Fill in the blank entries below. (You may not add additional vectors).
A B C D E F G H J K
0 0 1 1 0
0 1 1 0
1 1 0 1 1
1 0 0 1
o m
o t.c
s p
o g
.bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
40
Built-In-Self-Test (BIST)
for Embedded Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
Explain the meaning of the term Built-in Self-Test (BIST)
Identify the main components of BIST functionality
Describe the various methods of test pattern generation for designing embedded systems
with BIST
Define what is a Signature Analysis Register and describe some methods to designing
such units
Explain what is a Built-in Logic Block Observer (BILBO) and describe how to use this
block for designing BIST
PO
Good/Faulty
Signature
Fig. 40.1 A Typical BIST Architecture
As shown in Figure 40.1, the wires from primary inputs (PIs) to MUX and wires from circuit
output to primary outputs (POs) cannot be tested by BIST. In normal operation, the CUT
receives its inputs from other modules and performs the function for which it was designed.
During test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT,
and the test responses are evaluated by a output response compactor. In the most common type
of BIST, test responses are compacted in output response compactor to form (fault) signatures.
The response signatures are compared with reference golden signatures generated or stored on-
chip, and the error signal indicates whether chip is good or faulty.
Four primary parameters must be considered in developing a BIST methodology for embedded
systems; these correspond with the design parameters for on-line testing techniques discussed in
earlier chapter [2].
Fault coverage: This is the fraction of faults of interest that can be exposed by the test
patterns produced by pattern generator and detected by output response monitor. In
presence of input bit stream errors there is a chance that the computed signature matches
the golden signature, and the circuit is reported as fault free. This undesirable property is
called masking or aliasing.
Test set size: This is the number of test patterns produced by the test generator, and is
closely linked to fault coverage: generally, large test sets imply high fault coverage.
o m
Hardware overhead: The extra hardware required for BIST is considered to be overhead.
In most embedded systems, high hardware overhead is not acceptable.
o t.c
Performance overhead: This refers to the impact of BIST hardware on normal circuit
p
performance such as its worst-case (critical) path delays. Overhead of this type is
s
sometimes more important than hardware overhead.
o g
Issues for BIST . bl
u p
o
Area Overhead: Additional active area due to test controller, pattern generator, response
r
evaluator and testing of BIST hardware.
s g
nt
Pin Overhead: At least 1 additional pin is needed to activate BIST operation. Input MUX
adds extra pin overheads.
d e
Performance overhead: Extra path delays are added due to BIST.
t u
Yield loss increases due to increased chip area.
s
Design effort and time increases due to design BIST.
y
it
The BIST hardware complexity increases when the BIST hardware is made testable.
.c
Benefits of BIST w
w
w
It reduces testing and maintenance cost, as it requires simpler and less expensive ATE.
BIST significantly reduces cost of automatic test pattern generation (ATPG).
It reduces storage and maintenance of test patterns.
It can test many units in parallel.
It takes shorter test application times.
It can test at functional system speed.
BIST can be used for non-concurrent, on-line testing of the logic and memory parts of a system
[2]. It can readily be configured for event-triggered testing, in which case, the BIST control can
be tied to the system reset so that testing occurs during system start-up or shutdown. BIST can
also be designed for periodic testing with low fault latency. This requires incorporating a testing
process into the CUT that guarantees the detection of all target faults within a fixed time.
On-line BIST is usually implemented with the twin goals of complete fault coverage and low
fault latency. Hence, the test generation (TG) and response monitor (RM) are generally designed
to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable set
size. These goals are met by different techniques in different parts of the system.
TG and RM are often implemented by simple, counter-like circuits, especially linear-feedback
shift registers (LFSRs) [3]. The LFSR is simply a shift register formed from standard flip-flops,
with the outputs of selected flip-flops being fed back (modulo-2) to the shift registers inputs.
When used as a TG, an LFSR is set to cycle rapidly through a large number of its states. These
states, whose choice and order depend on the design parameters of the LFSR, define the test
patterns. In this mode of operation, an LFSR is seen as a source of (pseudo) random tests that
are, in principle, applicable to any fault and circuit types. An LFSR can also serve as an RM by
counting (in a special sense) the responses produced by the tests. An LFSR RMs final contents
after applying a sequence of test responses forms a fault signature, which can be compared to a
known or generated good signature, to see if a fault is present. Ensuring that the fault coverage is
sufficiently high and the number of tests is sufficiently low are the main problems with random
BIST methods. Two general approaches have been proposed to preserve the cost advantages of
o m
LFSRs while making the generated test sequence much shorter. Test points can be inserted in the
CUT to improve controllability and observability; however, they can also result in performance
o t.c
loss. Alternatively, some determinism can be introduced into the generated test sequence, for
example, by inserting specific seed tests that are known to detect hard faults.
A typical BIST architecture using LFSR is shown in Figure 40.2 [4].
p
s Since the output patterns of
the LFSR are time-shifted and repeated, they become correlated; o g this reduces the effectiveness of
the fault detection. Therefore a phase shifter (a network l
b of XOR gates) is often used to
decorrelate the output patterns of the LFSR. The response. of the CUT is usually compacted by a
u
multiple input shift register (MISR) to a small signature,
p which is compared with a known fault-
r
free signature to determine whether the CUT is faulty. o
s g
n t Scan chain 1 (/bits)
d e
LFSR
. t
Phaseu Scan chain 2 (/bits) MISR
y s
shifter .
.
. c i t .
.
.
w Scan chain n (/bits)
w
w
Fig. 40.2 A generic BIST architecture based on an LFSR, an MISR, and a phase shifter
Clock
o m
Reset Q1 Q2 o Q3
t.c
Fig. 40.3 Exhaustive pattern generator sp
o g
2.3 Pseudo-exhaustive patterns
. bl
In pseudo-exhaustive pattern generation, the circuit p
u is partitioned into several smaller sub-
o
r applied to each sub-circuit. The main goal
circuits based on the output cones of influence, possibly overlapping blocks with fewer than n
g
inputs. Then all possible test patterns are exhaustively
sfault coverage as the exhaustive testing and, at the
of pseudo-exhaustive test is to obtain the samet
same time, minimize the testing time. Sincenclose to 100% fault coverage is guaranteed, there is
d
no need for fault simulation for exhaustive
e testing and pseudo-exhaustive testing. However,
u to partition the circuits into pseudo-exhaustive testable
such a method requires extra design teffort
s of test patterns and test responses is also a major
ti y may also increase the overhead and decrease the
sub-circuits. Moreover, the delivery
consideration. The added hardware
performance. .c
w
w
w
X1
Five-Bit
X2 2
Binary h
Counter X3 6
1 3
0 for Counter 1 2-Bit X4
2-1 X5 1
1 for Counter 2 MUX
X6 4
Five-Bit f
Binary X7 7
Counter X8 5
2
Circuit partitioning for pseudo-exhaustive pattern generation can be done by cone segmentation
as shown in Figure 40.4. Here, a cone is defined as the fan-ins of an output pin. If the size of the
largest cone in K, the patterns must have the property to guarantee that the patterns applied to
any K inputs must contain all possible combinations. In Figure 40.4, the total circuit is divided
into two cones based on the cones of influence. For cone 1 the PO h is influenced by X1, X2, X3,
X4 and X5 while PO f is influenced by inputs X4, X5, X6, X7 and X8. Therefore the total test
pattern needed for exhaustive testing of cone 1 and cone 2 is (25 +25) = 64. But the original
circuit with 8 inputs requires 28 = 256 test patterns exhaustive test.
o
contrast with other methods, pseudo-random pattern BIST may require a long test time and
m
general, this requires more patterns than deterministic ATPG, but less than the exhaustive test. In
ot.c
necessitate evaluation of fault coverage by fault simulation. This pattern type, however, has the
potential for lower hardware and performance overheads and less design effort than the
s p
preceding methods. In pseudorandom test patterns, each bit has an approximately equal
g
probability of being a 0 or a 1. The number of patterns applied is typically of the order of 103 to
o
bl
107 and is related to the circuit's testability and the fault coverage required.
p .
Linear feedback shift register reseeding [5] is an example of a BIST technique that is based on
u
controlling the LFSR state. LFSR reseeding may be static, that is LFSR stops generating patterns
o
gr
while loading seeds, or dynamic, that is, test generation and seed loading can proceed
simultaneously. The length of the seed can be either equal to the size of the LFSR (full
s
e nt
reseeding) or less than the LFSR (partial reseeding). In [5], a dynamic reseeding technique that
allows partial reseeding is proposed to encode test vectors. A set of linear equations is solved to
d
obtain the seeds, and test vectors are ordered to facilitate the solution of this set of linear
u
equations.
st
it y
.c
w
wh
w n-1 hn-2 h2 h1
D FF D FF D FF D FF
Xn-1 Xn-2 X1 X0
Fig. 40.5 Standard Linear Feedback Shift Register
Figure 40.5 shows a standard, external exclusive-OR linear feedback shift register. There are n
flip-flops (Xn-1,X0) and this is called n-stage LFSR. It can be a near-exhaustive test pattern
generator as it cycles through 2n-1 states excluding all 0 states. This is known as a maximal
length LFSR. Figure 40.6 shows the implementation of a n-stage LFSR with actual digital
circuit. [1]
hn-1 hn-2 h2 h1
D Q D Q D Q D Q
n-1 n-2
x x x 1
Xn-1 Xn-2 X1 X0
Clock
Fig. 40.6 n-stage LFSR implementation with actual digital circuit
o t.c
exploited to find the seeds needed to cover the given set of deterministic patterns. Width
compression is combined with reseeding to reduce the hardware overhead. In a two-dimensional
s p
test data compression technique an LFSR and a folding counter are combined for scan-based
g
BIST. LFSR reseeding is used to reduce the number of bits to be stored for each pattern
o
bl
(horizontal compression) and folding counter reseeding is used to reduce the number of patterns
(vertical compression).
p .
u
ro Generation
2.6 Weighted Pseudo-random Pattern
g
s
e nt
Bit-flipping [9], bit-fixing, and weighted random BIST [1,8] are example of techniques that rely
on altering the patterns generated by LFSR to embed deterministic test cubes. A hybrid between
d
pseudorandom and stored-pattern BIST, weighted pseudorandom pattern BIST is effective for
u
t
dealing with hard-to-detect faults. In a pseudorandom test, each input bit has a probability of 1/2
s
it y
of being either a 0 or a 1. In a weighted pseudorandom test, the probabilities, or input weights,
can differ. The essence of weighted pseudorandom testing is to bias the probabilities of the input
.c
bits so that the tests needed for hard-to-detect faults are more likely to occur. One approach uses
w
software that determines a single or multiple weight set based on a probabilistic analysis of the
w
hard-to detect faults. Another approach uses a heuristic-based initial weight set followed by
w
additional weight sets produced with the help of an ATPG system. The weights are either
realized by logic or stored in on-chip ROM. With these techniques, researchers obtained fault
coverage over 98% for 10 designs, which is the same as the coverage of deterministic test
vectors.
In hybrid BIST method based on weighted pseudorandom testing, a weight of 0, 1, or
(unbiased) is assigned to each scan chain in CUT. The weight sets are compressed and stored on
the tester. During test application, an on-chip lookup table is used to decompress the data from
the tester and generate weight sets. In order to reduce the hardware overhead, scan cells are
carefully reordered and a special ATPG approach is used to generate suitable test cubes.
DQ DQ DQ DQ DQ DQ DQ DQ
X7 X6 X5 X4 X3 X2 X1 X0
Inversion
Fig. 40.7 Weighted pseudo-random pattern generator
LFSR 0
o m
t.c
0
g s
D D D lo D D D D
Q Q
.Qb Q Q Q Q
u p
1/8 3/4 1/2 7/8 1/2 0.8 r o0.6 0.8 0.4 0.5 0.3 0.3
(a)
s g (b)
n t
e
Fig. 40.8 weighted pseudorandom patterns.
d
Figure 40.7 shows a weightedtupseudo-random pattern generator implemented with
s
ti y of 1s and 0s. As shown in Figure 40.8 (a), if a 3-input
programmable probabilities of generating zeros and ones at the PIs. As we know, LFSR
generates pattern with equal probability
. c of 1s becomes 0.125. If a 2-input OR gate is used, the
AND gate is used, the probability
probability becomes 0.75.wSecond, one can use cellular automata to produce patterns of desired
w
weights as shown in Figure
w 40.8(b).
2.7 Cellular Automata for Pattern Generation
Cellular automata are excellent for pattern generation, because they have a better randomness
distribution than LFSRs. There is no shift induced bit value correlation. A cellular automaton is a
collection of cells with regular connections. Each pattern generator cell has few logic gates, a
flip-flop and is connected only to its local neighbors. If Ci is the state of the current CA cell, Ci+1
and Ci-1 are the states of its neighboring cells. The next state of cell Ci is determined by (Ci-1, Ci ,
and Ci+1). The cell is replicated to produce cellular automaton. The two commonly used CA
structures are shown in Figure 40.9.
0 0
D DD D D D D
Q QQ Q Q Q Q
D DD D D D o m
D
QQ
t.c
Q Q Q Q Q
patterns, then the CUT response to RM will be 1 billion bits. This is not manageable in practice.
So it is necessary to compact this enormous amount of circuit responses to a manageable size
that can be stored on the chip. The response analyzer compresses a very long test response into a
single word. Such a word is called a signature. The signature is then compared with the prestored
golden signature obtained from the fault-free responses using the same compression mechanism.
If the signature matches the golden copy, the CUT is regarded fault-free. Otherwise, it is faulty.
There are different response analysis methods such as ones count, transition count, syndrome
count, and signature analysis.
Compression: A reversible process used to reduce the size of the response. It is difficult in hard
ware.
Compaction: An irreversible (lossy) process used to reduce the size of the response.
d)
bit stream.
o t.c
Cyclic Redundancy Check (CRC): It is also called signature. It computes CRC check
word on the bit stream.
s p
o g
bl
Signature analysis Compact good machine response into good machine signature. Actual
.
signature generated during testing, and compared with good machine signature.
p
ou
Aliasing: Compression is like a function that maps a large input space (the response) into a small
output space (signature). It is a many-to-one mapping. Errors may occur in the in the input bit
gr
stream. Therefore, a faulty response may have the signature that matches the to the golden
s
nt
signature and the circuit is reported as the fault-free one. Such a situation is referred as the
aliasing or masking. The aliasing probability is the possibility that a faulty response is treated as
fault-free. It is defined as follows:
d e
t u
Let us assume that the possible input patterns are uniformly distributed over the possible mapped
y s
signature values. There are 2m input patterns, 2r signatures and 2n-r input patterns map into given
it
.c
signature. Then the aliasing or masking probability
P(M)= w
Number of erroneos input that map into the golden signature
w Number of faulty input responses
2 m-r -1
w
= m
2 -1
2 m-r
m for large m
2
1
= r
2
The aliasing probability is the major considerations in response analysis. Due to the n-to-1
mapping property of the compression, it is unlikely to do diagnosis after compression. Therefore,
the diagnosis resoluation is very poor after compression. In addition to the aliasing probability,
hardware overhead and hardware compatibility are also important issues. Here, hardware
compatibility is referred to how well the BIST hardware can be incorporated in the CUT or DFT.
Test
CUT
Pattern
Clock Counter
o m
Fig. 40.10 Ones count compression circuit structure
t.c
p ofollows:
For N-bit test length with r ones the masking probability is shown as
g s
N
Number of masking sequences = 1
b lo
r
.
2 possible output sequences with only one fault free. p
N
o u
N
g r
r ts
The masking probabilities: P(M) =
( 2 de1)
N n ( N)
1 2
DFF
Test
CUT
Pattern
Clock Counter
b l counts : 2
r
. Again, only one
of them is fault-free.
p .
N 1 o u
2 1 gr
P(M) = 2 s
t( N)
1 2
Masking probabilities:
( )e
2N
1 n
u d
3.3 Syndrome Testingst
i ty of ones of the CUT output response. The syndrome is 1/8
c
Syndrome is defined as the probability
for a 3-input AND gate and.7/8 for a 3-input OR gate if the inputs has equal probability of ones
w a BIST circuit structure for the syndrome count. It is very similar
w
and zeros. Figure 40.12 shows
w
to ones count and transition count. The difference is that the final count is divided by the number
of patterns being applied. The most distinguished feature of syndrome testing is that the
syndrome is independent of the implementation. It is solely determined by its function of the
circuit.
random
test CUT
pattern
Counter
Syndrome
Fig. 40.12 Syndrome testing circuit structure
Version 2 EE IIT, Kharagpur 13
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
The originally design of syndrome test applies exhaustive patterns. Hence, the syndrome is
S = K / 2 n , where n is the number of inputs and K is the number of minterms. A circuit is
syndrome testable if all single stuck-at faults are syndrome detectable. The interesting part of
syndrome testing is that any function can be designed as being syndrome testable.
t.c
po
D3 D2 D1 D0 D3 D2 D1 D0
d e
{am } = {a0 , a1 , a2 ,......} u
st
where ai = 1or 0 depending on the out put stage and time ti .
it y
The initial states are a-n, a-n+1,.,a-2, a-1. The recurrent relation defining {am}is
n
am = ci am i
.c
w
i =1
w
w is not fed back
where c = 0, means output
i
= 1, otherwise
n
G ( x ) = ci am i x m
m = 0 i =1
n
= ci x i am i x m
i =1 m=0
n
= ci x a i x + .... + a1 x + am x m
i i 1
i =1 m=0
c x (a x i + .... + a1 x 1 )
n
i
i i
G ( x) = i =1
n
1 ci x i
i =1
G(x) has been expressed in terms of the initial state and the feedback coefficients. The
n
denominator of the polynomial G(x), f ( x ) = 1 ci x i is called the characteristic polynomial of o m
the LFSR.
i =1
ot.c
s p
g Analysis
3.5 LFSR for Response Compaction: Signature
b lo
It uses cyclic redundancy check code (CRCC) generator
In this method, data bits from circuit Pos to p .be compacted
(LFSR) for response compacter
u as a decreasing order
coefficient polynomial
ro
CRCC divides the PO polynomial by itsgcharacteristic polynomial that leaves remainder
s
t totoknown
of division in LFSR. LFSR must be initialized seed value (usually 0) before testing.
e n
After testing, signature in LFSR is compared good machine signature
For an output sequence of length N, dthere is a total of 2 -1 faulty sequence. Let the input
t u N
s
sequence is represented as P(x) as P(x)=Q(X)G(x)+R(x). G(x) is the characteristic polynomial;
Q(x) is the quotient; and R(x) istythe remainder or signature. For those aliasing faulty sequence,
the remainder R(x) will be thecisame as the fault-free one. Since, P(x) is of order N and G(x) is of
. of N-r. Hence, there are 2 possible Q(x) or P(x). One of them
N-r
w the aliasing probability is shown as follows:
order r, hence Q(x) has an order
is fault-free.
w
Therefore,
2 1 w
N r
r
P(M ) = 2 for large N. Masking probabilities is independent of input sequence.
2 1
N
01010001 D Q D Q D Q D Q D Q
1 x x2 x3 x4
CLOCK
X0 X1 X2 X3 X4
Fig. 40.14 Modular LFSR as a response compactor
Version 2 EE IIT, Kharagpur 15
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Any divisor polynomial G(x) with two or more non-zero coefficients will detect all
single-bit errors.
S ( x) = S x + S i , m 1 x .c+ ........ + S x + S
m 1
i ,m 2
m2
i
S ( x ) = R ( x ) + xS (w
i +1 i
w
x ) mod G ( x )
i
i ,1 i ,0
w polynomial
G ( x ) is the characteristic
Assume initial state of MISR is 0. So,
S0 ( x ) = 0
S1 ( x ) = R0 ( x ) + xS0 ( x ) mod G ( x ) = R0 ( x )
S 2 ( x ) = R1 ( x ) + xS1 ( x ) mod G ( x ) = R1 ( x ) + R0 ( x ) mod G ( x )
.
.
S n ( x ) = x n 1 R0 ( x ) + x n 2 R1 ( x ) + ....... + xRn 2 ( x ) + Rn 1 ( x ) mod G ( x )
This is the signature left in MISR after n patterns are applied. Let us consider a n-bit response
compactor with m-bit error polynomial. Then the error polynomial is of (m+n-2) degree that
gives (2m+n-1-1) non-zero values. G(x) has 2n-1-1 nonzero multiples that result m polynomials of
degree <=m+n-2.
2n1 1
P( M ) = m+ n 1
Probability of masking 2 1
1
m
2
it y
CUT .c CUT
w
w
MISR w MISR
B1 D1 D2 Dn-1 Dn
B2
MUX
S1 0 DQ DQ DQ D Q SO
Clock 1 C C C C
Q1 Q2 Qn-1 Qn
Fig. 40.17 BILBO Example
nt
F U L U L I
S T B T B T S
R A O
1 d e B O
2
C R
t u
s
(a) Example test configuration.
y
it
.c
Fig. 40.18 Circuit configured with BILBO
Phase 1 w
w
w
In this mode of operation BILBO1 operates in MISR mode and BILBO2 operates in LFSR
mode. CUT A and CUT C are tested in parallel.
Phase 2
In this of operation BILBO1 operates in LFSR mode and BILBO2 operates in MISR mode. Only
CUT B is tested in this mode of operation.
responses, the test speed is much slower than the test-per-clock approach. The clocks required
for a test cycle is the maximal of the scan stages of input and output scan registers. Also fall in
this category include CEBS, LOCST, and STUMP.
SI SI
LFSR Scan Register SRI LFSR Scan Register SRI
CUT CUT
SO SO
MISR Scan Register SRO MISR Scan Register SRO
u d
st
Pseudo-Random Test Pattern Generator
i ty
.c Input Phase Shifting Network
w
w
w
SR1 CUT SR CUT SR
SR2 n-1 n
MISR
de
Input Decoders Output Buffers
... ...
t u PLA Outputs
PLA Inputs
s
i ty A general structure of a PLA.
Fig. 40.21
c
As mentioned earlier in the. fault model section, PLAs has the following faults, stuck-at faults,
w faults. Test generation for PLAs is more difficult than that for the
conventional logic. w
w
bridging faults, and crosspoint
This is because that PLAs have more complicated fault models. Further, a
typical PLA may have as many as 50 inputs, 67 inputs, and 190 product terms [10-11].
Functional testing of such PLAs can be a difficult task. PLAs often contain unintentional and
unidentifiable redundancy which might cause fault masking. Further more, PLAs are often
embedded in the logic which complicates the test application and response observation.
Therefore, many people proposed the use of BIST to handle the test of PLAs.
5. BIST Applications
Manufactures are increasingly employing BIST in real products. Examples of such applications
are given to illustrate the use of BIST in semiconductor, communications, and computer
industrial.
u
meets the burn-in requirements with littled additional logic.
st
5.4 ALU Based Programmable i ty MISR of MC68HC11 [15]
.c
w
Broseghini and Lenhert implemented an ALU-Based self-test system on a MC68HC11 Family
waliasing
microcontroller. A fully programmable pseudorandom pattern generator and MISR are used to
w
reduce test length and probabilities. They added microcodes to configure ALU into a
LFSR or MISR. It transforms the adder into a LFSR by forcing the carry input to 0. With such a
feature, the hardware overhead is minimized. The overhead is only 25% as compare to the
implementation by dedicated hardware.
References
[5] C. V. Krishna, A. Jalas, and N. A. Tauba, Test vector encoding using partial LFSR
reseeding, in Proceeding of the International Test Conference, pp. 885-893, 2001.
[6] J. Rajski, J. Tyszer, and N. Zacharia, Test data decompression for multiple scan designs
with boundary scan, IEEE Transactions on Computers, 47, pp. 1188-1200, 1998.
[7] N. A. Tauba and E.J.MaCluskey, Altering a pseudo-random bit sequence for scan
based, in Proceedings of International Test Conference, 1996, pp. 167-175.
[8] S. Wang, Low hardware overhead scan based 3-weight weighted random BIST, in
Proceedings of International Test Conference, 2001, pp. 868-877.
[9] H. J. Wunderlich and G.Kiefer, Bit-flipping BIST, in Proceedings of International
Conference on Computer-Aided Design, 1996, pp. 337-343.
[10] C.Y. Liu, K.K Saluja, and J.S. Ypadhyaya, BIST-PLA: A Built-in Self-Test Design of
Large Programmable Logic Arrays, Proc. 24th Design Automation Conf., June 1987, pp.
385-391.
[11] C.Y.Liu and K.K.Saluja, Built -In Self-Test Techniques for Programmable logic
Arrays, in VLSI Fault Modeling and Testing Techniques, G. W. Zobrist,ed., Ablex
Publishing, Norwood, N.J.,1993.
o m
[12]
t.c
P. Gelsinger, Design and Test of the 80386, IEEE Design & Test of Computers, Vol. 4,
No. 3, June 1987, pp.42-50.
o
[13] p
I.M. Ratiu and H.B. Bakouglu, Pseudorandom Built-In Self-Test Methodology and
s
g
implementation for the IBM RISC System/6000 Processor, IBM J. Research and
o
bl
Development, Vol. 34. 1990, pp.78-84.
[14]
.
A.L. Crouch, M. Pressly, J. Circello, Testability Features of the MC68060
p
Microprocessor, Proc. Intl Test Conf., 1994, pp. 60-69.
[15] u
J. Broseghini and D.H. Lenhert, An ALU-Based Programmable MISR/Pseudorandom
o
r
Generator for a MC68HC11 Family Self-Test, Proc. Intl Test Conf., 1993, pp. 349-358.
g
s
Problems
ent
u d
st
1. What is Built-In-Self-Test? Discuss the issues and benefits of BIST. Describe BIST
architecture and its operation.
it y
2. Excluding the circuit under test, what are the four basic components of BIST and what
.c
function does each component perform?
w
3. Which two BIST components are necessary for system-level testing and why?
w
4. What are the different techniques for test pattern generation?
w
5. Discuss exhaustive and pseudo-exhaustive pattern generation. Give an example to show
that pseudo-exhaustive testing requires less number of test pattern than exhaustive
testing.
6. What is pseudorandom pattern generation? What is an LFSR? Describe pattern
generation using LFSR.
7. Make a comparison of different test strategies based on fault coverage, hardware
overhead, test time overhead and design effort.
8. An LFSR based signature register compresses an n-bit input pattern into an m-bit
signature. Derive an expression for the probability of aliasing. Clearly state any
assumptions you make.
9. Design a weighted pseudo-random pattern generator with programmable weights 1/2, 1/4,
11/32 and 1/16.
10. Prove that the number of 1s in an m-sequence differs from the number of 0s by one.
11. Consider a LFSR based pattern generator where the feedback network is a single XOR
gate before the first stage. If the number of (feedback) inputs to the XOR is odd, is it
possible for the LFSR to generate maximal length sequence? Justify or contradict.
12. Show the schematic diagram of a 4-bit BILBO register.
13. A given data path has p number of n-bit registers. For having BIST capability, suppose
a% of the registers are converted to BILBO. Estimate the percentage overhead in the
registers in terms of extra hardware. All gates may be assumed to have unit cost in your
calculation.
14. It is said that by adding some extra hardware, a combinational circuit can be made
syndrome testable for single stuck-at faults. Illustrate the process for a circuit realizing
the Boolean function f = AB + BC.
15. Define the following:
a) Compression
b) Compaction
c) Signature analysis
d) Aliasing or masking
o m
16. Describe different response compaction techniques.
o t.c
17. What are different types of LFSR? What is modular LFSR? What is characteristic
polynomial?
s p
18. Implement a standard LFSR for the characteristic polynomial f(x) = x8+x7+x2+1.
19. Given the polynomial P(x)=x4+x2+x+1: o g
. bl
a. Design an external feedback LSFR with characteristic polynomial P(x).
u d
a) Draw a logic diagram for the complete register.
st
b) Determine the resultant signature that would be obtained for the following serial
t y
sequence of output responses produced by a known good CUT assuming the SAR
i
.c
is initialized to the all 0s state. Give the binary value of the resultant signature as
it would be contained in the SAR in your logic diagram above.
w 101001010010 time
w
w
22. What is MISR? Give architecture of an m-stage MISR and derive its signature. What is
the masking probability of MISR?
23. Describe with example and diagram what are test-per-clock system and test-per-scan
system. What is the difference between them?
24. What is BILBO? Describe BILBO architecture and its operation?
25. Describe how BILBO is implemented in digital circuits?
26. Describe STUMPS testing system and its test procedure.
27. Give some examples of practical BIST application in industry.
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Lesson
41
Boundary Scan Methods
and Standards
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
p .
circuits to analog or mixed-mode circuits. It is now widely accepted in industry and has been
considered as an industry standard in most large IC system designs. Boundary-scan, as defined
ou
by the IEEE Std. 1149.1 standard [1-3], is an integrated method for testing interconnects on
gr
printed circuit board that is implemented at the IC level. Earlier, most Printed Circuit Board
s
nt
(PCB) testing was done using bed-of-nail in-circuit test equipment. Recent advances with VLSI
technology now enable microprocessors and Application Specific Integrated Circuits (ASICs) to
e
be packaged into fine pitch, high count packages. The miniaturization of device packaging, the
d
u
development of surface-mounted packaging, double-sided and multi-layer board to accommodate
t
s
the extra interconnects between the increased density of devices on the board reduces the
y
it
physical accessibility of test points for traditional bed-of-nails in-circuit tester and poses a great
.c
challenge to test manufacturing defects in future. The long-term solution to this reduction in
w
physical probe access was to consider building the access inside the device i.e. a boundary scan
w
register. In 1985, a group of European companies formed Joint European Test Action Group
w
(JETAG) and by 1988 the Joint Test Action Group (JTAG) was formed by several companies to
tackle these challenges. The JTAG has developed a specification for boundary-scan testing that
was standardized in 1990 by IEEE as the IEEE Std. 1149.1-1990. In 1993 a new revision to the
IEEE Std. 1149.1 standard was introduced (1149.1a) and it contained many clarifications,
corrections, and enhancements. In 1994, a supplement that contains a description of the
boundary-scan Description Language (BSDL) was added to the standard. Since that time, this
standard has been adopted by major electronics companies all over the world. Applications are
found in high volume, high-end consumer products, telecommunication products, defense
systems, computers, peripherals, and avionics. Now, due to its economic advantages, smaller
companies that cannot afford expensive in-circuit testers are using boundary-scan. Figure 41.1
gives an overview of the boundary scan family, now known as the IEEE 1149.x standards.
required to test advanced digital networks that are not fully covered by IEEE Std. 1149.1, such
as networks that are AC-coupled, differential, or both.
1532 IEEE Standard is developed for In-System Configuration of Programmable Devices [5].
This extension of 1149.1 standardizes programming access and methodology for programmable
integrated circuit devices. Devices such as CPLDs and FPGAs, regardless of vendor, that
implement this standard may be configured (written), read back, erased and verified, singly or
concurrently, with a standardized set of resources based upon the algorithm description
contained in the 1532 BSDL file. JTAG Technologies programming tools contain support for
1532-compliant devices and automatically generate the applications.
Clearly the testing of mixed-mode circuits at the various levels of integration will be a critical
test issue for the system-on-chip design. Therefore there is a demand to combine all the boundary
scan standards into an integrated one.
o t.c
circuits on a board without using physical test probes. It adds a boundary-scan cell that includes
a multiplexer and latches, to each pin on the device. Figure 41.2 [1] illustrates the main elements
of a universal boundary-scan device.
s p
The Figure 41.2 shows the following elements:
o g
Test Access Port (TAP) with a set of four dedicated . bl test pins: Test Data In (TDI), Test
Mode Select (TMS), Test Clock (TCK), Test Data
u p Out (TDO) and one optional test pin
Test Reset (TRST*).
r o input and primary output pin, connected
gregister (Boundary Scan).
A boundary-scan cell on each device primary
s
A TAP controller with inputs TCK, n
t
internally to form a serial boundary-scan
TMS, and TRST*.
d e holding the current instruction.
An n-bit (n >= 2) instruction register
t
A 1-bit Bypass register (Bypass).u
s register capable of being loaded with a permanent
ti y
An optional 32-bit Identification
device identification code.
.c
w
w
w
Internal Register
test instructions and data are loaded from system input pins on the rising edge of TCK
and driven through system output pins on its falling edge. TCK is pulsed by the
equipment controlling the test and not by the tested device. It can be pulsed at any
frequency (up to a maximum of some MHz). It can be even pulsed at varying rates.
Test Data Input (TDI): an input line to allow the test instruction and test data to be loaded
into the instruction register and the various test data registers, respectively.
Test Data Output (TDO): an output line used to serially output the data from the JTAG
registers to the equipment controlling the test.
Test Mode Selector (TMS): the test control input to the TAP controller. It controls the
transitions of the test interface state machine. The test operations are controlled by the
sequence of 1s and 0s applied to this input. Usually this is the most important input that
has to be controlled by external testers or the on-board test controller.
Test Reset Input (TRST*): The optional TRST* pin is used to initialize the TAP controller, that
o m
is, if the TRST* pin is used, then the TAP controller can be asynchronously reset to a Test-
t.c
Logic-Reset state when a 0 is applied at TRST*. This pin can also be used to reset the circuit
under test, however it is not recommended for this application.
p o
2.2 Boundary Scan Cell g s
lo
The IEEE Std. 1149.1a specifies the design of four.b test data registers as shown in Figure
41.2. Two mandatory test data registers, the bypass and
u p the boundary-scan resisters, must be
o
included in any boundary scan architecture. The boundary scan register, though may be a little
confusing by its name, refers to the collection ofr the boundary scan cells. The other registers,
s gdesign-specific test data registers, can be added
optionally. n t
such as the device identification register and the
d e
Basic Boundary t u Scan Cell (BC 1)
s
ti y Scan Out = 0, Functional mode
.c (SO) Mode = 1, Test mode
(for BC_1)
w
w
Data In w 0 Data Out
(PI) Capture Update
Hold Cell 1 (PO)
Scan Cell
0
D Q D Q
1
Clk Clk
C
U
Scan in ShiftDR ClockDR UpdateDR
S
(SI)
Figure 41.3 [1] shows a basic universal boundary-scan cell, known as a BC_1. The cell has four
modes of operation: normal, update, capture, and serial shift. The memory elements are two D-
type flip-flops with front-end and back-end multiplexing of data. It is important to note that the
circuit shown in Figure 41.3 is only an example of how the requirement defined in the Standard
could be realized. The IEEE 1149.1 Standard does not mandate the design of the circuit, only its
functional specification. The four modes of operation are as follows:
1) During normal mode also called serial mode, Data_In is passed straight through to
Data_Out.
2) During update mode, the content of the Update Hold cell is passed through to Data_Out.
Signal values already present in the output scan cells to be passed out through the device
output pins. Signal values already present in the input scan cells will be passed into the
internal logic.
m
3) During capture mode, the Data_In signal is routed to the input Capture Scan cell and the
o
t.c
value is captured by the next ClockDR. ClockDR is a derivative of TCK. Signal values
on device input pins to be loaded into input cells, and signal values passing from the
p o
internal logic to device output pins to be loaded into output cells
4) During shift mode, the Scan_Out of one Capture Scan cellsis passed to the Scan_In of the
o g
next Capture Scan cell via a hard-wired path.
b l input pin and the various modes
.
The Test ClocK, TCK, is fed in via yet another dedicated device
of operation are controlled by a dedicated Test Mode pSelect (TMS) serial control signal. Note
o
that both capture and shift operations do not interfere u with the normal passing of data from the
parallel-in terminal to the parallel-out terminal. rThis allows on the fly capture of operational
s g without interference. This application of
values and the shifting out of these values for inspection
t
the boundary-scan register has tremendousnpotential for real-time monitoring of the operational
d e
status of a system a sort of electronic camera taking snapshots and is one reason why TCK
is kept separate from any system clocks.
t u
s
2.3 Boundary Scan iPath ty
At the device level, the w
.c
boundary-scan elements contribute nothing to the functionality of the
internal logic. In fact, w
the boundary-scan path is independent of the function of the device. The
w
value of the scan path is at the board level as shown in Figure 41.4 [1].
The figure shows a board containing four boundary-scan devices. It is seen that there is an edge-
connector input called TDI connected to the TDI of the first device. TDO from the first device is
permanently connected to TDI of the second device, and so on, creating a global serial scan path
terminating at the edge connector output called TDO. TCK is connected in parallel to each
device TCK input. TMS is connected in parallel to each device TMS input. All cell boundary
data registers are serially loaded and read from this single chain.
Boundary-scan cell
Chip 1 Chip 2
TDI
TMS TMS
Serial
data in
TCK TCK
Chip 4
Chip 3
TMS TMS
o m
TCK TCK
o t.c Serial
data out
s p TDO
o g
.bl TCK
TMS
Serial test interconnect
u p System interconnect
r o
Fig. 41.4 MCM with Serial Boundary Scan Chain
s g
The advantage of this configuration is that tonly two pins on the PCB/MCM are needed for
n
boundary scan data register support. Theedisadvantage is very long shifting sequences to deliver
test patterns to each component, and todshift out test responses. This leads to expensive time on
t
the external tester. As shown in Figureu 41.5 [1], the single scan chain is broken into two parallel
boundary scan chains, which share y s a common test clock (TCK). The extra pin overhead is one
t
i scan chains, so the test patterns are half as long and test
time is roughly halved. Here.c
more pin. As there are two boundary
both chains share common TDI and TDO pins, so when the top two
w
chips are being shifted, the
TDO lines. The opposite wmustbottom two chips must be disabled so that they do not drive their
hold true when the bottom two chips are being tested.
w
TMS1
TMS2
TDO
TCK
TDI
o m
o t.c
s p
o g
.bl
u p
r o
Fig. 41.5 MCM with two parallel boundary scan chains
s g
2.4 TAP Controller
n t
d e
The operation of the test interface is controlled by the Test Access Port (TAP) controller. This is
u
a 16-state finite state-machine whoset state transitions are controller by the TMS signal; the state-
s 41.7. The TAP controller can change state only at the
ti ystate is determined by the logic level of TMS. In other words,
transition diagram is shown in Figure
rising edge of TCK and the next
the state transition in Figure.c41.6 follows the edge with label 1 when the TMS line is set to 1,
otherwise the edge withw label 0 is followed. The output signals of the TAP controller
corresponding to a subsetw the labels associated with the various states. As shown in Figure
41.2, the TAP consistswof fourofmandatory terminals plus one optional terminal. The main functions of
the TAP controller are:
To reset the boundary scan architecture,
To select the output of instruction or test data to shift out to TDO,
To provide control signals to load instructions into Instruction Register,
To provide signals to shift test data from TDI and test response to TDO, and
To provide signals to perform test functions such as capture and application of test data.
TAP Controller
TMS ClockDR
TCK ShiftDR
TRST* UpdateDR
16-state FSM Reset*
TAP Controller Select
(Moore machine)
ClockIR
ShiftIR
UpdateIR
o m
t.c
Enable
p o
Fig. 41.6 Top level view of TAP Controller
g s
Figure 41.6 shows a top-level view of TAP Controller. TMSlo
. b andtheTCK (and the optional TRST*)
signals include dedicated signals to the Instruction u p (ClockIR, ShiftIR, UpdateIR) and
go to a 16-state finite-state machine controller, which produces
register
various control signals. These
r o
generic signals to all data registers (ClockDR, ShiftDR, UpdateDR). The data register that
s g particular
actually responds is the one enabled by the conditional control signals generated at the parallel
n t
outputs of the Instruction register, according to the instruction.
The other signals, Reset, Select and Enable
d e are distributed as follows:
t u register and to the target Data Register
Reset is distributed to the Instruction
s
Select is distributed to theyoutput multiplexer
ti
Enable is distributed.to c the output driver amplifier
w uses the term Data Register to mean any target register except
w
It must be noted that the Standard
w
the Instruction register
0 0
Shift_DR 0 Shift_IR 0
1 1
1 1
Exit_DR Exit1_IR
0 0
o m
Pause_DR 0 Pause_IR
t.c 0
po
1 1
0
Exit2_DR
0
g s Exit2_IR
1
o 1
bl
p.
Update_DR Update_IR
1 0 1 0
o u
g r of TAP controller
s
Fig. 41.7 State transition diagram
t
e
Figure 41.7 shows the 16-state state table for n the TAP controller. The value on the state transition
u d
arcs is the value of TMS. A state transition occurs on the positive edge of TCK and the controller
output values change on the negativet edge of TCK. The 16 states can be divided into three parts.
y
The first part contains the reset and s idle states, the second and third parts control the operations
t
i respectively. Since the only difference between the second
of the data and instruction registers,
and the third parts are on the.cregisters they deal with, in the following only the states in the first
w
and second parts are described.
part. w Similar description on the second part can be applied to the third
w
1. Test-Logic-Reset: In this state, the boundary scan circuitry is disabled and the system is in
its normal function. Whenever a Reset* signal is applied to the BS circuit, it also goes back
to this state. One should also notice that whatever state the TAP controller is at, it will goes
back to this state if 5 consecutive 1's are applied through TMS to the TAP controller.
2. Run-Test/Idle: This is a state at which the boundary scan circuitry is waiting for some test
operations such as BIST operations to complete. One typical example is that if a BIST
operation requires 216 cycles to complete, then after setting up the initial condition for the
BIST operation, the TAP controller will go back to this state and wait for 216 cycles before it
starts to shift out the test results.
3. Select-DR-Scan: This is a temporary state to allow the test data sequence for the selected
test-data register to be initiated.
4. Capture-DR: In this state, data can be loaded in parallel to the data registers selected by the
current instruction.
5. Shift-DR: In this state, test data are scanned in series through the data registers selected by
the current instruction. The TAP controller may stay at this state as long as TMS=0. For
each clock cycle, one data bit is shifted into (out of) the selected data register through TDI
(TDO).
6. Exit-DR: All parallel-loaded (from the Capture-DR state) or shifted (from the Shift-DR
state) data are held in the selected data register in this state.
7. Pause-DR: The BS pauses its function here to wait for some external operations. For
example, when a long test data is to be loaded to the chip(s) under test, the external tester
may need to reload the data from time to time. The Pause-DR is a state that allows the
boundary scan architecture to wait for more data to shift in.
8. Exit2-DR: This state represents the end of the Pause-DR operation, allows the TAP
controller to go back to ShiftDR state for more data to shift in.
o m
9. Update-DR: The test data stored in the first stage of boundary scan
t. c cells is loaded to the
second stage in this state.
p o
2.5 Bypass and Identification Registers gs
b lo
Figure 41.8 shows a typical design for a Bypass register.
p . It is a 1-bit register, selected by the
means that the Update_DR control has no effect ono
u
Bypass instruction and provides a basic serial-shift function. There is no parallel output (which
operation). It is also possible to load (Capture) internal hard-wired values into the shift section of
the Instruction register. The Instruction register must be at least two-bits long to allow coding of
the four mandatory instructions Extest, Bypass, Sample, Preload but the maximum length
of the Instruction register is not defined. In capture mode, the two least significant bits must
capture a 01 pattern. (Note: by convention, the least-significant bit of any register connected
between the device TDI and TDO pins, is always the bit closest to TDO.) The values captured
into higher-order bits of the Instruction register are not defined in the Standard. One possible use
of these higher-order bits is to capture an informal identification code if the optional 32-bit
Identification register is not implemented. In practice, the only mandated bits for the Instruction
register capture is the 01 pattern in the two least-significant bits. We will return to the value of
capturing this pattern later in the tutorial.
Instruction Register
m
DR select and control signals routed to selected target register
o
o t.c
p
Decode Logic
s
o g
b l register
.Hold
p current instruction)
(Holds
o u
r
g Scan Register
From
ts To
TDI
e n
Scan-in new instruction/scan-out capture bits) TDO
u d
TAP
st Higher order bits:
0 1
Controller
i ty
IR Control
current instruction, status bits, informal ident,
.c results of a power-up self test,
w Fig. 41.9 Instruction register
w
2.7 Instruction Set
w
The IEEE 1149.1 Standard describes four mandatory instructions: Extest, Bypass, Sample, and
Preload, and six optional instructions: Intest, Idcode, Usercode, Runbist, Clamp and HighZ.
Whenever a register is selected to become active between TDI and TDO, it is always possible to
perform three operations on the register: parallel Capture followed by serial Shift followed by
parallel Update. The order of these operations is fixed by the state-sequencing design of the TAP
controller. For some target Data registers, some of these operations will be effectively null
operations, no ops.
Standard Instructions
Instruction Selected Data Register
Mandatory:
Extest Boundary scan (formerly all-0s code)
Bypass Bypass (initialized state, all-1s code)
Sample Boundary scan (device in functional mode)
Preload Boundary scan (device in function mode)
Optional:
Intest Boundary scan
Idcode identification (initialized state if present)
Usercode Identification (for PLDs)
Runbist Result register
o m
t.c
Clamp Bypass (output pins in safe state)
HighZ Bypass (output pins in high-Z state)
p o
NB. All unused instruction codes must default to Bypass
g s
EXTEST: This instruction is used to test interconnect between two chips. The code for Extest
o
bl
used to be defined to be the all-0s code. The EXTEST instruction places an IEEE 1149.1
p .
compliant device into an external boundary test mode and selects the boundary scan register to
be connected between TDI and TDO. During this instruction, the boundary scan cells associated
o u
with outputs are preloaded with test patterns to test downstream devices. The input boundary
g r
cells are set up to capture the input data for later analysis.
BYPASS: A device's boundary scan chaintscan be skipped using the BYPASS instruction,
allowing the data to pass through the bypassnregister. The Bypass instruction must be assigned an
e
all-1s code and when executed, causesdthe Bypass register to be placed between the TDI and
t u of a selected device without incurring the overhead of
TDO pins. This allows efficient testing
traversing through other devices. sThe BYPASS instruction allows an IEEE 1149.1 compliant
i
device to remain in a functional tymode and selects the bypass register to be connected between
the TDI and TDO pins. The.cBYPASS instruction allows serial data to be transferred through a
device from the TDI pin towthe TDO pin without affecting the operation of the device.
w
w
SAMPLE/PRELOAD: The Sample and Preload instructions, and their predecessor the
Sample/Preload instruction, selects the Boundary-Scan register when executed. The instruction
sets up the boundary-scan cells either to sample (capture) values or to preload known values into
the boundary-scan cells prior to some follow-on operation. During this instruction, the boundary
scan register can be accessed via a data scan operation, to take a sample of the functional data
entering and leaving the device. This instruction is also used to preload test data into the
boundary-scan register prior to loading an EXTEST instruction.
INTEST: With this command the boundary scan register (BSR) is connected between the TDI
and the TDO signals. The chip's internal core-logic signals are sampled and captured by the BSR
cells at the entry to the "Capture_DR" state as shown in TAP state transition diagram. The
contents of the BSR register are shifted out via the TDO line at exits from the "Shift_DR" state.
As the contents of the BSR (the captured data) are shifted out, new data are sifted in at the entries
to the "Shift_DR" state. The new contents of the BSR are applied to the chip's core-logic signals
during the "Update_DR" state.
IDCODE: This is used to select the Identification register between TDI and TDO, preparatory to
loading the internally-held 32-bit identification code and reading it out through TDO. The 32 bits
are used to identify the manufacturer of the device, its part number and its version number.
USERCODE: This instruction selects the same 32-bit register as IDCODE, but allows an
alternative 32 bits of identity data to be loaded and serially shifted out. This instruction is used
for dual-personality devices, such as Complex Programmable Logic Devices and Field
Programmable Gate Arrays.
RUNBIST: An important optional instruction is RunBist. Because of the growing importance of
internal self-test structures, the behavior of RunBist is defined in the Standard. The self-test
routine must be self-initializing (i.e., no external seed values are allowed), and the execution of
RunBist essentially targets a self-test result register between TDI and TDO. At the end of the
self-test cycle, the targeted data register holds the Pass/Fail result. With this instruction one can
control the execution of the memory BIST by the TAP controller, and hence reducing the
hardware overhead for the BIST controller.
o m
CLAMP: Clamp is an instruction that uses boundary-scan cells to drive preset values established
o t.c
initially with the Preload instruction onto the outputs of devices, and then selects the Bypass
register between TDI and TDO (unlike the Preload instruction which leaves the device with the
s p
boundary-scan register still selected until a new instruction is executed or the device is returned
g
to the Test_Logic Reset state). Clamp would be used to set up safe guarding values on the
o
bl
outputs of certain devices in order to avoid bus contention problems, for example.
p .
HIGH-Z: It is similar to Clamp instruction, but it leaves the device output pins in a high-
impedance state rather than drive fixed logic-1 or logic-0 values. HighZ also selects the Bypass
register between TDI and TDO. o u
g r
3. On Board Test Controller ts
e n
So far the test architecture of boundary
u d scan inside the chip under test has been discussed. A
t
major problem remains is "Who is going to control the whole boundary scan test procedure?" In
general there are two solutions forsthis problem: using an external tester and using a special on-
y expensive because of the involving of an IC tester. The
board controller. The former isitusually
c to complete the whole test procedure. As clear from the above
latter provides an economic.way
wthe test data, the most important signal that a test controller has to
description, in addition to
w There exist two methods to provide this signal in a board: the star
provide is the TMS signal.
configuration and thewring configuration as shown in Figure 41.10. In the star configuration the
TMS is broadcast to all chips. Hence all chips must execute the same operation at any time. For
the ring structure, the test controller provides one independent TMS signal for each chip,
therefore great flexibility of the test procedure is facilitated.
TDI TDI
TCK TCK #N
TMS #N TMS
TDO TDO
(a) (b)
o m
Fig. 41.10 BUS master for chips with BS: (a) star structure, (b) ring structure
ou
Figure 41.11, "Single Boundary Scan Chain on a Board," illustrates the on onboard TAP
gr
controllers connected to an offboard TAP control device, such as a personal computer, through a
s
nt
TAP access connector. The offboard TAP control device can perform different tests during board
manufacturing without the need of bed-of-nail equipment.
d e
t u
y s
it
.c
w
w
w
L L L
O O O
G G G
I I I
C C C
TDI TDO TDI TDO TDI TDO
BP BP BP
IR IR IR
DR DR DR
TCK TMS TCK TMS TCK TMS
TAP TAP TAP
o m
o t.c
s p
o g
bl
TAP Control Device
(Test Software
Figureon11PC/WS)
p .
u
o
Test Connector
g r
s
t Scan Chain on a Board
n
Fig. 41.11 Single Boundary
e Sequence
5. Simple Board Level Test
u d
t
One of the first tests that shouldys
i t be performed for a PCB test is called the infra-structure test.
c
This test is used to determine whether all the components are installed correctly. This test relies
on the fact that the last two .bits of the instruction register (IR) are always ``01''. By shifting out
w chain, it can be determined whether the device is properly installed.
w
the IR of each device in the
w test is successful, the board level interconnect test can begin. This is
This is accomplished through sequencing the TAP controller for IR read.
After the infra-structure
accomplished through the EXTEST command. This test can be used to check out ``opens'' and
``shorts'' on the PCB. The test patterns are preloaded into the output pins of the driving devices.
Then they are propagated to the receiving devices and captured in the input boundary scan cells.
The result can then be shifted out through the TDO pin for analysis.
These patterns can be generated and analyzed automatically, via software programs. This feature
is normally offered through tools like Automatic Test Pattern Generation (ATPG) or Boundary
Scan Test Pattern Generation (BTPG).
language can greatly reduce the effort to incorporate boundary scan into a chip, and hence is
quite useful when a designer wishes to design boundary scan in his own style. Basically for those
parts that are mandatory to the Std. 1149.1a such as the TAP controller and the BYPASS
register, the designer does not need to describe them; they can be automatically generated. The
designer only has to describe the specifications related to his own design such as the length of
boundary scan register, the user-defined boundary scan instructions, the decoder for his own
instructions, the I/O pins assignment. In general these descriptions are quite easy to prepare. In
fact, currently many CAD tools already implement the boundary scan generation procedure and
thus it may even not needed for a designer to write the BSDL file: the tools can automatically
generate the needed boundary scan circuitry for any circuit design as long as the I/O of the
design is specified.
Any manufacturer of a JTAG compliant device must provide a BSDL file for that device. The
BSDL file contains information on the function of each of the pins on the device - which are
used as I/Os, power or ground. BSDL files describe the Boundary Scan architecture of a JTAG-
compliant device, and are written in VHDL. The BSDL file includes:
o m
t.c
1. Entity Declaration: The entity declaration is a VHDL construct that is used to identify the
name of the device that is described by the BSDL file.
p o
BSDL file. s
2. Generic Parameter: The Generic parameter specifies which package is described by the
g
o
l and states whether that pin is an
3. Logical Port Description: lists all of the pads on a device,
. b
bit;). u p
input(in bit;), output(out bit;), bidirectional (inout bit;) or unavailable for boundary scan (linkage
12 11
D6 Q6 D6 6 0 Q6
C 13 C 10
D5 O Q5 D5 7 O 1 Q5
R 14 R 9
D4 Q4 D4 8 2 Q4
E E
15 8
D3 Q3 D3 9 3 Q3
L L 7
16
D2 O Q2 D2 10 O 4 Q2
G 17 G 6
D1 I Q1 D1 11 I 5 Q1
C 1 C
CLK CLK 12
TAP
m
Controller
o
2
o t.3c 4
5
TDI TCK TMS TDO
(a) p
s (b)
o g
bl
Fig. 41.12 Example to illustrate BSDL (a) core logic (b) after BS insertion
7. p Scan
Benefits and Penalties of Boundary
.
o u
The decision whether to use boundary-scan usually
g r involves economics. Designers often hesitate
ts
to use boundary-scan due to the additional silicon involved. In many cases it may appear that the
penalties outweigh the benefits for an ASIC. n However, considering an analysis spanning all
assembly levels and all test phases duringethe system's life, the benefits will usually outweigh the
u d
penalties.
st
Benefits
i t y
.c
The benefits provided by boundary-scan include the following:
w
lower test generation
w costs
reduced test time
reduced timew to market
simpler and less costly testers
compatibility with tester interfaces
high-density packaging devices accommodation
By providing access to the scan chain I/Os, the need for physical test points on the board is
eliminated or greatly reduced, leading to significant savings as a result of simpler board layouts,
less costly test fixtures, reduced time on in-circuit test systems, increased use of standard
interfaces, and faster time-to-market. In addition to board testing, boundary-scan allows
programming almost all types of CPLDs and flash memories, regardless of size or package type,
on the board, after PCB assembly. In-system programming saves money and improves
throughput by reducing device handling, simplifying inventory management, and integrating the
programming steps into the board production line.
Penalties
The penalties incurred in using boundary-scan include the following:
extra silicon due to boundary scan circuitry
added pins
additional design effort
degradation in performance due to gate delays through the additional circuitry
increased power consumption
Boundary Scan Example
Since boundary-scan design is new to many designers, an example of gate count for a circuit
with boundary scan is discussed here. This provides an estimate for the circuitry sizes required to
implement the IEEE 1149.1 standard, but without the extensions defined in the standard. The
example uses a library-based gate array design environment. The gate counts given are based on
requirement. o m
commercial cells and relate to a 10000 gate design in a 40-pin package. Table 1 gives the gate
u d
Table: 1 Gate requirements for a Gate Array Boundary-scan Design
st
it y
It must be noted that in Table 1 the boundary-scan implementation requires 868 gates, requiring
an estimated 8 percent overhead. It also be noted that the cells used in this example were created
.c
prior to publication of the IEEE 1149.1 standard. If specific cell designs had been available to
w
support the standard or if the vendor had placed the boundary-scan circuitry in areas of the ASIC
w
not available to the user, then the design would have required less.
w
9. Conclusion
Board level testing has become more complex with the increasing use of fine pitch, high pin
count devices. However with the use of boundary scan the implementation of board level testing
is done more efficiently and at lower cost. This standard provides a unique opportunity to
simplify the design debug and test processes by enabling a simple and standard means of
automatically creating and applying tests at the device, board, and system levels. Boundary scan
is the only solution for MCMs and limited-access SMT/ML boards. The standard supports
external testing with an ATE. The IEEE 1532-2000 In-System Configuration (ISC) standard
makes use of 1149.1 boundary-scan structures within the CPLD and FPGA devices.
References
[1] IEEE-SA Standards Board, 3 Park Avenue, New York, NY 10016-5997, USA, IEEE
Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1-2002,
(Revision of IEEE Std 1149.1-1990), http://grouper.ieee.org/groups/1149/1or
http://standards.ieee.org/catalog/
[2] Parker, The boundary-scan handbook: analog and digital, Kluwer Academic Press,
1998 (2nd Edition).
[3] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic
Publishers, Norwell, MA, 2000.
[4] IEEE 1149.4 Mixed-Signal Test Bus Standard web site:
http://grouper.ieee.org/groups/1149/4
[5] IEEE 1532 In-System Configuration Standard web site:
http://grouper.ieee.org/groups/1532/
[6] Agilent Technologies BSDL verification
o m service:
t.c
http://www.agilent.com/see/bsdl_service
Problems p o
g s
o
bl
1. What is Boundary Scan? What is the motivation of boundary scan?
.
2. How boundary scan technique differs from so-called bed-of-nails techniques?
p
3. What are the different device packaging styles?
uo
4. What is JTAG?
g r
5. Give an overview of the boundary scansfamily i.e., 1149.
n t
6. Show boundary scan architecture and
d e describe functions of its elements.
t u
7. Show the basic cell of a boundary-scan register. Describe different modes of its
ys chips with 100 pins each. The length of the total scan chain
operation.
8. A board is composed ofit100
is 10,000 bits. Find a.c
possible testing strategy to reduce the scan chain length.
w What are the main functions of TAP controller?
w
9. What is TAP controller?
w boundary scan chain and its operation. What are its disadvantages and
10. Describe a serial
discuss a strategy to overcome these.
11. Discuss different instruction sets and their functions.
12. Considering a board populated by IEEE 1149.1-compliant devices (a "pure" boundary-
scan board), summarize a board-test strategy.
13. What is the goal of the infrastructure test? Is the infrastructure test mandatory or
optional? Which are the main steps of an infrastructure test?
14. Consider the example depicted in the following figure.
TDO
TDI
A C E
B IC1 IC2 F
D
This circuit has two primary inputs, two primary outputs and two nets that connect the ICs one to
o m
the other. There is only 1 TAP, which connects the TDI and TDO of both ICs. Prepare a test plan
for this circuit.
o t.c
15. Consider a board composed of 100 40-pin Boundary-Scan devices, 2,000 interconnects,
s p
an 8-bit Instruction Register per device, a 32-bit Identification Register per device, and a
g
10 MHz test application rate. Compute the test time to execute a test session.
o
16. What is BSDL. What are the different BSDL files?bl
p .
o u
gr
ts
e n
u d
st
i ty
.c
w
w
w
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
w
w
Module
8
Testing of Embedded
System
Version 2 EE IIT, Kharagpur 1
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
o m
o t.c
s p
o g
. bl
u p
r o
s g
ent
u d
st
it y
.c
w
Lesson
w
w
42
On-line Testing of
Embedded Systems
Version 2 EE IIT, Kharagpur 2
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
Instructional Objectives
After going through this lesson the student would be able to
1. Introduction o m
o t.c
EMBEDDED SYSTEMS are computers incorporated in consumer products or other devices
s p
to perform application-specific functions. The product user is usually not even aware of the
g
existence of these systems. From toys to medical devices, from ovens to automobiles, the range
o
. bl
of products incorporating microprocessor-based, software controlled systems has expanded
rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems is
u p
clear: They promise previously impossible functions that enhance the performance of people or
r o
machines. As these systems gain sophistication, manufacturers are using them in increasingly
g
critical applications products that can result in injury, economic loss, or unacceptable
s
nt
inconvenience when they do not perform as required.
d e
Embedded systems can contain a variety of computing devices, such as microcontrollers,
application-specific integrated circuits, and digital signal processors. A key requirement is that
t u
these computing devices continuously respond to external events in real time. Makers of
y s
embedded systems take many measures to ensure safety and reliability throughout the lifetime of
it
products incorporating the systems. Here, we consider techniques for identifying faults during
.c
normal operation of the productthat is, online-testing techniques. We evaluate them on the
w
basis of error coverage, error latency, space redundancy, and time redundancy.
w
2. w
Embedded-system test issues
Cost constraints in consumer products typically translate into stringent constraints on product
components. Thus, embedded systems are particularly cost sensitive. In many applications, low
production and maintenance costs are as important as performance.
Moreover, as people become dependent on computer-based systems, their expectations of
these systems availability increase dramatically. Nevertheless, most people still expect
significant downtime with computer systemsperhaps a few hours per month. People are much
less patient with computer downtime in other consumer products, since the items in question did
not demonstrate this type of failure before embedded systems were added. Thus, complex
consumer products with high availability requirements must be quickly and easily repaired. For
this reason, automobile manufacturers, among others, are increasingly providing online detection
and diagnosis, capabilities previously found only in very complex and expensive applications
p .
software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptance
testing checks for the presence or absence of well-defined events or conditions, usually
u
expressed as true-or-false conditions (predicates), related to the correctness or safety of
o
r
preceding computations. Diversity techniques compare replicated computations, either with
g
s
minor variations in data (data diversity) or with procedures written by separate, unrelated design
ent
teams (design diversity). This chapter focuses on digital hardware testing, including techniques
by which hardware tests itself, built-in self-test (BIST). Nevertheless, we must consider the role
u d
of software in detecting, diagnosing, and handling hardware faults. If we can use software to test
st
hardware, why should we add hardware to test hardware? There are two possible answers. First,
t y
it may be cheaper or more practical to use hardware for some tasks and software for others. In an
i
.c
embedded system, programs are stored online in hardware-implemented memories such as
ROMs (for this reason, embedded software is sometimes called firmware). This program storage
w
w
space is a finite resource whose cost is measured in exactly the same way as other hardware. A
w
function such as a test is soft only in the sense that it can easily be modified or omitted in the
final implementation.
The second answer involves the time that elapses between a faults occurrence and a problem
arising from that fault. For instance, a fault may induce an erroneous system state that can
ultimately lead to an accident. If the elapsed time between the faults occurrence and the
corresponding accident is short, the fault must be detected immediately. Acceptance tests can
detect many faults and errors in both software and hardware. However, their exact fault coverage
is hard to measure, and even when coverage is complete, acceptance tests may take a long time
to detect some faults. BIST typically targets relatively few hardware faults, but it detects them
quickly.
These two issues, cost and latency, are the main parameters in deciding whether to use
hardware or software for testing and which hardware or software technique to use. This decision
requires system-level analysis. We do not consider software methods here. Rather, we emphasize
the appropriate use of widely implemented BIST methods for online hardware testing. These
methods are components in the hardware-software trade-off.
Version 2 EE IIT, Kharagpur 4
Downloaded from www.citystudentsgroup.blogspot.com
Downloaded from www.citystudentsgroup.blogspot.com
3. Online testing
Faults are physical or logical defects in the design or implementation of a digital device.
Under certain conditions, they lead to errorsthat is, incorrect system states. Errors induce
failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is a
hazard. Faults can be classified into three groups: design, fabrication, and operational. Design
faults are made by human designers or CAD software (simulators, translators, or layout
generators) during the design process. Fabrication defects result from an imperfect
manufacturing process. For example, shorts and opens are common manufacturing defects in
VLSI circuits. Operational faults result from wear or environmental disturbances during normal
system operation. Such disturbances include electromagnetic interference, operator mistakes, and
extremes of temperature and vibration. Some design defects and manufacturing faults escape
detection and combine with wear and environmental disturbances to cause problems in the field.
Operational faults are usually classified by their duration:
m
Permanent faults remain in existence indefinitely if no corrective action is taken. Many
o
t.c
are residual design or manufacturing faults. The rest usually occur during changes in
system operation such as system start-up or shutdown or as a result of a catastrophic
environmental disturbance such as a collision. p o
s
Intermittent faults appear, disappear, and reappear repeatedly. They are difficult to
g
o
bl
predict, but their effects are highly correlated. When intermittent faults are present, the
system works well most of the time but fails under atypical environmental conditions.
p .
Transient faults appear and disappear quickly and are not correlated with each other.
u
They are most commonly induced by random environmental disturbances.
o
r
One generally uses online testing to detect operational faults in computers that support critical or
g
s
high-availability applications. The goal of online testing is to detect fault effects, or errors, and
ent
take appropriate corrective action. For example, in some critical applications, the system shuts
down after an error is detected. In other applications, error detection triggers a reconfiguration
u d
mechanism that allows the system to continue operating, perhaps with some performance
st
degradation. Online testing can take the form of external or internal monitoring, using either
t y
hardware or software. Internal monitoring, also called self-testing, takes place on the same
i
.c
substrate as the circuit under test (CUT). Today, this usually means inside a single ICa system
on a chip. There are four primary parameters to consider in designing an online-testing scheme:
w
error coveragethe fraction of modeled errors detected, usually expressed as a
w
w
percentage. Critical and highly available systems require very good error coverage to
minimize the probability of system failure.
error latencythe difference between the first time an error becomes active and the first
time it is detected. Error latency depends on the time taken to perform a test and how
often tests are executed. A related parameter is fault latency, the difference between the
onset of the fault and its detection. Clearly, fault latency is greater than or equal to error
latency, so when error latency is difficult to determine, test designers often consider fault
latency instead.
space redundancythe extra hardware or firmware needed for online testing.
time redundancythe extra time needed for online testing.
The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock
cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT
and impose no functional or structural restrictions on it. Most BIST methods meet some of these
constraints without addressing others. Considering all four parameters in the design of an online-
testing scheme may create conflicting goals. High coverage requires high error latency, space
redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling
1) minimize time redundancy but require more hardware. On the other hand, schemes with
delayed detection (error latency greater than 1) reduce time and space redundancy at the expense
of increased error latency. Several proposed delayed-detection techniques assume
equiprobability of input combinations and try to establish a probabilistic bound on error latency
[2]. As a result, certain faults remain undetected for a long time because tests for them rarely
appear at the CUTs inputs.
To cover all the operational fault types described earlier, test engineers use two different
modes of online testing: concurrent and non-concurrent. Concurrent testing takes place during
normal system operation, and non-concurrent testing takes place while normal operation is
temporarily suspended. One must often overlap these test modes to provide a comprehensive
online-testing strategy at acceptable cost.
4. Non-concurrent testing m
o
c (periodic) and is
t.
This form of testing is either event-triggered (sporadic) or time-triggered
characterized by low space and time redundancy. Event triggeredotesting is initiated by key
events or state changes such as start-up or shutdown, and its goalpis to detect permanent faults.
Detecting and repairing permanent faults as soon as possible g s is usually advisable. Event-
triggered tests resemble manufacturing tests. Any such test lcan o be applied online, as long as the
. b
required testing resources are available. Typically, the hardware is partitioned into components,
each exercised by specific tests. RAMs, for instance, are
u p tested with manufacturing tests such as
March tests [3]. o
Time-triggered testing occurs at predeterminedr times in the operation of the system. It detects
g
permanent faults, often using the same typess of tests applied by event-triggered testing. The
n
periodic approach is especially useful in systems
t that run for extended periods during which no
significant events occur to trigger testing.
d e Periodic testing is also essential for detecting
t
intermittent faults. Such faults typicallyu behave as permanent faults for short periods. Since they
usually represent conditions that must
y s orbemanufacturing
corrected, diagnostic resolution is important. Periodic
i
testing can identify latent design t flaws that appear only under certain
.cduring each test
environmental conditions. Time-triggered tests are frequently partitioned and interleaved so that
w
only part of the test is applied period.
w
5. Concurrent w testing
Non-concurrent testing cannot detect transient or intermittent faults whose effects disappear
quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults.
However, concurrent testing is not particularly useful for diagnosing the source of errors, so test
designers often combine it with diagnostic software. They may also combine concurrent and
non-concurrent testing to detect or diagnose complex faults of all types.
A common method of providing hardware support for concurrent testing, especially for
detecting control errors, is a watchdog timer [4]. This is a counter that the system resets
repeatedly to indicate that the system is functioning properly. The watchdog concept assumes
that the system is fault-freeor at least aliveif it can reset the timer at appropriate intervals.
The ability to perform this simple task implies that control flow is correctly traversing timer-reset
points. One can monitor system sequencing very precisely by guarding the watchdog- reset
operations with software-based acceptance tests that check signatures computed while control
flow traverses various checkpoints. To implement this last approach in hardware, one can
construct more complex hardware watchdogs.
A key element of concurrent testing for data errors is redundancy. For example, the
duplication-with-comparison (DWC) technique5 detects any single error at the expense of 100%
space redundancy. This technique requires two copies of the CUT, which operate in tandem with
identical inputs. Any discrepancy in their outputs indicates an error. In many applications,
DWCs high hardware overhead is unacceptable. Moreover, it is difficult to prevent minor
timing variations between duplicated modules from invalidating comparison.
A possible lower-cost alternative is time redundancy. A technique called double execution, or
retry, executes critical operations more than once at diverse time points and compares their
results. Transient faults are likely to affect only one instance of the operation and thus can be
detected. Another technique, re-computing with shifted operands (RESO) [5] achieves almost the
same error coverage as DWC with 100% time redundancy but very little space redundancy.
However, no one has demonstrated the practicality of double execution and RESO for online
testing of general logic circuits.
o m
A third, widely used form of redundancy is information redundancythe addition of
o t.c
redundant coded information such as a parity-check bit[5]. Such codes are particularly effective
for detecting memory and data transmission errors, since memories and networks are susceptible
p
to transient errors. Coding methods can also detect errors in data computed during critical
s
operations.
o g
6. Built-in self-test . bl
u p
o
For critical or highly available systems, a comprehensive online-testing approach that covers
r
g
all expected permanent, intermittent, and transient faults is essential. In recent years, BIST has
s
nt
emerged as an important method of testing manufacturing faults, and researchers increasingly
promote it for online testing as well.
d e
BIST is a design-for-testability technique that places test functions physically on chip with
t u
the CUT, as illustrated in Figure 42.1. In normal operating mode, the CUT receives its inputs
s
from other modules and performs the function for which it was de-signed. In test mode, a test
y
it
pattern generator circuit applies a sequence of test patterns to the CUT, and a response monitor
.c
evaluates the test responses. In the most common type of BIST, the response monitor compacts
w
the test responses to form fault signatures. It compares the fault signatures with reference
w
signatures generated or stored on chip, and an error signal indicates any discrepancies detected.
w
We assume this type of BIST in the following discussion.
In developing a BIST methodology for embedded systems, we must consider four primary
parameters related to those listed earlier for online-testing techniques:
fault coveragethe fraction of faults of interest that the test patterns produced by the test
generator can expose and the response monitor can detect. Most monitors produce a fault-
free signature for some faulty response sequences, an undesirable property called
aliasing.
test set sizethe number of test patterns produced by the test generator. Test set size is
closely linked to fault coverage; generally, large test sets imply high fault coverage.
However, for online testing, test set size must be small to reduce fault and error latency.
hardware overheadthe extra hardware needed for BIST. In most embedded systems,
high hardware overhead is not acceptable.
u p
Ensuring that fault coverage is sufficiently high and the number of tests is sufficiently low
r o
are the main problems with random BIST methods. Researchers have proposed two general
g
approaches to preserve the cost advantages of LFSRs while greatly shortening the generated test
s
nt
sequence. One approach is to insert test points in the CUT to improve controllability and
observability. However, this approach can result in performance loss. Alternatively, one can
d e
introduce some determinism into the generated test sequencefor example, by inserting specific
t
seed tests known to detect hard faults. u
s
Some CUTs, including data path circuits, contain hard-to detect faults that are detectable by
y
it
only a few test patterns, denoted Thard. An N-bit LSFR can generate a sequence that eventually
.c
includes 2N - 1 patterns (essentially all possibilities). However, the probability that the tests in
w
Thard will appear early in the sequence is low. In such cases, one can use deterministic testing,
w
which tailors the generated test sequence to the CUTs functional properties, instead of random
w
testing. Deterministic testing is especially suited to RAMs, ROMs, and other highly regular
components. A deterministic technique called transparent BIST [3] applies BIST to RAMs while
preserving the RAM contentsa particularly desirable feature for online testing. Keeping
hardware overhead acceptably low is the main difficulty with deterministic BIST.
A straightforward way to generate a specific test set is to store it in a ROM and address each
stored test pattern with a counter. Unfortunately, ROMs tend to be much too expensive for
storing entire test sequences. An alternative method is to synthesize a finite-state machine that
directly generates the test set. However, the relatively large test set size and test vector width, as
well as the test sets irregular structure, are much more than current FSM synthesis programs can
handle.
Another group of test generator design methods, loosely called deterministic, attempt to
embed a complete test set in a specific generated sequence. Again the generated tests must meet
the coverage, overhead, and test size constraints weve discussed. An earlier article [7] presents a
representative BIST design method for data path circuits that meets these requirements. The test
Inputs
Outputs
Multi-
Test plexer
pattern Circuit under test
sequence (CUT) Error
Response
Test monitor
generator
o m
Control
o t.c
s p
Fig. 42.1 A General BIST Scheme
o g
An Example . bl
u p
o
IEEE 1149.4 based Architecture for OLT of a Mixed Signal SoC
r
s g
Analog/mixed signal blocks like DCDC converters, PLLs, ADCs, etc. and digital modules
motion of the dual tandem hydraulic jack. The motion of the spool of the hydraulic servo valve
(Master control Valve), regulates the flow of oil to the tandem jacks, thereby determine the ram
position. The Spool and ram positions are controlled by means of feedback loops. The actuator
system is controlled by the on-board flight electronics. A lot of work has been done for On-line
fault detection and diagnosis of the mechanical system, however OLT of the electronic systems
were hardly looked into. It is to be noted that as Electro Hydraulic Actuators are mainly used in
mission critical systems like avionics; for reliable operation on-line fault detection and diagnosis
is required for both the mechanical and the electronic sub-systems.
The IEEE 1149.1 and 1149.4 circuitry are utilized to perform the BIST of the interconnecting
buses in between the cores. It may be noted that on-line tests are carried only for cores, which are
more susceptible to failures. However, the interconnecting buses are tested during startup and at
intervals when cores being connected by them are ideal. The test scheduling logic can be
designed as suggested in [10].
The following three classes of tests are carried in the SoC:
om
1. Interconnect test of the interconnecting buses (BIST)
c
t.
o
Interconnect testing is to detect open circuits in the interconnect betweens the cores, and to detect
p of whether they are
and diagnose bridging faults anywhere in the Interconnect --regardless
normally carry digital or analog signals. This test is performed g s by EXTEST instruction and
digital test patterns are generated from the pre-programmed test
b lo controller.
p .
2. Parametric test of the interconnecting
o u buses (BIST)
Parametric test: Parametric test permits analog g r measurements using analog stimulus and
responses. This test is also performed by EXTESTts instruction. For this only three values of
analog voltages viz., V =VDD, V =VDD/3,
H Low
e n V = VSS are given as test inputs by the controller
G
and the voltages at the output of the line
u d under test is sampled after one bit coarse digitization as
mentioned in the IEEE 1149.4 standard
s t
3. Internal test of thecicores ty (Concurrent tests)
.
This test is performed bywINTEST instruction and this enables the on-line monitors placed on
w in the SoC. This test can be enabled concurrently with the SoC
w
each of the cores present
operation and need not be synchronized to start up of the normal operation of the SoC. The
asynchronous startup/shutdown of the on-line testers facilitates power saving and higher
reliability of the test circuitry if compared to the functional circuit.
7. References
1) M.R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995.
2) K.K. Saluja, R. Sharma, and C.R. Kime, A Concurrent Testing Technique for Digital
Circuits, IEEE Trans. Computer-Aided Design, Vol. 7, No. 12, Dec. 1988, pp. 1250-
1259.
3) M. Nicolaidis, Theory of Transparent BIST for RAMs, IEEE Trans. Computers, Vol.
45, No. 10, Oct. 1996, pp. 1141-1156.
XTAL
Timing
Application Specific
DATA Processor
RAM Clock
16kB Divider
System
BUS
ADC
DACo m
o t.c
p
s Electro Hydraulic
TDI
o g Actuator System
bl
TMS (Simulation in
TCK
TDO On Chip Test Controller
(JTAG Interface)
p . Lab-View in a PC)
VH
o u
VL
VG
gr
s
e nt
AB1 AB2
u d
st
i t y DC/DC
.c Converter
w Battery &
w
Power supply to the cores Charger
w
Data and Control paths
IEEE 1149.4/1149.1 Boundary Scan Bus
Analog Buses (1149.4) AB1 and AB2
Fig. 42.2 Block Diagram of the SOC Representing On-Line Test Capability