Unit 1

UNIT I Introduction to Embedded Processors, Devices and Communication
Buses: Introduction to Embedded Systems - Design Metrics – Optimization

Challenges in Embedded system Design - Embedded Processors – General Purpose
Processor – Single Purpose Processor and Application Specific Instruction Set
Processor - IC Terminology – Full-Custom/VLSI – Semi-Custom ASIC - PLD
Introduction to RISC architecture, VLIW and DSP processors.
Introduction to I/O Devices – Types - Synchronous, Iso-synchronous and

Asynchronous Communications - Serial Communication – I2C, USB, CAN –
Wireless Communication – IrDA – Bluetooth.
1. INTRODUCTION TO EMBEDDED COMPUTING:
1.1 SYSTEM:
Definition:
o A System is a way of working, organizing or doing one or many tasks
according to a fixed plan, program, or set of rules.
o A system is also an arrangement in which all its units assemble and work
together according to the plan or program.
o Let us examine the following two examples.
Consider a watch.
It is a time-display system. Its parts are its hardware, needles
and battery with the beautiful dial, chassis and strap. These parts organize to
show the real time every second and continuously update the time every second.
The system-program updates the display using three needles after each second.
It follows a set of rules.
(i) All needles move clockwise only.
(ii) A thin and long needle rotates every second such that it returns to same
position after a minute.
(iii) A long needle rotates every minute such that it returns to same position
after an hour.
(iv) A short needle rotates every hour such that it returns to same position
after twelve hours.
(v) All three needles return to the same inclinations after twelve hours each
day.
Consider a washing machine.
It is an automatic clothes-washing system. The important
hardware parts include its status display panel, the switches and dials for user-
defined programming, a motor to rotate or spin, its power supply and control
unit, an inner water-level sensor, a solenoid valve for letting water in and
1
another valve for letting water drain out.
These parts organize to wash clothes automatically according to a
program preset by a user. The system-program is to wash the dirty clothes placed
in a tank, which rotates or spins in pre-programmed steps and stages.
It follows a set of rules.

(i) Follow the steps strictly in the following
sequence.
Step I: Wash by spinning the motor according to a programmed period.
Step II: Rinse in fresh water after draining out the dirty water, and
rinse a second time if the system is not programmed in
water-saving mode.
Step III: After draining out the water completely, spin fast the
motor for a programmed period for drying by centrifuging
out water from the clothes.
Step IV: Show the wash-over status by a blinking display. Sound
the alarm for a minute to signal that the wash cycle is
complete.
(ii) At each step, display the process stage of the system.
(iii) In case of an interruption, execute only the remaining part of the
program, starting from the position when the process was interrupted. There
can be no repetition from Step I unless the user resets the system by inserting
another set of clothes and resets the program.
A computer is a system that has the following or more components.
1. A microprocessor
2. A large memory comprising the following two kinds:
(a) Primary memory (semiconductor memories - RAM, ROM and fast
accessible caches)
(b) Secondary memory (magnetic memory located in hard disks, diskettes
and cartridge tapes and optical memory in CD-ROM)
3. Input units like keyboard, mouse,
digitizer ,scanner, etc.
4. Output units like video monitor, printer.
5. Networking units like Ethernet card, front-end processor-
based drivers, etc.
6. I/O units like a modem, fax cum modem, etc.
1.2 EMBEDDED SYSTEM :

An embedded system is one that has computer-hardware with software embedded
in it as one of its most important component. It is a dedicated computer-based
system for an application(s) or product. It may be either an independent system or a
part of a larger system. As its software usually embeds in ROM (Read Only
Memory) it does not need secondary memories as in a computer. An embedded
system has three main components:
1. It has hardware. Figure 1.1 shows the units in the hardware of an embedded
2
system.
2. It has main application software. The application software may perform
concurrently the series of tasks or multiple tasks.
3. It has a Real Time Operating System (RTOS), that supervises the
application software and provides a mechanism to let the processor run a
process as per scheduling and do the context-switch between the various
processes (tasks). RTOS defines the way the system works. It organizes
access to a resource in sequence of the series of tasks of the system. It schedules
their working and execution by following a plan to control the latencies and to
meet the deadlines. [ Latency refers to the waiting period between
running the codes of a task and the instance at which the need for the
task arises. ] It sets the rules during the execution of the application software.
A small-scale embedded system may not need an RTOS.
An embedded system has software designed to keep in view three constraints: (i)
available system-memory, (ii) available processor speed and (iii) the need to limit
power dissipation when running the system continuously in cycles of wait for events,
run, stop and wake-up.
Definition :
An Embedded System is a system that has software embedded
into computer-hardware, which makes a system dedicated for an
application(s) or specific part of an application or product or part of a
larger system.
An embedded system is one that has dedicated purpose software
embedded in computer hardware.
It is a dedicated computer based system for an application(s) or product.
It may be an independent system or apart of large system. Its software
usually embeds into a ROM (Read Only Memory) or flash.
―It is any device that includes a programmable computer but is not itself
intended to be a general purpose computer. –Wayne Wolf, Ref: 61
―Systems are the electronic systems that contain a microprocessor or a

microcontroller, but we do not think of them as computers– the computer is
hidden or embedded in the system.
– Todd D. Morton, Ref: 38
1.3 MAIN COMPONENTS OF EMBEDDED SYSTEMS :
1. Embeds hardware to give computer like functionalities.

2. Embeds main application software generally into flash or ROM and
the application software performs concurrently the number of tasks.
3. Embeds a real time operating system (RTOS), which supervises the
application software tasks running on the hardware and organizes the
accesses to system resources according to priorities and timing constraints of
tasks in the system.
3
1.4 CLASSIFICATION OF EMBEDDED SYSTEMS :
We can classify embedded systems into three types as follows:
1.Small Scale Embedded Systems: These systems are designed with a

single 8- or 16-bit microcontroller; they have little hardware and software
complexities and involve board-level design. They may even be battery
operated. When developing embedded software for these, an editor, assembler
and cross-assembler, specific to the microcontroller or processor used, are the
main programming tools. Usually, ‗C‘ is used for developing these
systems. ‗C‘ program compilation is done into the assembly, and executable
codes are then appropriately located in the system memory. The software has
to fit within the memory available and keep in view the need to limit power
dissipation when system is running continuously.
2. Medium Scale Embedded Systems : These systems are usually designed with a
single or few 16- or 32-bit microcontrollers or DSPs or Reduced Instruction Set
Computers (RISCs). These have both hardware and software complexities.
For complex software design, there are the following programming tools:
RTOS, Source code engineering tool, Simulator, Debugger and Integrated
Development Environment(IDE).Software tools also provide the solutions to the
Hardware complexities. An assembler is of little use as a programming tool.
These systems may also employ the readily available ASSPs and
IPs(explained later)for the various functions—for example, for the bus
interfacing, encrypting, deciphering, discrete cosine transformation and
inverse transformation, TCP/IP protocol stacking and network connecting
functions. [ASSPs and IPs may also have to be appropriately configured by the
system software before being integrated into the system-bus.]
3.Sophisticated Embedded Systems: Sophisticated embedded systems have

enormous hardware and software complexities and may need scalable
processors or configurable processors and programmable logic arrays. They are
used for cutting edge applications that need hardware and software co-design
and integration in the final system; however, they are constrained by the
processing speeds available in their hardware units. Certain software
functions such as encryption and deciphering algorithms, discrete
cosine transformation and inverse transformation algorithms, TCP/IP
protocol stacking and network driver functions are implemented in the
hardware to obtain additional speeds by saving time. Some of the functions of
the hardware resources in the system are also implemented by the software.
Development tools for these systems may not be readily available at a reasonable
4
cost or may not be available at all. In some cases, a compiler or retargetable
compiler might have to be developed for these. [A retargetable compiler is one
that configures according to the given target configuration in a system.]
2. ISSUES AND CHALLENGES IN EMBEDDED SYSTEM DESIGN
2.1 ISSUES IN EMBEDDED SYSTEM

Designing Realtime systems is a challenging task. Most of the challenge comes
from the fact that Realtime systems have to interact with real world entities. These
interactions can get fairly complex. A typical Realtime system might be interacting with
thousands of such entities at the same time. For example, a telephone switching system
routinely handles calls from tens of thousands of subscriber. The system has to connect each
call differently. Also, the exact sequence of events in the call might vary a lot.
In the following sections we will be discussing these very issues.
 Real-time Response
 Recovering from Failures
 Working with Distributed Architectures
 Asynchronous Communication
 Race Conditions and Timing
Real-time Response
Realtime systems have to respond to external interactions in a predetermined amount of
time. Successful completion of an operation depends upon the correct and timely operation
of the system. Design the hardware and the software in the system to meet the Realtime
requirements. For example, a telephone switching system must feed dial tone to thousands of
subscribers within a recommended limit of one second.
To meet these requirements, the off hook detection mechanism and the software
message communication involved have to work within the limited time budget. The system
has to meet these requirements for all the calls being set up at any given time.The designers
have to focus very early on the Realtime response requirements. During the architecture
design phase, the hardware and software engineers work together to select the right system
architecture that will meet the requirements.
Recovering from Failures
Realtime systems must function reliably in event of failures. These failures can be
internal as well as external. The following sections discuss the issues involved in handling
these failures.
Internal failures can be due to hardware and software failures in the system. The
different types of failures you would typically expect are:
 Software Failures in a Task: Unlike desktop applications, Realtime applications do not
have the luxury of popping a dialog box and exiting on detecting a failure. Design the tasks
to safeguard against error conditions.
5
This becomes even more important in a Realtime system because sequence of events can
result in a large number of scenarios. It may not be possible to test all the cases in the
laboratory environment. Thus apply defensive checks to recover from error conditions.
Also, some software error conditions might lead to a task hitting a processor exception. In
such cases, it might sometimes be possible to just rollback the task to its previous saved
state.
 Processor Restart: Most Realtime systems are made up of multiple nodes. It is not
possible to bring down the complete system on failure of a single node thus design the
software to handle independent failure of any of the nodes.
 Board Failure: Realtime systems are expected to recover from hardware failures. The
system should be able to detect and recover from board failures. When a board fails, the
system notifies the operator about the it. Also, the system should be able to switch in a spare
for the failed board. (If the board has a spare)
 Link Failure: Most of the communication in Realtime systems takes place over links
connecting the different processing nodes in the system. Again, the system isolates a link
failure and reroutes messages so that link failure does not disturb the message
communication.
External Failures in Realtime systems have to perform in the real world. Thus they should
recover from failures in the external environment. Different types of failures that can take
place in the environment are:
 Invalid Behavior of External Entities: When a Realtime system interacts with external
entities, it should be able to handle all possible failure conditions from these entities. A
good example of this is the way a telephone switching systems handle calls from
subscribers. In this case, the system is interacting with humans, so it should handle all kinds
of failures, like:
1. Subscriber goes off hook but does not dial
2. Toddler playing with the phone!
3. Subscriber hangs up before completing dialing.
 Inter Connectivity Failure: Many times a Realtime system is distributed across several
locations. External links might connect these locations. Handling of these conditions is
similar to handling of internal link failures. The major difference is that such failures might
be for an extended duration and many times it might not be possible to reroute the
messages.
Working with Distributed Architectures
Most Realtime systems involve processing on several different nodes. The system itself
distributes the processing load among several processors. This introduces several challenges
in design:
 Maintaining Consistency: Maintaining data-structure consistency is a challenge when
multiple processors are involved in feature execution. Consistency is generally maintained
by running data-structure audits.
 Initializing the System: Initializing a system with multiple processors is far more
complicated than bringing up a single machine. In most systems the software release is
resident on the OMC. The node that is directly connected to the OMC will initialize first.
6
When this node finishes initialization, it will initiate software downloads for the child nodes
directly connected to it. This process goes on in an hierarchical fashion till the complete
system is initialized.
 Inter-Processor Interfaces: One of the biggest headache in Realtime systems is defining
and maintaining message interfaces. Defining of interfaces is complicated by different byte
ordering and padding rules in processors. Maintenance of interfaces is complicated by
backward compatibility issues. For example if a cellular system changes the air interface
protocol for a new breed of phones, it will still have to support interfaces with older phones.
 Load Distribution: When multiple processors and links are involved in message
interactions distributing the load evenly can be a daunting task. If the system has evenly
balanced load, the capacity of the system can be increased by adding more processors. Such
systems are said to scale linearly with increasing processing power. But often designers find
themselves in a position where a single processor or link becomes a bottle neck. This leads
to costly redesign of the features to improve system scalability.
 Centralized Resource Allocation: Distributed systems may be running on multiple
processors, but they have to allocate resources from a shared pool. Shared pool allocation is
typically managed by a single processor allocating resources from the shared pool. If the
system is not designed carefully, the shared resource allocator can become a bottle neck in
achieving full system capacity.
Asynchronous Communication
Remote procedure calls (RPC) are used in computer systems to simplify software
design. RPC allows a programmer to call procedures on a remote machine with the same
semantics as local procedure calls. RPCs really simplify the design and development of
conventional systems, but they are of very limited use in Real time systems. The main reason
is that most communication in the real world is asynchronous in nature, i.e. very few
message interactions can be classified into the query response paradigm that works so well
using RPCs.
Thus most Real time systems support state machine based design where multiple
messages can be received in a single state. The next state is determined by the contents of the
received message. State machines provide a very flexible mechanism to handle asynchronous
message interactions. The flexibility comes with its own complexities. We will be covering
state machine design issues in future additions to the Real time Mantra.
Race Conditions and Timing
It is said that the three most important things in Real time system design are timing,
timing and timing. A brief look at any protocol will underscore the importance of timing. All
the steps in a protocol are described with exact timing specification for each stage. Most
protocols will also specify how the timing should vary with increasing load. Real time
systems deal with timing issues by using timers. Timers are started to monitor the progress of
events. If the expected event takes place, the timer is stopped. If the expected event does not
take place, the timer will timeout and recovery action will be triggered.
A race condition occurs when the state of a resource depends on timing factors that are
not predictable. This is best explained with an example. Telephone exchanges have two way
7
trunks which can be used by any of the two exchanges connected by the trunk. The problem
is that both ends can allocate the trunk at more or less the same time, thus resulting in a race
condition. Here the same trunk has been allocated for a incoming and an outgoing call. This
race condition can be easily resolved by defining rules on who gets to keep the resource
when such a clash occurs. The race condition can be avoided by requiring the two exchanges
to work from different ends of the pool. Thus there will be no clashes under low load. Under
high load race conditions will be hit which will be resolved by the pre-defined rules.
A more conservative design would partition the two way trunk pool into two one way
pools. This would avoid the race condition but would fragment the resource pool. The main
issue here is identifying race conditions. Most race conditions are not as simple as this one.
Some of them are subtle and can only be identified by careful examination of the design.
2.2 OPTIMIZATION CHALLENGES IN EMBEDDED COMPUTING SYSTEM

DESIGN
External constraints are one important source of difficulty in
embedded system design. Let‘s consider some important problems that
must be taken into account in embedded system design.
How much hardware do we need?
We have a great deal of control over the amount of computing power we
apply to our problem. We cannot only select the type of microprocessor
used, but also select the amount of memory, the peripheral devices, and
more. Since we often must meet both performance deadlines and
manufacturing cost constraints, the choice of hardware is important—too
little hardware and the system fails to meet its deadlines, too much hardware
and it becomes too expensive.
How do we meet deadlines?
The brute force way of meeting a deadline is to speed up the
hardware so that the program runs faster. Of course, that makes the system
more expensive. It is also entirely possible that increasing the CPU clock
rate may not make enough difference to execution time, since the program‘s
speed may be limited by the memory system.
How do we minimize power consumption?
In battery-powered applications, power consumption is extremely
important. Even in non battery applications, excessive power
consumption can increase heat dissipation. One way to make a digital
system consume less power is to make it run more slowly, but
naively slowing down the system can obviously lead to missed deadlines.
Careful design is required to slow down the noncritical parts of the
machine for power consumption while still meeting necessary performance
goals.
8
How do we design for upgradability?
The hardware platform may be used over several product generations
or for several different versions of a product in the same generation,
with few or no changes. However, we want to be able to add features by
changing software. How can we design a machine that will provide the
required performance for software that we haven‘t yet written?
Does it really work?

Reliability is always important when selling products—customers rightly
expect that products they buy will work. Reliability is especially
important in some applications, such as safety-critical systems. If we wait
until we have a running system and try to eliminate the bugs, we will
be too late—we won‘t find enough bugs, it will be too expensive to fix
them, and it will take too long as well. Another set of challenges comes
from the characteristics of the components and systems themselves. If
workstation programming is like assembling a machine on a bench, then
embedded system design is often more like working on a car—cramped,
delicate, and difficult. Let‘s consider some ways in which the nature of
embedded computing machines makes their design more difficult.
■ Complex testing: Exercising an embedded system is generally more

difficult than typing in some data. We may have to run a real machine in
order to generate the proper data. The timing of data is often important,
meaning that we cannot separate the testing of an embedded computer from
the machine in which it is embedded.
■ Limited observability and controllability: Embedded
computing systems usually do not come with keyboards and screens. This
makes it more difficult to see what is going on and to affect the system‘s
operation. We may be forced to watch the values of electrical signals on
the microprocessor bus, for example, to know what is going on inside the
system. Moreover, in real-time applications we may not be able to easily
stop the system to see what is going on inside.
■ Restricted development environments: The development
environments for embedded systems (the tools used to develop software
and hardware) are often much more limited than those available for PCs
and workstations. We generally compile code on one type of machine,
such as a PC, and download it onto the embedded system. To debug the
code, we must usually rely on programs that run on the PC or workstation
and then look inside the embedded system.
9
4. EMBEDDED PROCESSOR ARCHITECTURE: GENERAL CONCEPTS
4.1 Hardware elements in the embedded systems:
(i) Power Source

1. System own supply with separate supply rails for IOs, clock, basic
processor and memory and analog units
2. Supply from a system to which the embedded system interfaces,
for example in a network card,
3. Charge pump concept used in a system of little power needs, for
examples, in the mouse or contact-less smart card.
Power Dissipation Management
1. Clever real-time programming by Wait and Stop instructions
2. Clever reduction of the clock rate during specific set
of instructions 3. Optimizing the codes and
4. Clever enabling and disabling of use of caches or cache blocks
(ii) Clock Oscillator Circuit and
Clocking Units
1. Appropriate clock oscillator
circuit
2. Real Time Clock( System Clock) and Timers driving hardware and software
(iii)Reset Circuit
Reset means that the processor starts the processing of instructions from a starting
address. That address is one that is set by default in the processor program
counter (or instruction pointer and code segment registers in x86 processors) on
a power-up. From that address in memory, the fetching of program-instructions
starts following the reset of the processor.
1. Reset on Power-up
2. External and Internal Reset circuit
3. Reset on Timeout of Watchdog timer.
10
(iv)Memory
a. Functions Assigned to the ROM or EPROM or Flash
1. Storing 'Application' program from where the processor fetches the
instruction codes. 2. Storing codes for system booting, initializing, Initial
input data and Strings.
3. Storing Codes for RTOS.
4. Storing Pointers (addresses) of various service routines.
b. Functions Assigned to the Internal, External
and Buffer RAM 1. Storing the variables during
program run,
2. Storing the stacks,
3. Storing input or output buffers for example, for speech or image .
c. Functions Assigned to the
EEPROM or Flash Storing
non-volatile results of
processing
d. Functions Assigned to the Caches
1. Storing copies of the instructions, data and branch-transfer
instructions in advance from external memories,
2. Storing temporarily the results in write back caches during fast processing.
11
(v) Input, Output and I/O Ports :
The system gets inputs from physical devices (such as, for example,
the key-buttons, sensors and transducer circuits) through the input ports. A
controller circuit in a system gets the inputs from the sensor and transducer
circuits. A receiver of signals or a network card gets the input from a
communication system. [A communication system could be a fax or modem, a
broadcasting service]. Signals from a network are also received at the ports.
Consider the system in an Automatic Chocolate Vending Machine.

It gets inputs from a port that collects the coins that a child inserts. Presume that
only a child would use this wonder machine.
Consider the system in a mobile phone. The user inputs the mobile
number through the buttons, directly or indirectly (through recall of the number
from its memory). A panel of buttons connects to the system through the input
port or ports. Processor identifies each input port by its memory buffer address(es),
called port address(es). Just as a memory location holding a byte or word is
identified by an address, each input port is also identified by the address.
The system gets the inputs by the read operations at the port addresses.
The system has output ports through which it sends output bytes to the real
world. An output may be to an LED (Light Emitting diode) or LCD (Liquid
Crystal Display) panel.
For example, a calculator or mobile phone system sends the output-numbers or an
SMS message to the LCD display. A system may send output to a printer. An
output may be to a communication system or network. A control system sends the
outputs to alarms, actuators, furnaces or boilers. A robot is sent output for its
various motors. Each output port is identified by its memory-buffer address(es)
(called port address).
The system sends the output by a write operation to the port address.
There are also general-purpose ports for both the input and
output (I/O) operations. There are two types of I/O ports: Parallel
and Serial.
From a serial port, a system gets a serial stream of bits at an input or sends a
stream at an
output. For example, through a serial port, the system gets and sends the signals as
the bits through a modem. A serial port also facilitates long distance communication
and interconnections. A serial port may be a Serial UART port, a Serial
Synchronous port or some other Serial Interfacing port. [UART stands for
Universal Asynchronous Receiver and Transmitter].
A system port may get inputs from multiple channels or may have to send output
to multiple channels. A demultiplexer takes the inputs from various channels and
transfers the input from a select channel to the system. A multiplexer takes the
12
output from the system and sends it to another system.
A system might have to be connected to a number of other devices and systems.
For networking the systems, there are different types of buses: for example, I2C,
CAN, USB, ISA, EISA and PCI.
(v) Interrupts Handler

Interrupt Handling element for the external port interrupts, IO interrupts,
timer and RTC interrupts, software interrupts and Exceptions.
(vi) Linking Embedded System Hardware

• Linking and interfacing circuit for the Buses by using the appropriate
multiplexers, and decoders, demultiplexers Interface the various system
units
(vii) IO Communication Unit

a. Communication Driver(s) : Network Ethernet or serial driver to
communicate with host embedded system
Expansion Facility …
Serial Bus(es) : For example, UART(512 kbaud/s),1-wire CAN (33

kbps), Industrial I2C 100kbps),SM I2C Bus(100
kbps), SPI (100 kbps), Fault tolerant CAN (110
kbps), Serial Port(230 kbps), MicroWire (300
kbps), SCSI parallel (40 Mbps),
Fast SCSI(8M to 80 Mbps) , Ultra SCSI-3
(8Mto 160 Mbps), FireWire/IEEE 1394(400
Mbps, 72 meter),
High Speed USB 2.0 (480
Mbps, 25 meter). Parallel Bus(es): PCI, PCI-X
b. Media IO Control Element
c. Keypad or Keyboard IO
Interface
d. LCD Display System
Interface
e. ADC – Single or Multi
channel
f. DAC
g. GPIB Interface
Element
h. Pulse Dialling
Element
i. Modem
j. Bluetooth, 802.11, IrDA
viii) TIMERS :
A timer circuit suitably configured is the system-clock, also called real-
13
time clock (RTC). An RTC is used by the schedulers and for real-time
programming.
An RTC is designed as follows: Assume a processor generates a clock output
every 0.5 ms. When a system timer is configured by a software instruction to
issue timeout after 200 inputs from the processor clock outputs, then there are
10000 interrupts (ticks) each second. The RTC ticking rate is then 10 kHz and it
interrupts every 100 ms. The RTC is also used to obtain software-controlled
delays and time-outs.
More than one timer using the system clock (RTC) may be needed for the
various timing and counting needs in a system.
SOFTWARE IN EMBEDDED SYSTEMS
The software is the most important aspect, the brain of the embedded system.
An embedded system processor and the system need software that is specific to
a given application of that system. The processor of the system processes the
instruction codes and data. In the final stage, these are placed in the memory
(ROM) for all the tasks that have to be executed. The final stage software is also
called ROM image. Why? Just as an image is a unique sequence and
arrangement of pixels, embedded software is also a unique placement and
arrangement of bytes for instructions and data.
Each code or datum is available only in bits and byte(s) format. The system
requires bytes at each ROM-address, according to the tasks being executed. A
machine implement-able software file is therefore like a table of address and
bytes at each address of the system memory.
SOFTWARE IN PROCESSOR SPECIFIC ASSEMBLY LANGUAGE

When a programmer understands the processor and its instruction set
thoroughly, a program or a small specific part can be coded in the assembly
language.
Coding in assembly language is easy to learn for a designer who has gone
through a microprocessor or microcontroller course. Coding is extremely useful for
configuring physical devices like ports, a line-display interface, ADC and DAC and
reading into or transmitting from a buffer. These codes can also be device driver
codes. They are useful to run the processor or device specific
features and provide an optimal coding solution.
Lack of knowledge of writing device driver codes or codes that utilize the
processor-specific features-invoking codes in an embedded system design team
can cost a lot. Vendors may not only charge for the API but also charge
intellectual property fees for each system shipped out of the company.
To do all the coding in assembly language may, however, be very time
14
consuming. Full coding in assembly may be done only for a few simple, small-scale
systems, such as toys, automatic chocolate vending machine, robot or data
acquisition system.
Figure shows the process of converting an assembly language program

into the machine implementable software file and then finally obtaining a
ROM image file.
1. An assembler translates the assembly software into the machine codes

using a step called assembling.
2. In the next step, called linking , a linker links these codes with the other required
assembled codes. Linking is necessary because of the number of codes to be
linked for the final binary file. For example, there are the standard codes to
program a delay task for which there is a reference in the assembly language
program. The codes for the delay must link with the assembled codes. The delay
code is sequential from a certain beginning address. The assembly software code
is also sequential from a certain beginning address. Both the codes have to at the
distinct addresses as well as available addresses in the system. Linker links these.
The linked file in binary for run on a computer is commonly known as executable
file or simply ‗.exe‘ file. After linking, there has to be re-allocation of the
sequences of placing the codes before actually placement of the codes in the
memory.
3. In the next step, the loader program performs the task of reallocating the codes
after finding the physical RAM addresses available at a given instant. The
loader is a part of the operating system and places codes into the memory after
reading the ‗.exe‘ file. This step is necessary because the available memory
addresses may not start from 0x0000, and binary codes have to be loaded at the
different addresses during the run. The loader finds the appropriate start
address. In a computer, the loader is used and it loads into a section of RAM
15
the program that is ready to run.
4. The final step of the system design process is locating the codes as a ROM image
and permanently placing them at the actually available addresses in the ROM. In
embedded systems, there is no separate program to keep track of the available
addresses at different times during the running, as in a computer. The designer
has to define the available addresses to load and create files for permanently
locating the codes. A program called locator reallocates the linked file and creates
a file for permanent location of codes in a standard format. This format may be
Intel Hex file format or Motorola S-record format.
5. Lastly, either
(i) a laboratory system, called device programmer, takes as input
the ROM image file and finally burns the image into the PROM or
EPROM or
(ii) at a foundry, a mask is created for the ROM of the embedded system
from the image file. [The process of placing the codes in PROM or EPROM is also
called burning.] The mask created from the image gives the ROM in IC chip form.
SOFTWARE FOR EMBEDDING IN SYSTEM:

Device drivers, Device manager, OS, RTOS and Software tools
Devices :
o In an embedded system, there are number of physical devices.
o Physical devices – keypad, LCD display or touch screen, memory
stick(flash memory), wireless networking device, parallel port
and network card In an embedded system, there are number of
virtual devices.
o Virtual devices – pipe, file, RAM disk, socket,
A device driver is software for controlling (configuring), receiving and

sending a byte or a stream of bytes from or to a device.
A set of generic functions, such as create ( ),open ( ), connect ( ), listen ( ),
accept ( ),
read ( ), write ( ), close ( ), delete ( ) for use by high level programmers. Each generic
function calls a specific software (interrupt service routine), which controls a device
function or device input or output
Device controls and functions by :
1. Calling an ISR (also called Interrupt Handler Routine) on hardware or
software interrupt 2. Placing appropriate bits at the control register or word.
3. Setting status flag(s) in the status register for interrupting, therefore
running (driving) the ISR, Resetting the status flag after interrupt service.
Device Manager for the devices and drivers :
Device Management software (usually a part of the OS) provide codes for
detecting the presence of devices, for initializing (configuring) these and for testing
16
the devices that are present. Also includes software for allocating and registering
port(s) or device codes and data at memory addresses for the various devices at
distinctly different addresses, including codes for detecting any collision between
the allocated addresses, if any Real Time Operating System (RTOS).
Embedded software is most often designed for deterministic performance
and task and ISR latencies in addition to the OS functions
Performing multiple actions and controlling multiple devices and their
ISRs with defined real time constraints and with deadlines for these Task and ISRs
priority allocations, their preemptive scheduling, OS for providing deterministic
performance during concurrent processing and execution with hard(stringent) or
soft timing requirements with priority allocation and pre- emption. RTOS is needed
when the tasks for the system have real time constraints and deadlines for finishing
the tasks.
Application Software Development Tools
o Source Code Engineering Tools
o Stethoscope (tracks the switching from one task to another as a
function of time, stores beats)
o Trace Scope (traces changes in a parameter(s) as a function of time)
Simulator
A Simulator used to simulate the target processor and hardware elements on a
host PC and to run and test the executable module.
Project Manager
To manage the files that associates with a design stage project and keep
several versions of the source file(s) in an orderly fashion.
EXAMPLES OF EMBEDDED SYSTEMS

 Smart Cards,
 Missiles and Satellites,
 Computer Networking,
 Digital Consumer Electronics, and
 Automotive
1.10 APPLICATIONS
o Point of sales terminals

o Automatic Chocolate Vending Machine
o Stepper motor controllers for a
robotics system
o Washing or cooking system
o Multitasking Toys
17
o Microcontroller- based single or multi-display digital panel
meter for voltage, current, resistance and frequency
o Keyboard controller
o Serial port cards CD drive or Hard Disk drive controller,
Peripheral controllers, a CRT display controller, a keyboard
controller, a DRAM controller, a DMA controller, a printer
controller,
o a laser printer-controller, a LAN controller, a disk drive controller
o Fax or photocopy or printer or scanner Machine Remote
(controller) of TV o Telephone with memory, display and
other sophisticated features
o Motor controls Systems - for examples, an accurate control of speed and
position of d.c. motor, robot, and CNC machine; the automotive
applications like such as a close loop engine control, a dynamic ride
control, and an anti-lock braking system monitor
o Electronic data acquisition and supervisory control system Spectrum analyzer
o Biomedical systems - for example, an ECG LCD display-cum-
recorder, a blood- cell recorder cum analyzer and a patient monitor
system service.
o Electronic instruments, such as industrial process controller
o Electronic smart weight display system, and an industrial moisture
recorder cum controller. Digital storage system for a signal wave
form or Electric or Water Meter Reading.
o Computer networking systems, - for examples, router, front-end
processor in a server, switch, bridge, hub, and gateway
o For Internet appliances, there are numerous application systems:
( i ) Intelligent operation, administration and maintenance router
(IOAMR) in a distributed network, and
(ii) Mail Client card to store e-mail and personal addresses and to
smartly connect to a modem or server.
o Banking systems - for examples, Bank ATM and Credit card transactions
o Signal Tracking Systems - for examples, an automatic signal
tracker and a target tracker.
Communication systems, for examples, such as for a
mobile-communication a SIM card, a numeric pager, a
cellular phone, a cable TV terminal, and a FAX
transceiver with or without a graphic accelerator. Image
Filtering, Image Processing, Pattern Recognizer, Speech
Processing and Video Processing.
o Entertainment systems - such as videogame, music system and Video Games
o A system that connects a pocket PC to the automobile driver mobile
phone and a wireless receiver. The system then connects to a remote
server for Internet or e-mail or to remote computer at an ASP
(application Service Provider).A personal information manager using
18
frame buffers in handheld devices.
o Thin Client to provide the disk-less nodes with the remote boot
capability.[Application of thin- clients is accesses to a data center from
a number of nodes; or in an Internet Laboratory accesses to the
Internet leased line through a remote Server]. Embedded Firewall /
Router using ARM7/multi-processor with two Ethernet interfaces
and interfaces support to for PPP, TCP/IP and UDP protocols.
Sophisticated Applications
o Mobile Smart Phones and

Computing systems
o Mobile computer
o Embedded systems for wireless LAN and convergent technology devices
o Embedded systems for Video, Interactive video, broadband IPv6
(Internet Protocol version 6) Internet and other products, real time
video and speech or multimedia processing systems
o Embedded Interface and Networking systems using high speed (400
MHz plus), and ultra high speed (10 Gbps) and large bandwidth:
Routers, LANs, switches and gateways, SANs (Storage Area Networks),
WANs (Wide Area Networks),Security products and High-speed
Network security, Gigabit rate encryption rate products
THE EMBEDDED SYSTEM DESIGN PROCESS

This section provides an overview of the embedded system design process
aimed at two objectives.
 First, it will give us an introduction to the various steps in
embedded system design before we delve into them in more detail.
 Second, it will allow us to consider the design methodology itself.
A design methodology is important for three reasons.

 First, it allows us to keep a scorecard on a design to ensure that
we have done everything we need to do, such as optimizing
performance or performing functional tests.
 Second, it allows us to develop computer-aided design tools.
Developing a single program that takes in a concept for an
embedded system and emits a completed design would be a
daunting task, but by first breaking the process into manageable
steps, we can work on automating (or at least semiautomating)
the steps one at a time.
 Third, a design methodology makes it much easier for members of
a design team to communicate. By defining the overall
19
process, team members can more easily understand what they
are supposed to do, what they should receive from other team
members at certain times, and what they are to hand off
when they complete their assigned steps. Since most embedded
systems are designed by teams, coordination is perhaps the most
important role of a well-defined design methodology.
Figure summarizes the major steps in the embedded

system design process. In this top–down view, we start
with the system requirements.
Major levels of abstraction in the design process.
In the next step, specification, we create a more detailed description

of what we want. But the specification states only how the system
behaves, not how it is built. The details of the system‘s internals begin to
take shape when we develop the architecture, which gives the system
structure in terms of large components. Once we know the components we
need, we can design those components, including both software modules
and any specialized hardware we need.
Based on those components, we can finally build a complete

system. In this section we will consider design from the top–down—we
will begin with the most abstract description of the system and conclude
with concrete details. The alternative is a bottom–up view in which we
start with components to build a system. Bottom–up design steps are shown
20
in the figure as dashed-line arrows. We need bottom–up design because we
do not have perfect insight into how later stages of the design process will
turn out. Decisions at one stage of design are based upon estimates of what
will happen later: How fast can we make a particular function run? How
much memory will we need? How much system bus capacity do we
need? If our estimates are inadequate, we may have to backtrack and
amend our original decisions to take the new facts into account. In
general, the less experience we have with the design of similar systems,
the more we will have to rely on bottom-up design information to help us
refine the system.
But the steps in the design process are only one axis along which we can
view embedded system design. We also need to consider the major goals of the
design:
■ manufacturing cost;
■ performance ( both overall speed and
deadlines); and ■ power consumption.
We must also consider the tasks we need to perform at every step in the
design process. At each step in the design, we add detail:
■ We must analyze the design at each step to determine how we
can meet the specifications.
■ We must then refine the design to add detail.
■ And we must verify the design to ensure that it still meets all system
goals, such as cost, speed, and so on.
Requirements
Clearly, before we design a system, we must know what we are
designing. The initial stages of the design process capture this information for
use in creating the architecture and components.
We generally proceed in two phases:
 First, we gather an informal description from the customers known as
requirements, and
 we refine the requirements into a specification that contains enough
information to begin designing the system architecture.
Separating out requirements analysis and specification is often
necessary because of the large gap between what the customers can
describe about the system they want and what the architects need to design
the system. Consumers of embedded systems are usually not themselves
embedded system designers or even product designers. Their understanding
of the system is based on how they envision users‘ interactions with the
system.They may have unrealistic expectations as to what can be done
within their budgets; and they may also express their desires in a
language very different from system architects‘ jargon. Capturing a
21
consistent set of requirements from the customer and then massaging those
requirements into a more formal specification is a structured way to manage
the process of translating from the consumer‘s language to the designer‘s.
Requirements may be functional or nonfunctional . We must of

course capture the basic functions of the embedded system, but functional
description is often not sufficient. Typical non-functional requirements
include:
■ Performance: The speed of the system is often a major

consideration both for the usability of the system and for its ultimate cost.
As we have noted, performance may be a combination of soft performance
metrics such as approximate time to perform a user-level function and hard
deadlines by which a particular operation must be completed.
■ Cost: The target cost or purchase price for the system is

almostalways a consideration. Cost typically has two major
components: manufacturing cost includes the cost of components and
assembly; nonrecurring engineering (NRE) costs include the personnel
and other costs of designing the system.
■ Physical size and weight: The physical aspects of the final

system can vary greatly depending upon the application. An industrial
control system for an assembly line may be designed to fit into a standard-
size rack with no strict limitations on weight. A handheld device typically
has tight requirements on both size and weight that can ripple through
the entire system design.
■ Power consumption: Power, of course, is important in battery-

powered systems and is often important in other applications as well.
Power can be specified in the requirements stage in terms of battery
life—the customer is unlikely to be able to describe the allowable
wattage.
Validating a set of requirements is ultimately a psychological task

since it requires understanding both what people want and how they
communicate those needs. One good way to refine at least the user interface
portion of a system‘s requirements is to build a mock-up. The mock-up may
use canned data to simulate functionality in a restricted
demonstration, and it may be executed on a PC or a workstation. But it
should give the customer a good idea of how the system will be used
and how the user can react to it. Physical, nonfunctional models of devices
can also give customers a better idea of characteristics such as size and
weight.
Name
22
Purpose
Inputs
Outputs
Functions
Manufacturing
cost Power
Physical size and weight
FIGURE Sample requirements form.
Requirements analysis for big systems can be complex and time

consuming. However, capturing a relatively small amount of information in a
clear, simple format is a good start toward understanding system
requirements. To introduce the discipline of
requirements analysis as part of system design, we will use a simple
requirements methodology.
Figure shows a sample requirements form that can be filled out at
the start of the project. We can use the form as a checklist in considering
the basic characteristics of the system. Let‘s consider the entries in the form:
■ Name: This is simple but helpful. Giving a name to the project not
only simplifies talking about it to other people but can also crystallize
the purpose of the machine.
■ Purpose: This should be a brief one- or two-line description of what

the system is supposed to do. If you can‘t describe the essence of your
system in one or two lines, chances are that you don‘t understand it well
enough.
■ Inputs and outputs: These two entries are more complex than they
seem. The inputs and outputs to the system encompass a wealth of detail:
— Types of data: Analog electronic signals? Digital data? Mechanical inputs?

— Data characteristics: Periodically arriving data, such as digital
audio samples? Occasional user inputs? How many bits per data element?
— Types of I/O devices: Buttons? Analog/digital converters? Video displays?
■ Functions: This is a more detailed description of what the system

does. A good way to approach this is to work from the inputs to the
outputs: When the system receives an input, what does it do? How do
user interface inputs affect these functions? How do different functions
interact?
■ Performance: Many embedded computing systems spend at least

some time controlling physical devices or processing data coming from
the physical world. In most of these cases, the computations must be
performed within a certain time frame. It is essential that the
23
performance requirements be identified early since they must be
carefully measured during implementation to ensure that the system works
properly.
■ Manufacturing cost: This includes primarily the cost of the hardware

components. Even if you don‘t know exactly how much you can
afford to spend on system components, you should have some idea
of the eventual cost range. Cost has a substantial influence on
architecture: A machine that is meant to sell at $10 most likely has
a very different internal structure than a $100 system.
■ Power: Similarly, you may have only a rough idea of how much
power the system can consume, but a little information can go a long
way. Typically, the most important decision is whether the machine will
be battery powered or plugged into the wall. Battery-powered machines
must be much more careful about how they spend energy.
■ Physical size and weight: You should give some indication of the
physical size of the system to help guide certain architectural decisions.
A desktop machine has much more flexibility in the components used
than, for example, a lapel- mounted voice recorder.
A more thorough requirements analysis for a large system might use a form
similar to Figure as a summary of the longer requirements document. After an
introductory section containing this form, a longer requirements document could
include details on each of the items mentioned in the introduction. For example, each
individual feature described in the introduction in a single sentence may be
described in detail in a section of the specification.
After writing the requirements, you should check them for internal
consistency: Did you forget to assign a function to an input or output? Did you
consider all the modes in which you want the system to operate? Did you place
an unrealistic number of features into a battery-powered, low-cost machine?
To practice thecapture of system requirements, Example

1.1 creates the requirements for a GPS moving map system.
Example 1.1
Requirements analysis of a GPS moving map
The moving map is a handheld device that displays for the user a map of the
terrain around the user‘s current position; the map display changes as the
user and the map device change position. The moving map obtains its
position from the GPS, a satellite-based navigation system. The moving
map display might look something like the following figure.
24
What requirements might we have for our GPS moving map? Here is an initial list:
■ Functionality: This system is designed for highway driving and similar uses,
not nautical or aviation uses that require more specialized databases and
functions. The system should show major roads and other landmarks available in
standard topographic databases.
■ User interface: The screen should have at least 400 600 pixel resolution. The
device should be controlled by no more than three buttons. A menu system
should pop up on the screen when buttons are pressed to allow the user to make
selections to control the system.
■ Performance: The map should scroll smoothly. Upon power-up, a display should
take no more than one second to appear, and the system should be able to verify
its position and display the current map within 15 s.
■ Cost: The selling cost (street price) of the unit should be no more than $100.
Physical size and weight: The device should fit comfortably in the palm of the hand.
■ Power consumption: The device should run for at least eight hours
on four AA batteries.
Note that many of these requirements are not specified in engineering

units—for example, physical size is measured relative to a hand, not in centimeters.
Although these requirements must ultimately be translated into something that can
be used by the designers, keeping a record of what the customer wants can help
to resolve questions about the specification that may crop up later during design.
25
Specification
The specification is more precise—it serves as the contract between
the customer and the architects. As such, the specification must be carefully
written so that it accurately reflects the customer‘s requirements and does
so in a way that can be clearly followed during design.
Specification is probably the least familiar phase of this methodology
for neophyte designers, but it is essential to creating working systems with a
minimum of designer effort. Designers who lack a clear idea of what they want
to build when they begin typically make faulty assumptions early in the
process that aren‘t obvious until they have a working system. At that point,
the only solution is to take the machine apart, throw away some of it, and
start again. Not only does this take a lot of extra time, the resulting system is
also very likely to be inelegant, kludge, and bug-ridden.
The specification should be understandable enough so that
someone can verify that it meets system requirements and overall expectations
of the customer. It should also be unambiguous enough that designers know
what they need to build. Designers can run into several different types of
problems caused by unclear specification. If the behavior of some feature in a
particular situation is unclear from the specification, the designer may
implement the wrong functionality. If global characteristics of the
specification are wrong or incomplete, the overall system architecture derived
from the specification may be inadequate to meet the needs of
implementation.
A specification of the GPS system would include several components:

■ Data received from the GPS satellite
constellation.
■ Map data.
■ User interface.
■ Operations that must be performed to satisfy customer requests.
26
■ Background actions required to keep the system running, such as
operating the GPS receiver.
Architecture Design
The specification does not say how the system does things, only
what the system does. Describing how the system implements those
functions is the purpose of the architecture. The architecture is a plan for
the overall structure of the system that will be used later to design the
components that make up the architecture. The creation of the
architecture is the first phase of what many designers think of as design.
To understand what an architectural description is, let‘s look at a
sample architecture for the moving map of Example 1.1.
Figure shows a sample system architecture in the form of a block
diagram that shows major operations and data flows among them.
This block diagram is still quite abstract—we have not yet specified
which operations will be performed by software running on a CPU, what
will be done by special-purpose hardware, and so on. The diagram does,
however, go a long way toward describing how to implement the
functions described in the specification. We clearly see, for example, that
we need to search the topographic database and to render (i.e., draw) the
results for the display.
Only after we have designed an initial architecture that is not biased

toward too many implementation details should we refine that system
block diagram into two block diagrams: one for hardware and another
for software. These two more refined block diagrams are shown in Figure
The hardware block diagram clearly shows that we have one central
CPU surrounded by memory and I/O devices. In particular, we have chosen
to use two memories: a frame buffer for the pixels to be displayed and a
separate program/data memory for general use by the CPU. The software
block diagram fairly closely follows the system block diagram, but we have
added a timer to control when we read the buttons on the user interface and
render data onto the screen. To have a truly complete architectural
description, we require more detail, such as where units in the software block
diagram will be executed in the hardware block diagram and when
operations will be performed in time.
27
Architectural descriptions must be designed to satisfy both
functional and non-functional requirements. Not only must all the
required functions be present, but we must meet cost, speed, power, and
other nonfunctional constraints.
FIGURE Hardware and software architectures for the moving map.
Starting out with a system architecture and refining that to hardware and
software Architectures is one good way to ensure that we meet all
specifications: We can concentrate on the functional elements in the
system block diagram, and then consider the nonfunctional constraints
when creating the hardware and software architectures.
How do we know that our hardware and software architectures in fact

meet constraints on speed, cost, and so on? We must somehow be able to
estimate the properties of the components of the block diagrams, such as the
search and rendering functions in the moving map system. Accurate
estimation derives in part from experience, both general design
experience and particular experience with similar systems. However, we
can sometimes create simplified models to help us make more accurate
estimates. Sound estimates of all nonfunctional constraints during the
architecture phase are crucial, since decisions based on bad data will show
up during the final phases of design, indicating that we did not, in fact, meet
the specification.
28
Designing Hardware and Software Components
The architectural description tells us what components we need.

The component design effort builds those components in conformance to the
architecture and specification. The components will in general include both
hardware—FPGAs, boards, and so on—and software modules.
Some of the components will be ready-made. The CPU, for
example, will be a standard component in almost all cases, as will memory
chips and many other components. In the moving map, the GPS receiver is a
good example of a specialized component that will nonetheless be a
predesigned, standard component. We can also make use of standard software
modules. One good example is the topographic database. Standard
topographic databases exist, and you probably want to use standard routines to
access the database—not only is the data in a predefined format, but it is
highly compressed to save storage. Using standard software for these access
functions not only saves us design time, but it may give us a faster
implementation for specialized functions such as the data decompression
phase.
You will have to design some components yourself. Even if you are using
only standard integrated circuits, you may have to design the printed circuit
board that connects them. You will probably have to do a lot of custom
programming as well. When creating these embedded software modules,
you must of course make use of your expertise to ensure that the system
runs properly in real time and that it does not take up more memory
space than is allowed. The power consumption of the moving map
software example is particularly important. You may need to be very
careful about how you read and write memory to minimize power—for
example, since memory accesses are a major source of power consumption,
memory transactions must be carefully planned to avoid reading the same
data several times.
System Integration
Only after the components are built do we have the satisfaction of

putting them together and seeing a working system. Of course, this phase
usually consists of a lot more than just plugging everything together and
standing back. Bugs are typically found during system integration, and
good planning can help us find the bugs quickly. By building up the
system in phases and running properly chosen tests, we can often find
bugs more easily. If we debug only a few modules at a time, we are more
likely to uncover the simple bugs and able to easily recognize them. Only
by fixing the simple bugs early will we be able to uncover the more complex
or obscure bugs that can be identified only by giving the system a hard
workout. We need to ensure during the architectural and component design
phases that we make it as easy as possible to assemble the system in phases
29
and test functions relatively independently.
System integration is difficult because it usually uncovers problems.
It is often hard to observe the system in sufficient detail to determine
exactly what is wrong— the debugging facilities for embedded systems
are usually much more limited than what you would find on desktop
systems. As a result, determining why things do not stet work correctly
and how they can be fixed is a challenge in itself. Careful attention to
inserting appropriate debugging facilities during design can help ease system
integration problems, but the nature of embedded computing means that this
phase will always be a challenge.
4.3 RISC ARCHITECTURE
RISC architecture has been developed as a result of the 801 project which started in
1975 at the IBM T.J. Watson Research Center and was completed by the early 1980s.
This project was not widely known to the world outside of IBM and two other projects
with similar objectives started in the early 1980s at the University of California
Berkeley and Stanford University]. The term RISC (Reduced Instruction Set
Architecture), used for the Berkeley research project, is the term under which this
architecture became widely known and recognized today.
One cycle per instruction is achieved by exploitation of parallelism through the

use of pipelining. It is parallelism through pipelining that is the single most
important characteristic of RISC architecture from which all the remaining features of
the RISC architecture are derived. Basically we can characterize RISC as a
performance oriented architecture based on exploitation of parallelism through pipelining.
RISC Performance
There are several ways to achieve performance: technology advances, better machine
organization, better architecture, and also the optimization improvements in compiler
technology. By technology, machine performance can be improved only in proportion
to the amount of technology improvements and this is, more or less, available to everyone.
RISC deals with these two levels - more precisely their interaction and trade-offs.
The work that each instruction of the RISC machine performs is simple and straight
forward. Thus, the time required to execute each instruction can be shortened and the
number of cycles reduced.
Typically the instruction execution time is divided in five stages, machine cycles,
and as soon as processing of one stage is finished, the machine proceeds with executing the
second stage. However, when the stage becomes free it is used to execute the same
operation that belongs to the next instruction. The operation of the instructions is
performed in a pipeline fashion, similar to the assembly line in the factory process.
30
Typically those five pipeline stagesare:
IF – Instruction Fetch
ID – Instruction Decode
EX-Execute
MA – Memory Access
WB -Write Back
By overlapping the execution of several instructions in a pipeline fashion (as shown in

Fig.),RISC achieves its inherent execution parallelism which is responsible for the
performance advantage over the Complex Instruction Set Architectures (CISC).
The goal of RISC is to achieve execution rate of one Cycle Per Instruction (CPI=1.0) which
would be the case when no interruptions in the pipeline occurs.
The simplicity of the RISC instruction set is traded for more parallelism in execution. On
average a code written for RISC will consist of more instructions than the one written
for CISC. The typical trade-off that exists between RISC and CISC can be expressed in
the total time required to execute a certain task:
Time (task) = I x C x P x T0
I = No. of Instructions / Task
C = No. of Cycles / Instruction
P = No. of Clock Periods / Cycle (usually P=1)
T0 = Clock Period (ns)
The trade-off between RISC and CISC can be summarized as follows:
31
a. CISC achieves its performance advantage by denser program consisting of a
fewer number of powerful instructions.
b. RISC achieves its performance advantage by having simpler instructions .
4.4PIPELINING
Finally, the single most important feature of RISC is pipelining. Degree of
parallelism in the RISC machine is determined by the depth of the pipeline. It could be
stated that all the features of RISC (that were listed in this article), could easily be
derived from the requirementsfor pipelining and maintaining an efficient execution
model. The sole purpose of many of those features is to support an efficient execution of
RISC pipeline. It is clear that without pipelining the goal of CPI = 1 is not possible.
An example of the instruction execution in the absence of pipelining is shown in Figure.
We may be lead to think that by increasing the number of pipeline stages (the
pipeline depth), thus introducing more parallelism, we may increase the RISC
machine performance further. However, this idea does not lead to a simple and straight
forward to realization.
The increase in the number of pipeline stages introduces an overhead not only in hardware
(needed to implement the additional pipeline registers), but also the overhead in time due to
the delay of the latches used to implement the pipeline stages as well as the cycle time lost
due to the clock skews and clock jitter. This could very soon bring us to the point of
diminishing returns where further increase in the pipeline depth would result in less
performance.
An additional side effect of deeply pipelined systems is hardware complexity
necessary to resolve all the possible conflicts that can occur between the increased number
of instructions residing in the pipeline at one time. The number of the pipeline stages is
mainly determined by the type of the instruction core (the most frequent
instructions) and the operations required by those instructions. The pipeline depth
depends, as well, on the technology used. If the machine is implemented in a very high
speed technology characterized by the very small number of gate levels (such as GaAs
or ECL), characterized with a very good control of the clock skews, it makes sense to
pipeline the machine deeper. The RISC machines that achieve performance through the
use of many pipeline stages are known as super-pipelined machines.
Today the most common number of the pipeline stages encountered is five (as in the
examples given in this text). However, twelve or more pipeline stages are encountered in
some machine implementations.
32
4.6 INSTRUCTION FORMAT
An instruction is represented as a binary value with specific format, called the

instruction code. It is made out of different groups of bits, with different significations:
Opcode – represents the operation to be performed (it is the instruction

identifier)
Operands – one, two or three represent the operands of the operation to be
performed A microprocessor can have one format for all the instructions or can
have several different formats
An instruction is represented by a single instruction code.
ADDRESSING MODE
Addressing modes are an aspect of the instruction set architecture in

mostcentral processing unit (CPU) designs. The various addressing modes that are
defined in a given instruction set architecture define how machine language instructions
in that architecture identify the operand (or operands) of each instruction. An addressing
mode specifies how to calculate the effective memory address of an operand by using
information held in registers and/or constants contained within a machine instruction or
elsewhere.
• An architecture addressing mode is the set of syntaxes and methods

that instructions use to specify a memory address
– For operands or results
– As a target address for a branch instruction
• When a microprocessor accesses memory, to either read or write data,
it must specify the memory address it needs to access
• Several addressing modes to generate this address are known, a
microprocessor instruction set architecture may contain some or all of
those modes, deepening on its design
In the following examples we will use the LDAC instruction (loads
data from a memory location into the AC - accumulator -
microprocessor register)
Information involved in any operation performed by the CPU needs to be addressed.

In computer terminology, such information is called the operand. Therefore, any
instruction issued by the processor must carry at least two types of information. These are
the operation to be performed, encoded in what is called the op-code field, and the
address information of the operand on which the operation is to be performed, encoded in
what is called the address field.
Instructions can be classified based on the number of operands as: three-address,
two-address, one-and-half-address, one-address, and zero-address. We explain these
33
classes together with simple examples in the following paragraphs. It should be
noted that in presenting these examples, we would use the convention operation, source,
destination to express any instruction. In that convention, operation represents the
operation to be performed, for example, add, subtract, write, or read. The source field
represents the source operand(s). The source operand can be a constant, a value stored in
a register, or a value stored in the memory. The destination field represents the place
where the result of the operation is to be stored, for example, a register or a memory
location.
A three-address instruction takes the form operation add-1, add-2, add-3. In this form,
each of add-1, add-2, and add-3 refers to a register or to a memory location. Consider, for
example, the instruction
ADD R1 ,R2 ,R3 .
This instruction indicates that the operation to be performed is addition. It also

indicates that the values to be added are those stored in registers R1 and R2 that the
results should be stored in register R3. An example of a three-address instruction that
refers to memory locations may take the form ADD A,B,C. The instruction adds the
contents of memory location A to the contents of memory location B and stores the result
in memory location C.
A two-address instruction takes the form operation add-1, add-2. In this form,
each of add-1 and add-2 refers to a register or to a memory location. Consider, for
example, the instruction ADD R1 ,R2 . This instruction adds the contents of register R1 to
the contents of register R2 and stores the results in register R2 . The original contents of
register R2 are lost due to this operation while the original contents of register R1 remain
intact. This instruction is equivalent to a three-address instruction of the form ADD
R1 ,R2 ,R2 . A similar instruction that uses memory locations instead of registers can take the
form ADD A,B. In this case, the contents of memory location A are added to the contents
of memory location B and the result is used to override the original contents of
memory location B.
The operation performed by the three-address instruction ADD A,B,C can be per-
formed by the two two-address instructions MOVE B,C and ADD A,C. This is
because the first instruction moves the contents of location B into location C and the
second instruction adds the contents of location A to those of location C (the contents of
location B) and stores the result in location C.
A one-address instruction takes the form ADD R1 . In this case the instruction
implicitly refers to a register, called the Accumulator Racc , such that the contents of the
accumulator is added to the contents of the register R1 and the results are stored back
into the accumulator Racc . If a memory location is used instead of a register then an
instruction of the form ADD B is used. In this case, the instruction adds the content of
the accumulator Racc to the content of memory location B and stores the result back into
the accumulator Racc . The instruction ADD R1 is equivalent to the three-address
instruction ADD R1 ,Racc ,Racc or to the two-address instruction ADD R1 ,Racc .
Between the two- and the one-address instruction, there can be a one-and-half
34
address instruction. Consider, for example, the instruction ADD B,R1 . In this
case, the instruction adds the contents of register R1 to the contents of memory location
B and stores the result in register R1 . Owing to the fact that the instruction uses two
types of addressing, that is, a register and a memory location, it is called a one-and-
half-address instruction. This is because register addressing needs a smaller number of
bits than those needed by memory addressing.
It is interesting to indicate that there exist zero-address instructions. These are the
instructions that use stack operation. A stack is a data organization mechanism in which
the last data item stored is the first data item retrieved. Two specific operations can be
performed on a stack. These are the push and the pop operations. Figure below illustrates
these two operations.
As can be seen, a specific register, called the stack pointer (SP), is used to
indicate the stack location that can be addressed. In the stack push operation, the SP
value is used to indicate the location (called the top of the stack) in which the value
(5A) is to be stored (in this case it is location 1023). After storing (pushing) this value the
SP is
incremented to indicate to location 1024. In the stack pop operation, the SP is first
decremented to become 1021. The value stored at this location (DD in this case) is
retrieved (popped out) and stored in the shown register.
Different operations can be performed using the stack structure. Consider, for
example, an instruction such as ADD (SP)þ, (SP). The instruction adds the contents of the
stack location pointed to by the SP to those pointed to by the SP þ 1 and stores the result on
the stack in the location pointed to by the current value of the SP.
The different ways in which operands can be addressed are called the
addressing modes. Addressing modes differ in the way the address information of
operands is specified. The simplest addressing mode is to include the operand itself in
the instruction, that is, no address information is needed. This is called immediate
addressing. A more involved addressing mode is to compute the address of the operand by
adding a constant value to the content of a register. This is called indexed addressing.
Between these two addressing modes there exist a number of other addressing modes
35
including absolute addressing, direct addressing, and indirect addressing. A number of
different addressing modes are explained below.
1. Immediate Mode
According to this addressing mode, the value of the operand is (immediately)
available in the instruction itself. Consider, for example, the case of loading the
decimal value 1000 into a register Ri . This operation can be performed using an
instruction such as the following: LOAD #1000, Ri . In this instruction, the operation to
be performed is to load a value into a register. The source operand is (immediately)
given as 1000, and the destination is the register Ri . It should be noted that in order to
indicate that the value 1000 mentioned in the instruction is the operand itself and not its
address (immediate mode), it is customary to prefix the operand by the special character
(#). As can be seen the use of the immediate addressing mode is simple. The use of
immediate addressing leads to poor programming practice. This is because a change in the
36
value of an operand requires a change in every instruction that uses the immediate value
of such an operand. A more flexible addressing mode is explained below.
2. Direct (Absolute) Mode

According to this addressing mode, the address of the memory location that holds
the operand is included in the instruction. Consider, for example, the case of loading the
value of the operand stored in memory location 1000 into register Ri . This operation
can be performed using an instruction such as LOAD 1000, Ri . In this instruction, the
source operand is the value stored in the memory location whose address is
1000, and the destination is the register Ri . Note that the value 1000 is not prefixed
with any special characters, indicating that it is the (direct or absolute) address of the source
operand. Figure below shows an illustration of the direct addressing mode.
For example, if the content of the memory location whose address is 1000 was
(2345) at the time when the instruction LOAD 1000, Ri is executed, then the
result of executing such instruction is to load the value (2345) into register Ri .
Direct (absolute) addressing mode provides more flexibility compared to the

immediate mode. However, it requires the explicit inclusion of the operand
address in the
instruction. A more flexible addressing mechanism is provided through the use
of the indirect addressing mode. This is explained below.
3. Indirect Mode
In the indirect mode, what is included in the instruction is not the address of the
operand, but rather a name of a register or a memory location that holds the (effective)
address of the operand. In order to indicate the use of indirection in the instruction, it is
customary to include the name of the register or the memory location in parentheses.
Consider, for example, the instruction LOAD (1000), Ri, This instruction has the
memory location 1000 enclosed in parentheses, thus indicating indirection. The
meaning of this instruction is to load register Ri with the contents of the memory location
whose address is stored at memory address 1000. Because indirection can be made through
either a register or a memory location, therefore, we can identify two types of indirect
addressing. These are register indirect addressing, if a register is used to hold the address
of the operand, and memory indirect addressing, if a memory location is used to
37
hold the address of the operand. The two types are illustrated in Figure below
4. Indexed Mode
In this addressing mode, the address of the operand is obtained by adding a con-
stant to the content of a register, called the index register. Consider, for example, the
instruction LOAD X(Rind ), Ri . This instruction loads register Ri with the contents of the
memory location whose address is the sum of the contents of register Rind and the
value X. Index addressing is indicated in the instruction by including the name of the
index register in parentheses and using the symbol X to indicate the constant to be
added. Figure illustrates indexed addressing. As can be seen, indexing requires an
additional level of complexity over register indirect addressing.
5. Other Modes
The addressing modes presented above represent the most commonly used
modes in most processors. They provide the programmer with sufficient means to
handle most general programming tasks. However, a number of other addressing modes
have been used in a number of processors to facilitate execution of specific programming
tasks. These additional addressing modes are more involved as compared to those
presented above. Among these addressing modes the relative, autoincrement, and the
autodecrement modes represent the most well-known ones. These are explained below.
Relative Mode Recall that in indexed addressing, an index register, Rind , is

used. Relative addressing is the same as indexed addressing except that the
program counter (PC) replaces the index register. For example, the instruction
LOAD X(PC), Ri loads register Ri with the contents of the memory
38
location whose address is the sum of the contents of the program counter (PC)
and the value X. Figure illustrates the relative addressing mode.
Autoincrement Mode This addressing mode is similar to the register

indirect addressing mode in the sense that the effective address of the operand is
the content of a register, call it the autoincrement register, that is included in the
instruction
Figure Illustration of the auto increment addressing mode
However, with autoincrement, the content of the autoincrement register is
39
automatically incremented after accessing the operand. As before, indirection
is indicated by including the autoincrement register in parentheses. The
automatic increment of the register‘s content after accessing the operand is
indicated by including a (þ) after the parentheses. Consider, for example,
the instruction LOAD (Rauto )þ, Ri . This instruction loads register Ri with
the operand whose address is the content of register Rauto . After loading the
operand into register Ri , the content of register Rauto is incremented, pointing
for example to the next item in a list of items. Figure below illustrates the
autoincrement addressing mode.
Autodecrement Mode
Similar to the autoincrement, the autodecrement mode uses a register to
hold the address of the operand. However, in this case the content of
the autodecrement register is first decremented and the new content is
used as the effective address of the operand. In order to reflect the fact that the
content of the autodecrement register is decremented before accessing the
operand, a (2) is included before the indirection parentheses. Consider,
for example, the instruction LOAD (Rauto ), Ri . This instruction decrements
the content of the register Rauto and then uses the new content as the
effective address of the operand that is to be loaded into register Ri .
Figure below illustrates the autodecrement addressing mode.
Figure Illustration of the autodecrement addressing mode
40
INSTRUCTION TYPES
The type of instructions forming the instruction set of a machine is an indication of

the power of the underlying architecture of the machine. Instructions can in general be
classified as in the following Subsections 1 to 4.
1. Data Movement Instructions

Data movement instructions are used to move data among the different units of
the machine. Most notably among these are instructions that are used to move data
among the different registers in the CPU. A simple register to register movement of
data can be made through the instruction
MOVE Ri ,Rj
This instruction moves the content of register R to register R . The effect of the instruction is
to override the contents of the (destination) register R without changing the contents of the
(source) register R . Data movement instructions include those used to move data to (from)
registers from (to) memory. These instructions are usually referred to as the load and store
instructions, respectively.Examplesof the twoinstructionsare
LOAD 25838, Rj STORE Ri , 1024
The first instruction loads the content of the memory location whose address is 25838
into the destination register Rj. The content of the memory location is unchanged by
executing the LOAD instruction. The STORE instruction stores the content of the source
register Ri into the memory location 1024. The content of the source register is
unchanged by executing the STORE instruction. Table 2.3 shows some common data
transfer operations and their meanings.
41
2. Arithmetic and Logical Instructions
Arithmetic and logical instructions are those used to perform arithmetic and logical
manipulation of registers and memory contents. Examples of arithmetic instructions
include the ADD and SUBTRACT instructions. These are
ADD
R1 ,R2 ,R
0
SUBTR
ACT R1 ,
R2 ,R0
The first instruction adds the contents of source registers R1 and R2 and stores the
result in destination register R0 . The second instruction subtracts the contents of the
source registers R1 and R2 and stores the result in the destination register R0. The
contents of the source registers are unchanged by the ADD and the SUBTRACT
instructions. In addition to the ADD and SUBTRACT instructions, some machines have
MULTIPLY and DIVIDE instructions. These two instructions are expensive to implement
and could be substituted by the use of repeated addition or repeated subtraction. Therefore,
most modern architectures do not have MULTIPLY or
DIVIDE instructions on their instruction set. Table shows some common arithmetic
operations and their meanings.
Logical instructions are used to perform logical operations such as AND, OR,
SHIFT, COMPARE, and ROTATE. As the names indicate, these instructions perform,
respectively, and, or, shift, compare, and rotate operations on register or memory
contents. Table presents a number of logical operations.
42
3. Sequencing Instructions
Control (sequencing) instructions are used to change the sequence in which
instructions are executed. They take the form of CONDITIONAL BRANCHING
(CONDITIONALJUMP), UNCONDITIONAL BRANCHING (JUMP), or CALL
instructions. A common characteristic among these instructions is that their execution
changes the program counter (PC) value. The change made in the PC value can be
unconditional, for example, in the unconditional branching or the jump instructions.
In this case, the earlier value of the PC is lost and execution of the program starts at a
new value specified by the instruction. Consider, for example, the instruction JUMP
NEW-ADDRESS. Execution of this instruction will cause the PC to be loaded with the
memory location represented by NEW-ADDRESS whereby the instruction stored at this
new address is executed. On the other hand,
the change made in the PC by the branching instruction can be conditional based on the
value of a specific flag. Examples of these flags include the Negative (N ), Zero (Z ),
Overflow (V ), and Carry (C ). These flags represent the individual bits of a
specific register, called the CONDITION CODE (CC) REGISTER. The values of
flags are set based on the results of executing different instructions. The meaning of
each of these flags is shown in Table.
43
Consider, for example, the following group of instructions.
LOAD#100, R1 Loop: ADD (R2 ) þ , R0

DECREMENT R
BRANCH-IF-GREATER-THAN Loop
The fourth instruction is a conditional branch instruction, which indicates that if

the result of decrementing the contents of register R1 is greater than zero, that is, if
the Z flag is not set, then the next instruction to be executed is that labeled by
Loop. It should be noted that conditional branch instructions could be used to execute
program loops (as shown above).
The CALL instructions are used to cause execution of the program to transfer to a
subroutine. A CALL instruction has the same effect as that of the JUMP in terms of loading
the PC with a new value from which the next instruction is to be executed. However, with
the CALL instruction the incremented value of the PC (to point to the next instruction in
sequence) is pushed onto the stack. Execution of a RETURN instruction in the
subroutine will load the PC with the popped value from the stack. This has the
effect of resuming program execution from the point where branching to the subroutine has
occurred.
Figure shows a program segment that uses the CALL instruction. This pro-
gram segment sums up a number of values, N, and stores the result into memory location
SUM. The values to be added are stored in N consecutive memory locations starting at
NUM. The subroutine, called ADDITION, is used to perform the actual addition of values
while the main program stores the results in SUM. Table presents some common transfer
of control operations.
4. Input/Output Instructions
Input and output instructions (I/O instructions) are used to transfer data between the
computer and peripheral devices. The two basic I/O instructions used are the INPUT and
OUTPUT instructions. The INPUT instruction is used to transfer data from an input
device to the processor. Examples of input devices include a keyboard or a mouse. Input
devices are interfaced with a computer through dedicated input ports. Computers can
use dedicated addresses to address these ports. Suppose that the input port through
which a keyboard is connected to a computer carries the unique address 1000.
Therefore, execution of the instruction INPUT 1000 will cause the data stored in
a specific register in the interface between the keyboard and the computer, call it the
input data register, to be moved into a specific register (called the accumulator) in the
computer. Similarly, the execution of the instruction OUTPUT 2000 causes the
data stored in the accumulator to be moved to the data output register in the output
device whose address is 2000. Alternatively, the computer can address these ports in
the usual way of addressing memory locations. In this case, the computer can input data
from an input device by executing an instruc-tion such as MOVE Rin , R0 . This
instruction moves the content of the register Rin into the register R0 . Similarly, the
instruction MOVE R0 , Rin moves the contents of register R0 into the register Rin , that
is, performs an output operation. This
44
Latter scheme is called memory-mapped Input/Output. Among the advantages of
memory-mapped I/O is the ability to execute a number of memory-dedicated
instructions on the registers in the I/O devices in addition to the elimination of
the need for dedicated I/O instructions. Its main disadvantage is the need to dedicate part
of the memory address space for I/O devices.
4.7 Data Alignment and Byte Ordering (Big Endian/Little Endian)

Data Alignment
Different processor targets may also have different word alignment restrictions.
Typically, a two or four byte integer should start on a four-byte address in order to be fetched
correctly over a bus. An access of a misaligned memory element may cause a bus error.
Intel/AMD processors (eg a PC) are typically more tolerant of alignment problems than
others. The SUN Sparc that kattis runs on is quite picky about alignment!
Therefore, the software may need to adjust the contents of a message, or copy it into a
new location, before interpreting it.
A typical example of alignment problems in networking is an Ethernet header followed

by an IP header. If the 14-byte ethernet header is correctly aligned on a four-byte boundary, the
IP header will not be (since it starts 14 byte after the Ethernet header).
Due to alignment and byte order problems, it is necessary to translate protocol headers
between the network representation (a sequence of bytes) to a structured data type (eg a C-
struct). The process of encoding a structured data type into a sequence of bytes in a message is
sometimes called marshaling. The reverse process, of unpacking data from its network
representation to one interpretable by a host is called unmarshaling.
In the lab, it is recommended to create C structures for the Ethernet and IP headers. Then
there are essentially two approaches to handle the translation between the raw packet data and
the structured information:
45
Use the unstructured raw data, and typecast it into structured information. The fields are
then accessed by byte- swapping functions. This is easiest but has limitation for handling
misaligned data. However, the x86 processors are forgiving when it comes to accessing
misaligned data,but not other architectures.
Example: char *buf; /* This is the raw data */ iphdr *ip; /* This is the structured type */
ip = (iphdr *)buf; /* Typecast */ ip->len = ntohl(ip->len); /* Fields may be accessed via the
structured type after byte swapping */
Copying and translating the raw data into separate structures. The fields can then be accessed
without type conversions. This is more complex but is a more general method.
One example of C-types are uint8_t, uint16_t and uint32_t to represent 1- 2-byte and 4-
byte unsigned integers, respectively. It is recommended that you use these types when
defining fields in protocol headers, for example.
Byte Ordering
The optional E-bit in the PSW controls whether loads and stores use big endian or
little endian byte ordering. When the E-bit is 0, all larger-than-byte loads and stores are big
endian — the lower addressed bytes in memory correspond to the higher-order bytes in the
register. When the E-bit is 1, all larger-than-byte loads and stores are little endian — the
lower-addressed bytes in memory correspond to the lower-order bytes in the register. Load
byte and store byte instructions are not affected by the E-bit.The E-bit also affects instruction
fetch. Processors which implement the PSW E-bit must also provide an implementation-
dependent, software writable default endian bit. The default endian bit controls whether the
PSW E-bit is set to 0 or 1 on interruptions and also controls whether data in the page table is
interpreted in big endian or little endian format by the hardware TLB miss handler (if
implemented).
46
Figure shows various loads in little endian format.
Stores are not shown but behave similarly. The E-bit also affects instruction fetch.When
the E-bit is 0, instruction fetch is big endian — the lower- addressed bytes in memory correspond
to the higher-order bytes in the instruction.When the E-bit is 1, instruction fetch is little endian
— the lower-addressed bytes in memory correspond to the lower-order bytes in the
instruction.Architecturally, the instruction byte swapping can occur either when a cache line is
moved into the instruction cache or as instructions are fetched from the I-cache into the pipeline.
Because processors are allowed to swap instructions as they are moved into the I-cache, software
is required to keep track of which pages might have been brought into the I-cache in big endian
form and in little endian form, given the cache move-in rules, and before executing the code,
flush all lines on any page that might have been moved in with the wrong form. Note that the
move-in rules allow all lines on a page, plus the next sequential page, to be moved in, so guard
pages (that will never be executed) must be used between code pages which will execute with
opposite endian form.
47
5. INTRODUCTION TO VLIW AND DSP PROCESSOR
DSP
Digital signal processors (DSPs) are specialized microprocessors optimized to execute the
computationally-intensive operations commonly found in the inner loops of digital signal
processing algorithms. DSPs are typically designed to achieve high performance and low cost,
and are therefore used extensively in embedded systems. Some of the architectural features
commonly found in DSPs include fast multiply accumulate hardware, separate address units,
multiple data-memory banks, specialized addressing modes, and support for low-overhead
looping. Another common feature of DSP architectures is the use of tightly-encoded instruction
sets.
For example, using 16- to 48-bit instruction words, a common DSP instruction can specify
up to five parallel operations: multiply-accumulate, two pointer updates, and two data moves.
Tightly-encoded instructions that could specify five operations in a single instruction were
initially used to improve code density with the belief that this also reduced instruction memory
requirements, and hence cost. Tightly-encoded instructions were also used to reduce instruction
memory bandwidth, which was of particular concern when packaging was not very advanced.
As a result of this encoding style, DSP instruction-set architectures (ISAs) are not very
well suited for automatic code generation by modern, high-level language (HLL) compilers. For
example, most operations are accumulator based, and very few registers can be specified in a
DSP instruction. Although this reduces the number of bits required for encoding instructions, it
also makes it more difficult for the compiler to generate efficient code. Furthermore, DSP
instructions can only specify limited instances of parallelism that are commonly used in the inner
loops of DSP algorithms. As a result, DSPs cannot exploit parallelism beyond what is supported
by the ISA, and this can degrade performance significantly. For the early generation of
programmable DSPs, tightly-encoded instruction sets were an acceptable solution. At that time,
DSPs were mainly used to implement simple algorithms that were relatively easy to code and
optimize in assembly language. Furthermore, the compiler technology of that time was not
advanced enough to generate code with the same efficiency and compactness as that of a human
programmer.
However, as DSP and multimedia applications become larger and more sophisticated, and
as design and development cycles are required to be shorter, it is no longer feasible to program
DSPs in assembly language. Furthermore, current compiler technology has advanced to the point
where it can generate very efficient and compact code. For these reasons, DSP manufacturers are
currently seeking alternative instruction set architectures. One such alternative, used in new
media processors and DSPs such as the Texas Instruments TMS320C6x is the VLIW
architecture
48
VLIW
Since DSP applications typically exhibit static control flow, compilers can easily be used to
schedule instructions statically and exploit the available parallelism. This makes VLIW
architectures, with their simpler control units, a more cost-effective choice than superscalar
architectures in embedded systems. VLIW architectures are also better suited for exploiting
parallelism in DSP applications than SIMD architectures. Although SIMD architectures are
suitable for applications that exhibit homogeneous parallelism, they are not well suited for
applications that exhibit more heterogeneous parallelism, such as that found in the inner loops of
DSP kernels, or the control code of DSP applications. The greater flexibility in VLIW
architectures also makes them easier targets for HLL compilers. However, in their most basic
form, they are also more expensive to implement.
For example, while SIMD architectures use a single ALU to perform four parallel
operations, a VLIW architecture requires four, full-precision ALUs to perform the same
operations. Moreover, while the four operations can be encoded as a single SIMD instruction,
they have to be encoded as four, separate operations in a long instruction. One way to reduce the
cost of a VLIW architecture is to customize it to the target application. For example, smaller-
precision ALUs and datapaths can be used to match the requirements of the application.
Instruction storage and bandwidth requirements can also be reduced by using efficient long-
instruction encoding techniques. Even specialized SIMD operations can be used to supplement a
VLIW architecture and reduce its cost. This, however, requires more sophisticated compiler
technology to exploit available sub-word parallelism.
Finally, VLIW architectures are better suited for exploiting the fine-grained parallelism
found in DSP applications than multiprocessor architectures. Even though some applications can
benefit from the exploitation of coarse-grained parallelism, most applications cannot justify the
additional cost of a multiprocessor architecture. Multiprocessor DSP architectures are also
difficult targets for HLL compilers, making them very cumbersome to program.
Since VLIW architectures have the flexibility to exploit different levels of parallelism,
have simple control structures, and are easy targets for current HLL compilers, they are the most
suitable architecture for embedded DSP systems. However, a major impediment to using VLIW
architectures in cost-sensitive embedded systems has been their high instruction bandwidth
requirements. Nonetheless, several approaches have been employed in research and commercial
VLIW architectures to overcome this problem. The remainder of this paper shows the
performance and cost benefits of using VLIW architectures compared to typical, DSP-style
architectures
49
1.1 IO PORT TYPES- SERIAL AND PARALLEL IO PORTS
A port is a device to receive the bytes from external peripheral(s) [or

device(s) or processor(s) or controllers] for reading them later using instructions
executed on the processor to send the bytes to external peripheral or device or
processor using instructions executed on processor.
A Port connects to the processor using address decoder and system buses.
The processor uses the addresses of the port-registers for programming the port
functions or modes, reading port status and for writing or reading bytes.
Example
 SI serial interface in 8051
 SPI serial peripheral
interface in 68HC11  PPI
parallel peripheral interface
8255
 Ports P0, P1, P2 and P3 in 8051 or PA, PB,PC and
PD in 68HC11 COM1 and COM2 ports in an IBM
PC
IO
Port
Types
Types
of
Seria
ports

 Synchronou
s Serial Input 
 Synchronou
s Serial Output
 Asynchronous Serial UART input
 Asynchronous Serial UART output (both as input and as output, for
example,modem.)
50

Types of parallel ports
 Parallel
port one bit
Input 
 Parallel
one bit output
 Parallel Port
multi-bit Input 
 Parallel Port
multi-bit Output
1. 2. SYNCHRONOUS SERIAL INPUT EXAMPLE
Inter-processor data transfer, reading from CD or hard disk, audio input,

video input, dial tone, network input, transceiver input, scanner input, remote
controller input, serial I/O bus input, writing to flash memory using SDIO (Secure
Data Association IO based card).
51
Synchronous Serial Input
 The sender along with the serial bits also sends the clock pulses SCLK
(serial clock) to the receiver port pin. The port synchronizes the serial data
input bits with clock bits. Each bit in each byte as well as each byte in
synchronization
 Synchronization means separation by a constant interval or phase
difference. If clock period = T, then each byte at the port is received at
input in period = 8T.
 The bytes are received at constant rates. Each byte at input port separates
by 8T and data transfer rate or the serial line bits is (1/T) bps. [1bps = 1
bit per s]
 Serial data and clock pulse-inputs
 On same input line − when clock pulses either encode or modulate serial
data input bits suitably. Receiver detects the clock pulses and receives
data bits after decoding or demodulating.
 On separate input line − When a separate SCLK input is sent, the receiver
detects at the
52
middle or + ve edge or –ve edge of the clock pulses that whether the data-
input is 1 or 0
and saves the bits in an 8-bit shift register. The processing element at the port
(peripheral)
saves the byte at a port register from where the microprocessor reads the byte.
Master output slave input (MOSI) and Master input slave output (MISO)
MOSI when the SCLK is sent from the sender to the receiver and
slave is forced to synchronize sent inputs from the master as per the inputs
from master clock.
 MISO when the SCLK is sent to the sender (slave)from the receiver (master)
and slave is
forced to synchronize for sending the inputs to master as per the master
clock outputs.  Synchronous serial input is used for inter processor
transfers, audio inputs and
streaming data inputs.
53
Example Synchronous Serial Output
Inter-processor data transfer, multiprocessor communication, writing to CD or
hard disk,
audio Input/output, video Input/output, dialer output, network device output,
remote TV Control, transceiver output, and serial I/O bus output or writing to
flash memory using SDIO Synchronous Serial Output
 Each bit in each byte sent in synchronization with a clock.
 Bytes sent at constant rates. If clock period= T, then data transfer rate is (1/T)
bps.
 Sender either sends the clock pulses at SCLK pin or sends the serial data
output and clock pulse-input through same output line with clock pulses
either suitably modulate or encode the serial output bits.
Synchronous serial output using shift register
 The processing element at the port (peripheral) sends the byte through a
shift register at the port to where the microprocessor writes the byte.
 Synchronous serial output is used for inter processor transfers, audio outputs
and streaming data outputs.
54
Synchronous Serial Input/output
 Each bit in each byte is in synchronization at input and each bit in

each byte is in synchronization at output with the master clock
output.
 The bytes are sent or received at constant rates. The I/Os can also be on
same I/O line when input/output clock pulses either suitably modulate or
encode the serial input/output, respectively. If clock period = T, then data
transfer rate is (1/T)bps.
 The processing element at the port (peripheral)sends and receives the byte
at a port register to or from where the microprocessor writes or reads the
byte
55
1. 4. ASYNCHRONOUS SERIAL PORT LINE RXD (RECEIVE DATA).
 Does not receive the clock pulses or clock information along with the bits.
 Each bit is received in each byte at fixed intervals but each
received byte is not in synchronization.
 Bytes separate by the variable intervals or phase differences.
 Asynchronous serial input also called UART input if serial input is
according to UART protocol
Example Serial Asynchronous Input

 Asynchronous serial input is used for keypad inputs and modem inputs in
computers
 Keypad controller serial data-in, mice, keyboard controller, modem input,
character send inputs on serial line [also called UART (universal receiver
and transmitter) input when according to UART mode]
56
1.6 UART PROTOCOL SERIAL LINE FORMAT
 Starting point of receiving the bits for each byte is indicated by a line
transition from 1to 0 for a period = T. [T−1 called baud rate.]
 If sender‗s shift-clock period = T, then a byte at the port is received at
input in period= 10.T or 11.T due to use of additional bits at start and end
of each byte. Receiver detects n bits at the intervals of T from the middle
of the start indicating bit. The n = 0, 1, …, 10 or 11 and finds whether the
data-input is 1 or 0 and saves the bits in an 8-bit shift register.
57
 Processing element at the port (peripheral)saves the byte at a port register
from where the microprocessor reads the byte.
Asynchronous Serial Output
 Asynchronous output serial port line TxD(transmit data).
 Each bit in each byte transmit at fixed intervals but each output byte is not
in synchronization (separates by a variable interval or phase difference).
Minimum separation is 1 stop bit interval TxD.
 Does not send the clock pulses along with the bits.
 Sender transmits the bytes at the minimum intervals of nT. Bits receiving
starts from the
middle of the start indicating bit,

 n = 0, 1, …, 10 or 11 and sender sends the bits through a 10 or 11 -
bit shift register. The processing element at the port(peripheral) sends the
byte at a port register to where the microprocessor is to write the byte.
 Synchronous serial output is also called UART output if serial output
is according to UART protocol
Example Serial Asynchronous Output
_ Output from modem, output for printer, the output on a serial line [also called
UART output when according to UART]
Half Duplex
 Half duplex means as follows: at an instant communication can only be
one way (input or output) on a bi-directional line.
58
 An example of half-duplex mode─ telephone communication. On one
telephone line, the talk can only in the half duplex way mode.
Full Duplex
 Full duplex means that at an instant, the communication can be both ways.
An example of the full duplex asynchronous mode of communication is the
communication between the modem and the computer though TxD and
RxD lines or communication using SI in modes 1, 2 and 3 in 8051
1.7 PARALLEL PORT SINGLE BIT INPUT
 Completion of a
revolution of a wheel, 
 Achievingpreset
pressure in a boiler,
 Exceeding the upper limit of permittedweight over the pan of an
electronicbalance,
 Presence of a magnetic piece in the vicinity of or within reach of a robot arm
to its endpoint and Filling of a liquid up to a fixed level.
Parallel Port Output- single bit
 PWM output for a DAC, which controls liquid level, or temperature, or
pressure, or speed or angular position of a rotating shaft or a linear
displacement of an object or ad.c. motor
control
 Pulses to an external circuit
 Control signal to an
external circuit Parallel
59
Port Input- multi-bit
 ADC input from liquid level measuring sensor or temperature sensor or
pressure sensor or speed sensor or d.c. motor rpm sensor
 Encoder inputs for bits for angular position of a rotating shaft or a linear
displacement of an object.
Parallel Port Output- multi-bit

 LCD controller for Multilane LCD display matrix unit in a cellular phone
to display on the screen the phone number, time, messages, character
outputs or pictogram bit-images for display screen or e-mail or web page
 Print controller output
 Stepper-motor
coil driving bits
Parallel Port Input-
Output
 PPI 8255
 Touch screen in mobile phone
Ports or Devices Communication and communication protocols
Two Modes of communication between the devices and computer system
Full Duplex – Both devices or device and computer system simultaneously
communicate each other.
Half Duplex – Only one device can communicate with another at an instance
Three ways of communication between the

ports or devices 1. Synchronous
2.
Iso-
sync
hron
ous
60
3.
Asyn
chron
ous
1. Synchronous and Iso-synchronous Communication in Serial Ports or Devices
Synchronous Communication.
When a byte (character) or a frame (a collection of bytes) in of the

data is received or transmitted at the constant time intervals with uniform
phase differences, the communication is called as synchronous. Bits of a
full frame are sent in a prefixed maximum time interval. Iso-synchronous
Synchronous communication special case−when bits of a full
frame are sent in the maximum time interval, which can be variable.
Synchronous Communication
Clock information is transmitted explicitly or implicitly in synchronous
communication. The receiver clock continuously maintains constant phase
difference with the transmitter clock. Bits of a data frame maintain uniform phase
difference and are sent within a fixed maximum time interval.
Example of synchronous serial communication
 Frames sent over a LAN. Frames of data communicate with
the constant time intervals between each frame remaining
constant.
 Another example is the inter-processor communication in a
multiprocessor system Optional Synchronous Code bits
 Optional Sync Code bits or bi-sync code bits orframe start and end
signaling bits─ During communication few bits (each separated
61
byinterval ΔT) sent as Sync code to enable the frame
synchronization or frame start signaling.
 Code bits precede the data bits.
 May be inversion of code bits after each frame in certain protocols.
 Flag bits at start and end are also used in certain protocols.
Always present Synchronous device port data bits
 Reciprocal of T is the bit per second(bps).
 Data bits─ m frame bits or 8 bits transmit such that each bit is at
the line for time ΔT or, each frame is at the line for time (m. T)m
may be 8 or a large number. It depends on the protocol
Synchronous device clock bits
 Clock bits ─ Either on a separate clock line or on data line
such that the clock information is also embedded with the data
bits by an appropriate encoding or modulation
 Generally not optional
First characteristics of synchronous communication

1. Bytes (or frames) maintain a constant phase difference, which means
they are synchronous, i.e. in synchronization. No permission of sending either
the bytes or the frames at the random time intervals, this mode therefore does
not provide for handshaking during the communication interval ─ This
facilitates fast data communication at pre-fixed bps.
Second characteristics of synchronous communication
2. A clock ticking at a certain rate has always to be there for transmitting
serially the bits of all the bytes (or frames) serially. Mostly, the clock is not
62
always implicit to the synchronous data receiver. The transmitter generally
transmits the clock rate information
Asynchronous Communication from Serial Ports or Devices
Asynchronous Communication Clocks of the receiver and transmitter
independent, unsynchronized, but of same frequency and variable phase
differences between bytes or bits of two data frames, which may not be sent
within any prefixed time interval.
Example of asynchronous communication
• UART Serial, Telephone or modem
communication. • RS232C
communication between the UART
devices
• Each successive byte can have variable time-gap but have a
minimum in-between interval and no maximum limit for full frame
of many bytes
Two characteristics of asynchronous communication
1. Bytes (or frames) need not maintain a constant phase difference and are
asynchronous, i.e., not in synchronization. There is permission to send either
bytes or frames at variable time intervals─ This facilitates in-between
handshaking between the serial transmitter port and serial receiver port
2. Though the clock must ticking at a certain rate always has to be there to
transmit the bits of a single byte (or frame) serially, it is always implicit to the
asynchronous data receiver and is independent of the transmitter
Clock Features
_ The transmitter does not transmit (neither separately nor by encoding
using
63
modulation)along with the serial stream of bits any clock rate
information in the asynchronous communication and receiver clock thus is
notable to maintain identical frequency and constant phase difference with
transmitter clock
Example: IBM personal computer has two COM ports (communication ports)
 _ COM1 and COM2 at IO addresses 0x2F8-
0xFFand 0xx38-0x3FF  _ Handshaking signals─
RI, DCD, DSR, DTR,RTS, CTS, DTR
 _ Data Bits─ RxD and TxD Example: COM port and Modem
Handshaking signals  _ When a modem connects, modem sends data
carrier detect DCD signal
at an instance t0.
 _ Communicates data set ready (DSR)signal at an instance t1 when it
receives the bytes on the line.
 _ Receiving computer (terminal) responds at an
instance t2 by data terminal ready(DTR) signal.
After DTR, request to send (RTS) signal is sent at an instance t3
 _ Receiving end responds by clear to send (CTS) signal at an
instance t4. After the response CTS, the data bits are transmitted
by modem from an instance t5 to the receiver terminal.
 _ Between two sets of bytes sent in asynchronous mode, the
handshaking signals RTS and CTS can again be exchanged.
This explains why the bytes do not remain synchronized during
asynchronous transmission.
64
3. Communication Protocols
65
1. Protocol
A protocol is a standard adopted, which tells the way in which the bits
of a frame must be sent from a device (or controller or port or processor) to
another device or system
[Even in personal communication we follow a protocol – we say Hello! Then
talk and then say good bye!]
A protocol defines how are the frame bits:
1) sent− synchronously or Iso synchronously or asynchronously and
at what rate(s)? 2) preceded by the header bits? How the receiving
device address communicated
so that only destined device activates and receives the bits?
[Needed when several devices addressed though a common line(bus)]
3) How can the transmitting device address defined so that
receiving device comes to know the source when receiving data
from several sources?
4) How the frame-length defined so that receiving device know the frame-
size in advance? 5) Frame-content specifications –Are the sent frame bits
specify the control or
device configuring or commend or data?

6) Are there succeeding to frame the trailing bits so that receiving
device can check the errors, if any in reception before it detects end
of the frame ?
A protocol may also define:
7) Frame bits minimum and maximum length permitted per frame
8) Line supply and impedances and line-Connectors specifications

Specified protocol at an embedded system port or communication device IO port
bits
66
sent after first formatted according to a specified protocol, which is to be
followed when communicating with another device through an IO port or
channel
Protocols
 _ HDLC, Frame Relay, for synchronous communication
 _ For asynchronous transmission from a device port− RS232C, UART,
X.25, ATM, DSL and
ADSL
 _ For networking the physical devices in telecommunication and
computer networks − Ethernet and token ring protocols used in LAN Networks
Protocols in embedded
network
devices o _
For Bridges
and routers
o _ Internet appliances application protocols and Web protocols
─HTTP (hyper text transfer protocol), HTTPS (hyper text transfer
protocol Secure Socket Layer),SMTP (Simple Mail Transfer
Protocol),POP3 (Post office Protocol version 3),ESMTP (Extended
SMTP),
File transfer, Boot Protocols in embedded
devices network o _ TELNET
(Tele network),
o _ FTP (file transfer protocol),
o _ DNS (domain network server),
o _ IMAP 4 (Internet Message Exchange Application Protocol) and
o _ Bootp (Bootstrap protocol).Wireless Protocols in embedded devices
network
o _ Embedded wireless appliances uses wireless
protocols─ WLAN 802.11,802.16, Bluetooth,
ZigBee, WiFi, WiMax,
67
1.8 TIMING AND
COUNTING
DEVICES
Timer
• Timer is a device, which counts the input at regular interval (δT) using
clock pulses at its input.
• The counts increment on each pulse and store in a register,
called count register • Output bits (in a count register or at the
output pins) for the present counts.
Evaluation of Time
• The counts multiplied by the interval δT give the time.
• The (present counts −initial counts) ×δT interval gives the time interval between
two instances when present count bits are read and initial counts were read or set.
Timer
 _ Has an input pin (or a control bit in control register) for resetting it for all
count bits = 0s.  _ Has an output pin (or a status bit in status register) for
output when all count bits =
0s after reaching the maximum value, which also means after
timeout or overflow. Counter
• A device, which counts the input due to the events at irregular or
regular intervals. • The counts gives the number of input events or
pulses since it was last read.
• Has a register to enable read of present counts
• Functions as timer when counting regular interval clock pulses
_ Has an input pin (or a control bit in control register) for resetting it for all
68
count bits = 0s. _ Has an output pin (or a status bit in status register) for
output when all count bits = 0s after reaching the maximum value, which
also means after timeout or overflow.
When a timer or counter becomes 0x00or 0x0000 after 0xFF or

0xFFFF(maximum value), it can generate an interrupt‗, or an output ‗Time-
Out‗ or set a status bit ‗TOV‗
69
Free running Counter (Blind running Counter)
• A counting device may be a free running(blind counting) device giving
overflow interrupts at fixed intervals
• A pre-scalar for the clock input pulses to
fix the intervals Free Running Counter
It is useful for action or initiating chain of actions, processor
interrupts at the preset instances noting the instances of occurrences
of the events
_ processor interrupts for requesting the processor to use the capturing
of counts at the input instance
_ comparing of counts on the events for future Actions
Free running (blind counting) device Many Applications Based on

_ comparing the count (instance) with the one preloaded in a
compare register[an additional register for defining an instance for an
action]
_ capturing counts (instance) in an additional register on an input event.
70
[An addition input pin for sensing an event and saving the counts at the
instance of event and taking action.]
Free running (Blind Counts) input OC enable pin (or a control bit in control
register)
• For enabling an output when all count bits at free running count =
preloaded counts in the compare register.
• At that instance a status bit or output pin also sets in and an interrupt
‗OCINT‗ of processor can occur for event of comparison equality.
• Generates alarm or processor interrupts at the preset times
or after preset interval from another event
Free running (Blind Counts) input capture -enable pin (or a control bit in
control register) for Instance of Event Capture
• A register for capturing the counts on an instance of an input (0 to 1 or 1
to 0or toggling) transition
_ A status bit can also sets in and processor interrupt can occur for
the capture event Free running (Blind Counts) Pre-scaling
• Prescalar can be programmed as p = 1, 2,4, 8, 16, 32, .. by programming a
prescaler register. •Prescalar divides the input pulses as per the programmed
value of p.
• Count interval = p × δT interval
• δT = clock pulses period, clock
frequency = δT −1 Free running
(Blind Counts) Overflow
• It has an output pin (or a status bit in status register) for output when all
count bits = 0s after reaching the maximum value, which also means after
timeout or overflow
• Free running n-bit counter overflows after p ×
2n × δT interval Uses of a timer device
 _ Real Time Clock Ticks (System Heart Beats). [Real time clock is
a clock, which, once the system starts, does not stop and can't be reset
and its count value can't
71
be reloaded. Real time endlessly flows and never returns back!] Real
Time Clock is set for ticks using prescaling bits (or rate set bits)
inappropriate control registers.









 Initiating an event after a preset delay time. Delay is as per count
value loaded. Initiating an event (or a pair of events or a chain of
events) after a comparison(s)
with between the pre-set time(s) with counted value(s). [It is
similar to a preset alarm(s).].
 A preset time is loaded in a Compare Register. [It is similar to presetting an
alarm].
 Capturing the count value at the timer on an event. The information of
time(instance of the event) is thus stored at the capture register.
 Finding the time interval between two events. Counts are captured
at each event in capture register(s) and read. The intervals are thus
found out.
 Wait for a message from a queue or mailbox or semaphore for a preset
time when using RTOS. There is a predefined waiting period is done
before RTOS lets a task run.
Watchdog timer.
It resets the system after a defined time.
 _ Baud or Bit Rate Control for serial communication on a line or
network. Timer timeout interrupts define the time of each baud
72
 _ Input pulse counting when using a timer, which is ticked by giving non
periodic inputs instead of the clock inputs. The timer acts as a counter if,
in place of clock inputs, the inputs are given to the timer for each instance
to be counted.
 _ Scheduling of various tasks. A chain of software-timers interrupt
and RTOS uses these interrupts to schedule the tasks.
 _ Time slicing of various tasks. A multitasking or multi-programmed
operating system presents the illusion that multiple tasks or programs
are running simultaneously by switching between programs very
rapidly, for example, after every
16.6 ms.
 _ Process known as a context switch.[RTOS switches after preset time-delay
from one
running task to the next. task. Each task can therefore run in predefined slots of
time]
Time division multiplexing (TDM)



 _ Timer device used for multiplexing the input from a number of channels.
 _ Each channel input allotted a distinct and fixed-time slot to get a
TDM output. [For example, multiple telephone calls are the inputs
and TDM device generates the TDM output for launching it into the
optical fiber.
Software Timer
_ A software, which executes and increases or decreases a count-
73
variable(count value) on an interrupt from on a system timer output or
from on a real time clock interrupt.
_ The software timer also generate interrupt on overflow of count-value or on
finishing value of the count variabl
System clock
• In a system an hardware-timing device is programmed to tick at
constant intervals. • At each tick there is an interrupt
• A chain of interrupts thus occur at
periodic intervals. • The interval is as
per a preset count value
• The interrupts are called system clock interrupts, when used to
control the schedules and timings of the system
Software timer (SWT)

• SWT is a timer based on the system
clock interrupts •
The interrupt functions as a clock

input to an SWT.
• This input is common to all the SWTs that are in the list of
activated SWTs. • Any number of SWTs can be made active in
a list.
• Each SWT will set a status flag on its timeout (count-value reaching 0).
74
• Actions are analogous to that of a hardware timer. While there is physical
limit (1, 2 or 3 or 4) for the number of hardware timers in a system, SWT scan
be limited by the number of interrupt vectors provided by the user.
• Certain processors (microcontrollers)also defines the interrupt vector
addresses of 2 or 4 SWTs
1.9 SERIAL BUSCOMMUNICATION PROTOCOLS– I2C
Interconnecting number of device circuits, Assume flash memory, touch screen, ICs
for measuring temperatures and ICs for measuring pressures at a number of
processes in a plant.
_ ICs mutually network through a common synchronous serial bus I2C An 'Inter
Integrated Circuit' (I2C) bus, a popular bus for these circuits.
_Synchronous Serial Bus Communication for networking
_ Each specific I/O synchronous serial device may be connected to other using
specific interfaces, for example, with I/O device using I2C controller
_ I2C Bus communication− use of only simplifies the number of connections and
provides a common way (protocol) of connecting different or same type of I/O
devices using synchronous serial communication
75
IO I2C Bus
_ Any device that is compatible with a I2Cbus can be added to the system(assuming
an appropriate device driver program is available), and a I2C device can be integrated
into any system that uses that I2C bus.
Originally developed at Philips Semiconductors
Synchronous Serial Communication 400kbps up to 2 m and 100 kbps for
longer distances Three I2C standards
1. Industrial
100 kbps I2C,
2. 100 kbps
SM I2C,
3.
400
kbps
I2C
I2C
76
Bus
_ The Bus has two lines that carry its signals— one line is for the
clock and one is for bi- directional data.
_ There is a standard protocol
for the I2Cbus. Device
Addresses and Master in the
I2C bus
_ Each device has a 7-bit address using which the data
transfers take place. _ Master can address 127 other slaves
at an instance.
_ Master has at a processing element functioning as bus controller or a
microcontroller with I2C (Inter Integrated Circuit) bus interface circuit.
Slaves and Masters in the I2C bus
_ Each slave can also optionally has I2C (Inter Integrated Circuit) bus
controller and processing element.
_ Number of masters can be connected on the bus.
_ However, at an instance, master is one,which initiates a data transfer on

SDA(serial data) line and which transmits theSCL (serial clock) pulses. From
master, adata frame has fields beginning from startbi _ Second field of 7 bits─
address field. It defines the slave address, which is being sent the data frame (of
many bytes) by the master
_ Third field of 1 control bit─ defines whether a read or write cycle is in progress
_ Fourth field of 1 control bit─ defines whether is the present data is an
acknowledgment (from slave)
_ Fifth field of 8 bits─ I2C device data byte
_ Sixth field of 1-bit─ bit NACK (negative acknowledgement) from the
receiver. If active then acknowledgment after a transfer is not needed from the
slave, else acknowledgement is expected from the slave
_ Seventh field of 1 bit ─ stop bit
like in an UART Disadvantage of
I2C bus
77
• Time taken by algorithm in the hardware that analyzes the bits throughI2C
in case the slave hardware does not provide for the hardware that support sit.
• Certain ICs support the protocol and certain do not.
• Open collector drivers at the master need a pull-up resistance of 2.2 K on each line
1.11 SERIAL BUSCOMMUNICATION PROTOCOLS– CAN
Distributed Control Area Network example - a network of embedded systems in
automobile
_ CAN-bus line usually interconnects to a CAN controller between line and
host at the node. I
gives the input and gets output between the physical and data link layers at the
host node.
_ The CAN controller has a BIU (bus interface unit consisting of buffer and
driver), protocol controller, status-cum control registers, receiver-buffer and
message objects. These units connect the host node through the host interface
circuit
Three standards:
1. 33 kbps CAN,
2. 110 kbps Fault
Tolerant CAN, 3.
1 Mbps High
Speed CAN
CAN protocol
There is a CAN controller between the CAN line and the host node.
_ CAN controller ─BIU (Bus Interface Unit)consisting of a buffer and driver
_ Method for arbitration─ CSMA/AMP(Carrier Sense Multiple Access with
Arbitration on
78
Message Priority basis)
Each Distributed Node Uses:
• Twisted Pair Connection up to 40 m –for bi-directional data
• Line, which pulls to Logic 1 through a resistor between the line and + 4.5V to +12V.
• Line Idle state Logic 1 (Recessive state)
• Uses a buffer gate between an input pin and the CAN line
• Detects Input Presence at the CAN line pulled down to dominant (active) state
logic 0 (ground ~ 0V) by a sender to the CAN line
• Uses a current driver between the output pin and CAN line and pulls line
down to dominant (active) state logic 0(ground ~ 0V) when sending to the
79
CAN line Protocol defined start bit followed by six fields of frame bits Data
frame starts after first detecting that dominant state is not present at the CAN
line with logic 1 (R state) to 0 (D state transition) for one serial bit interval
• After start bit, six fields starting from arbitration field and ends with seven
logic0s end-field •
3-bit minimum inter frame gap before next start bit (R→ D transition) occurs
Protocol defined First field in frame bits _ First field of 12 bits ─'arbitration field. _ 11-bit
destination address and RTR bit (Remote Transmission Request)
_ Destination device address specified in an11-bit sub-field and whether the data byte
being sent
80
is a data for the device or a request to the device in 1-bit sub-field.
_ Maximum 211 devices can connect a CAN controller in case of 11-bit address
fieldstandard11-
bit address standard CAN
_ Identifies the device to which data is being sent or request is being made.
_ When RTR bit is at '1', it means this packet is for the device at destination address.
If this bit is at '0' (dominant state) it means, this packet is a request for the data from
the
device. Protocol defined frame
bits Second field _ Second field
of 6 bits─ control field.
The first bit is for the
identifier‗s extension. _ The
second bit is always '1'.
_ The last 4 bits specify code for data Length
_ Third field of 0 to 64 bits─ Its length depends on the data length code in the control
field.
• Fourth field (third if data field has no bit present) of 16 bits─ CRC (Cyclic
Redundancy Check) bits.
• The receiver node uses it to detect the errors, if any, during
the transmission • Fifth field of 2 bits─ First bit 'ACK slot'
• ACK = '1' and receiver sends back '0' in this slot when the receiver detects
an error in the reception.
• Sender after sensing '0' in the ACK slot, generally retransmits
the data frame. • Second bit 'ACK delimiter' bit. It signals the
end of ACK field.
81
• If the transmitting node does not receive any acknowledgement of data
frame within a Specified time slot, it should retransmit.
Sixth field of 7-bits ─ end- of- the frame specification and has seven '0's
82

Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

UNIT I Introduction to Embedded Processors, Devices and Communication

Buses: Introduction to Embedded Systems - Design Metrics – Optimization

Introduction to I/O Devices – Types - Synchronous, Iso-synchronous and

1. INTRODUCTION TO EMBEDDED COMPUTING:

It follows a set of rules.

1.2 EMBEDDED SYSTEM :

―Systems are the electronic systems that contain a microprocessor or a

1. Embeds hardware to give computer like functionalities.

We can classify embedded systems into three types as follows:

1.Small Scale Embedded Systems: These systems are designed with a

3.Sophisticated Embedded Systems: Sophisticated embedded systems have

2. ISSUES AND CHALLENGES IN EMBEDDED SYSTEM DESIGN

2.1 ISSUES IN EMBEDDED SYSTEM

2.2 OPTIMIZATION CHALLENGES IN EMBEDDED COMPUTING SYSTEM

Does it really work?

■ Complex testing: Exercising an embedded system is generally more

4.1 Hardware elements in the embedded systems:

(i) Power Source

Consider the system in an Automatic Chocolate Vending Machine.

(v) Interrupts Handler

(vi) Linking Embedded System Hardware

(vii) IO Communication Unit

Serial Bus(es) : For example, UART(512 kbaud/s),1-wire CAN (33

SOFTWARE IN EMBEDDED SYSTEMS

SOFTWARE IN PROCESSOR SPECIFIC ASSEMBLY LANGUAGE

To do all the coding in assembly language may, however, be very time

Figure shows the process of converting an assembly language program

1. An assembler translates the assembly software into the machine codes

SOFTWARE FOR EMBEDDING IN SYSTEM:

A device driver is software for controlling (configuring), receiving and

EXAMPLES OF EMBEDDED SYSTEMS

o Point of sales terminals

o Mobile Smart Phones and

THE EMBEDDED SYSTEM DESIGN PROCESS

A design methodology is important for three reasons.

Figure summarizes the major steps in the embedded

Major levels of abstraction in the design process.

In the next step, speciﬁcation, we create a more detailed description

Based on those components, we can ﬁnally build a complete

Requirements may be functional or nonfunctional . We must of

■ Performance: The speed of the system is often a major

■ Cost: The target cost or purchase price for the system is

■ Physical size and weight: The physical aspects of the ﬁnal

■ Power consumption: Power, of course, is important in battery-

Validating a set of requirements is ultimately a psychological task

FIGURE Sample requirements form.

Requirements analysis for big systems can be complex and time

■ Purpose: This should be a brief one- or two-line description of what

— Types of data: Analog electronic signals? Digital data? Mechanical inputs?

■ Functions: This is a more detailed description of what the system

■ Performance: Many embedded computing systems spend at least

■ Manufacturing cost: This includes primarily the cost of the hardware

To practice thecapture of system requirements, Example

Note that many of these requirements are not speciﬁed in engineering

A speciﬁcation of the GPS system would include several components:

Only after we have designed an initial architecture that is not biased

FIGURE Hardware and software architectures for the moving map.

How do we know that our hardware and software architectures in fact

The architectural description tells us what components we need.

Only after the components are built do we have the satisfaction of

One cycle per instruction is achieved by exploitation of parallelism through the

By overlapping the execution of several instructions in a pipeline fashion (as shown in

The trade-off between RISC and CISC can be summarized as follows: