Professional Documents
Culture Documents
1.1 SYSTEM:
Definition:
o A System is a way of working, organizing or doing one or many tasks
according to a fixed plan, program, or set of rules.
o A system is also an arrangement in which all its units assemble and work
together according to the plan or program.
o Let us examine the following two examples.
Consider a watch.
It is a time-display system. Its parts are its hardware, needles
and battery with the beautiful dial, chassis and strap. These parts organize to
show the real time every second and continuously update the time every second.
The system-program updates the display using three needles after each second.
It follows a set of rules.
(i) All needles move clockwise only.
(ii) A thin and long needle rotates every second such that it returns to same
position after a minute.
(iii) A long needle rotates every minute such that it returns to same position
after an hour.
(iv) A short needle rotates every hour such that it returns to same position
after twelve hours.
(v) All three needles return to the same inclinations after twelve hours each
day.
Consider a washing machine.
It is an automatic clothes-washing system. The important
hard- ware parts include its status display panel, the switches and dials for user-
defined programming, a motor to rotate or spin, its power supply and control
unit, an inner water-level sensor, a solenoid valve for letting water in and
1
another valve for letting water drain out.
These parts organize to wash clothes automatically according to a
program preset by a user. The system-program is to wash the dirty clothes placed
in a tank, which rotates or spins in pre-programmed steps and stages.
1. A microprocessor
2. A large memory comprising the following two kinds:
(a) Primary memory (semiconductor memories - RAM, ROM and fast
accessible caches)
(b) Secondary memory (magnetic memory located in hard disks, diskettes
and cartridge tapes and optical memory in CD-ROM)
3. Input units like keyboard, mouse,
digitizer ,scanner, etc.
4. Output units like video monitor, printer.
5. Networking units like Ethernet card, front-end processor-
based drivers, etc.
6. I/O units like a modem, fax cum modem, etc.
2
system.
2. It has main application software. The application software may perform
concurrently the series of tasks or multiple tasks.
3. It has a Real Time Operating System (RTOS), that supervises the
application software and provides a mechanism to let the processor run a
process as per scheduling and do the context-switch between the various
processes (tasks). RTOS defines the way the system works. It organizes
access to a resource in sequence of the series of tasks of the system. It schedules
their working and execution by following a plan to control the latencies and to
meet the deadlines. [ Latency refers to the waiting period between
running the codes of a task and the instance at which the need for the
task arises. ] It sets the rules during the execution of the application software.
A small-scale embedded system may not need an RTOS.
An embedded system has software designed to keep in view three constraints: (i)
available system-memory, (ii) available processor speed and (iii) the need to limit
power dissipation when running the system continuously in cycles of wait for events,
run, stop and wake-up.
Definition :
An Embedded System is a system that has software embedded
into computer-hardware, which makes a system dedicated for an
application(s) or specific part of an application or product or part of a
larger system.
An embedded system is one that has dedicated purpose software
embedded in computer hardware.
It is a dedicated computer based system for an application(s) or product.
It may be an independent system or apart of large system. Its software
usually embeds into a ROM (Read Only Memory) or flash.
―It is any device that includes a programmable computer but is not itself
intended to be a general purpose computer. –Wayne Wolf, Ref: 61
3
1.4 CLASSIFICATION OF EMBEDDED SYSTEMS :
2. Medium Scale Embedded Systems : These systems are usually designed with a
single or few 16- or 32-bit microcontrollers or DSPs or Reduced Instruction Set
Computers (RISCs). These have both hardware and software complexities.
For complex software design, there are the following programming tools:
RTOS, Source code engineering tool, Simulator, Debugger and Integrated
Development Environment(IDE).Software tools also provide the solutions to the
Hardware complexities. An assembler is of little use as a programming tool.
These systems may also employ the readily available ASSPs and
IPs(explained later)for the various functions—for example, for the bus
interfacing, encrypting, deciphering, discrete cosine transformation and
inverse transformation, TCP/IP protocol stacking and network connecting
functions. [ASSPs and IPs may also have to be appropriately configured by the
system software before being integrated into the system-bus.]
4
cost or may not be available at all. In some cases, a compiler or retargetable
compiler might have to be developed for these. [A retargetable compiler is one
that configures according to the given target configuration in a system.]
Real-time Response
Realtime systems have to respond to external interactions in a predetermined amount of
time. Successful completion of an operation depends upon the correct and timely operation
of the system. Design the hardware and the software in the system to meet the Realtime
requirements. For example, a telephone switching system must feed dial tone to thousands of
subscribers within a recommended limit of one second.
To meet these requirements, the off hook detection mechanism and the software
message communication involved have to work within the limited time budget. The system
has to meet these requirements for all the calls being set up at any given time.The designers
have to focus very early on the Realtime response requirements. During the architecture
design phase, the hardware and software engineers work together to select the right system
architecture that will meet the requirements.
Recovering from Failures
Realtime systems must function reliably in event of failures. These failures can be
internal as well as external. The following sections discuss the issues involved in handling
these failures.
Internal failures can be due to hardware and software failures in the system. The
different types of failures you would typically expect are:
Software Failures in a Task: Unlike desktop applications, Realtime applications do not
have the luxury of popping a dialog box and exiting on detecting a failure. Design the tasks
to safeguard against error conditions.
5
This becomes even more important in a Realtime system because sequence of events can
result in a large number of scenarios. It may not be possible to test all the cases in the
laboratory environment. Thus apply defensive checks to recover from error conditions.
Also, some software error conditions might lead to a task hitting a processor exception. In
such cases, it might sometimes be possible to just rollback the task to its previous saved
state.
Processor Restart: Most Realtime systems are made up of multiple nodes. It is not
possible to bring down the complete system on failure of a single node thus design the
software to handle independent failure of any of the nodes.
Board Failure: Realtime systems are expected to recover from hardware failures. The
system should be able to detect and recover from board failures. When a board fails, the
system notifies the operator about the it. Also, the system should be able to switch in a spare
for the failed board. (If the board has a spare)
Link Failure: Most of the communication in Realtime systems takes place over links
connecting the different processing nodes in the system. Again, the system isolates a link
failure and reroutes messages so that link failure does not disturb the message
communication.
External Failures in Realtime systems have to perform in the real world. Thus they should
recover from failures in the external environment. Different types of failures that can take
place in the environment are:
Invalid Behavior of External Entities: When a Realtime system interacts with external
entities, it should be able to handle all possible failure conditions from these entities. A
good example of this is the way a telephone switching systems handle calls from
subscribers. In this case, the system is interacting with humans, so it should handle all kinds
of failures, like:
1. Subscriber goes off hook but does not dial
2. Toddler playing with the phone!
3. Subscriber hangs up before completing dialing.
Inter Connectivity Failure: Many times a Realtime system is distributed across several
locations. External links might connect these locations. Handling of these conditions is
similar to handling of internal link failures. The major difference is that such failures might
be for an extended duration and many times it might not be possible to reroute the
messages.
Working with Distributed Architectures
Most Realtime systems involve processing on several different nodes. The system itself
distributes the processing load among several processors. This introduces several challenges
in design:
Maintaining Consistency: Maintaining data-structure consistency is a challenge when
multiple processors are involved in feature execution. Consistency is generally maintained
by running data-structure audits.
Initializing the System: Initializing a system with multiple processors is far more
complicated than bringing up a single machine. In most systems the software release is
resident on the OMC. The node that is directly connected to the OMC will initialize first.
6
When this node finishes initialization, it will initiate software downloads for the child nodes
directly connected to it. This process goes on in an hierarchical fashion till the complete
system is initialized.
Inter-Processor Interfaces: One of the biggest headache in Realtime systems is defining
and maintaining message interfaces. Defining of interfaces is complicated by different byte
ordering and padding rules in processors. Maintenance of interfaces is complicated by
backward compatibility issues. For example if a cellular system changes the air interface
protocol for a new breed of phones, it will still have to support interfaces with older phones.
Load Distribution: When multiple processors and links are involved in message
interactions distributing the load evenly can be a daunting task. If the system has evenly
balanced load, the capacity of the system can be increased by adding more processors. Such
systems are said to scale linearly with increasing processing power. But often designers find
themselves in a position where a single processor or link becomes a bottle neck. This leads
to costly redesign of the features to improve system scalability.
Centralized Resource Allocation: Distributed systems may be running on multiple
processors, but they have to allocate resources from a shared pool. Shared pool allocation is
typically managed by a single processor allocating resources from the shared pool. If the
system is not designed carefully, the shared resource allocator can become a bottle neck in
achieving full system capacity.
Asynchronous Communication
Remote procedure calls (RPC) are used in computer systems to simplify software
design. RPC allows a programmer to call procedures on a remote machine with the same
semantics as local procedure calls. RPCs really simplify the design and development of
conventional systems, but they are of very limited use in Real time systems. The main reason
is that most communication in the real world is asynchronous in nature, i.e. very few
message interactions can be classified into the query response paradigm that works so well
using RPCs.
Thus most Real time systems support state machine based design where multiple
messages can be received in a single state. The next state is determined by the contents of the
received message. State machines provide a very flexible mechanism to handle asynchronous
message interactions. The flexibility comes with its own complexities. We will be covering
state machine design issues in future additions to the Real time Mantra.
Race Conditions and Timing
It is said that the three most important things in Real time system design are timing,
timing and timing. A brief look at any protocol will underscore the importance of timing. All
the steps in a protocol are described with exact timing specification for each stage. Most
protocols will also specify how the timing should vary with increasing load. Real time
systems deal with timing issues by using timers. Timers are started to monitor the progress of
events. If the expected event takes place, the timer is stopped. If the expected event does not
take place, the timer will timeout and recovery action will be triggered.
A race condition occurs when the state of a resource depends on timing factors that are
not predictable. This is best explained with an example. Telephone exchanges have two way
7
trunks which can be used by any of the two exchanges connected by the trunk. The problem
is that both ends can allocate the trunk at more or less the same time, thus resulting in a race
condition. Here the same trunk has been allocated for a incoming and an outgoing call. This
race condition can be easily resolved by defining rules on who gets to keep the resource
when such a clash occurs. The race condition can be avoided by requiring the two exchanges
to work from different ends of the pool. Thus there will be no clashes under low load. Under
high load race conditions will be hit which will be resolved by the pre-defined rules.
A more conservative design would partition the two way trunk pool into two one way
pools. This would avoid the race condition but would fragment the resource pool. The main
issue here is identifying race conditions. Most race conditions are not as simple as this one.
Some of them are subtle and can only be identified by careful examination of the design.
8
How do we design for upgradability?
The hardware platform may be used over several product generations
or for several different versions of a product in the same generation,
with few or no changes. However, we want to be able to add features by
changing software. How can we design a machine that will provide the
required performance for software that we haven‘t yet written?
9
4. EMBEDDED PROCESSOR ARCHITECTURE: GENERAL CONCEPTS
(iii)Reset Circuit
Reset means that the processor starts the processing of instructions from a starting
address. That address is one that is set by default in the processor program
counter (or instruction pointer and code segment registers in x86 processors) on
a power-up. From that address in memory, the fetching of program-instructions
starts following the reset of the processor.
1. Reset on Power-up
2. External and Internal Reset circuit
3. Reset on Timeout of Watchdog timer.
10
(iv)Memory
a. Functions Assigned to the ROM or EPROM or Flash
1. Storing 'Application' program from where the processor fetches the
instruction codes. 2. Storing codes for system booting, initializing, Initial
input data and Strings.
3. Storing Codes for RTOS.
4. Storing Pointers (addresses) of various service routines.
b. Functions Assigned to the Internal, External
and Buffer RAM 1. Storing the variables during
program run,
2. Storing the stacks,
3. Storing input or output buffers for example, for speech or image .
c. Functions Assigned to the
EEPROM or Flash Storing
non-volatile results of
processing
d. Functions Assigned to the Caches
1. Storing copies of the instructions, data and branch-transfer
instructions in advance from external memories,
2. Storing temporarily the results in write back caches during fast processing.
11
(v) Input, Output and I/O Ports :
The system gets inputs from physical devices (such as, for example,
the key-buttons, sensors and transducer circuits) through the input ports. A
controller circuit in a system gets the inputs from the sensor and transducer
circuits. A receiver of signals or a network card gets the input from a
communication system. [A communication system could be a fax or modem, a
broadcasting service]. Signals from a network are also received at the ports.
Consider the system in a mobile phone. The user inputs the mobile
number through the buttons, directly or indirectly (through recall of the number
from its memory). A panel of buttons connects to the system through the input
port or ports. Processor identifies each input port by its memory buffer address(es),
called port address(es). Just as a memory location holding a byte or word is
identified by an address, each input port is also identified by the address.
The system gets the inputs by the read operations at the port addresses.
The system has output ports through which it sends output bytes to the real
world. An output may be to an LED (Light Emitting diode) or LCD (Liquid
Crystal Display) panel.
For example, a calculator or mobile phone system sends the output-numbers or an
SMS message to the LCD display. A system may send output to a printer. An
output may be to a communication system or network. A control system sends the
outputs to alarms, actuators, furnaces or boilers. A robot is sent output for its
various motors. Each output port is identified by its memory-buffer address(es)
(called port address).
The system sends the output by a write operation to the port address.
There are also general-purpose ports for both the input and
output (I/O) operations. There are two types of I/O ports: Parallel
and Serial.
From a serial port, a system gets a serial stream of bits at an input or sends a
stream at an
output. For example, through a serial port, the system gets and sends the signals as
the bits through a modem. A serial port also facilitates long distance communication
and interconnections. A serial port may be a Serial UART port, a Serial
Synchronous port or some other Serial Interfacing port. [UART stands for
Universal Asynchronous Receiver and Transmitter].
A system port may get inputs from multiple channels or may have to send output
to multiple channels. A demultiplexer takes the inputs from various channels and
transfers the input from a select channel to the system. A multiplexer takes the
12
output from the system and sends it to another system.
A system might have to be connected to a number of other devices and systems.
For networking the systems, there are different types of buses: for example, I2C,
CAN, USB, ISA, EISA and PCI.
viii) TIMERS :
A timer circuit suitably configured is the system-clock, also called real-
13
time clock (RTC). An RTC is used by the schedulers and for real-time
programming.
An RTC is designed as follows: Assume a processor generates a clock output
every 0.5 ms. When a system timer is configured by a software instruction to
issue timeout after 200 inputs from the processor clock outputs, then there are
10000 interrupts (ticks) each second. The RTC ticking rate is then 10 kHz and it
interrupts every 100 ms. The RTC is also used to obtain software-controlled
delays and time-outs.
More than one timer using the system clock (RTC) may be needed for the
various timing and counting needs in a system.
The software is the most important aspect, the brain of the embedded system.
An embedded system processor and the system need software that is specific to
a given application of that system. The processor of the system processes the
instruction codes and data. In the final stage, these are placed in the memory
(ROM) for all the tasks that have to be executed. The final stage software is also
called ROM image. Why? Just as an image is a unique sequence and
arrangement of pixels, embedded software is also a unique placement and
arrangement of bytes for instructions and data.
Each code or datum is available only in bits and byte(s) format. The system
requires bytes at each ROM-address, according to the tasks being executed. A
machine implement-able software file is therefore like a table of address and
bytes at each address of the system memory.
Lack of knowledge of writing device driver codes or codes that utilize the
processor-specific features-invoking codes in an embedded system design team
can cost a lot. Vendors may not only charge for the API but also charge
intellectual property fees for each system shipped out of the company.
14
consuming. Full coding in assembly may be done only for a few simple, small-scale
systems, such as toys, automatic chocolate vending machine, robot or data
acquisition system.
2. In the next step, called linking , a linker links these codes with the other required
assembled codes. Linking is necessary because of the number of codes to be
linked for the final binary file. For example, there are the standard codes to
program a delay task for which there is a reference in the assembly language
program. The codes for the delay must link with the assembled codes. The delay
code is sequential from a certain beginning address. The assembly software code
is also sequential from a certain beginning address. Both the codes have to at the
distinct addresses as well as available addresses in the system. Linker links these.
The linked file in binary for run on a computer is commonly known as executable
file or simply ‗.exe‘ file. After linking, there has to be re-allocation of the
sequences of placing the codes before actually placement of the codes in the
memory.
3. In the next step, the loader program performs the task of reallocating the codes
after finding the physical RAM addresses available at a given instant. The
loader is a part of the operating system and places codes into the memory after
reading the ‗.exe‘ file. This step is necessary because the available memory
addresses may not start from 0x0000, and binary codes have to be loaded at the
different addresses during the run. The loader finds the appropriate start
address. In a computer, the loader is used and it loads into a section of RAM
15
the program that is ready to run.
4. The final step of the system design process is locating the codes as a ROM image
and permanently placing them at the actually available addresses in the ROM. In
embedded systems, there is no separate program to keep track of the available
addresses at different times during the running, as in a computer. The designer
has to define the available addresses to load and create files for permanently
locating the codes. A program called locator reallocates the linked file and creates
a file for permanent location of codes in a standard format. This format may be
Intel Hex file format or Motorola S-record format.
5. Lastly, either
(i) a laboratory system, called device programmer, takes as input
the ROM image file and finally burns the image into the PROM or
EPROM or
(ii) at a foundry, a mask is created for the ROM of the embedded system
from the image file. [The process of placing the codes in PROM or EPROM is also
called burning.] The mask created from the image gives the ROM in IC chip form.
16
the devices that are present. Also includes software for allocating and registering
port(s) or device codes and data at memory addresses for the various devices at
distinctly different addresses, including codes for detecting any collision between
the allocated addresses, if any Real Time Operating System (RTOS).
Embedded software is most often designed for deterministic performance
and task and ISR latencies in addition to the OS functions
Performing multiple actions and controlling multiple devices and their
ISRs with defined real time constraints and with deadlines for these Task and ISRs
priority allocations, their preemptive scheduling, OS for providing deterministic
performance during concurrent processing and execution with hard(stringent) or
soft timing requirements with priority allocation and pre- emption. RTOS is needed
when the tasks for the system have real time constraints and deadlines for finishing
the tasks.
Application Software Development Tools
o Source Code Engineering Tools
o Stethoscope (tracks the switching from one task to another as a
function of time, stores beats)
o Trace Scope (traces changes in a parameter(s) as a function of time)
Simulator
A Simulator used to simulate the target processor and hardware elements on a
host PC and to run and test the executable module.
Project Manager
To manage the files that associates with a design stage project and keep
several versions of the source file(s) in an orderly fashion.
1.10 APPLICATIONS
17
o Microcontroller- based single or multi-display digital panel
meter for voltage, current, resistance and frequency
o Keyboard controller
o Serial port cards CD drive or Hard Disk drive controller,
Peripheral controllers, a CRT display controller, a keyboard
controller, a DRAM controller, a DMA controller, a printer
controller,
o a laser printer-controller, a LAN controller, a disk drive controller
o Fax or photocopy or printer or scanner Machine Remote
(controller) of TV o Telephone with memory, display and
other sophisticated features
o Motor controls Systems - for examples, an accurate control of speed and
position of d.c. motor, robot, and CNC machine; the automotive
applications like such as a close loop engine control, a dynamic ride
control, and an anti-lock braking system monitor
o Electronic data acquisition and supervisory control system Spectrum analyzer
o Biomedical systems - for example, an ECG LCD display-cum-
recorder, a blood- cell recorder cum analyzer and a patient monitor
system service.
o Electronic instruments, such as industrial process controller
o Electronic smart weight display system, and an industrial moisture
recorder cum controller. Digital storage system for a signal wave
form or Electric or Water Meter Reading.
o Computer networking systems, - for examples, router, front-end
processor in a server, switch, bridge, hub, and gateway
o For Internet appliances, there are numerous application systems:
( i ) Intelligent operation, administration and maintenance router
(IOAMR) in a distributed network, and
(ii) Mail Client card to store e-mail and personal addresses and to
smartly connect to a modem or server.
o Banking systems - for examples, Bank ATM and Credit card transactions
o Signal Tracking Systems - for examples, an automatic signal
tracker and a target tracker.
Communication systems, for examples, such as for a
mobile-communication a SIM card, a numeric pager, a
cellular phone, a cable TV terminal, and a FAX
transceiver with or without a graphic accelerator. Image
Filtering, Image Processing, Pattern Recognizer, Speech
Processing and Video Processing.
o Entertainment systems - such as videogame, music system and Video Games
o A system that connects a pocket PC to the automobile driver mobile
phone and a wireless receiver. The system then connects to a remote
server for Internet or e-mail or to remote computer at an ASP
(application Service Provider).A personal information manager using
18
frame buffers in hand- held devices.
o Thin Client to provide the disk-less nodes with the remote boot
capability.[Application of thin- clients is accesses to a data center from
a number of nodes; or in an Internet Laboratory accesses to the
Internet leased line through a remote Server]. Embedded Firewall /
Router using ARM7/multi-processor with two Ethernet interfaces
and interfaces support to for PPP, TCP/IP and UDP protocols.
Sophisticated Applications
19
process, team members can more easily understand what they
are supposed to do, what they should receive from other team
members at certain times, and what they are to hand off
when they complete their assigned steps. Since most embedded
systems are designed by teams, coordination is perhaps the most
important role of a well-defined design methodology.
20
in the figure as dashed-line arrows. We need bottom–up design because we
do not have perfect insight into how later stages of the design process will
turn out. Decisions at one stage of design are based upon estimates of what
will happen later: How fast can we make a particular function run? How
much memory will we need? How much system bus capacity do we
need? If our estimates are inadequate, we may have to backtrack and
amend our original decisions to take the new facts into account. In
general, the less experience we have with the design of similar systems,
the more we will have to rely on bottom-up design information to help us
refine the system.
But the steps in the design process are only one axis along which we can
view embedded system design. We also need to consider the major goals of the
design:
■ manufacturing cost;
■ performance ( both overall speed and
deadlines); and ■ power consumption.
We must also consider the tasks we need to perform at every step in the
design process. At each step in the design, we add detail:
■ We must analyze the design at each step to determine how we
can meet the specifications.
■ We must then refine the design to add detail.
■ And we must verify the design to ensure that it still meets all system
goals, such as cost, speed, and so on.
Requirements
Clearly, before we design a system, we must know what we are
designing. The initial stages of the design process capture this information for
use in creating the architecture and components.
We generally proceed in two phases:
First, we gather an informal description from the customers known as
requirements, and
we refine the requirements into a specification that contains enough
information to begin designing the system architecture.
Separating out requirements analysis and specification is often
necessary because of the large gap between what the customers can
describe about the system they want and what the architects need to design
the system. Consumers of embedded systems are usually not themselves
embedded system designers or even product designers. Their understanding
of the system is based on how they envision users‘ interactions with the
system.They may have unrealistic expectations as to what can be done
within their budgets; and they may also express their desires in a
language very different from system architects‘ jargon. Capturing a
21
consistent set of requirements from the customer and then massaging those
requirements into a more formal specification is a structured way to manage
the process of translating from the consumer‘s language to the designer‘s.
Name
22
Purpose
Inputs
Outputs
Functions
Manufacturing
cost Power
Physical size and weight
■ Name: This is simple but helpful. Giving a name to the project not
only sim- plifies talking about it to other people but can also crystallize
the purpose of the machine.
■ Inputs and outputs: These two entries are more complex than they
seem. The inputs and outputs to the system encompass a wealth of detail:
23
performance requirements be identified early since they must be
carefully measured during implementation to ensure that the system works
properly.
■ Power: Similarly, you may have only a rough idea of how much
power the system can consume, but a little information can go a long
way. Typically, the most important decision is whether the machine will
be battery powered or plugged into the wall. Battery-powered machines
must be much more careful about how they spend energy.
■ Physical size and weight: You should give some indication of the
physical size of the system to help guide certain architectural decisions.
A desktop machine has much more flexibility in the components used
than, for example, a lapel- mounted voice recorder.
A more thorough requirements analysis for a large system might use a form
similar to Figure as a summary of the longer requirements document. After an
introductory section containing this form, a longer requirements document could
include details on each of the items mentioned in the introduction. For example, each
individual feature described in the introduction in a single sentence may be
described in detail in a section of the specification.
After writing the requirements, you should check them for internal
consistency: Did you forget to assign a function to an input or output? Did you
consider all the modes in which you want the system to operate? Did you place
an unrealistic number of features into a battery-powered, low-cost machine?
Example 1.1
Requirements analysis of a GPS moving map
The moving map is a handheld device that displays for the user a map of the
terrain around the user‘s current position; the map display changes as the
user and the map device change position. The moving map obtains its
position from the GPS, a satellite-based navigation system. The moving
map display might look something like the following figure.
24
What requirements might we have for our GPS moving map? Here is an initial list:
■ Functionality: This system is designed for highway driving and similar uses,
not nautical or aviation uses that require more specialized databases and
functions. The system should show major roads and other landmarks available in
standard topographic databases.
■ User interface: The screen should have at least 400 600 pixel resolution. The
device should be controlled by no more than three buttons. A menu system
should pop up on the screen when buttons are pressed to allow the user to make
selections to control the system.
■ Performance: The map should scroll smoothly. Upon power-up, a display should
take no more than one second to appear, and the system should be able to verify
its position and display the current map within 15 s.
■ Cost: The selling cost (street price) of the unit should be no more than $100.
Physical size and weight: The device should fit comfortably in the palm of the hand.
■ Power consumption: The device should run for at least eight hours
on four AA batteries.
25
Specification
The specification is more precise—it serves as the contract between
the customer and the architects. As such, the specification must be carefully
written so that it accurately reflects the customer‘s requirements and does
so in a way that can be clearly followed during design.
Specification is probably the least familiar phase of this methodology
for neophyte designers, but it is essential to creating working systems with a
minimum of designer effort. Designers who lack a clear idea of what they want
to build when they begin typically make faulty assumptions early in the
process that aren‘t obvious until they have a working system. At that point,
the only solution is to take the machine apart, throw away some of it, and
start again. Not only does this take a lot of extra time, the resulting system is
also very likely to be inelegant, kludge, and bug-ridden.
The specification should be understandable enough so that
someone can verify that it meets system requirements and overall expectations
of the customer. It should also be unambiguous enough that designers know
what they need to build. Designers can run into several different types of
problems caused by unclear specification. If the behavior of some feature in a
particular situation is unclear from the specification, the designer may
implement the wrong functionality. If global characteristics of the
specification are wrong or incomplete, the overall system architecture derived
from the specification may be inadequate to meet the needs of
implementation.
26
■ Background actions required to keep the system running, such as
operating the GPS receiver.
Architecture Design
The specification does not say how the system does things, only
what the system does. Describing how the system implements those
functions is the purpose of the architecture. The architecture is a plan for
the overall structure of the system that will be used later to design the
components that make up the architecture. The creation of the
architecture is the first phase of what many designers think of as design.
To understand what an architectural description is, let‘s look at a
sample architecture for the moving map of Example 1.1.
Figure shows a sample system architecture in the form of a block
diagram that shows major operations and data flows among them.
This block diagram is still quite abstract—we have not yet specified
which operations will be performed by software running on a CPU, what
will be done by special-purpose hardware, and so on. The diagram does,
however, go a long way toward describing how to implement the
functions described in the specification. We clearly see, for example, that
we need to search the topographic database and to render (i.e., draw) the
results for the display.
27
Architectural descriptions must be designed to satisfy both
functional and non-functional requirements. Not only must all the
required functions be present, but we must meet cost, speed, power, and
other nonfunctional constraints.
Starting out with a system architecture and refining that to hardware and
software Architectures is one good way to ensure that we meet all
specifications: We can concentrate on the functional elements in the
system block diagram, and then consider the nonfunctional constraints
when creating the hardware and software architectures.
28
Designing Hardware and Software Components
You will have to design some components yourself. Even if you are using
only standard integrated circuits, you may have to design the printed circuit
board that connects them. You will probably have to do a lot of custom
programming as well. When creating these embedded software modules,
you must of course make use of your expertise to ensure that the system
runs properly in real time and that it does not take up more memory
space than is allowed. The power consumption of the moving map
software example is particularly important. You may need to be very
careful about how you read and write memory to minimize power—for
example, since memory accesses are a major source of power consumption,
memory transactions must be carefully planned to avoid reading the same
data several times.
System Integration
29
and test functions relatively independently.
System integration is difficult because it usually uncovers problems.
It is often hard to observe the system in sufficient detail to determine
exactly what is wrong— the debugging facilities for embedded systems
are usually much more limited than what you would find on desktop
systems. As a result, determining why things do not stet work correctly
and how they can be fixed is a challenge in itself. Careful attention to
inserting appropriate debugging facilities during design can help ease system
integration problems, but the nature of embedded computing means that this
phase will always be a challenge.
4.3 RISC ARCHITECTURE
RISC architecture has been developed as a result of the 801 project which started in
1975 at the IBM T.J. Watson Research Center and was completed by the early 1980s.
This project was not widely known to the world outside of IBM and two other projects
with similar objectives started in the early 1980s at the University of California
Berkeley and Stanford University]. The term RISC (Reduced Instruction Set
Architecture), used for the Berkeley research project, is the term under which this
architecture became widely known and recognized today.
RISC Performance
There are several ways to achieve performance: technology advances, better machine
organization, better architecture, and also the optimization improvements in compiler
technology. By technology, machine performance can be improved only in proportion
to the amount of technology improvements and this is, more or less, available to everyone.
RISC deals with these two levels - more precisely their interaction and trade-offs.
The work that each instruction of the RISC machine performs is simple and straight
forward. Thus, the time required to execute each instruction can be shortened and the
number of cycles reduced.
Typically the instruction execution time is divided in five stages, machine cycles,
and as soon as processing of one stage is finished, the machine proceeds with executing the
second stage. However, when the stage becomes free it is used to execute the same
operation that belongs to the next instruction. The operation of the instructions is
performed in a pipeline fashion, similar to the assembly line in the factory process.
30
Typically those five pipeline stagesare:
IF – Instruction Fetch
ID – Instruction Decode
EX-Execute
MA – Memory Access
WB -Write Back
The goal of RISC is to achieve execution rate of one Cycle Per Instruction (CPI=1.0) which
would be the case when no interruptions in the pipeline occurs.
The simplicity of the RISC instruction set is traded for more parallelism in execution. On
average a code written for RISC will consist of more instructions than the one written
for CISC. The typical trade-off that exists between RISC and CISC can be expressed in
the total time required to execute a certain task:
Time (task) = I x C x P x T0
I = No. of Instructions / Task
C = No. of Cycles / Instruction
P = No. of Clock Periods / Cycle (usually P=1)
T0 = Clock Period (ns)
31
a. CISC achieves its performance advantage by denser program consisting of a
fewer number of powerful instructions.
b. RISC achieves its performance advantage by having simpler instructions .
4.4PIPELINING
Finally, the single most important feature of RISC is pipelining. Degree of
parallelism in the RISC machine is determined by the depth of the pipeline. It could be
stated that all the features of RISC (that were listed in this article), could easily be
derived from the requirementsfor pipelining and maintaining an efficient execution
model. The sole purpose of many of those features is to support an efficient execution of
RISC pipeline. It is clear that without pipelining the goal of CPI = 1 is not possible.
We may be lead to think that by increasing the number of pipeline stages (the
pipeline depth), thus introducing more parallelism, we may increase the RISC
machine performance further. However, this idea does not lead to a simple and straight
forward to realization.
The increase in the number of pipeline stages introduces an overhead not only in hardware
(needed to implement the additional pipeline registers), but also the overhead in time due to
the delay of the latches used to implement the pipeline stages as well as the cycle time lost
due to the clock skews and clock jitter. This could very soon bring us to the point of
diminishing returns where further increase in the pipeline depth would result in less
performance.
An additional side effect of deeply pipelined systems is hardware complexity
necessary to resolve all the possible conflicts that can occur between the increased number
of instructions residing in the pipeline at one time. The number of the pipeline stages is
mainly determined by the type of the instruction core (the most frequent
instructions) and the operations required by those instructions. The pipeline depth
depends, as well, on the technology used. If the machine is implemented in a very high
speed technology characterized by the very small number of gate levels (such as GaAs
or ECL), characterized with a very good control of the clock skews, it makes sense to
pipeline the machine deeper. The RISC machines that achieve performance through the
use of many pipeline stages are known as super-pipelined machines.
Today the most common number of the pipeline stages encountered is five (as in the
examples given in this text). However, twelve or more pipeline stages are encountered in
some machine implementations.
32
4.6 INSTRUCTION FORMAT
ADDRESSING MODE
33
classes together with simple examples in the following paragraphs. It should be
noted that in presenting these examples, we would use the convention operation, source,
destination to express any instruction. In that convention, operation rep- resents the
operation to be performed, for example, add, subtract, write, or read. The source field
represents the source operand(s). The source operand can be a constant, a value stored in
a register, or a value stored in the memory. The destination field represents the place
where the result of the operation is to be stored, for example, a register or a memory
location.
A three-address instruction takes the form operation add-1, add-2, add-3. In this form,
each of add-1, add-2, and add-3 refers to a register or to a memory location. Consider, for
example, the instruction
ADD R1 ,R2 ,R3 .
34
address instruction. Consider, for example, the instruction ADD B,R1 . In this
case, the instruction adds the contents of register R1 to the contents of memory location
B and stores the result in register R1 . Owing to the fact that the instruction uses two
types of addressing, that is, a register and a memory location, it is called a one-and-
half-address instruction. This is because register addressing needs a smaller number of
bits than those needed by memory addressing.
It is interesting to indicate that there exist zero-address instructions. These are the
instructions that use stack operation. A stack is a data organization mechanism in which
the last data item stored is the first data item retrieved. Two specific operations can be
performed on a stack. These are the push and the pop operations. Figure below illustrates
these two operations.
As can be seen, a specific register, called the stack pointer (SP), is used to
indicate the stack location that can be addressed. In the stack push operation, the SP
value is used to indicate the location (called the top of the stack) in which the value
(5A) is to be stored (in this case it is location 1023). After storing (pushing) this value the
SP is
incremented to indicate to location 1024. In the stack pop operation, the SP is first
decremented to become 1021. The value stored at this location (DD in this case) is
retrieved (popped out) and stored in the shown register.
Different operations can be performed using the stack structure. Consider, for
example, an instruction such as ADD (SP)þ, (SP). The instruction adds the contents of the
stack location pointed to by the SP to those pointed to by the SP þ 1 and stores the result on
the stack in the location pointed to by the current value of the SP.
The different ways in which operands can be addressed are called the
addressing modes. Addressing modes differ in the way the address information of
operands is specified. The simplest addressing mode is to include the operand itself in
the instruction, that is, no address information is needed. This is called immediate
addressing. A more involved addressing mode is to compute the address of the operand by
adding a constant value to the content of a register. This is called indexed addressing.
Between these two addressing modes there exist a number of other addressing modes
35
including absolute addressing, direct addressing, and indirect addressing. A number of
different addressing modes are explained below.
1. Immediate Mode
According to this addressing mode, the value of the operand is (immediately)
avail- able in the instruction itself. Consider, for example, the case of loading the
decimal value 1000 into a register Ri . This operation can be performed using an
instruction such as the following: LOAD #1000, Ri . In this instruction, the operation to
be per- formed is to load a value into a register. The source operand is (immediately)
given as 1000, and the destination is the register Ri . It should be noted that in order to
indicate that the value 1000 mentioned in the instruction is the operand itself and not its
address (immediate mode), it is customary to prefix the operand by the special character
(#). As can be seen the use of the immediate addressing mode is simple. The use of
immediate addressing leads to poor programming practice. This is because a change in the
36
value of an operand requires a change in every instruction that uses the immediate value
of such an operand. A more flexible addressing mode is explained below.
For example, if the content of the memory location whose address is 1000 was
(2345) at the time when the instruction LOAD 1000, Ri is executed, then the
result of executing such instruction is to load the value (2345) into register Ri .
3. Indirect Mode
In the indirect mode, what is included in the instruction is not the address of the
operand, but rather a name of a register or a memory location that holds the (effective)
address of the operand. In order to indicate the use of indirection in the instruction, it is
customary to include the name of the register or the memory location in parentheses.
Consider, for example, the instruction LOAD (1000), Ri, This instruction has the
memory location 1000 enclosed in parentheses, thus indicating indirection. The
meaning of this instruction is to load register Ri with the contents of the memory location
whose address is stored at memory address 1000. Because indirection can be made through
either a register or a memory location, therefore, we can identify two types of indirect
addressing. These are register indirect addressing, if a register is used to hold the address
of the operand, and memory indirect addressing, if a memory location is used to
37
hold the address of the operand. The two types are illustrated in Figure below
4. Indexed Mode
In this addressing mode, the address of the operand is obtained by adding a con-
stant to the content of a register, called the index register. Consider, for example, the
instruction LOAD X(Rind ), Ri . This instruction loads register Ri with the contents of the
memory location whose address is the sum of the contents of register Rind and the
value X. Index addressing is indicated in the instruction by including the name of the
index register in parentheses and using the symbol X to indicate the constant to be
added. Figure illustrates indexed addressing. As can be seen, indexing requires an
additional level of complexity over register indirect addressing.
5. Other Modes
The addressing modes presented above represent the most commonly used
modes in most processors. They provide the programmer with sufficient means to
handle most general programming tasks. However, a number of other addressing modes
have been used in a number of processors to facilitate execution of specific programming
tasks. These additional addressing modes are more involved as compared to those
presented above. Among these addressing modes the relative, autoincrement, and the
autodecrement modes represent the most well-known ones. These are explained below.
38
location whose address is the sum of the contents of the program counter (PC)
and the value X. Figure illustrates the relative addressing mode.
39
automati- cally incremented after accessing the operand. As before, indirection
is indicated by including the autoincrement register in parentheses. The
automatic increment of the register‘s content after accessing the operand is
indicated by including a (þ) after the parentheses. Consider, for example,
the instruction LOAD (Rauto )þ, Ri . This instruction loads register Ri with
the operand whose address is the content of register Rauto . After loading the
operand into register Ri , the content of register Rauto is incremented, pointing
for example to the next item in a list of items. Figure below illustrates the
autoincrement addressing mode.
Autodecrement Mode
Similar to the autoincrement, the autodecrement mode uses a register to
hold the address of the operand. However, in this case the content of
the autodecrement register is first decremented and the new content is
used as the effective address of the operand. In order to reflect the fact that the
content of the autodecrement register is decremented before accessing the
operand, a (2) is included before the indirection parentheses. Consider,
for example, the instruction LOAD (Rauto ), Ri . This instruction decrements
the content of the register Rauto and then uses the new content as the
effective address of the operand that is to be loaded into register Ri .
Figure below illustrates the autodecrement addressing mode.
40
INSTRUCTION TYPES
MOVE Ri ,Rj
This instruction moves the content of register R to register R . The effect of the instruction is
to override the contents of the (destination) register R without changing the con- tents of the
(source) register R . Data movement instructions include those used to move data to (from)
registers from (to) memory. These instructions are usually referred to as the load and store
instructions, respectively.Examplesof the twoinstructionsare
The first instruction loads the content of the memory location whose address is 25838
into the destination register Rj. The content of the memory location is unchanged by
executing the LOAD instruction. The STORE instruction stores the content of the source
register Ri into the memory location 1024. The content of the source register is
unchanged by executing the STORE instruction. Table 2.3 shows some common data
transfer operations and their meanings.
41
2. Arithmetic and Logical Instructions
Arithmetic and logical instructions are those used to perform arithmetic and logical
manipulation of registers and memory contents. Examples of arithmetic instructions
include the ADD and SUBTRACT instructions. These are
ADD
R1 ,R2 ,R
0
SUBTR
ACT R1 ,
R2 ,R0
The first instruction adds the contents of source registers R1 and R2 and stores the
result in destination register R0 . The second instruction subtracts the contents of the
source registers R1 and R2 and stores the result in the destination register R0. The
contents of the source registers are unchanged by the ADD and the SUBTRACT
instructions. In addition to the ADD and SUBTRACT instructions, some machines have
MULTIPLY and DIVIDE instructions. These two instructions are expensive to implement
and could be substituted by the use of repeated addition or repeated subtraction. Therefore,
most modern architectures do not have MULTIPLY or
DIVIDE instructions on their instruction set. Table shows some common arithmetic
operations and their meanings.
Logical instructions are used to perform logical operations such as AND, OR,
SHIFT, COMPARE, and ROTATE. As the names indicate, these instructions per- form,
respectively, and, or, shift, compare, and rotate operations on register or memory
contents. Table presents a number of logical operations.
42
3. Sequencing Instructions
Control (sequencing) instructions are used to change the sequence in which
instructions are executed. They take the form of CONDITIONAL BRANCHING
(CONDITIONALJUMP), UNCONDITIONAL BRANCHING (JUMP), or CALL
instructions. A common characteristic among these instructions is that their execution
changes the program counter (PC) value. The change made in the PC value can be
unconditional, for example, in the unconditional branching or the jump instructions.
In this case, the earlier value of the PC is lost and execution of the program starts at a
new value specified by the instruction. Consider, for example, the instruction JUMP
NEW-ADDRESS. Execution of this instruction will cause the PC to be loaded with the
memory location represented by NEW-ADDRESS whereby the instruction stored at this
new address is executed. On the other hand,
the change made in the PC by the branching instruction can be conditional based on the
value of a specific flag. Examples of these flags include the Negative (N ), Zero (Z ),
Overflow (V ), and Carry (C ). These flags represent the individual bits of a
specific register, called the CONDITION CODE (CC) REGISTER. The values of
flags are set based on the results of executing different instructions. The meaning of
each of these flags is shown in Table.
43
Consider, for example, the following group of instructions.
4. Input/Output Instructions
Input and output instructions (I/O instructions) are used to transfer data between the
computer and peripheral devices. The two basic I/O instructions used are the INPUT and
OUTPUT instructions. The INPUT instruction is used to transfer data from an input
device to the processor. Examples of input devices include a keyboard or a mouse. Input
devices are interfaced with a computer through dedicated input ports. Computers can
use dedicated addresses to address these ports. Suppose that the input port through
which a keyboard is connected to a computer carries the unique address 1000.
Therefore, execution of the instruction INPUT 1000 will cause the data stored in
a specific register in the interface between the keyboard and the computer, call it the
input data register, to be moved into a specific register (called the accumulator) in the
computer. Similarly, the execution of the instruction OUTPUT 2000 causes the
data stored in the accumulator to be moved to the data output register in the output
device whose address is 2000. Alternatively, the com- puter can address these ports in
the usual way of addressing memory locations. In this case, the computer can input data
from an input device by executing an instruc-tion such as MOVE Rin , R0 . This
instruction moves the content of the register Rin into the register R0 . Similarly, the
instruction MOVE R0 , Rin moves the contents of register R0 into the register Rin , that
is, performs an output operation. This
44
Latter scheme is called memory-mapped Input/Output. Among the advantages of
memory-mapped I/O is the ability to execute a number of memory-dedicated
instructions on the registers in the I/O devices in addition to the elimination of
the need for dedicated I/O instructions. Its main disadvantage is the need to dedicate part
of the memory address space for I/O devices.
Different processor targets may also have different word alignment restrictions.
Typically, a two or four byte integer should start on a four-byte address in order to be fetched
correctly over a bus. An access of a misaligned memory element may cause a bus error.
Intel/AMD processors (eg a PC) are typically more tolerant of alignment problems than
others. The SUN Sparc that kattis runs on is quite picky about alignment!
Therefore, the software may need to adjust the contents of a message, or copy it into a
new location, before interpreting it.
Due to alignment and byte order problems, it is necessary to translate protocol headers
between the network representation (a sequence of bytes) to a structured data type (eg a C-
struct). The process of encoding a structured data type into a sequence of bytes in a message is
sometimes called marshaling. The reverse process, of unpacking data from its network
representation to one interpretable by a host is called unmarshaling.
In the lab, it is recommended to create C structures for the Ethernet and IP headers. Then
there are essentially two approaches to handle the translation between the raw packet data and
the structured information:
45
Use the unstructured raw data, and typecast it into structured information. The fields are
then accessed by byte- swapping functions. This is easiest but has limitation for handling
misaligned data. However, the x86 processors are forgiving when it comes to accessing
misaligned data,but not other architectures.
Example: char *buf; /* This is the raw data */ iphdr *ip; /* This is the structured type */
ip = (iphdr *)buf; /* Typecast */ ip->len = ntohl(ip->len); /* Fields may be accessed via the
structured type after byte swapping */
Copying and translating the raw data into separate structures. The fields can then be accessed
without type conversions. This is more complex but is a more general method.
One example of C-types are uint8_t, uint16_t and uint32_t to represent 1- 2-byte and 4-
byte unsigned integers, respectively. It is recommended that you use these types when
defining fields in protocol headers, for example.
Byte Ordering
The optional E-bit in the PSW controls whether loads and stores use big endian or
little endian byte ordering. When the E-bit is 0, all larger-than-byte loads and stores are big
endian — the lower addressed bytes in memory correspond to the higher-order bytes in the
register. When the E-bit is 1, all larger-than-byte loads and stores are little endian — the
lower-addressed bytes in memory correspond to the lower-order bytes in the register. Load
byte and store byte instructions are not affected by the E-bit.The E-bit also affects instruction
fetch. Processors which implement the PSW E-bit must also provide an implementation-
dependent, software writable default endian bit. The default endian bit controls whether the
PSW E-bit is set to 0 or 1 on interruptions and also controls whether data in the page table is
interpreted in big endian or little endian format by the hardware TLB miss handler (if
implemented).
46
Figure shows various loads in little endian format.
Stores are not shown but behave similarly. The E-bit also affects instruction fetch.When
the E-bit is 0, instruction fetch is big endian — the lower- addressed bytes in memory correspond
to the higher-order bytes in the instruction.When the E-bit is 1, instruction fetch is little endian
— the lower-addressed bytes in memory correspond to the lower-order bytes in the
instruction.Architecturally, the instruction byte swapping can occur either when a cache line is
moved into the instruction cache or as instructions are fetched from the I-cache into the pipeline.
Because processors are allowed to swap instructions as they are moved into the I-cache, software
is required to keep track of which pages might have been brought into the I-cache in big endian
form and in little endian form, given the cache move-in rules, and before executing the code,
flush all lines on any page that might have been moved in with the wrong form. Note that the
move-in rules allow all lines on a page, plus the next sequential page, to be moved in, so guard
pages (that will never be executed) must be used between code pages which will execute with
opposite endian form.
47
5. INTRODUCTION TO VLIW AND DSP PROCESSOR
DSP
Digital signal processors (DSPs) are specialized microprocessors optimized to execute the
computationally-intensive operations commonly found in the inner loops of digital signal
processing algorithms. DSPs are typically designed to achieve high performance and low cost,
and are therefore used extensively in embedded systems. Some of the architectural features
commonly found in DSPs include fast multiply accumulate hardware, separate address units,
multiple data-memory banks, specialized addressing modes, and support for low-overhead
looping. Another common feature of DSP architectures is the use of tightly-encoded instruction
sets.
For example, using 16- to 48-bit instruction words, a common DSP instruction can specify
up to five parallel operations: multiply-accumulate, two pointer updates, and two data moves.
Tightly-encoded instructions that could specify five operations in a single instruction were
initially used to improve code density with the belief that this also reduced instruction memory
requirements, and hence cost. Tightly-encoded instructions were also used to reduce instruction
memory bandwidth, which was of particular concern when packaging was not very advanced.
As a result of this encoding style, DSP instruction-set architectures (ISAs) are not very
well suited for automatic code generation by modern, high-level language (HLL) compilers. For
example, most operations are accumulator based, and very few registers can be specified in a
DSP instruction. Although this reduces the number of bits required for encoding instructions, it
also makes it more difficult for the compiler to generate efficient code. Furthermore, DSP
instructions can only specify limited instances of parallelism that are commonly used in the inner
loops of DSP algorithms. As a result, DSPs cannot exploit parallelism beyond what is supported
by the ISA, and this can degrade performance significantly. For the early generation of
programmable DSPs, tightly-encoded instruction sets were an acceptable solution. At that time,
DSPs were mainly used to implement simple algorithms that were relatively easy to code and
optimize in assembly language. Furthermore, the compiler technology of that time was not
advanced enough to generate code with the same efficiency and compactness as that of a human
programmer.
However, as DSP and multimedia applications become larger and more sophisticated, and
as design and development cycles are required to be shorter, it is no longer feasible to program
DSPs in assembly language. Furthermore, current compiler technology has advanced to the point
where it can generate very efficient and compact code. For these reasons, DSP manufacturers are
currently seeking alternative instruction set architectures. One such alternative, used in new
media processors and DSPs such as the Texas Instruments TMS320C6x is the VLIW
architecture
48
VLIW
Since DSP applications typically exhibit static control flow, compilers can easily be used to
schedule instructions statically and exploit the available parallelism. This makes VLIW
architectures, with their simpler control units, a more cost-effective choice than superscalar
architectures in embedded systems. VLIW architectures are also better suited for exploiting
parallelism in DSP applications than SIMD architectures. Although SIMD architectures are
suitable for applications that exhibit homogeneous parallelism, they are not well suited for
applications that exhibit more heterogeneous parallelism, such as that found in the inner loops of
DSP kernels, or the control code of DSP applications. The greater flexibility in VLIW
architectures also makes them easier targets for HLL compilers. However, in their most basic
form, they are also more expensive to implement.
For example, while SIMD architectures use a single ALU to perform four parallel
operations, a VLIW architecture requires four, full-precision ALUs to perform the same
operations. Moreover, while the four operations can be encoded as a single SIMD instruction,
they have to be encoded as four, separate operations in a long instruction. One way to reduce the
cost of a VLIW architecture is to customize it to the target application. For example, smaller-
precision ALUs and datapaths can be used to match the requirements of the application.
Instruction storage and bandwidth requirements can also be reduced by using efficient long-
instruction encoding techniques. Even specialized SIMD operations can be used to supplement a
VLIW architecture and reduce its cost. This, however, requires more sophisticated compiler
technology to exploit available sub-word parallelism.
Finally, VLIW architectures are better suited for exploiting the fine-grained parallelism
found in DSP applications than multiprocessor architectures. Even though some applications can
benefit from the exploitation of coarse-grained parallelism, most applications cannot justify the
additional cost of a multiprocessor architecture. Multiprocessor DSP architectures are also
difficult targets for HLL compilers, making them very cumbersome to program.
Since VLIW architectures have the flexibility to exploit different levels of parallelism,
have simple control structures, and are easy targets for current HLL compilers, they are the most
suitable architecture for embedded DSP systems. However, a major impediment to using VLIW
architectures in cost-sensitive embedded systems has been their high instruction bandwidth
requirements. Nonetheless, several approaches have been employed in research and commercial
VLIW architectures to overcome this problem. The remainder of this paper shows the
performance and cost benefits of using VLIW architectures compared to typical, DSP-style
architectures
49
1.1 IO PORT TYPES- SERIAL AND PARALLEL IO PORTS
50
Types of parallel ports
Parallel
port one bit
Input
Parallel
one bit output
Parallel Port
multi-bit Input
Parallel Port
multi-bit Output
51
Synchronous Serial Input
The sender along with the serial bits also sends the clock pulses SCLK
(serial clock) to the receiver port pin. The port synchronizes the serial data
input bits with clock bits. Each bit in each byte as well as each byte in
synchronization
Synchronization means separation by a constant interval or phase
difference. If clock period = T, then each byte at the port is received at
input in period = 8T.
The bytes are received at constant rates. Each byte at input port separates
by 8T and data transfer rate or the serial line bits is (1/T) bps. [1bps = 1
bit per s]
Serial data and clock pulse-inputs
On same input line − when clock pulses either encode or modulate serial
data input bits suitably. Receiver detects the clock pulses and receives
data bits after decoding or demodulating.
On separate input line − When a separate SCLK input is sent, the receiver
detects at the
52
middle or + ve edge or –ve edge of the clock pulses that whether the data-
input is 1 or 0
and saves the bits in an 8-bit shift register. The processing element at the port
(peripheral)
saves the byte at a port register from where the microprocessor reads the byte.
Master output slave input (MOSI) and Master input slave output (MISO)
MOSI when the SCLK is sent from the sender to the receiver and
slave is forced to synchronize sent inputs from the master as per the inputs
from master clock.
MISO when the SCLK is sent to the sender (slave)from the receiver (master)
and slave is
forced to synchronize for sending the inputs to master as per the master
clock outputs. Synchronous serial input is used for inter processor
transfers, audio inputs and
streaming data inputs.
53
Example Synchronous Serial Output
Inter-processor data transfer, multiprocessor communication, writing to CD or
hard disk,
audio Input/output, video Input/output, dialer output, network device output,
remote TV Control, transceiver output, and serial I/O bus output or writing to
flash memory using SDIO Synchronous Serial Output
Each bit in each byte sent in synchronization with a clock.
Bytes sent at constant rates. If clock period= T, then data transfer rate is (1/T)
bps.
Sender either sends the clock pulses at SCLK pin or sends the serial data
output and clock pulse-input through same output line with clock pulses
either suitably modulate or encode the serial output bits.
Synchronous serial output using shift register
The processing element at the port (peripheral) sends the byte through a
shift register at the port to where the microprocessor writes the byte.
Synchronous serial output is used for inter processor transfers, audio outputs
and streaming data outputs.
54
Synchronous Serial Input/output
55
1. 4. ASYNCHRONOUS SERIAL PORT LINE RXD (RECEIVE DATA).
Does not receive the clock pulses or clock information along with the bits.
Each bit is received in each byte at fixed intervals but each
received byte is not in synchronization.
Bytes separate by the variable intervals or phase differences.
Asynchronous serial input also called UART input if serial input is
according to UART protocol
56
1.6 UART PROTOCOL SERIAL LINE FORMAT
Starting point of receiving the bits for each byte is indicated by a line
transition from 1to 0 for a period = T. [T−1 called baud rate.]
If sender‗s shift-clock period = T, then a byte at the port is received at
input in period= 10.T or 11.T due to use of additional bits at start and end
of each byte. Receiver detects n bits at the intervals of T from the middle
of the start indicating bit. The n = 0, 1, …, 10 or 11 and finds whether the
data-input is 1 or 0 and saves the bits in an 8-bit shift register.
57
Processing element at the port (peripheral)saves the byte at a port register
from where the microprocessor reads the byte.
Asynchronous Serial Output
Asynchronous output serial port line TxD(transmit data).
Each bit in each byte transmit at fixed intervals but each output byte is not
in synchronization (separates by a variable interval or phase difference).
Minimum separation is 1 stop bit interval TxD.
Does not send the clock pulses along with the bits.
Sender transmits the bytes at the minimum intervals of nT. Bits receiving
starts from the
58
An example of half-duplex mode─ telephone communication. On one
telephone line, the talk can only in the half duplex way mode.
Full Duplex
Full duplex means that at an instant, the communication can be both ways.
An example of the full duplex asynchronous mode of communication is the
communication between the modem and the computer though TxD and
RxD lines or communication using SI in modes 1, 2 and 3 in 8051
1.7 PARALLEL PORT SINGLE BIT INPUT
Completion of a
revolution of a wheel,
Achievingpreset
pressure in a boiler,
Exceeding the upper limit of permittedweight over the pan of an
electronicbalance,
Presence of a magnetic piece in the vicinity of or within reach of a robot arm
to its endpoint and Filling of a liquid up to a fixed level.
Parallel Port Output- single bit
PWM output for a DAC, which controls liquid level, or temperature, or
pressure, or speed or angular position of a rotating shaft or a linear
displacement of an object or ad.c. motor
control
Pulses to an external circuit
Control signal to an
external circuit Parallel
59
Port Input- multi-bit
ADC input from liquid level measuring sensor or temperature sensor or
pressure sensor or speed sensor or d.c. motor rpm sensor
Encoder inputs for bits for angular position of a rotating shaft or a linear
displacement of an object.
60
3.
Asyn
chron
ous
1. Synchronous and Iso-synchronous Communication in Serial Ports or Devices
Synchronous Communication.
61
byinterval ΔT) sent as Sync code to enable the frame
synchronization or frame start signaling.
Code bits precede the data bits.
May be inversion of code bits after each frame in certain protocols.
Flag bits at start and end are also used in certain protocols.
Always present Synchronous device port data bits
Reciprocal of T is the bit per second(bps).
Data bits─ m frame bits or 8 bits transmit such that each bit is at
the line for time ΔT or, each frame is at the line for time (m. T)m
may be 8 or a large number. It depends on the protocol
Synchronous device clock bits
Clock bits ─ Either on a separate clock line or on data line
such that the clock information is also embedded with the data
bits by an appropriate encoding or modulation
Generally not optional
62
always implicit to the synchronous data receiver. The transmitter generally
transmits the clock rate information
Asynchronous Communication from Serial Ports or Devices
Asynchronous Communication Clocks of the receiver and transmitter
independent, unsynchronized, but of same frequency and variable phase
differences between bytes or bits of two data frames, which may not be sent
within any prefixed time interval.
Example of asynchronous communication
• UART Serial, Telephone or modem
communication. • RS232C
communication between the UART
devices
• Each successive byte can have variable time-gap but have a
minimum in-between interval and no maximum limit for full frame
of many bytes
Two characteristics of asynchronous communication
1. Bytes (or frames) need not maintain a constant phase difference and are
asynchronous, i.e., not in synchronization. There is permission to send either
bytes or frames at variable time intervals─ This facilitates in-between
handshaking between the serial transmitter port and serial receiver port
2. Though the clock must ticking at a certain rate always has to be there to
transmit the bits of a single byte (or frame) serially, it is always implicit to the
asynchronous data receiver and is independent of the transmitter
Clock Features
_ The transmitter does not transmit (neither separately nor by encoding
using
63
modulation)along with the serial stream of bits any clock rate
information in the asynchronous communication and receiver clock thus is
notable to maintain identical frequency and constant phase difference with
transmitter clock
Example: IBM personal computer has two COM ports (communication ports)
_ COM1 and COM2 at IO addresses 0x2F8-
0xFFand 0xx38-0x3FF _ Handshaking signals─
RI, DCD, DSR, DTR,RTS, CTS, DTR
_ Data Bits─ RxD and TxD Example: COM port and Modem
Handshaking signals _ When a modem connects, modem sends data
carrier detect DCD signal
at an instance t0.
_ Communicates data set ready (DSR)signal at an instance t1 when it
receives the bytes on the line.
_ Receiving computer (terminal) responds at an
instance t2 by data terminal ready(DTR) signal.
After DTR, request to send (RTS) signal is sent at an instance t3
_ Receiving end responds by clear to send (CTS) signal at an
instance t4. After the response CTS, the data bits are transmitted
by modem from an instance t5 to the receiver terminal.
_ Between two sets of bytes sent in asynchronous mode, the
handshaking signals RTS and CTS can again be exchanged.
This explains why the bytes do not remain synchronized during
asynchronous transmission.
64
3. Communication Protocols
65
1. Protocol
A protocol is a standard adopted, which tells the way in which the bits
of a frame must be sent from a device (or controller or port or processor) to
another device or system
[Even in personal communication we follow a protocol – we say Hello! Then
talk and then say good bye!]
A protocol defines how are the frame bits:
1) sent− synchronously or Iso synchronously or asynchronously and
at what rate(s)? 2) preceded by the header bits? How the receiving
device address communicated
so that only destined device activates and receives the bits?
[Needed when several devices addressed though a common line(bus)]
3) How can the transmitting device address defined so that
receiving device comes to know the source when receiving data
from several sources?
4) How the frame-length defined so that receiving device know the frame-
size in advance? 5) Frame-content specifications –Are the sent frame bits
specify the control or
66
sent after first formatted according to a specified protocol, which is to be
followed when communicating with another device through an IO port or
channel
Protocols
_ HDLC, Frame Relay, for synchronous communication
_ For asynchronous transmission from a device port− RS232C, UART,
X.25, ATM, DSL and
ADSL
_ For networking the physical devices in telecommunication and
computer networks − Ethernet and token ring protocols used in LAN Networks
Protocols in embedded
network
devices o _
For Bridges
and routers
o _ Internet appliances application protocols and Web protocols
─HTTP (hyper text transfer protocol), HTTPS (hyper text transfer
protocol Secure Socket Layer),SMTP (Simple Mail Transfer
Protocol),POP3 (Post office Protocol version 3),ESMTP (Extended
SMTP),
File transfer, Boot Protocols in embedded
devices network o _ TELNET
(Tele network),
o _ FTP (file transfer protocol),
o _ DNS (domain network server),
o _ IMAP 4 (Internet Message Exchange Application Protocol) and
o _ Bootp (Bootstrap protocol).Wireless Protocols in embedded devices
network
o _ Embedded wireless appliances uses wireless
protocols─ WLAN 802.11,802.16, Bluetooth,
ZigBee, WiFi, WiMax,
67
1.8 TIMING AND
COUNTING
DEVICES
Timer
• Timer is a device, which counts the input at regular interval (δT) using
clock pulses at its input.
• The counts increment on each pulse and store in a register,
called count register • Output bits (in a count register or at the
output pins) for the present counts.
Evaluation of Time
• The counts multiplied by the interval δT give the time.
• The (present counts −initial counts) ×δT interval gives the time interval between
two instances when present count bits are read and initial counts were read or set.
Timer
_ Has an input pin (or a control bit in control register) for resetting it for all
count bits = 0s. _ Has an output pin (or a status bit in status register) for
output when all count bits =
0s after reaching the maximum value, which also means after
timeout or overflow. Counter
• A device, which counts the input due to the events at irregular or
regular intervals. • The counts gives the number of input events or
pulses since it was last read.
• Has a register to enable read of present counts
• Functions as timer when counting regular interval clock pulses
_ Has an input pin (or a control bit in control register) for resetting it for all
68
count bits = 0s. _ Has an output pin (or a status bit in status register) for
output when all count bits = 0s after reaching the maximum value, which
also means after timeout or overflow.
69
Free running Counter (Blind running Counter)
• A counting device may be a free running(blind counting) device giving
overflow interrupts at fixed intervals
• A pre-scalar for the clock input pulses to
fix the intervals Free Running Counter
It is useful for action or initiating chain of actions, processor
interrupts at the preset instances noting the instances of occurrences
of the events
_ processor interrupts for requesting the processor to use the capturing
of counts at the input instance
_ comparing of counts on the events for future Actions
70
[An addition input pin for sensing an event and saving the counts at the
instance of event and taking action.]
Free running (Blind Counts) input OC enable pin (or a control bit in control
register)
• For enabling an output when all count bits at free running count =
preloaded counts in the compare register.
• At that instance a status bit or output pin also sets in and an interrupt
‗OCINT‗ of processor can occur for event of comparison equality.
• Generates alarm or processor interrupts at the preset times
or after preset interval from another event
Free running (Blind Counts) input capture -enable pin (or a control bit in
control register) for Instance of Event Capture
• A register for capturing the counts on an instance of an input (0 to 1 or 1
to 0or toggling) transition
_ A status bit can also sets in and processor interrupt can occur for
the capture event Free running (Blind Counts) Pre-scaling
• Prescalar can be programmed as p = 1, 2,4, 8, 16, 32, .. by programming a
prescaler register. •Prescalar divides the input pulses as per the programmed
value of p.
• Count interval = p × δT interval
• δT = clock pulses period, clock
frequency = δT −1 Free running
(Blind Counts) Overflow
• It has an output pin (or a status bit in status register) for output when all
count bits = 0s after reaching the maximum value, which also means after
timeout or overflow
• Free running n-bit counter overflows after p ×
2n × δT interval Uses of a timer device
_ Real Time Clock Ticks (System Heart Beats). [Real time clock is
a clock, which, once the system starts, does not stop and can't be reset
and its count value can't
71
be reloaded. Real time endlessly flows and never returns back!] Real
Time Clock is set for ticks using prescaling bits (or rate set bits)
inappropriate control registers.
Initiating an event after a preset delay time. Delay is as per count
value loaded. Initiating an event (or a pair of events or a chain of
events) after a comparison(s)
with between the pre-set time(s) with counted value(s). [It is
similar to a preset alarm(s).].
A preset time is loaded in a Compare Register. [It is similar to presetting an
alarm].
Capturing the count value at the timer on an event. The information of
time(instance of the event) is thus stored at the capture register.
Finding the time interval between two events. Counts are captured
at each event in capture register(s) and read. The intervals are thus
found out.
Wait for a message from a queue or mailbox or semaphore for a preset
time when using RTOS. There is a predefined waiting period is done
before RTOS lets a task run.
Watchdog timer.
It resets the system after a defined time.
_ Baud or Bit Rate Control for serial communication on a line or
network. Timer timeout interrupts define the time of each baud
72
_ Input pulse counting when using a timer, which is ticked by giving non
periodic inputs instead of the clock inputs. The timer acts as a counter if,
in place of clock inputs, the inputs are given to the timer for each instance
to be counted.
_ Scheduling of various tasks. A chain of software-timers interrupt
and RTOS uses these interrupts to schedule the tasks.
_ Time slicing of various tasks. A multitasking or multi-programmed
operating system presents the illusion that multiple tasks or programs
are running simultaneously by switching between programs very
rapidly, for example, after every
16.6 ms.
_ Process known as a context switch.[RTOS switches after preset time-delay
from one
running task to the next. task. Each task can therefore run in predefined slots of
time]
Software Timer
_ A software, which executes and increases or decreases a count-
73
variable(count value) on an interrupt from on a system timer output or
from on a real time clock interrupt.
_ The software timer also generate interrupt on overflow of count-value or on
finishing value of the count variabl
System clock
• In a system an hardware-timing device is programmed to tick at
constant intervals. • At each tick there is an interrupt
• A chain of interrupts thus occur at
periodic intervals. • The interval is as
per a preset count value
• The interrupts are called system clock interrupts, when used to
control the schedules and timings of the system
74
• Actions are analogous to that of a hardware timer. While there is physical
limit (1, 2 or 3 or 4) for the number of hardware timers in a system, SWT scan
be limited by the number of interrupt vectors provided by the user.
• Certain processors (microcontrollers)also defines the interrupt vector
addresses of 2 or 4 SWTs
1.9 SERIAL BUSCOMMUNICATION PROTOCOLS– I2C
Interconnecting number of device circuits, Assume flash memory, touch screen, ICs
for measuring temperatures and ICs for measuring pressures at a number of
processes in a plant.
_ ICs mutually network through a common synchronous serial bus I2C An 'Inter
Integrated Circuit' (I2C) bus, a popular bus for these circuits.
_Synchronous Serial Bus Communication for networking
_ Each specific I/O synchronous serial device may be connected to other using
specific interfaces, for example, with I/O device using I2C controller
_ I2C Bus communication− use of only simplifies the number of connections and
provides a common way (protocol) of connecting different or same type of I/O
devices using synchronous serial communication
75
IO I2C Bus
_ Any device that is compatible with a I2Cbus can be added to the system(assuming
an appropriate device driver program is available), and a I2C device can be integrated
into any system that uses that I2C bus.
Originally developed at Philips Semiconductors
Synchronous Serial Communication 400kbps up to 2 m and 100 kbps for
longer distances Three I2C standards
1. Industrial
100 kbps I2C,
2. 100 kbps
SM I2C,
3.
400
kbps
I2C
I2C
76
Bus
_ The Bus has two lines that carry its signals— one line is for the
clock and one is for bi- directional data.
_ There is a standard protocol
for the I2Cbus. Device
Addresses and Master in the
I2C bus
_ Each device has a 7-bit address using which the data
transfers take place. _ Master can address 127 other slaves
at an instance.
_ Master has at a processing element functioning as bus controller or a
microcontroller with I2C (Inter Integrated Circuit) bus interface circuit.
Slaves and Masters in the I2C bus
_ Each slave can also optionally has I2C (Inter Integrated Circuit) bus
controller and processing element.
_ Number of masters can be connected on the bus.
77
• Time taken by algorithm in the hardware that analyzes the bits throughI2C
in case the slave hardware does not provide for the hardware that support sit.
• Certain ICs support the protocol and certain do not.
• Open collector drivers at the master need a pull-up resistance of 2.2 K on each line
1.11 SERIAL BUSCOMMUNICATION PROTOCOLS– CAN
Distributed Control Area Network example - a network of embedded systems in
automobile
_ CAN-bus line usually interconnects to a CAN controller between line and
host at the node. I
gives the input and gets output between the physical and data link layers at the
host node.
_ The CAN controller has a BIU (bus interface unit consisting of buffer and
driver), protocol controller, status-cum control registers, receiver-buffer and
message objects. These units connect the host node through the host interface
circuit
Three standards:
1. 33 kbps CAN,
2. 110 kbps Fault
Tolerant CAN, 3.
1 Mbps High
Speed CAN
CAN protocol
There is a CAN controller between the CAN line and the host node.
_ CAN controller ─BIU (Bus Interface Unit)consisting of a buffer and driver
_ Method for arbitration─ CSMA/AMP(Carrier Sense Multiple Access with
Arbitration on
78
Message Priority basis)
• Line, which pulls to Logic 1 through a resistor between the line and + 4.5V to +12V.
• Line Idle state Logic 1 (Recessive state)
• Uses a buffer gate between an input pin and the CAN line
• Detects Input Presence at the CAN line pulled down to dominant (active) state
logic 0 (ground ~ 0V) by a sender to the CAN line
• Uses a current driver between the output pin and CAN line and pulls line
down to dominant (active) state logic 0(ground ~ 0V) when sending to the
79
CAN line Protocol defined start bit followed by six fields of frame bits Data
frame starts after first detecting that dominant state is not present at the CAN
line with logic 1 (R state) to 0 (D state transition) for one serial bit interval
• After start bit, six fields starting from arbitration field and ends with seven
logic0s end-field •
3-bit minimum inter frame gap before next start bit (R→ D transition) occurs
Protocol defined First field in frame bits _ First field of 12 bits ─'arbitration field. _ 11-bit
destination address and RTR bit (Remote Transmission Request)
_ Destination device address specified in an11-bit sub-field and whether the data byte
being sent
80
is a data for the device or a request to the device in 1-bit sub-field.
_ Maximum 211 devices can connect a CAN controller in case of 11-bit address
fieldstandard11-
bit address standard CAN
_ Identifies the device to which data is being sent or request is being made.
_ When RTR bit is at '1', it means this packet is for the device at destination address.
If this bit is at '0' (dominant state) it means, this packet is a request for the data from
the
device. Protocol defined frame
bits Second field _ Second field
of 6 bits─ control field.
The first bit is for the
identifier‗s extension. _ The
second bit is always '1'.
_ The last 4 bits specify code for data Length
_ Third field of 0 to 64 bits─ Its length depends on the data length code in the control
field.
• Fourth field (third if data field has no bit present) of 16 bits─ CRC (Cyclic
Redundancy Check) bits.
• The receiver node uses it to detect the errors, if any, during
the transmission • Fifth field of 2 bits─ First bit 'ACK slot'
• ACK = '1' and receiver sends back '0' in this slot when the receiver detects
an error in the reception.
• Sender after sensing '0' in the ACK slot, generally retransmits
the data frame. • Second bit 'ACK delimiter' bit. It signals the
end of ACK field.
81
• If the transmitting node does not receive any acknowledgement of data
frame within a Specified time slot, it should retransmit.
Sixth field of 7-bits ─ end- of- the frame specification and has seven '0's
82