01 EmbeddedSystemsNotes2011a

Embedded Systems – 2011
Chapter 1
1. What is an embedded system?
An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it
controls. Unlike a general-purpose computer, such as a personal computer, an embedded system performs pre-defined
tasks, usually with very specific requirements. Since the system is dedicated to a specific task, design engineers can
optimise it, reducing the size and cost of the product. Embedded systems are often mass-produced, so the cost savings
may be multiplied by millions of items.
The first recognizably modern embedded system was the Apollo Guidance Computer, developed by Charles Stark Draper
at the MIT Instrumentation Laboratory. Each flight to the moon had two. At the project's inception, the Apollo guidance
computer was considered the riskiest item in the Apollo project. The use of the then new monolithic integrated circuits, to
reduce the size and weight, increased this risk.
The first mass-produced embedded system was the Autonetics D-17 guidance computer for the Minuteman missile,
released in 1961. It was built from discrete transistor logic and had a hard disk for main memory. When the Minuteman II
went into production in 1966, the D-17 was replaced with a new computer that was the first high-volume use of integrated
circuits. This program alone reduced prices on quad NAND gate ICs from $1000/each to $3/each, permitting their use in
commercial products.
Since these early applications in the 1960s, where cost was no object, embedded systems have come down in price. There
has also been an enormous rise in processing power and functionality. For example the first microprocessor was the Intel
4004, which found its way into calculators and other small systems, but required external memory and support chips. By
the mid-1980s, most of the previously external system components had been integrated into the same chip as the
processor, resulting in integrated circuits called microcontrollers, and widespread use of embedded systems became
feasible.
As the cost of a microcontroller fell below $1, it became feasible to replace expensive analog components such as
potentiometers and variable capacitors with digital electronics controlled by a small microcontroller. By the end of the
80s, embedded systems were the norm rather than the exception for almost all electronics devices, a trend which has
continued since.
1.1. Examples of embedded systems

 Automatic teller machines (ATMs)
 Avionics, such as inertial guidance systems, flight control hardware/software and other integrated systems in
aircraft and missiles
 Cellular telephones and telephone switches
 Computer equipment such as routers and printers
 Engine Management Systems and antilock brake controllers for automobiles
 Home automation products, like thermostats, air conditioners, sprinklers, and security monitoring systems
 Handheld calculators
 Household appliances, including microwave ovens, washing machines, television sets, DVD players/recorders
 Medical equipment
1.2. Characteristics
Embedded systems are designed to do some specific task, rather than be a general-purpose computer for multiple tasks.
Some also have real-time performance constraints that must be met, for reasons such as safety and usability; others may
have low or no performance requirements, allowing the system hardware to be simplified to reduce costs.
For high volume systems such as portable music players or mobile phones, minimising production cost is usually the
primary design consideration. Engineers typically select hardware that is just ―good enough‖ to implement the necessary
functions. For example, a digital set-top box for satellite TV has to process large amounts of data every second, but most
of the processing is done by custom integrated circuits. The embedded CPU "sets up" this process, and displays menu
graphics, etc. for the unit‘s look and feel.
1
For low-volume or prototype embedded systems, prebuilt computer hardware can be used by limiting the programs or by
replacing the operating system with a real-time operating system. In such systems, minimising the design and
development cost is usually the goal.
The software written for embedded systems is often called firmware, and is stored in ROM or Flash memory chips rather
than a disk drive. It often runs with limited hardware resources: small or no keyboard, screen, and little RAM memory.
Embedded systems reside in machines that are expected to run continuously for years without errors and in some cases
must recover by themselves if an error occurs. Therefore the software is usually developed and tested more carefully than
that for PCs, and unreliable mechanical moving parts such as Disk drives, switches or buttons are avoided. Recovery from
errors may be achieved with techniques such as a watchdog timer that resets the computer unless the software periodically
notifies the watchdog.
1.3. User interfaces

Embedded systems range from no user interface at all (dedicated to a single task) to full user interfaces similar to desktop
operating systems in devices such as PDAs. In between are devices with small character or digit only displays and a few
buttons. Therefore usability considerations vary widely.
Where little user interaction is required, the interface may consist of a few buttons and lights. More complex interfaces
may be implemented using larger, multi-character displays and keypads or even touch sensitive, full colour displays.
These allow for extremely complex interaction as any user of a modern ‗smart‘ phone can see.
The rise of the World Wide Web has given embedded designers another quite different option, by providing a web page
interface over a network connection. This avoids the cost of a sophisticated display, yet provides complex input and
display capabilities when needed, on another computer.
1.4. Self-Test
Most embedded systems have some degree of built-in self-test. In safety-critical systems, they are may be run periodically
or even continuously. There are several basic types:
1. Computer Tests: CPU, RAM, and program memory. These often run once at power-up.
2. Peripheral Tests: These simulate inputs and read-back or measure outputs.
3. Power Supply Tests: including batteries or other backup.
4. Consumables Tests: These measure what a system uses up, and warn when the quantities are low, for example a
fuel gauge in a car, or chemical levels in a medical system.
5. Safety Tests: These run within a 'safety interval', and assure that the system is still reliable. The safety interval is
usually less than the minimum time that can cause harm.
Some tests that may require interaction with a technician:
1. Cabling Tests, where a loop is made to allow the unit to receive what it transmits
2. Rigging Tests: allow a system to be adjusted when it is installed
3. Operational Tests: These measure things that a user would care about to operate the system. Notably, these
have to run when the system is operating. This includes navigational instruments on aircraft, a car's speedometer,
and disk-drive lights.
After self-test passes, it is common to indicate this by some visible means like LEDs, providing simple diagnostics to
technicians and users.
CPU Platform
There are many different CPU architectures used in embedded designs such as ARM, MIPS, Coldfire/68k, PowerPC,
X86, PIC, 8051, Atmel AVR, Renesas H8, SH, V850, FR-V, M32R etc. This in contrast to the desktop computer
market, which is currently limited to just a few competing architectures.
PC/104 is a typical base for small, low-volume embedded and ruggedized system design. These often use DOS, Linux,
NetBSD, or an embedded real-time operating system such as QNX or Inferno.
EG. This is a full PC type processer using a 1.1Ghz Intel Atom processor with multiple
RS232, LAN & USB ports, up to 2GB of RAM. All this in a 146mm x104mm module.
A common configuration for very-high-volume embedded systems is the system on a
chip (SoC)(2), an application-specific integrated circuit (ASIC(3)), for which the CPU
2
was purchased as intellectual property to add to the IC's design. A related scheme is to use a field-programmable gate
array, and program it with all the logic, including the CPU.
1.5. Tools
As for other software, embedded system designers use compilers, assemblers, and debuggers to develop embedded system
software. However, they may also use some more specific tools:
 An in-circuit emulator(4) (ICE): ICE is a hardware device that replaces or plugs into the microprocessor, and
provides facilities to quickly load and debug experimental code in the system.
 Utilities to add a checksum or CRC to a program, so the embedded system can check its program is valid
 For systems using digital signal processing, developers may use a math workbench such as MatLab or
Mathematica to simulate the mathematics.
 Custom compilers and linkers may be used to improve optimisation for the particular hardware.
 An embedded system may have its own special language or design tool, or add enhancements to an existing
language.
Software tools can come from several sources:
 Software companies that specialize in the embedded market
 Ported from the GNU software development tools
 Sometimes, development tools for a personal computer can be used if the embedded processor is a close relative
to a common PC processor
1.6. Debugging
Embedded Debugging may be performed at different levels, depending on the facilities available, ranging from assembly
or source-level debugging with an in-circuit emulator to output from serial debug ports, to an emulated environment
running on a personal computer.
As the complexity of embedded systems grows, higher level tools and operating systems are migrating into machinery
where it makes sense. For example, cellphones, personal digital assistants and other consumer computers often need
significant software that is purchased or provided by a person other than the manufacturer of the electronics. In these
systems, an open programming environment such as Linux, NetBSD, OSGi or Embedded Java is required so that the
third-party software provider can sell to a large market.
Most such open environments have a reference design that runs on a PC. Much of the software for such systems can be
developed on a conventional PC. However, the porting of the open environment to the specialized electronics, and the
development of the device drivers for the electronics are usually still the responsibility of a classic embedded software
engineer. In some cases, the engineer works for the integrated circuit manufacturer, but there is still such a person
somewhere.
1.7. Start-up
All embedded systems have start-up code. Usually it sets up the electronics, runs a self-test, and then starts the application
code. The startup process is commonly designed to be short, such as less than a tenth of a second, though this may depend
on the application.
1.8. Reliability regimes

Reliability has different definitions depending on why people want it.
1. The system is too unsafe, or inaccessible to repair. Generally, the embedded system tests subsystems, and
switches redundant spares on line, or incorporates "limp modes" that provide partial function. Examples include
space systems, undersea cables, navigational beacons, bore-hole systems, and automobiles. Often mass-produced
equipment for consumers falls in this category because repairs are expensive and repairmen far away, when
compared to the initial cost of the unit.
2. The system must be kept running for safety reasons. Like the above, but "limp modes" are less tolerable. Often
backups are selected by an operator. Examples include aircraft navigation, reactor control systems, safety-critical
chemical factory controls, train signals, engines on single-engine aircraft.
3
3. The system will lose large amounts of money when shut down. (Telephone switches, factory controls, bridge and
elevator controls, funds transfer and market making, automated sales and service) These usually have a few
go/no-go tests, with on-line spares or limp-modes using alternative equipment and manual procedures.
4. The system cannot be operated when it is unsafe. Similarly, perhaps a system cannot be operated when it would
lose too much money. (Medical equipment, aircraft equipment with hot spares, such as engines, chemical factory
controls, automated stock exchanges, gaming systems) The testing can be quite exotic, but the only action is to
shut down the whole unit and indicate a failure.
4
2. Embedded Software Architectures
There are several different types of software architecture in common use.
2.1. Simple Control Loop

In this design, the software simply has a loop. The loop calls subroutines, each of which manages a part of the hardware
or software. A common model for this kind of design is a state machine, which identifies a set of states that the system
can be in and how it changes between them, with the goal of providing tightly defined system behaviour.
This system's strength is its simplicity, and on small pieces of software, the loop is usually very quick. It is common on
small devices with a stand-alone microcontroller dedicated to a simple task.
Weaknesses of a simple control loop are that it does not guarantee a time to respond to any particular hardware event
(although careful design may work around this), and that it can become difficult to maintain or add new features.
2.2. Non Pre-emptive Multitasking

A non-pre-emptive multitasking system is very similar to the above, except that the loop is hidden in an API. The
programmer defines a series of tasks, and each task gets its own environment to "run" in. Then, when a task is idle, it calls
an idle routine (usually called "pause", "wait", "yield", or etc.).
An architecture with similar properties is to have an event queue, and have a loop that processes the events one at a time.
The advantages and disadvantages are very similar to the control loop, except that adding new software is easier, by
simply writing a new task, or adding to the queue-interpreter.
2.3. Pre-emptive Multitasking

In this type of system, a low-level kernel switches between tasks based on a timer or events. This is the level at which the
system is generally considered to have an "operating system", and introduces all the complexities of managing multiple
tasks running seemingly at the same time.
Any piece of task code can damage the data of another task; they must be precisely separated. Access to shared data must
be controlled by some synchronization strategy, such as message queues, semaphores or a non-blocking synchronization
scheme.
Because of these complexities, it is common for organisations to buy a real-time operating system, allowing the
application programmers to concentrate on device functionality rather than operating system services. Such operating
systems may be divided into the following categories.
2.3.1. Monolithic Kernels

In this case, a full kernel with sophisticated capabilities is adapted to suit an embedded environment. This gives the
programmers a full environment similar to a desktop operating system like Linux or Microsoft Windows, and is therefore
very productive for development; on the downside, it requires considerably more hardware resources, is often more
expensive, and because of the complexity of these kernels can have low predictability / reliability.
Common examples of embedded monolithic kernels are Embedded Linux and Windows CE.
Despite the increased cost in hardware, this type of embedded system is increasing in popularity, especially on the more
powerful embedded devices such as Wireless Routers and GPS Navigation Systems. Here are some of the reasons:
 Ports to common embedded chip sets are available.
 They permit re-use of publicly available code for Device Drivers, Web Servers, Firewalls, , and other code.
 Development systems can start out with broad feature-sets, and then the distribution can be configured to exclude
unneeded functionality, and save the expense of the memory that it would consume.
 Many engineers believe that running application code in user mode is more reliable, easier to debug and that
therefore the development process is easier and the code more portable.
 Many embedded systems lack tight real time capabilities but often a system such as Embedded Linux has fast
enough response for many applications.
 Features requiring faster response than can be guaranteed can often be placed in hardware.
5
2.3.2. Micro-kernels
A microkernel is an alternate realisation of a real-time OS where the micro-kernel handles only memory and task
management with user mode processes left to implement major subsystems such as file systems, network interfaces, etc.
In general, microkernels are of value when the task switching and inter-task communication is fast.
2.4. Custom Operating Systems

Some systems require safe, timely, reliable or efficient behaviour unobtainable with the above architectures. In
this case companies will build a system to suit. However, this is increasingly unusual due to the enormous cost of building
an operating system from scratch, and it is more common to adapt or add features to one of the above approaches.
Footnotes:
System-on-a-chip
System-on-a-chip (SoC or SOC) is an idea of integrating all components of a computer or other electronic system into a single chip. It may contain
digital, analogue, mixed-signal, and often radio-frequency functions – all on one chip. A typical application is in the area of embedded systems. If it is
not feasible to construct an SoC for a particular application, an alternative is a system in package (SiP) comprising a number of chips in a single
package.
A typical SoC consists of:
 one or more microcontroller, microprocessor or DSP core(s)
 memory blocks including a selection of ROM, RAM, EEPROM and Flash
 Timing sources including oscillators and phase-locked loops
 Peripherals including counter-timers, real-time timers and power-on reset generators
 External interfaces including industry standards such as USB, FireWire, Ethernet, USART, SPI
 Analog interfaces including ADCs and DACs
 Voltage regulators and power management circuits
These blocks are connected by either a proprietary or industry-standard bus such as the AMBA bus from ARM. DMA controllers route data directly
between external interfaces and memory, by-passing the processor core and thereby increasing the data throughput of the SoC.
Application Specific Integrated Circuit (ASIC)

An application-specific integrated circuit (ASIC) is an IC customised for a particular use, rather than intended for general-purpose use. For example, a
chip designed solely to run a cell phone is an ASIC. In contrast, the 7400 series and 4000 series integrated circuits are logic building blocks that can be
wired together to perform many different applications. Intermediate between ASICs and standard products are application specific standard products
(ASSPs).
In-circuit emulator
An in-circuit emulator (ICE) also called on-circuit debugger (OCD) or background debug module (BDM) is a hardware device used to debug the
software of an embedded system. Embedded systems present special problems for a programmer, because they usually lack keyboards, screens, disk-
drives and other helpful user interfaces and storage devices that are present on business computers.
The basic idea of an "in-circuit emulator" is that it provides a window into the embedded system. The programmer uses the emulator to load programs
into the embedded system, run them, step through them slowly, and see and change the data used by the system's software
In-circuit emulation can also refer to the use of hardware emulation, when the emulator is plugged into a system (not always embedded) in place of a
yet-to-be-built chip (not always a processor). These in-circuit emulators provide a way to run the system with "live" data while still allowing relatively
good debugging capabilities.
6
3. Processor Design Metrics
3.1. Introduction
Design is the task of defining a system‘s functionality and converting
that functionality into a physical implementation, while satisfying
certain constrained design metrics and optimising other design
metrics.
The functionality of modern microelectronics and embedded systems are becoming more and more complex. Getting such
complex functionality right is difficult task because of the millions of possible environment scenarios that must be
responded to properly. Not only is getting the functionality correct difficult, but creating an implementation that satisfies
physical constraints may also be difficult due to competing, tightly constrained metrics.
The designer, while converting requirements into a workable implementation, passes through many stages each with its
own constraints. Some constraints are inflexible and known to the designer before design begins; these are the generic
rules for design of a particular kind of object. For example, a logic designer may begin with the constraint of using strictly
AND, OR, and NOT gates to build a circuit. More complex objects must all reduce to these at some point. Other prior
knowledge is more understandable and is sometimes taken for granted, such as the constraint of having to do logic design
with only two-valued logic (true or false) rather than multivalued logic or continuous (analog) logic. All of these are
guiding factors to a designer.
Other design constraints are specific to the individual product being designed. Examples of these constraints are the
particular components selected for the design, the location of power and ground paths, and the timing specifications. As
design proceeds, the number of constraints increases. A good design leaves sufficient options available in the final stages
so that corrections and improvements can be made at that point. A poor design can leave the designer "painted into a
corner" such that the final stages become more and more difficult, even impossible to accomplish.
The hardest design constraints to satisfy are those that are continuous in their trade-off and, because they compete with
others, have no optimum solutions. An example of such constraints is the simultaneous desire to keep size low, power
consumption low, and performance high. In general, these three criteria cannot all be minimised because they are at cross-
purposes to each other. Improving one often leads to worsening of another. For example, if we reduce an
implementation's size, the performance may suffer. Performance and power dissipation compete with one another. The
designer must choose a scheme that meets specified needs in some areas and is not too wasteful in the others. The rules
for this are complex, intertwined, and imprecise. It is possible to say that design is a matter of constraint optimisation
without any trade-off with the system‘s functionality (the system requirements and specifications).
This chapter deals with this type of constraints; constraints that compete with one another. The metrics are divided into
four groups; performance design metrics, design economics metrics, power dissipation metrics, and system effectiveness
metrics
3.2. Optimising Design Metrics

The main target of any computer or an embedded-system designer is to construct an implementation that fulfils desired
functionality. At each phase of the design and implementation process, the designer has to select between many possible
options. He has to decide which top-down model he is going to use, which tools he is going to use (we call it design
technology) to speed up the design process, which processor technology (software, hardware, combination between
hardware and software, etc.), the type of IC technology he is going to use (VLSI, PLD, FPGA, etc) and so on. It is
expected, then, that more than one design will fulfil the required functionality. The challenge is to construct an
implementation that simultaneously optimises numerous design metrics.
3.3. Common Design Metrics

For our purposes, an implementation consists either of a microprocessor/microcontroller with an accompanying programs,
a special hardware or some combination thereof. A design metric is a measurable feature of a system's implementation.
Commonly used metrics include:
 NRE cost (nonrecurring engineering cost): The one-time monetary cost of designing the system. Once the system is designed,
any number of units can be manufactured without incurring any additional design cost; hence the term nonrecurring.
 Unit cost: The monetary cost of manufacturing each copy of the system, excluding NRE cost.
 Size: The physical space required by the system, often measured in bytes for software, and gates or transistors for hardware.
7
 Performance: The execution time of the system or the processing power. It is usually taken to mean the time required to complete
a task (latency or response time), or as the number of tasks that can be processed per unit time (throughput). Factors that influence
throughput and latency include the clock speed, the word length, the number of general purpose registers, the instruction variety,
memory speed, programming language used, and the availability of suitable peripherals.
 Power: The amount of power consumed by the system, which may determine the lifetime of a battery, or the cooling requirements
of the IC, since more power means more heat.
o Heat generation is a primary enemy in achieving increased performance. Newer processors are larger and faster, and keeping
them cool can be a major concern.
o Reducing power usage will be the primary objective in case of designing a project that needs the components to be crammed
into small space. Such applications are very sensitive to heat problems
o Energy conservation is becoming an increasingly significant global concern to be addressed.
 Flexibility: The ability to change the functionality of the system without incurring heavy NRE cost. Software is typically
considered very flexible. Many digital systems are created to provide a device that may be used in a variety of applications to
achieve a reasonable solution.
 Time-to-prototype: The time needed to build a working version of the system, which may be bigger or more expensive than the
final system implementation, but can be used to verify and enhance the system's functionality.
 Time-to-market: The time required to develop a system to the point that it can be released and sold to customers. The main
contributors are design time, manufacturing time, and testing time.
 Reliability: Reliability is the probability that a machine or product can perform continuously, without failure, for a specified
interval of time when operating under standard conditions. Increased reliability implies less failure of the machinery and
consequently less downtime and loss of production.
 Availability: Availability refers to the probability that a system will be operative (up).
 Serviceability: Refers to how easily the system is repaired.
 Maintainability: It is the ease with which a software system or component can be modified to correct faults, improve performance,
or other attributes, or adapt to a changed environment
 Range of complementary hardware: For some applications the existence of a good range of compatible ICs to support the
microcontroller/microprocessor may be important.
 Special environmental constraints: The existence of special requirements, such as military specifications or minimum physical
size and weight, may well be overriding factors for certain tasks. In such cases, the decision is often an easy one.
 Ease of use: This will affect the time required to develop, implement, test and start using the system.
 Correctness: Our confidence that we have implemented the system's functionality correctly. We can check the functionality
throughout the process of designing the system, and we can insert test circuitry to check that manufacturing was correct.
 Safety: The probability that the system will not cause harm.
Metrics typically compete with one another: Improving one often leads to worsening of another. For example, if we
reduce an implementation's size, the performance may suffer. Some observers have compared this to a wheel with
numerous pins, as illustrated below. If you push one pin in, such as size, then the other pins pop out. To best meet this
optimisation challenge, the designer must be comfortable with a variety of hardware and software implementation
technologies, and must be able to migrate from one technology to another, in order to find the best implementation for a
given application and constraints. Thus, a designer cannot simply be a hardware expert or a software expert, as is
commonly the case today; the designer must have expertise in both areas.
Power
Performance Size
NRE cost
Figure 1.1: Design metric competition -improving one may worsen others.
The above design metrics can be divided into five major groups based on the design phenomena that it measures. The five
proposed groups are:
8
1. Performance Metrics: How fast is the system? How quickly can it execute the desired application.
2. Cost Metrics: The metrics of this group measure the product cost, the unit cost and the price of the product.
3. Power Consumption Metrics: Critical in battery operated systems.
4. System effectiveness Metrics: In many applications e.g. military applications, how adequate and effective
the system is in implement its target, is more important than cost. Reliability, Maintainability, Serviceability,
design adequacy, and flexibility are related to the metrics of this group.
5. Metrics that guide the designer to select one out of many off-the-shelf components that can do the job he
wants. Ease of use, software –support, motherboard support, safety, and availability of second source
supplier are some of the metrics of this group.
3.4. Performance Design Metrics

Performance of a system is a measure of how long the system takes to execute our desired application. For example, in
terms of performance, the computer user cares about how long a digital computer takes to execute a program. For the user
the computer is faster when his program runs in less time (less execution time). For the computer centre manager, a
computer is faster when it completes more jobs in an hour. For the user as well as for the manager, the computer's clock
frequency or instructions per second are not the key issues – the architecture of one computer may result in executing the
program faster but has a lower clock frequency than another computer. We note here that, the computer user is interested
in reducing response time (or latency, or execution time). The manager of a large data processing centre may be interested
in increasing throughput.
With that said, there are several measures of performance. For simplicity, suppose we have a single task that will be
repeated over and over, such as executing the programs of the users or as producing car in an automobile assembly line.
The two main measures of performance are latency (or response time) and throughput.
Definitions:
Latency or response time: The time between the start of the task's execution and the end.
For example, producing one car takes 4 hours.
Throughput: The number of tasks that can be processed per unit time. For example, an
assembly line may be able to produce 6 cars per day.
However, note that throughput is not always just the number of tasks times the latency. A system may be able to do better
than this by using parallelism, either by starting one task before finishing the next one (pipelining) or by processing each
task concurrently. In case of an automobile assembly line, there are many steps, each contributing something to the
construction of the car. Each step operates in parallel with the other steps, though on a different car. Thus, our assembly
line may have a latency of 4 hours but a throughput of 120 cars per day. In case of an automobile assembly line,
throughput is defined as the number of cars per day and is determined by how often a completed car exits the assembly
line.
Whether we are interested in throughput or response time, the key measurement is time: The computer that performs the
same amount of work in the least time is the fastest. The difference whether we measure one task (response time) or many
tasks (throughput).
We can expect many metrics measuring the throughput based on the definition of the task. It can be instruction based as in
case of MIPS, floating-point operations as in case of MFLOPS or any other task. Beside execution time and rate metrics,
a third group of performance metrics (discussed below) are widely used. A wide variety of performance metrics has been
proposed and used in the computer field to measure the performance. Unfortunately, as we are going to see later, many of
these metrics are often used and interpreted incorrectly.
3.4.1. Characteristics of a Good Performance Metric

There are many different metrics that have been used to describe a computer system's performance. Some of these metrics
are commonly used throughout the field, such as MIPS and MFLOPS (which are defined later in this chapter), whereas
others are introduced by manufacturers and/or designers for new situations as they are needed. Experience has shown that
not all of these metrics are ‗good‘ in the sense that sometimes using a particular metric out of context can lead to
erroneous or misleading conclusions. Consequently, it is useful to understand the characteristics of a 'good' performance
metric. This understanding will help when deciding which of the existing performance metrics to use for a particular
situation, and when developing a new performance metric.
A performance metric that satisfies all of the following criteria is generally useful to a performance analyst in allowing
accurate and detailed comparisons of different measurements. These criteria have been developed by observing the results
9
of numerous performance analyses over many years. While they should not be considered absolute requirements of a
performance metric, it has been observed that using a metric that does not satisfy these requirements can often lead the
analyst to make erroneous conclusions.
1. Linearity: The metric is linear if its value is proportional to the actual performance of the machine. That is, if the
value of the metric changes by a certain ratio, the actual performance of the machine should change by the same
ratio. For example, suppose that you are upgrading your system to a system whose speed metric (i.e. execution-
rate metric) is twice the same metric on your current system. You then would expect the new system to be able to
run your application programs in half the time taken by your old system. Not all types of metrics satisfy this
proportionally requirement.
2. Reliability: A performance metric is considered to be reliable if system A always outperforms system B when
the corresponding values of the metric for both systems indicate that system A should outperform system B. For
example, suppose that we have developed a new performance metric called WPAM that we have designed to
compare the performance of computer systems when running the class of word-processing application programs.
We measure system A and find that it has a WPAM rating of 128, while system B has a WPAM rating of 97. We
then can say that WPAM is a reliable performance metric for word-processing application programs if system A
always outperforms system B when executing these types of applications. While this requirement would seem to
be so obvious as to be unnecessary to state explicitly, several commonly used performance metrics do not in fact
satisfy this requirement. The MIPS metric, for instance is notoriously unreliable. Specifically, it is not unusual
for one processor to have a higher MIPS rating than another processor while the lower rated processor actually
executes a specific program in less time than does the higher rated. Such a metric is essentially useless for
summarizing performance, and we say that it is unreliable.
3. Repeatability: A performance metric is repeatable if the same value of the metric is measured each time the
same experiment is performed. Note that this also implies that a good metric is deterministic.
4. Ease of measurement: If a metric is not easy to measure, it is unlikely that anyone will actually use it.
Furthermore, the more difficult a metric is to measure directly, or to derive from other measured values, the more
likely it is that the metric will be determined incorrectly.
5. Consistency: A consistent performance metric is one for which the units of the metric and its precise definition
are the same across different systems and different configurations of the same system. If the units of a metric are
not consistent, it is impossible to use the metric to compare the performances of the different systems. While the
necessity for this characteristic would also seem obvious, it is not satisfied by many popular metrics, such as
MIPS and MFLOPS.
6. Independence: Many purchasers of computer systems decide which system to buy by comparing the values of
some commonly used performance metric. As a result, there is a great deal of pressure on manufacturers to
design their machines to optimise the value obtained for that particular metric, and to influence the composition
of the metric to their benefit. To prevent corruption of its meaning, a good metric should be independent of such
outside influences.
Many metrics have been proposed to measure performance. The following subsections describe some of the most
commonly used performance metrics and evaluate them against the above characteristics of a good performance metric.
3.4.2. Some Popular Performance Metrics

A number of popular measures have been devised in attempts to create a standard and easy-to-use measure of computer
performance. One result has been that simple metrics, valid in a limited context, have been heavily misused. All proposed
alternatives to the use of time as the performance metric have eventually lead to misleading claims, distorted results, or
incorrect interpretations. Before introducing the main Quantitative Principles of measuring computer performance
(execution time), we first introduce some of the most popular performance measures; clock rate, MIPS, and FLOPS.
These three measure performance by calculating the rate of occurrence of an event.
3.4.2.1. The Clock Rate

In many advertisements for computer systems, the most prominent indication of performance is often the frequency of the
processor's central clock. The implication is that a 250 MHz system must always be faster at solving the user's problem
than a 200 MHz system. However, this performance metric completely ignores how much computation is actually
accomplished in each clock cycle, it ignores the complex interactions of the processor with the memory subsystem and
the input/output subsystem, and it ignores the not at all unlikely fact that the processor may not be the performance
bottleneck.
Some of the characteristics and advantages of using the clock rate metric are:
 it is very repeatable since it is a constant for a given system,
 it is easy to measure, generally being set by a crystal oscillator ,
10
 it is consistent since the value of MHz is precisely defined across all systems
 it is independent of any sort of manufacturers' games.
The above characteristics appear as advantages to using the clock rate as a measure of performance, but in fact it is
nonlinear measure (doubling the clock rate seldom doubles the resulting performance) and is an unreliable metric. As
many owners of personal computer systems can attest, buying a system with a faster clock in no way assures that their
programs will run correspondingly faster. This point will be clarified later when considering the execution time equation.
Thus, we conclude that the processor's clock rate is not a good metric of performance.
3.4.2.2. IPS (Instructions Per Second)

The IPS metric (usually MIPS for Millions of IPS) is an attempt to develop a rate metric for computer systems that allows a
direct comparison of their speeds. It is a throughput or execution-rate performance metric and is a measure of the amount
of computation performed per unit time.
6
MIPS = Instruction count / (Execution time * 10 )
6
= Instruction count / (CPU clocks * Cycle time * 10 )
6
= (Instruction count * Clock rate) / (Instruction count * CPI * 10 )
6
= Clock rate / (CPI * 10 ) (1.1)
Execution time in terms of MIPS:
Execution time = (Instruction count * CPI) / Clock rate
6 6
= Instruction count / { (clock rate / (CPI * 10 )) *10 }
6
= Instruction count / (MIPS * 10 ) (1.2)
a) Problems that may arise of using MIPS as measure of performance
Taking the instruction as the unit of counting makes MIPS easy to measure, repeatable, and independent, however MIPS
suffers from many problems that make it a poor measure of performance. The problem with MIPS as a performance
metric is that different processors can do substantially different amounts of computation with a single instruction. For
instance, one processor may have a branch instruction that branches after checking the state of a specified condition code
bit. Another processor, on the other hand, may have a branch instruction that first decrements a specified count register,
and then branches after comparing the resulting value in the register with zero. In the first case, a single instruction does
one simple operation, whereas in the second case, one instruction actually performs two operations. The failing of the
MIPS metric is that each instruction is counted as one unit even though in this example the second instruction actually
performs more real computation. These differences in the amount of computation performed per instruction are at the
heart of the differences between RISC and CISC processors and render MIPS essentially useless as a performance metric.
As a matter of fact, sometimes MIPS can vary inversely with performance. A very clear example of this case is when we
calculate MIPS for two machines with the same instruction set but one of them has special hardware to execute floating-
point operations and another machine using software routines to execute the floating-point operations. The first machine
(with optional hardware) takes more clock cycles per floating-point instruction than per integer instruction. Then floating-
point programs using optional hardware instead of software floating-point routines have a lower MIPS rating (less number
of instructions with more number of cycles per instruction) while it is practically take less time. Software floating point
executes simpler instructions, resulting in a higher MIPS rating, but it executes so many of them compared with the case
of hardwired execution such that overall execution time is longer. Example1.4 shows such drawback of MIPS.
3.4.2.3. FLOPS (Floating-Point Operations per Second)

Most performance measures look at the processor‘s ability to process integer data: whole numbers, text and the like. This
is perfectly valid, because most processing is done on this sort of data. However, intensive floating point applications,
such as spread sheets and some graphics programs and games, make heavy use of the processor‘s floating point unit. In
fact, processors that have similar performance on integer tasks can have very different performance on floating point
operations. FLOPS is a measure of performance (speed) that takes into consideration this type of processing by defining
one FLOP as the execution of a single floating point operation.
MFLOPS, for a specific program running on a specific computer, is a measure of millions of floating point-operation
(megaflops) per second:
MFLOPS = Number of floating-point operations / (Execution time x 106) (1.3)
A floating-point operation is an addition, subtraction, multiplication, or division operation applied to numbers represented
by a single or double precision floating-point representation. Such data items are heavily used in scientific calculations
and are specified in programming languages using key words like float, real, double, or double precision.
11
MFLOPS performance depends heavily on the program. Different programs require the execution of different proportions
of floating point operations. Since MFLOPS was intended to measure floating-point performance, it is not applicable
outside that range. Compilers for example have a MFLOPS rating near 0 no matter how fast the machine is, because
compilers rarely use floating-point arithmetic.
Because it is based on operations in the program rather than on the instructions, MFLOPS has stronger claim than MIPS
to being a fair comparison between different machines. The reason of that is the fact that the same program running on
different computers may execute a different number of instructions but will always execute the same number of floating-
point operations.
MFLOPS is not dependable. It depends on the type of floating-point operations present in the program (availability of
floating point instructions). For example some computers have no sine instruction while others do. In the first group of
computers, the calculation of a sine function needs to call the sine routine, which would require performing several
floating-point operations, while in the second group this would require only one operation. Another potential problem is
that the MFLOPS rating changes according not only to mixture of integer and floating-point operations but to the mixture
of fast and slow floating-point operations. For example, a program with 100% floating-point adds will have a higher
rating than a program with 100% floating-point divides. The solution to both these problems is to define a method of
counting the number of floating-point operations in a high-level language program. This counting process can also weight
the operations, giving more complex operations larger weights, allowing a machine to achieve a higher MFLOPS rating
even if the program contains many floating-point divides. This type of MFLOPS is called normalized or weighted
MFLOPS.
I essence, both MIPS and MFLOPS are however quite misleading metrics and although often used, cannot give much data
on the real performance of the system.
3.4.2.4. SPEC (System Performance Evaluation Cooperative)

To standardize the definition of the actual result produced by a computer system in 'typical' usage, several computer
manufacturers banded together to form the System Performance Evaluation Cooperative (SPEC). This group identified a
set of integer and floating-point benchmark programs that were intended to reflect the way most workstation-class
computer systems were actually used. Additionally, and, perhaps, most importantly, they also standardized the
methodology for measuring and reporting the performance obtained when executing these programs.
The methodology defined consists of the following key steps.
1. Measure the time required to execute each program in the set on the system being tested.
2. Divide the time measured for each program in the first step by the time required to execute each program on a
standard basis machine to normalize the execution times.
3. Average together all of these normalized values using the geometric mean (see Section 3.3.4) to produce a single-
number performance metric.
While the SPEC methodology is certainly more rigorous than is using MIPS or MFLOPS as a measure of performance, it
still produces a problematic performance metric. One shortcoming is that averaging together the individual normalized
results with the geometric mean produces a metric that is not linearly related to a program's actual execution time. Thus,
the SPEC metric is not intuitive. Furthermore, and more importantly, it has been shown to be an unreliable metric in that
it is possible that some given program may execute faster on a system that has a lower SPEC rating than it does on a
competing system with a higher rating.
Finally, although the defined methodology appears to make the metric independent of outside influences, it is actually
subject to a wide range of tinkering. For example, many compiler developers have used these benchmarks as practice
programs, thereby tuning their optimisations to the characteristics of this collection of applications. As a result, the
execution times of the collection of programs in the SPEC suite can be quite sensitive to the particular selection of
optimisation flags chosen when the program is compiled. Also, the selection of specific programs that comprise the SPEC
suite is determined by a committee of representatives from the manufacturers within the cooperative. This committee is
subject to numerous outside pressures since each manufacturer has a strong interest in advocating application programs
that will perform well on their machines. Thus, while SPEC is a significant step in the right direction towards defining a
good performance metric, it is still far from ideal.
3.4.3. The Processor Performance Equation

The performance of any processor is measured, as mentioned above, by the time required to implement a given task, in
our case a given program. The CPU time for a program can be expressed as follows:
CPU time = CPU clock cycles for a program * Clock cycle time
12
For any processor the clock cycle time (or clock rate) is known and in addition to that we can also count the number of
instruction executed – the instruction count IC. If we know the number of clock cycles and the instruction count we can
calculate the average number of clock cycles per instruction (CPI):
CPI = (CPU clock cycles for a program)/ IC
This allows us to express CPU time using the following equation:
CPU time = IC x CPI x Clock cycle time (1.5)
Which can be written as:
(1.6)
In the general case, executing the program means the use of different instruction types each of which has its own
frequency of occurrence and its own CPI.
3.4.3.1. Instruction Frequency & CPI

Given a program with n types or classes of instructions with the following characteristics:
Ci = Count of instructions of type i
CPIi = Average cycles per instruction of type i
Fi = Frequency of instruction type i
= Ci / total instruction count
Then for a given program, the total CPU clock cycles will be expressed as:
∑ (1.7)
where ICi represents number of times instruction class i is executed in a program and CPIi represents the average number
of clock cycles for instruction i. Equation 1.7 can be used to express CPU time as:
∑ [ ] (1.8)
EXAMPLE 1.1: A RISC Machine has the following instruction types and frequencies of use.
Operation Freq, Fi CPIi CPIi x Fi % Time

ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Calculate the value of CPI

SOLUTION: In this case, we use the following equation to calculate the weighted average of CPI
n
CPI   (CPI i xFi ) (1.9)
i 1
CPI = 0.5 x 1 + 0.2 x 5 + 0.1 x 3 + 0.2 x 2 = 2.2
EXAMPLE 1.2: CPU Execution Time

A Program runs on a specific machine with the following parameters:
– Total instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program?
13
SOLUTION:
CPU time = (Seconds/ Program) = (Instructions/ Program) x (Cycles/ Instruction) x (Seconds/Cycle)

CPU time = Instruction count x CPI x Clock cycle
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= .125 seconds
EXAMPLE 1.3: Compiler Variations, MIPS, Performance:

• For the machine with instruction classes:
Instruction class CPI

A 1
B 2
C 3
• For a given program two compilers produced the following instruction counts:
Instruction counts (in millions) for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
• The machine is assumed to run at a clock rate of 100 MHz
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)

CPI = CPU execution cycles / Instructions count
CPU time = Instruction count x CPI / Clock rate

• For compiler 1:
– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.428
– MIP1 = 100 / (1.428) = 70.0
– CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds
• For compiler 2:
– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
– MIP2 = 100 / (1.25) = 80.0
– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds
This example clearly indicates the discrepancy in different metrics. The MIP rate of the first machine is lower yet the
execution time is also lower.
EXAMPLE 1.4: Effect of Compiler Variations on MIPS Rating and Performance:

Assume we optimised the compiler of the load-store machine for which the measurements given in the table have been
made. The compiler discards 50% of the arithmetic logic unit (ALU) instructions, although it cannot reduce loads, stores, or
branches. Ignoring systems issues and assuming a 500 MHz clock rate and 1.57 unoptimised CPI, what is the MIPS rating
for optimised code versus unoptimised code? Does the rating of MIPS agree with the ranking of execution time?
14
Instruction type Frequency Clock Cycle Count
ALU 43% 1
Loads 21% 2
Stores 12% 2
Branches 24% 2
SOLUTION: Given the value CPIun-optimised = 1.57, then from equation (1.1), we get:
MIPSun-optimised = 500 MHz/ 1.57 x 106 = 318.5
Use equation 1.5 to calculate the performance of un-optimised code, we get
CPU timeun-optimised = ICun-optimised x 1.57 x (2 x 10-9)
= 3.14 x 10-9 x ICunoptimised
For the optimised case:
Since the optimised compiler will discard 50% of the ALU operations, then:
ICoptimised = (1 – 0.43/2) ICunoptimised = (1 - 0.215) ICunoptmized = 0.785 ICunoptimised
The frequency given in table above must be changed now to fraction of the ICoptimised. The table will take the form:
Instruction type Frequency Clock Cycle Count
ALU 27.4% 1
Loads 26.8% 2
Stores 15.3% 2
Branches 30.5% 2
From this table we get:

CPIoptimised = 0.274 x 1 + 0.268 x 2 + 0.153 x 2 + 0.305 x 2 = 1.73
Using equation 1.1, we get:
MIPS optimised = 500MHz/1.73 x 106 = 289.0
The performance of the optimised compiler is:
CPU timeoptimised = ICoptimised x 1.73 x (2 x 10-9)
-9
= 0.785 ICunoptimised x 1.73 x 2 x 10
-9
= 2.72 x 10 ICunoptimised.
The optimised code is 3.14/2.72 = 1.15 times faster, but its MIPS rating is lower: 289 versus 318.
3.4.4. Speedup Ratio

In embedded systems, there is high possibility of having many design alternatives. In such cases it is important to
compare the performance of such design alternatives to select the best. Speedup is a common method of comparing the
performance of two systems. The speedup of system A over system B is determined simply as:
speedup of A over B = performance of A / performance of B.
Performance could be measured either as latency or as throughput, depending on what is of interest.
15
Another technique for comparing performance is to express the performance of a system as a percent change relative to
the performance of another system. Such a measure is called relative change. If, for example, the throughput of system A
is R1, and that of system B is R2, the relative change of system B with respect to A, denoted  2,1 , (that is, using system A
as the base) is then defined to be: R2  R1
 2,1 
R1
Relative change of system B w.r.t. system A =.
Typically, the value of  2,1 is multiplied by 100 to express the relative change as a percentage with respect to a given
basis system. This definition of relative change will produce a positive value if system B is faster than system A, whereas
a negative value indicates that the basis system is faster.
EXAMPLE 1.5:
As an example of how to apply these two normalization techniques, the speed up and relative change of the systems
shown in Table-1.1 are found using system 1 as the basis. From the raw execution times, we can see that system 4 is the
fastest, followed by systems 2, 1, and 3, respectively. However, the speedup values give us a more precise indication of
exactly how much faster one system is than the other. For instance, system 2 has a speedup of 1.33 compared with system
1 or, equivalently, it is 33% faster. System 4 has a speedup ratio of 2.29 compared with system 1 (or it is 129% faster).
We also see that system 3 is actually 11% slower than system 1, giving it a slowdown factor of 0.89.
Normally, we use the speedup ratio and the relative change to compare the overall performance of two systems. In many
cases it is required to measure how much the overall performance of any system can be improved due to changes in only a
single component of the system. Amdahl‘s law can be used, in such cases, to get the impact of improving certain feature
on the performance of the system.
Table-1.1: An example of calculating speedup and relative change using system 1 as the basis.
System Execution time Speedup Relative change (%)
X Tx(s) Sx,1  2,1
1 480 1 0
2 360 1.33 +33
3 540 0.89 -11
4 210 2.29 +129
3.4.5. Rates versus Execution Time metrics

One of the most important characteristics of a performance metric is that it be reliable. One of the problems with many of
the metrics discussed above that makes them unreliable is that they measure what was done whether or not it was useful.
What makes a performance metric reliable, however, is that it accurately and consistently measures progress towards a
goal. Metrics that measure what was done, useful or not, have been called means-based metrics whereas ends-based
metrics measure what is actually accomplished.
To obtain a feel for the difference between these two types of metrics, consider the vector dot-product routine shown in
Figure 1.2. This program executes N floating-point addition and multiplication operations for a total of 2N floating- point
operations. If the time required to execute one addition is t+ cycles and one multiplication requires t* cycles, the total time
required to execute this program is t1 = N(t+ + t*) cycles. The resulting execution rate then is
2N 2
R1   FLOPS / cycle
N (t   t* ) t   t*
s = 0;
for (i = 1; I < N; i++)
s = s + x[ i ] * y[ i ]
16
Figure 1.2: A vector dot-product example program.
Since there is no need to perform the addition or multiplication operations for elements whose value is zero, it may be
possible to reduce the total execution time if many elements of the two vectors are zero. Figure 1.3 shows the example
from Figure 1.2 modified to perform the floating-point operations only for those nonzero elements. If the conditional if
statement requires tif cycles to execute, the total time required to execute this program is
t2 = N [tif + f (t+ + t*)] cycles,
where f is the fraction of N for which both x [i] and y [i] are nonzero. Since the total number of additions and
multiplications executed in this case is 2Nf, the execution rate for this program is
2 Nf 2f
R2   FLOPS / cycle
N [t if  f (t   t* )] t if  f (t   t* )
s = 0;
for (i = 1; i < N; i++)
If (x[ i ] != 0 && y[ i ] != 0)
S = s + x[ i ] * y[ i ];
Figure1.3: The vector dot-product example program of Figure1.2 modified to calculate only nonzero
element.
If tif is four cycles, t+ is five cycles, t* is ten cycles, f is 10%, and the processor's clock rate is 250 MHz (i.e. one cycle is 4
ns), then:
t1 = 60N ns and
t2 = N [4 + 0.1(5 + 10)] * 4 ns = 22N ns.
The speedup of program 2 relative to program 1 then is found to be:
S2,1= 60N/22N = 2.73.
Calculating the execution rates realized by each program with these assumptions produces
R1 = 2/(60 ns) = 33 MFLOPS and
R2 = 2(0.1)/(22 ns) = 9.09 MFLOPS.
Thus, even though we have reduced the total execution time from t1 = 60N ns to t2 = 22N ns, the means-based metric
(MFLOPS) shows that program 2 is 72% slower than program 1. The ends-based metric (execution time), however,
shows that program 2 is actually 173% faster than program 1.
We reach completely different conclusions when using these two different types of metrics because the means-based
metric unfairly gives program 1 credit for all of the useless operations of multiplying and adding zero. This example
highlights the danger of using the wrong metric to reach a conclusion about computer-system performance.
3.4.6. Processor versus System Performance

It is important to remember that the processor is not the only component in the system that determines overall system
performance. It is an important one, but it isn't the only one. Many hardware companies like to overstate the value of the
processor's performance (sometimes focusing just on the clock speed). For example, you'll see claims that a system
running with a Pentium 150 is "50% faster" than one with a Pentium 100. These claims are, in almost every case, totally
untrue. In fact, they are usually nowhere close to being true.
The reason is that speeding up the processor only improves system performance for those aspects of system use that
depend on the processor. In most systems, the processor is already fast enough, but it is other parts of the system--the
memory, system buses, hard disk and video card that are the "bottlenecks" to system performance. Since most processors
are already much faster than the devices that support them, they spend a great deal of time waiting around for data that
they can use. Putting a still faster processor in place of the current one will not yield a very large performance increase if
this is the case, because the faster processor will just spend more time waiting.
One of the most important factors that influences overall system performance is memory bus speed. Since the processors
on these machines are so fast, they end up waiting a great deal on data from the memory bus.
17
3.4.7. Amdahl’s Law
The performance gain that can be obtained by improving some portion of a computer can be calculated using Amdahl‘s
Law. Amdahl‘s Law states,
“The performance improvement to be gained from using some faster mode of execution is
limited by the fraction of the time the faster mode can be used”.
In particular, consider the execution time lines shown in Fig.1.4. The top line shows the time (T old) required to execute
some given program on the system before any changes are made. Now assume that some change that reduces the
execution time for some particular feature in the processor by a factor of q is made to system. The program now runs in
time Tnew, where Tnew < Told , as shown in the bottom line.
Told
Tnew
Figure1.4: Execution times of different modes

Since the change to the system improves the performance of only some operations in the program, there are many other
operations in the program that are unaffected by this change. Let  be the fraction of all operations that are unaffected
by the enhancement. Amdahl’s Law defines the speedup that can be gained by this improvement as:
Performance for entire task using the enhancemen t when possible

Speedup 
Performance for entire task without using the enhancemen t
Alternatively,
Execution time for entire task without using the enhancemen t (Told )
Speedup 
Execution time for entire task using the enhancemen t when possible (Tnew )
Speedup tells us how much faster a task will run using the machine with the enhancement as opposed to the original
machine.
The speedup from some enhancement depends on two factors:
The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement.
For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the
fraction is 20/60. This value, which we will call Fractionenhanced, is always less than or equal to 1. In Fig.1.4,
Fraction enhancment  (1   ) .
The improvement gained by the enhanced execution mode; that is, how much faster the task would run if the enhanced
mode were used for the entire program. This value is the time of the original mode over the time of the enhanced mode: If
the enhanced mode takes 2 seconds for some portion of the program that can completely use the mode, while the original
mode took 5 seconds for the same portion, the improvement is 5/2. We will call this value, which is always greater than 1,
Speedupenhanced . In Fig.1.2, Speedupenhanced = q.
The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced
portion of the machine plus the time spent using the enhancement:
 Fraction enhanced 
Execution time new  Execution time old x  (1 - Fraction enhanced )  
 Speedup enhanced 
The overall speedup is the ratio of the execution times:
18
Execution time old 1
Speedup overall  
Execution time new Fraction enhanced
(1  Fraction enhanced ) 
Speedup enhanced
Or:
1 1
Speedupoverall =  (1.21)
  (1   ) / q 1 / q   (1  1 / q)
EXAMPLE 1.19:
Suppose that we are considering an enhancement that runs 10 times faster than the original machine but is only usable 40% of
the time. What is the overall speedup gained by incorporating the enhancement?
SOLUTION:
Fractionenhanced = (1-  ) = 0.4, or  = 0.6
Speedupenhanced = q = 10
Speedupoverall = 1/(0.6 + (0.4/10)) = 1/0.64 = 1.56
Amdahl‘s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resources
to improve cost/performance. The goal, clearly, is to spend resources proportional to where time is spent. We can also use
Amdahl‘s Law to compare two design alternatives, as the following examples shows.
EXAMPLE 1.20:
Implementation of floating-point (FP) square root vary significantly in performance. Suppose FP square root (FPSQR) is
responsible for 20% of the execution time of a critical benchmark on a machine. One proposal is to add FPSQR hardware that
will speed up this operation by a factor 10. The other alternative is just to try to make all FP instructions run faster; FP
instructions are responsible for a total of 50% of the execution time. The design team believes that they can make all FP
instructions run two times faster with the same effort as required to the fast square root. Compare these two design
alternatives.
SOLUTION:
We can compare these two alternatives by comparing the speedups:
1 1
Speedup FPSQR    1.22
0.2 0.82
(1 - 0.2) 
10
1 1
Speedup FP    1.33
0.5 0.75
(1 - 0.5) 
2.0
From the results, it is clear that improving the performance of the FP operations overall is better because of the higher frequency.
EXAMPLE 1.21:
For the RISC machine with the following instruction mix given earlier:
OperationFreq Cycles CPI(i) % Time

ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
If a CPU design enhancement improves the CPI of load instructions from 5 to 2, what is the resulting performance
improvement from this enhancement?
SOLUTION:
Fraction enhanced = (1-  ) = F = 45% or .45
19
Unaffected fraction = 100% - 45% = 55% or .55
Factor of enhancement = 5/2 = 2.5
Using Amdahl‘s Law: We get:

Speedup = 1/[0.55 + (0.45/2.5)] = 1.37
An Alternative Solution Using CPU Equation

It is possible to solve the above example using the CPU time equation (1.1:
Speedup ratio = [Original Execution Time] / [New Execution Time]
= [Instruction count x old CPI x clock cycle]/ [Instruction count x new CPI x clock cycle]
= Old CPI / New CPI
From the given table; Old CPI = 0.5 + 1.0 + 0.3 + 0.4 =2.2
New CPI = 0.5 x1 + 0.2 x 2 + 0.1 x 3 + 0.2 x 2 = 1.6
Accordingly:
Speedup ratio = 2.2/1.6 = 1.37
Which is the same speedup obtained from Amdahl‘s Law in the first solution.
EXAMPLE 1.24:
A Program is running on a specific machine with the following parameters:
– Total instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
Using the same program with these changes:
– A new compiler used: New instruction count 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHZ
What is the speedup with the changes?
SOLUTION:
Speedup = Old Execution Time/ New Execution = (Iold x CPIold x Clock cycleold)/Time Inew x
CPInew x Clock Cyclenew
Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )
= .125 / .095 = 1.32
or 32 % faster after changes.
Equation (1.21), as mentioned before, can be used to calculate the overall speedup obtained due to some improvement in the system.
However, it can be used to till us what is going to happen as the impact on the performance of improvement becomes large, that is, as
q  .
It is easy to show that, in the limit as

q   , (1   ) / q  0 . Thus:
1 1
lim q ( Speedup )  lim q  (1.22)
1 / q   (1  1 / q) 
This result says that, no matter how much one type of operation in a system is improved, the overall performance is inherently limited
by the operations that still must be performed but are unaffected by the improvement. For example the best (ideal) speedup that could
20
be obtained in a parallel computing system with p processors is p. However, if 10% of a program cannot be executed in parallel, the
overall speedup when using the parallel machine is at most 1 /  = 1/0.1 = 10, even if an infinite number of processors were available.
The constraint that 10% of the total program must be executed sequentially limits the overall performance improvement that could be
obtained.
3.4.7.1. Extending Amdahl's Law to Multiple Enhancements

Suppose that enhancement Ei accelerates a fraction Fi of the execution time by a factor Si and the remainder of the time is
unaffected then:
Original Execution Time

Speedup 
F
((1   Fi )   i ) * Orignal Execution time
i i Si
1
Speedup  (1.23)
F
((1   Fi )   i )
i i Si
Note: All fractions refer to original execution time.
EXAMPLE 1.25:
Three CPU performance enhancements are proposed with the following speedups and percentage of the
code execution time affected:
Speedup1 = S1 = 10 Percentage1 = F1 = 20%
While all three enhancements are in place in the new design, each enhancement affects a different
portion of the code and only one enhancement can be used at a time.
What is the resulting overall speedup?
SOLUTION:
1
Speedup 
Fi
((1   Fi )   )
i i Si
Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]
= 1 / [ .55 + .0333 ]
= 1 / .5833 = 1.71
Pictorial Depiction of the example is shown in Fig.1.5
21
Before:
Execution Time without enhancement: 1
Unaffected, fraction: 0.55 Fi = 0.2 F2 = 0.15 F3=0.1
Unaffected, Fraction =/10

0.55 /15 /30
Unchanged
Figure 1.5: Pictorial Depiction of Example 1.25
3.4.8. Concluding Remarks

Execution time is the only valid and unimpeachable measure of performance. Many other metrics have been proposed and
found deficient, either because they are not reflecting execution time, or because they are valid in a limited context. In the
case when the metric is valid in a limited context, if we extended its use beyond this context it can fail to give a true and
correct picture. In Fig.1.7, the performance metrics discussed in this section are shown together with the purpose (the
context) of using each.
Metrics of Computer Performance
Application Execution Time: Target

workload, SPEC95, etc
Programming
Language
(millions) of Instructions per second (MIPS)

ISA (millions) of (F.P.) operations per second
(MFLOP/s)
Datapath
Control Megabytes per second
Function Units
Cycles per second
(clock rate)
Transistor
Each metric has a purpose, and each can be misused.

Figure 1.7: Metrics of Computer Performance.
Although we have focused on performance and how to evaluate it in this section, designing only for performance without
considering other factors as cost, power, etc, is unrealistic. For example, in the field of computer design, the designers
22
must balance performance and cost, in mobile computing, the power comes as priority, in military application reliability,
design adequacy and system effectiveness are more important than cost, etc..
3.5. Time-to-Market Metric

3.5.1. Time-to-market, and Product Lifetime,
Product life cycles have progressively been reducing as technology enhances and this has placed huge pressure on the
design phase to go from concept to product in ever shorter times. The globalisation of the technology market compensates
to some extent by increasing the market volume so that total revenues may be as good as or better than past product cycles
but the compressed time frame does provide several challenges. One of these issues is the short time to market and the
problem of missing the market window if product development is delayed.
Missing this short market window, which means that the product begins being sold further to the right on the time scale,
can mean significant loss in sales. In some cases, each day that a product is delayed from introduction to market can
translate to a one-million-dollar loss. One way for the industry to respond to these trends is to move faster towards
programmable technologies.
Figure 1.9: Dramatic Change in Product life cycle (market window) [9]
Let's investigate the loss of revenue that can occur due to delayed entry of a product in the market. We'll use a simplified
model of revenue that is shown in Figure 1.10(b). This model assumes the peak of the market occurs at the halfway point,
denoted as W, of the product life, and that the peak is the same even for a delayed entry .The revenue for an on-time
market entry is the area of the triangle labeled On-time, and the revenue for a delayed entry product is the area of the
triangle labeled Delayed. The revenue loss for a delayed entry is just the difference of these two triangles' areas. Let's
derive an equation for percentage revenue loss, which equals «On-time -Delayed) / On-time) * 100%. For simplicity, we'll
assume the market rise angle is 45 degrees, meaning the height of the triangle is W, and we leave as an exercise the
derivation of the same equation for any angle. The area of the On-time triangle, computed as 1/2 * base * height, is thus
1/2 * 2W * W, or W2. The area of the Delayed triangle is ½(W -D + W) * (W- D). After algebraic simplification, we
obtain the following equation for percentage revenue loss:
Percentage revenue loss = (D(3 W- D)/2W2) * 100%
Consider a product whose lifetime is 52 weeks, so W = 26. According to the preceding equation, a delay of just D = 4
weeks results in a revenue loss of 22%, and a delay of D = 10 weeks results in a loss of 50%. Some studies claim that
reaching market late has a larger negative effect on revenues than development cost overruns or even a product price that
is too high.
23
Peak Revenue
Revenue($)
Peak Revenue
Market from delayed
Rise entry
On-time
Dela Time
yed
Time (months) D W 2W
(a) On Time (b)

Entry
Delayed Entry
Figure 1.10: Time-to-market: (a) market window, (b) simplified revenue model for computing revenue loss from
delayed entry.
3.6. Design Economics

It is important for the system designer to be able to predict the cost and the time to design the system. This can guide the
choice of an implementation strategy. This section will summarize a simplified approach to estimate the cost and time
values. The section differentiates between cost and price and shows the relationship between them: price is what you sell
a finished product for, and cost is the amount spent to produce it, including overhead. We are going to introduce the major
factors that affect cost of a processor design and discuss how these factors are changing over time.
In section 1.2 we defined the NRE cost and the unit cost as part from the design metrics. The NRE and the unit costs
represent two of the factors that define the final total cost of the product. The total product cost, in general, consists of two
parts:
 Fixed costs and
 Variable costs (proportional mainly to the number of products sold i.e. the sales volume).
The product cost is related to the fixed and variable costs by the equation:
Total product cost = fixed product cost + variable product cost * products sold (1.24)
In a product made from parts (units) the total cost for any part is
Total part cost = fixed part cost + variable cost per part * volume of parts (1.25)
The selling price Stotal of a single part, an integrated circuit in our case, may be given by
Stotal = Ctotal / (1- m) (1.26)
Where
Ctotal is the manufacturing cost of a single IC to the vendor
m is the desired profit margin
The profit margin has to be selected to ensure a profit after overhead and the cost of sales (marketing and sales costs) have
been considered. Normally, a profit model represents the profit flow during the product lifetime is used to calculate the
profit margin m.
Each term in the cost equations (equations 1.24 and 1.25) has an effect on the final unit price (C total and Stotal) and also
plays an important rule in selecting the best technology that can be used to implement the product to get the best
performance with minimum unit price. We start by understanding the meaning of fixed and variable costs.
24
3.6.1. Non-recurring (Fixed) Engineering Costs
Fixed costs represent the one-time costs that the manufacturer must spend to guarantee a successful development cycle of
the product and which are not directly related to the production strategy or the volume of production (or product sold).
Once the system is developed (designed), any number of units can be manufactured without any additional fixed costs.
Fixed costs include:
Non-recurring Engineering Costs (NREs):
 engineering design cost Etotal
 prototype manufacturing cost Ptotal
 Fixed costs to support the product Stotal
These costs are amortized over the total number of products sold. Ftotal, the total non recurring cost, is given by
Ftotal = Etotal + Ptotal + Stotal (1.27)
The NRE costs can be amortized over the lifetime volume of the product. Alternatively, the non-recurring costs can be
viewed as an investment for which there is a required rate of return. For instance, if $1M is invested in NRE for a product,
then $10M has to be generated for a rate of return of 10.
3.6.1.1. Engineering Costs

The cost of designing the product Etotal hopefully will happen only once during the product design process. The costs
include:
 personnel cost
 support costs
The personnel costs might include the labor for
 architectural design
 logic capture
 simulation for functionality
 layout of modules and chips
 timing verification
 DRC and tapeout procedures
 test generation: Includes:
o production test and design for test
o test vectors and test-program development cost
3.6.1.2. Prototype IC Manufacturing Costs

These costs (Ptotal) are the fixed costs to get the first ICs from the vendor. They include
 the mask cost
 test fixture costs (hardware costs)
 package tooling
EXAMPLE 1.26: ASIC fixed costs.
Figure 1.11 shows typical fixed costs of producing certain ASIC.
FPGA MGA CBIC

Training $800 $2000 $2000
Days 2 5 5
Cost/Day $400 $400 $400
Hardware $10000 $10000 $10000
Software $1000 $20000 $40000
Design $8000 $20000 $20000
Size(Gates) 10,000 10,000 10,000
Gates/day 500 200 200
Days 20 50 50
Cost/Day $400 $400 $400
Design for Test $2000 $2000
25
Days 5 5
Cost/Day $400 $400
NRE $30000 $70000
Masks $10000 $50000
Simulation $10000 $10000
Test program $10000 $10000
Second source $2000 $2000 $2000
Days 5 5 5
Cost/Day $400 $400 $400
Total Fixed Costs $21,800 $86,000 $146,000
Figure 1.11: ASIC Fixed Costs
EXAMPLE1.26:
You are starting a company to commercialize your brilliant research idea. Estimate the cost to prototype a mixed-signal
chip. Assume you have seven digital designers (each of salary $70K and costs an overhead of $30K), three analog
designers (each of salary $100K and costs an overhead of $30K), and five support personnel (each of salary $40K and an
overhead of $20K) and that the prototype takes two fabrication runs and two years.
SOLUTION:
Total cost of one digital Engineer per year = Salary + Overhead + Computer used
+ Digital front end CAD tool
= $70K + $30K + $10K + $10K = $120K
Total cost of one analog Engineer per year = Salary + Overhead + Computer used
+ Analog front end CAD tool
= $100K + $30K + $10K + $100K = $240K
Total cost of one support staff per year = Salary + Overhead + Computer used
= $40K + $20K + $10K = $70K
Total cost per year:
Cost of 7 digital engineers = 7 * $120K = $ 840K
Cost of 3 analog engineers = 3* $240K = $ 720K
Cost of 5 support staff = 5* $70K = $ 350K
Two fabrication runs = 2 * $1M = $ 2M
Total cost per year = $ 3.91M
The total predicted cost here is nearly $8M.
Figure 1.12 shows the breakdown of the overall cost.
It is important for the project manager to find ways to reduce fabrications costs. Clearly, the manager can reduce the
number of people and the labor cost. He might reduce the CAD tool cost and the fabrication cost by doing multiproject
chips. However, the latter approach will not get him to a pre-production version, because issues such as yield and
behavior across process variations will not be proved.
26
Back-End
Tools Fab
25% 25%
Entry
Tools
9% Salary
Computer 26%
4%
Overhead
11%
Figure 1.12: Pie chart showing prototyping costs for a mixed-signal IC
3.6.1.3. Fixed Costs to support the product

Once a chip has been designed and put into manufacture, the cost to support that chip from an engineering viewpoint may
have a few sources. Data sheets describing the characteristics of the IC have to be written, even for application-specific
ICs that are not sold outside the company that developed them. From time to time, application notes describing how to use
the IC may be needed. In addition, specific application support may have to be provided to help particular users. This is
especially true for ASICs, where the designer usually becomes the walking, talking, data sheet and application note.
Another ongoing task may be failure or yield analysis if the part is in high volume and you want to increase the yield.
As a matter of fact, every chip or test chip designed should have accompanying documentation that explains what it is and
how to use it. This even applies to chips designed in the academic environment because the time between design
submission and fabricated chip can be quite large and can tax even the best memory.
3.6.2. Variable and Recurring Costs:

Variable Costs:
Represent all the costs after the development (or designing) stage. It is directly related to the production phase (including production
strategy) and production volume. In case of integrated circuits it represents mainly the die cost including the packaging and test. Some
of the factors affecting the variable costs are:
 wafer size
 wafer cost
 Moore‘s Law (Gordon Moore of Intel)
 gate density
 gate utilization
 die size
 die per wafer
 defect density
 die cost
 profit margin (depends on fab or fabless )
 price per gate
 part cost
Special components of variable costs can be added. A few large companies such as Intel, TI, STMicroelectronics,
Toshiba, and IBM have in-house manufacturing divisions. Many other semiconductor companies outsource their
manufacturing to a silicon foundry such as TSMC, Hitachi/ UMC, IBM, LSI Logic, or ST. This is a recurring cost; it
recurs every time an IC is sold. Another component of the recurring cost is the continuing cost to support the part from a
technical viewpoint. Finally, there is what is called ―the cost of sales,‖ which is the marketing, sales force, and overhead
27
costs associated with selling each IC. In a captive situation such as the IBM microelectronics division selling CPUs to the
mainframe division, this might be zero.
Example 1.28: ASIC Variable Costs
Figure 1.13 shows typical variable costs of a certain ASIC
FPGA MGA CBIC Units

Wafer Size 6 6 6 inches
Wafer Cost 1,400 1,300 1,500 $
Design 10,000 10,000 10,000 gates
Density 10,000 20,000 25,000 gates/sq.cm
Utilization 60 85 100
Die Size 1.67 0.59 0.4 defects/sq.cm
Die/Wafer 88 248 365 %
Defect density 1 1 1 $
Yield 65 72 80 %
Die cost 25 7 5 $
Profit Margin 60 45 50 %
Price/gate 0 0 0 cents
Part cost $39 $10 $8

Figure 1.13: ASIC “Variable Costs”
An expression for the cost to fabricate an IC is as follows:

Rtotal = Rprocess + Rpackage + Rtest (1.28)
where
Rpackage = package cost
Rtest = test cost-the cost to test an IC is usually proportional to the number of vectors and the time to test.
Rprocess = Process cost. This includes all the variable costs, except the test and packaging, that are required to produce the IC.
It is given by:
Rprocess= W/(N. Yw.Ypa) (1.29)

where
W = wafer cost ($500-$3000 depending on process and wafer size)
N = gross die per wafer (the number of complete die on a wafer)
Yw = die yield per wafer (should be 70% - 90% for moderate sized dice in a mature process)
Ypa = packaging yield (or wafer yield) (should be =95%—99%)
Example 1.29
Suppose your startup seeks a return on investment of 5. The wafers cost $2000 and hold 400 gross die with a yield of 70%. If
packaging, test, and fixed costs are negligible, how much do you need to charge per chip to have a 60% profit margin? How
many chips do you need to sell to obtain a 5-fold return on your $8M investment?
Solution:
Rtotal = Rprocess = $2000/(400* 0.7) = $7.14. For a 60% margin, the chips are sold at $7.14/(1 - 0.6) = $17.86 with a profit of
$10.72 per unit. The desired ROI implies a profit of $8M • 5 = $40M. Thus, $40M / $10.72 = 3.73M chips must be sold.
Clearly, a large market is necessary to justify the investment in custom chip design.
3.7. Power Metric

In the early days of computers, there wasn't very much concern about how much power a processor used. There weren't
as many of them, and we weren't doing nearly as much with them. We were just thrilled they existed at all! As time has
gone on our demands on these machines have continued to increase and new uses have put power consumption in the
spotlight. The power usage and the voltage support of a processor are important for the following reasons:
 Power consumption equates largely with heat generation, which is a primary enemy in achieving increased
performance. Newer processors are larger and faster, and keeping them cool can be a major concern.
 In the embedded domain, applications cannot use a heat sink or a fan. A cellular phone with a fan would probably
not be a top seller.
 With millions of PCs in use, and sometimes thousands located in the same company, the desire to conserve
energy has grown from a non-issue to a real issue in the last five years.
 Reducing power usage is a primary objective for the designers of notebook computers and embedded systems,
since they run on batteries with a limited life and may even rely on ―supercapacitors‖ for staying operational during
power outages.
28
 Newer processors strive to add additional features, integrate more and more peripherals and to run at faster
speeds, which tend to increase power consumption This trend let the ICs to be more sensitive to heat problems
since their components are crammed into such a small space.
3.7.1. Reducing the Power Consumption

There are three conditions that have to be considered when planning for a microcontroller’s power consumption in an
application.
1. The ―intrinsic power,‖ which is the power required just to run the microcontroller.
2. The ―I/O drive power,‖ which takes into account the power consumed when the microcontroller is sinking/sourcing
current to external I/O devices.
3. The third is the power consumed when the microcontroller is in ―sleep‖ or ―standby‖ mode and is waiting with
clocks on or off for a specific external event.
Taking these three conditions into account when planning an application can change an application that will only run for a
few hours to literally several months.
3.7.1.1. Reducing the “Intrinsic Power”

Most embedded microprocessors have three different modes: fully operational; standby or power down; and clock-off (or
sleep). (Some vendors use different names, but they mean basically the same thing.) Fully operational means that the
clock signal is propagated to the entire processor and all functional units are available to execute instructions. In standby
mode, the processor is not actually executing an instruction, but all its stored information is still available—for example,
DRAM is still refreshed, register contents are valid, and so forth. In the event of an external interrupt, the processor returns
(in a couple of cycles) to fully operational mode without losing information. In clock-off (sleep) mode, the system has to be
restarted, which takes nearly as long as the initial start-up. The ―intrinsic power‖ consumption reaches its maximum value
when the processor is in fully operational mode.
a) Effect of Clock Speed on the power consumption
Reducing the power consumption starts by using lower-power semiconductor processor. All microcontrollers built today
use ―complementary metal oxide semiconductor‖ (CMOS) logic technology to provide the computing functions and
electronic interfaces. CMOS-based devices require significantly less power than older bipolar or NMOS-based devices.
―CMOS‖ is a ―push-pull‖ technology in which a ―PMOS‖ and ―NMOS‖ transistor are paired together. During a state
transition, a very small amount of current will flow through the transistors. As the frequency of operation increases, current
will flow more often in a given period of time. In other words, the average current will be going up. This increased current
flow will result in increased power consumption by the device. This type of power that is consumed by the microcontroller
when it is running and nothing is connected to the I/O pins is called ―Intrinsic power‖. As mentioned before, this
consumption of power is largely a function of the CMOS switching current, which in itself is a function of the speed of the
microcontroller.
By decreasing the system clock frequency, the intrinsic power can be reduced significantly.
For example, table-1 shows the current requirements (―Idd‖) for a PICMicro 16C73A running at different frequencies with 5
volts power input.
FREQUENCY CURRENT
1.0 MHz 550 uA
2.0 MHz 750 uA
3.0 MHz 1 mA
4.0 MHz 1.25 mA
By lowering the clock speed used in an application, the power required (which is simply the product of input voltage and
current) will be reduced. This may mean that the application software may have to be written ―tighter,‖ but the gains in
product life for a given set of batteries may be an important advantage for the application. Therefore, a CMOS device
should be driven at the slowest possible speed, to minimize power consumption. (―Sleep‖ mode can dramatically reduce a
microcontroller’s power consumption during inactive periods because if no gates are switching, there is no current flow in
the device.).
In some application reducing the speed is not acceptable. This is why most of the new processors focus on
reducing power consumption in fully operational and standby modes. They do so by stopping transistor activity when a
particular block is not in use. To achieve that, such designs connect every register, flip-flop, or latch to the processor's
29
clock tree. The implementation of the clock therefore becomes crucial, and it often must be completely redesigned. (In
traditional microprocessor design, the clock signal is propagated from a single point throughout the chip.)
b) Effect of Reducing the Voltage on power consumption
Obviously, the intrinsic power consumption can be further reduced by supplying a lower voltage input to the microcontroller
(which may or may not be possible, depending on the circuitry attached to the microcontroller and the microcontroller
itself). A CPU core voltage of 1.8V or even less is the state-of-the-art processor technology. The problem her is that the
power consumed by the CPU core is no longer the most power consuming part as it did in the past. The increasing
integration of power-consuming peripherals alongside embedded cores forces us to measure the power consumption of
the entire system. Overall power consumption differs a lot, depending on your system design and the degree of integration.
Increasingly, the processor core is only a small part of the entire system.
External and Internal Voltage Levels (Standard Voltage Levels and Motherboard Voltage Support
Early processors had a single voltage level that was used by the motherboard and the processor, typically 5 volts. As
processors have increased in speed and size the desire to use lower voltage levels has led designers to look at using
lower voltage levels. The first step was to reduce the voltage level to 3.3 volts. Newer processors reduce voltage levels
even more by using what is called a dual voltage, or split rail design.
A split rail processor uses two different voltages. The external or I/O voltage is higher, typically 3.3V for compatibility with
the other chips on the motherboard. The internal or core voltage is lower: usually 2.5 to 2.9 volts. This design allows these
lower-voltage CPUs to be used without requiring wholesale changes to motherboards, chipsets etc. The voltage regulator
on the motherboard is what must be changed to supply the correct voltages to the processor socket.
There are several "industry standard" voltages in use in processors today. The phrase ―industry standard‖ is put in quotes
because it seems that the number of different voltages being used continues to increase, and the new market presence of
AMD and Cyrix makes this even more confusing than when it is was just Intel we had to worry about. Table-1.3 shows the
current standard voltages with their names and the typical range of voltages that is considered acceptable to run a
processor that uses that nominal voltage level:
Table-1.3 Typical Standard Voltages

Name Nominal Voltage Acceptable Range
System +5V 5.000V 4.750V to 5.250V
STD(Standard) 3.300V 3.135V to 3.465V
(or 3.135V to 3.600V)
VR(Voltage 3.380V 3.300V to 3.465V
Reduced)
VRE(Voltage 3.520V (or 3.500V) 3.450V to 3.600V (or 3.400V
Reduced to 3.600V)
Extended)
System 2.8V 2.800V 2.700V to 2.900V
Note a couple of things about this table. First, the term ―voltage reduced‖ refers to being reduced from the original +5 V
used in older processors. Second, note that both VR and VRE have two slightly different definitions, as they were revised
by Intel. The differences are pretty subtle and generally aren't anything to worry about. Most motherboards will have
enough voltage settings to support a wide range of processors. Many will have additional voltages besides the ones in the
table.
3.7.1.2. Reducing the “I/O drive Power”

The I/O drive power is a measurement of how much power is sourced/sunk by the microcontroller to external devices and
is unique to the application. In many applications, the microcontroller is the only active device in the circuit (i.e., it is getting
input from switches and outputting information via LEDs). If the microcontroller is driving devices continually at times when
they are not required, more current (which means more power) than is absolutely necessary is being consumed by the
application.
3.7.1.3. Reduce”Sleep/Standby Power”

The last aspect of power consumption to consider when developing an application is the consumption during the
―sleep‖/‖standby‖ mode. The processor is usually entered to this mode by executing a special instruction and after
executing this instruction, the microcontroller shuts down its oscillator and waits for some event to happen (such as a
watchdog timer to count down or an input to change state).
30
Using this mode can reduce the power consumption of a microcontroller from milliwatts to microwatts. An excellent
example of what this means is taken from the Parallax BASIC Stamp manual in a question-and-answer section:
―Q. How long can the BASIC Stamp run on a 9-volt battery?
A. This depends on what you’re doing with the BASIC Stamp. If your program never uses sleep mode and
has several LEDs connected to I/O lines, then the BASIC Stamp may only run for several hours. If, however,
sleep mode is used and I/O current draw is minimal, then the BASIC Stamp can run for weeks.‖
Using the sleep mode in a microcontroller will allow the use of a virtual ―on/off‖ switch that is connected directly to the
microcontroller. This provides several advantages.
The first is cost and reliability; a simple momentary on/off switch is much cheaper and much less prone to failure than a
slide or toggle switch. Second is operational; while sleep mode is active, the contents of the variable RAM will not be lost or
changed.
There is one potential disadvantage of sleep mode for some applications, and that is the time required for the
microcontroller to ―wake up‖ and restart its oscillator. As mentioned before, his can be as long as the initial start-up time.
This time can be as long as ten milliseconds, which will be too long for many applications. Actually, this time is too slow for
interfacing with other computer equipment. If the main thing the microcontroller is interfacing to is a human, this wake-up
time will not be an issue at all.
One thing to remember with sleep mode is to make sure there is no current draw when it is active. A microcontroller sinking
current from an LED connected to the power rail while in sleep mode will result in extra power being consumed.
3.7.1.4. Use of Power Management Circuitry

Spurred on primarily by the goal of putting faster and more powerful processors in laptop computers, Intel has created
power management circuitry to enable processors to conserve energy use and lengthen battery life. These were
introduced initially in the Intel 486SL processor, which is an enhanced version of the 486DX processor. Subsequently, the
power management features were made universalized and incorporated into all Pentium and later processors. This feature
set is called SMM, which stands for "System Management Mode".
SMM circuitry is integrated into the physical chip but operates independently to control the processor's power use based on
its activity level. It allows the user to specify time intervals after which the CPU will be powered down partially or fully, and
also supports the suspend/resume feature that allows for instant power on and power off, used mostly with laptop PCs.
a) Power and Voltage for Specific Processors
This section contains a summary table showing the voltage and power characteristics of the different processors.
Note: A regular processor with a single entry for external and internal voltages is a single-voltage CPU, and one with
separate entries is a dual-plane CPU. OverDrive CPUs with a different external and internal voltage represent OverDrives
which include integrated voltage regulators.
Table 1.4: Voltage/Power Characteristics of different processors

External or Internal or
Processor Family Processor Version I/O Voltage Core Power Management
(V) Voltage (V)
8088 All 5 None
8086 All 5 None
80286 All 5 None
80386DX All 5 None
80386SX All 5 SMM on 80386SL only
SMM in SL-enhanced
80486DX All 5
versions
SMM in SL-enhanced
80486SX All 5
versions
SMM in SL-enhanced
Intel 5
80486DX2 versions
AMD, Cyrix 3.3 (5V tolerant) SMM
80486DX4 Intel 3.3 SMM
80486DX4 AMD, Cyrix 3.3 (5V tolerant) SMM
AMD 5x86 -- 3.45 SMM
Cyrix 5x86 All 3.45 SMM
60,66 5 SMM
Pentium
75 to 200 3.3 (STD) / 3.52 (VRE)
63,83 SMM
3.3 (STD) /
Pentium OverDrive 5
3.52 (VRE)
120/133
31
125, 150, 166
All SMM
Pentium with MMX 3.3 2.8
Pentium with MMX All 2.8 (3.3 from SMM

3.3
OverDrive motherboard)
Regular 3.3 SMM
6x86
6x86L 3.3 2.8
K5 All 3.52 SMM
150 3.1 SMM
Pentium Pro
166, 180, All 200s 3.3
Pentium II All 3.3 2.8 SMM
166, 200 2.9 SMM
K6 3.3
233 3.2
6x86MX All 3.3 2.9 SMM
3.8. Circuit Complexity

In the last three sections we discussed three important metrics; performance, cost (directly related to circuit complexity)
and power consumption. Many ways that can be used to evaluate the three metrics. The given ways, in many cases,
evaluate the metrics at a high design-level by using characterization information of the high-level components, e.g. the
performance of a process or the performance of a system. One important aspect in the design development cycle (See
Chapter-2) is to estimate the values of these metrics (performance, die size/or die area, and power consumption) at early
stage of the design cycle, i.e. prior to the time–consuming logic synthesis and physical layout phases. Estimating the
metrics at early stage gives the designer the opportunity to modify the design or the circuit layout to improve the metrics or
to achieve the targeted values. At gate level, however, estimation is more difficult and less accurate because circuit size
and performance strongly depend on the gate-level synthesis results and on the physical cell arrangement and routing.
In this section we give guidelines for the designer who wants to use modeling to find initial estimates for the performance,
the area, and the power consumption of the processor he is designing (e.g. an adder) at early stages. We based our
discussions assuming that the designer is using cell-based VLSI design techniques to design combinational circuits with
any degree of complexity.
3.8.1. Area modelling

Silicon area on a VLSI chip is taken up by the active circuit elements and their interconnections. In cell-based design
techniques, the following criteria for area modeling can be formulated:
 Total circuit complexity (GEtotal) can be measured by the number of gate equivalents (1 GE  1 2-input NAND-gate
 4 MOSFETs).
 Circuit area (Acircuit) is occupied by logic cells and inter-cell wiring. In technologies with three and more metal
layers, over-the-cell routing capabilities allow the overlap of cell and wiring areas, as opposed to 2-metal
technologies. This means that most of the cell area can also be used for wiring, resulting in very low routing area
factors. (Acircuits = Acell + Awiring)
 Total cell area (Acells) is roughly proportional to the number of transistors or gate equivalents (GEtotal) contained in
a circuit. This number is influenced by technology mapping, but not by physical layout. Thus, cell area can be
roughly estimated from a generic circuit description (e.g. logic equations or netlist with simple gates) and can be
precisely determined from a synthesized netlist. ( Acell  GEtotal )
 Wiring area (Awiring) is proportional to the total wire length. The exact wire lengths, however, are not known prior to
physical layout. ( Awiring  Ltotal )
 Total wire length (Ltotal) can be estimated from the number of nodes and the average wire length of a node
[Feu82, KP89] or, more accurate, from the sum of cell fan-out and the average wire length of cell-to-cell
connections (i.e. accounts for the longer wire length of nodes with higher fan-out). The wire lengths also depend
on circuit size, circuit connectivity (i.e., locality of connections), and layout topology, which are not known prior to
circuit partitioning and physical layout [RK92]. ( Ltotal  FOtotal )
 Cell fan-out (FO) is the number of cell inputs a cell output is driving. Fan-in is the number of inputs to a cell
[WE93], which for many combinational gates is proportional to the size of the cell. Since the sum of cell fan-out
(FOtotal) of a circuit is equivalent to the sum of cell fan-in, it is also proportional to circuit size. ( FOtotal  GEtotal )
32
 Therefore, in a first approximation, cell area as well as wiring area are proportional to the number of gate
equivalents. More accurate area estimations before performing actual technology mapping and circuit partitioning
are hardly possible. For circuit comparison purposes, the proportionality factor is of no concern. (
Acircuit  GEtotal  FOtotal )
The designer is normally interested in an area estimation model that is simple to compute while being as accurate as
possible, and it should anticipate from logic equations or generic netlists (i.e. netlists composed of simple logic gates)
alone. By considering the above observations, possible candidates are:
Unit-gate area model: This is the simplest and most abstract circuit area model, which is often used in the literature
[Tya93]. A unit gate is a basic, monotonic 2-input gate (or logic operation, if logic equations are concerned), such as
AND, OR, NAND, and NOR. Basic, non-monotonic 2-input gates like XOR and XNOR are counted as two unit gates,
reflecting their higher circuit complexities. Complex gates as well as multi-input basic gates are built from 2-input basic
gates and their gate count equals the sum of gate counts of the composing cells.
Fan-in area model: In the fan-in model, the size of 2- and multi-input basic cells is measured by counting the
number of inputs (i.e., fan-in). Complex cells are again composed of basic cells with their fan-in numbers summed up,
while the XOR/XNOR-gates are treated individually. The obtained numbers basically differ from the unit-gate numbers
only by an offset of 1 (e.g., the AND-gate counts as one unit gate but has a fan-in of two).
Other area models: The two previous models do not account for transistor level optimisation possibilities in complex gates,
e.g., in multiplexers and full-adders. More accurate area models need individual gate count numbers for such complex
gates. However, some degree of abstraction is sacrificed and application on arbitrary logic equations is not possible
anymore. The same holds true for models which take wiring aspects into consideration. One example of a more
accurate area model is the gate-equivalents model (GE) mentioned above, which bases on gate transistor counts and
therefore is only applicable after synthesis and technology mapping.
Inverters and buffers are not accounted for in the above area models, which make sense for pre-synthesis circuit
descriptions. Note that the biggest differences in buffering costs are found between low fan-out and high fan-out circuits.
With respect to area occupation however, these effects are partly compensated because high fan-out circuits need
additional buffering while low fan-out circuits usually have more wiring.
Investigations showed that the unit-gate model approach for the area estimation of complex gates, such as
multiplexers and full-adders, does not introduce more inaccuracies than e.g. the neglection of circuit connectivity for wiring
area estimation. With the XOR/XNOR being treated separately, the unit-gate model yields acceptable accuracy at the
given abstraction level.
Also, it perfectly reflects the structure of logic equations by modeling the basic logic operators individually and by regarding
complex logic functions as composed from basic ones. Investigations showed comparable performance for the fan-in and
the unit-gate models due to their similarity. After all, the unit-gate model is very commonly used in the literature.
3.8.2. Delay modelling

Propagation delay in a circuit is determined by the cell and interconnection delays on the critical path (i.e. longest signal
propagation path in a combinational circuit). As opposed to area estimation, not average and total numbers are of interest,
but individual cell and node values are relevant for path delays.
Critical path evaluation is done by static timing analysis which involves graph-based search algorithms. Of course, timings
are also dependent on temperature, voltage, and process parameters which, however, are not of concern for our
comparison purposes.
 Maximum delay (tcritical path) of a circuit is equal to the sum of cell inertial delays, cell output ramp delays, and wire
delays on the critical path.
( tcritical path  critical path((tcell  tramp )  critical pathtwire ) )
33
 Cell delay (tcell) depends on the transistor-level circuit implementation and the complexity of a cell. All simple gates
have comparable delays. Complex gates usually contain tree-like circuit and transistor arrangements, resulting in
logarithmic delay-to-area dependencies. ( tcell  log( Acell ) )
 Ramp delay (tramp) is the time it takes for a cell output to drive the attached capacitive load, which is made up of
interconnect and cell input loads. The ramp delay depends linearly on the capacitive load attached, which in turn
depends linearly on the fan-out of the cell. ( tramp  FOcell )
 Wire delay or interconnection delay (twire) is the RC-delay of a wire, which depends on the wire length. RC-delays,
however, are negligible compared to cell and ramp delays for small circuits such as the adders investigated in this
work.
(twire= 0).
 Thus, a rough delay estimation is possible by considering sizes and, with a smaller weighting factor, fan-out of the
cells on the critical path.
( tcritical path  
critical path
(log( Acell )  kFOcell ) )
Possible delay estimation models are:
Unit-gate delay model: The unit-gate delay model is similar to the unit-gate area model. Again, the basic 2-input gates
(AND, OR, NAND, NOR) count as one gate delay with the exception of the XOR/XNOR-gates which count as two
gate delays [Tya93]. Complex cells are composed of basic cells using the fastest possible arrangement (i.e., tree
structures wherever possible) with the total gate delay determined accordingly.
Fan-in delay model: As for area modeling, fan-in numbers can be taken instead of unit-gate numbers. Again, no
advantages over the unit-gate model are observed.
Fan-out delay model: The fan-out delay model bases on the unit-gate model but incorporates fan-out numbers, thus
accounting for gate fan-out numbers and interconnection delays [WT90]. Individual fan-out numbers can be
obtained from a generic circuit description. A proportionality factor has to be determined for appropriate weighting
of fan-out with respect to unit-gate delay numbers.
Other delay models: Various delay models exist at other abstraction levels. At the transistor level, transistors can be
modeled to contribute one unit delay each (  - model [CSTO91]). At a higher level, complex gates like full-adders
and multiplexers can again be modeled separately for higher accuracy [Kan91, CSTO91].
The impact of large fan-out on circuit delay is higher than on area requirements.
This is because high fan-out nodes lead to long wires and high capacitive loads and require additional buffering, resulting
in larger delays. Therefore, the fan-out delay model is more accurate than the unit-gate model. However, due to the much
simpler calculation of the unit-gate delay model and its widespread use, as well as for compatibility reasons with the
chosen unit-gate area model, this model will be used for the circuit comparisons in this work.
3.8.3. Power measures and modeling

An increasingly important performance parameter for VLSI circuits is power dissipation. Peak power is a problem with
respect to circuit reliability (e.g. voltage drop on power buses, ground bounce) which, however, can be dealt with by careful
design. On the other hand, average power dissipation is becoming a crucial design constraint in many modern
applications, such as high-performance microprocessors and portable applications, due to heat removal problems and
power budget limitations.
The following principles hold for average power dissipation in synchronous

CMOS circuits [ZF97]:
 Total power (Ptotal) in CMOS circuits is dominated by the dynamic switching of circuit elements (i.e., charging and
discharging of capacitances), whereas dynamic short-circuit (or overlap) currents and static leakage are of less
34
importance. Thus, power dissipation can be assumed proportional to the total capacitance to be switched, the
square of the supply voltage, the clock frequency, and the switching activity  in a circuit [CB95]. (
1
Ptotal  .Ctotal.Vdd2 . f clk . )
2
 Total capacitance (Ctotal) in a CMOS circuit is the sum of the capacitances from transistor gates, sources, and
drains and from wiring. Thus, total capacitance is proportional to the number of transistors and the amount of
wiring, both of which are roughly proportional to circuit size. ( Ctotal  GEtotal ).
 Supply voltage (Vdd) and clock frequency (fclk) can be regarded as constant within a circuit and therefore are not
relevant in our circuit comparisons. (Vdd , fclk =constant)
 The switching activity factor (  ) gives a measure for the number of transient nodes per clock cycle and depends
on input patterns and circuit characteristics. In many cases, input patterns to data paths and arithmetic units are
assumed to be random, which results in a constant average transition activity of 50% on all inputs (i.e., each input
toggles each second clock cycle). Signal propagation through several levels of combinational logic may decrease
or increase transition activities, depending on the circuit structure. Such effects, however, are of minor relevance
in adder circuits and will be discussed later in the thesis. (  = constant)
 Therefore, for arithmetic units having constant input switching activities, power dissipation is approximately
proportional to circuit size. ( Ptotal GEtotal )
If average power dissipation of a circuit can be regarded as proportional to its size, the presented area models can
also be used for power estimation. Thus, the unit-gate model is chosen for the power comparisons of generic circuit
descriptions.
3.8.4. Combined circuit performance measures

Depending on the constraints imposed by the design specifications, the performance of combinational circuits is measured
by means of either circuit size, propagation delay, or power dissipation, or by a combination of those. Frequently used
combined performance measures are the area-time or area-delay product (AT-product) and the power-time or power-delay
product (PT-product). The PT-product can also be regarded as the amount of energy used per computation. The unit-gate
models presented above for area, delay, and power estimation can also be used for AT- and PT-product comparisons.
3.9. System Effectiveness Metrics

Attributes of System Effectiveness
It is very important for a design group to measure the performance and the capabilities of the product they just finished and
if it is capable to accomplish its mission. There are a number of metrics that indicate the overall performance of a system.
The most popular are reliability, availability, and system effectiveness. Reliability (as will be explained latter) is the
probability of successful operation, whereas availability (see latter) is the probability that the system is operational and
available when it is needed. System effectiveness is the overall capability of a system to accomplish its mission and is
determined by calculating the product of reliability and availability.
System effectiveness was introduced in the late 1950s and early 1960s to describe the overall capability of a system for
accomplishing its intended mission. The mission (to perform some intended function) was often referred to as the ultimate
output of any system. Various system effectiveness definitions have been presented. The definition in ARINC [4] is: ‖the
probability that the system can successfully meet an operational demand within a given time when operated under
specified conditions.‖ To define system effectiveness for a one-shot device such as a missile, the definition was modified
to: ―the probability that the system [missile] will operate successfully [destroy the target] when called upon to do so under
specified conditions.‖ Effectiveness is obviously influenced by equipment design and construction (mainly reliability).
However, just as critical are the equipment’s use and maintenance (maintainability). Another famous and widely used
definition of system effectiveness is from MIL-STD-721B […]; ―a measure of the degree to which an item can be expected
to achieve a set of specific mission requirements and which may be expressed as a function of availability, dependability
and capability.‖
From the above simple discussion, it is clear that any model for system effectiveness must include many different
attributes. Many of the design metrics given in section 1.2 represent some of the attributes. Beside reliability,
maintainability, serviceability, availability and design adequacy mentioned in section 1.2, repairability (the probability that
a failed system will be restored to operable condition in a given active repair time), capability (a measure of the ability of
an item to achieve mission objectives given the conditions during the mission), dependability (a measure of the item
operating condition at one or more points during the mission, including the effects of reliability, maintainability and
survivability, given the item conditions at the start of the mission), human performance, and environmental effect are
attributes used in some system effectiveness models [..].
35
In the following we are going to discuss some of the attributes (also design metrics) used in many of the system
effectiveness models.
3.9.1. Reliability, Maintainability and Availability metrics
Digital systems, as all other sophisticated equipment, undergo the cycle of repair, check-out, operational readiness, failure,
and back to repair. When the cost of a machine’s not being in operation is high, methods must be applied to reduce these
out-of-service, or downtime, periods. The cost of downtime is not simply the lost revenue when the system is not used, but
also the cost of having to rerun programs that were interrupted by ailing system (specially if the system is a computer),
retransmit and requesting other terminals to resend (if possible) copy from real-time data lost and, loss of control of
external processes, opportunity costs, and costs related to user inconvenience, dissatisfaction, and reduce confidence in
the system. Other costs are related directly to the diagnosis and corrective repair actions, and associated logistics and
book keepings.
Due to the complexity of the digital systems, many users often decide not to maintain the system (processors, memory,
system software, peripherals) themselves, but rather to have a maintenance contract with the system manufacturer. The
cost of a maintenance contract over the useful life of the system in relation to its capital cost is quite high. Some literatures
suggest that roughly 38% of the life cycle cost is directed toward maintainability issues. Such high costs due to unreliability
and maintenance needs a strong argument for designing reliability, maintainability, and serviceability.
For the purpose of better understanding, and to understand the factors that must be considered by the system designer to
insure minimum downtime and minimum costs of maintenance, in the following we are defining and explaining the related
terms, such as reliability, availability, maintainability and serviceability. Reliability, maintainability, availability and
serviceability have a direct impact on both operational capability and life cycle costs. From a life cycle cost perspective, it
has been universally recognized that the qualities of reliability, maintainability and availability result in reduced life cycle
costs
3.9.1.1. Reliability
Reliability is an attribute of any computer-related component (software, hardware, or a network, for example) that
consistently performs according to its specifications. It has long been considered one of three related attributes that must
be considered when making, buying, or using a computer product or component. Reliability, availability, and maintainability
RAM, for short - are considered to be important aspects to design into any system. (Note: sometimes together with
reliability and availability, serviceability is used instead of maintainability. In this case RAS is used instead of RAM).
Quantitatively, reliability can be defined as ―the probability that the system will perform its intended function over the stated
duration of time in the specified environment for its usage‖. Therefore, the probability that a system successfully performs
as designed is called ―system reliability‖ or the ―probability of survival‖. In theory, a reliable product is totally free of
technical errors; in practice, however, vendors frequently express a product's reliability quotient as a percentage.
Evolutionary products (those that have evolved through numerous versions over a significant period of time) are usually
considered to become increasingly reliable, since it is assumed that bugs have been eliminated in earlier releases.
Software bugs, instructions sensitivity, and problems that may arise due to durability of the EEPROM and Flash memories
(The nature of the EEPROM architecture, limits the number of updates that may be reliably performed on a single location
– this is called the durability of the memory. At least 10,000 updates are typically possible for EEPROM and 100 updates
for flash memory), are some of the possible reasons of the failure of embedded systems.
Reliability of a system depends on the number of devices used to build the system. As the number of units used to build
the system increases, the chance of system unreliability becomes greater, since the reliability of any system (or equipment)
depends on the reliability of its components. The relationship between parts reliability and the system reliability depends
mainly on the system configurations and the reliability function can be formulated mathematically to varying degrees of
precision, depending on the scale of the modeling effort. To understand how the system reliability depends on the system
configuration and to understand how to calculate the system reliability giving the reliability of each component, we are
considering here two simple configurations. The two examples are considering the reliability system consisting of n
components. These components can be hardware, software, or even human. Let Pr(A i), 1  i  n , denote the probability
of event Ai that component I operates successfully during the intended period of time. Then the reliability of component I is
ri = Pr(Ai). Similarly, let Pr( Ai ) denote the probability of event Ai that component I fails during the intended period. In the
following calculations, we are going to assume that the failure of any component is independent of that of the other
components.
Case 1: Serial Configuration:
The series configuration is the simplest and perhaps one of the most common structures. The block diagram in
Fig.1.19 represents a series system consisting of n components.
36
1 2 i n
Figure 1.19: A Serial Configuration
In this configuration, all n components must be operating to ensure system operations. In other words, the system fails
when any one of the n components fails.
Thus, the reliability of a series system Rs is:
Rs = Pr(all components operate successfully)
= Pr( A1  A2  ......  An )
n
=  Pr( A )
i 1
i
The last equality holds since all components operate independently. Therefore, the reliability of a series system is:
n
Rs = r
i 1
i (1.33)
Case 2: Parallel Configuration

In many systems, several paths perform the same operation simultaneously forming a parallel configuration. A
block diagram for this system is shown in Fig.1.20. There are n paths connecting input to output, and the system fails if all
the n components fail. This is sometimes called a redundant configuration. The word ―redundant‖ is used only when the
system configuration is deliberately changed to produce additional parallel paths in order to improve the system reliability.
Thus, a parallel system may occur as a result of the basic system structure or may be produced by using redundancy in a
reliability design or redesign of the system.
Figure 1.20: Parallel Configuration
In a parallel configuration consisting of n components, the system is successful if any one of the n components is
successful. Thus, the reliability of a parallel system is the probability of the union of the n events A1, Aa,….., An. which can
be written as:
Rs = Pr( A1  A2  ...  An )
= 1  Pr( A1  A2  .....  An )
37
n
= 1   Pr( Ai )
i 1
n
= 1   (1  Pr( Ai )
i 1
The last equality holds since the component failures are independent. Therefore, the reliability of a parallel system is:
n
Rs  1   (1  ri ) (1.34)
i 1
Equations (1.33) and (1.34) show how the configuration of the components affects the system reliability. In addition, it is
possible to recognize two distinct and viable approaches to enhance system reliability; one on the level of the components
and the second on the level of the overall system organization.
i. Component technology: The first approach is based on component technology; i.e., manufacturing capability of
producing the component with the highest possible reliability, followed by parts screening, quality control,
pretesting to remove early failures (infant mortality effects), etc.
ii. System organization: The second approach is based on the organization of the system itself (e.g., fault-tolerant
architectures that make use of protective redundancy to mask or remove the effects of failure, and thereby
provide greater overall system reliability than would be possible by the use of the same components in a
simplex or nonredundant configuration).
Fault –tolerant and quasi fault-tolerant architectures: Fault tolerance is the capability of the system to perform its
functions in accordance with design specifications, even in the presence of hardware failures. If, in the event of faults,
the system functions can be performed, but do not meet the design specifications with respect to the time required to
complete the job or the storage capacity required for the job, then the system is said to be partially or quasi fault-
tolerant. Since the number of possible hardware failures can be very large, in practice it is necessary to restrict fault
tolerance to prespecified classes of faults from which the system is designed to recover.
Fault-tolerance is provided by application of protective redundancy, or the use of more resources so as to upgrade
system reliability. These resources may consist of more hardware, software, or time, or a combination of the three.
Extra time is required to retransmit messages or to execute programs, extra software is required to perform diagnosis
on the hardware, and extra hardware is required to provide replication of units.
When designing for reliability, the primary goal of the designer is to find the best way to increase system reliability.
Accepted principles for doing this include:
1. to keep the system as simple as is compatible with performance requirements;
2. to increase the reliability of the components in the system;
3. to use parallel redundancy for the less reliable components;
4. to use standby redundancy (hot standby) which can be switched to active components when failure occur;
5. to use repair maintenance where failed components are replaced but not automatically switched in;
6. to use preventive maintenance such that components are replaced by new ones whenever they fail, or at
some fixed time interval, whichever comes first;
7. to use better arrangement for exchangeable components; and
8. to use large safety factors or management programs for product improvement.
Note: The Institute of Electrical and Electronics Engineers (IEEE) sponsors an organization devoted to reliability in
engineering, the IEEE Reliability Society (IEEE RS). The Reliability Society promotes industry-wide acceptance of a
systematic approach to design that will help to ensure reliable products. To that end, they promote reliability not just in
engineering, but in maintenance and analysis as well. The Society encourages collaborative effort and information sharing
38
among its membership, which encompasses organizations and individuals involved in all areas of engineering, including
aerospace, transportation systems, medical electronics, computers, and communications.
3.9.1.2. Maintainability
A qualitative definition of maintainability M is given by Goldamn and Slattery (1979) as:
―…. The characteristics (both qualitative and quantitative) of material design and installation which make it
possible to meet operational objectives with a minimum expenditure of maintenance effort (manpower, personnel
skill, test equipment, technical data, and maintenance support facilities) under operational environmental
conditions in which scheduled and unscheduled maintenances will be performed‖
Recently, maintainability is described in MIL-HDBK-470A dated 4 August 1997, ―Designing and Developing Maintainable
Products and Systems‖ as:
―The relative ease and economy of time and resources with which an item can be retained in, or restored to, a
specified condition when maintenance is performed by personnel having specified skill levels, using prescribed
procedures and resources, at each prescribed level of maintenance and repair. In this context, it is a function of
design.‖
Based on the qualitative definitions, maintainability can also be expressed quantitatively by means of probability theory.
Thus quantitatively, according to Goldamn and Slattery,
―… maintainability is a characteristic of design and installation which is expressed as the probability that an item
will be restored to specified conditions within a given period of time when maintenance action is performed in
accordance with prescribed procedures and resources.‖
Mathematically, this can e expressed as:
-t/MTTR
M=1–e
Where t is the specified time to repair, and MTTR is the mean time to repair.
The importance of focusing on maintainability appears from some articles which suggests that roughly 38% of the life cycle
cost is directed toward maintainability issues
Design for maintainability requires a product that is serviceable (must be easily repaired) and supportable (must be cost-
effectively kept in or restored to a usable condition)—better yet if the design includes a durability feature called reliability
(absence of failures) then you can have the best of all worlds.
Supportability has a design subset involving testability (a design characteristic that allows verification of the status to be
determined and faults within the item to be isolated in a timely and effective manner such as can occur with build-in-test
equipment (BIT) so the new item can demonstrate it’s status (operable, inoperable, or degraded) and similar conditions for
routine trouble shooting and verification the equipment has been restored to useful condition following maintenance).
Maintainability is primarily a design parameter. The design for maintainability defines how long equipment will be down
and unavailable. Yes, you can reduce the amount of time spent by having a highly trained workforce and a responsive
supply system, which paces the speed of maintenance to achieve minimum downtimes. Unavailability occurs when the
equipment is down for periodic maintenance and for repairs. Unreliability is associated with failures of the system—the
failures can be associated with planned outages or unplanned outages.
Maintainability has true design characteristic. Attempts to improve the inherent maintainability of a product/item after the
design is frozen is usually expensive, inefficient, and ineffective as demonstrated so often in manufacturing plants when
the first maintenance effort requires the use of a cutting torch to access the item requiring replacement.
Poor maintainability results in equipment, which is unavailable, expensive for the cost of unreliability, and results in an
irritable state of conditions for all parties who touch the equipment or have responsibility for the equipment.
39
Availability: Availability refers to the probability that a system will be operative (up), and is expressed as:
Ai = MTBF/(MTBF+MTTR)
The above equation some times is called inherent availability equation. Inherent availability looks at availability from a
design perspective. In this equation:
MTBF = mean time between failures
MTTR = mean time to repair
Reliability and maintainability are considered complementary disciplines from the inherent availability equation. If mean
time between failure or mean time to failure (MTTF) is very large compared to the mean time to repair or mean time to
replace, then you will see high availability. Likewise if mean time to repair or replace is miniscule, then availability will be
high. As reliability decreases (i.e., MTTF becomes smaller), better maintainability (i.e., shorter MTTR) is needed to
achieve the same availability. Of course as reliability increases then maintainability is not so important to achieve the
same availability. Thus tradeoffs can be made between reliability and amenability to achieve the same availability and thus
the two disciplines must work hand-in-hand to achieve the objectives. Ai is the largest availability value you can observe if
you never had any system abuses.
The above quantitative definition of availability assumes a system model where all faults are immediately detected at the
time of their occurrence, and fault location and repair action are initiated immediately.
In the operational world we talk of the operational availability equation. Operational availability looks at availability by
collecting all of the abuses in a practical system
Ao = MTBM/(MTBM+MDT).
Where, MDT = mean down time.
MTBM = mean time between maintenance.
The mean time between maintenance includes all corrective and preventive actions (compared to MTBF which only
accounts for failures). The mean down time includes all time associated with the system being down for corrective
maintenance (CM) including delays (compared to MTTR which only addresses repair time) including self imposed
downtime for preventive maintenance (PM) although it is preferred to perform most PM actions while the equipment is
operating. Ao is a smaller availability number than Ai because of naturally occurring abuses when you shoot yourself in the
foot. The uptime and downtime concepts are explained in Figure 1 for constant values of availability. Figure 1 shows the
difficulty of increasing availability from 99% to 99.9% (increase MTBM by one order of magnitude or decrease MDT by one
order of magnitude) compared to improving availability from 85% to 90% (requires improving MTBM by less than ~½ order
of magnitude or decrease MDT by ~¾ order of magnitude).
40
Figure 1.21: Availability Relationships
Operational availability includes issues associated with: inherent design, availability of maintenance personnel, availability
of spare parts, maintenance policy, and a host of other non-design issues (whereas inherent availability addresses only the
inherent design)—in short, all the abuses! Testability, the subset of maintainability/supportability, enters strongly into the
MDT portion of the equation to clearly identify the status of an item so as to know if a fault exists and to determine if the
item is dead, alive, or deteriorated—these issues always affect affordability issues. Operational availability depends upon
operational maintainability which includes factors totally outside of the design environment such as insufficient number of
spare parts, slow procurement of equipment, poorly trained maintenance personnel, lack of proper tools and procedures to
perform the maintenance actions. Achieving excellent operational maintainability requires sound planning, engineering,
design, test, excellent manufacturing conformance, adequate support system [logistics] for spare parts, people, training,
etc to incorporate lessons learned from previous or similar equipment.
41

01 EmbeddedSystemsNotes2011a

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 EmbeddedSystemsNotes2011a

Uploaded by

Copyright:

Available Formats

Embedded Systems – 2011

1.1. Examples of embedded systems

1.3. User interfaces

1.8. Reliability regimes

2.1. Simple Control Loop

2.2. Non Pre-emptive Multitasking

2.3. Pre-emptive Multitasking

2.3.1. Monolithic Kernels

2.4. Custom Operating Systems

Application Specific Integrated Circuit (ASIC)

3.2. Optimising Design Metrics

3.3. Common Design Metrics

3.4. Performance Design Metrics

3.4.1. Characteristics of a Good Performance Metric

3.4.2. Some Popular Performance Metrics

3.4.2.1. The Clock Rate

3.4.2.2. IPS (Instructions Per Second)

3.4.2.3. FLOPS (Floating-Point Operations per Second)

3.4.2.4. SPEC (System Performance Evaluation Cooperative)

3.4.3. The Processor Performance Equation

3.4.3.1. Instruction Frequency & CPI

Operation Freq, Fi CPIi CPIi x Fi % Time

Calculate the value of CPI

CPI = 0.5 x 1 + 0.2 x 5 + 0.1 x 3 + 0.2 x 2 = 2.2

EXAMPLE 1.2: CPU Execution Time

CPU time = (Seconds/ Program) = (Instructions/ Program) x (Cycles/ Instruction) x (Seconds/Cycle)

EXAMPLE 1.3: Compiler Variations, MIPS, Performance:

Instruction class CPI

MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)

CPU time = Instruction count x CPI / Clock rate

EXAMPLE 1.4: Effect of Compiler Variations on MIPS Rating and Performance:

Instruction type Frequency Clock Cycle Count

From this table we get:

3.4.4. Speedup Ratio

3.4.5. Rates versus Execution Time metrics

t2 = N [tif + f (t+ + t*)] cycles,

3.4.6. Processor versus System Performance

Figure1.4: Execution times of different modes

Performance for entire task using the enhancemen t when possible

OperationFreq Cycles CPI(i) % Time

Using Amdahl‘s Law: We get:

An Alternative Solution Using CPU Equation

It is easy to show that, in the limit as

3.4.7.1. Extending Amdahl's Law to Multiple Enhancements

Original Execution Time

Note: All fractions refer to original execution time.

Unaffected, fraction: 0.55 Fi = 0.2 F2 = 0.15 F3=0.1

Unaffected, Fraction =/10

Figure 1.5: Pictorial Depiction of Example 1.25

3.4.8. Concluding Remarks

Metrics of Computer Performance

Application Execution Time: Target

(millions) of Instructions per second (MIPS)

Each metric has a purpose, and each can be misused.

3.5. Time-to-Market Metric

(a) On Time (b)

3.6. Design Economics

3.6.1.1. Engineering Costs

3.6.1.2. Prototype IC Manufacturing Costs

FPGA MGA CBIC

Total Fixed Costs $21,800 $86,000 $146,000

Figure 1.11: ASIC Fixed Costs

Figure 1.12: Pie chart showing prototyping costs for a mixed-signal IC