You are on page 1of 62

This course introduces embedded system design

using a modern approach.


Modern design requires a designer to have a
unified view of software and hardware, seeing
them not as completely different domains, but
rather as two implementation options based on
their design metrics (cost, performance, power,
flexibility, etc.).

2/21/2022 ESD UNIT 1 1


Three important trends have made such a unified view
possible.
➢ First, integrated circuit (IC) capacities have increased
to the point that both software processors and custom
hardware processors now commonly coexist on a
single IC.
➢ Second, quality compiler availability and average
program sizes have increased to the point that C
compilers (and even C++ or in some cases Java) have
become commonplace in embedded systems.

2
➢ Third, synthesis technology has advanced to the point that
synthesis tools have become commonplace in the design of digital
hardware. they allow the designer to describe desired processing in
a high-level programming language, and they then automatically
generate an efficient (in this case custom-hardware) processor
implementation.
➢ The second and third trends enable their unified design, by turning
embedded system design, at its highest level, into the problem of
selecting (for software), designing (for hardware), and integrating
processors.

2/21/2022 ESD UNIT 1 3


Chapter 1: Introduction

2/21/2022 ESD UNIT 1 4


Outline

• Embedded systems overview


– What are they?
• Design challenge – optimizing design metrics
• Technologies
– Processor technologies
– IC technologies
– Design technologies

5
Embedded systems overview

• Computing systems are everywhere


• Most of us think of “desktop” computers
– PC’s
– Laptops
– Mainframes
– Servers
• But there’s another type of computing system
– Far more common...

6
Embedded systems overview

• Embedded computing systems


Computers are in here...
– Computing systems embedded within
and here...
electronic devices
– Hard to define. Nearly any computing and even here...
system other than a desktop computer
– Billions of units produced yearly, versus
millions of desktop units
– Perhaps 50 per household and per
automobile Lots more of these,
though they cost a lot
less each.

7
A “short list” of embedded systems
Anti-lock brakes Modems
Auto-focus cameras MPEG decoders
Automatic teller machines Network cards
Automatic toll systems Network switches/routers
Automatic transmission On-board navigation
Avionic systems Pagers
Battery chargers Photocopiers
Camcorders Point-of-sale systems
Cell phones Portable video games
Cell-phone base stations Printers
Cordless phones Satellite phones
Cruise control Scanners
Curbside check-in systems Smart ovens/dishwashers
Digital cameras Speech recognizers
Disk drives Stereo systems
Electronic card readers Teleconferencing systems
Electronic instruments Televisions
Electronic toys/games Temperature controllers
Factory control Theft tracking systems
Fax machines TV set-top boxes
Fingerprint identifiers VCR’s, DVD players
Home security systems Video game consoles
Life-support systems Video phones
Medical testing systems Washers and dryers

And the list goes on and on


8
Some common characteristics of embedded
systems
• Single-functioned
– Executes a single program, repeatedly
• Tightly-constrained design metrics
– Low cost, low power, small, fast, etc.
• Reactive and real-time
– Continually reacts to changes in the system’s environment
– Must compute certain results in real-time without delay

9
An embedded system example -- a digital camera
Digital camera chip
CCD

CCD preprocessor Pixel coprocessor D2A


A2D

lens

JPEG codec Microcontroller Multiplier/Accum

DMA controller Display ctrl

Memory controller ISA bus interface UART LCD ctrl

• Single-functioned -- always a digital camera


• Tightly-constrained -- Low cost, low power, small, fast
• Reactive and real-time -- only to a small extent
10
➢ The A2D and D2A circuits convert analog images to digital and
digital to analog, respectively.
➢ The CCD pre-processor is a charge-coupled device pre-processor.
➢ The JPEG codec compresses and decompresses an image using the
JPEG2 compression standard, enabling compact storage in the
limited memory of the camera.
➢ The Pixel coprocessor aids in rapidly displaying images.
➢ The Memory controller controls access to a memory chip also
found in the camera, while the DMA controller enables direct
memory access without requiring the use of the microcontroller.
➢ The UART enables communication with a PC’s serial port for
uploading video frames, while the ISA bus interface enables a
faster connection with a PC’s ISA bus.
➢ The LCD ctrl and Display ctrl circuits control the display of
images on the camera’s liquid-crystal display device. 11
➢ A Multiplier/Accum circuit assists with certain digital signal
processing.
➢ At the heart of the system is a microcontroller, which is a
processor that controls the activities of all the other circuits.

12
This example illustrates some of the embedded system
characteristics described above.
➢ First, it performs a single function repeatedly. The system
always acts as a digital camera, wherein it captures,
compresses and stores frames, decompresses and displays
frames, and uploads frames.
➢ Second, it is tightly constrained. The system must be low
cost. It must be small so that it fits within a standard-sized
camera. It must be fast so that it can process numerous
images in milliseconds. It must consume little power so that
the camera’s battery will last a long time.
➢ However, this particular system does not possess a high
degree of the characteristic of being reactive and real-time, as
it only needs to respond to the pressing of buttons by a user, 13
Design challenge – optimizing design metrics

• Obvious design goal:


– Construct an implementation with desired
functionality
• Key design challenge:
– Simultaneously optimize numerous design metrics
• Design metric
– A measurable feature of a system’s implementation
– Optimizing design metrics is a key challenge

14
Design challenge – optimizing design metrics
1. Unit cost: The cost of manufacturing each copy of the
system, excluding NRE cost.
2. NRE cost (Non-Recurring Engineering cost): The cost of
designing the system. Once the system is designed, any
number of units can be manufactured without incurring any
additional design cost (hence the term “non-recurring”).
3. Size: The physical space required by the system, often
measured in bytes for software, and gates or transistors for
hardware.
4. Performance: The execution time or throughput of the
system.

15
5. Power: The amount of power consumed by the system,
which determines the lifetime of a battery, or the cooling
requirements of the IC
6. Flexibility: The ability to change the functionality of the
system without incurring heavy NRE cost. Software is typically
considered very flexible.
7. Time-to-market: The amount of time required to design and
manufacture the system to the point the system can be sold to
customers.
8. Time-to-prototype: The amount of time to build a working
version of the system, which may be bigger or more expensive
than the final system implementation, but can be used to verify
the system’s usefulness and correctness and to refine the
system's functionality.
16
9. Correctness: Check the functionality throughout the process
of designing the system, and we can insert test circuitry to
check that manufacturing was correct.
10. Safety: The probability that the system will not cause
harm.

17
Design metric competition -- improving one
may worsen others
Power • Expertise with both software
and hardware is needed to
Performance Size
optimize design metrics
– Not just a hardware or
software expert, as is common
NRE cost – A designer must be
comfortable with various
CCD
Digital camera chip technologies in order to choose
A2D
CCD preprocessor Pixel coprocessor D2A the best for a given application
lens
and constraints
JPEG codec Microcontroller Multiplier/Accum

DMA controller Display ctrl Hardware

Memory controller ISA bus interface UART LCD ctrl


Software

18
19
Time-to-market: a demanding design metric

• Time required to develop a


product to the point it can be
sold to customers
• Market window
Revenues ($)

– Period during which the


product would have highest
sales
• Average time-to-market
Time (months) constraint is about 8 months
• Delays can be costly

20
Losses due to delayed market entry

• Simplified revenue model


Peak revenue
– Product life = 2W, peak at W
Peak revenue from
– Time of market entry defines a
Revenues ($)

delayed entry
On-time triangle, representing market
Market rise Market fall penetration
Delayed – Triangle area equals revenue
• Loss
D W 2W – The difference between the on-
On-time Delayed Time time and delayed triangle areas
entry entry

21
Losses due to delayed market entry (cont.)

• Area = 1/2 * base * height


Peak revenue – On-time = 1/2 * 2W * W
Peak revenue from
– Delayed = 1/2 * (W-D+W)*(W-D)
Revenues ($)

On-time
delayed entry
• Percentage revenue loss =
Market rise Market fall (D(3W-D)/2W2)*100%
Delayed • Try some examples
– Lifetime 2W=52 wks, delay D=4 wks
D W 2W
– (4*(3*26 –4)/2*26^2) = 22%
On-time Delayed Time – Lifetime 2W=52 wks, delay D=10 wks
entry entry – (10*(3*26 –10)/2*26^2) = 50%
– Delays are costly!

22
NRE and unit cost metrics
• Costs:
– Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-time monetary
cost of designing the system
– total cost = NRE cost + unit cost * # of units
– per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost
• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300

Amortizing NRE cost over the units results in


an additional $200 per unit

23
NRE and unit cost metrics
• Compare technologies by costs -- best depends on quantity
– Technology A: NRE=$2,000, unit=$100
– Technology B: NRE=$30,000, unit=$30
– Technology C: NRE=$100,000, unit=$2

• But, must also consider time-to-market


24
The performance design metric
• Widely-used measure of system, widely-abused
– Clock frequency, instructions per second – not good measures
– Digital camera example – a user cares about how fast it processes
images, not clock speed or instructions per second
• Latency (response time)
– Time between task start and end
– e.g., Camera’s A and B process images in 0.25 seconds
• Throughput
– Tasks per second, e.g. Camera A processes 4 images per second
– Throughput can be more than latency seems to imply due to
concurrency, e.g. Camera B may process 8 images per second (by
capturing a new image while previous image is being stored).
• Speedup of B over S = B’s performance / A’s performance
– Throughput speedup = 8/4 = 2

25
Three key embedded system technologies

• Technology
– A manner of accomplishing a task, especially using technical
processes, methods, or knowledge
• Three key technologies for embedded systems
– Processor technology
– IC technology
– Design technology

26
EMBEDDED PROCESSOR TECHNOLOGY
Processor technology involves the architecture of the computation
engine used to implement a system’s desired functionality. While
the term “processor” is with programmable software processors,
nonprogrammable, digital systems also called as processors.

27
The application requires a specific embedded functionality,
represented as a cross, such as the summing of the items in an array,
as shown in Figure. Several types of processors can implement this
functionality.
• Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N loop
total += M[i]
end loop
Desired
functionality

General-purpose Application-specific Single-purpose


processor processor processor

28
General-purpose processors
• Programmable device used in a variety of
Controller Datapath
applications
Control
– Also known as “microprocessor” logic and
Register
file
• Features State register

– Program memory General


– General datapath with large register file and IR PC ALU

general ALU
• User benefits Program
memory
Data
memory
– Low time-to-market and NRE costs
Assembly code
– High flexibility for:

• “Pentium” the most well-known, but total = 0


for i =1 to …
there are hundreds of others

29
Design-metric benefits in using a general-purpose processor in an
embedded system
➢ Design time and NRE cost are low, because the designer must
only write a program, but need not do any digital design.
➢ Flexibility is high, because changing functionality requires only
changing the program.
➢ Unit cost may be relatively low for small quantities, since the
processor manufacturer sells large quantities to other customers and
hence distributes the NRE cost over many units.
➢ Performance may be fast for computation-intensive applications,
if using a fast processor, due to advanced architecture features and
leading edge IC technology.
Design-metric drawbacks in using General purpose processors.
➢ Unit cost may be too high for large quantities.
➢ Performance may be slow for certain applications.
➢ Size and power may be large due to unnecessary processor
hardware. 30
Single-purpose processors

• Digital circuit designed to execute exactly Controller Datapath


one program Control index
logic
– a.k.a. coprocessor, accelerator or peripheral total
• Features State
register +
– Contains only the components needed to
execute a single program Data
– No program memory memory

• Benefits
– Fast
– Low power
– Small size

31
Design metric benefits
➢ Performance may be fast
➢ Size and power may be small,
➢ Unit-cost may be low for large quantities

Design metric drawbacks


➢ Design time and NRE costs may be high
➢ Flexibility is low,
➢ Unit cost may be high for small quantities,
➢ Performance may not match general-purpose
processors for some applications.

32
Application-specific processors

• Programmable processor optimized for a Controller Datapath

particular class of applications having Control Registers


logic and
common characteristics State register
– Compromise between general-purpose and Custom
ALU
single-purpose processors IR PC

• Features Data
Program memory
– Program memory memory
– Optimized datapath Assembly code
– Special functional units for:

• Benefits total = 0
for i =1 to …

– Some flexibility, good performance, size and


power

33
Design metric benefits
➢ Using an ASIP in an embedded system can
provide the benefit of flexibility while still achieving
good performance, power and size.

Design metric drawbacks


➢ However, such processors can require large NRE
cost to build the processor itself, and to build a
compiler, if these items don’t already exist.
➢ Due to the lack of retargetable compilers that can
exploit the unique features of a particular ASIP,
designers using ASIPs often write much of the
software in assembly language.

34
IC technology
• The manner in which a digital (gate-level) implementation
is mapped onto an IC
– IC: Integrated circuit, or “chip” is a semiconductor device
consisting of a set of connected transistors and other devices.
– IC technologies differ in their customization to a design
– IC’s consist of numerous layers (perhaps 10 or more)
➢The bottom layers form the transistors.
➢The middle layers form logic gates.
➢The top layers connect these gates with wires.
– IC technologies differ with respect to who builds each layer
and when

35
One way to create these layers is by depositing photo-sensitive
chemicals on the chip surface and then shining light through
masks to change regions of the chemicals. The narrowest line
that we can create on a chip is called the feature size, which is
well below one micrometer (sub-micron).

36
IC technology
• Three types of IC technologies
– Full-custom/VLSI
– Semi-custom ASIC (gate array and standard cell)
– PLD (Programmable Logic Device)
Full-custom/VLSI
All layers are optimized for an embedded system’s
particular digital implementationin which includes
➢ Placing the transistors to minimize interconnection
lengths,
➢ Sizing the transistors to optimize signal transmissions
➢ Routing wires among the transistors.
37
➢ After completing all the masks, the mask specifications
will be sent to a fabrication plant that builds the actual IC.
➢ Full-custom IC design, often referred to as VLSI (Very
Large Scale Integration) design.
➢ Usually used only in high-volume or extremely
performance-critical applications.
Benefits:
Excellent performance with small size and power.
Draw backs:
High NRE cost and long turnaround times (typically
months) before the IC becomes available

38
Semi-custom ASIC (gate array and standard cell)
• In an ASIC technology Lower layers are fully or partially
built
– Designers are left with routing of wires and maybe placing
some blocks. In a gate array technology, the masks for the
transistor and gate levels are already built (i.e., the IC
already consists of arrays of gates). The remaining task is
to connect these gates to achieve our particular
implementation
• Benefits
– Good performance, good size, less NRE cost than a full-
custom implementation (perhaps $10k to $100k)
• Drawbacks
– Still require weeks to months to develop
39
PLD (Programmable Logic Device)
• All layers already exist
– Designers can purchase an IC
– Connections on the IC are either created or destroyed to
implement desired functionality
– Field-Programmable Gate Array (FPGA) very popular
• PLD's are divided into two types, simple and complex.
• One type of simple PLD is a PLA (Programmable Logic Array),
which consists of a programmable array of AND gates and a
programmable array of OR gates.
• Another type is a PAL (Programmable Array Logic), which
uses just one programmable array to reduce the number of
expensive programmable components.
40
• One type of complex PLD, is the FPGA (Field Programmable
Gate Array), which offers more general connectivity among
blocks of logic, rather than just arrays of logic as with PLD
and PALs, and are thus able to implement far more complex
designs.
• Benefits
– Low NRE costs, almost instant IC availability
• Drawbacks
– Bigger, expensive (perhaps $30 per unit), power hungry,
slower.

41
42
➢ In general a company marketing a commercial general-purpose
processor might first market a semi-custom implementation to
reach the market early, and then later introduce a full-custom
implementation.
➢ They might also first map the processor to an older but more
reliable technology, like 0.2 micron, and then later map it to a
newer technology, like 0.08 micron.
➢ These two evolutions of mappings explain why a processor’s
clock speed improves on the market over time.
➢ Furthermore, we often implement multiple processors of different
types on the same IC. The IC designed for digital camera
included a microcontroller (general-purpose processor) plus
numerous single-purpose processors on the same IC and this is
known as system on chip (SoC).
43
Moore’s law

• The most important trend in embedded systems


– Predicted in 1965 by Intel co-founder Gordon Moore
IC transistor capacity has doubled roughly every 18 months
for the past several decades
10,000
1,000

Logic transistors 100


per chip 10
(in millions) 1
0.1
Note:
0.01
logarithmic scale
0.001

44
Moore’s law

• Wow
– This growth rate is hard to imagine, most people
underestimate
– How many ancestors do you have from 20 generations ago
• i.e., roughly how many people alive in the 1500’s did it take to make
you?
• 220 = more than 1 million people
– (This underestimation is the key to pyramid schemes!)

45
Graphical illustration of Moore’s law

1981 1984 1987 1990 1993 1996 1999 2002

10,000 150,000,000
transistors transistors

Leading edge Leading edge


chip in 1981 chip in 2002

• Something that doubles frequently grows more quickly


than most people realize!
– A 2002 chip can hold about 15,000 1981 chips inside itself

46
Design Technology
• The manner in which we convert our concept of desired system
functionality into an implementation

47
The designer refines the system through several abstraction levels.
➢ At the system level, the designer describes the desired
functionality in some language, preferably an executable
language like C; and this is the system specification.
➢ The designer refines this specification by distributing portions of
it among chosen processors (general or single purpose), yielding
behavioural specifications for each processor.
➢ The designer refines these specifications into register-transfer
(RT) specifications by converting behaviour on general-purpose
processors to assembly code (software), and by converting
behaviour on single-purpose processors to a connection of
register-transfer components and state machines.
48
➢ The designer then refines the register-transfer-level
specification of a single-purpose processor into a logic
specification consisting of Boolean equations.
➢ Finally, the designer refines the remaining specifications
into an implementation, consisting of machine code for
general-purpose processors, and a gate-level net list for
single-purpose processors.
The following three approaches are used to improve the
design process for increased productivity.
Compilation/Synthesis
Libraries/IP
Test/Verification
49
Compilation/Synthesis
Compilation/Synthesis allows the designer to specify desired
functionality in an abstract manner, and automatically generates
lower-level implementation details.
➢ System synthesis tool converts an abstract system specification
into a set of sequential programs on general and single-purpose
processors.
➢ A behavioural synthesis tool converts a sequential program into
finite-state machines and register transfers.
➢ A register-transfer (RT) synthesis tool converts finite-state
machines and register-transfers into a datapath of RT
components and a controller of Boolean equations
➢ A logic synthesis tool converts Boolean expressions into a
connection of logic gates (called a netlist). 50
➢ The recent RT and behavioural synthesis tools has enabled a
unified view of the design process for single-purpose
(“hardware design,”) and general-purpose processors
(“software design”).
➢ In the past, the design processes were radically different –
software designers wrote sequential programs, while
hardware designers connected components.
➢ But today, synthesis tools have converted the hardware
design process essentially into one of writing sequential
programs with some knowledge of how the hardware will be
synthesized.

51
52
Libraries/IP
Libraries involve reuse of pre-existing implementations. Using
libraries of existing implementations can improve productivity
if the time it takes to find, acquire, integrate and test a library
item is less than that of designing the item oneself.
➢ A logic-level library may consist of layouts for gates and
cells.
➢ An RT-level library may consist of layouts for RT
components, like registers, multiplexors, decoders, and
functional units.
➢ A behavioural-level library may consist of commonly used
components, such as compression components, bus
interfaces, display controllers, and even general purpose
processors. 53
➢ Rather than these components being IC’s, they now available in
a form, called cores, (Intellectual Property (IP), exist in a “soft”
form) and we can implement them on just one portion of an IC.
➢ Finally, a system-level library might consist of complete systems
solving particular problems, such as an interconnection of
processors with accompanying operating systems and programs to
implement an interface to the Internet over an Ethernet network.
Test/Verification
Test/Verification involves ensuring that functionality is correct.
Simulation is the most common method of testing for correct
functionality.
➢ At the logic level, gate level simulators provide output signal
timing waveforms for a given input signal waveforms.
➢ Likewise, general-purpose processor simulators execute machine
code. 54
➢ At the RT-level, HDL simulators execute RT-level descriptions
and provide output waveforms for a given input waveforms.
➢ At the behavioural level, HDL simulators simulate sequential
programs, and co-simulators connect HDL and general purpose
processor simulators to enable hardware/software co-verification.
➢ At the system level, a model simulator simulates the initial
system specification using an abstract computation model,
independent of any processor technology, to verify correctness
and completeness of the specification.

55
Design productivity exponential increase
100,000

10,000

(K) Trans./Staff – Mo.


1,000

Productivity
100

10

0.1

0.01

2001

2005
1993

2003
1983

1985

1987

1991
1989

1997

1999

2007
1995

2009
• Exponential increase over the past few decades

56
The co-design ladder

• In the past: Sequential program code (e.g., C, VHDL)

– Hardware and software Compilers


Behavioral synthesis
(1990's)
(1960's,1970's)
design technologies were
Register transfers
very different Assembly instructions RT synthesis
– Recent maturation of Assemblers, linkers
(1980's, 1990's)

synthesis enables a unified (1950's, 1960's) Logic equations / FSM's


Logic synthesis
view of hardware and (1970's, 1980's)
Machine instructions
software Logic gates

• Hardware/software
“codesign”
Implementation
Microprocessor plus VLSI, ASIC, or PLD
program bits: “software” implementation: “hardware”

The choice of hardware versus software for a particular function is simply a tradeoff among various
design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
fundamental difference between what hardware or software can implement.

57
Independence of processor and IC
technologies
• Basic tradeoff
– General vs. custom
– With respect to processor technology or IC technology
– The two technologies are independent

General- Single-
purpose ASIP purpose
General, processor processor Customized,
providing improved: providing improved:

Flexibility Power efficiency


Maintainability Performance
NRE cost Size
Time- to-prototype Cost (high volume)
Time-to-market
Cost (low volume)

PLD Semi-custom Full-custom

58
Design productivity gap

• While designer productivity has grown at an impressive rate


over the past decades, the rate of improvement has not kept
pace with chip capacity
10,000 100,000
1,000 10,000

Logic transistors 100 1000


per chip 10 Gap 100 Productivity
IC capacity (K) Trans./Staff-Mo.
(in millions) 1 10
0.1 1
productivity
0.01 0.1
0.001 0.01

59
Design productivity gap

• 1981 leading edge chip required 100 designer months


– 10,000 transistors / 100 transistors/month
• 2002 leading edge chip requires 30,000 designer months
– 150,000,000 / 5000 transistors/month
• Designer cost increase from $1M to $300M
10,000 100,000
1,000 10,000
Logic transistors 100 1000
10 Gap 100 Productivity
per chip IC capacity
(in millions) 1 10 (K) Trans./Staff-Mo.
0.1 1
productivity
0.01 0.1
0.001 0.01

60
The mythical man-month

• The situation is even worse than the productivity gap indicates


• In theory, adding designers to team reduces project completion time
• In reality, productivity per designer decreases due to complexities of team management
and communication
• In the software community, known as “the mythical man-month” (Brooks 1975)
• At some point, can actually lengthen project completion time! (“Too many cooks”)
Team
60000 15
• 1M transistors, 1 16 16
50000 19 18
designer=5000 trans/month
40000 23
• Each additional designer 24
30000
reduces for 100 trans/month Months until completion
20000 43
• So 2 designers produce 4900 Individual
10000
trans/month each
0 10 20 30 40
Number of designers

61
Summary

• Embedded systems are everywhere


• Key challenge: optimization of design metrics
– Design metrics compete with one another
• A unified view of hardware and software is necessary
to improve productivity
• Three key technologies
– Processor: general-purpose, application-specific, single-
purpose
– IC: Full-custom, semi-custom, PLD
– Design: Compilation/synthesis, libraries/IP, test/verification
62

You might also like