You are on page 1of 58

Embedded Computer Systems

Digital System v. Embedded System


• Digital System: may provide service
• as a self-contained unit (e.g., desktop PC)
• as part of a larger system (e.g., digital control
system for manufacturing plant)
• Embedded System:
• part of a larger unit
• provides dedicated service to that unit

Lect-01.2
Embedded Systems Overview
• Computing systems are everywhere
• Most of us think of “desktop” computers
• PC’s
• Laptops
• Mainframes
• Servers
• But there’s another type of computing system
• Far more common...

Lect-01.3
Embedded Systems Overview (cont.)
• Embedded computing systems
• Computing systems embedded
within electronic devices Computers are in here...

• Hard to define. Nearly any computing and here...

system other than a desktop


and even here...
computer
• Billions of units produced yearly,
versus millions of desktop units
• Perhaps 100s per household and per
automobile
Lots more of these,
though they cost a lot
less each.

Lect-01.4
A “Short List” of Embedded Systems

Anti-lock brakes Modems


Auto-focus cameras MPEG decoders
Automatic teller machines Network cards
Automatic toll systems Network switches/routers
Automatic transmission On-board navigation
Avionic systems Pagers
Battery chargers Photocopiers
Camcorders Point-of-sale systems
Cell phones Portable video games
Cell-phone base stations Printers
Cordless phones Satellite phones
Cruise control Scanners
Curbside check-in systems Smart ovens/dishwashers
Digital cameras Speech recognizers
Disk drives Stereo systems
Electronic card readers Teleconferencing systems
Electronic instruments Televisions
Electronic toys/games Temperature controllers
Factory control Theft tracking systems
Fax machines TV set-top boxes
Fingerprint identifiers VCR’s, DVD players
Home security systems Video game consoles
Life-support systems Video phones
Medical testing systems Washers and dryers

And the list goes on and on


Lect-01.5
Examples of Embedded Systems
• PC having dedicated software programs and
appropriate interfaces to a manufacturing
assembly line
• Microprocessor dedicated to a control function
in a computer, e.g., keyboard/mouse input
control

Lect-01.6
Outline
• Embedded systems overview
• Design challenge – optimizing design metrics
• Technologies
• Processor technologies
• Design technologies
• Generic codesign methodology

Lect-01.7
Some Application Domains
• CONSUMER PRODUCTS • Local
• Appliances, Games, A/V,
• e.g., appliance
Intelligent home devices
• TRANSPORTATION • Locally distributed
• Autos, Trains, Ships, • e.g., aircraft
Aircrafts control over a
• PLANT CONTROL LAN
• Manufacturing, Chemical, • Geographically
Power Generation
distributed
• NETWORKS
• e.g., telephone
• Telecommunication,
Defense
network

Lect-01.8
Parts of an Embedded System

USER EMBEDDED SYSTEM

I/O

MEMORY PROCESSOR

HARDWIRED UNIT
• Application-specific logic
• Timers
• A/D and D/A conversion

ENVIRONMENT
Lect-01.9
Parts of an Embedded System (cont.)
• Actuators - mechanical components (e.g.,
valve)
• Sensors - input data (e.g., accelerometer for
airbag control)
• Data conversion, storage, processing
• Decision-making

• Range of implementation options


• Single-chip implementation: system on a chip

Lect-01.10
Functions and Design Criteria
• Monitoring and control functions for the overall
system (e.g., vehicle control)
• Information-processing functions (e.g.,
telecommunication system -- data
compression, routing, etc.)

• Criteria: performance, reliability, availability,


safety, usability, etc.

Lect-01.11
Some Common Characteristics
• Single-functioned
• Executes a single program, repeatedly
• Tightly-constrained
• Low cost, low power, small, fast, etc.
• Reactive and real-time
• Continually reacts to changes in the system’s
environment
• Must compute certain results in real-time
without delay

Lect-01.12
An Embedded System Example
• Digital Camera
Digital camera chip
CCD

CCD preprocessor Pixel coprocessor D2A


A2D

lens

JPEG codec Microcontroller Multiplier/Accum

DMA controller Display ctrl

Memory controller ISA bus interface UART LCD ctrl

• Single-functioned -- always a digital camera


• Tightly-constrained -- Low cost, low power, small, fast
• Reactive and real-time -- only to a small extent

Lect-01.13
Design Challenge – Optimization
• Obvious design goal:
• Construct an implementation with desired
functionality
• Key design challenge:
• Simultaneously optimize numerous design
metrics
• Design metric
• A measurable feature of a system’s
implementation
• Optimizing design metrics is a key challenge

Lect-01.14
Design Challenge – Optimization (cont.)

• Common metrics
• Unit cost: the monetary cost of manufacturing each
copy of the system, excluding NRE cost
• NRE cost (Non-Recurring Engineering
cost): The one-time monetary cost of designing the
system
• Size: the physical space required by the system
• Performance: the execution time or throughput of
the system
• Power: the amount of power consumed by the
system
• Flexibility: the ability to change the functionality of
the system without incurring heavy NRE cost
Lect-01.15
Design Challenge – Optimization (cont.)

• Common metrics (continued)


• Time-to-prototype: the time needed to build a
working version of the system
• Time-to-market: the time required to develop a
system to the point that it can be released and sold to
customers
• Maintainability: the ability to modify the system
after its initial release
• Correctness, safety, many more

Lect-01.16
Design Metric Competition
• Expertise with both
Power software and hardware
is needed to optimize
design metrics
Performance Size
• Not just a hardware or
software expert, as is
NRE cost
common
CCD
Digital camera chip • A designer must be
A2D
CCD preprocessor Pixel coprocessor D2A
comfortable with
lens
JPEG codec Microcontroller Multiplier/Accum various technologies
DMA controller Display ctrl Hardware

Memory controller ISA bus interface UART LCD ctrl


Software

Lect-01.17
Time-to Market
• Time required to develop
a product to the point it
can be sold to
customers
• Market window
Revenues ($)

• Period during which the


product would have
highest sales
• Average time-to-market
Time (months)
constraint is about 8
months
• Delays can be costly

Lect-01.18
Delayed Market Entry
• Simplified revenue
model
• Product life = 2W, peak
Peak revenue at W
Peak revenue from • Time of market entry
delayed entry
Revenues ($)

On-time defines a triangle,


Market
rise
Market representing market
fall penetration
Delayed
• Triangle area equals
D W 2W revenue
On-time Delayed
entry entry
Time • Loss
• The difference between
the on-time and
delayed triangle areas
Lect-01.19
NRE and Unit Cost Metrics
• Costs:
• Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
• NRE cost (Non-Recurring Engineering cost): the one-time
monetary cost of designing the system
• total cost = NRE cost + unit cost * # of units
• per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost

• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit

Lect-01.20
NRE and unit cost metrics
• Compare technologies by costs -- best depends on
quantity
• Technology A: NRE=$2,000, unit=$100
• Technology B: NRE=$30,000, unit=$30
• Technology C: NRE=$100,000, unit=$2
$200,000 $200
A A
B B
$160,000 $160
C C
tota l c ost (x1000)

p er p rod uc t c ost
$120,000 $120

$80,000 $80

$40,000 $40

$0 $0
0 800 1600 2400 0 800 1600 2400
Numb er of units (volume) Numb er of units (volume)

• But, must also consider time-to-market


Lect-01.21
The Performance Design Metric
• Widely-used measure of system, widely-abused
• Clock frequency, instructions per second – not good measures
• Digital camera example – a user cares about how fast it processes
images, not clock speed or instructions per second
• Latency (response time)
• Time between task start and end
• e.g., Camera’s A and B process images in 0.25 seconds
• Throughput
• Tasks per second, e.g. Camera A processes 4 images per second
• Throughput can be more than latency seems to imply due to
concurrency, e.g. Camera B may process 8 images per second (by
capturing a new image while previous image is being stored).
• Speedup of B over S = B’s performance / A’s
performance
• Throughput speedup = 8/4 = 2

Lect-01.22
Three Key Technologies
• Technology
• A manner of accomplishing a task, especially
using technical processes, methods, or
knowledge
• Three key technologies for embedded systems
• Processor technology (CprE 581, 583, 681)
• IC technology (EE 501, 507, 511)
• Design technology (CprE 588)

Lect-01.23
Processor Technology
• The architecture of the computation engine used to
implement a system’s desired functionality
• Processor does not have to be programmable
• “Processor” not equal to general-purpose processor

Controller Datapath Controller Datapath Controller Datapath


Control index
Control Register Control logic Registers
logic
logic and file and State total
State register register
Custom State
+
ALU register
General
IR PC ALU IR PC
Data Data
memory memory
Program Data Program memory
memory memory
Assembly code Assembly code
for: for:

total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)

Lect-01.24
Processor Technology (cont.)

• Processors vary in their customization for the


problem at hand
total = 0
for i = 1 to N loop
total += M[i]
end loop
Desired
functionality

General-purpose Application-specific Single-purpose


processor processor processor

Lect-01.25
General-Purpose Processors
• Programmable device used in a
variety of applications
Controller Datapath
• Also known as “microprocessor”
Control Register
• Features logic and file
State register
• Program memory
General
• General datapath with large IR PC ALU
register file and general ALU
• User benefits Program
memory
Data
memory
• Low time-to-market and NRE Assembly code
costs for:

• High flexibility total = 0


for i =1 to …

• “Intel/AMD” the most well-known,


but there are hundreds of others
Lect-01.26
Application-Specific Processors
• Programmable processor optimized
for a particular class of applications Controller Datapath

having common characteristics Control Registers


logic and
• Compromise between general- State register
Custom
purpose and single-purpose ALU
processors IR PC

Data
• Features Program memory
memory
• Program memory
Assembly code
• Optimized datapath for:

• Special functional units total = 0


for i =1 to …

• Benefits
• Some flexibility, good performance,
size and power
Lect-01.27
Independence of Processor Technologies

• Basic tradeoff
• General vs. custom
• With respect to processor technology or IC technology
• The two technologies are independent

General- Single-
purpose ASIP purpose
General, processor processor Customized,
providing improved: providing improved:

Flexibility Power efficiency


Maintainability Performance
NRE cost Size
Time- to-prototype Cost (high volume)
Time-to-market
Cost (low volume)

PLD Semi-custom Full-custom

Lect-01.28
Design Technology
• The manner in which we convert our concept of
desired system functionality into an
implementation Compilation/ Libraries/ Test/
Synthesis IP Verification

System System Hw/Sw/ Model simulat./


Compilation/Synthesis: specification synthesis OS checkers
Automates exploration and
insertion of implementation
details for lower level.
Behavioral Behavior Cores Hw-Sw
specification synthesis cosimulators
Libraries/IP: Incorporates pre-
designed implementation from
lower abstraction level into
higher level. RT RT RT HDL simulators
specification synthesis components

Test/Verification: Ensures correct


functionality at each level, thus
reducing costly iterations Logic Logic Gates/ Gate
between levels. specification synthesis Cells simulators

To final implementation

Lect-01.29
Design Productivity Exponential Increase

100,000

10,000

(K) Trans./Staff – Mo.


1,000

Productivity
100

10

0.1

0.01

2005
1993

2001

2003
1983

1987
1985

1991
1989

1999
1997
1995

2007

2009
• Exponential increase over the past few
decades
Lect-01.30
Design Productivity Gap
• While designer productivity has grown at an
impressive rate over the past decades, the rate of
improvement has not kept pace with chip capacity

10,000 100,000
1,000 10,000

Logic transistors 100 1000


per chip 10 Gap 100 Productivity
IC capacity (K) Trans./Staff-Mo.
(in millions) 1 10
0.1 1
productivity
0.01 0.1
0.001 0.01

Lect-01.31
Design Productivity Gap (cont.)
• 1981 leading edge chip required 100 designer months
• 10,000 transistors / 100 transistors/month
• 2002 leading edge chip requires 30,000 designer
months
• 150,000,000 / 5000 transistors/month
• Designer cost increase from $1M to $300M
10,000 100,000
1,000 10,000
Logic transistors 100 1000
10 Gap 100 Productivity
per chip IC capacity
(in millions) 1 10 (K) Trans./Staff-Mo.
0.1 1
productivity
0.01 0.1
0.001 0.01

Lect-01.32
The Mythical Man-Month

• The situation is even worse than the productivity gap indicates


• In theory, adding designers to team reduces project completion time
• In reality, productivity per designer decreases due to complexities of team
management and communication
• In the software community, known as “the mythical man-month” (Brooks
1975)
• At some point, can actually lengthen project completion time! (“Too many
cooks”)
Team
• 1M transistors, 1 60000 16 15 16
50000 19 18
designer=5000
trans/month 40000 24 23
30000
• Each additional designer Months until completion
20000 43
reduces for 100 Individual
10000
trans/month
• So 2 designers produce 0 10 20 30 40
4900 trans/month each Number of designers

Lect-01.33
Co-Design Methodology
• Co-design
• Design of systems involving both hardware and
software components
• Starts with formal, abstract specification; series
of refinements maps to target architecture:
allocation, partitioning, scheduling,
communication synthesis
• Means to manage large-scale, complex
systems
R. Domer, D. Gajski, J. Zhu, “Specification and Design of
Embedded Systems,” it+ti magazine, Oldenbourg Verlag
(Germany), No. 3, June 1998.
Lect-01.34
Complex Systems
• SOC (System-On-a-Chip)
• Millions of gates on a chip
• Decreasing processing technologies (deep sub-
micron, 0.25 µm and below): decreasing geometry
size, increasing chip density
• Problems
• Electronic design automation (EDA) tools
• Time-to-market

Lect-01.35
Complex Systems (cont.)
• Abstraction
• Reduce the number of objects managed by a
design task, e.g., by grouping objects using
hierarchy
• Computer-aided design (CAD) example
• Logic level: transistors grouped into gates
• Register transfer level (RTL): gates grouped into
registers, ALUs, and other RTL components

Lect-01.36
Complex Systems (cont.)
• Abstraction
• Co-design example
• System level: processors (off-the-shelf or application-
specific), memories, application-specific integrated
circuits (ASICs), I/O interfaces, etc.
• Integration of intellectual property (IP) -
representations of products of the mind
• Reuse of formerly designed circuits as core cells

Lect-01.37
Generic Co-Design Methodology
Synthesis model
• Specification Analysis
• Allocation task &
• Partitioning Validation
• Scheduling
Note: design
models may
• Communication be captured
synthesis in the same
language
Implementation
• Software synthesis
• Hardware synthesis
• Interface synthesis

Lect-01.38
System Specification
• Describes the functionality of the system
without specifying the implementation
• Describes non-functional properties such as
performance, power, cost, and other quality
metrics or design constraints
• May be executable to allow dynamic
verification

Lect-01.39
System Specification Example

B0: top
shared sync
behavior
• integer write read child
variable behavior
• boolean
variable
sync
Graphical
representation:
• Hierarchy Behaviors
• Concurrency • Sequential: B1, B2, B3
• Transitions • Concurrent: B4, B5
between • Atomic: B1
behaviors • Composite: B2
Lect-01.40
System Specification Example (cont.)

Producer- shared sync


consumer
read
functionality write

• B6 computes a
value
• B4 consumes the
value sync

• Synchronization
is needed: B4
waits until B6
produces the value

Lect-01.41
System Specification Example
• Atomic behaviors
B1( ) B3( ) B7( )
{ { {
stmt; stmt; stmt;
... ... ...
} } }

B6( ) B4( )
{ {
int local; int local;
… wait(sync);
shared = local + 1; local = shared - 1;
signal(sync); ...
} }

Lect-01.42
Allocation
• Selects the type and number of components
from a library and determines their
interconnection
• Implements functionality so as to
• Satisfy constraints
• Minimize objective cost function
• Result may be customization of a generic
target architecture

Lect-01.43
Allocation Example

PE1 PE2

Proc1 LMem1 GMem1 ASIC1 LMem2

bus1 bus2 bus3

IF1 IF2 IF3

system bus

PE: Processing Element Arbiter1

LMem: Local Memory Target Architecture Model


GMem: Global Memory
IF: Interface
Lect-01.44
Partitioning
• Defines the mapping between the set of
behaviors in the specification and the set of
allocated components in the architecture
• Satisfy constraints
• Minimize costs
• Not yet near implementation
• Multiple behaviors in a single PE (scheduling)
• Interactions between PEs (communication)
• Design model
• additional level of hierarchy
• functional equivalence with specification

Lect-01.45
Partitioning Example
synchronization variables
Top shared sync B1_start B1_done B4_start B4_done

PE0 PE1
controlling Child (B1)
B1_start
behavior B1_ctrl assigned to
B1_done different PE
than
B4_start parent (B0)
B4_ctrl

B4_done

System model after partitioning


Lect-01.46
Partitioning Example (cont.)
• Atomic behaviors
B1( ) B1_ctrl( ) B3( )
{ { {
stmt; B7( )
wait(B1_start); signal(B1_start);
... {
… wait(B1_done);
} stmt;
signal(B1_done); }
...
}
}

B4( ) B4_ctrl( ) B6( )


{ { {
int local; signal(B4_start); int local;
wait(B4_start); wait(B4_done); …
wait(sync); } shared = local +
local = shared - 1;
1;

signal(sync);
signal(B4_done);
}
}
Lect-01.47
Scheduling
• Given a set of behaviors and optionally a set of
performance constraints, determines a total
order in time for invoking behaviors running on
the same PE
• Maintains the partial order imposed by
dependencies in the functionality
• Minimizes synchronization overhead between
PEs and context-switching overhead within
each PE

Lect-01.48
Scheduling
• Ordering information
• Known at compile time
• Static scheduling
• Higher inter-PE synchronization overhead if
inaccurate performance estimation, i.e., longer wait
times and lower CPU utilization
• Unknown until runtime (e.g., data-, event-
dependent)
• Dynamic scheduling
• Higher context-switching overhead (running task
blocked, new task scheduled)

Lect-01.49
Scheduling Example
shared sync B6_start B3_start

Scheduling
decision:
• Sequential
B6_start
ordering of
behaviors
sync
on PE0, PE1
• Synchronization
B3_start
to maintain
partial order
across Pes
• Optimization - no
control behaviors
System model after static scheduling

Lect-01.50
Scheduling Example (cont.)
• Atomic behaviors
B1( ) B3( ) B7( )
{ { {
… wait(B3_start); stmt;
signal(B6_start); ... ...
} } }

B6( ) B4( )
{ {
int local; int local;
wait(B6_start); wait(sync);
… local = shared - 1;
shared = local + 1; …
signal(sync); signal(B3_start);
} }
Lect-01.51
Communication Synthesis
• Implements the shared-variable accesses
between concurrent behaviors using an inter-
PE communication scheme
• Shared memory: read or write to a shared-
memory address
• Local PE memory: send or receive message-
passing calls
• Inserts interfaces to communication channels
(local or system buses)

Lect-01.52
Communication example

lbus0 lbus1 lbus2 sbus

Synthesis IF0 IF1 IF2

decision:
PE0 PE1
• Put all
global

Shared_mem
B6 B1
variables

Arbiter
into B7
Shared_mem
B4
• New global B3

variables in Top

System model after communication synthesis


Lect-01.53
Communication Example (cont.)
• Atomic behaviors
B1( ) B3( ) B7( )
{ { {
… wait(*B3_start_addr); stmt;
signal ... ...
(*B6_start_addr); }
}
}

B6( ) B4( )
{ {
int local; int local;
wait (*B6_start_addr); wait (*sync_addr);
… local = *shared_addr - 1;
*shared_addr = local + 1; …
signal(*sync_addr); signal (*B3_start_addr);
} }

Lect-01.54
Communication Example (cont.)
• Atomic behaviors
IF0( ) IF1( ) IF2( ) Arbiter( )
{ { { {
stmt; stmt; stmt; stmt;
... ... ... ...
} } } }

Shared_mem( )
{
int shared;
bool sync;
bool B3_start;
bool B6_start;
}

Lect-01.55
Analysis and Validation
• Functional validation of design models at each
step using simulation or formal verification
• Analysis to estimate quality metrics and make
design decisions
• Tools
• Static analyzer - program, ASIC metrics
• Simulator - functional, cycle-based, discrete-
event
• Debugger - access to state of behaviors
• Profiler - dynamic execution information
• Visualizer - graphical displays of state, data

Lect-01.56
Backend
• Implementations
• Processor: compiler translates model into
machine code
• ASIC: high-level synthesis tool translates model
into netlist of RTL components
• Interface
• Special type of ASIC that links a PE with other
components
• Implements the behavior of a communication
channel

Lect-01.57
Summary
• Embedded systems are everywhere
• Key challenge: optimization of design metrics
• Design metrics compete with one another
• A unified view of hardware and software is necessary
to improve productivity
• Key technologies
• Processor: general-purpose, application-specific, single-
purpose
• Design: compilation/synthesis, libraries/IP,
test/verification

Lect-01.58

You might also like