You are on page 1of 34

Design challenge – optimizing

design metrics
• Design metric
– A measurable feature of a system’s implementation
– Optimizing design metrics is a key challenge

4
Design challenge – optimizing
design metrics
Common metrics
Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
NRE cost (Non-Recurring Engineering cost): The one-time
monetary cost of designing the system
Size: the physical space required by the system
Performance: the execution time or throughput of the system
Power: the amount of power consumed by the system
Flexibility: the ability to change the functionality of the system
without incurring heavy NRE cost

5
Design challenge – optimizing
design metrics
Common metrics (continued)
Time-to-prototype: the time needed to build a working version of
the system
Time-to-market: the time required to develop a system to the point
that it can be released and sold to customers
Maintainability: the ability to modify the system after its initial
release
Correctness, safety, many more

6
Design metric competition
improving one may worsen others
Expertise with both
software and hardware is
Power
needed to optimize design
metrics
 Not just a hardware or
Performance Size
software expert, as is
common
 A designer must be
NRE cost
comfortable with various
technologies in order to
choose the best for a given
application and constraints

7
Time-to-market: a demanding
design metric Time required to develop
a product to the point it
can be sold to customers
Market window
 Period during which the
product would have highest
Revenues ($)

sales
Average time-to-market
Time (months)
constraint is about 8
months
Delays can be costly

8
Losses due to delayed market
entry
Simplified revenue model
Peak revenue
 Product life = 2W, peak at W
Peak revenue from  Time of market entry defines
delayed entry
a triangle, representing
Revenues ($)

On-time

Market rise Market fall market penetration


Delayed  Triangle area equals revenue
Loss
D W 2W  The difference between the
On-time Delayed Time on-time and delayed triangle
entry entry
areas

9
Losses due to delayed market
entry (cont.)  Area = 1/2 * base * height
Peak revenue
 On-time = 1/2 * 2W * W
 Delayed = 1/2 * (W-D+W)*(W-D)
Peak revenue from
delayed entry
Percentage revenue loss =
Revenues ($)

On-time

Market rise Market fall (D(3W-D)/2W2)*100%


Delayed Try some examples
– Lifetime 2W=52 wks, delay D=4
D W 2W wks
On-time Delayed Time – (4*(3*26 –4)/2*26^2) = 22%
entry entry – Lifetime 2W=52 wks, delay D=10
wks
– (10*(3*26 –10)/2*26^2) = 50%
– Delays are costly!
10
NRE
Costs: and unit cost metrics

– Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-time monetary cost
of designing the system
– total cost = NRE cost + unit cost * # of units
– per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost

• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit

11
The performance design metric
Widely-used measure of system, widely-abused
 Clock frequency, instructions per second – not good measures
 Digital camera example – a user cares about how fast it processes images,
not clock speed or instructions per second
Latency (response time)
 Time between task start and end
 e.g., Camera’s A and B process images in 0.25 seconds
Throughput
 Tasks per second, e.g. Camera A processes 4 images per second
 Throughput can be more than latency seems to imply due to concurrency,
e.g. Camera B may process 8 images per second (by capturing a new image
while previous image is being stored).
Speedup of B over S = B’s performance / A’s performance
 Throughput speedup = 8/4 = 2

12
Three key embedded system
technologies
Technology
A manner of accomplishing a task, especially using
technical processes, methods, or knowledge
Three key technologies for embedded systems
Processor technology
IC technology
Design technology

13
Processor technology
The architecture of the computation engine used to
implement a system’s desired functionality
Processor does not have to be programmable
 “Processor” not equal to general-purpose processor
Controller Datapath Controller Datapath Controller Datapath
Control index
Control Register Control logic Registers
logic
logic and file and State total
State register State
Custom +
register register
ALU
General
IR PC ALU IR PC
Data Data
memory memory
Program Data Program
memory memory memory
Assembly code Assembly code
for: for:

total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)

14
Processor technology
 Processors vary in their customization for the problem at hand

total = 0
for i = 1 to N loop
total += M[i]
Desired end loop
functionality

General- Application-specific Single-


purpose processor purpose
processor processor

15
General-purpose processors
Programmable device used in a variety
Controller Datapath
of applications
 Also known as “microprocessor” Control Register
logic and file
Features State
register
 Program memory General
 General datapath with large register file IR PC ALU

and general ALU


User benefits Program Data
memory memory
 Low time-to-market and NRE costs
Assembly code
 High flexibility for:

“Pentium” the most well-known, but total = 0


for i =1 to …
there are hundreds of others

16
Single-purpose processors
Digital circuit designed to execute
Controller Datapath
exactly one program Control index
 a.k.a. coprocessor, accelerator or peripheral logic
total
Features State
+
register
 Contains only the components needed to
execute a single program Data
 No program memory memory

Benefits
 Fast
 Low power
 Small size

17
Application-specific processors
• Programmable processor optimized for Controller Datapath
a particular class of applications having Control Registers
common characteristics logic and
State
– Compromise between general-purpose and register
Custom
single-purpose processors ALU
IR PC
• Features
Data
– Program memory
Program memory
– Optimized datapath memory
– Special functional units Assembly code
for:
• Benefits
total = 0
– Some flexibility, good performance, size and for i =1 to …
power

18
Architectures
 We must be clear about the architecture that we are going to use for design of
ES
 It has also got a wide variety of choices, to be chosen according to the given
application.
 The choices are as follows
 Application-specific Architecture :-
- Controller Architecture
- Datapath Architecture
- Finite state machine with datapath

 General Purpose Architecture :-


- CISC
- RISC
- Vector machine
- VLIW ( Very Long Instruction Word Computer )
Basic Architecture
Control unit and Processor
datapath Control unit Datapath

 Note similarity to ALU


Controller Control
single-purpose /Status
processor
Registers
Key differences
 Datapath is general
 Control unit doesn’t PC IR
store the algorithm –
the algorithm is
I/O
“programmed” into the
Memory
memory

20
Datapath Operations
Load
Processor
 Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control +1
/Status
– Input certain registers
through ALU, store Registers
back in register
• Store
10 11
– Write register to PC IR
memory location
I/O
...
Memory
10
11
...

21
Control Unit
 Control unit: configures the
datapath operations Processor
 Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
“program” ALU
Controller Control
 Instruction cycle – broken into /Status
several sub-operations, each one
clock cycle, e.g.: Registers
 Fetch: Get next instruction into IR
 Decode: Determine what the
instruction means
 Fetch operands: Move data from PC IR R0 R1
memory to datapath register
 Execute: Move data through the
ALU I/O
 Store results: Write data from ...
100 load R0, M[500] Memory
500 10
register to memory 101 inc R1, R0 501
102 store M[501], R1 ...

22
Control Unit Sub-Operations
• Fetch Processor

– Get next Control unit Datapath

ALU
instruction into IR Controller Control
– PC: program /Status

counter, always Registers

points to next
instruction
PC 100 IR R0 R1
load R0, M[500]
– IR: holds the
fetched instruction I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

23
Control Unit Sub-Operations
Decode Processor

Determine what Control unit Datapath

ALU
the instruction Controller Control
means /Status

Registers

PC 100 IR R0 R1
load R0, M[500]

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

24
Control Unit Sub-Operations
Fetch operands Processor

Move data from Control unit Datapath

ALU
memory to Controller Control
datapath register /Status

Registers

10
PC 100 IR R0 R1
load R0, M[500]

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

25
Control Unit Sub-Operations
Execute Processor

Move data through Control unit Datapath

ALU
the ALU Controller Control
/Status
This particular
instruction does Registers

nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

26
Control Unit Sub-Operations
Store results Processor

Write data from Control unit Datapath

ALU
register to memory Controller Control
/Status
This particular
instruction does Registers

nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

27
Instruction
PC=
100
Cycles Processor

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU


clk
s Controller Control
/Status

Registers

10
PC 100 IR R0 R1
load R0, M[500]

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

28
Instruction
PC=100
Cycles Processor

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU


clk
s Controller Control +1
/Status

PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 101 IR R0 R1
inc R1, R0

I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...

29
Instruction
PC=100
Cycles Processor

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU


clk
s Controller Control
/Status

PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 102 IR R0 R1
store M[501], R1

PC=102
Fetch DecodeFetch Exec. Store I/O
ops result ...
100 load R0, M[500] Memory
clk 500 10
s 101 inc R1, R0 501 11
102 store M[501], R1 ...

30
Architectural Considerations
• N-bit processor Processor
– N-bit ALU, Control unit Datapath

registers, buses, ALU


Controller
memory data Control
/Status
interface
– Embedded: 8-bit, 16- Registers

bit, 32-bit common


– Desktop/servers: 32- PC IR

bit, even 64
• PC size determines I/O

address space Memory

31
Architectural Considerations
• Clock frequency Processor

– Inverse of clock Control unit Datapath

ALU
period Controller Control
– Must be longer than /Status

longest register to Registers

register delay in
entire processor
PC IR
– Memory access is
often the longest I/O
Memory

32
Pipelining: Increasing Instruction
Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

non-pipelined dish cleaning Time pipelined dish cleaning Time

Fetch-instr. 1 2 3 4 5 6 7 8

Decode 1 2 3 4 5 6 7 8

Fetch ops. 1 2 3 4 5 6 7 8 Pipelined

Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8

Time
pipelined instruction execution

33
Summary
What is an embedded system?
Characteristics of ES
Classification of ES
Design challenges and Metrics
Architecture of ES

Coming Lecture will cover


Design methodology
System on chip

You might also like