Lect3 - Design Metrics

Design challenge – optimizing
design metrics
• Design metric
– A measurable feature of a system’s implementation
– Optimizing design metrics is a key challenge
4
design metrics
Common metrics
Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
NRE cost (Non-Recurring Engineering cost): The one-time
monetary cost of designing the system
Size: the physical space required by the system
Performance: the execution time or throughput of the system
Power: the amount of power consumed by the system
Flexibility: the ability to change the functionality of the system
without incurring heavy NRE cost
5
design metrics
Common metrics (continued)
Time-to-prototype: the time needed to build a working version of
the system
Time-to-market: the time required to develop a system to the point
that it can be released and sold to customers
Maintainability: the ability to modify the system after its initial
release
Correctness, safety, many more
6
Design metric competition
improving one may worsen others
Expertise with both
software and hardware is
Power
needed to optimize design
metrics
 Not just a hardware or
Performance Size
software expert, as is
common
 A designer must be
NRE cost
comfortable with various
technologies in order to
choose the best for a given
application and constraints
7
Time-to-market: a demanding
design metric Time required to develop
a product to the point it
can be sold to customers
Market window
 Period during which the
product would have highest
Revenues ($)
sales
Average time-to-market
Time (months)
constraint is about 8
months
Delays can be costly
8
Losses due to delayed market
entry
Simplified revenue model
Peak revenue
 Product life = 2W, peak at W
Peak revenue from  Time of market entry defines
delayed entry
a triangle, representing
Revenues ($)
On-time
Market rise Market fall market penetration

Delayed  Triangle area equals revenue
Loss
D W 2W  The difference between the
On-time Delayed Time on-time and delayed triangle
entry entry
areas
9
Losses due to delayed market
entry (cont.)  Area = 1/2 * base * height
Peak revenue
 On-time = 1/2 * 2W * W
 Delayed = 1/2 * (W-D+W)*(W-D)
Peak revenue from
delayed entry
Percentage revenue loss =
Revenues ($)
On-time
Market rise Market fall (D(3W-D)/2W2)*100%

Delayed Try some examples
– Lifetime 2W=52 wks, delay D=4
D W 2W wks
On-time Delayed Time – (4*(3*26 –4)/2*26^2) = 22%
entry entry – Lifetime 2W=52 wks, delay D=10
wks
– (10*(3*26 –10)/2*26^2) = 50%
– Delays are costly!
10
NRE
Costs: and unit cost metrics
•
– Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-time monetary cost
of designing the system
– total cost = NRE cost + unit cost * # of units
– per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost
• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit
11
The performance design metric
Widely-used measure of system, widely-abused
 Clock frequency, instructions per second – not good measures
 Digital camera example – a user cares about how fast it processes images,
not clock speed or instructions per second
Latency (response time)
 Time between task start and end
 e.g., Camera’s A and B process images in 0.25 seconds
Throughput
 Tasks per second, e.g. Camera A processes 4 images per second
 Throughput can be more than latency seems to imply due to concurrency,
e.g. Camera B may process 8 images per second (by capturing a new image
while previous image is being stored).
Speedup of B over S = B’s performance / A’s performance
 Throughput speedup = 8/4 = 2
12
Three key embedded system
technologies
Technology
A manner of accomplishing a task, especially using
technical processes, methods, or knowledge
Three key technologies for embedded systems
Processor technology
IC technology
Design technology
13
Processor technology
The architecture of the computation engine used to
implement a system’s desired functionality
Processor does not have to be programmable
 “Processor” not equal to general-purpose processor
Controller Datapath Controller Datapath Controller Datapath
Control index
Control Register Control logic Registers
logic
logic and file and State total
State register State
Custom +
register register
ALU
General
IR PC ALU IR PC
Data Data
memory memory
Program Data Program
memory memory memory
Assembly code Assembly code
for: for:
total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)
14
Processor technology
 Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N loop
total += M[i]
Desired end loop
functionality
General- Application-specific Single-

purpose processor purpose
processor processor
15
General-purpose processors
Programmable device used in a variety
Controller Datapath
of applications
 Also known as “microprocessor” Control Register
logic and file
Features State
register
 Program memory General
 General datapath with large register file IR PC ALU
and general ALU

User benefits Program Data
memory memory
 Low time-to-market and NRE costs
Assembly code
 High flexibility for:
“Pentium” the most well-known, but total = 0

for i =1 to …
there are hundreds of others
16
Single-purpose processors
Digital circuit designed to execute
Controller Datapath
exactly one program Control index
 a.k.a. coprocessor, accelerator or peripheral logic
total
Features State
+
register
 Contains only the components needed to
execute a single program Data
 No program memory memory
Benefits
 Fast
 Low power
 Small size
17
Application-specific processors
• Programmable processor optimized for Controller Datapath
a particular class of applications having Control Registers
common characteristics logic and
State
– Compromise between general-purpose and register
Custom
single-purpose processors ALU
IR PC
• Features
Data
– Program memory
Program memory
– Optimized datapath memory
– Special functional units Assembly code
for:
• Benefits
total = 0
– Some flexibility, good performance, size and for i =1 to …
power
18
Architectures
 We must be clear about the architecture that we are going to use for design of
ES
 It has also got a wide variety of choices, to be chosen according to the given
application.
 The choices are as follows
 Application-specific Architecture :-
- Controller Architecture
- Datapath Architecture
- Finite state machine with datapath
 General Purpose Architecture :-

- CISC
- RISC
- Vector machine
- VLIW ( Very Long Instruction Word Computer )
Basic Architecture
Control unit and Processor
datapath Control unit Datapath
 Note similarity to ALU

Controller Control
single-purpose /Status
processor
Registers
Key differences
 Datapath is general
 Control unit doesn’t PC IR
store the algorithm –
the algorithm is
I/O
“programmed” into the
Memory
memory
20
Datapath Operations
Load
Processor
 Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control +1
/Status
– Input certain registers
through ALU, store Registers
back in register
• Store
10 11
– Write register to PC IR
memory location
I/O
...
Memory
10
11
...
21
Control Unit
 Control unit: configures the
datapath operations Processor
 Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
“program” ALU
Controller Control
 Instruction cycle – broken into /Status
several sub-operations, each one
clock cycle, e.g.: Registers
 Fetch: Get next instruction into IR
 Decode: Determine what the
instruction means
 Fetch operands: Move data from PC IR R0 R1
memory to datapath register
 Execute: Move data through the
ALU I/O
 Store results: Write data from ...
100 load R0, M[500] Memory
500 10
register to memory 101 inc R1, R0 501
102 store M[501], R1 ...
22
Control Unit Sub-Operations
• Fetch Processor
– Get next Control unit Datapath
ALU
instruction into IR Controller Control
– PC: program /Status
counter, always Registers
points to next
instruction
PC 100 IR R0 R1
load R0, M[500]
– IR: holds the
fetched instruction I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
23
Decode Processor
Determine what Control unit Datapath
ALU
the instruction Controller Control
means /Status
Registers
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
24
Fetch operands Processor
Move data from Control unit Datapath
ALU
memory to Controller Control
datapath register /Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
25
Execute Processor
Move data through Control unit Datapath
ALU
the ALU Controller Control
/Status
This particular
instruction does Registers
nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
26
Store results Processor
Write data from Control unit Datapath
ALU
register to memory Controller Control
/Status
This particular
instruction does Registers
nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
27
Instruction
PC=
100
Cycles Processor
Fetch DecodeFetch Exec. Store Control unit Datapath
ops result ALU

clk
s Controller Control
/Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
28
Instruction
PC=100
Cycles Processor
ops result ALU

clk
s Controller Control +1
/Status
PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 101 IR R0 R1
inc R1, R0
I/O
...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
29
Instruction
PC=100
Cycles Processor
ops result ALU

clk
s Controller Control
/Status
PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch DecodeFetch Exec. Store I/O
ops result ...
clk 500 10
s 101 inc R1, R0 501 11
102 store M[501], R1 ...
30
Architectural Considerations
• N-bit processor Processor
– N-bit ALU, Control unit Datapath
registers, buses, ALU

Controller
memory data Control
/Status
interface
– Embedded: 8-bit, 16- Registers
bit, 32-bit common

– Desktop/servers: 32- PC IR
bit, even 64
• PC size determines I/O
address space Memory
31
Architectural Considerations
• Clock frequency Processor
– Inverse of clock Control unit Datapath
ALU
period Controller Control
– Must be longer than /Status
longest register to Registers
register delay in
entire processor
PC IR
– Memory access is
often the longest I/O
Memory
32
Pipelining: Increasing Instruction
Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
non-pipelined dish cleaning Time pipelined dish cleaning Time
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
33
Summary
What is an embedded system?
Characteristics of ES
Classification of ES
Design challenges and Metrics
Architecture of ES
Coming Lecture will cover

Design methodology
System on chip

Lect3 - Design Metrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect3 - Design Metrics

Uploaded by

Copyright:

Available Formats

Design challenge – optimizing

Market rise Market fall market penetration

Market rise Market fall (D(3W-D)/2W2)*100%

General- Application-specific Single-

and general ALU

“Pentium” the most well-known, but total = 0

 General Purpose Architecture :-

 Note similarity to ALU

– Get next Control unit Datapath

counter, always Registers

Determine what Control unit Datapath

Move data from Control unit Datapath

Move data through Control unit Datapath

Write data from Control unit Datapath

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU

Fetch DecodeFetch Exec. Store Control unit Datapath

ops result ALU

registers, buses, ALU

bit, 32-bit common

address space Memory

– Inverse of clock Control unit Datapath

longest register to Registers

non-pipelined dish cleaning Time pipelined dish cleaning Time

Fetch ops. 1 2 3 4 5 6 7 8 Pipelined

Coming Lecture will cover

You might also like