Professional Documents
Culture Documents
Cpre 588 Embedded Computer Systems
Cpre 588 Embedded Computer Systems
I/O
MEMORY PROCESSOR
ACTUATORS
SENSORS
HARDWIRED UNIT
• Application-specific logic
• Timers
• A/D and D/A conversion
ENVIRONMENT
Jan 13-15, 2009 CprE 588 – Embedded Computer Systems Lect-01.10
Parts of an Embedded System (cont.)
• Actuators - mechanical components (e.g.,
valve)
• Sensors - input data (e.g., accelerometer for
airbag control)
• Data conversion, storage, processing
• Decision-making
lens
• Common metrics
• Unit cost: the monetary cost of manufacturing each
copy of the system, excluding NRE cost
• NRE cost (Non-Recurring Engineering
cost): The one-time monetary cost of designing the
system
• Size: the physical space required by the system
• Performance: the execution time or throughput of
the system
• Power: the amount of power consumed by the
system
• Flexibility: the ability to change the functionality of
the system without incurring heavy NRE cost
Jan 13-15, 2009 CprE 588 – Embedded Computer Systems Lect-01.16
Design Challenge – Optimization (cont.)
Market
On-time penetration
Market
rise
fall • Triangle area equals
Delayed
revenue
• Loss
D W 2W
• The difference between
On-time Delayed Time
entry entry the on-time and delayed
triangle areas
• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300
$200,000 $200
A A
B B
$160,000 $160
C C
to ta l c o st (x1000)
p e r p ro d uc t c o st
$120,000 $120
$80,000 $80
$40,000 $40
$0 $0
0 800 1600 2400 0 800 1600 2400
Num b e r o f units (vo lu m e ) Num b e r o f units (vo lu m e )
total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)
total = 0
for i = 1 to N loop
total += M[i]
end loop
Desired
functionality
• User benefits
Program Data
• Low time-to-market and NRE costs memory memory
• High flexibility Assembly code
for:
• “Intel/AMD” the most well-known, but
total = 0
there are hundreds of others for i =1 to …
• Features Data
memory
Program
• Program memory memory
Assembly code
• Optimized datapath for:
• Basic tradeoff
• General vs. custom
• With respect to processor technology or IC technology
• The two technologies are independent
General- Single-
purpose ASIP purpose
General, processor processor Customized,
providing improved: providing improved:
To final implementation
100,000
10,000
1,000
100
(K) Trans./Staff – Mo.
Productivity
10
0.1
0.01
2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981
10,000 100,000
1,000 10,000
Logic transistors
(K) Trans./Staff-Mo.
(in millions)
100 1000
per chip
Productivity
10 Gap 100
IC capacity
1 10
0.1 1
productivity
0.01 0.1
0.001 0.01
2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981
10,000 100,000
Logic transistors
(K) Trans./Staff-Mo.
1,000 10,000
(in millions)
100 1000
per chip
Productivity
10 Gap 100
IC capacity
1 10
0.1 1
productivity
0.01 0.1
0.001 0.01
2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981
Team
• 1M transistors, 1 60000 16 15 16
designer=5000 trans/month 50000 19 18
• Each additional designer 40000 24 23
reduces for 100 trans/month 30000
Months until completion
• So 2 designers produce 20000 43
10000 Individual
4900 trans/month each
0 10 20 30 40
Number of designers
B0: top
shared sync
behavior
• integer read child
write
variable behavior
• boolean
variable
sync
Graphical
representation:
• Hierarchy Behaviors
• Concurrency • Sequential: B1, B2, B3
• Transitions • Concurrent: B4, B5
between • Atomic: B1
behaviors • Composite: B2
Jan 13-15, 2009 CprE 588 – Embedded Computer Systems Lect-01.41
System Specification Example (cont.)
• B6 computes a
value
• B4 consumes the
value sync
• Synchronization
is needed: B4
waits until B6
produces the value
B6( ) B4( )
{ {
int local; int local;
… wait(sync);
shared = local + 1; local = shared - 1;
signal(sync); ...
} }
PE1 PE2
system bus
PE0 PE1
B4_done
Scheduling
decision:
• Sequential
B6_start
ordering of
behaviors
sync
on PE0, PE1
• Synchronization
B3_start
to maintain
partial order
across Pes
• Optimization - no
control behaviors
System model after static scheduling
B6( ) B4( )
{ {
int local; int local;
wait(B6_start); wait(sync);
… local = shared - 1;
shared = local + 1; …
signal(sync); signal(B3_start);
} }
Jan 13-15, 2009 CprE 588 – Embedded Computer Systems Lect-01.52
Communication Synthesis
• Implements the shared-variable accesses
between concurrent behaviors using an inter-
PE communication scheme
• Shared memory: read or write to a shared-
memory address
• Local PE memory: send or receive message-
passing calls
• Inserts interfaces to communication channels
(local or system buses)
decision:
PE0 PE1
• Put all
global B6
Shared_mem
B1
variables
Arbiter
into B7
Shared_mem
B4
• New global B3
variables in Top
B6( ) B4( )
{ {
int local; int local;
wait (*B6_start_addr); wait (*sync_addr);
… local = *shared_addr - 1;
*shared_addr = local + 1; …
signal(*sync_addr); signal (*B3_start_addr);
} }
Shared_mem( )
{
int shared;
bool sync;
bool B3_start;
bool B6_start;
}