You are on page 1of 17

FPGA-based

Embedded Systems
Jan Madsen

Informatics and Mathematical Modeling


Technical University of Denmark
Richard Petersens Plads, Building 321
DK2800 Lyngby, Denmark

This slideset is based on slides produced by Hans Holten-Lund

Motivation

8051 microcontroller

Technology, 0.25um

6371 cells
320 uW @ 1MHz

6800 lines of SystemC

Logic synthesis 30min


Place&Route 15 min

02131 Embedded Systems 2

02131 Embedded Systems 1


Software/Hardware architecture

pe1
mem
application
private application

shared
private RTOS-APIs
Software
private RTOS
private HW/SW
drivers
Plattform
Cache CPU Timer
Timer
Hardware Bus-
Periphery I/O Int Bus
CTRL
ce1

Courtesy of Rolf Ernst, TU Braunschweigh

02131 Embedded Systems 3

Software
? Concept: separate function from detailed
SW
SW
architecture ______
______
______
______
? Uniform, mature development tools ______
______
? Same binary can run on variety of architectures
? Instruction-stream-based Standard
Profiling
? New architectures can be developed and Compiler
introduced for existing applications
? New application can excute on existing Binary
Binary
architectures, highly flexible
? Trend towards dynamic translation and
optimization of function in mapping to ? Processor1
Processor
architecture
? Processor2
Processor
? Dynamic SW optimizations (1.3x, maybe 2-3x?)
? Processor3
Processor

02131 Embedded Systems 4

02131 Embedded Systems 2


Hardware

?Design flow similar to software SW


HDL
______
______
?Exploits parallelism ______
______
______
______

?data-stream-based
Hardware
Profiling
?“binary” target to a specific Compiler

technology Layout
Binary
?Architecture is fixed, but highly
optimized for the application
ASIC
Processor
?High speed
?Low power

02131 Embedded Systems 6

Software-Hardware Codesign
Profiler
Profiler SW
SW Critical ? Improvements eclipse
______
______ Regions
______
______
those of dynamic software
______
______ methods
? Speedups of 10x to 1000x
SW
SW SW
HW ? Far more potential than
______
______ ______
______ dynamic SW optimizations
______
______ ______
______
______
______ ______
______ (1.3x, maybe 2-3x?)
? Energy reductions of 90%
or more
? Processor ASIC
? Decisions made at
compile/design time
Commonly one chip today
SW Only
? Architecture partly flexible
HW/SW
? Gap between procedural
and structural mind set!
Time Energy

02131 Embedded Systems 7

02131 Embedded Systems 3


Hardware technology
? Hardware is an interconnection of transistors
following one of several possible styles – fabrics Hardware
Processor
? The fabric defines how and when transistors are
composed
? Hardware fabrics differ in terms of customizability
and generality

Custom Semicustom Programmable

02131 Embedded Systems 8

Soft Hardware!
Morphware
Software Configware Hardware
SW
SW SW
HDL
CW SW
HW
______
______ ______
______ ______
______
______
______ ______
______ ______
______
______
______ ______
______ ______
______

Standard
Profiling Hardware
Profiling Hardware
Profiling
Compiler Compiler Compiler

Binary
Binary Binary
Binary Layout
Binary

? Processor
Processor FPGA
Processor ASIC
Processor

Software Morphware Hardware


02131 Embedded Systems 9

02131 Embedded Systems 4


Software-Configware-Morphware
SW
SW
______
______
______
______
______
______

SW
CW SW
SW SW
HW
______
______ ______
______ ______
______
______
______ ______
______ ______
______
______
______ ______
______ ______
______

Morphware ? Processor ASIC

02131 Embedded Systems 10

Example of Morphware

02131 Embedded Systems 11

02131 Embedded Systems 5


What is an FPGA?

?Field Programmable Gate Array


?A practical alternative to Application Specific
Integrated Circuits (ASIC)
?Any digital circuit can be implemented in an
FPGA.
?This includes the CPU.
?Complexity:
?A modern high-end FPGA can hold up to about 20 32-
bit RISC cores, each running at 100 MHz.

02131 Embedded Systems 14

Xilinx FPGA board


FPGA Reset button
Leds Led display

LCD display
02131 Embedded Systems 15

02131 Embedded Systems 6


Why are FPGAs so interesting?
? They allow anybody to build ASIC style systems!
? Specialized/parallel/pipelined data processor? No problem!
? Tough learning curve…
? (Re)configuration:
? In a few seconds the entire chip can be reconfigured, by loading
a new configuration bit-pattern
? Partial self-reconfiguration of a running FPGA is also possible
? Other interesting uses:
? An FPGA is used in the European Mars Express Lander “British
Beagle 2”
?To save weight and power, the FPGA can only control a single
instrument at a time, but reconfigures itself on demand when
another instrument is used
? Emulation of old hardware: Original Pac-Man game on an FPGA
? Building specialized graphics processors for photo-realistic
computer graphics (like in “Finding Nemo”)

02131 Embedded Systems 16

Xilinx Spartan II

02131 Embedded Systems 17

02131 Embedded Systems 7


How does an FPGA work?

?Xilinx Virtex II FPGA


?Logic is implemented with look-up tables (LUTs)
?A 1-bit,16 entry lookup table that can be addressed
using 4 bits (2^4 = 16)
?The 16 configuration bits can be used to
implement one of 2^16 logical functions of 4 input
bits.
?AND, NOR, MUX, Full-adder, etc...
?Table lookup (ROM)

02131 Embedded Systems 18

A Xilinx LUT

?4 input logic function generator


16x1 bit ROM
0
1
1
0
?
.
.
.
Address Data
(4 bit) (1 bit)

02131 Embedded Systems 19

02131 Embedded Systems 8


Extra LUT functions (Xilinx only)

?Dual port 16x1 bit RAM


16 bit shift-register
0
1
1
0
?
.
.
.
Read address Read data
(4 bit) (1 bit)

Write address
(4 bit)

Write data
(1-bit)

02131 Embedded Systems 20

Configurable Logic Block (CLB)

? The lookup-tables (LUT) are organized in slices


? Each slice contains two LUTs and two flip-flops (single bit
registers)
? Each CLB contains 4 slices
? So each CLB can contain an 8 bit register plus an 8 bit logic
function.
? To support fast addition, the CLBs can be chained
together with dedicated fast carry-chains.
? Two on-chip tri-state bus drivers.

02131 Embedded Systems 21

02131 Embedded Systems 9


A Xilinx Slice (2 LUTs and 2 FFs)

02131 Embedded Systems 22

A Xilinx Virtex II CLB element

02131 Embedded Systems 24

02131 Embedded Systems 10


Routing network resources

02131 Embedded Systems 26

Size of the FPGA

02131 Embedded Systems 27

02131 Embedded Systems 11


Routing resources

02131 Embedded Systems 28

On-chip BlockRAM and Multipliers

02131 Embedded Systems 29

02131 Embedded Systems 12


FPGA resources summary:

? CLB (Configurable Logic Block)


? Contains 8 LUTs (Function generator/RAM/Shifter)
? and 8 Registers (8 flip flops)
? IOB (Input / Output Block)
? Configurable voltages/currents
? DCM (Digital Clock Manager)
? Clock mirroring/phase shifting/multiplication/division etc.
? BlockRAM (18 kbit on-chip dual-ported RAM)
? Multiplier (18x18 bit dedicated multiplier)
? Routing network
? Connects everything!

02131 Embedded Systems 30

Virtex II Pro:
Embedded hard-core PowerPC CPU

02131 Embedded Systems 33

02131 Embedded Systems 13


Practical issues with FPGAs:

? Cost issues:
? At low volume, very low cost compared to ASICs.
? At high volume, expensive!
? Speed issues:
? FPGAs are rapidly catching up with standard-cell based ASICs,
there is only a factor 5 in clock speed difference today. The gap is
getting smaller for each generation (mass-production).
? Area issues:
? Major concern, as most of the FPGA’s chip-area is used for the
reconfiguration network. => low silicon area utilization.

02131 Embedded Systems 34

How to program the FPGA?


C/C++/Java Netlist (VHDL) bit-pattern
void UnitControl() 001010100101101
{ 101011101101010
up = down = 0; open = 1; 001010011101101
while (1) { 110101001010011
while (req == floor); 101010101010001
open = 0; 111101010111101
if (req > floor) { up = 1;} 010111101101010
else {down = 1;}
while (req != floor);
open = 1;
delay(10);
}
}
}

controller datapath

FSMD (GEZEL) place & route

02131 Embedded Systems 36

02131 Embedded Systems 14


Tools
C/C++/Java Netlist (VHDL) bit-pattern
void UnitControl() 001010100101101
{ 101011101101010
up = down = 0; open = 1;
while (1) {
001010011101101
110101001010011 Xilinx
while (req == floor); 101010101010001
open = 0;
if (req > floor) { up = 1;}
111101010111101
010111101101010
ISE
else {down = 1;}
while (req != floor);
open = 1;
Manual
delay(10); Xilinx Xilinx
}
}

or GEZEL ISE ISE


}

Zebra
controller datapath

FSMD (GEZEL) place & route

02131 Embedded Systems 37

Future?

?Having an FPGA based system allows:


?Addition of new hardware features
?Fix hardware bugs “in the field”
?Flexible architecture, can be tuned to match the
application
?FPGAs have revolutionized the way we think
about hardware, by introducing “soft” hardware
?Like with software, there is also an opensource
hardware movement (www.opencores.org)
?The border between software and hardware has been
blurred
?What
02131 Embedded is the difference between hardware and
Systems 38

software on an FPGA?

02131 Embedded Systems 15


Dynamic partitioning

SW
Binary ?Can we dynamically move
software kernels to FPGA?
Traditional
Standard
partitioning Profiling
Compiler
? Enabler – binary-level partitioning
done here and synthesis
Binary
Binary
? Partition and synthesize starting from
SW binary
Binary ? Advantages
Partitioner
? Any compiler, any language, multiple
sources, assembly/object support,
Modified Netlist
Netlist legacy code support
Binary
? Disadvantage
? Loses high-level information
? Processor Morphware
Processor
?Quality loss?

02131 Embedded Systems 40

Dynamic partitioning

? Dynamic HW/SW Partitioning


SW
Binary ? Embed partitioning CAD tools on-chip
? Feasible in era of billion-transistor chips
Standard
Profiling
? Advantages
Compiler ? No special desktop tools
? Completely transparent
Binary
Binary ? Avoid complexities of supporting different
FPGA types
? Complements other approaches
? Desktop CAD best from purely technical
CAD
perspective
? Dynamic opens additional market
? Proc. Morph. segments (i.e., all software developers)
that otherwise might not use desktop CAD

02131 Embedded Systems 41

02131 Embedded Systems 16


Warp Processors
2
Profile application to
determine critical
regions
1
Initially execute
application in
software only
Profiler
3Partition critical
I$
µP regions to hardware
5Partitioned application D$

executes faster with


lower energy Warp Config.
Dynamic
consumption (speed Logic Part. Module
Architecture (DPM)
has been “warped”)

4Program configurable
SW Only logic & update software
HW/SW
binary

02131 Embedded Systems 42


Time Energy

Does it work?

Profiler
I$
I$ ARM7
ARM D$
D$

Config. Xilinx Virtex-E


Logic Arch. DPM FPGA

5 100%
Warp Proc. Warp Proc.
Xilinx Virtex-E Xilinx Virtex-E
4 80%
Energy Reduction

3 60%
Speedup

2 40%

1 20%

0 0%
log rl
log rl
g3 v

g3 v
g3 2

g3 2
2
bit r

bit r
w
21

21
2

in

in
m prk

G.

m prk

G.
w

np

np
01

01
k

k
d

d
u

u
idc 1

idc 1
bre

bre
fax

fax
fax
fax

oo

oo
tflo
nr

nr
m

m
ix0

ix0
tflo

AV

AV
g7

g7
m

m
trn

trn
tts

tts
ca

ca
tbl

tbl
atr

atr
pk
pk

02131 Embedded Systems 43

02131 Embedded Systems 17

You might also like