You are on page 1of 56

VHDL Coding

Exercise 4: FIR Filter


Where to start?
Designspace
Feedback
Exploration

Algorithm Architecture

Optimization

RTL-
VHDL-Code
Block diagram
Algorithm
• High-Level System Diagram
 Context of the design
 Inputs and Outputs
 Throughput/rates
 Algorithmic requirements

• Algorithm Description
y ( k ) = ∑ bi x( k − i )
N
 Mathematical Description
 Performance Criteria
i =0
x( k ) y( k )
 Accuracy
 Optimization constraints
FIR
 Implementation constraints
 Area
 Speed
Architecture (1)
• Isomorphic Architecture:
 Straight forward implementation of the algorithm

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Insert register(s) at the inputs or outputs
 Increases Latency
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality
Forward:
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality
Forward:
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality
Forward:
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Reverse the adder chain
 Perform Retiming
Architecture (4)
• More pipelining:
 Add one pipelining stage to the retimed circuit

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 The longest path is given by the multiplier
 Unbalanced: The delay from input to the first pipeline
stage is
much longer than the delay from the first to the second
stage
Architecture (5)
• More pipelining:
 Add one pipelining stage to the retimed circuit

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Move the pipeline registers into the multiplier:
 Paths between pipeline stages are balanced
 Improved timing
 Tclock = (Tadd + Tmult)/2 + Treg
Architecture (6)
• Iterative Decomposition:
 Reuse Hardware

x( k )

b0 b1 b2 bN −2 bN −1 bN

y( k )
 Identify regularity and reusable hardware components
 Add control
x( k )
 multiplexers
 storage elements
 Control
0
 Increases Cycles/Sample
b0 y( k )
bN
RTL-Design
• Choose an architecture under the following constraints:
 It meets ALL timing specifications/constraints:
 Throughput
 Latency Iterative
 It consumes the smallest possible area Decomposition
 It requires the least possible amount of power

• Decide which additional functions are needed and


how they can be implemented efficiently:
 Storage of samples x(k) => MEMORY
 Storage of coefficients bi => LUT
x( k )
 Address generators for MEMORY and LUT
=> COUNTERS
 Control => FSM 0
b0 y( k )
bN
RTL-Design
• RTL Block-diagram:
 Datapathy ( k ) = ∑ bi x( k − i )
N

i =0
x( k )

0
b0 y( k )
bN

• FSM:
 Interface protocols
datapath control:
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
y ( k ) = ∑ bi x( k − i )
N

i =0
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
( ) ( )
N
 y k = ∑ bi x k − i
i =0
 Store result to output register
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
( ) ( )
N
 y k = ∑ bi x k − i
i =0
 Store result to output register
 DATA OUT:
 Output result
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
( ) ( )
N

 y k = ∑ bi x k − i
i =0
 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
RTL-Design
• How it works:y ( k ) = ∑ bi x( k − i )
N

i =0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
( ) ( )
N

 y k = ∑ bi x k − i
i =0
 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
 IDLE: …
Translation into VHDL
• Some basic VHDL building blocks:
 Signal Assignments:
 Outside a process:
AxD YxD

• This is NOT allowed !!!


AxD YxD
BxD

 Within a process (sequential execution):


AxD • Sequential execution
YxD • The last assignment is
BxD
kept when the process
terminates
Translation into VHDL
• Some basic VHDL building blocks:
 Multiplexer:
AxD
BxD YxD
CxD Default
SELxS Assignment
 Conditional Statements:
AxD

BxD

SelAxS OUTxD

CxD

DxD

SelBxS

STATExDP
Translation into VHDL
• Common mistakes with conditional statements:
 Example:
AxD

??
• NO default assignment
SelAxS OUTxD

BxD

?? • NO else statement

SelBxS

STATExDP

• ASSIGNING NOTHING TO A SIGNAL IS NOT A


WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL
• Some basic VHDL building blocks:
 Register:
DataREGxDN DataREGxDP

 Register with ENABLE:


DataREGxDN DataREGxDP

DataREGxDN DataREGxDP
Translation into VHDL
• Common mistakes with sequential processes:
DataREGxDN DataREGxDP

CLKxCI

DataRegENxS
• Can not be translated
into hardware and is
NOT allowed

DataREGxDN DataREGxDP

0
1
• Clocks are NEVER
generated within
any logic

DataREGxDN DataREGxDP

CLKxCI
• Gated clocks are more
complicated then this
• Avoid them !!!
DataRegENxS
Translation into VHDL
• Some basic rules:
 Sequential processes (FlipFlops)
 Only CLOCK and RESET in the sensitivity list
 Logic signals are NEVER used as clock signals
 Combinatorial processes
 Multiple assignments to the same signal are ONLY possible within the
same process => ONLY the last assignment is valid
 Something must be assigned to each signal in any case OR
There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:
 Use separate signals for the PRESENT state and the
NEXT state of every FlipFlop in your design.
 Use variables ONLY to store intermediate results or even
avoid them whenever possible in an RTL design.
Translation into VHDL
• Write the ENTITY definition of your design to
specify:
 Inputs, Outputs and Generics
Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:

Register with ENABLE

Register with ENABLE


Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:

Register with CLEAR


Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:

Counter

Counter
Translation into VHDL
• Describe the functional units in your block
diagram
one after another in the architecture section:
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process
MEALY
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process

MEALY
Translation into VHDL
• The FSM is described with one sequential
process
and one combinatorial process

MEALY
Translation into VHDL
• Complete and check the code:
 Declare the signals and components

 Check and complete the sensitivity lists of ALL


combinatorial processes with ALL signals that are:
 used as condition in any IF or CASE statement
 being assigned to any other signal
 used in any operation with any other signal

 Check the sensitivity lists of ALL sequential processes


that they
 contain ONLY one global clock and one global async. reset
signal
 no other signals
Other Good Ideas
• Keep things simple
• Partition the design (Divide et Impera):
 Example:
Start processing the next sample, while the previous
result is waiting in the output register:
 Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop
• Do not try to save cycles if not necessary
• VHDL code
 Is usually long and that is good !!
 Is just a representation of your block diagram
 Does not mind hierarchy