You are on page 1of 56

# VHDL Coding

Exercise 4: FIR Filter
Where to start?
Algorithm Architecture
RTL-
Block diagram
VHDL-Code
Designspace
Exploration
Feedback
Optimization
Algorithm
• High-Level System Diagram
Context of the design
 Inputs and Outputs
 Throughput/rates
 Algorithmic requirements

• Algorithm Description
Mathematical Description
Performance Criteria
 Accuracy
 Optimization constraints
Implementation constraints
 Area
 Speed
   

 
N
i
i
i k x b k y
0
FIR
  k y   k x
Architecture (1)
• Isomorphic Architecture:
Straight forward implementation of the algorithm
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
Architecture (2)
• Pipelining/Retiming:
Improve timing
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
Insert register(s) at the inputs or outputs
 Increases Latency

Architecture (2)
• Pipelining/Retiming:
Improve timing
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
Insert register(s) at the inputs or outputs
 Increases Latency
Perform Retiming:
 Move registers through the logic
without changing functionality

Forward:
Backwards:
Architecture (2)
• Pipelining/Retiming:
Improve timing

0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
Insert register(s) at the inputs or outputs
 Increases Latency
Perform Retiming:
 Move registers through the logic
without changing functionality

Forward:
Backwards:
Architecture (2)
• Pipelining/Retiming:
Improve timing

0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
Insert register(s) at the inputs or outputs
 Increases Latency
Perform Retiming:
 Move registers through the logic
without changing functionality

Forward:
Backwards:
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x

Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x

Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (3)
• Retiming and simple transformation:
Optimization
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Perform Retiming

  k y
Architecture (4)
• More pipelining:
Add one pipelining stage to the retimed circuit
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
The longest path is given by the multiplier
 Unbalanced: The delay from input to the first pipeline stage is
much longer than the delay from the first to the second stage

  k y
Architecture (5)
• More pipelining:
Add one pipelining stage to the retimed circuit
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k x
Move the pipeline registers into the multiplier:
 Paths between pipeline stages are balanced
 Improved timing
Tclock = (Tadd + Tmult)/2 + Treg

  k y
Architecture (6)
• Iterative Decomposition:
Reuse Hardware
Identify regularity and reusable hardware components
 multiplexers
 storage elements
 Control
Increases Cycles/Sample
0
b
1
b
2
b
2  N
b
1  N
b
N
b
  k y
  k x
  k x
0
b
N
b
0
  k y
RTL-Design
• Choose an architecture under the following constraints:
It meets ALL timing specifications/constraints:
 Throughput
 Latency
It consumes the smallest possible area
It requires the least possible amount of power

• Decide which additional functions are needed and
how they can be implemented efficiently:
Storage of samples x(k) => MEMORY
Storage of coefficients b
i
=> LUT
Address generators for MEMORY and LUT
=> COUNTERS
Control => FSM
Iterative
Decomposition
  k x
0
b
N
b
0
  k y
RTL-Design
• RTL Block-diagram:
Datapath
   

 
N
i
i
i k x b k y
0
• FSM:
Interface protocols
datapath control:
  k x
0
b
N
b
0
  k y
RTL-Design
• How it works:
 IDLE
 Wait for new sample

   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:

   

 
N
i
i
i k x b k y
0
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:

 Store result to output register
   

 
N
i
i
i k x b k y
0
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:

 Store result to output register
 DATA OUT:
 Output result
   

 
N
i
i
i k x b k y
0
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:

 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
   

 
N
i
i
i k x b k y
0
   

 
N
i
i
i k x b k y
0
RTL-Design
• How it works:
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:

 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
 IDLE: …
   

 
N
i
i
i k x b k y
0
   

 
N
i
i
i k x b k y
0
Translation into VHDL
• Some basic VHDL building blocks:
Signal Assignments:
 Outside a process:

 Within a process (sequential execution):

AxD
YxD
AxD
YxD
BxD
• Sequential execution
• The last assignment is
kept when the process
terminates
AxD
YxD
BxD
• This is NOT allowed !!!
Translation into VHDL
• Some basic VHDL building blocks:
Multiplexer:

Conditional Statements:
AxD
BxD YxD
SELxS
CxD
Default
Assignment
AxD
BxD
SelAxS
CxD
DxD
OUTxD
SelBxS
STATExDP
Translation into VHDL
• Common mistakes with conditional statements:
Example:
AxD
??
SelAxS
BxD
??
OUTxD
SelBxS
STATExDP
• NO default assignment
• NO else statement
• ASSIGNING NOTHING TO A SIGNAL IS NOT A
WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL
• Some basic VHDL building blocks:
Register:

Register with ENABLE:
DataREGxDN DataREGxDP
DataREGxDN DataREGxDP
DataREGxDN
DataREGxDP
Translation into VHDL
• Common mistakes with sequential processes:
DataREGxDN DataREGxDP
CLKxCI
DataRegENxS
DataREGxDN DataREGxDP
CLKxCI
DataRegENxS
DataREGxDN DataREGxDP
0
1
• Can not be translated
into hardware and is
NOT allowed
• Clocks are NEVER
generated within
any logic
• Gated clocks are more
complicated then this
• Avoid them !!!
Translation into VHDL
• Some basic rules:
Sequential processes (FlipFlops)
 Only CLOCK and RESET in the sensitivity list
 Logic signals are NEVER used as clock signals
Combinatorial processes
 Multiple assignments to the same signal are ONLY possible within
the same process => ONLY the last assignment is valid
 Something must be assigned to each signal in any case OR
There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:
Use separate signals for the PRESENT state and the
NEXT state of every FlipFlop in your design.
Use variables ONLY to store intermediate results or even
avoid them whenever possible in an RTL design.

Translation into VHDL
• Write the ENTITY definition of your design to specify:
Inputs, Outputs and Generics
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:

Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Register with ENABLE
Register with ENABLE
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Register with CLEAR
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Counter
Counter
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
MEALY
Translation into VHDL
• Complete and check the code:
Declare the signals and components

Check and complete the sensitivity lists of ALL combinatorial
processes with ALL signals that are:
 used as condition in any IF or CASE statement
 being assigned to any other signal
 used in any operation with any other signal

Check the sensitivity lists of ALL sequential processes that they
 contain ONLY one global clock and one global async. reset signal
 no other signals
Other Good Ideas
• Keep things simple
• Partition the design (Divide et Impera):
Example:
Start processing the next sample, while the previous
result is waiting in the output register:
 Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop
• Do not try to save cycles if not necessary
• VHDL code
Is usually long and that is good !!
Is just a representation of your block diagram
Does not mind hierarchy