You are on page 1of 7

Home Improving timing for FIFO by adding registers

CV
Every now and then we find that the design's FIFO signals in the critical paths. There are several
CV reasons why a FIFO may be involved:
(hebrew)
FPGA The FIFO's rd_en signal is used in combinatorical logic within the FIFO, which calculates the
next read address, which may be time-consuming.
Editor
The driver of the FIFO's rd_en is commonly a not-so-trivial combinatoric function in our own
Free design. Many times this function consists of state registers and several flags, coming from
software different (physical) parts of our design, resulting in a difficult timing problem.

Lectures There is an inherent delay between the rd_clk and the actual appearance of data at the
FIFO's output, which (usually) results from the RAM's delay from clock to output. This creates
Perlilog an unfavorable start for the timing budget, which may turn into the critical path, if the FIFO's
output is used further in combinatorical logic (such as determining the next state of some
Eobj state machine).
frandom Physical distance between the FIFO and its data sink: On large devices, the data source and
data sink may be physically far away, causing a placement problem.
cdepend
hitec Problems may also occur in the write-related logic, but the solution is pretty trivial: Delay both data
(typically "din") and the write enable signal (typically "wr_en") by the same number of write clocks
(LaTeX) ("wr_clk"?) with registers, and you're done. Ah, of course, you may want to modify your "FIFO is
easyspec soon full" watermark ("almost_full" or "prog_full") to enlarge the safety margin, since you're going
to keep on writing a few extra clock cylces after it goes high.
peakdet
So, let's stick to the real problem: We want the rd_en signal to be sampled by a flip-flop in the
FIFO tricks FIFO as soon as possible, and we want a low clock-to-output on the data lines. It's clear that the
answer is adding registers, but there's some logic to add as well...
CDCE906
The Verilog code for a FIFO wrapper is given below, and is released to the public domain. This
Optical
means that I don't claim any copyrights for it.
simulator
How to use it: This is a functional drop-in wrapper for a typical FIFO (such as the one generated
HTML by Xilinx' Fifo Generator). This means that if you wrap your existing FIFO with this module, your
highlighting design will functionally "do the same", with possibly better static timing. But you can not expect
identical simulations, because the data appears slightly later on the FIFO's output. A non-buggy
Hobbies and fairly sensible design should not care about such a difference.
Tech blog
Assuming that you have a non-FWFT FIFO to begin with, with a pretty standard read interface, you
should instantiate it in the module below as "orig_fifo", and set the "width" parameter to the
number of data bits you have. If you need some other control signal, pass them though with care.
For example, "almost empty" may ring the bell slightly too soon. Signals regarding the write path
can be passed on as is, since the write path is not touched by this wrapper.

The term "FWFT" is explained below, in case you're not familiar with it.

So here's the module. It's explained in detail after the listing.

module reg_fifo(rst,
rd_clk, rd_en, dout, empty,
wr_clk, wr_en, din, full, prog_full);

// Eli Billauer, 12.4.06


// This module is released to the public domain; Any use is allowed.

parameter width = 8;

input rst;
input rd_clk;
input rd_en;
input wr_clk;
input wr_en;
input [(width-1):0] din;
output empty;
output full;
output prog_full;
output [(width-1):0] dout;

reg fifo_valid, middle_valid;


reg [(width-1):0] dout, middle_dout;

wire [(width-1):0] fifo_dout;


wire fifo_empty, fifo_rd_en;
wire will_update_middle, will_update_dout;

// orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO


fifo orig_fifo
(
.rst(rst),
.rd_clk(rd_clk),
.rd_en(fifo_rd_en),
.dout(fifo_dout),
.empty(fifo_empty),
.wr_clk(wr_clk),
.wr_en(wr_en),
.din(din),
.full(full),
.prog_full(prog_full)
);

assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);


assign will_update_dout = (middle_valid || fifo_valid) && rd_en;
assign fifo_rd_en = (!fifo_empty) && !(middle_valid && fifo_valid);
assign empty = !(fifo_valid || middle_valid);

always @(posedge rd_clk)


if (rst)
begin
fifo_valid <= 0;
middle_valid <= 0;
dout <= 0;
middle_dout <= 0;
end
else
begin
if (will_update_middle)
middle_dout <= fifo_dout;

if (will_update_dout)
dout <= middle_valid ? middle_dout : fifo_dout;

if (fifo_rd_en)
fifo_valid <= 1;
else if (will_update_middle || will_update_dout)
fifo_valid <= 0;

if (will_update_middle)
middle_valid <= 1;
else if (will_update_dout)
middle_valid <= 0;
end
endmodule

OK, let's see how this thing works.

The first thing to realize, is that if we want that the FIFO's rd_en will be detached from the original
rd_en, two registers are necessary. Consider the following drawing:
The FIFO's output is marked as fifo_dout. This modules output is dout. Now suppose that we didn't
have the middle register. We could activate the original FIFO's rd_en so that the first data entry
appears on fifo_dout, and hold our "empty" signal low, telling the application to come and get data.
But whenever the application came to get some, it would raise the rd_en. The wrapper would then
sample fifo_dout into dout, and all is fine so far.

But now what? Since we don't want the FIFO's fifo_rd_en to combinatorically depend on the
application's rd_en, there is no possibility to update fifo_dout during the same clock cycle, so we've
run out of data. A direct response to rd_en is therefore to raise the "empty" flag, raise the FIFO's
fifo_rd_en, and wait a clock. Only then can we clear the "empty" flag again.

This means that the wrapped FIFO works in half rate. And if we're ready for a FIFO to work in half
rate, there are simpler solutions.

So the solution is to have another register, middle_dout. The FIFO wrapper always tries to keep
both fifo_dout and middle_dout with data. Just to get an idea of how this works, let's consider the
case of a well-filled FIFO and that we're after a long period with no reads. That means that
fifo_dout and middle_dout have valid data. At this point the application's rd_en is raised. dout
samples middle_dout. The "empty" flag is kept low, since data can still be sampled from fifo_dout.
If rd_en is kept high, data will continously be sampled from fifo_dout to dout.

But all this requires the FIFO's fifo_rd_en to be correctly driven. We shall now see how it's all done.

Let's get acquainted with a few internal signals in the Verilog code above. We'll start with
will_update_middle and will_update_dout:

if (will_update_middle)
middle_dout <= fifo_dout;

if (will_update_dout)
dout <= middle_valid ? middle_dout : fifo_dout;

Evidently from the Verilog code, the middle_dout and dout registers (respectively) sample
something when these signals are high. middle_dout samples fifo_dout (it doesn't have much
choice), but dout will sample middle_dout if middle_valid is high, otherwise it goes for fifo_dout.

The two *_valid flags tell us whether the respective register containts valid data.

if (fifo_rd_en)
fifo_valid <= 1;
else if (will_update_middle || will_update_dout)
fifo_valid <= 0;

if (will_update_middle)
middle_valid <= 1;
else if (will_update_dout)
middle_valid <= 0;

When fifo_rd_en is high during a clock rise, fifo_valid goes high as well: If we just pulled data from
the FIFO, its current output is considered valid. But if fifo_rd_en was low, and data was sampled by
one of middle_dout or dout, we don't consider fifo_dout valid anymore, since its data has been
consumed.

middle_valid goes by the same logic: When will_update_middle is high, data is sampled into
middle_dout, so middle_valid goes high as well. If not, it will go low when dout samples data from
it.

Now let's turn to how the FIFO's fifo_rd_en signal is generated:


assign fifo_rd_en = (!fifo_empty) && !(middle_valid && fifo_valid);

We will read from the FIFO if it has data (i.e. not fifo_empty) and if any of fifo_dout and/or
middle_dout is not valid.

Note that fifo_rd_en driven by the state machine's urge to prepare data for reading, and not the
application's reading. As a result, fifo_rd_en is derived from a three-input function, which depends
on two flip-flops (fifo_valid and middle_valid) and the FIFO's own fifo_empty signal. On a Xilinx
FPGA, this is done with a single LUT, which is as fast as you can get for any flow-controlled read
from a FIFO.

The "empty" flag is generated from a function of these two flip-flops as well, so it's very fast as
well.

assign empty = !(fifo_valid || middle_valid);

We are now left with the generation of the will_update_* signals:

assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);


assign will_update_dout = (middle_valid || fifo_valid) && rd_en;

will_update_dout essentially says (!empty && rd_en). Since we mimic a FIFO, it's pretty natural
that the output is updated when it's not empty, and data is requested by rd_en.

will_update_middle is trickier: It requires fifo_valid (otherwise there is nothing to sample) and then
we have this (middle_valid == will_update_dout). This is actually a shortcut for saying the
following:

If middle_valid is low, then we need to update. But we'll do that only if will_update_dout is
low as well, because if both dout and middle_dout "want" to update, dout has precedence.
If middle_valid is high, we don't need to update. But if will_update_dout is high and
middle_valid is high, it means that dout is going to sample from middle_dout. Not updating
now means becoming invalid, and since fifo_dout is waiting with data, why hold it?

The combinatoric functions which generate will_update_middle, will_update_dout and the next
values of fifo_valid and middle_valid may seem complicated, but they eventually depend only
fifo_empty, rd_en and two registers: fifo_valid and middle_valid. A fairly clever synthesizer should
detect this, and create a single logic level on these important paths on a Xilinx FPGA architecture.

A FWFT (First Word Fall Through) FIFO conversion


In a "classic" FIFO, a deasserted "empty" flag means that valid data will be presented on the FIFO's
outputs after rd_en is sampled high on clock. A FWFT FIFO presents the data on the outputs as
soon as it's available, so a deasserted "empty" flag means that the data on the outputs is valid.

The meaning of rd_en could therefore be interpreted somewhat differently: For a "classic" FIFO it
means "bring me data". On a FWFT it's something like "I just grabbed the data, bring me the next
one if you have it".

We shall now see a module, which uses a "classic" FIFO to implement a FWFT. Note that it does not
improve timing: The FIFO's dout is passed through, and the FIFO's rd_en depends on the one
coming from the application, so its timing is slightly worse than a the bare FIFO it uses.

As a matter of fact, this wrapper merely fiddles with the rd_en and empty signals.

If timing improvement is necessary, see the end of this page.

module basic_fwft_fifo(rst,
rd_clk, rd_en, dout, empty,
wr_clk, wr_en, din, full, prog_full);

// Eli Billauer, 12.4.06


// This module is released to the public domain; Any use is allowed.

parameter width = 8;

input rst;
input rd_clk;
input rd_en;
input wr_clk;
input wr_en;
input [(width-1):0] din;
output empty;
output full;
output prog_full;
output [(width-1):0] dout;

reg dout_valid;
wire fifo_rd_en, fifo_empty;

// orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO


fifo orig_fifo
(
.rst(rst),
.rd_clk(rd_clk),
.rd_en(fifo_rd_en),
.dout(dout),
.empty(fifo_empty),
.wr_clk(wr_clk),
.wr_en(wr_en),
.din(din),
.full(full),
.prog_full(prog_full)
);

assign fifo_rd_en = !fifo_empty && (!dout_valid || rd_en);


assign empty = !dout_valid;

always @(posedge rd_clk)


if (rst)
dout_valid <= 0;
else
begin
if (fifo_rd_en)
dout_valid <= 1;
else if (rd_en)
dout_valid <= 0;
end
endmodule

Using a FWFT for an improved non-FWFT FIFO


Given a FWFT FIFO, it's quite easy to turn it into a non-FWFT FIFO. A FWFT FIFO can be created as
shown above, or requested as such from Xilinx' tools.

As before, the bonus is the extra register at the output improves the timing. Besides, Xilinx FIFO
Generator has an "Use embedded registers" option, which can be enabled with no difference in the
FWFT FIFO's behavior (note however that when embedded registers are used on the built-in FIFO
created by Xilinx' FIFO Generator, it's not a real FWFT. When any non-built-in FIFO option is used,
it's a true FWFT). Embedded registers improve timing even further.

So it boils down to simply this (example given for 32 bit width): Since a deasserted "empty" means
that data is present on the output port, create a register which samples this data when rd_en is
asserted. That mimics a standard FIFO's behavior:

module regular_fifo
(
input rst,
input wr_clk,
input rd_clk,
input [31:0] din,
input wr_en,
input rd_en,
output reg [31:0] dout,
output full,
output empty,
output prog_full
);

wire [31:0] dout_w;

always @(posedge rd_clk)


if (rd_en && !empty)
dout <= dout_w;

fwft_fifo wrapper
(
.wr_clk(wr_clk),
.rd_clk(rd_clk),
.rst(rst),
.din(din),
.wr_en(wr_en),
.rd_en(rd_en),
.dout(dout_w),
.full(full),
.prog_full(prog_full),
.empty(empty)
);
endmodule

A FWFT with improved timing


We shall finish with a combination of the two: A FWFT module with extra registers for improved
timing. It is based on the first module of this page, with some changes that makes it a FWFT FIFO.
Only few changes are required: To make dout sample data not only on rd_en, but also when it's
invalid. Also, the "empty" flag (which is a misnomer for a FWFT) should be active when dout is not
valid, rather than depending on the validity of other registers.

The module for a conversion from a non-FWFT FIFO to a FWFT FIFO, including the timing
advantages, is given below.

module fwft_fifo(rst,
rd_clk, rd_en, dout, empty,
wr_clk, wr_en, din, full, prog_full);

parameter width = 8;

input rst;
input rd_clk;
input rd_en;
input wr_clk;
input wr_en;
input [(width-1):0] din;
output empty;
output full;
output prog_full;
output [(width-1):0] dout;

reg fifo_valid, middle_valid, dout_valid;


reg [(width-1):0] dout, middle_dout;

wire [(width-1):0] fifo_dout;


wire fifo_empty, fifo_rd_en;
wire will_update_middle, will_update_dout;

// orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO


fifo orig_fifo
(
.rst(rst),
.rd_clk(rd_clk),
.rd_en(fifo_rd_en),
.dout(fifo_dout),
.empty(fifo_empty),
.wr_clk(wr_clk),
.wr_en(wr_en),
.din(din),
.full(full),
.prog_full(prog_full)
);

assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);


assign will_update_dout = (middle_valid || fifo_valid) && (rd_en || !dout_valid);
assign fifo_rd_en = (!fifo_empty) && !(middle_valid && dout_valid && fifo_valid);
assign empty = !dout_valid;

always @(posedge rd_clk)


if (rst)
begin
fifo_valid <= 0;
middle_valid <= 0;
dout_valid <= 0;
dout <= 0;
middle_dout <= 0;
end
else
begin
if (will_update_middle)
middle_dout <= fifo_dout;
if (will_update_dout)
dout <= middle_valid ? middle_dout : fifo_dout;

if (fifo_rd_en)
fifo_valid <= 1;
else if (will_update_middle || will_update_dout)
fifo_valid <= 0;

if (will_update_middle)
middle_valid <= 1;
else if (will_update_dout)
middle_valid <= 0;

if (will_update_dout)
dout_valid <= 1;
else if (rd_en)
dout_valid <= 0;
end
endmodule

Last modified on Mon Jun 8 11:59:22 2015. E-mail: lets_talk@billauer.co.il

You might also like