You are on page 1of 7

VERILOG HDL based FPGA design

Gary Gannot and Michiel Ligthart


Exemplar Logic, Inc.
2550 Ninth Street, Suite 101
Berkeley, Ca. 94710
(510) 849 0937

Abstract language entry could easily cost more than


This paper presents a logic synthesis system for $ lOOk for simulation, high-level synthesis, and
Field Programmable Gate Arrays (FPGAs) based logic synthesis tools. Such a cost may seem rea-
on the Verilog HDL. It describes aspects of syn- sonable for CMOS gate-arrays, where designers
thesis and optimization specific to FPGAs, in are already facing high NRE costs, and can am-
contrast to CMOS gate-arrays. Particular atten- ortize tools over several projects, but is prohibi-
tion is paid to architecture specific optimization , tive for average FPGA design.
both on Register Transfer and Logic Level. The
concept of the design methodology is proven by This paper discusses a design methodology for
a real-world implementation of an actual design. FPGAs based on the Verilog HDL, including be-
havioral simulation, synthesis, and back annota-
Introduction tion for gate-level simulation. Within the thread
of the flow, we will focus on the specifics of
FPGA synthesis, in contrast with traditional gate-
At the end of the previous decade, Verilog HDL
array synthesis.
was rapidly becoming the design language of
choice for many CMOS gate array starts. This de-
velopment was driven by an excellent simulator FPGA architectures
and a wide availability of many vendor- certified
simulation libraries. The adaptation of an RTL Field Programmable Gate Arrays (FPGAs), con-
subset of Verilog HDL by Synopsys as its origi- trary to traditional gate arrays, comprise a wide
nal high-level synthesis language accelerated the variety of building blocks, or cells. Xilinx’ Con-
usefulness of the language. figurable Logic Block (CLB, see figure l), for
instance, contains programmable Look-Up Ta-
Unfortunately, the HDL paradigm that proved to bles and storage registers. The 4000 architecture
be so useful for CMOS gate array design, was implements any function of up to five inputs, and
lacking in support for the design of FPGAs. Not some functions of up to nine inputs. In this sense,
only were no simulation libraries available from a 5-input XOR function is as expensive in area
the different vendors, but synthesis tools based on and delay as a 2-input NAND. This is in contrast
gate-array optimization performed poorly when with ASIC transistor counts when implementing
optimizing for different FPGA architectures. thesc same functions.

86
0-8186-5655-7/94 $03.00 0 1994 IEEE
FUNCTION

m
CONTROL

FUNCTION o o -xa

>
- EC
RO
K
(CLOCK) I\ 1
X

Figure 1. Simplified Block Diagram of XC4000 Configurable Logic Block.

completely different basic cell (see figure 2). regular and-or decomposition, and fanin-limited
Based on a multiplexor architecture, the Act2 c- decomposition for LCA design. And-or decom-
module implements as diverse gates as 2-input position processes a complex equation, and rep-
AND gates, as well as AND-OR gates with five resents it as a tree of and gates and or gates only.
inputs with the same area and delay penalty. The tree representation is then mapped into an
LCA by merging gates with less then four fanins,
As a final example, the Altera MAX family (fig- and splitting gates with more than four fanins.
ure 3) is representative of the so-called complex
PLDs. The basic structure is the Logic Array For instance,
Block (LAB), which consist of a macrocell array,
an expander product term array, and an do control X = (A*(B+C))+(B*D)+(E*F*G*H*I)
block. The LAB implements 16 functions of up to
30 product terms.
after and-or decomposition would be as follows:

The differences in cell granularity among differ-


X = Tl+T2+T3
ent FPGAs make it a challenge both for human
designers and for automated synthesis tools to de-
sign circuits in an optimal fashion for every T1 =A*T4
architecture. For random logic, synthesis can use T2 = B*D
architecture specific optimization algorithms. To T3 = E*F*G*H*I
stick with the three technologies discussed be- T4 = B+C
fore, one can use fanin-limited optimization for
Xilinx, mux-based optimization for Actel, and
Next, as T3 has more than four inputs, it is split
cube-limited optimization for Altera [ 13.
into two functions :

As an example, consider the differences between


T3 = E*F*G*TS
Figure 2. Combinational and sequential configurations of the Act2 cell.

T5 = H*I
X = Tl+(T2*E)
Now the design is in AND/OR format, the Xilinx T1 = A*(B+C)+(B*D)
physical place and route software can be used to T2 = F*G*H*I
place the design into physical CLBs. This pro-
cess is referred to as "partitioning" by Xilinx,
Since the three expressions share no common in-
a l t h o u g h " p a c k i n g " is the m o r e f a v o r a b l e
puts, an optimal solution has been achieved.
terminology. In this example, T5 or T3 cannot be
merged because of fan-in limitations. T I may be
combined with X, but it would be better to com- The Actel logic module consists of three 2-to-1
bine T1 with T4 and X with T2. This gives the multiplexors and one 2-input OR gate. As such,
following partitioning into four CLBs: the Actel technology is often classified as a fine-
grain FPGA, and can be optimized and mapped
using traditional gate-array techniques [2]. How-
X = Tl+(B*D)+T3
ever, since the basic logic module is created from
T I = A*(B+C) multiplexors, mux-based optimization and bool-
T3 = E*F*G*TS ean mapping techniques are much more effective.
T5 = H*I
Consider the function:
The best partitioning for this example, however is
not the best solution overall. The initial logic de- f = a*b + !b*c + d
composition given to the place and route strongly
affects the final solution. A fanin-limited decom-
where ! symbolizes the boolean negation.
position, which is based on the knowledge that
the number of inputs only limits the logic to be Using the property:
implemented, yields a much better partioning into
three CLBs: d = (b + !b) d

88
to I/O Block

Programmable 16 Expander
Interconnect Product Terms
SignaIs

Figure 3. Simplified Block Diagram of Altera Max7000 Macrocell.

sometimes in different aspect ratios.) Hard and


and substituting this into f, we obtain: soft macros are especially popular to implement
arithmetic and relational logic, including addi-
tion, subtraction, incrementation, and
f = b * ( a + d) + !b * (c + d ) (3)
comparisons. The optimization techniques de-
scribed previously, however, operate only on the
In a similar fashion: Boolean gate level, and are not capable of resyn-
thesizing larger structures, like n-bit adders, as a
a+d = a+!a*d = a*l+!a*d (4) structure in itself. In fact, once the n-bit adder is
implemented in a certain way, no synthesis tool
c+d = c + ! c * ~= ~ * l + ! c * d (5) will change its basic structure, say from carry rip-
ple into carry look ahead.
Substituting (4) and ( 5 ) into (3) gives: a b

pre-defined, teqhnology-specific hard and soft C

macros that can be utilized by advanced synthe-


sis systems. (Hard macros differ from soft macros Figure 4. Decomposition into Actel technology.
in the fact that they are pre laid out and routed,

89
VERILOG HDL

I
parser

random ~

____-- -~ modgen
I
I library

I optimization

v
netlist

Figure 5. Module Generation dataflow.

matic drawing. He or she can even design propri-


From the discussion above, it is obvious that im- etary macros and use those where appropriate.
plementing structured logic functionality in an The drawback, besides using the schematic entry,
FPGA differs drastically from technology to is that the designer has to be familiar with the tar-
technology. As an example, 16-bit addition in the get technology and should know in detail when to
Xilinx 4000 family is best implemented using use which macro. From a Verilog HDL perspec-
hard macros exploiting the architecture's fast car- tive, these macros can be utilized using module
ry chain, but these hard macros do not exist in the instantiation with hardwired connectivity. For in-
3000/3100 family. Actel, on the other hand, has a stance, when addition is required in the Xilinx
vast variety of full-adder cells that can be config- 4000 family, the designer can instantiate the
ured in carry-look ahead or ripple carry adders. ADSU8H for an 8-bit implementation or the
Not only target architecture, but area vcrsus de- ADSU16H for a 16-bit implementation:
lay trade-offs can also play an important role in
how to implement structured logic. To stay with adsu8h g 0 (.a( busa),. b( busb),.add(pwr),
the adder example, a 16-bit implementation in s(sum),.ofl(ofl));
Act 1 can be done in 24 modules as a carry-ripple, ~~

or in 78 modules as a carry-select adder for ap-


where the actual module is described without a
proximately half the delay.
body:

In a schematic entry methodology, the designer


module adsu8h (a, b, add, s, ofl);
can .always utilize the foundry-provided hard and
soft macros by interconnecting them in the sche- input [7:0] a, b;

90
Figure 6. The Mancala Game implemented in FPGAs and mounted on an Aptix Field Programmable
Circuit Board.

input add; wants to see the ’+’ operator implemented by the


output [7:0] s; ADSU8H hard macro, but cannot enforce this as
output 0fl; Verilog does not allow overloading of any oper-
ator or function.
endmodule

The solution is one where designers can refer to


However, this removes the higher level of ab-
libraries where certain operators are defined in
straction that the Verilog HDL was supposed to
terms of a target technology. Such a library could
provide and essentially reduces Verilog design to
contain n-bit adders, subtractors, accumulators,
textual schematic capture. But at the same time, it
and other datapath operators. The synthesis tool
is the only way to assure the utilization of the
chooses from this library the preferred implemen-
fast-carry chain in this particular architecture.As-
tation of an operator, depending on size and
suming that the overflow signal (ofl) is not re-
well-defined attributes. Such a library, which is
quired in this implementation, a typical Verilog
statement to achieve addition would have been referred to as a module generation library, can be
defined in many different ways.
sum = busa + busb ;
Figure 5 shows the general flow of data in a mod-
ule generation environment. After the Verilog
For the 4000 architecture, the designer
code is successfully parsed, it is passed on

91
to an inference engine that matches supported op- assign bincount-bus = bindselect[ I] ?
erators like addition with preferred implementa- bincountl : 8’bz;
tions in the module generation library. Matching
is performed on three levels: The high-impedance assignment synthesized fine
- name of the module generator in the Xilinx technology, where it resulted in a
each operator has an identifying name, total of 96 three-state buffers. Actel’s anti-fuse
e.g. ’modgen-add’ for ’+’. technology nor Altera’s FLEX 8000, however,
- generic value ’size’ support internal three-state buffers.Hence, to ac-
module generators can be defined for comodate the assign statement an automatic con-
any size in the natural range. version into multiplexed logic was performed.
- number of ports
the number of ports is defined by the The finalized design occupied 175 CLBs plus 96
module generator and its size. 3-state buffers (xilinx 4000), 893 modules (act2),
and 680 LCs (flex 8000). Figure 6 shows how the
devices are mounted on an Aptix Field Program-
Other generics, for instance for area and delay mable Circuit Board.
trade-offs, can be defined as well, but are not
mandatory for a match [3].
Conclusions
Pilot design
This paper has discussed aspects of a Verilog
HDL based FPGA design flow. It has shown that
The concept of Verilog HDL based FPGA design the Verilog HDL language itself is applicable to
as described in this paper has been tested and im- FPGA design, but that dedicated tools especially
plemented with a real-world design. The Mancala
for synthesis are required to obtain feasible
Game is a tradional African board game, that has
results. The approach taken has been validated
been emulated in hardware.
with the synthesis of a Verilog HDL description
of the Mancala game in actual hardware.
The Verilog description is approximately 700
lines, and has been synthesized into different
References
FPGAs. The printed circuit board implementation,
as shown in figure 6, contains three identical im-
[ 11 ’ELSS: A Logic Synthesis Tool for FPGAs’,
plementations of the game in an Actel 1280,
R.P Ranauro and M.M. Ligthart, proceedings of
Altera Flex, and a Xilinx 4008 device.
the 4th annual IEEE Asic conference and Exhibit,
1991, pp.13.2.1-13.2.4.
The pilot design illustrates another interesting as-
pect of device-specific FPGA synthesis. The
[2] ’Technology Mapping in MIS’, E. Detjens, G.
Mancala game requires 24 seven-segment dis-
Gannot et..al., proceedings of the ICCAD, 1987,
plays, which at four i/o pins each consume 96
pp.1 16-119.
device pins. In order to save i/o pins on the FP-
GAS, the data for the seven-segment displays
were multiplexed on a high-impedance output [3] ’Module Generation for VHDL synthesis’,
bus using the verilog assignment to ’z’: R.W. Dekker and M.M. Ligthart, proceedings of
the VIUF spring conference, 1993.

92

You might also like