You are on page 1of 8

RTL Design RTL Overview

• Gate-level design is now rare!


– design automation is necessary to manage the complexity of modern
circuits
Controller
– only library designers use gates
– automated RTL synthesis is now almost universal
load
• RTL = Register Transfer Level fn
– The design is perceived as a number of registers with transfer functions
Register
which transform data as it passes from one register to another
– this is a synchronous methodology sel Transfer
Function
– chosen as the methodology for input to synthesis

Data Path

ELEC3017 - RTL 1 ELEC3017 - RTL 2

RTL Design Steps Example


• There are typically 8 steps in the RTL design process • Scalar-product calculator
1. Create a Dependency Graph for the Data Path • Not a realistic design, but
2. Determine the widths of the data paths shows the elements of RTL
design
3. Decide what resources to provide
• The example will work on 8
4. Allocate operations to resources and schedule them
element vectors.
5. Allocate registers to intermediate results
6. Share registers
7. Design the controller
8. Design the reset/initialisation mechanism
• The order of these steps may vary and may be iterated

ELEC3017 - RTL 3 ELEC3017 - RTL 4


Step 1: Create Dependency Graph for Data
Path
Step 1b: 8-way Addition?

• So, the data operations are:


– 8 multiplications
– one 8-way addition

Balanced binary tree Skewed binary tree

ELEC3017 - RTL 5 ELEC3017 - RTL 6

Step 1c: Identify and Make Unique Data


Paths
Step 2: Determine Data Path Widths
• Create what is known as the Single Assignment Form • In most cases, this is defined for I/O as
part of the specification.
• The designer typically then has to decide
p0 := a0 * b0; • Each variable assigned only once on internal data path widths.
p1 := a1 * b1;
• Each statement uses just one operator • In this case we’ll assume the
p2 := a2 * b2; specification:
... • Create new intermediate variables to achieve
– Input is 8-bit 2’s-complement
z1 := p0 + p1; this – Output is 16-bit 2’s-complement
z2 := z1 + p2;
• To achieve this output precision, internal
z3 := z2 + p3;
paths must be 16-bit 2’s-complement
...
z := z6 + p7; • Precision of Operations:
– all multiplications are 8-bit input, 16-bit output
– the addition is 16-bit input, 16-bit output

ELEC3017 - RTL 7 ELEC3017 - RTL 8


Step 3: Choose Resources to Provide Step 4: Allocate/Schedule Operations
• As a first solution, target a minimum-area implementation • This stage of the design determines which operations are to be
– one multiplier, 8-bit inputs, 16-bit output performed by which resources at which time
– one adder, 16-bit inputs, 16-bit output – Allocation – which resource
• These resources will be shared – remember that the original – Scheduling – what time (i.e. which clock cycle)
algorithm requires the following operations: • Allocation for this example is trivial
– eight multiplications – all additions are allocated to the adder
– seven additions – all multiplications are allocated to the multiplier
• Scheduling has many permutations
– it doesn’t matter what order multiplications are performed
– it doesn’t matter what order additions are performed
– addition is associative and commutative

ELEC3017 - RTL 9 ELEC3017 - RTL 10

Step 4b: Allocate/Schedule Operations Step 4c: Simplify Schedule

Cycle mult1 add1 Cycle mult1 add1

1 a0 * b0 ҧ p0 - 1 a0 * b0 ҧ p0 0 + p0 ҧ z0

2 a1 * b1 ҧ p1 p0 + p1 ҧ z1 2 a1 * b1 ҧ p1 z0 + p1 ҧ z1

3 a2 * b2 ҧ p2 z1 + p2 ҧ z2 3 a2 * b2 ҧ p2 z1 + p2 ҧ z2

4 a3 * b3 ҧ p3 z2 + p3 ҧ z3 4 a3 * b3 ҧ p3 z2 + p3 ҧ z3

5 a4 * b4 ҧ p4 z3 + p4 ҧ z4 5 a4 * b4 ҧ p4 z3 + p4 ҧ z4

6 a5 * b5 ҧ p5 z4 + p5 ҧ z5 6 a5 * b5 ҧ p5 z4 + p5 ҧ z5

7 a6 * b6 ҧ p6 z5 + p6 ҧ z6 7 a6 * b6 ҧ p6 z5 + p6 ҧ z6

8 a7 * b7 ҧ p7 z6 + p7 ҧ z 8 a7 * b7 ҧ p7 z6 + p7 ҧ z

ELEC3017 - RTL 11 ELEC3017 - RTL 12


Step 5: Allocate registers to intermediate
results
Step 6: Share Registers
• Every variable in the scheduling that is generated in one cycle and • A register can be shared between variables if their lifetimes do not
used in another must be registered. intersect.
• The schedule proposed only requires outputs z0…z to be registered,
1 z0
not the products:
2 z1

1 a0 * b0 ҧ p0 0 + p0 ҧ z0 3 z2

2 a1 * b1 ҧ p1 z0 + p1 ҧ z1 4 z3

5 z4
• However, the schedule does assume that a multiply-accumulate can 6 z5
be done in one clock cycle
7 z6

8 z

ELEC3017 - RTL 13 ELEC3017 - RTL 14

Step 6b: Simplify Registers Design so Far

Cycle mult1 add1 • The data path design is now complete:

1 a0 * b0 ҧ p0 0 + p0 ҧ z

2 a1 * b1 ҧ p1 z + p1 ҧ z

3 a2 * b2 ҧ p2 z + p2 ҧ z
mult1
4 a3 * b3 ҧ p3 z + p3 ҧ z add1
z

5 a4 * b4 ҧ p4 z + p4 ҧ z

6 a5 * b5 ҧ p5 z + p5 ҧ z

7 a6 * b6 ҧ p6 z + p6 ҧ z

8 a7 * b7 ҧ p7 z + p7 ҧ z

ELEC3017 - RTL 15 ELEC3017 - RTL 16


Step 7: Design the controller Step 7b: Design the Controller
• The controller is responsible for: • In this case the controller becomes a simple 3-bit counter to control
– routing operation inputs to resource inputs at the scheduled cycle the muxes
– enabling registers to store intermediate results • The register is always enabled

Cycle a mux b mux zero mux register


1 select a0 select b0 select 0 load add1
2 select a1 select b1 select z load add1 =0

3 select a2 select b2 select z load add1


4 select a3 select b3 select z load add1
5 select a4 select b4 select z load add1
6 select a5 select b5 select z load add1
7 select a6 select b6 select z load add1
8 select a7 select b7 select z load add1

ELEC3017 - RTL 17 ELEC3017 - RTL 18

Step 8: Design Reset Mechanism VHDL Description


library ieee; architecture RTL of cross_product is
• Most systems require a reset mechanism use ieee.std_logic_1164.all; signal i : unsigned(2 downto 0);
signal ai, bi : sig8;
• This may be synchronous or asynchronous – this will be in the use ieee.numeric_std.all;
package cross_product_types is
signal product, add_in, sum, accumulator : sig16;
begin
specification subtype sig8 is signed (7 downto 0);
subtype sig16 is signed (15 downto 0); control: process (ck)
• Most RTL designs are reset by resetting the controller type sig8_vector is array (natural range <>) of
begin
if ck'event and ck = '1' then
sig8;
• The controller then resets the data path (as in this example) end;
if reset = '1' then
i <= "000";
else
• In this example, the reset simply puts the counter in its start state library ieee; i <= i + 1;

(count = 0) use ieee.std_logic_1164.all;


end if;
end if;
use ieee.numeric_std.all; end process;
use work.cross_product_types.all;
entity cross_product is a_mux: ai <= a(i);
port (a, b : in sig8_vector(7 downto 0); b_mux: bi <= b(i);
z_mux: add_in <= X"0000" when i = 0 else accumulator;
ck, reset: in std_logic;
multiply: product <= ai * bi;
z : out sig16); add: sum <= product + add_in;
end;
process (ck)
clear begin
if ck'event and ck = '1' then
accumulator <= sum;
end if;
reset end process;

output: z <= accumulator;


end;

ELEC3017 - RTL 19 ELEC3017 - RTL 20


More Complex Schedules Single Assignment Form
• e.g. Zwolinski p197 (simplified)
input_sum := input;
input_sum := input; input_sum := input_sum + delay(0) * ceoffb(0);
for j in 0 to order-1 loop output_sum := input_sum * coeffa(1);
input_sum := input_sum + delay(j) * ceoffb(j); output_sum := output_sum + delay(0) * coeffa(0);
end loop; output_sum := output_sum + delay(1) * coeffa(1);
output_sum := input_sum * coeffa(order); output := output_sum;
for k in 0 to order loop
output_sum := output_sum + delay(k) * coeffa(k);
end loop; p0 := delay(0) * ceoffb(0);
output := output_sum; s0 := input + p0;
s1 := s0 * coeffa(1);
• for order = 1 p1 := delay(0) * coeffa(0);
s2 := s1 + p1;
input_sum := input; p2 := delay(1) * coeffa(1);
input_sum := input_sum + delay(0) * ceoffb(0); output := s2 + p2;
output_sum := input_sum * coeffa(1); • 3 adds
output_sum := output_sum + delay(0) * coeffa(0);
output_sum := output_sum + delay(1) * coeffa(1); • 4 multiplies
output := output_sum;

ELEC3017 - RTL 21 ELEC3017 - RTL 22

Data Dependency Graph ASAP Schedule

delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)


delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)

Cycle

1
* * * * * *
p0 • 1 adder
p1 p2

+
2 + • 3 multipliers
s0
3
* *
s1
4 +
+
s2
5 +
+
output
output

ELEC3017 - RTL 23 ELEC3017 - RTL 24


ALAP Schedule Resource Constrained Schedule
delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)

delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)


Cycle
Cycle
1
* 1
*
2 + • 1 adder • 1 adder
• 2 multipliers 2 + * • 1 multiplier
3
* * 3
*
4 + * 4 + *
5 + 5 +
output output

ELEC3017 - RTL 25 ELEC3017 - RTL 26

Allocation Allocate Registers


delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1) delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)

Cycle Cycle

1
* * mpy2 1
* *
R1 R2
mpy1 • 1 adder
2 + * • 2 multipliers 2 + *
R3 R4 R5

3
* 3
*
R6 R7 R8
4 + add1
4 +
R9 R10
5 +
5 +
output R11
output

ELEC3017 - RTL 27 ELEC3017 - RTL 28


Share Registers Data Path
delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)

Cycle
delay(0) R3 coeffb(0) coeffa(1) delay(0) coeffa(0) coeffa(1) delay(1) R1 R6 R9 input R7 R10

1
* * mux1 mux2 mux3 mux4 mux5 mux6
R1 R2
• 5 Registers
2 + * mpy1 mpy2 add1
R3 R4 R5

R3/R9/
3
* R1/R6 R2/R5
R11
R6 R7 R8
R4/R8
4 +
R9 R10
R7/R10

5 +
R11
output

ELEC3017 - RTL 29 ELEC3017 - RTL 30

Controller State Machine

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5


mux1 = 0 mux5 = 0 mux1 = 1 mux5 = 1 mux5 = 2
mux2 = 0 mux6 = 0 mux2 = 1 mux6 = 1 mux6 = 2
mux3 = 0 mux3 = 1 enable R6 enable R9 enable R11
mux4 = 0 mux4 = 1 enable R7 enable R10
enable R1 enable R3 enable R8
enable R2 enable R4
enable R5

ELEC3017 - RTL 31

You might also like