You are on page 1of 5

Datapath Optimizations in Oasys

A general rule in Oasys is we only merge those datapaths in one partition which can have Carry-
save(CS) edges in between them. These datapaths are generally non-lossy additions or lossy to
the same precision additions. Mults feeding adders are also marked CS if they follow the non-
lossy/lossy rule mentioned above.

Lossy to same precision example:


Input [3:0] a,b,c;
Output [3:0] out;
wire [3:0] temp = a+b;  out = a+b+c (a,b,c in one partition)
out = temp+c;

Here “out” is a lossy adder but “out” and “temp both have size 4 with temp also being a
lossy addition in size “4”. Hence we merge all the inputs in one partition.

=> This will have to be taken care of when we plan to introduce hierarchy/partition for
datapaths.

Optimizations:
CSE opt:
RTL:
assign out1 = a+b+c
assign out2 = a+b+d

Synthesis:
assign temp = a+b
assign out1 = temp+c
assign out2 = temp+d

Now signal “temp” has 2 fanouts and hence Synthesis netlist will not have any Carry-Save
additions. So right after synthesis there will be 3 datapath Partitions one for each line.

If any of out1 or out2 become critical in timing we will undo the CSE and try to infer Carry-
Save additions.

Optimize Netlist if lets say out1 is critical:


assign out1 = a+b+c //implemented using carry save additions and in one partition
 This is already done by rtlc-FP. Have to check for other complex variations.
=====================================================================
=========

Conditional Pruning opt:


Flavour 1: Constant Propagation to eliminate addition
RTL:
assign out = (sel == 4’b1010) ? (sel + 1’b1) : (a+b)

Netlist:
assign out = (sel == 4’b1010) ? (4’b1011) : (a+b)

=> Done by RTLC-FP

Flavour 2: Range propagation to reduce size of adder


RTL:
Input [2:0] a;
input[15:0] sel;
wire [15:0] temp = sel+a
assign out = (sel < 2’b11) ? temp : a

Netlist
wire [3:0] temp = sel+a //because sel can be atmost 2’b10 for temp to be active
assign out = (sel < 2’b11) ? temp : a
Size of “temp” adder is reduced from size 16 to 4

=> Done by rtlc-precision but not by FP.


Optimization in precision done at Nomlevel(nomSweepAfterFlatten)
Done here: BreakCombinationalLoops
Next here: After_MuxDataPathOptForModgens

=====================================================================
=========
Collapse adder Vector writes opt:
RTL:
module test(d,idx,q);
input [7:0] d; input [3:0] idx; output [31:0] q;
wire [4:0] t; assign t = idx+1;
always @(*) begin
q=0;
{q[t+7], q[t+6], q[t+5], q[t+4], q[t+3], q[t+2], q[t+1], q[t]} = d
end
endmodule

Netlist:
input [3:0] idx; output [31:0] q; wire [4:0] t;
assign t = idx+1;
always @(*) begin
q = 0;
q[t +: 8] = d;
end
 Not performed in rtlc-FP & rtlc-precision.

Add Shift Opt:


RTL:
assign out = c << (a+1)

Netlist:
assign out = {c,1’b0} << a

iff (a+1) is non-lossy

 Not performed in rtlc-FP * rtlc-precision.

=============================================================
Cancellations opt:
RTL:
assign out = (a+b+c) == (a+d+e)

Netlist:
assign out = (b+c) == (d+e)

 Disable in rtlc-FP &enabled in rtlc-precision.

===============================================================
==
Carry-Save Additions opt:
RTL: Simple additions
input [3:0] a,b,c;
output [3:0] out;
wire [3:0] temp;
assign temp= a+b+4;
assign out = temp+c+5;

Netlist
assign out = a+b+c+9; //4+5 = 9, separate const terms get added into one equiv term

We will create one datapath partition that will implement a ripple-adder after synthesis
and carry-save addition after timing opt

 Not performed in rtlc-precision and rtlc-fp.


-----------------------------------------------------------------------------------------------------------

carry save marking across wire nodes


RTL:
input signed [3:0] c;
input signed [2:0] b;
input signed [7:0] a;
output signed [7:0] out;
wire signed [6:0] mult; wire signed [7:0] temp;
assign mult = b*c;
assign temp = {mult[6],mult[6:0]};
assign out = a + temp;
Netlist:
assign out = ($signed)b*c + ($signed)a //both the mult and addition will get merged in one
partition

 Datapath partition problem.

For this testcase you will see just one datapath partition with inputs a,b and c.

--------------------------------------------------------------------------------------------------------------------------------
----
RTL:
input [4:0] a,b,c;
output [6:0] out;
assign out = {(a+b),3’b000}+c

Netlist:
assign out = {a,3’b000}+{b,3’b000}+c

But if (a+b) already has 2 fanouts we wont do this optimization in synthesize step as we will
throw up on area. However, if in timing (a+b) is de-cloned and has one fanout, we will do this
CS optimization in timing flow.

 Does not seems happening in Oasys for one case. Not happening in RTLC either.
---------------------------------------------------------------------------------------------------------------------
------

carry save marking across Inequalities


RTL:
assign out = ((d1+d2)) > (d3+d4) ? a : b

Netlist:

assign temp = (d1+d2) - (d3+d4);


assign out = temp ? a : b

 not done in rtlc-fp /precision. We always create LESS_THAN macro.


=====================================================================================

Resource Sharing opt:


RTL:
assign out = sel ? (a+b) : (a+c)

Netlist:
assign out = a+ (sel ? b : c)

 Not done in rtlc-FP. Performed in rtlc-precision.

You might also like