Professional Documents
Culture Documents
Gabe Rowe: EE 471 Lab 2
Gabe Rowe: EE 471 Lab 2
EE 471
Lab 2
Abstract
This 32 bit ALU has a couple great functions. Minimized logic for speed, and carry look-
ahead for even more speed. The goal of this project was to make an ALU that looks like
the figure below. My ALU can perform all operations within approximately 14 gate
Bus A
32
Output
32
32 bit ALU
Bus B Zero
32
Overflow
CarryOut
ALU
Control
In order to calculate the overflow, we need to know what the carry in and the carry out
are into the highest order bits. However, I wanted to minimize logic, so I made a table
showing how I could possibly go about this, and made the logic equation using the
following tables.
A+B < 0 0 0 0 0 1 0 1 1
-A+-B > 0 0 1 1 1 0 1 0 1
A - -B < 0 1 0 1 0 1 0 1 1
-A - B > 0 1 1 0 1 0 1 0 1
In order to calculate the set less than output from the highest order bit, we would
typically xor the sum output and overflow. However, this is costly in time, and can be
minimized in logic. So I made a table showing how I could possibly go about this, and
made the logic equation using the following tables, and a k-map.
K-map
AB AB AB AB
Cin 00 01 10 11
0 0 1 1 1
1 0 0 0 1
1 bit ALU
I decided to go beyond just minimizing logic for the ALU. I decided to do carry-
lookahead. This meant that I would need to employ a partial full adder (PFA). Since the
PFA used an OR gate for the propogate, and an AND gate for the generate outputs, I
simply re-used those for the AND and OR operations required of the ALU. I also
decided to simply use an XOR gate instead of a 2 to 1 mux, since the XOR gate
simplified logic.
OP Code
Cin
3
b invert 1 2
AND 0
OR 1
a Result
b a
2
3
Less
p
g
Sum
Cin Set
Set=ABmux+A~Cin+Bmux~Cin
I decided to use the 2 gate level version of the XOR in the 1 bit ALU, however, in my
code to compete with my friends, I decided to use XOR gates with 50ps delays—the
same as the other gates. In reality, the XOR is actually two gate levels, and thus should
be slower. A 4 to 1 mux was used to select the input we want to look at, whether it’s the
sum, the set less than, AND or OR. This is shown below.
4 to 1 mux
sel1 sel0
sel
in0
in0 0
in1
out
in1 1 in2
out
in2 2 in3
in3 3
Figure 3. 4 to 1 Mux
Carry Look-Ahead
The next main module I decided to use in this 32 bit ALU was carry look-ahead for my
adder. This increased the speed of my adder by a factor of 8 approximately. The entire
add process takes 13 gate delays to calculate the slowest bit—the 32nd bit’s sum. The
carry look ahead modules are shown on the following pages. This illustrates how I used
the same 4-bit carry look-ahead module to create a 32-bit carry look-ahead module, with
cascaded 4-bit sections. I first show the main 4-bit carry look-ahead section in the 4-bit
adder. Then the block diagrams used to create the 16-bit and 32-bit adders. Then finally,
I expect this to work, and to be the fastest in the class. If I were to take the two gate level
XOR’s and make them one gate level, this would be unstoppable. I had a lot of fun
Verilog Code
/*
Gabe Rowe
EE 471
Lab #2
32 bit ALU with Carry Look Ahead, and minimized logic on set less than and overflow.
*/
module alu_32_bit(bus_a,bus_b,op_code,result_bus,zero_detect,overflow_detect,carryout);
output zero_detect,overflow_detect,carryout;
wire gnd=0;
set set0(bus_a[31],bmux31,ci1[15],set_less_than);
overflow overflow0(bus_a[31],bmux31,ci1[15],overflow_detect);
nor
nor0(zero_detect,result_bus[31],result_bus[30],result_bus[29],result_bus[28],result_bus[27],result_bus[26],r
esult_bus[25],
result_bus[24],result_bus[23],result_bus[22],result_bus[21],result_bus[20],result_bus[19],result_
bus[18],
result_bus[17],result_bus[16],result_bus[15],result_bus[14],result_bus[13],result_bus[12],result_
bus[11],
result_bus[10],result_bus[9],result_bus[8],result_bus[7],result_bus[6],result_bus[5],result_bus[4]
,
result_bus[3],result_bus[2],result_bus[1],result_bus[0]);
//These two 16 bit carry look ahead blocks make up a 32 bit carry lookahead block
cla_16_bit cla_16_bit0(op_code[2],p0,g0,ci0,gg0,pg0);
cla_16_bit cla_16_bit1(c16,p1,g1,ci1,gg1,pg1);
or #50 or0(c16,pre_c16,gg0);
or #50 or1(carryout,pre_c32_1,pre_c32_2,gg1);
alu_1_bit alu_1_bit0(bus_a[0],bus_b[0],op_code[2],op_code[2],
{op_code[1],op_code[0]},set_less_than,p0[0],g0[0],bmux0,outsum0,result_bus[0]);
alu_1_bit alu_1_bit1(bus_a[1],bus_b[1],ci0[1],op_code[2],
{op_code[1],op_code[0]},gnd,p0[1],g0[1],bmux1,outsum1,result_bus[1]);
alu_1_bit alu_1_bit2(bus_a[2],bus_b[2],ci0[2],op_code[2],
{op_code[1],op_code[0]},gnd,p0[2],g0[2],bmux2,outsum2,result_bus[2]);
alu_1_bit alu_1_bit3(bus_a[3],bus_b[3],ci0[3],op_code[2],
{op_code[1],op_code[0]},gnd,p0[3],g0[3],bmux3,outsum3,result_bus[3]);
alu_1_bit alu_1_bit4(bus_a[4],bus_b[4],ci0[4],op_code[2],
{op_code[1],op_code[0]},gnd,p0[4],g0[4],bmux4,outsum4,result_bus[4]);
alu_1_bit alu_1_bit5(bus_a[5],bus_b[5],ci0[5],op_code[2],
{op_code[1],op_code[0]},gnd,p0[5],g0[5],bmux5,outsum5,result_bus[5]);
alu_1_bit alu_1_bit6(bus_a[6],bus_b[6],ci0[6],op_code[2],
{op_code[1],op_code[0]},gnd,p0[6],g0[6],bmux6,outsum6,result_bus[6]);
alu_1_bit alu_1_bit7(bus_a[7],bus_b[7],ci0[7],op_code[2],
{op_code[1],op_code[0]},gnd,p0[7],g0[7],bmux7,outsum7,result_bus[7]);
alu_1_bit alu_1_bit8(bus_a[8],bus_b[8],ci0[8],op_code[2],
{op_code[1],op_code[0]},gnd,p0[8],g0[8],bmux8,outsum8,result_bus[8]);
alu_1_bit alu_1_bit9(bus_a[9],bus_b[9],ci0[9],op_code[2],
{op_code[1],op_code[0]},gnd,p0[9],g0[9],bmux9,outsum9,result_bus[9]);
alu_1_bit alu_1_bit10(bus_a[10],bus_b[10],ci0[10],op_code[2],
{op_code[1],op_code[0]},gnd,p0[10],g0[10],bmux10,outsum10,result_bus[10]);
alu_1_bit alu_1_bit11(bus_a[11],bus_b[11],ci0[11],op_code[2],
{op_code[1],op_code[0]},gnd,p0[11],g0[11],bmux11,outsum11,result_bus[11]);
alu_1_bit alu_1_bit12(bus_a[12],bus_b[12],ci0[12],op_code[2],
{op_code[1],op_code[0]},gnd,p0[12],g0[12],bmux12,outsum12,result_bus[12]);
alu_1_bit alu_1_bit13(bus_a[13],bus_b[13],ci0[13],op_code[2],
{op_code[1],op_code[0]},gnd,p0[13],g0[13],bmux13,outsum13,result_bus[13]);
alu_1_bit alu_1_bit14(bus_a[14],bus_b[14],ci0[14],op_code[2],
{op_code[1],op_code[0]},gnd,p0[14],g0[14],bmux14,outsum14,result_bus[14]);
alu_1_bit alu_1_bit15(bus_a[15],bus_b[15],ci0[15],op_code[2],
{op_code[1],op_code[0]},gnd,p0[15],g0[15],bmux15,outsum15,result_bus[15]);
alu_1_bit alu_1_bit16(bus_a[16],bus_b[16],c16,op_code[2],
{op_code[1],op_code[0]},gnd,p1[0],g1[0],bmux16,outsum16,result_bus[16]);
alu_1_bit alu_1_bit17(bus_a[17],bus_b[17],ci1[1],op_code[2],
{op_code[1],op_code[0]},gnd,p1[1],g1[1],bmux17,outsum17,result_bus[17]);
alu_1_bit alu_1_bit18(bus_a[18],bus_b[18],ci1[2],op_code[2],
{op_code[1],op_code[0]},gnd,p1[2],g1[2],bmux18,outsum18,result_bus[18]);
alu_1_bit alu_1_bit19(bus_a[19],bus_b[19],ci1[3],op_code[2],
{op_code[1],op_code[0]},gnd,p1[3],g1[3],bmux19,outsum19,result_bus[19]);
alu_1_bit alu_1_bit20(bus_a[20],bus_b[20],ci1[4],op_code[2],
{op_code[1],op_code[0]},gnd,p1[4],g1[4],bmux20,outsum20,result_bus[20]);
alu_1_bit alu_1_bit21(bus_a[21],bus_b[21],ci1[5],op_code[2],
{op_code[1],op_code[0]},gnd,p1[5],g1[5],bmux21,outsum21,result_bus[21]);
alu_1_bit alu_1_bit22(bus_a[22],bus_b[22],ci1[6],op_code[2],
{op_code[1],op_code[0]},gnd,p1[6],g1[6],bmux22,outsum22,result_bus[22]);
alu_1_bit alu_1_bit23(bus_a[23],bus_b[23],ci1[7],op_code[2],
{op_code[1],op_code[0]},gnd,p1[7],g1[7],bmux23,outsum23,result_bus[23]);
alu_1_bit alu_1_bit24(bus_a[24],bus_b[24],ci1[8],op_code[2],
{op_code[1],op_code[0]},gnd,p1[8],g1[8],bmux24,outsum24,result_bus[24]);
alu_1_bit alu_1_bit25(bus_a[25],bus_b[25],ci1[9],op_code[2],
{op_code[1],op_code[0]},gnd,p1[9],g1[9],bmux25,outsum25,result_bus[25]);
alu_1_bit alu_1_bit26(bus_a[26],bus_b[26],ci1[10],op_code[2],
{op_code[1],op_code[0]},gnd,p1[10],g1[10],bmux26,outsum26,result_bus[26]);
alu_1_bit alu_1_bit27(bus_a[27],bus_b[27],ci1[11],op_code[2],
{op_code[1],op_code[0]},gnd,p1[11],g1[11],bmux27,outsum27,result_bus[27]);
alu_1_bit alu_1_bit28(bus_a[28],bus_b[28],ci1[12],op_code[2],
{op_code[1],op_code[0]},gnd,p1[12],g1[12],bmux28,outsum28,result_bus[28]);
alu_1_bit alu_1_bit29(bus_a[29],bus_b[29],ci1[13],op_code[2],
{op_code[1],op_code[0]},gnd,p1[13],g1[13],bmux29,outsum29,result_bus[29]);
alu_1_bit alu_1_bit30(bus_a[30],bus_b[30],ci1[14],op_code[2],
{op_code[1],op_code[0]},gnd,p1[14],g1[14],bmux30,outsum30,result_bus[30]);
alu_1_bit alu_1_bit31(bus_a[31],bus_b[31],ci1[15],op_code[2],
{op_code[1],op_code[0]},gnd,p1[15],g1[15],bmux31,outsum31,result_bus[31]);
endmodule
module alu_1_bit(a,b,cin,binv,op,less,p,g,bmux,sum,result);
input a,b,cin,binv,less;
output p,g,bmux,sum,result;
b_mux b_mux0(b,binv,bmux);
pfa pfa0(a,bmux,cin,g,p,sum);
mux_4_to_1 mux_4_to_1_0(op,g,p,sum,less,result);
endmodule
module pfa(a,b,cin,g,p,sum);
input a,b,cin;
output g,p,sum;
or #50 or0(p,a,b);
endmodule
module cla_16_bit(cin,p,g,ci,gg_out,pg_out);
input cin;
output gg_out,pg_out;
cla_4_bit cla_4_bit0(cin,{p[3],p[2],p[1],p[0]},{g[3],g[2],g[1],g[0]},ci[1],ci[2],ci[3],gg[0],pg[0]);
cla_4_bit cla_4_bit1(ci[4],{p[7],p[6],p[5],p[4]},{g[7],g[6],g[5],g[4]},ci[5],ci[6],ci[7],gg[1],pg[1]);
cla_4_bit cla_4_bit2(ci[8],{p[11],p[10],p[9],p[8]},{g[11],g[10],g[9],g[8]},ci[9],ci[10],ci[11],gg[2],pg[2]);
cla_4_bit cla_4_bit3(ci[12],{p[15],p[14],p[13],p[12]},{g[15],g[14],g[13],g[12]},ci[13],ci[14],ci[15],gg[3],pg[3]);
cla_4_bit cla_4_bit_main(cin,pg,gg,ci[4],ci[8],ci[12],gg_out,pg_out);
endmodule
module cla_4_bit(cin,p,g,c1,c2,c3,gg,pg);
input cin;
output c1,c2,c3;
output gg,pg;
or #50 or0(c1,g[0],c1_and0);
or #50 or1(c2,g[1],c2_and0,c2_and1);
or #50 or2(c3,g[2],c3_and2,c3_and1,c3_and0);
or #50 or3(gg,g[3],c4_and2,c4_and1,c4_and0);
endmodule
module overflow(a,b,cin,overflow_detect);
input a,b,cin;
output overflow_detect;
or #50 or0(overflow_detect,and_a_b_not_cin,and_not_a_not_b_cin);
endmodule
module set(a,b,cin,set_less_than);
input a,b,cin;
output set_less_than;
not not0(not_cin, cin);
or #50 or0(set_less_than,a_and_b,a_and_not_cin,b_and_not_cin);
endmodule
output out;
endmodule
input b, binv;
output bmux;
endmodule