You are on page 1of 15

Gabe Rowe

EE 471

Lab 2
Abstract

This 32 bit ALU has a couple great functions. Minimized logic for speed, and carry look-

ahead for even more speed. The goal of this project was to make an ALU that looks like

the figure below. My ALU can perform all operations within approximately 14 gate

delays assuming the zero detect is one gate delay.

Bus A
32

Output
32
32 bit ALU

Bus B Zero
32

Overflow

CarryOut

ALU
Control

Figure 1. 32 bit ALU block diagram


Overflow

In order to calculate the overflow, we need to know what the carry in and the carry out

are into the highest order bits. However, I wanted to minimize logic, so I made a table

showing how I could possibly go about this, and made the logic equation using the

following tables.

To minimize logic of overflow:


Overflow=Cout xor Cin

All Overflow Cases


Example Binvert A B Bmux Cin Cout Sum Overflow

A+B < 0 0 0 0 0 1 0 1 1
-A+-B > 0 0 1 1 1 0 1 0 1
A - -B < 0 1 0 1 0 1 0 1 1
-A - B > 0 1 1 0 1 0 1 0 1

Simplified Overflow Cases


Example Binvert A B Bmux Cin Cout Sum Overflow
A+B < 0 X 0 X 0 1 0 1 1
-A+-B > 0 X 1 X 1 0 1 0 1

Logic Equation based on table:


Overflow = ~A*~Bmux*Cin + A*Bmux*~Cin
Set Less Than

In order to calculate the set less than output from the highest order bit, we would

typically xor the sum output and overflow. However, this is costly in time, and can be

minimized in logic. So I made a table showing how I could possibly go about this, and

made the logic equation using the following tables, and a k-map.

Set=Overflow xor Sum


I expanded this, and realized that it was too much
work to minimize logic this way, so I used a k-map

A B Cin Sum Overflow Set


0 0 0 0 0 0
0 0 1 1 1 0
0 1 0 1 0 1
0 1 1 0 0 0
1 0 0 1 0 1
1 0 1 0 0 0
1 1 0 0 1 1
1 1 1 1 0 1

K-map
AB AB AB AB
Cin 00 01 10 11
0 0 1 1 1
1 0 0 0 1

This gives us the minimized equation for set of:


Set=AB+A~Cin+B~Cin

1 bit ALU

I decided to go beyond just minimizing logic for the ALU. I decided to do carry-

lookahead. This meant that I would need to employ a partial full adder (PFA). Since the

PFA used an OR gate for the propogate, and an AND gate for the generate outputs, I

simply re-used those for the AND and OR operations required of the ALU. I also
decided to simply use an XOR gate instead of a 2 to 1 mux, since the XOR gate

simplified logic.

OP Code
Cin
3
b invert 1 2

AND 0

OR 1
a Result
b a
2

3
Less
p

g
Sum

1-bit ALU PFA


Bmux
Cin
A Overflow
Overflow=
ABmux~Cin+~A~BmuxCin
Bmux
A

Cin Set
Set=ABmux+A~Cin+Bmux~Cin

Figure 2. 1-bit ALU

I decided to use the 2 gate level version of the XOR in the 1 bit ALU, however, in my

code to compete with my friends, I decided to use XOR gates with 50ps delays—the

same as the other gates. In reality, the XOR is actually two gate levels, and thus should

be slower. A 4 to 1 mux was used to select the input we want to look at, whether it’s the

sum, the set less than, AND or OR. This is shown below.
4 to 1 mux

sel1 sel0
sel
in0

in0 0
in1
out
in1 1 in2
out

in2 2 in3

in3 3

Figure 3. 4 to 1 Mux

Carry Look-Ahead

The next main module I decided to use in this 32 bit ALU was carry look-ahead for my

adder. This increased the speed of my adder by a factor of 8 approximately. The entire

add process takes 13 gate delays to calculate the slowest bit—the 32nd bit’s sum. The

carry look ahead modules are shown on the following pages. This illustrates how I used

the same 4-bit carry look-ahead module to create a 32-bit carry look-ahead module, with

cascaded 4-bit sections. I first show the main 4-bit carry look-ahead section in the 4-bit

adder. Then the block diagrams used to create the 16-bit and 32-bit adders. Then finally,

I show the actual gate-level design of the 16 and 32 bit adders.


Insert carry look-ahead pages here.
Conclusion

I expect this to work, and to be the fastest in the class. If I were to take the two gate level

XOR’s and make them one gate level, this would be unstoppable. I had a lot of fun

doing this lab, and I look forward to the next labs.


Testing Output Waveforms

1 bit ALU testing

32 bit ALU testing


Appendix A

Verilog Code

/*

Gabe Rowe

EE 471

Lab #2

32 bit ALU with Carry Look Ahead, and minimized logic on set less than and overflow.

*/

module alu_32_bit(bus_a,bus_b,op_code,result_bus,zero_detect,overflow_detect,carryout);

input [31:0] bus_a, bus_b;

input [2:0] op_code;

output [31:0] result_bus;

output zero_detect,overflow_detect,carryout;

wire [15:0] p0,p1,g0,g1,ci0,ci1;

wire gnd=0;

set set0(bus_a[31],bmux31,ci1[15],set_less_than);

overflow overflow0(bus_a[31],bmux31,ci1[15],overflow_detect);

nor
nor0(zero_detect,result_bus[31],result_bus[30],result_bus[29],result_bus[28],result_bus[27],result_bus[26],r
esult_bus[25],

result_bus[24],result_bus[23],result_bus[22],result_bus[21],result_bus[20],result_bus[19],result_
bus[18],

result_bus[17],result_bus[16],result_bus[15],result_bus[14],result_bus[13],result_bus[12],result_
bus[11],

result_bus[10],result_bus[9],result_bus[8],result_bus[7],result_bus[6],result_bus[5],result_bus[4]
,

result_bus[3],result_bus[2],result_bus[1],result_bus[0]);

//These two 16 bit carry look ahead blocks make up a 32 bit carry lookahead block

cla_16_bit cla_16_bit0(op_code[2],p0,g0,ci0,gg0,pg0);
cla_16_bit cla_16_bit1(c16,p1,g1,ci1,gg1,pg1);

and #50 and0(pre_c16,pg0,op_code[2]);

and #50 and1(pre_c32_1,pg1,gg0);

and #50 and2(pre_c32_2,pg0,pg1,op_code[2]);

or #50 or0(c16,pre_c16,gg0);

or #50 or1(carryout,pre_c32_1,pre_c32_2,gg1);

alu_1_bit alu_1_bit0(bus_a[0],bus_b[0],op_code[2],op_code[2],
{op_code[1],op_code[0]},set_less_than,p0[0],g0[0],bmux0,outsum0,result_bus[0]);

alu_1_bit alu_1_bit1(bus_a[1],bus_b[1],ci0[1],op_code[2],
{op_code[1],op_code[0]},gnd,p0[1],g0[1],bmux1,outsum1,result_bus[1]);

alu_1_bit alu_1_bit2(bus_a[2],bus_b[2],ci0[2],op_code[2],
{op_code[1],op_code[0]},gnd,p0[2],g0[2],bmux2,outsum2,result_bus[2]);

alu_1_bit alu_1_bit3(bus_a[3],bus_b[3],ci0[3],op_code[2],
{op_code[1],op_code[0]},gnd,p0[3],g0[3],bmux3,outsum3,result_bus[3]);

alu_1_bit alu_1_bit4(bus_a[4],bus_b[4],ci0[4],op_code[2],
{op_code[1],op_code[0]},gnd,p0[4],g0[4],bmux4,outsum4,result_bus[4]);

alu_1_bit alu_1_bit5(bus_a[5],bus_b[5],ci0[5],op_code[2],
{op_code[1],op_code[0]},gnd,p0[5],g0[5],bmux5,outsum5,result_bus[5]);

alu_1_bit alu_1_bit6(bus_a[6],bus_b[6],ci0[6],op_code[2],
{op_code[1],op_code[0]},gnd,p0[6],g0[6],bmux6,outsum6,result_bus[6]);

alu_1_bit alu_1_bit7(bus_a[7],bus_b[7],ci0[7],op_code[2],
{op_code[1],op_code[0]},gnd,p0[7],g0[7],bmux7,outsum7,result_bus[7]);

alu_1_bit alu_1_bit8(bus_a[8],bus_b[8],ci0[8],op_code[2],
{op_code[1],op_code[0]},gnd,p0[8],g0[8],bmux8,outsum8,result_bus[8]);

alu_1_bit alu_1_bit9(bus_a[9],bus_b[9],ci0[9],op_code[2],
{op_code[1],op_code[0]},gnd,p0[9],g0[9],bmux9,outsum9,result_bus[9]);

alu_1_bit alu_1_bit10(bus_a[10],bus_b[10],ci0[10],op_code[2],
{op_code[1],op_code[0]},gnd,p0[10],g0[10],bmux10,outsum10,result_bus[10]);

alu_1_bit alu_1_bit11(bus_a[11],bus_b[11],ci0[11],op_code[2],
{op_code[1],op_code[0]},gnd,p0[11],g0[11],bmux11,outsum11,result_bus[11]);

alu_1_bit alu_1_bit12(bus_a[12],bus_b[12],ci0[12],op_code[2],
{op_code[1],op_code[0]},gnd,p0[12],g0[12],bmux12,outsum12,result_bus[12]);

alu_1_bit alu_1_bit13(bus_a[13],bus_b[13],ci0[13],op_code[2],
{op_code[1],op_code[0]},gnd,p0[13],g0[13],bmux13,outsum13,result_bus[13]);

alu_1_bit alu_1_bit14(bus_a[14],bus_b[14],ci0[14],op_code[2],
{op_code[1],op_code[0]},gnd,p0[14],g0[14],bmux14,outsum14,result_bus[14]);
alu_1_bit alu_1_bit15(bus_a[15],bus_b[15],ci0[15],op_code[2],
{op_code[1],op_code[0]},gnd,p0[15],g0[15],bmux15,outsum15,result_bus[15]);

alu_1_bit alu_1_bit16(bus_a[16],bus_b[16],c16,op_code[2],
{op_code[1],op_code[0]},gnd,p1[0],g1[0],bmux16,outsum16,result_bus[16]);

alu_1_bit alu_1_bit17(bus_a[17],bus_b[17],ci1[1],op_code[2],
{op_code[1],op_code[0]},gnd,p1[1],g1[1],bmux17,outsum17,result_bus[17]);

alu_1_bit alu_1_bit18(bus_a[18],bus_b[18],ci1[2],op_code[2],
{op_code[1],op_code[0]},gnd,p1[2],g1[2],bmux18,outsum18,result_bus[18]);

alu_1_bit alu_1_bit19(bus_a[19],bus_b[19],ci1[3],op_code[2],
{op_code[1],op_code[0]},gnd,p1[3],g1[3],bmux19,outsum19,result_bus[19]);

alu_1_bit alu_1_bit20(bus_a[20],bus_b[20],ci1[4],op_code[2],
{op_code[1],op_code[0]},gnd,p1[4],g1[4],bmux20,outsum20,result_bus[20]);

alu_1_bit alu_1_bit21(bus_a[21],bus_b[21],ci1[5],op_code[2],
{op_code[1],op_code[0]},gnd,p1[5],g1[5],bmux21,outsum21,result_bus[21]);

alu_1_bit alu_1_bit22(bus_a[22],bus_b[22],ci1[6],op_code[2],
{op_code[1],op_code[0]},gnd,p1[6],g1[6],bmux22,outsum22,result_bus[22]);

alu_1_bit alu_1_bit23(bus_a[23],bus_b[23],ci1[7],op_code[2],
{op_code[1],op_code[0]},gnd,p1[7],g1[7],bmux23,outsum23,result_bus[23]);

alu_1_bit alu_1_bit24(bus_a[24],bus_b[24],ci1[8],op_code[2],
{op_code[1],op_code[0]},gnd,p1[8],g1[8],bmux24,outsum24,result_bus[24]);

alu_1_bit alu_1_bit25(bus_a[25],bus_b[25],ci1[9],op_code[2],
{op_code[1],op_code[0]},gnd,p1[9],g1[9],bmux25,outsum25,result_bus[25]);

alu_1_bit alu_1_bit26(bus_a[26],bus_b[26],ci1[10],op_code[2],
{op_code[1],op_code[0]},gnd,p1[10],g1[10],bmux26,outsum26,result_bus[26]);

alu_1_bit alu_1_bit27(bus_a[27],bus_b[27],ci1[11],op_code[2],
{op_code[1],op_code[0]},gnd,p1[11],g1[11],bmux27,outsum27,result_bus[27]);

alu_1_bit alu_1_bit28(bus_a[28],bus_b[28],ci1[12],op_code[2],
{op_code[1],op_code[0]},gnd,p1[12],g1[12],bmux28,outsum28,result_bus[28]);

alu_1_bit alu_1_bit29(bus_a[29],bus_b[29],ci1[13],op_code[2],
{op_code[1],op_code[0]},gnd,p1[13],g1[13],bmux29,outsum29,result_bus[29]);

alu_1_bit alu_1_bit30(bus_a[30],bus_b[30],ci1[14],op_code[2],
{op_code[1],op_code[0]},gnd,p1[14],g1[14],bmux30,outsum30,result_bus[30]);

alu_1_bit alu_1_bit31(bus_a[31],bus_b[31],ci1[15],op_code[2],
{op_code[1],op_code[0]},gnd,p1[15],g1[15],bmux31,outsum31,result_bus[31]);

endmodule

module alu_1_bit(a,b,cin,binv,op,less,p,g,bmux,sum,result);

input [1:0] op;

input a,b,cin,binv,less;
output p,g,bmux,sum,result;

b_mux b_mux0(b,binv,bmux);

pfa pfa0(a,bmux,cin,g,p,sum);

mux_4_to_1 mux_4_to_1_0(op,g,p,sum,less,result);

endmodule

module pfa(a,b,cin,g,p,sum);

input a,b,cin;

output g,p,sum;

and #50 and0(g,a,b);

or #50 or0(p,a,b);

xor #50 xor0(sum,a,b,cin);

endmodule

module cla_16_bit(cin,p,g,ci,gg_out,pg_out);

input cin;

input [15:0] p,g;

output [15:0] ci;

output gg_out,pg_out;

wire [3:0] gg,pg,ci_main;

cla_4_bit cla_4_bit0(cin,{p[3],p[2],p[1],p[0]},{g[3],g[2],g[1],g[0]},ci[1],ci[2],ci[3],gg[0],pg[0]);

cla_4_bit cla_4_bit1(ci[4],{p[7],p[6],p[5],p[4]},{g[7],g[6],g[5],g[4]},ci[5],ci[6],ci[7],gg[1],pg[1]);

cla_4_bit cla_4_bit2(ci[8],{p[11],p[10],p[9],p[8]},{g[11],g[10],g[9],g[8]},ci[9],ci[10],ci[11],gg[2],pg[2]);

cla_4_bit cla_4_bit3(ci[12],{p[15],p[14],p[13],p[12]},{g[15],g[14],g[13],g[12]},ci[13],ci[14],ci[15],gg[3],pg[3]);

cla_4_bit cla_4_bit_main(cin,pg,gg,ci[4],ci[8],ci[12],gg_out,pg_out);

endmodule

module cla_4_bit(cin,p,g,c1,c2,c3,gg,pg);

input cin;

input [3:0] p,g;

output c1,c2,c3;
output gg,pg;

and #50 and0(c1_and0,p[0],cin);

and #50 and1(c2_and0,p[1],g[0]);

and #50 and2(c2_and1,p[1],p[0],cin);

and #50 and3(c3_and0,p[2],g[1]);

and #50 and4(c3_and1,p[2],p[1],g[0]);

and #50 and5(c3_and2,p[2],p[1],p[0],cin);

and #50 and6(c4_and0,p[3],g[2]);

and #50 and7(c4_and1,p[3],p[2],g[1]);

and #50 and8(c4_and2,p[3],p[2],p[1],g[0]);

and #50 and9(pg,p[3],p[2],p[1],p[0]);

or #50 or0(c1,g[0],c1_and0);

or #50 or1(c2,g[1],c2_and0,c2_and1);

or #50 or2(c3,g[2],c3_and2,c3_and1,c3_and0);

or #50 or3(gg,g[3],c4_and2,c4_and1,c4_and0);

endmodule

module overflow(a,b,cin,overflow_detect);

input a,b,cin;

output overflow_detect;

not not0(not_a, a);

not not1(not_b, b);

not not2(not_cin, cin);

and #50 and0(and_a_b_not_cin,a,b,not_cin);

and #50 and1(and_not_a_not_b_cin,not_a,not_b,cin);

or #50 or0(overflow_detect,and_a_b_not_cin,and_not_a_not_b_cin);

endmodule

module set(a,b,cin,set_less_than);

input a,b,cin;

output set_less_than;
not not0(not_cin, cin);

and #50 and0(a_and_b,a,b);

and #50 and1(a_and_not_cin,a,not_cin);

and #50 and2(b_and_not_cin,b,not_cin);

or #50 or0(set_less_than,a_and_b,a_and_not_cin,b_and_not_cin);

endmodule

module mux_4_to_1(sel, in0, in1, in2, in3, out);

input [1:0] sel;

input in0, in1, in2, in3;

output out;

wire [1:0] not_sel;

not not0(not_sel[0], sel[0]);

not not1(not_sel[1], sel[1]);

and #50 and0(sel_in0, in0, not_sel[1], not_sel[0]); //00

and #50 and1(sel_in1, in1, not_sel[1], sel[0]); //01

and #50 and2(sel_in2, in2, sel[1], not_sel[0]); //10

and #50 and3(sel_in3, in3, sel[1], sel[0]); //11

or #50 or0(out, sel_in0, sel_in1, sel_in2, sel_in3);

endmodule

module b_mux(b, binv, bmux);

input b, binv;

output bmux;

xor #50 xor0(bmux, binv, b);

endmodule

You might also like