2011 - 12

ABSTRACT

Dept of VLSI and Embedded Systems

Page 1

2011 - 12

Table of Contents...................................................................................................2 INTRODUCTION...................................................................................................... 3 Digital Pipelining.................................................................................................3 Partitioning of a Design..........................................................................................5 Partition of Data Width........................................................................................5 Partition of Functionality.....................................................................................5 Pipelined Serial Adder Design.................................................................................6 Parallel Signed Adder Design..................................................................................9 RTL View........................................................................................................... 17 Simulation Results of Parallel Signed Adder......................................................17 REFERENCES........................................................................................................ 20

Dept of VLSI and Embedded Systems

Page 2

2011 - 12

INTRODUCTION Digital Pipelining

Dept of VLSI and Embedded Systems

Page 3

2011 - 12

Fig. 1.1 (a) Traditional approach. (b) Pipelined approach Time(ns) 0 10 20 …… 100 110 …… 190 Latency: 100ns. Fig. 1.2 Processing order of pipelining The effect of pipelining may be summarized as follows: • • • Throughput increases considerably Latency comes into effect Chip area increases marginally Input Data1 Data2 Data3 …… Data11 Data12 …… Data20 Reg. 1 Proc.1_1 Proc.1_2 …… Proc.1_10 Proc.1_11 …… Proc.1_19 Reg. 2 Proc.2_1 …… Proc.2_9 Proc.2_10 …… Proc.2_18 …… Reg. 10

…… …… …… …… ……

…… Proc.10_1 Proc.10_2 …… Proc.10_10

Dept of VLSI and Embedded Systems

Page 4

2011 - 12

Partitioning of a Design
In order to incorporate pipelining in the design, we need to break a sequence of operations or a complex algorithm into convenient small steps in terms of the following: • • Partition of data width Partition of functionality

The following sub-sections discuss the methodology of partitioning.

Partition of Data Width

Partition of Functionality
Functionality is any process such as addition, subtraction, multiplication, or division. We need to group similar functions such as multiplication together. Also, the functional block is divided into smaller sub-blocks, if this is feasible. In this type of partitioning, each sub-block does a different function, in general. This can be clearly understood by considering an example. To compute a sum of products: a1*b1 + a2*b2 + a3*b3 + a4*b4, where a1, b1, etc., are each of size 16 bits. We can group multiplication functions, a1*b1, a2*b2, and a3*b3 together and do all these computations simultaneously and register the partial products.
Dept of VLSI and Embedded Systems Page 5

2011 - 12

Similarly in the subsequent pipeline stage, we can perform additions A = (a1*b1 + a2*b2) and B = (a3*b3 + a4*b4) concurrently. In a next pipeline stage, the final addition, result = A + B, which is the desired sum of products, is performed. It may be noted that products such as a1*b1, etc., can be broken down into smaller sub-blocks, namely, shift operations and additions. In the signed adder example cited earlier, LSBs (7 bits) of the eight numbers are added concurrently followed by the addition of MSBs (5 bits along with carry from LSB addition) in subsequent pipeline stages.

The code for addition of eight, 12 bit, twos complement numbers is shown in Verilog Code_1.1. The inputs are fed serially at pins marked “n”. The design module is declared as “serial_adder12s”, listing all the inputs/outputs. The inputs are the system clock, enable, and n. The sum and result are the outputs. The signal, sum_valid, goes high when the added sum is valid. The “result” is the same as the “sum” except for the difference that the added result is prolonged at the “result” output till it is overwritten by a new result. A 3-bit counter, cnt [2:0], keeps track of the number of inputs accumulated. The first assign statement computes the “sum” in advance (sum_next [14:0]) if “enable” is high. Otherwise, it is cleared. Note that the sum is sign extended by 3 bits since the result is 3 bits more than the input number(s). Also, note carefully the number of flower brackets used. Otherwise, compiler tool will complain. The counter, cnt, is pre-advanced if enabled. The sum is valid after inputting the eighth number. An advanced valid signal, sum_val, is switched on only when “cnt” equals 7. The first “always” block registers the advance sum computed earlier when the clock strikes. Also, the “cnt” is incremented, every time an input is accumulated. The “sum_valid” is set high if all the eight input numbers are exhausted. The last “always” block registers the “result” whatever was in “sum” if “sum_valid” is active. Otherwise, the result is not disturbed.

Dept of VLSI and Embedded Systems

Page 6

2011 - 12

Verilog Code_1.1 // Place the design in a file named “serial_adder12s.v”. module serial_adder12s ( clk, enable, n, sum, sum_valid, result ); input clk ; input enable ; input [11:0] n ; output [14:0] sum ; output sum_valid ; output [14:0] result ; // Prolong the result till it is overwritten by a new result. wire wire wire reg reg reg reg [14:0] sum_next ; [2:0] cnt_next ; sum_val ; [14:0] sum; [2:0] cnt ; sum_valid ; [14:0] result ; // Declare nets in the design.

assign sum_next [14:0] = enable ? ({{3{n[11]}}, n[11:0]} + sum[14:0]) : 0 ; // Sign extend & accumulate. assign cnt_next [2:0] = enable ? (cnt + 1) : 0 ; // Pre-advance the counter. assign sum_val = (cnt == 7) ? 1 : 0 ; // Pre-determine the validity of the sum. always @ (posedge clk) begin sum [14:0] <= cnt [2:0] sum_valid <= end always @ (posedge clk) // Prolong the result till it is overwritten by the new result[14:0] endmodule <= result. sum_valid ? sum[14:0] : result[14:0] ; // Register the sum. // Pipeline – Register the sum. sum_next [14:0] ; // Register the sum. <= cnt_next [2:0] ; // Advance the count. sum_val ; // Register the signal.

Dept of VLSI and Embedded Systems

Page 7

2011 - 12

Dept of VLSI and Embedded Systems

Page 8

2011 - 12

In the serial adder design, we added eight numbers, n0 to n7, and we got a sum whose size is 3 bits more than the input. The last bit is the sign bit. The design was pipelined and partitioned for the data width as well as the functionality. It is also true for the parallel adder design . The block diagram for this design is shown in Figure 1.2. We have three stages of pipelining and five pipelined registers in this design. Before we consider the design, let us see how to evaluate twos complement quickly. It can be done in just two steps as follows.

Fig. 1.2 Parallel signed order design

Twos Complement Evaluation (Shortcut)
Let us say that we have an eight bit data 11110000, whose twos complement is required. This can be evaluated as follows. We may have to sign extend the number by 1 bit, i.e., duplicate the MSB, if we wish to add another number as shown. In the first step (other than sign extension), we scan the number from LSB till we encounter the first “1” and retain all the bits from LSB up to “1”. In this example, we retain 10000. In the second and final step, we invert all other bits (1111) to get the desired result, 000010000. Once you get used to this, you will be able to compute the twos complement at one shot. When we add two numbers, the result will be 1 bit more than the precision of each number. Hence, we need to extend the sign bit of each number by one. [8].……...[0]

111110000 Sign extended data.
Dept of VLSI and Embedded Systems Page 9

2011 - 12

Step 1 2 • • •

10000 Retain first 1 from LSB, followed by 0s. 000010000 Invert other bits.

Sign can be extended by any number of bits without affecting the actual value. Sign extend means duplicate MSB ([8]<=[7]). Without the sign extension, the MSB [7] will be mistaken as a negative number for high positive values such as +254.

Pipelined Design of Parallel Twos Complement Adder
Dept of VLSI and Embedded Systems Page 10

2011 - 12

Fig. 1.3 Pipelined design partition of parallel adder

Verilog Code for the Parallel Signed Adder Design
Now, let us consider the Verilog code for this parallel, signed adder design. We will see how to add eight 12 bit, twos complement numbers n0 to n7 with 5 pipeline stages registered at positive clock. The result “sum” is a 15 bits in twos complement and the output is not registered. We have to first declare the module with the appropriate module name and declare the input clk, the input numbers n0 to n7 and the output sum. During the course of actual arithmetic operations, we will encounter many intermediate signals. Some of them may be used in assign statements and they are declared as wire along with their width. We also have some numbers, which are not used in the computation, but propagated at a particular stage. For example, the msb addition is not calculated at the beginning and so they have to be registered and propagated for use later on when it is required. The msb and lsb for the next stage are also declared as registers. This completes the “reg”, “wire” declarations. In the first stage, we add two numbers at a time, say, n0 and n1 and we add only the lsbs of the two numbers. Parallel to this, we add the others numbers n2 and n3, n4 and n5, and n6 and n7. This is same as that of using four adders concurrently and the results are stored using the assign statements. We add only the lsbs and register when the first clock arrives. Since we are not adding the msbs at this stage, we need to register it separately and propagate it through and use when the next clock arrives. Before the next clock arrives, we also preserve the sum. We have four sum results at this stage. Before we add the msbs, the sign should be extended. The msb 11 is the sign bit. The sign bit is first copied to another signal and then
Dept of VLSI and Embedded Systems Page 11

2011 - 12

Dept of VLSI and Embedded Systems

Page 12

2011 - 12

Verilog Code_1.2 /* Verilog Code for Signed Adder Design // Adds eight numbers, n0 to n7, each of size 12 bits in 2’s complement. // Has five pipeline stages registered at positive edge of clock. // Result, sum, is in 15 bits, 2’s complement form (not registered). module adder12s ( n0, n1, n2, n3, n4, n5, n6, n7, sum clk,

); input input input input input input input input input output [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [14:0] clk ; n0 ; n1 ; n2 ; n3 ; n4 ; n5 ; n6 ; n7 ; sum ;

wire wire wire wire wire wire wire wire wire wire wire wire wire reg reg

[7:0] [7:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [7:0] [7:0] [6:0] [6:0] [7:0]

s00_lsb ; s01_lsb ; s02_lsb ; s03_lsb ; s00_msb ; s01_msb ; s02_msb ; s03_msb ; s10_lsb ; s11_lsb ; s10_msb ; s11_msb ; s20_lsb ;

[11:7] n0_reg1 ; [11:7] n1_reg1 ;
Page 13

Dept of VLSI and Embedded Systems

2011 - 12

reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg

[11:7] [11:7] [11:7] [11:7] [11:7] [11:7] [7:0] [7:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [6:0] [6:0] [6:0] [6:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [6:0] [6:0] [6:0] [6:0] [6:0] [6:0] [6:0]

n2_reg1 ; n3_reg1 ; n4_reg1 ; n5_reg1 ; n6_reg1 ; n7_reg1 ; s00_lsbreg1 ; s01_lsbreg1 ; s02_lsbreg1 ; s03_lsbreg1 ; s00_msbreg2 ; s01_msbreg2 ; s02_msbreg2 ; s03_msbreg2 ; s00_lsbreg2 ; s01_lsbreg2 ; s02_lsbreg2 ; s03_lsbreg2 ; s10_lsbreg3 ; s11_lsbreg3 ; s00_msbreg3 ; s01_msbreg3 ; s02_msbreg3 ; s03_msbreg3 ; s10_lsbreg4 ; s11_lsbreg4 ; s10_msbreg4 ; s11_msbreg4 ; s10_msbreg5 ; s11_msbreg5 ; s20_lsbreg5cy ; s20_lsbreg5 ;

// First Stage Addition assign s00_lsb[7:0] = n0[6:0]+n1[6:0] ; // Add lsb first - s00_lsb[7] is the carry assign s01_lsb[7:0] = n2[6:0]+n3[6:0] ; // n0-n7 lsb need not be registered since addition is already carried out here. assign s02_lsb[7:0] = n4[6:0]+n5[6:0] ; assign s03_lsb[7:0] = n6[6:0]+n7[6:0] ; always @ (posedge clk) // Pipeline 1: clk (1). Register msb to continue // addition of msb. begin n0_reg1[11:7] <= n0[11:7] ; // Preserve all inputs for msb addition during the clk(2). n1_reg1[11:7] <= n1[11:7] ; n2_reg1[11:7] <= n2[11:7] ; n3_reg1[11:7] <= n3[11:7] ;
Dept of VLSI and Embedded Systems Page 14

2011 - 12

n4_reg1[11:7] <= n4[11:7] ; n5_reg1[11:7] <= n5[11:7] ; n6_reg1[11:7] <= n6[11:7] ; n7_reg1[11:7] <= n7[11:7] ; s00_lsbreg1[7:0] <= s00_lsb[7:0] ; // Preserve all lsb sum. s00_lsbreg1[7] is the registered carry from lsb addition. s01_lsbreg1[7:0] <= s01_lsb[7:0] ; s02_lsbreg1[7:0] <= s02_lsb[7:0] ; s03_lsbreg1[7:0] <= s03_lsb[7:0] ; end // Sign extended & msb added with carry. assign s00_msb[5:0] = {n0_reg1[11], n0_reg1[11:7]}+ {n1_reg1[11], n1_reg1[11:7]}+s00_lsbreg1[7]; //s00_msb[6] is ignored. assign s01_msb[5:0] = {n2_reg1[11], n2_reg1[11:7]}+ {n3_reg1[11], n3_reg1[11:7]}+s01_lsbreg1[7]; assign s02_msb[5:0] = assign s03_msb[5:0] = {n4_reg1[11], n4_reg1[11:7]}+ {n5_reg1[11], n5_reg1[11:7]}+s02_lsbreg1[7]; {n6_reg1[11], n6_reg1[11:7]}+ {n7_reg1[11], n7_reg1[11:7]}+s03_lsbreg1[7];

always @ (posedge clk) // Pipeline 2: clk (2). Register msb to continue addition of msb. begin s00_msbreg2[5:0] <= s00_msb[5:0] ; // Preserve all msb sum. s01_msbreg2[5:0] <= s01_msb[5:0] ; s02_msbreg2[5:0] <= s02_msb[5:0] ; s03_msbreg2[5:0] <= s03_msb[5:0] ; s00_lsbreg2[6:0] <= s00_lsbreg1[6:0] ; // Preserve all lsb sum. s01_lsbreg2[6:0] <= s01_lsbreg1[6:0] ; s02_lsbreg2[6:0] <= s02_lsbreg1[6:0] ; s03_lsbreg2[6:0] <= s03_lsbreg1[6:0] ; end // Second Stage Addition assign s10_lsb[7:0] = assign s11_lsb[7:0] = s00_lsbreg2[6:0]+s01_lsbreg2[6:0] ; //Add lsb first : s10_lsb[7] is the carry. s02_lsbreg2[6:0] +s03_lsbreg2[6:0] ; //s00, s01 lsbs need not be registered //since addition is already carried out here.

always @ (posedge clk) // Pipeline 3: clk (3). Register msb to continue addition of msb. begin s10_lsbreg3[7:0] <= s10_lsb[7:0] ; // Preserve all lsb sum. s11_lsbreg3[7:0] <= s11_lsb[7:0] ; s00_msbreg3[5:0] <= s00_msbreg2[5:0] // Preserve all msb sum. s01_msbreg3[5:0] <= s01_msbreg2[5:0] ;
Dept of VLSI and Embedded Systems Page 15

2011 - 12

s02_msbreg3[5:0] <= s02_msbreg2[5:0] ; s03_msbreg3[5:0] <= s03_msbreg2[5:0] ; end assign s10_msb[6:0] = {s00_msbreg3[5], s00_msbreg3[5:0]}+{s01_msbreg3[5], s01_msbreg3[5:0]}+s10_lsbreg3[7] ; // Add MSB of second stage with sign extension and carry in from LSB. // s10_msb[7] is ignored. assign s11_msb[6:0] = {s02_msbreg3[5], s02_msbreg3[5:0]}+ {s03_msbreg3[5], s03_msbreg3[5:0]}+ s11_lsbreg3[7] ; always @ (posedge clk)// Pipeline 4: clk (4). Register msb to continue addition of msb. begin s10_lsbreg4[6:0] s11_lsbreg4[6:0] s10_msbreg4[6:0] s11_msbreg4[6:0] end // Third Stage Addition assign s20_lsb[7:0] = s10_lsbreg4[6:0]+ s11_lsbreg4[6:0] ; //Add lsb first : s20_lsb[7] is the carry. <= <= <= s10_lsbreg3[6:0] ; // Preserve all lsb sum. <= s11_lsbreg3[6:0] ; s10_msb[6:0] ; // Preserve all msb sum. s11_msb[6:0] ;

always @ (posedge clk) // Pipeline 5: clk (5). Register msb to continue addition of msb. begin s10_msbreg5[6:0] <= s10_msbreg4[6:0]; //Preserve all msb sum. s11_msbreg5[6:0] <= s11_msbreg4[6:0] ; s20_lsbreg5cy <= s20_lsb[7]; // Preserve all lsb sum. s20_lsbreg5[6:0] <= s20_lsb[6:0]; end // Add third stage MSB results and concatenate // with LSB result to get the final result. assign sum[14:0] = {({s10_msbreg5[6], s10_msbreg5[6:0]}+ {s11_msbreg5[6], s11_msbreg5[6:0]}+ s20_lsbreg5cy), s20_lsbreg5[6:0]}; endmodule

Dept of VLSI and Embedded Systems

Page 16

2011 - 12

RTL View

Simulation Results of Parallel Signed Adder

Dept of VLSI and Embedded Systems

Page 17

2011 - 12

Dept of VLSI and Embedded Systems

Page 18

2011 - 12

CONCLUSION Comparison of Serial and Parallel Adders with Eight Numbers f Inputs

Dept of VLSI and Embedded Systems

Page 19

2011 - 12

REFERENCES
1. S. Ramachandran, S. Srinivasan and R. Chen, EPLD-based Architecture of Real Time

2D-Discrete Cosine Transform and Quantization for Image Compression, IEEE International Symposium on Circuits and Systems (ISCAS ‘99), Orlando, Florida, May–June 1999.
2. Tian-Sheuan Chang, Chin-Sheng Kung and Chein-Wei Jen, A simple processor core

design for DCT/IDCT, IEEE Trans. Circuits Syst. Video Technol., 10 D.E. Thomas and P.R. Moorby, The Verilog Hardware Description Language, Kluwer Academic Publishers, Boston, 1998. 3. J. Bhaskar, A Verilog HDL Primer, Star Galaxy Publishing, PA, 1998. 4. J. Bhaskar, Verilog HDL Synthesis, Star Galaxy Publishing, PA, 5. M. Morris Mano and C.R. Kime, Logic and Computer Design Fundamentals, Prentice Hall, NJ, 2000.

Dept of VLSI and Embedded Systems

Page 20