Parallel Adder Design

2011 - 12

ABSTRACT
In the serial adder design, we add eight numbers, n0 to n7, and we get a sum whose size is 3 bits more than the input. The last bit is the sign bit. The design is pipelined and partitioned for the data width as well as the functionality. It is also true for the parallel adder design considered here. Pipelined approach is basically dividing an entire process into small and roughly equal time consuming sub-processes such that the total processing time of these subprocesses equals the total processing time of the entire process. The effect of pipelining may be summarized as - throughput increases considerably, latency comes into effect, chip area increases marginally. The parallel adder is nine times faster than the serial adder and can be used if speed of processing is of top most concern as it is in real time applications such as the Discrete Cosine Transform Quadrant(DCTQ). However, if the chip area is vital and the speed of processing is adequate for the application, then the serial adder is a better choice. The chip area requirement for serial adder is about six times less than the parallel adder. Also, the Verilog code is shorter. .

Dept of VLSI and Embedded Systems

Page 1

Parallel Adder Design

2011 - 12

Table of Contents
Table of Contents...................................................................................................2 INTRODUCTION...................................................................................................... 3 Digital Pipelining.................................................................................................3 Partitioning of a Design..........................................................................................5 Partition of Data Width........................................................................................5 Partition of Functionality.....................................................................................5 Pipelined Serial Adder Design.................................................................................6 Parallel Signed Adder Design..................................................................................9 RTL View........................................................................................................... 17 Simulation Results of Parallel Signed Adder......................................................17 REFERENCES........................................................................................................ 20

Dept of VLSI and Embedded Systems

Page 2

Parallel Adder Design

2011 - 12

INTRODUCTION Digital Pipelining
Consider a pipe carrying oil or water from one place to another. In order to bring about this, a motor is needed to pump the liquid. This process will naturally have some delay before the liquid is available for use at the end of the pipe. This delay may be referred to as latency. Once the pipeline is full, the vital liquid is available to the consumer continuously like a perennial river. This analogy of pipelining may be effectively applied to flow of data in a digital system. In this digital pipelining, we have data or control signals, etc., flowing through registers that may be regarded as pipes and the system clock as the driving motor. Thus, the data, etc., are carried from one part of a circuit to another via a series of registers which are clocked. Data flows from one register into another whenever the clock strikes. En-route, the data may undergo any type of process such as add, subtract, multiply, compare, etc. By this means, any complex algorithm can be solved. often with spectacular speed-up of processing time. Pipelined approach is basically dividing an entire process into small and roughly equal time consuming sub-processes such that the total processing time of these sub-processes equals the total processing time of the entire process. For example, Figure 1.1a shows the traditional approach of processing an operation such as a multiplier in about 100ns. In the pipelined approach, we divide this process into ten sub-processes, each of approximately 10ns processing time. After each sub-process, we add a register with a clock signal. As shown in the figure, the input data is applied to Proc1, which process is completed, say in 10ns. The result of this subprocess is registered in Reg1 at the positive edge of the clock. This is subjected to a sequel Proc2 followed by registering in Reg2. This is repeated up to Proc10, registering the desired final result in Reg10. Thus, the data flows in a digital pipeline from input to the final output, traveling from Reg1 to Reg10 successively, and undergoing various processes on the way. Since the data will have to travel through ten registers, we will have to wait for ten clock pulses for the output to manifest at Reg10. This delay is referred to as the latency. If each clock pulse takes 10ns to arrive, then the output is available after 100ns, which is the same as in the traditional method. Once the pipeline is full, we get a stream of processed results every 10ns. Thus, the advantage in pipelining is that we can have a throughput of (and also a clock of) 100 MHz instead of 10 MHz in the traditional approach. That means ten-fold processing speed when compared to the traditional method. However, we need to apply the input(s) every 10ns, the same as the output rate. The foregoing treatment of pipelining is shown in Figure 1.2. It may be noted that Proc10_1, Proc10_2, up to Proc10_10 are the results corresponding to the inputs Data1, Data2 up to Data10.

Dept of VLSI and Embedded Systems

Page 3

Parallel Adder Design

2011 - 12

Fig. 1.1 (a) Traditional approach. (b) Pipelined approach Time(ns) 0 10 20 …… 100 110 …… 190 Latency: 100ns. Fig. 1.2 Processing order of pipelining The effect of pipelining may be summarized as follows: • • • Throughput increases considerably Latency comes into effect Chip area increases marginally Input Data1 Data2 Data3 …… Data11 Data12 …… Data20 Reg. 1 Proc.1_1 Proc.1_2 …… Proc.1_10 Proc.1_11 …… Proc.1_19 Reg. 2 Proc.2_1 …… Proc.2_9 Proc.2_10 …… Proc.2_18 …… Reg. 10

…… …… …… …… ……

…… Proc.10_1 Proc.10_2 …… Proc.10_10

Dept of VLSI and Embedded Systems

Page 4

Parallel Adder Design

2011 - 12

Partitioning of a Design
In order to incorporate pipelining in the design, we need to break a sequence of operations or a complex algorithm into convenient small steps in terms of the following: • • Partition of data width Partition of functionality

The following sub-sections discuss the methodology of partitioning.

Partition of Data Width
Let us consider a process of adding two 16-bit numbers. This will be a time consuming process if addition is carried out on 16 bits since bit-wise carry out generated need to propagate through all the 16 bits. A better way of doing this is to bifurcate it into two 8-bit numbers and add only 8 bits at a time. That will be faster than adding 16 bits at one go. This can be effectively carried out by introducing pipelining. The LSBs of the two numbers are added first and stored in a pipeline register along with the generated carry at the rising edge of the system clock. In the next rising edge of the clock, MSBs of the two numbers are added along with the carry generated while adding the LSBs. In this fashion, we can divide and conquer the entire data width, no matter how wide it is. There are no hard and fast rules for this division of width. One has to experiment with it and choose the best possible bifurcation applicable for a particular application. We will illustrate the partitioning of data width by an example, a signed adder with the following specifications: 1. Eight signed input numbers, each of width 12 bits 2. Sum of these numbers are required Conventional approach of addition/subtraction uses all the 12 bits together. Since full adders are used for implementation, the result is delayed owing to the propagation of carry rippling through all the 12 bits. Even the usage of ‘carry look ahead’ circuit does not help in speeding up the computation since a large number of gates and inputs are required in this case. The answer for this problem is to divide the data widths into smaller and equal chunks, and introduce pipelining. In the data width partitioning approach, all sub-blocks do the same function, namely addition.

Partition of Functionality
Functionality is any process such as addition, subtraction, multiplication, or division. We need to group similar functions such as multiplication together. Also, the functional block is divided into smaller sub-blocks, if this is feasible. In this type of partitioning, each sub-block does a different function, in general. This can be clearly understood by considering an example. To compute a sum of products: a1*b1 + a2*b2 + a3*b3 + a4*b4, where a1, b1, etc., are each of size 16 bits. We can group multiplication functions, a1*b1, a2*b2, and a3*b3 together and do all these computations simultaneously and register the partial products.
Dept of VLSI and Embedded Systems Page 5

Parallel Adder Design

2011 - 12

Similarly in the subsequent pipeline stage, we can perform additions A = (a1*b1 + a2*b2) and B = (a3*b3 + a4*b4) concurrently. In a next pipeline stage, the final addition, result = A + B, which is the desired sum of products, is performed. It may be noted that products such as a1*b1, etc., can be broken down into smaller sub-blocks, namely, shift operations and additions. In the signed adder example cited earlier, LSBs (7 bits) of the eight numbers are added concurrently followed by the addition of MSBs (5 bits along with carry from LSB addition) in subsequent pipeline stages.

Pipelined Serial Adder Design
The code for addition of eight, 12 bit, twos complement numbers is shown in Verilog Code_1.1. The inputs are fed serially at pins marked “n”. The design module is declared as “serial_adder12s”, listing all the inputs/outputs. The inputs are the system clock, enable, and n. The sum and result are the outputs. The signal, sum_valid, goes high when the added sum is valid. The “result” is the same as the “sum” except for the difference that the added result is prolonged at the “result” output till it is overwritten by a new result. A 3-bit counter, cnt [2:0], keeps track of the number of inputs accumulated. The first assign statement computes the “sum” in advance (sum_next [14:0]) if “enable” is high. Otherwise, it is cleared. Note that the sum is sign extended by 3 bits since the result is 3 bits more than the input number(s). Also, note carefully the number of flower brackets used. Otherwise, compiler tool will complain. The counter, cnt, is pre-advanced if enabled. The sum is valid after inputting the eighth number. An advanced valid signal, sum_val, is switched on only when “cnt” equals 7. The first “always” block registers the advance sum computed earlier when the clock strikes. Also, the “cnt” is incremented, every time an input is accumulated. The “sum_valid” is set high if all the eight input numbers are exhausted. The last “always” block registers the “result” whatever was in “sum” if “sum_valid” is active. Otherwise, the result is not disturbed.

Dept of VLSI and Embedded Systems

Page 6

Parallel Adder Design

2011 - 12

Verilog Code_1.1 // Place the design in a file named “serial_adder12s.v”. module serial_adder12s ( clk, enable, n, sum, sum_valid, result ); input clk ; input enable ; input [11:0] n ; output [14:0] sum ; output sum_valid ; output [14:0] result ; // Prolong the result till it is overwritten by a new result. wire wire wire reg reg reg reg [14:0] sum_next ; [2:0] cnt_next ; sum_val ; [14:0] sum; [2:0] cnt ; sum_valid ; [14:0] result ; // Declare nets in the design.

assign sum_next [14:0] = enable ? ({{3{n[11]}}, n[11:0]} + sum[14:0]) : 0 ; // Sign extend & accumulate. assign cnt_next [2:0] = enable ? (cnt + 1) : 0 ; // Pre-advance the counter. assign sum_val = (cnt == 7) ? 1 : 0 ; // Pre-determine the validity of the sum. always @ (posedge clk) begin sum [14:0] <= cnt [2:0] sum_valid <= end always @ (posedge clk) // Prolong the result till it is overwritten by the new result[14:0] endmodule <= result. sum_valid ? sum[14:0] : result[14:0] ; // Register the sum. // Pipeline – Register the sum. sum_next [14:0] ; // Register the sum. <= cnt_next [2:0] ; // Advance the count. sum_val ; // Register the signal.

Dept of VLSI and Embedded Systems

Page 7

Parallel Adder Design

2011 - 12

Dept of VLSI and Embedded Systems

Page 8

Parallel Adder Design

2011 - 12

Parallel Signed Adder Design
In the serial adder design, we added eight numbers, n0 to n7, and we got a sum whose size is 3 bits more than the input. The last bit is the sign bit. The design was pipelined and partitioned for the data width as well as the functionality. It is also true for the parallel adder design . The block diagram for this design is shown in Figure 1.2. We have three stages of pipelining and five pipelined registers in this design. Before we consider the design, let us see how to evaluate twos complement quickly. It can be done in just two steps as follows.

Fig. 1.2 Parallel signed order design

Twos Complement Evaluation (Shortcut)
Let us say that we have an eight bit data 11110000, whose twos complement is required. This can be evaluated as follows. We may have to sign extend the number by 1 bit, i.e., duplicate the MSB, if we wish to add another number as shown. In the first step (other than sign extension), we scan the number from LSB till we encounter the first “1” and retain all the bits from LSB up to “1”. In this example, we retain 10000. In the second and final step, we invert all other bits (1111) to get the desired result, 000010000. Once you get used to this, you will be able to compute the twos complement at one shot. When we add two numbers, the result will be 1 bit more than the precision of each number. Hence, we need to extend the sign bit of each number by one. [8].……...[0]

111110000 Sign extended data.
Dept of VLSI and Embedded Systems Page 9

Parallel Adder Design

2011 - 12

Step 1 2 • • •

10000 Retain first 1 from LSB, followed by 0s. 000010000 Invert other bits.

Sign can be extended by any number of bits without affecting the actual value. Sign extend means duplicate MSB ([8]<=[7]). Without the sign extension, the MSB [7] will be mistaken as a negative number for high positive values such as +254.

Pipelined Design of Parallel Twos Complement Adder
The parallel signed adder shown in Figure 1.2 has a simple algorithm. This was evolved for use in the Discrete Cosine Transform Quadrant (DCTQ) application, where speed of processing has the top most priority, and the method is shown in Figure 1.3. The signed addition can be realized with seven two input adders and five pipeline stages. In the first stage, we have four numbers of 12 bits, twos complement adders to add all the eight numbers. They work concurrently, thereby speeding up the process. They have pipe lined registers internally. The clock input is marked as (1), (2), etc., and correspond to internal pipeline registers. We will add the LSBs at the first clock pulse (1) and the MSBs at the next clock pulse (2) along with the carry generated at the LSB. In the second stage, we will add the four outputs, each of size 13 bits, generated at the first stage. Two numbers of two input adders are used at this stage. LSBs and MSBs are added with the arrival of the clock pulse (3) and clock pulse (4) respectively. In the third stage, with the arrival of the clock pulse (5), we will add the LSBs of the two inputs of size, 14 bits. Subsequently, the MSBs are added along with carry generated while adding the LSBs to produce 15 bits final result.
Dept of VLSI and Embedded Systems Page 10

Parallel Adder Design

2011 - 12

Fig. 1.3 Pipelined design partition of parallel adder

Verilog Code for the Parallel Signed Adder Design
Now, let us consider the Verilog code for this parallel, signed adder design. We will see how to add eight 12 bit, twos complement numbers n0 to n7 with 5 pipeline stages registered at positive clock. The result “sum” is a 15 bits in twos complement and the output is not registered. We have to first declare the module with the appropriate module name and declare the input clk, the input numbers n0 to n7 and the output sum. During the course of actual arithmetic operations, we will encounter many intermediate signals. Some of them may be used in assign statements and they are declared as wire along with their width. We also have some numbers, which are not used in the computation, but propagated at a particular stage. For example, the msb addition is not calculated at the beginning and so they have to be registered and propagated for use later on when it is required. The msb and lsb for the next stage are also declared as registers. This completes the “reg”, “wire” declarations. In the first stage, we add two numbers at a time, say, n0 and n1 and we add only the lsbs of the two numbers. Parallel to this, we add the others numbers n2 and n3, n4 and n5, and n6 and n7. This is same as that of using four adders concurrently and the results are stored using the assign statements. We add only the lsbs and register when the first clock arrives. Since we are not adding the msbs at this stage, we need to register it separately and propagate it through and use when the next clock arrives. Before the next clock arrives, we also preserve the sum. We have four sum results at this stage. Before we add the msbs, the sign should be extended. The msb 11 is the sign bit. The sign bit is first copied to another signal and then
Dept of VLSI and Embedded Systems Page 11

Parallel Adder Design

2011 - 12

concatenated with the original value. This is done for both n0 and n1 and then added. We should also add the carry resulting from the msb addition. Since this is a time consuming operation, we preserve the results before the next clock pulse arrives. In the next clock, we preserve the entire msb sum in registers for use in the subsequent stages. We should also continue to preserve the lsb sum, as we need it for the final results. This completes the first stage of computation. In the second stage, we add the 4-lsb sums we got in the first stage in two steps s00, s01 and s02, s03. The carry resulting here will be added with the msb later on. At the third clock pulse, the msbs are registered to continue addition later on. So we preserve the msbs and the lsb sum found at this stage. After the clock 4 edge rises, the added msbs of the second stage and carry generated in lsb addition are stored. At this stage, we have two msb and lsb sums. At clk (5) rising edge, msbs and lsbs are registered to continue addition of msb.At the third stage, the two msbs are added and concatenated with LSB result to get the final result, 15 bits sum. This completes the design of the parallel signed adder.

Dept of VLSI and Embedded Systems

Page 12

Parallel Adder Design

2011 - 12

Verilog Code_1.2 /* Verilog Code for Signed Adder Design // Adds eight numbers, n0 to n7, each of size 12 bits in 2’s complement. // Has five pipeline stages registered at positive edge of clock. // Result, sum, is in 15 bits, 2’s complement form (not registered). module adder12s ( n0, n1, n2, n3, n4, n5, n6, n7, sum clk,

); input input input input input input input input input output [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [11:0] [14:0] clk ; n0 ; n1 ; n2 ; n3 ; n4 ; n5 ; n6 ; n7 ; sum ;

wire wire wire wire wire wire wire wire wire wire wire wire wire reg reg

[7:0] [7:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [7:0] [7:0] [6:0] [6:0] [7:0]

s00_lsb ; s01_lsb ; s02_lsb ; s03_lsb ; s00_msb ; s01_msb ; s02_msb ; s03_msb ; s10_lsb ; s11_lsb ; s10_msb ; s11_msb ; s20_lsb ;

[11:7] n0_reg1 ; [11:7] n1_reg1 ;
Page 13

Dept of VLSI and Embedded Systems

Parallel Adder Design

2011 - 12

reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg

[11:7] [11:7] [11:7] [11:7] [11:7] [11:7] [7:0] [7:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [6:0] [6:0] [6:0] [6:0] [7:0] [7:0] [5:0] [5:0] [5:0] [5:0] [6:0] [6:0] [6:0] [6:0] [6:0] [6:0] [6:0]

n2_reg1 ; n3_reg1 ; n4_reg1 ; n5_reg1 ; n6_reg1 ; n7_reg1 ; s00_lsbreg1 ; s01_lsbreg1 ; s02_lsbreg1 ; s03_lsbreg1 ; s00_msbreg2 ; s01_msbreg2 ; s02_msbreg2 ; s03_msbreg2 ; s00_lsbreg2 ; s01_lsbreg2 ; s02_lsbreg2 ; s03_lsbreg2 ; s10_lsbreg3 ; s11_lsbreg3 ; s00_msbreg3 ; s01_msbreg3 ; s02_msbreg3 ; s03_msbreg3 ; s10_lsbreg4 ; s11_lsbreg4 ; s10_msbreg4 ; s11_msbreg4 ; s10_msbreg5 ; s11_msbreg5 ; s20_lsbreg5cy ; s20_lsbreg5 ;

// First Stage Addition assign s00_lsb[7:0] = n0[6:0]+n1[6:0] ; // Add lsb first - s00_lsb[7] is the carry assign s01_lsb[7:0] = n2[6:0]+n3[6:0] ; // n0-n7 lsb need not be registered since addition is already carried out here. assign s02_lsb[7:0] = n4[6:0]+n5[6:0] ; assign s03_lsb[7:0] = n6[6:0]+n7[6:0] ; always @ (posedge clk) // Pipeline 1: clk (1). Register msb to continue // addition of msb. begin n0_reg1[11:7] <= n0[11:7] ; // Preserve all inputs for msb addition during the clk(2). n1_reg1[11:7] <= n1[11:7] ; n2_reg1[11:7] <= n2[11:7] ; n3_reg1[11:7] <= n3[11:7] ;
Dept of VLSI and Embedded Systems Page 14

Parallel Adder Design

2011 - 12

n4_reg1[11:7] <= n4[11:7] ; n5_reg1[11:7] <= n5[11:7] ; n6_reg1[11:7] <= n6[11:7] ; n7_reg1[11:7] <= n7[11:7] ; s00_lsbreg1[7:0] <= s00_lsb[7:0] ; // Preserve all lsb sum. s00_lsbreg1[7] is the registered carry from lsb addition. s01_lsbreg1[7:0] <= s01_lsb[7:0] ; s02_lsbreg1[7:0] <= s02_lsb[7:0] ; s03_lsbreg1[7:0] <= s03_lsb[7:0] ; end // Sign extended & msb added with carry. assign s00_msb[5:0] = {n0_reg1[11], n0_reg1[11:7]}+ {n1_reg1[11], n1_reg1[11:7]}+s00_lsbreg1[7]; //s00_msb[6] is ignored. assign s01_msb[5:0] = {n2_reg1[11], n2_reg1[11:7]}+ {n3_reg1[11], n3_reg1[11:7]}+s01_lsbreg1[7]; assign s02_msb[5:0] = assign s03_msb[5:0] = {n4_reg1[11], n4_reg1[11:7]}+ {n5_reg1[11], n5_reg1[11:7]}+s02_lsbreg1[7]; {n6_reg1[11], n6_reg1[11:7]}+ {n7_reg1[11], n7_reg1[11:7]}+s03_lsbreg1[7];

always @ (posedge clk) // Pipeline 2: clk (2). Register msb to continue addition of msb. begin s00_msbreg2[5:0] <= s00_msb[5:0] ; // Preserve all msb sum. s01_msbreg2[5:0] <= s01_msb[5:0] ; s02_msbreg2[5:0] <= s02_msb[5:0] ; s03_msbreg2[5:0] <= s03_msb[5:0] ; s00_lsbreg2[6:0] <= s00_lsbreg1[6:0] ; // Preserve all lsb sum. s01_lsbreg2[6:0] <= s01_lsbreg1[6:0] ; s02_lsbreg2[6:0] <= s02_lsbreg1[6:0] ; s03_lsbreg2[6:0] <= s03_lsbreg1[6:0] ; end // Second Stage Addition assign s10_lsb[7:0] = assign s11_lsb[7:0] = s00_lsbreg2[6:0]+s01_lsbreg2[6:0] ; //Add lsb first : s10_lsb[7] is the carry. s02_lsbreg2[6:0] +s03_lsbreg2[6:0] ; //s00, s01 lsbs need not be registered //since addition is already carried out here.

always @ (posedge clk) // Pipeline 3: clk (3). Register msb to continue addition of msb. begin s10_lsbreg3[7:0] <= s10_lsb[7:0] ; // Preserve all lsb sum. s11_lsbreg3[7:0] <= s11_lsb[7:0] ; s00_msbreg3[5:0] <= s00_msbreg2[5:0] // Preserve all msb sum. s01_msbreg3[5:0] <= s01_msbreg2[5:0] ;
Dept of VLSI and Embedded Systems Page 15

Parallel Adder Design

2011 - 12

s02_msbreg3[5:0] <= s02_msbreg2[5:0] ; s03_msbreg3[5:0] <= s03_msbreg2[5:0] ; end assign s10_msb[6:0] = {s00_msbreg3[5], s00_msbreg3[5:0]}+{s01_msbreg3[5], s01_msbreg3[5:0]}+s10_lsbreg3[7] ; // Add MSB of second stage with sign extension and carry in from LSB. // s10_msb[7] is ignored. assign s11_msb[6:0] = {s02_msbreg3[5], s02_msbreg3[5:0]}+ {s03_msbreg3[5], s03_msbreg3[5:0]}+ s11_lsbreg3[7] ; always @ (posedge clk)// Pipeline 4: clk (4). Register msb to continue addition of msb. begin s10_lsbreg4[6:0] s11_lsbreg4[6:0] s10_msbreg4[6:0] s11_msbreg4[6:0] end // Third Stage Addition assign s20_lsb[7:0] = s10_lsbreg4[6:0]+ s11_lsbreg4[6:0] ; //Add lsb first : s20_lsb[7] is the carry. <= <= <= s10_lsbreg3[6:0] ; // Preserve all lsb sum. <= s11_lsbreg3[6:0] ; s10_msb[6:0] ; // Preserve all msb sum. s11_msb[6:0] ;

always @ (posedge clk) // Pipeline 5: clk (5). Register msb to continue addition of msb. begin s10_msbreg5[6:0] <= s10_msbreg4[6:0]; //Preserve all msb sum. s11_msbreg5[6:0] <= s11_msbreg4[6:0] ; s20_lsbreg5cy <= s20_lsb[7]; // Preserve all lsb sum. s20_lsbreg5[6:0] <= s20_lsb[6:0]; end // Add third stage MSB results and concatenate // with LSB result to get the final result. assign sum[14:0] = {({s10_msbreg5[6], s10_msbreg5[6:0]}+ {s11_msbreg5[6], s11_msbreg5[6:0]}+ s20_lsbreg5cy), s20_lsbreg5[6:0]}; endmodule

Dept of VLSI and Embedded Systems

Page 16

Parallel Adder Design

2011 - 12

RTL View

Simulation Results of Parallel Signed Adder

Dept of VLSI and Embedded Systems

Page 17

Parallel Adder Design

2011 - 12

Dept of VLSI and Embedded Systems

Page 18

Parallel Adder Design

2011 - 12

CONCLUSION Comparison of Serial and Parallel Adders with Eight Numbers f Inputs
The serial and parallel adders designed earlier, add eight numbers of inputs, each of width, 12 bits, where the MSB is the sign bit. They are basically adder cum subtractor since they perform signed addition. The output width is 15 bits. The performance of these two types of designs, which serve the same purpose of adding eight signed numbers are presented in Table 1.1. The parallel adder is nine times faster than the serial adder and may be used if speed of processing is of top most concern as it is in real time applications such as the DCTQ. However, if the chip area is vital and the speed of processing is adequate for the application, then the serial adder is a better choice. The chip area requirement for serial adder is about six times less than the parallel adder. Also, the Verilog code is shorter. Table 1.1 Comparison of performance of eight inputs serial and parallel adders Type of Adder No. of i/p clk cycles No. of o/p clk cycles 9 1 Gate count JTAG gate Maximum frequency of operation in MHz Serial 8 9 464 2,160 174 Parallel 1 1 2,810 5,376 152

Dept of VLSI and Embedded Systems

Page 19

Parallel Adder Design

2011 - 12

REFERENCES
1. S. Ramachandran, S. Srinivasan and R. Chen, EPLD-based Architecture of Real Time

2D-Discrete Cosine Transform and Quantization for Image Compression, IEEE International Symposium on Circuits and Systems (ISCAS ‘99), Orlando, Florida, May–June 1999.
2. Tian-Sheuan Chang, Chin-Sheng Kung and Chein-Wei Jen, A simple processor core

design for DCT/IDCT, IEEE Trans. Circuits Syst. Video Technol., 10 D.E. Thomas and P.R. Moorby, The Verilog Hardware Description Language, Kluwer Academic Publishers, Boston, 1998. 3. J. Bhaskar, A Verilog HDL Primer, Star Galaxy Publishing, PA, 1998. 4. J. Bhaskar, Verilog HDL Synthesis, Star Galaxy Publishing, PA, 5. M. Morris Mano and C.R. Kime, Logic and Computer Design Fundamentals, Prentice Hall, NJ, 2000.

Dept of VLSI and Embedded Systems

Page 20