You are on page 1of 57

SELF-CHECKING CARRY SELECT ADDER

WITH MULTIPLE FAULT DETECTION


UTILISING OPTIMISED ADD-ONE AND
MULTIPLEXING LOGIC
A Project Report
Submitted in partial fulfilment of the requirements
for the award of the degree of
BACHELOR OF TECHNOLOGY
In
ELECTRONICS AND COMMUNICATION ENGINEERING
By
K. RAVI KIRAN (18B91A0485)
K. DEEPIKA SOWMYA (18B91A04B4) K. KARUNAKAR (18B91A0487)
K. HARI KRISHNA (18B91A0478)

Under the esteemed guidance of


Sri T. V. SYAMALA RAJU, M. Tech
Assistant Professor, Department of ECE

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

S.R.K.R. ENGINEERING COLLEGE (AUTONOMOUS)


(Affiliated to JNTU, KAKINADA)
(Recognized by A.I.C.T.E., Accredited by N.A.A.C. with ‘A’ Grade, NEW DELHI)
CHINNA AMIRAM, BHIMAVARAM- 534204

(2018-2022)
S.R.K.R. ENGINEERING COLLEGE (AUTONOMOUS)
(Affiliated to JNTU, KAKINADA)
(Recognized by A.I.C.T.E., Accredited by N.A.A.C. with ‘A’ Grade, NEW DELHI)
CHINNA AMIRAM, BHIMAVARAM- 534204

(2018-2022)
DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING

CERTIFICATE
This is to certify that the project work entitled

“SELF-CHECKING CARRY SELECT ADDER WITH


MULTIPLE FAULT DETECTION UTILISING
OPTIMISED ADD-ONE AND MULTIPLEXING LOGIC”
is the bona fide work submitted by

Mr/
Miss……………………………………………………………………………………………
………………………………………………………………………………………………...
Regd. No…………………………………... of final year B.Tech. along with his/her batch
mates submitted in the partial fulfilment of the requirement for the award of the degree of
BACHELOR OF TECHNOLOGY in ELECTRONICS AND COMMUNICATION
ENGINEERING during the academic year 2021 – 2022.

Guide: Head of the Department:


T. V. Syamala Raju, M. Tech, Dr. N. UDAY KUMAR,
Assistant professor M.E., Ph.D., M.I.S.T.E., M.I.E.E.E.
Department of ECE Department of ECE
CERTIFICATE OF EXAMINATION

This is to certify that we had examined the thesis and here by accord our approval of
it as a study carried out and presented in a manner required for its acceptance in a
partial fulfilment for the award of BACHELOR OF TECHNOLOGY in
ELECTRONICS AND COMMUNICATION ENGINEERING for which it has
been submitted. This approval does not endorse or accept every statement made,
opinion expressed or conclusion drawn as in report. It signifies acceptance of report
for the purpose for which it is submitted.

External Examiner Internal Examiner


DECLARATION

This is to certify that the project entitled “SELF-CHECKING CARRY SELECT


ADDER WITH MULTIPLE FAULT DETECTION UTILISING OPTIMISED
ADD-ONE AND MULTIPLEXING LOGIC” is submitted by K. RAVI KIRAN
(18B91A0485), K. DEEPIKA SOWMYA (18B91A04B4), K. KARUNAKAR
(18B91A0487) K. HARI KRISHNA (18B91A0478) In partial fulfilment of the
requirement for the award of degree B. Tech in Electronics and Communication
Engineering to S.R.K.R Engineering College, affiliated to JNTU KAKINADA. It
comprises only our original work and the acknowledgment has been made in the text
to all other material used.

Date:
ACKNOWLEDGEMENT

The successful completion of any task would be incomplete without greeting those
who made it possible and whose guidance and encouragement made the effort taken a
success.

This dissertation here is the work accomplished under the enviable and scholarly
guidance of Asst Prof. Mr. T. V. SYAMALA RAJU, Department of Electronics and
Communication Engineering. We are profoundly grateful for the unmatched services
rendered by her. We express deep-felt and sincere appreciation for her advice and
encouragement during the preparation and progress of this project.

It gives us immense pleasure to avail this opportunity to thank Dr. N. UDAY


KUMAR, Head of the Department of Electronics and Communication Engineering
for his constant and timely support and encouragement.

We express our thanks to Dr. M. JAGAPATHI RAJU, Principal, S.R.K.R.


Engineering College, Bhimavaram for giving us this opportunity for the successful
completion of Our project.

We also thank other teaching and non-teaching staff for their assistance and help
extended. We thank one and all that have contributed directly or indirectly to this
project.

Project associates

K. RAVI KIRAN

K. DEEPIKA SOWMYA

K. KARUNAKAR

K. HARI KRISHNA
CONTENTS
ABSTRACT

LIST OF FIGURES

LIST OF TABLES

1.INTRODUCTION
2. BASIC PARALLEL ADDERS
2.1 HALF ADDER
2.2 FULL ADDER
2.3 RIPPLE CARRY ADDER (RCA)
2.4 CARRY LOOK-AHEAD ADDER (CLA)
2.5 CARRY SAVE ADDER (CSA)
2.6 CARRY SKIP ADDER (CSKA)
2.7 CARRY INCREMENT ADDER (CIA)
2.8 CARRY SELECT ADDER (CSA)
2.9 PARAMETERS
3. ABOUT VERILOG HDL

4. KEY REQUIREMENTS

5. PRINCIPLE OF SELF-CHECKING FULL ADDER


6. SELF-CHECKING CSLA ARCHITECTURES
6.1 BEC BASED SELF-CHECKING CSLA
6.2 CGS BASED SELF-CHECKING CSLA
7. FAST ADD-ONE AND MULTIPLEXING LOGIC BASED SELF-
CHECKING CSLA ARCHITECTURES
7.1 SELF-CHECKING CSLA WITH FAST ADD-ONE AND MULTIPLEXING
(FAM) LOGIC
7.2 FAM LOGIC BASED SELF-CHECKING CSLA WITH MUX-BASED RCA
7.3 FAM LOGIC BASED SELF-CHECKING CSLA WITH SKIP
LOGIC-BASED RCA
8. DELAY OPTIMIZED SELF-CHECKING MULTI-STAGE CSLA

8.1 SQRT GROUPING

9. RESULTS
9.1 RESULTS OF 16-BIT BEC BASED SELF-CHECKING CSLA
9.2 RESULTS OF 32-BIT BEC BASED SELF-CHECKING CSLA
9.3 RESULTS OF 64-BIT BEC BASED SELF-CHECKING CSLA
9.4 RESULTS OF 16-BIT CGS BASED SELF-CHECKING CSLA
9.5 RESULTS OF 32-BIT CGS BASED SELF-CHECKING CSLA
9.6 RESULTS OF 64-BIT CGS BASED SELF-CHECKING CSLA
9.7 RESULTS OF 16-BIT FAM BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.8 RESULTS OF 32-BIT FAM BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.9 RESULTS OF 64-BIT FAM BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.10 RESULTS OF 16-BIT SKIP LOGIC BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.10 RESULTS OF 32-BIT SKIP LOGIC BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.10 RESULTS OF 64-BIT SKIP LOGIC BASED SELF-CHECKING CLSA
USING MUX-BASED RCA
9.11 COMPARISON OF PARAMETERS FOR DIFFERENT
SELF-CHECKING ARCHITECTURES
10. CONCLUSION

11. FUTURE SCOPE

12. REFERENCES
ABSTRACT
In this project, square-root (SQRT) carry select adder (CSLA) is one of the fastest
adder architectures incorporated with a high-speed design, a multiple-fault detecting
design, and a design with less required area compared to the previous generation
CSLAs are implemented. Utilizing low-cost, high-speed, reliable computational units
is a major task in the design of current Digital Integrated circuits. From the concept
of the self-checking full adder, an n-bit logic-enhanced fault-tolerant single-stage
BEC based CSLA, Carry Generation and Selection (CGS) based CSLA and a new
Fast Add-one and Multiplexing (FAM) based CSLA design is implemented with two
different fast RCAs which can detect a maximum of n contemporaneous faults. And
to further reduce the delay of the multi-stage self-checking CSLA utilizing
conventional SQRT grouping structure is utilized for different adder sizes to reduce
the delay. In this project, the conventional BEC, CGS-based self-checking CLSA
designs are compared with new FAM-based CSLA designs in terms of LUT, power
consumption, delay, ADP, and PDP. These self-checking CSLA designs are
implemented for 16-bit, 32-bit, and 64-bit adder sizes coded in Verilog HDL,
simulated, and synthesized by using XILINX VIVADO.
1.INTRODUCTION
Nowadays, the VLSI systems are encountering both transient and permanent faults
because of the factors such as shrinking power supply voltages and diminishing
transistor feature sizes. The vulnerability of such systems against different
environmental effects is increasing especially in extreme environments such as outer
space. Therefore, the VLSI-based circuits especially the processors should be
reconsidered to be reinforced against different types and numbers of faults in the form
of fault masking or fault detection.

Regarding the fault detection property, the concept of self-checking is used which
includes fault-secure and self-testing characteristics, and such a system is called self-
checking. A circuit is said fault-secure if it remains unaffected by a fault or it
indicates a fault as soon as it occurs. In addition, a circuit is said self-testing if it is
guaranteed that for each modeled fault there is at least one input vector, occurring
during the normal operation of the circuit that detects it. To attain self-checking
processing systems, the concepts of self-checking and error detection have been
applied to a variety of adders and some multipliers although these concepts are used
in other applications such as self-healing networks.

The main self-checking methods include parity prediction schemes, arithmetic residue
codes, duplication with comparison or double modular redundancy, and time
redundancy-based methods. All of these methods are intrinsically used for single fault
detection and are naturally not utilized for multiple-fault scenarios. Among these
methods, duplication with comparison requires significantly more area and power
because it doubles the basic circuit besides using a proper comparator. Furthermore,
time redundancy-based methods highly reduce the primary speed in addition to
requiring some area and power overheads.

The carry-select adder (CSLA) as one of the fastest adders is considered in this
project. This adder can be revisited for the self-checking property and multiple-fault
detection capability while performing the high-speed add operations. Regarding the
area or power optimizations, the more valuable designs are proposed. In most of these
designs, the square-root (SQRT) CSLA architecture is investigated because of having
a lower delay. In the basic SQRT scheme, to optimize the worst-case delay, the

1
different-size groups normally including two Ripple-Carry Adder (RCA) sections in
parallel are utilized.

The simplest adder is Ripple Carry Adder (RCA) which includes n Full Adders (FA)
if the operands bit width is n. The RCA requires the lowest cost but it has the worst
critical path delay. Therefore, some structures to address the delay problem have been
suggested such as carry look-ahead adder, Carry Select Adder (CSLA), parallel prefix
adders and carry skip adder. The CSLA is one of the fast adders and requires lower
area and power consumption compared to many adder structures. However, due to the
use of duplicated RCAs in the basic CSLA, its area and power are still high compared
to RCA. Two alternative designs are investigated to reduce the area and power
consumption. The first category includes the methods that replace the second RCA in
each group of CSLA (the RCA with the input carry equal to one) with a simpler logic
by utilizing the outputs of the first RCA (the RCA with the input carry equal to zero).
This modification leads to lower area and power consumption but increases the delay
because the results corresponding to input carry equal to one are produced in each
group after the preparation of the results of the first RCA.

In this project, reduced-area SQRT CSLA designs are implemented for the 16-bit, 32-
bit, and 64-bit addition. In addition, the Binary to Excess-1 converter (BEC) is used
instead of the second RCA in each group which leads to lower area and power
consumption whereas its delay is increased. A new CSLA based on the common
Boolean logic (CBL) that greatly reduces the required area and power but highly
decreases the speed, as well is also implemented. Enhancing a design with these
characteristics requires area, power, and delay overheads.

Thus, the main goal of this project is the design of a new self-checking CSLA which
is based on the add-one and multiplexing design with multiple-fault detection
capability while improving delay and area compared to the previous self-checking
CSLA designs. To attain multiple fault detection capabilities, the fault detection is
independently performed in each bit of the CSLA based on the concept of a self-
checking full adder (FA). In addition, to reduce area and delay, the multiplexer
(MUX) along with BEC is combined and reduced to a fast add-one and a multiplexing
(FAM) circuit is used for the selection of proper sum and carry in each group. Finally,
for more delay enhancement, a new grouping structure is utilized for different-size

2
groups compatible with fast multiplexing. This new grouping outperforms the basic
SQRT grouping in both delay and power for different adder sizes.

2. BASIC PARALLEL ADDERS


2.1 HALF ADDER:

This does the summation of two binary operands and generates two binary results
which are the sum and carry. XOR logic is implemented on two inputs to generate a
sum which means the sum will be 1 only when both the inputs are different,
otherwise, it will be 0. AND logic is implemented on two inputs to generate carry that
means carry will be 1 only when both the inputs are 1, otherwise, it will be 0.

sum = a XOR b
carry = a AND b

Operands Results
A B Sum carry
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

Table 2.1: Truth table of half adder Fig. 2.1: Half adder
2.2 FULL ADDER:

This adds three binary operands and produces two binary results. Here three operands
are considered to be a, b and c. Two results are considered to be the sum and carry.
Full adder logic is designed in such a manner that can take eight input combinations
and can be cascaded with one another by propagating carry to form a multi-bit adder.
Full adder is implemented with logic gates as:
Fig. 2.2: Full adder

3
sum (S) = a XOR b XOR c
carry (Cout) = ab + bc + ca

operands results
A B C sum carry
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

Table 2.2: Truth table of full adder

2.3 RIPPLE CARRY ADDER (RCA):

RCA is designed based on how many bits the inputs contain. If both the inputs contain
N bits, then N numbers of full adders are connected in series with carry being
propagated from one full adder to another. Its circuit simplicity is an asset for it,
except that for N-bit addition, N full adders are to be required which increases power
consumption invariably. Each full adder must wait until it obtains the carry from its
previous full adder which further increases the delay. So RCA is not preferred as
several bits increase. RCA designed for two 4-bit numbers addition is given in Fig.
2.3

Fig. 2.3: RCA


4
2.4 CARRY LOOK-AHEAD ADDER (CLA):

Every full adder in RCA should lapse in time for previous state carry. This excessive
waiting is negotiated by launching CLA. This adder gives results very fast because it
does not involve carry propagation. It uses simple logical operations to calculate the
carry at each stage using the previous stage carry. This method employs several logic
gates such as XOR and AND gates for generating propagate and generate bits. These
bits are further used for carry calculation. The carry and sum at each stage can be
generated directly without wasting time by the following formulae.

C[i+1] = g[i] + p[i]. C[i]

S[i] = p[i] XOR C[i]

Where g[i] = A[i].B[i]

P[i] = A[i] XOR B[i]

Here i take values from 0 to n-1, where n is the number of bits in inputs. 4-bit CLA is
shown in Fig. 2.4

Fig. 2.4 CLA


2.5 CARRY SAVE ADDER (CSA): 5

This module is mainly adopted because it can add three multi-bit inputs at a time. It
employs full adders in two different stages. In the first stage, full adders convert three
inputs into two a sum and a carry. In this stage, unlike RCA, carry propagation will
not be done from one full adder to another full adder, instead, the generated sums and
carries are propagated to the upcoming phase. Full adders in the succeeding phase
perform the addition of propagated operands from the first phase and thus final results
are produced. In the secondary phase propagation of carry from one full adder to
another will take place. 4-bit CSA is described below in Fig. 2.5

Fig. 2.5 5-bit CSA

2.6 CARRY SKIP ADDER (CSKA):

This block mainly follows skip logic in carry propagation. Just like RCA, in CSKA
full adders are cascaded. As it uses carry skip logic, the delay in propagation of carry
will be less compared to that of RCA. In addition, propagate bit at each stage is
calculated by XOR operation of inputs and these propagate bits are ANDed and given
as selection bit to a 2:1 MUX which produces Cout. 2-bit CSKAs are further used in
implementing higher-bit CSKAs. Hence it is preferable for higher bit addition.4-bit
carry skip block is given in Fig. 2.6
6

Fig. 2.6 4-bit CSKA

2.7 CARRY INCREMENT ADDER (CIA):

This adder block in turn uses RCAs for carrying out addition operations along with an
incremental circuit. In order to perform 8-bit addition, this CIA involves two 4-bit
RCAs along with an increment block. The first 4-bit RCA circuit calculates sums and
produces carry-out which is propagated to increment block. The second RCA circuit
generates transient sums which are propagated to the incremental circuit, where their
addition with propagated carry from the first RCA will take place for getting the final
outcome. In incremental block half adders are used for adding these transient sums
with carry-out, so carry generation will take less time in the increment circuit. Hence
the propagation delay will be invariably less than that of RCA.
Fig. 2.7 CLA
2.8 CARRY SELECT ADDER (CSLA): 7

CSLA uses multiple narrow adders to create fast wide adders. A CSLA breaks the
addition problem into smaller groups. It is one of the fast types of the adder. The
adder consists of two independent units. Each unit implements the addition operation
in parallel. One way to speed up the addition into several smaller groups, with each
having N-bit, say 8-bit groups, and then for each group four additions are performed
in
parallel, one assuming carry in is “0” (Cin=0) and the other assuming the carry-in is
“1” (Cin=1). when the carry-in is eventually known the correct sum is simply selected
through an N-bit using 2-to-1 mux. The adder based on this approach is known as a
carry select adder (CSLA). The application CSLA is used in data processing
processors
to perform the fast Arithmetic function.

Fig. 2.8 n-bit single-stage CSLA

The carry-select adder generally consists of two ripple carry adders and a multiplexer.
Adding two n-bit numbers with a carry-select adder is done with two adders
(therefore two ripple-carry adders) in order to perform the calculation twice, one-time
with the assumption of the carry being zero and the other assuming one. After the two
results are calculated, the correct sum, as well as the correct carry, is then selected
8
with the multiplexer once the correct carry is known. The number of bits in each carry
select block can be uniform or variable. When variable, the block size should have a
delay, from addition inputs A and B to the carryout, equal to that of the multiplexer
chain leading into it, so that the carry out is calculated just in time. The delay is
derived from uniform sizing, where the ideal number of full-adder elements per block
is equal to the square root of the number of bits being added, since that will yield and
equal number of MUX delays.
The single-stage CSeAs should be chained together to construct a multi-stage CSeA
in order to fully benefit from the parallel operations to reach a low-delay addition. In
fact, a multi-stage CSeA is composed of groups or single-stage CSeAs which can be
either of the same size or different size. If the same-size groups are used, the best size
for lowering the delay is the square root of the adder size. However, if different-size
groups are allowed, the increasing size of single-stage CSeAs in the SQRT grouping
exploits the maximum concurrency in the carry propagation path which leads to lower
delay compared to the same-size groups. In the SQRT grouping, the adder delay
approximately increases with the square root of adder size growth. Fig. 1 shows the
basic 16-bit CSeA with the SQRT grouping. Except the first 2-bit RCA which is not
grouped because the input carry Cin is ready with the input operands A and B, the
other RCAs are used in the groups with the sizes of 2, 3, 4 and 5 bits, respectively.
However, the size of multiplexer in each group is one bit bigger since an output carry
should be selected as well as the output sum bits in each group.
3. ABOUT VERILOG
9 HDL
VHDL: Multiple design-units (entity/architecture pairs), that reside in the
same system file, may be separately compiled if so desired. However, it is good
design practice to keep each design unit in its own system file in which case separate
compilation should not be an issue.

A multitude of language or user-defined data types can be used. This may


mean dedicated conversion functions are needed to convert objects from one type to
another. The choice of which data types to use should be considered wisely,
especially enumerated(abstract) data types. This will make models easier to write,
clearer to read and avoid unnecessary conversion functions that can clutter the code.
VHDL may be preferred because it allows a multitude of language or user-defined
data types to be used.

Verilog: The Verilog language is still rooted in its native interpretative mode.
Compilation is a means of speeding up simulation, but has not changed the original
nature of the language. As a result, care must be taken with both the compilation order
of code written in a single file and the compilation order of multiple files. Simulation
results can change by simply changing the order of compilation. Verilog data types
are very simple, easy to use and very much geared towards modeling hardware
structure as opposed to abstract hardware modelling. Unlike VHDL, all data types
used in a Verilog model are defined by the Verilog language and not by the user.
There are net data types, for example wire, and a register data type call reg. A model
with a signal whose type is one of the net data types has a corresponding electrical
wire in the implied modelled circuit. Objects that are signals, of type reg hold their
value over simulation delta cycles and should not be confused with the modelling of a
hardware register. Verilog may be preferred because of its simplicity.

DESIGN REUSABILITY:

VHDL. Procedures and functions may be placed in a package so that they are
available to any design-unit that wishes to use them.
Verilog. There is no concept of packages in Verilog. Functions and procedures used
within a model must be defined in the module. To make functions and procedures
generally accessible from different module statements the functions and procedures
must be placed in a separate system file and included using the include compiler
10
directive. Starting with zero knowledge of either language, Verilog is probably the
easiest to grasp and understand. This assumes the Verilog compiler directive language
for simulation and the PLI language is not included. If these languages are included,
they can be looked upon as two additional languages that need to be learned. VHDL
may seem less intuitive at first for two primary reasons. First, it is very strongly
typed: a feature that makes it robust and powerful for the advanced user after a longer
learning phase. Second, there are many ways to model the same circuit, especially
those with large hierarchal structures.

FORWARD AND BACK ANNOTATION:

A spin-off from Verilog is the Standard Delay Format (SDF). This is a general-
purpose format used to define the timing delays in a circuit. The format provides a
bidirectional link between chip layout tools and either synthesis or simulation tools, in
order to provide more accurate timing representations. The SDF format is now an
industry standard in its own right.

HIGH-LEVEL CONSTRUCTS:

VHDL. There are more constructs and features for high-level modeling in VHDL than
there are in Verilog. Abstract data types can be used along with the following
statements.

 Package statements for model reuse,


 Configuration statements for configuring design structure,
 Generate statements for replicating structure, Generic statements for generic
models that can be individually characterized, for example, bit width. All these
language statements are useful in synthesizable models.
Verilog. Except for being able to parameterize models by overloading parameter
constants, there is no equivalent to the high-level VHDL modelling statements in
Verilog.
LANGUAGE EXTENSIONS: The use of language extensions will make a model
non-standard and most likely not portable across other design tools. However,
sometimes they are necessary in order to achieve the desired results.

VHDL has an attribute called foreign that allows architectures and sub-programs to be
modelled in another language. 11

Verilog. The Programming Language Interface (PLI) is an interface between Verilog


models and Verilog software tools. For example, a designer, or more likely, a Verilog
tool vendor, can Specify user-defined tasks or functions in the C programming
language and then call them from the Verilog source description. The use of such
tasks or functions makes a Verilog model non-standard and so may not be usable by
other Verilog tools. Their use is not recommended.

LIBRARIES:

VHDL. A library is a store for compiled entities, architectures, packages, and


configurations. Useful for managing multiple design projects.

Verilog. There is no concept of a library in Verilog. This is due to its origin as an


interpretive language.

LOW-LEVEL CONSTRUCTS:

VHDL. Simple two input logical operators are built into the language, they are: NOT,
AND, OR, NAND, NOR, XOR AND XNOR. Any timing must be separately
specified using the after clause.

Verilog. The Verilog language was originally developed with gate level modelling in
mind, and so has very good constructs for modelling at this level and for modelling
the cell primitives of ASIC and FPGA libraries. Examples include User Defined
Primitives (UDP), truth tables and the specify block for specifying timing delays
across a module.

MANAGING LARGE DESIGNS:

VHDL. Configuration, generation, generic and package statements all help manage
large design structures.

Verilog. There are no statements in Verilog that help manage large designs.
OPERATORS:

The majority of operators are the same between the two languages. Verilog does have
very useful unary reduction operators that are not in VHDL. A loop statement can be
used in VHDL to perform the same operation as a Verilog unary reduction operator.

VHDL has the mod operator that is not found in Verilog.


12
PARAMETERIZABLE MODELS:

VHDL. A specific bit width model can be instantiated from a generic n-bit model
using the generic statement. The generic model will not synthesize until it is
instantiated and the values of the generic given.

Verilog, A specific width model can be instantiated from the generic n-bit model
using overloaded parameter values. The generic model must have a default parameter
value defined.

This means two things. In the absence of an overloaded value being specified, it will
still synthesize but will use the specified default parameter value. Also, it does not
need to be instantiated with an overload parameter value specified, before it will
synthesize.

PROCEDURES AND TASKS:

VHDL allows concurrent procedure calls; Verilog does not allow concurrent task
calls.

READABILITY:

This is more a matter of coding, style and experience than language feature. VHDL is
a concise and verbose language; its roots are based on Ada. Verilog is more like C
because its constructs are based approximately 50% on C and 50% an Ada. For this
reason, an existing C programmer may prefer Verilog over VHDL. Although an
existing programmer of both C and Ada may find the mix of a constructs somewhat
confusing at first. Whatever HDL is used, when writing or reading an HDL model to
be synthesized it is important to think about hardware intent.

STRUCTURAL REPLICATION:
VHDL. The generate statement replicates a number of instances of the same design-
unit or some sub part of a design, and connects it approximately.

Verilog. This is no equivalent to the generate statement in VHDL.

TEST HARNESSES:

Designers typically spend about 50% of their time writing synthesizable models and
the other 50% writing a test harness to verity the synthesizable models, Test harnesses
are not restricted to the synthesizable subset and so are free to use the full potential of
the language. VHDL has generic and configuration statements that are useful in test
harnesses that are not found in Verilog.13

VERBOSENESS:

VHDL. Because VHDL is a very strongly typed, language models must be coded
precisely with defined and matching data types. This may be considered an advantage
or disadvantage.

However, it does mean models are often more verbose, and the code often longer than
its Verilog equivalent.

Verilog. Signals representing objects of different bit widths may be assigned to each
other. The signal representing the smaller number of bits is automatically padded out
to that of the larger number of bits, and is independent of whether it is the assigned
signal or not. Unused bits will be automatically optimized away during the synthesis
process. This has the advantage if not needing to model quite as explicitly as in
VHDL, but does mean unintended modelling errors will not be identified by an
analyser.
4. KEY REQUIREMENTS
14

VERILOG is :

 A hardware design language (HDL).


 Tool for specifying hardware circuits.
 Syntactically, a lot like C or Java.
 An alternative to VHDL (and more widely used).
MODULE:

Since Verilog is HDL (hardware description language-one used for the conceptual
design of integrated circuits), it also needs to have these things. In Verilog, we call
our” black boxes” module. This is a reserved word within the program used to refer to
the things with inputs, outputs, and internal logic workings; they’re the rough
equivalents of functions with returns in other programming languages.

DRIVERS:

A driver is a data type that can drive a load. Basically, in a physical circuit, a driver
would be anything that electrons can move through/into. These are of two types:

 Driver that can store a value (example: flipflop).


 Driver that cannot store a value, but connects two points(wire).
The first type of driver is called a reg in Verilog (short for “register”). The second
data type is called a wire (for …well,” wire”). You can refer to tidbits section to
understand it better.

OPERATORS:
Operators, thankfully, are the same things here as they are in other programming
languages. They take two values and compare (or otherwise operate on) them to yield
third result-common examples are addition, equals, logical-and….to make life easier
for us, nearly all operators (at least ones in the list below) are exactly the same as their
counterparts in the C programming language.

Operator type Operator symbol Operation performed


Arithmetic * Multiply
/ Division
+ Add
- Subtract
15
% Modulus
+ Unary plus
- Unary minus
Logical ! Logical negation
&& Logical and
|| Logical or
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
Equality == Equality
!= Inequality
Reduction ~ Bitwise negation
~& NAND
| OR
~| NOR
^ XOR
^~ XNOR
~^ XNOR
Shift >> Right shift
<< Left shift
Concatenation {} Concatenation
Conditional ? Conditional
Table 6.1: Types of operators used in Verilog

CONTROL STATEMENTS:

IF-ELSE:

If-else statements check a condition to decide whether or not to execute a portion of


code. If a condition is satisfied, the code is executed. Else, it runs this portion of code.

CASE:

Case statements are used where we have one variable which needs to be checked for
multiple values like an address decoder, where the input is an address and it needs to
be checked for all the values that it can take. Instead of using multiple nested if-else
statements, one for each value we are looking for, we use a single case statement: this
is similar to switch statements in languages as C++.case statements begin with a
reserved word case and with a reserved word end case (Verilog does not use brackets
to delimit the blocks of code).

The cases, followed with a colon and the statements you wish executed, are listed
within these two delimiters. It’s also a good idea to have a default case. Just like with
a finite state machine (FSM), if the Verilog machine enters into a non-covered
statement, the machine hangs, defaulting the statement with a return to idle keeps us
safe.

WHILE:

A while statement executes the code within it repeatedly if the condition it is assigned
to check returns true. While loops are not normally used for models in real life, but
they are used in test benches. as with other statement blocks, they are delimited by
begin and end.

FOR LOOP:

For loops in Verilog are almost exactly like for loops in Cor C++. The only difference
is that the ++ and – operators are not supported in Verilog. Instead of writing i++ as
you would in C, we need to write out it’s full operational equivalent, i=i+1.

REPEAT:
Repeat is similar to the for loop we just covered. Instead of explicitly specifying a
variable and incrementing it when we declare the for loop, we tell the program how
many times to run through the code, and no variables are incremented (unless we
want them to be, like in this example).

ASSIGN STATEMENTS:

An assign statement is used only for modelling only combinational logic and it is
executed continuously. So, the assign statement is called ‘continuous assign
statement’ as there is no sensitive list.

assign out = (enable)? data: 1’b;


17
The above example is a tri-state buffer. When enable is 1, data is driven to out, else
out is pulled to high-impedance. We can use nested conditional operators to construct
mux, decoders and encoders.

assign out = data;

This example is a simple buffer.

TASK AND FUNCTIONS:

When repeating the same old things again and again, Verilog, like any other
programming language, provides means to address repeated used code, these are
called Task and functions. Functions and tasks have the same syntax; one difference is
that task can have delays, whereas functions cannot have any delay. This means that
function can be used for modelling combinational logic. A second difference is that
functions can return a value, whereas task cannot be performed.

TEST BENCH:

Test bench needs to drive inputs and needs to monitor outputs. A test bench supplies
the signals and dumps the output to simulate the Verilog design modules. It involves
the design under test, generates the simulation input vectors and implements the
system task to view/format the results of the simulation. It is never synthesized so it
can use all Verilog commands.
5. PRINCIPLE OF SELF-CHECKING FULL ADDER
18

We utilize the following relations for the full adder:

1. The Sum and Carry bits will be equal to each other when all three inputs are equal.

2. The Sum and Carry bits will be complemented when any of the three inputs are
different.

By using the above two relations, a full adder can be self-checked with only the
expense of an equivalence tester (Eqt), as shown in Fig. 2. The purpose of the
equivalence tester is to check the equivalence of all inputs. Hence, apart from the Sum
and Carry bits calculation, we have to compute logic for the equivalence tester, and
the expression for that purpose is in Eq. (3.3). The functional block Eqt has been
designed to produce an active low output so that we can use the fault-secure exclusive
nor logical function (XNOR) gate for comparison.

Sum = A XOR B XOR Cin -------------- (2.1)

Cout = A.B + Cin.(A + B) -------------- (2.2)

Equivalence tester (Eqt) = [(A’.B’.Cin’) + (A.B.Cin)]’ -------------- (2.3)

Error (Err) = Sum XNOR Cout XNOR Eqt ------------- (2.4)


FIG. 2.1 Full adder with no logic sharing

The final fault is computed by using two XNOR gates. The purpose of the first XNOR
19Cout bits are equal or complemented. We
gate (G1) is to check whether the Sum and
need a second XNOR gate (G2) because of the previously mentioned observation that
Sum and Cout will always complement each other except when all inputs are equal.
Thus, the output of G1 will indicate the equality or difference of Sum and Cout, and
G2 will verify the output of G1 by comparing it with an equivalence tester, and thus
generate the final error indication. When Eqt is zero, the output of G1 and G2 should
be logic 1 and 0, respectively. On the other hand, if Eqt indicates logic 1, then both
XNOR gates should generate logic 0 (i.e., It = 0 and Ef = 0), and in any other case, a
fault will be indicated.

FIG. 2.1 Self-checking Full Adder


6. SELF-CHECKING
20 CSLA ARCHITECTURES

6.1 BEC BASED SELF-CHECKING CSLA


The basic m-bit single-stage CSLA includes two m-bit RCAs, and an (m+1)-bit
multiplexer to select the proper m-bit sum and 1-bit output carry. Naturally, the self-
checking property is attained with the cost of some area, power, and delay overheads.
Thus, to obtain a low-overhead self-checking CSLA, the proposed self-checking FA
described in Section 3.2 was applied to the area- and power-efficient BEC-based
CSLA. In this CSLA, the second RCA is replaced by the BEC logic which includes
only one XOR gate and one AND gate in all bit positions except the first one which
requires only a NOT gate. This way, the obtained self-checking single-stage CSLA
will have lower area and power compared to the designs based on two-pair two-rail
checkers even after replacing all multiplexers and XOR gates with their self-checking
versions.

Except for the least significant bit (LSB), S0 j and S1 j has the following relation:
Except for the LSB, the Sum bit computed when carry-in equals 0 will be a
complement to the corresponding Sum bit with carry-in equal to 1, only when all the
lower Sum bits are equal to logic 1.
Fig. 6.1.1 BEC based self-checking 2-bit single-stage CSLA

In general, we can say that: 21

If (S01.S02.S03 ...... S0(i-1) = 1)

Then (S1j = S0j’);

Else (S1j = S0j);

where j indicates the bit position. Thus, in order to design an n-bit CSLA, the
generalized Boolean equations have been shown in Eqs. 6.1.1, 6.1.2, and 6.1.3. The
design module shown in Fig. 5 will be used to extend the 2-bit CSLA design to an n-
bit CSLA design. All the intermediate Sum bits between the LSB and final Cout can
be generated using the module shown in Fig. 5. The LSB and module for the final
Cout (MOFC) design in Fig. 4 can be used to complete the n-bit self-checking CSLA.

S01 = S00’ ---------------(6.1.1)

Sn1 = S0n XOR (S01. S02. S03 …..S0(n-1)) ---------------(6.1.2)

C1n = C0n XOR (S01. S02. S03 …..S0(n-1)) ---------------(6.1.3)

6.2 CGS BASED SELF-CHECKING CSLA

The logic operations of the RCA can be broken into four distinct operations as shown
in Fig. 3.2.1. This figure is redrawn with some modifications in the location of two
lower blocks and the connecting lines between them in comparison with the structure
shown. The four split operations are a half-sum generation (HSG), half-carry
generation (HCG), full-sum generation (FSG), and full-carry generation (FCG). These
split operations can be applied on a single FA as depicted in Fig. 5 in which the logic
sharing is used to reduce the overall area. The shared logic between Sum and Cout is
an XOR gate that produces the half-sum. Based on Fig. 5, Eqs. (2.1) and (2.2) can be
rewritten according to Eqs. (6.2.1) and (6.2.2), respectively:

Sum = A⨁B⨁Cin = HS⨁Cin ------------------ (6.2.1)

Cout = A.B + Cin.(A⨁B) = HC + Cin. HS -------------------(6.2.2)

in which HS and HC stand for half-sum and half-carry, respectively.

22

Fig. 6.2.1 Split logic operation of RCA


Fig 6.2.2 Different parts of shared-logic FA based on the split logic operations

Considering two m-bit RCAs in the basic single-stage CSLA with respect to Fig. 3.2.1
reveals that the HSG and HCG blocks in two RCAs are identical irrespective the
value of the input carry. Thus, one HSG and one HCG block can be removed.
Furthermore, due to the fact that the FCG blocks of two RCAs have no dependency
on the FSG blocks, the FCG blocks can be scheduled before the FSG blocks. Thus,
due to the nature of CSLA, one of two FCG blocks can be selected based on the real
input carry to later construct the final sum in an FSG block as well as the last Cout.
This way, one of two FSG blocks can be removed, as well.
23
Here, based on the discussions above, the formulations required to obtain a new self-
checking single-stage CSLA are presented with the aim of lowering area, power and
delay compared to the best existing design. As stated in Chapter 2, the self-checking
property necessitates that the outputs of FA do not have any shared logic. Therefore,
the FA shown in Fig. 2.1 is used to construct the combined RCAs of the new CSLA.
The comparison of FAs in Fig. 2.1 and Fig. 6.2.2 reveals that the FA without logic
sharing (Fig. 2.1) includes an extra OR gate whose output can be used as the
Propagate signal P since its inputs are the input operands A and B. In fact, this OR
gate is used instead of the XOR gate of HSG in Fig. 6.2.2 to produce the Cout in the
FCG. In addition, the output of the AND gate whose inputs are the operands A and B
can be used as the Generate signal G. This signal is the output of HCG, as well. Thus,
the well-known Propagate and Generate signals P and G are used in the new equations
because the Cout of a FA can simply be stated by P and G based on Eq. (2.2). As a
result, the following equations are presented to describe an m-bit self-checking single-
stage CSLA for 0 ≤ i ≤ m − 1 as the bit index:

S0(i) = A(i)⨁B(i), G(i) = A(i).B(i), P(i) = A(i) + B(i) ---------(6.2.3)

C0(i) = G(i) + P(i).C0(i-1), C0(0) = G(0) for C0(-1) = 0 ----------(6.2.4)

C1(i) = G(i) + P(i).C1(i-1), C1(0) = P(0) for C1(-1) = 1 ----------(6.2.5)

C(i) = C0(i)+Cin.C1(i), Cout = C(m-1) ----------(6.2.6)

S(i) = S0(i)⨁C(i-1), S(0) = S0(0)⨁Cin ---------(6.2.7)

Teq(i) = {A(i).B(i).C(i-1) + A(i)’.B(i)’.C(i-1)’}’, C(-1) = Cin ---------(6.2.8)

Err(i) = S(i)⨀C(i)⨀Teq(i) ---------(6.2.9)

In Eq. (3.2.3), S0(i) is the half-sum, G(i) is the half-carry, and P(i) is the redundant
logic needed for the ith FA without logic sharing. These logic parts are computed in
parallel and commonly used in two combined RCAs of the new CSLA. Eqs. (6.2.4)
and (6.2.5) show the parallel computation of full-carry in the RCAs assuming that the
input carry is zero or one, respectively. In Eqs. (6.2.4) and (6.2.5), C0(–1) and C1(–1)
are the input carries that cause the full carry of the first FA to be equal to G(0) or P(0),
respectively. However, the real internal carries and the output carry Cout are selected
by the multiplexing logic of Eq. (6.2.7) after entering the input carry Cin as the
24
selector of the multiplexer. Eq. (6.2.7) shows a selection logic simpler than a 2-to-1
MUX. This logic produces a proper result due to the fact that if C0(i) equals one, then
C1(i) equals one, as well, and this leads to a correct result even if Cin equals one. Eq.
(6.2.8) shows the computation of full-sum in all bit locations using the half-sum and
the input carry from the previous bit location computed by Eq. (6.2.7). However, the
first incoming carry C(–1) is the Cin for the first full-sum bit S(0).

Eqs. (6.2.8) and (6.2.9) are used to attain the self-checking property in all bit locations
of the proposed CSLA. Eq. (6.2.8) performs the test of equivalence (Teq) in all bit
locations for the input operands A and B along with their corresponding input carry.
Teq(i) is computed in parallel with S(i) from Eq. (6.2.7). Eq. (6.2.9) sets the error
signal Err(i) to one for each bit location i if the output bits of sum and carry in each
bit location are not compatible with the test of equivalence for that bit location. Due to
the fact that the error signals are independently computed for all bit locations,
multiple-fault detection (maximum m faults) is possible in the proposed self-checking
single-stage CSLA.

Fig. 6.2.3 illustrates the proposed self-checking single-stage CSLA based on


described equations. In this figure, the upper block generates half-sum, in addition, to
Propagate and Generate signals (Eq. 6.2.3) where the latter is used as the half-carry,
as well. Two FCG blocks as the full-carry generation blocks produce two carry
vectors in parallel based on Eqs. (6.2.4) and (6.2.5). One of these carry vectors will be
selected by the multiplexer which includes m 1-bit 2-to-1 MUXs. The other blocks
implement the remaining equations described before. For more illustration, Fig. 6.2.4
demonstrates the gate level implementation of different parts of the proposed CSLA
shown in Fig. 6.2.3.

25
Fig. 6.2.3 CGS based m-bit self-checking single-stage CSLA

(a)

(b)
(c)

26
(d) (e)

(f)
Fig. 6.2.4. Gate level implementation of (a) half-sum, P and G generators, (b)
full-sum generator, (c) full-carry generator with the input carry equal to '0' (1st
FCG), (d) full-carry generator with the input carry equal to '1' (2nd FCG), (e)
error detection block, and (f) test of equivalence error detection block

To clarify the self-checking property of the proposed single-stage CSLA, some


theorems can be proved based on Figs. 3.2.3 and 3.2.4, and the equations (3.2.3) to
(3.2.9). Beforehand, it should be noted that the fault model used in this project
includes all faults that can solely affect the output of an internal block provided that
all produced erroneous values are at the logic levels.

27
Theorem 1. The incorporated error detection scheme detects the error produced by a
logic fault affecting the half-sum generator or the full-sum generator or the
multiplexer or test of equivalence block or error detection block.

Proof.

(1) A fault affecting the half-sum or full-sum generators definitely changes one of the
bits of the output sum e.g. S(i) which is simply detected after computing Teq(i) and
passing S(i) through an error detection block that causes to error detection by setting
Err(i) to '1'.

(2) A fault affecting the multiplexer changes one of the bits of the C vector e.g. C(i)
which is simply detected after computing Teq(i) and passing C(i) through an error
detection block that causes Err(i) to be equal to '1'. However, the erroneous C(i) alters
S(i+1) based on Eq. (11) or Fig. 3.2.4b, and may change Teq(i+1) based on Fig. 3.2.4
e which possibly lead to error detection in the (i+1)th bit, as well. Anyway, the
erroneous C(i) will definitely be detected since one raised error signal (Err(i)) is
enough for error detection.

(3) A fault affecting the test of equivalence block or error detection block is simply
detected because if Teq(i) changes or one of the XNOR gates producing Err(i) (Fig.
3.2.4f) toggles its output, Err(i) will be set to '1'.

Theorem 2. The incorporated error detection scheme detects the error produced by a
logic fault affecting the Propagate or Generate signals or one of the first or second
full-carry generators.

Proof.

(1) A fault affecting P(0) or G(0) or one of the first or second full-carry generators
definitely changes one of the bits of the output carries, e.g. C0(i) or C1(i) according to
Figs. 7c and 7d., which can also make erroneous the subsequent internal carries if the
error is propagated in the chain. As one of the output carry vectors C0 and C1 is
erroneous if the multiplexer selects the unfaulty output carry vector, the error is
masked and the CSLA remains unaffected by the fault. However, if the multiplexer
selects the faulty output carry vector, the error will definitely be detected by Err(i).

28
(2) A fault affecting one of the P(i) or G(i) signals with i > 0 may produce zero (if
masked), one or more erroneous internal carries (if propagated in the chains shown in
Figs. 3.2.4c and 3.2.4d) inside one or both the full-carry generators. In the worst case,
the output carry vector C (Fig. 6) with one or more erroneous bits passes through the
multiplexer which will definitely be detected in the first erroneous location by Err(i).

29
7. FAST ADD-ONE AND MULTIPLEXING BASED
SELF-CHECKING CSLA ARCHITECTURES

7.1. SELF-CHECKING CSLA WITH FAST ADD-ONE AND MULTIPLEXING


CIRCUIT
As stated in Section 1, the first category of CSLAs includes the designs utilizing an
add-one circuit instead of the RCA with Cin = 1. This method is the first effective
idea to decrease the area and power consumption of the basic CSLA. The BEC CSLA
(Fig. 2), as the best design of this category with a low delay overhead, requires
multipliers to select between two pairs of results (two sums and two output carries). In
this section, the FAM CSLA (which stands for Fast Add-one and Multiplexing
CSLA) is implemented in which the add-one operation is performed combined with
the MUX operation. This combination leads to an efficient CSLA architecture with
respect to delay, area, and power consumption. Moreover, to achieve more speed, two
fast FA structures are utilized inside the single RCA of the proposed FAM CSLA.
Fig. 5 depicts the proposed m-bit single-stage FAM CSLA. Different from the BEC
CSLA shown in Fig. 2, there is no multiplexer in this architecture which makes it
more compact and simpler than the BEC CSLA. To prove the correctness of this
CSLA’s operation, some logical expressions can be used. We show that the sum and
output carry of FAM CSLA are the same as those of BEC CSLA in an m-bit single-
stage structure In fact, in Fig. 2, S1(i) is dependent on S0(i), and both enter a MUX.
This dependency helps to simplify the selection operation of multiplexers. Starting
from Eqs. (1) and (2), we can replace S1(i) inside Eq. (2a) by S1(i) from Eq. (1a), and
similarly, C1 out inside Eq. (2b) by C1 out from Eq. (1c) to obtain the following
equations for 0 ≤ i ≤ m–1:

S(i) = Cin’.S0(i), + Cin. (S0(i)⊕ C1 (i−1)) ------------(7.1.1)

Cout = Cin’.Cout0 + Cin. (Cout0⊕ C1 (m−1)) ------------(7.1.2)

Eq. (4a) can expand as the following by replacing the first term with two terms
equivalent to it, and changing the XOR operation to AND-OR operations in order to
obtain a simpler equation:

S(i) = [Cin’.S0(i)+Cin’.S0(i).C1(i −1)’]+[Cin.S0(i).C1(i − 1)’+Cin.S0(i).C1(i − 1) ]


= Cin’.S0(i) + S0(i).C1(i − 1)’+Cin.S0(i)’.C1(i − 1)

= S0(i).(Cin’+C1(i − 1)’) + S0(i)’.Cin.C1(i −301)

= S0(i).(Cin.C1 (i − 1))’ + S0(i)’.Cin. C1(i − 1)

Fig. 7.1.1 m-bit FAM based single stage Self-Checking CSLA


which results in the following equation:
S(i) = S0(i)⊕ (Cin.C1(i−1)) -------------------(7.1.3)

where C1(–1) = 1. Similarly, the output carry can be stated by the following equation:
Cout = Cout0 ⊕ (Cin.C1(m − 1)) -----------------(7.1.4)

which is simplified to Eq. (4.1.5) knowing the fact that Cout 0 and C1(m−1) are never
both equal to one.

Cout = Cout0 + Cin. C1(m − 1) ------------------(7.1.5)

Eqs. (7.1.3) and (7.1.4) show the simplified equations of the FAM CSLA as Fig. 7.1.1
depicts their implementation. These equations lead to significant reductions in delay,
area, and power consumption compared to the BEC CSLA.
Eqs. (7.2.8) and (7.2.9) are used to attain the self-checking property in all bit locations
of the CGS-based CSLA. Eq. (6.2.8) performs the test of equivalence (Teq) in all bit
locations for the input operands A and B along with their corresponding input carry.
Teq(i) is computed in parallel with S(i) 31
from Eq. (6.2.7). Eq. (6.2.9) sets the error
signal Err(i) to one for each bit location i if the output bits of sum and carry in each
bit location are not compatible with the test of equivalence for that bit location. Due to
the fact that the error signals are independently computed for all bit locations,
multiple-fault detection (maximum m faults) is possible in the proposed self-checking
single-stage CSLA.

Fig. 4.1.1 illustrates the proposed self-checking single-stage CSLA based on


described equations. In this figure, the upper block computes Sum with Cin as ‘0’,
where the latter one is Add-one and multiplexing circuit which adds ‘1’ to the sum
computed by the above block. And the two carry vectors are generated, one by the
Fast RCA block and the other by the Add-one and multiplexing circuit block. One of
these carry vectors will be selected by the reduced logic multiplexing logic present in
the Add-one and Multiplexing circuit block. The other blocks implement the
remaining equations described before. For more illustration, Fig. 7.1.2 demonstrates
the gate level implementation of different parts of the proposed CSLA shown in Fig.
7.1.1.

(a)
32

(b)

(c)
Fig. 7.1.2. Gate level implementation of (a) Add-one and Multiplexing block,
(b)test of equivalence error detection block and, (c) error detection block
7.2. FAM BASED SELF-CHECKING CSLA WITH MUX BASED RCA
33

To achieve more speed in the FAM-based Self-checking CSLA discussed in 4.1, the
RCA in each group is constructed using fast FA structures. Altogether four types of
gate-level FAs are investigated. The first FA as the basic FA is the area optimized FA
in which the sum and output carry have a common logic. These two gate-level FAs
are the fast designs utilized in the FAM CSLA. These two FAs result in a lower delay
for carry logic compared to the basic FA based on synthesis results. The first FA is a
low-area MUX-based circuit. But to achieve fault detection capabilities the carry and
sum generation must not share any common logic, so an additional XOR gate is
introduced at each FA present in the MUX-based.

Fig. 7.2.1 Fast MUX-based m-bit RCA with Cin=0


34

7.3. FAM BASED SELF-CHECKING CSLA WITH SKIP LOGIC BASED


RCA

To achieve more speed and lesser delay in the proposed FAM CSLA, the RCA in
each group is constructed using fast FA structures. Another is a high-speed FA that
we call it as a skip logic. This logic that includes one OR and one AND gate is similar
to the skip logic in the carry skip adder, and results in more speed if it is designed
only in a separate module to be mapped on a complex gate in the synthesis process.
As this design of FA shares common logic for output Carry and Sum, so as to make it
capable for fault-detection an additional XOR gate is introduced at each FA so that it
is well suited for the implementation of high-speed Self-Checking CSLA.

Fig. 7.3.1 Fast Skip logic-based m-bit RCA with Cin=0


8. Delay-optimized self-checking
35 multi-stage CSLA
In order to fully benefit from the concurrent operations of a CSLA to reach a low-
delay addition, a multi-stage CSLA should be constructed by chaining the single-stage
CSLAs which can be either of the same size or different sizes. In a multi-stage CSLA,
the maximum delay is not affected by the delays of the sum's preparation in the
groups, rather, it is dependent on the delays of the outgoing carries of the groups. In
the following, after illustrating the SQRT grouping for the CSLA.

8.1. SQRT grouping

In the SQRT grouping structure, the increasing size of the groups is based on the fact
that the delay of an m-bit group, independent from the other groups equals the
summation of delays of an m-bit RCA and a 1-bit 2-to-1 MUX. Thus, to fully exploit
the concurrency, the delay of the RCAs in the next group should be equal to this
summation which means the size of the next group should be more than the size of the
current group. For example, according to Fig. 1, the output carry of the first 2-bit
RCA will be ready for the first multiplexer along with the output carries of 2-bit
RCAs of the first group. In addition, the outgoing carry of the first group should be
ready as the selector of the second multiplexer along with the output carries of 3-bit
RCAs of the second group. This is true for the next groups, as well. This way, ideally,
all of the inputs of the last 1-bit 2-to-1 MUX (for selecting the outgoing carry) in a
group should be ready at the same time.
36
For the 8-bit multi-stage CSLA the best SQRT grouping includes a 2-bit RCA, and
two groups with the sizes of 2 and 4 bits, respectively, although the groups with the
sizes of 2 and 3 bits followed by a 1-bit group can be utilized, as well. However, the
last grouping requires more area. As stated before, based on Fig. 1, the 16-bit SQRT
CSLA consists of a 2-bit RCA, and the groups with the sizes of 2, 3, 4, and 5 bits,
respectively. For the 32-bit SQRT CSLA after the first 2-bit RCA, the groups with the
sizes of 2, 3, 4, 6, 7, and 8 bits lead to the minimum delay, and for the 64-bit SQRT
CSLA after the first 2-bit RCA, the groups with the sizes of 2, 3, 4, 6, 7, 8, 9, 11 and
12 bits are used.
37
9. RESULTS

9.1. RESULTS OF 16-BIT BEC BASED SELF-CHECKING CLSA


9.2. RESULTS OF 32-BIT BEC BASED38SELF-CHECKING CLSA
9.3. RESULTS OF 64-BIT BEC BASED SELF-CHECKING CLSA
39
9.4. RESULTS OF 16-BIT CGS BASED SELF-CHECKING CLSA
40
79.5. RESULTS OF 32-BIT CGS BASED SELF-CHECKING CLSA
41
9.6. RESULTS OF 64-BIT CGS BASED SELF-CHECKING CLSA
42
9.7. RESULTS OF 16-BIT FAM BASED SELF-CHECKING CLSA USING
MUX-BASED RCA 43
44SELF-CHECKING CLSA USING
9.8. RESULTS OF 32-BIT FAM BASED
MUX-BASED RCA
9.9. RESULTS OF 64-BIT FAM BASED SELF-CHECKING CLSA USING
MUX-BASED RCA
45

You might also like