21ec503 Vlsi Design Unit IV

1 1
2 2
Please read this disclaimer before
pThrisodcoceumeednitnisgco:nfidential and intended solely for the educational
purpose of RMK Group of Educational Institutions. If you have received this
document through email in error, please notify the system manager. This
document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and
delete this document from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.
3 3
R.M.D ENGINEERING COLLEGE
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
21EC503 - VLSI DESIGN (Lab Integrated)
Department :Electronics and Communication

Engineering
Batch/Year :2021-2025/III
Created by :Ms.P.Santhoshini
:Ms.S.Gayathri Priya
Date :23.09.2023
4 4
TABLE OF
CONTENTS
S.No Contents Page
Number
1 Course Objectives 7
2 Pre Requisites 8
3 Syllabus 9
4 Course outcomes 10
5 CO- PO/PSO Mapping 15
6 Unit IV- DESIGN OF ARITHMETIC BUILDING BLOCKS
AND SUBSYSTEM
6.1 Lecture Plan 16
6.2 Activity based learning 17
6.3 Lecture Notes 18
Data Path Circuits 19
Adders 20
Ripple Carry Adder 21
The Mirror Adder 24
Transmission Gate Based Adder 25
Carry Look Ahead Adder 26
High Speed Adders 30
Multiplier 36
Barrel Shifter 47
ALU 48
Designing Memory And Array Structures 51
6.4 Assignments 77
6.5 Part A Q & A 78
6.6 Part B Questions 83
6.7 Supportive online Certification courses 84
6.8 Real time Applications in day to day life and to Industry 86
6.9 Contents beyond the Syllabus 87
5
Table of
Contents
S.No Contents Page
Number
7 Assessment Schedule 88
8 Prescribed Text Books & Reference Books 89
9 Mini Project suggestions 90
6
1. COURSE OBJECTIVE
OBJECTIVES:
❖ To study the fundamentals of CMOS circuits and its characteristics.
❖ To learn the design and realization of combinational & sequential digital

circuits.
❖ To study the Architectural choices and performance tradeoffs involved in
designing and realizing the circuits in CMOS technology are discussed.
❖ To learn the different FPGA architectures and testability of VLSI circuits.
❖ To learn Hardware Descriptive Language (Verilog / VHDL) and to
familiarize fusing of logical modules on FPGAs.
7
2. PRE REQUISITES
1.21EC303 - DIGITAL ELECTRONICS

By learning this course,the student will have a thorough knowledge
about designing combinational and sequential circuits.
2. 21EC404 – LINEAR INTEGRATED CIRCUITS

By learning this course,the student will have deep insight in fabrication
and designing ICs
8
3. SYLLABUS
Subject Code Subject Name L T P C
21EC503 VLSI Design (Lab 3 0 2 4

Integrated)
UNIT I INTRODUCTION TO MOS TRANSISTOR 15

MOS Transistor, CMOS logic, Inverter, Layout Design Rules, Gate Layouts, Stick
Diagrams, Long-Channel I-V Characteristics, C-V Characteristics, Non ideal I-V
Effects, DC Transfer characteristics, RC Delay Model, Elmore Delay, Linear Delay
Model, Logical effort, Parasitic Delay, Delay in Logic Gate, Scaling.
LIST OF EXPERIMENTS
1. Design of inverter using LT-SPICE
2. Layout verification of CMOS inverter, NOR and NAND gates
UNIT II COMBINATIONAL MOS LOGIC CIRCUITS 15

Circuit Families: Static CMOS, Ratioed Circuits, Cascode Voltage Switch Logic,
Dynamic Circuits, Pass Transistor Logic, Transmission Gates, Domino, Dual Rail
Domino, CPL, DCVSPG, DPL, CMOS Power Dissipation. Design of combinational
circuits using Verilog.
LIST OF EXPERIMENTS
1. Design of adder and subtractor
2. Design of multiplexer and demultiplexer
UNIT III SEQUENTIAL CIRCUIT DESIGN 15

Static latches and Registers, Dynamic latches and Registers, Pulse Registers,
Pipelining, Schmitt Trigger, Monostable Sequential Circuits, Astable Sequential
Circuits. Timing Issues: Timing Classification of Digital System, Synchronous Design,
Design of sequential circuits using Verilog.
LIST OF EXPERIMENTS
1.Design of Flipflops
2.Design of counter
3. Design of
universal shift
register
4. Design of Mealy
and Moore State
Machines
5. Design of random
9
Access Memory
UNIT IV DESIGN OF ARITHMETIC BUILDING BLOCKS AND
SUBSYSTEM
15
Arithmetic Building Blocks: Data Paths, Adders, Multipliers, Shifters, ALUs, power
and speed tradeoffs, Designing Memory and Array structures: Memory Architectures
and Building Blocks, Memory Core, Memory Peripheral Circuitry.
LIST OF EXPERIMENTS
1.Design of Arithmetic Logic Unit
2.Design of Ripple Carry Adder
3.Design of Carry Select Adder
4.Design of Multiplier
UNIT V
IMPLEMENTATION
STRATEGIES AND
TESTING
15
FPGA Building Block Architectures, FPGA Interconnect Routing Procedures. Design
for Testability: Ad Hoc Testing, Scan Design, BIST, IDDQ Testing, Boundary Scan.
10
4. COURSE OUTCOMES
After successful completion of the course, the students should be

able to
Highest
Course Outcomes Cognitive
Level
Understand the fundamental principles of VLSI circuit design in
CO1 K2
digital domain
CO2 Realize the combinational circuits using different logic families K3
Understand the memory design in sequential logic circuits K3

CO3
Analyze the architectural choice and performance tradeoff

CO4 K3
involved in data path unit design
Understand the different FPGA architectures and its testing K2

CO5
Design, Simulate to verify the functionality of logic modules

CO6 using EDA tools and familiarize fusing of logical modules on K2
FPGA
11
Program Outcomes(PO)
Program Engineering Graduates will be able to
Outcome
Engineering Apply the knowledge of mathematics,science,engineering
PO1 fundamentals, and an engineering specialization to the solution of
complex engineering problems.
Knowledge
Identify, formulate, review research literature, and analyze complex
Problem
PO2 engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences
Analysis
Design solutions for complex engineering problems and design

Design/
system components or processes that meet the specified needs with
PO3 Development
appropriate consideration for the public health and safety, and the
of Solutions
cultural, societal, and environmental considerations.
Conduct
Use research-based knowledge and research methods including
Investigations
PO4 design of experiments, analysis and interpretation of data, and
of Complex
synthesis of the information to provide valid conclusions.
Problems
Create, select, and apply appropriate techniques, resources, and

Modern Tool modern engineering and IT tools including prediction and modeling to
PO5 Usage complex engineering activities with an understanding of
the limitations.
Apply reasoning informed by the contextual knowledge to assess

The Engineer
PO6 societal, health, safety, legal and cultural issues and the consequent
and Society
responsibilities relevant to the professional engineering practice.
Environment Understand the impact of the professional engineering solutions in

and societal and environmental contexts, and demonstrate the knowledge
PO7
Sustainability of, and need for sustainable development.
12
Program Outcomes(PO)
Program Engineering Graduates will be able to
Outcome
Ethics Apply ethical principles and commit to professional ethics and
PO8
responsibilities and norms of the engineering practice
Individual and Function effectively as an individual, and as a member or leader in
PO9 Team Work diverse teams, and in multidisciplinary settings
Communicate effectively on complex engineering activities with the

engineering community and with society at large, such as, being able
Communication
PO10 to comprehend and write effective reports and design
documentation, make effective presentations, and give and receive
clear instructions
Project Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one’s own work, as a
PO11 Management member and leader in a team, to manage projects and
and Finance
in multidisciplinary environments
Lifelong Recognize the need for, and have the preparation and ability to
PO12 Learning engage in independent and life-long learning in the broadest context
of technological change.
13
Program Specific Outcomes(PSO)
Program
Specific Electronics and Communication Engineering Graduates will be
Outcomes able to
To analyze, design and develop solutions by applying

PSO1 foundational concepts of Electronics and Communication
Engineering.
PSO2 To apply design principles and best practices for developing

quality products for scientific and business applications
To adapt to emerging information and communication

PSO3 technologies (ICT) to innovate ideas and solutions to
existing/novel problems.
14
5. CO- PO/PSO Mapping
Course Level
Program
Outcom of Program Outcomes Specific
es CO Outcomes
K3,K5
K K4 K4 K5 A3 A2 A3 A3 A3 A3 A2 K6 K5 K3
,K6
3
P
PO-
O- PO-2 PO-3 PO-4 PO-7 PO-8
PO-5 PO-6 9 PO-10 PO-11 PO-12 PSO-1 PSO-2 PSO-3
1
CO1 K2 2 1 1 - - - - - - - - - - 1 1
CO2 K3 1 2 - - - - - - - - - - - 2 1
CO3 K3 2 1 2 - - - - - - - - - - 1 1
CO4 K3 1 2 1 - - - - - - - - - - 1 2
CO5 K3 1 2 - - - - - - - - - - - 1 2
CO6 K2 2 1 1 - - - - - - - - - - 1 1
15
UNIT IV DESIGN OF ARITHMETIC BUILDING BLOCKS AND SUBSYSTEM
Arithmetic Building Blocks: Data Paths, Adders, Multipliers, Shifters, ALUs, power
and speed tradeoffs, Designing Memory and Array structures: Memory Architectures
and Building Blocks, Memory Core, Memory Peripheral Circuitry.
16
LECTURE PLAN
UNIT IV -DESIGN OF ARITHMETIC BUILDING BLOCKS
AND SUBSYSTEM
S No. Proposed Ac Per Reaso
. of Date tu tai Taxonom Mode of n or
N Per al y level Devia
o Topic iods nin Delivery tion
Da g
te CO
Arithmetic Building K3 Chalk -

1 Blocks: Data Paths, 1 CO3 Apply and talk
Adders, Multipliers, K3 Chalk

1 -
2 CO3 Apply and talk
Shifters, ALUs, K3 Chalk

1 -
3 CO3 Apply and talk
power and speed
tradeoffs K3 Chalk -
4 1 CO3 Apply and talk
, Designing K3 Chalk
1 -
5 Memory and Array CO3 Apply and talk
structures
: Memory K3 Chalk
1 -
6 Architectures and CO3 Apply and talk
Building Blocks, K3 Chalk

1 CO3 -
7 Apply and talk
Memory Core, K2
Chalk -
1 CO3 Under
8 and talk
stand
Memory
Chalk -
Peripheral K1
Circuitry. 1 CO3 and talk
9 Remember
Total No. of Periods : 9

17 18
6.2 Activity Based Learning
1. Crossword Puzzles:
Made the students to understand the basic concepts of memory,
multiplers,adders and had a short discussion using the below mentioned
crossword puzzles.
2. Roleplay :
A group of 10 students are given the following topic and instructed to

demonstrate a role play. “Comparing the ALUs, power and speed
tradeoff in designing the circuits”
18
6.3 LECTURE NOTES
UNIT IV DESIGN OF ARITHMETIC BUILDING
BLOCKS AND SUBSYSTEM
4.1 DATA PATH CIRCUITS:

Digital processor architecture consists of the data path, memory, control and
input/output blocks. Data path is core of the processor where all computations are
performed.
Fig 4.1 Digital Processor

A typical data path consist of an interconnection of basic computational functions, such as
arithmetic operators (addition, multiplication, comparison and shift) or logic (AND,OR and
XOR).
Bit sliced Data Path Circuits:

Data paths often are arranged in a bit sliced organization. Instead operating on
single bit digital signals, the data in a processor are arranged in a word based
fashion.
19
Fig 4.2 Bit sliced data paths
Adders:
Addition forms the basis for many processing operations, from ALUs to address
generation to multiplication to filtering. As a result, adder circuits that add two binary
numbers are of great interest to digital system designers.
Single-Bit Addition
The half adderof Figure 4.3 (a) adds two single-bit inputs, A and B. The result is 0, 1,
or 2, so two bits are required to represent the value; they are called the sum S and
carry- out
C. The carry-out is equivalent to a carry-in to the next more significant column of a
multibit adder, so it can be described as having double the weight of the other bits. If
multiple adders are to be cascaded, each must be able to receive the carry-in. Such a
full adder as shown in Figure 4.3 (b) has a third input called C or Cin.
Figure 4.3 (a) Half adder

20
Full Adder:
Fig 4.3 (b) Truth table of Full adder

A and B are adder inputs. Ci is the carry input. S is the Sum output and C0 is the carry
output. The Boolean expressions for Sum and Carry output.
21
Express Sum and Carry as a function of P,G, D
Define 3 new variable which ONLY depend on A, B
Generate(G)= AB Propagate (P) = A ⊕B Delete =
A’B’
We can rewrite the S and C0 as functions of P and
G(or D)
C(G,P)=G+PCi
S(G,P)=P⊕Ci
4.2.2 Ripple Carry Adder:

Multiple full adder circuits can be cascaded in
parallel to add an N-bit number. For
an N- bit parallel adder, there must be N number of full adder circuits. A ripple carry
adder is a logic circuit in which the carry-out of each full adder is the carry in of the
succeeding next most significant full adder. It is called a ripple carry adder because each
carry bit gets rippled into the next stage. In a ripple carry adder the sum and carry out
bits of any half adder stage is not valid until the carry in of that stage occurs.
Propagation delays inside the logic circuitry is the reason behind this. Propagation delay
is time elapsed between the application of an input and occurrence of the corresponding
output.
Figure 4.4 Ripple carry adder
22
The propagation delay of such a structure (is also called critical path) is
defined as the worst case delay over all possible input patterns. In case of ripple
carry adder ,the worst case delay happens when a carry generated at the LSB
position propagates all the way to the MSB bit. This carry is finally consumed in the
last stage to produce the Sum. The delay is proportional to number of bits in the
input words N is approximated by
t = (N-1) t
+t
adder
carry
sum
Static CMOS full adder design:
Figure 4.5 Static CMOS Full adder design
23
Inversion Property
To design full adder this property used. Inverting all inputs to a full adder results in
inverted values for all outputs.
A B A B
Ci FA Co Ci FA Co
S S
Inversion property applied on the Boolean expression of full adder,
In ripple carry adder minimize the critical path by reducing inverting stages.
Figure 4.6 Exploit inversion property
24
The Mirror Adder:
□ The NMOS and PMOS chains are completely symmetrical. A maximum of two series
transistors can be observed in the carry-generation
circuitry. When laying out the cell, the most critical issue is the minimization of the
capacitance at node Co. The reduction of the diffusion capacitances is particularly
important.
□ The capacitance at node Co is composed of four diffusion capacitances, two internal

gate capacitances, and six gate capacitances in the connecting adder cell.
□ The transistors connected to Ci are placed closest to the output. Only the transistors in
the carry stage have to be optimized for optimal speed. All transistors in the sum stage
can be minimal size.
Figure 4.7 The Mirror Adder
25
4.2.3 TRANSMISSION GATE BASED ADDER:
A full adder can be designed to use Multiplexers and XOR implemented by

transmission gates. The propagate signal which is the XOR of inputs A and B ,is used
to select the true and complementary value of the input carry as the new sum
output. Only 16 transistors.
Figure 4.8 Transmission gate based full adder
26
A rather different full adder design use transmission gates to form multiplexers and
XORs.
Figure (4.5) shows the transistor-level schematic using 24 transistors and providing
buffered outputs of the proper polarity with equal delay.
The design can be understood by parsing the transmission gate structures into
multiplexers and an “invertible inverter” XOR structure 1 Note that the multiplexer
choosing S is configured to compute P ^ C.
4.2.4 Carry look ahead adder:

A carry look-ahead adder reduces the propagation delay by introducing more complex
hardware. In this design, the ripple carry design is suitably transformed such that the
carry logic over fixed groups of bits of the adder is reduced to two-level logic.
The sum output and carry output can be expressed in terms of carry generate and
carry propagate as
Carry look ahead principle:
Substituting C1 into C2 then C2 into C3 and C3 into C4
27
To determine whether a bit pair will generate a carry, the following logic works:
Gi=Ai.Bi
To determine whether a bit pair will propagate a carry, either of the following logic
statements work:
Pi=Ai⊕Bi
Figure 4.9 Carry Look ahead adder
28
The carry look ahead equations implemented using mirror structure.
Figure 4.10 Mirror implementation of 4 bit look ahead adder
29
Design Example: Implementing a Look ahead adder in Dynamic
logic:
Generate Block
Propagate Block
Gi=Ai.Bi
Pi =Ai i
⊕B
Fig 4.11 Propagate and generate with dynamic gates
One way of implementing the sum in domino logic is through sum selection.
The sum are computed as Si =0 Ai Xor Bi and Si =1 Ai Xor Bi .The dynamic gate is
then used to select one of these possibilities ,based on incoming carry. The
implementation of the multiplexer gate requires three logic levels, because no
complementary carry is available in domino logic. keepers should be placed at all
dynamic nodes.
30
Fig 4.12 sum select in dynamic logic
31
5. HIGH SPEED ADDERS:
1. MANCHESTER CARRY CHAIN ADDER:
The carry propagation circuit can be simplified by adding Generate and Delete signals.
The propagate path is unchanged and it passes Ci to Co output if the propagate signal (Ai
Xor Bi) is true. If the propagate condition is not satisfied , the output is either pulled low
by Di signal or pulled up by Gi‟.
4.13 Static Manchester carry gates using Propagate/generate/kill

The Dynamic implementation makes even further simplification possible. Since the
transition in the dynamic circuit are monotonic. The transmission gates can be replaced
by NMOS only pass transistors. Pre charging the output eliminates the need for the kill
signal.
Fig 4.14 Dynamic Manchester carry gates using Propagate/generate
32
A Manchester carry chain adder uses a cascade of pass transistors to
implement the carry chain. During the pre charge phase (Φ=0) all intermediate
nodes of the pass transistors carry chain are pre charged to VDD. During
evaluation the node is discharged when there is an incoming carry and the
propagate signal Pk is high or when the generate signal for stage k(Gk) is high.
Fig 4.15 Manchester carry chain adder in dynamic logic
4.5.2 CARRY BYPASS ADDER:
Consider the 4 bit full adder shown in the fig. Suppose that values Ak and Bk (0,1,2,3)
propagate signal P0,P1,P2,P3 .an incoming carry Ci,0=1 propagate under those conditions
through the complete adder chain and causes outgoing carry C0,3=1
33
If all the propagate signals P0 P1 P2 P3 =1 then C0,3= Ci,0 Else generate or delete.
When BP= P0 P1 P2 P3 =1 the incoming carry immediately forward to the next block.
Hence the name is called Carry by pass adder or carry skip adder.
P0 P0 P2 P3
G1 G1 G2 G3
Ci,0 C C C C
o,0 o,1 o,2 o,3
F F F F
A A
A A
P0 P0 P2 P3
G1 G1 G2 G3 BP=PoP1P 2P 3
C Co,0 Co,1 Co,2
i,0
FA F FA F
Multiplexer
A A Co,3
Idea: If (P0 and P1 and P2 and P3 = 1) then Co3 = C0, else “kill” or
“generate”.
Fig 4.16 carry bypass structure
Let us now compute the delay of N bit adder. We assume that the total adder is
divided in (N/M) equal length bypass stages, each of which contains M bits.
tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum
34
Fig4.16 N=16 carry by pass adder
4.5.3 Carry-Select Adder

Consider the block of adders shown in fig. which is adding bits k to
k+3.instead of waiting on arrival of the output carry of bit k-1, both 0 and 1
possibilities are analyzed. In this block two carry path is implemented. when C 0,k-1
finally settles either the result of 0 or the 1 path is selected by the multiplexer,
which can be performed with a minimum delay. The delay can be derived for this
adder
t =t +Mt +(N/M) t +t
add setup carry mux
sum
34
Fig 4.17 4 bit carry select module
Fig 4.18 16 bit carry select adder
36
4.6 Multiplier:
Very important operation. Often the speed of multiplication limits the
performance of the digital processor.Multiplications are used in many digital signal
processing applications: correlations, convolution, filtering, and frequency analysis.
Vector product, matrix multiplication.

Weighted sums required in many DSP such as Neural network, Filtering etc…
Multipliers are in fact complex adder arrays.
The analysis of the multiplier gives us some further insight on how to optimize
the performance (or the area) of complex circuit topologies.
Example: 12x5
The multiplication process may be viewed to consist of the following two steps:
Evaluation of partial products.
Accumulation of the shifted partial products
37
It should be noted that binary multiplication is equivalent to a logical AND operation.
Thus evaluation of partial products consists of the logical ANDing of the multiplicand
and the relevant multiplier bit. Each column of partial products must then be added
and, if necessary, any carry values passed to the next column.
There are a number of techniques that may be used to perform multiplication. In

general, the choice is based on factors such as speed, throughput, numerical accuracy,
and area. As a rule, multipliers may be classified by the format in which data words
are accessed, namely, Serial form Serial/parallel form Parallel form.
4.6.1 ARRAY MULTIPLIER:
The multiplier and multiplicand is expressed
Pk the partial product terms called summands. There are M*N summands which are
generated in parallel by a set of M*N AND gates.
A n*n multiplier requires n(n-2) full adders, n half adders, and n 2 AND gates. The worst
case delay is (2n+1)tg, where tg is the worst case adder delay
38
For 4-bit numbers, the expression above may be expanded as in the table
below.
Array Multiplier:
Figure 4.19 4×4 Array Multiplier
39
4.6.2 Wallace-Tree Based Multiplier:
Principle
Sum N shifted partial products Do
N- input addition efficiently
Reduced N-input addition in steps
Use counters, e.g. carry-save adder (CSA) (3/2 reduction)
CSA is simple, it is just a full adder
At the end of the array you need to add two parts together.
This take a fast adder, but you only need one at the end, not one for each partial
product.
The delay through the array addition (not including the CPA) is proportional to log1.5(n),
where n is the width of the Wallace tree.
40
4.6.4 BRAUN MULTIPLIER:
The simplest parallel multiplier is the Braun array. All the partial products are
computed in parallel, then collected through a cascade of Carry Save Adders.
The completion time is limited by the depth of the carry save array, and by the
carry propagation in the adder. Note that this multiplier is only suited for
positive operands. The structure of the Braun algorithm for the unsigned binary
multiplication is shown in figure 4.20
Fig 4.20 Braun Multiplier
41
4.6.3 BAUGH WOOLEY MULTIPLIER:
In signed multiplication the length of the partial products and the number of
partial products will be very high. So an algorithm was introduced for signed
multiplication called as Baugh- Wooley algorithm. The Baugh-Wooley multiplication
is one amongst the cost-effective ways to handle the sign bits. This method has
been developed so as to style regular multipliers, suited to 2's complement
numbers.
Let two n-bit numbers, number (A) and number (B), A and B are often
pictured as
Where and area unit the bits during A and B, severally and −1 and −1 area unit the
sign bits. The full precision product, P = A × B, is provided by the equation:
42
The first two terms of above equation are positive and last two terms are
negative. In order to calculate the product, instead of subtracting the last two
terms, it is possible to add the opposite values. The above equation signifies the
Baugh-Wooley algorithm for multiplication process in two’s complement form.
Baugh-Wooley Multiplier provides a high speed, signed multiplication algorithm. It

uses parallel products to complement multiplication and adjusts the partial
products to maximize the regularity of multiplication array. When number is
represented in two’s complement form, sign of the number is embedded in Baugh-
Wooley multiplier. This algorithm has the advantage that the sign of the partial
product bits are always kept positive so that array addition techniques can be
directly employed. In the two’s complement multiplication, each partial product bit
is the AND of a multiplier bit and a multiplicand bit, and the sign of the partial
product bits are positive.
Fig 4.21 Baugh wooley multiplication
43
The above algorithm implemented as architecture
Fig 4.21 Baugh wooley multiplication
44
4.6.4 Booth Multiplier:
Booth„s Algorithm is a smart move for multiplying signed numbers. It
initiate with the ability to both add and subtract there are multiple ways to
compute a product. Booth„s algorithm is a multiplication algorithm that
utilizes two„s complement notation of signed binary numbers for
multiplication.
When multiplying by 9:
Multiply by 10 (easy, just shift digit left) Subtract once
E.g.123454 x 9 = 123454 x (10 –1) = 1234540 – 123454
Converts addition of six partial products to one shift and one

subtraction Booth’s algorithm applies same principle Except no
‘9’in binary, just ‘1’ and ‘0’ So, it’s actually easier!
BOOTH ENCODER: Booth multiplier reduce the number of iteration step to

perform multiplication as compare to conventional steps. Booth Algorithm Scans the
multiplier operand and spikes chains of this algorithm can. This algorithm can reduce
the number of addition required to produce the result compare to conventional
multiplication method. With the help of this algorithm reduce the number of partially
product generated in multiplication process by using the modified booth algorithm.
Based on the multiplier bits, the process of encoding the multiplicand is performed by
radix-4 booth encoder. This recoding algorithm is used to generate efficient partial
product.
45
Booth encoding Algorithm:
46
Tree Multiplier with Booth Encoding :
Fig 4.22 Tree Multiplier with Booth Encoding
47
4.7 Barrel Shifter:
A barrel shifter performs a right rotate operation . As mentioned earlier, it
handles left rotations using the complementary shift amount. Barrel shifters
can also perform shifts when suitable masking hardware is included.
Barrel shifters come in array and logarithmic forms; we focus on logarithmic

barrel shifters because they are better suited for large shifts. Figure 4.23 shows
a simple 4-bit barrel shifter that performs right rotations.
Notice how, unlike funnel shifters, barrel shifters contain long wrap-around
wires. In a large shifter, it is beneficial to upsize or buffer the drivers for these
wires
Performing logical or arithmetic shifts on a barrel shifter requires a way to
mask out the bits that are rotated off the end of the shifter.
Fig 4.23 Barrel Shifter
48
4.8 ALU:
An ALU is a Arithmetic Logic Unit that requires Arithmetic operations and Boolean
operations. Basically arithmetic operations are addition and subtraction. one may
either multiplex between an adder and a Boolean unit or merge the Boolean unit
into the adder as in the classic transistor-transistor logic.
49
The heart of the ALU is a 4-bit adder circuit. A 4-bit adder must take sum of
two 4- bit numbers, and there is an assumption that all 4-bit quantities are
presented in parallel form and that the shifter circuit is designed to accept and shift
a 4-bit parallel sum from the ALU. The sum is to be stored in parallel at the output
of the adder from where it is fed through the shifter and back to the register array.
Therefore, a single 4-bit data bus is needed from the adder to the shifter and
another 4-bit bus is required from the shifted output back to the register.
Figure 4.25 Bit ALU 4-bit data path for processor :
50
9. DESIGNING MEMORY AND ARRAY STRUCTURES
INTRODUCTION:
Memory is classified into two categories. They are
1. Background Memory
2. Foreground Memory
1. Background Memory:
A Large amount of centralized memory core are referred as Background

memory. Ex: Semiconductor Memories such as SRAM and DRAM
2. Foreground Memory:
A memory that is embedded in to logic itself is called foreground
memory.
Ex: Latch, Register and Flipflop Semiconductor
Classification
Fig 4.26 Semiconductor memory classification
The memories are comes in many different formats and styles. The type of memory unit
that is preferable for particular application depends on size, the time to access stored
data, the access pattern and system requirements.
3.2 Memory Architecture and Building blocks:

In N x M memory, N is the number of word and each word having M number of
bits. This can be shown below:
51
Fig. 4.27 Intuitive Architecture for NxM Fig. 4.28 A decoder reduces
Memory the number of
address bits
In order to reduce complexity to access the stored data, the column decoder
is used to select one particular cell out of M bits. This can be shown in Fig 4.27
One word is selected from N number of word by using select lines S 0 , S1……
SN-1. For example, a memory holds 1 million (N=106) 8 bit (M=8) words. This 1 million is
simplification of the actual memory size, because the memory size always comes in
power of 2. Thereby the actual number of words equals
= 1,048,576 = 106 Such memory can be expressed as 1 Mword unit.
To choose220
any one memory location, we need select lines. The number of select
lines are 1 million. Each one is required to select each word. It is very complex, so to
reduce the select lines, decoders are used.
If N is the number of word, k is the number of select lines required. It is given by
K = log2N
Again the column decoder is used to select one particular cell from M number of
bits. This can be shown in Fig. 4.28.
Memories are organized as array in order to maintain the lesser delay to

access each and every cell. This architecture works well up to 64Kbits to 256Kbits.
52
Fig. 4.29 Array- Structured Memory Organisation
The horizontal select line that enables a single row of cell is called the
word line, while the wire that connect the cells in a single column to the input
output circuitry is named the bit lines the area of large memory module is
dominated by the size of the memory code that it is crucial to keep the size of the
basic storage cell as small as possible.
Semiconductor memory cell area can be reduced by some desired

properties of digital circuits such as noise margin, logic swing, input/output isolation,
fan out or speed. For example it is common to reduce voltage swing on the bit lines
to your value substantially below the supply voltage, this produces both the
propagation delay and power consumption. So a careful control of the cross-talk and
other disturbance is possible within the memory array ensuring that sufficient noise
margin is obtained even for the small signal swings. On the other hand, it requires
large signal to interface external world, sense amplifiers are used to amplify the
internal swing to full rail-to-rail amplitude.
If the memory size increases, then the speed will be reduced to

access/write data due to 1. Length 2. Capacitance 3. Resistance of the word and bit
53
line. So the memory is partitioned in to smaller blocks (P). Thereby an extra
address line i required
Fig. 4.30 Hierarchical Memory Architecture

to select the block first and then select a cell by using row and column address. The
extra select line is called block address. This can be shown in Fig.3.5. The advantage
of this architecture 1. Shorter wires within blocks 2. Block address activates only 1
block which savings power.
Problem:
A 4 Mbit SRAM can be designed as a composition of 32 blocks, each of which
contain 128Kbits. Each block is structured as an array with 1024 rows and 128
columns. Find out the number of row address(x) , column address(y) and block
address(z).
Solution:
Number of rows = 1024
2X = 1024 = 210 Row address = x = 10
Number of columns = 128
2Y = 128 = 27 Column address = Y = 7
Number of blocks = 32
2Z = 32 = 25 Block address = 5
54
4.10 Memory Core
While designing large memories, the size of the memory cell must be as small
as possible without affecting the design quality such as speed and reliability. The
various types of memory core are read only, non volatile and read-write memory
cores.
1. Read only memory:

It is a non-volatile memory that is the data stored in the cell is permanent
even when the power is off.It is used to store the program for processor with a fixed
application such as washing machine, calculator and game machines.
ROM Cells:
consider the simplest cell, which is ROM based cell. In this, presence or
absence of diode between WL and BL differentiates between ROM cells storing 1 or a 0
respectively.
Fig. 4.31 Different approaches for implementing 1 and 0 in ROM cell

The disadvantage of diode cell is that it does not isolate the bit line from the
word line. All current required to charge the bit line capacitance which can be quite High
for large memories, has to be provided through the word line and its driver. Therefore it
is suitable only for small memories. So to overcome this drawback, the diode is replaced
by the gate - source connection of an NMOS transistor whose drain is connected to the
supply voltage.
55
All output driving current is provided by the MOS transistor in the cell. The
word-line driver is only responsible for charging and discharging the word-line
capacitance. An example of 4 x 4 OR ROM cell array is shown in Fig. 4.32
Fig. 4.32 A 4 x 4 OR ROM Cell array
Programming ROM Memory:

According to NOR concepts, the output is 1, if all inputs 0. Thereby WL(0)
………..WL(3) are 0, then BL(0)………BL(3) are 1. if WL(0) is 1, then BL(1) is 0.
Fig. 4.33 A 4 x 4 MOS NOR ROM
56
A 4 x 4 MOS NAND ROM is given in Fig. 4.34. According to NAND
concepts, all the outputs BL(0), BL(0), BL(0), BL(0), are 1. If the inputs are WL(0),
WL(0), WL(0), WL(0), 0. If WL(1) is 1, the BL(0) is 0.
Fig. 4.34 A 4 x 4 MOS NAND ROM
2. Non-Volatile Read-Write Memory:

The architecture of NVRW memories are identical to ROM architecture.
In a ROM, the programming is accomplished by mask level alteration, whereas in
NVRM, the programming can be accomplished by altering the threshold of the
transistors electrically.
i. EPROM:
It can be erased by using ultraviolet rays on the cells through a
transparent window on the IC package. The main disadvantage of EPROM is that, it
must be removed from the board before erasing procedure.
ii. EEPROM:
It can allow to inject or remove charges from floating gate called
tunnelling. So this mechanism is based on Fowler-Nordheim tunnelling. The main
57
advantage is that, erasing is simply achieved by reversing the voltage applied to the
floating gate during writing process. But the repeated programming causes a drift in
the Vt due to malfunction or inability to reprogram the device.
iii. Flash EEPROM(Flash):

It is the most popular non-volatile memory. It is the combination of EPROM
and EEPROM. The main difference is that, erasing is performed in bulk for the
complete chip. It is used in various applications like memory card and USB drives.
The Flash memory cell uses floating gate transistor. Since the floating gate is
surrounded by sio2 (which is an excellent insulator), the charge trapped in floating
gate can be stored for many years, even when the supply voltage is removed
creating non-volatile storage mechanism. But the main concern is that the floating
gate approach requires high voltage for programming (12V). The device used in flash
memory is shown in Fig. 4.34
Fig. 4.34 The Floating Gate Transistor (FAMOS)
The three modes of operation as follows: 1. Write 2. Erase 3. Read.

Write Operation: When a high voltage ( above 10V) is applied at gate with respect
to source, then electrons in the p-substrate acquires high energy and injected in to
floating gate. The trapping of electrons will lower the threshold voltage, which is
58
Called self-limiting. The charge injected in to the floating gate can shifts the I-V
characteristics of transistor. It can be shown below Fig. 3.11
Fig. 3.11 I-V curve shifting by hot electron programming

Erase Operation: When 0V is applied at the gate, then the electrons are
ejected from the floating gate.
Read Operation: If 5V is applied at the gate(when no electrons in

Floating gate), then the current flow from Drain to source which shows 0 state.
If 5V is applied (when electrons present in floating gate), then the current does
not flow means, it shows 1 state.
Fig. 3.12 Programming the Floating Gate

Transistor
59
3. Read-Write Memory:
RWM or RAM memories are classified in to two categories depends on
either positive feedback or capacitive charge.
i. SRAM(uses positive feedback)

It is similar to SR latch and it requires six transistor/bit. The important
key
points are 1. It preserve data as long as supply is applied. 2. Fast 3. Requires large
area 4. Produces differential output Q and Q.
The word line is used to read and write the data on bit line BL and BL. When WL is
1, then M5 and M6 are on, thereby the data stored in Q and Q available on BL and
BL respectively(read operation). Similarly when WL is 1, the data on BL and BL are
written on Q and Q respectively(write operation).
Fig.4.35 6T CMOS SRAM Cell
ii. DRAM( use of capacitive charge):
The 3-Transistor dynamic memory cell is given in Fig. 3.14. The

important
60
Key points are 1. Periodic refreshment required to preserve data 2. Requires less
area 3. Slower 4. Produces single ended output.
Fig. 4.36 3T Dynamic Memory Cell Fig. 4.37 1-T dynamic Memory Cell
The operation of Fig. 4.36 3T Dynamic Memory Cell is shown below:

Write operation: When WWL(write word Line) is 1, then M1 goes to On state. Thereby
the data on BL1 is stored in node x.
Read operation: When the read word line (RWL) is 1, then M3 goes on. Depending on
stored value on node x, M2 either on or off. If x is 1, then M2 goes on and already M3 is in
on condition, then BL2 = 0. If x is 0, then M2 goes off, then BL2 maintain its value 1.
The complexity can be reduced by using 1-T dynamic memory cell and the
operation of Fig. 4.37 3T Dynamic Memory Cell is shown below:
Write operation: When WL is 1, then M1 goes to On state. Thereby the data on

BL is stored in capacitance.
Read operation: When WL is 1, then M1 goes on. Thereby the data on

capacitance is available on BL line.
61
4. Contents addressable or Associative
Memory(CAM)
Fig. 4.39 Application of CAM cell-High

Fig. 4.38 9-Transistor CAM
Cell performance on-chip cache
memory
A CAM is a special type of memory device, that stores data, but also has
ability to compare all the stored data in parallel with incoming data in an efficient
manner. The cell combines 6T RAM storage cell( M 4 - M 9 ) with 1 bit digital
comparison(M1-M3). When the cell is to be written, complimentary data is forced
onto the bit lines, while the word line is enabled as in a standard SRAM cell.
In the compare mode, the stored data S and Sbar are compared to the
incoming data, which is provided on the complementary bit lines Bit and Bit bar. The
match line is tied to all the CAM cells in a given row, and is initially precharged to
VDD. If S and Bit match, the internal node int is discharged, and M1 is turned off,
keeping the match line high. However, if the stored and incoming bit are different,
int is charged to VDD-Vt causing the match line to discharge. The application of CAM
cell to a fully associative cache memory is shown in Fig. 4.38.
62
3.3. Memory Peripheral Circuitry
Always there is an a trade between the performance and reliability for
reducing area in the case of memory core. So the circuit designer will concentrate more
on the peripheral circuitry to recover both speed and electrical integrity. Some of the
Memory Peripheral Circuitry are
1. Address Decoder
2. Sense amplifier
3. Voltage Reference
4. I/O drivers/buffers
5. Memory timing and control
Address Decoder
Whenever a memory allows random address-based access, address
decoder must be present. The two types of decoders are
6. Row decoder: The task is to choose one memory row out of 2 M
7.Column and Block Decoder: The task is to choose one memory row out of 2 K
The
geometry matching between the cell dimensions of decoders and thememory core is
must which is called pitch matching. if it fails, this will lead to delay and power
dissipation.
Row Decoder
consider and 8 bit address Decoder. Each of the outputs WLi is a logic
function of the 8 input address signals (A0 to A7 ). For example, the rows with address
0 and 127 are enabled by the following logic functions:
WL0 = A0 A1 A2 A3 A4 A5 A6 A7
WL127 = A0 A1 A2 A3 A4 A5 A6 A7
63
This function can be implemented in two phases, using single 8 input NAND gate
and an inverter. For single stage implementation, it is converted in to wide NOR
using De Morgan's rule.
WL0 = A0 + A1 + A2 + A3 + A4 + A5 + A6 +
A7 WL127 = A0 + A1 + A2+ A3+ A4+ A5+ A6+A7
To implement this function, an 8 input NOR gate is needed per row. The propagation delay
of decoder is the sum of read and write access times. The decoder can be designed by the
following ways:
1. Static Decoder design
2. Dynamic Decoder design
Static Decoder design:

Implementing decoder using wide NOR in complementary CMOS is impractical.
one possible solution is to use pseudo NMOS design style which allows efficient
implementation of wide NORs. But the power dissipation concern make this approach not
very attractive. So splitting a complex Gate in to two or more logic layers, most often
produces both a faster and cheaper implementation. So segments of the address are
decoded in first logic layer called the predecoder. The second layer of logic gates then
produces the final word line signals.
Consider the 8 input NAND decoder.The expression for WL0 can be

regrouped in the following way:
WL0 = A0 A1 A2 A3 A4 A5 A6 A7
= (A0 + A1) (A2 + A3) (A4 + A5) (A6 + A7)

In this case the address is partition into sections of two bits that are decoded in
advance. The resulting signals are then combined using 4 input NAND to produce the fully
decoded array of word line signals and the resulting structure is given in Fig. 3.6.
64
Fig. 4.40 A NAND decoder using 2 input
predecoder
The use of pre-decoder has following advantages:

1. it
reduces the number of transistor required. Assume that the pre-decoder is
implemented in
complementary static CMOS, the number of active devices in the 8 input decoder equals =
(256 x 8) + (4 x 4 x 4 ) = 2,112 which is 52% of a single-stage decoder which would required
4096 transistors.
2. As the number of inputs to the gate is halved, the propagation delay is reduced by
approximately a factor of 4.
Dynamic Decoder design:

Dynamic logic offers a better alternative for designing decoders. A first solution is
presented in where the transistor diagram and the conceptual layout of 2 is to 4 Decoder is
depicted. Note that this structure is geometrically identical to the normal NOR ROM array
defering only in the data patterns.
NOR decoders are substantially faster but they consume more area than their
NAND counterparts and drastically more power. This is clear from the following observation:
only a single word line is being pulled down after the precharge in a NAND decoder while
only a single wire stays high in the NOR decoder.
65
Fig. 4.41. 2 input NOR Decoder
Fig. 4.42. 2 input NAND

Decoder Column and Block Decoder
The functionality of a column and block Decoder is best described as 2 K input

multiplexer, where K stands for the size of the address word. To implementation of this
multiplexing functions are in general use, which one to choose depends on the area
66
performance and architectural considerations. one implementation is based on
CMOS pass transistor multiplexer which is shown in Fig. 4.43.
Fig.
4.43. Four input pass transistor based column decoder usi ng pre-
decoder
The main advantage of this approach is its speed. because only a single pass
transistor is inserted in the signal path which introduced only a minimal extra the
resistance. The column decoding is one of the last actions to be performed in the read
sequence, so that the predecoding can be executed in parallel with other operations such
as the memory access and sensing and can we perform as soon as the column address is
available. Its propagation delay does not to the overall memory access time. Slower
implementation such as Nand decoders might even be acceptable. The disadvantage of
the structure is its large transistor count. (K + 1)2K
+ 2K devices are needed for 2K input decoder. For example 1024 to 1 column
decoder requires 12,288 transistors.
A more efficient way of implementing decoder by a tree decoder which uses binary
reduction scheme. No predecoder is required. The number of transistor count is
drastically reduced.
67
N tree
= 2K + 2 K-1
+…….+ 4 + 2 = 2 x (2K – 1)
Fig. 4.44 4 to 1 tree based column decoder

This means that a 1024 to 1 decoder requires only 2046 transistor, a reduction by a
factor of 6!.
Sense amplifier
Sense amplifier play a major role in the functionality performance and
reliability of memory circuits. They perform the following functions:
1. Amplification: In certain memory structure such as the 1T DRAM, amplification is

required for proper functionality since the typical circuit swing Limited 200
millivolts.
2. Delay reduction: The amplifier compensates for the restricted fan out driving
capability of the memory cell by accelerating the bit line transition or by
detecting and amplifying the small transactions on the bit line to large signal
output swings.
68
3. Power reduction: Reducing the signals going on the bit lines can eliminate a
substantial part of the power dissipation related to charging and discharging the bit
lines.
4. Signal restoration: Because the read and refresh functions are intrinsically linked in
1T DRAM, it is necessary to drive the bit lines to the full signal range after sensing.
The differential amplifier take small signal differential inputs that is the bit
line voltage and amplifies them to a large signal single ended output. It is generally
known that a differential approach presents numerous advantage. Such an amplifier
reject noise that is equally injected to both inputs. This is especially attractive in
memories that the exact value of BIT lines signal varies from die to die and even for
different locations on a single die. The effectiveness of a differential amplifier is
characterized by its ability to reject the common noise and amplify the true
difference between the signals. The signals common to both inputs of suppressed at
the output of the Amplifier by a ratio called common mode rejection ratio. Similarly
Spikes on the power supply are suppressed by a ratio called power supply rejection
ratio.
Fig. 4.45 Basic Differential Sense amplifier

Circuit
69
The input signals bit and bit or heavily loaded and driven by the SRAM memory cell.
The swing on those lines as small as a small memory cell drives a large capacitive
load. The inputs are fed to the differential input devices M1 and M2 and transistors
M3 and M4 act as an active current mirror load. The amplifier is conditioned by the
sense amplifier enable signal SE. Initially the inputs of precharged and equalize to a
common value while SE is low disabling the sensing circuit. Once read operation is
initiated, one of the bit line drops. SE is enabled when a sufficient differential signal
has been established and amplifier evaluates.
The gain of such differential to single ended amplifier is given by
A = - g (r ǁr )
sense m1 o2 o4
Where gm1 is the transconductance of input transistor and ro is the small signal
resistance of transistor.
Voltage References:
The operation of sophisticated memory required number of voltage references and
supply levels including the following:
▪ Boosted word line voltage: In a conventional 1T DRAM cell using an NMOS pass
transistor, the maximum voltage level that can be written on to a cell level equals VDD-
VT which negatively impact the reliability of the memory. By raising the voltage to
VDD+VT, the charge pump can be used.
▪ Half VDD: DRAM bit lines of precharged VDD/2. This voltage must be generated on
chip.
▪ Reduce internal supply: Most memory circuits operate at a lower power supply
than the external power supply. DRAM use internal voltage regulators to generate
the required voltages while being compliant with the standard interface voltages.
▪ Negative substrate bias: An effective means to control the threshold voltage

with the main memory is to apply negative substrate biasing, augmented with a
control loop. This approach has been used all recent generation of DRAM memories.
70
The design of voltage reference is fall under the category of analog circuit
design. A some of the reference circuit is given below:
1. Voltage Down Converter:

Voltage down converters are used to create low internal supplies, allowing
the interface circuits to operate at high voltages. The Fig. 4.46 shows the basic
structure of a voltage down converter which is also called a linear regulator. It is
based on the operational amplifier that was described in the last section. The circuit
used as a large PMOS output driver transistor to drive the load of the memory
circuits(an NMOS output device may also be used). The circuit uses negative
feedback to set the output voltage VDL to the reference voltage. The converter must
offer a voltage that is immune to variations in operating conditions. Slow variations
such as temperature changes can be compensated by the feedback loop.
Fig.4.46 A voltage regulator and its equivalent representation
2. Charge Pump:
A charge pump is an ideal generator for the word line boosting and do not
draw much current. The concept is best explained with the simple circuit of Fig.4.47.
Transistors M1 and M2 are connected in diode style. Assume initially that the clock
is
71
high. During this phase node A is at ground and node B at VDD-VT. The charge
stored in the capacitor is given by
Q = Cpump (VDD – VT)

During the second phase, clock goes raising node A to VDD. Node B rises in concert,
effectively shutting off M1. When B is one threshold above Vload, M2 starts to conduct
and charges transferred to Cload.
Fig. 4.47 Simple Charge Pump and its signal waveform

During consecutive clock cycles, the pump continuous to deliver charged to
the load until the maximum voltage of 2 (VDD – VT) is reached at the output. The
amount of current that can be drawn from the generator is primarily determined by
the capacitor size and the clock frequency.
Voltage Reference generator:

An accurate and stable voltage reference is an important component of the
voltage down converter. The reference voltage is assumed to be relatively constant
over power supply and temperature variations. The figure 3.13 shows an example of
your VT reference generator.
72
The bottom devices M3 and M4 act as a current mirror to force the same current
through the drain of M and the resistor
1
R1. By making the device M1 large and keeping the
current small enough, the source to gate voltage for M can be made
approximately equal to
1
V as can be derived from the equation

TP,
2
= VTP + 1
VGS,M1 M1,
Fig. 4.48 A simple V reference generator

T
Also the current flowing through the resistor and drain current M1 both are equal V
TP
/R1. Note that M2 act as biasing transistor. Since device M1 and M5 experience the same gate-to-
source voltage, the drain current of M1 is mirrored to M5. The reference voltage is given by
V =
REF 1
VTP 2
Buffers/Drivers:
The length of word and bit lines increases with increasing memory sizes. Even
though some of the associated performance degradation can be alleviated by partitioning
the memory array, a large portion of the read and write access time can be attributed to
the wire delays. A major part of the memory Periphery area is therefore allocated to the
drivers, in particular the address buffers and I/O drivers.
73
Timing and Control:
A careful timing of the different events such as address latching, word line
decoding, bit line precharging and equalization, sense amplifier enabling and output
driving are necessary if maximum performance is to be achieved. Although the timing and
control circuitry only occupies a minimal amount of area, its design is an integral and
defining part of the memory design process. It requires careful Optimization, and the
execution of extensive and repetitive SPICE simulations over a range of operating
conditions. So the different memory timing approaches can be classified as clocked and self
timed.
DRAM-A Clocked Approach:

In early days, memories have opted for a multiplexed addressing scheme, that
the row address and column address are presented in sequence on the same address bus
to save package pins.
Fig. 4.49 Read cycle Timing diagram of 4Mx1 DRAM

Memory
74
In this scheme, the user must provide two main control signals-RAS (row-
address strobe) and CAS (Column-address strobe) that indicate the presence of the
row and column addresses respectively. Another control signal (W) indicates if the
intended operation is read or write. These signals can be interpreted as external
clock signals, and are used to time the internal memory events. Similar to the
second clocking approach, the RAS and CAS signal must be sufficiently separated so
that all the ensured operations have come to the completion. The Fig. 3.15 shows a
simplified timing diagram of a 1 x 4 Mbit DRAM memory.
To make the memory more robust while preserving the performance. So a

dummy word line should be included in the timing generation circuitry. Thus it is
concluded that the DRAM combined a synchronous approach at the global level with
self timed techniques for the generation of some of the local signals.
Synchronous DRAM Memories:

One of the main challenge of the DRAM memory is the large access time
and the low throughput. With ever increasing speed of microprocessors and their
SRAM catches, the performance gap between processor core and DRAM main
memory is getting larger. This gap is becoming one of the main challenges in high
performance system design. To address this problem high speed DRAM synchronous
memory such as SDRAM (synchronous DRAM) and RDRAM have been introduced.
These memories present a major difference from the RAC/CAS timing

model of traditional memories. It exploits the highly parallel, in other words for a
given read and write cycle a large number of bits can be read and written at the
same time. This comes at the penalty of extra latches and buffers at the interface in
to the memory core, as well as high speed circuitry to support the high data rate I/O
interface.
Consider RDRAM as an example. Input / output data is transferred serially

on a narrow bus taking several clock cycles. The bus is operated at a very high
speed, however, and uses efficient packet protocols. Thus, a large amount of data
75
Fig. 4.51 RDRAM Architecture
can be transferred in a short period. Multiple memory chips can be connected to the
bus called the Rambus channel. The schematic diagram of the input/ output circuitry
of an RDRAM is given in the Fig. 4.51.
SRAM-A self-timed Approach:

It acts as the source of most timing signal and is an integral part of the
critical timing path. Speed is thus of utmost importance. A possible implementation
of an ATD is shown in Fig.4.52. It consists of a number of transition triggered one
Shots. A transition on any of the input signals causes ATD to go low for a period td.
The resulting pulse acts as the main timing reference for the rest of the memory,
which results in a huge fan-out.
Fig. 4.52 Address Transition Detection

circuitry 76
6.4 Assignment
Assignments ( For higher level learning and Evaluation -

Examples: Case study, Comprehensive design, etc.,)
UNIT IV
CO K
Q.No Questions
Level Level
1. Generate a brawn multiplier and implement the same using CO4 K3

verilog coding
Construct Wallace tree multiplier and write verilog code CO4

2. K3
for Wallace tree multiplier.
3. Compare the parameters of various types of Adders CO4 K3

and Multipliers.
4. Explain in detail speed tradeoff of data path circuits. CO4 K3
5.
Explain logarithmic shifter. CO4 K3
76
6.5 Part A Q &
A Unit-IV
S.NO Question and Answers CO K
LEVE
L
How data path can be implemented in VLSI system?

A data path is best implemented in a bit-sliced fashion. A single layout
CO4 K1
1 is used repetitively for every bit in the data word. This regular approach
eases the design effort and results in fast and dense layouts
Write short note on the performance of ripple carry adder.
A ripple carry adder has a performance that is linearly proportional to
the number of bits. Circuit optimizations concentrate on reducing the

CO4 K1
2 delay of the carry path. A number of circuit topologies exist proving that
careful optimization of the circuit topology and the transistor sizes helps
to reduce the capacitance on the carry bit

What are the advantages of ripple carry adder?
Circuit realization is very simple Consumes less power Compact layout
CO4 K1
3 giving smaller chip area
What is carry skip adder?

A carry skip adder consists of a simple ripple carry adder with a special
speed up carry chain called a skip chain. The carry skip circuitry CO4 K1
4
consists of two logic gates. The AND gate accepts the carry in bit and
compares it to the group propagate signal.
What is mirror adder?
In this circuit realization the PMOS network is identical to the NMOS

CO4 K2
5
network rather than being the conduction complement, so the topology
is called a mirror adder.
78
6.5 Part A Q &
A
S.No QuestionUnit-IV
and Answers CO K
LEVE
L
6. CO4 K1
What are the advantages of carry skip adder?
The propagation delay is smaller compare to ripple carry adder when

optimal stages are used. The carry skip adder is shown to be superior
to constant width carry skip module the advantages being greater at
high precisions
7. CO4 K2
What is the logic of adder for increasing its performance?
Other adder structures use logic optimizations to increase the

performance ( carrybypass, carry select, carry look ahead).
Performance increase comes at the cost of area.
8 CO4 K1
Define input ordering.
For PMOS and NMOS the inner inputs encounters the body effect and
requires high threshold voltage to turn on. By input ordering the rare
changing inputs are moved to inner inputs. This provides sufficient
power saving.
9. CO4 K2
Write down the expression for worst case delay for RCA.
t = (n-1) tc+ts
10. Write down the expression to obtain delay for N-bit CO4 K2
carry bypass adder.
tadder = tsetup +M*tcarry +(N/M-1)*tbypass +(M-1)*tcarry + tsum
79
6.5 Part A Q &
A
Unit-IV
S.No Question and Answers CO K
LEVE
L
11 CO4 K1
Define braun multiplier.
The simplest multiplier is the Braun multiplier. All the partial products
are computed in parallel, and then collected through a cascade of
Carry Save Adders. The completion time is limited by the depth of the
carry save array, and by the carry propagation in the adder. This
multiplier is suitable for positive operands.
12 CO4 K1
Why we go for booth’s algorithm?
Booth algorithm is a method that will reduce the number
of
multiplicand m u l t i p l es . For a given number of ranges to be
represented, a higher representation radix leads to fewer digits

13 CO4 K1
List the different types of shifter.
Array shifter
Barrel shifter
Logarithm
shifter
14 CO4 K1
What are the various shift operations available?
Logical left shift
Logical right shift
Arithmetic left shift
Arithmetic right
shift
15. How many storage locations are available when a CO4 K2
memory device has twelve address lines?
The number of storage locations available for memory device with

81
12 address lines are 212 = 4096. 80
6.5 Part A Q &
A
S.No
Unit-IV
Question and Answers CO K
LEVE
L
16 Find out the number of row address(x) , column address(y) CO4 K2
and block address(z). A 4 Mbit SRAM can be designed as a
composition of 32 blocks, each of which contain 128Kbits.
Each block is structured as an array with 1024 rows and
128 columns.
Number of rows = 1024

2X = 1024 = 210 Row address = x =
10
Number of columns = 128
2Y = 128 = 27 Column address = Y = 7

Number of blocks = 32
2Z = 32 = 25 Block
address = 5
17 Why diode based ROM cell not suitable for large CO4 K1
memories? The disadvantage of diode cell is that it does not
isolate the bit line from the word line. All current required to
charge the bit line capacitance which can be quite High for large
memories, has to be provided through the word line and its driver.
Therefore it is suitable only for small memories.
18 CO4 K2
Draw the schematic symbol of FAMOS
81
6.5 Part A Q &
A
S.No
Unit-IV
Question and Answers CO K
LEVE
L
19. Draw 6T CMOS SRAM cell. CO4 K2
20 Draw 3T CMOS D RAM CO4 K2

cell.
21 Draw 1T CMOS DRAM cell. CO4 K2
82
6.5 Part A Q &
A
S.No QuestionUnit-IV
and Answers CO K
LEVE
L
22 What is CAM? CO4 K2
A CAM is a special type of memory device, that stores data, but also
has ability to compare all the stored data in parallel with incoming
data in an efficient manner. The cell combines 6T RAM storage cell
with 1 bit digital comparison.
23 Draw a basic differential sense amplifier circuit. CO4 K2
24 Why NOR decoder are faster than NAND decoder? CO4 K2

NOR decoders are substantially faster but they consume more area
than their NAND counterparts and drastically more power. This is clear
from the following observation: only a single word line is being pulled
down after the precharge in a NAND decoder while only a single wire
stays high in the NOR decoder.

25. Why sense amplifier used in semiconductor memories. CO4 K2
Semiconductor memory cell area can be reduced by some desired
properties of digital circuits such as noise margin, logic swing,
input/output isolation, fan out or speed. For example it is common to
reduce voltage swing on the bit lines to the value substantially below
the supply voltage, this reduces both the propagation delay and
power consumption. On the other hand, it requires large signal to
interface
external world, sense amplifiers are used to amplify the internal swing
83
to full rail-to-rail amplitude 83
6.6 Part B Questions
S.No Questions COK LEVEL
1 Describe in detail Booth‟s Multiplication algorithm and its
hardware implementation. CO4 K2
2 Discuss about the different high speed adders. CO4 K2

3 Discuss in detail about low power memory circuits.
CO4 K2
4 Draw the diagram of a carry look ahead adder and explain

the principle. CO4 K2
5 Explain about multipliers with necessary diagrams? CO4 K2

6 Explain about speed and tradeoff in arithmetic building blocks CO4 K2
7 Explain briefly
i) Barrel shifter CO4 K2
ii) Ripple carry adder
8 Discuss the following high speed adders:
i) Carry Bypass adder
CO4 K2
ii) Carry propagation adder
9 Design a Booth multiplier with Carry Save Adder

implementation to add the partial products CO4 K2
10 Explain the mirror implementation of the adder in detail

CO4 K2
11 Explain in detail about array multiplier. CO4 K2

12 Explain construction and working of SRAM and DRAM. CO4 K2
13 Explain the building blocks of Memory architecture. CO4 K2
84
6.7 Supportive online Certification courses (NPTEL,
Swayam, Coursera, Udemy, etc.,) for EC8095 VLSI DESIGN
S.No me of the Course Name of Duration Link

Na the online
platform
1 VLSI CAD Per-1 Coursera 23 Hours https://www.courser
a.org/learn/vlsi-cad-l
2 MOS Transistor Coursera 18 hours https://www.cours
ogic
era.org/learn/mosf
et
3 CMOS NPTEL 8 weeks https://onlinecours
Digital VLSI
Design es.nptel.ac.in/noc2
1_ee09/preview
4 VSD Custom Udemy 4.5 https://www.ude
Layout hours m
y.com/course/vlsi-
academy-custom-
layout/
5 VSD NPTEL 5 hours https://nptel.ac.in/
Physical c
Design ourses/106105161
85
Supportive Link to Videos UNIT IV
S.No Topic Link

4 Bit Carry Look Ahead Adder
https://www.youtube.com/watch?v=
1 transistor level implementation
using Static CMOS Logic WItAXzrfPrE
https://www.youtube.com/watch?v
2 High speed adders
=b9Upbz5jvC8
Array Multiplier https://www.youtube.com/watch?v
3 =5-PI4T25OXI
SRAM Operation - https://www.youtube.com/watch?v
4
Memory and Storage =0BM97a7p6Zo
Circuits
VLSI Design Technology https://youtu.be/PUxrn6NJIlM
5
VLSI Physical Design https://youtu.be/lRpt1fCHd8Y

6
7
Design of Adders https://youtu.be/6E_cAf0Keng
86
6.8 Real time Applications in day to day life and
to Industry
1. Phased-array technologies with beam-steering capability continue to be a major

application for millimeter-wave CMOS, driving the development of highly
integrated millimeter-wave transistors. These applications may lead to reduced
costs and the greater adoption of CMOS MMICs for radar, backhaul, 5G, and
gigabit Wi-Fi.
2. CMOS technology is used in a wide range of analog circuits which includes data
converters, image sensors & highly incorporated transceivers for several kinds of
communication.
3. Used in designing Computer memories and CPUs.
4. Used in the implementation of Microprocessor designs.
5. Used in the implementation of Flash memory chip designing.
6. Used in designing the application-specific integrated circuits (ASICs).
7. It is used in the Real time applications of Pendrive, External Hard Disk, Smart
Watch, Air Conditioner, Washing Machine, Microwave Oven, Refrigerator,
Calculator, Toaster ,Dish Washer, Digital Alarm Clock and Thermostat
87
6.9 Contents beyond the syllabus
Unit IV 8-BIT KOGGE
STONE ADDER
The complete functioning of KSA can be easily comprehended by analyzing it in terms of

three distinct parts :
1. Pre processing This step involves computation of generate and propagate signals
corresponding too each pair of bits in A and B. These signals are given by the logic
equations below:
pi = Ai xor Bi
gi = Ai and Bi
2.Carrylook ahead network This block differentiates KSA from other adders and is the main
force behind its high performance. This step involves computation of carries corresponding
to each bit. It uses group propagate and generate as intermediate signals which are given
by the logic equations below:
Pi:j = Pi:k+1 and Pk:j

Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j )
3. Post processing This is the final step and is common to all adders of this family (carry look
ahead). It involves computation of sum bits. Sum bits are computed by the logic given
below: Si = pi xor Ci-1
4. Illustration The working of KSA can be understood by the following Fig. 1 which
corresponds to 4-bit KSA. 4-bit KSA is shown for simplicity.
5. Implementation
The schematic of KSA is implemented by using following building blocks :
1. Bit propagate and generate This block implements the
88
89
7. ASSESSMENT SCHEDULE
ASSESSMENT PROPOSED DATE ACTUAL DATE
Unit 1 Assignment
Assessment
Unit Test 1
Unit 2 Assignment
Assessment
Internal Assessment 1 27.02.2023
Retest for IA 1
Unit 3 Assignment
Assessment
Unit Test 2
Unit 4 Assignment
Assessment
Internal Assessment 2 18.04.2023
Retest for IA 2
Unit 5 Assignment
Assessment
Revision Test 1
Revision Test 2
Model Exam 11.05.2023

Remodel Exam
University Exam
90
8.Prescribed Text Books & Reference Books
TEXT BOOKS:
1. Neil H.E. Weste, David Money Harris ―CMOS VLSI Design: A Circuits
and Systems Perspectiveǁ, 4th Edition, Pearson , 2017 (UNIT I,II,V)
2. Jan M. Rabaey ,Anantha Chandrakasan, Borivoje. Nikolic, ǁDigital
Integrated Circuits:A Design perspectiveǁ, Second Edition , Pearson ,
2016.(UNIT III,IV)
REFERENCES :
3. M.J. Smith, ―Application Specific Integrated Circuitsǁ, Addisson
Wesley, 1997
4. Sung-Mo kang, Yusuf leblebici, Chulwoo Kim ―CMOS Digital
Integrated Circuits: Analysis & Designǁ,4th edition McGraw Hill
Education,2013
5. Wayne Wolf, ―Modern VLSI Design: System On Chipǁ,
Pearson Education, 2007
6. R.Jacob Baker, Harry W.LI., David E.Boyee, ―CMOS Circuit
Design, Layout and Simulationǁ, Prentice Hall of India 2005.
91
9 MINI PROJECT
S.No Name of the mini project K level
Design and implement of carry look

1. ahead adder using xilonx K3
Design of 64 bit ALU using Xilinx

2. K3
Design of SRAM cell using Xilinx
3. K3
Design and implement High speed

4. K3
adders using DSCH2 .
Design and implement carry save adder
5. K2
using Xilinx and DSCH2
92
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.
93 93

21ec503 Vlsi Design Unit IV

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21ec503 Vlsi Design Unit IV

Uploaded by

Copyright:

Available Formats

1 1

21EC503 - VLSI DESIGN (Lab Integrated)

Department :Electronics and Communication

6.2 Activity based learning 17

6.3 Lecture Notes 18

Data Path Circuits 19

8 Prescribed Text Books & Reference Books 89

9 Mini Project suggestions 90

❖ To study the fundamentals of CMOS circuits and its characteristics.

❖ To learn the design and realization of combinational & sequential digital

1.21EC303 - DIGITAL ELECTRONICS

2. 21EC404 – LINEAR INTEGRATED CIRCUITS

Subject Code Subject Name L T P C

21EC503 VLSI Design (Lab 3 0 2 4

UNIT I INTRODUCTION TO MOS TRANSISTOR 15

UNIT II COMBINATIONAL MOS LOGIC CIRCUITS 15

UNIT III SEQUENTIAL CIRCUIT DESIGN 15

After successful completion of the course, the students should be

CO2 Realize the combinational circuits using different logic families K3

Understand the memory design in sequential logic circuits K3

Analyze the architectural choice and performance tradeoff

Understand the different FPGA architectures and its testing K2

Design, Simulate to verify the functionality of logic modules

Design solutions for complex engineering problems and design

Create, select, and apply appropriate techniques, resources, and

Apply reasoning informed by the contextual knowledge to assess

Environment Understand the impact of the professional engineering solutions in

Communicate effectively on complex engineering activities with the

To analyze, design and develop solutions by applying

PSO2 To apply design principles and best practices for developing

To adapt to emerging information and communication

Arithmetic Building K3 Chalk -

Adders, Multipliers, K3 Chalk

Shifters, ALUs, K3 Chalk

Building Blocks, K3 Chalk

Total No. of Periods : 9

A group of 10 students are given the following topic and instructed to

4.1 DATA PATH CIRCUITS:

Fig 4.1 Digital Processor

Bit sliced Data Path Circuits:

Figure 4.3 (a) Half adder

Fig 4.3 (b) Truth table of Full adder

4.2.2 Ripple Carry Adder:

Figure 4.4 Ripple carry adder

Figure 4.5 Static CMOS Full adder design

Figure 4.6 Exploit inversion property

□ The capacitance at node Co is composed of four diffusion capacitances, two internal

Figure 4.7 The Mirror Adder

A full adder can be designed to use Multiplexers and XOR implemented by

Figure 4.8 Transmission gate based full adder

4.2.4 Carry look ahead adder:

Carry look ahead principle:

Substituting C1 into C2 then C2 into C3 and C3 into C4

Figure 4.9 Carry Look ahead adder

Figure 4.10 Mirror implementation of 4 bit look ahead adder

Fig 4.11 Propagate and generate with dynamic gates

4.13 Static Manchester carry gates using Propagate/generate/kill

Fig 4.14 Dynamic Manchester carry gates using Propagate/generate

Fig 4.15 Manchester carry chain adder in dynamic logic

4.5.2 CARRY BYPASS ADDER:

tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum

4.5.3 Carry-Select Adder

Fig 4.18 16 bit carry select adder

Vector product, matrix multiplication.