VLSI Subsystem Design: Data Path and Array Multipliers

VLSI
U N I T - IV
SUBSYSTEM DESIGN
Dr. D SUDHA ( ASSISTENT PROFESSOR)
C O NTENTS
DATA PATH S U B SYST E M S : Subsystem Design, Shifters, Adders, ALUs, Multipliers,

Parity generators, Comparators, Zero/One Detectors, Counters.
ARRAY S U B SYST E M S :
S R A M , DR A M , RO M , Serial Access Memories, Content Addressable Memory.
2
Outline
UNIT IV 🠶 Shifters, Adders
 DATA PATH SUBSYSTEMS 🠶 ALUs

🠶 Multipliers
🠶 Parity generators
🠶 Comparators
🠶 Zero/One Detectors
🠶 Counters
3 Department of Electronics and Communication Engineering, VIDYA SAGAR P

Multiplication
– Example:
1100 : 1210 multiplicand
0101 : 510 multiplier
1100
0000 partial
1100 products
0000
– M x N-bit multiplication
00111100 : 6010 product
– Produce N M-bit partial products
– S u m these to produce M+N-bit product

General Form
– Multiplicand: Y = (y M-1 , y M - 2 , … , y 1 , y0)

– Multiplier: X = (x N- 1 , x N - 2 , … , x 1 , x0)
– Product: N 1 M
 M 1    N
P    y j 2   xi 2  
i 1 j
xi y j 2i j
1
 j 0   i0   j0
 i0
y5 y4 y3 y2 y1 y0 multiplicand
x5 x4 x3 x2 x1 x0 multiplier
x0 y 5 x0y4 x0 y 3 x0 y2 x0 y 1 x 0 y0
x 1y 5 x1 y 4 x1y3 x1 y 2 x1 y1 x1 y 0
x2y5 x 2y 4 x2 y 3 x2y2 x2 y 1 x2 y0 partial

3 5
xy
3 4
xy
3 3
xy
3 2
xy
3 1
xy
3 0
xy products
x4y5 x4 y 4 x4y3 x 4y 2 x4 y 1 x4y0
x5y5 x5y4 x5 y 3 x5y2 x 5y 1 x5 y 0
p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 product

Dot Diagram
– E a c h dot represents a bit
x0
partial products
multiplier x
x15

A 4 × 4 Unsigned Array Multiplier
skew array
for rectangular
layout
X3 X2 X1 X0
× Y3 Y2 Y1 Y0
X3Y0 X2Y0 X1Y0 X0Y0
X3Y1
X3Y2 X2Y1 X1Y1 X0Y1
X3Y3 X2Y2
P7 P6 XP2Y
5 3 P4 XP1Y
3 2 XP0Y
2 2 P1 P0
X1Y3
X0Y3

Array Multiplier

Array Multiplier y3 y2 y1 y0
x0
x1
CSA
A rra y
x2
x3
CPA
p7 p6 p5 p4 p3 p2 p1 p0
A B
S in A C in c r i t i ca l p a t h A B
A B
B S in
= Cout C in = Cout C in
Cout C in
S out
Cout S out S out
S out

Rectangular Array
– Squ a sh array to fit rectangular floorplan
y3 y2 y1 y0
x0
p0
x1
p1
x2
p2
x3
p3
p7 p6 p5 p4

Wallace Tree
– Reduces the number of partial products

– Built from carry-save adders:
– Three inputs: a , b, c
– Two outputs: y, z such that y + z = a + b + c
– Carry-save equations:
– yi = a i  b i ci
– zi+1 = a i b i + b i c i + c i a i

Wallace Tree Structure
a2 b 2 c 2 a1 b1 c1 a0 b0 c0 carry-ripple
adder
FA FA FA
s2 s1 s0
a2 b 2 c 2 a1 b 1 c 1 a0 b0 c0
carry-save
FA FA FA adder
z 3 y2 z 2 y1 z1 y0

Wallace Tree Operation
– n additions are reduced to (2n/3) additions after each level

– S u m of inputs = S u m of outputs
– C a n apply the reduction hierarchically
– More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each
level
– Need final adder to add the last two numbers

Signed Multiplication
– Signed number representation

n2
– X  xn1 2 n1 i
– Signed n×n multiplication   xi 2

i0
– (1110)2 × (0011)2 = (1010)2 (-2) × 3 = (-6)
– No difference from unsigned multiplication if the result has the same bit-width as
the input
– B ut what if we want the result to be 2n bit?
– Use sign-bit extension
– Needs 2n × 2n array multiplier

Baugh-Wooley Multiplier: Principle
n2 n n
XY  xn 1 yn 2n
2
xi y i 2
y  y x ) i  n 1
1
2
i0 j 2j
j
 i
n 1 i n 1
2 i
2x  1  yi  1 
0
(x
0
i i
X Y ( n  1y n   xn   n  ) 2 2 n  ( n x1i y n  ) 2 n y
x n  2 yn
1 1 1 2
n x1 1

2
x
i  0 j  2j
i y i
j

2
n 1 y i  n 1x )2 i  n
i0 i y

1
0 2n2 n 1
XY   2 2 ( x  y n 1 ) 2  ( x n 1 
n 1  ( x n 1 y  x n y n 1 ) 2
n2 n n2


n 1 1 i 
n 1 y  n 1x )2 i  n
2
xi y j
i  0 j  2j  i y i 1

0
(x
16 Department of Electronics andi Communication
0 Engineering, VIDYA SAGAR P
Two’s Complement Array Multiplication
Modified Baugh-Wooley two’s complement

multiplier

Baugh-Wooley Multiplier: Structure
x3 y 0 0 x2 y 0 0 x1 y 0 0 x0 y 0
a P0
Cin + x2 y 1 + x1 y 1 + x0 y 1
x3 y 1
+ b P1
x3 y 3 x 3y 2 + x 2y 2 + x1y2 + x0y2
P2
Cout Sum + xy x 2y 3 + x1y3 + x0y3 x3
1 3 3 +
+ + + + + y3
P7 P6 P5 P4 P3

Fewer Partial Products
– Array multiplier requires N partial products

– If we looked at groups of r bits, we could form N/r partial products.
– Faster and smaller?
– Called radix-2 r encoding
– Ex: r = 2: look at pairs of bits
– Form partial products of 0, Y, 2Y, 3Y
– First three are easy, but 3Y requires adder 

Booth Multiplier
– Utilize Booth encoding scheme
– Booth encoding scheme

 Handles signed multiplication
 Reduce the number of partial products by half
 Small area and fast
 Encoding scheme cannot be applied hierarchically
– Often u s e d a s the first stage partial products
reduction

Booth Encoding: Principle
– Two’s-complement form of multiplier y

–
– n1 n2
n3
Y   yn1 2  y n2 2  yn3 2 ...
Y  (n y2 n 1)2 n 1 n ( 3y n   2 y n4 n )2 n3
–
 y first
Consider 
)2 n  2  ( y  y
3 two terms  ...
–
XY  ( n 1 2 y n  2 ny 3 y n  3 n  4 n  ( 2 y
) X 2 n2
 y
– By looking at5 three
product.

bits ofyy, we
) X 
can2determine
n4
...
whether to add x, 2x to partial

Booth Encoding
– Instead of 3Y, try –Y, then increment next partial product to add 4Y
– Similarly, for 2Y, try –2Y + 4Y in next partial product

Booth Hardware
– Booth encoder generates control lines for each PP

– Booth selectors choose PP bits

Sign Extension
– Partial products can be negative

– Require sign extension, which is cumbersome
– High fanout on most significant bit
0 x-1
ss s s s s s s s s s s s s s sss x0
ss sss ssss s s PP0
ss s s s s s s s s s ss s
ss sss ss s PP1
ss s s s s s s s
ss ss s PP2
multiplier x
ss s s
s s PP3
s
PP4
PP5
PP6
x15
PP7 0 x16
PP 8 0 x17

To begin
– When using Booth's Algorithm:

– You will need twice a s m an y bits in your product as
you have in your original two operands.
– Decide which operand will be the multiplier and which will be the multiplicand
– Convert both operands to two's complement representation using X bits
– X must be at least one more bit than is required for the binary representation of
the numerically larger operand
– Begin with a product that consists of the multiplier with an additional X leading zero bits

Example
– In the week by week, there is an example of multiplying 2 x (-5)

– For our example, let's reverse the operation, and multiply (-5) x 2
– The num erically larger operand (5) would require 3 bits to represent in binary (101).
So we must use AT LEAST 4 bits to represent the operands, to allow for the sign bit.
– Let's use 5-bit 2's complement:
– -5 is 11011 (multiplier)
– 2 is 00010 (multiplicand)

Beginning Product
– The multiplier is:
11011
– Add 5 leading zeros to the multiplier to get the beginning product:
00000 11011

Step 1 for each pass
– Use the LSB (least significant bit) and the previous LSB
to determine the arithmetic action.
– If it is the FIRST pass, use 0 as the previous L S B .
– Possible arithmetic actions:

– 00 no arithmetic operation

add multiplicand to left half of product
– 01
 subtract multiplicand from left half of product
– 10 no arithmetic operation

Step 2 for each pass
– Perform an arithmetic right shift (ASR) on the entire product.
– NOTE: For X-bit operands, Booth's algorithm requires

X passes.

Example
– Let's continue with our example of multiplying (-5) x 2

– Remember:
– -5 is 11011 (multiplier)
– 2 is 00010 (multiplicand)
– And we added 5 leading zeros to the multiplier to get the beginning product:
00000 11011

Example continued
– Initial Product and previous L S B
00000 11011 0
(Note: Since this is the first pass, we use 0 for the previous LSB)
– Pass 1, Step 1: Examine the last 2 bits
00000 11011 0
The last two bits are 10, so we need to:
subtract the multiplicand from left half of product

Example: Pass 1 continued
– Pass 1, Step 1: Arithmetic action
(1) 00000 (left half of product)

-00010 (mulitplicand)
11110 (uses a phantom
borrow)
– Place result into left half of product
11110 11011 0

– Pass 1, Step 2: ASR (arithmetic shift right)
– Before AS R
11110 11011 0
– After AS R
11111 01101 1
(left-most bit was 1, so a 1 was shifted in on
the left)
– Pass 1 is complete.

Example: Pass 2
– Current Product and previous L S B
11111 01101 1
11111 01101 1
The last two bits are 11, so we do NOT need to perform an arithmetic action
-- just proceed to step 2.

– Pass 2, Step 2: AS R (arithmetic shift right)
– Before AS R
11111 01101 1
– After AS R
11111 10110 1
(left-most bit was 1, so a 1 was shifted in on the
left)

Example: Pass 3
11111 10110 1
11111 10110 1
add the multiplicand to the left half of the product


+00010 (mulitplicand)
00001 (drop the leftmost
carry)
00001 10110 1
– Before AS R
00001 10110 1
– After AS R
00000 11011 0
the left)

Example: Pass 4
00000 11011 0
– Pass 4, Step 1: Examine the last

2 bits
00000 11011 0
subtract the multiplicand from the
left half of the product


-00010 (mulitplicand)
11110 (uses a phantom
borrow)
11110 11011 0

– Before AS R
11110 11011 0
– After AS R
11111 01101 1
the left)

Example: Pass 5
11111 01101 1
11111 01101 1
The last two bits are 11, so we do NOT need to perform an arithmetic action
-- just proceed to step 2.

– Before AS R
11111 01101 1
– After AS R
11111 10110 1
(left-most bit was 1, so a 1 was shifted in on the
left)

Final Product
– We have completed 5 passes on the 5-bit operands, so we are done.
– Dropping the previous L S B , the resulting final product is:
11111 10110

Verification
– To confirm we have the correct answer, convert the 2's complement final product back to
decimal.
– Final product: 11111 10110

– Decimal value: -10
which is the C O R R E C T product of:
(-5) x 2

C o mparators
 0’s detector: A = 00…000

 1’s detector: A = 11…
 Equality comparator: A
111
=B
 Magnitude comparator: A<B

1’s & 0’s Detectors
 1’s detector: N-input AND gate

 0’s detector: NOTs + 1’s detector (N-input NOR)
A7
A3
A6
A5
allones A2 allzeros
A4
A3 A01
A2
A1
A0
A 7
allones
A 6
A 5
A 4
A 3
47 A 2 Department of Electronics and Communication Engineering, VIDYA SAGAR P

Equality Comparator
 Ch eck if each bit is equal (XNOR, aka equality gate)

 1’s detect on bitwise equality
B[3]
A[3]
B[2]
A[2] A=B
B[1]
A[1]
B[0]
A[0]

Magnitude Comparator
 Compute B – A and look at sign

 B – A = B + ~A + 1 A
 For unsigned numbers, carry out is sign bit BC
B3
N A
A B
3
B
2 Z
A=
B
A
2
B
1
A
1
49
B Engineering,
Department of Electronics and Communication VIDYA SAGAR P
Signed vs. Unsigned
 For signed numbers, comparison is

harder
 C : carry out
 Z: zero (all bits of B – A are 0)
 N: negative (MSB of result)
 V: overflow (inputs had different
signs, output sign  B)
 S : N xor V (sign of result)

Shifters
 Logical Shift:
 Shifts number left or right and fills with 0’s
 1011 LS R 1 = 0101 1011 LSL1 =
0110
 Arithmetic Shift:
 Shifts number left or right. Rt shift sign
extends
 1011 A S R 1 = 1101 1011 AS L1 =
0110
 Rotate:
 Shifts number left or right and fills with lost bits
 1011 R O R 1 = 1101 1011 ROL1 =
0111

Funnel Shifter
 A funnel shifter can do all six types of shifts
 Selects N-bit field Y from 2N–1-bit input
 Shift by k bits (0  k < N)
 Logically involves N N:1 multiplexers
 Is the most general kind of shifter
 C a n do all the other shifts.
 Concatenates two n-bit words together and then
 selects any contiguous n-bit subfield.
 If A=B get a barrel shifter
 If A = sign bit, get arithmetic shifts
 And it does byte inserts too.
 C a n implement this shifter using a cross-bar switch, where the inputs are vertical
and the output are horizontal

Funnel Shifter Operation
– Computing N-k requires an adder

Simplified Funnel Shifter
– Optimize down to 2N-1 bit input

Logarithmic Funnel Shifter
 Log N stages of 2-input muxes

 No select decoding needed

Barrel Shifter
– Barrel shifters perform right rotations using wrap-around wires.
– Left rotations are right rotations by N – k = k + 1 bits.
– Shifts are rotations with the end bits masked off.

4-Bit Barrel Shifter
• A rotate is a shift in which the bits shifted out are inserted into the positions vacated
• The circuit rotates its contents left from 0 to 3 positions depending on Selector S.
Note that a left rotation by three (3)

positions is the same as a right
rotation by one position in this 4 bit
barrel shifter
57 Department of Electronics and Communication Engineering,

Logarithmic Barrel
Shifter
Right shift only
Right/Left shift Right/Left Shift & Rotate

ADDERS
– Single-bit Addition
– Carry-Ripple Adder
– Carry-Skip Adder
– Carry-Lookahead Adder
– Carry-Select Adder
– Carry Save Adder

Single-Bit Addition
A B
Half Adder Full Adder
A B C out C
S AB
S  AB C out C C  MAJ ( A,
S
out
 A B S
B,C)
C out
A B Cout S A B C Cou
S
0 0 0 0 t
0 1 0 1 0 0 0 0 0
1 0 0 1 0 0 1 0 1
1 1 1 0 0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

PGK
– For a full adder, define what happens to carries

(in terms of A and B)
– Generate: C o u t = 1 independent of C
– G =A •B
– Propagate: C o u t = C
– P=A B
– Kill: C o u t = 0 independent of C
– K = ~A • ~B

Full Adder Design
– Brute force implementation from eqns S ABC

Cout  MAJ ( A,
A A B B,C)
B C
A A C
A B B
B S B
C C
C A B B
S
C C C A
J
MA
A C out
B B C out
B B C A
C
A B B
A A

Carry Propagate Adders
– N-bit adder called CPA

– E a c h su m bit depends on all previous carries
– How do we compute all these carries quickly?
AN...1 BN...1
Cout Cout Cin
Cin
00000
C out C in 11111 carries
+ 1111
1111 A4...1
+0000
+0000 B4...1
S N...1 1111
0000 S4...1

Carry-Ripple Adder
– Simplest design: cascade full adders

– Critical path goes from C i n to C o u t
– Design full adder to have fast
carry delay
A4 B4 A3 B3 A2 B2 A1 B1
Cout C3 C2 C1 Cin
S4 S3 S2 S1

Generate / Propagate
– Equations often factored into G and P

– Generate and propagate for groups spanning
ci+1 = Gi + Pi.ci
si = Pi ⊕ ci
Where Gi = ai.bi
Pi = (ai⊕ bi)

P G Logic
A4 B4 A3 B3 A2 B2 A1 B1
Cin
1: Bitwise PG logic
G4 P4 G3 P3 G2 P2 G1 P1 G0 P0
2: Group PG logic
G3:0 G2:0 G1:0 G0:0
C3 C2 C1 C0
3: Sum logic
C4
S4 S3 S2 S1
Cout

Carry-Skip Adder
– Carry-ripple is slow through all N stages

– Carry-skip allows carry to skip over groups of n bits
– Decision based on n-bit propagate signal
A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1
B4:1
P16:13 P12:9 P8:5 P4:1

1 1
Cout C12 C8 1 C4 1
Cin
0 + 0 + 0 + 0 +
S16:13 S12:9 S8:5 S4:1

Carry-Select Adder
– Trick for critical paths dependent on late input X

– Precompute two possible outputs for X = 0, 1
– Select proper output when X arrives
– Carry-select adder precomputes n-bit su m s
– For both possible carries into n-bit group
A 16:13 B 16:13 A 12:9 B 12:9 A 8:5 B 8:5 A 4:1 B 4:1
0 0 0
+ + +
C8 C4
C out C 12
1 1
+ + + 1 + C in
1
1
0
0
S 16:13 S 12:9 S 8:5 S 4:1

Carry Save Addition
– The carry-save adder block is the same circuit as the full adder
The name “carry-save” arises from the fact that we save the carry-out word instead
using it immediately to calculate a final s u m .
X4 Y4 Z4 X3 Y3 Z3 X2 Y2 Z2 X1 Y1 Z 1
C4 S4 C 3 S3 C2 S2 C 1 S1
XN...1 YN...1 ZN...1
n-bit CSA
CN...1 SN...1

Counters
Counters can be implemented using the adder/subtractor circuits andregisters
(or equivalently, D flip-flops)
The simplest counter circuits can be built using T flip-flops because the toggle feature is
naturally suited for the implementation of the counting operation. Counters are available in
two categories.
1. Asynchronous(Ripple counters) Asynchronous counters, also known as ripple counters,
are not clocked by a common pulse and hence every ip- op in the counter changes at
different times.
EX:- Binary ripple counters, BCD ripple counters
2.Synchronous counters A synchronous counter however, has an internal clock, and the
external event is used to produce a pulse which is synchronized with this internal clock.
E.X.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter, Johnson
Counter.

A 3-bit up-counter.
A 3-bit down-counter

A 4bit synchronous up counter
synchronous counter using adders and registers

Linear-Feedback Shift Registers
A linear-feedback shift register (LFSR) consists of N registers configured as a shift register.

The input to the shift register comes from the XOR of particular bits of the register, as
shown in Figure for a 3-bit LFSR. On reset, the registers must be initialized to a
nonzero value (e.g., all 1s). The pattern of outputs for the LFSR is shown in Table

Array S u b Systems
SRAM
DRAM
RO M
Serial Access Memories
Content Addressable Memory
76
Memory Arrays
Random Access Memory Serial Access Memory Content Addressable Memory

(CAM)
Read/Write Memory Read Only Memory

Shift Registers Queues
(RAM) (ROM)
(Volatile) (Nonvolatile)
Serial In Parallel In First In Last In

Static Dynamic RAM Parallel Out Serial Out First Out First Out
RAM (DRAM) (SIPO) (PISO) (FIFO) (LIFO)
(SRAM)
Mask ROM Programmable Erasable Electrically Flash ROM

ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)

Read Only Memory CLASSIFICATION
Mask Programmed RO M s -Data is written during chip fabrication using a photo ma s k
Fused RO M s -D ata is written by blowing the fuse electrically, hence cannot be modified later
Programmable Read Only Memories (PROMs) :Data is written after chip fabrication
Erasable PROMs -Complete block is erased using U V light which is penetrated through glass
window
Electrically Erasable PROMs -8 bit data is erased at a time, hence slower
Flash - Programmed using high electrical voltage. Erases data in blocks hence faster

Memory Architecture
m× n
 Stores large number of bits
memory
 m x n: m words of n bits each …
words
 k = Log2(m) address input signals …
m
 or m = 2 k words
 e.g., 4,096 x 8 memory:
n bits per word
 32, 768 bits
memory external
 12 a d d ress input signals view
r/w
 8 input/output data signals 2k × n read and write memory
enable
 Memory access
A0
 r/w: selects read or write …
 enable: read or write only when asserted Ak-1
…
 multiport: multiple accesses to different locations simultaneously
Q0
Qn-1

Semiconductor Memory Types (Cont.)
 RAM: the stored data is volatile
 DRAM
 A capacitor to store data, and a transistor to access the capacitor
 Need refresh operation
 Low cost, and high density  it is used for main memory
 SRAM
 Consists of a latch
 Don’t need the refresh operation
 High speed and low power consumption it is mainly used for cache memory
and memory in hand-held devices

ROM: “Read-Only” Memory
External
 Nonvolatile
view
 Can be read from but not written to, by a processor in an 2k × n ROM
enable
microcomputer system
 Traditionally written to, “programmed”, before inserting
A0
to microcomputer system
 Uses
…
 Store software program for general-purpose processor Ak-1
…
 Store constant data (parameters) needed by system
 Implement combinational circuits (e.g., decoders)
Q0
Qn-1

Example: 8 x 4 ROM
Internal
view
 Horizontal lines = words 8 × 4 ROM

 Vertical lines = data 3×8
word 0
enable word 1
 Lines connected only at circles decoder
word 2
word
 Decoder sets word 2’s line to 1 if address input is AA0 line
1
010
 Data lines Q3 and Q1 are set to 1 because there is a A2
“programmed” connection with word 2’s line data
programmable line
 Word 2 is not connected with data lines Q2 and Q0
connection
Output is 1010
Q3 Q2 Q1 Q0

Memory –
 ROM Arrays ROM
 There are two basic types of ROM arrays
1) NOR-based ROM
2) NAND-based ROM
NOR-based ROM: All Column Lines are pulled-up using a PMOS transistor (or resistor)
TheRow Lines are connected to the gates of NMOS transistors at the
intersection of Row and Column Lines
 The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is stored
If the NMOS transistor is present, it will pulldown the ColumnLine when
its gate is driven high by the Row Line.
If the NMOS transistor is absent, the Column Line will not be pulled down,so it will
remain pulled up by the PMOS’s.

Memory –
ROM
 NOR-based ROM
 In order to Read from the
array, the Row line is
asserted and the desired
Column line is observed
 a NOR-based ROM
is similar to a Hex
Keypad

Memory –
 NAND-based ROM
ROM
NAND-based ROM is a different architecture
array it uses a depletion-load NMOS as the pull-up

transistor
the Column NMOS’s are connected in series with the
column lines (i.e. a NAND configuration)
If an NMOS exists in the Column line and the Row line
is asserted, the NMOS will pull the Column Line down
and represent a stored ’0’
 If an NMOS is absent on the Column line and the
Row line is asserted, the Column Line will
pulled high by the depletion NMOS remain and
a stored ‘1’ represent
 since all the NMOS’s are in series, in to
fromof a Row, order
all other much be Read
turned
Rows ON the Rowwe are
- this means in order to asserting,
distinguish we write a ‘0’ to it
Memory –
ROM
 NAND-based ROM- In this configuration, if an NMOS is
present, it will represent a “stored 1” since in order to
address its location, the Row line is driven to a ‘0’ and the
NMOS not turned on. This leaves the Column line pulled
HIGH.
 - if NMOS is absent, it will represent a
an all of “stored 0” the other
since will RowNMOS’s
pull theare turned
Column on Line LOW
- and
this gives the opposite behavior as in a NOR-based ROM
 NOR NAND
NMOS present 0 1
NMOS absent 1 0
- it also gives a complementary addressing scheme
NOR NAND
Address Row Line by driving: 1 0
All other Row Lines driven to: 0 1

Mask-programmed ROM
 Connections “programmed” at fabrication

 set of masks
 Lowest write ability
 only once
 Highest storage permanence
 bits never change unless damaged
 Typically used for final design of high-volume systems
 spread out NRE (non-recurrent engineering) cost for
a low unit cost

EPROM: Erasable programmable ROM
 Programmable component is a MOS transistor
 Transistor has “floating” gate surrounded by an insulator
 (a) Negative charges form a channel between source and drain storing floating gate
0V
a logic 1 source drain
 (b) Large positive voltage at gate causes negative charges to move out
(a)
of channel and get trapped in floating gate storing a logic 0
 (c) (Erase) Shining UV rays on surface of floating-gate causes negative +15V
charges to return to channel from floating gate restoring the logic 1 source drain
(b)
 (d) An EPROM package showing quartz window through which UV
light can pass 5-30 min
 Better write ability source drain
(c)
 can be erased and reprogrammed thousands of times
 Reduced storage permanence
(d)
 program lasts about 10 years but is susceptible to radiation and
electric noise,Typically used during design .
88
development Department of Electronics and Communication Engineering, VIDYA SAGAR P
Sample EPROM components

Sample EPROM programmers

EEPROM: Electrically erasable programmable ROM
 Programmed and erased electronically
 typically by using higher than normal voltage
 can program and erase individual words
 Better write ability
 can be in-system programmable with built-in circuit to provide higher than normal voltage
 built-in memory controller commonly used to hide details from memory user
 writes very slow due to erasing and programming
 “busy” pin indicates to processor EEPROM still writing
 can be erased and programmed tens of thousands of times
 Similar storage permanence to EPROM (about 10 years)
 Far more convenient than EPROMs, but more expensive

FLASH
 Extension of EEPROM
 Same floating gate principle
 Same write ability and storage permanence
 Fast erase
 Large blocks of memory erased at once, rather than one word at a time
 Blocks typically several thousand bytes large
 Writes to single words may be slower
 Entire block must be read, word updated, then entire block written back
 Used with embedded microcomputer systems storing large data items in nonvolatile memory
 e.g., digital cameras, MP3, cell phones

Serial Access Memories
 Serial access memories do not use an address

 Shift Registers
 Serial In Parallel Out (SIPO)
 Parallel In Serial Out (PISO)
 Queues (FIFO, LIFO)

Shift Register
– Shift registers store and delay data
– Simple design: cascade of
registers
– Watch your hold times!
clk
Din Dout
8

Serial In Parallel Out
– 1-bit shift register reads in serial data
– After N steps, presents N-bit parallel
output
clk
Sin
P0 P1 P2 P3

Parallel In Serial Out
– Load all N bits in parallel when shift =

0
– Then shift one bit out per cycle
P0 P1 P2 P3
shift/load
clk
Sout

FIFO, LIFO Queues
– First In First Out (FIFO)

– Initialize read and write pointers to first
element
– Queue is EMPTY
– On write, increment write pointer
– If write almost catches read, Queue is FULL
– On read, increment read pointer
– Last In First Out (LIFO)
– Also called a stack
– Use a single stack pointer for read and write

SRAM
b it
w r it e
w r it e _ b
read
read_ b
SRAM memory
cell

6T SRAM Cell
 Cell size accounts for most of array size
 Reduce cell size at expense of
complexity
 6T SRAM Cell bit bit_b

 Used in mostincommercial
Data stored chips
cross-coupled inverters
word
 Read:
 Precharge bit, bit_b
 Raise wordline
 Write:
 Drive data onto bit, bit_b
 Raise wordline

SRAM Read
 Precharge both bitlines high bit bit_b
 Then turn on wordline word

 One of the two bitlines will be pulled down by the P1 P2
cell N2 N4
 Ex: A = 0, A_b = 1 A A_b
 bit discharges, bit_b stays high N1 N3
 But A bumps up slightly A _b b it_ b
 Read stability
 A must not flip 1.5
 N1 >> N2
1.0
w o rd b it
0.5
A
0.0
0 100 200 300 400 500 600
time (ps)

SRAM Write
bi bit_
 Drive one bitline high, the other low t b
wor
 Then turn on wordline d
P1 P2
 Bitlines overpower cell with new N2 N4
value A A_
 Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 N1 b
N3
 Force A_b low, then A rises high
A_b
 Writability
1.5 A
 Must overpower feedback inverter
bit_b
 N2 >> P1 1.0
0.5
word
0.0
0 100 200 300 400 500 600 700
time (ps)

DRAM
DRAM store their contents as charge on a capacitor rather than in a feedback loop.
The cell must be periodically read and refreshed so that its contents do not leak
away. Like SRAM accessed by asserting wordline to connect the capacitor to the
bitline.

DRAM READ
 On read the bitline is precharged to Vdd/2.
 When wordline rises the capacitor shares its charge with the bitline causing a voltage
 change that can be sensed.
 some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when writing a ‘1’.
 DRAM capacitor must be physically small as possible to achieve good density.
 According to charge-sharing equation the voltage swing on bitline during readout is

Content Addressable Memories

CAMs
– Extension of ordinary memory (e.g. SRAM)
– Read and write memory as usual
– Also match to see which words contain a
key
adr data/key
read
CAM match
write

What is CAM?
 Content Addressable Memory is a special kind of memory! 00 1 0 1 X X
 Read operation in traditional memory: 01 0 1 1 0 X

0 1 1 0 X
10 0 1 1 X X
 Input is address location of the content that we are interested in it. 11 1 0 0 1 1
 Output is the content of that address.

0 1
 In CAM it is the reverse:
Traditional Memory
 Input is associated with something stored in the memory.
 Output is location where the associated content is stored. 00 1 0 1 X X
01 0 1 1 0 X
01
10 0 1 1 X X
11 1 0 0 1 1
0 1 1 0 1
Content Addressable
Memory

Simplified CAM Block Diagram
 The input to the system is the search word.
 The search word is broadcast on the search lines.
 Match line indicates if there were a match btw. the search and stored word.
 Encoder specifies the match location.
 If multiple matches, a priority encoder selects the first match.
 Hit signal specifies if there is no match.
 The length of the search word is long ranging from 36 to 144 bits.
 Table size ranges: a few hundred to 32K.
 Address space : 7 to 15 bits.

Type of CAMs
 Binary CAM (BCAM) only stores 0s and 1s

 Applications: MAC table consultation. Layer 2 security related VPN
segregation.
 Ternary CAM (TCAM) stores 0s, 1s and don’t cares.
 Application: when we need wilds cards such as, layer 3 and 4 classification
for QoS and CoS purposes. IP routing (longest prefix matching).
 Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.
 CAM entries are structured as multiples of 36 bits rather than 32 bits.

CAM Advantages
 They associate the input (comparand) with their memory contents in one clock
cycle.
 They are configurable in multiple formats of width and depth of search data
that allows searches to be conducted in parallel.
 CAM can be cascaded to increase the size of lookup tables that they can
store.
 We can add new entries into their table to learn what they don’t know
before.
 They are one of the appropriate solutions for higher speeds.

CAM Disadvantages
 They cost several hundred of dollars per CAM even in large quantities.
 They occupy a relatively large footprint on a card.
 They consume excessive power.
 Generic system engineering problems:
 Interface with network processor.
 Simultaneous table update and looking up requests.

Thank y o u … … … … … …

VLSI Subsystem Design: Data Path and Array Multipliers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VLSI Subsystem Design: Data Path and Array Multipliers

Uploaded by

Copyright:

Available Formats

VLSI

DATA PATH S U B SYST E M S : Subsystem Design, Shifters, Adders, ALUs, Multipliers,

 DATA PATH SUBSYSTEMS 🠶 ALUs

3 Department of Electronics and Communication Engineering, VIDYA SAGAR P

4 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Multiplicand: Y = (y M-1 , y M - 2 , … , y 1 , y0)

x2y5 x 2y 4 x2 y 3 x2y2 x2 y 1 x2 y0 partial

5 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– E a c h dot represents a bit

6 Department of Electronics and Communication Engineering, VIDYA SAGAR P

7 Department of Electronics and Communication Engineering, VIDYA SAGAR P

8 Department of Electronics and Communication Engineering, VIDYA SAGAR P

9 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Squ a sh array to fit rectangular floorplan

10 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Reduces the number of partial products

12 Department of Electronics and Communication Engineering, VIDYA SAGAR P

13 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– n additions are reduced to (2n/3) additions after each level

– Need final adder to add the last two numbers

14 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Signed number representation

– Signed n×n multiplication   xi 2

15 Department of Electronics and Communication Engineering, VIDYA SAGAR P

Modified Baugh-Wooley two’s complement

17 Department of Electronics and Communication Engineering, VIDYA SAGAR P

18 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Array multiplier requires N partial products

19 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Utilize Booth encoding scheme

– Booth encoding scheme

20 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Two’s-complement form of multiplier y

21 Department of Electronics and Communication Engineering, VIDYA SAGAR P

22 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Booth encoder generates control lines for each PP

23 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Partial products can be negative

24 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– When using Booth's Algorithm:

25 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– In the week by week, there is an example of multiplying 2 x (-5)

26 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– The multiplier is:

– Add 5 leading zeros to the multiplier to get the beginning product:

27 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Possible arithmetic actions:

– Perform an arithmetic right shift (ASR) on the entire product.

– NOTE: For X-bit operands, Booth's algorithm requires

29 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Let's continue with our example of multiplying (-5) x 2

30 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Initial Product and previous L S B

– Pass 1, Step 1: Examine the last 2 bits

31 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Pass 1, Step 1: Arithmetic action

(1) 00000 (left half of product)

– Place result into left half of product

32 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Pass 1, Step 2: ASR (arithmetic shift right)

33 Department of Electronics and Communication Engineering, VIDYA SAGAR P

– Current Product and previous L S B