You are on page 1of 98

Digital Integrated Circuits

(83-313)

Lecture 8:
SRAM
Prof. Adam Teman
2 May 2021

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
Lecture Content

2
© Adam Teman,
May 2, 2021
First Look at Memory

3
Why Memory?

Source: Intel
Intel Pentium-M (2001) – 2MB L3 Cache

Source: wccftech.com

Cerebras Wafer Scale Engine 2 (2021) –


Source: Intel 16GB On-Chip Memory
Intel 10th Gen “Comet Lake” (2020) – 20MB L3 Cache © Adam Teman,
May 2, 2021
Memory Hierarchy of a Personal Computer

5 Source: Pavlov, Sachdev, 2008


© Adam Teman,
May 2, 2021
Semiconductor Memory Classification
• Size:
Memory Arrays
• Bits, Bytes, Words

Random Access Memory Serial Access Memory Content Addressable Memory


• Timing Parameters:
(CAM) • Read access, write access, cycle time
Read/Write Memory Read Only Memory
Shift Registers Queues
• Function:
(RAM) (ROM)
(Volatile) (Nonvolatile) • Read Only (ROM) – non-volatile
• Read-Write (RWM) – volatile
Serial In Parallel In First In Last In
Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out • NVRWM – Non-volatile Read Write
• Access Pattern:
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

• Random Access, FIFO, LIFO,


Electrically
Shift Register, CAM
Mask ROM Programmable Erasable Flash ROM
ROM
(PROM)
Programmable
ROM
Erasable
Programmable
• I/O Architecture:
(EPROM) ROM • Single Port, Multi-port
(EEPROM)
• Application:
• Embedded, External, Secondary
6
© Adam Teman,
May 2, 2021
Random Access Chip Architecture
• Conceptual: linear array
• Each box holds some data
• But this leads to a long and skinny shape

• Let’s say we want to make a 1MB memory:


• 1MB=220 words X 8 bits=223 bits, each word in a separate row
• A decoder would reduce the number of access pins from
220 access pins to 20 address lines.
• We’d fit the pitch of the decoder to the word cells,
so we’d have Word Lines with no area overhead.
• The output lines (=bit lines) would be extremely long,
as would the delay of the huge decoder.
• The array’s height is about 128,000 times larger than its width (220/23).
7
© Adam Teman,
May 2, 2021
Square Ratio
• Instead, let’s make the array square:
• 1MB=223 bits=212 rows X 211 columns.
• There are 4000 rows, so we need a
12-bit row address decoder
(to select a single row)
• There are 2000 columns,
representing 256 8-bit words.
• We need to select only one of the
256 words through a
column address decoder (or multiplexer).
• We call the row lines “Word Lines”
and the column lines “Bit Lines”.

8
© Adam Teman,
May 2, 2021
Special Considerations
• The “core” of the memory array is huge.
It can sometimes take up most of the chip area.
• For this reason, we will try to make the “bitcell” as small as possible.
• A standard Flip Flop uses at least 10 transistors per bit
(usually more than 20). This is very area consuming.
• We will trade-off area for other circuit properties:
• Noise Margins
• Logic Swing
• Speed
• Design Rules
• This requires special peripheral circuitry.
9
© Adam Teman,
May 2, 2021
Memory Architecture
Storage Cell
Bit Line Memory Size: W Words of C bits
=W x C bits
Address bus: A bits
ADDA-1 : ADDM

→W=2A
Row Decoder

Word Line

Number of Words in a Row: 2M


Multiplexing Factor: M

Number of Rows: 2A-M


Number of Columns: C x 2M
C×2M
Sense Amplifiers /Drivers
Row Decoder: A-M → 2A-M
ADDM-1 :
Column Decoder Column Decoder: M → 2M
ADD0
Input/Output
10 (C bits) © Adam Teman,
May 2, 2021
The 6T SRAM Bitcell

11
Basic Static Memory Element

Q Q

12
© Adam Teman,
May 2, 2021
Positive Feedback: Bi-Stability

© Adam Teman,
May 2, 2021
Writing into a Cross-Coupled Pair
• The write operation is ratioed
• The access transistor must overcome the feedback.

En

D Q Q

15
© Adam Teman,
May 2, 2021
How should we write a ‘1’
Option 1: nMOS Access Transistor Option 2: pMOS Access Transistor

Passes a “weak ‘1’”, bad at pulling Passes a “weak ‘0’”, bad at pulling
up against the feedback down against the feedback

Option 3: Transmission Gate Solution: Differential nMOS Write

Writes well, but how do we read?


16
© Adam Teman,
May 2, 2021
Reading from a 6T SRAM Cell

© Adam Teman,
May 2, 2021
6-transistor CMOS SRAM Cell
BL BLB

WL WL
M3 M6
M2 M5
Q QB
M1 M4

18
© Adam Teman,
May 2, 2021
The Computer Hall of Fame
• The machine that introduced the GUI, the mouse,
and Steve Jobs to the mainstream.

• The personal computer that “was designed Source: macworld.co.uk

so easy to use that people could actually use it”.


• Developed based on a 3-day tour of Apple
at Xerox PARC, where they saw the Xerox Alto.
• Introduced in 1984, sold for $2500 with an
8MHz Motorola 6800 processor, 128kB RAM.
• Included MacPaint and MacWrite
• Despite initial success, sales declined and
Steve Jobs was fired from Apple in 1985.
Source: macworld.co.uk
© Adam Teman,
May 2, 2021
6T SRAM Operation

21
SRAM Operation: HOLD
BL BLB

WL WL
M3 M6
M2 M5
Q QB
M1 M4

22
© Adam Teman,
May 2, 2021
SRAM Operation: READ
BL BLB

WL 0 VDD WL
M3 M6
M2 VDD M5
0 BLB
Q QB
QB M1
0 VDD
M4
WL
M5
M3
QB=ΔV
Q=VDD
Q
M4
WL M2
Left Side: Right Side:
BL “nMOS” inverter –
Nothing Changes…
QB voltage rises
23
© Adam Teman,
May 2, 2021
SRAM Operation - Read

BLB
BL
BLB Cell Ratio:
WL M3 M6 WL WL
M5 W4
L4
VDD M2 M5 VDD QB=ΔV CR 
W5
Q=‘1’ M1 Q
M4 L5
CBLB

M4 QB=‘0’

CBL
 VDSat,n 
2
 V 2 
kM5 (VDD − V − VT,n )VDSat,n −  = kM4 (VDD − VT,n ) V − 
 2   2 

VDSat,n + CR (VDD − VT,n ) − V (1 + CR ) + CR (VDD − VT,n )


2 2 2
DSat,n
V =
CR
24
© Adam Teman,
May 2, 2021
Cell Ratio (Read Constraint)
BLB

W4 WL
M5
L4
CR  QB=ΔV
W5
L5 M4
Q

So we need the pull


down transistor to be
much stronger than the
access transistor…

25
© Adam Teman,
May 2, 2021
SRAM Operation: WRITE
BL BLB

WL VDD 0 WL
M3 M6
VDD M2 0 VDD M5 0
Q QB
BL M1 M4
VDD 0
Q
WL
M2 M6
Q=ΔV
QB=VOLmin
M1 WL M5
QB Left Side: Right Side:
BLB
Same as during read – Pseudo nMOS
designed so ΔV<VM inverter!
26
© Adam Teman,
May 2, 2021
SRAM Operation - Write Pull-Up Ratio
W6

BLB
BL L6
PR 
Q W5
M6 L5
WL M3 M6 WL
M2 M5
QB=VOLmin
‘0’
Q=‘0’ M1 WL M5
M4 QB=‘1’
VDD BLB

 2
   2

( )
V
 = kM5 (VDD − VT,n )VQB − 2 
V QB
kM6  VDD − VT,p VDSat,p − DSat,p

 2   

p  2

( )
VDSat,p
(V − VT,n )
2
VQB = VDD − VT,n − − 2 PR  VDD − VT,p VDSat,p − 
DD
n  2 
27
© Adam Teman,
May 2, 2021
Pull Up Ratio – Write Constraint Q
M6

QB=VOLmin

WL M5
BLB

So we need the access


transistor to be much stronger
than the pull up transistor…
W6
L6
PR 
W5
L5

28
© Adam Teman,
May 2, 2021
Summary – SRAM Sizing Constraints
W1 W4
Read Constraint L1 L4 PDN
CR  = =
W2 W5 access
L2 L5

KPDN  Kaccess

KPDN  Kaccess  KPUN


Write Constraint
Kaccess  KPUN
W3 W6
L3 L6 PUN
PR  = =
W2 W5 access
L2 L5
29
© Adam Teman,
May 2, 2021
4T Memory Cell
• Achieve density by removing the PMOS pull-up.
• However, this results in static power dissipation.

30
© Adam Teman,
May 2, 2021
Multi-Port SRAM

Dual Port SRAM Two Port SRAM

31
© Adam Teman,
May 2, 2021
6T SRAM Layout
SRAM Layout - Traditional
• Share Horizontal Routing (WL).
• Share Vertical Routing (BL, BLB).
• Share Power and Ground.

BL BLB

WL WL
M3 M6
M2 M5
Q QB
M1 M4

© Adam Teman,
May 2, 2021
SRAM Layout – Thin Cell BL BLB

• Avoid Bends in Polysilicon and Diffusion WL WL


M3 M6
• Orient all transistors in one direction.
M2 M5
• Minimize Bitline Capacitance. Q QB
M1 M4

34
© Adam Teman,
May 2, 2021
65nm SRAM
• Industrial example from ST/Phillips

35
© Adam Teman,
May 2, 2021
Commercial SRAMs

Intel Design Forum 2009

36
© Adam Teman,
May 2, 2021
And very recent SRAMs Samsung 3nm
TSMC 7nm GAA SRAM Test Chip
SRAM

Source: TSMC

Source: ISSCC 2021

TSMC 5nm SRAM Test Chip


Source: ISSCC 2020 © Adam Teman,
May 2, 2021
SRAM Stability
“Static Noise Margin”

38
Static Noise Margin - Hold

+ +
VN - - VN

39
© Adam Teman,
May 2, 2021
Static Noise Margin - Hold
M3 M6
Q QB Q QB

M1 M4

1. Plot both VTCs on the same graph


2. Find the maximum square that fits
in the VTC.
3. The SNM is defined as the side of
the maximum square.

40
© Adam Teman,
May 2, 2021
Static Noise Margin - Read
• What happens during Read?
• We can’t ignore the access transistors anymore…

BLB
Vout
BL

WL M3 M6 WL
VDD M2 M5 VDD
Q M1 M4 QB
CBL

CBLB
M3 M2 M6 M5
QB Q
Q QB
M1 M4
Vin
41
© Adam Teman,
May 2, 2021
Static Noise Margin - Read
QB

M3 M2

QB Q
M1

SNM

M6 M5
Q
QB
M4

Q
42
© Adam Teman,
May 2, 2021
Static Noise Margin - Write Q

• What happens during Write? M3 M2


• The two sides are now different. QB Q
M1

BLB
BL

QB
WL M3 M6 WL QB

M2 M5 ‘0’
M6
Q=‘0’ M1 M4 QB=‘1’ QB
VDD Q

M4 M5

43
© Adam Teman,
May 2, 2021
Static Noise Margin - Write
Q
M3 M2

QB Q
M1

If there is a stable
point here, the
wrong data is
M6 WSNM written!
QB
Q

M4 M5

QB
44
© Adam Teman,
May 2, 2021
Alternative Write SNM Definition
• Write SNM depends on the cell’s separatrix,
therefore alternative definitions have been proposed.
• For example, add a DC Voltage (VBL) to the 0 bitline Q

and see how high it can be and still flip the cell.

M3 M2 M6 VBL=
V DD
QB
QB Q
Q

VB
L =0
M1 M4 M5

QB
VBL
45
© Adam Teman,
May 2, 2021
Dynamic Stability

46
© Adam Teman,
May 2, 2021
SNM Calculation

47
Simulating SNM
• Problem:
• How can we calculate SNM with SPICE?

• Some options:
• Insert DC sources at Q and QB
• But where exactly do we connect them?
• Draw Butterfly Curves
• But how do we find the largest squares?

• To run Monte Carlo Simulations we should have an easy way of calculation.

49
© Adam Teman,
May 2, 2021
Simulating SNM
• First let’s define the graphical solution:
• The diagonals of all the squares are on lines parallel to Q=QB.
• We need to find the distance
between the points where these
intersect the butterfly plot.
• The largest of these distances
is the diagonal of the maximum
square in each lobe.
• Multiply this by cos45°
and we get the SNM.

• Easy, right?
50
© Adam Teman,
May 2, 2021
Changing Coordinates
• What if we were to turn the graph?

51
© Adam Teman,
May 2, 2021
Changing Coordinates
• If we were to use new axes, we could just subtract the graphs.
• This gives us the distances between the intersections with the Q=QB parallels.
• Now all we have to do is
find the maximum of the
subtraction.
• (Don’t forget to multiply by cos45)

52
© Adam Teman,
May 2, 2021
Changing Coordinates
• The required transformation is: x=
1
u+
1
v
2 2
1 1
y=− u+ v
2 2
• Now let’s define some function as F1

• Substituting y=F1(x) gives:


v = u + 2y =
 1 1 
= u + 2 F1  u+ v
 2 2 

53
© Adam Teman,
May 2, 2021
Changing Coordinates
• What we did is turn some function (F1) 45 degrees
counter clockwise.
• This can easily be implemented with the following circuit:

• What is F1?
• It could be the VTC of
Vin=Q, Vout=QB…

54
© Adam Teman,
May 2, 2021
Changing Coordinates
• But what about the “mirrored” VTC?
• This needs to first be mirrored with respect to the v axis and then transformed
to the (u,v) system.
• If we call the second VTC F2 with x=F2(y) then the operation we need is:

v = −u + 2 x
 v u 
= −u + 2 F2  − 
 2 2

55
© Adam Teman,
May 2, 2021
Final SNM Calculation
VDD GND VDD
• Now we need to:
BL WL BLB
• Make a schematic of our SRAM
cell with two pins: Q and QB. 6T Cell
Q2 Q QB QB2

• Create a coordinate changing circuit VDD GND VDD

for each of the transformations. DC Sweep u v1 v1


BL WL BLB
Transformation 1
Q1 F1(in)
6T CellF1(out) QB1
Q2 Q QB QB2

DC Sweep u v2 v2
Transformation 2
QB2 F2(in) F2(out) Q2

56
© Adam Teman,
May 2, 2021
Final SNM Calculation
• Now, connect F1 to Q→QB, and F2 to QB→Q. VDD GND VDD

DC Sweep u v1 v1 BL WL BLB

Transformation 1 6T Cell
Q1 F1(in) F1(out) QB1 Q1 Q QB QB1

VDD GND VDD

DC Sweep u v2 v2
BL WL BLB
Transformation 2
QB2 F2(in) F2(out) Q2 6T Cell
Q2 Q QB QB2

• Run a DC Sweep on u from –VDD/√2 to VDD/√2


• This will present the butterfly curves
turned 45 degrees.
57
© Adam Teman,
May 2, 2021
Final SNM Calculation
• Now just:
• Subtract the bottom graph from the top one.
• Find the local maxima for each lobe.
• The smaller of the local maxima
is the diagonal of the largest square.
• Multiply this by cos45° for the SNM

 min max ( v1 − v2 ) , max ( v1 − v2 )


1 
SNM =
2  − 2 u  0 0 u  2

58
© Adam Teman,
May 2, 2021
Read/Write SNM
• How about Read SNM:
• Use the exact same setup.
• Connect BL and BLB to VDD.
• Connect WL to VDD.
• Run the same calculation.

• And Write SNM.


• Now connect one BL to GND.
• This is trickier, so you’ll have to play around with the calculation.
• There are other options for WM calculation.

59
© Adam Teman,
May 2, 2021
Testbench Setup – Read/Write
Read Testbench: Write Testbench:

DC Sweep u v1 v1 DC Sweep u v1 v1
Transformation 1 Transformation 1
Q1 F1(in) F1(out) QB1 Q1 F1(in) F1(out) QB1

DC Sweep u v2 v2
DC Sweep u v2 v2 Transformation 2
Transformation 2 QB2 F2(in) F2(out) Q2
QB2 F2(in) F2(out) Q2
GND VDD VDD
VDD VDD VDD
BL WL BLB
BL WL BLB
6T Cell
6T Cell Q1 Q QB QB1
Q1 Q QB QB1
GND VDD VDD
VDD VDD VDD

BL WL BLB
BL WL BLB

6T Cell 6T Cell
Q2 Q QB QB2 Q2 Q QB QB2

60
© Adam Teman,
May 2, 2021
SRAM Stability under process variations

61
© Adam Teman,
May 2, 2021
Metastability Convergence in Spectre
• Node Sets
• What solution does Virtuoso find with a standard OP?

• To fix this, make sure you use the “Node Set” option.

© Adam Teman,
May 2, 2021
Node Sets vs. Initial Conditions
• SPICE supports two types of conversion aids :
• Node Sets:
• Help SPICE converge by providing it
with an initial guess.
• Used only for DC convergence!
Disregarded for Transient Analysis.

• Initial Conditions:
• Enforce a node voltage at time t=0.
• Used only for Transient analysis!
Disregarded for DC convergence.

63
© Adam Teman,
May 2, 2021
Additional simulation tips
• Work with Design Hierarchy
• Create transformation functions
and DUTs as symbols.
• Create multiple tests in
single ADE-XL view.
• Use variables/parameters to
define initial conditions/node sets.
• Create supply voltages in
separate symbol.
• Use buffers to smooth transitions
and reduce cross cap.
64
© Adam Teman,
May 2, 2021
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”
• Seevinck, List, Lostroh, “Static Noise Margin Analysis of SRAM Cells”
IEEE Journal of Solid State Circuits, 1987
• Teman and Visotsky. "A fast modular method for true variation-aware separatrix
tracing in nanoscaled SRAMs." IEEE TVLSI, 2014.

65
© Adam Teman,
May 2, 2021
Digital Integrated Circuits
(83-313)

Lecture 9:
Memory Peripherals
Prof. Adam Teman
25 May 2021

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
Lecture Content

2
© Adam May
Teman,
25, 2021
Memory Peripherals Overview

3
Memory Architecture
Storage Cell
Bit Line Memory Size: W Words of C bits
=W x C bits
Address bus: A bits
ADDA-1 : ADDM

→W=2A
Row Decoder

Word Line

Number of Words in a Row: 2M


Multiplexing Factor: M

Number of Rows: 2A-M


Number of Columns: C x 2M
C×2M
Sense Amplifiers /Drivers
Row Decoder: A-M → 2A-M
ADDM-1 :
Column Decoder Column Decoder: M → 2M
ADD0
Input/Output
4 (C bits) © Adam May
Teman,
25, 2021
Synchronous SRAM Interface 2mxn SRAM
A[m-1:0]

• A typical on-chip synchronous SRAM features: D[n-1:0] Q[n-1:0]

• Single-cycle write/read latency WEN[p-1:0]

• Byte write mask CEN


• Active low Write Enable (i.e., WEN=1 → Read Enable) CLK
• The timing diagram can be viewed, as follows:
(1) Rising edge of the clock results
CLK in WRITE, when WE is low.

A A0 A1 A2 A3 (2) Rising edge of the clock results


in READ, when WE is high.
D D0 D1 Valid data appears on the
output after a delay.
WE
Q D2 D3
5 © Adam May
Teman,
25, 2021
Memory Timing: Definitions

Real Datasheet
Example

Simple Definitions

Source: CMU, ECE548

6 Write Cycle © Adam May


Teman,
Read25, 2021
Cycle
Major Peripheral Circuits
Storage Cell
Bit Line

• Row Decoder
• Column Multiplexer

Row Decoder
Word Line

AW-1 : AM
• Sense Amplifier
• Write Driver
• Precharge Circuit

C×2M
Sense Amplifiers /Drivers

AM-1 : A0 Column Decoder

Input/Output
(C bits)
7
© Adam May
Teman,
25, 2021
Row Decoder Design

8
Row Decoders
• A Decoder reduces the number of select signals by log2.
• Number of Rows: W
• Number of Row Address Bits: A=log2W
Word 0
Word 1
ADDA-1 : ADD0

Word 2
Row Decoder

Word W-2
Word W-1

9 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Row Decoders
• Standard Decoder Design:
• Each output row is driven by an AND gate with k=log2N inputs.
• Each gate has a unique combination of address inputs
(or their inverted values).
• For example, an 8-bit row address has 256 8-input AND gates, such as:

WL0 = A7 A6 A5 A4 A3 A2 A1 A0 WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• NOR Decoder:
• DeMorgan will provide us with a NOR Decoder.
• In the previous example, we’ll get 256 8-input NOR gates:
WL0 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A0
WL255 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A 0
10 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
How should we build it? WL0

• Let’s build a row decoder for a 256x256 SRAM Array.


• We need 256 8-input AND Gates.
WL1
• Each gate drives 256 bitcells
• We have various options:

WL255

• Which one is best?


11 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Reminder: Logical Effort
t pd ,i = t pINV ( pi + EFi )
bi  Cin,i +1
PE = F   LE  B =   LEi  bi
CL
EFi LEi  fi = LEi 
Cin,i Cin ,1
EFopt = PE = N F   LEi  bi
N

Nopt = log EFopt PE = log EFopt F  LE  B

(
t pd = t pINV  ( pi + EFi ) = t pINV   pi + N  N PE )
12 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Problem Setup
• For LE calculation we need to start with:
• Output Load (CL)
• Input Capacitance (Cin)
• Branching (B)
• What is the Load Capacitance?
• 256 bitcells on each Word Line

CWL = 256  CCell + CWire


• Let’s ignore the wire for now…
• What is the Input Capacitance?
• Let’s assume our address drivers
can drive a bit more than a bitcell, so: Cin ,addr _ driver = 4  CCell
13 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Problem Setup
• What is the Branching Effort?
• Lets take another look WL0 = A7 A6 A5 A4 A3 A2 A1 A0
at the Boolean expressions:
WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• We see that half of the signals use Ai and half use Ai!
• So each address driver drives 128 8-input AND gates,
but only one is on the selected WL path.

Con path = Cnand ; Coff path = 127  Cnand


Con path + Coff path Cnand + 127  Cnand
Badd _ driver = = = 128
Con path Cnand
14 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Number of Stages
CWL 256CCell
• Altogether the path effort is: PE = LE  B  F = LE  bi = LE  128 
Caddress 4CCell
= LE  8k = 213  LE

• The best case logical effort is


LE = 1
• So the minimum number of
stages for optimal delay is: PE = 213
N opt = log 3.6 2 = 7
13

• That’s a lot of stages!

15 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
So which implementation should we use?
• The one with the minimum Logical Effort:

LE = (10 3) 1 LE = 2  ( 5 3) LE = ( 4 3)  ( 5 3)  ( 4 3) 1 LE = ( 4 3)


3

= 10 3; = 10 3 = 80 27;
p = 2 + 2 + 2 +1 = 7
= 2.37;
p = 8 +1 = 9 p = 4+2 = 6
p = 2  3 + 1 3 = 9

16 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
New optimal number of Stages
• So now we can calculate the actual path effort:

PE = F  bi  LEi =
= 2.37  213 = 19.418k
N opt = log 3.6 PE = 7.7

• We could add another inverter or two


to get closer to the optimal number of stages…

17 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Implementation Problems
• Address Line Capacitance:
• Our assumption was that Cin,addr_driver=4Ccell.
• But each address drives 128 gates
• That’s a really long wire with high capacitance.
• This means that we will need to buffer the address lines
• This will probably ruin our whole analysis...

• Bit-cell Pitch:
• Each signal drives one row of bitcells.
• How will we fit 8 address signals into this pitch?

18 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Predecoding - Concept
• Solution:
• Let’s look at two decoder paths: WL254, WL255
A0 A0 A0
A1 A0
A1 A1
A1
A2
A2 A2
A2
A3
WL255 A3 A3 A
A4 WL254 3
A4
A5 A4
A5 A4
A6 A5 A5
A6
A7
A6 A7
A6
A7 A7

• We see that there are many “shared” gates.


• So why not share them?
• For instance, we can use the purple output for both gates…
19 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Method

4 →16
A0
• How do we do this? A1 D
A2
• If we look at the final Boolean expression, A3
it has combinations of groups of inputs.
• By grouping together a few inputs,
we actually create a small decoder.

4 →16
• Then we just AND the outputs of all the A4
“pre” decoders.
A5 E
A6
• For example: Two 4:16 predecoders A7
D = dec ( A0 , A1 , A2 , A3 ) ; E = dec ( A4 , A5 , A6 , A7 ) ;
WL0 = D0  E0 ; WL255 = D15  E15 ; WL254 = D14  E15 ;
20 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Example
• Let’s look at our example: WL0 = D0  E0
D = dec ( A0 , A1 , A2 , A3 ) WL255 = D15  E15
E = dec ( A4 , A5 , A6 , A7 ) WL254 = D15  E14
• What is our new branching effort?
• As before, each address drives half the lines of the small decoder.
• Each predecoder output drives 256/16 post-decoder gates.
• Altogether, the branching effort is:
B = baddr _ driver  bpredecoder = 16  256 = 128
2 16
• Same as before!
21 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Solution
• Why is this a better solution?
• Each Address driver is only driving eight gates
• less capacitance.
• We saved a ton of area by “sharing” gates.
• We can “Pitch Fit” 2-input NAND gates.

22 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Another Predecoding Example
• We can try using four 2-input predecoders:
• This will require us to use 256 4-input NAND gates.

23 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
How do we choose a configuration?
• Pitch Fitting: 2-input NANDs vs. 4-input NAND.
• Switching Capacitance: How many wires switch at each transition?
• Stages Before the large cap: Distribution of the load along the delay.
• Conclusion: Usually do as much predecoding as possible!
WL0 WL0

WL1 WL1
4 4 4 4 16 16

WL127 WL127

2→4 2→4 2→4 2→4 4 →16 4 →16

A0A1 A2A3 A4A5 A6A7 A0A1A2A3 A4A5A6A7


24 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Alternative Solution: Dynamic Decoders

GND

GND
VDD
PC

WL0 WL3

WL1 WL2

WL1
WL2
WL0
WL3
A0

A0

A1

A1

A0

A0

A1

A1
2-input NOR decoder 2-input NAND decoder
25 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Column Multiplexer

26
Column Multiplexer
• First option – PTL Mux with decoder
• Fast – only 1 transistor in signal path.
• Large transistor Count A1 A0

B0 B1 B2 B3

Y
27 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
4 to 1 tree decoder
• Second option – Tree Decoder
• For 2k:1 Mux, it uses k series transistors.
• Delay increases quadratically
• No external decode logic → big area reduction.

28 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Combining the Two

29 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Precharge and Sense Amp

30
Precharge Circuitry
• Precharge bitlines high before reads


bit bit_b
• Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b

31 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Sense Amplifiers
make D V as small
C  DV as possible
tp = ----------------
Iav

large small

Idea: Use Sense Amplifier

small
transition s.a.

input output

32 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Differential Sense Amplifier
• Non-clocked Sense Amp has high static power.
• Clocked sense amp saves power
• Requires sense_clk after enough bitline swing
• Isolation transistors cut off large bitline capacitance

33 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
The Computer Hall of Fame
• The machine that many of us got to know
during our military service:

Source: pcworld.com
• 32-bit, CISC architecture, introduced in 1977
• The VAX-11/780 was TTL-based, 5MHz, 2kB cache, reaching 1 MIPS
• Known as a “minicomputer”, even though it took up a whole room.
• VAX means “Virtual Address Extension”,
since the VAX was one of the first minicomputers to use virtual memory.
• Ran the VMS operating system.
• Many systems that were developed during the cold war
(e.g., F-15, F-18, Hawk missiles, nuclear programs) still use VAX today!
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”

36
© Adam May
Teman,
25, 2021

You might also like