SRAM Adi Teman

Digital Integrated Circuits
(83-313)
Lecture 8:
SRAM
Prof. Adam Teman
2 May 2021
Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
Lecture Content
2
© Adam Teman,
May 2, 2021
First Look at Memory
3
Why Memory?
Source: Intel
Intel Pentium-M (2001) – 2MB L3 Cache
Source: wccftech.com
Cerebras Wafer Scale Engine 2 (2021) –

Source: Intel 16GB On-Chip Memory
Intel 10th Gen “Comet Lake” (2020) – 20MB L3 Cache © Adam Teman,
May 2, 2021
Memory Hierarchy of a Personal Computer
5 Source: Pavlov, Sachdev, 2008

© Adam Teman,
May 2, 2021
Semiconductor Memory Classification
• Size:
Memory Arrays
• Bits, Bytes, Words
Random Access Memory Serial Access Memory Content Addressable Memory

• Timing Parameters:
(CAM) • Read access, write access, cycle time
Read/Write Memory Read Only Memory
Shift Registers Queues
• Function:
(RAM) (ROM)
(Volatile) (Nonvolatile) • Read Only (ROM) – non-volatile
• Read-Write (RWM) – volatile
Serial In Parallel In First In Last In
Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out • NVRWM – Non-volatile Read Write
• Access Pattern:
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)
• Random Access, FIFO, LIFO,

Electrically
Shift Register, CAM
Mask ROM Programmable Erasable Flash ROM
ROM
(PROM)
Programmable
ROM
Erasable
Programmable
• I/O Architecture:
(EPROM) ROM • Single Port, Multi-port
(EEPROM)
• Application:
• Embedded, External, Secondary
6
© Adam Teman,
May 2, 2021
Random Access Chip Architecture
• Conceptual: linear array
• Each box holds some data
• But this leads to a long and skinny shape
• Let’s say we want to make a 1MB memory:

• 1MB=220 words X 8 bits=223 bits, each word in a separate row
• A decoder would reduce the number of access pins from
220 access pins to 20 address lines.
• We’d fit the pitch of the decoder to the word cells,
so we’d have Word Lines with no area overhead.
• The output lines (=bit lines) would be extremely long,
as would the delay of the huge decoder.
• The array’s height is about 128,000 times larger than its width (220/23).
7
© Adam Teman,
May 2, 2021
Square Ratio
• Instead, let’s make the array square:
• 1MB=223 bits=212 rows X 211 columns.
• There are 4000 rows, so we need a
12-bit row address decoder
(to select a single row)
• There are 2000 columns,
representing 256 8-bit words.
• We need to select only one of the
256 words through a
column address decoder (or multiplexer).
• We call the row lines “Word Lines”
and the column lines “Bit Lines”.
8
© Adam Teman,
May 2, 2021
Special Considerations
• The “core” of the memory array is huge.
It can sometimes take up most of the chip area.
• For this reason, we will try to make the “bitcell” as small as possible.
• A standard Flip Flop uses at least 10 transistors per bit
(usually more than 20). This is very area consuming.
• We will trade-off area for other circuit properties:
• Noise Margins
• Logic Swing
• Speed
• Design Rules
• This requires special peripheral circuitry.
9
© Adam Teman,
May 2, 2021
Memory Architecture
Storage Cell
Bit Line Memory Size: W Words of C bits
=W x C bits
Address bus: A bits
ADDA-1 : ADDM
→W=2A
Row Decoder
Word Line
Number of Words in a Row: 2M

Multiplexing Factor: M
Number of Rows: 2A-M

Number of Columns: C x 2M
C×2M
Sense Amplifiers /Drivers
Row Decoder: A-M → 2A-M
ADDM-1 :
Column Decoder Column Decoder: M → 2M
ADD0
Input/Output
10 (C bits) © Adam Teman,
May 2, 2021
The 6T SRAM Bitcell
11
Basic Static Memory Element
Q Q
12
© Adam Teman,
May 2, 2021
Positive Feedback: Bi-Stability
© Adam Teman,
May 2, 2021
Writing into a Cross-Coupled Pair
• The write operation is ratioed
• The access transistor must overcome the feedback.
En
D Q Q
15
© Adam Teman,
May 2, 2021
How should we write a ‘1’
Option 1: nMOS Access Transistor Option 2: pMOS Access Transistor
Passes a “weak ‘1’”, bad at pulling Passes a “weak ‘0’”, bad at pulling
up against the feedback down against the feedback
Option 3: Transmission Gate Solution: Differential nMOS Write
Writes well, but how do we read?

16
© Adam Teman,
May 2, 2021
Reading from a 6T SRAM Cell
© Adam Teman,
May 2, 2021
6-transistor CMOS SRAM Cell
BL BLB
WL WL
M3 M6
M2 M5
Q QB
M1 M4
18
© Adam Teman,
May 2, 2021
The Computer Hall of Fame
• The machine that introduced the GUI, the mouse,
and Steve Jobs to the mainstream.
• The personal computer that “was designed Source: macworld.co.uk
so easy to use that people could actually use it”.

• Developed based on a 3-day tour of Apple
at Xerox PARC, where they saw the Xerox Alto.
• Introduced in 1984, sold for $2500 with an
8MHz Motorola 6800 processor, 128kB RAM.
• Included MacPaint and MacWrite
• Despite initial success, sales declined and
Steve Jobs was fired from Apple in 1985.
Source: macworld.co.uk
© Adam Teman,
May 2, 2021
6T SRAM Operation
21
SRAM Operation: HOLD
BL BLB
WL WL
M3 M6
M2 M5
Q QB
M1 M4
22
© Adam Teman,
May 2, 2021
SRAM Operation: READ
BL BLB
WL 0 VDD WL
M3 M6
M2 VDD M5
0 BLB
Q QB
QB M1
0 VDD
M4
WL
M5
M3
QB=ΔV
Q=VDD
Q
M4
WL M2
Left Side: Right Side:
BL “nMOS” inverter –
Nothing Changes…
QB voltage rises
23
© Adam Teman,
May 2, 2021
SRAM Operation - Read
BLB
BL
BLB Cell Ratio:
WL M3 M6 WL WL
M5 W4
L4
VDD M2 M5 VDD QB=ΔV CR 
W5
Q=‘1’ M1 Q
M4 L5
CBLB
M4 QB=‘0’
CBL
 VDSat,n 
2
 V 2 
kM5 (VDD − V − VT,n )VDSat,n −  = kM4 (VDD − VT,n ) V − 
 2   2 
VDSat,n + CR (VDD − VT,n ) − V (1 + CR ) + CR (VDD − VT,n )

2 2 2
DSat,n
V =
CR
24
© Adam Teman,
May 2, 2021
Cell Ratio (Read Constraint)
BLB
W4 WL
M5
L4
CR  QB=ΔV
W5
L5 M4
Q
So we need the pull

down transistor to be
much stronger than the
access transistor…
25
© Adam Teman,
May 2, 2021
SRAM Operation: WRITE
BL BLB
WL VDD 0 WL
M3 M6
VDD M2 0 VDD M5 0
Q QB
BL M1 M4
VDD 0
Q
WL
M2 M6
Q=ΔV
QB=VOLmin
M1 WL M5
QB Left Side: Right Side:
BLB
Same as during read – Pseudo nMOS
designed so ΔV<VM inverter!
26
© Adam Teman,
May 2, 2021
SRAM Operation - Write Pull-Up Ratio
W6
BLB
BL L6
PR 
Q W5
M6 L5
WL M3 M6 WL
M2 M5
QB=VOLmin
‘0’
Q=‘0’ M1 WL M5
M4 QB=‘1’
VDD BLB
 2
   2

( )
V
 = kM5 (VDD − VT,n )VQB − 2 
V QB
kM6  VDD − VT,p VDSat,p − DSat,p
 2   
p  2

( )
VDSat,p
(V − VT,n )
2
VQB = VDD − VT,n − − 2 PR  VDD − VT,p VDSat,p − 
DD
n  2 
27
© Adam Teman,
May 2, 2021
Pull Up Ratio – Write Constraint Q
M6
QB=VOLmin
WL M5
BLB
So we need the access

transistor to be much stronger
than the pull up transistor…
W6
L6
PR 
W5
L5
28
© Adam Teman,
May 2, 2021
Summary – SRAM Sizing Constraints
W1 W4
Read Constraint L1 L4 PDN
CR  = =
W2 W5 access
L2 L5
KPDN  Kaccess
KPDN  Kaccess  KPUN

Write Constraint
Kaccess  KPUN
W3 W6
L3 L6 PUN
PR  = =
W2 W5 access
L2 L5
29
© Adam Teman,
May 2, 2021
4T Memory Cell
• Achieve density by removing the PMOS pull-up.
• However, this results in static power dissipation.
30
© Adam Teman,
May 2, 2021
Multi-Port SRAM
Dual Port SRAM Two Port SRAM
31
© Adam Teman,
May 2, 2021
6T SRAM Layout
SRAM Layout - Traditional
• Share Horizontal Routing (WL).
• Share Vertical Routing (BL, BLB).
• Share Power and Ground.
BL BLB
WL WL
M3 M6
M2 M5
Q QB
M1 M4
© Adam Teman,
May 2, 2021
SRAM Layout – Thin Cell BL BLB
• Avoid Bends in Polysilicon and Diffusion WL WL

M3 M6
• Orient all transistors in one direction.
M2 M5
• Minimize Bitline Capacitance. Q QB
M1 M4
34
© Adam Teman,
May 2, 2021
65nm SRAM
• Industrial example from ST/Phillips
35
© Adam Teman,
May 2, 2021
Commercial SRAMs
Intel Design Forum 2009
36
© Adam Teman,
May 2, 2021
And very recent SRAMs Samsung 3nm
TSMC 7nm GAA SRAM Test Chip
SRAM
Source: TSMC
Source: ISSCC 2021
TSMC 5nm SRAM Test Chip

Source: ISSCC 2020 © Adam Teman,
May 2, 2021
SRAM Stability
“Static Noise Margin”
38
Static Noise Margin - Hold
+ +
VN - - VN
39
© Adam Teman,
May 2, 2021
Static Noise Margin - Hold
M3 M6
Q QB Q QB
M1 M4
1. Plot both VTCs on the same graph

2. Find the maximum square that fits
in the VTC.
3. The SNM is defined as the side of
the maximum square.
40
© Adam Teman,
May 2, 2021
Static Noise Margin - Read
• What happens during Read?
• We can’t ignore the access transistors anymore…
BLB
Vout
BL
WL M3 M6 WL
VDD M2 M5 VDD
Q M1 M4 QB
CBL
CBLB
M3 M2 M6 M5
QB Q
Q QB
M1 M4
Vin
41
© Adam Teman,
May 2, 2021
Static Noise Margin - Read
QB
M3 M2
QB Q
M1
SNM
M6 M5
Q
QB
M4
Q
42
© Adam Teman,
May 2, 2021
Static Noise Margin - Write Q
• What happens during Write? M3 M2

• The two sides are now different. QB Q
M1
BLB
BL
QB
WL M3 M6 WL QB
M2 M5 ‘0’
M6
Q=‘0’ M1 M4 QB=‘1’ QB
VDD Q
M4 M5
43
© Adam Teman,
May 2, 2021
Static Noise Margin - Write
Q
M3 M2
QB Q
M1
If there is a stable
point here, the
wrong data is
M6 WSNM written!
QB
Q
M4 M5
QB
44
© Adam Teman,
May 2, 2021
Alternative Write SNM Definition
• Write SNM depends on the cell’s separatrix,
therefore alternative definitions have been proposed.
• For example, add a DC Voltage (VBL) to the 0 bitline Q
and see how high it can be and still flip the cell.
M3 M2 M6 VBL=
V DD
QB
QB Q
Q
VB
L =0
M1 M4 M5
QB
VBL
45
© Adam Teman,
May 2, 2021
Dynamic Stability
46
© Adam Teman,
May 2, 2021
SNM Calculation
47
Simulating SNM
• Problem:
• How can we calculate SNM with SPICE?
• Some options:
• Insert DC sources at Q and QB
• But where exactly do we connect them?
• Draw Butterfly Curves
• But how do we find the largest squares?
• To run Monte Carlo Simulations we should have an easy way of calculation.
49
© Adam Teman,
May 2, 2021
Simulating SNM
• First let’s define the graphical solution:
• The diagonals of all the squares are on lines parallel to Q=QB.
• We need to find the distance
between the points where these
intersect the butterfly plot.
• The largest of these distances
is the diagonal of the maximum
square in each lobe.
• Multiply this by cos45°
and we get the SNM.
• Easy, right?
50
© Adam Teman,
May 2, 2021
Changing Coordinates
• What if we were to turn the graph?
51
© Adam Teman,
May 2, 2021
• If we were to use new axes, we could just subtract the graphs.
• This gives us the distances between the intersections with the Q=QB parallels.
• Now all we have to do is
find the maximum of the
subtraction.
• (Don’t forget to multiply by cos45)
52
© Adam Teman,
May 2, 2021
• The required transformation is: x=
1
u+
1
v
2 2
1 1
y=− u+ v
2 2
• Now let’s define some function as F1
• Substituting y=F1(x) gives:

v = u + 2y =
 1 1 
= u + 2 F1  u+ v
 2 2 
53
© Adam Teman,
May 2, 2021
• What we did is turn some function (F1) 45 degrees
counter clockwise.
• This can easily be implemented with the following circuit:
• What is F1?
• It could be the VTC of
Vin=Q, Vout=QB…
54
© Adam Teman,
May 2, 2021
• But what about the “mirrored” VTC?
• This needs to first be mirrored with respect to the v axis and then transformed
to the (u,v) system.
• If we call the second VTC F2 with x=F2(y) then the operation we need is:
v = −u + 2 x
 v u 
= −u + 2 F2  − 
 2 2
55
© Adam Teman,
May 2, 2021
Final SNM Calculation
VDD GND VDD
• Now we need to:
BL WL BLB
• Make a schematic of our SRAM
cell with two pins: Q and QB. 6T Cell
Q2 Q QB QB2
• Create a coordinate changing circuit VDD GND VDD
for each of the transformations. DC Sweep u v1 v1

BL WL BLB
Transformation 1
Q1 F1(in)
6T CellF1(out) QB1
Q2 Q QB QB2
DC Sweep u v2 v2
Transformation 2
QB2 F2(in) F2(out) Q2
56
© Adam Teman,
May 2, 2021
• Now, connect F1 to Q→QB, and F2 to QB→Q. VDD GND VDD
DC Sweep u v1 v1 BL WL BLB
Transformation 1 6T Cell
Q1 F1(in) F1(out) QB1 Q1 Q QB QB1
VDD GND VDD
DC Sweep u v2 v2
BL WL BLB
Transformation 2
QB2 F2(in) F2(out) Q2 6T Cell
Q2 Q QB QB2
• Run a DC Sweep on u from –VDD/√2 to VDD/√2

• This will present the butterfly curves
turned 45 degrees.
57
© Adam Teman,
May 2, 2021
• Now just:
• Subtract the bottom graph from the top one.
• Find the local maxima for each lobe.
• The smaller of the local maxima
is the diagonal of the largest square.
• Multiply this by cos45° for the SNM
 min max ( v1 − v2 ) , max ( v1 − v2 )

1 
SNM =
2  − 2 u  0 0 u  2
58
© Adam Teman,
May 2, 2021
Read/Write SNM
• How about Read SNM:
• Use the exact same setup.
• Connect BL and BLB to VDD.
• Connect WL to VDD.
• Run the same calculation.
• And Write SNM.

• Now connect one BL to GND.
• This is trickier, so you’ll have to play around with the calculation.
• There are other options for WM calculation.
59
© Adam Teman,
May 2, 2021
Testbench Setup – Read/Write
Read Testbench: Write Testbench:
DC Sweep u v1 v1 DC Sweep u v1 v1
Transformation 1 Transformation 1
Q1 F1(in) F1(out) QB1 Q1 F1(in) F1(out) QB1
DC Sweep u v2 v2
DC Sweep u v2 v2 Transformation 2
Transformation 2 QB2 F2(in) F2(out) Q2
QB2 F2(in) F2(out) Q2
GND VDD VDD
VDD VDD VDD
BL WL BLB
BL WL BLB
6T Cell
6T Cell Q1 Q QB QB1
Q1 Q QB QB1
GND VDD VDD
VDD VDD VDD
BL WL BLB
BL WL BLB
6T Cell 6T Cell
Q2 Q QB QB2 Q2 Q QB QB2
60
© Adam Teman,
May 2, 2021
SRAM Stability under process variations
61
© Adam Teman,
May 2, 2021
Metastability Convergence in Spectre
• Node Sets
• What solution does Virtuoso find with a standard OP?
• To fix this, make sure you use the “Node Set” option.
© Adam Teman,
May 2, 2021
Node Sets vs. Initial Conditions
• SPICE supports two types of conversion aids :
• Node Sets:
• Help SPICE converge by providing it
with an initial guess.
• Used only for DC convergence!
Disregarded for Transient Analysis.
• Initial Conditions:
• Enforce a node voltage at time t=0.
• Used only for Transient analysis!
Disregarded for DC convergence.
63
© Adam Teman,
May 2, 2021
Additional simulation tips
• Work with Design Hierarchy
• Create transformation functions
and DUTs as symbols.
• Create multiple tests in
single ADE-XL view.
• Use variables/parameters to
define initial conditions/node sets.
• Create supply voltages in
separate symbol.
• Use buffers to smooth transitions
and reduce cross cap.
64
© Adam Teman,
May 2, 2021
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”
• Seevinck, List, Lostroh, “Static Noise Margin Analysis of SRAM Cells”
IEEE Journal of Solid State Circuits, 1987
• Teman and Visotsky. "A fast modular method for true variation-aware separatrix
tracing in nanoscaled SRAMs." IEEE TVLSI, 2014.
65
© Adam Teman,
May 2, 2021
Digital Integrated Circuits
(83-313)
Lecture 9:
Memory Peripherals
Prof. Adam Teman
25 May 2021
Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email adam.teman@biu.ac.il and I will address this as soon as possible.
Lecture Content
2
© Adam May
Teman,
25, 2021
Memory Peripherals Overview
3
Memory Architecture
Storage Cell
Bit Line Memory Size: W Words of C bits
=W x C bits
Address bus: A bits
ADDA-1 : ADDM
→W=2A
Row Decoder
Word Line
Number of Words in a Row: 2M

Multiplexing Factor: M
Number of Rows: 2A-M

Number of Columns: C x 2M
C×2M
Row Decoder: A-M → 2A-M
ADDM-1 :
Column Decoder Column Decoder: M → 2M
ADD0
Input/Output
4 (C bits) © Adam May
Teman,
25, 2021
Synchronous SRAM Interface 2mxn SRAM
A[m-1:0]
• A typical on-chip synchronous SRAM features: D[n-1:0] Q[n-1:0]
• Single-cycle write/read latency WEN[p-1:0]
• Byte write mask CEN

• Active low Write Enable (i.e., WEN=1 → Read Enable) CLK
• The timing diagram can be viewed, as follows:
(1) Rising edge of the clock results
CLK in WRITE, when WE is low.
A A0 A1 A2 A3 (2) Rising edge of the clock results

in READ, when WE is high.
D D0 D1 Valid data appears on the
output after a delay.
WE
Q D2 D3
5 © Adam May
Teman,
25, 2021
Memory Timing: Definitions
Real Datasheet
Example
Simple Definitions
Source: CMU, ECE548
6 Write Cycle © Adam May

Teman,
Read25, 2021
Cycle
Major Peripheral Circuits
Storage Cell
Bit Line
• Row Decoder
• Column Multiplexer
Row Decoder
Word Line
AW-1 : AM
• Sense Amplifier
• Write Driver
• Precharge Circuit
C×2M
AM-1 : A0 Column Decoder
Input/Output
(C bits)
7
© Adam May
Teman,
25, 2021
Row Decoder Design
8
Row Decoders
• A Decoder reduces the number of select signals by log2.
• Number of Rows: W
• Number of Row Address Bits: A=log2W
Word 0
Word 1
ADDA-1 : ADD0
Word 2
Row Decoder
Word W-2
Word W-1
9 Row Decoder Column Mux Precharge Sense Amp © Adam May

Teman,
25, 2021
Row Decoders
• Standard Decoder Design:
• Each output row is driven by an AND gate with k=log2N inputs.
• Each gate has a unique combination of address inputs
(or their inverted values).
• For example, an 8-bit row address has 256 8-input AND gates, such as:
WL0 = A7 A6 A5 A4 A3 A2 A1 A0 WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• NOR Decoder:
• DeMorgan will provide us with a NOR Decoder.
• In the previous example, we’ll get 256 8-input NOR gates:
WL0 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A0
WL255 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A 0
Teman,
25, 2021
How should we build it? WL0
• Let’s build a row decoder for a 256x256 SRAM Array.

• We need 256 8-input AND Gates.
WL1
• Each gate drives 256 bitcells
• We have various options:
WL255
• Which one is best?

Teman,
25, 2021
Reminder: Logical Effort
t pd ,i = t pINV ( pi + EFi )
bi  Cin,i +1
PE = F   LE  B =   LEi  bi
CL
EFi LEi  fi = LEi 
Cin,i Cin ,1
EFopt = PE = N F   LEi  bi
N
Nopt = log EFopt PE = log EFopt F  LE  B
(
t pd = t pINV  ( pi + EFi ) = t pINV   pi + N  N PE )
Teman,
25, 2021
Problem Setup
• For LE calculation we need to start with:
• Output Load (CL)
• Input Capacitance (Cin)
• Branching (B)
• What is the Load Capacitance?
• 256 bitcells on each Word Line
CWL = 256  CCell + CWire

• Let’s ignore the wire for now…
• What is the Input Capacitance?
• Let’s assume our address drivers
can drive a bit more than a bitcell, so: Cin ,addr _ driver = 4  CCell
Teman,
25, 2021
Problem Setup
• What is the Branching Effort?
• Lets take another look WL0 = A7 A6 A5 A4 A3 A2 A1 A0
at the Boolean expressions:
WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• We see that half of the signals use Ai and half use Ai!
• So each address driver drives 128 8-input AND gates,
but only one is on the selected WL path.
Con path = Cnand ; Coff path = 127  Cnand

Con path + Coff path Cnand + 127  Cnand
Badd _ driver = = = 128
Con path Cnand
Teman,
25, 2021
Number of Stages
CWL 256CCell
• Altogether the path effort is: PE = LE  B  F = LE  bi = LE  128 
Caddress 4CCell
= LE  8k = 213  LE
• The best case logical effort is

LE = 1
• So the minimum number of
stages for optimal delay is: PE = 213
N opt = log 3.6 2 = 7
13
• That’s a lot of stages!

Teman,
25, 2021
So which implementation should we use?
• The one with the minimum Logical Effort:
LE = (10 3) 1 LE = 2  ( 5 3) LE = ( 4 3)  ( 5 3)  ( 4 3) 1 LE = ( 4 3)

3
= 10 3; = 10 3 = 80 27;
p = 2 + 2 + 2 +1 = 7
= 2.37;
p = 8 +1 = 9 p = 4+2 = 6
p = 2  3 + 1 3 = 9

Teman,
25, 2021
New optimal number of Stages
• So now we can calculate the actual path effort:
PE = F  bi  LEi =
= 2.37  213 = 19.418k
N opt = log 3.6 PE = 7.7
• We could add another inverter or two

to get closer to the optimal number of stages…

Teman,
25, 2021
Implementation Problems
• Address Line Capacitance:
• Our assumption was that Cin,addr_driver=4Ccell.
• But each address drives 128 gates
• That’s a really long wire with high capacitance.
• This means that we will need to buffer the address lines
• This will probably ruin our whole analysis...
• Bit-cell Pitch:
• Each signal drives one row of bitcells.
• How will we fit 8 address signals into this pitch?

Teman,
25, 2021
Predecoding - Concept
• Solution:
• Let’s look at two decoder paths: WL254, WL255
A0 A0 A0
A1 A0
A1 A1
A1
A2
A2 A2
A2
A3
WL255 A3 A3 A
A4 WL254 3
A4
A5 A4
A5 A4
A6 A5 A5
A6
A7
A6 A7
A6
A7 A7
• We see that there are many “shared” gates.

• So why not share them?
• For instance, we can use the purple output for both gates…
Teman,
25, 2021
Predecoding - Method
4 →16
A0
• How do we do this? A1 D
A2
• If we look at the final Boolean expression, A3
it has combinations of groups of inputs.
• By grouping together a few inputs,
we actually create a small decoder.
4 →16
• Then we just AND the outputs of all the A4
“pre” decoders.
A5 E
A6
• For example: Two 4:16 predecoders A7
D = dec ( A0 , A1 , A2 , A3 ) ; E = dec ( A4 , A5 , A6 , A7 ) ;
WL0 = D0  E0 ; WL255 = D15  E15 ; WL254 = D14  E15 ;
Teman,
25, 2021
Predecoding - Example
• Let’s look at our example: WL0 = D0  E0
D = dec ( A0 , A1 , A2 , A3 ) WL255 = D15  E15
E = dec ( A4 , A5 , A6 , A7 ) WL254 = D15  E14
• What is our new branching effort?
• As before, each address drives half the lines of the small decoder.
• Each predecoder output drives 256/16 post-decoder gates.
• Altogether, the branching effort is:
B = baddr _ driver  bpredecoder = 16  256 = 128
2 16
• Same as before!
Teman,
25, 2021
Predecoding - Solution
• Why is this a better solution?
• Each Address driver is only driving eight gates
• less capacitance.
• We saved a ton of area by “sharing” gates.
• We can “Pitch Fit” 2-input NAND gates.

Teman,
25, 2021
Another Predecoding Example
• We can try using four 2-input predecoders:
• This will require us to use 256 4-input NAND gates.

Teman,
25, 2021
How do we choose a configuration?
• Pitch Fitting: 2-input NANDs vs. 4-input NAND.
• Switching Capacitance: How many wires switch at each transition?
• Stages Before the large cap: Distribution of the load along the delay.
• Conclusion: Usually do as much predecoding as possible!
WL0 WL0
WL1 WL1
4 4 4 4 16 16
WL127 WL127
2→4 2→4 2→4 2→4 4 →16 4 →16
A0A1 A2A3 A4A5 A6A7 A0A1A2A3 A4A5A6A7

Teman,
25, 2021
Alternative Solution: Dynamic Decoders
GND
GND
VDD
PC
WL0 WL3
WL1 WL2
WL1
WL2
WL0
WL3
A0
A0
A1
A1
A0
A0
A1
A1
2-input NOR decoder 2-input NAND decoder
Teman,
25, 2021
Column Multiplexer
26
Column Multiplexer
• First option – PTL Mux with decoder
• Fast – only 1 transistor in signal path.
• Large transistor Count A1 A0
B0 B1 B2 B3
Y
Teman,
25, 2021
4 to 1 tree decoder
• Second option – Tree Decoder
• For 2k:1 Mux, it uses k series transistors.
• Delay increases quadratically
• No external decode logic → big area reduction.

Teman,
25, 2021
Combining the Two

Teman,
25, 2021
Precharge and Sense Amp
30
Precharge Circuitry
• Precharge bitlines high before reads

bit bit_b
• Equalize bitlines to minimize voltage difference when using sense amplifiers
bit bit_b

Teman,
25, 2021
Sense Amplifiers
make D V as small
C  DV as possible
tp = ----------------
Iav
large small
Idea: Use Sense Amplifier
small
transition s.a.
input output

Teman,
25, 2021
Differential Sense Amplifier
• Non-clocked Sense Amp has high static power.
• Clocked sense amp saves power
• Requires sense_clk after enough bitline swing
• Isolation transistors cut off large bitline capacitance

Teman,
25, 2021
The Computer Hall of Fame
• The machine that many of us got to know
during our military service:
Source: pcworld.com
• 32-bit, CISC architecture, introduced in 1977
• The VAX-11/780 was TTL-based, 5MHz, 2kB cache, reaching 1 MIPS
• Known as a “minicomputer”, even though it took up a whole room.
• VAX means “Virtual Address Extension”,
since the VAX was one of the first minicomputers to use virtual memory.
• Ran the VMS operating system.
• Many systems that were developed during the cold war
(e.g., F-15, F-18, Hawk missiles, nuclear programs) still use VAX today!
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”
36
© Adam May
Teman,
25, 2021

SRAM Adi Teman

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SRAM Adi Teman

Uploaded by

Copyright:

Available Formats

Digital Integrated Circuits

Cerebras Wafer Scale Engine 2 (2021) –

5 Source: Pavlov, Sachdev, 2008

Random Access Memory Serial Access Memory Content Addressable Memory

• Random Access, FIFO, LIFO,

• Let’s say we want to make a 1MB memory:

Number of Words in a Row: 2M

Number of Rows: 2A-M

Option 3: Transmission Gate Solution: Differential nMOS Write

Writes well, but how do we read?

• The personal computer that “was designed Source: macworld.co.uk

so easy to use that people could actually use it”.

VDSat,n + CR (VDD − VT,n ) − V (1 + CR ) + CR (VDD − VT,n )

So we need the pull

So we need the access

KPDN  Kaccess  KPUN

Dual Port SRAM Two Port SRAM

• Avoid Bends in Polysilicon and Diffusion WL WL

Intel Design Forum 2009

Source: ISSCC 2021

TSMC 5nm SRAM Test Chip

1. Plot both VTCs on the same graph

• What happens during Write? M3 M2

• To run Monte Carlo Simulations we should have an easy way of calculation.

• Substituting y=F1(x) gives:

• Create a coordinate changing circuit VDD GND VDD

for each of the transformations. DC Sweep u v1 v1

VDD GND VDD

• Run a DC Sweep on u from –VDD/√2 to VDD/√2

 min max ( v1 − v2 ) , max ( v1 − v2 )

• And Write SNM.

Number of Words in a Row: 2M

Number of Rows: 2A-M

• A typical on-chip synchronous SRAM features: D[n-1:0] Q[n-1:0]

• Single-cycle write/read latency WEN[p-1:0]

• Byte write mask CEN

A A0 A1 A2 A3 (2) Rising edge of the clock results

Source: CMU, ECE548

6 Write Cycle © Adam May

AM-1 : A0 Column Decoder

9 Row Decoder Column Mux Precharge Sense Amp © Adam May

• Let’s build a row decoder for a 256x256 SRAM Array.

• Which one is best?

Nopt = log EFopt PE = log EFopt F  LE  B

CWL = 256  CCell + CWire

Con path = Cnand ; Coff path = 127  Cnand

• The best case logical effort is

• That’s a lot of stages!

15 Row Decoder Column Mux Precharge Sense Amp © Adam May

LE = (10 3) 1 LE = 2  ( 5 3) LE = ( 4 3)  ( 5 3)  ( 4 3) 1 LE = ( 4 3)

16 Row Decoder Column Mux Precharge Sense Amp © Adam May

• We could add another inverter or two

17 Row Decoder Column Mux Precharge Sense Amp © Adam May

18 Row Decoder Column Mux Precharge Sense Amp © Adam May

• We see that there are many “shared” gates.

22 Row Decoder Column Mux Precharge Sense Amp © Adam May

23 Row Decoder Column Mux Precharge Sense Amp © Adam May

2→4 2→4 2→4 2→4 4 →16 4 →16

A0A1 A2A3 A4A5 A6A7 A0A1A2A3 A4A5A6A7

28 Row Decoder Column Mux Precharge Sense Amp © Adam May

29 Row Decoder Column Mux Precharge Sense Amp © Adam May

31 Row Decoder Column Mux Precharge Sense Amp © Adam May

Idea: Use Sense Amplifier

32 Row Decoder Column Mux Precharge Sense Amp © Adam May