Professional Documents
Culture Documents
Array Structured Memories: Stmicro/Intel Ucsd Cad Lab Weste Text
Array Structured Memories: Stmicro/Intel Ucsd Cad Lab Weste Text
STMicro/Intel
UCSD CAD LAB
Weste Text
1
STMicro/Intel/UCSD/THNU
EE141 Memory
Memory Arrays
Memory Arrays
2
STMicro/Intel/UCSD/THNU
EE141 Memory
Feature Comparison Between
Memory Types
3
STMicro/Intel/UCSD/THNU
EE141 Memory
Array Architecture
2n words of 2m bits each
If n >> m, fold by 2k into fewer rows of more columns
bitline conditioning
wordlines
bitlines
row decoder
memory cells:
2n-k rows x
2m+k columns
n-k
k column
circuitry
n column
decoder
2m bits
5
STMicro/Intel/UCSD/THNU
EE141 Memory
Hierarchical Memory Architecture
Row
Address
Column
Address
Block
Address
I/O
Advantages:
1. Shorter wires within blocks
2. Block address activates only 1 block => power savings
6
STMicro/Intel/UCSD/THNU
EE141 Memory
Array Organization Design Issues
aspect ratio should be relative square
Row / Column organisation (matrix)
R = log2(N_rows); C = log2(N_columns)
R + C = N (N_address_bits)
number of rows should be power of 2
number of bits in a row need not be…
sense amplifiers to speed voltage swing
1 -> 2R row decoder
1 -> 2C column decoder
M column decoders (M bits, one per bit)
– M = output word width
7
STMicro/Intel/UCSD/THNU
EE141 Memory
Simple 4x4 SRAM Memory
read
precharge bit line precharge
enable
WL[0]
BL !BL
Row Decoder
2 bit width: M=2 A1
WL[1]
R = 2 => N_rows = 2R = 4
C=1 A2 WL[2]
N_columns = 2c x M = 4
N=R+C=3 WL[3]
Array size =
N_rows x N_columns = 16
A0
Column Decoder
A0!
clocking and
control -> sense amplifiers
8
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Read Timing (typical)
tAA (access time for address): time for stable
output after a change in address.
tACS (access time for chip select): time for stable
output after CS is asserted.
tOE (output enable time): time for low impedance
when OE and CS are both asserted.
tOZ (output-disable time): time to high-impedance
state when OE or CS are negated.
tOH (output-hold time): time data remains valid
after a change to the address inputs.
9
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Read Timing (typical)
ADDR stable stable stable
CS_L
tOH
tACS
OE_L
WE_L = HIGH
10
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Architecture and Read
Timings
tOH
tAA
tACS
tOZ
tOE
11
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM write cycle timing
~WE controlled
~CS controlled
12
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Architecture and Write
Timings
tDH
Setup time = tDW
Write
driver
tWP-tDW
13
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Cell Design
Memory arrays are large
Need to optimize cell design for area and
performance
Peripheral circuits can be complex
– 60-80% area in array, 20-40% in periphery
Classical Memory cell design
6T cell full CMOS
4T cell with high resistance poly load
TFT load cell
14
STMicro/Intel/UCSD/THNU
EE141 Memory
Anatomy of the SRAM Cell
Write:
•set bit lines to new data value Read:
•b’ = ~b •set bit lines high
•raise word line to “high” •set word line high
•sets cell to new state •see which bit line goes
•Low impedance bit-lines low
•High impedance bit lines
15
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Cell Operating Principle
•Inverter Amplifies
•Negative gain
•Slope < –1 in middle
•Saturates at ends
• Inverter Pair Amplifies
•Positive gain
•Slope > 1 in middle
•Saturates at ends
16
STMicro/Intel/UCSD/THNU
EE141 Memory
Bistable Element
Stability
Require Vin = V2
Stable at endpoints
recover from pertubation
Metastable in middle
Fall out when perturbed
18
STMicro/Intel/UCSD/THNU
EE141 Memory
SNM: Butterfly Curves
1
SNM
2
SNM
1
2 1 1 2
19
STMicro/Intel/UCSD/THNU
EE141 Memory
SNM for Poly Load Cell
20
STMicro/Intel/UCSD/THNU
EE141 Memory
12T SRAM Cell
bit
write
12-transistor (12T)
SRAM cell
Latch with TM-gate write
Separately buffered read
21
STMicro/Intel/UCSD/THNU
EE141 Memory
6T SRAM Cell
Cell size accounts for most of array size
Reduce cell size at cost of complexity/margins
6T SRAM Cell
Read:
Precharge bit, bit_b bit bit_b
word
Raise wordline
Write:
Drive data onto bit, bit_b
Raise wordline
22
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Design
23
STMicro/Intel/UCSD/THNU
EE141 Memory
Vertical 6T Cell Layout B- B+ N Well
Connection
VDD
PMOS
Pull Up
Q/ Q
NMOS
Pull Down
GND
SEL
SEL MOSFET
Substrate
Connection
24
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Bitcell Design
WL VSS VDD BL
25
STMicro/Intel/UCSD/THNU
EE141 Memory
Detailed SRAM Bitcell Layout
26
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Read
bit bit_b
Precharge both bitlines high word
P1 P2
Then turn on wordline N2 N4
One of the two bitlines will A A_b
N1 N3
be pulled down by the cell
Ex: A = 0, A_b = 1 A_b bit_b
0.5
A must not flip A
N1 >> N2 0.0
0 100 200 300 400 500 600
time (ps)
27
STMicro/Intel/UCSD/THNU
EE141 Memory
bit bit_b
word
P1 P2
N2 N4
A A_b
N1 N3
N2
P1 P2
N4
Drive one bitline high, other low A A_b
N1 N3
Then turn on wordline
Bitlines overpower cell
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0
Force A_b low, then A rises high
Writability A_b
0.0
0 100 200 300 400 500 600 700
time (ps)
29
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Sizing
High bitlines must not overpower
inverters during reads
But low bitlines must write new value
into cell bit bit_b
word
weak
med med
A A_b
strong
30
STMicro/Intel/UCSD/THNU
EE141 Memory
SRAM Column Example
read write Bitline Conditioning
2
More
Cells
word_q1
bit_b_v1f
bit_v1f
SRAM Cell
write_q1
data_s1
31
STMicro/Intel/UCSD/THNU
EE141 Memory
Decoders
n:2n decoder consists of 2n n-input AND gates
One needed for each row of memory
Build AND from NAND or NOR gate
choose minimum size to reduce
load on the address lines
A1 A0
A1 A0
1/2 4 16
1 1 8 word
word
A0 A1 2 8
A1 1 4 1 1
A0 1
word0 word0
word1 word1
Pseudo-nMOS
word2
static word2
word3
word3
32
STMicro/Intel/UCSD/THNU
EE141 Memory
Single Pass-Gate Mux
A1 A0
B0 B1 B2 B3
bitlines propagate
through 1 transistor
33
STMicro/Intel/UCSD/THNU
EE141 Memory
Decoder Layout
Decoders must be pitch-matched to
SRAM cell
Requires very skinny gates
A3 A3 A2 A2 A1 A1 A0 A0
VDD
word
GND
34
STMicro/Intel/UCSD/THNU
EE141 Memory
Large Decoders
For n > 4, NAND gates become slow
Break large gates into multiple smaller gates
A3 A2 A1 A0
word0
word1
word2
word3
word15
35
STMicro/Intel/UCSD/THNU
EE141 Memory
Predecoding
Many of these gates are redundant
Factor out common A3
A1
Saves area A0
1 of 4 hot
predecoded lines
word0
word1
word2
word3
word15
36
STMicro/Intel/UCSD/THNU
EE141 Memory
37
STMicro/Intel/UCSD/THNU
EE141 Memory
Column Circuitry
Some circuitry is required for each
column
Bitline conditioning
Sense amplifiers
Column multiplexing
Each column must have write drivers
and read sensing circuits
38
STMicro/Intel/UCSD/THNU
EE141 Memory
Column Multiplexing
Recall that array may be folded for good
aspect ratio
Ex: 2k word x 16 folded into 256 rows x
128 columns
Must select 16 output bits from the 128
columns
Requires 16 8:1 column multiplexers
39
STMicro/Intel/UCSD/THNU
EE141 Memory
Typical Column Access
40
STMicro/Intel/UCSD/THNU
EE141 Memory
Pass Transistor Based Column
Decoder
BL !BL BL !BL BL !BL 3 3 2 2 1 1 BL0 !BL0
2 input NOR decoder
S3
A1
S2
S1
A0
S0
Data !Data
A1
A1
A2
A2
Y Y
to sense amps and write circuits
42
STMicro/Intel/UCSD/THNU
EE141 Memory
Ex: 2-way Muxed SRAM
2
More More
Cells Cells
word_q1
2-to-1 mux
A0
A0
43
STMicro/Intel/UCSD/THNU
EE141 Memory
Bitline Conditioning
Precharge bitlines high before reads
bit bit_b
bit bit_b
44
STMicro/Intel/UCSD/THNU
EE141 Memory
Sense Amplifier: Why? Cell pull
down
Bit line cap significant for large array Xtor
If each cell contributes 2fF, resistance
45
STMicro/Intel/UCSD/THNU
EE141 Memory
Sense Amplifiers
Bitlines have many cells attached
Ex: 32-kbit SRAM has 256 rows x 128 cols
128 cells on each bitline
tpd (C/I) V
Even with shared diffusion contacts, 64C of
diffusion capacitance (big C)
Discharged slowly through small transistors (small
I)
Sense amplifiers are triggered on small voltage
swing (reduce V)
46
STMicro/Intel/UCSD/THNU
EE141 Memory
Differential Pair Amp
Differential
pair requires no clock
But always dissipates static power
P1 P2
sense_b sense
bit N1 N2 bit_b
N3
47
STMicro/Intel/UCSD/THNU
EE141 Memory
Clocked Sense Amp
Clocked sense amp saves power
Requires sense_clk after enough bitline swing
Isolation transistors cut off large bitline capacitance
bit bit_b
sense_clk isolation
transistors
regenerative
feedback
sense sense_b
48
STMicro/Intel/UCSD/THNU
EE141 Memory
Sense Amp Waveforms
1ns / div
bit
200mV
bit’
wordline wordline
begin precharging bit lines
BIT
2.5V
BIT’
50
STMicro/Intel/UCSD/THNU
EE141 Memory
Dual-Ported SRAM
Simple dual-ported SRAM
Two independent single-ended reads
Or one differential write bit bit_b
wordA
wordB
wordA reads bit_b (complementary)
wordB reads bit (true)
51
STMicro/Intel/UCSD/THNU
EE141 Memory
Multiple Ports
We have considered single-ported SRAM
One read or one write on each cycle
Multiported SRAM are needed for register files
Examples:
Multicycle MIPS must read two sources or write a
result on some cycles
Pipelined MIPS must read two sources and write a
third result each cycle
Superscalar MIPS must read and write many
sources and results each cycle
52
STMicro/Intel/UCSD/THNU
EE141 Memory
Multi-Ported SRAM
Adding more access transistors hurts read stability
Multiported SRAM isolates reads from state node
Single-ended design minimizes number of bitlines
bA bB bC bD bE bF bG
wordA
wordB
wordC
wordD
wordE
wordF
wordG
write
circuits
read
circuits
53
STMicro/Intel/UCSD/THNU
EE141 Memory
Logical effort of RAMs
54
STMicro/Intel/UCSD/THNU
EE141 Memory
55
STMicro/Intel/UCSD/THNU
EE141 Memory
Twisted Bitlines
Sense amplifiers also amplify noise
Coupling noise is severe in modern processes
Try to couple equally onto bit and bit_b
Done by twisting bitlines
b0 b0_b b1 b1_b b2 b2_b b3 b3_b
56
STMicro/Intel/UCSD/THNU
EE141 Memory
Alternative SRAM Cells
Low Voltage/High Leakage/Process Variations crowd
the operating margins of conventional SRAM
Alternative Sense Amplifiers, column and row
arrangements, adaptive timing, smaller hierarchy,
redundant and spare rows/columns have all been
addressed in the literature with some success.
Some problems come from the cell design itself–
modifying the cell can break conflicting demands for
optimization
57
STMicro/Intel/UCSD/THNU
EE141 Memory
10T
Features
BL Leakage reduction
Approaches
Separated Read port
Stacked effect by M10
Performance
400mV@475kHz, 3.28uW
320mV W/O Read error@27℃
380mV W/O Write error@27℃
Vmin=300mV@1% bit errors
256 bits/BL
Stable SRM Cell Design for the 32nm Node and Beyond
Leland Chang et. al,
Symp. on VLSI,2005
63
STMicro/Intel/UCSD/THNU
EE141 Memory
8T
Features
No read disturb
Low VDD(350mV)
Low subthreshold(Sub. Vt) leakage
Approaches
Separate Read &Write WL
Separated read port
Foot-drivers reduce the sub.Vt leakage
Performance
65nm process ,128 cells/row
Operating @ 25KHz
2.2uW leakage power
70
STMicro/Intel/UCSD/THNU
EE141 Memory
Butterfly and N-Curves
• Measure method
– Increase VR and measure VL
– Increase VL and measure VR
– Make voltage transfer curve in VR
and VL axes Butterfly
– Measure Iin N-curve
71
STMicro/Intel/UCSD/THNU
EE141 Memory
Iread, Ileakage and VDDHOLD
Iread
Measure bitline current when WL WL
switches to high VDD
ILEAKAGE A1 A2
nr
‘1’ nl
Measure VDD (or VSS) current when C1 ‘0’
C2
WL=0
B1 B2
VDDHOLD
BLb BL
Decreasing VDD voltage, while WL=0 ‘1’ ‘1’
Measure minimum VDD voltage when
| V(nl) - V(nr) | = ‘sensing margin’
(100mV is assumed)
32nm proposed
REFERENCE
(for 30x12 and 25x12)
Iread 41.2 uA 66.7 uA
Ileakage 85.4 nA 142.7 nA
VDDHOLD 110 mV 118 mV
72
STMicro/Intel/UCSD/THNU
EE141 Memory
Corner Simulation: Butterfly and N-Curve
• Three candidate
1.20E+00 1.20E+00 1.20E+00
5.0E-05 5.0E-05
5.0E-05
0.0E+00 0.0E+00
1
6
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
101
0.0E+00
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
-5.0E-05 -5.0E-05
-5.0E-05 25x10 25x10 25x10
25x12 -1.0E-04 25x12 -1.0E-04 25x12
30x12 30x12 30x12
-1.0E-04 -1.5E-04 -1.5E-04
0.13
0.128
0.126
0.124
VDDHOLD (V) 0.122
0.12
0.118
0.116
0.114
0.112
25x10 25x12 30x12 25x10 25x12 30x12 25x10 25x12 30x12
74
STMicro/Intel/UCSD/THNU
EE141 Memory