Semiconductor Memory Design (Sram & Dram) : Kaushik Saha

Semiconductor Memory Design (SRAM & DRAM)
Kaushik Saha Contact: kaushik.saha@st.com, mobile-98110-64398
STMicroelectronics
Understanding the Memory Trade

The memory market is the most
- Volatile - Cost Competitive - Innovative in the IC trade
Demand Memory market
Supply
Technical Change
2
Classification of Memories
RWMemory
Random Access Non-Random Access
NVRWM
ROM
EPROM EEPROM FLASH
Mask Programmed PROM (Fuse Programmed)
SRAM (Static) DRAM (Dynamic)
FIFO (Queue) LIFO (Stack) SR (Shift Register) CAM (Content Addressable)
Feature Comparison Between Memory Types
Memory selection : cost and performance

DRAM, EPROM
- Merit : cheap, high density - Demerit : low speed, high power
SRAM
- Merit : high speed or low power - Demerit : expensive, low density
Large memory with cost pressure :

- DRAM
Large memory with very fast speed :

- SRAM or - DRAM main + SRAM cache
Back-up main for no data loss when power failure

- SRAM with battery back-up - EEPROM
5
Trends in Storage Technology

Generation
Increasing die size factor 1.5 per generation Combined with reducing cell size factor 2.6 per generation
*MB=Mbytes
The Need for Innovation in Memory Industry

The learning rate (viz. the constant b) is the highest for the memory industry
- Because prices drop most steeply among all ICs
Due to the nature of demand + supply
- Yet margins must the maintained
Techniques must be applied to reduce production cost Often, memories are the launch vehicles for a technology node
- Leads to volatile nature of prices
7
Memory Hierarchy of a Modern Computer System

By taking advantage of the principle of locality:
- Present the user with as much memory as is available in the cheapest technology. - Provide access at the speed offered by the fastest technology.
Processor Control Second Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk)
Tertiary Storage (Tape)
On-Chip Cache
Registers
Datapath
Speed (ns):
1s
10s Ks
100s Ms
Size (bytes): 100s
10,000,000s 10,000,000,000s (10s ms) (10s sec) Gs Ts

8
How is the hierarchy managed?

Registers <-> Memory
- by compiler (programmer?)
cache <-> memory

- by the hardware
memory <-> disks

- by the hardware and operating system (virtual memory) - by the programmer (files)
Memory Hierarchy Technology

Random Access:
- Random is good: access time is the same for all locations - DRAM: Dynamic Random Access Memory
High density, low power, cheap, slow Dynamic: need to be refreshed regularly Low density, high power, expensive, fast Static: content will last forever(until lose power)
- SRAM: Static Random Access Memory
Not-so-random Access Technology:

- Access time varies from location to location and from time to time - Examples: Disk, CDROM
10
Main Memory Background

Performance of Main Memory:
- Latency: Cache Miss Penalty
Access Time: time between request and word arrives Cycle Time: time between requests
- Bandwidth: I/O & Large Block Miss Penalty (L2)
Main Memory is DRAM : Dynamic Random Access Memory

- Dynamic since needs to be refreshed periodically Addresses divided into 2 halves (Memory as a 2D matrix):
RAS or Row Access Strobe CAS or Column Access Strobe
Cache uses SRAM : Static Random Access Memory

- No refresh (6 transistors/bit vs. 1 transistor) Size: DRAM/SRAM 4-8 Cost/Cycle time: SRAM/DRAM 8-16
11
Memory Interfaces
Address i/ps
- Maybe latched with strobe signals
Write Enable (/WE)

- To choose between read / write - To control writing of new data to memory
Chip Select (/CS)

- To choose between memory chips / banks on system
Output Enable (/OE)

- To control o/p buffer in read circuitry
Data i/os
- For large memories data i/p and o/p muxed on same pins,
selected with /WE
Refresh signals
12
Memory - Basic Organization

S0
Word 0
N words
S1
S2
Word 1
Word 2
Single Storage Cell
M bits per word N select lines
1:N decoder
very inefficient design
SN-2
Word N-2
difficult to place and route
SN-1
Word N-1
M bit output word
13
Memory - Real Array of N x K words Organization

------------- columns ------------ KxM
S0
C of M bit words C of M bit words
row 0 ------------- rows R-----------row 1 row 2
Log2R Address Lines
Row Decoder
C of M bit words
SR-1
C of M bit words
- - - - KxM bits - - - -
row N-2 row N-1
Log2C Address Lines
C of M bit words
Column Select
N=R*C
M bit data word

14
Array-Structured Memory Architecture

Problem: ASPECT RATIO or HEIGHT >> WIDTH
2L-K AK AK+1 AL-1 Bit Line Storage Cell
Row Decoder
Word Line
M.2K Sense Amplifiers / Drivers A0 AK -1 Input-Output (M bits)

15
Amplify swing to rail-to-rail amplitude Selects appropriate word
Column Decoder
Hierarchical Memory Architecture

Row Address Column Address Block Address
Global Data Bus Control Circuitry Block Selector Global Amplifier/Driver I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings
16
Memory - Organization and Cell Design Issues

aspect ratio (height : width) should be relative square - Row / Column organisation (matrix) - R = log2(N_rows); C = log2(N_columns) - R + C = N (N_address_bits) number of rows should be power of 2 - number of bits in a row sense amplifiers to amplify the voltage from each memory cell 1 -> 2R row decoder 1 -> 2C column decoder - implement M of the column decoders (M bits, one per bit) M = output word width
17
Semiconductor Manufacturing Process
18
Basic Micro Technology
19
Semiconductor Manufacturing Process

Fundamental Processing Steps 1.Silicon Manufacturing
a) Czochralski method. b) Wafer Manufacturing c) Crystal structure
2.Photolithography
a) Photoresists b) Photomask and Reticles c) Patterning
20
Lithography Requirements
21
Excimer Laser DUV & EUV lithography
Power o/p
Pulse Rate
NovaLine Laser Lambda Physik

22
Dry or Plasma Etching
23
24

Combination of chemical and physical etching Reactive Ion Etching (RIE)
Directional etching due to ion assistance. In RIE processes the wafers sit on the powered electrode. This placement sets up a negative bias on the wafer which accelerates positively charge ions toward the surface. These ions enhance the chemical etching mechanisms and allow anisotropic etching. Wet etches are simpler, but dry etches provide better line width control since it is anisotropic.
25
Dry Etching Reactive Ion Etching- RIE
26
CMOS fabrication sequence

4.2 Local oxidation of silicon (LOCOS)
The photoresist mask is removed The SiO2/SiN layers will now act as masks The thick field oxide is then grown by: exposing the surface of the wafer to a flow of oxygen-rich gas The oxide grows in both the vertical and lateral directions This results in a active area smaller than patterned
patterned active area
Field oxide (FOX)
n-well
active area after LOCOS
p-type
Paulo Moreira
Technology
27
LOCOS: Local Oxidation
28
Advanced CMOS processes

Shallow trench isolation n+ and p+-doped polysilicon gates (low threshold) source-drain extensions LDD (hot-electron effects) Self-aligned silicide (spacers) Non-uniform channel doping (short-channel effects)
Silicide n+ poly Oxide spacer p+ poly
n+
p-doping
n+
p+
n-doping
p+ n-well
Shallow-trench isolation p-type substrate Source-drain extension
Paulo Moreira
Technology
29
Process enhancements
Up to eight metal levels in modern processes Copper for metal levels 2 and higher Stacked contacts and vias Chemical Metal Polishing for technologies with several metal levels For analog applications some processes offer:
- capacitors - resistors - bipolar transistors (BiCMOS)
Paulo Moreira
Technology
30
Metalisation
Metal deposited first, followed by photoresist Then metal etched away to leave pattern, gaps filled with SiO2
31
Electroplating Based Damascene Process Sequence
Pre-clean
25 nm
IMP barrier + Copper

10-20 nm
Electroplating
+ 100-200 nm
CMP
Simple, Low-cost, Hybrid, Robust Fill Solution

32
33
34
Example CMOS SRAM Process

0.7u n-channel min gate length, 0.6u Leff 1.0u FOX isolation using SiNiO2 masking 0.25u N+ to P+ spacing Thin epi material to suppress latchup Twin well to suppress parasitic channel through field transistors LDD struct for n & p transistors to suppress hot carrier effects Buried contacts to overlying metal or underlying gates Metal salicide to reduce poly resistivity 2 metals to reduce die area Planarisation after all major process steps
- To reduce step coverage problems
on contact cut fills Large oxide depositions
35
SRAM Application Areas

Main memory in high performance small system Main memory in low power consumption system Simpler and less expensive system if without a cache Battery back-up Battery operated system
36
SRAM Performance vs Application Families
37
Typical Application Scenarios
SRAM MMU BIU ALU CORE L1 16KB L2 256KB
DRAM 64MB I/O PCI ISA
FPU
i586 based PC Hand phone and Cache
38
Market View by Application
39
Overview of SRAM Types

SRAMs
Asynchronous Low Speed Medium Speed High Speed
Synchronous Flow Through / Pipelined Zero Bus Turnaround Double Data Rate Dual Port Interleaved / Linear Burst
Special CAM / Cache Tag FIFO Multiport
40
SRAM Array
SL0
Array Organization
common bit precharge lines
need sense amplifier
SL1
SL2
41
Logic Diagram of a Typical SRAM

A0-AN
CS! WE_L OE_L 2 N words x M bit SRAM
M
Write Enable is usually active low (WE_L) Din and Dout are combined to save pins:
- A new control signal, output enable (OE_L) is needed - WE_L = 0, OE_L = 1
D serves as the data input pin
- WE_L = 1, OE_L = 0
D is the data output pin
- Both WE_L = 1, OE_L = 1

Result is unknown. Dont do that!!!
42
Simple 4x4 SRAM Memory

2 bit width M=2 A1 R=2 N_rows = 2R = 4 C=1 A2 c x M = 4 N_columns = 2 N=R+C=3 Array size = N_rows x N_columns = 16 read precharge enable BL !BL bit line precharge WL[0] WL[1]
->
WL[2] WL[3]
A0 A0!
Column Decoder sense amplifiers
clocking and control ->

WE! , OE!
write circuitry
43
Basic Memory Read Cycle

System selects memory with /CS=L System presents correct address (A0-AN) System turns o/p buffers on with /OE=L System tri-states previous data sources within a permissible time limit (tOLZ or tCLZ) System must wait minimum time of tAA, tAC or tOE to get correct data
44
Basic Memory Write Cycle

System presents correct address (A0-AN) System selects memory with /CS=L System waits a minimum time equal to internal setup time of new addresses (tAS) System enables writing with /WE=L System waits for minimum time to disable o/p driver (twz) System inputs data and waits minimum time (tDW) for data to be written in core, then turns off write (/WE=H)
45
Memory Timing: Definitions

Read Cycle READ Read Access WRITE Write Access Data Valid DATA Data Written Read Access Write Cycle
46
Memory Timing: Approaches

MSB LSB
Address Bus RAS
Row Address Column Address Address Bus
Address Address transition initiates memory operation
CAS
RAS-CAS timing
DRAM Timing Multiplexed Adressing
SRAM Timing Self-timed

47
The system level view of Async SRAMs
48
The system level view of synch SRAMs
49
Typical Async SRAM Timing

A
N
WE_L OE_L Write Timing:
2 N words x M bit SRAM

M
Read Timing: High Z Junk Data Out Read Address Data Out Read Address
D A OE_L WE_L
Data In Write Address
Write Hold Time Write Setup Time
Read Access Time
Read Access Time
50
SRAM Read Timing (typical)

tAA (access time for address): how long it takes to get stable output after a change in address. tACS (access time for chip select): how long it takes to get stable output after CS is asserted. tOE (output enable time): how long it takes for the three-state output buffers to leave the highimpedance state when OE and CS are both asserted. tOZ (output-disable time): how long it takes for the three-state output buffers to enter highimpedance state after OE or CS are negated. tOH (output-hold time): how long the output data remains valid after a change to the address inputs.
51
SRAM Read Timing (typical)

ADDR
stable
stable tAA
stable Max(tAA, tACS)
CS_L
tACS
OE_L
tOH
tAA
DOUT
tOZ
tOE
tOZ
tOE
valid
WE_L = HIGH
valid
valid
52
SRAM Architecture and Read Timings

tOH
tAA
tACS tOZ tOE

53
SRAM write cycle timing
/WE controlled
/CS controlled
54
SRAM Architecture and Write Timings
Setup time = tDW
tDH
Write driver
tWP-tDW
55
SRAM Architecture
56
SRAM Cell Design

Memory array typically needs to store lots of bits
- Need to optimize cell design for area and performance - Peripheral circuits can be complex
Smaller compared to the array (60-70% area in array, 30-40% in periphery)
Memory cell design

- 6T cell full CMOS - 4T cell with high resistance poly load - TFT load cell
57
Anatomy of the SRAM Cell
-> Write:
set bit lines to new data value b = opposite of b raise word line to high sets cell to new state May need to flip old state
Read:
set bit lines high set word line high see which bit line goes low
58
SRAM Cell Operating Principle
Inverter Amplifies Negative gain Slope < 1 in middle Saturates at ends Inverter Pair Amplifies Positive gain Slope > 1 in middle Saturates at ends
59
Bistable Element
Stability Require Vin = V2 Stable at endpoints
recover from pertubation
Metastable in middle
Fall out when perturbed
Ball on Ramp Analogy

60
SRAM Cell technologies

Bipolar ECL : NPN with dual emitter NMOS load A) Enhancement : additional load gate bias B) Depletion : no additional load gate bias
High Load Resistance (4T)
Full CMOS (6T)
Thin Film Transistors
61
6T & 4T cell Implementation
6T Bistable Latch
High resistance poly
4T Bistable Latch
62
Reading a Cell
Icell
DV = Icell * t ----Cb
Sense Amplifier
63
Writing a Cell
0 -> 1
1 -> 0
64
Bistable Element
Stability Require Vin = V2 Stable at endpoints
recover from pertubation
Metastable in middle
Fall out when perturbed
Ball on Ramp Analogy

65
Cell Static Noise Margin

Cell state may be disturbed by
DC
Layout pattern offset Process mismatches
non-uniformity of implantation gate pattern size errors
AC
Alpha particles Crosstalk Voltage supply ripple Thermal noise
SNM = Maximum Value of Vn Without flipping cell state
66
SNM: Butterfly Curves

1
SNM
2
SNM
1
67
SNM for Poly Load Cell
68
6T Cell LayoutN Well

BB+
Connection
VDD
PMOS Pull Up
Q/
Q
NMOS Pull Down
GND SEL
SEL MOSFET Substrate Connection
69
6T SRAM Array Layout
70
Another 6T Cell Layout Stick Diagram

bit T bit T VDD
T T Gnd GND and contact shared with cell to left T
T word
These four contacts shared with (mirrored) cell below
2 Metal Layer Process

71
6T Array Layout (2x2) Stick Diagram

Gnd VDD bit bit bit Gnd bit VDD
word word
VDD
72
6T Cell Full Layout

Transistor sizing
- M2 (pMOS) 4:3 - M1 (nMOS) 6:2 - M3 (nMOS) 4:2
M2
All boundaries shared 38l H x 28l W Reduced cap on bit lines

M1 M3
73
6T Cell Example Layout & Abutment

Vdd T3 T4 Vdd T4 T2 T1 T5 T5 T6 T6 T3
Vdd Vss Vdd T3

T4 T2 T5 T5
T1 T1 Vss Vss
Vss
Vdd T6 Vss
T2 Vss Vss T6
T2
T5
T6 B
B B
4 x 4 array 2 abutment 2x
74
Vdd T4
T3
T1 T1
T3
Vss
T2 Vdd T4
Vss Vdd
6T and 4T Cell Layouts

R1
GND
T4
T5
Vdd
VDD
BIT
Q
T6
R2 BIT!
T3
Q
T1
T4
T2 T3
Word Line
T2
GND WL
T1 BL
BL
75
6T - 4T Cell Comparison
6T cell
- Merits
Faster Better Noise Immunity Low standby current
4T cell
- Merits
Smaller cell, only 4 transistors HR Poly stacked above transistors
- Demerits
Large size due to 6 transistors
- Demerits
Additional process step due to HR poly Poor noise immunity Large standby current Thermal instability
76
Transistor Level View of Core
Precharge
Column Decode
Sense Amp
77
Row Decode
SRAM, Putting it all together
2n rows, 2m * k columns
n + m address lines, k bits data width

78
Hierarchical Array Architecture

Subblocks 1 / output bit
Row Address Column Address Block Address
Select 1 column / subblock
Global Data Bus Control Circuitry Block Selector Global Amplifier/Driver I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings
1 sense amp / subblock
79
Standalone SRAM Floorplan Example
80
Divided bit-line structure
81
SRAM Partitioning Partitioned Bitline
82
SRAM Partitioning Divided Wordline Arch
83
Partioning summary
Partioning involves a trade off between area, power and speed For high speed designs, use short blocks(e.g 64 rows x 128 columns )
- Keep local bitline heights small
For low power designs use tall narrow blocks (e.g 256 rows x 64 columns)
- Keep the number of columns same as the access width to minimize wasted power
84
Redundancy
Redundant rows Fuse : Bank Redundant columns Memory Array Row Address
Row Decoder
Column Address
85
Column Decoder
Periphery
Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry

86
Asynchronous & Synchronous SRAMs
87
Address Transition Detection Provides Clock for Asynch RAMs

VDD
A0
DELAY td
ATD
ATD
A1
DELAY td
... AN-1 DELAY td
88
Collection of 2R complex logic gates organized in a regular, dense fashion

(N)AND decoder 9->512
WL(0) /= !A8!A7!A6!A5!A4!A3!A2!A1!A0 WL(511) /= A8A7A6A5A4A3A2A1A0
Row Decoders
NOR decoder 9->512

WL(0) = !(A8+A7+A6+A5+A4+A3+A2+A1+A0) WL(511) = !(!A8+!A7+!A6+!A5+!A4+!A3+!A2+!A1+!A0)
89
A NAND decoder using 2-input pre-decoders
WL 1
WL 0
A0A1 A0 A1 A0 A1 A0A 1
A 2A3 A2 A3 A2 A3 A2 A3
A1 A 0
A0
A1
A3 A2
A2
A3
Splitting decoder into two or more logic layers produces a faster and cheaper implementation
90
Row Decoders (contd)

A0/ A1/ A0 A1/ A0/ A1 A0 A1 A2/ A3/ A2 A3/ A2/ A3 A2 A3
R0/ R1/ R2/
and so forth
A0
A1
A2
A3
91
Dynamic Decoders
Precharge devices GND GND VDD WL 3 VDD WL 2
WL 3
WL 2 WL 1 WL 0 VD D
VDD
VDD
WL 1
WL 0 A0 A0 A1 A1
A0
A0
A1
A1
Dynamic 2-to-4 NOR decoder
2-to-4 MOS dynamic NAND Decoder
Propagation delay is primary concern

92
Dynamic NOR Row Decoder

Vdd
WL0 WL1 WL2 WL3 A0 Precharge/ !A0 A1 !A1
93
Dynamic NAND Row Decoder

WL0 WL1 WL2 WL3
!A0
A0
!A1
A1
Precharge/
Back
94
Decoders
n:2n decoder consists of 2n n-input AND gates
- One needed for each row of memory - Build AND from NAND or NOR gates
A1
A0
Make devices on address line minimal size Scale devices on decoder O/P to drive word lines Static CMOS Pseudo-nMOS
A1 A0
1/2
word0 word1 word2 word3
4 2
16 8
word
1 A1 A0
1 1 1
8 4
word0
word
word1 word2 word3
A0
A1 1 1
95
Decoder Layout
Decoders must be pitch-matched to SRAM cell
- Requires very skinny gates
A3 VDD A3 A2 A2 A1 A1 A0 A0
word
GND NAND gate buffer inverter
96
Large Decoders
For n > 4, NAND gates become slow
- Break large gates into multiple smaller gates
A3 A2 A1 A0
word0
word1
word2
word3
word15
97
Predecoding
- Group address bits in predecoder - Saves area - Same path effort
A3 A2 A1
A0
predecoders 1 of 4 hot predecoded lines word0 word1
word2 word3
word15
98
Column Circuitry
Some circuitry is required for each column
- Bitline conditioning - Sense amplifiers - Column multiplexing
Need hazard-free reading & writing of RAM cell Column decoder drives a MUX the two are often merged
99
Typical Column Access
100
Pass Transistor Based Column Decoder

BL3 !BL3 BL2 !BL2 BL1 !BL1 BL0 !BL0
A1
2 input NOR decoder
S3 S2 S1
A0
S0
Data !Data
Advantage: speed since there is only one extra transistor in the signal path Disadvantage: large transistor count
101
Tree Decoder Mux

Column MUX can use pass transistors
- Use nMOS only, precharge outputs
One design is to use k series transistors for 2k:1 mux

- No external decoder logic needed
B0 B1 A0 A0 A1 A1 A2 A2 Y Y
102
B2 B3
B4 B5
B6 B7
B0 B1
B2 B3
B4 B5
B6 B7
to sense amps and write circuits
Bitline Conditioning
Precharge bitlines high before reads
bit bit_b
Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b
103
Bit Line Precharging

Static Pull-up Precharge Clocked Precharge
clock
BL
!BL
BL
!BL
equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line
104
Sense Amplifier: Why?

Bit line cap significant for large array
- If each cell contributes 2fF,
for 256 cells, 512fF plus wire cap
Cell pull down
Xtor resistance
- Pull-down resistance is about 15K - RC = 7.5ns! (assuming DV = Vdd)
RCDV t Vdd
Cell current
Cannot easily change R, C, or Vdd, but can change DV i.e. smallest sensed voltage
- Can reliably sense DV as small as <50mV
105
Sense Amplifiers
D t p = Cb V ---------------I cell large make D V as small as possible
small
Idea: Use Sense Amplifer
small transition
input
s.a.
output
106
Differential Sensing - SRAM

V DD V DD M4 M2 M5 x x SE y x
V DD
PC
VDD
x
y M3 M1 SE BL
BL
EQ
WL i (b) Doubled-ended Current Mirror Amplifier SRAM cell i Diff. Sense x x Amp y y D D (a) SRAM sensing scheme. y x SE (c) Cross-Coupled Amplifier
107
V DD y x
Latch-Based Sense Amplifier

EQ BL VDD SE BL
SE
Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.
108
Sense Amplifier
bit word bit
sense clk
isolation transistor
regenerative amplifier
109
Sense Amp Waveforms

1ns / div
bit
200mV
bit
wordline
wordline
begin precharging bit lines

BIT BIT
2.5V
sense clk
sense clk
110
Write Driver Circuits
111
Twisted Bitlines
Sense amplifiers also amplify noise
- Coupling noise is severe in modern processes - Try to couple equally onto bit and bit_b - Done by twisting bitlines
b0 b0_b b1 b1_b b2 b2_b b3 b3_b
112
Transposed-Bitline Architecture
BL BL BL BL" (a) Straightforward bitline routing. BL BL BL BL" (b) Transposed bitline architecture.
113
Ccross SA
Ccross SA
114
DRAM in a nutshell
Based on capacitive (non-regenerative) storage Highest density (Gb/cm2) Large external memory (Gb) or embedded DRAM for image, graphics, multimedia Needs periodic refresh -> overhead, slower
115
116
Classical DRAM Organization (square) bit (data) lines

r o w d e c o d e r RAM Cell Array
Each intersection represents a 1-T DRAM Cell
word (row) select
row address
Column Selector & I/O Circuits
Column Address
data
Row and Column Address together:

- Select 1 bit a time
117
DRAM logical organization (4 Mbit)
118
DRAM physical organization (4 Mbit,x16)
119
Memory Systems
address
n
DRAM Controller n/2 Memory Timing Controller DRAM 2^n x 1 chip w Bus Drivers
Tc = Tcycle + Tcontroller + Tdriver
120
Logic Diagram of a Typical DRAM

RAS_L CAS_L WE_L OE_L
A
9
256K x 8 DRAM
Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D):
- WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin D is the data output pin
- WE_L is disasserted (High), OE_L is asserted (Low)
Row and column addresses share the same pins (A)

- RAS_L goes low: Pins A are latched in as row address - CAS_L goes low: Pins A are latched in as column address - RAS/CAS edge-sensitive
121
DRAM Operations
Write
- Charge bitline HIGH or LOW and set wordline HIGH
Read
- Bit line is precharged to a voltage halfway between HIGH and LOW, and then the word line is set HIGH. - Depending on the charge in the cap, the precharged bitline is pulled slightly higher or lower. - Sense Amp Detects change
Word Line
. . .
Bit Line
Explains why Cap cant shrink

- Need to sufficiently drive bitline - Increase density => increase parasitic capacitance
Sense Amp
122
DRAM Access
1M DRAM = 1024 x 1024 array of bits
10 row address bits arrive first Row Access Strobe (RAS)
1024 bits are read out 10 column address bits arrive next Column Access Strobe (CAS) Column decoder
Subset of bits returned to CPU
123
DRAM Read Timing

Every DRAM access begins at:
- The assertion of the RAS_L - 2 ways to read: early or late v. CAS
DRAM Read Cycle Time
RAS_L CAS_L A
9
RAS_L
CAS_L
WE_L
OE_L
256K x 8 DRAM
Row Address
Col Address
Junk
Row Address
Col Address
Junk
WE_L OE_L
High Z
Junk
Read Access Time
Data Out
High Z
Output Enable Delay
Data Out
Early Read Cycle: OE_L asserted before CAS_L
Late Read Cycle: OE_L asserted after CAS_L

124
DRAM Write Timing

Every DRAM access begins at:
- The assertion of the RAS_L - 2 ways to write: early or late v. CAS
DRAM WR Cycle Time
RAS_L CAS_L
RAS_L
CAS_L
WE_L
OE_L
A
9
256K x 8 DRAM
Row Address
Col Address
Junk
Row Address
Col Address
Junk
OE_L WE_L
Junk
Data In
WR Access Time
Junk
Data In
WR Access Time
Junk
Early Wr Cycle: WE_L asserted before CAS_L
Late Wr Cycle: WE_L asserted after CAS_L

125
DRAM Performance
A 60 ns (tRAC) DRAM can
- perform a row access only every 110 ns (tRC) - perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC).
In practice, external address delays and turning around buses make it 40 to 50 ns
These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead.
- Drive parallel DRAMs, external memory controller, bus to turn around, SIMM module, pins - 180 ns to 250 ns latency from processor to memory is good for a 60 ns (tRAC) DRAM
126
1-Transistor Memory Cell (DRAM)

Write:
- 1. Drive bit line - 2.. Select row
row select
Read:
- 1. Precharge bit line - 2.. Select row - 3. Cell and bit line share charges
Very small voltage changes on the bit line Can detect changes of ~1 million electrons bit
- 4. Sense (fancy sense amp) - 5. Write: restore the value
Refresh
- 1. Just do a dummy read to every cell.
127
DRAM architecture
128
Cell read: refresh is the art
DV V ' BL -VBL (VSN
Cs - VBL ) C s Cb
129
Sense Amplifier
130
131
DRAM technological requirements

Unlike SRAM : large Cb must be charged by small sense FF. This is slow. - Make Cb small: backbias junction cap., limit blocksize, - Backbias generator required. Triple well. Prevent threshold loss in wl pass: VG > Vccs+VTn - Requires another voltage generator on chip Requires VTnwl> Vtnlogic and thus thicker oxide than logic - Better dynamic data retention as there is less subthreshold loss. - DRAM Process unlike Logic process! Must create large Cs (10..30fF) in smallest possible area - (-> 2 poly-> trench cap -> stacked cap)
132
Refreshing Overhead
Leakage : - junction leakage exponential with temp! - 25 msec @ 800 C - Decreases noise margin, destroys info All columns in a selected row are refreshed when read - Count through all row addresses once per 3 msec. (no write possible then) Overhead @ 10nsec read time for 8192*8192=64Mb: - 8192*1e-8/3e-3= 2.7% Requires additional refresh counter and I/O control
133
Dummy cells
Bitline
Bitline
Vdd/2
Vdd/2 precharge precharge
Wordline
134
Alternative Sensing Strategy Decreasing Cdummy

Convert to differential sense
- Create a reference in an identical structure Data Col BL
Needs
- A method of generating signal swing of bit line
Operation:
- Dummy cell is C - active wordline and dummy wordline on opposite sides of sense amp. - Amplify difference
Dummy Col BL BL
BL Dummy Col BL
DV"1/ 0"
1 1 Cb C s
Vdd 2
So Small Cs small swing, Large Cb small swing
Data Col BL
135
Overhead of fabricating C/2
Alternative Sensing Strategy Increasing Cbitline on Dummy side
Double Bitline
Data
SA outputs D and D pre-charged to VDD through Q1, Q2 (Pr=1)
Dummy
reference capacitor, Cdummy, connected to a pair of matched bit lines and is at 0V (Pr=0) parasitic cap Cp2 on BL is ~ 2 Cp1 on BL, sets up a differential voltage LHS vs. RHS due to rise time difference SA outputs (D, D) become charged, with a small difference LHS vs. RHS Regenerative Action of Latch
136
DRAM Memory Systems

address
n
DRAM Controller n/2 Memory Timing Controller DRAM 2^n x 1 chip w Bus Drivers
Tc = Tcycle + Tcontroller + Tdriver
137
DRAM Performance
Cycle Time Access Time Time
DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time

- 2:1; why?
DRAM (Read/Write) Cycle Time :

- How frequent can you initiate an access?
DRAM (Read/Write) Access Time:

- How quickly will you get what you want once you initiate an access?
DRAM Bandwidth Limitation:

- Limited by Cycle Time
138
Fast Page Mode Operation

Fast Page Mode DRAM
- N x M SRAM to save a row
N rows Column Address N cols
After a row is read into the register
DRAM
Row Address
- Only CAS is needed to access other M-bit blocks on that row N x M SRAM - RAS_L remains asserted while M-bit Output CAS_L is toggled
1st M-bit Access RAS_L CAS_L A Row Address Col Address Col Address Col Address 2nd M-bit 3rd M-bit
M bits 4th M-bit
Col Address
139
Page Mode DRAM Bandwidth Example

Page Mode DRAM Example:
- 16 bits x 1M DRAM chips (4 nos) in 64-bit module (8 MB module) - 60 ns RAS+CAS access time; 25 ns CAS access time - Latency to first access=60 ns Latency to subsequent accesses=25 ns - 110 ns read/write cycle time; 40 ns page mode access time ; 256 words (64 bits each) per page
Bandwidth takes into account 110 ns first cycle, 40 ns for CAS cycles Bandwidth for one word = 8 bytes / 110 ns = 69.35 MB/sec Bandwidth for two words = 16 bytes / (110+40 ns) = 101.73 MB/sec Peak bandwidth = 8 bytes / 40 ns = 190.73 MB/sec Maximum sustained bandwidth = (256 words * 8 bytes) / ( 110ns + 256*40ns) = 188.71 MB/sec
140
4 Transistor Dynamic Memory

Remove the PMOS/resistors from the SRAM memory cell Value stored on the drain of M1 and M2 But it is held there only by the capacitance on those nodes Leakage and soft-errors may destroy value
141
142
First 1T DRAM (4K Density)

Texas Instruments TMS4030 introduced 1973 NMOS, 1M1P, TTL I/O 1T Cell, Open Bit Line, Differential Sense Amp Vdd=12v, Vcc=5v, Vbb=-3/-5v (Vss=0v)
143
16k DRAM (Double Poly Cell)

MostekMK4116, introduced 1977 Address multiplex Page mode NMOS, 2P1M Vdd=12v, Vcc=5v, Vbb=5v (Vss=0v) Vdd-Vt precharge, dynamic sensing
144
64K DRAM
Internal Vbbgenerator Boosted Wordline and Active Restore
- eliminate Vtloss for 1
x4 pinout
145
256K DRAM
Folded bitline architecture
- Common mode noise to coupling to B/Ls - Easy Y-access
NMOS 2P1M
- poly 1 plate - poly 2 (polycide) -gate, W/L - metal -B/L
redundancy
146
1M DRAM
Triple poly Planar cell, 3P1M
poly1 -gate, W/L poly2 plate poly3 (polycide) -B/L metal -W/L strap
Vdd/2 bitline reference, Vdd/2 cell plate
147
On-chip Voltage Generators

Power supplies
- for logic and memory
precharge voltage
- e.g VDD/2 for DRAM Bitline .
backgate bias
- reduce leakage
WL select overdrive (DRAM)
148
Charge Pump Operating Principle

Vin ~ +Vin
+Vin
Charge Phase
Vin
dV +Vin dV Vo
Discharge Phase
Vin = dV Vin + dV +Vo Vo = 2*Vin + 2*dV ~ 2*Vin
149
Voltage Booster for WL

Cf CL
d Vhi Vhi dV Vcf(0) ~ Vhi VGG=Vhi + VGG ~ Vhi + Vhi CL Cf Vcf ~ Vhi
150
Backgate bias generation
Use charge pump Backgate bias: Increases Vt -> reduces leakage reduces Cj of nMOST when applied to p-well (triple well process!),
151 smaller Cj -> smaller Cb larger readout V
Vdd / 2 Generation
2v
1v 1.5v 0.5v ~1v 0.5v 1 v
0.5v
1v
Vtn = |Vtp|~0.5v uN = 2 uP
152
4M DRAM
3D stacked or trench cell CMOS 4P1M x16 introduced Self Refresh Build cell in vertical dimension -shrink area while maintaining 30fF cell capacitance
153
154
Stacked-Capacitor Cells
Poly plate
Hitachi 64Mbit DRAM Cross Section Samsung 64Mbit DRAM Cross Section
155
Evolution of DRAM cell structures
156
Buried Strap Trench Cell
157
Process Flow of BEST Cell DRAM

Array Buried N-Well Storage Trench Formation Node Dielectric (6nm TEQ.) Buried Strap Formation Shallow Trench Isolation Formation N- and P-Well Implants Gate Oxidation (8nm) Gate Conductor (N+ poly / WSi) Junction Implants Insulator Deposition and Planarization Contact formation Bitline (Metal 0) formation Via 1 / Metal 1 formation Via 2 / Metal 2 formation
Shallow Trench Isolation -> Replaces LOCOS isolation -> saves area by eliminating Birds Beak
158
BEST cell Dimensions
Deep Trench etch with very high aspect ratio

159
256K DRAM
Folded bitline architecture
- Common mode noise to coupling to B/Ls - Easy Y-access
NMOS 2P1M
- poly 1 plate - poly 2 (polycide) -gate, W/L - metal -B/L
redundancy
160
161
162
Transposed-Bitline Architecture
BL BL BL BL" (a) Straightforward bitline routing. BL BL BL BL" (b) Transposed bitline architecture.
163
Ccross SA
Ccross SA
Cell Array and Circuits

1 Transistor 1 Capacitor Cell
- Array Example
Major Circuits
- Sense amplifier - Dynamic Row Decoder - Wordline Driver
Other interesting circuits

Data bus amplifier Voltage Regulator Reference generator Redundancy technique High speed I/O circuits
164
Standard DRAM Array Design Example
165
WL direction (row)
Column predecode
64K cells (256x256) 1M cells = 64Kx16
Global WL decode + drivers
Local WL Decode
166
BL direction (col)
DRAM Array Example (contd)

2048
256x256
64
256
512K Array Nmat=16 ( 256 WL x 2048 SA)

Interleaved SA & Hierarchical Row Decoder/Driver (shared bit lines are not shown)
167
168
169
170
Standard DRAM Design Feature

Heavy dependence on technology The row circuits are fully different from SRAM. Almost always analogue circuit design CAD:
- Spice-like circuits simulator - Fully handcrafted layout
171

Semiconductor Memory Design (Sram & Dram) : Kaushik Saha

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semiconductor Memory Design (Sram & Dram) : Kaushik Saha

Uploaded by

Copyright:

Available Formats

Semiconductor Memory Design (SRAM & DRAM)

Kaushik Saha Contact: kaushik.saha@st.com, mobile-98110-64398

Understanding the Memory Trade

Demand Memory market

EPROM EEPROM FLASH

Mask Programmed PROM (Fuse Programmed)

SRAM (Static) DRAM (Dynamic)

FIFO (Queue) LIFO (Stack) SR (Shift Register) CAM (Content Addressable)

Feature Comparison Between Memory Types

Memory selection : cost and performance

Large memory with cost pressure :

Large memory with very fast speed :

Back-up main for no data loss when power failure

Trends in Storage Technology

The Need for Innovation in Memory Industry

- Yet margins must the maintained

Memory Hierarchy of a Modern Computer System

Tertiary Storage (Tape)

Size (bytes): 100s

10,000,000s 10,000,000,000s (10s ms) (10s sec) Gs Ts

How is the hierarchy managed?

cache <-> memory

memory <-> disks

Memory Hierarchy Technology

- SRAM: Static Random Access Memory

Not-so-random Access Technology:

Main Memory Background

- Bandwidth: I/O & Large Block Miss Penalty (L2)

Main Memory is DRAM : Dynamic Random Access Memory

Cache uses SRAM : Static Random Access Memory

Write Enable (/WE)

Chip Select (/CS)

Output Enable (/OE)

Memory - Basic Organization

Single Storage Cell

M bits per word N select lines

difficult to place and route

M bit output word

Memory - Real Array of N x K words Organization

C of M bit words C of M bit words

row 0 ------------- rows R-----------row 1 row 2

Log2R Address Lines

row N-2 row N-1

Log2C Address Lines

M bit data word

Array-Structured Memory Architecture

M.2K Sense Amplifiers / Drivers A0 AK -1 Input-Output (M bits)

Amplify swing to rail-to-rail amplitude Selects appropriate word

Hierarchical Memory Architecture

Memory - Organization and Cell Design Issues

Semiconductor Manufacturing Process

Basic Micro Technology

Semiconductor Manufacturing Process

Excimer Laser DUV & EUV lithography

NovaLine Laser Lambda Physik

Dry or Plasma Etching

Dry or Plasma Etching

Dry or Plasma Etching

Dry Etching Reactive Ion Etching- RIE

CMOS fabrication sequence

patterned active area

Field oxide (FOX)

LOCOS: Local Oxidation