You are on page 1of 171

Semiconductor Memory Design (SRAM & DRAM)

Kaushik Saha Contact: kaushik.saha@st.com, mobile-98110-64398

STMicroelectronics

Understanding the Memory Trade


The memory market is the most
- Volatile - Cost Competitive - Innovative in the IC trade

Demand Memory market

Supply

Technical Change
2

Classification of Memories
RWMemory
Random Access Non-Random Access

NVRWM

ROM

EPROM EEPROM FLASH

Mask Programmed PROM (Fuse Programmed)

SRAM (Static) DRAM (Dynamic)

FIFO (Queue) LIFO (Stack) SR (Shift Register) CAM (Content Addressable)

Feature Comparison Between Memory Types

Memory selection : cost and performance


DRAM, EPROM
- Merit : cheap, high density - Demerit : low speed, high power

SRAM
- Merit : high speed or low power - Demerit : expensive, low density

Large memory with cost pressure :


- DRAM

Large memory with very fast speed :


- SRAM or - DRAM main + SRAM cache

Back-up main for no data loss when power failure


- SRAM with battery back-up - EEPROM
5

Trends in Storage Technology


Generation

Increasing die size factor 1.5 per generation Combined with reducing cell size factor 2.6 per generation

*MB=Mbytes

The Need for Innovation in Memory Industry


The learning rate (viz. the constant b) is the highest for the memory industry
- Because prices drop most steeply among all ICs
Due to the nature of demand + supply

- Yet margins must the maintained

Techniques must be applied to reduce production cost Often, memories are the launch vehicles for a technology node
- Leads to volatile nature of prices
7

Memory Hierarchy of a Modern Computer System


By taking advantage of the principle of locality:
- Present the user with as much memory as is available in the cheapest technology. - Provide access at the speed offered by the fastest technology.
Processor Control Second Level Cache (SRAM) Main Memory (DRAM) Secondary Storage (Disk)

Tertiary Storage (Tape)

On-Chip Cache

Registers

Datapath

Speed (ns):

1s

10s Ks

100s Ms

Size (bytes): 100s

10,000,000s 10,000,000,000s (10s ms) (10s sec) Gs Ts


8

How is the hierarchy managed?


Registers <-> Memory
- by compiler (programmer?)

cache <-> memory


- by the hardware

memory <-> disks


- by the hardware and operating system (virtual memory) - by the programmer (files)

Memory Hierarchy Technology


Random Access:
- Random is good: access time is the same for all locations - DRAM: Dynamic Random Access Memory
High density, low power, cheap, slow Dynamic: need to be refreshed regularly Low density, high power, expensive, fast Static: content will last forever(until lose power)

- SRAM: Static Random Access Memory

Not-so-random Access Technology:


- Access time varies from location to location and from time to time - Examples: Disk, CDROM

10

Main Memory Background


Performance of Main Memory:
- Latency: Cache Miss Penalty
Access Time: time between request and word arrives Cycle Time: time between requests

- Bandwidth: I/O & Large Block Miss Penalty (L2)

Main Memory is DRAM : Dynamic Random Access Memory


- Dynamic since needs to be refreshed periodically Addresses divided into 2 halves (Memory as a 2D matrix):
RAS or Row Access Strobe CAS or Column Access Strobe

Cache uses SRAM : Static Random Access Memory


- No refresh (6 transistors/bit vs. 1 transistor) Size: DRAM/SRAM 4-8 Cost/Cycle time: SRAM/DRAM 8-16
11

Memory Interfaces
Address i/ps
- Maybe latched with strobe signals

Write Enable (/WE)


- To choose between read / write - To control writing of new data to memory

Chip Select (/CS)


- To choose between memory chips / banks on system

Output Enable (/OE)


- To control o/p buffer in read circuitry

Data i/os
- For large memories data i/p and o/p muxed on same pins,
selected with /WE

Refresh signals
12

Memory - Basic Organization


S0
Word 0

N words
S1
S2
Word 1

Word 2

Single Storage Cell

M bits per word N select lines

1:N decoder
very inefficient design
SN-2
Word N-2

difficult to place and route

SN-1

Word N-1

M bit output word

13

Memory - Real Array of N x K words Organization


------------- columns ------------ KxM

S0

C of M bit words C of M bit words

row 0 ------------- rows R-----------row 1 row 2

Log2R Address Lines

Row Decoder

C of M bit words

SR-1

C of M bit words
- - - - KxM bits - - - -

row N-2 row N-1

Log2C Address Lines

C of M bit words

Column Select

N=R*C

M bit data word


14

Array-Structured Memory Architecture


Problem: ASPECT RATIO or HEIGHT >> WIDTH
2L-K AK AK+1 AL-1 Bit Line Storage Cell

Row Decoder

Word Line

M.2K Sense Amplifiers / Drivers A0 AK -1 Input-Output (M bits)


15

Amplify swing to rail-to-rail amplitude Selects appropriate word

Column Decoder

Hierarchical Memory Architecture


Row Address Column Address Block Address

Global Data Bus Control Circuitry Block Selector Global Amplifier/Driver I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings
16

Memory - Organization and Cell Design Issues


aspect ratio (height : width) should be relative square - Row / Column organisation (matrix) - R = log2(N_rows); C = log2(N_columns) - R + C = N (N_address_bits) number of rows should be power of 2 - number of bits in a row sense amplifiers to amplify the voltage from each memory cell 1 -> 2R row decoder 1 -> 2C column decoder - implement M of the column decoders (M bits, one per bit) M = output word width
17

Semiconductor Manufacturing Process

18

Basic Micro Technology

19

Semiconductor Manufacturing Process


Fundamental Processing Steps 1.Silicon Manufacturing
a) Czochralski method. b) Wafer Manufacturing c) Crystal structure

2.Photolithography
a) Photoresists b) Photomask and Reticles c) Patterning

20

Lithography Requirements

21

Excimer Laser DUV & EUV lithography

Power o/p

Pulse Rate

NovaLine Laser Lambda Physik


22

Dry or Plasma Etching

23

Dry or Plasma Etching

24

Dry or Plasma Etching


Combination of chemical and physical etching Reactive Ion Etching (RIE)
Directional etching due to ion assistance. In RIE processes the wafers sit on the powered electrode. This placement sets up a negative bias on the wafer which accelerates positively charge ions toward the surface. These ions enhance the chemical etching mechanisms and allow anisotropic etching. Wet etches are simpler, but dry etches provide better line width control since it is anisotropic.
25

Dry Etching Reactive Ion Etching- RIE

26

CMOS fabrication sequence


4.2 Local oxidation of silicon (LOCOS)
The photoresist mask is removed The SiO2/SiN layers will now act as masks The thick field oxide is then grown by: exposing the surface of the wafer to a flow of oxygen-rich gas The oxide grows in both the vertical and lateral directions This results in a active area smaller than patterned

patterned active area

Field oxide (FOX)

n-well
active area after LOCOS

p-type
Paulo Moreira

Technology

27

LOCOS: Local Oxidation

28

Advanced CMOS processes


Shallow trench isolation n+ and p+-doped polysilicon gates (low threshold) source-drain extensions LDD (hot-electron effects) Self-aligned silicide (spacers) Non-uniform channel doping (short-channel effects)
Silicide n+ poly Oxide spacer p+ poly

n+

p-doping

n+

p+

n-doping

p+ n-well

Shallow-trench isolation p-type substrate Source-drain extension

Paulo Moreira

Technology

29

Process enhancements
Up to eight metal levels in modern processes Copper for metal levels 2 and higher Stacked contacts and vias Chemical Metal Polishing for technologies with several metal levels For analog applications some processes offer:
- capacitors - resistors - bipolar transistors (BiCMOS)

Paulo Moreira

Technology

30

Metalisation

Metal deposited first, followed by photoresist Then metal etched away to leave pattern, gaps filled with SiO2
31

Electroplating Based Damascene Process Sequence

Pre-clean
25 nm

IMP barrier + Copper


10-20 nm

Electroplating
+ 100-200 nm

CMP

Simple, Low-cost, Hybrid, Robust Fill Solution


32

33

34

Example CMOS SRAM Process


0.7u n-channel min gate length, 0.6u Leff 1.0u FOX isolation using SiNiO2 masking 0.25u N+ to P+ spacing Thin epi material to suppress latchup Twin well to suppress parasitic channel through field transistors LDD struct for n & p transistors to suppress hot carrier effects Buried contacts to overlying metal or underlying gates Metal salicide to reduce poly resistivity 2 metals to reduce die area Planarisation after all major process steps
- To reduce step coverage problems
on contact cut fills Large oxide depositions

35

SRAM Application Areas


Main memory in high performance small system Main memory in low power consumption system Simpler and less expensive system if without a cache Battery back-up Battery operated system

36

SRAM Performance vs Application Families

37

Typical Application Scenarios

SRAM MMU BIU ALU CORE L1 16KB L2 256KB

DRAM 64MB I/O PCI ISA

FPU

i586 based PC Hand phone and Cache

38

Market View by Application

39

Overview of SRAM Types


SRAMs

Asynchronous Low Speed Medium Speed High Speed

Synchronous Flow Through / Pipelined Zero Bus Turnaround Double Data Rate Dual Port Interleaved / Linear Burst

Special CAM / Cache Tag FIFO Multiport

40

SRAM Array

SL0

Array Organization
common bit precharge lines
need sense amplifier

SL1

SL2

41

Logic Diagram of a Typical SRAM


A0-AN
CS! WE_L OE_L 2 N words x M bit SRAM
M

Write Enable is usually active low (WE_L) Din and Dout are combined to save pins:
- A new control signal, output enable (OE_L) is needed - WE_L = 0, OE_L = 1
D serves as the data input pin

- WE_L = 1, OE_L = 0
D is the data output pin

- Both WE_L = 1, OE_L = 1


Result is unknown. Dont do that!!!
42

Simple 4x4 SRAM Memory


2 bit width M=2 A1 R=2 N_rows = 2R = 4 C=1 A2 c x M = 4 N_columns = 2 N=R+C=3 Array size = N_rows x N_columns = 16 read precharge enable BL !BL bit line precharge WL[0] WL[1]

->

WL[2] WL[3]

A0 A0!

Column Decoder sense amplifiers

clocking and control ->


WE! , OE!

write circuitry

43

Basic Memory Read Cycle


System selects memory with /CS=L System presents correct address (A0-AN) System turns o/p buffers on with /OE=L System tri-states previous data sources within a permissible time limit (tOLZ or tCLZ) System must wait minimum time of tAA, tAC or tOE to get correct data

44

Basic Memory Write Cycle


System presents correct address (A0-AN) System selects memory with /CS=L System waits a minimum time equal to internal setup time of new addresses (tAS) System enables writing with /WE=L System waits for minimum time to disable o/p driver (twz) System inputs data and waits minimum time (tDW) for data to be written in core, then turns off write (/WE=H)

45

Memory Timing: Definitions


Read Cycle READ Read Access WRITE Write Access Data Valid DATA Data Written Read Access Write Cycle

46

Memory Timing: Approaches


MSB LSB

Address Bus RAS

Row Address Column Address Address Bus

Address Address transition initiates memory operation

CAS

RAS-CAS timing

DRAM Timing Multiplexed Adressing

SRAM Timing Self-timed


47

The system level view of Async SRAMs

48

The system level view of synch SRAMs

49

Typical Async SRAM Timing


A
N

WE_L OE_L Write Timing:

2 N words x M bit SRAM


M

Read Timing: High Z Junk Data Out Read Address Data Out Read Address

D A OE_L WE_L

Data In Write Address

Write Hold Time Write Setup Time

Read Access Time

Read Access Time

50

SRAM Read Timing (typical)


tAA (access time for address): how long it takes to get stable output after a change in address. tACS (access time for chip select): how long it takes to get stable output after CS is asserted. tOE (output enable time): how long it takes for the three-state output buffers to leave the highimpedance state when OE and CS are both asserted. tOZ (output-disable time): how long it takes for the three-state output buffers to enter highimpedance state after OE or CS are negated. tOH (output-hold time): how long the output data remains valid after a change to the address inputs.
51

SRAM Read Timing (typical)


ADDR

stable

stable tAA

stable Max(tAA, tACS)

CS_L

tACS
OE_L

tOH

tAA
DOUT

tOZ

tOE

tOZ

tOE

valid
WE_L = HIGH

valid

valid

52

SRAM Architecture and Read Timings


tOH

tAA

tACS tOZ tOE


53

SRAM write cycle timing

/WE controlled

/CS controlled

54

SRAM Architecture and Write Timings

Setup time = tDW

tDH
Write driver

tWP-tDW

55

SRAM Architecture

56

SRAM Cell Design


Memory array typically needs to store lots of bits
- Need to optimize cell design for area and performance - Peripheral circuits can be complex
Smaller compared to the array (60-70% area in array, 30-40% in periphery)

Memory cell design


- 6T cell full CMOS - 4T cell with high resistance poly load - TFT load cell

57

Anatomy of the SRAM Cell

-> Write:
set bit lines to new data value b = opposite of b raise word line to high sets cell to new state May need to flip old state

Read:
set bit lines high set word line high see which bit line goes low
58

SRAM Cell Operating Principle

Inverter Amplifies Negative gain Slope < 1 in middle Saturates at ends Inverter Pair Amplifies Positive gain Slope > 1 in middle Saturates at ends
59

Bistable Element
Stability Require Vin = V2 Stable at endpoints
recover from pertubation

Metastable in middle
Fall out when perturbed

Ball on Ramp Analogy


60

SRAM Cell technologies


Bipolar ECL : NPN with dual emitter NMOS load A) Enhancement : additional load gate bias B) Depletion : no additional load gate bias

High Load Resistance (4T)

Full CMOS (6T)

Thin Film Transistors

61

6T & 4T cell Implementation

6T Bistable Latch
High resistance poly

4T Bistable Latch

62

Reading a Cell

Icell

DV = Icell * t ----Cb
Sense Amplifier
63

Writing a Cell

0 -> 1

1 -> 0

64

Bistable Element
Stability Require Vin = V2 Stable at endpoints
recover from pertubation

Metastable in middle
Fall out when perturbed

Ball on Ramp Analogy


65

Cell Static Noise Margin


Cell state may be disturbed by
DC
Layout pattern offset Process mismatches
non-uniformity of implantation gate pattern size errors

AC
Alpha particles Crosstalk Voltage supply ripple Thermal noise
SNM = Maximum Value of Vn Without flipping cell state

66

SNM: Butterfly Curves


1

SNM
2

SNM
1

67

SNM for Poly Load Cell

68

6T Cell LayoutN Well


BB+
Connection

VDD
PMOS Pull Up

Q/

Q
NMOS Pull Down

GND SEL
SEL MOSFET Substrate Connection

69

6T SRAM Array Layout

70

Another 6T Cell Layout Stick Diagram


bit T bit T VDD

T T Gnd GND and contact shared with cell to left T

T word

These four contacts shared with (mirrored) cell below

2 Metal Layer Process


71

6T Array Layout (2x2) Stick Diagram


Gnd VDD bit bit bit Gnd bit VDD

word word

VDD
72

6T Cell Full Layout


Transistor sizing
- M2 (pMOS) 4:3 - M1 (nMOS) 6:2 - M3 (nMOS) 4:2

M2

All boundaries shared 38l H x 28l W Reduced cap on bit lines


M1 M3
73

6T Cell Example Layout & Abutment


Vdd T3 T4 Vdd T4 T2 T1 T5 T5 T6 T6 T3

Vdd Vss Vdd T3


T4 T2 T5 T5

T1 T1 Vss Vss

Vss

Vdd T6 Vss
T2 Vss Vss T6

T2

T5

T6 B

B B

4 x 4 array 2 abutment 2x
74

Vdd T4

T3

T1 T1
T3

Vss

T2 Vdd T4

Vss Vdd

6T and 4T Cell Layouts


R1

GND
T4
T5

Vdd
VDD

BIT
Q

T6

R2 BIT!

T3
Q

T1
T4

T2 T3

Word Line
T2

GND WL

T1 BL

BL
75

6T - 4T Cell Comparison
6T cell
- Merits
Faster Better Noise Immunity Low standby current

4T cell
- Merits
Smaller cell, only 4 transistors HR Poly stacked above transistors

- Demerits
Large size due to 6 transistors

- Demerits
Additional process step due to HR poly Poor noise immunity Large standby current Thermal instability
76

Transistor Level View of Core

Precharge

Column Decode

Sense Amp

77

Row Decode

SRAM, Putting it all together

2n rows, 2m * k columns

n + m address lines, k bits data width


78

Hierarchical Array Architecture


Subblocks 1 / output bit
Row Address Column Address Block Address

Select 1 column / subblock

Global Data Bus Control Circuitry Block Selector Global Amplifier/Driver I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings

1 sense amp / subblock

79

Standalone SRAM Floorplan Example

80

Divided bit-line structure

81

SRAM Partitioning Partitioned Bitline

82

SRAM Partitioning Divided Wordline Arch

83

Partioning summary
Partioning involves a trade off between area, power and speed For high speed designs, use short blocks(e.g 64 rows x 128 columns )
- Keep local bitline heights small

For low power designs use tall narrow blocks (e.g 256 rows x 64 columns)
- Keep the number of columns same as the access width to minimize wasted power
84

Redundancy
Redundant rows Fuse : Bank Redundant columns Memory Array Row Address

Row Decoder
Column Address
85

Column Decoder

Periphery

Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry


86

Asynchronous & Synchronous SRAMs

87

Address Transition Detection Provides Clock for Asynch RAMs


VDD

A0

DELAY td

ATD

ATD

A1

DELAY td

... AN-1 DELAY td

88

Collection of 2R complex logic gates organized in a regular, dense fashion


(N)AND decoder 9->512
WL(0) /= !A8!A7!A6!A5!A4!A3!A2!A1!A0 WL(511) /= A8A7A6A5A4A3A2A1A0

Row Decoders

NOR decoder 9->512


WL(0) = !(A8+A7+A6+A5+A4+A3+A2+A1+A0) WL(511) = !(!A8+!A7+!A6+!A5+!A4+!A3+!A2+!A1+!A0)

89

A NAND decoder using 2-input pre-decoders

WL 1

WL 0

A0A1 A0 A1 A0 A1 A0A 1

A 2A3 A2 A3 A2 A3 A2 A3

A1 A 0

A0

A1

A3 A2

A2

A3

Splitting decoder into two or more logic layers produces a faster and cheaper implementation
90

Row Decoders (contd)


A0/ A1/ A0 A1/ A0/ A1 A0 A1 A2/ A3/ A2 A3/ A2/ A3 A2 A3

R0/ R1/ R2/

and so forth

A0

A1

A2

A3

91

Dynamic Decoders
Precharge devices GND GND VDD WL 3 VDD WL 2

WL 3

WL 2 WL 1 WL 0 VD D

VDD

VDD

WL 1

WL 0 A0 A0 A1 A1

A0

A0

A1

A1

Dynamic 2-to-4 NOR decoder

2-to-4 MOS dynamic NAND Decoder

Propagation delay is primary concern


92

Dynamic NOR Row Decoder


Vdd

WL0 WL1 WL2 WL3 A0 Precharge/ !A0 A1 !A1

93

Dynamic NAND Row Decoder


WL0 WL1 WL2 WL3

!A0

A0

!A1

A1
Precharge/

Back

94

Decoders
n:2n decoder consists of 2n n-input AND gates
- One needed for each row of memory - Build AND from NAND or NOR gates

A1

A0

Make devices on address line minimal size Scale devices on decoder O/P to drive word lines Static CMOS Pseudo-nMOS
A1 A0

1/2
word0 word1 word2 word3

4 2

16 8

word

1 A1 A0

1 1 1

8 4

word0

word
word1 word2 word3

A0

A1 1 1

95

Decoder Layout
Decoders must be pitch-matched to SRAM cell
- Requires very skinny gates
A3 VDD A3 A2 A2 A1 A1 A0 A0

word

GND NAND gate buffer inverter

96

Large Decoders
For n > 4, NAND gates become slow
- Break large gates into multiple smaller gates
A3 A2 A1 A0

word0

word1

word2

word3

word15

97

Predecoding
- Group address bits in predecoder - Saves area - Same path effort
A3 A2 A1

A0

predecoders 1 of 4 hot predecoded lines word0 word1

word2 word3

word15

98

Column Circuitry
Some circuitry is required for each column
- Bitline conditioning - Sense amplifiers - Column multiplexing

Need hazard-free reading & writing of RAM cell Column decoder drives a MUX the two are often merged

99

Typical Column Access

100

Pass Transistor Based Column Decoder


BL3 !BL3 BL2 !BL2 BL1 !BL1 BL0 !BL0

A1

2 input NOR decoder

S3 S2 S1

A0

S0
Data !Data

Advantage: speed since there is only one extra transistor in the signal path Disadvantage: large transistor count
101

Tree Decoder Mux


Column MUX can use pass transistors
- Use nMOS only, precharge outputs

One design is to use k series transistors for 2k:1 mux


- No external decoder logic needed
B0 B1 A0 A0 A1 A1 A2 A2 Y Y
102

B2 B3

B4 B5

B6 B7

B0 B1

B2 B3

B4 B5

B6 B7

to sense amps and write circuits

Bitline Conditioning
Precharge bitlines high before reads
bit bit_b

Equalize bitlines to minimize voltage difference when using sense amplifiers


bit bit_b
103

Bit Line Precharging


Static Pull-up Precharge Clocked Precharge
clock

BL

!BL

BL

!BL

equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line

104

Sense Amplifier: Why?


Bit line cap significant for large array
- If each cell contributes 2fF,
for 256 cells, 512fF plus wire cap

Cell pull down

Xtor resistance

- Pull-down resistance is about 15K - RC = 7.5ns! (assuming DV = Vdd)

RCDV t Vdd
Cell current

Cannot easily change R, C, or Vdd, but can change DV i.e. smallest sensed voltage
- Can reliably sense DV as small as <50mV

105

Sense Amplifiers
D t p = Cb V ---------------I cell large make D V as small as possible

small

Idea: Use Sense Amplifer

small transition
input

s.a.

output
106

Differential Sensing - SRAM


V DD V DD M4 M2 M5 x x SE y x

V DD

PC

VDD
x

y M3 M1 SE BL

BL

EQ

WL i (b) Doubled-ended Current Mirror Amplifier SRAM cell i Diff. Sense x x Amp y y D D (a) SRAM sensing scheme. y x SE (c) Cross-Coupled Amplifier
107

V DD y x

Latch-Based Sense Amplifier


EQ BL VDD SE BL

SE

Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.
108

Sense Amplifier
bit word bit

sense clk

isolation transistor

regenerative amplifier

109

Sense Amp Waveforms


1ns / div

bit
200mV

bit

wordline

wordline

begin precharging bit lines


BIT BIT

2.5V

sense clk

sense clk
110

Write Driver Circuits

111

Twisted Bitlines
Sense amplifiers also amplify noise
- Coupling noise is severe in modern processes - Try to couple equally onto bit and bit_b - Done by twisting bitlines
b0 b0_b b1 b1_b b2 b2_b b3 b3_b

112

Transposed-Bitline Architecture
BL BL BL BL" (a) Straightforward bitline routing. BL BL BL BL" (b) Transposed bitline architecture.
113

Ccross SA

Ccross SA

114

DRAM in a nutshell

Based on capacitive (non-regenerative) storage Highest density (Gb/cm2) Large external memory (Gb) or embedded DRAM for image, graphics, multimedia Needs periodic refresh -> overhead, slower

115

116

Classical DRAM Organization (square) bit (data) lines


r o w d e c o d e r RAM Cell Array
Each intersection represents a 1-T DRAM Cell

word (row) select

row address

Column Selector & I/O Circuits

Column Address

data

Row and Column Address together:


- Select 1 bit a time
117

DRAM logical organization (4 Mbit)

118

DRAM physical organization (4 Mbit,x16)

119

Memory Systems
address

n
DRAM Controller n/2 Memory Timing Controller DRAM 2^n x 1 chip w Bus Drivers

Tc = Tcycle + Tcontroller + Tdriver

120

Logic Diagram of a Typical DRAM


RAS_L CAS_L WE_L OE_L

A
9

256K x 8 DRAM

Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D):
- WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin D is the data output pin

- WE_L is disasserted (High), OE_L is asserted (Low)

Row and column addresses share the same pins (A)


- RAS_L goes low: Pins A are latched in as row address - CAS_L goes low: Pins A are latched in as column address - RAS/CAS edge-sensitive
121

DRAM Operations
Write
- Charge bitline HIGH or LOW and set wordline HIGH

Read
- Bit line is precharged to a voltage halfway between HIGH and LOW, and then the word line is set HIGH. - Depending on the charge in the cap, the precharged bitline is pulled slightly higher or lower. - Sense Amp Detects change
Word Line

. . .

Bit Line

Explains why Cap cant shrink


- Need to sufficiently drive bitline - Increase density => increase parasitic capacitance

Sense Amp

122

DRAM Access
1M DRAM = 1024 x 1024 array of bits

10 row address bits arrive first Row Access Strobe (RAS)

1024 bits are read out 10 column address bits arrive next Column Access Strobe (CAS) Column decoder

Subset of bits returned to CPU

123

DRAM Read Timing


Every DRAM access begins at:
- The assertion of the RAS_L - 2 ways to read: early or late v. CAS
DRAM Read Cycle Time
RAS_L CAS_L A
9

RAS_L

CAS_L

WE_L

OE_L

256K x 8 DRAM

Row Address

Col Address

Junk

Row Address

Col Address

Junk

WE_L OE_L

High Z

Junk
Read Access Time

Data Out

High Z
Output Enable Delay

Data Out

Early Read Cycle: OE_L asserted before CAS_L

Late Read Cycle: OE_L asserted after CAS_L


124

DRAM Write Timing


Every DRAM access begins at:
- The assertion of the RAS_L - 2 ways to write: early or late v. CAS
DRAM WR Cycle Time
RAS_L CAS_L

RAS_L

CAS_L

WE_L

OE_L

A
9

256K x 8 DRAM

Row Address

Col Address

Junk

Row Address

Col Address

Junk

OE_L WE_L

Junk

Data In
WR Access Time

Junk

Data In
WR Access Time

Junk

Early Wr Cycle: WE_L asserted before CAS_L

Late Wr Cycle: WE_L asserted after CAS_L


125

DRAM Performance
A 60 ns (tRAC) DRAM can
- perform a row access only every 110 ns (tRC) - perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC).
In practice, external address delays and turning around buses make it 40 to 50 ns

These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead.
- Drive parallel DRAMs, external memory controller, bus to turn around, SIMM module, pins - 180 ns to 250 ns latency from processor to memory is good for a 60 ns (tRAC) DRAM
126

1-Transistor Memory Cell (DRAM)


Write:
- 1. Drive bit line - 2.. Select row
row select

Read:
- 1. Precharge bit line - 2.. Select row - 3. Cell and bit line share charges
Very small voltage changes on the bit line Can detect changes of ~1 million electrons bit

- 4. Sense (fancy sense amp) - 5. Write: restore the value

Refresh
- 1. Just do a dummy read to every cell.
127

DRAM architecture

128

Cell read: refresh is the art

DV V ' BL -VBL (VSN

Cs - VBL ) C s Cb

129

Sense Amplifier

130

131

DRAM technological requirements


Unlike SRAM : large Cb must be charged by small sense FF. This is slow. - Make Cb small: backbias junction cap., limit blocksize, - Backbias generator required. Triple well. Prevent threshold loss in wl pass: VG > Vccs+VTn - Requires another voltage generator on chip Requires VTnwl> Vtnlogic and thus thicker oxide than logic - Better dynamic data retention as there is less subthreshold loss. - DRAM Process unlike Logic process! Must create large Cs (10..30fF) in smallest possible area - (-> 2 poly-> trench cap -> stacked cap)
132

Refreshing Overhead
Leakage : - junction leakage exponential with temp! - 25 msec @ 800 C - Decreases noise margin, destroys info All columns in a selected row are refreshed when read - Count through all row addresses once per 3 msec. (no write possible then) Overhead @ 10nsec read time for 8192*8192=64Mb: - 8192*1e-8/3e-3= 2.7% Requires additional refresh counter and I/O control
133

Dummy cells

Bitline

Bitline

Vdd/2
Vdd/2 precharge precharge

Wordline
134

Alternative Sensing Strategy Decreasing Cdummy


Convert to differential sense
- Create a reference in an identical structure Data Col BL

Needs
- A method of generating signal swing of bit line

Operation:
- Dummy cell is C - active wordline and dummy wordline on opposite sides of sense amp. - Amplify difference

Dummy Col BL BL

BL Dummy Col BL

DV"1/ 0"

1 1 Cb C s

Vdd 2

So Small Cs small swing, Large Cb small swing

Data Col BL
135

Overhead of fabricating C/2

Alternative Sensing Strategy Increasing Cbitline on Dummy side

Double Bitline

Data
SA outputs D and D pre-charged to VDD through Q1, Q2 (Pr=1)

Dummy

reference capacitor, Cdummy, connected to a pair of matched bit lines and is at 0V (Pr=0) parasitic cap Cp2 on BL is ~ 2 Cp1 on BL, sets up a differential voltage LHS vs. RHS due to rise time difference SA outputs (D, D) become charged, with a small difference LHS vs. RHS Regenerative Action of Latch
136

DRAM Memory Systems


address

n
DRAM Controller n/2 Memory Timing Controller DRAM 2^n x 1 chip w Bus Drivers

Tc = Tcycle + Tcontroller + Tdriver

137

DRAM Performance
Cycle Time Access Time Time

DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time


- 2:1; why?

DRAM (Read/Write) Cycle Time :


- How frequent can you initiate an access?

DRAM (Read/Write) Access Time:


- How quickly will you get what you want once you initiate an access?

DRAM Bandwidth Limitation:


- Limited by Cycle Time
138

Fast Page Mode Operation


Fast Page Mode DRAM
- N x M SRAM to save a row
N rows Column Address N cols

After a row is read into the register

DRAM

Row Address

- Only CAS is needed to access other M-bit blocks on that row N x M SRAM - RAS_L remains asserted while M-bit Output CAS_L is toggled
1st M-bit Access RAS_L CAS_L A Row Address Col Address Col Address Col Address 2nd M-bit 3rd M-bit

M bits 4th M-bit

Col Address
139

Page Mode DRAM Bandwidth Example


Page Mode DRAM Example:
- 16 bits x 1M DRAM chips (4 nos) in 64-bit module (8 MB module) - 60 ns RAS+CAS access time; 25 ns CAS access time - Latency to first access=60 ns Latency to subsequent accesses=25 ns - 110 ns read/write cycle time; 40 ns page mode access time ; 256 words (64 bits each) per page

Bandwidth takes into account 110 ns first cycle, 40 ns for CAS cycles Bandwidth for one word = 8 bytes / 110 ns = 69.35 MB/sec Bandwidth for two words = 16 bytes / (110+40 ns) = 101.73 MB/sec Peak bandwidth = 8 bytes / 40 ns = 190.73 MB/sec Maximum sustained bandwidth = (256 words * 8 bytes) / ( 110ns + 256*40ns) = 188.71 MB/sec
140

4 Transistor Dynamic Memory


Remove the PMOS/resistors from the SRAM memory cell Value stored on the drain of M1 and M2 But it is held there only by the capacitance on those nodes Leakage and soft-errors may destroy value

141

142

First 1T DRAM (4K Density)


Texas Instruments TMS4030 introduced 1973 NMOS, 1M1P, TTL I/O 1T Cell, Open Bit Line, Differential Sense Amp Vdd=12v, Vcc=5v, Vbb=-3/-5v (Vss=0v)

143

16k DRAM (Double Poly Cell)


MostekMK4116, introduced 1977 Address multiplex Page mode NMOS, 2P1M Vdd=12v, Vcc=5v, Vbb=5v (Vss=0v) Vdd-Vt precharge, dynamic sensing
144

64K DRAM
Internal Vbbgenerator Boosted Wordline and Active Restore
- eliminate Vtloss for 1

x4 pinout

145

256K DRAM
Folded bitline architecture
- Common mode noise to coupling to B/Ls - Easy Y-access

NMOS 2P1M
- poly 1 plate - poly 2 (polycide) -gate, W/L - metal -B/L

redundancy

146

1M DRAM
Triple poly Planar cell, 3P1M
poly1 -gate, W/L poly2 plate poly3 (polycide) -B/L metal -W/L strap

Vdd/2 bitline reference, Vdd/2 cell plate

147

On-chip Voltage Generators


Power supplies
- for logic and memory

precharge voltage
- e.g VDD/2 for DRAM Bitline .

backgate bias
- reduce leakage

WL select overdrive (DRAM)

148

Charge Pump Operating Principle


Vin ~ +Vin

+Vin

Charge Phase

Vin

dV +Vin dV Vo

Discharge Phase
Vin = dV Vin + dV +Vo Vo = 2*Vin + 2*dV ~ 2*Vin

149

Voltage Booster for WL


Cf CL

d Vhi Vhi dV Vcf(0) ~ Vhi VGG=Vhi + VGG ~ Vhi + Vhi CL Cf Vcf ~ Vhi
150

Backgate bias generation

Use charge pump Backgate bias: Increases Vt -> reduces leakage reduces Cj of nMOST when applied to p-well (triple well process!),
151 smaller Cj -> smaller Cb larger readout V

Vdd / 2 Generation
2v
1v 1.5v 0.5v ~1v 0.5v 1 v

0.5v
1v

Vtn = |Vtp|~0.5v uN = 2 uP
152

4M DRAM
3D stacked or trench cell CMOS 4P1M x16 introduced Self Refresh Build cell in vertical dimension -shrink area while maintaining 30fF cell capacitance

153

154

Stacked-Capacitor Cells

Poly plate

Hitachi 64Mbit DRAM Cross Section Samsung 64Mbit DRAM Cross Section
155

Evolution of DRAM cell structures

156

Buried Strap Trench Cell

157

Process Flow of BEST Cell DRAM


Array Buried N-Well Storage Trench Formation Node Dielectric (6nm TEQ.) Buried Strap Formation Shallow Trench Isolation Formation N- and P-Well Implants Gate Oxidation (8nm) Gate Conductor (N+ poly / WSi) Junction Implants Insulator Deposition and Planarization Contact formation Bitline (Metal 0) formation Via 1 / Metal 1 formation Via 2 / Metal 2 formation

Shallow Trench Isolation -> Replaces LOCOS isolation -> saves area by eliminating Birds Beak

158

BEST cell Dimensions

Deep Trench etch with very high aspect ratio


159

256K DRAM
Folded bitline architecture
- Common mode noise to coupling to B/Ls - Easy Y-access

NMOS 2P1M
- poly 1 plate - poly 2 (polycide) -gate, W/L - metal -B/L

redundancy

160

161

162

Transposed-Bitline Architecture
BL BL BL BL" (a) Straightforward bitline routing. BL BL BL BL" (b) Transposed bitline architecture.
163

Ccross SA

Ccross SA

Cell Array and Circuits


1 Transistor 1 Capacitor Cell
- Array Example

Major Circuits
- Sense amplifier - Dynamic Row Decoder - Wordline Driver

Other interesting circuits


Data bus amplifier Voltage Regulator Reference generator Redundancy technique High speed I/O circuits

164

Standard DRAM Array Design Example

165

WL direction (row)

Column predecode

64K cells (256x256) 1M cells = 64Kx16

Global WL decode + drivers

Local WL Decode
166

BL direction (col)

DRAM Array Example (contd)


2048

256x256

64

256

512K Array Nmat=16 ( 256 WL x 2048 SA)


Interleaved SA & Hierarchical Row Decoder/Driver (shared bit lines are not shown)
167

168

169

170

Standard DRAM Design Feature


Heavy dependence on technology The row circuits are fully different from SRAM. Almost always analogue circuit design CAD:
- Spice-like circuits simulator - Fully handcrafted layout

171

You might also like