You are on page 1of 49

ESE 570 SEMICONDUCTOR

MEMORIES

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

A Typical Computer System


CPU
L1-D

L1-I

L2-Cache

System bus
Video
RAM

AGP
GPU bus
USB bus

Ch 1

Memory
Controller Ch 2
PCI bus
I/O
Controller
Other
buses

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

Disk
Adapter

DRAM
DIMM
DRAM
DIMM

Ethernet
Adapter

volatile

non-volatile

Non-Volatile

(no power required


to hold data)

Volatile

(requires power to
hold data)

ROM

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

CPU Memory Hierarchy


CPU Chip
L1 on-CPU

1k to 64 k SRAM (register file)

cache
off-chip
cache
memory

L2

64k to 4M

L3

4M to 32M SRAM or DRAM


(multi-core shared)
8M

L4

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

Locality and Cacheing


Memory hierarchies exploit locality by cacheing (keeping close to the processor)
data likely to be used again.
This is done because we can build
large, slow memories &
small, fast memories,
but we cant build large, fast memories.
If it works, we get the illusion of SRAM access time with disk based memory
capacity.
SRAM (static RAM) -- 5-20 ns access time, very expensive (on-CPU faster).
DRAM (dynamic RAM) -- 60-100 ns, cheaper.
Disk -- access time measured in milliseconds, very cheap.

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

Why Do We Care about Memory Hierarchy?

Relative
Performance

100,000
10,000
1,000

Processor Memory
Performance Gap
(grew about 50%/year)

CPU
100
10
1
1980

Memory
1985

1990

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

1995
Year

2000

2005

2010

Access Time
Cycle Time

Low
2
2

1*
1*

3
3

4
4

2
2

2
2

Cache/PDAs

*1 = Best
Access Time (tAC) time required to read data from a single memory cell
Cycle Time (tC) time required to perform a read or write operation plus any
recovery time before the next read/write operation can begin
(measure of overall data rate)

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

TYPICALRANDOM
RANDOMACCESS
ACCESSMEMORY
MEMORYARRAY
ARRAY
TYPICAL
ORGANIZATION
ORGANIZATION (ONE
(1) OF 2M-BIT WORD PER ROW)
G

Data

S,D

CHIP I/O INTERFACE

Chip
Control
Signals

SENSE AMPLIFIERS/DRIVERS

Address
(N + M)

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

Practical Issues:
1. N >> M
a. Long, thin layout => awkward to fit into system chip floor-plan.
b. Long bit lines slow memory access, i.e. more parasitic capacitance.
2. 2N*2M very large, say 1010 to 1012 cells
a. Long bit lines slow memory access, i.e. more parasitic capacitance.
Remedies:
1. Reorganize memory by reducing the number of rows to 2 N-k and increasing the
number of rows to 2M+k, i.e. make N k M &k (complicates column decoder)
2. Construct large memories from smaller modular blocks
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

DRAM Chip Partitioned into Supercells


128 bit DRAM chip
0

2
addr

(to CPU)

Memory
Controller
8

Cols
1

0
1
Rows
2
3

Supercell (2, 1)
8 bits wide

data

Internal Row Buffer

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

10

NONVOLATILE MEMORY ROM


PseudonMOS
NOR gate

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

11

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

12

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

13

Pseudo-nMOS
NAND gate

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

14

DESIGN OF ROW AND COLUMN


DECODERS
N = 2 Address Bits to access each of 2 N = 4
Word Lines

N=2

Purpose of ROW
DECODER ->
reduce number of
external signals (or
bits) needed to
select a word or row
from memory.
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

L = 2N = 4
MEMORY
Storing
L = 2N = 4
words, each
2M bits wide

N decoder address bits


needed to select a
specific one of 2N words
or rows in memory, one
at a time.
15

L = 2N = 4 rows
(N*L) nMOS + (L) pMOS

+ 2N INVs (4N Xstrs)


TOTAL = NL + L + 4N Xstrs

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

16

Row
decoder
selects
oneof
(1)
Let
there
be
one
(1)
22MM bit
bitlines.
word per Row.

(2M-1 + 5M) Xstrs

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

17

C2M
CA1
CA1
CA2
CA2
CAM
CAM

no separate decoder needed.

M series connected nMOS pass Xstrs.


Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

18

= 9, i.e. 29 = 512 rows


= 6, i.e. 26 = 64 cols.
R511
R512
C63 C64

1.5
3.5 fF/m2
1.8 fF
Crow = Cox (LnMOSWnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cell
Rrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cell
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

19

R512
C64

Crow = Cox (LnMOS/WnMOS) = 3.5 fF (2 x 1.5) = 10.5 fF per cell


Rrow = Rsheet-poly (Lpoly/Wpoly) = 20 (6/1.5) = 80 per cell

64

63

63

64

VG64
64

* row

VG64
G256

Elmore Delay Formula


N
j = 256 x 80 = 20.48 k
N ' N &1(
* DN = C j R k =R row C row
2
j=1
k =1
= 256 x 10.5 fF = 2688 fF
where N = 64
= 20.9
* row0.69 * D64 =1.2
nsns (at VG64)

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

20

REVIEW

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

21

*column

512
R512

R512

=512 x 1.8 fF pF
= 0.9
C column 512C dbn=5120.0118
6pF
pF

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

22

=128 x 11.8 fFpF


= 1.5
1.47 pF
C column 128C dbn=1280.0118
pF
*column =* PHL

Other Parameters:
VOH = VDD = 5V

0.9 pF

V OL 0V
VT0n = - VT0p = 1 V
nCox = 20 A/V2

.18
11 ns

20.9 ns&11
ns + 18ns=12.2
ns = 38.9
*access =* row &*column==1.2
nsns
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

23

(or Differential Column)

ACCESS DATA WHILE NOT MODIFYING THE


DATA IN SRAM CELL

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

24

of data

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

25

6T CMOS SRAM Cell

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

26

(Rk from ROW Rk


Decoder)

Rk = 0: M3 & M4 are OFF


If Rk = 0 for ALL rows (all k = 2N), the bit line capacitances CC and CNOT-C are
pre-charged

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

27

Rk
Rk = 1

a. WRITE 1 OP
b. READ 1 OP
c. WRITE 0 OP
d. READ 0 OP
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

28

Rk -> 1

Rk -> 1

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

29

and interprets the result as a 1 data bit.

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

30

Rk -> 1

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

31

and interprets the result as a 0 data bit.

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

32

Rk

Rk

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

33

Rk

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

34

CMOS SRAM

WRITE CKT

from ROW
DECODER

Rk

(0)
(1)
(1)

(1)

(0)

(0)

DATA

WB

WB

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

OPERATION (M3 ON)

35

CMOS SRAM
SRAM READ CIRCUIT
Differential Sense Amplifier
(one per column)
MP2
MA4

Rk
WB

MA5

VNOT-C
M2

WB

Read
Select

VC

MA1

MA2
MA3

Sense Amp Gain:

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

36

Rk

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

6T 1-bit CMOS
SRAM Cell

37

HISTORICAL EVOLUTION OF THE DRAM CELL


Pull-up transistors
(two per column)

Static Dynamic RAM

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

38

6
3

M1

M2
M4

M3

M2
M3
M1

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

39

INDUSTRY STANDARD 1T-1C DRAM CELL

M1

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

NOTE:
Two-poly
capacitors
have very
low
dissipation

40

VC

VB

VNOT-C

write select
(WS)

Uses three-phase non-over-lapping clock scheme with PC = pre-charge,


1 = RS = read and 2 = WS or write. NOTE bit-lines are no longer
complements.
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

41

3-T DRAM CELL OPERATION - cont.

Precharge PC = 1

VC

VB

M2

I=0

CNOT-C

WS = 1; RS = 0

WS
DATA = 0

(WRITE)

(READ)

CC >> C
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

42

CHARGE SHARING IN 3T DRAM WRITE 1


WS
VC

VB

CC

When WS = 1
Since CC >> C

C C V C &C V B
V R=
C C &C

V R V C

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

independent of VB

43

3-T DRAM CELL OPERATION - cont.


VB keeps
M2 ON

CC

Vdata-out

WS = 0; RS = 1

WS
(Read 1 is non-destructive)

Falling Vdata-out is
interpreted as 1

VC

VB

3-T DRAM
cell is
Inverting

CNOT-C

WS = 1; RS = 0
WS

MDATA

(and C) DISCHARGED

CC >> C

DATA = 1
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

DATA = 1

MDATA (and M1)

44

3-T DRAM CELL OPERATION - cont.

VB keeps
M2 OFF

CC

Vdata-out

WS = 0;
WS

3-T DRAM cell is Inverting

Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

High Vdata-out is
interpreted as 0
Read 0 is nondestructive

45

3-T DRAM CELL


OPERATION - cont.
Dout

Din

Dout

Din

(WRITE)

(READ)

CC >> C

write
1

write
0

WS
Din
Dout
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

46

1T-1C DRAM CELL OPERATION


VB

CC >> C

READ OP Charge Sharing


R = 0 => VBL0 = VPRE
C C V PRE &C V B
R = 1 => V BL1=
C C &C
C C V PRE &C V B
) V B L =V BL1 V BL0 =
V PRE
C C &C
CCC
)VBL
=
'VBBV
VPRE
)V
'V
BL=
PRE((
C
&C
C CC&C

WRITE 1 OP: D = 1, R = 1 (M1 ON) => C CHARGES to 1


WRITE 0 OP: D = 0, R = 1 (M1 ON) => C DISCHARGES to 0

READ 0 OP: DESTROYS BIT STORED ON C => REFRESH is NEEDED

READ 0: DESTROYS BIT STORED on C => REFRESH is NEEDED

STEP 1 READ OP: Pre-charge column capacitance Cc to HIGH (VBL0 = VPRE = VDD/2)
STEP 2 READ OP: Set R = 1 and detect VBL on Cc + C due to charge sharing
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

47

EXAMPLE: 1T DRAM Read OP:

Assume a bit line capacitance of CC = 1 pF, storage capacitance C = 50 fF


and a bit line pre-charge VPRE = 1.25 V. Let the voltage stored on C to be
VB = 2.5 V for a 1 and VB = 0 V for a 0. Determine the VBL1 for
reading a 1 and the the VBL0 for reading a 0.
C
0.05 pF
)V BL1=
'V V PRE (=
'2.5 V 1.25V (=60 mV
C C &C B
1 pF &0.05 pF
C
0.05 pF
)V BL0 =
'V V PRE (=
'0 V 1.25V (=60 mV
C C &C B
1 pF &0.05 pF
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

48

SRAM vs. DRAM


Static SRAM

+ Data is stored as long as power supply is applied


+ Volatile or nonvolatile
- Large cells (6-T per cell) fewer bits per unit chip area
- 4x to 10x larger than comparable DRAM
- 10x more expensive than comparable DRAM
++ Fast due to simple interface and efficient read/write operations
+ Used where speed is important, e.g. caches
+ Differential outputs
o Use sense amplifiers to increase read performance

Dynamic DRAM
++ Small cells (1-T to 3-T per cell) more bits per unit chip area
+ 4x to 10x higher density than SRAM with same chip area
- Periodic refresh required if DATA stored for > 1 msec
- Volatile
- Slow due to very complex interface
- row/column access multiplexed
o Used where speed in less important than high capacity, e.g. main memory
Kenneth R. Laker, University of Pennsylvania, updated 02Apr15

49

You might also like