Registers

Memory Design I
Kia Bazargan
University of Minnesota Dept. of ECE kia@umn.edu Slides adapted from Prof. Chris H. Kim

• Used for storing data • Structure
N-bit wide Parallel/serial read/write Clocked Static/dynamic implementation

32 bits 16 wo ords ...

• Register files
Multiple read/write ports possible Example: 32-bit wide by 16-bit deep, dual-port parallel read, single port parallel write register file

32
[©Hauck]

EE 5324 - VLSI Design II - © Kia Bazargan

2

Implementing Registers Using Logic Gates
• Flip-flops
Simple SR latch: S Q Q Flip-flops
o JK, D, T o Clocked o Master-slave (edge-triggered)
EE 5324 - VLSI Design II - © Kia Bazargan

Implementing Registers in CMOS
• Direct gate implementation too costly
A master-slave JK flip-flop uses 38 CMOS transistors

• Directly implement in transistors y p
S R Q Q S 1 1 0 0 R 1 0 1 0 Q Q 0 1 x Q’ Q’ 1 0 x Example: clocked SR FF Q Q φ Q φ R
[Rab96] p.342

R

Note: carefully size the S S, R and φ transistors so that we can write
3
EE 5324 - VLSI Design II - © Kia Bazargan

4

1

wait for the word line to become stable Read/write data on the data lines Read Cycle READ Read Access WRITE Write Access Data Valid DATA [©Prentice Hall] .VLSI Design II .VLSI Design II . word select lines SM-2 SM-1 Word M-2 Word M-1 N bits .. ... D2’.. φ φ D1 D1 D1 φ φ D2 D2 φ φ D3 D3 WR D WR Q WR WR Q • Problem? D2 D3 When clock goes high. .. 5 EE 5324 .VLSI Design II .. “write” operation will take place Stack D latch structures to get n-bit register D Shift Registers: Idea • Shift registers are used for iteratively shifting data Used in pipelining. bit-by-bit processing.) • Another example: D latch (register) Uses transmission gate When “WR” asserted..© Kia Bazargan EE 5324 . Read Access Write Cycle k = log2 (M) Data Written EE 5324 ..© Kia Bazargan 8 2 .© Kia Bazargan 7 EE 5324 . φ2 by even gates (use xmission gates after D1’.Implementing Registers in CMOS (cont. D3’).VLSI Design II .. etc.. the data will traverse all the shift registers chain in one clock cycle! Solution: use non overlapping clocks φ1 and φ2. Word M-2 Word M-1 N bits Storage cells Address decoder A0 A1 S0 S1 S2 Word 0 Word 1 Decode er SM-2 SM 1 M-1 Ak-1 . φ1 used by odd gates.© Kia Bazargan 6 Memory Architecture: the Big Picture • Address: which one of the M words to access • Data: the N bits of the word are read/written S0 S1 S2 Word 0 Word 1 Memory Access Timing: the Big Picture • Timing: Send address on the address lines.

...3L-1 Word 2L Word 2L+1 . Word 1023 16 bits SAmp/Drv 16 bits S1016-1023 M/L = 1024/8= 128 rows N bits N bits SAmp/Drv SAmp/Drv 16 bits 16 bits SAmp/Drv SAmp/Drv Alog L-1 A0 .© Kia Bazargan N bits N bits 1 A0 0 A1 0 A2 11 16 bits Column Decoder + MUX 16 bits 16 bits EE 5324 ..... . ... ..... Word 1016 . ........23 L=8 words Word 0 Word 8 Word 16 Word 1 Word 9 Word 17 . ..) • Group the M words into M/L rows. • Memory area Cell array layout ... .VLSI Design II .. address=10 bits (k) • Accessing word 9= 00000010012 S0.. .. 0 S0 0 S1 0 S2 0 1 S9 0 . .Memory Cell Array Interface: Example • Memory parameters: 16-bit wide 1024-word deep Memory Cell Array Layout • Memory performance (speed) Storage cell speed (read.. .... write) Data bus capacitance Periphery: address decoders.... buffers A0 A1 S0 S1 S2 Word 0 Word 1 Word 2 • Accessing word 9 Address = 00000010012 0 1 0 1 0 … 0 A0 A1 A2 A3 ..... . . .. 0 A9 . sense amplifiers. . Ak-1 .VLSI Design II . Word M-1 N bits SAmp/Drv N bits SM-L... Column Decoder + MUX N bits EE 5324 ..© Kia Bazargan SM-2 M 2 SM-1 Word M-2 Word M-1 N bits SenseAmp / Drivers 16 bits EE 5324 .. Word 7 Word 15 Word 23 address: address L bits k-L bits Row Decoder R Row Decoder ... . • How to layout the cells array? Linear is bad: o Long data busses large capacity o A lot of cells connected to data bus o Decoder will have a lot of logic levels EE 5324 ... ....7 S8..VLSI Design II . Word 0 Word 1 Word 2 ...15 S16. row=8 words(L).. . Word L-1 Word 2L-1 W d 2L 1 Word 3L-1 Memory Cell Array Access Example • word=16-bit wide(N)..L-1 Word 0 Word 1 SL 2L-1 L. .. . ..© Kia Bazargan 12 3 ..2L-1 Word W dL W dL 1 Word L+1 S2L.. Word 9 Decoder Decoder A9 0 0 S1022 S1023 Word 1022 Word 1023 16 bits . ...M-1 Word M-L 1 A3 0 A4 0 … . Ak-1 .... ...© Kia Bazargan SenseAmp / Drivers N bits 9 10 Memory Cell Array Layout (cont... each containing L words • Benefits? Alog L Alog L+1 S0. ...VLSI Design II .....

longer word lines! power savings • How to decrease the delay on the word lines? Break the word line by inserting buffers Place the decoder in the middle Polysilicon word line Metal word line Polysilicon word line Metal bypass [Rab96] p.) • Place the decoder in the middle • Add buffers to outputs of decoder Array-Structured Memory Architecture memory cell array d e c o d e r k Address lines memory cell array [©Hauck] EE 5324 .Hierarchical Memory Structure • Taking the idea one step further Shorter wires within each block Enable only one block addr decoder Row Address Column Address Block Address Blk EN Blk EN Blk EN Blk EN Global Bus SAmp/ Global drivers/ Drv sense amplifiers EE 5324 . 558 (a) Drive the word line from both sides (b) Use metal bypass [©Prentice Hall] 13 EE 5324 .© Kia Bazargan 15 16 4 . made 2D array shorter busses But.VLSI Design II .© Kia Bazargan 14 Decreasing Word Line Delay (cont.© Kia Bazargan Decreasing Word Line Delay • Word line delay comes into play! We used to have long busses.VLSI Design II .VLSI Design II .

logic i 6T. differential DRAM: cell has no gain. 2003 • High density is the primary design goal for memories • Low voltage operation is essential for low power 19 • Vdd cannot be scaled down aggressively for low power consumption Itoh. 2003 20 5 . FAST. POWER l i compatible. DRAM process. slow.Semiconductor Memory Classification Read-Write M R d W i Memory Non-Volatile Read-Write Memory EPROM E2PROM FLASH Read-Only M R d O l Memory Read-Write Memories (RWM) • Basic storage elements of semiconductor memory RAM SRAM DRAM Random Access Non-Random Access FIFO LIFO Shift Register CAM Mask-Programmed Programmable (PROM) SRAM DRAM SRAM: ll h SRAM cell has gain. single ended. refresh. DENSE 17 18 Memory Scaling Trend Memory Scaling Trend • Long retention time low Ioff – High Vt is required • Fast access time high Ion – High Vgs-Vt is required Itoh. IBM R&D. IBM R&D. 1T. 6T FAST LOW POWER.

4 1.18μm 1.8 0.01 NMOS PMOS σV ∝ t 0. stability.Why SRAMs are Important Cache Core Logic Cache Core Logic Cache Core Logic Why SRAMs are Important Normalize ION ed 0.6 0.13μm 1 1 = Area WL Taur.2 1. 110°C 0.9B out of 10B transistors will be used for SRAMs • Company with better SRAM design will dominate 21 • • • • Area is the number one concern minimum sized devices Smaller devices have larger variation Delay variation.09μm Normalized IOFF • Memories have better power efficiency compared to logic • ~9.1 1 10 100 0. Ning 2X 100X 150nm.4 0.0 0. leakage is a problem Central limit theorem doesn’t hold (σ/μ) 22 Positive Feedback: Bi-Stability V i1 V o1 = V i 2 V o2 Meta-Stability Vi2 = Vo1 Vi2 = Vo1 A A V o1 Vi2 V o2 = V i 1 C V i1 A V i 2 = V o1 C V o2 C B δ B V i 1 = V o2 B δ Vi1 = Vo2 Vi1 = Vo2 Gain should be larger than 1 in the transition region 23 24 6 .

BLB) to carry the data Almost minimum size transistors for small cell area 25 • • • • • • BL BLB Both bit lines are precharged to Vdd Wordline is fired for one of the cells on bit line Cell pulls down either BL or BLB Sense amp regenerates the differential signal Data should not flip after read access Driver TR must be stronger than access TR 26 SRAM Read Operation SRAM Read Operation WL bitline delay = bitli d l Cbitline ΔVbitline I cell BL BLB SA out 50mV 50 V Murmann class notes • For high density. circuit complexity 28 7 .SRAM Memory Cell WL 0 SRAM Read Operation WL ‘1’ 0 ‘1’ ‘1’ ‘1’ ‘1’ • • • • • BL BLB NMOS access transistors Read and write uses the same port: need sufficient margins One wordline to access cell Two bit lines (BL. large number of cells share bitline and wordline – Subarray organization for 32Kb: 128 WL’s. 256BL’s 27 • Cbitline is large due to large number of cells attached • Icell is small due to high density cells • ∆Vbitline has to be minimized for high speed – < 100mV bitline voltage difference generated by SRAM cell – Let the sense amplifier finish the job – Increased noise sensitivity.

2 0.2 VDD VDD VQ VDD 0 Vx 0 Good SNM 0 0.4 0. Seevinck.2 0 1 • When cell is not accessed (WL=0) – Data is safely kept inside the cell – Hi h noise margin High i i • When cell is accessed (WL=Vdd) – Access transistor acts as a noise source – Data ‘0’ is pulled up to Vx – Cell data can flip if Vx rises above Vtn 31 Destructive read problem The size of the largest square enclosed in the butterfly curves = read static noise margin Bad SNM 0 0.4 0.6 0.6 0.8 0.8 V( (QB) 0.6 0. 1987.4 0.2 0. JSSC VQB 1 0.8 1 32 8 .6 V(Q) 0.4 0.8 V(Q) E.SRAM Read Operation: Precharge SRAM Read Operation: Precharge • Option (a) – Similar to dynamic logic precharge – Balance transistor to equalize bitline voltages – Sh Short wordline pulse required to li i bitline swing dli l i d limit bi li i • Option (b) – Pseudo-NMOS type circuit – Bitline voltage clamped during read • Option (c) – NMOS pullup instead of PMOS – Precharge levels are limited to Vdd-Vt V – Can’t operate at low Vdd Vdd Vdd-Vtn 29 30 SRAM Cell Read Margin Vdd Vdd Vdd Vdd Static Noise Margin V(QB) 1 0.

decreasing cell size Vdd+∆ Vdd SNM low Vt SNM high Vt 0 35 36 9 . suppressing the rise in Vdd the low side – Effectively improves the beta ratio – Driver NMOS can be downsized. area is the number one constraint in memory design • Increasing cell size a not a good trade off 33 34 Techniques to Improve Read Margin • High Vt transistors – Internal node on low side needs to rise to Vt or more – Virtually never happens when Vt is larger than half Vdd – Cell is extremely stable at ultra-low power design point – Beta ratio constraint is relaxed smaller driver and larger access TR can be used for faster read and write Techniques to Improve Read Margin • Boosted cell supply – Supply voltage of Vdd SRAM cell i hi h ll is higher than outside Vdd – Makes driver stronger than access.CMOS SRAM Analysis (Read) WL Techniques to Improve Read Margin Cell beta ratio = (W/L)drv / (W/L)access V DD BL Q= 0 M5 V DD Cbit M4 Q= 1 V DD M6 V DD Cbit BL M1 J. Rabaey • Increasing the size of the driver NMOS improves read margin • But remember.

5) • To avoid cell size increase. correct pull-up ratio achieved by controlling Vtn and Vtp 39 40 10 .SRAM Write Operation WL ‘1’ 0 CMOS SRAM Analysis (Write) WL V DD M4 PR= M6 ‘1’ M5 Q= 0 Q= 1 M1 V DD (W / L)4 (W //L)6 L ‘1’ ‘0’ BL • • • • BLB BL = 1 BL = 0 Launch the write data on BL and BLB Word line signal is fired Low bit line value flips cell data Access TR must be stronger than PMOS load 37 38 SRAM Cell Write Margin Vdd 0 Vdd J. PMOS in latch • Higher WL voltage for access TR • Virtual VDD Higher voltage Vdd 0 0 Vdd = (W/L)pmos / (W/L)access Sizing • Access transistor must be stronger than PMOS to pull the below the trip point (typical pull-up ratio ~ 1. Rabaey Techniques to Improve Write Margin • Sizing: access TR vs.

6T-SRAM Layout Until 90nm BL BLB VDD 6T-SRAM Layout From 65nm GND WL Compact cell Bitlines: M2 Wordline: strapped in M3 41 42 6T versus 4T SRAM 6T SRAM Cell Supply current is limited to th l k t the leakage current of t f transistors in the stable state RAM Variations • Many variations to the basic 6T SRAM cell • More functionality. smaller cells – – – – – – – Dual read or single write cell True multi-ported cell Content addressable memory (CAM) 4T memory cell 3T memory cell 2T memory cell 1T DRAM cell 4T SRAM Cell High d Hi h degree of f compactness High power consumption 43 44 11 .

VLSI Design II . provide data • Needs “Encoder”: Inverse function of decoder Take a one-hot collection of signals and encode them m bits find a match Applications: cache. 2 wide words.Dual Read or Single Write Cell WL0 WL1 WL0 Multi Port Cell BL BLB WL1 BL1 BL0 BLB0 BLB1 • Two wordlines.© Kia Bazargan bus_B0 bus_B1 bus_B0 bus_B1 bus_A0 bus_A1 m 47 EE 5324 . and all other SA’s=0 SA7 Set SBk=1. 2n rows content addressable memory cell array e n c o d e r n [©Hauck] bus_A0 bus_A1 EE 5324 . and all 1. physical particle collider SB0 SA1 SB1 Set SAj=1. dual port mem • To read from word j and write “d1d0” to word k simultaneously: SA0 Content Addressable Memory (CAM) • Instead of address.. . other SB’s=0 Sense the values on bus_A0 and bus_A1 SB7 Write d1d0 to bus_B0 and bus_B1 .VLSI Design II ....© Kia Bazargan 48 12 . one for each access transistor • S ll increase in cell size Small i i ll i • Can either – read two different cells in one cycle – or write to one cell 45 • E h port has separate address Each th t dd • Memory access bandwidth is twice (ideally) • “Write through”: data written can be read by another port in the very same cycle 46 Multi-Port RAM Cells Array • 7 words deep.

© Kia Bazargan EE 5324 .Content Addressable Memory Cell • Read and write like normal 6T memory cell • Match signal is precharged to 1. since it has • Effective strength of NMOS more wires driver is reduced • Refresh needed 51 Write: C S is charged or discharged by asserting WL and BL. 52 13 .VLSI Design II . p pulled to 0 if no match Send data on bit’ and data’ on bit for matching Match remains 1 iff all bits in word match row select row select Encoders content addressable memory cell array ll e n c o d e r match bit bit' [©Hauck] match bit bit' EE 5324 . read WL Vdd and write WL • Cell won’t work at low Vdd • Can have 1 or 2 bitlines • High value stored is (Read/Write) degraded • Not very small.© Kia Bazargan 49 50 Smaller RAM Cells BL WL 1-T DRAM Cell Write 1 WL M1 CS BL V DD V DD /2 CBL V /2 sensing DD X GND V DD 2 V T Read 1 • Internal nodes don’t go to • Need 2 wordlines. typically around 250 mV. Read: Charge redistribution takes places between bit line and storage capacitance CS ΔV = V – VPRE = VBIT – VPRE -----------BL CS + CBL Voltage swing is small.VLSI Design II .

When writing a “1” into a DRAM cell. a threshold voltage is lost.© Kia Bazargan 56 14 . DRAM memory cells are single ended in contrast to SRAM cells.DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line. This charge loss can be circumvented by lost bootstrapping the word lines to a higher value than VDD V BL Sense Amp Operation V(1) V PRE D V(1) V(0) Sense amp activated Word line activated t 53 54 1-T DRAM Cell Capacitor M 1 word line SiO2 Poly n+ Poly n+ Inversion layer induced by plate bias Field Oxide Diffused bit line Polysilicon gate Polysilicon plate Dynamic RAM 1-Transistor Cell: Layout Metal word line Cross-section Layout Uses Polysilicon-Diffusion Capacitance Expensive in Area [©Prentice Hall] 55 Spring 2006 EE 5324 .VLSI Design II . read and refresh operations are necessary for correct operation. due to charge redistribution read-out. The read-out of the 1T DRAM cell is destructive.

Breitwisch. Review and future prospects of low-voltage RAM circuits. T. K. Springer-Verlag New York. W. M. Kawahara. Mann. Ultralow-power SRAM technology. W. LLC • Y Nakagome M Horiguchi T Kawahara and K Itoh Y. W. J. Itoh. 2003. O. Bula. 47. IBM J R&D • R. M. Horiguchi.Advanced 1-T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer Good References on RAM • K. Nakagome. et al. Abadeer. 5/6. No. No. IBM J R&D Cell Plate Si Capacitor Insulator Storage Node Poly Refilling Poly Transfer gate Isolation Storage electrode Si Substrate 2nd Field Oxide Trench Cell Stacked-capacitor Cell 57 58 59 60 15 . 2003. 5/6. Itoh. VLSI Memory Chip Design. Vol. 47. Vol.

61 62 63 64 16 .

Sign up to vote on this title
UsefulNot useful