High Speed Memory Alsc

High Speed SRAM Design
"
Clock to WL (Read/Write speed)

Read path (Memory Core) Data-Out path (Periphery) Data-In path (Periphery) Write Path (Core) Precharge scheme (Core) Misc Issues
"
"
"
"
"
"
CLK TO WORDLINE (WL)

"
"
"
"
"
CLK to WL timing is one of the most critical timing in High Speed SRAM design. (50% of SRAM access time). Important for both Read and Write. For Read, after WL is turned on, data from the cell comes on Bit-line which is sensed and sent to data-path blocks. For write, after WL is turned on, data-to be written is presented on bitline which flip the cell. WL in high speed designs in "pulsed" till data is sensed (Read) / data is written (Write).
Clock to WL path
Location of Input buffers
Address Decoding/Multiplexing (PR Generation) Row Redundancy using dynamic logic
Clock routing using Rowred info in XDEC

Dynamic logic scheme for GWL/SWL generation
Timing Example (QDR B2)

"
RD/WR address at rising/falling edge resp.

Q /D are valid at both rising/falling edge (DDR operation) Separate Clocks (K/C) for cmd and Data. Two clocks (K/K_) & (C/C_) for DDR timing.
"
"
"
36M Chip floor plan
Clock to WL (I/P buffers location)

"
Long RC lines are driven by big drivers and it switches at regulated VDD supply and hence increase current consumption in set-up phase. Input buffers if kept close to the pad result in long RC lines for address from Input buffer till master/slave latch (typically kept in chip-centre) Current consumption is a function of address change pattern. (More for address compliment) Typical distance for a 36M (0.11um) from the bond-pad to the middle of the chip is 7.5K.
"
"
"

"
Place the input buffer at chip-centre and route the pad signal all the way. Increase the pin capacitance as entire 7.5K routing is seen on the bondpad.
"
"
Entire RC line is driven by external driver and not by the regulated VDD supply so address switching current is independent of address pattern.
Width of the pad signal to be decided based on ESD requirement plus pin capacitance.
"

"
"
"
"
"
Higher the width, better for ESD but input capacitance becomes higher. Layout should avoid 90 degree bends for better signal waveforms. Use 45 degree instead. Use top-metal for better EM and lesser capacitance. Requires low cap ESD structure to meet pin capacitance spec Shield the pad signals on both sides by supply lines to avoid delay variations due to signal switching condition.
Shielding of Signal Lines in Setup/Hold path
Issues : Shielding
" "
Switching of lines in the flow through path depends upon external conditions. External conditions means switching pattern during setup/hold conditions, rise/fall time.
"
"
"
This effect could be severe particularly if pad signal is directly routed till the chip-middle. Amount of dips/bumps on supply lines depend up on rise/fall time of the signal. No logic should be put on shielding VDD/GND lines due to dips/bumps.
Hence minimum width VDD/GND lines to be used for shielding to minimise space overhead.
"
Master Latch
"
Required to meet hold time. Located in HMIDDLE to minimize hold time skews across different addresses. Kept as close as possible to CLKGEN. IN to OUT delay should be min. for good setup.
"
"
"
Address predecode/slave-latch (1)

"
TG latch to register decoded addresses. Driver input is driven low by 4 series nMOS hence speed suffers. High gate-load on Address inputs As / NAs.
"
"
Address predecode/Slave latch (2)

"
Dynamic logic for slave latch. Driver input is driven by 2 series nMOS (faster). Less input gate load on As / NA. Area efficient. (Small NOR3 devices)
"
"
"
Set-up/Hold best scenario

"
Master/slave latches should be placed close by so that min sized devices can be used in M-latch. Small pass-gate sizes (for CLK) in master latch means less gate-loading for CLK and hence better hold time. (1 M-latch for each address) Predecoder gate load at the input of slave latch should be small so that driver in master-latch (and also pass-gate) can be kept as small as possible w/o sacrificing M-latch delay. Hence choice of slave-latch (2) is better.
"
"
"
Set-up/Hold layout issues

"
Watch out for the connection RC (done in lower level metal) of the clk signal to M-latch/Slave-latch. Place M/S-latches as close as possible to main Clk trace (Top metal). Typically M/S-latch exists for all address and control inputs so connection RC gets multiplied by no. of M/Slatches. Keep the Clk connection away from other connection to latches (to reduce coupling caps).
"
"
"
Row-Redundancy
"
Dedicated fuse-logic for every core.
"
Evaluation one of the bottleneck to fire WL (one of the gating item to fire normal WL)
Address based redundancy is faster but for different RD/WR address timing (eg QDR) PR based redundancy is simpler.
"
"
Kept in HMIDDLE.
Typically 1 redundant row per Core. There is one signal carrying redundancy eval info for every core.
"
Motivation for Dynamic Rowred

"
Common Clock controlling WL pulse width is heavily loaded as it has to drive decoders (with huge input cap) for both Top/Bottom half of core. Local Inverters are used for wave-shaping.
Clock information if sent through redundancy eval signal will be specific to each core and hence will be less loaded. Local wave-shaping inverters are not required.
"
Dynamic Row-Redundancy
RGWL (Old v/s New)

"
Timing requirement similar in both the cases.
GWL (Static v/s Dynamic)
Comparison : Static v/s Dynamic

" "
"
"
"
Dynamic decoders have lesser area ; Easy to layout in tight space. Considerably reduces gate load for long lines hence sharper waveforms. Gate-sharing (series nMOS) reduces the gate-load (Eg : Common nMOS for PRI in dynamic scheme) Separate WLE_NREDEN_CORESEL as compared to common WLE_SEL means sharper Clock waveforms without local waveshaping inverters. For pulsed high inputs, no timing margin constraints in dynamic scheme (Evaluation during high).
SWL generation (Static v/s Dynamic)

"
Less input gate loading for dynamic scheme (3 times less loading for SEGEN input).
Read Path (Sense path) Basics

"
Bitlines are selected based on some set of external addresses.

Bitlines are precharged at VDD. There are two ways of sensing the information on Bitline after WL is turned on. Voltage sensing : Senses the differential voltage on BL - BL_ (Compact S/A circuit) Current sensing : Sense the current difference on BL - BL_ (Big size S/A circuitry)
"
"
"
Voltage v/s Current sensing

"
Voltage sensing : Sense amplifer needs to be turned on after a delay from WL to allow the differential to build. Current Sensing : Sense amplifier turns on as soon as WL turns on. Current sensing consumes more current (biasing) than voltage sensing.
"
"
"
Current sensing becomes risky in case of technology with high leakage currents, Voltage sensing can be made to work by delaying SAMP enable more.
Read path Basics (Contd)

"
Two ways of connecting the BL/BL_ to SAMP (a) Through the column pass-gate (pMOS) (b) Directly (No column pass-gate) In case of (a) typically 16 or 32 to 1 Mux is used i.e. 1 SAMP for 16/32 bitline pairs
In case of (b) there is 1 SAMP per BL-BL_. More circuitry in Core for SAMP and related logic in case of (b). SAMP layout has to fit in BL-BL_ pitch which favours choice of voltage SAMP.
"
"
"
T-Gate muxing for Read

"
CPGs should arrive same time as WL.
"
Difficult to optimise pMOS size for speed.

Less Width ; less C but more R. More Width ; less R but more C. Delay in split transfer from BL to TBUSR. Keep SAMP at mid point of TBUS routing.
"
"
"
"
Motivation :Per Bit SAMP scheme

"
"
Column pass-gate mux adds routing parasitics and wired-OR gate loading in the weakly driven critical path of voltage differential on SAMP . Typically pMOS are used as Bit-lines are around VDD in read mode. (pMOS passes 1 efficiently) A 32 : 1 Mux involves drain load of 32 pMOS and around 50um routing. This adds 500ps delay for the 80mv differential to appear on SAMP nodes (CSM 0.13um) WL pulse has to be wider to get the desired differential . Higher BL/BL_ spilt and hence higher precharge current/longer precharge time at the end of the Read cycle. Not much speed gain with 16:1,8:1,4:1 mux.
"
" "
"
Per Bit SAMP scheme (READ)

"
One SAMP per every Column (BL-BL_)

Column decoder signals combine with Senseclocks to decide which SAMP to turn-on. Column decoder signals can arrive later than WL but before than Sense clocks. (Less constraint)
"
"
"
16: 1 Mux is kept at the output of SAMP which are strongly driven CMOS signals (faster).
Per bit SAMP scheme
Per bit SAMP issues

" " "
SAMP has to be laid out in BL/BL_ pitch hence SAMPBANK has lot of devices packed in a tight area. SAMP input nodes are liable to coupling. BL/BL_ pitch (core DRC rules) and SAMP transistors rule (peripherey DRC rules) vary across technologies so difficult to port layout. More devices hence more leakage paths per BL/BL_ in SAMPBANK hence not suitable for designs with low leakage current spec. To minimize leakage current, transistor length for logic in SAMPBANK area is kept > min channel length there by putting more constraint on pitched layout design.
"
"
COLRED scheme Basics

"
COLRED logic to repair faulty bitlines which translate to faulty I/O repair. Typically a redundant bitline can repair more than one I/Os. COLRED logic contains fuses corresponding to each column address plus IO fuses.
"
"
"
IO fuse decide the assignment of Redundant SAMP to a particular IOR line.

Typically output data from redundant column can be assigned to 3 I/Os. (2 IO fuses)
"
Read Data Muxing (COLRED & NORMAL path)

"
Two schemes to mux the data from redundant bitlines and normal bitlines
(a) Provide separate RIOR lines each redundant bitline data out and put mux to select RIOR and IOR lines. (b) Use IOR lines to carry both normal and redundant bitline data (Wired-OR I/O line). Need to disable normal path incase of COLRED evaluation.
"
"
Read data muxing scheme (a)
Read Data scheme (b)
COLRED RD Data path

"
No muxing at immediate output of SAMP, Muxing at the input of final Read Driver.
COLRED logic placement (layout)

"
Scheme (a) involves mux in the data-path (post read driver) hence ideal location is away from VMID and close to I/O path circuitry. Scheme (b) requires that evaluation should happen with sufficient margin to disable path hence it should be kept close to address latches for faster evaluation. (In VMID).
Advantage of scheme (b) is less switching current due to lesser routing for address to eval logic.
"
"
Data out Path (Periphery circuits)

"
Output Data-timing (QDR & QDR II)

DLL : Tight tCO and tDOH in QDR II.
"
"
Echo Clocks
Prefetching for DDR operation Write DIN muxing in case of address-match (Coherency) Read I/O line muxing for X36, X18, X9 etc
"
"
"
"
Q-latch clocking for DDR operation

Programmable Impedance
"
"
Output Stage
Dataout timing (QDR / QDR II)

"
Assume 166 Mhz operation, Half-cycle = 3ns

For QDR, tCO = 2.5ns & tDOH = 1.2ns Data-valid window = (3 - 2.50) + 1.20 = 1.7ns For QDR II tCO = 0.45ns & tDOH = -0.45ns Data-valid window = ( 3 - 0.45) + 0.45 =2.1ns
"
"
"
Wider Data-valid window for QDR II. Data is more or less coincident with K / K_ rising.
1st Data is half-cycle delayed in QDR II.
"
Importance of Data validity

"
"
"
"
Input data valid window for any chip is equal to set-up time + Hold time (tS + tH). Some ASICs/FPGAs do have tS/tH of 1ns. QDR wont be able to interface with a chip with tS = tH = 1ns (data-validity = 2ns) but QDR II will. Shorter data valid window means little margin for set-up/hold time window to the interfacing logic IC.
Factors affecting data valid window

"
" "
"
Logic gate speed becomes half at fast corner (Higher voltage/Cold temp/fast process) as compared to slow corner (Low voltage/high temp/slow process). Hence tCO = 2.5ns result in tDOH = 1.2ns. tCO includes external clk to Q-latch clk, Q-latch delay, predriver delay and output buffer delay. (Quite a few gates). Hence way to increase data valid window is to reduce tCO i.e. reduce no of gates in Clock to output path.
Flight time in system

"
Traditional Sync SRAMs have single clock to control input and output. Flight time delays both clock and output data for farthest SRAM.
"
"
Flight time variation makes it difficult for controller to latch the data.
Separate Command/IO clocks

"
Input clocks (K/K_) for command/address registration. Output clocks (C/C_) for read output timing.
"
Flight time elimination (1)

"
Separate command and I/O clocks eliminate flight time differences for data (to be latched at controller). Farthest SRAM is clocked for I/O first, returned clocks is used to latch the data.
"
Flight Time elimination (2)

"
To avoid loading, separate C/C_ clocks can be routed from controller to the 2 SRAMs. Nearest SRAM has maximum skew between command and IO clocks. (Spec for K / to C/)
"
DLL : Tighter tCO/tDOH in QDRII

"
DLL makes Output data coincident with the rising edge of C and C_ clocks. (Low tCO)
Internally DLL clocks are generated tCO time earlier than output clocks. External tCO and tDOH spec gets generated because of slow & fast corner respectively. Because of small positive tCO, tDOH is negative.
"
"
"
DLL Issues
"
DLL needs some amount of clocks cycles to lock up after power up. (1024 cycles).
DLL is fine tuned for a particular frequency range. Puts a min frequency spec on the QDR unlike conventional sync SRAMs. DLL can be turned off by a pin called DOFF, QDR II timing will become similar to QDR.
"
"
"
Need for Echo clocks in QDR II

"
QDR, hold time is positive (1.2ns). Hence QDR controller can use K_ / to latch the 1st data, K / can be used to latch 2nd data and so on.
"
QDR II, hold times are negative !

QDR II supplies Echo clocks which helps in latching the data from QDR II accurately. CQ and CQ_ in sync with clocks C (K) and C_(K_) respectively.
"
"
Echo Clock timing

"
CQ & CQ _ ; Free running clocks. Same Frequency as input clks (C - C_ or K-K_) Constant timing relationship with Data coming out of QDR and echo clocks.
"
"
"
Echo clock helps in generation of DLL clocks which are in advance of external clocks.
DLL Clock generation / Echo clocks

"
Aim of DLL : To make CQ/CQ# and Q coincident within tCO (0.45ns) from C/C.
Echo Clock Generation

"
Same output buffer for CQ/CQ_ as for Q.

Same controlling clocks (DLL outputs) to latch for data and Q-latch for CQ/CQ_. Q-
"
"
CQ/CQ_ buffers are kept on same VDDQ/VSS bus as Q buffers.
"
Similar ESD structures to match output cap.

CQ/CQ_ buffers are laid out close to Qs in left/right.
"
CQ/CQ_ scheme for data latching

"
CQ/CQ_ are specific to each SRAM. Allows point to point link between controller & SRAMs for data latching.
"
CQ/CQ_ clocks for data latching

"
Constant timing relation between data and CQ clocks. Data can be latched with 100 % accuracy. Rising edge of echo clocks always occur 0.1ns/0.2ns before data. Echo clocks can be delayed to arrive centered with data at controller input for equal tS/tH. Allows use of QDR SRAMs from multiple sources.
"
"
"
"
Data out path block diagram
Prefetching for DDR operation (B4)

"
Corresponding to Read address A1, four data are output (Q11, Q12, Q13, Q14) Q11 and Q12 are accessed simultaneously in Read-A1 cycle. Similarly Q13-Q14 are accessed simultaneously in next cycle. (Prefetch) Hence for a X36 (36 output) part, internal datapath contain 72 Read data lines.
"
"
Address Match timing (QDR B2)

"
Read address A3 = Write address A2 (Previous match) Q11 = D21 ; Q12 = D22
Read address A1 = Write address A2 (Current match) Q31 = D21 ; Q32 = D22 Both cases write DIN should be routed to DOUT ignoring the memory read.
"
"
I/O Muxing
"
Typically X36, X18, X9 options are provided in a single die (with bond-option). Hence muxing. More options, more muxing logic.
"
"
Muxing introduces gate delays in data-path.

Data stored in Write pipe should be routed to output buffer in case of address match between successive Read and Write (2 Match conditions) Overall it is >= 6:1 Mux
"
"
I/O Muxing schemes
Q-Latch
" " "
" "
"
Input(s) change every clock cycle (Prefetch); Outputs change every half cycle (DDR operation). Output driver is CMOS Output enable information is combined before Data goes to Q-latch to prevent junk data being driven on output pins. For tristate, pMOS gate = VDD, nMOS gate = GND Separate latch for pull down and pull up path with separate set/reset logic for tristate. Logic should be added to tristate output during powerup.
Q-latch clocking for DDR (pulldown)
Q-Latch clocking for DDR(pullup)
Programmable Impedance
"
Output drivers can be configured to have variable impedance (between 35 & 75 Ohms) to match load conditions. Chip configures its output impedance to match the load every 1024 cycles to track supply voltage and tempreture variations.
Output impedance can be changed on the fly during active Read cycles. (pull-down nMOS is configured while output data is '1' and vice versa).
"
"
Programming Code (impedance) on the fly
Programming Impedance codes on the fly
Output predriver/driver
Write DIN (B4) : One cycle latency

"
Ideally Separate I/O should make zero DIN latency possible. Zero cycle latency for write makes it an issue for data coherency (latest data output).
"
"
Tight conditions in case of address match i.e. A1 = A2. D21 will be routed through mux to the output buffer with in half cycle.
LW solves coherency problems

"
A1 = A2 is not match condition requiring DIN information since Q11 comes before D21.
A2 = A3 is valid match ; D21/D22 comes much before being required to be routed to the output.
"
Synchronizing Write address (WL) and DIN

"
"
"
"
WL corresponding to write cant be turned on in A2 cycle since data is delayed by 1 cycle. Read command can be given with different address (A3) during D21-D22 cycle so write cant be performed in D21-D22 cycle either. Hold the data in registers till next valid write command is given. Write-A3 is done when Write-A4 (i.e. next valid write) command is given.
Interesting scenario
"
Always one unfinished write command. Write address to be written (till next valid write) is stored in a register which gets updated only on a valid write.
"
"
Address match scenario can happen after any number of cycles. (Read address = Unfinished write address)
Write Data-in path (QDR B4)

"
Write DIN comes 1 cycle later than Address (Late Write).
"
D21-D22 : Written together in 1st half cycle, D23-D24 : Written together in 3rd half cycle.
Write is actually performed at next write cmd i.e. D21-D22 are "actually" written in 1st half of Write-A5 cycle and D22-D23 are written in the 1st half of next DSEL cycle (D51). 2 cycles min. between successive Writes.
"
"
Write Data In path (QDR B2)

"
Data comes half cycle before address. (Early Write)
"
Data can be written in the same cycle as given but to simplify design, D21-D22 are written together during Write-A5 half cycle.
One cycle min. between successive write.
"
"
1/2 cycle delay between DIN at the pin and DIN actually being written in B2/B4 respectively ; DIN path is not speed critical.
Write Path Timing (Eg QDR B4)
Write Path Basics (Core)

"
Connecting BL/BL_ to WRTDRV (2 ways) (a) Through the column pass-gate (nMOS) (b) Directly (No column pass-gate)
"
In case of (a) typically 16 or 32 to 1 Mux is used i.e. 1 WRTDRV for 16/32 bitline pairs
In case of (b) there is WRTDRV per BL-BL_. More circuitry in Core for WRTDRV and related logic in case of (b). WRTDRV layout has to fit in BL-BL_ pitch.
"
"
T-gate muxing for Write

"
Difficult to optimise nMOS size for speed.
"
Less Width ; less C but more R.

More Width ; less R but more C. .
"
"
Keep WRTDRV at mid point of TBUS routing
Motivation : Per BL/BL_ WRTDRV scheme

"
"
"
"
Since either BL or BL_ is driven to GND to write the cell, nMOS mux is used. During write, 0.7/0.13 nMOS device comes in series with big write-driver pulldown (w=3.4) effectively reducing its strength to flip the cell. At 1.2V/TT/120C, this series nMOS adds about 800ps delay from TBUS \ to BL \. (16 -> 1 CPG and 20u TBUS length). Not much speed gain in 16:1, 8:1, 4:1 mux.
Per BL/BL_ WRTDRV scheme

"
One nMOS pulldown directly connected to each BL/NBL.
"
CPG<0:15> combine with WE / to select 1 out of 16 WRTDRV.

WRTDRV enable pulse-width is determined by "WE" pulse-width. CPGs need to arrive before WE /.
"
"
"
nMOS Pull-downs are laid out in BL/NBL pitch.

No pull-up in write driver
"
Per BL/BL_ WRTDRV scheme

"
No pullup in WRTDRV as SEN2 is low during write, ssamp pMOS keeps BL/NBL high. If BL is driven low by turning on MN8, this low passes (weakly) through SEN2 passgate MP58 and turns on pull up of Inverter I4 which keeps NBL to VDD.
"
Per BL/BL_ WRTDRV scheme (1)

"
One Pre write-driver for every 16 Bitline.

Output of Pre Wrtdrv DATA_BL and DATA_NBL combine with CPG<0:15> for every BL/NBL_. NMOS Gate-sharing in final wrtdrv for compact layout.
"
"
Per BL/BL_ WRTDRV scheme (2)

Final Wrtdrv just 3 transistors as compared to 13 transistors. Gate load on CPG and DATA signals is more. No leakage path in wrtdrv logic (previous circuit 4 leakage paths). Bigger devices in the final stage to get the same drive strength because of series nMOS.
Per BL/BL_ Wrtdrv scheme (3)

10 devices (< 1st scheme). 3 Leakage path. Extra routing for local NCPG signal in wrtdrv. Less load on CPG and DATA signals. CPG low isolate gate load on DATA Single nMOS for final stage so less final driver size than 2nd case.
Per BL/BL_ write driver : issues

"
"
"
"
Like SAMP, Write-driver has to be laid out in BL-BL_ pitch. Internal nodes are at digital levels (VDD/GND) ; not unlike analog voltages (voltage-differential) in SAMP. Lesser layout constraint/requirements. Since logic is repeated for every BL/BL_ pair, channel lengths are kept > min length to limit leakage current (Standby current spec).
Bitline precharge (Basics)

"
Bitlines are precharged to VDD in between active cycles (RD/WR). During write, either BL or NBL is driven fully to ground. Hence BL swing during Equalisation is very high. Write -> Read equalisation is one of most important critical timing in high speed SRAMs. During Read, because of the pulsed WL, typical BL/NBL splits are around 200+ mv hence precharge after read is not critical.
"
"
"
Typical Precharge Scheme/Issues

"
For longer BL/BL_ backend EQ helps.
"
Routing for backend EQ is more, hence smaller pMOS for backend EQ.
Size of LOGIC EQ pMOS is determined by WR -> RD timing. NEQ sees big load. Rise/Fall time is an issue for faster EQ.
"
"
Motivation : Faster precharge scheme

"
Equalisation takes more time only for the BL or NBL which is driven low.
"
Typically only few bitlines are driven low in a coreseg. (Eg. In ALSC QDR SRAM only 6 out of 192 bitlines are driven low during WR).
CPG selects the BL/NBL to driven low during write hence it can be used to selectively turn on big EQ devices during precharge. Back end EQ devices can be sized to take care of equalisation after read. Hence to save current big EQ devices should be turned on only after Write.
"
"
Faster Precharge scheme (1)

"
Big EQ device will turn on only for BL NBL being written. EQ is less loaded.
"
"
Latch should be powered up so that NEQ_CPG = high. 8 transistors per BL/NBL. 2 leakage paths per BL pair.
Difficult for pitched layout.
"
"
Faster precharge scheme (2)

"
Lesser devices. NEQ_CPG goes low only for (CPG = 1) during EQ high. NEQ_CPG floating for BLs (CPG = 0) during EQ high.
"
"
"
During standby NEQ_CPG is floating low for last written bitline but floating high for all other BLs. Only 1 leakage path.
Easy for pitched layout (3 transistors)
"
"
Faster precharge scheme (3)

"
EQ behaviour changed, default is low, self timed pulse after WL \. Ensure EQ is low during power-up. NEQ_CPG floating high for BLs (CPG =0) during EQ high. During Standby, NEQ_CPG is taken solid high (better !)
"
"
"
Voltage regulator issues

"
Separate supply for SAMP to isolate the switching noise on regular VDD rail. Max VDD under minimum current condition.
"
"
Tail current adjustment according to switching load.

Regulator current under active and standby conditions. Regulator biasing block placement.
"
"
"
"
VREF routing (shielded) with decap.

Regulator final driver placement.
Misc Layout: Decaps

Decaps have to be kept under powerbus, signal lines especially near big drivers to prevent localised dips in VDD.
"
Decaps kept under Top, (Top-1) metal lines increase the capacitance by < 2% if thin orthogonal Metal1 is used for supply connection of decaps. Higher L for decaps means more decaps can be laid out in a given area but decap effectiveness will reduce because of series resistance due to higher L. Length of decaps should be kept 12 times the minimum channel length to optimise parasitic series resistance of decap transistor and amount of decaps. Put decaps as on VDDQ/VSSQ bus as part of output buffer layout. Keep decaps little away from ESD structures as they tend to store charge during ESD event.
"
"
"
Layout : Drivers
"
Avoid keeping too big drivers for long routing.

Keep options to reduce/increase drive strength. Big drivers when turn on causes localised drops in VDD due to high current. Instead use stages of driver for long traces.
"
"
"
"
Avoid keeping too many big drivers near by which are likely to switch on simultaneously.
Layout : Routing long line

"
Top metal has higher thickness hence highest coupling cap. Top metal and (Top-1) metal capacitance differs by 10% for a routing length of 3.5K. Alternatively route Top and (Top-1) metal, use Top metal for relatively longer distance and Top-1 metal for shorter distance. Delayed signals like sense clocks, IO precharge, Address pulse etc can be routed in (Top-1) metal ; Routing delay can be taken into account for overall timing. Set-up time critical signals like redundancy info, WL clocks should be routed in Top metal.
"
"
"
"
Many Thanks !!
To all QDR team members (SQ/SF ; design and layout) for implementing the schemes and thorough simulations.
"
"
To DLL/IO/Regulator teams for their support and assistance.

High Speed Memory Alsc

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Speed Memory Alsc

Uploaded by

Copyright:

Available Formats

High Speed SRAM Design

Clock to WL (Read/Write speed)

CLK TO WORDLINE (WL)

Clock routing using Rowred info in XDEC

Timing Example (QDR B2)

RD/WR address at rising/falling edge resp.

36M Chip floor plan

Clock to WL (I/P buffers location)

Clock to WL (I/P buffers location)

Clock to WL (I/P buffers location)

Shielding of Signal Lines in Setup/Hold path

Address predecode/slave-latch (1)

Address predecode/Slave latch (2)

Set-up/Hold best scenario

Set-up/Hold layout issues

Dedicated fuse-logic for every core.

Motivation for Dynamic Rowred

RGWL (Old v/s New)

Timing requirement similar in both the cases.

GWL (Static v/s Dynamic)

Comparison : Static v/s Dynamic

SWL generation (Static v/s Dynamic)

Read Path (Sense path) Basics

Bitlines are selected based on some set of external addresses.

Voltage v/s Current sensing

Read path Basics (Contd)

T-Gate muxing for Read

CPGs should arrive same time as WL.

Difficult to optimise pMOS size for speed.

Motivation :Per Bit SAMP scheme

Per Bit SAMP scheme (READ)

One SAMP per every Column (BL-BL_)

Per bit SAMP scheme

Per bit SAMP issues

COLRED scheme Basics

IO fuse decide the assignment of Redundant SAMP to a particular IOR line.

Read Data Muxing (COLRED & NORMAL path)

Read data muxing scheme (a)

Read Data scheme (b)

COLRED RD Data path

COLRED logic placement (layout)

Data out Path (Periphery circuits)

Output Data-timing (QDR & QDR II)

Q-latch clocking for DDR operation

Dataout timing (QDR / QDR II)

Assume 166 Mhz operation, Half-cycle = 3ns

Importance of Data validity

Factors affecting data valid window

Flight time in system

Separate Command/IO clocks

Flight time elimination (1)

Flight Time elimination (2)

DLL : Tighter tCO/tDOH in QDRII

Need for Echo clocks in QDR II

QDR II, hold times are negative !

Echo Clock timing

DLL Clock generation / Echo clocks

Echo Clock Generation

Same output buffer for CQ/CQ_ as for Q.

CQ/CQ_ buffers are kept on same VDDQ/VSS bus as Q buffers.

Similar ESD structures to match output cap.

CQ/CQ_ scheme for data latching

CQ/CQ_ clocks for data latching