You are on page 1of 41

Gain-Cell embedded DRAM:

An alternative option for


embedded memories
Prof. Adam Teman
Co-director EnICS Labs, Bar-Ilan University

14 October 2021
Who are we?

Prof. Alex Fish Prof. Adam Teman

Prof. Yossie Shor Dr. Itamar Levi

2
Prof. Osnat Keren Dr. Leonid Yavits © Adam
October
Teman,
14, 2021
Outline

© Adam
October
Teman,
14, 2021
Embedded DRT and Other
GC-eDRAM Summary
Memories Refresh Designs

Embedded Memories
The Computer Memory Hierarchy

© Adam
October
Teman,
14, 2021
The Importance of Embedded Memories
• Memories dominate area and power.
2MB L3 Cache 20MB L3 Cache

Source: Intel

Intel Pentium-M (2001) Intel 10th Gen “Comet Lake” (2020) Source: Intel

16GB On-Chip Memory

Source: wccftec
256MB L3 Cache 960MB L4 Cache
Cerebras Wafer Scale Engine 2 (2021)
6 IBM z15 – CP and SC chips (2020) © Adam
October
Teman,
14, 2021
Static is GOOD!
• A static circuit can replenish its state in light of a disruption.
VQ VQB VQ VQB VQ

0V 1V 0V 1V 0V
VQB
‘0’ State
0.8V

0.4V
0V Q QB 1V

VQ
0V 0.4V
‘0’ State ‘1’ State
• High noise margins!
7 © Adam
October
Teman,
14, 2021
SRAM is GOOD!
• SRAM is the exclusive solution for embedded memories in most ICs.
BL BLB
TSMC 7nm SRAM TSMC 5nm SRAM Test Chip
WL WL
M3 M6
M2 M5
Q QB
M1 M4

Source: TSMC
Source: ISSCC 2020

Source: ISSCC 2020

Samsung
3nm GAA SRAM

Source: ST
Source: ISSCC 2021

8 © Adam
October
Teman,
14, 2021
But… Nobody is Perfect BL BLB

• SRAM is BIG WL WL
M3 M6
• 6 Transistors=1 bit
M2 M5
0VQ QB
1V
M1 M4

• SRAM is Leaky
• Several VDD to GND paths

• SRAM is Ratioed
• Fails under voltage scaling

9 Chang (IBM), IEEE Proc. 2009 © Adam


October
Teman,
14, 2021
Dynamic is SMALL (and that’s GOOD!)
• DRAM can be made with a single transistor
• Up to 3X higher density than SRAM
• But the capacitor is very complicated to fabricate
• 1T-1C DRAM is fabricated in standalone chips
Source: CISCO
at specialized fabs.

Trench cap Stacked caps


Source: Kang, McGraw-Hill’03
10 © Adam
October
Teman,
14, 2021
Dynamic is complicated…
• Data retention is limited
• Lack of positive feedback in bitcell
• Leakage deteriorates levels
• Require periodic refresh operations to ensure data integrity
• Memory availability is limited
• During refresh operations, memory cannot be accessed
Tclk
Availability[%] = 1 − N rows
Tret
• Static power is not only leakage
N rows
• Also includes refresh power Pret = ( Eread + Ewrite ) + Pleak
• Retention Power = Leakage + Retention Tret
11 © Adam
October
Teman,
14, 2021
But SIZE does matter!
• Higher density trumps complex operation
• 1T-1C embedded DRAM
• Used (until?) recently in high-end servers
by IBM, Intel, …
• Provided in some process design kits.
• But… Source: EET Asia

• Requires expensive fabrication cost adders (Deep Trench Capacitor).


• Not provided in many process design kits.
• Doesn’t scale well… most advanced node to date: Global Foundries 14nm

• Can we provide a logic compatible embedded DRAM?


12 © Adam
October
Teman,
14, 2021
Embedded DRT and Other
GC-eDRAM Summary
Memories Refresh Designs

Gain-Cell embedded DRAM

13
Introducing the “Gain-Cell”
WL
• 1T-1C DRAM uses a single port for reading and writing BL

• Write: Drive charge through the port onto a storage capacitor. R/W Port
• Read:
• Precharge bitline and enable charge sharing through the port
• The charge transferred from the storage node changes the bitline voltage
• A large storage capacitor is required to enable sensing this change
• It also destroys the stored data requiring write-back

• What if we provided a decoupled read port? WBL


WWL RWL

• We can amplify the stored charge (=“gain”) Write Port Read Port
• We can separately optimize read and write
RBL
• Read becomes non-destructive
• We get two-ported functionality
14 © Adam
October
Teman,
14, 2021
Basic Gain Cell Operation
• All NMOS 2T Gain Cell • All NMOS 3T Gain Cell
• Write Strong ‘0’, Weak ‘1’ • Write is the same
• Boosted voltage for strong ‘1’ • Read:
• Read: • Precharge RBL, Pulse RWL high
• Precharge RBL, Pulse RWL low • SN=‘0’ → RBL unchanged
• SN=‘0’ → RBL unchanged • SN=‘1’ → RBL discharges
• SN=‘1’ → RBL discharges
Vboost VDD VDD
RWL driven RWL
WWL through diffusion VDD
WBL WWL
‘1’
‘0’ RBL discharge
WBL MW
MW SN dependent on
MR RBL other cells on row
MW SN
VDD MR RBL
RBL saturation
depends on other
RWL cells in column
15 GND © Adam
October
Teman,
14, 2021
GC-eDRAM Advantages
• Compared to SRAM: BL BLB

• Smaller cell size (2-4T vs. 6T) WL


M3 M6
WL

• Low leakage VS. M2


Q QB
M5

• Non-ratioed M1 M4

• Two-ported

• Compared to 1T-1C DRAM


• Logic-compatible VS.
• Non-destructive read
• SRAM-like performance

16 © Adam
October
Teman,
14, 2021
But, charge leaks away…
• Subthreshold conduction
• Exponentially depends on MW’s VT, VGS, and temp
• Depends on voltage difference between SN and WBL
• GIDL and junction leakage
• Asymmetrical between ‘1’ and ‘0’, Increases with temperature
• Gate leakage
• Asymmetrical between ‘1’ and ‘0’, Independent of temperature

Storing ‘1’ Storing ‘0’

17 © Adam
October
Teman,
14, 2021
Write access statistics
• Sub-threshold leakage depends on the relation between SN and WBL
• Scenario 1: Worst-case access
• After writing a cell, WBL is permanently opposite to stored data
Write ‘0’
• Scenario 2: Retention mode
• After writing memory array, it remains in idle or read states,
allowing WBL control -> pre-(dis)charge or bias WBL

Continuously Write ‘1’

18 © Adam
October
Teman,
14, 2021
Data Retention Time Measurement
• Data Retention Time (DRT) is the time from
write until you can no longer read out the data.
• Various approaches for measuring:
• Effective data retention time (EDRT)
• Voltage-based data retention time (VDRT)
• Current-Based Data Retention Time Evaluation (IDRT)

Sources: R. Giterman, A. Bonetti, T. Noy, A. Teman, and A. Burg, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020
19 N. Edri, P. Meinerzhagen,A. Teman, A. Burg, and A. Fish, IEEE Transactions on Circuits and Systems I (TCAS-I), 2016 © Adam
October
Teman,
14, 2021
Embedded DRT and Other
GC-eDRAM Summary
Memories Refresh Designs

Dealing with Refresh

20
The problems with Data Retention Time
• The main barrier for GC-eDRAM is its limited DRT, which leads to:
• Increased power consumption - Pret  1
DRT
• Lower availability Availability  1 → DRT
Trefresh
• This gets worse with transistor scaling,
as the parasitic capacitance is reduced DRT  CG  W  L
• In addition, DRT is a complex factor, as it is dependent on:
• Written voltage levels (Vboost, CI/CF)
• Read Frequency: I RBL  VSN  1
Tret
• Write Statistics
• Data stored in neighboring cells (for 1T read port)
• Accordingly, a wide range of research has focused on extending the DRT
© Adam
October
Teman,
14, 2021
Different Bitcells
• Many combinations of bitcells have been proposed for
improving retention time and other circuit characteristics
Somasekhar 08,09 2T Luk 2004 (SG) / 05 2T1D,
All PMOS 2T 2T1D
Chang 07
3T1D VDD
RBL

WWL BL
PB
MW RWL
WWL
MW MR MR
MR
WBL

RWL WWL
RWL GD GND/
VDD
Vbias MW
Somasekhar 08, 09 Luk 04, Chang 07 WWL
SN

RBL
Chun 12 Asymmetric
Asymmetric 2T 2T ChunBoosted
09,11 Boosted 3T
3T CSN
SN1

WBL
VDD
RBL

WWL
WBL

WWL 400mV SN0


MR GND
RBL

PB
200mV
MW Luk 06, Harel 21
MR MW
MS
WBL

RWL RWL
Chun 12 Chun 09, 11
© Adam
October
Teman,
14, 2021
Dealing with CMOS Scaling
• The retention time of classic GC-eDRAM
options drops significantly below 65nm

• For 28nm operation, a 4T internal-feedback


gain cell (IFGC) was invented 180nm 28nm
• Silicon proven in both
28nm bulk and FD-SOI

Sources: R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018
R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.
23 © Adam
October
R. Giterman,A. Teman, P. Meinerzhagen, A. Burg, and A. Fish, US Patent 10,002,660 Teman,
14, 2021
Different Technologies
• Bulk CMOS technologies suffer from increasing subthreshold leakages
• 180nm provided DRTs of ms, reduced to 10’s of us at 65nm
• Reduced leakage of FD-SOI and FinFET technologies provide new opportunities
28FD-SOI Test Chip
16nm FinFET Test Chip

© Adam
October
Teman,
14, 2021
Sources: R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.
24 R. Giterman, A. Shalom, A. Burg, A. Fish, and A. Teman, IEEE Solid State Circuit Letters, 2020
Body Biasing
• In mature processes, body biasing can be
applied to lower leakage and extend DRT
• Silicon: 100mV RBB → 2.3X DRT Boost

• Can be more aggressively exploited


in FD-SOI processes

Sources: P. Meinerzhagen,A. Teman, A. Fish, and A. Burg, IET Journal of Engineering (JoE), 2013
R. Giterman, A. Bonetti, A. Burg, andA. Teman, IEEE Transactions on Circuits and Systems II (TCAS-II), 2019
J. Narinx, A. Bonetti, N. Frigerio, C. Aprile, A. Burg and Y. Lenlebici, IEEE Asian Solid State Circuits (ASSCC), 2019
R. Giterman and A. Teman, US Patent App. 17/257,893, 2021 © Adam
October
Teman,
14, 2021
Refresh Approaches
• Straightforward approach: ordinary periodic refresh (a.k.a., global refresh)
• Sequentially refresh entire array at 1/DRT frequency
Normal Operation Refresh Normal Operation Refresh

• Reduced array Availability can limit the adoption of GC-eDRAM


• Requires access protocol that enables busy signal
• Not tolerable for all applications
• Not feasible with poor ratio of access time, number of rows, and DRT
• On the fly approaches can improve availability, e.g.:
• Row counters (Xiaoyao, 2007)
• Opportunistic refresh (Kazimirsky, 2016)

26 © Adam
October
Teman,
14, 2021
Hidden Refresh Algorithm
• Can we ensure 100% Availability?
• In order to provide a “drop-in” replacement for SRAM,
a GC-eDRAM macro must ensure 100% array availability.
• Hide the refresh using COIs (copies of instances)
Memory subarrays

COI’s
(invisible to user)

Subarray 1 Refresh Subarray 2 Refresh Subarray N Refresh


DRT

27 Source: R. Golman, N. Nachum, T. Cohen, R. Giterman, A. Teman, IEEE Access, 2021 © Adam
October
Teman,
14, 2021
Refreshing FIFOs
• What if the access is strictly ordered, such as in a FIFO?
Can we do any better?
• Yes.
• There is an upper bound on the number
of interruptions that can occur.
A FIFO of size S is guaranteed to be
refreshed on time if:
NDRT ≥ (S+1) + 2(S-1) = 3S-1
(NDRT is Retention Time in clock cycles)

• So we just need to trigger the refresh in time to ensure we can finish on time!
• Leads to very significant power savings (often no refresh is needed!)
Sources: T. Noy and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020
28 T. Noy and A. Teman, US Patent 10,803,920, 2021 © Adam
October
Teman,
14, 2021
Replica Cells
• Utilize replica cells to track data retention time
due to process variations, write statistics
• Silicon: 5X longer DRT, 5X lower refresh power

Calibrated die: VDD tracking Un-calibrated: W-disturb tracking

5X

29 Source: A. Teman, P. Meinerzhagen, R. Giterman, A. Fish, and A. Burg, IEEE Transactions on Circuits and Systems II (TCAS-II), 2014 © Adam
October
Teman,
14, 2021
Overlapping Refresh

Internal Refresh
Multi-Ported Gain-Cell

Double-pumped Read

Sources: O. Harel, Y. Nachum, R. Giterman, Microelectronics Journal (MEJ), 2020


30
E. Levy, A. Sfez, R. Golman, O. Harel, and A. Teman, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2020
R. Golman, R. Giterman, and A. Teman, IEEE Int. Conf. on Electronic Circuits (ICECS), 2018 © Adam
October
Teman,
14, 2021
Embedded DRT and Other
GC-eDRAM Summary
Memories Refresh Designs

Other Designs and Use Cases

31
Low-leakage Hybrid Memory
• A hybrid SRAM/GC-eDRAM cell can
provide ultra low-leakage by
• Power gating the supply during standby
• Rely on dynamic storage of GC-eDRAM
• Use the SRAM latch to refresh the data

Sources: R. Giterman, A. Teman, P. Meinerzhagen, IEEE Transactions on Circuits and Systems II (TCAS-II), 2017
32
R. Giterman, A. Teman, P. Meinerzhagen, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2017
© Adam
October
Teman,
14, 2021
Radiation-Hardened Dynamic Memory
• A conventional 2T gain-cell is only susceptible to a one-direction bit-flip
• Combine complementary 2T cells and one will never fail!
• When reading, if both outputs are complementary → No error
• If both outputs are the same (presumable data ‘1’) → an error has occurred
• Add parity to correct the error!
• Can also be used for retention time extension.

Sources: R. Giterman, L. Atias and A. Teman , IEEE Transactions on VLSI (TVLSI), 2016
R. Giterman, L. Atias and A. Teman, US Patent 10,991,421
33 R. Giterman, R. Golman and A. Teman, IEEE Access, 2019 © Adam
October
Teman,
14, 2021
True Approximate Storage
• Approximate computing does not require
100% error-free operation.
• However, this requires “graceful degradation”
• This is an inherent trait of DRT failures
28nm GC-eDRAM with Integrated dynamic and static RAM (iD-SRAM)
reduced refresh frequency

1us 5us

10us 50us
Sources: A. Teman, G. Karakonstantis, R. Giterman, P. Meinerzhagen, and A. Burg, DATE 2015
S. Ganapathy, A. Teman, R. Giterman, A. Burg, and G. Karakonstantis, IEEE NEWCAS, 2015
34 © Adam
October
R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018
A. Kazimirsky, A. Teman, N. Edri, and A. Fish, IEEE Trans. VLSI (TVLSI), 2017 Teman,
14, 2021
Ternary Bitcells
• Static bitcells are bi-stable
• And therefore, can only store two values (VDD and GND) 100 ‘0’ (𝐺𝑁𝐷)
• But dynamic circuits can be at intermediate levels 010 ‘1’ (
𝑣𝑑𝑑
2
)

• The provides the capability to implement a multi-level cell 001 ‘2’ (V𝐷𝐷)

• A 5T bitcell allows digital readout of ternary values Precharge:


• Can be used for higher density RBLN→VDD
RBLP→GND
• Can be used for ternary logic (e.g., ternary weights)

SN=GND SN=VDD/2 SN=VDD


Readout: ’11’ Readout: ‘01’ Readout: ‘00’

35 © Adam
October
Teman,
14, 2021
Cryogenic GC-eDRAM
• Cryogenic operation is used for certain applications:
• Quantum computing, Infra-red imaging, HPC
• Subthreshold leakage is highly
suppressed under these conditions
• Dynamic memories could be a great option!

37 Source: E. Garzon, Y. Grinblatt, O. Harel, . Lanuzza, and A. Teman, IEEE Transactions on VLSI (TVLSI), 2021 © Adam
October
Teman,
14, 2021
Embedded DRT and Other
GC-eDRAM Summary
Memories Refresh Designs

Summary and Conclusion

38
A decade of GC-eDRAM research
• I started researching gain cells in 2012
• More than 40 published papers.
• One full-length book. GREENBELT1
(2012) 180nm
GREENBELT2
(2013) 180nm
CAMEL (2014)
65nm
dynOR (2015)
28 FDSOI
DAFNA
(2016) 28nm

• 13 taped out test chips


• And much more to come…
• In memory computing BEER (2017)
28 FDSOI
MARTINI (2018)
28 FDSOI
KWAK (2019)
28 FDSOI
ERGODEC (2020)
28 FDSOI
• Dynamic CAMs
• Reliability studies
• and more
NEGEV (2020) Rosetta (2020) LEO-I (2021) Sansa (2021)
16 FinFET 65nm 65nm 16 FinFET

• One clear thing is that GC-eDRAM is different than other memories


and requires specialized/targeted research
© Adam
October
Teman,
14, 2021
Architectural Modeling
• Large variety of design tradeoffs:
• Read and write peripherals: power vs. access time
• Different bit-cells: area vs. retention time
• Geometry of basic array: rows/columns
• Breakdown into sub-arrays for larger arrays

• GEMTOO – a GC-eDRAM Modeling Tool


Data Retention Time Refresh Rate Bitcell Topology
GEMTOO available for
GEMTOO download at:
https://www.epfl.ch/labs/tcl/
Access Time Modelling Memory Organization
resources-and-sw/gemtoo-
Tool a-gain-cell-embedded-
dram-modeling-tool/
Silicon Area Memory Density Memory Bandwidth

40 Source: A. Bonetti, R. Golman, R. Giterman, A. Teman, and A. Burg, IEEE Transactions on VLSI (TVLSI), 2020 © Adam
October
Teman,
14, 2021
And the next step: RAAAM
Delivering the Highest Density Volatile
Embedded Memories in Standard CMOS
Reduced Cost | Longer Battery-Life | Better Performance

Newest addition to the


Silicon Catalyst family
October 2021
https://raaam-tech.com/

Dr. Robert Giterman Prof. Andreas Burg Prof. Alex Fish Prof. Adam Teman Mr. Danny Biran
41 CEO CTO Technology Advisor Technology Advisor © Adam
October
Teman,
Business 14, 2021
Advisor
Thank you

42 © Adam
October
Teman,
14, 2021

You might also like