Articles Ucompressed

Received 9 September 2022; revised 14 November 2022; accepted 10 December 2022.
Date of publication 26 December 2022;

date of current version 21 February 2023. The review of this article was arranged by Editor S. Menzel.
Digital Object Identifier 10.1109/JEDS.2022.3230542
3-D Monolithic Stacking of Complementary-FET

on CMOS for Next Generation
Compute-In-Memory SRAM
MD. AFTAB BAIG 1 , CHENG-JUI YEH1 , SHU-WEI CHANG2,3 (Member, IEEE), BO-HAN QIU1 ,
XIAO-SHAN HUANG1 , CHENG-HSIEN TSAI1 , YU-MING CHANG1 , PO-JUNG SUNG3 , CHUN-JUNG SU4 ,
TA-CHUN CHO 3 , SOURAV DE 1,5 (Member, IEEE), DARSEN LU 1 (Senior Member, IEEE),
YAO-JEN LEE 6 (Senior Member, IEEE), WEN-HSI LEE 2 ,
WEN-FA WU 3 , AND WEN-KUAN YEH3
1 Institute of Microelectronics, National Cheng Kung University, Tainan 701, Taiwan
2 Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
3 Taiwan Semiconductor Research Institute, Hsinchu 300091, Taiwan
4 Department of Electrophysics, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
5 Center Nanoelectronic Technologies, Fraunhofer-Institut für Photonische Mikrosysteme, 01109 Dresden, Germany
6 Institute of Pioneer Semiconductor Innovation, Industry Academia Innovation School, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
CORRESPONDING AUTHOR: D. LU (e-mail: darsenlu@mail.ncku.edu.tw)
This work was jointly supported by the Ministry of Science and Technology under Grant MOST-108-2634-F-006-008 and Grant MOST-109-2628-E-006-010-MY3,
and the Center for Future Computing, Miin Wu School of Computing, NCKU.
ABSTRACT Monolithic 3D stacking of complementary FET (CFET) SRAM arrays increases integration
density multi-fold while supporting the inherent SRAM advantages of low write power and near-infinite
endurance. We propose stacking multiple 8-transistor CFET-SRAM layers on regular CMOS periphery to
achieve an ultra-high-density array for computing-in-memory (CIM). CFET and regular CMOS (FinFET)
devices are measured and calibrated with BSIM-CMG compact model. SPICE simulations are performed
to evaluate the delay of CIM operation, power consumption, and analog computational error due to device
non-linearity. The impact of device non-linearity on neural network inference accuracy is evaluated using
the CIMulator simulation platform. Lower CFET current drive due to amorphous (deposited) silicon
channel is shown to have negligible impact on CIM operational delay in many cases, as the maximum
allowable current is limited by wiring resistance, not transistor drive strength while maintaining accurate
weighted sum. Compared to regular 2D CMOS FinFET array. CFET SRAM cells show an improvement
up to 57.19% in TOPS/W. Furthermore, the performance in TOPS/W mm2 is improved up to 19×. A
factor proportional to the number of stacked layers for monolithically stacked CFET SRAM cells, makes
it highly promising for future edge intelligence.
INDEX TERMS Complementary FET (CFET), compute-in-memory (CIM), monolithic 3D integration,

SRAM.
I. INTRODUCTION random-access memory (SRAM) is a promising candidate

The exponential growth of neural network (NN) size for to achieve on-chip CIM [1], [2] compared to other elec-
artificial intelligence (AI) has placed an enormous demand tronic memory types for its superior endurance, low write
on data-centric computational power. Computing-in-memory power, noise immunity, etc., yet it occupies the largest layout
(CIM) has come to the rescue by integrating processors area among all Fig. 1(a). This may be addressed by verti-
and data to overcome the von Neumann bottleneck. Static cally stacking nFET and pFET nanosheet devices into the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 107

BAIG et al.: 3-D MONOLITHIC STACKING OF CFET ON CMOS FOR NEXT GENERATION CIM SRAM
FIGURE 2. Schematic showing two adjacent CFET 8T-SRAM cells, each has
two CFETs and an average of 2.5 double-layer n-channel nanosheet FET
FIGURE 1. (a) Advantages of SRAM cell in many aspects compared to (NSFETs) (RA2/RA’2 share one NSFET).
other memory devices, while lacking in integration density and
non-volatility [8]. (b) Combining n-channel and p-channel devices into one
CFET [4], and monolithic stacking of multiple CFET layers to improve the
device footprint. (c) Cross-sectional TEM images showing CFET structure
with p-channel on top and n-channel at bottom. Inset showing EDS
mapping of the TEM image for Ti and Hf.
complementary FET (CFET) structure Fig. 1(b) [3], [4], and

further stacking of CFET-SRAM layers via monolithic 3D
integration, reduces the footprint by more than a factor of
n, where n is the number of CFET-SRAM stacks Fig. 1(c).
Silicon channel deposited at low temperature (< 600◦ C) FIGURE 3. Layout of proposed CFET 8T-SRAM with schematic shown in
Fig. 2.
degrades carrier mobility, so transistor drive current is much
lower than regular CMOS [5], yet for CIM where multiple
SRAM rows are activated simultaneously for weighted sum
operation, in many cases maximum current is limited by II. CFET SRAM CELL DESIGN AND MONOLITHIC 3D
I-R drop rather than transistor drive strength. To mini- INTEGRATION
mize latency, we may employ regular CMOS FinFET in Compute-in-memory operations is efficiently realized using
the bottom single-crystalline silicon layer for peripheral 8T-SRAM cell. These cells have less read bitline discharge
drive/read-out circuitry to drive and sense CFET SRAM current compared to 7T-SRAM and a reduced footprint
arrays on top. The 8-transistor (8T) SRAM is suitable for compared to 10T-SRAM. However, 10T-SRAM are used to
CIM due to its high linearity and low read disturb. In this perform other Boolean operations. Hence, this paper con-
work, we designed and laid out CFET-based 8T-SRAM cell siders standard 6T-SRAM for memory and 8T-SRAM for
and compare it with regular CMOS FinFET 8T-SRAM pro- computing-in-memory applications. The CFET device com-
cessed with the same fabrication facilities in a similar test prising of bottom n-channel and top p-channel is fabricated
vehicle. BSIM-CMG [6] is calibrated to both technologies with a low temperature process (< 600◦ C) [4]. CFET device
and SPICE simulations are performed to compare the two is then used to realize an 8T-SRAM comprising of 2 p-
cases to quantify improvements in terms of power con- channel and 6 n-channel transistors (2P6N configuration)
sumption, linearity, and latency. We used the CIMulator [7] with the aid of an additional dummy transistor T1 sharing
platform to compare CIM-SRAM build in the two tech- the gate with transistor RA1 Fig. 2. The second read access
nologies in terms of NN inference accuracy with a given transistor RA2 is combined with the read access transistor
set of software-trained 5-bit-weight network, taking into of adjacent SRAM cell RA’2 to complete the structure. The
account hardware extracted device variation and nonlinearity layout of the structure is shown in Fig. 3, it has an area of
as found during SPICE simulation [1]. The advantages of 3D 240 λ2 compared to the standard FinFET area of 270 λ2 (not
monolithic-integrated CFET SRAM is highlighted in terms shown here for simplicity). Stacking the p-channel on top of
of CIM operations per power, and operations per power per the n-channel reduces the footprint for both 6T-SRAM and
area. 8T-SRAM as shown in Table 1. The advantage is reduced in
108 VOLUME 11, 2023

TABLE 1. Footprint reduction of 6T and 8T SRAM cells using CFET.
FIGURE 5. Transfer characteristics (Id − Vgs ) of n-channel (Blue) and

p-channel (Red) (a) CFET and (b) FinFET devices. Relatively low drain
current of CFET is due to the polycrystalline silicon channel. (c) Output
characteristics (Id − Vds ) and (d) output conductance (Gds − Vds ) of the
n-channel CFET device used for realizing the read access transistors of the
FIGURE 4. Benchmarking flow adopted in this study to evaluate and CFET SRAM cell.
compare CFET and FinFET technologies.
for n-channel and p-channel CFETs. The threshold volt-

the realization of the 8T-SRAM design due to the addition age is tuned using post-calibration adjustment of effective
of the dummy transistor for realizing the CFET structure. gate work function assuming Vth engineering is available
Further layout area reduction for CFET is possible via the by channel or gate stack engineering [13], [14]. 8T-SRAM
buried power rail approach [3], yet such method is not com- is cell is designed and the widths of the transistors are
patible with 3D monolithic stacking. The main advantage tuned for better hold, read, and write stability which is
of our polysilicon-channel CFET device is that its chan- described in Section IV. Peripheral components including
nel is deposited using Plasma Enhanced Chemical Vapour sense amplifiers, counter, and drivers are realized using
Deposition (PECVD) and solid-phase crystallization. This FinFET devices and perfected for speed and linearity and
enables us to monolithically stack one SRAM layer on top are described in Section V. Fig. 5(a) and 5(b) show the
of another by simply repeating the fabrication process steps. measured transfer characteristics of both the CFET and the
The stacking of multiple SRAM layers also provides shorter FinFET, respectively. One key difference between CFET and
global interconnects which minimizes R-C delay. Besides, FinFET devices is that the smaller current in CFET due to
the effective cost per layer would be less as we will use the the lower mobility of carriers in the polycrystalline channel.
same photomasks for each layer. However, the complexity Fig. 5(c) and 5(d) show the output characteristics and out-
of multiple layers will increase total processing costs and put conductance of n-channel CFET devices which will be
might cause yield reduction. discussed in detail in Section VI.
FinFET devices, on the other hand, are made of single-
crystalline silicon and hence has superior drive strength IV. READ, WRITE AND HOLD STABILITY OF CFET 8T-SRAM
compared to polysilicon. The FinFETs [9], [10], [11], [12] 8T-SRAM cells are designed for optimum stability, speed,
are fabricated with a gate-first high-K and metal gate process, and linearity. For accurate CIM results, and enhanced image
with Hf0.5 Zr0.5 O2 gate dielectric of 10 nm thickness. The recognition accuracy, superior read stability is needed. To
device is primarily made as part of ferroelectric-FinFET (Fe- enhance the read stability, an 8T SRAM cell is considered
FinFET). Nevertheless, the device can be used as a regular in which the additional two transistors provide the neces-
FinFET. sary decoupling of output loading from the storage node
during the read operation [15], thus significantly improving
III. DEVICE CHARACTERIZATION AND PARAMETER the read stability. In fact, for 8T-SRAM the read static noise
EXTRACTION margin (RSNM) is nearly the same as hold static noise mar-
For fair comparison, CFET and FinFET devices are made gin (HSNM) Fig. 6(a). Variation of HSNM as a function of
in the same nanofabrication facility with similar baseline Vdd is shown in Fig. 6(b). Read stability is also quantified
processes. Fabricated devices are then characterized and cal- by read N-curve that is produced by biasing both the bitlines
ibrated to BSIM-CMG multiple-gate CMOS compact model with Vdd . Sweeping the voltage at one of the storage nodes
(CFET and FinFET devices) following the benchmarking and measuring the current that flows into it. Write stability,
flow shown in Fig. 4. To ensure that the parameters are on the other hand, is quantified with write N-curves by bias-
consistent with the physical properties we follow the extrac- ing one bitline with Vdd , and the other bitline to ground [16].
tion method mentioned in the BSIM-CMG manual [6]. The Subsequently, we sweep the voltage at the storage node to
measured data show asymmetric threshold voltages (Vth ) extract nodal current. Fig. 6(c) and 6(d) show the critical
VOLUME 11, 2023 109

FIGURE 6. (a) Simulated hold static noise margin (HSNM) of 6T-SRAM and
read static noise margin (RSNM) of 8T-SRAM using CFET, (b) HSNM as a
function of Vdd , (C) Critical write current as a function of Vdd with inset FIGURE 8. Components of read and write peripheral circuits placed at the
showing the write N-curve, (d) Critical read current as a function of Vdd bottom of the CFET SRAM stack. Inset shows the discharge current flowing
with inset showing the read N-curve. The above curves are obtained after through read access transistors responsible for MAC nonlinearity.
fine tuning the SRAM cell stability and balancing of read and write speeds.
write operation for initialization of the neural network (NN)

weights and to update them. Selection of a single row is
made by selecting the layer first, followed by the row in that
layer, employing layer decoder and row decoder simultane-
ously. MAC operation or read operation is done by applying
pulses corresponding to the input image data on wordlines
that get multiplied with the contents of the SRAM cell. MAC
operation results in the current being discharged from the
SRAM cell in accordance to data stored in the SRAM cell
or weight WN and excitation on the read word line (RWL)
TWL . Discharged current IN gets added along the column
of the SRAM array and an equivalent amount of charge
T
WN × C1BL × 0 WL IN (t)dt is discharged from the charge-
sharing capacitors located on the bottom layer. The leftover
FIGURE 7. Block diagram of 3D monolithically stacked GAA CFET SRAM resultant voltage of capacitors (Eq. (1)) [1] is then used to
array. The peripheral read and write blocks are made of standard FinFET to estimate the MAC output using a flash Analog to Digital
retain speed; the SRAM arrays are made of CFET with deposited channel
for high density.
Converter (ADC). The overall block diagram is shown in
Fig. 8. The 8-bit input image is split into eight batches, each
comprising of 5-bit input pulses (8 × 25 = 256 pulses). The
write current and critical read current of the CFET SRAM adders and registers of the read periphery circuit carry out
cell as a function of Vdd with the inset showing the write the accumulation of partial sums of each cycle resulting from
N-curve and read N-curve respectively [17]. the splitting of an 8-bit input image (256 pulses).
TWL
1
V. 3D CFET SRAM COMPUTE-IN-MEMORY ARCHITECTURE VBL = W1 × × I1 (t)dt
CIM is a highly energy-efficient way of performing CBL 0
TWL
multiply-and-accumulate (MAC) operations. Fig. 7 shows 1
+ W2 × × I2 (t)dt
the proposed novel CIM architecture for 3D monolithically CBL 0
stacked CFET SRAM arrays. Peripheral circuitry aiding in TWL
1
data reading, data writing, and digital transformation of MAC + · · · + WN × × IN (t)dt . (1)
CBL 0
results are placed in the first layer (bottom) with single crys-
talline silicon for optimal read/write latency. Multiple CFET
SRAM layers with deposited silicon channels are placed VI. SPICE AND NEURAL NETWORK
above. The weights obtained from off-chip training of the COMPUTE-IN-MEMORY SIMULATION
neural network are stored as SRAM cell contents through To compare CFET and regular CMOS FinFET in terms of
write operation. CIM generally employs a word-by-word MAC latency, power consumption per operation, and impact
110 VOLUME 11, 2023

TABLE 2. Mapping of neural network layers 1 and 2 to CIM SRAM macro of

size 64 × 60 (rows × columns).
case of 64 × 64 for 5-bit precision the last 4 columns would

FIGURE 9. (a) Transient analysis of CFET SRAM array of size 64 × 5 be unusable. To keep the number of computational units or
(rows × columns) with 5-bit inputs, outputs, and weights. (b) Energy
consumed at various stages of the CIM Operation. SRAM arrays of CFET are
neurons of the NN low. We have considered one device
more energy efficient compared to that of FinFET due to lower wiring as synapse (1D1S) architecture [20] that makes use of a
capacitance. reference column of weights instead of negative synaptic
devices. For instance, layer 1 comprises of 784 × 200 × 5
SRAM cells. The 784 rows in layer 1 are realized by
using the 64 × 60 macro using 12 instances/copies of the
64 × 60 macro and using 16 rows from the 13th instance.
200 × 5 (columns × precision) is realized 16 instances of
64 × 60 macro and using 40 columns from the 17th instance.
Training the network-on-chip is fine given the SRAMs have
infinite endurance but the process is energy-draining and
training the NN every time is discouraged. Moreover, for
training a NN, one needs higher bit precision (8-bit) but
for image recognition (feed-forward inference) a lower bit
precision (4 or 5-bit) may be sufficient. Hence, we train
the network off the chip with 8-bit weights and optimize it
to 4 and 5-bit for feed forward inference. The above tech-
FIGURE 10. The CIMulator platform is run in python using a tensor flow
nique effectively reduces the footprint of the CIM macro and
library. The Neural network comprises 784, 200, and 10 nodes in input, reduces the energy consumed by the chip with very minimal
hidden, and output layers, respectively. Trained weights are then degradation in accuracy.
quantized. RBL nonlinearity is then added to the CIMulator platform to
conduct the realistic simulation. Retraining of the network is done to
Access transistors of the SRAM cell connected to the
address the RBL nonlinearity of the neural network. RBL needs to be operated in the saturation regime. In a
saturation regime, a device will pass a constant current irre-
spective of the voltage applied. The CFET technology is
of read bitline nonlinearity [1], SPICE simulation is per- newly developed and is particularly very vulnerable to pro-
formed with 4 and 5-bit unary input pulses (24 and 25 pulses cess variation. Fig. 5(c) and 5(d) shows a device having
respectively) of 0.8 V each on arrays with 64, 256 rows and finite output conductance due to process variation [4]. Finite
having 4 and 5-bit precision (weights). The resultant tran- output conductance causes degradation in the forward infer-
sient analysis and energy consumed in each of the step are ence accuracy of analog summing, or MAC operation [1].
shown in Fig. 9(a) and 9(b) respectively. The input pulse NN are in general robust to synaptic array non-idealities and
width to be applied on the worldline is tuned to ensure that one can often retrain a NN with a small number of epochs
VBL does not drop below 3% of Vdd after the application to recover back the lost accuracy as described in the next
of 24 and 25 pulses with all the SRAM weight being 1. paragraph.
Performance evaluation of CFET 8T-SRAM array is Fig. 11 shows the RBL curves of the CFET device for
obtained using CIMulator [7] a circuit-level benchmarking 4-bit and 5-bit ADC corresponding to 4-bit and 5-bit input
tool for neuromorphic circuits such as Neurosim [18]. We pulses and weights, respectively. nonlinearity θ is extracted
employ a 3-layer multi-layer perceptron (MLP) NN com- by fitting the ADC output to the nonlinearity model as shown
prising 784, 200, and 10 neurons respectively to recognize in the inset of Fig. 11. The value of θ is easily adopted in
handwritten digits from the MNIST [19] database. Table 2 the CIMulator platform to study the impact of the MAC non-
describes the realization of the NN weights and biases of linearity on forward inference accuracy for the entire NN.
layers 1 and 2 using the SRAM CIM Marco of the size Inference accuracy is severely degraded after considering
of 64 × 60 (Rows and Column) with 5-bit precision. The MAC nonlinearity. Inference accuracy can be improved by
dimension of the macro 64 × 60 is chosen because in the retraining the NN Fig. 12(a). Following a brief re-training
VOLUME 11, 2023 111

TABLE 3. Parasitic resistance and capacitance of the read bitline of SRAM

cells using CFET and FinFET. [17].
TABLE 4. Key performance parameters of CFET and regular CMOS (FinFET)

for array sizes with 64 rows and 256 rows.
FIGURE 11. RBL characteristics (θ ) modeling and fitting for FinFET and
CFET device for 4-bit inputs, and weights with the inset showing the MNIST
inference accuracy as a function of epochs during retraining.
VIII. PERFORMANCE BENCHMARKING

Table 4 shows the device, circuit, and system-level compar-
ison of CFET and regular CMOS FinFET technology for
SRAM-CIM. Both 64-rows and 256-rows arrays are consid-
ered for the evaluation. The area of the CFET SRAM cell
(Areacell ) is smaller than the FinFET SRAM cell due to the
FIGURE 12. (a) Image recognition accuracy is degraded due to MAC
non-linearity. A brief re-training significantly improves the accuracy. stacking of the n-channel and p-channel in CFET. Smaller
(b) Accuracy of 4 and 5-bit weight CFET SRAM cells at various stages of AI CFET SRAM cell area leads to small read bitline resistance
process flow.
and capacitance (RBL and CRBL ) of CFET SRAM arrays in
comparison to FinFET SRAM arrays. Read bit line discharge
current ICell is a function of RBL , CRBL . FinFET having a
(less than 5 epochs) inset of Fig. 11, accuracy is recovered to large current and 3× lower latency due to higher mobility,
85.19% and 89.63% for 4-bit and 5- bit weights, respectively. has slightly larger wiring capacitance for the 64 rows case.
Fig. 12(b) shows the accuracy of NN at various stages of CFET on the other hand, shows 3× lower power due to low
feed forward inference. It should be noted that the accuracy Icell and hence the power efficiency (TOPS/W) is better for
after retraining is limited to 90% is because of lower bit the CFET case. For 256 rows, to ensure that VBL does not
precision (4 and 5 bits). drop below 3% of Vdd , Icell is reduced by tuning the input
pulse voltage and therefore has low bit-line current for the
VII. PARASITIC EXTRACTION extracted RBL , CRBL values of 256 × 256 macro. The latency
Parasitic resistance is obtained using the equation R = of CFET and CMOS FinFET become similar due to similar
ρ(L/A). Resistivity ρ of the copper interconnect, A being Icell currents.
the cross-sectional area of the interconnect trench. Length CFET shows an improvement of 4.24% and 57.19% for
of the read bitline LRBL , and the corresponding resistances power efficiency in terms of TOPS/W due to lower CFET
RRBL per unit SRAM cell using CFET and FinFET devices SRAM cell area leading to lower parasitics and the bene-
is summarized in Table 3. Unit capacitance C/L of value fit becomes significant for large arrays. CFET SRAM cell
0.378 fF/μm [17] is used to obtain the read bitline capaci- requires about 7 masks for 1 stack of SRAM cells built
tance CRBL . Signals on wordline on the other hand do not using 2 metal layers. FinFET also needs 7 masks for the
impact the performance of the CIM if driven with sufficient realization of SRAM cells using 1 poly and 2 metal layers.
power. Hence, we consider the parasitics of bitline in our Multiple stacks of SRAM cells (3D Stacking) are workable
simulations. in the case of a CFET device. The realization of the 3-layer
112 VOLUME 11, 2023

MLP neural network described in Fig. 10 requires 17 stacks [2] Y.-D. Chih et al., “16.4 an 89TOPS/W and 16.3TOPS/mm2 all-digital
of CFET SRAM cells of dimension 64 × 60 and 4 stacks SRAM-based full-precision compute-in memory macro in 22nm for
machine-learning edge applications,” in Proc. IEEE Int. Solid-State
of SRAM cells of dimension 256 × 256. Stacking the lay- Circuits Conf. (ISSCC), vol. 64, 2021, pp. 252–254.
ers on top of each other greatly improves the performance [3] J. Ryckaert et al., “The complementary FET (CFET) for CMOS scaling
in terms of TOPS/(W-mm2 ) proportional to the number of beyond N3,” in Proc. IEEE Symp. VLSI Technol., 2018, pp. 141–142.
[4] S.-W. Chang et al., “First demonstration of CMOS inverter and 6T-
stacked layers. However, stacking would incur larger process- SRAM based on GAA CFETs structure for 3D-IC applications,” in
ing costs due to repeated processing steps for every SRAM Proc. IEEE Int. Electron Devices Meeting (IEDM), 2019, pp. 1–7.
stack at the expense of improved performance proportional [5] S. De et al., “Tri-gate ferroelectric FET characterization and modelling
for online training of neural networks at room temperature and 233K,”
to the stacked layers in terms of performance per power per in Proc. Device Res. Conf. (DRC), 2020, pp. 1–2.
layout area. [6] S. Khandelwal et al., BSIM-CMG 110.0.0: Multi-Gate MOSFET
Compact Model: Technical Manual, BSIM Group UC Berkeley,
Berkeley, CA, USA, 2014.
IX. CONCLUSION [7] H.-H. Le et al., “Ultralow power neuromorphic accelerator for deep
learning using Ni/HfO2/TiN resistive random access memory,” in Proc.
Measured CFET and regular CMOS FinFET devices are 4th IEEE Electron Devices Technol. Manuf. Conf. (EDTM), 2020,
accurately calibrated in transfer and output characteristics pp. 1–4.
using the BSIM-CMG compact model. 8T-SRAM cell is [8] W.-S. Khwa, D. Lu, C.-M. Dou, and M.-F. Chang, “Emerging NVM
circuit techniques and implementations for energy-efficient systems,”
realized using CFET technology and SPICE simulations in Beyond-CMOS Technologies for Next Generation Computer Design.
of 8T-SRAM CIM macros are conducted for CFET and Cham, Switzerand: Springer, 2019, pp. 85–132.
FinFET technologies. Evaluation of MAC latency, power, [9] M. A. Baig et al., “Compact model of retention characteristics of
ferroelectric FinFET synapse with MFIS gate stack,” Semicond. Sci.
and non-linearity is carried out, a non-linearity model is Technol., vol. 37, no. 2, 2021, Art. no. 24001.
developed and transferred to a higher-level NN simulator [10] S. De et al., “Ultra-low power robust 3bit/cell Hf0.5Zr0.5O2 ferroelec-
to evaluate system-level inference accuracy. Though FinFET tric FinFET with high endurance for advanced computing-in-memory
technology,” in Proc. Symp. VLSI Technol., 2021, pp. 1–2.
has an advantage in-terms of latency, CFET shows better [11] S. De et al., “Random and systematic variation in nanoscale
performance per power due to shorter SRAM cell height and Hf0.5 Zr0.5 O2 ferroelectric FinFETs: Physical origin and neuromor-
reduced wiring resistance and the efficiency becomes more phic circuit implications,” Front. Nanotechnol., vol. 3, Jan. 2022,
Art. no. 826232. [Online]. Available: https://www.frontiersin.org/
evident for large array sizes. Performance in (TOPS/W mm2 ) article/10.3389/fnano.2021.826232
gets improved by 19× and 7× for 17 and 4 stacked layers of [12] S. De et al., “Robust binary neural network operation from 233 K to
CFET SRAM cells of dimension 64 × 60 and 256 × 256 398 K via gate stack and bias optimization of ferroelectric FinFET
synapses,” IEEE Electron Device Lett., vol. 42, no. 8, pp. 1144–1147,
respectively. Device nonlinearity caused due finite output Aug. 2021.
conductance is recovered by retraining the NN with epochs [13] Gate-all-around field effect transistor having multiple thresh-
as few as 5. CFET being a gate-all-around device has supe- old voltages, by R. Bao et al. (2020, Mar. 10). U.S. Patent
10 586 854. [Online]. Available: https://www.freepatentsonline.com/
rior gate controllability and reduced short channel effects, 10586854.html
making it scalable beyond N3 [3]. In addition, CFET with [14] C.-Y. Huang et al., “3-D self-aligned stacked NMOS-on-PMOS
deposited channel has reduced footprint and is stackable in nanoribbon transistors for continued Moore’s law scaling,” in Proc.
IEEE Int. Electron Devices Meeting (IEDM), 2020, pp. 1–4.
comparison to regular CMOS. [15] J. Singh, S. P. Mohanty, and D. K. Pradhan, Robust SRAM Designs
and Analysis. New York, NY, USA: Springer, 2013.
[16] Q. Chen et al., “Critical current (ICRIT) based SPICE model extraction
ACKNOWLEDGMENT for SRAM cell,” in Proc. 9th Int. Conf. Solid-State Integr. Circuit
Technol., 2008, pp. 448–451.
The authors are grateful to the Taiwan Semiconductor [17] C.-J. Yeh, “Next generation compute-in-memory via monolith-
Research Institute for Nanofabrication facilities, and services. ically integrated 3D-stacked complementary-FET with conven-
National Center for High-Performance Computing for GPU tional CMOS,” M.S. thesis, Inst. Microelectron., NCKU, Tainan,
Taiwan, Jan. 2021. [Online]. Available: http://ir.lib.ncku.edu.tw/handle/
computing facilities. They appreciate PROPLUS Solutions 987654321/204107
for support on software licenses for model parameter [18] P.-Y. Chen, X. Peng, and S. Yu, “NeuroSim: A circuit-level macro
extraction. model for benchmarking neuro-inspired architectures in online learn-
ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37,
no. 12, pp. 3067–3080, Dec. 2018.
[19] Y. LeCun. “The MNIST database of handwritten digits.” 2022.
REFERENCES [Online]. Available: https://ci.nii.ac.jp/naid/10027939599/en/
[1] Q. Dong et al., “15.3 A 351TOPS/W and 372.4GOPS compute-in- [20] S. N. Truong and K.-S. Min, “New memristor-based crossbar array
memory SRAM macro in 7nm FinFET CMOS for machine-learning architecture with 50-% area reduction and 48-% power saving for
applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), matrix-vector multiplication of analog neuromorphic computing,” J.
2020, pp. 242–244. Semicond. Technol. Sci., vol. 14, no. 3, pp. 356–363, 2014.
VOLUME 11, 2023 113

Received 17 December 2022; revised 18 February 2023; accepted 4 April 2023. Date of publication 10 April 2023; date of current version 18 April 2023.
The review of this article was arranged by Editor S. Reggiani.
Digital Object Identifier 10.1109/JEDS.2023.3265372
Monolithic Dual-Gate E-Mode

Device-Based NAND Logic Block
for GaN MIS-HEMTs IC Platform
YUHAO ZHU 1,2 , FAN LI1,2 , MIAO CUI1,2 , ZHICHENG FANG1,2 ,
ANG LI 1,2 (Graduate Student Member, IEEE), DONGYI YANG1,2 , YINCHAO ZHAO3 ,
HUIQING WEN 1,2 (Senior Member, IEEE), AND WEN LIU 1,2 (Member, IEEE)
1 School of Advanced Technology, Xi’an Jiaotong–Liverpool University, Suzhou 215123, China
2 Department of Electrical Engineering and Electronics, University of Liverpool, L69 3GJ Liverpool, U.K.
3 School of Intelligent Manufacturing Ecosystem, Xi’an Jiaotong–Liverpool University, Suzhou 215123, China
CORRESPONDING AUTHOR: W. LIU (e-mail: wen.liu@xjtlu.edu.cn)
This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFB3604400; in part by the Suzhou
Science and Technology Program under Grant SYG202131; in part by the Key Program Special Fund in XJTLU under Grant KSF-T-07;
and in part by the XJTLU Research Development Funding under Grant RDF-20-02-43.
ABSTRACT In this work, dual-gate enhancement-mode (E-mode) device based NAND circuit (DG-NAND)
and the NAND block with double E-mode devices (DD-NAND) are developed and fabricated based on the
GaN MIS-HEMTs (metal-insulator-semiconductor-high-electron-mobility-transistors) platform. The DG-
NAND circuit has an area of 0.118 mm2 with the probe pad, which is 24% lower than the area of the
DD-NAND circuit. Both static and dynamic experimental results validate the advantages of performance
improvement of NAND circuits designed by dual-gate technology at an input voltage of 9 V. This paper
demonstrates the design potential of dual-gate NAND in an all-GaN MIS-HEMTs platform through
compact design.
INDEX TERMS Logic circuits, GaN, MIS-HEMTs, dual-gate E-mode device, monolithic integration.
I. INTRODUCTION HEMTs (high-electron-mobility-transistors) with Al2 O3 gate

AlGaN/GaN HEMTs become a promising candidate for oxide layer, which showed a lower reverse gate leak-
power converter switches in recent years due to high age current than its normally-on dual-gate Schottky HFET,
switching frequency, power density, and conversion effi- while the gate insulators are also used to enhance the
ciency [1], [2], [3], [4]. Stray inductances and commutation gate swing [13], [14], [15]. However, all the above applica-
loops are the main factors that limit the switching speed of tions are examined with depletion-mode (D-mode) devices.
GaN-based power electronics with discrete driver chips [5]. Since D-mode HEMT technology requires a negative volt-
The lateral structure of the GaN-on-Si technology allows age control, its counterpart, the enhancement-mode (E-mode)
the monolithic integration of several devices for a compact HEMT, is more suitable for the power conversion applica-
design to restrain parasitic effects [6]. tion [16], [17]. Wolf et al. [18] fabricated a monolithically
For GaN devices, the dual-gate technology could be integrated bidirectional switch based on E-mode p-GaN
applied to realize the unique function and reduce require- HFETs using dual-gate technology.
ments on the extra component [7], [8], [9], [10]. With the For a high-level integrated power system, dual-gate tech-
back gate technology, Hazra et al. [8] proposed a dual- nology is a feasible solution for monolithic design [19].
gate structure to control the threshold voltage. A cascade Chvála et al. demonstrated a monolithic NAND cell that
dual-gate structure is proposed by Lin et al. to suppress cur- is based on InAlN/GaN MIS-HEMTs with a maximum
rent collapse [11]. Gong et al. [12] reported a normally-on gate voltage of 3 V, and confirming the validity of the
dual-gate AlGaN/GaN MIS (metal-insulator-semiconductor) proposed models [20]. As an essential building block for
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
230 VOLUME 11, 2023

ZHU et al.: MONOLITHIC DUAL-GATE E-MODE DEVICE-BASED NAND LOGIC BLOCK
FIGURE 1. Schematic cross-section of the AlGaN/GaN MIS-HEMT

DG-NAND logic circuit.
digital circuits, implementing dual-gate E-mode NAND (DG-

NAND) by p-GaN HEMTs with a smaller area provides
an optimization method [21]. However, as a result of the
limitations of p-GaN technology with low gate breakdown
voltage, the gate driver voltage is limited to 5 V, which
requires extra protection circuits for an all-GaN integrated
circuit (IC) development. Moreover, the dynamic test is nec-
essary to analyze the superior characteristics of the dual-gate
design. FIGURE 2. DD-NAND logic circuit: (a) schematic of the logic circuit,
(b) microscope photo (E-mode device dimension: LGS /LG /LGD /WG =
In this work, we demonstrate double E-mode devices 5/3/10/200 µm). DG-NAND logic circuit: (c) circuit diagram, (d) microscope
NAND (DD-NAND) and dual-gate E-mode device NAND view.
logic gate circuits by monolithic integrated E/D mode
MIS-HEMTs. The static and dynamic measurements are
conducted to evaluate the performance of the logic gate. pad for the connection part to optimize the chip layout.
The experimental investigation is reported about the compact The D-mode device functions as an active load resistor and
solution to logic blocks for power IC applications. two E-mode gates serve as logic input. In this work, the
presented GaN-based dual-gate E-mode (DG-E) structure is
II. EXPERIMENTS AND DISCUSSION used to replace the counterpart DD-E structure. Thus the
Fig. 1 depicts a schematic cross-section of the proposed DG- DG-NAND logic circuit is formed, as shown in Fig. 2 (c).
NAND circuit by utilizing the AlGaN/GaN MIS-HEMTs The employment of dual-gate devices aims to replace two
technology. All devices are fabricated on a Si substrate con- E-mode HEMTs in the logic circuit design and eliminates
sisting of a 4.2 μm GaN buffer layer, a 420 nm layer GaN redundant ohmic contacts [28], which could reduce on-
channel layer, a 1 nm AlN spacer and a 25 nm AlGaN state resistance and parasitic capacitance. The corresponding
layer. GaN devices with D-mode and E-mode are capable of microscope views of the DD-NAND and DG-NAND circuits
forming logic circuits by Direct Coupled FET Logic (DCFL) are shown in Fig. 2 (b) and Fig. 2 (d), respectively. The size
structure [22], [23]. For E-mode devices, recessed-gate tech- of the layout with the probe pad reduces from 0.155 mm2
nology has been utilized [24], and MIS gates have been to 0.118 mm2 owing to the advantages of a dual-gate
implemented to extend the swing voltage of the gate and structure.
thus improve performance [25]. An interconnect layer of Al
is deposited to allow the integration of monolithic integrated A. EXPERIMENTAL SETUP
circuits. An in-depth description of the fabrication process is The static test platform is illustrated in Fig. 3 (a), and the
provided in [26]. The D-mode device features a 5 μm source experimental setup is represented by the blue and black
to gate distance (LGS ), a 3 μm gate length (LG ), a 5 μm connection. A fixed DC bias voltage is provided to V2 .
gate width (WG ), and a 10 μm drain to gate distance (LGD ). It is worth pointing out that DD-E structure measurements
For dual-gate E-mode device dimension, LGS /LG /LGD /WG use V1 and V2 to regulate the switching states of the two
= 5/3/5/200 μm, and the separation between the two gates E-mode devices. For DG-E device, V1 and V2 are utilized
is set to 10 μm. to control the G1 and G2 gates of the dual-gate E-mode
The schematic diagram of the GaN-based double device device, as shown in Fig. 3 (a) left part. Meanwhile, the static
E-mode (DD-E) structure is illustrated in the red border, performance of the NAND circuits is monitored by the same
as shown in Fig. 2 (a) and reported in [19]. Unlike tradi- analyzer, the supply voltage VDD is provided. The dynamic
tional NAND design uses two E-mode devices connected experimental setup is represented by the red and black arrows
in series [27], the two E-mode devices share the common in Fig. 3 (a). Two input signals V1 and V2 are supplied by
VOLUME 11, 2023 231

FIGURE 4. Comparison of DD-E structure and DG- E device at V2 is biased

at constant 9 V: (a) Transfer characteristics, (b) output characteristics.
FIGURE 3. (a) The experimental setup diagram of the DG-NAND logic

circuit for static (blue and black line) and dynamic measurement (red and
black line), (b) test setup, (c) monolithic integrated GaN driver with
DG-NAND circuit.
a signal generator, and they are set to output different logic

FIGURE 5. Comparison between measured voltage transfer characteristics
levels. The dynamic performance of the NAND logic circuit of the DG-NAND and DD-NAND logic circuits.
is monitored and recorded by an oscilloscope. The test setup
is demonstrated in Fig. 3 (b). Fig. 3 (c) shows an integration
of the logic unit into the pre-drive module [13], in which
based on the feedback signal [29]. The voltage transfer char-
the dual-gate NAND circuit could be utilized to replace the
acteristics (VTC) of the investigated DG-NAND circuit are
first-level DCFL inverter in the driver stage. This method
demonstrated in Fig. 5, and it is compared with the VTC
would contribute to the high integration level all-GaN power
of the DD-NAND circuit. A supply voltage VDD of 9 V
conversion system.
is applied. The minimum output voltage (VOL ) of the DD-
NAND circuit is about 1.69 V, and VOL of the DG-NAND
B. STATIC CHARACTERISTICS MEASUREMENT FOR DD-E circuit is nearly 1.56 V. Meanwhile, the output voltage of
STRUCTURE AND DG-E DEVICE 9 V are obtained in Fig. 5 at V1 = 0 V.
The characteristics of the devices and the NAND circuits The NAND dynamic test is used to emulate different oper-
were measured by an Agilent B1505A power device ana- ating conditions on an all-GaN integrated power chip. Logic
lyzer. As illustrated in Fig. 4 (a), the threshold voltage (Vth ) circuits are designed to generate a drive signal based on the
of the DD-E structure is 1.9 V, and the Vth of the DG-E input operating signal (V1 ) and enable signal (V2 ) as shown
device is 1.7 V. Fig. 4 (b) shows the ID -VD curve, the cur- in Fig. 3 (c). Different output states could be observed in
rent density (ID, max ) is approaching 124.8 mA/mm when the output waveforms by adjusting the phase between the
two gates are biased at 9 V for DD-E structure. As could input operating signal and enable signal. In Fig. 6, the volt-
be observed in Fig. 4 (b), the current density of the DG-E age output waveforms of the DG-NAND and DD-NAND
structure reaches a peak value of 133.9 mA/mm. logic circuits working at 100 kHz are monitored by the
oscilloscope. The black and red curves represent the gate
C. STATIC AND DYNAMIC CHARACTERISTICS voltages applied on the first gate and the second gate, respec-
MEASUREMENT FOR GAN-BASED TWO-INPUT NAND tively, from signal generators. When the DD-NAND circuit
LOGIC CIRCUIT and DG-NAND circuit work at a switching frequency of
In all-GaN monolithic IC design, the logic circuit serves as 100 kHz, the falling time of the DG-NAND circuit is 15
the drive control module applied to modify the drive voltage percent shorter than the DD-NAND circuit.
232 VOLUME 11, 2023

fewer devices and the die area drops by 24%. Indicated par-
asitic parameters are also reduced due to the elimination of
redundant ohmic contacts. The static characterization of the
dual-gate E-mode device NAND circuit indicates a lower
VOL about 1.46 V. From the dynamic measurement results,
the DG-NAND logic circuit observed less falling time, which
decreased by 15%. Furthermore, the dual-gate NAND will
be applied as a subunit of the drive unit development for
compact design and functional integration of the all-GaN
MIS-HEMTs IC platform.
REFERENCES
FIGURE 6. Measured output voltage waveforms of the DD-NAND (green [1] D. Kinzer and S. Oliver, “Monolithic HV GaN power ICs:
line) and DG-NAND (blue line) logic circuits. Performance and application,” IEEE Power Electron. Mag., vol. 3,
no. 3, pp. 14–21, Sep. 2016, doi: 10.1109/MPEL.2016.2585474.
[2] R. Sun, Y. C. Liang, Y.-C. Yeo, C. Zhao, W. Chen,
TABLE 1. Comparison for GaN-based dual-gate device. and B. Zhang, “All-GaN power integration: Devices to func-
tional subcircuits and converter ICs,” IEEE J. Emerg. Sel.
Topics Power Electron., vol. 8, no. 1, pp. 31–41, Mar. 2020,
doi: 10.1109/JESTPE.2019.2946418.
[3] K. J. Chen et al., “GaN-on-Si power technology: Devices and appli-
cations,” IEEE Trans. Electron Devices, vol. 64, no. 3, pp. 779–795,
Mar. 2017, doi: 10.1109/TED.2017.2657579.
[4] C.-L. Tsai et al., “Smart GaN platform: Performance & challenges,”
in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2017, pp. 1–4,
doi: 10.1109/IEDM.2017.8268488.
[5] S. Moench et al., “Monolithic integrated quasi-normally-off gate
driver and 600 V GaN-on-Si HEMT,” in Proc. IEEE 3rd Workshop
Wide Bandgap Power Devices Appl. (WiPDA), 2015, pp. 92–97,
doi: 10.1109/WiPDA.2015.7369264.
[6] D. Kinzer, “GaN power IC technology: Past, present, and future,” in
Proc. 29th Int. Symp. Power Semicond. Devices IC’s (ISPSD), 2017,
pp. 19–24, doi: 10.23919/ISPSD.2017.7988981.
[7] J. S. Moon et al., “<70% power-added-efficiency dual-
gate, cascode GaN HEMTs without harmonic tuning,” IEEE
Electron Device Lett., vol. 37, no. 3, pp. 272–275, Mar. 2016,
doi: 10.1109/LED.2016.2520488.
[8] S. Hazra, A. A. Shuvo, and A. G. Bhuiyan, “Electrical characteristics
of dual gate algan/gan high-electron mobility transistors,” in Proc.
Table 1 compares the dual gate technology for differ- 5th Int. Conf. Electr. Inf. Commun. Technol. (EICT), 2021, pp. 1–4,
ent applications. References [8], [10], and [12] apply a doi: 10.1109/EICT54103.2021.9733454.
[9] B. Kim, H. Q. Tserng, and P. Saunier, “A GaAs dual gate
second gate in the D-mode to improve device performance. power FET for operation up to K band,” in IEEE Int.
Papers [18] and [30] uses a second gate to complete Solid-State Circuits Conf. Dig. Tech. Papers, 1983, pp. 200–201,
the design of a bidirectional device. Both papers [20] doi: 10.1109/ISSCC.1983.1156508.
[10] L. Shi, M. Liu, N. Dong, and X. Lin, “Dual-metal-gate
and [21] utilize a dual-gate structure for the functional block AlGaN/GaN HEMTs for power application,” in Proc. 14th IEEE Int.
of the monolithic integrated GaN platform. Verifying the Conf. Solid-State Integr. Circuit Technol. (ICSICT), 2018, pp. 1–3,
dynamic performance and comparing it with the conventional doi: 10.1109/ICSICT.2018.8565730.
[11] D.-J. Lin, J.-Y. Yang, C.-K. Chang, and J.-J. Huang, “Study
design is necessary. In this paper, a logic circuit module of current collapse behaviors of dual-gate AlGaN/GaN HEMTs
for the GaN MIS-HEMTs platform is designed based on on Si,” IEEE J. Electron Devices Soc., vol. 10, pp. 59–64, 2022,
DG-E device and compared with the conventional design doi: 10.1109/JEDS.2021.3132429.
[12] R. Gong et al., “AlGaN/GaN dual gate MOS HFET for
of the DD-E structure through dynamic and static tests. power device applications,” in Proc. 10th IEEE Int. Conf.
Furthermore, this design also exploits the advantages of Solid-State Integr. Circuit Technol., 2010, pp. 1353–1355,
the MIS-HEMTs platform. The large gate swing lowers the doi: 10.1109/ICSICT.2010.5667654.
[13] M. Cui et al., “Monolithic integration design of GaN-based power
demand for gate voltage protection, resulting in a compact chip including gate driver for high-temperature DC–DC convert-
design. ers,” Jpn. J. Appl. Phys., vol. 58, no. 5, 2019, Art. no. 56505,
doi: 10.7567/1347-4065/ab1313.
[14] A. Li et al., “A monolithically integrated 2-transistor voltage reference
with a wide temperature range based on AlGaN/GaN technology,”
III. CONCLUSION IEEE Electron Device Lett., vol. 43, no. 3, pp. 362–365, Mar. 2022,
In this work, two types of AlGaN/GaN -based NAND logic doi: 10.1109/LED.2022.3146263.
circuits are demonstrated. Static and dynamic measurements [15] Y. Cai et al., “Low ON-state resistance normally-OFF AlGaN/GaN
MIS-HEMTs with partially recessed gate and ZrOx charge trapping
experimentally validate the NAND logic circuit of both struc- layer,” IEEE Trans. Electron Devices, vol. 68, no. 9, pp. 4310–4316,
tures. The dual-gate E-mode device NAND logic cell utilizes Sep. 2021, doi: 10.1109/TED.2021.3100002.
VOLUME 11, 2023 233

[16] H. Amano et al., “The 2018 GaN power electronics roadmap,” J. [24] W. Saito, Y. Takada, M. Kuraguchi, K. Tsuda, and I. Omura,
Phys. D, Appl. Phys., vol. 51, no. 16, Mar. 2018, Art. no. 163001, “Recessed-gate structure approach toward normally off high-Voltage
doi: 10.1088/1361-6463/aaaf9d. AlGaN/GaN HEMT for power electronics applications,” IEEE
[17] M. Giandalia, J. Zhang, and T. Ribarich, “650 V AllGaNTM power Trans. Electron Devices, vol. 53, no. 2, pp. 356–362, Feb. 2006,
IC for power supply applications,” in Proc. IEEE 4th Workshop doi: 10.1109/TED.2005.862708.
Wide Bandgap Power Device Appl. (WiPDA), 2016, pp. 220–222, [25] R. Sun, Y. C. Liang, Y.-C. Yeo, and C. Zhao, “Au-
doi: 10.1109/WiPDA.2016.7799941. free AlGaN/GaN MIS-HEMTs with embedded current sens-
[18] M. Wolf, O. Hilt, and J. Würfl, “Gate control scheme of monolithically ing structure for power switching applications,” IEEE Trans.
integrated normally OFF bidirectional 600-V GaN HFETs,” IEEE Electron Devices, vol. 64, no. 8, pp. 3515–3518, Aug. 2017,
Trans. Electron Devices, vol. 65, no. 9, pp. 3878–3883, Sep. 2018, doi: 10.1109/TED.2017.2717934.
doi: 10.1109/TED.2018.2857848. [26] M. Cui et al., “Monolithic GaN half-bridge stages with
[19] Y. Zhu, M. Cui, A. Li, F. Li, H. Wen, and W. Liu, “Monolithic integrated gate drivers for high temperature DC–DC buck
DFF-NAND and DFF-NOR logic circuits based on GaN MIS- converters,” IEEE Access, vol. 7, pp. 184375–184384, 2019,
HEMT,” in Proc. Int. Conf. IC Des. Technol. (ICICDT), 2021, pp. 1–4, doi: 10.1109/ACCESS.2019.2958059.
doi: 10.1109/ICICDT51558.2021.9626401. [27] X. Li et al., “GaN-on-SOI: Monolithically integrated all-GaN ICs
[20] A. Chvála et al., “Device and circuit models of InAlN/GaN for power conversion,” in Proc. IEEE Int. Electron Devices Meeting
D- and dual-gate E-mode HEMTs for design and characterisa- (IEDM), 2019, pp. 1–4, doi: 10.1109/IEDM19573.2019.8993572.
tion of monolithic NAND logic cell,” in Proc. 13th Int. Conf. [28] Z. Hu, S. Zhou, R. He, and Q. Zhang, “A broadband voltage variable
Des. Technol. Integr. Syst. Nanoscale Era (DTIS), 2018, pp. 1–6, attenuator with high-power tolerance and compact size based on dual-
doi: 10.1109/DTIS.2018.8368565. gate GaN HEMTs,” IEEE Trans. Power Electron., vol. 38, no. 5,
[21] M. Basler et al., “Building blocks for GaN power inte- pp. 6108–6115, May 2023, doi: 10.1109/TPEL.2023.3242474.
gration,” IEEE Access, vol. 9, pp. 163122–163137, 2021, [29] H. Xu, G. Tang, J. Wei, and K. J. Chen, “Integrated high-speed over-
doi: 10.1109/ACCESS.2021.3132667. current protection circuit for GaN power transistors,” in Proc. 31st
[22] M. D. Feuer et al., “Direct-coupled FET logic circuits on InP,” Int. Symp. Power Semicond. Devices ICs (ISPSD), 2019, pp. 275–278,
IEEE Electron Device Lett., vol. 12, no. 3, pp. 98–100, Mar. 1991, doi: 10.1109/ISPSD.2019.8757685.
doi: 10.1109/55.75724. [30] T. Morita et al., “650 V 3.1 mcm2 GaN-based monolithic bidi-
[23] G. Tang et al., “Digital integrated circuits on an E-mode GaN rectional switch using normally-off gate injection transistor,” in
power HEMT platform,” IEEE Electron Device Lett., vol. 38, no. 9, Proc. IEEE Int. Electron Devices Meeting, 2007, pp. 865–868,
pp. 1282–1285, Sep. 2017, doi: 10.1109/LED.2017.2725908. doi: 10.1109/IEDM.2007.4419086.
234 VOLUME 11, 2023

Articles Ucompressed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Articles Ucompressed

Uploaded by

Copyright:

Available Formats

Received 9 September 2022; revised 14 November 2022; accepted 10 December 2022.

Date of publication 26 December 2022;

3-D Monolithic Stacking of Complementary-FET

CORRESPONDING AUTHOR: D. LU (e-mail: darsenlu@mail.ncku.edu.tw)

INDEX TERMS Complementary FET (CFET), compute-in-memory (CIM), monolithic 3D integration,

I. INTRODUCTION random-access memory (SRAM) is a promising candidate

VOLUME 11, 2023 107

complementary FET (CFET) structure Fig. 1(b) [3], [4], and

108 VOLUME 11, 2023

TABLE 1. Footprint reduction of 6T and 8T SRAM cells using CFET.

FIGURE 5. Transfer characteristics (Id − Vgs ) of n-channel (Blue) and

for n-channel and p-channel CFETs. The threshold volt-

VOLUME 11, 2023 109

write operation for initialization of the neural network (NN)

110 VOLUME 11, 2023

TABLE 2. Mapping of neural network layers 1 and 2 to CIM SRAM macro of

case of 64 × 64 for 5-bit precision the last 4 columns would

VOLUME 11, 2023 111

TABLE 3. Parasitic resistance and capacitance of the read bitline of SRAM

TABLE 4. Key performance parameters of CFET and regular CMOS (FinFET)

VIII. PERFORMANCE BENCHMARKING

112 VOLUME 11, 2023

VOLUME 11, 2023 113

Monolithic Dual-Gate E-Mode

CORRESPONDING AUTHOR: W. LIU (e-mail: wen.liu@xjtlu.edu.cn)

I. INTRODUCTION HEMTs (high-electron-mobility-transistors) with Al2 O3 gate

230 VOLUME 11, 2023

FIGURE 1. Schematic cross-section of the AlGaN/GaN MIS-HEMT

digital circuits, implementing dual-gate E-mode NAND (DG-

VOLUME 11, 2023 231

FIGURE 4. Comparison of DD-E structure and DG- E device at V2 is biased

FIGURE 3. (a) The experimental setup diagram of the DG-NAND logic

a signal generator, and they are set to output different logic

232 VOLUME 11, 2023

VOLUME 11, 2023 233

234 VOLUME 11, 2023

You might also like