You are on page 1of 4

FPGA Based on Integration of Memristors and CMOS

Devices
Wei Wang, Tom T. Jing, and Brian Butcher
College of Nanoscale Science and Engineering, University at Albany, SUNY
Albany, NY 12203, USA

Abstract—This paper introduces a novel CMOS-memristor


hybrid reconfigurable architecture, mFPGA. Different from the
existing crossbar-based CMOS-memristor architectures,
mFPGA mainly consists of 1T1M-like structures that can be
fabricated by using a CMOS-compatible process. These devices
can efficiently establish FPGA block memories. More
importantly, novel CMOS-memristor routing switches are
developed to replace the CMOS routing switches to achieve
significant density enhancement and power reduction. The
simulation results demonstrate that 2D and 3D mFPGAs Figure 2. A typical FPGA architecture (LB: logic block, CB: connection
provide at least a 2X to 3X overall improvement in terms of area block, and SB: switch block).
reduction, and a 20% lower power consumption, compared with
corresponding CMOS FPGA architectures. In this paper, we develop various 1T1M cell designs and use
them to build memory, routing, and logic components for CMOS-
memristor hybrid FPGA, namely, mFPGA. Architectures based on
I. INTRODUCTION
2D and 3D mFPGAs are detailed and followed by a benchmark
A memristor, or “memory resistor”, was first proposed in 1971 circuit simulation. The results demonstrate that mFPGA, compared
by Prof. Leon Chua but was only recently discovered by researchers with CMOS FPGA, achieves a significant improvement in area
at Hewlett-Packard Labs [1]. This new device was shown to be the
reduction and power consumption.
fourth fundamental electronic device in addition to resistors,
capacitors, and inductors. One application of memristor and The rest of this paper is organized as follows. Section II reviews
memristive devices is for a type of non-volatile random access the existing work and introduces new 1T1M-based elements. Section
memory, resistive memory (RRAM). RRAM applications are III presents the circuit designs of mFPGA as well as the architectures
generally implemented as a two-terminal memristor (1M, Fig. 1a) of 2D and 3D mFPGAs. Section IV shows the results of circuit
through vias on top of the source or drain terminal of a transistor (1T, simulation and Section V concludes the paper.
Fig. 1b). In this one-transistor-one-memristor (1T1M) structure, the
transistor is used to write and read the resistance state of the 1M. II. CMOS-MEMRISTOR ELEMENTS

A. Crossbar and 1T1M Elements


Several CMOS-memristor hybrid reconfigurable architectures,
such as nanoPLA [4], CMOL [5], FPNI [6], and nFPGA [7, 8], have
(a) (b) been developed by using the memristor-based crossbars with CMOS
logic circuits. The main challenge of building these nanoscale
Figure 1. (a) Hysteresis characteristic of the resistive bistable memristor
crossbar-based FPGAs is that the integration of crossbars and CMOS
(1M), (b) 1T1M cell. devices is not easily established [8]. In particular, it is difficult to
fabricate and orient special metal pins (two different heights) to
Integrating the memristor with CMOS transistors leads to the connect CMOS transistors with the top or bottom nanowires [6, 8].
further advancement of building elements in FPGAs. High-end This design issue motivates us to consider 1T1M devices.
elemental-improvements for FPGAs are of importance for research The 1T1M devices are easily integrated with CMOS devices to
and development since much of FPGA revenue comes from the build FPGA components. Also, compared with crossbars that
significant increase of high-end FPGA chips. A typical FPGA generally require diode-like junctions, 1T1M structures can be
architecture is shown in Fig. 2, which includes a cluster-based logic efficiently established by using various memristor materials. These
block (LB), switch block (SB), and connection block (CB). In terms memristors were successfully fabricated and integrated with CMOS
of total area, delay, and power consumption of a typical FPGA, the devices as emerging memory circuits [9-11], establishing the solid
LB accounts for 22%, while SBs, CBs, configuration memories, and foundation of the applications of 1T1M devices.
interconnects account for 78% (both local and global interconnects
are considered) [2]. However, these existing studies mainly focus on the memory
applications. Our work focuses on 1T1M structures as elements for

This work was supported in part by SRC FCRP, NSF EMT, Air
Force STTR, and International Sematech research grants.

978-1-4244-5309-2/10/$26.00 ©2010 IEEE 1963

Authorized licensed use limited to: Chungnam National University. Downloaded on October 14,2021 at 03:09:54 UTC from IEEE Xplore. Restrictions apply.
FPGA applications. The proposed FPGA is similar to the existing VP2 are passing through the two programming transistors to select
magnetic memory-based FPGA [12]. The advantages of using the 1M junction to be RON or ROFF (see Table I). If the 1M junction
CMOS-memristor hybrid devices are expected to provide a higher is RON, it will connect “A” and “B” during the operational stage. If
density and more compatible fabrication with foundry-type CMOS the 1M junction is ROFF, it will disconnect “A” and “B”. The
technology. For example, HfO2 and ZrO2 materials have been used in
programming voltage |VP1 - VP2| is much larger than the operational
CMOS and can be used to establish efficient resistive memory
devices [9] as well as mFPGA. voltage |VAB| as shown in Table I.
Our second design, a 2T2M structure, is seen in Fig. 3c.
B. 1T1M-based FPGA Elements Compared with both Fig. 3a and Fig. 3b, this design has one
1) NOR-based 1T1M Array as Memory Element programming transistor with a programming voltage VP used to
Using 1T1M structures, the NOR-cell array provides faster configure two complementary junctions connected to Vdd and GND,
access compared with the NAND-cell array that requires a larger area respectively. Note that |Vdd -VP| is used to program the Vdd junction,
and a slower access. The NOR 1T1M array is used to replace SRAM while Vp is used for the GND junction. (When one junction is ON, the
cells used in the block memory of an FPGA, which leads to a 6X other junction is OFF.) Table I summarizes the typical values of VP
density enhancement as expected and referenced in [13]. Since used to program the pass transistor in the programming stage; it also
information storage of the NOR array is based on the resistance shows how combining one RON junction and one ROFF junction works
change in the memristor, almost no power is required to maintain as a voltage divider, during the operational stage. The ROFF/RON ratio
data storage. 1T1M cell can significantly reduce the standby power of determines the pass transistor gate voltage VG. ROFF/RON must be
a corresponding SRAM cell, PSRAM,with a 6X improvement as shown large enough to insure both VG < VT when the pass transistor is OFF,
in [13]. and VG > VT when the pass transistor is ON. Generally, the memristor
has a ratio in a range of 104 [10, 11, 13]; this is much higher than the
2) Two Novel CMOS-Memristor Routing Switches typical ratio of 119, which is assumed as a low bound. This 2T2M
As shown in Fig. 3a, the conventionally used CMOS routing routing switch operates exactly like a SRAM routing switch where
switches consist of a pass transistor controlled by a SRAM cell of VG has a modest reduction and does not affect the ON/OFF operation
six transistors to provide the routing function. By integrating of the pass transistor.
memristor devices with CMOS devices, we can achieve two new Compared with a 7T SRAM cell, 2T2M or 2T1M can have
routing switches to improve the area and standby power around 3.5X density improvement. The standby power of 2T2M or
consumption of FPGA. Note that the standby power of the 7- 2T1M switch is equal to 1/3 (2 transistors / 6 transistors) of PSRAM.
transistor (7T) SRAM routing switch depends on the 6-transistor The 2T1M switch is expected to operate faster than a 7T SRAM
SRAM PSRAM. switch or a 2T2M switch, but the 2T2M leads to a more reliable
circuit operation than both SRAM and 2T1M designs. The proposed
mFPGA structure considers both new routing switches.

III. ARCHITECTURES AND CIRCUIT DESIGNS OF MFPGA

By utilizing 1T1M devices, we subsequently can develop circuits


and architectures of 2D and 3D mFPGAs that achieve a higher
density and lower power consumption.

A. 2D mFPGA
The proposed 2D mFPGA maintains the architecture of the
baseline 2D FPGA (Fig. 2), while utilizing RRAM devices to build
several FPGA building blocks. In particular, the block RAM memory
is based on the NOR 1T1M arrays; CB and SB are designed by using
2T1M or 2T2M routing switches; LB is designed by using 1T1M
cells.
Figure 3. (a) A 7T SRAM routing switch and its ON/OFF operations, (b) A
2T1M routing switch and its ON/OFF operations, (c) A 2T2M routing switch
and its ON/OFF operations.
TABLE I
COMPARISON BETWEEN THE 7T SRAM, 2T1M, AND 2T2M ROUTING
SWITCHES FOR “ON” AND “OFF” CASES
Programming Stage Operating Stage
“ON”: Vg = 1.2V
“ON”: 1.2V Figure 4. The proposed mFPGA structure considers both new routing
7T SRAM “ON”: Vg = 0V
“OFF”: 0V switches. (a) The 4-by-4 CB structure, (b) Replacement of a 7T switch with a
VAB = 0V – 1.2V
2T1M switch, (c) Replacement of a 7T switch with a 2T2M switch.
“ON”: VP1 = 3V and VP2 = 0V
2T1M VAB = 0V – 1.2V
“OFF”: VP1 = 0V and VP2 = 3V
The 4-by-4 CMOS CB is shown in Fig. 4a [15]. By using the
“ON”: Vg = 1.19V 2T1M routing device to replace the 7T SRAM switch, we can obtain
“ON”: VP = -3V
2T2M “ON”: Vg = 0.01V a high-density and low-power design (Fig. 4b). We also consider the
“OFF”: VP = 3V
VAB = 0V – 1.2V use of the 2T2M switch to replace the 7T switch in the CB (see Fig.
Note: The pass transistor has Vdd = 1.2V, VT = 0.4V [14] and the 4c).
memristor requires ±3V to program [10, 11, 13] By using the 2T1M or 2T2M switch to replace each 7T SRAM
switch in Fig. 5a and 5b, we can obtain CMOS-memristor designs
Fig. 3b compares our first design, the 2T1M structure, with Fig. for SB-1 and SB-2 (these two designs are for 1-bit switching
3a: two programming transistors are used to configure one junction. operation as shown in Fig. 5c and 5d, respectively). The 4-bit CB
During the configuration stage, the programming voltages VP1 and

1964

Authorized licensed use limited to: Chungnam National University. Downloaded on October 14,2021 at 03:09:54 UTC from IEEE Xplore. Restrictions apply.
and SB operations are summarized in Table II in terms of the area,
delay, and power performance estimations.

(c) (d)

Figure 5. The 1-bit SB operation: (a) SB-1: CMOS design [15], (b) SB-2:
CMOS design [15], (c) SB-1: CMOS-memristor design (four branches are
required; only one branch is shown) and its equivalent circuit, (d) SB-2:
CMOS-memristor design (four branches are required; only one branch is
shown) and its equivalent circuit.

TABLE II
PERFORMANCE ESTIMATION OF CB AND SB DESIGNS FOR 4-BIT OPERATIONS (e) (f)
CMOS SRAM 2T1M-based 2T2M-based
design design design Figure 6. (a) LB (BLE: basic logic elements) [3], (b) BLE [3], (c) A truth
112 8 32 table for a 2-input logic, (d) A LUT implementation, (e) LUT with SRAM,
Area (f) LUT with 1T1M.
n-transistors n-transistors n-transistors
4 × 4 CB
Delay τPASS τPASS τPASS
(Fig. 4) B. 3D mFPGA
Standby
16PSRAM 5.3PSRAM 5.3PSRAM The proposed mFPGA can be modified to a 3D FPGA structure.
Power
440 240 240 As shown in Fig. 7 below, the 3D mFPGA is a face-to-face two-
Area layer structure. The use of 1T1M cells and circuits in 3D mFPGA is
n-transistors n-transistors n-transistors
Four
SB-1’s Delay τSB-1 τSB-1/3 τSB-1
expected to give new opportunities, similar to 2D mFPGA, which
(Fig. 5 a, b) will reduce power density and thus improve thermal performance of
Standby 3D ICs.
40PSRAM 13.3PSRAM 13.3PSRAM
Power
432 192 192
Area
n-transistors n-transistors n-transistors
Four
SB-2’s Delay τSB-2 τSB-2/3 τSB-2
(Fig. 5 c, d)
Standby
48PSRAM 16PSRAM 16PSRAM
Power
Note: The p-transistor in the SRAM cell is equivalent to an n-transistor.
Each p-type transistor in the buffer (two inverters) or individual inverter is
similar to two n-type transistors.
In the proposed mFPGA architecture, the LB CMOS design is
also modified by utilizing 1T1M cells. Most commercial FPGAs use
LBs (Fig. 6a) with basic-logic elements (BLEs, Fig. 6b) based on
look-up tables (LUTs). Each LB contains N BLEs fed by I cluster
inputs. The BLE consists of a K-input LUT and register, which feed
a two-input MUX that determines whether the registered or
unregistered LUT output drives the BLE output. A truth table and
the related implementation for a 2-input LUT are illustrated by Fig. Figure 7. Architecture of the 2-layer 3D mFPGA based on the face-to-face
6c and Fig. 6d, respectively. In the convertional design, SRAM is bonding using bumps without area penalty.
used as shown in Fig. 6e. Here, we use 1T1M cell to replace each
SRAM cell in the design of the FPGA LB LUT (Fig. 6f) to reduce
area and power consumption. IV. BENCHMARK SIMULATIONS
In order to demonstrate the efficiency of the proposed 2D and 3D
As we analyzed before, one SRAM cell consists of six transistors
mFPGAs, we simulate the Toronto-20 FPGA benchmark circuits for
while the 1T1M cells only uses one transistor. The BLE LUT
2D and 3D mFPGAs following the methods described in [7, 8]. The
consists of four SRAM cells and six additional pass transistors. simulation results are summarized in Table III, including area,
Therefore, compared with the SRAM-based LUT design, the 1T1M- critical-path delay, and total-power consumption (including both
based design can have a 3X density improvement. The standby dynamic and standby power values). These results provide an
power of the four 1T1M cells is equal to 1/6 (4 transistors / 24 accurate comparison between the proposed 2D and 3D mFPGAs with
transistors) of the four SRAM cells, leading to 2/3 PSRAM.

1965

Authorized licensed use limited to: Chungnam National University. Downloaded on October 14,2021 at 03:09:54 UTC from IEEE Xplore. Restrictions apply.
their corresponding CMOS FPGA designs [2, 7, 8] for logic circuits [2] M. Lin, et al., “Performance benefits of monolithically stacked 3D-
(not considering block memories). FPGA”, in: Proc. of FPGA, 2006.
[3] V. Betz, et al., “Architecture and CAD for deep-submicron FPGAs”,
The simulation results in Table III demonstrate that the 2D Kluwer Academic Publishers, 1999.
mFPGA provides 2X to 3X improvement over the 2D-baseline
[4] A. DeHon, “Nanowire-based programmable architectures”, ACM J. on
FPGAs in terms of area, while the 3D mFPGA’s improvement over Emerg. Tech. Comput. Systems, vol. 1, pp. 109-162, 2005.
the 3D CMOS FPGA is less than 2X. The delay improvement of the
2T1M-based mFPGA is around 10% over the CMOS FPGA. Note [5] D.B. Strukov and K.K. Likharev, “CMOL FPGA: a reconfigurable
architecture for hybrid digital circuits with two-terminal nanodevices”,
that the 2T2M-based mFPGA has the same delay performance as the Nanotechnology, vol. 16, pp. 888-900, 2005.
CMOS FPGA. Both 2T1M- and 2T2M-based mFPGAs have the total
power improvement over the CMOS FPGA, which is around 20% for [6] G. Snider and S. Williams, “Nano/CMOS architecture using a field-
programmable nanowire interconnect”, Nanotechnology, vol. 18, 2007.
both 2D and 3D cases.
[7] C. Dong, D. Liu, S. Haruehanroengra, and W. Wang, “3D nFPGA: A
V. CONCLUSIONS reconfigurable architecture for 3D CMOS/nanomaterial hybrid digital
In this paper, we have introduced a new CMOS-memristor hybrid circuits”, IEEE Trans. Circuits and Systems I, vol.54, pp. 2489-2501,
2007.
FPGA platform, mFPGA, which is based on the utilization of the
CMOS-memristor devices. The proposed mFPGA has the following [8] D. Tu, M. Liu, W. Wang, and S. Haruehanroengra, “3D CMOL: A 3D
properties: (1) 1T1M-like devices are CMOS-compatible and easily FPGA using CMOS/nanomaterial hybrid digital circuits”, IET (IEE)
Micro and Nano Letters, vol. 2, no. 2, pp. 40-45, 2007.
integrated with CMOS circuits; (2) Novel 2T1M and 2T2M routing
switches can significantly reduce the complexity of FPGA routing [9] W. Guan, S. Long, Q. Liu, M. Liu, and W. Wang, "Nonpolar
resources, leading to 2X to 3X density and 20% power improvement nonvolatile resistive switching in Cu doped ZrO2", IEEE Electron
Device Lett., vol. 29, no. 5, pp. 434-438, 2008.
for the logic part; (3) RRAM-based block memories have 6X area
and power reduction compared with the SRAM block memories; (4) [10] W. Guan, M. Liu, S. Long, Q. Liu, and W. Wang, “On the resistive
The LB can utilize 1T1M to get efficient LUT design; (5) The switching mechanisms of Cu/ZrO2:Cu/Pt”, Appl. Phys. Lett., 93, pp.
223506, 2008.
mFPGAs maintain the existing designs and can utilize the existing
CAD tools of the CMOS FPGA; (6) By using the face-to-face 3D [11] C.Y. Liu, et al., “Bistable resistive switching of a sputter-deposited Cr-
integration method, a 2D mFPGA can be extended to a 3D FPGA, doped SrZrO3 memory film”, IEEE Electron Device Lett., vol. 26, no.
6, pp. 351-353, 2005.
thus providing further density, speed, and power enhancement.
[12] N. Bruchon, et al., “Magnetic tunnelling junction based FPGA”, in:
Due to the aforementioned superior properties, mFPGAs are Proc. of FPGA, pp.123-130, 2006.
expected to lead to innovation and technology breakthroughs in [13] J. Mustafaa et al., “Comparison of three different architectures for
establishing reconfigurable ICs in the future nanotechnology era. active resistive memories”, Inter. J. Elect. & Commun., pp. 345-352,
2007.
REFERENCES [14] ITRS 2007 edition, emerging research devices.
[1] L.O. Chua, “Memristor-the missing circuit element”, IEEE Trans. on [15] G. Lemieux et al., “Circuit design of routing switches”, in: Proc. of
Circuit Theory, 18, pp.507-519, 1971. FPGA, pp.452-455, 2002.

TABLE III
SIMULATION RESULT COMPARISON BETWEEN CMOS FPGAS AND MFPGAS USING BENCHMARK CIRCUITS (32NM TECHNOLOGY)
Area (µm2) Critical Part Delay (ns) Power (mW)
2D 3D
2D 2D 2D 3D 3D CMOS 2D CMOS 3D 2D 2D 2D 3D 3D
3D 3D
CMOS mFPGA mFPGA mFPGA mFPGA [8] or mFPGA or mFPGA CMOS mFPGA mFPGA mFPGA mFPGA
CMOS CMOS
[8] (2T1M) (2T2M) (2T1M) (2T2M) mFPGA (2T1M) mFPGA (2T1M) [7] (2T1M) (2T2M) (2T1M) (2T2M)
(2T2M) (2T2M)
Alu4 137700 55080 60588 68850 47540 58654 5.1 4.59 2.04 1.8 0.062 0.056 0.058 0.0362 0.028 0.03
apex2 166050 66420 79704 83025 53210 61875 6 5.4 2.4 2.2 0.067 0.06 0.063 0.0421 0.033 0.036
apex4 414619 165848 197865 207310 102924 115432 5.5 4.95 2.2 2 0.042 0.038 0.04 0.0203 0.014 0.015
clma 623194 249278 348976 311597 144639 157865 13.1 11.79 5.24 4.7 0.2 0.18 0.19 0.178 0.147 0.149
diffeq 100238 40095 48765 50119 40048 43876 6 5.4 2.4 2.2 0.024 0.022 0.023 0.0152 0.011 0.014
elliptic 213638 85455 118769 106819 62728 71903 8.6 7.74 3.44 3.1 0.069 0.052 0.053 0.0502 0.04 0.042
ex1010 391331 156532 219876 195666 98266 113245 9 8.1 3.6 3.2 0.113 0.092 0.093 0.096 0.079 0.081
ex5p 100238 40095 52794 50119 40048 48765 5.1 4.59 2.04 1.8 0.0314 0.018 0.025 0.0105 0.009 0.011
frisc 230850 92340 101574 115425 66170 85439 11.3 10.17 4.52 4.1 0.0627 0.046 0.049 0.0472 0.047 0.048
misex3 124538 49815 69877 62269 44908 47654 5.3 4.77 2.12 1.9 0.0513 0.036 0.047 0.0299 0.022 0.023
pdc 369056 147622 189078 184528 93811 107653 9.6 8.64 3.84 3.5 0.101 0.081 0.089 0.087 0.061 0.063
s298 166050 66420 73062 83025 53210 55282 10.7 9.63 4.28 3.9 0.042 0.028 0.029 0.0261 0.019 0.021
S38417 462713 185085 204651 231357 112543 127530 7.3 6.57 2.92 2.6 0.124 0.102 0.108 0.122 0.091 0.093
S38584.1 438413 175365 210049 219207 107683 111235 4.8 4.32 1.92 1.7 0.136 0.112 0.115 0.121 0.098 0.099
seq 151369 60548 67456 75685 50274 55672 5.4 4.86 2.16 1.9 0.065 0.049 0.052 0.042 0.033 0.034
spla 326025 130410 179028 163013 85205 90378 7.3 6.57 2.92 2.6 0.087 0.068 0.069 0.0754 0.041 0.044
tseng 78469 31388 53297 39235 35694 38976 6.3 5.67 2.52 2.3 0.029 0.016 0.017 0.0101 0.009 0.01
Avg. 264382 105753 133848 132191 72877 81849 7.44 6.69 2.97 2.68 0.077 0.062 0.066 0.059 0.046 0.048

1966

Authorized licensed use limited to: Chungnam National University. Downloaded on October 14,2021 at 03:09:54 UTC from IEEE Xplore. Restrictions apply.

You might also like