Professional Documents
Culture Documents
Research Article
ISSN 1751-858X
Design and evaluation of a memristor-based Received on 16th April 2015
Revised on 23rd September 2015
look-up table for non-volatile field Accepted on 27th October 2015
doi: 10.1049/iet-cds.2015.0217
programmable gate arrays www.ietdl.org
Abstract: This study presents the detailed design and analysis of a new memristor-based look-up table (LUT) for field
programmable gate arrays (FPGAs). The proposed memory utilises memristors as storage elements with N-type metal–
oxide–semiconductor transistors for row access. New WRITE and READ operations are proposed; the proposed LUT
requires no additional circuit to handle the WRITE 1 (0) operation. The proposed method requires a RESTORE pulse
only for the READ 0 operation. Moreover, the WRITE operation of the proposed method requires three power lines and
a RESTORE pulse only for the READ 0 operation, thus saving 25% READ time when compared with previous methods.
In addition, the proposed method does not require the REFRESH pulse and does not dissipate power during stand-by
mode. Extensive simulation results are presented with respect to different operational features such as normalised
state parameter, pulse width and LUT size. In addition to a circuit-level evaluation, the proposed LUT scheme has also
been assessed with respect to FPGA implementation. Simulation results using sequential benchmarks mapped on
Spartan 4 and 5 FPGAs show that the proposed non-volatile LUT outperforms existing static random access memory
cell-based LUTs in terms of performance.
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
292 & The Institution of Engineering and Technology 2016
Table 1 Features of different NV memories Table 3 Voltage requirements for WRITE and READ operations on M11
using the proposed method
STT-MRAM PCM CBRAM RRAM,
memristors Voltage on Voltage on Voltage on Voltage on
WL1 WL2 BL1 BL2
cell size, F2 37 8–16 <20 >5
read latency, ns <10 48 ∼20 <10 write Vdd floating GND floating
write latency, ns 12.5 40–150 ∼100 ∼10 1
energy per bit 0.02 100 2 2 write −Vdd floating GND floating
access, pJ 0
15 8 5 5
endurance >10 10 >10 10 read ±Vdd GND to load to load
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 293
Table 4 Truth table of controller 3.1 WRITE 1 operation
A B WL1 WL2 G1 G2 Sel
For the WRITE 1 operation, consider a pulse (+Vdd) is applied to
WL until the NSP of the memristor (initially at ROFF) changes
0 0 1 0 1 0 0
0 1 1 0 0 1 1 from ROFF (0) to RON (1); thus, a logic 1 is successfully written to
1 0 0 1 1 0 0 the memory cell. Fig. 2a shows the value of the NSP, whereas
1 1 0 1 0 1 1 Fig. 2b shows the output voltage at the load after applying +Vdd
(+0.9 V) till 150 ns. It should be observed that the NSP fully
changes from the initial value of 0 to 1 at about 135 ns; thus, the
applied at WL1. In addition, the proposed controller block is WRITE 1 operation is correctly executed.
designed such that simultaneous READ and WRITE operations are
not possible. Prior to writing the configuration bits to the LUT, all
memristors are assumed to be in the off state (0 state). 3.2 WRITE 0 operation
The proposed memory block provides many qualitative advantages
compared with an SRAM-based LUT. When the power to the For a WRITE 0 operation, consider a pulse (−Vdd) is applied to WL
proposed LUT is turned off, the NSPs [21, 22] of the memristors until the NSP of the memristor (initially at RON) changes from RON (1)
are retained due to its NV nature. Therefore, unlike an SRAM-based to ROFF (0). Thus, a logic 0 is successfully written to the memory cell.
LUT, there will be no power dissipation during the stand-by mode; As shown in Fig. 2a for WRITE 0, –Vdd (−0.9 V) is applied to WL at
moreover, the dynamic power dissipation is very small when 170 ns and the NSP changes from 1 and reaches the value of 0 at
compared with an SRAM-based LUT as the latter consumes more about 320 ns, i.e. the WRITE 0 operation is correctly executed.
power particularly when switching the inverters. Six transistors (6 T) Thus unlike [11–13], the proposed WRITE operation requires
are usually required for storing a single bit in a complementary applying +Vdd (+0.9 V) and –Vdd (−0.9 V) only to WL, i.e. it
MOS (CMOS) SRAM-based LUT; hence, a two-input LUT requires does not require an additional circuit to monitor the incoming data
24 transistors. In the proposed memory block, only four memristors and then channel it to WL/BL depending on the value of the data
and two transistors are required. A detailed evaluation of for the WRITE operation. In addition, the proposed method does
performance and related metrics will be performed in subsequent not require the application of +Vdd and +0.5 Vdd to WL and BL to
sections. In the proposed work, the updated version of the unselect a memristor; therefore, the proposed method uses only
simulation program with integrated circuit emphasis (SPICE) model three power rails (+Vdd, −Vdd and GND). Moreover as shown
of [23] is utilised for simulating the memristor; as it shows close later, in the proposed method, the changes to the NSP of the
resemblance to the HP Labs implementation [20]. Moreover, unselected memristors are small when compared with [11–13].
throughout this paper unless specified, the default values of the
parameters used in [21] are adopted.
3.3 READ 1(0) operation
Fig. 2 Response to a sequence of WRITE 1, READ 1, RESTORE, WRITE 0, READ 0 and RESTORE operations on a single memristor using the proposed method
a Applied voltage and resulting NSP
b Output voltage
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
294 & The Institution of Engineering and Technology 2016
accomplishes significant READ and RESTORE time savings when A similar simulation assessment (consecutive READ and
compared with [11–13], as analytically proven next. RESTORE operations) has been performed on methods presented
Let α be the percentage of READ 1 operations and β be the in [11–13]. The effect of NSP due to READ 0 operation is similar
percentage of READ 0 operations over a total number of N READ to the proposed method; however, the READ 1 operation causes
operations; Ttotal denotes the total time required to perform all N the NSP of the memristor to fall below the threshold value.
READs and is given by Therefore, [11–13] requires a REFRESH pulse when the NSP
reaches the threshold value of 0.5 and a circuit is required to
monitor the NSP of a memristor [21]. More detailed analysis is
Ttotal = a · N · T1 + Tr1 + b · N · T0 + Tr0 (1) presented in [21].
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 295
Table 5 NSP of unselected memristors under Scenario 1 memristor Mij, RTj denotes the on resistance of transistor Tj and Vj
denotes the output voltage at node Outj, where i = 1, 2 and j = 1, 2.
LUT size Number of affected, Worst-case NSP among
unselected memristors unselected memristors with Under the floating-row method, let RM = R21 + R22 and RT = RT1 =
initial value of 0 RT2, so the output voltages V1 and V2 are given as follows
where
the proposed method the worst case of NSP is well below the
threshold (i.e. the threshold is 0.5), whereas for [11–13] the S = RM R11 R12 + RT RM R11 + RM R12 + 2R11 R12
worst-case NSP is always greater than the threshold level. Hence,
it is obvious that the previous method does not hold the memory + R2T RM + R11 + R12
state of the unselected memristors when performing a WRITE
operation on the selected memristors. Under the grounded-row method, RT = RT1 = RT2 and the output
voltages V1 and V2 are given by
4.2 READ operation V R
V1 = in P1 (5)
The process of reading the LUT using the proposed method is RP1 + R11
performed by reading the selected cell in a row then using MUX
V R
to output the content of the selected cell. To substantially V2 = in P2 (6)
differentiate the output voltage difference between READ 0 and 1 RP1 + R12
operations, a grounded-row method (Fig. 2b) that connects the
unselected rows to ground rather than floating (previous methods where 1/RP1 = 1/RT + 1/R21 and 1/RP2 = 1/RT + 1/R22 are the parallel
Fig. 2a) is utilised in this paper. The following modelling analysis resistances between the unselected memristors and the transistors.
supports the utilisation of the proposed method. Consider the floating-row method; as shown in (3) and (4) the
Figs. 2a and b show the equivalent circuits of the 2 × 2 LUT under difference between the two outputs (V1 and V2) is decided solely
these methods at steady state. Rij denotes the resistance of the by R11 and R12. Moreover, the second term of the two equations is
Fig. 4 Worst-case difference between reading 0 and 1 NSP for different RON and LUT sizes for the grounded-row method
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
296 & The Institution of Engineering and Technology 2016
Table 6 Average WRITE delay, energy and EDP of SRAM and the proposed memristor-based LUTs
LUT size Average delay, ns Average energy, pJ Average EDP, pJns
significantly larger causing the difference between the two voltages considered for the floating-row method, the output voltages are
very small. As an example, consider R11 = 19 kΩ and R12 = R21 = now given by V1 = 0.0046 V and V2 = 0.4460 V. That is, there is
R22 = 100 Ω. As RT = 5.552 kΩ at 32 nm feature size and Vin = 0.9 now a clear distinction between a memristor with a 0 NSP and a
V, the output voltages are V1 = 0.8399 V and V2 = 0.8695 V. memristor with a 1 NSP. This is made possible due to the parallel
Therefore, it is very difficult to distinguish between the two arrangement between the unselected memristors and the transistor
outputs when the two memristors have different values of NSP. resistance, i.e. the selected memristor is likely to determine the
Now consider the grounded-row method where the output voltage outcome of the voltage divisor.
is effectively caused by the voltage divider. Therefore, the output Next, the performance of the grounded-row method is studied for
voltage depends mostly on the resistances of the column for the array sizes; it is observed that the final value of the voltage for READ
READ operation. Under the same conditions as previously output and hence the difference reduces. The worst case occurs when
Table 7 Average READ delay, energy and EDP of SRAM and proposed memristor-based LUTs
LUT size Average delay, ps Average energy, fJ Average EDP, fJps
Table 8 Evaluation of the proposed and SRAM-based LUTs by mapping ISCAS’89 circuits on Virtex4 FPGA
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 297
the value of the NSP of all unselected memristors is the smallest
1.3686
3.5403
17.382
115.528
250.254
223.397
278.452
506.961
2800.221
20,665.076
SRAM
Average EDP, pJns
(RON), i.e. in this case the unselected memristors on a column
have NSPs of 1. Fig. 4 shows the output voltage difference when
reading two memristors (one with an NSP of 1 and the other with
an NSP of 0) for different values of RON and LUT sizes (the LUT
0.0703
0.2109
0.7648
Proposed
size refers to the LUT memory size) in the worst case of the
4.87
10.54
10.04
11.60
25.31
148.31
1105.20
grounded-row method. As expected, with an increase in memory
READ
size, the difference in the output voltage decreases due to the
decrease in the load resistance. Moreover, as RON increases (under
a constant ROFF/RON ratio), the output voltage difference
29.61
47.64
105.30
271.46
399.73
377.64
421.63
569.75
1339.15
3650.18
SRAM
Average delay, ns
decreases, as caused by the decrease in the amount of current
flowing in the circuit. As shown in Fig. 4, the reference voltage of
a comparator at 0.1 V can sense logic 1 and 0 of an LUT with up
Proposed
0.1198
0.3640
to eight inputs. Thus unlike [11–13], the proposed READ
0.223
0.899
1.325
1.334
1.382
2.240
5.580
15.451
operation does not require a sense amplifier.
0.7553
5 Simulation results
115.22
109.36
126.85
272.80
1593.16
2.25
8.35
53.31
11,733.93
SRAM
Average EDP, pJns
This section presents a simulation-based evaluation of the proposed
scheme and a comparison with [11–13] is pursued with respect to the
worst-case scenario (the results for the average case are reported in
38,974,005
1.42 × 1010
8.08 × 1010
6.13 × 1011
Proposed
1.09 × 108
4.52 × 108
2.95 × 109
6.41 × 109
5.89 × 109
7.09 × 109
[21]). Moreover, comparison is performed between the proposed
memristor-based and SRAM-based LUTs.
WRITE
32.167
SRAM
5.1 WRITE operation
4.22
7.83
12.99
47.32
47.52
49.38
79.18
196.81
541.06
Average delay, ns
A detailed performance analysis is pursued by applying the WRITE
operation to LUTs of different sizes and then comparing the results
14,937.09
25,812.24
49,140.09
124,419.93
183,517.56
178,134.89
192,626.61
283,224.22
684,171.86
1,897,444.42
with [11–13]. The following scenarios (as in [22]) are simulated:
Proposed
Scenario 1: WRITE 1 to all memristors. Scenario 2: WRITE 0 to
all memristors. Scenario 3: WRITE 0 to a memristor while the
NSPs of all memristors are initially 1. Scenario 4: WRITE 1 to a
memristor while the NSPs of all memristors are 0. The worst
Six input LUTs
memristor-based cell.
3
6
5
7
10
19
24
49
96
380
establish the minimum width of the READ signal. (ii) the MUX
outputs the contents of the target cell. Fig. 6 shows the results of
worst-case READ operation applicable to the whole LUT for the
proposed method and [11–13]. It is observed that the average and
Benchmark circuit
time and dissipates less energy for all LUT sizes. In [11–13], the
1238
1488
5378
298
400
510
820
953
unselected rows for the READ operation are left floating and this
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
298 & The Institution of Engineering and Technology 2016
Fig. 7 KT and TKT for different benchmark circuits
significantly decreases the read voltage difference between the read 0 benchmark circuits on the given FPGA type. The results are
and 1 operations. presented in Tables 8 and 9. The average delay and energy
Next the READ operation performance of the proposed method is required by the proposed LUT design during the WRITE operation
compared with SRAM-based volatile LUTs. As shown in Table 7, are significantly high. However, for the READ operations, the
the proposed method incurs in a significantly reduced READ delay delay required by the proposed LUT design is very small when
and EDP compared with an SRAM-based LUT. This confirms the compared with an SRAM-based LUT. Thus confirms viability of
viability of the proposed cell, because in an FPGA, READ the proposed cell as an LUT for FPGA, because in an FPGA,
operations are performed more often than WRITE operations. READ operations are performed more often than WRITE operations.
Next, the average performance of the proposed method is carried
out by estimating the number of consecutive READ operations
5.3 Benchmark evaluation required immediately following a WRITE, such that total delay
incurred by the proposed scheme is equal to the delay incurred by
The proposed NV and the volatile SRAM-based LUT designs are the SRAM-based LUT.
also evaluated and compared by implementing the ISCAS’89 Let TO,I,B denote the average delay time for operation O (either
sequential benchmark circuits on the Xilinx Virtex4 FPGA READ, R or WRITE, W) using scheme I (I = SRAM or I = MEM)
(XC4VLX100) and the Virtex5 FPGA (XC5VLX220). Each for benchmark B; in a similar manner, it is possible to define EO,I,
volatile LUT is replaced with the NV LUT version using the B for the average EDP under the same cases. Let
proposed memristor-based scheme; the interconnect routing of the
benchmark circuits is kept the same as in the original Xilinx
FPGAs. The average delay and energy are found using circuit TO,I,B = tW ,I,B + KT tR,I,B (7)
simulation (for both the proposed LUT and the SRAM LUT) for
the WRITE and READ operations of all LUTs (for both the
proposed and the SRAM-based LUTs) for implementing different EO,I,B = eW ,I,B + KE eR,I,B (8)
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 299
(7) and (8) denote the average performance metrics (delay time and Simulation results using sequential benchmarks mapped on
EDP) using a given scheme for a benchmark under a single WRITE Spartan 4 and 5 FPGAs show that the proposed NV LUT
followed by KT or KE consecutive READ operations. KT and KE are outperforms existing SRAM-based LUTs in terms of performance
the least integers such that TO,MEM,,B is equal or greater than TO, (delay time and energy-delay product). Thus confirms viability of
SRAM,,B and EO,MEM,,B is equal or greater than EO,SRAM,,B, the proposed cell as an LUT for FPGA as READ operations are
respectively, for each benchmark circuit. The results for KT in performed more often than WRITE operations in an FPGA. As
different benchmark circuits when mapped to the Virtex4 and current work, we are developing other NV configurable resources
Virtex5 FPGAs are plotted in Fig. 7. The average value of KT is (such as the switch blocks) for an FPGA as well as integrating the
656 and 490 for the Virtex4 and Virtex5, respectively; also across designs of all resources (inclusive of the controller) such that
all benchmark circuits, KT is smaller for benchmark circuits performance can be improved furthermore.
mapped to the Virtex5 FPGA. This occurs because the Virtex5
FPGA uses LUTs with a larger number of inputs when mapping
the sequential circuits. Hence, the proposed LUT performs better
with FPGAs having LUTs of larger size, and therefore the 7 Acknowledgments
proposed scheme is shown to be scalable; the total time for a
WRITE followed by KT READ operations in each benchmark Please be informed that this manuscript is an extended version of
circuit (denoted by TKT) is found and plotted in Fig. 7; TKT is a [21, 22] by the same authors.
few microseconds for each benchmark circuit. The results for KE
for different benchmark circuits when mapped to the Virtex4 and
Virtex5 FPGAs are shown in Fig. 8. Similar to KT, KE is smaller
for benchmark circuits mapped to the Virtex5 FPGA; however, the 8 References
total time for a WRITE followed by KE READ operations in each
benchmark circuit (denoted by TKE) is a few milliseconds (Fig. 8). 1 Almurib, H.A.F., Kumar, T.N., Lombardi, F.: ‘Scalable application-dependent
Note however that the average values of KE for the Virtex FPGAs diagnosis of interconnects of SRAM-based FPGAs’, IEEE Trans. Comput.,
2013, 63, (6), pp. 1540–1550
are substantially larger than those for KT. 2 ‘Xilinx Spartan Data sheet’. Available at http://www.xilinx.com
3 ‘Xilinx SpartanTM-3AN FPGAs’ Available at http://www.xilinx.com
4 ITRS: ‘International Technology Roadmap For Semiconductors 2011 Edition
6 Conclusion Executive Summary’, 2011
5 Deepaksubramanyan, B.S., Nu, A.: ‘Analysis of subthreshold leakage reduction in
CMOS digital circuits’. Proc. of 50th Midwest Symp. on Circuits and Systems,
This paper has proposed a novel memory block for an LUT of an 2007, no. 1, pp. 1400–1404
FPGA using memristors as storage elements and NMOS 6 Zhao, W.S., Belhaire, E., Chappert, C., et al.: ‘Spin transfer torque (STT)-MRAM
transistors for column selection. An extensive analysis has been based run time reconfiguration FPGA circuit’, ACM Trans. Embedded Comput.
Syst., 2009, 9, (2), article 14
pursued; new WRITE and READ operations have been proposed 7 Sumanta, C., Weisheng, Z., Jacques, O.K., et al.: ‘High density asynchronous LUT
for this new LUT. They have been analysed using the so-called based on non-volatile MRAM technology’. Int. Conf. on Field Programmable
NSP as metric for a memristor behaviour. Logic and Applications, 2010, pp. 374–379
Unlike [11–13], the proposed WRITE operation requires +Vdd 8 Chen, Y., Zhao, J., Xie, Y.: ‘3D-nonfar: three-dimensional nonvolatile FPGA
architecture using phase change memory’. Proc. 16th ACM/IEEE Int. Symp.
(+1.5 V) and –Vdd (−1.5 V) only to WL; moreover, it does not Low Power Electronic Design, 2010, pp. 55–60
require an additional circuit to monitor the incoming data and then 9 Wen, C., Li, J., Kim, S., et al.: ‘A non-volatile look-up table design using PCM
channel it to WL/BL depending on the value of the data for the (phase-change memory) cells’. in Proc. VLSIC, 2011, pp. 302–303
WRITE operation. Therefore, the proposed memory block requires 10 Waser, R., Aono, M.: ‘Nature mater’. 2007, 6, pp. 833–840
11 Ho, Y., Huang, G.M., Li, P.: ‘Dynamical properties and design analysis for
only three power lines (+Vdd, −Vdd and GND), one less rail than nonvolatile memristor memories’, Proc. IEEE Trans. Circuits Syst. I, 2011, 58,
[11–13]. (4), pp. 724–736
The proposed operations also affect other figures of merit; as there 12 Haron, N.Z., Hamdioui, S.: ‘On defect oriented testing for hybrid CMOS/
is no need to apply the RESTORE pulse for the READ 1 operation, a memristor memory’. Proc. of ATS 2011, pp. 353–358
13 Xu, C., Dong, X., Jouppi, N.P., et al.: ‘Design implications of memristor-based
substantial reduction of energy dissipation is achieved. Methods RRAM cross-point structures’. Proc. Des. Autom. Test Eur., 2011, pp. 1–6
such as those found in [11–13] require a RESTORE pulse after 14 Turkyilmaz, O., Onkaraiah, S., Reyboz, M., et al.: ‘RRAM-based FPGA for
multiple READ 1 operations; they use a negative pulse for READ ‘normally off, instantly on’ applications’, IEEE/ACM Int. Symp. on Nanoscale
followed by a positive pulse for RESTORE. As shown by Architectures, 2012, pp. 101–108
15 Cong, J., Xiao, B.: ‘mrFPGA: a novel FPGA architecture with memristor-based
simulation results, the NSP of the memristor does not fall below reconfiguration’. Proc. of IEEE/ACM Int. Symp. on Nanoscale Architectures,
the threshold value using the proposed method; this consists of a 2011, pp. 1–8
+Vdd pulse applied to WL (READ pulse) of the memristor. It has 16 Blanchard, P., Gopalan, C., Shields, J., et al.: ‘First commercial demonstration of
been shown that this READ 1 operation does not affect the value an emerging memory technology for embedded flash using CBRAM’. Innovative
Memory Technologies Workshop 2011, MINATEC, Grenoble – France
of the NSP in the proposed cell and memory block. Moreover, 17 Burr, G.W., Kurdi, B.N., Scott, J.C., et al.: ‘Overview of candidate device
when applying a READ pulse, it has been shown that the energy technologies for storage-class memory’, IBM J. Res. Dev. 2008, 52, (4/5),
dissipated is dependent on the duration of the READ pulse width; pp. 449–464
so the worst-case delay in the circuit must be considered when 18 Kryder, M.H., Kim, C.S.: ‘After hard drives – what comes next?’ IEEE Trans.
Magn., 2009, 45, (10), pp. 3406–3413
selecting the duration of the READ pulse, i.e. the READ pulse 19 Chua, L.O.: ‘Memristor – the missing circuit element’. IEEE Trans. Circuit Theory,
must be larger than such delay. The same consideration is also 1971, 18, (5), pp. 507–519
applicable to the WRITE for correct operation of the proposed 20 Yang, J.J., Pickett, M.D., Li, X., et al.: ‘Memristive switching mechanism for
memory block. The simulation results show that the consumed metal/oxide/metal nanodevices’, Nat. Nanotechnol., 2008, 3, pp. 429–433
21 Kumar, T.N., Almurib, H.A.F., Lombardi, F.: ‘On the operational features and
energy reduces when smaller feature sizes are employed for the performance of a memristor-based cell for a LUT of an FPGA’. 13th IEEE Int.
NMOS. As expected the increase in LUT size results in modest Conf. on Nanotechnology, 2013, pp. 71–76
increase of delay and energy dissipation (and their product); these 22 Almurib, H.A.F., Kumar, T.N., Lombardi, F.: ‘A memristor-based LUT for
increases are substantially less than those incurred by using the FPGAs’. Proc. of the Ninth IEEE-NEMS, 2014, pp. 448–453
23 Biolek, Z., Biolek, D., Biolova, V.: ‘SPICE model of memristor with nonlinear
proposed schemes of [11–13]. dopant drift’, Radioengineering, 2009, 18, (2), pp. 210–214
In addition to a circuit-level evaluation, the proposed LUT scheme 24 Predictive Technology Model (PTM) website. Available at http://www.ptm.asu.
has also been assessed with respect to FPGA implementation. edu/
IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
300 & The Institution of Engineering and Technology 2016