You are on page 1of 9

IET Circuits, Devices & Systems

Research Article

ISSN 1751-858X
Design and evaluation of a memristor-based Received on 16th April 2015
Revised on 23rd September 2015
look-up table for non-volatile field Accepted on 27th October 2015
doi: 10.1049/iet-cds.2015.0217
programmable gate arrays www.ietdl.org

Haider Abbas F. Almurib 1, Thulasiraman Nandha Kumar 1 ✉, Fabrizio Lombardi 2


1
Faculty of Engineering, The University of Nottingham Malaysia Campus, Semenyih, Selangor, Malaysia
2
Department of Electrical and Computer Engineering, Northeastern University, Boston 02115, USA
✉ E-mail: nandhakumaar.t@nottingham.edu.my

Abstract: This study presents the detailed design and analysis of a new memristor-based look-up table (LUT) for field
programmable gate arrays (FPGAs). The proposed memory utilises memristors as storage elements with N-type metal–
oxide–semiconductor transistors for row access. New WRITE and READ operations are proposed; the proposed LUT
requires no additional circuit to handle the WRITE 1 (0) operation. The proposed method requires a RESTORE pulse
only for the READ 0 operation. Moreover, the WRITE operation of the proposed method requires three power lines and
a RESTORE pulse only for the READ 0 operation, thus saving 25% READ time when compared with previous methods.
In addition, the proposed method does not require the REFRESH pulse and does not dissipate power during stand-by
mode. Extensive simulation results are presented with respect to different operational features such as normalised
state parameter, pulse width and LUT size. In addition to a circuit-level evaluation, the proposed LUT scheme has also
been assessed with respect to FPGA implementation. Simulation results using sequential benchmarks mapped on
Spartan 4 and 5 FPGAs show that the proposed non-volatile LUT outperforms existing static random access memory
cell-based LUTs in terms of performance.

1 Introduction The memristor is a passive element postulated in [19] and realised


by using a nanoscale thin film of titanium dioxide by
Field programmable gate arrays (FPGAs) offer programmability at Hewlett-Packard (HP) Labs [20]. Memristor-based memories have
relatively low development cost and good performance [1]. The been extensively analysed in the technical literature [11–15]; these
common FPGA architecture consists of a regular, programmable memories have been advocated as a potential replacement for
two-dimensional (2D) array of configurable logic blocks with conventional NV flash memories due to the high density and low
programmable input/outputs along its perimeter [2]. All power consumption [12, 15]. An LUT in an FPGA must meet
configurable resources [inclusive of the look-up tables (LUTs)] are specific requirements to ensure that a memristor-based
controlled by the configuration bits stored in a static random access implementation is effective: its size is usually small (Table 1) and
memory (SRAM) cell [2]. However, an SRAM is unable to retain once programming (WRITE) for an application is accomplished,
the configurations bits should the power be lost. Hence, as a the FPGA requires a fast and NV READ operation (Table 1). A
possible solution, non-volatile (NV) flash memory is integrated novel architecture that avoids the 3D stacking process for the
into the FPGA for storing the configuration bits [3]. However, it interconnect of an FPGA has been proposed in [15]; it uses only
leads to issues such as a larger silicon area, increase in cost [4] and memristors and metal wires for implementing the interconnect.
very slow data retrieving time. Moreover with the nanotechnology, Turkyilmaz et al. [14] has proposed a different technique; it uses
a substantial increase of leakage current is encountered when the NVSRAM memories with bipolar OxRRAM. A power reduction
FPGA is in a stand-by mode [5], hence causing additional power in stand-by mode is reported compared with a conventional
dissipation. Thus, an alternative NV memory block (as an LUT) SRAM-based FPGA. However, the above techniques employ six
based on the so-called memristor as storage device is proposed in transistors (for SRAM) and two memristors per information bit; a
this paper to overcome the above-mentioned issues. perfect clock gating is also assumed. Therefore, these methods
Emerging NV memory technologies such as spin transfer torque suffer from dynamic power dissipation and incur in substantial
(STT)-magnetic RAM (MRAM) [6, 7], phase change memory area and delay overheads. Moreover, these methods have
(PCM) [8–10], conductive bridge RAM (CBRAM) [10] and complicated WRITE and READ operations.
resistive RAM (RRAM) [11–14, 16] have been proposed to This paper proposes a novel memory block for an LUT of an
potentially supersede SRAM-based LUTs, because they show a FPGA using memristors. The LUT proposed in this paper is
nearly zero stand-by power, a high speed, density and endurance designed using memristors; as shown later in this paper, it has a
cycle, also at low costs. Comparison between different NV fast access time and no power dissipation during stand-by mode.
memories [16–18] as per different metrics is presented in Table 1. New WRITE and READ operations are proposed for the memory
The on/off resistance ratio of a STT-MRAM cell is comparatively block; the proposed WRITE 1 (0) operation is performed by
poor; moreover, for robust operation, one of the most crucial applying a pulse of +Vdd (–Vdd) only at the word line (WL).
requirements is the design of the sense amplifier [7]. The PCM Similarly, the READ operation is carried out by applying a pulse
based FPGA of [8] incurs in to a long configuration time and high of ±Vdd at WL, while the status of the selected memristor is found
power dissipation when the power is on; the PCM based LUT of at the bit line (BL). The substantial differences between the
[9] suffer from a high active leakage power during normal proposed method and previous methods [11–13] are listed in
operation. Furthermore, a PCM requires a large programming Table 2. Simulation results for benchmark circuits in FPGA
current and incurs in a resistance drift due to material relaxation. implementation are provided to substantiate the applicability of the
The switching voltage of the CBRAM is low so resulting in poor proposed LUT scheme and its superior performance compared
retention [10]. with an SRAM-based LUT.

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
292 & The Institution of Engineering and Technology 2016
Table 1 Features of different NV memories Table 3 Voltage requirements for WRITE and READ operations on M11
using the proposed method
STT-MRAM PCM CBRAM RRAM,
memristors Voltage on Voltage on Voltage on Voltage on
WL1 WL2 BL1 BL2
cell size, F2 37 8–16 <20 >5
read latency, ns <10 48 ∼20 <10 write Vdd floating GND floating
write latency, ns 12.5 40–150 ∼100 ∼10 1
energy per bit 0.02 100 2 2 write −Vdd floating GND floating
access, pJ 0
15 8 5 5
endurance >10 10 >10 10 read ±Vdd GND to load to load

paper is to design an LUT using memristors and evaluate its


Table 2 Differences between proposed method and [11–13] performance for the READ and WRITE operations. A controller
for the memory inputs has been designed; it has 70 metal–oxide–
Feature [11–13] Proposed method
semiconductor (MOS) transistors at 45 nm technology node [24],
WRITE operation required data monitor data monitor circuit not it has a static leakage current of 6061.568 pA; no issue with noise
circuit required margin has been observed.
performed at WL and BL performed only at WL In the memory array, the horizontal wires are the WLs and the
READ operation negative pulse positive pulse vertical wires are the BLs. Every BL is connected to the ground
RESTORE pulse positive pulse negative pulse
required for both READ required only for READ 0 (GND) through an N-type MOS (NMOS) transistor (T1,T2); so
1 and 0 according to the input data, the controller block controls the data
REFRESH pulse required not required to be driven on WL. Moreover, it switches the transistors on and
V/2 biasing required not required off by controlling the gate signals (G1 and G2) and selects (Sel)
data integrity affected not affected
number of four three the appropriate BL value to the output (Out). A and B are the
power rails inputs to the control block, i.e. the two inputs of the LUT; Out is
the output of the LUT. WRITE is executed on a column wise
basis when WriteEn is high. To WRITE to all memristors
connected to a column, the controller selects the corresponding
The submitted paper is significantly different in technical contents BL, generates the appropriate voltage based on the input data and
from [21, 22] in terms of simulation and analysis of (i) performance applies to the respective WLs. The voltage requirements for the
of the normalised state parameters (NSPs) of the unselected signals of the WRITE and READ operations of the selected
memristors at different array sizes under the WRITE operation, (ii) memristor (M11) are shown in Table 3. The truth table of the
new READ method using an equivalent circuit model, (iii) functions of the controller block is given in Table 4.
worst-case performance of the READ and WRITE operations, (iv) As an example, let the inputs of the LUT (AB) be 00; the
the proposed NV and SRAM-based LUT designs for Xilinx controller selects the memristor M11 for the operation (i.e.
Virtex4 and Virtex5 FPGAs to implement the International WRITE/READ) by turning on T1 and applying the appropriate
Symposium on Circuits and Systems (ISCAS)’89 sequential voltage to WL1. Similarly for selecting the memristor M22, the
benchmark circuits. appropriate voltage is applied to WL2 and T2 is turned on. The
output multiplexer is enabled only during the READ operation.
The READ is performed on each memristor; the READ value is
2 Proposed LUT obtained at the OUT. ReadEn is made high to READ a memristor.
Then depending on the values of A and B, the controller selects
The proposed memory block for a two-input LUT (Fig. 1) consists of the corresponding BL of the memristor and applies the READ
two parts: (a) the memory array that consists of four nanocross wires voltage (Table 3) to the respective WL of the memristor.
(in which a memristor is connected at every junction); (b) a Moreover, the controller generates the appropriate select signal
controller (dotted box of Fig. 1) that is utilised to provide (0/1) to propagate the READ value to OUT. For example, to
appropriate control signals to the memory. The main focus of this READ M11, BL1 is selected, Sel is ‘0’ and the READ voltage is

Fig. 1 Proposed memory block as a two-input LUT implemented using memristors

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 293
Table 4 Truth table of controller 3.1 WRITE 1 operation
A B WL1 WL2 G1 G2 Sel
For the WRITE 1 operation, consider a pulse (+Vdd) is applied to
WL until the NSP of the memristor (initially at ROFF) changes
0 0 1 0 1 0 0
0 1 1 0 0 1 1 from ROFF (0) to RON (1); thus, a logic 1 is successfully written to
1 0 0 1 1 0 0 the memory cell. Fig. 2a shows the value of the NSP, whereas
1 1 0 1 0 1 1 Fig. 2b shows the output voltage at the load after applying +Vdd
(+0.9 V) till 150 ns. It should be observed that the NSP fully
changes from the initial value of 0 to 1 at about 135 ns; thus, the
applied at WL1. In addition, the proposed controller block is WRITE 1 operation is correctly executed.
designed such that simultaneous READ and WRITE operations are
not possible. Prior to writing the configuration bits to the LUT, all
memristors are assumed to be in the off state (0 state). 3.2 WRITE 0 operation
The proposed memory block provides many qualitative advantages
compared with an SRAM-based LUT. When the power to the For a WRITE 0 operation, consider a pulse (−Vdd) is applied to WL
proposed LUT is turned off, the NSPs [21, 22] of the memristors until the NSP of the memristor (initially at RON) changes from RON (1)
are retained due to its NV nature. Therefore, unlike an SRAM-based to ROFF (0). Thus, a logic 0 is successfully written to the memory cell.
LUT, there will be no power dissipation during the stand-by mode; As shown in Fig. 2a for WRITE 0, –Vdd (−0.9 V) is applied to WL at
moreover, the dynamic power dissipation is very small when 170 ns and the NSP changes from 1 and reaches the value of 0 at
compared with an SRAM-based LUT as the latter consumes more about 320 ns, i.e. the WRITE 0 operation is correctly executed.
power particularly when switching the inverters. Six transistors (6 T) Thus unlike [11–13], the proposed WRITE operation requires
are usually required for storing a single bit in a complementary applying +Vdd (+0.9 V) and –Vdd (−0.9 V) only to WL, i.e. it
MOS (CMOS) SRAM-based LUT; hence, a two-input LUT requires does not require an additional circuit to monitor the incoming data
24 transistors. In the proposed memory block, only four memristors and then channel it to WL/BL depending on the value of the data
and two transistors are required. A detailed evaluation of for the WRITE operation. In addition, the proposed method does
performance and related metrics will be performed in subsequent not require the application of +Vdd and +0.5 Vdd to WL and BL to
sections. In the proposed work, the updated version of the unselect a memristor; therefore, the proposed method uses only
simulation program with integrated circuit emphasis (SPICE) model three power rails (+Vdd, −Vdd and GND). Moreover as shown
of [23] is utilised for simulating the memristor; as it shows close later, in the proposed method, the changes to the NSP of the
resemblance to the HP Labs implementation [20]. Moreover, unselected memristors are small when compared with [11–13].
throughout this paper unless specified, the default values of the
parameters used in [21] are adopted.
3.3 READ 1(0) operation

3 Operational features A pulse of +Vdd (with a duration of 10 ns) is applied to WL (READ


pulse) of the memristor with NSP at 1(0) and the READ value is
Consider first WRITE and READ on a single bit LUT in which a found across the NMOS. The simulation results are shown in
terminal of the memristor is connected to the WL, while the other Figs. 2a and b. The READ 1 operation performed at 155 ns shows
terminal is connected to the BL; an NMOS that acts as a load and that it did not affect the NSP of the memristor. Therefore, unlike
by controlling it (on/off) the corresponding memristor is selected/ [11–13], the proposed method does not require a RESTORE pulse
unselected for the WRITE/READ operation. following the READ 1 operation. Thus, the proposed method

Fig. 2 Response to a sequence of WRITE 1, READ 1, RESTORE, WRITE 0, READ 0 and RESTORE operations on a single memristor using the proposed method
a Applied voltage and resulting NSP
b Output voltage

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
294 & The Institution of Engineering and Technology 2016
accomplishes significant READ and RESTORE time savings when A similar simulation assessment (consecutive READ and
compared with [11–13], as analytically proven next. RESTORE operations) has been performed on methods presented
Let α be the percentage of READ 1 operations and β be the in [11–13]. The effect of NSP due to READ 0 operation is similar
percentage of READ 0 operations over a total number of N READ to the proposed method; however, the READ 1 operation causes
operations; Ttotal denotes the total time required to perform all N the NSP of the memristor to fall below the threshold value.
READs and is given by Therefore, [11–13] requires a REFRESH pulse when the NSP
reaches the threshold value of 0.5 and a circuit is required to
    monitor the NSP of a memristor [21]. More detailed analysis is
Ttotal = a · N · T1 + Tr1 + b · N · T0 + Tr0 (1) presented in [21].

where T1 is the time required for a READ 1 operation, Tr1 is the


RESTORE time after a READ 1, T0 is the time required for the 4 LUT operations
READ 0 operation and Tr0 is the RESTORE time after the READ
0 operation. In this section, different features related to the LUT-level
Therefore, for the methods of [11–13], Tr1 = Tr0 ≠ 0 while for the performance of the proposed LUT block (Fig. 1) are pursued in
proposed method, Tr1 = 0 and Tr0 ≠ 0. Assuming that all pulse widths terms of the WRITE and READ memory operations by
are equal (i.e. T1 = T0 = Tr0 = T ), then the total time to perform all considering the scenario under which the memristor M11 is
READ operations using the proposed method is given by selected for the WRITE and READ operations, while the other
memristors (M12, M21 and M22) are unselected.
Ttotal = a · N · (T ) + b · N · (2 · T ) (2)
4.1 WRITE operation
The largest saving in total time (Ts-max) that can be achieved using
the proposed method is 50% of the Ttotal of the [11–13]. Now consider all the memristors are at ROFF state and the proposed
Alternatively for equal amount of READ 1 and 0 operations (α = WRITE 1 and 0 operations are sequentially performed on M11, as
0.5 and β = 0.5), the proposed method achieves a saving of 25% expected the NSP of M11 changes from 0 to 1 (1 to 0) during the
for Ttotal. Thus, the proposed method saves 25% of READ and WRITE 1 (0) operation. However, the NSPs of the unselected
RESTORE times compared with [11–13]. memristors M21 and M12 show a slight change toward RON (well
A READ 0 operation performed at 355 ns shows (Fig. 2b) that the below threshold value) during the WRITE 1 operation and no
output voltage is low during the READ 0 operation. However, as change during the WRITE 0 operation, respectively; therefore, the
shown in Fig. 2a, the NSP changes slightly toward RON. To correct logic value (0) is stored in M21 and M12. The NSP of the
restore the NSP to its original value, a pulse of –Vdd is applied for unselected memristor M22 does not change during the WRITE 1
10 ns; this signal is referred to as the RESTORE pulse. A circuit is operation, but it changes significantly toward RON during the
required to control the application of the RESTORE pulse; this WRITE 0 operation. Hence to overcome change in the NSP of
circuit operates depending on the NSP value of the memristor some of the unselected memristors the WRITE operation of the
when a READ operation is performed. A RESTORE pulse is then proposed memory (i.e. the programming of an LUT) executes in
applied at 365 ns to restore the NSP to its original value (0). two phases as follows:
However, this pulse does not fully restore the NSP to its original
value due to the insufficient width of the RESTORE pulse. Hence † Phase 1 (Refresh Phase): Before writing to the array all the
consecutive READ 0 operations may cause the NSP to reach the memristors are written to 0.
threshold level, thus eventually changing the value stored in the † Phase 2 (Write 1): Write the 1 to all selected memristors of the
memory. LUT.
Assessment of the number of consecutive READ 0 and
RESTORE operations can be found in [21]; a ringing behaviour A similar experiment performed using the methods proposed in
has been reported in the operation of the memristor. Therefore, for [11–13] and the results are presented in Table 5. Columns 2 and 3
ringing to have no effect on the memory state, the width of the show the number of unselected memristors that are affected during
READ pulse must be smaller or equal to the RESTORE pulse the WRITE 1 operation in the proposed and [11–13] respectively.
width [21]. Therefore, the proposed method does not require a Columns 4 and 5 represent the worst-case change in NSP of those
REFRESH pulse to be applied after few READ operations [21]. affected (unselected) memristors using the proposed and previous
As described previously, an additional circuit is required to decide methods [11–13]. The number of unselected memristors affected
when to apply the RESTORE pulse. in the proposed method is twice that of [11–13]. However, for

Fig. 3 Equivalent circuit for LUT reading


a Floating-row method
b Grounded-row method

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 295
Table 5 NSP of unselected memristors under Scenario 1 memristor Mij, RTj denotes the on resistance of transistor Tj and Vj
denotes the output voltage at node Outj, where i = 1, 2 and j = 1, 2.
LUT size Number of affected, Worst-case NSP among
unselected memristors unselected memristors with Under the floating-row method, let RM = R21 + R22 and RT = RT1 =
initial value of 0 RT2, so the output voltages V1 and V2 are given as follows

Proposed method [11–13] Proposed method [11–13]    


Vin RT RM 2 RM + R11 + R12
V1 = R12 + Vin RT (3)
2×2 2 1 0.179 0.868 S S
4×4 6 3 0.242 0.838    
6×6 10 5 0.261 0.827 Vin RT RM 2 RM + R11 + R12
8×8 14 7 0.269 0.812 V2 = R11 + Vin RT (4)
S S

where
the proposed method the worst case of NSP is well below the
 
threshold (i.e. the threshold is 0.5), whereas for [11–13] the S = RM R11 R12 + RT RM R11 + RM R12 + 2R11 R12
worst-case NSP is always greater than the threshold level. Hence,  
it is obvious that the previous method does not hold the memory + R2T RM + R11 + R12
state of the unselected memristors when performing a WRITE
operation on the selected memristors. Under the grounded-row method, RT = RT1 = RT2 and the output
voltages V1 and V2 are given by
4.2 READ operation V R
V1 =  in P1  (5)
The process of reading the LUT using the proposed method is RP1 + R11
performed by reading the selected cell in a row then using MUX
V R
to output the content of the selected cell. To substantially V2 =  in P2  (6)
differentiate the output voltage difference between READ 0 and 1 RP1 + R12
operations, a grounded-row method (Fig. 2b) that connects the
unselected rows to ground rather than floating (previous methods where 1/RP1 = 1/RT + 1/R21 and 1/RP2 = 1/RT + 1/R22 are the parallel
Fig. 2a) is utilised in this paper. The following modelling analysis resistances between the unselected memristors and the transistors.
supports the utilisation of the proposed method. Consider the floating-row method; as shown in (3) and (4) the
Figs. 2a and b show the equivalent circuits of the 2 × 2 LUT under difference between the two outputs (V1 and V2) is decided solely
these methods at steady state. Rij denotes the resistance of the by R11 and R12. Moreover, the second term of the two equations is

Fig. 4 Worst-case difference between reading 0 and 1 NSP for different RON and LUT sizes for the grounded-row method

Fig. 5 Worst-case WRITE operation for different sizes of LUTs results


a Delay
b Energy
c Energy delay product (EDP)

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
296 & The Institution of Engineering and Technology 2016
Table 6 Average WRITE delay, energy and EDP of SRAM and the proposed memristor-based LUTs
LUT size Average delay, ns Average energy, pJ Average EDP, pJns

SRAM Proposed SRAM Proposed SRAM Proposed

2×2 0.103 281.19 0.0016 30.53 0.00017 10,666


4×4 0.206 631.28 0.0067 97.50 0.0014 98,448
6×6 0.309 1151.76 0.0151 213.85 0.0047 435,339
8×8 0.412 1821.81 0.0269 379.5 0.0111 1,285,099

significantly larger causing the difference between the two voltages considered for the floating-row method, the output voltages are
very small. As an example, consider R11 = 19 kΩ and R12 = R21 = now given by V1 = 0.0046 V and V2 = 0.4460 V. That is, there is
R22 = 100 Ω. As RT = 5.552 kΩ at 32 nm feature size and Vin = 0.9 now a clear distinction between a memristor with a 0 NSP and a
V, the output voltages are V1 = 0.8399 V and V2 = 0.8695 V. memristor with a 1 NSP. This is made possible due to the parallel
Therefore, it is very difficult to distinguish between the two arrangement between the unselected memristors and the transistor
outputs when the two memristors have different values of NSP. resistance, i.e. the selected memristor is likely to determine the
Now consider the grounded-row method where the output voltage outcome of the voltage divisor.
is effectively caused by the voltage divider. Therefore, the output Next, the performance of the grounded-row method is studied for
voltage depends mostly on the resistances of the column for the array sizes; it is observed that the final value of the voltage for READ
READ operation. Under the same conditions as previously output and hence the difference reduces. The worst case occurs when

Fig. 6 Worst-case READ operation for different sizes of LUTs results


a Delay
b Energy
c EDP

Table 7 Average READ delay, energy and EDP of SRAM and proposed memristor-based LUTs
LUT size Average delay, ps Average energy, fJ Average EDP, fJps

SRAM Proposed SRAM Proposed SRAM Proposed

2×2 320.25 3.073 0.46623 5.64 149.31 16.80


4×4 1094.96 5.87 1.71822 22.01 1881.39 122.05
6×6 3362.08 8.63 9.3831 49.23 3,154,677 398.82
8×8 14,922.75 11.37 692.839 87.29 10,339,066 928.43

Table 8 Evaluation of the proposed and SRAM-based LUTs by mapping ISCAS’89 circuits on Virtex4 FPGA

Benchmark Two input Three Four WRITE READ


circuit LUTs input LUTs input
LUTs Average delay, ns Average EDP, pJ Average delay, ns Average EDP, fJns

Proposed SRAM Proposed SRAM Proposed SRAM Proposed SRAM

298 4 6 11 11,443.12 3.917 17,862,481 0.38903 0.1137 17.168 0.037 0.452


400 11 14 19 22,960.73 7.938 69,873,405 1.50597 0.2313 33.293 0.147 1.691
510 13 17 55 47,936.33 16.186 3.26 × 108 7.18203 0.4672 75.274 0.689 8.761
820 10 26 72 62,885.94 21.238 5.6E × 108 12.352 0.6131 98.692 1.18 15.05
953 25 33 123 103,235.7 34.744 1.52E × 109 33.6919 1.0016 163.822 3.22 41.56
1238 27 41 155 128,498.1 43.198 2.37E × 109 52.3959 1.2448 204.626 5.01 64.88
1488 26 48 183 149,829.4 50.312 3.23E × 109 71.5095 1.4491 239.448 6.83 88.89
5378 65 96 206 202,309.5 68.973 5.65E × 109 123.559 1.9989 307.866 11.96 145.82
15,850 140 270 376 428,570.5 147.639 2.46E × 1010 532.564 4.296 629.474 52.03 606.08
35,932 1192 489 1307 1,435,265 493.230 2.78 × 1011 6031.66 14.34 2126.055 588.04 6923.80

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 297
the value of the NSP of all unselected memristors is the smallest

1.3686
3.5403
17.382
115.528
250.254
223.397
278.452
506.961
2800.221
20,665.076
SRAM
Average EDP, pJns
(RON), i.e. in this case the unselected memristors on a column
have NSPs of 1. Fig. 4 shows the output voltage difference when
reading two memristors (one with an NSP of 1 and the other with
an NSP of 0) for different values of RON and LUT sizes (the LUT

0.0703
0.2109
0.7648
Proposed
size refers to the LUT memory size) in the worst case of the

4.87
10.54
10.04
11.60
25.31
148.31
1105.20
grounded-row method. As expected, with an increase in memory

READ
size, the difference in the output voltage decreases due to the
decrease in the load resistance. Moreover, as RON increases (under
a constant ROFF/RON ratio), the output voltage difference

29.61
47.64
105.30
271.46
399.73
377.64
421.63
569.75
1339.15
3650.18
SRAM
Average delay, ns
decreases, as caused by the decrease in the amount of current
flowing in the circuit. As shown in Fig. 4, the reference voltage of
a comparator at 0.1 V can sense logic 1 and 0 of an LUT with up

Proposed

0.1198

0.3640
to eight inputs. Thus unlike [11–13], the proposed READ

0.223

0.899
1.325
1.334
1.382
2.240
5.580
15.451
operation does not require a sense amplifier.

0.7553
5 Simulation results

115.22
109.36
126.85
272.80
1593.16
2.25
8.35
53.31

11,733.93
SRAM
Average EDP, pJns
This section presents a simulation-based evaluation of the proposed
scheme and a comparison with [11–13] is pursued with respect to the
worst-case scenario (the results for the average case are reported in

38,974,005

1.42 × 1010
8.08 × 1010
6.13 × 1011
Proposed

1.09 × 108
4.52 × 108
2.95 × 109
6.41 × 109
5.89 × 109
7.09 × 109
[21]). Moreover, comparison is performed between the proposed
memristor-based and SRAM-based LUTs.

WRITE

32.167
SRAM
5.1 WRITE operation

4.22
7.83
12.99

47.32
47.52
49.38
79.18
196.81
541.06
Average delay, ns
A detailed performance analysis is pursued by applying the WRITE
operation to LUTs of different sizes and then comparing the results

14,937.09
25,812.24
49,140.09
124,419.93
183,517.56
178,134.89
192,626.61
283,224.22
684,171.86
1,897,444.42
with [11–13]. The following scenarios (as in [22]) are simulated:

Proposed
Scenario 1: WRITE 1 to all memristors. Scenario 2: WRITE 0 to
all memristors. Scenario 3: WRITE 0 to a memristor while the
NSPs of all memristors are initially 1. Scenario 4: WRITE 1 to a
memristor while the NSPs of all memristors are 0. The worst
Six input LUTs

values for the WRITE operation at different array sizes are


Table 9 Evaluation of the proposed and SRAM-based LUTs by mapping ISCAS’89 circuits on Virtex5 FPGA

calculated based on the simulation results obtained for the above 4


4
17
47
71
60
76
80
163
542
scenarios. From the results (Fig. 5), it can be observed that the
worst-case delay incurred by the proposed method is slightly less
than [11–13]. However, the energy dissipated and EDP of the
proposed method are significantly less than [11–13].
Five input LUTs

Next the WRITE operation performance of the proposed method


is compared with SRAM-based volatile LUTs. In this volatile
197
129
3
9
11
25
31
39
24
59
LUT, each cell is a 6 T SRAM designed using 32 nm. Table 6
shows the results for the SRAM-based and the proposed NV
LUTs; the average write time and the average EDP of the
SRAM-based scheme are significantly less than the proposed
Four input LUTs

memristor-based cell.
3
6
5
7
10
19
24
49
96
380

5.2 READ operation

The READ operation is performed on different array sizes


Three input LUTs

considering the scenarios stated in [22]. Scenario 5: READ 0


when only the NSP of one memristor is 0 (i.e. the NSP of all
other memristors are 1). Scenario 6: READ 0 when the NSPs of
3
5
2
3
9
9
10
41
94
326

all memristors are 0. Scenario 7: READ 1 when only the NSP of


one memristor is 1 (i.e. the NSPs of all other memristors are 0).
Scenario 8: READ 1 when the NSPs of all memristors are 1. The
process of reading the LUT is performed in two steps; (i) the row
Two input LUTs

on which the selected (target) cell is located is initially read to


determine the worst propagation delay of the READ, and therefore
1152
1
2
0
4
13
9
11
32
89

establish the minimum width of the READ signal. (ii) the MUX
outputs the contents of the target cell. Fig. 6 shows the results of
worst-case READ operation applicable to the whole LUT for the
proposed method and [11–13]. It is observed that the average and
Benchmark circuit

worst-case values of the EDP in both methods increase as the size


of the array increases. Among the two, the proposed method
incurs in a significantly smaller EDP value for all LUT sizes.
Moreover, the proposed method requires significantly less READ
15,850
35,932

time and dissipates less energy for all LUT sizes. In [11–13], the
1238
1488
5378
298
400
510
820
953

unselected rows for the READ operation are left floating and this

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
298 & The Institution of Engineering and Technology 2016
Fig. 7 KT and TKT for different benchmark circuits

significantly decreases the read voltage difference between the read 0 benchmark circuits on the given FPGA type. The results are
and 1 operations. presented in Tables 8 and 9. The average delay and energy
Next the READ operation performance of the proposed method is required by the proposed LUT design during the WRITE operation
compared with SRAM-based volatile LUTs. As shown in Table 7, are significantly high. However, for the READ operations, the
the proposed method incurs in a significantly reduced READ delay delay required by the proposed LUT design is very small when
and EDP compared with an SRAM-based LUT. This confirms the compared with an SRAM-based LUT. Thus confirms viability of
viability of the proposed cell, because in an FPGA, READ the proposed cell as an LUT for FPGA, because in an FPGA,
operations are performed more often than WRITE operations. READ operations are performed more often than WRITE operations.
Next, the average performance of the proposed method is carried
out by estimating the number of consecutive READ operations
5.3 Benchmark evaluation required immediately following a WRITE, such that total delay
incurred by the proposed scheme is equal to the delay incurred by
The proposed NV and the volatile SRAM-based LUT designs are the SRAM-based LUT.
also evaluated and compared by implementing the ISCAS’89 Let TO,I,B denote the average delay time for operation O (either
sequential benchmark circuits on the Xilinx Virtex4 FPGA READ, R or WRITE, W) using scheme I (I = SRAM or I = MEM)
(XC4VLX100) and the Virtex5 FPGA (XC5VLX220). Each for benchmark B; in a similar manner, it is possible to define EO,I,
volatile LUT is replaced with the NV LUT version using the B for the average EDP under the same cases. Let
proposed memristor-based scheme; the interconnect routing of the
benchmark circuits is kept the same as in the original Xilinx
FPGAs. The average delay and energy are found using circuit TO,I,B = tW ,I,B + KT tR,I,B (7)
simulation (for both the proposed LUT and the SRAM LUT) for
the WRITE and READ operations of all LUTs (for both the
proposed and the SRAM-based LUTs) for implementing different EO,I,B = eW ,I,B + KE eR,I,B (8)

Fig. 8 KE and TKE for different benchmark circuits

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
& The Institution of Engineering and Technology 2016 299
(7) and (8) denote the average performance metrics (delay time and Simulation results using sequential benchmarks mapped on
EDP) using a given scheme for a benchmark under a single WRITE Spartan 4 and 5 FPGAs show that the proposed NV LUT
followed by KT or KE consecutive READ operations. KT and KE are outperforms existing SRAM-based LUTs in terms of performance
the least integers such that TO,MEM,,B is equal or greater than TO, (delay time and energy-delay product). Thus confirms viability of
SRAM,,B and EO,MEM,,B is equal or greater than EO,SRAM,,B, the proposed cell as an LUT for FPGA as READ operations are
respectively, for each benchmark circuit. The results for KT in performed more often than WRITE operations in an FPGA. As
different benchmark circuits when mapped to the Virtex4 and current work, we are developing other NV configurable resources
Virtex5 FPGAs are plotted in Fig. 7. The average value of KT is (such as the switch blocks) for an FPGA as well as integrating the
656 and 490 for the Virtex4 and Virtex5, respectively; also across designs of all resources (inclusive of the controller) such that
all benchmark circuits, KT is smaller for benchmark circuits performance can be improved furthermore.
mapped to the Virtex5 FPGA. This occurs because the Virtex5
FPGA uses LUTs with a larger number of inputs when mapping
the sequential circuits. Hence, the proposed LUT performs better
with FPGAs having LUTs of larger size, and therefore the 7 Acknowledgments
proposed scheme is shown to be scalable; the total time for a
WRITE followed by KT READ operations in each benchmark Please be informed that this manuscript is an extended version of
circuit (denoted by TKT) is found and plotted in Fig. 7; TKT is a [21, 22] by the same authors.
few microseconds for each benchmark circuit. The results for KE
for different benchmark circuits when mapped to the Virtex4 and
Virtex5 FPGAs are shown in Fig. 8. Similar to KT, KE is smaller
for benchmark circuits mapped to the Virtex5 FPGA; however, the 8 References
total time for a WRITE followed by KE READ operations in each
benchmark circuit (denoted by TKE) is a few milliseconds (Fig. 8). 1 Almurib, H.A.F., Kumar, T.N., Lombardi, F.: ‘Scalable application-dependent
Note however that the average values of KE for the Virtex FPGAs diagnosis of interconnects of SRAM-based FPGAs’, IEEE Trans. Comput.,
2013, 63, (6), pp. 1540–1550
are substantially larger than those for KT. 2 ‘Xilinx Spartan Data sheet’. Available at http://www.xilinx.com
3 ‘Xilinx SpartanTM-3AN FPGAs’ Available at http://www.xilinx.com
4 ITRS: ‘International Technology Roadmap For Semiconductors 2011 Edition
6 Conclusion Executive Summary’, 2011
5 Deepaksubramanyan, B.S., Nu, A.: ‘Analysis of subthreshold leakage reduction in
CMOS digital circuits’. Proc. of 50th Midwest Symp. on Circuits and Systems,
This paper has proposed a novel memory block for an LUT of an 2007, no. 1, pp. 1400–1404
FPGA using memristors as storage elements and NMOS 6 Zhao, W.S., Belhaire, E., Chappert, C., et al.: ‘Spin transfer torque (STT)-MRAM
transistors for column selection. An extensive analysis has been based run time reconfiguration FPGA circuit’, ACM Trans. Embedded Comput.
Syst., 2009, 9, (2), article 14
pursued; new WRITE and READ operations have been proposed 7 Sumanta, C., Weisheng, Z., Jacques, O.K., et al.: ‘High density asynchronous LUT
for this new LUT. They have been analysed using the so-called based on non-volatile MRAM technology’. Int. Conf. on Field Programmable
NSP as metric for a memristor behaviour. Logic and Applications, 2010, pp. 374–379
Unlike [11–13], the proposed WRITE operation requires +Vdd 8 Chen, Y., Zhao, J., Xie, Y.: ‘3D-nonfar: three-dimensional nonvolatile FPGA
architecture using phase change memory’. Proc. 16th ACM/IEEE Int. Symp.
(+1.5 V) and –Vdd (−1.5 V) only to WL; moreover, it does not Low Power Electronic Design, 2010, pp. 55–60
require an additional circuit to monitor the incoming data and then 9 Wen, C., Li, J., Kim, S., et al.: ‘A non-volatile look-up table design using PCM
channel it to WL/BL depending on the value of the data for the (phase-change memory) cells’. in Proc. VLSIC, 2011, pp. 302–303
WRITE operation. Therefore, the proposed memory block requires 10 Waser, R., Aono, M.: ‘Nature mater’. 2007, 6, pp. 833–840
11 Ho, Y., Huang, G.M., Li, P.: ‘Dynamical properties and design analysis for
only three power lines (+Vdd, −Vdd and GND), one less rail than nonvolatile memristor memories’, Proc. IEEE Trans. Circuits Syst. I, 2011, 58,
[11–13]. (4), pp. 724–736
The proposed operations also affect other figures of merit; as there 12 Haron, N.Z., Hamdioui, S.: ‘On defect oriented testing for hybrid CMOS/
is no need to apply the RESTORE pulse for the READ 1 operation, a memristor memory’. Proc. of ATS 2011, pp. 353–358
13 Xu, C., Dong, X., Jouppi, N.P., et al.: ‘Design implications of memristor-based
substantial reduction of energy dissipation is achieved. Methods RRAM cross-point structures’. Proc. Des. Autom. Test Eur., 2011, pp. 1–6
such as those found in [11–13] require a RESTORE pulse after 14 Turkyilmaz, O., Onkaraiah, S., Reyboz, M., et al.: ‘RRAM-based FPGA for
multiple READ 1 operations; they use a negative pulse for READ ‘normally off, instantly on’ applications’, IEEE/ACM Int. Symp. on Nanoscale
followed by a positive pulse for RESTORE. As shown by Architectures, 2012, pp. 101–108
15 Cong, J., Xiao, B.: ‘mrFPGA: a novel FPGA architecture with memristor-based
simulation results, the NSP of the memristor does not fall below reconfiguration’. Proc. of IEEE/ACM Int. Symp. on Nanoscale Architectures,
the threshold value using the proposed method; this consists of a 2011, pp. 1–8
+Vdd pulse applied to WL (READ pulse) of the memristor. It has 16 Blanchard, P., Gopalan, C., Shields, J., et al.: ‘First commercial demonstration of
been shown that this READ 1 operation does not affect the value an emerging memory technology for embedded flash using CBRAM’. Innovative
Memory Technologies Workshop 2011, MINATEC, Grenoble – France
of the NSP in the proposed cell and memory block. Moreover, 17 Burr, G.W., Kurdi, B.N., Scott, J.C., et al.: ‘Overview of candidate device
when applying a READ pulse, it has been shown that the energy technologies for storage-class memory’, IBM J. Res. Dev. 2008, 52, (4/5),
dissipated is dependent on the duration of the READ pulse width; pp. 449–464
so the worst-case delay in the circuit must be considered when 18 Kryder, M.H., Kim, C.S.: ‘After hard drives – what comes next?’ IEEE Trans.
Magn., 2009, 45, (10), pp. 3406–3413
selecting the duration of the READ pulse, i.e. the READ pulse 19 Chua, L.O.: ‘Memristor – the missing circuit element’. IEEE Trans. Circuit Theory,
must be larger than such delay. The same consideration is also 1971, 18, (5), pp. 507–519
applicable to the WRITE for correct operation of the proposed 20 Yang, J.J., Pickett, M.D., Li, X., et al.: ‘Memristive switching mechanism for
memory block. The simulation results show that the consumed metal/oxide/metal nanodevices’, Nat. Nanotechnol., 2008, 3, pp. 429–433
21 Kumar, T.N., Almurib, H.A.F., Lombardi, F.: ‘On the operational features and
energy reduces when smaller feature sizes are employed for the performance of a memristor-based cell for a LUT of an FPGA’. 13th IEEE Int.
NMOS. As expected the increase in LUT size results in modest Conf. on Nanotechnology, 2013, pp. 71–76
increase of delay and energy dissipation (and their product); these 22 Almurib, H.A.F., Kumar, T.N., Lombardi, F.: ‘A memristor-based LUT for
increases are substantially less than those incurred by using the FPGAs’. Proc. of the Ninth IEEE-NEMS, 2014, pp. 448–453
23 Biolek, Z., Biolek, D., Biolova, V.: ‘SPICE model of memristor with nonlinear
proposed schemes of [11–13]. dopant drift’, Radioengineering, 2009, 18, (2), pp. 210–214
In addition to a circuit-level evaluation, the proposed LUT scheme 24 Predictive Technology Model (PTM) website. Available at http://www.ptm.asu.
has also been assessed with respect to FPGA implementation. edu/

IET Circuits Devices Syst., 2016, Vol. 10, Iss. 4, pp. 292–300
300 & The Institution of Engineering and Technology 2016

You might also like