Professional Documents
Culture Documents
4, APRIL 2020
Abstract— Static random access memory (SRAM)-based ternary upsets (SBUs) or multiple-bit upsets (MBUs). Cuckoo hashing offers
content-addressable memory (TCAM) on field-programmable gate arrays a low-cost solution for implementing efficient binary CAMs on
(FPGAs) is used for packet classification in software-defined networking
FPGAs [6]. The TCAM function in most of the SRAM-based FPGA
(SDN) and OpenFlow applications. SRAMs implementing TCAM con-
tents constitute the major part of a TCAM design on FPGAs, which solutions is defined by the content of the configured embedded mem-
are vulnerable to soft errors. The protection of SRAM-based TCAMs ories [i.e., block RAM (BRAM), distributed RAM (distRAM)] [7],
against soft errors is challenging without compromising critical path delay and a transient error may lead to a false match/mismatch and returns
and maintaining a high search performance. This brief presents a low- an incorrect match address. Accordingly, in the case of a soft error,
cost and low-response-time technique for the protection of SRAM-based
TCAMs. This technique uses simple, single-bit parity for fault detection the affected word of SRAM should be overwritten to retrieve the cor-
which has a minimal critical path overhead. This technique exploits rect matching information during lookups. However, the protection of
the binary-encoded TCAM table maintained in SRAM-based TCAMs SRAM-based TCAM solutions is challenging without compromising
for update purposes to implement a low-response-time error-correction the critical path delay and maintaining high search performance.
mechanism at low cost. The error-correction process is carried out in the
background, allowing lookup operations to be performed simultaneously,
This brief presents a low-cost, low-response time and easy for
thus maintaining a high search performance. The proposed technique integration technique for the protection of SRAM-enabled TCAMs
provides protection against soft errors with a response time of 293 ns, without compromising the search performance. The error detection
whereas maintaining a search rate of 222 million searches per second on is carried out in a simple way using single-bit parity checking at
a 1024 × 40 size TCAM on Artix-7 FPGA. a minimal delay and logic overhead. The proposed error-correction
Index Terms— Field-programmable gate array (FPGA), mem- technique exploits the redundant binary-encoded TCAM table main-
ory architecture, soft errors, static random access memory tained in SRAM-based TCAM solutions for update purposes to
(SRAM)-based ternary content-addressable memory (TCAM). correct soft errors. It maintains a high search performance while
the proposed error-correction mechanism is carried out in the back-
I. I NTRODUCTION ground, allowing search operations to be performed simultaneously.
Content-addressable memory (CAM) allows the stored content to The proposed error-correction technique has a low response time,
be searched in parallel in a single cycle, achieving a high search ensuring a faultless TCAM design for lookups, during the entire
performance. A binary CAM stores and searches data in only two (almost) processing time.
states: “0” and “1,” whereas a ternary CAM (TCAM) represents data
in three different states: “0,” “1,” and do not care state “x.” TCAMs
are extensively used in network systems for packet classification and II. SRAM-B ASED TCAM ON FPGA S
filtering [1], [2].
Modern static random access memory (SRAM)-based field- On-chip SRAM memories in modern FPGAs are used for imple-
programmable gate array (FPGA) technology offers the flexibility menting TCAM solutions. For example, a 1 × 1 TCAM can be
and reconfigurability with high performance required in software- implemented using a 2 × 1 RAM such that the match knowledge
defined networking (SDN) and OpenFlow network accelerators for for the presence of a “0” TCAM state is represented by storing a “1”
big data [2]. at RAM[0], a value of “1” by storing a “1” at SRAM[1], and “x”
Owing to the disturbances from high-energy neutron particles, state by storing a “1” at both the SRAM[0] and SRAM[1] locations.
circuits on SRAM-based FPGAs are susceptible to single-event A C-bit TCAM pattern can be implemented using a 1-bit SRAM of
upsets (SEUs) [3]. The on-chip embedded memory has been known 2C positions. The address of an SRAM represents the C-bit TCAM
as the most vulnerable to SEUs in advanced process technologies pattern, whereas the words of the SRAM stores match/mismatch
because of their increasingly small size and highly compact memory information for each word of the TCAM table against all the possible
cells [4], [5]. C-bit patterns. In this manner, a C-bit wide TCAM table of B words
An SEU in embedded memory generates a transient error until the can be implemented using a B bits wide SRAM of 2C positions.
corrupted data is overwritten [5]. SEUs can result in either single-bit Researchers divide the wide TCAM bit patterns into smaller
chunks as they do not scale well in terms of required memory
Manuscript received June 27, 2019; revised September 14, 2019 and
November 10, 2019; accepted December 17, 2019. Date of publication
in SRAM-based TCAMs. The W -bit wide bit patterns of TCAM
February 20, 2020; date of current version March 20, 2020. This work with depth D are divided into smaller chunks of C bits and then
was supported by the Samsung Research Funding and Incubation Center of implemented using AND-cascaded 2C × D size SRAMs [8], [9]. This
Samsung Electronics under Project SRFC-TB1803-02. The EDA tools used is explained using a simplified implementation of an SRAM-based
in this work were supported by IDEC, Daejeon, South Korea. TCAM shown in Fig. 1. The 4-bit patterns of a 4-word deep TCAM
Inayat Ullah and Jaeyong Chung are with the Department of Electron-
ics Engineering, Incheon National University, Incheon 22012, South Korea are divided into two partitions of 4 × 2, which are then implemented
(e-mail: jychung@incheon.ac.kr). using the two 4 × 4 SRAMs shown in Fig. 1(b). Let us consider a
Joon-Sung Yang is with the Department of Systems Semiconductor Engi- search key (1001) is applied for the search operation, the first two
neering, Yonsei University, Seoul, South Korea (e-mail: js.yang@yonsei. bits (10) would access the third word of the first SRAM (1100) and
ac.kr).
Color versions of one or more of the figures in this article are available
the last two bits (01) would access the second word of the second
online at http://ieeexplore.ieee.org. SRAM (1001). The SRAM words read are ANDed to get the final
Digital Object Identifier 10.1109/TVLSI.2020.2968365 match result (1000) which represents a match for rule R0 .
1063-8210 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 00:25:50 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 4, APRIL 2020 1085
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 00:25:50 UTC from IEEE Xplore. Restrictions apply.
1086 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 4, APRIL 2020
executed such that log2 N-bits of the SRAM ID constitute its most
significant bits and points to the start of the corresponding sub-block
in SRAM, and the lower log2 D bits from the counter select SRAM
words in the sub-block. In this way, the AGU accesses all the binary-
Fig. 3. Proposed ER-TCAM architecture for error detection. encoded words of the corresponding partition of the TCAM table. The
TCAM words read are matched with the C-bit pattern to get a match-
table, providing redundancy of the matching information in SRAMs bit each cycle, thus requiring D clock cycles to compute the match
realizing TCAM. bits and the associated parity bit, composing the ECV. The read/write
To detect an SBU in SRAMs, the ER-TCAM adds a parity bit for controller generates write enable high signal for the corresponding
each SRAM word as shown in Fig. 2(a). Error detection is performed SRAM to write the computed ECV over the corrupted SRAM word.
on the words of SRAMs accessed during lookups. Once an error is During the error-correction process, the ER-TCAM allows search
detected in a word, the ER-TCAM uses the redundant information operations as SRAMs realizing the TCAM function are available
stored in the binary-coded TCAM table for correction. for lookup operations. The ER-TCAM configures these SRAMs as
For example, when a search key (0110) is applied to SRAMs simple dual-port RAM that performs the read and write in parallel at
shown in Fig. 2(a), accesses the second (00101) and third words the same clock cycle. Once the ECV is computed, it is written using
(01110) of the first and second SRAMs, respectively. The calculated the write port of SRAM, thus, the error correction process completely
parity for the match bits of the second SRAM word (01110) does overlaps the search operations in the ER-TCAM. Although a soft
not match the parity stored, indicating the occurrence of an SBU in error can occur in the SRAM storing binary-encoded TCAM table;
SRAM as shown in Fig. 2(a). The ER-TCAM accesses the words however, its error occurrence probability compared with that of
of SRAM storing the binary-encoded contents of the related TCAM SRAMs realizing TCAM is very low, owing to its relatively small
partition one by one. The SRAM words read are matched with the size. Still, the ER-TCAM is able to protect SRAM storing the binary-
corresponding bit pattern (10) to compute match bits and associated encoded TCAM table using ECCs at a very little cost of memory and
parity (01010) collectively called error-correction vector (ECV) as error-correction latency overhead.
shown in Fig. 2(b). The computed ECV is further written over the
corrupted SRAM word. In this way, the ER-TCAM ensures 100% V. I MPLEMENTATION AND P ERFORMANCE E VALUATION
coverage against the SBUs. Moreover, the ER-TCAM is able to A. FPGA Implementation and Results
protect against a special case of MBUs when an odd number of ER-TCAM design experiments were conducted using Xilinx
bits flip in an SRAM word detected and corrected using the same Vivado Design Suite 2018.3, targeting the Artix-7 FPGA device
aforementioned processes. (xc7a200tffv1156-2). The maximum achievable clock rate and
resource consumption of the ER-TCAM are reported using the Xilinx
A. Architecture of the Proposed ER-TCAM post-place-and-route results.
Fig. 3 shows the proposed ER-TCAM error-detection architecture. Table I lists the FPGA resource utilization for three different
When an input search key is applied for lookup, the bits of the SRAM TCAM sizes (case I: 64 × 40, case II: 512 × 40, and case III:
words read are EX-ORed to get an error signal. The error signals from 1024 × 40) implemented using BRAM and distRAM resources for
the N SRAMs of the TCAM design are encoded to get a log2 N-bit the ER-TCAM technique. The ER-TCAM implementation cases I,
error code that uniquely identifies respective corrupted SRAM. The II, and III utilizes 5-, 38-, and 76 36-kb size BRAMs, respectively,
error code and related search-key bit patterns are forwarded to the and 0.5-, 1.5-, and 2.5 36-kb size BRAMs, respectively, for storing
error-correction module. the binary encoded contents of the respective TCAM tables as shown
Fig. 4 shows the ER-TCAM architecture for error correction which in the second and fifth columns of Table I. The memory overhead
mainly comprises an SRAM storing binary-encoded contents of the in the ER-TCAM for maintaining a binary-encoded TCAM table
TCAM table, an ECV computation unit, an address generation unit is minimal as SRAM-based TCAMs utilize 57 BRAM bits for
(AGU), and a read/write controller. The MOD-D counter generates a implementing a single TCAM bit, while the binary-encoded storage
new sequence of log2 D bits for every cycle. The SRAM address is of a ternary bit requires only two bits, thus, a memory overhead of
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 00:25:50 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 4, APRIL 2020 1087
TABLE I TABLE II
FPGA I MPLEMENTATION R ESULTS OF THE ER-TCAM FPGA R ESOURCE U TILIZATION C OMPARISON OF VARIOUS BRAM AND
DIST RAM-B ASED TCAM S P ROTECTION A RCHITECTURES
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 00:25:50 UTC from IEEE Xplore. Restrictions apply.
1088 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 28, NO. 4, APRIL 2020
original TCAM content maintained on-chip for update purposes to [6] S. Pontarelli, P. Reviriego, and J. A. Maestro, “Parallel D-pipeline:
provide a low area, critical path delay, and response time protection. A cuckoo hashing implementation for increased throughput,” IEEE
Trans. Comput., vol. 65, no. 1, pp. 326–331, Mar. 2015.
The proposed method uses single-bit parity to detect faults at a
[7] P. Reviriego, A. Ullah, and S. Pontarelli, “PR-TCAM: Efficient TCAM
minimal cost of logic and critical path delay. The proposed error emulation on Xilinx FPGAs using partial reconfiguration,” IEEE Trans.
resiliency technique, called ER-TCAM, employs the binary-encoded Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 8, pp. 1952–1956,
TCAM table in SRAM-based TCAMs to correct errors. SRAMs Mar. 2019.
implementing the TCAM function are available for search operations [8] Ternary Content Addressable Memory (TCAM) Search IP for SDNet
V1.0; Xilinx Product Guide PG190, Xilinx, San Jose, CA, USA,
during the error-correction process carried out in the background, Nov. 2017.
thus, the proposed error-correction technique does not affect the [9] W. Jiang, “Scalable ternary content addressable memory implementation
data path processing. The ER-TCAM achieved a search performance using FPGAs,” in Proc. 9th ACM/IEEE Symp. Archit. Netw. Commun.
of up to 250 million searches per second with an EDD of 8 ns Syst., 2013, pp. 71–82.
[10] M. Irfan and Z. Ullah, “G-AETCAM: Gate-based area-efficient
and a deterministic error-correction time of 260 ns when tested ternary content-addressable memory on FPGA,” IEEE Access, vol. 5,
on the Artix-7 FPGA device. However, the error-correction time pp. 20785–20790, 2017.
of other existing error-correction techniques is very high and non- [11] P. Reviriego, S. Pontarelli, and A. Ullah, “Error detection and correction
deterministic. Compared with others, the ER-TCAM achieves at least in SRAM emulated TCAMs,” IEEE Trans. Very Large Scale Integr.
two times better EDD. The ER-TCAM can be useful for network (VLSI) Syst., vol. 27, no. 2, pp. 486–490, Feb. 2019.
[12] I. Ullah, Z. Ullah, U. Afzaal, and J.-A. Lee, “DURE: An energy-and
system or other system-on-chips that require SRAM-based TCAMs resource-efficient TCAM architecture for FPGAs with dynamic updates,”
as a search engine for accelerating specific tasks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 6,
pp. 1298–1307, Mar. 2019.
R EFERENCES [13] Z. Ullah, K. Ilgon, and S. Baeg, “Hybrid partitioned SRAM-based
ternary content addressable memory,” IEEE Trans. Circuits Syst. I, Reg.
[1] P. He, W. Zhang, H. Guan, K. Salamatian, and G. Xie, “Partial order Papers, vol. 59, no. 12, pp. 2969–2979, Dec. 2012.
theory for fast TCAM updates,” IEEE/ACM Trans. Netw., vol. 26, no. 1, [14] I. Ullah, Z. Ullah, and J.-A. Lee, “Efficient TCAM design based on
pp. 217–230, Feb. 2018. multipumping-enabled multiported SRAM on FPGAA,” IEEE Access,
[2] W. Fu, T. Li, and Z. Sun, “FAS: Using FPGA to accelerate and secure vol. 6, pp. 19940–19947, 2018.
SDN software switches,” Secur. Commun. Netw., vol. 2018, Jan. 2018, [15] Xilinx. (2016). 7 Series FPGAs Configurable Logic Block User
Art. no. 5650205. Guide UG474 (V1.8). Accessed: Jun. 25, 2019. [Online]. Available:
[3] T. Li, H. Liu, and H. Yang, “Design and characterization of http://www.xilinx.com
SEU hardened circuits for SRAM-based FPGA,” IEEE Trans. Very [16] A. M. Keller and M. J. Wirthlin, “Impact of soft errors on large-
Large Scale Integr. (VLSI) Syst., vol. 27, no. 6, pp. 1276–1283, scale FPGA cloud computing,” in Proc. ACM/SIGDA Int. Symp. Field-
Feb. 2019. Program. Gate Arrays, 2019, pp. 272–281.
[4] T. Li, H. Yang, H. Zhao, N. Wang, Y. Wei, and Y. Jia, “Investigation [17] Device Reliability Report, Xilinx, San Jose, CA, USA, 2018.
into SEU effects and hardening strategies in SRAM based FPGA,” in [18] D. Shah and P. Gupta, “Fast updating algorithms for TCAM,” IEEE
Proc. 17th Eur. Conf. Radiat. Effects Compon. Syst. (RADECS), 2019, Micro, vol. 21, no. 1, pp. 36–47, Jan. 2001.
pp. 1–5. [19] S. R. Nassif, N. Mehta, and Y. Cao, “A resilience roadmap,” in Proc.
[5] A. Ramos, R. G. Toral, P. Reviriego, and J. A. Maestro, “An Design, Autom. Test Eur. Conf. Exhib., 2010, pp. 1011–1016.
ALU protection methodology for soft processors on SRAM-based [20] S. B. Akers, “A parity bit signature for exhaustive testing,” IEEE
FPGAs,” IEEE Trans. Comput., vol. 68, no. 9, pp. 1404–1410, Trans. Comput.-Aided Design Integr. Circuits Syst., vol. CAD-7, no. 3,
Mar. 2019. pp. 333–338, Mar. 1988.
Authorized licensed use limited to: University of Canberra. Downloaded on April 28,2020 at 00:25:50 UTC from IEEE Xplore. Restrictions apply.