Professional Documents
Culture Documents
8, AUGUST 2005
I. INTRODUCTION
the NAND cells for several bits but also achieves high speed by
using the NOR cells for most bits. The CRSLD reduces the SL
power by recycling the charge of SLs without the SL precharge.
Fig. 4. Activations in NAND-MLs and NOR-MLs.
The organization of this paper is as follows. In Section II, we
propose the PNN-CAM using the PNN-ML and the CRSLD. In
Section III, we present performance comparisons and show test CAMs. The NAND-ML in the NAND type CAM consumes the
results of the fabricated chip. This paper ends with the conclu- least power but it is the slowest. The NOR-ML in the NOR type
sion in Section IV. CAM is the fastest but it dissipates the largest power.
Fig. 3(a) shows the NAND-ML with NAND cells. The av-
II. ARCHITECTURE erage capacitance of the NAND-ML is
where the
A. Pulsed NAND-NOR Match-Line Scheme matching probability of each bit is and is drain capac-
Fig. 2 shows the pulsed NAND–NOR match-line (PNN-ML) itance of transistors [3]. Its swing voltage is
architecture. The CAM has PNN-MLs and a replica PNN-ML. due to the voltage drop of series NMOS transistors whose gate
Each PNN-ML consists of NAND cells and - NOR cells. The voltage is . The effective capacitance of NAND-ML is
PNN-ML utilizes the advantages of both NAND- and NOR-type where the degradation ratio is .
1738 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 8, AUGUST 2005
Fig. 5. Power consumption in mismatched MLs (a) without the replica PNN-ML and (b) with the replica PNN-ML.
TABLE I
EFFECTIVE CAPACITANCE COMPARISONS OF MLs
Fig. 8. Power and delay comparisons of MLs when (a) n = 128 and m = 32 and (b) n = 512 and m = 144.
the PNN-ML becomes slow. To improve the speed, it utilizes process with V. When and ,
the hierarchical ML in [7]. Fig. 7 shows the architectures of the it consumes 20% more power but it is three times faster than
PNN-ML. When , the PNN-ML is relatively fast. How- the NAND-ML. Also, it is 19% slower but it consumes 89% less
ever, when , it becomes slow. To improve the speed, power than the NOR-ML. When and , it
the PNN-ML is hierarchically divided into four sub-PNN-MLs consumes 76% more power but it is 23 times faster than the
with bits. The sub-PNN-ML consists of NAND cells and NAND-ML. Also, it is 28% faster and it consumes 90% less
NOR cells. The search result is hierarchically gener- power than the NOR-ML. This saves the power of MLs by re-
ated by four sub-PNN-MLs with three AND gates. Its delay is ducing the number of activated MLs and by using the pulse op-
equal to the summation of delays of -bit sub-PNN-ML, two eration. When , the PNN-ML with the hierarchical ML
AND gates, and the hierarchical ML wires. Although the hierar- is faster than the NOR-ML.
chical PNN-ML consumes more power due to four NAND-MLs, Fig. 9 shows the CAM architecture. During search opera-
12 ML-precharge transistors, three AND gates, and the hierar- tion, the I/O circuits catch and send the search data to SLs. The
chical ML wires, it is much faster. SLs are connected to all memory cells. The CAM compares the
Fig. 8 shows the power and delay comparisons of MLs. All search data in SLs to the stored data. 128 32 bit memory is
simulations in this paper are performed in a 0.25- m CMOS divided into four sub-blocks with 32 32-bit memory in order
YANG AND KIM: LOW-POWER CAM USING PULSED NAND–NOR MATCH-LINE AND CHARGE-RECYCLING SEARCH-LINE DRIVER 1741
to reduce the delay of the SL. When a ML is matched, the ad- TABLE II
dress of the matched ML is encoded in the ROM encoder. The POWER COMPARISONS OF SLs
encoded search address is send to the output pad of the chip by
the I/O circuits.
Fig. 12. Power comparisons of SLs when n = 128 and (a) = 0:5 and (b) = 0:2.
Fig. 13. Power comparisons of SLs according to n when (a) = 0:5 and (b) = 0:2.
SL. As increases and decreases, the CRSLD saves more largest ML power because all MLs precharged to and
power. then discharged to ground. With the speed of the NOR-CAM,
the CS-CAM reduces the ML power by supplying large cur-
III. PERFORMANCE COMPARISON AND TEST RESULTS rent to the matched ML and small current to the mismatched
MLs. Also, the SLs are not precharged. The Hybrid-CAM
A. Performance Comparisons further reduces the ML and SL power by using the hierarchical
Fig. 14 shows the energy and delay comparisons of various search composed of a small NOR-type main-bank and several
CAMs. For a fair comparison, the proposed PNN-CAM, the dy- large NAND-type sub-banks. The result of the main-bank ac-
namic NAND-type CAM using the NAND-MLs (NAND-CAM) [3], tivates only a sub-bank. The main-bank is fast and consumes
the dynamic NOR-type CAM using the NOR-MLs (NOR-CAM) a little power because it is a small NOR-CAM. The selected
[2], the current saving CAM (CS-CAM) [6], and the Hy- sub-bank consumes a small amount of power because it uses the
brid-type CAM (Hybrid-CAM) [7] are simulated in a 0.25- m NAND-MLs. To improve the speed of the NAND-MLs, we apply
CMOS process with V. the hierarchical ML structures and it inserts many of the ML-re-
The NAND-CAM consumes the least ML power but it is peaters into the NAND-MLs. The Hybrid-CAM consumes the
the slowest. The NOR-CAM is the fastest but it consumes the least power in both MLs and SLs among the previous CAMs.
YANG AND KIM: LOW-POWER CAM USING PULSED NAND–NOR MATCH-LINE AND CHARGE-RECYCLING SEARCH-LINE DRIVER 1743
Fig. 14. Energy/bit/search and delay comparisons when CAM sizes are
2 2
(a) 128 32 bit and (b) 512 144 bit.
TABLE III [4] H. Miyatake et al., “A design for high-speed low-power CMOS fully
FEATURES OF THE PNN-CAM CHIP parallel content-addressable memory macros,” IEEE J. Solid-State Cir-
cuits, vol. 36, no. 6, pp. 956–968, Jun. 2001.
[5] C.-S. Lin et al., “A low power precomputation-based fully parallel con-
tent-addressable memory,” IEEE J. Solid-State Circuits, vol. 38, no. 4,
pp. 654–662, Apr. 2003.
[6] I. Arsovski et al., “A mismatch-dependent power allocation technique
for match-line sensing in content-addressable memories,” IEEE J. Solid-
State Circuits, vol. 38, no. 11, pp. 1958–1966, Nov. 2003.
[7] S. Choi et al., “A 0.7 fJ/bit/search, 2.2 ns search time hybrid type TCAM
architecture,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,
Feb. 2004, pp. 498–499.
[8] I. Arsovski et al., “A ternary content-addressable memory (TCAM)
based on 4T static storage and including a current-race sensing scheme,”
IV. CONCLUSION IEEE J. Solid-State Circuits, vol. 38, no. 1, pp. 155–158, Jan. 2003.
[9] K. Pagiamtzis et al., “A low-power content-addressable memory (CAM)
The PNN-CAM is proposed to achieve low power and high using pipelined hierarchical search scheme,” IEEE J. Solid-State Cir-
speed. The PNN-CAM reduces the ML power by using the cuits, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
PNN-ML. The PNN-ML not only significantly reduces the
ML power by activating only a few MLs by using the NAND
cells for several bits but also achieves high speed by using
the NOR cells for most bits. To reduce the delay of long MLs, Byung-Do Yang received the B.S., M.S., and Ph.D.
degrees in electrical engineering and computer sci-
the hierarchical ML is utilized. The PNN-CAM reduces the ence from the Korea Advanced Institute of Science
SL power by using the CRSLD. The CRSLD reduces the SL and Technology (KAIST), Daejeon, Korea, in 1999,
power by recycling the charge of SLs without the SL precharge. 2001, and 2005, respectively.
He joined the Memory Division, Samsung Elec-
The small PNN-CAM with 128 32 bit consumes only 31% tronics, Kyungki-Do, Korea, in 2005, where he has
power with 19% speed degradation compared to the dynamic been engaged in the design of DRAM. His research
NOR-type CAM. The large PNN-CAM with 512 144 bit interests include low-power DRAM circuits.
consumes only 21% power with 39% speed improvement. The
PNN-CAM chip with 128 32 bit is fabricated in a 0.25- m
CMOS process with V. The chip core dissipates
17.2-fJ/bit/search. Its area is 0.32 mm . Its maximum operating
Lee-Sup Kim received the B.S. degree in electronics
frequency is 260 MHz. engineering from Seoul National University, Seoul,
Korea, in 1982 and the M.S. and Ph.D. degrees in
electrical engineering from Stanford University,
REFERENCES Stanford, CA, in 1986 and 1990, respectively.
[1] F. Shafai et al., “Fully parallel 30-MHz 2.5-Mb CAM,” IEEE J. Solid- He was a Postdoctoral Fellow with the Toshiba
State Circuits, vol. 33, no. 11, pp. 1690–1998, Nov. 1998. Corporation, Kawasaki, Japan, during 1990–1993,
[2] P. Lin et al., “A 1-V 128-kb four-set-associative CMOS cache memory where he was involved in the design of the high-per-
using wordline-oriented tag compare (WLOTC) structure with content- formance DSP and single-chip MPEG2 decoder.
addressable memory (CAM) 10-transistor tag cell,” IEEE J. Solid-State Since March 1993, he has been with the Korea
Circuits, vol. 36, no. 4, pp. 666–676, Apr. 2001. Advanced Institute of Science and Technology,
[3] Y. L. Hsiao et al., “Power modeling and low-power design of content- Daejeon, Korea. In November 2002, he became a full Professor. His research
addressable memories,” in Proc. IEEE Int. Symp. Circuits and Systems, interests are multimedia VLSI design, hardware implementation of signal
vol. 4, 2001, pp. 926–929. processing algorithms, and low-power IC design.