Professional Documents
Culture Documents
6, JUNE 2009
479
I. I NTRODUCTION
480
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: EXPRESS BRIEFS, VOL. 56, NO. 6, JUNE 2009
Fig. 2. (a) Search data example of five consecutive 0s. Case A/B is the TCAM
design without/with SE scheme. (b) Traditional SE scheme.
off the path-control switch to block the search data from being
further broadcast. Because the effective capacitance of SL is
decreased, the SL power consumption can be reduced.
III. T WO -L EVEL DCG S CHEME
A. SE Technique
If we only consider two consecutive search data on a single
bit, there are four search patterns, i.e., 0 0, 1 1, 0 1,
and 1 0. Due to no having data transition, 0 0 and 1 1
patterns are classified as quiet patterns. In contrast, 0 1 and
1 0 are classified as switch patterns. In the conventional
TCAM design, because the ML has to be charged to high during
the precharge phase, both S and S must be discharged to 0 to
avoid a possible short circuit. However, such discharge will
increase the unnecessary SL switching activity. For example,
Fig. 2(a) shows a search data pattern of five consecutive 0s.
In case A, i.e., the conventional TCAM design, the gray block
contains the values of S and S during the precharge phase.
Clearly, the number of energy-consuming transitions (N01 )
on SL is four, but they are all unnecessary switching activities.
As illustrated in Fig. 2(b), a straightforward solution to this
weakness is the introduction of an additional transistor, i.e., N3,
that is used to disconnect the pull-down path during the ML
precharge and then enable the search operation. It is referred to
as the SE technique, whose effect can be observed in case B
shown in Fig. 2(a), where the unnecessary SL switches are all
eliminated. Compared to the traditional TCAM without SE,
whose pull-down path is only N1 and N2, the use of SE will
result in an increase of one transistor in the length of the pulldown path. Based on our simulation, the optimal N3 width
is three times the N1 (or N2) width, where the performance
penalty is about 4.2%. Fortunately, this performance penalty
can be compensated by the two-level DCG scheme proposed
in the following section.
B. L1 DCG
Fig. 3 shows the general configuration of a TCAM array
with N prefixes. The function of the routing table lookup is to
find the longest one in all the prefixes that match the incoming
481
Fig. 5. L2 DCG example, where the granularity (GL2 ) is 4, and GNL2 is the
L2 gating node.
Fig. 3.
Fig. 4.
482
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: EXPRESS BRIEFS, VOL. 56, NO. 6, JUNE 2009
Fig. 6. Match delay for both the conventional and two-level DCG TCAM
designs, where XY means that GL1 = X and GL2 = Y .
Fig. 7. Column power dissipated in the switch pattern for both the conventional and two-level DCG TCAM designs.
483
TABLE I
COLUMN POWER CONSUMPTION (IN WATTS) FOR BOTH THE
CONVENTIONAL AND TWO-LEVEL DCG TCAM DESIGNS.
CONFIGURATION 16-8 IS THE BEST TO REDUCE
THE S WITCH P OWER C ONSUMPTION
TABLE II
COLUMN ENERGY CONSUMPTION FOR BOTH THE CONVENTIONAL
AND T WO -L EVEL DCG TCAM D ESIGNS
its column power shows a step wave. The power data are
summarized in Table I, where the worst and best values are for
the X = 0 and X = 128 cases. In particular, the average value
is obtained by averaging the results of all 129 cases, i.e., X =
0 128, if every case has the same occurrence probability.
In Table I, the key observations are as follows: 1) There are
two features about the conventional TCAM. First, due to the
need for presetting both S and S to 0, the power consumption
of the quiet pattern is almost equal to that of the switch pattern.
Second, its column power is independent of the continuous X
number. As shown in Table I, they are always 3.753 E-05 and
3.776 E-05 W for quiet and switch patterns, respectively. 2) Due
to having no SL switch, in the quiet pattern, our design almost
consumes no power compared to the conventional TCAM, and
the difference between three cases is hardly noticeable. 3) In
the worst case of the switch pattern, because no X cell can
facilitate our design to reduce the SL power, the additional
L1 and L2 gating nodes will result in absolute power penalty.
Consequently, for all configurations, the worst switch power
must be larger than that of the conventional TCAM. 4) Clearly,
for the switch pattern the best configuration is 16-8, in which
our design incurs the least power penalty, i.e., 13%, in the
worst case, while achieving the largest power reduction, i.e.,
35%, in the average case.
For a fair comparison, the evaluation metric is the energy,
which is the product of the MD and the search power. Thus,
the column energy consumption is summarized in Table II, in
which only the results of the average case are presented. In
Table II, the best configuration is still 16-8, which can achieve
70% average SL energy reduction compared to the conventional
NOR-type TCAM design.
V. C ONCLUSION
R EFERENCES
[1] J. S. Wang, C. C. Wang, and C. Yeh, TCAM for IP-address lookup
using tree-style AND-type match lines and segmented search lines, in
Int. Solid-State Circuits Conf., 2006, pp. 577586.
[2] N. Mohan and M. Sachdev, Low-capacitance and charge-shared matchlines for low-energy high-performance TCAMs, IEEE J. Solid-State
Circuits, vol. 42, no. 9, pp. 20542060, Sep. 2007.
[3] N. Mohan, W. Fung, D. Wright, and M. Sachdev, Match line sense
amplifiers with positive feedback for low-power content addressable
memories, in IEEE Custom Integr. Circuits Conf., 2006, pp. 297300.
[4] K. Pagiamtzis and A. Sheikholeslami, A low power content-addressable
memory (CAM) using pipelined hierarchical search scheme, IEEE J.
Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.
[5] H. Noda, K. Inoue, M. Kuroiwa, F. Igaue, K. Yamamoto, H. J. Mattausch,
T. Koide, A. Amo, A. Hachisuka, S. Soeda, I. Hayashi, F. Morishita,
K. Dosaka, K. Arimoto, K. Fujishima, K. Anami, and T. Yoshihara, A
cost-efficient high-performance dynamic TCAM with pipelined hierarchical searching and shift redundancy architecture, IEEE J. Solid-State
Circuits, vol. 40, no. 1, pp. 245253, Jan. 2005.
[6] P. T. Huang, S. W. Chang, W. Y. Liu, and W. Hwang, A 256x128 energyefficient TCAM with novel low power schemes, in Proc. Int. Symp. VLSIDAT, 2007, pp. 14.
[7] I. Arsovski, T. Chandler, and A. Sheikholeslami, A ternary content addressable memory (TCAM) based on 4T static storage and including a
current-race sensing scheme, IEEE J. Solid-State Circuits, vol. 38, no. 1,
pp. 155158, Jan. 2003.
[8] I. Arsovski and R. Nadkarni, Low-noise embedded CAM with reduced
slow-rate match-llines and asynchronous search-lines, in IEEE Custom
Integr. Circuits Conf., 2005, pp. 447450.
[9] Y.-J. Chang, Two-layer hierarchical matching method for energyefficient CAM design, Electron. Lett., vol. 43, no. 2, pp. 8082,
Jan. 2007.
[10] Y.-J. Chang, Y.-H. Liao, and S.-J. Ruan, Improve CAM power efficiency
using decoupled match line scheme, in IEEE/ACM DATE, Apr. 1620,
2007, pp. 16.
[11] Y.-J. Chang and Y.-H. Liao, Hybrid-type CAM design for both power
and performance efficiency, IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 16, no. 8, pp. 965974, Aug. 2008.
[12] D. Shah and P. Gupta, Fast updating algorithms for TCAMs, IEEE
Micro, vol. 21, no. 1, pp. 3647, Jan./Feb. 2001.