An Efficient Test Design For Verification of Cache Coherence in Cmps

2011 IEEE
Ninth Ninth
IEEE International Conference on Dependable, Autonomic and Secure Computing
An Efficient Test Design for Verification of Cache Coherence in CMPs
Mamata Dalui Biplab K Sikdar

Department of Computer Science and Engineering Department of Computer Science and Technology
National Institute of Technology Bengal Engineering and Science University
Durgapur, West Bengal, India Shibpur, West Bengal, India
mamata.06@gmail.com biplab@cs.becs.ac.in
Abstract—The data coherence in the cache systems of CMPs Since the reliability of CMPs is an important issue, a
with thousands of processors are to be more accurate and number of works address this from different perspectives [8],
reliable. This work proposes an effective solution to address [9], [10], [11]. The schemes ensuring coherency in CMPs
this issue through introduction of highly efficient test logic with
the cache controller. It is based on the modular structure of with thousands of cores, through frequent communication
Cellular Automata (CA) and a special class of CA referred to along the global wires, are reported in [3], [4], [5], [6],
as the SACA (single length single cycle attractor CA) has been [7]. This communication among the L1 caches seriously
introduced to identify the inconsistencies in cache line states of affects the system performance as well as the energy usage.
the processors’ private caches. The hardware implementation However, the power consumption has emerged as the first
of the proposed test logic can ensure quick verification of
cache inconsistencies in CMPs. The proposed design eliminates order design metric for future CMPs [1][2].
the requirement of huge storage as well as the complex data The above scenario motivates us to develop a scheme to
structures commonly used to verify the data coherency in a determine the accuracy in data consistency of the CMPs
multiprocessor system. cache system [12]. The design should function at speed with
Keywords-Cache coherence, Chip Multi-Processor, Fault de- the system as well as is to be energy efficient. In this work,
tection, Coherence controller; we propose the design of an efficient test logic entrusted
with the verification of data inconsistencies in the CMPs
I. I NTRODUCTION private caches. The solution is based on the theory of a
The Chip Multi-Processors (CMPs) with thousands of special class of cellular automata (CA) [13] referred to as
on-chip cores are more susceptible to faults due to the the single length single cycle attractor CA (SACA) [14],
effects of technology scaling [1] as well as noncompliance [15]. At each stage of cache access, the proposed test logic
of the schemes that are targeted for small systems. The checks the state of a cache line at all the processors’ private
technology scaling adds new form of defects at the deep caches. For any inconsistency recorded, due to defect in the
submicron level that pose serious threats in CMPs while cache system, the SACA of the fault detection unit points
ensuring data consistency in processors’ private caches. to an attractor indicating inconsistency in the cache system.
Therefore, the current need is to develop schemes for high- The hardware implementation of the proposed test logic
speed verification of inconsistencies in cache data without a can quickly determine the denial of cache coherence in
commitment of major cost involvement. CMPs at an instant of time. This SACA based uncon-
In CMPs, the L1 cache is the private cache of a processor ventional scheme demands minimum wire communication
core. L2 cache is shared among the cores and is kept as well as interconnect access that effectively can enable
coherent with all the L1s. The cache coherence controller the reduction in power dissipation. Further, the modular
(CC), responsible for ensuring consistency in shared data, structure of cellular automata (CA) [13][14] makes the
is one of the most important hardware component [2]. An solution suitable for a system with billions of cores -that
insignificant defect in CC of the chip multiprocessors can is, highly scalable.
lead to a major data inconsistency in CMPs L1 caches. The relevant part of CA preliminaries and a brief on cache
If the CC wrongly computes a cache line state as ‘shared’ coherence are introduced in the next sections.
(S) instead of ‘modified’ (M), it denies the issuance of
invalidation message. This can cause a serious damage to II. CA P RELIMINARIES

the system performance as well as reliability of the system. A Cellular Automaton ( ) consists of a number of cells
On the other hand, setting of an ‘M’ state instead of ‘S’ organized in the form of lattice. It evolves in discrete space
results in unnecessary message delivery, in effect, a huge and time, and can be viewed as an autonomous finite state
power loss. Therefore, maintaining coherency of shared data machine ( ). Each cell stores a discrete variable at time

in CMPs is of utmost necessity for ensuring the correctness that refers to the present state (PS) of the cell. The next

of computation as well as the power efficiency of a system. state (NS) of the cell at is affected by its state and the

978-0-7695-4612-4/11 $26.00 © 2011 IEEE 329

328
DOI 10.1109/DASC.2011.72
Table I
S OF THE

OUT IN OUT IN OUT IN OUT IN OUT

IN
Cell 1 Cell i−1 Cell i Cell i+1 Cell n
(FF) (FF) (FF) (FF) (FF)
" ( * +

, , / , , , 0 , , ,
- - - - - - - -
2 "

2 "

fi
2 "
f1 fn

2 "

2 "

null boundar Combinational logic circuit null boundar
Figure 1. An n-cell null boundary CA

states of its at time . In this work, we concentrate
@
3 4 5 7 9 ; < = ?
A C
on such 3-neighborhood (self, left and right neighbors), 13

A C
10
where a cell is having two states - 0 or 1 and the next A C
0
state of cell is 5 D F
4
=
Q
8
K N K K
K
D I J
G L M G D G D G D
J P P I J
6 11
where , and are the present states of the left G
K
D
N
G
K
D G
K
neighbor, self and right neighbor of the cell at time

J I J
@
1
9
5 D F
and
R
is the next state function. The states of the cells L
K
A C
12 14 2 5 3 7 15
at is the present state of the .
Q @
D T M G D G D G D
A C
Therefore, the next state of an cell is =

J
P P W W W P
3 Z G D I J
[ \ ] [ ^ _ ^
]
_ ^ _ \ [ ^
]
_ ^ _ ^ _ \ g [ ^
g
_
h ]
^
g
_ ^
g
_
i ]
Figure 2. A 4-cell reversible CA

` a a b c a a b a d c a e e e a a a c c
The next state function of the CA cell can be ex- 5 D F
pressed in the form of a truth table (Table I). The decimal 192 and 240.
equivalent of the 8 outputs is called Rule . In a 2-state k
K
Definition 6. A set of states of a CA forms loop (cycle)

3-neighborhood CA, there can be (256) rules. Five such l n
(7 7 and 9 1 9 of Figure 3) and is referred to as the

} } }
rules 15, 14, 192, 207, and 240 are illustrated in Table I. The attractor. The attractors of single length cycle, that is, 7 7 }
first row lists the possible (8) combinations of present l o
of Figure 3 is of our current interest.

states of , and cells at t. The last five
p Q q p Q
D F D F D F
M 5 Z 5 M 5
rows indicate the next states of the cell at (t + 1), forming 5 D F

III. C ACHE C OHERENCE IN CMP S
the rules 15 (NS=S ), 14 (NS=S ), 192 sK N sK N
G
K
q
G
K
s
N
G
K
In the present work, we assume the shared bus architecture

(NS= ), 207 (NS=S ), and 240 (NS=S ) G
K N
G
K
J
sK N
q
G
K
J t J t
I J
K N
to describe our scheme. In CMPs, the bus based cache

J J
respectively. The following terminologies are relevant for coherence protocol connects all the L1 caches through a
t
the current work. shared bus (Figure 4) and each L1 cache miss generates the
Definition 1. The set R = of u k k
V
k
K
k
X x
coherence messages. The all other L1s of the system update

rules that configure the cells of a CA is called the rule vector their valid data (cache line) states in accordance with the
J
P P W W W P P W W W P
of the CA. coherence messages.

Definition 2. If all the CA cells obey the same rule, then the
CA is a uniform CA; otherwise it is a non-uniform/hybrid
2
CA.
10 8
Definition 3. A CA is a null boundary CA (Figure 1) if
the left (right) neighbor of the leftmost (rightmost) terminal
cell is permanently fixed to 0-state. 6 3
0
Definition 4. A CA is reversible if its states form only
cycles in the state transition diagram (Figure 2); otherwise, 4
14 9
the CA is irreversible (Figure 3). 11
Definition 5. From the view point of Switching Theory, a 12

5
combination of the present states (1 row of Table I) can be y D
13 1
considered as the Min Term of a 3-variable G
K
D
N
G
K
D G
K
D
15
switching function. Each column of the first row of Table I
J P P I J
7
is referred to as the Rule Min Term (RMT). The column 011
of Table I is the 3 RMT. The next states corresponding { |
Figure 3. A 4-cell irreversible CA

0 / / /
to this RMT are 1 for Rule 15, 14 and 207, and 0 for Rule

329
330
Processor (summary of states shown in Figure 5) of the cache system.
P1 P2 Pn Cores All these are coherent states -that is, when the system is
in such states, the proposed cache inconsistency detection
Private L1 logic should respond as CH (coherent). The event shown in
C1 C2 Cn Caches column 2 causes transition of a cache line’s (say, B) states
at different Cs (private caches of different processors) from
a current state to the desired next state (column 3). During
this transition a faulty system may record incorrect states of
CA based B at different processors’ caches (noted in column 4). For
Shared CC Test Unit
L2 Cache the current design, we assume the faulty recording is due to
communication failure or a design defect in the CC logic.
A faulty recording may not always lead to incoherent
Figure 4. CMPs with CC and test unit state. The effect of fault results in either CH or ICH
(incoherent) as noted in the last column of Table II. The
Invalid Pjs refers to other processors
MSI (snoopy) entry ‘All Cs I[S]’, in the table, represents the cache line B
I at all the caches is in Invalid [Shared] state.
The consideration of the columns 1, 4 and 5 of Table II
Processor Pi write miss: Pi read miss: indicates that the fault detection unit should respond as CH
Pj
ack ss:
signal Pjs to invalidate B

wr
i
signal to Pjs (sharers) for the states of cache line B when -

ite te m
ite
m
Case 1: All the caches (Cs) as I (invalid)

i
wr
−b
iss
Pj
Pj read miss: Case 2: All the caches (Cs) as S (shared)

wr
Case 3: Some Cs I and others S

Pi
Pi write−back Shared
Modified Pi read hit
M
S
(read−only) Case 4: One cache M and all others I.
On the other hand, for the following incoherent states
Pi read/write hit Pi write hit: Case 5: One cache as M, at least one S and others I
signal to Pjs for invalidation Case 6: Two caches as M and others I
it should respond ICH.
‘B’ is the cache line (block) read miss follows the write−back
In the proposed design, we introduce a CA-based model
Figure 5. State transition diagram of 3-state MSI protocol to realize the test logic (unit) so that it can correctly respond
either CH or ICH following the above six cases in a cache
system. For a chip multiprocessor with caches (C1, C2, ?
...Cn, where Ci is the cache of processor Pi), we employ

The cache coherence controller (CC) is entrusted to ensure
an (n+2)-cell CA (each of the two terminal cells are not
the consistencies of shared data [12] among the processor
representing a cache) at the test unit. If the cache line B at
cores. However, a fault in the CC logic may lead to an
Ci has its state as M (Modified) then i cell of the CA is
inconsistency in cache lines that effectively lead to an error
L M
configured with rule R . Similarly, if the state of a Ci is S

prone computation as well as huge power loss.

or I, corresponding CA cell is configured with Rule R or

The scheme proposed in [2], using a two step verification

R .
(local verification and global verification) technique for CC

Now at each transition from current cache state to next

logic, is the only effective solution so far been proposed
cache state of Table II, the test unit forms the n-cell CA
to address this issue for CMPs. It involves computation
through setting up of each CA cell rule R corresponding R
intensive steps to detect inconsistencies in shared data of

to the cache Ci. Then the CA is run for ‘t’ (=n+2) time
different caches.
steps with all 1s seed. It settles to either in attractor ‘CH’
In this work, we propose a cellular automata based high-
for coherent state or ‘ICH’ for the incoherent -that is, faulty
speed verification/test logic that is scalable and cost effective
state to satisfy the above six cases. The following subsection
in terms of area overhead as well as power consumption.
reports the selection of rules for R , R & R .

IV. OVERVIEW OF THE P ROPOSED T EST D ESIGN

V. CA BASED D ESIGN OF CC T EST U NIT
In the current presentation, we consider such a basic bus-
The scheme described in the earlier section demands
based 3-state MSI protocol for the CMPs cache system. It
that the CA constructed from R , R & R should form
maintains three cache states - Modified (M), Shared (S) and

single length single cycle attractors, preferably each CA

Invalid (I) (Figure 5).
should have single length cycle attractor. That is, the de-
This work detects faults/defects in the CC logic that leads
sign demands SACA. Further, the SACA should correctly
to an incorrect recording of cache states. In column 1 of
distinguish Case 1-4 and Case 5-6 of Section IV.
Table II, we describe the different possible current states
330
331
Table II
S TATE T RANSITIONS
Current Cache States Event Desired Next States Faulty Next States Effect of Fault
(1) (2) (3) (4) (5)
All Cs I Pi writes Ci M and all others I Ci M and all others I Coherent state (CH)
All Cs S Pi writes Ci M and all others I Ci M and all others I Coherent state (CH)
Ci M and others S & I Incoherent state (ICH)
Cs are I & S Pi writes Ci M and all others I Ci M and others S & I Incoherent state (ICH)
Cj M and all others I Pi reads Ci & Cj S others I Cj M Ci S and others I Incoherent state (ICH)
Pi writes Ci M and all others I Ci & Cj M and others I Incoherent state (ICH)
A. The SACA rules 15
Since the next state of a single length cycle attractor is the 7

attractor itself, there should be at least one RMT (Section II)
14
of each cell rule of R for which the cell does not change

its state in the next time step. For example, the RMT xdx 3
13 12 11
(x=0/1, d=0/1) of a rule is considered to find the next
6
state of cell when the current states of its left neighbor

( cell), self and right neighbor cell) are x,

4 1
d and x respectively. To get a single length cycle attractor, 8
9 2
the RMT xdx of is to be d(0/1). It implies that the state
change in cell is . That is, for rule , if the RMT
0
10 5

0(000), 1(001), 4(100), 5(101) are 0, then the CA cell ,
configured with , does not change its state. Similarly, if
the RMTs 2(010), 3(011), 6(110) or 7(111) are 1 in a rule ,
Figure 6. State transition diagram of " # % ' " # % ' " # % ' " # % )
a cell configured with can stick to its current state in the
next time step. For example, when a CA cell is configured 7 10

with the rule 204, all its RMTs help the formation of single 14 4 5
2
length cycle attractor. On the other hand, RMTs of 51 deny
13
the attractor formation.
Property 1: [15] A rule can contribute to the formation of

9
3 12 1
single length cycle attractor(s) if at least one of the s
0, 1, 4 or 5 is 0, and/or at least one of the s 2, 3, 6
or 7 is 1. That is, =0 : at least for one i 0, 1, 4, 5

0 15 6 8 11
and/or =1 : at least for one i 2, 3, 6, 7.

Based on Property1, the 256 rules are classified [15] Figure 7. State transition diagram of "
*
+ ' "
*
+ ' "
*
+ ' "
*
+ )
in 9 groups (group 0-8). The rule 207 (11001111) is in group
6 as it follows Property1 for 6 RMTs. A configured

with the rules that maintain Property1 for most of the RMTs 144, 160, 166, 170, 180, 184, 219, 226, 235, 240, 249, 255.
can be a probable candidate for single length single cycle The rules 0, 10, 15, 24, 40, 66, 80, 85, 96, ... form single
attractor CA. The following observations are the outcome of graph i.e. SACA (Table III, group 4).
our extensive experimentations [15]. Observation 3. In group 2, the uniform

designed only
Observation 1. Most of the rules of group 6 form single with the rule 34/48 can form SACA.
Observation 4. The uniform

length cycle attractor . Out of these, some rules (e.g.

designed with the rules

201) form multi-graph and some (e.g. 136, 192) form single

of group 3 form single length cycle attractors only. Among
graph -that is, (Figure 6).
those - 2, 16, 32, 42, 56, 98, 112, 162, and 176 form SACA.
Observation 2. A number of rules from group 4 form both It is observed that to form an SACA, the CA rules should
the single & multi-length cycles and single & multi-graphs. follow Property 1. However, a rule (e.g. rule 204) that
The state transition diagram of a 4-bit is noted in

maintains Property 1 for all its RMTs can’t form an SACA.

Figure 7. The single (state 10) & multi-length (states 8 & Property 2: To form a uniform SACA with rule ’r’, the ’r’
11) cycle attractors are shown. It is multi-graph. The rules must deny Property 1 for some RMTs.

that form only single length cycle uniform are 0, 10, The rule 48 of group 2 denies Property 1 for 6 RMTs.
15, 20, 24, 30, 36, 40, 46, 66, 80, 85, 90, 96, 106, 120, 130, On the other hand, rule 192 of group 6 denies Property 1
331
332
Table III Structure of Rule 192 Structure of Rule 207
CA RULES FOR UNIFORM SACA
RMT 111 110 101 100 011 010 001 000 RMT 111 110 101 100 011 010 001 000
group Rule for SACA 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1

2 34, 48
3 2, 16, 32, 42, 56, 98, 112, 162, 176
Property 1 denied
4 0, 10, 15, 24, 40, 66, 80, 85, 96
130, 144, 160, 170, 184, 226, 240, 255
5 8, 64, 128, 138, 143, 152, 168, 194, 208, 213, 224 RMT 111 110 101 100 011 010 001 000 RMT 111 110 101 100 011 010 001 000
6 136, 192 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0
Structure of Rule 15 Structure of Rule 14
11 Figure 9. Structure of rule 14, 15, 192 and 207

8 9 10 3
11 10 8 9 12
14 15 3 15 5 13
1 0 5 207 (R ) is 0, 1 (Figure 9). Therefore, rule 192 and 207

0 1

2 follow Property 3. That is, the CA resulted from an uniform

12 SACA (with rule 192) due to faults at single or multiple

2 14 4
6 7 6 7 nodes is a hybrid CA (hybridized with rule 207). It generates
4 13 multiple attractors and when initialized with all 1s seed,
settles to an attractor with LSB = 1 (Figure 8).
a) CA<192, 207, 192, 192> b) CA<192, 207, 207, 192> Further, to be noted that both the CA formed for Case
4 and Case 6 are resulted from hybridization of uniform
Figure 8. State transition diagrams of the CA resulted due to hybridization SACA of Case 1. It signifies the scheme to select one set
of R , R & R , satisfying Cond 1, is hard.

In the current work, we propose 2-stage (Stage 0 & Stage

only for 2 RMTs. Each of these rules forms SACA for all 1) solution to satisfy Cond 1 assuming even number of L1
lengths. caches (processors) in the system.
Table III displays the rules that form SACA of any Stage 0: For a system with processors, we choose
arbitrary length. The SACA, synthesized following Property R = 240, R = 15 and R = 14.

2, is employed to develop the proposed scheme for detecting Theorem 1: The null boundary n-cell CA, configured with
any violation of cache coherence. rule 15 and 240 in any sequence, forms SACA. The depth
of the CA is n.
B. SACA rule selection for R , R & R
Theorem 2: If a null-boundary CA, configured with rule

The rules R , R & R required for the design should
15, is hybridized with rule 14, its state transition behavior

primarily form uniform SACA. That is, these should follow remains as that of the uniform CA configured with rule 15
Property 1 and 2. Such rules are shown in Table III. -that is, rule 14 is absorbed into rule 15.
However, in Case 1 & 2 (Section IV), the CA formed are This combination of rules forms two classes of (n+2)-CA
the uniform SACA. Say, the attractors of these class of CA (even length) with LSB as ‘0’ and ‘1’ (the two terminal
are & . For Case 3 & 4, it results in hybrid CA, say,

cells rules are always set to 15), while considering all 1s

with attractors

& . Similarly, the CA formed in Case

seed (initial seed), for Case 1-6. The LSB ‘0’ signifies Case

5 & 6 are also hybrid. Let us assume these form attractors 1, 2, 3 & 6 and ‘1’ is for Case 4 & 5. However, Cond

&

respectively. The best selection of R , R & R ,

1 states that Case 6 attractor can’t belong to the attractor

therefore, can be such that set of Case 1, 2 & 3 and Case 4 attractor must be in the

Cond 1: = , , , belongs to CH and

set of attractors for Case 1, 2 & 3. This misclassification is

=

,

belongs to ICH are different.

corrected in Stage 1 of the verification process.

The CA for Case 6 is resulted from the uniform CA for Stage 1: (a) To differentiate Case 6 and Case 1, 2 & 3 at
Case 1 through hybridization of R . Now, to ensure

the end of Stage 0, we select

, the following properties are to be satisfied. R = 207, R = 192 and R = 192.

Property 3: If a uniform SACA with rule R is hybridized

As the rule 192 & 207 follow Property 3, the attractor of
by a cell rule R , it can generate new attractors only if the hybrid CA (Case 6) is different from the attractor of (n+2)-
set of RMTs of R , for which Property 1 is denied, is not a cell uniform CA with rule 192 (Figure 8). The LSB of the

subset of the set of RMTs of R , for which also the Property attractors for Case 1, 2 & 3 is 0 (CH) but it is 1 for Case 6
(ICH) (here the terminal cells are set to 192).

1 is denied.
For example, the RMTs of rule 192 (R ) for which (b) To separate out Case 4 and Case 5 at the completion
Property 1 is denied are 2, 3 and the similar set for rule
of Stage 0, the selected rules are
332
333
Stage 0 Rm=240 Ri=15
D D FF D FF D D
Rs=14 FF
Cell i−1 Cell i Cell i+1
at
or
Q
tra
Q Q Q Q
ct
0 1
ct
tra
or
Si−1 Si
at Si+1
Stage 1 Rm=207 Ri=192 Rm=192 Ri=192
S’i−1
Rs=192 Rs=207
at
at
or
or
tra
Si−1. Si
tra
S’i−1 + Si
ct
ct
0 1
ct
tra
0
tra
ct
or
S’i−1.Si + S’i−1.Si+1
or
at
at
0
CH ICH 1
M
2
3
d a
Figure 10. CA based two-stage verification of data inconsistencies 4
5
t
6
a
7 Output
8 U
R = 192, R = 192 and R = 207. 9
l i

10
Here, the presence of R in the (n+2)-cell (terminal cells
n

11
e
are with rule 192) hybrid CA generates attractor with LSB 12
s
‘1’ (Figure 8(a)) that corresponds to Case 5 (ICH). Case 4 13 X
14
corresponds to the uniform CA with rule 192 and its attractor 15
1−bit
1−bit select
is ‘0’. Attractor Stage
sta t e at Ci
The two stage process is described in Figure 10. The
Pi1 Pi0
hardware realization is reported in the next subsection.
States of cache line B at different caches (2n−bit)
C. Realization of the test unit
Figure 11. Hardware realization of test unit
The states M, I and S of a cache line are represented as
the 00, 10 and 11 respectively (01 is don’t care state). At
Stage 0 (encoded as ‘0’) the cache line B’s state (at Ci) ‘00’ we choose all the six cases (Case 1- Case 6 of section IV).
implies the rule 240 (R ) is to be set, it is 15 (R ) for state
The CA rules configured for the cache line states, in stage 0

‘10’ and rule 14 (R ) for ‘11’.
and stage 1 are shown in Column 3 and Column 4. Column

To set the CA cell rule at Stage 1 (encoded as 1), the

5 and Column 6 report the attractors formed after the CA

design accepts the state (00/10/11) of B at Ci and the LSB have been run for (n+2)-steps. The LSBs of the attractors
(0/1) of the attractor generated at Stage 0. (0/1) are indicated by boldfaces. Depending upon the LSB
The hardware realization of the design is shown in Figure of the attractor at stage 0, stage 1 CA cell rules are set. It can
11. The 16 to 1 multiplexer (MUX) and the other combina- be observed that the LSB of an attractor at stage 1 correctly
tional logic (shown in the figure) generate the next state of a indicates whether the cache states are in coherent state (‘0’
CA cell. For example, if the cache line B’s state at cache
for coherent(CH), ‘1’ for incoherent(ICH)). The decision is

Ci is ‘modified’ (R ) -that is, P P =00 and the attractor
noted in the last Column of Table IV.

at Stage 0 is 1 (‘attractor’ of Figure 11 =1), then the rule

VII. C ONCLUSION
selected for the CA cell at Stage 1 (‘stage’=1 in Figure

11) is 192 (Figure 10). In Figure 11, this sets the select lines This work proposes an efficient solution for detecting
of the MUX as 0011. It implies the output of the MUX is faults in the logic entrusted with cache coherence. The
S .S -that is, the rule 192. The part of the next state logic,

solution targets the Chip Multi-Processors with thousands

shown in Figure 11, is also shared in realizing the next state

of processors. It avoids rigorous computational and commu-

logic of cell i-1 and cell i+1. nication overhead assuring robust and scalable design. The
proposed design is developed around the regular structure of
VI. E XPERIMENTAL R ESULTS CA (Cellular Automata). A special class of CA called SACA
This section reports the experimental results establishing is introduced. This enables simple hardware realization of
the effectiveness of CA based scheme for verification of the design leading to quick identification of incoherency in
cache inconsistencies in CMPs. In the experimentation, a cache system.
we assume CMPs with 16 to 1024 number of processors R EFERENCES
(column 1 of Table IV). For each such a system, we generate
arbitrary sequence of states of a cache line B at different [1] P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L.
Alvisi. Modeling the effect of technology trends on the soft
private (L1) caches. Column 2 of Table IV notes the chosen error rate of combinational logic. International Conference on
sequence of cache states. For an n (number of processors), Dependable systems and Networks, June 2002, pp.389-398.
333
334
Table IV
E XPERIMENTAL RESULTS
No. of Cache CA Rules Attractors Decision

Processors states Stage 0 Stage 1 Stage 0 Stage 1
16 I I ....I 15 15 .......15 192......192 1010....10 0.....0 CH
S S ....S 14 14 .......14 192......192 1........0 0.....0 CH
I S ....I 15 14........15 192 .....192 1010....10 0.....0 CH
I M I...I 15 240 15....15 192 192 192...192 10....... 1 0.....0 CH
I M S I...I 15 240 14 15...15 192 192 207....192 10......1 0.....1 ICH
I M MI....I 15 240 240 15 ...15 192 207 207 ...192 10......0 0.....1 ICH
32 I I ....I 15 15 .......15 192......192 1010....10 0.....0 CH
S S ....S 14 14 .......14 192......192 1........0 0.....0 CH
I S ....I 15 14........15 192 .....192 1010....10 0.....0 CH
I M I...I 15 240 15....15 192 192 192...192 10....... 1 0.....0 CH
I M S I...I 15 240 14 15...15 192 192 207....192 10......1 0.....1 ICH
I M MI....I 15 240 240 15 ...15 192 207 207 ...192 10......0 0.....1 ICH
64 I I ....I 15 15 .......15 192......192 1010....10 0.....0 CH
S S ....S 14 14 .......14 192......192 1........0 0.....0 CH
I S ....I 15 14........15 192 .....192 1010....10 0.....0 CH
I M I...I 15 240 15....15 192 192 192...192 10....... 1 0.....0 CH
I M S I...I 15 240 14 15...15 192 192 207....192 10......1 0.....1 ICH
I M MI....I 15 240 240 15 ...15 192 207 207 ...192 10......0 0.....1 ICH
256 I I ....I 15 15 .......15 192......192 1010....10 0.....0 CH
S S ....S 14 14 .......14 192......192 1........0 0.....0 CH
I S ....I 15 14........15 192 .....192 1010....10 0.....0 CH
I M I...I 15 240 15....15 192 192 192...192 10....... 1 0.....0 CH
I M S I...I 15 240 14 15...15 192 192 207....192 10......1 0.....1 ICH
I M MI....I 15 240 240 15 ...15 192 207 207 ...192 10......0 0.....1 ICH
1024 I I ....I 15 15 .......15 192......192 1010....10 0.....0 CH
S S ....S 14 14 .......14 192......192 1........0 0.....0 CH
I S ....I 15 14........15 192 .....192 1010....10 0.....0 CH
I M I...I 15 240 15....15 192 192 192...192 10....... 1 0.....0 CH
I M S I...I 15 240 14 15...15 192 192 207....192 10......1 0.....1 ICH
I M MI....I 15 240 240 15 ...15 192 207 207 ...192 10......0 0.....1 ICH
[2] Hui Wang, Sandeep Baldawa, Rama Sangireddy. Dynamic [9] Rui Gong, Kui Dai, Zhiying Wang Transient Fault Recovery
Error Detection for Dependable Cache Coherency in Multicore on Chip Multiprocessor based on Dual Core Redundancy and
Architecture. VLSI Design conference, January 2008. Context Saving. IEEE Int Conference for Young Computer
Scientists, 2008.
[3] Liqun Cheng, NAveen Muralimanohar, Karthik Ramani, Ra-
jeev Balasubrsmonian, Jihn B. Carter. Interconnect-Aware [10] Ransford Hyman, Koustav Bhattacharya, Nagarajan Ran-
Coherence Protocols for Chip Multiprocessors. The 33rd IEEE ganathan, Redundancy Mining for Soft Error Detection in
International Symposium on Computer Architecture (ISCA’06), Multicore Processors. IEEE Trans on Computers, VOL. 60,
2006. NO. 8, August 2011.
[4] Akira Yamawaki, Masahiko Iwane. Coherence Maintenances [11] Pramod Subramanyan, Virendra Singh, Kewal K. Saluja, Erik
to realize an efficient parallel processing for a Cache Memory Larsson Energy-Efficient Fault Tolerance in Chip Multipro-
with Synchronization on a ChipMultiprocessor. ISPAN, 2005. cessors Using Critical Value Forwarding. 201O IEEE/IIFIP
International Conference on Dependable Systems & Networks
[5] Jichuan Chang, Gurindar S. Sohi Cooperative Caching for (DSN).
Chip Multiprocessors. ISCA, 2006.
[12] J. L. Hennesy and D. A. Patterson. Computer Architecture: A
Quantitative Approach, 3rd Edition. Morgan Kaufmann, 2003.
[6] Rana Ejaz Ahmed Energy-Aware Cache Coherence Protocol
for Chip-Multiprocessors. IEEE CCECE/CCGEI, Ottawa, May [13] S. Wolfram. Cellular Automata and Complexity — Collected
2006. Papers. Addison Wesley, 1994.
[7] Alberto Ros, Manuel E. Acacio, Jose M. Garca A Direct Co- [14] P Pal Chaudhuri, D Roy Chowdhury, S Nandi, and S Chat-
herence Protocol for Many-Core Chip Multiprocessors. IEEE terjee. Additive Cellular Automata – Theory and Applications,
Trans on Parallel and Distributed Ststems, VOL. 21, NO. 12, volume 1. IEEE Computer Society Press, California, USA,
December 2010. ISBN 0-8186-7717-1, 1997.
[8] Linzhi Ning, Wenbin Yao, Jun Ni, Nianmin Yao Fault-Tolerant [15] Sukanta Das, Nazma N Naskar, Sukanya Mukherjee, Mamata
CMP Architecture Based on SMT Technology. IEEE In- Dalui and Biplab K Sikdar. Characterization of CA Rules For
ternational Multisymposium on Computer and Computational SACA Targeting Detection of Faulty Nodes In WSN. ACRI-
Sciences, 2007. 2010.
334
335

An Efficient Test Design For Verification of Cache Coherence in Cmps

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Efficient Test Design For Verification of Cache Coherence in Cmps

Uploaded by

Copyright:

Available Formats

2011 IEEE

An Efﬁcient Test Design for Veriﬁcation of Cache Coherence in CMPs

Mamata Dalui Biplab K Sikdar

978-0-7695-4612-4/11 $26.00 © 2011 IEEE 329

OUT IN OUT IN OUT IN OUT IN OUT

null boundar Combinational logic circuit null boundar

Figure 1. An n-cell null boundary CA

on such 3-neighborhood (self, left and right neighbors), 13

neighbor, self and right neighbor of the cell at time

Therefore, the next state of an cell is =

The next state function of the CA cell can be ex- 5 D F

Deﬁnition 6. A set of states of a CA forms loop (cycle)

(7 7 and 9 1 9 of Figure 3) and is referred to as the

ﬁrst row lists the possible (8) combinations of present l o

of Figure 3 is of our current interest.

rows indicate the next states of the cell at (t + 1), forming 5 D F

In the present work, we assume the shared bus architecture

to describe our scheme. In CMPs, the bus based cache

coherence messages. The all other L1s of the system update

of the CA. coherence messages.

Deﬁnition 5. From the view point of Switching Theory, a 12

Figure 3. A 4-cell irreversible CA

signal Pjs to invalidate B

signal to Pjs (sharers) for the states of cache line B when -

Case 1: All the caches (Cs) as I (invalid)

Pj read miss: Case 2: All the caches (Cs) as S (shared)

Case 3: Some Cs I and others S

...Cn, where Ci is the cache of processor Pi), we employ

conﬁgured with rule R . Similarly, if the state of a Ci is S

or I, corresponding CA cell is conﬁgured with Rule R or

Now at each transition from current cache state to next

intensive steps to detect inconsistencies in shared data of

IV. OVERVIEW OF THE P ROPOSED T EST D ESIGN

single length single cycle attractors, preferably each CA

A. The SACA rules 15

Since the next state of a single length cycle attractor is the 7

( cell), self and right neighbor cell) are x,

change in cell is . That is, for rule , if the RMT

0(000), 1(001), 4(100), 5(101) are 0, then the CA cell ,

conﬁgured with , does not change its state. Similarly, if

the RMTs 2(010), 3(011), 6(110) or 7(111) are 1 in a rule ,

a cell conﬁgured with can stick to its current state in the

next time step. For example, when a CA cell is conﬁgured 7 10

0, 1, 4 or 5 is 0, and/or at least one of the s 2, 3, 6  

or 7 is 1. That is, =0 : at least for one i 0, 1, 4, 5  

and/or =1 : at least for one i 2, 3, 6, 7.

in 9 groups (group 0-8). The rule 207 (11001111) is in group  

6 as it follows Property1 for 6 RMTs. A conﬁgured

length cycle attractor . Out of these, some rules (e.g.

designed with the rules

maintains Property 1 for all its RMTs can’t form an SACA.

group Rule for SACA 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1

Structure of Rule 15 Structure of Rule 14

11 Figure 9. Structure of rule 14, 15, 192 and 207

2 follow Property 3. That is, the CA resulted from an uniform

12 SACA (with rule 192) due to faults at single or multiple

In the current work, we propose 2-stage (Stage 0 & Stage

Theorem 2: If a null-boundary CA, conﬁgured with rule

15, is hybridized with rule 14, its state transition behavior

cells rules are always set to 15), while considering all 1s

& . Similarly, the CA formed in Case  

respectively. The best selection of R , R & R ,

1 states that Case 6 attractor can’t belong to the attractor

0, 1, 4 or 5 is 0, and/or at least one of the s 2, 3, 6

or 7 is 1. That is, =0 : at least for one i 0, 1, 4, 5

in 9 groups (group 0-8). The rule 207 (11001111) is in group

& . Similarly, the CA formed in Case