You are on page 1of 8

5.1 [10/10/10/10/10/10/10] <5.

2> for each part of this exercise, assume the initial cache
and memory state as illustrated in Figure 5.35. Each part of this exercise specifies a
sequence of one or more CPU operations of the form:
P#: <op> <address> [<value>] where P# designates the CPU (e.g., P0), <op> is the
CPU operations (e.g., read or write), <address> denotes the memory address, and
<value> indicates the new word to be assigned on a write operation. Treat each action
below as independently applied to the initial state as given in Figure 5.35. What is the
resulting state (i.e., coherence state, tags, and data) of the caches and memory after
the given action?

Show only the blocks that change; for example, P0.B0: (I, 120, 00 01) indicates that
CPU P0s block B0 has the final state of I, tag of 120, and data words 00 and 01. Also,
what value is returned by each read operation?

Figure 5.35 Multicore (point-to-point) multiprocessor.


a. [10] <5.2> P0: read 120
P0: read 120 P0.B0: (S, 120, 0020) returns 0020

b. [10] <5.2> P0: write 120 80


P0: write 120 80 P0.B0: (M, 120, 0080)
P3.B0: (I, 120, 0020)

c. [10] <5.2> P3: write 120 80


P3: write 120 80 P3.B0: (M, 120, 0080)

d. [10] <5.2> P1: read 110


P1: read 110 P1.B2: (S, 110, 0010) returns 0010

e. [10] <5.2> P0: write 108 48


P0: write 108 48 P0.B1: (M, 108, 0048)

f. [10] <5.2> P0: write 130 78


P0: write 130 78 P0.B2: (M, 130, 0078)
M: 110 0030 (write back to memory)

g. [10] <5.2> P3: write 130 78

P3: write 130 78 P3.B2: (M, 130, 0078)

5.3 [20] <5.2> Many snooping coherence protocols have additional states, state
transitions, or bus transactions to reduce the overhead of maintaining cache
coherency. In Implementation 1 of Exercise 5.2, misses are incurring fewer stall cycles
when they are supplied by cache than when they are supplied by memory. Some
coherence protocols try to improve performance by increasing the frequency of this
case. A common protocol optimization is to introduce an Owned state (usually denoted
O). The Owned state behaves like the Shared state in that nodes may only read
Owned blocks, but it behaves like the Modified state in that nodes must supply data on
other nodes read and write misses to Owned blocks. A read miss to a block in either
the Modified or Owned states supplies data to the requesting node and transitions to
the Owned state. A write miss to a block in either state Modified or Owned supplies
data to the requesting node and transitions to state Invalid. This optimized MOSI
protocol only updates memory when a node replaces a block in state Modified or
Owned. Draw new protocol diagrams with the additional state and transitions.

Protocol diagrams

WRITE MISS OR INVALIDATE FOR THIS BLOCK

Shared
Invalid CPU READ

PLACE READ MISS ON BUS

CPU READ HIT


PLACE WRITE MISS ON BUS
ABORT MEMORY ACCESS
WRITEBACK BLOCK

CPU WRITE

WRITE MISS FOR


THIS BLOCK
READ MISS
Owned
Modified
WRITEBACK BLOCK
ABORT MEMORY ACCESS

CPU WRITE HIT


PLACE INVALIDATE ON BUS
CPU READ HIT
CPU READ HIT
CPU WRITE HIT
5.6 [20/20/20/20/20] <5.2> Assume the cache contents of Figure 5.35 and the timing of
Implementation 1 in Figure 5.36. What are the total stall cycles for the following code
sequences with both the base protocol and the new MESI protocol in Exercise 5.5?
Assume that state transitions that do not require interconnect transactions incur no
additional stall cycles.

a. [20] <5.2> P0: read 100


P0: write 100 40
P0: read 100, read miss, satisfied in memory, no sharers MSI: S, MESI: E
P0: write 100 40, MSI: send invalidate, MESI: silent transition from E to M
MSI: 100 + 15 = 115 stall cycles
MESI: 100 + 0 = 100 stall cycles

b. [20] <5.2> P0: read 120


P0: write 120 60
P0: read 120, read miss, satisfied in memory, sharers both to S
P0: write 120 60, both send invalidates
Both: 100 + 15 = 115 stall cycles

c. [20] <5.2> P0: read 100


P0: read 120
P0: read 100, read miss, satisfied in memory, no sharers MSI: S, MESI:E
P0: read 120, read miss, memory, silently replace 120 from S or E
Both: 100 +100 = 200 stall cycles, silent replacement from E

d. [20] <5.2> P0: read 100


P1: write 100 60
P0: read 100, read miss, satisfied in memory, no sharers MSI: S, MESI:E
P1: write 100 60, write miss, satisfied in memory regardless of protocol
Both: 100 + 100 = 200 stall cycles, dont supply data in E state

e. [20] <5.2> P0: read 100


P0: write 100 60
P1: write 100 40
P0: read 100, read miss, satisfied in memory, no sharers MSI: S, MESI: E
P0: write 100 60, MSI: send invalidate, MESI: silent transition from E to M
P1: write 100 40, write miss, P0s cache, write back data to memory
MSI: 100 +15 + 40 +10 = 165 stall cycles
MESI: 100 + 0 +40 + 10 = 150 stall cycles

5.9 [10/10/10/10/15/15/15/15] <5.4> for each part of this exercise, assume the initial
cache and memory state in Figure 5.38. Each part of this exercise specifies a
sequence of one or more CPU operations of the form:
P#: <op> <address> [ <value> ]

Figure 5.37 Multichip, multicore multiprocessor with DSM

Figure 5.38 Cache and memory states in the multichip, multicore multiprocessor
Where P# designates the CPU (e.g., P0,0), <op> is the CPU operation (e.g., read or
write), <address> denotes the memory address, and <value> indicates the new word to
be assigned on a write operation. What is the final state (i.e., coherence state,
sharers/owners, tags, and data) of the caches and memory after the given sequence of
CPU operations has completed? Also, what value is returned by each read operation?

a. [10] <5.4> P0,0: read 100


L1 hit returns 0x0010, state unchanged (M)

b. [10] <5.4> P0,0: read 128


L1 miss, will replace B1 in L1
L2 miss, will replace B1 in L2 which has address 108.
L1 will have 128 in B1 (shared)
L2 will have 128 in B1 (DS, P0,0)
Memory directory entry for 108 will become <DS, C1>
Memory directory entry for 128 will become <DS, C0>

c. [10] <5.4> P0,0: write 128 78


L1 miss, will replace B1 in L1
L2 miss, will replace B1 in L2 which has address 108.L1 will have 128 in B1 (modified)
L2 will have 128 in B1 (DM, P0,0)
Memory directory entry for 108 will become <DS, C1>
Memory directory entry for 128 will become <DM, C0>

d. [10] <5.4> P0,0: read 120


L1 miss, will replace B0 in L1 which has address 100, and return 0x0020.
L2 miss, will replace B0 in L2 which has address 100.
L1 will have 120 in B0 (shared)
L2 will have 120 in B0(DS, P0,0)
Memory directory entry for 120 will become <DS, C0, C1>

e. [15] <5.4> P0,0: read 120


P1, 0: read 120
L1 miss, will replace B0 in L1 of P0,0 hich has address 100, and return 0x0020.
L2 miss, will replace B0 in L2 which has address 100.
P1,0: read 120
L1 miss, will replace B0 in L1 of P1,0 which has address 130, and return 0x0020.
L2 hit
P0,0 L1 will have 120 in B0 (shared)
P0,0 L1 will have 120 in B0 (shared)
L2 will have 120 in B0 (DS, P0,0)
Memory directory entry for 120 will become <DS, C0, C1>

f. [15] <5.4> P0, 0: read 120


P1,0: write 120 80

L1 miss, will replace B0 in L1 of P0,0 which has address 100, and return 0x0020.
L2 miss, will replace B0 in L2 which has address 100.
P1,0: write 120 80
L1 miss, will replace B0 in L1 of P1,0 which has address 130. Invalidate other copies.
L2 hit.
P0,0 L1 will have 120 in B0 (invalid)
P0,0 L1 will have 120 in B0 (modified)
P3,1 L1 will have 120 in B0 (invalid)
L2 S,0will have 120 in B0 (DM, P0,0)
L2 S,1will have 120 in B0 (D I, -)
Memory directory entry for 120 will become <D M, C0 >

g. [15] <5.4> P0, 0: write 120 80


P1,0: read 120

L1 miss, will replace B0 in L1 of P0,0 which has address 100. Invalidate other copies.
L2 miss, will replace B0 in L2 which has address 100.
P1,0: read 120
L1 miss, change to shared state and return 0x0080
L2 hit.
P0,0 L1 will have 120 in B0 (shared)
P0,0 L1 will have 120 in B0 (shared)
P3,1 L1 will have 120 in B0 (invalid)
L2 S,0 will have 120 in B0 (DS, P0,0;P1,0)
L2 S,1 will have 120 in B0 (DI, -)
Memory directory entry for 120 will become <DS, C0>
h. [15] <5.4> P0,0: write 120 80
P1,0: write 120 90

L1 miss, will replace B0 in L1 of P0,0 which has address 100. Invalidate other copies.
L2 miss, will replace B0 in L2 which has address 100.
P1,0: write 120 90
L1 miss, will replace B0 in L1 of P1,0 which has address 130. Invalidate other copies.
L2 hit.
P0,0 L1 will have 120 in B0 (invalid)
P0,0 L1 will have 120 in B0 (modified)
P3,1 L1 will have 120 in B0 (invalid)
L2 S,0 will have 120 in B0 (DM, P1,0)
L2 S,1will have 120 in B0 (D I, -)
Memory directory entry for 120 will become <DM, C0>

You might also like