Professional Documents
Culture Documents
UNIT IV
Coherence Misses
Def : The misses that rise from interprocessor communications. It can be broken down to 2 separate sources
True sharing misses False sharing misses
P1 Write x1
P2 Read x2
Example Result
1: True sharing miss (invalidate P2) 2: False sharing miss x2 was invalidated by the write of P1, but that value of x1 is not used in P2 3: False sharing miss The block containing x1 is marked shared due to the read in P2, but P2 did not read x1. A write miss is required to obtain exclusive access to the block 4: False sharing miss 5: True sharing miss
Solution Explanation
1. This event is a true sharing miss, since x1 was read by P2 and needs to be invalidated from P2. 2. This event is a false sharing miss, since x2 was invalidated by the write of x1 in P1, but that value of x1 is not used in P2. 3. This event is a false sharing miss, since the block containing x1 is marked shared due to the read in P2, but P2 did not read x1. a write miss is required to obtain exclusive access to the block 1. This event is a false sharing miss for the same reason as step 3. 2. This event is a true sharing miss, since the value being read was written by P2.
10
Performance Measurements
The following are the diff. performance measurements of symmetric shared memory multiprocessors: 1. Commercial workload 2. Multiprogramming & OS workload 3. Scientific / Technical workload
11
The following model are taken for the performance measurements of the commercial workload Alphaserver 4100 Configurable simulator model
12
1. L2 is a 96 KB on-chip unified 3-way set associative cache with a 32-byte block size, using write-back. 2. L3 is an off-chip, combined, direct-mapped 2 MB caches with 64-byte blocks also using write-back.
13
15
19
Explanations
True sharing & False sharing unchanged going from 1 MB to 8 MB ( L3 cache ) Uniprocessor cache misses improve with cache size increase(Instruction, Capacity/Conflict, Compulsory) L3 cache simulated as two-way set associative. The cold, false sharing and true sharing are unaffected by L3 cache Contribution to memory access cycles increase as processor count increase primarily due to increase true sharing Increase in true sharing miss rate leads to an overall increase in memory access cycles per instruction
23
24
25
A significant I-cache performance loss (at least for OS) I-cache miss rate in OS for a 64-byte block size, 2-way se associative: 1.7% (32KB) 0.2% (256KB) I-cache miss rate in user-level: 1/6 of OS rate
27
29
Reasons
The reasons in which the behavior of the OS is more complex than the user processes are 1. The kernal initializes all pages before allocating to a user 2. The kernal shares data and has a non trivial coherence miss rate
30
31
32
33
34
Stay constant
35
36
37