Professional Documents
Culture Documents
Lecture 17: Memory Hierarchy - Five Ways To Reduce Miss Penalty (Second Level Cache)
Lecture 17: Memory Hierarchy - Five Ways To Reduce Miss Penalty (Second Level Cache)
RHK.S96 1
Review: Summary
CPUtime = IC x (CPIexecution +
Mem Access per instruction x Miss Rate x Miss penalty) x clock cycle time
2. Subblock Placement to
Reduce Miss Penalty
Dont have to load full block on a miss
Have bits per subblock to indicate valid
(Originally invented to reduce tag storage)
Valid Bits
RHK.S96 4
RHK.S96 5
4. Non-blocking Caches to
reduce stalls on misses
Non-blocking cache or lockup-free cache allowing the
data cache to continue to supply cache hits during a
miss
hit under miss reduces the effective miss penalty
by being helpful during a miss instead of ignoring the
requests of the CPU
hit under multiple miss or miss under miss may
further lower the effective miss penalty by overlapping
multiple misses
Significantly increases the complexity of the cache controller as
there can be multiple outstanding memory accesses
RHK.S96 6
1.6
1.4
0->1
1.2
1->2
1
2->64
0.8
Base
0.6
0.4
0.2
Integer
ora
spice2g6
nasa7
alvinn
hydro2d
mdljdp2
wave5
su2cor
doduc
swm256
tomcatv
fpppp
ear
mdljsp2
compress
xlisp
espresso
eqntott
Floating Point
FP programs on average: AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26
Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19
8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss RHK.S96
Definitions:
Local miss rate misses in this cache divided by the total
number of memory accesses to this cache (Miss rateL2)
Global miss ratemisses in this cache divided by the total
number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)
RHK.S96 8
Linear
Cache Size
Log
Cache Size
RHK.S96 9
RHK.S96 10
1.95
1.54
1.36
16
1.28
1.27
32
64
1.34
128
256
512
Block Size
RHK.S96 11
RHK.S96 12
RHK.S96 13
RHK.S96 14
Solution to aliases
HW that guarantees that every cache block has unique physical address
SW guarantee: lower n bits must have same address; as long as covers
index field & direct mapped, they must be unique;
called page coloring
2. Avoiding Translation:
Process ID impact
Black is uniprocess
Light Gray is multiprocess
when flush cache
Dark Gray is multiprocess
when use Process ID tag
Y axis: Miss Rates up to 20%
X axis: Cache size from 2 KB
to 1024 KB
RHK.S96 16
Address Tag
Page Offset
Index
Block Offset
RHK.S96 17
Tag match and valid bit not set: The tag match means that this is the
proper block; writing the data into the subblock makes it appropriate to
turn the valid bit on.
Tag mismatch: This is a miss and will modify the data portion of the
block. As this is a write-through cache, however, no harm was done;
memory still has an up-to-date copy of the old value. Only the tag to the
address of the write and the valid bits of the other subblock need be
changed because the valid bit for this subblock has already been set
MR
+
+
+
+
+
+
+
MP HT
+
+
+
+
+
+
+
+
Complexity
0
1
2
2
2
3
0
1
1
2
3
2
0
2
1
RHK.S96 20
CPU
2000
1999
DRAM
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1960-1985: Speed
= (no. operations)
100
1995
Pipelined
Execution &
Fast Clock Rate 10
Out-of-Order
completion
Superscalar
1
Instruction Issue
1995: Speed =
(non-cached memory accesses)
What does this mean for
Compilers?,Operating Systems?, Algorithms? Data Structures?
RHK.S96 21