Professional Documents
Culture Documents
CSCE430/830 Computer Architecture: Memory Hierarchy: Set-Associative Cache
CSCE430/830 Computer Architecture: Memory Hierarchy: Set-Associative Cache
CSCE430/830
Memory: Set-Associative $
Cache performance
Miss-oriented Approach to Memory Access:
MemAccess
CPUtime IC CPI
MemMisses
CPUtime IC CPI
MissPenalty CycleTime
Execution
Inst
CPUtime IC
AluOps
MemAccess
AMAT CycleTime
Inst
Memory: Set-Associative $
In reality:
MemAccess
MissRate MissPenalty
Inst
IC (1 0.5) 0.02 25 IC 0.75
MemoryStallCycles IC
Memory: Set-Associative $
For gcc, the frequency for all loads and stores is 36%.
instruction cache miss rate for gcc = 2%
data cache miss rate for gcc = 4%.
If a machine has a CPI of 2 without memory stalls
and the miss penalty is 40 cycles for all misses,
how much faster is a machine with a perfect cache?
CSCE430/830
3.376
2
1.69
Memory: Set-Associative $
For gcc, the frequency for all loads and stores is 36%
Instruction miss cycles
= IC x 2% x 80
= 1.600 x IC
Data miss cycles
= IC x 36% x 4% x 80
= 1.152 x IC
2.752 x IC
I x CPIslowClk x Clock period
3.376
I x CPIfastClk x Clock period
CSCE430/830
4.752 x 0.5
= 1.42 (not 2)
Memory: Set-Associative $
Fundamental Questions
Q1: Where can a block be placed in the upper level?
(Block placement)
Q2: How is a block found if it is in the upper level?
(Block identification)
Q3: Which block should be replaced on a miss?
(Block replacement)
Q4: What happens on a write?
(Write strategy)
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C
Memory
CSCE430/830
Memory: Set-Associative $
Mem Block
0
DM Hit/Miss
Block 0
Block 1
Block 2
Block 3
CSCE430/830
Memory: Set-Associative $
Mem Block
0
DM Hit/Miss Block 0
miss
Block 1
Mem[0]
Block 2
Block 3
Memory: Set-Associative $
Mem Block
0
8
DM Hit/Miss Block 0
miss
Block 1
Mem[0]
Block 2
Block 3
CSCE430/830
Memory: Set-Associative $
Mem Block
0
8
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
Mem[8]
Block 3
CSCE430/830
Memory: Set-Associative $
Set 0 contains Mem[0]. Overwrite with
Mem[8]
Mem Block
0
8
0
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
Mem[8]
Block 3
CSCE430/830
Memory: Set-Associative $
Mem Block
0
8
0
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
miss
Mem[0]
Block 3
CSCE430/830
Memory: Set-Associative $
Set 0 contains Mem[8]. Overwrite with
Mem[0]
Mem Block
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
miss
Block 3
0
8
CSCE430/830
Mem[0]
Memory: Set-Associative $
Mem Block
0
8
0
6
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
miss
miss
Mem[0]
Mem[6]
Block 3
Memory: Set-Associative $
Mem Block
0
8
0
6
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
miss
miss
Mem[0]
Mem[6]
Block 3
CSCE430/830
Memory: Set-Associative $
Mem Block
0
8
0
CSCE430/830
DM Hit/Miss Block 0
miss
Block 1
miss
Block 2
miss
miss
miss
Mem[8]
Mem[6]
Block 3
Memory: Set-Associative $
Set 0 contains Mem[0]. Overwrite with
Mem[8]
Cache
00001
CSCE430/830
00101
01001
01101
10001
Memory
10101
11001
11101
Memory: Set-Associative $
00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C
Memory
CSCE430/830
Memory: Set-Associative $
Mem Block
0
DM Hit/Miss
S
e
t
0
CSCE430/830
Mem Block
DM Hit/Miss
miss
CSCE430/830
S
e
t
0
Mem
[0]
Mem Block
DM Hit/Miss
miss
CSCE430/830
S
e
t
0
Mem
[0]
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
CSCE430/830
S
e
t
0
Mem
[0]
Mem
[8]
Mem Block
DM Hit/Miss
miss
miss
S
e
t
0
Mem
[0]
Mem
[8]
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
S
e
t
0
Mem
[0]
Mem
[8]
Memory: Set-Associative $
DM Hit/Miss
miss
miss
hit
S
e
t
0
Mem
[0]
Mem
[8]
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
miss
CSCE430/830
S
e
t
0
Mem
[0]
Mem
[8]
Mem
[6]
DM Hit/Miss
miss
miss
hit
miss
S
e
t
0
Mem
[0]
Mem
[8]
Mem
[6]
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
miss
hit
CSCE430/830
S
e
t
0
Mem
[0]
Mem
[8]
Mem
[6]
00011
00100
00101
00110
Set 0
Bloc
k0
Bloc
k1
Bloc
k2
Bloc
k3
Bloc
k4
Bloc
k5
Bloc
k6
Bloc
k7
00111
01000
01001
01010
01011
PRO:
CON:
CSCE430/830
01100
01101
01110
01111
Memory: Set-Associative $
00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C
Memory
CSCE430/830
Memory: Set-Associative $
00000
Mem block 0
00001
00010
00011
00100
00101
00110
Set
00
Set
01
Set
10
Set
11
Block 0
Block 1
00111
01000
Mem block 8
01001
01010
01011
01100
01101
01110
01111
CSCE430/830
Memory: Set-Associative $
Mem Block
0
DM Hit/Miss
Set 0
Set 1
CSCE430/830
Mem Block
DM Hit/Miss
miss
CSCE430/830
Set 0
Mem[0]
Set 1
Mem Block
DM Hit/Miss
miss
CSCE430/830
Set 0
Mem[0]
Set 1
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
CSCE430/830
Set 0
Mem[0]
Mem[8]
Set 1
Mem Block
DM Hit/Miss
miss
miss
Set 0
Mem[0]
Mem[8]
Set 1
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
CSCE430/830
Set 0
Mem[0]
Mem[8]
Set 1
Mem Block
DM Hit/Miss
miss
miss
hit
Set 0
Mem[0]
Mem[8]
Set 1
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
miss
CSCE430/830
Set 0
Mem[0]
Mem[6]
Set 1
Mem Block
DM Hit/Miss
miss
miss
hit
miss
Set 0
Mem[0]
Mem[6]
Set 1
CSCE430/830
Memory: Set-Associative $
Mem Block
DM Hit/Miss
miss
miss
hit
miss
miss
CSCE430/830
Set 0
Mem[8]
Mem[6]
Set 1
00000
Mem block 0
00001
00010
00011
00100
Block 0
00101
00110
00111
Block 1
01000
Mem block 8
01001
01010
01011
01100
Set
11
CON:
Must search set for hit/miss
01101
CSCE430/830
01110
01111
Memory: Set-Associative $
Associativity Considerations
DM and FA are special cases of SA cache
Set-Associative: n/m sets; m blocks/set (associativity=m)
Direct-Mapped: m=1 (1-way set-associative, associativity=1)
Fully-Associative: m=n (n-way set-associative, associativity=n)
Memory: Set-Associative $
CSCE430/830
Tag
0x00001C0
Data
0xff083c2d
Memory: Set-Associative $
Index
Block
Offset
FullyAssociative:Noindex
DirectMapped:Largeindex
CSCE430/830
Memory: Set-Associative $
0x0000000
Cache
Index
DATA
Byte Offset
HIT =1
3 0
ADDR
V
1
0
1
1
0
0
1
0
Tag
Data
0x00001C0 0xff083c2d
0x0000000 0x00000021
0x0000000
0x00000103
CACHE SRAM
0x23F0210 0x00000009
DATA[59] DATA[58:32]
DATA[31:0]
=
CSCE430/830
Memory: Set-Associative $
Key idea:
31 30
12 11 10 9 8
8
22
Advantages:
3210
Disadvantage:
More tag bits
More hardware
Higher access time
Index
0
1
2
Tag
Data
Tag
Data
Tag
Data
Tag
Data
253
254
255
22
32
4-to-1 multiplexor
Hit
Data
Memory: Set-Associative $
tag in 11110111
CSCE430/830
Memory: Set-Associative $
How many total bits are needed for a direct- mapped cache with
64 KBytes of data and 8 word blocks, assuming a 32-bit address?
CSCE430/830
Memory: Set-Associative $
How many total bits would be needed for a 4-way set associative
cache to store the same amount of data
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Replacement Algorithms
When a block is fetched, which block in the target set should be
replaced?
Optimal algorithm:
replace the block that will not be used for the longest time
(must know the future)
Usage based algorithms:
Least recently used (LRU)
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Write
Through
Store
Memory
Processor
Cache
Load
CSCE430/830
Cache
Load
Memory: Set-Associative $
Write
Back
Store by processor only updates cache line
Modified line written to memory only when it is evicted
Requires dirty bit for each line
Write
Back
Processor Store
Cache
Load
CSCE430/830
Memory
Cache
Load
Memory: Set-Associative $
Store Miss?
Write-Allocate
Bring written block into cache
Update word in block
Anticipate further use of block
No-write Allocate
Main memory is updated
Cache contents unmodified
CSCE430/830
Memory: Set-Associative $
Cache Basics
Cache: level of temporary memory storage between
CPU and main memory. Improves overall memory
speed by taking advantage of the principle of locality
Cache is divided into sets; each set holds from a
particular group of main memory locations
Cache parameters
Cache size, block size, associativity
CSCE430/830
Memory: Set-Associative $
Classifying Misses: 3C
CompulsoryThe first access to a block is not in the cache, so
the block must be brought into the cache. Also called cold start
misses or first reference misses.
(Misses in even an Infinite Cache)
CapacityIf the cache cannot contain all the blocks needed
during execution of a program, capacity misses will occur due
to blocks being discarded and later retrieved.
(Misses in Fully Associative Size X Cache)
ConflictIf block-placement strategy is set associative or direct
mapped, conflict misses (in addition to compulsory & capacity
misses) will occur because a block can be discarded and later
retrieved if too many blocks map to its set. Also called collision
misses or interference misses.
(Misses in N-way Associative, Size X Cache)
CSCE430/830
Memory: Set-Associative $
Classifying Misses: 3C
3Cs Absolute Miss Rate (SPEC92)
0.14
0.12
Conflict
1-way
2-way
0.1
4-way
0.08
8-way
0.06
Capacity
0.04
0.02
Compulsory vanishingly
small
CSCE430/830
128
64
32
16
Compulsory
Memory: Set-Associative $
0.14
Conflict
1-way
0.12
2-way
0.1
4-way
0.08
8-way
0.06
Capacity
0.04
0.02
128
64
32
16
Compulsory
Memory: Set-Associative $
1-way
80%
2-way
4-way
8-way
60%
40%
Conflict
Capacity
20%
128
64
32
16
0%
Compulsory
Memory: Set-Associative $
Section 5.5
MemoryAccess
Instruction
Section 5.3
Section 5.4
CSCE430/830
Memory: Set-Associative $
Size of Cache
1K
20%
Miss
Rate
4K
15%
16K
10%
64K
5%
256K
256
128
64
32
16
0%
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Miss
Penalty
Miss
Rate
Block Size
CSCE430/830
Avg.
Memory
Access
Time
Block Size
Block Size
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CPU
Address
=?
Tag
Address
In
Out
Victim Cache
Data
Cache
=?
Write
Buffer
Lower Level Memory
CSCE430/830
Memory: Set-Associative $
caches
Operate exactly as direct-mapping caches when
hit, thus again preserving advantages of the
direct-mapping;
In case of a miss, another block is checked (as if
in set-associative caches), by simply inverting the
most significant bit of the index field to find the
other block in the pseudoset.
real hit time < pseudo-hit time
too many pseudo hits would defeat the purpose
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Adding an L2 Cache
If a direct mapped cache has a hit rate of 95%, a hit time of
4 ns, and a miss penalty of 100 ns, what is the AMAT?
CSCE430/830
Memory: Set-Associative $
Adding an L2 Cache
If a direct mapped cache has a hit rate of 95%, a hit time of
4 ns, and a miss penalty of 100 ns, what is the AMAT?
AMAT = Hit time + Miss rate x Miss penalty = 4 + 0.05 x 100 = 9 ns
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Memory: Set-Associative $
Trace caches
Finds a dynamic sequence of instructions including taken
branches to load into a cache block:
Put traces of the executed instructions into cache blocks as
determined by the CPU
Branch prediction is folded in to the cache and must be
validated along with the addresses to have a valid fetch.
Disadvantage: store the same instructions multiple times
CSCE430/830
Memory: Set-Associative $
CSCE430/830
Memory: Set-Associative $
Cache performance
Miss-oriented Approach to Memory Access:
MemAccess
CPUtime IC CPI
MemMisses
CPUtime IC CPI
MissPenalty CycleTime
Execution
Inst
CPUtime IC
AluOps
MemAccess
AMAT CycleTime
Inst
Memory: Set-Associative $
Calculating AMAT
If a direct mapped cache has a hit rate of 95%, a hit
time of 4 ns, and a miss penalty of 100 ns, what is the
AMAT?
CSCE430/830
Memory: Set-Associative $
Calculating AMAT
If a direct mapped cache has a hit rate of 95%, a hit
time of 4 ns, and a miss penalty of 100 ns, what is the
AMAT?
AMAT = Hit time + Miss rate x Miss penalty = 4 + 0.05 x 100 = 9 ns
CSCE430/830
Memory: Set-Associative $
Impact on Performance
Suppose a processor executes at
Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1
50% arith/logic, 30% ld/st, 20% control
CSCE430/830
Memory: Set-Associative $
Impact on Performance
Suppose a processor executes at
Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1
50% arith/logic, 30% ld/st, 20% control
Suppose that 10% of data memory operations get 50 cycle miss penalty
Suppose that 1% of instructions get same miss penalty
CPI = ideal CPI + average stalls per instruction
1.1(cycles/ins) +
[ 0.30 (DataMops/ins) x 0.10 (miss/DataMop) x 50 (cycle/miss)] +
[ 1 (InstMop/ins) x 0.01 (miss/InstMop) x 50 (cycle/miss)]
= (1.1 + 1.5 + .5) cycle/ins = 3.1
AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54
AMAT HitTime MissRate MissPenalty
HitTime Inst MissRate Inst MissPenalty Inst
CSCE430/830
Memory: Set-Associative $
Example:
16KB I&D: Inst miss ra te =0.6 4%, Da ta mis s rate=6 .47%
32KB unified: Aggregate miss rate=1 .99%
Proc
Unified
Cache-1
Unified
Cache-2
CSCE430/830
Proc
I-Cache-1
D-Cache-1
Unified
Cache-2
Memory: Set-Associative $
Example:
16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%
32KB unified: Ag gregate miss rate=1.99%
Proc
Unified
Cache-1
Unified
Cache-2
CSCE430/830
Proc
I-Cache-1
D-Cache-1
Unified
Cache-2
Memory: Set-Associative $
Memory: Set-Associative $