You are on page 1of 17

Lecture 19: Cache Basics

• Today’s topics:

 Out-of-order execution
 Cache hierarchies

• Reminder:

 Assignment 7 due on Thursday

1
Multicycle Instructions

• Multiple parallel pipelines – each pipeline can have a different


number of stages

• Instructions can now complete out of order – must make sure


that writes to a register happen in the correct order
2
An Out-of-Order Processor Implementation
Reorder Buffer (ROB)

Branch prediction Instr 1 T1


and instr fetch Instr 2 T2
Register File
Instr 3 T3
R1-R32
Instr 4 T4
Instr 5 T5
Instr 6 T6
R1  R1+R2
R2  R1+R3
BEQZ R2 Decode &
R3  R1+R2 Rename T1  R1+R2 ALU ALU ALU
R1  R3+R2 T2  T1+R3
BEQZ T2
Instr Fetch Queue T4  T1+T2 Results written to
T5  T4+T2 ROB and tags
broadcast to IQ
Issue Queue (IQ)
3
Cache Hierarchies

• Data and instructions are stored on DRAM chips – DRAM


is a technology that has high bit density, but relatively poor
latency – an access to data in memory can take as many
as 300 cycles today!

• Hence, some data is stored on the processor in a structure


called the cache – caches employ SRAM technology, which
is faster, but has lower bit density

• Internet browsers also cache web pages – same concept

4
Memory Hierarchy

• As you go further, capacity and latency increase

L1 data or
Registers instruction L2 cache Memory
1KB Cache 2MB 1GB Disk
1 cycle 32KB 15 cycles 300 cycles 80 GB
2 cycles 10M cycles

5
Locality

• Why do caches work?


 Temporal locality: if you used some data recently, you
will likely use it again
 Spatial locality: if you used some data recently, you
will likely access its neighbors

• No hierarchy: average access time for data = 300 cycles

• 32KB 1-cycle L1 cache that has a hit rate of 95%:


average access time = 0.95 x 1 + 0.05 x (301)
= 16 cycles

6
Accessing the Cache
Byte address

101000

Offset
8-byte words

8 words: 3 index bits


Direct-mapped cache:
each address maps to
a unique address

Sets
Data array
7
The Tag Array
Byte address

101000

Tag
8-byte words

Compare

Direct-mapped cache:
each address maps to
a unique address

Tag array Data array


8
Example Access Pattern
Byte address Assume that addresses are 8 bits long
How many of the following address requests
101000 are hits/misses?
4, 7, 10, 13, 16, 68, 73, 78, 83, 88, 4, 7, 10…
Tag
8-byte words

Compare

Direct-mapped cache:
each address maps to
a unique address

Tag array Data array


9
Increasing Line Size
Byte address A large cache line size  smaller tag array,
fewer misses because of spatial locality
10100000
32-byte cache
Tag Offset line size or
block size

Tag array Data array


10
Associativity
Byte address Set associativity  fewer conflicts; wasted power
because multiple data and tags are read
10100000

Tag Way-1 Way-2

Tag array Data array


Compare 11
Associativity
How many offset/index/tag bits if the cache has
Byte address
64 sets,
each set has 64 bytes,
10100000 4 ways

Tag Way-1 Way-2

Tag array Data array


Compare 12
Example

• 32 KB 4-way set-associative data cache array with 32


byte line sizes

• How many sets?

• How many index bits, offset bits, tag bits?

• How large is the tag array?

13
Cache Misses

• On a write miss, you may either choose to bring the block


into the cache (write-allocate) or not (write-no-allocate)

• On a read miss, you always bring the block in (spatial and


temporal locality) – but which block do you replace?
 no choice for a direct-mapped cache
 randomly pick one of the ways to replace
 replace the way that was least-recently used (LRU)
 FIFO replacement (round-robin)

14
Writes

• When you write into a block, do you also update the


copy in L2?
 write-through: every write to L1  write to L2
 write-back: mark the block as dirty, when the block
gets replaced from L1, write it to L2

• Writeback coalesces multiple writes to an L1 block into one


L2 write

• Writethrough simplifies coherency protocols in a


multiprocessor system as the L2 always has a current
copy of data
15
Types of Cache Misses

• Compulsory misses: happens the first time a memory


word is accessed – the misses for an infinite cache

• Capacity misses: happens because the program touched


many other words before re-touching the same word – the
misses for a fully-associative cache

• Conflict misses: happens because two words map to the


same location in the cache – the misses generated while
moving from a fully-associative to a direct-mapped cache

16
Title

• Bullet

17

You might also like