Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday

Lecture 19: Cache Basics
• Today’s topics:
 Out-of-order execution
 Cache hierarchies
• Reminder:
 Assignment 7 due on Thursday
1
Multicycle Instructions
• Multiple parallel pipelines – each pipeline can have a different

number of stages
• Instructions can now complete out of order – must make sure

that writes to a register happen in the correct order
2
An Out-of-Order Processor Implementation
Reorder Buffer (ROB)
Branch prediction Instr 1 T1

and instr fetch Instr 2 T2
Register File
Instr 3 T3
R1-R32
Instr 4 T4
Instr 5 T5
Instr 6 T6
R1  R1+R2
R2  R1+R3
BEQZ R2 Decode &
R3  R1+R2 Rename T1  R1+R2 ALU ALU ALU
R1  R3+R2 T2  T1+R3
BEQZ T2
Instr Fetch Queue T4  T1+T2 Results written to
T5  T4+T2 ROB and tags
broadcast to IQ
Issue Queue (IQ)
3
Cache Hierarchies
• Data and instructions are stored on DRAM chips – DRAM

is a technology that has high bit density, but relatively poor
latency – an access to data in memory can take as many
as 300 cycles today!
• Hence, some data is stored on the processor in a structure

called the cache – caches employ SRAM technology, which
is faster, but has lower bit density
• Internet browsers also cache web pages – same concept
4
Memory Hierarchy
• As you go further, capacity and latency increase
L1 data or
Registers instruction L2 cache Memory
1KB Cache 2MB 1GB Disk
1 cycle 32KB 15 cycles 300 cycles 80 GB
2 cycles 10M cycles
5
Locality
• Why do caches work?

 Temporal locality: if you used some data recently, you
will likely use it again
 Spatial locality: if you used some data recently, you
will likely access its neighbors
• No hierarchy: average access time for data = 300 cycles
• 32KB 1-cycle L1 cache that has a hit rate of 95%:

average access time = 0.95 x 1 + 0.05 x (301)
= 16 cycles
6
Accessing the Cache
Byte address
101000
Offset
8-byte words
8 words: 3 index bits

Direct-mapped cache:
each address maps to
a unique address
Sets
Data array
7
The Tag Array
Byte address
101000
Tag
8-byte words
Compare
a unique address
Tag array Data array

8
Example Access Pattern
Byte address Assume that addresses are 8 bits long
How many of the following address requests
101000 are hits/misses?
4, 7, 10, 13, 16, 68, 73, 78, 83, 88, 4, 7, 10…
Tag
8-byte words
Compare
a unique address

9
Increasing Line Size
Byte address A large cache line size  smaller tag array,
fewer misses because of spatial locality
10100000
32-byte cache
Tag Offset line size or
block size

10
Associativity
Byte address Set associativity  fewer conflicts; wasted power
because multiple data and tags are read
10100000
Tag Way-1 Way-2

Compare 11
Associativity
How many offset/index/tag bits if the cache has
Byte address
64 sets,
each set has 64 bytes,
10100000 4 ways
Tag Way-1 Way-2

Compare 12
Example
• 32 KB 4-way set-associative data cache array with 32

byte line sizes
• How many sets?
• How many index bits, offset bits, tag bits?
• How large is the tag array?
13
Cache Misses
• On a write miss, you may either choose to bring the block

into the cache (write-allocate) or not (write-no-allocate)
• On a read miss, you always bring the block in (spatial and

temporal locality) – but which block do you replace?
 no choice for a direct-mapped cache
 randomly pick one of the ways to replace
 replace the way that was least-recently used (LRU)
 FIFO replacement (round-robin)
14
Writes
• When you write into a block, do you also update the

copy in L2?
 write-through: every write to L1  write to L2
 write-back: mark the block as dirty, when the block
gets replaced from L1, write it to L2
• Writeback coalesces multiple writes to an L1 block into one

L2 write
• Writethrough simplifies coherency protocols in a

multiprocessor system as the L2 always has a current
copy of data
15
Types of Cache Misses
• Compulsory misses: happens the first time a memory

word is accessed – the misses for an infinite cache
• Capacity misses: happens because the program touched

many other words before re-touching the same word – the
misses for a fully-associative cache
• Conflict misses: happens because two words map to the

same location in the cache – the misses generated while
moving from a fully-associative to a direct-mapped cache
16
Title
• Bullet
17

Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday

Uploaded by

Copyright:

Available Formats

Lecture 19: Cache Basics

 Assignment 7 due on Thursday

• Multiple parallel pipelines – each pipeline can have a different

• Instructions can now complete out of order – must make sure

Branch prediction Instr 1 T1

• Data and instructions are stored on DRAM chips – DRAM

• Hence, some data is stored on the processor in a structure

• Internet browsers also cache web pages – same concept

• As you go further, capacity and latency increase

• Why do caches work?

• No hierarchy: average access time for data = 300 cycles

• 32KB 1-cycle L1 cache that has a hit rate of 95%:

8 words: 3 index bits

Tag array Data array

Tag array Data array

Tag array Data array

Tag Way-1 Way-2

Tag array Data array

Tag Way-1 Way-2

Tag array Data array

• 32 KB 4-way set-associative data cache array with 32

• How many sets?

• How many index bits, offset bits, tag bits?

• How large is the tag array?

• On a write miss, you may either choose to bring the block

• On a read miss, you always bring the block in (spatial and

• When you write into a block, do you also update the

• Writeback coalesces multiple writes to an L1 block into one

• Writethrough simplifies coherency protocols in a

• Compulsory misses: happens the first time a memory

• Capacity misses: happens because the program touched

• Conflict misses: happens because two words map to the

You might also like