Computer Architecture: Cache Memory

Computer Architecture
Cache Memory
Computer Architecture 1
Outline
 Cache Memory Introduction
 Memory Hierarchy
 Direct-Mapped Cache
 Set-Associative Cache
 Cache Performance
2
Introduction
 Memory access time is important to performance!
 Users want large memories with fast access times 
ideally unlimited fast memory
 principle of locality:
This principle states that programs access a relatively small
portion of their address space at any instant of time.
3
Levels of the Memory Hierarchy
CPU
Farther away from

Registers the CPU:
Lower Cost/Bit
Cache Higher Capacity
Level(s) Increased Access
Time/Latency
Main Memory Lower Throughput/
Bandwidth
Magnetic Disc
Optical Disk or Magnetic Tape
4
Cache
 Processor does all memory
Processor operations with cache.
words  Miss – If requested word is not in
cache, a block of words containing
Cache the requested word is brought to
small, fast cache, and then the processor
request is completed.
memory
 Hit – If the requested word is in
blocks cache, read or write operation is
performed directly in cache, without
accessing main memory.
Main memory  Block – minimum amount of data
large, inexpensive transferred between cache and
(slow) main memory.
55:035 Computer Architecture and Organization 5

Temporal & Spatial Locality
 There are two types of locality:
TEMPORAL LOCALITY
(locality in time) If an item is referenced, it will likely be referenced again
soon. Data is reused.
SPATIAL LOCALITY
(locality in space) If an item is referenced, items in neighboring addresses will
likely be referenced soon
 Most programs contain natural locality in structure. For example,
most programs contain loops in which the instructions and data
need to be accessed repeatedly. This is an example of temporal
locality.
 Instructions are usually accessed sequentially, so they contain a
high amount of spatial locality.
 Also, data access to elements in an array is another example of
spatial locality.

Placement and Replacement
 Placement of data refers to storage of data from

memory to cache.
 Replacement of data refers t how long the data

will reside inside memory

Three major placements
 Direct mapped
 Fully Associative
 Set Associative
9
Direct-Mapped Placement
 A block can only go into one place in the cache
 Determined by the block’s address (in memory space)
 The index number for block placement is usually given by some low-
order bits of block’s address.
 (block address) modulo (# of blocks in the cache)
10
Cache of 8 blocks Main Memory
0000xx
Cache 0001xx Two low order bits
define the byte in the
0010xx
Index Valid Tag Data word
0011xx
00 0100xx
01 0101xx
10 0110xx
11 0111xx
Q2: How do we find it?
1000xx
1001xx
1010xx
Use next 2 low order
Q1: Is it there? 1011xx
memory address bits
1100xx
– the index – to
Compare the cache 1101xx determine which
tag to the high order 2 1110xx cache block (i.e.,
memory address bits 1111xx modulo the number of
to tell if the memory blocks in the cache)
block is in the cache
(block address) modulo (# of blocks in the cache)
Direct-Mapped Cache
(block address) modulo (# of blocks in the
00000
00001
cache)
index (local address)

00010
00011
32-word word-addressable memory
00100
00101
00110
Cache of 8 blocks
00111
01000
Block size = 1 word
01001
01010
tag
01011
01100
01101
01110
00 000
01111 10 001
11 010
10000
01 011
10001 01 100
10010
00 101
10011
10 110
10100 11 111
10101
10110
10111
11000
11001
11010
11011 cache address:
11100
11101 tag
11110
11111
Main index
11 101 → memory address
memory

Direct-Mapped Cache
00000
00001
00010
00011
00100
index (local address)

00101
00110 Cache of 4 blocks
00111
Block size = 2 word
01000
01001
01010
01011
01100
tag
01101
01110
01111
00 00
10000 11 01
10001 00 10
10010 10 11
10011
0 1
10100
10101
block offset
10110
10111
cache address:
11000
11001
tag
11010
11011
index
11100
Main block offset
11101
11110
11 10 1 → memory address
11111 memory

Fully-Associative Cache
00000 00
00001 00
00010 00 This block is needed
00011 00
00100 00
00101 00 Cache of 8 blocks

00110 00
00111 00
Block size = 1 word
01000 00
01001 00
tag
01010 00
01011 00
01100 00
01101 00
00 000
01110 00
10 001
01111 00
11 010
01 011
10000 00
01 100
10001 00
00 101
10010 00
10 110 01010
10011 00
11 111
10100 00
10101 00
10110 00
10111 00
LRU block
11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00
11110 00 Main 11101 00 → memory address
11111 00
memory
byte offset
Two-Way Set-Associative Cache
00000 00
00001 00
00010 00 This block is needed
00011 00
00100 00
00101 00 Cache of 8 blocks

00110 00
00111 00
Block size = 1 word
01000 00
index
tags
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00 000 | 011
00
01111 00 100 | 001
01
110 | 101
10
10000 00 010 | 111
11
10001 00
10010 00
10011 00
10100 00
10101 00 LRU block
10110 00
10111 00
11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00 index
11110 00
11111 00
Main 111 01 00 → memory address
memory
byte offset
Page Replacement Algorithms
• Want lowest page-fault rate.
• Evaluate algorithm by running it on a particular string of memory references

(reference string) and computing the number of page faults and page
replacements on that string.
In all our examples, we use a few recurring reference strings.
17
The FIFO Policy
 Treats
page frames allocated to a
process as a circular buffer:
 When the buffer is full, the oldest page is
replaced. Hence first-in, first-out:
A frequently used page is often the oldest,
so it will be repeatedly paged out by FIFO.
 Simple to implement:
 requires
only a pointer that circles through
the page frames of the process.
FIFO Page Replacement
A. Frank - P. Weisberg
Optimal Page Replacement
 The Optimal policy selects for
replacement the page that will not be used
for longest period of time.
 Impossible to implement (need to
know the future) but serves as a
standard to compare with the other
algorithms we shall study.
Optimal Page Replacement
The LRU Policy
 Replacesthe page that has not been

referenced for the longest time:
 By the principle of locality, this should be
the page least likely to be referenced in
the near future.
 performs nearly as well as the optimal
policy.
LRU Page Replacement
Comparison of OPT with LRU
 Example:
Comparison of FIFO with LRU
 LRU recognizes that pages 2 and 5 are referenced more

frequently than others but FIFO does not.
More on cache……….cont
Cache Performance

Performance
When caches were originally introduced, the typical system had a single
cache. More recently, the use of multiple caches has become the norm.
MULTILEVEL CACHES As logic density has increased, it has become possible

to have a cache on the same chip as the processor: the on-chip cache.
Compared with a cache reachable via an external bus, the on-chip cache
reduces the processor’s external bus activity and therefore speeds up execution
times and increases overall system performance. When the requested
instruction or data is found in the on-chip cache, the bus access is eliminated.

The inclusion of an on-chip cache leaves open the question of whether an
offchip, or external, cache is still desirable. Typically, the answer is yes, and
most contemporary designs include both on-chip and external caches. The
simplest such organization is known as a two-level cache, with the internal
cache designated as level 1 (L1) and the external cache designated as level 2
(L2).The reason for including an L2 cache is the following: If there is no L2
cache and the processor makes an access request for a memory location not
in the L1 cache, then the processor must access DRAM across the bus.
Due to the typically slow bus speed and slow memory access time, this
results in poor performance. On the other hand, if an L2 SRAM (static RAM)
cache is used, then frequently the missing information can be quickly
retrieved. If the SRAM is fast enough to match the bus speed, then the data
can be accessed using a zero-wait state transaction, the fastest type of bus
transfer.
Some high performance systems also include additional L3 cache which sits
between L2 and main memory . It has different arrangement but principle
same.
The cache is placed both physically closer and logically closer to the CPU
than the main memory.

VALID BIT / DIRTY BIT
When a program is first loaded into main
memory, the cache is cleared, and so while a
program is executing, a valid bit is needed to
indicate whether or not the slot holds a line that
belongs to the program being executed.
There is also a dirty bit that keeps track of
whether or not a line has been modified while it
is in the cache. A slot that is modified must be
written back to the main memory before the slot
is reused for another line.

Computer Architecture: Cache Memory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture: Cache Memory

Uploaded by

Copyright:

Available Formats

Computer Architecture

Farther away from

Optical Disk or Magnetic Tape

55:035 Computer Architecture and Organization 5

55:035 Computer Architecture and Organization 6

 Placement of data refers to storage of data from

 Replacement of data refers t how long the data

55:035 Computer Architecture and Organization 7

 Determined by the block’s address (in memory space)

 (block address) modulo (# of blocks in the cache)

index (local address)

55:035 Computer Architecture and Organization 13

index (local address)

55:035 Computer Architecture and Organization 14

00101 00 Cache of 8 blocks

00101 00 Cache of 8 blocks

• Want lowest page-fault rate.

• Evaluate algorithm by running it on a particular string of memory references

In all our examples, we use a few recurring reference strings.

 Replacesthe page that has not been

 LRU recognizes that pages 2 and 5 are referenced more

55:035 Computer Architecture and Organization 26

MULTILEVEL CACHES As logic density has increased, it has become possible

55:035 Computer Architecture and Organization 27

55:035 Computer Architecture and Organization 28

You might also like