You are on page 1of 28

Computer Architecture

Cache Memory

Computer Architecture 1
Outline
 Cache Memory Introduction
 Memory Hierarchy
 Direct-Mapped Cache
 Set-Associative Cache
 Cache Performance

2
Introduction
 Memory access time is important to performance!
 Users want large memories with fast access times 
ideally unlimited fast memory

 principle of locality:
This principle states that programs access a relatively small
portion of their address space at any instant of time.

3
Levels of the Memory Hierarchy
CPU

Farther away from


Registers the CPU:
Lower Cost/Bit
Cache Higher Capacity
Level(s) Increased Access
Time/Latency
Main Memory Lower Throughput/
Bandwidth
Magnetic Disc

Optical Disk or Magnetic Tape

4
Cache
 Processor does all memory
Processor operations with cache.
words  Miss – If requested word is not in
cache, a block of words containing
Cache the requested word is brought to
small, fast cache, and then the processor
request is completed.
memory
 Hit – If the requested word is in
blocks cache, read or write operation is
performed directly in cache, without
accessing main memory.
Main memory  Block – minimum amount of data
large, inexpensive transferred between cache and
(slow) main memory.

55:035 Computer Architecture and Organization 5


Temporal & Spatial Locality
 There are two types of locality:
TEMPORAL LOCALITY
(locality in time) If an item is referenced, it will likely be referenced again
soon. Data is reused.
SPATIAL LOCALITY
(locality in space) If an item is referenced, items in neighboring addresses will
likely be referenced soon
 Most programs contain natural locality in structure. For example,
most programs contain loops in which the instructions and data
need to be accessed repeatedly. This is an example of temporal
locality.
 Instructions are usually accessed sequentially, so they contain a
high amount of spatial locality.
 Also, data access to elements in an array is another example of
spatial locality.

55:035 Computer Architecture and Organization 6


Placement and Replacement

 Placement of data refers to storage of data from


memory to cache.

 Replacement of data refers t how long the data


will reside inside memory

55:035 Computer Architecture and Organization 7


55:035 Computer Architecture and Organization 8
Three major placements
 Direct mapped
 Fully Associative
 Set Associative

9
Direct-Mapped Placement
 A block can only go into one place in the cache

 Determined by the block’s address (in memory space)

 The index number for block placement is usually given by some low-
order bits of block’s address.

 (block address) modulo (# of blocks in the cache)

10
Cache of 8 blocks Main Memory
0000xx
Cache 0001xx Two low order bits
define the byte in the
0010xx
Index Valid Tag Data word
0011xx
00 0100xx
01 0101xx
10 0110xx
11 0111xx
Q2: How do we find it?
1000xx
1001xx
1010xx
Use next 2 low order
Q1: Is it there? 1011xx
memory address bits
1100xx
– the index – to
Compare the cache 1101xx determine which
tag to the high order 2 1110xx cache block (i.e.,
memory address bits 1111xx modulo the number of
to tell if the memory blocks in the cache)
block is in the cache
(block address) modulo (# of blocks in the cache)
Direct-Mapped Cache
(block address) modulo (# of blocks in the
00000
00001
cache)

index (local address)


00010
00011
32-word word-addressable memory

00100
00101
00110
Cache of 8 blocks
00111

01000
Block size = 1 word
01001
01010

tag
01011
01100
01101
01110
00 000
01111 10 001
11 010
10000
01 011
10001 01 100
10010
00 101
10011
10 110
10100 11 111
10101
10110
10111

11000
11001
11010
11011 cache address:
11100
11101 tag
11110
11111
Main index
11 101 → memory address
memory

55:035 Computer Architecture and Organization 13


Direct-Mapped Cache
00000
00001
00010
00011
32-word word-addressable memory

00100

index (local address)


00101
00110 Cache of 4 blocks
00111
Block size = 2 word
01000
01001
01010
01011
01100

tag
01101
01110
01111
00 00
10000 11 01
10001 00 10
10010 10 11
10011
0 1
10100
10101
block offset
10110
10111
cache address:
11000
11001
tag
11010
11011
index
11100
Main block offset
11101
11110
11 10 1 → memory address
11111 memory

55:035 Computer Architecture and Organization 14


Fully-Associative Cache
00000 00
00001 00
00010 00 This block is needed
00011 00
00100 00
32-word word-addressable memory

00101 00 Cache of 8 blocks


00110 00
00111 00
Block size = 1 word
01000 00
01001 00

tag
01010 00
01011 00
01100 00
01101 00
00 000
01110 00
10 001
01111 00
11 010
01 011
10000 00
01 100
10001 00
00 101
10010 00
10 110 01010
10011 00
11 111
10100 00
10101 00
10110 00
10111 00
LRU block
11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00
11110 00 Main 11101 00 → memory address
11111 00
memory
byte offset
55:035 Computer Architecture and Organization 15
Two-Way Set-Associative Cache
00000 00
00001 00
00010 00 This block is needed
00011 00
00100 00
32-word word-addressable memory

00101 00 Cache of 8 blocks


00110 00
00111 00
Block size = 1 word
01000 00

index
tags
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00 000 | 011
00
01111 00 100 | 001
01
110 | 101
10
10000 00 010 | 111
11
10001 00
10010 00
10011 00
10100 00
10101 00 LRU block
10110 00
10111 00

11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00 index
11110 00
11111 00
Main 111 01 00 → memory address
memory
byte offset
55:035 Computer Architecture and Organization 16
Page Replacement Algorithms

• Want lowest page-fault rate.

• Evaluate algorithm by running it on a particular string of memory references


(reference string) and computing the number of page faults and page
replacements on that string.

In all our examples, we use a few recurring reference strings.

17
The FIFO Policy
 Treats
page frames allocated to a
process as a circular buffer:
 When the buffer is full, the oldest page is
replaced. Hence first-in, first-out:
A frequently used page is often the oldest,
so it will be repeatedly paged out by FIFO.
 Simple to implement:
 requires
only a pointer that circles through
the page frames of the process.
FIFO Page Replacement

A. Frank - P. Weisberg
Optimal Page Replacement
 The Optimal policy selects for
replacement the page that will not be used
for longest period of time.
 Impossible to implement (need to
know the future) but serves as a
standard to compare with the other
algorithms we shall study.

A. Frank - P. Weisberg
Optimal Page Replacement

A. Frank - P. Weisberg
The LRU Policy

 Replacesthe page that has not been


referenced for the longest time:
 By the principle of locality, this should be
the page least likely to be referenced in
the near future.
 performs nearly as well as the optimal
policy.
LRU Page Replacement

A. Frank - P. Weisberg
Comparison of OPT with LRU
 Example:
Comparison of FIFO with LRU

 LRU recognizes that pages 2 and 5 are referenced more


frequently than others but FIFO does not.
A. Frank - P. Weisberg
More on cache……….cont

Cache Performance

55:035 Computer Architecture and Organization 26


Performance

When caches were originally introduced, the typical system had a single
cache. More recently, the use of multiple caches has become the norm.

MULTILEVEL CACHES As logic density has increased, it has become possible


to have a cache on the same chip as the processor: the on-chip cache.
Compared with a cache reachable via an external bus, the on-chip cache
reduces the processor’s external bus activity and therefore speeds up execution
times and increases overall system performance. When the requested
instruction or data is found in the on-chip cache, the bus access is eliminated.

55:035 Computer Architecture and Organization 27


The inclusion of an on-chip cache leaves open the question of whether an
offchip, or external, cache is still desirable. Typically, the answer is yes, and
most contemporary designs include both on-chip and external caches. The
simplest such organization is known as a two-level cache, with the internal
cache designated as level 1 (L1) and the external cache designated as level 2
(L2).The reason for including an L2 cache is the following: If there is no L2
cache and the processor makes an access request for a memory location not
in the L1 cache, then the processor must access DRAM across the bus.

Due to the typically slow bus speed and slow memory access time, this
results in poor performance. On the other hand, if an L2 SRAM (static RAM)
cache is used, then frequently the missing information can be quickly
retrieved. If the SRAM is fast enough to match the bus speed, then the data
can be accessed using a zero-wait state transaction, the fastest type of bus
transfer.

Some high performance systems also include additional L3 cache which sits
between L2 and main memory . It has different arrangement but principle
same.
The cache is placed both physically closer and logically closer to the CPU
than the main memory.

55:035 Computer Architecture and Organization 28


VALID BIT / DIRTY BIT
When a program is first loaded into main
memory, the cache is cleared, and so while a
program is executing, a valid bit is needed to
indicate whether or not the slot holds a line that
belongs to the program being executed.
There is also a dirty bit that keeps track of
whether or not a line has been modified while it
is in the cache. A slot that is modified must be
written back to the main memory before the slot
is reused for another line.

You might also like