0% found this document useful (0 votes)

20 views39 pages

CMP3010L09 MemoryII

The document covers the concepts of memory hierarchy, focusing on cache types, performance metrics, and optimization techniques. It discusses fully associative and set associative caches, cache performance measurements, and the differences between cache and virtual memory. Additionally, it explores software optimization strategies for improving cache efficiency, particularly in matrix operations.

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Average Memory Access Time,
Associative Cache,
Blocking Optimization,
LRU Policy,
Spatial Locality,
Cache Simulation,
Cache Block Size,
Cache Metrics,
Multilevel Caches,
Miss Rate

0% found this document useful (0 votes)

20 views39 pages

CMP3010L09 MemoryII

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Average Memory Access Time,
Associative Cache,
Blocking Optimization,
LRU Policy,
Spatial Locality,
Cache Simulation,
Cache Block Size,
Cache Metrics,
Multilevel Caches,
Miss Rate

CMP3010: Computer Architecture

L09: Memory Hierarchy ||

Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Review
• Full Associative Cache
• Set Associative Cache
• Cache performance
• Cache vs virtual memory
• Software optimization based on caching
Review: memory hierarchy

3
Review: memory hierarchy

4
Review: memory hierarchy

Principle of locality
States that programs access a relatively small portion of their
address space at any instant of time.

Temporal locality Spatial locality

If an item is referenced, If an item is referenced, items

it will tend to be referenced again whose addresses are close by will tend
soon. to be referenced soon.

‹#›
Review: Terminologies
•Hit: data appears in some block in the upper level
–Hit Rate: the fraction of memory access found in the upper level
–Hit Time: Time to access the upper level which consists of
cache access time + Time to determine hit/miss

•Miss: data needs to be retrieved from a block in the lower level

–Miss Rate = 1 - (Hit Rate)
–Miss Penalty = Time to replace a block in the upper level + Time to deliver the
block to the processor

•Hit Time << Miss Penalty

6
Review: The Basics of Caches
• How do we know if a data item is in the cache?
• How do we find it?

7
Review: Direct Mapped Cache

8
Review: Read and Write Policies
• Two write options when the data block is in the memory :
• Write Through: write to cache and memory at the same time.
• Isn’t memory too slow for this?

• Write Back: write to cache only. Write the cache block to memory
when that cache block is being replaced on a cache miss.
• Need a “dirty” bit for each cache block
• Control can be complex

9
Review: Write Miss Policies
• Write allocate (also called fetch on write): data at the missed-
write location is loaded to cache, followed by a write-hit
operation. In this approach, write misses are like read misses.

• No-write allocate (also called write-no-allocate or write

around): data at the missed-write location is not loaded to
cache, and is written directly to the backing store. In this
approach, data is loaded into the cache on read misses only.

‹#›
What about other mappings ?

‹#›
Flexible placement of blocks: Associativity
1111111111 2222222222 33
Block Number 0 1 2 3 4 5 6 7 8 9 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct

Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
12
Fully Associative
• Fully Associative Cache -- push the set associative idea to its limit!
• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 B blocks, we need N 27-bit comparators
• By definition: Conflict Miss = 0 for a fully associative cache
3 4 0
1 Cache Tag (27 bits long) Byte Select
Ex: 0x01

Cache Tag Valid Bit Cache Data

X Byte 31 Byte 1 Byte 0

: :
X Byte 63 Byte 33 Byte 32
X
X

X
: : :
13
Flexible placement of blocks: Associativity

14
A Four-way Set Associative Cache
• N-way set associative: N
entries for each Cache Index
• N direct mapped caches
operates in parallel
• Example: Four-way set
associative cache
• Cache Index selects a “set”
from the cache
• The four tags in the set are
compared in parallel
• Data is selected based on the
tag result

15
Replacement Policy
In an associative cache, which block from a set should be
evicted when the set becomes full?

• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)

• First-In, First-Out (FIFO) a.k.a. Round-Robin

• used in highly associative caches

Replacement only happens on misses

16
Quiz
Assume a 16 Kbyte cache that holds both instructions and data.
Additional specs for the 16 Kbyte cache include:
- Each block will hold 32 bytes of data
- The cache would be 4-way set associative
- Physical addresses are 32 bits

Q1: How many blocks would be in this cache?

Q2: How many bits of tag are stored with each block entry?

17
Cache Performance

‹#›
Measuring Cache Performance
Impact of cache miss on Performance

19
Example

20
Example: Solution

21
Example Solution

22
Improving Cache Performance

Average memory access time(AMAT) =

Hit time + Miss rate x Miss penalty

To improve performance:
• reduce the hit time
• reduce the miss rate
• reduce the miss penalty

23
Sources of Cache Misses
• Compulsory (cold start, first reference): first access to a block
• Misses that would occur even with infinite cache
• “Cold” fact of life: not a whole lot you can do about it

• Conflict (collision):
• Multiple memory locations mapped to the same cache location
• Solution 1: increase cache size
• Solution 2: increase associativity

• Capacity:
• Cache cannot contain all blocks accessed by the program
• Solution: increase cache size
Reducing Miss Penalty Using Multilevel caches
• Use smaller L1 if there is also L2

• Trade increased L1 miss rate for reduced L1 hit time

and reduced L1 miss penalty

• Reduces average access time

CPU L1 L2 DRAM

25
Performance of Multilevel Caches

26
Effect of Cache Parameters on Performance
• Larger cache size
+ reduces capacity and conflict misses
- hit time will increase

• Higher associativity
+ reduces conflict misses
- may increase hit time

• Larger block size

+ reduces compulsory misses and reload
- increases conflict misses and miss penalty

27
Quiz
• Suppose a processor executes at
• Clock Rate = 1 GHz (1 ns per cycle), Ideal (no misses) CPI = 1.5
• 40% arith/logic, 40% ld/st, 20% control
• Suppose that 5% of memory operations (involving data) get 100 cycle miss penalty
• Suppose that 2% of instructions get same miss penalty

Determine how much faster a processor with a perfect cache that never missed
would run?

28
Is Virtual Memory same as
Caching ?

‹#›
‹#›
Virtual Memory Vs Cache Memory

Virtual Memory Cache Memory

Increases the capacity of main memory. Increase the accessing speed of CPU.

Virtual memory is not a memory unit, its a

Cache memory is a hardware.
technique.
Operating System manages the Virtual
Hardware manages the cache memory.
memory.
The size of virtual memory could be greater The size of cache memory is less than the
than main memory main memory
‹#›
How can we benefit from cache?

‹#›
Software Optimization via Blocking
• When dealing with arrays, we can get good performance from the
memory system if we store the array in memory so that accesses to
the array are sequential in memory. What about the matrix ?

• How Matrix is stored ?

• Row Major (row by row)
• Column Major (column by column)

• A size matrix of 512x512 needs = 1MB, much bigger than level-1

cache. It doesn’t fit in the memory ?!
‹#›
Software Optimization via Blocking
• How Matrix Multiplication is done?

for (int j = 0; j < n; ++j)

{
double cij = C[i+j*n]; /* cij = C[i][j] */
for( int k = 0; k < n; k++ )
cij += A[i+k*n] * B[k+j*n]; /* cij += A[i][k]*B[k][j] */
C[i+j*n] = cij; /* C[i][j] = cij */
}

‹#›
Software Optimization via Blocking

Do we need to store all three

White: not accessed
matrices ? Isn’t that increasing
Light grey: old access
cache misses due to
Dark grey: new access
replacement? ‹#›
Software Optimization via Blocking

Blocked DGEMM ‹#›

Software Optimization via Blocking

Blocked DGEMM ‹#›

Software Optimization via Blocking

‹#›
Thank you

‹#›

Cache
No ratings yet
Cache
34 pages
Understanding Cache Memory in Computer Architecture
No ratings yet
Understanding Cache Memory in Computer Architecture
20 pages
Memory Hierarchy Design Explained
No ratings yet
Memory Hierarchy Design Explained
134 pages
Cache Performance and Write Strategies
No ratings yet
Cache Performance and Write Strategies
30 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Hierarchy and Cache Performance
No ratings yet
Memory Hierarchy and Cache Performance
31 pages
Understanding Cache Memory Basics
No ratings yet
Understanding Cache Memory Basics
34 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
20 pages
Memory Hierarchy in Computer Architecture
No ratings yet
Memory Hierarchy in Computer Architecture
21 pages
Memory Caches in Computer Architecture
No ratings yet
Memory Caches in Computer Architecture
38 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
I/O Operations and Cache Memory Overview
No ratings yet
I/O Operations and Cache Memory Overview
158 pages
Understanding Cache and Memory Hierarchy
No ratings yet
Understanding Cache and Memory Hierarchy
13 pages
Understanding Cache Memory Locality
No ratings yet
Understanding Cache Memory Locality
36 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Memory Hierarchy Design in Computer Architecture
No ratings yet
Memory Hierarchy Design in Computer Architecture
33 pages
Understanding Cache Design and Memory Hierarchy
No ratings yet
Understanding Cache Design and Memory Hierarchy
59 pages
Understanding Cache Memory Architecture
No ratings yet
Understanding Cache Memory Architecture
36 pages
Enhancing Cache Performance in Computing
No ratings yet
Enhancing Cache Performance in Computing
24 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Understanding Cache Memory and Locality
No ratings yet
Understanding Cache Memory and Locality
34 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
72 pages
Unit-5 Eleven Advanced Optimizations For Cache Performance
No ratings yet
Unit-5 Eleven Advanced Optimizations For Cache Performance
14 pages
Understanding CPU Caches in Computer Architecture
No ratings yet
Understanding CPU Caches in Computer Architecture
61 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Understanding Cache Memory Basics
No ratings yet
Understanding Cache Memory Basics
27 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Memory Hierarchy and Cache Performance
No ratings yet
Memory Hierarchy and Cache Performance
49 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
Understanding Cache Memory Organization
No ratings yet
Understanding Cache Memory Organization
19 pages
Cache Mapping
No ratings yet
Cache Mapping
23 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
Understanding Memory Hierarchy and Caches
No ratings yet
Understanding Memory Hierarchy and Caches
39 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Memory Hierarchy and Cache Strategies
No ratings yet
Memory Hierarchy and Cache Strategies
10 pages
Memory Hierarchies and Cache Performance
No ratings yet
Memory Hierarchies and Cache Performance
7 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Memory 2
No ratings yet
Memory 2
31 pages
CPU Caching Concepts Explained
No ratings yet
CPU Caching Concepts Explained
14 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Caches and Memory Systems Overview
No ratings yet
Caches and Memory Systems Overview
49 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
51 pages
CMP3010L03 Pipelining
No ratings yet
CMP3010L03 Pipelining
42 pages
Gaussian Mixture Model Overview
No ratings yet
Gaussian Mixture Model Overview
12 pages
Density Estimation in Pattern Classification
No ratings yet
Density Estimation in Pattern Classification
29 pages
02 Training Patterns
No ratings yet
02 Training Patterns
18 pages
Sadfcaosl
No ratings yet
Sadfcaosl
66 pages
Learn Kubernetes in A Month of Lunches 1st Edition Elton Stoneman
No ratings yet
Learn Kubernetes in A Month of Lunches 1st Edition Elton Stoneman
431 pages
Banu
No ratings yet
Banu
2 pages
SAP BI License Terms Overview
No ratings yet
SAP BI License Terms Overview
91 pages
Internship Report Final Sem 8
No ratings yet
Internship Report Final Sem 8
84 pages
DM - Sm2263en - SM2263XT
100% (1)
DM - Sm2263en - SM2263XT
2 pages
Cit143 Noun Gist Summary 08138907163
No ratings yet
Cit143 Noun Gist Summary 08138907163
24 pages
Isilon Dual Boot Drive Recovery Guide
No ratings yet
Isilon Dual Boot Drive Recovery Guide
18 pages
Arena106 Setup
No ratings yet
Arena106 Setup
9 pages
Chain of Logic for Legal Reasoning
No ratings yet
Chain of Logic for Legal Reasoning
12 pages
Https Docs Frappe Io Education Introduction
No ratings yet
Https Docs Frappe Io Education Introduction
3 pages
5G Security: Challenges and Solutions
No ratings yet
5G Security: Challenges and Solutions
6 pages
Top 10 Best PCB Design Tools
No ratings yet
Top 10 Best PCB Design Tools
15 pages
Downtime-Aware O-RAN VNF Deployment Strategy For Optimized Self-Healing in The O-Cloud
No ratings yet
Downtime-Aware O-RAN VNF Deployment Strategy For Optimized Self-Healing in The O-Cloud
6 pages
Lightweight Threads Explained
No ratings yet
Lightweight Threads Explained
12 pages
3-Day Workshop Schedule
No ratings yet
3-Day Workshop Schedule
3 pages
Visual Basic and MS Access Project Report in Electricity Billing System
85% (60)
Visual Basic and MS Access Project Report in Electricity Billing System
107 pages
Stakeholder Tracker Overview
No ratings yet
Stakeholder Tracker Overview
6 pages
ADR Setup for Film Students
No ratings yet
ADR Setup for Film Students
3 pages
ConcurrentHashMap - Brian Goetz
No ratings yet
ConcurrentHashMap - Brian Goetz
9 pages
1FD S4hana2023-Fps02 BPD en BR
No ratings yet
1FD S4hana2023-Fps02 BPD en BR
9 pages
ICAS 2017 Digital Technologies Test
100% (1)
ICAS 2017 Digital Technologies Test
17 pages
SonicOS 7.1 Device Settings Guide
No ratings yet
SonicOS 7.1 Device Settings Guide
105 pages
LC-3 Instruction Set Overview
No ratings yet
LC-3 Instruction Set Overview
43 pages
ABIA AirScale Capacity Datasheet
100% (1)
ABIA AirScale Capacity Datasheet
4 pages
SOLIDWORKS 2026 Enhancements Ebook Whats New
No ratings yet
SOLIDWORKS 2026 Enhancements Ebook Whats New
41 pages
Bernard Sapida: Aspiring Software Engineer
No ratings yet
Bernard Sapida: Aspiring Software Engineer
5 pages
Click To Join Telegram Channel: Bank Logs TO Bitcoin Tutorial Step - BY - Step
No ratings yet
Click To Join Telegram Channel: Bank Logs TO Bitcoin Tutorial Step - BY - Step
26 pages
Rayat Shikshan Sanstha ICT Reports 2022-23
No ratings yet
Rayat Shikshan Sanstha ICT Reports 2022-23
83 pages
Securview DX Workstation: Comprehensive Breast Imaging Review
No ratings yet
Securview DX Workstation: Comprehensive Breast Imaging Review
2 pages

CMP3010L09 MemoryII

Uploaded by

CMP3010L09 MemoryII

Uploaded by

CMP3010: Computer Architecture

L09: Memory Hierarchy ||

Temporal locality Spatial locality

If an item is referenced, If an item is referenced, items

•Miss: data needs to be retrieved from a block in the lower level

•Hit Time << Miss Penalty

• No-write allocate (also called write-no-allocate or write

Set Number 0 1 2 3 01234567

Fully (2-way) Set Direct

Cache Tag Valid Bit Cache Data

• First-In, First-Out (FIFO) a.k.a. Round-Robin

Replacement only happens on misses

Q1: How many blocks would be in this cache?

Average memory access time(AMAT) =

• Trade increased L1 miss rate for reduced L1 hit time

• Reduces average access time

• Larger block size

Virtual Memory Cache Memory

Virtual memory is not a memory unit, its a

• How Matrix is stored ?

• A size matrix of 512x512 needs = 1MB, much bigger than level-1

for (int j = 0; j < n; ++j)

Do we need to store all three

Blocked DGEMM ‹#›

Blocked DGEMM ‹#›

You might also like