You are on page 1of 14

Name ______ Solutions _______

Part A: Memory Hierarchy (30 points)


[All addresses in this Part at Physical Addresses].

Suppose we are running the following code:

#define ARRAY_SIZE 4

for (int i = 0; i < ARRAY_SIZE; i++) {

S[i] = P[i] + Q[i]

The arrays S, P and Q have 4 entries each, and hold integer values (4 bytes at each entry).

The memory addresses for each are shown below:

Suppose this code translates to the following assembly instruction sequence.

// Initial values:
// Rp = 0x1A20, Rq = 0x1C60, Rs = 0x2BA0
// Ri = 4

loop: LD R1, (Rp) // R1  Mem[Rp]


LD R2, (Rq) // R2  Mem[Rq]
ADD R3, R1, R2 // R3 = R1 + R2
ST R3, (Rs) // Mem[Rs]  R3
ADDI Rp, 4 // Rp  Rp + 0x4
ADDI Rq, 4 // Rq  Rq + 0x4
ADDI Rs, 4 // Rs  Rs + 0x4
SUBI Ri, 1 // R4  R4 – 1
BNEZ Ri, loop // if R4 != 0, branch to loop

Page 1 of 14
Name ______ Solutions _______

This code produces the following 12 memory accesses to the cache hierarchy:

LD 0x1A20
LD 0x1C60
ST 0x2BA0
LD 0x1A24
LD 0x1C64
ST 0x2BA4
LD 0x1A28
LD 0x1C68
ST 0x2BA8
LD 0x1A2C
LD 0x1C6C
ST 0x2BAC

We will construct multiple memory hierarchies and analyze the miss rates for each.

Page 2 of 14
Name ______ Solutions _______

Question A-1
Consider the following cache hierarchy with a L1 connected to a L2, which is connected to
DRAM.

All cache lines are 16B.

The L1 is Direct Mapped, with 4 sets.

The L2 is 2-way Set Associative, with 4 sets.

The L2 is Non-Inclusive with L1.

The Non-Inclusion policy is as follows:

• All L1 allocations also result in L2 allocations


• All L1 evictions result in L2 allocations
• L2 evictions do not result in L1 evictions

L1 is write-through with respect to L2 → i.e., any writes to addresses in L1 propagate data to


L2 at the same time. Thus only the data in L2 is considered dirty.

L2 is a write-back cache with respect to DRAM → dirty data from L2 has to be written back
to DRAM when the line gets evicted.

Question A-1.1 (10 points)


On the next page, the initial state of the L1 and L2 are given. Dirty data is circled in L2.

Update the state of the L1 and L2 for each of the accesses. For each entry you can write the {tag,
index} for simplicity and circle the dirty data. The writeback column specifies the cache line
whose data is being written back to DRAM.

Page 3 of 14
Name ______ Solutions _______

Access L1 State After Access Non-Inclusive L2 State After Access Write-


{tag, index} {tag, index} back
Hit? Set 0 Set 1 Set 2 Set 3 Hit? Set 0 Set 1 Set 2 Set 3
(Yes (Yes/
/ No/ Way 0 Way 1 Way 0 Way 1 Way 0 Way 1 Way Way 1
No) NA) 0
0x110 0x1A5 0x11E 0x1BB 0x110 0x11E 0x1BB

LD 0x1A20 No 0x1A2 No 0x1A2

LD 0x1C60 No 0x1C6 No 0x1C6 0x11E

ST 0x2BA0 No 0x2BA No 0x2BA

LD 0x1A24 No 0x1A2 No 0x1A2

LD 0x1C64 No 0x1C6 No 0x1C6 0x2BA

ST 0x2BA4 No 0x2BA No 0x2BA

LD 0x1A28 No 0x1A2 No 0x1A2

LD 0x1C68 No 0x1C6 No 0x1C6 0x2BA

ST 0x2BA8 No 0x2BA No 0x2BA

LD 0x1A2C No 0x1A2 No 0x1A2

LD 0x1C6C No 0x1C6 No 0x1C6 0x2BA

ST 0x2BAC No 0x2BA No 0x2BA

Page 4 of 14
Name ______ Solutions _______

Question A-1.2 (2 points)


What is the L1 Miss Rate (L1 Misses/L1 Accesses) ?

12 / 12 = 100%

Question A-1.3 (2 points)


What is the L2 Miss Rate (L2 Misses/L2 Accesses) ?

12 / 12 = 100%

Question A-1.4 (1 points)


How many times does a write back from the L2 to the DRAM take place?

Page 5 of 14
Name ______ Solutions _______

Question A-2
We add a Victim Cache next to the L1 cache:

All evicted data from L1 goes to L2 as before, but a copy is also retained in the Victim Cache.

The victim cache has 4 entries, and is fully associative.

Upon a L1 miss, first the Victim Cache is checked, before going to L2.

IF THE DATA IS FOUND IN EITHER THE DIRECT MAPPED CACHE OR THE


VICTIM CACHE, IT IS CONSIDERED A L1 HIT.

The line is brought into L1 and the evicted line from L1 is added to the Victim Cache. Victim
Caches are Exclusive with respect to L1 – i.e., either L1 or the Victim Cache will have a cache
line, never both.

Question A-2.1 (2 points)


Suppose L1 has an overall hit rate of 90%. Of these hits, 70% hit in the direct mapped cache and
take 1-cycle, while 30% hit in the victim cache and take 4 cycles. A L1 miss takes 50 cycles to
bring the data into L1. What is the average memory access time?

Hit Time = 0.7*1 + 0.3*4 = 1.9 cycles

AMAT = Hit Time + MissRate * Miss Penalty = 1.9 + 0.1*50 = 6.9 cycles

Question A-2.2 (6 points)


Update the state of the L1 and Victim Caches for the same set of memory accesses.

Page 6 of 14
Name ______ Solutions _______

Access L1 State After Access Victim Cache State After Access


{tag, index} {tag, index}
Hit? Set 0 Set 1 Set 2 Set 3 Hit? Way 0 Way 1 Way 2 Way 3
(Yes/ (Yes/
No) No/
NA)
0x110 0x1A5 0x11E 0x1BB

LD 0x1A20 No 0x1A2 No 0x11E

LD 0x1C60 No 0x1C6 No 0x1A2

ST 0x2BA0 No 0x2BA No 0x1C6

LD 0x1A24 Yes 0x1A2 Yes X 0x2BA

LD 0x1C64 Yes 0x1C6 Yes 0x1A2 X

ST 0x2BA4 Yes 0x2BA Yes 0x1C6 X

LD 0x1A28 Yes 0x1A2 Yes X 0x2BA

LD 0x1C68 Yes 0x1C6 Yes 0x1A2 X

ST 0x2BA8 Yes 0x2BA Yes 0x1C6 X

LD 0x1A2C Yes 0x1A2 Yes X 0x2BA

LD 0x1C6C Yes 0x1C6 Yes 0x1A2 X

ST 0x2BAC Yes 0x2BA Yes 0x1C6 X

Page 7 of 14
Name ______ Solutions _______

Question A-2.3 (2 points)


What is the L1 Miss Rate (L1 Misses/L1 Accesses) ?

3 /12 = 25%

Question A-2.4 (5 points)


George P. Burdell is interested in designing a memory hierarchy optimized for this code. He looks
at the memory access pattern and the L1 Miss Rate and believes that he can reduce it even further
without increasing the size of any of the caches. His solution is to reduce the size of the Victim
Cache to two entries, and use the remaining two entries for some other structure connected to L1,
as shown below.

What structure do you think George has in mind? Briefly describe the function of this structure,
and a recipe to allocate it and access it.

Page 8 of 14
Name ______ Solutions _______

Prefetcher

There are essentially 3 cache lines: 0x1A2, 0x1C6 and 0x2BA that are being accessed all the time
one after the other.

In Part A-2.2 you would have noticed that at any time the L1 Direct Mapped Cache has one of the
lines, and the Victim cache has the other two.

The Victim Cache removes the conflict misses, but there are still compulsory misses the first
times these lines are accessed.

The idea is that once 0x1A2 is accessed, a prefetcher can bring in 0x1C6 and 0x2BA and place
them in the prefetcher. Upon any access, the cache should look up both the Victim Cache and the
prefetcher. In this case, there will only be one miss, and every other access will be a hit.

Page 9 of 14
Name ______ Solutions _______

Part B: Caches and Virtual Memory (37 points)


Suppose we have a virtually-indexed, physically-tagged cache with the following specs:
• Cache Line Size = 8 Bytes
• Number of sets = 8
• Number of ways per set = 2 (i.e., 2-way set associative cache).

The page size is 256 bytes, and the byte-addressed machine uses 16-bit virtual address and 16-bit
physical address. Do not worry about aliasing problems for this question.

Question B-1 (2 points)


What is the size of the cache?

8 x 2 x 8B = 128B

Question B-2 (8 points)


The following diagram shows the corresponding breakdown of a virtual address and a physical
address in this system (index, tag, and byte offset). Replace “A”, “B”, “C” and “D” with bit
indexes showing the size of each field. Note that tags should contain the minimum number of bits
required to provide the information needed to check whether the cache hits.

A: __5__ B: __2__ C: __7__ D: __5__

Page 10 of 14
Name ______ Solutions _______

Question B-3 (10 points)

Now, we test the cache by accessing the following virtual address. We provide the
corresponding binary number for the virtual address.

0x0151 (0000000101010001)
The TLB is Fully Associative with 4 ways and LRU replacement. The table below shows the
current TLB states and the LRU stack for the TLB.
The Cache also uses the LRU replacement policy for each set. The LRU way bit represents the
way that is least recently used. (Note that this bit should also be updated if necessary.)

Update the TLB LRU state and the Cache state after accessing 0x0151.
You may only fill in the elements in the table when a value changes from the previous table. Write
tags in hexadecimal numbers. If the memory access is a cache hit, write “Hit” in the appropriate
entry. Otherwise update the cache states as necessary.
Note that the cache uses physical tags.

TLB LRU Stack for TLB


Way VPN PPN Initial State New State
0 0x00 0x0A MRU way 3 1
1 0x01 0x1A LRU+2 way 0 3
2 0x02 0x2A LRU+1 way 1 0
3 0x03 0x3A LRU way 2 2

Cache Tag Array


Initial State New State
LRU Tags Tags LRU Tags Tags
idx idx
Way (Way 0) (Way 1) Way (Way 0) (Way 1)
0 0 inv 0x40 0
1 0 inv inv 1
2 0 0x8B 0x14 2 1 0x69
3 0 inv inv 3
4 0 inv inv 4
5 1 0x3F 0xAA 5
6 0 inv inv 6
7 1 0xC3 0x1F 7

Page 11 of 14
Name ______ Solutions _______

Suppose we label the 16 cache lines from A to P, as shown in the figure below for our 2-way set
associative cache. Recall that our line size is 8 bytes.

Cache Configuration

Question B-4 (6 points)


Suppose the cache is virtually indexed. A program wants to read from a virtual address 0x6E,
and the physical address of this virtual address 0x6E is unknown (could be arbitrary). Enumerate
all lines (A,B,C,D,…) that can possibly hold the content of the virtual address 0x6E, for each
given page size in the next table.

Page Size Lines(s) to which VA 0x6E can be mapped to

16 bytes F, N
32 bytes F, N

64 bytes F, N

Page 12 of 14
Name ______ Solutions _______

Question B-5 (6 points)


Now, suppose the same cache is physically indexed. A program wants to read from the same
virtual address 0x6E, and the physical address of this virtual address 0x6E is unknown (could be
arbitrary). Enumerate all lines (A,B,C,D,…) that can possibly hold the content of the virtual
address 0x6E, for each given page size in the next table.

Page Size Line(s) to which VA 0x6E can be mapped to

16 bytes B, D, F, H, J, L, N, P
32 bytes B, F, J, N

64 bytes F, N

Question B-6 (3 points)

We have two possible organizations for our Page Tables: Linear and Hierarchical as shown
below:

Suppose each Page Table Entry is 2 bytes (16-bits) wide.


Virtual and Physical addresses are 16 bits.
Suppose Page Sizes are 64 Bytes.

Page 13 of 14
Name ______ Solutions _______

For the Hierarchical Page Table, suppose we want the L1 page table to fit exactly within one
page. What will be the size of the Linear and Hierarchical L2 Page Tables?

Size of Linear Page Table:

Offset = 6 bits
Linear PT Index = 10 bits
Linear Page Table Size = 210 x 2B = 2KB

Size of Hierarchical Page Table L2:

Offset = 6 bits
L1 PT Size = 64B => L1 PT Index = log(64B/2B) = 5 bits

=> L2 PT Index = 16 – 6 - 5 = 5 bits

L2 PT Size = 25 x 2B = 64B

Question B-7 (2 points)


One of the advantages of having Virtual Addresses, is that the same program can also run on a
different machine with smaller a physical memory. Suppose for correct (though very slow)
operation of the program, its complete linear page table + one page for data + one page for system
(OS) should fit in physical memory. Suppose DRAM sizes are always in powers of 2. What is the
minimum sized DRAM that we would need if our page size is 64B, Virtual Addresses are 16-bits,
and PTEs are 16-bits? What would be the corresponding size of the physical address?

Linear PT size (2KB) + one data page (64B) + one system page (64B) = 2176B

Need DRAM of size 4096 B (power of) 2

Page 14 of 14

You might also like