You are on page 1of 6

EG211: Computer architecture

Assignment 3: Cache: Marks 45


Deadline: Submit by July 3 2022 , 11:59pm

Start early. No deadline extensions will be given.

General instructions on what to submit:

• Can be done in groups of 2 or 3 students

• Submit

◦ the code

◦ A report showing snapshots

• There will be a demo of the assignment

• Submit individual files and not compressed files

• All the files you submit should have the following format: <roll_numbers>_filename

• Only one person in the group will submit the assignment

• If you are getting an error and unable to get the result, submit snapshot of error in the report

What should the report contain?

• Output graphs/tables

• Observations from the experiments a,b,c

Design:

This assignment should be done in python using nmigen library. The process of installation and
the syntax of nmigen can be found here.

Sample code for adding two numbers can be found here.

For the caches assignment:

Your nmigen ports are:

Input signals: address.


Output signals: hit (hit = 1 if it is a cache hit else it will be 0).

Other than the address if you want to give your custom inputs, like no.of index bits, no.of tag
bits, etc. you can pass them when you are instantiating the class and receive them in the
constructor.

Cache can be implemented as an Array of Signals.(Refer to nmigen documentation). Each


Signal of the Array might contain tag-bits, valid bits. No need of storing any data.

Addresses should be read from the given trace files and should be given as input from if
__name__==”__main__” block inside the process function.(refer to the example code).

A similar design approach can be used for branch prediction Question also.

Question 1 –Caches - [45 marks]


Problem statement:

a. Design a 4-way set associative cache of size 512kilobytes. Block size: 4 bytes. Assume a 32
bit address. Figure out how many cache lines you need. (15 marks)

You need not implement a main memory, which means you do not need to implement data in
the cache. (Implementing a 2^32 memory will be impossible!)

[ Hint: For reporting hit/miss, all you need to check is a tag match, and a valid bit, so there is no
need to have data in the cache ]

Output: You need to report hit/miss rates of the cache for the input memory trace files (5 traces)
provided in the next page

b. Increase the cache size to 2048kB and repeat the experiment. Note the change in hit/miss
rates (10 marks)

Grading for a + b:

• The code carries 10 marks.


• The report carries 5 marks.
• The demo/viva carries 5 marks.
c. Keeping the cache size at 512kB, vary the block size from 1 byte, 4 bytes, 8 bytes, 16 bytes.
Note that the number of cache lines will reduce, if you increase the block size keeping the cache
size same. Repeat the experiment for all the trace files, and note the hit/miss rates

d. Vary the associativity from 1-way to 16-way, for a fixed cache size of 512kB, and plot the
variation of hit rates.

Grading for c+d:

• The code carries 15 marks.


• The report carries 5 marks.
• The demo/viva carries 5 marks.

Inputs to your code:

Use the memory trace files at this location

https://cseweb.ucsd.edu/classes/fa07/cse240a/proj1-traces.tar.gz as input.

The trace file will specify all the memory accesses/addresses that occur in a certain program.
Each line in the trace file will specify a new memory reference and has the following fields:

• Access Type: A single character indicating whether the access is a load ('l') or a store ('s').
You can ignore this field. For reporting hit/miss, it does not matter whether it is a Load/Store

• Address: A 32-bit integer (in unsigned hexadecimal format) specifying the memory address
that is being accessed. This is the only field you need.

• Instructions since last memory access: Indicates the number of instructions of any type that
executed between since the last memory access (i.e. the one on the previous line in the trace).
For example, if the 5th and 10th instructions in the program's execution are loads, and there are
no memory operations between them, then the trace line for with the second load has "4" for
this field. You can safely ignore this field

Output from the code:

• Hit rates or miss rates for all the 5 traces for the 3 different experiments can be reported in
an excel or in the form of a table/graph in the report.

• Submit the graphs/table/excel with your observations in the report


Question 2: Branch Predictors [45 marks]
Introduction

In this assignment, you will explore the effectiveness of branch prediction. A binary
instrumentation tool is used to generate a trace of branches and their outcomes. Your task will
be to use this representative trace to evaluate the effectiveness of a few branch prediction
schemes seen in the class. To do this, you'll write a program that reads in the trace and
simulates different branch prediction schemes and different predictor sizes.

Trace and Trace Format

The trace given to you is a subset of 16 million conditional branches. As unconditional branches
are always taken, they are excluded from the trace. Each line of the trace file has two fields.
Below are the first four lines of the trace file:

3086629576 T

3086629604 T

3086629599 N

3086629604 T

The first field is the address of the branch instruction (in decimal). You will need to convert these
addresses to a 32-bit binary number to proceed with this assignment. The second field is the
character "T" or "N" for branch taken or not taken. The trace file can be downloaded from here.

Question 2a [15 marks]

We are going to look at the simple static branch prediction policies of "always predict taken" and
"always predict not taken". Write a program to read in the trace and calculate the mis-prediction
rate (that is, the percentage of conditional branches that were mis-predicted) for these two
simple schemes.

The following are needed for the report:

• Results and output screenshots


• Which of these two policies is more accurate?

Grading:

• The code carries 5 marks.


• The report carries 5 marks.
• The demo/viva carries 5 marks.
Question 2b [30 marks]

The simplest dynamic branch direction predictor is an array of 2𝑛 two-bit counters. It is advised
to follow the notation discussed in class: strongly taken (00), weakly taken (01), weakly not taken
(10), and strongly not taken (11).

Prediction: To make a prediction, the predictor selects a counter from the table using the lower-
order 𝑛 bits of the instruction's address (its program counter value). The direction prediction is
made based on the value of the counter.

Training: After each branch (correctly predicted or not), the hardware increments or decrements
the corresponding counter to bias the counter toward the actual branch outcome (the outcome
given in the trace file).

Initialization: Although initialization doesn't affect the results in any significant way, your code
should initialize the predictor to "strongly taken".

Your task is to analyze the impact of predictor size on prediction accuracy. Write a program to
simulate the two-bit predictor. Use your program to simulate varying sizes of the predictor.
Generate data for predictors with 22 , 23 , 24 , 25 ... 220 counters (the address is a 32-bit binary
number). These sizes correspond to predictor index sizes of 2 bits, 3 bits, 4 bits, 5 bits, ... 20
bits. Generate a line plot of the data using MS Excel or some other graphing program. On the y-
axis, plot "percentage of branches mis-predicted" (a metric in which smaller is better). On the x-
axis plot the log of the predictor size (basically, the number of index bits).

The following are needed for the report:

• Results and output screenshots


• The line plot described above
• What is the best mis-prediction rate obtained in the analysis carried out?
• How large must the predictor be to reduce the number of mis-predictions by
approximately half as compared to the better of "always taken" and "always not taken"?
Give the predictor size both in terms of number of counters as well as bits.
• At what point does the performance of the predictor pretty much max out? That is, how
large does the predictor need to be before it basically captures almost all of the benefits
of a much larger predictor.

Grading:

• The code carries 10 marks.


• The report carries 10 marks. Each item listed above carries 2 marks each.
• The demo/viva carries 10 marks.

You might also like